| Age | Commit message (Collapse) | Author |
|
* Include a "stack trace" with nested-import errors
When errors occur in nested `#include` files it is often helpful to have a "stack trace" / traceback of the `#include` chain that led from a root translation unit to the file with an error.
This change implements a similar feature for `import`s.
It is worth noting that `import`s don't really *require* this kind of compiler support the way `#include`s do because the intention is that the meaning of an `import`ed file does not depend on the order or nesting of `import`s. As such, when trying to *fix* an error in an `import`ed file, you usually don't care how it came to be `import`ed into your shaders.
The use case here is somebody adapting a large body of Slang code to use in a different codebase, such that they have certain `.slang` files they don't actually intend to have compile correctly, and they want to be able to diagnose how they came to include those files when/if they cause problems.
The actual feature implementation is pretty simple because we already track a stack of active `import`s so that we can detect and diagnose recursive `import`s. This change simply changes the disagnostics when there is an error in imported code so that instead of just noting the inner-most `import` site it lists all the `import` sites that were active at the time.
The change includes a test case to confirm that the behavior works (at least for the case of a parse error).
* fixup: test outputs
Co-authored-by: Yong He <yonghe@outlook.com>
Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
|
|
If the user has a derived `struct` type:
```hlsl
struct Base { int b = 1; }
struct Derived : Base { int d = 2; }
```
Then it is still reasonable for them to want to use initializer lists when declaring variables using the `Derived` type:
```hlsl
Derived x = {};
Derived y = { 7, 8 };
```
This change implements two missing pieces of functionality in the Slang compiler to allow this case:
* First, when the front-end semantic checks are applied to an initializer list, if the type being initialized is a derived `struct` type it always expects to find initialization arguments for its base type before those for its fields.
* Second, when lowering an initializer-list expression from the AST to the IR, the compiler expects the first argument in the list to be the initial value for the base field (if any). This also applies to default-initialization of fields/variables.
This change slightly entangles front-end logic with the logic for how struct inheritance is lowered to the IR, but the behavior is unlikely to confuse users who expect C++-like layout.
It is worth noting that with this change it should be possible to initialize the base type using either a nested initializer list or flat arguments:
```hlsl
struct BigBase { int x; int y; int z; }
struct BigDerived : BigBase { int w; }
BigDerived a = { {1,2,3}, 4 };
BigDerived b = { 1, 2, 3, 4 };
```
This behavior should Just Work because of the existing C-like rules for initializer lists where an aggregate can be initialized by either a `{}`-enclosed block or distinct values for its leaf fields.
|
|
During lowering from AST to IR, the Slang compiler translates code that uses `struct` inheritance:
```hlsl
struct Base { int a; }
struct Derived : Base {}
```
into code where the inheritance relationship is "witnessed" by a simple field:
```hlsl
struct Base { int a; }
struct Derived { Base __anonymous_field__; }
```
The underlying bug here is that the `__anonymous_field__` that the compiler generated during IR lowering was not being given any linkage decorations (no mangled name). As a result, if multiple separately-compiled modules all access that field they could disagree on its identity as an IR instruction. This could lead to output code being generated where the declaration of `__anonymous_field__` uses one IR instruction, but accesses use another.
This change includes a fix for the issue, and a test that serves as a reproducer for the original problem.
|
|
The basic bug here was that `enum` types with an explicit tag type:
enum Color : int32_t { ... }
would have an `InheritanceDecl` implying that `Color` inherits from
`int32_t`. The problem is that this is *not* actually an inheritance
relationship, since a `Color` needs to be explicitly cast to/from an
`int32_t`.
Various parts of the compiler currently treat this case like real
inheritance, and as a result the operations taht would apply to an
`int32_t` end up applying to a `Color` as well. This particularly leads
to an ambiguity between applying the `==` operator, because it has
overloads for both the `__EnumType` and `__Builtin{something}`
interfaces.
The fix here is to explicitly exclude the `InheritanceDecl` that
represents an enumeration tag type when considering declared subtype
relationships. A more complete version of this fix would need to go
through all places in the code where `InheritanceDecl`s are used and
make sure that any places using them for true inheritnace relationships
ignore those that represent an enumeration tag type.
(An alternative option would be to use a distinct kind of `Decl` to
represent the tag-type relationship, perhaps even going so far as to
modifying the type of the relevant AST node as part of semantic
checking)
This change includes a regression test for the way this bug surfaced in
user code.
Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
|
|
Introduction
============
Several of our target platforms share a concept of "opaque" types, including resources (`Texture2D`) and samplers (`SamplerState`), which are restricted in how they can be used. GLSL and SPIR-V place very severe restrictions, in that opaque types cannot be used for the type of:
* (mutable) local variables
* (mutable) global variables
* structure fields
* Function result/return
* `out` or `inout` parameters
The HLSL language allows all of these cases, but with the practical caveat that the compiler front-end must be able to statically analyze how opaque types have been used and "optimize away" all of the above cases. For example, it is legal to have a local variable of an opaque type, but at any point where the variable gets used it must be statically known which top-level shader parameter the variable refers to.
Existing Work
=============
In the Slang compiler we need to implement our own passes to detect these "illegal" uses of opaque types and legalize them. The work is basically broken into two distinct steps:
* The existing `legalizeResourceTypes()` pass detects illegal types (e.g., a `struct` that has a field of type `Texture2D`) and replaces them with legal types, sometimes by splitting apart declarations (e.g., a parameter using such a `struct` type gets split into multiple parameters). At a high level, we can think of this as "exposing" opaque types so that they are not hidden inside of nested structures.
* Next, the `specializeResourceOutputs()` pass detects calls to functions that output opaque types (whether by the function return value of `out` / `inout` parameters). The pass analyzes the body of such functions, and tries to isolate the logic that determines their resource-type outputs and hoise that logic into call sites (so that the opaque-type outputs can then be eliminated).
This Change
===========
One important missing case was that the type legalization step was incapable of legalizing types that appear in the result/return type of functions. The existing logic would simply diagnose an internal/unimplemented error if it ecountered a non-simple type in the return position.
At a high-level, supporting this case seems simple enough. Given a function signature like:
```
struct Things { int a; Texture2D b; }
Things myFunc(int x) { ... }
```
we want to split the result type into an "ordinary" result type and then `out` parameters for any opaque-type fields:
```
struct Things_Legal { int a; }
Things_Legal myFunc(int x, out Texture2D result_b) { ... };
```
Similarly, at a call site to a function like this:
```
Things t = myFunc(99);
```
we split the function result into ordinary and opaque-type parts, and pass the latter as `out` parameters:
```
Texture2D t_b;
Things_Legal t = myFunc(99, /*out*/ t_b);
```
The main place where things get tricky is when dealing with `return` sites within the body of a function that needs legalization:
```
Things myFunc(int x) {
...
Things things = ...;
...
return things;
}
```
In theory the answer is simple: a `return` translates into writes to the `out` parameters for any opaque-type data, followed by a return of the ordinary-type part:
```
Things_Legal myFunc(int x, out Texture2D result_b) {
...
Things_Legal things = ...;
Texture2D things_b = ...;
...
result_b = things_b;
return things;
}
```
The sticking point here is that this step requires tracking data between the legalization of the parameter list for `myFunc` and legalization of the `return`s in its body, so that we can identify the `result_b` parameter to be able to write to it. The existing type legalization pass was not built with the idea that such communication is commonly needed; it assumes that each instruction can be legalized in isolation, so long as dependencies are respected.
This change adds logic such that the `legalizeFunc()` step sets up a data structure that it used to represent information about how a function (and its parameter list) got legalized, so that the logic for a `return` can make use of that legalized information. Right now the information we track consists of just the list of parameters that were introduced to represent a return/result type.
Testing
=======
In order to confirm what features do/don't work, I added a set of tests that cover a cross-product of opaque type use cases:
* The opaque type can be used in the function result type, an `out` parameter, or an `inout` parameter
* The opaque type can be used "directly" or nested inside a `struct`.
These tests are helpful to make sure we handle the most important cases, but it is worth noting that the coverage is still lacking in that we do not sufficiently test all the options for what the function body might do. An opaque-type function result could be derived from many different sources:
* It could be a global shader parameter
* It could be an `in` or `inout` parameter of the function itself
* It could be wrapped up in one or more structure types
* It could be wrapped up in one or more array types (such that the output of specialization needs to pass around array indices)
* It could involve use of the type as a local variable (including passing it into other functions with result/`out`/`inout` outputs of opaque types)
This change makes it so that we can handle the simplest cases involving result/return types with a wrapper `struct`, and adds test cases that confirm we handle several other cases for `out` and `inout` parameters. Gaining confidence that we cover all the cases that arise in practical shaders will require more work over following changes.
|
|
* Update gfx back-ends to handle static specialization
The main goal here is to make the D3D11, D3D12 and Vulkan back-ends support static specialization of interface types in the case where the data for the type won't "fit" in the pre-allocated space for existential values. This includes all cases where the concrete type being specialized to has resources/samplers/etc., as well as any cases where its ordinary/uniform data exceeds the space available.
(Note that the CPU and CUDA targets don't need this work since they can (in theory) support arbitrary-size data in the fixed-size existential payload by using pointer indirection. Actually supporting indirection in those cases should be a distinct change)
The Slang compiler already performs layout for programs that have this kind of data that doesn't "fit," and it lays them out using an idea of "pending" type layouts. Basically, a type that contains some amount of specialized interface-type fields will produce both a "primary" type layout that just covers the data for the unspecialized case, as well as "pending" type layout that describes the layout for all the extra data needed by specialization.
When laying out a `ConstantBuffer<X>` or `ParameterBlocK<X>` ("CB" or "PB"), the front-end will try to place as much of that "pending" data into the layout of the buffer/block itself as is possible. That means that both CBs and PBs will be able to allocate trailing bytes for any ordinary data in the "pending" layout. PBs will be able to allocate any trailing resources/samplers into their layout, but for CBs they will spill out to be part of the pending layout for the buffer itself.
In order for the back-ends to properly handle pending data, they need to *either* assume the exact layout rules used by the front-end and try to reproduce them (e.g., by iterating over binding ranges and sub-objects in the exact same order that front-end layout would enumerate them), *or* they need to respect the reflection information produced by the front-end. This change takes the latter approach, trying to make only minimal assumptions about the layout rules being used. This choice is motivated by wanting to decouple the `gfx` implementation from the compiler front-end, especially insofar as this work has made me question whether the current layout rules are the best ones possible.
A common theme across all the implementations is to have a fixed-size type that can represent "binding offsets" for the chosen back-end. The offset type has fields that depend on the API-specific way bindings are indexed; e.g., for D3D11 it has offsets for CBV, SRV, UAV, and sampler bindings. This fixed-size offset type can be filled in based on Slang reflecton information, and then used to compute derived offsets with just a few add operations.
The simple offset type for each API is then extended to produce an offset type that includes both the offsets for "primary" data and also the offsets for "pending" data. Most logic that traffics in offsets doesn't have to know about this more complicated representation.
Making consistent use of these offsets required that I pretty much rewrite the logic that actually applies shader objects to the API state. Doing so might be lowering the efficiency of the system in the near term, but the increase in clarity was important for getting the work done, and it seems like it will also be important if/when we start trying to perform special-case optimizations around root and entry-point parameter setting.
While there are many API-specific differences, we can identify a repeated pattern where many steps, whether applying parameters to the pipeline stage or constructing signatures / layouts, can be broken down into three main operations on `ShaderObject`s or their layouts:
* `*AsValue()` is the core operation, and is the one used for the `ExistentialValue` case most of the time. It ignores the ordinary data in the object, and instead processes all nested binding ranges (for resources/smaplers) and sub-objects.
* `*AsConstantBuffer()` handles the `ConstntBuffer<X>` case, by dealing with the implicit buffer for ordinary data (if it is needed) and then delegates to the `*AsValue()` case.
* `*AsParameterBlock()` handles the `ParameterBlock<X>` case, by allocating/preparing/etc. any descriptor tables/sets that would be required for the current object/layout and then delegating to `*AsConstantBuffer()` to do the rest
The idea is that by having the parameter block case delegate to the constant buffer case, which delegates to the value/existential case, we can streamline a lot of the logic so that it doesn't seem quite as full of special cases.
Note: When preparing this pull request I spent a reasonable amount of time trying to clean up the D3D11 and Vulkan implementations, so they are probably the easiest to read and understand when it comes to the new code. Doing the cleanup work also helped to work out some weird corner case bugs/issues. In contrast, the D3D12 path hasn't had as much attention given to cleanliness and comments, so it really needs some attention down the line to get things into a state that is easier to understand.
* fixup: remove debugging code spotted in review
|
|
* Append proper suffixes to 16-bit literals for GLSL
The GLSL output path wasn't putting suffixes on literals of 16-bit types, and that was leading to compilation errors in downstream `glslang`. This change adds the suffixes defined by `GL_EXT_shader_explicit_arithmetic_types`.
This change also wraps up 8-bit literals so that they are emitted as, e.g., `int8_t(1)` instead of just `1`, to make sure we don't have implicit conversions in the output GLSL that weren't implicit in the Slang IR. We similarly wrap floating-point special values like infinities in their desired types when the type is `float` (e.g., `double(1.0 / 0.0)` for a double-precision infinity).
Note: Standad IEEE 754 half-precision doesn't provide an encoding for infinite or not-a-number values, so it might be considered an error if we emit `half(1.0 / 0.0)` but there really isn't a significantly better alternative for us to emit.
* fixup
|
|
The original goal of this change was to streamline the `TEST_INPUT` system by eliminating options that are no longer relevant once we have eliminated the non-shader-object execution paths. The result is more or less a re-implementation/refactor of the logic around how input is parsed and represented, that tries to set things up for a more general sytem going forward.
The main changes isthat the `ShaderInputLayout` no longer tracks a simple flat list of `ShaderInputLayoutEntry` (that is a kind of pseudo-union of the various buffer/texture/value cases), and it instead uses a hierarchical representation composed of `RefObject`-derived classes to represent "values."
There are several "simple" cases of values
* Textures
* Samplers
* Uniform/ordinary data (`uniform`)
* Buffers composed of uniform/ordinary data (`ubuffer`)
Then there are composed/aggregate values that nest other values:
* An *aggregate* value is a set of *fields* which are name/value pairs. It can be used to fill in a structure, for example.
* An *array* value is a list of values for the elements of an array. It can be used to fill out an array-of-textures parameter, for example.
* A combined texture/sampler value is a pair of a texture value and a sampler value (easy enough)
* An *object* holds an optional type name for a shader object to allocate (it defaults to the type that is "under" the current shader cursor when binding), and a nested value that describes how to fill in the contents of that object
Finally there are cases of values that are just syntactic sugar:
* A `cbuffer` is just shorthand for creating an object value with a nested uniform/ordinary data value
The big idea with this recursive structure is that it gives us a way to handle more arbitrary data types with name-based binding. Supporting this new capability requires changes to both how input layouts get parsed, and also how they get bound into shader objects.
On the parsing side, things have been refactored a bit so that parsing isn't a single monolithic routine. The refactor also tries to make it so that the various options on an input item (e.g., the `size=...` option for textures) are only supported on the relevant type of entry (so you can't specify as many useless options that will be ignored).
The bigger change to parsing is that it now supports a hierarchical structure, where certain input elements like `begin_array` can push a new "parent" value onto a stack, and subsequent `TEST_INPUT` lines will be parsed as children of that item until a matching `end` item. This approach means that we can now in principle describe arbitrary hierarchical structures as part of test input without endlessly increasing the complexity of invididual `TEST_INPUT` lines.
On the binding side, we now have a central recursive operation called `assign(ShaderCursor, ShaderInputLayout::ValPtr)` that assigns from a parsed `ShaderInputLayout` value to a particular cursor. That operation can then recurse on the fields/elements/contents of whatever the cursor points to.
Major open directions:
* With this change it is still necessary to use `uniform` entries to set things like individual integers or `float`s and that is a little silly. It would be good to have some streamlines cases for setting individual scalar values.
* Further, once we have a hierarchical representation of the values for `TEST_INPUT` lines, it becomes clear that we really ought to move to a format more like `TEST_INPUT: dstLocation = srcValue;` where `srcValue` is some kind of hierarchial expression grammar. Refactoring things in this way should make the binding logic even more clear and easy to understand. The refactored parser should make parsing hierarchical expressions easier to do in the future (even if it uses the push/pop model for now)
* One detailed note is that the representation of buffers in this change is kind of a compromise. Just as an "object" value is a thin wrapper around a recursively-contained value for its "content" it seems clear that a buffer could be represented as a wrapper around a content value that could include hierarchical aggregates/objects instead of just flat binary data (this would be important for things like a buffer over a structure type that lays out different on different targets). The main problem right now with changing the representation is actually needing to compute the size of a buffer based on its content, so that can/should be addressed in a subsequent change.
Details:
* The base `RenderTestApp` class and the `ShaderObjectRenderTestApp` classes have been merged, since the hierarchy no longer serves any purpose.
* Disabled the tess that rely on `StructuredBuffer<IWhatever>` because they aren't really supported by our current shader object implementation
* Replaced used of `Uniform` and `root_constants` in `TEST_INPUT` lines with just `uniform`
* Removed a bunch of uses of `stride` from `cbuffer` inputs, where it wasn't really correct/meaningful
* Added the `copyBuffer()` operation to VK/D3D renderers, along with some missing `Usage` cases to support it.
* Made `ShaderCursor` handle the logic to look up a name in the entry points of a root shader object, rather than just having that logic in `render-test`. (We probably need to make a clear design choice on this issue)
|
|
* Remove old code paths from render-test
Historically, the `render-test` tool was using three different code paths:
* One based on `gfx` and manual (non-reflection-based) parameter setting, used for OpenGL, D3D11, D3D12, and Vulkan
* One for CPU that used reflection-based parameter setting but shared no code with the first
* One for CUDA that used reflection-based parameter setting and shared some, but not all, code with the CPU path
Recently we've updated `render-test` to include a fourth option:
* Using `gfx` and the "shader object" system it exposes for a unified reflection-based parameter-setting system taht works across OpenGL, D3D11, D3D12, Vulkan, CUDA, and CPU
This change removes the first three options and leaves only the single unified path. A sa result, a bunch of code in `render-test` is no longer needed, and the codebase no longer relies on things like the `IDescriptorSet`-related APIs in `gfx`.
Several existing tests had to be disabled to make this change possible. Those tests will need to be audited and either re-enabled once we fix issues in the shader object system, or permanently removed if they don't test stuff we intend to support in the long run (e.g., global-scope type parameters, which aren't a clear necessity).
* fixup: CUDA detection logic
|
|
This change originally started with the simple goal of allowing generic functions with default argument values on their parameters to work:
```
void someFunction<T>(T value, int optional = 0);
```
The core problem there was that the compiler code was (correctly) anticipate the case where the default argument value for a parameter depends on a generic parameter, such as:
```
interface IDefaultable { static This getDefault(); }
void anotherFunction<T : IDefaultable>(T first, T second = T.getDefault());
```
Supporting this latter case requires some kind of ability to apply subsitutions to an `Expr`, but our compiler logic simply errored out in that case. The first major fix that went into this change was to add a new `SubstExpr<T>` type that behaves a lot like `DeclRef<T>` in that it stores a `T*` plus a set of substititions that need to be applied to it.
In addition, it was found that even if `anotherFunction<ConcreteType>(...)` might work, when generic argument inference was used for just `anotherFunction(...)` would fail because it includes a strict match on the number of arguments/parameters in the call expression.
The next problem that arose was that the test I'd created used an interace with an `__init` requirement, and it appeared that our code generation didn't work for that case:
```
interface IStuff { __init(int val); }
void f<T : IStuff>(T x = T(0));
```
In this case, the `T(0)` initialization would get compiled to `(ConcreteType) 0` in the output rather than calling the function generated for the `__init` inside `ConcreteType`. The basic problem there was a bit of crufty old logic we have in place to work around the large number of `__init` declarations in the stdlib that don't have proper `__intrinsic_op` modifiers on them. We really need to fix the underlying problem there, but I worked around it by having the IR lowering pass only do its workaround magic on stdlib declarations.
The next problem down this line was that my test had two different `__init` declarations in the concrete type and the logic for checking interface conformance was picking the wrong one to satisfying an interface requirement despite it being obviously wrong (not even the right number of parameter).
This last problem led me down the rabbit-hole of trying to actually get our semantic checking for interface requirements right. There were a few pieces to that work:
* Actually checking that the parameter and result types for two callables match is the simple part. If that was all that would be required we would have implement this logic a long time ago.
* Next we have to deal with functions that make use of the `This` type, associated types, etc. We have to know that when the interface uses `This`, we want to treat that as equivalent to `ConcreteType`, and similarly for associated types. Getting that working is mostly a matter of setting up a this-type subsitution for the interface member being checked.
* Finally, when comparing generic declarations like `IBase::doThing<T>` and `Derived::doThing<U>` we need to deal with the way that `T` and `U` represent the "same" logical type parameter, but are distinct `Decl`s. This is handled by specializing the base declaration to the parameters of the derived one (e.g., forming `IBase::doThing<U>` using the `U` from `Derived::doThing`).
The result seems to be passing our tests, but there are still a few gotchas lurking, I'm sure.
|
|
* Support `bit_cast` between complex types.
* Fix vs project file
* Fix clang build error
* fix
* fix
* Fix
* FIx
* Fix
* Fix
* Fix
* Fix
* Fix linux compile error
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
The basic feature here is the ability to use the `&` operator to produce the conjunction/intersection of two interfaces. That is, you can have interfaces:
interface IFirst { int getFirst(); }
interface ISecond { int getSecoond(); }
and if you need a generic function where the type parameter `T` must conform to *both* of these interfaces, you express that by constraining the parameter to the intersection of the interfaces:
void someFunction<T : IFirst & ISecond>(T value) { ... }
Without this feature, the main alternative an application would have is to define an intermediate interface, like:
interface IBoth : IFirst, ISecond {}
Forcing users to deal with an intermediate interface creates more work for type authors (they need to remember to inherit from the right combined interface(s)), or for `extension` authors (when you add `ISecond` to a type that used to just support `IFirst`, you had better also add `IBoth`). In the worst case, a family of N related "leaf" interfaces would give rise to an exponential number of intermediate interfaces to represnt the possible combinations.
A conjunction like `IFirst & ISecond` is officially its own type, and can be used to declare a type alias:
typealias IBoth = IFirst & ISecond;
This change only includes the first pass of work on this feature, so there are several caveats to be aware of:
* Using a conjunction as part of an inheritance clause is not yet supported (e.g., `struct X : IFirst & ISecond`). This is true even if the conjunction was introduced by an intermediate `typealias`
* The `&` syntax introduced here is only parsed in places where only a type (not an expression) is possible. This means you cannot do things like cast to a conjunction with `(IFirst & ISecond)(someValue)`.
* This work *should* apply to conjunctions of more than two interfaces (like `IA & IB & IC`) but that has not yet been tested
* In the long run it may be sensible to allow conjunctions that use concrete types, but we really ought to have the semantic checking logic rule that out for now.
* During testing, I encountered compiler crashes when trying to use this feature together with `property` declarations. Further investigation and debugging is called for.
* The handling of conjunction types is currently incomplete, in that there are many equivalences the compiler does not yet understand. For example, it is clear that `IA & IB` is equivalent to `IB & IA`, but the compiler currently does not understand this and will treat them as different types. A deeper implementation approach is called for.
* Conjunctions are currently only supported for generic type parameter constraints, when performing full specialization. Use of conjunctions for existential-type value parameters or with dynamic dispatch is not yet supported.
|
|
This change converts a large number of our existing tests to use the `ShaderObject` support that was added to the `gfx` layer.
In many cases, tests were just updated to pass `-shaderobj` and the result Just Worked.
In other cases, a `name` attribute had to be added to one or more `TEST_INPUT` lines.
For tests that did not work with shader objects "out of the box," I spent a little bit of time trying to get them work, but fell back to letting those tests run in the older mode.
Future changes to the infrastructure will be needed to get those additional tests working in the new path.
Along with the changes to test files, the following implementation changes were made to get additional tests working:
* Because the shader object mode uses explicit register bindings (from reflection), the hacky logic that was offseting `u` registers for D3D12 based on the number of render targets gets disabled (by another hack).
* The "flat" reflection information coming from Slang was not correctly reporting "binding ranges" for things that consumed only uniform data (which would be everything on CUDA/CPU), so it was refactored to properly include binding ranges for anything where the type of the field/variable implied a binding range should be created (even if the `LayoutResourceKind` was `::Uniform`).
* A few fixes were made to the CUDA implementation of `Renderer`, in order to get additional tests up and running. Most of these changes had to do with texture bindings, which hadn't really been tested previously.
In addition, a few changes were made that were attempts at getting more tests working, but didn't actually help. These could be dropped if requested:
* As a quality-of-life feature (not being used) the `object` style of `TEST_INPUT` line is upgraded to support inferring the type to use from the type of the input being set.
* Any `object` shader input lines get ignored in non-shader-object mode.
|
|
The big picture here is that an `extension` can now apply to an interface type and provide convenience methods for all types that implement that interface. Suppose you have an interface for counters:
interface ICounter { [mutating] void add(int val); }
and a type that implements it:
struct SimpleCounter : ICounter { int _state = 0; ... }
If a common operation in your codebase is to increment a counter by adding one, you would be faced with the problem of either:
* Add the `increment()` operation to `ICounter`, and force every implementation to implement the new requirement
* Add the `increment()` operation to concrete counter types as needed, and thus not be able to use it in generic code
* Make `increment()` a global ("free") function, and force clients of counters to have to know which operations use member syntax (`c.add(...)`) and which use global function call syntax (`increment(c)`).
The whole idea of `extension`s is to allow for another option that is better than all of the above:
extension ICounter { [mutating] void increment() { this.add(1); } }
The core of the implementation is relatively straightforward, and consists of two complementary pieces.
The first piece is that when emitting a concrete IR entity (function/type/whatever) we treat any enclosing `interface` type (or `extension` thereof) a bit like an enclosing `GenericDecl`, and introduce an `IRGeneric` to wrap things. The generic `IRGeneric` has parameters representing the `This` type for the interface, along with the witness table that shows how `This` conforms to the interface itself.
We thus end up with an IR version of `increment()` something like:
void increment<This : ICounter>(This this) { this.add(1); }
The second (complementary) fix is that when there is code that references this `increment()` operation, we don't treat it like an interface requirement (look up based on its key), and instead treat it like a generic (since that is how it is lowered now) and speciaize it to the information we can glean from the `ThisTypeSubstitution`.
A related fix that is required here is that within the body of `increment`, when we perform `this.add`, we need to ensure that the lookup of `add` in the base interface properly takes into account the subtype relationship (`This : ICounter`) and encodes it into the lookup result, so that we get `((ICounter) this).add`, and properly generate code that looks up the `add` method in the witness table for `This`.
|
|
* Another fix for overriding property decls
The central problem we keep running into with `property` decls in `interface`s comes down to two choices:
1. When a member lookup `obj.someName` or a simple lookup for `someName` produces an overloaded result, we make no attempt to resolve the overloading right away, and instead postpone disambiguation until the point where that expression gets *used*, in case the context where it gets used can help in disambiguation (a notable case being when there is a call expression `obj.someName(...)` or `someName(...)`).
2. When looking up members in a the scope of a type (either for `obj.someName` or `someName` in the context of a method), we include all results from base types in the set of overloads returned, even in cases where the type has a direct member that "overrides" the inherited one.
The combination of these factors means that when a `struct` type implements a `property` to satisfy a requirement of an inherited `interface`, then references to `obj.someProp` end up being ambiguous between the property in the concrete `struct` type and the property it inherits through the `interface`.
There is no quick fix possible for issue (2). It might seem that we could just skip over members inherited through `interface`s when doing lookup in a type, but that solution wouldn't apply to inheritance from another `struct` type, or any future scenario where we support default implementations of methods in interfaces. The simple idea of saying that a derived-type member named `M` hides all inherited members named `M` is possible, but would lead to a bad user interface when a type wants to support both a core "bottleneck" method and a bunch of convenience overloads with the same name.
That leaves us with issue (1), and trying to find a reasonable fix for it. The common case is that any expression `e` eventually gets used in a context where it will be be subject to disambiguation:
* If we form a call expression `e(...)`, then the overload resolution logic will (obviously) work to disambiguate which `e` was meant.
* If `e` is used as an argument to another call (`f(... e ...)` or `... + e`), then `e` will be coerced to the expected parameter type for its argument position, and that coercion will disambiguate it (this is the bit that was fixed in #1501)
* If `e` is used in another context where a type is expected/known, it will also be coerced: `if(e)`, `int v = e`, etc.
The problem case that is left behind is any scenario where `e` is not subject to one of the above resolution cases, which mostly amounts to cases where an expression is never coerced to a single fixed type. There are a few important cases where this occurs today:
* When the expression is used as the left-hand side of an assignement (`e = ...`).
* When an expression is used to initialize a variable with an implicit type (`let v = e`).
* When inferring generic arguments from the value arguments at a call site (`f(e)` where `f` is defined as `f<T>(T v)`)
The key connecting thread in each of these cases is that the front-end needs to determine the type of `e` to make progress.
Our semantic checking logic already has functions that try to draw a distinction between the two cases:
* The `CheckTerm()` operation is supposed to be used when we expect that we will eventually coerce or otherwise diambiguate the term, and also in cases where we don't yet know if a term should name a type or a value
* The `CheckExpr()` operation is supposed to be used when we do not expect that we will apply coercion/disambiguation to a term, and need to have assurances that it has been coerced into a non-overloaded expression with a reasonable type
The simple part of the fix made here is to make `CheckExpr()` actually do part of what it is suppsoed to (attempt to disambiguate overloaded terms), and then audit all the call sites to `CheckExpr()` to make sure they are actually ones that intend to opt into that logic.
The messier part of the fix is dealing with generic argument inference, because we need to extract the type of the disambiguated expression for the purposes of inference, but we don't want to disturb the actual argument list at a call site (because type coercion of the arguments is supposed to handle the disambiguation). This part is done with a bit of special-casing in the overload-resolution context, by adding a method that gets the type or an argument after disambiguation (when possible).
* fixup
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
The basic idea is that if you have a namespace:
namespace MyCoolNamespace { void f() { ... } ... }
then you can bring the declarations from that namespace into scope with:
using MyCoolNamespace;
f();
The `using` construct is allowed in any scope where declarations are allowed. As an additional feature, the construct allows and then ignores the keyword `namespace` if it occurs right after `using`:
using namespace MyCoolNamespace;
Note that unlike in C++, `using` a namespace inside another namespace doesn't implicitly make the symbols available to clients of that namespace:
namespace hidden { void secret() {...} ... }
namespace api { using hidden; ... }
api.secret(); // ERROR: `secret()` isn't a member of `api`
The implementation of this feature was relatively simple, although it does leave out more advanced features that might be desirable in the future:
* No support for `using MCN = MyCoolNamespace` sorts of tricks to define a short name
* No support for `using` anything that isn't a namespace (e.g., to make the members of a type available without qualification)
* No support for cases where multiple visible modules have a namespace of the same name (or dealing with overloaded namespaces in general)
|
|
Our current lookup process always finds *all* members of a type, which can include both an inherited member (e.g., from an `interface`) and one that logically overrides/implements it. If something downstream doesn't filter this result down and favor the derived member, then an ambiguity error will result.
To date, this has mostly been a non-issue because we haven't emphasized inheritance, and the main case we did support (`struct` types implemented `interface` methods) gets disambiguated as part of overload resolution for function calls.
Recent changes to support `property` declarations to `interface`s add the possibility for ambiguity between a "base" and "derived" declaration that can't rely on overload resolution for disambiguation.
The approach in this PR is to add disambiguation logic to the other main place where the results of lookup get used. If a lookup result is being assigned to a variable, passed to a function, or otherwise used in a case where a value of a specific type is needed, it will be "coerced" to the desired type. This change makes it so that the first step in the coercion logic is to try to disambiguate the expression that is being coerced.
In order to ensure that an overloaded expression can be detected and resolved even when just checking if coercion is possible, I needed to update the `canCoerce*()` functions to also take the expression that is being tested for coercibility, and not just its type. There is only one case (that I saw) where coercion checks were being made without an expression value available, and that case didn't actually need/want to handle overloading.
In order to test the fixes here, I added logic to the `property`-in-`interface` test to make sure that the critical cases work as expected (references to a derived member using "dot syntax" and "implicit `this`" syntax).
Alternatives Considered
-----------------------
The first attempt at this fix took a simpler approach: I added the disambiguation logic as a post-process on member lookup. That is, given `obj.foo` I would take the `LookupResult` for `foo` and immediately try to filter it to include only the most-derived members. This approach has the major benefit of catching even more use cases of values (and thus helping to ensure that we don't spend forever chasing down more of these ambiguity errors), but it also has two critical problems:
1. If we only trigger disambiguation when looking up `obj.foo`, then we can't do anything to help when `foo` is looked up as an ordinary identifier, but is actually equivalent to `this.foo`. A full fix would require doing this disambiguation on *every* name lookup, which leads to the second issue:
2. It is important that for a method call like `obj.m(...)` we do *not* disambiguate when looking up `obj.m`, and instead let the overload resolution for the call resolve things. That choice is what makes it possible to call an inherited `m` declaration even when there is a derived `m` with a different signature.
Issue (1) is covered by the test case that was added here, but we should probably have a test case for (2) to make sure we don't break that use case.
Caveats
-------
An important case that we don't solve in this PR is when the result of a lookup is captured in a variable without an explicit type:
let f = obj.foo;
That case also needs disambiguation, and should be addressed in a later change.
A secondary issue is that our approach to prioritizing declarations during lookup is still quite naive. We really need a way for lookup to attach information about nesting of scopes to results (to be clear that results from inner scopes should be preferred over those from outer scopes), as well as have a robust mechanism for comparing the priority of members based on the inheritance graph of a type. This change doesn't do anything to make the situation better or worse.
|
|
The basic problem here was that in a declaration like:
```hlsl
enum Color : uint { Red, Orange, ... }
```
The `: uint` bit is represented as an `InheritanceDecl`, because that is what we use to represent the syntactic form of inheritance clauses like that. At the point where we parse the `InheritanceDecl` we don't yet know whether it represents a base interface or a "tag type" like `uint` in this case.
The root problem that is then created is: an `enum` type is *not* a subtype of its "tag type," and treating it like a subtype can create problems.
The main problem that arises is that looking in a type like `Color` will find both the members of color *and* the members of `uint`. In the case of things like `__init` declarations, that creates a problem where the `Color` type has two different `__init`s that take a `uint`:
* The one it inherits from `uint` via that `InheritanceDecl` (even though it shouldn't)
* The one it gets via an extension just for conforming to `__EnumType` (a non-user-exposed `interface` in the standard library)
Because both of those `__init`s are inherited, neither is preferred over the other one and they create an ambiguity if somebody tries to write:
```hlsl
uint u = ...;
Colorc = Color(u);
```
The solution used in this PR is to add a compiler-internal modifier to the `InheritanceDecl` that introduces a "tag type" to an `enum`, in an early phase of checking (one of the ones that occurs before it is legal to enumerate the bases of a type). Then the lookup process is modified to ignore `InheritanceDecl`s with that modifier when doing lookup in super-types (since the declaration does *not* indicate a subtype/supertype relationship).
This appears to get the basic feature working again, although it is possible that there are other parts of the compiler that use `InheritanceDecl`s and mistakently assume that all `InheritanceDecl`s introduce subtype/supertype relationships. We probably need to do a significant audit of the code to start being more clear about the nature of the relationships such declarations introduce. Such steps are left to future changes.
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
There are two main features in this change. First, we allow for `interface`s to declare `property` requirements, which can be satisfied by matching `property` declarations in a type that conforms to the interface:
interface IRectangle
{
property float width { get; }
property float height { get; }
}
struct Square : IRectangle
{
float size;
property float width { get { return size; } }
property float height { get { return size; } }
}
Second, we allow a type to satisfy a `property` requirement with an ordinary field of the same name:
struct Rectangle : IRectangle
{
float width;
float height;
// no explicit `property` declarations needed
}
The implementation of these features is mostly in `slang-check-decl.cpp` in the logic for checking conformance of a type to an interface.
The first feature simply requires adding logic to checking whether a candidate satisfying `property` declaration matches a required `property` declaration. To do so, it must have the same type, and an accessor to satisfy each of the required accessors.
The second feature requires adding logic to synthesize an AST `property` declaration for a type, based on a required `property` declaration and its accessors. This means that, more or less, any type where `this.name` yields a storage location that does what is needed can satisfy a property requirement (there is no specific rule that says the storage needs to be a field, although that is the most likely case).
The way that witnesses are stored for property declarations probably merits some description. During IR lowering, an abstract storage declaration like a subscript or `property` more or less desugars away, so that the actual interface requirements correspond to the accessors within it (the `get`, `set`, etc.). This means that a witness table should have entries/keys corresponding to the accessors and not the property itself. The process of finding/recording witnesses for `property` requirements thus installs entries for the individual accessors (with care taken to only install accessor witnesses once we are sure we have witnesses for all the requirements). Currently, the code also installs an entry for the property itself, although that is not strictly required, and might not be something we continue to do long-term.
(Aside: it was somewhat surprising that an end-to-end test of `property` declarations in `interface`s Just Worked without any changes to IR lowering.)
As we continue to write more code that synthesizes and checks AST expressions/statements, it becomes necessary to refactor the semantic checking logic so that it splits the recursive part (e.g., checking the operands of an assignment) from the validation part (e.g., checking that the assignment itself is valid). It is probably too big of a change to justify at this point, but it might be valuable in the future to have distinct hierarchies that represent unchecked and checked ASTs, with semantic checking mostly being a transformation from one to the other. The benefit of such a change is we could factor out a distinct "builder" API for constructing validated/checked AST nodes, with both semantic checking and AST synthesis being clients of that API.
|
|
The initial change to introduce `property` declarations tied them to a "modern" syntax:
property width : float { ... }
In practice, a great majority of users assume that properties in Slang will be declared like those in C#:
property float height { ... }
This change allows both options to parse correctly.
The choice made here is to only parse as the "modern" syntax when it can be detected from lookahead (an identifier followed by a `:`), and fall back to the "traditional" syntax otherwise. That choice might not produce the best diagnostic messages around syntax errors in codebases that use the modern syntax, but it is the easiest trade-offs to make.
We also add similar disambiguation logic for the `newValue` parameter of a `set` declaration (and other "modern"-style parameters). This strategy cannot be applied to all function parameters in general, because traditional parameter lists can still use `:` to introduce a semantic.
Note: the same disambiguation strategy applied here could be used for `let` and `var` declarations:
let a : int = 1;
let int b = 2;
This change does not try to introduce flexibility like that, because it seems unlikely for users to care.
|
|
* Improve handling of cast detection when have a more complex cast than just a single identifier.
* Improve comments around heuristic for casting
* Added nested enum test.
* Improve comments
* Define function like - output change.
* Use lookup for types in determining if cast or not.
* Add _isCast function
* Add heuristic test to nested-enum.slang that works if the type test fails.
* Change hueristic based on review.
Allow (..)( to always be an expression, because if it's a type it will be turned into a cast later.
* Fix output of define-function-like.slang - which changes again with improved casting support.
* Improve testing for type in cast - if we find a decl and it's not a type, then we know it's not a cast.
|
|
Entry point `uniform` parameters were a feature of the original Cg and HLSL, but have not been used much in production shader code. One of our goals on Slang is to reduce the (ab)use of the global scope, so bringing entry point `uniform` parameters up to a greater level of usability is an important goal.
Some policy choices about how global vs. entry-point `uniform` parameters behave have already been made, that shape decisions looking forward:
* For DXBC/DXIL, it makes the most sense to follow the lead of fxc/dxc, by treating entry point `uniform` parameters as a kind of syntax sugar for global shader parameters. Any parameters of "ordinary" types are bundles up into an implicit constant buffer, and all the resources (including the implicit constant buffer) are assigned `register`s just as for globals. It is up to the application to decide how to bind those parameters via a root signature (using root descriptors, root constants, descriptor tables, local vs. global root signature, etc.)
* For CPU, it makes sense to pass global vs. entry-point parameters as two different pointers, although the details of what we do for CPU are the least constrained across all current targets.
* For CUDA compute, it makes the most sense to map global shader parameters to `__constant__` global data, and entry-point `uniform` parameters to kernel parameters. This choice ensures that the signature of a kernel when translated from Slang->CUDA follows the Principle of Least Surprise, at the cost of making entry-point vs. global parameters be passed via different mechanisms.
* For OptiX ray tracing, it makes sense to expand on the precedent from CUDA compute: pass global parameters via global `__constant__` data (as is already expected by OptiX for whole-launch parameters), and pass entry-point `uniform` parameters via the "shader record." This establishes a precedent that for ray-tracing shaders, global-scope parameters map to the "global root signature" concept from DXR, while entry-point `uniform` parameters map to a "local root signature" or "shader record."
* For Vulkan ray tracing, the precedent from OptiX then argues that entry-point `uniform` parameters should map to the Vulkan "shader record" concept (and thus cannot support things like resource types).
* The remaining interesting case is what to do for non-ray-tracing shaders on Vulkan.
The dev team agrees that the most reasonable choice to make for non-ray-tracing Vulkan shaders is to map entry-point `uniform` parameters to "push constants." In particular, this makes it easy to express the case of a compute kernel with direct parameters of ordinary/value types in the way that will be implemented most efficiently.
The big picture is then that a kernel like:
```hlsl
void computeMain(uniform float someValue) { ... }
```
will map to output GLSL like:
```glsl
layout(push_constant)
uniform
{
float someValue;
} U;
void main() { ... }
```
If the user really wanted a constant-buffer binding to be created instead, they can easily change their input to make the buffer explicit:
```hlsl
struct Params { float someValue; }
void computeMain(uniform ConstantBuffer<Params> params) { ... }
```
(Forcing the user to be explicit about the desire for a buffer here creates a nice symmetry between Vulkan and CUDA; in the first case the user sets up the data in host memory and passes it to the GPU by copy, while in the second case the user must allocate and set up a device-memory buffer for the data. This symmetry extends to D3D if the application chooses to map entry-point `uniform` parameters to root constants.)
This change implements logic in the "parameter binding" part of the Slang compiler to make sure that entry-point `uniform` parameters are wrapped up in a push-constant buffer rather than an ordinary constant buffer for non-ray-tracing shaders on Vulkan (and in a shader record "buffer" for the ray-tracing case).
The majority of the actual work was in adding support for root/push constants to the test framework and the graphics API abstraction it uses. To be clear about that support:
* Root constant ranges are (perhaps confusingly) treated as a new kind of "slot" that can appear on a descriptor set. This choice ensures that the implicit numbering of registers/spaces used by the back-ends can account for these ranges correctly.
* The `TEST_INPUT` lines are extended to allow a `root_constants` case that behaves more or less like `cbuffer`
* The CPU and CUDA paths can treat a `root_constants` input identically to a `cbuffer`. They already allocate the actual buffers based on reflection, and just use `cbuffer` as a directive that causes bytes to be copied in.
* On D3D12 and Vulkan, a descriptor set allocates a `List<char>` to hold the bytes of root constant data assigned into it, and these bytes are flushed to the command list when the table is actually bound (usually right before rendering).
* On D3D11, a descriptor set treats a root constant range more or less like a constant buffer range (with a single buffer), except that it also automatically allocates a buffer to hold the data. Assigning "root constant" data automatically copies it into that buffer.
The small number of tests that used entry-point `uniform` parameters of ordinary types were updated to use the new `root_constant` input type, and the bugs that surfaced were fixed.
A new test to confirm that entry-point `uniform` parameters map to the shader record for VK ray tracing was added.
An important but technically unrelated change is the removal of the `DescriptorSetImpl::Binding` type and related function from the Vulkan implementation of `Renderer`. That type was created to ensure that objects that are bound into a descriptor set don't get released while the descriptor set is still alive, but the implementation relied on a complicated linear search to check for existing bindings, which could create a performance issue for descriptor sets that include large arrays of descriptors. The new implementation makes use of the approach already present in the various `Renderer` implementations (including the Vulkan one) for assigning ranges in a descriptor set a flat/linear index for where their pertinent data is to be bound. As a result, the Vulkan `DescriptorSetImpl` now uses a single flat array of `RefPtr`s to track bound objects, and has no need for linear search when binding.
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
The IR pass that introduces an explicit `KernelContext` for the CPU/CUDA back-ends was also responsible for adding an explicit parameter to the kernel entry point to receive the constant buffer (pointer) with all the global uniform parameters. However, if there were no global uniform parameters, this parameter wasn't getting introduced, which changed the signature/ABI of the generated entry point function.
This change makes it so that the pass unconditionally adds a parameter. In the case where there are no global uniforms it just adds a `void*` parameter that never gets used.
In order to avoid future regressions, this change also adds a test case to confirm that things work correctly when a kernel has only entry-point parameters and no global parameters.
|
|
During semantic checking, the compiler used to link together `ExtensionDecl`s into a singly-linked list dangling off of the `AggTypeDecl` that they applied to. This approach made lookup relatively easy, because given a `DeclRef` to an `AggTypeDecl` one could easily find and walk the list of candidate extensions.
Unfortunately, the simple approach has two major strikes against it:
* First, as we recently ran into, it creates a lifetime/ownership problem, in cases where the `ExtensionDecl` is outlived by the `AggTypeDecl` it applies to. This creates the one and only place in the compiler today where an "old" AST node might point to a "new" AST node, and it resulted in use-after-free problems in client code.
* Second, the scoping of `extension`s ends up being completely wrong. All of the `extension` methods on a type end up being visible in all cases, instead of just in the context of modules where the `extension` itself is visible. The comparable feature in C# (static extension methods) is careful to not make scoping mistakes like this. The Swift langauge has loose scoping for `extension` more akin to what we have in Slang today, but the maintainers seem to consider it a misfeature.
This change attempts to clean up both issues by changing the way that extension declarations are stored. There are two main pieces:
1. The primary "source of truth" for extension lookup has been moved to the `ModuleDecl`, where a module is responsible for storing a cache of the extensions declared within that module (keyed by the declaration of the type being extended). This cache is updated at the same point where the old code would mutate the AST node being depended on.
2. A secondary aggregated cache is added to the `SharedSemanticsContext` used during semantic checking. This cache includes entries from across multiple modules, and is intended to be invalidated and rebuilt on demand if new modules are added during checking.
Access to the candidate extensions has now been put behind subroutines that require a semantics-checking context to be passed in (there was always one available in contexts that care about extensions).
In addition, the operation for looking up members including those from extensions was refactored heavily to involve internal rather than external iteration and, more importantly, was changed so that it actually tests whether the `ExtensionDecl`s it loops over apply to the type in question, rather than blindly letting extensions members be looked up in ways that don't make sense.
There are three test cases added here to confirm aspects of the fix:
* First, I added a test that reproduces the crash that was being seen, so that we have a regression test for the fix.
* Second, I added a basic semantic-checking test to confirm that an `extension` from an `import`ed module is still visible/usable, to confirm that I didn't break existing valid uses of extensions.
* Third, I added a diagnostic test that ensures that we correctly ignore extensions that should not be visible in a given context as a result of `import` declarations.
Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
|
|
* Adding support for global uniform shader parameters
This change adds support for Slang programmers to declare shader parameters of "ordinary" types at global scope:
```hlsl
uniform float gScaleFactor;
void main() { ... *= gScaleFactor; ... }
```
The generated HLSL/GLSL/DXIL/SPIR-V output will be something along the lines of:
```hlsl
struct GlobalParams
{
float gScaleFactor;
}
cbuffer globalParams
{
GlobalParams globalParams;
}
void main() { ... *= globalParams.gScaleFactor; ... }
```
The binding information used for the implicit `globalParams` constant buffer will be determined by the existing implicit parameter binding logic (which already had support for this kind of transformation).
The reason this change is being pursued right now is because it is one step toward removing the implicit `KernelContext` type that is used to wrap the generated code for our CPU and CUDA C++ targets. Handling global-scope parameters of ordinary type requires an IR pass that synthesizes the `GlobalParams` structure type above, and that step ends up removing the need for the similar `UniformState` structure that was being used in the CPU/CUDA emit logic.
A more detailed guide to the changes included follows:
* The diagnostic for a global-scope variable that is implicitly a shader parameter was kept, but changed to a warning. Users can opt out of the warning by decorating their parameter as a `uniform` (since that keyword is already being used to mark entry-point parameters that should be treated as uniform shader parameters).
* To simplify the task of finding the global shader parameters, the `CLikeSourceEmitter` type has been given an `m_irModule` member. The previous emit logic for `UniformState` was having to do a roundabout solution involving the `EmitAction`s to deal with not having direct access to the module.
* Removed a few dead declarations in the emit logic (related to a much earlier point where emit was based on the AST instead of the IR).
* Made the computation of type names in C++ emit take into account `ConstantBuffer<T>` and `ParameterBlock<T>`. As far as I can tell, these were being handled with some special-case hacks in the emit logic instead of being supported more fundamentally. It might actually be good to pass these through as `ConstantBuffer<T>` and `ParameterBlock<T>` in the C++ output, and allow the prelude to customize their translation (defaulting to defining them as `T*`).
* Removed the special-case C++ emit logic for references to global shader parameters. There are now at most two global shader parameters to deal with, and the default emit logic (referring to them by name) does the Right Thing.
* Changed the handling of entry points for C++ (both CPU and CUDA) so that it handles the bundled-up shader paameters for the global and entry-point scopes the same way. The main complication here is OptiX, where parameter data is passed very differently than it is for CUDA compute kernels.
* Reverted changes to `ir-entry-point-uniforms` that had made its logic depend on the compilation target. The parameter binding logic was already responsible for deciding if a given target needed to wrap up its entry-point parameters in a constant buffer, and the IR pass was respecting that layout information. The current workaround had been removing the `ConstantBuffer<T>` indirection from this IR pass for CPU/CUDA, but then reintroducing the same indirection later on in the emit step.
* Added an explicit IR pass with the task of collecting global-scope parameters of uniform/ordinary type and packaging them up into a `struct`, and then optionally packaging that `struct` up in a constant buffer. This pass bases its decisions on the IR layout information that was already computed, so it should match whatever policy choices were made at the layout level.
* Changed the "key" operand on IR `struct` layout information to not assume an `IRStructKey`. The problem here is that the global scope gets a `StructTypeLayout` to represent its members, and this is convenient (rather than having to always special-case logic that handles the global scope), but the "fields" of that struct are global variables which do not have `IRStructKey`s associated with them. The simplest solution is to use the variables themselves as the keys, which required removing the assumption in the IR encoding.
* Updated the IR layout process to compute a layout for the global scope of an entire program, and to attach that to the `IRModule` via a decoration. Updated the IR linking process to carry through that decoration to the linked output. This is necessary so that the IR pass that transforms global parameters can access the global-scope layout information.
An important concern with this approach is that the contents and layout of the monolithic `GlobalParams` structure depends on the exact set of modules that were linked (and the order in which they were specified, in some cases). This isn't really a new thing with this change, but it becomes more important as we start to think of how to generalize things to better support separate compilation and linking.
There are changes that can (and should) be made to the way that IR layouts are computed for programs (e.g., so that we compute layout per-module and then combine them rather than as a whole-program step). In this case, the problem of forming the combined/linked global layout can be moved down the IR level and not be reliant on AST-level information.
Just changing the way layout and linking interact would not change the fundamental problem that global shader parameters as they currently exist in Slang/HLSL/GLSL are not readily compatible with true separate compilation. We either need to find a solution strategy that we can apply to allow existing shaders to work with separate compilation *or* we need to incrementally work toward removing support for global-scope shader parameters in favor of explicit entry-point parameters in all cases.
* fixup: missing files
* fixup: comment the new code
|
|
* Initial work on property declarations
Introduction
============
The main feature added here is support for `property` declarations, which provide a nicer experience for working with getter/setter pairs.
If existing code had something like this:
```hlsl
struct Sphere
{
float4 centerAndRadius; // xyz: center, w: radius
float3 getCenter() { return centerAndRadius.xyz; }
void setCenter(float3 newValue) { centerAndRadius.xyz = newValue; }
// similarly for radius...
}
void someFunc(in out Sphere s)
{
float3 c = s.getCenter();
s.setCenter(c + offset);
}
```
It can be expressed instead using a `property` declaration for `center`:
```hlsl
struct Sphere
{
float4 centerAndRadius; // xyz: center, w: radius
property center : float3
{
get { return centerAndRadius.xyz; }
set(newValue) { centerAndRadius.xyz = newValue; }
}
// similarly for radius...
}
void someFunc(in out Sphere s)
{
float3 c = s.center;
s.center = c + offset;
}
```
The benefits at the declaration site aren't that signficiant (e.g., in the example above we actually have slightly more lines of code), but the improvement in code clarity for users is significant.
Having `property` declarations should also make it easier to migrate from a simple field to a property with more complex logic without having to first abstract the use-site code using a getter and setter.
An important future benefit of `property` syntax will be if we allow `interface`s to include `property` requirements, and then also allow those requirements to be satisfied by ordinary fields in concrete types.
Subscripts
----------
The Slang compiler already has limited (stdlib-use-only) support for `__subscript` declarations, which are conceptually similar to `operator[]` from the C++ world, but are expressed in a way that is more in line with `subscript` declarations in Swift. A `SubscriptDecl` in the AST contains zero or more `AccessorDecl`s, which correspond to the `get` and `set` clauses inside the original declaration (there is also a case for a `__ref` accessor, to handle the case where access needs to return a single address/reference that can be atomically mutated).
A major goal of the implementation here is to re-use as much of the infrastructure as possible for `__subscript` declarations when implementing `property` declarations.
Nonmutating Setters
-------------------
One additional thing added in this change is the ability to mark a `set` accessor on either a subscript or a property as `[nonmutating]`, and indeed all of the existing `set` accessors declared in the stdlib have been marked this way.
The need for this modifier is a bit subtle. If we think about a typical subscript or property:
```hlsl
struct MyThing
{
int f;
property p : int
{
get { return f; }
set(newValue) { f = newValue; }
}
}
```
it is clear we want the `set` accessor to translate to output HLSL as something like:
```
void MyThing_p_set(inout MyThing this, int newValue)
{
this.f = newValue;
}
```
Note how the implicit `this` parameter is `inout` even though we didn't mark anything as `[mutating]`. This is the obvious thing a user would expect us to generate given a property declaration.
Now consider a case like the following:
```hlsl
struct MyThing
{
RWStructuredBuffer<int> storage;
property p : int
{
get { return storage[0]; }
set(newValue) { storage[0] = newValue; }
}
}
```
This new declaration doesn't require (or want) an `inout` `this` parameter at all:
```
void MyThing_p_set(MyThing this, int newValue)
{
this.storage[0] = newValue;
}
```
In fact, given the limitations in the current Slang compiler around functions that return resource types (or use them for `inout` parameters), we can only support a `set` operation like this if we can ensure that the `this` parameter is considered to be `in` instead of `inout`. This is exactly the behavior we allow users to opt into with a `[nonmutating] set` declaration.
All of the subscript operations in the stdlib today have `set` accessors that don't actually change the value of `this` that they act on (e.g., storing into a `RWStructuredBuffer` using its `operator[]` doesn't change the value of the `RWStructuredBuffer` variable -- just its contents).
We'd gotten away without this detail so far just because `set` accessors were only being declared in the stdlib and they were all implicitly `[nonmutating]` anyway, so it never surfaced as an issue that the code we generated assumed a setter wouldn't change `this`.
Implementation
==============
Parser and AST
--------------
Adding a new AST node for `PropertyDecl` and the relevant parsing logic was mostly straightforward. The biggest change was allowing a `set` declaration to introduce an explicit name for the parameter that represents the new value to be set.
This change also adds a `[nonmutating]` attribute as a dual to `[mutating]`, for reasons I will get to later.
Semantic Checking
-----------------
The `getTypeForDeclRef` logic was updated to allow references to `property` declarations.
Some of the semantic checking work for subscripts was pulled out into re-usable subroutines to allow it to be shared by `__subscript` and `property` declarations.
The checking of accessor declarations, which sets their result type based on the type of the outer `__subscript` was changed to also handle an outer `property`.
Some special-case logic was added for checking of `set` declarations to make sure that their parameter is given the expected type.
Some logic around deciding whether or not `this` is mutable had to be updated to correctly note that `this` should be mutable by default in a `set` accessor, with an explicit `[nonmutating]` modifier required to opt out of this default. (This is the inverse of how a typical method or `get` accessor works).
IR Lowering
-----------
The good news is that after IR lowering, access to properties turns into ordinary function calls (equivalent to what hand-written getters and setters would produce), so that subsequent compiler steps (including all the target-specific emit logic) doesn't have to care about the new feature.
The bad news is that adding `property` declarations has revealed a few holes in how IR lowering was handling `__subscript` declarations and their accessors, so that it didn't trivially work for the new case as-is.
The IR lowering pass already has the `LoweredValInfo` type that abstractly represents a value that resulted from lowering some AST code to the IR. One of the cases of `LoweredValInfo` was `BoundSubscript` that represented an expression of the form `baseVal[someIndex]` where the AST-level expression referenced a `__subscript` declaration. The key feature of `BoundSubscript` is that it avoided deciding whether to invoke the getter, the setter, or both "too early" and instead tried to only invoke the expected/required operations on-demand.
This change generalizes `BoundSubscript` to handle `property` references as well, so it changes to `BoundStorage`. Making the type handle user-defined property declarations required fixing a bunch of issues:
* When building up argument lists in the IR, we need to know whether an argument corresponds to an `in` or an `out`/`inout` parameter, to decide whether to pass the value directly or a pointer to the value. Some of the logic in the lowering pass had been playing fast and loose with this, so this change tries to make sure that whenever we care computing a list of `IRInst*` that represent the arguments to a call we have the information about the corresponding parameter.
* Similarly, when emitting a call to an accessor in the IR, the information about the expected type of the callee was missing/unavailable, and the code was incorrectly building up the expected type of the callee based on the types of the arguments at the call site. The logic has been changed so that we can extract the expected signature of an accessor (how it will be translated to the IR) using the same logic that is used to produce the actual `IRFunc` for the accessor (so hopefully both will always agree).
* Dealing with `in` vs. `inout` differences around parameters means also dealing with the "fixup" code that is used to assign from the temporary used to pass an `inout` argument back into the actual l-value expression that was used. That logic has all been hoisted out of the expression visitor(s) and into the global scope.
Future Work
===========
The entire approach to handling l-values in the IR lowering pass is broken, and it is in need a of a complete rewrite based on new first-principles design goals. While something like `LoweredValInfo` is decent for abstracting over the easy cases of r-values, addresses, and a few complicated l-value cases like swizzling, it just doesn't scale to highly abstract l-values like we get from `__subcript` and `property` declarations, nor other corner cases of l-values that we need to handle (e.g., passing an `int` to an `inout float` parameter is allowed in HLSL, and performs conversions in both directions!).
It Should be Easy (TM) to extend the logic that tries to synthesize an interface conformance witness method when there isn't an exact match to also support synthesizing a property declaration (plus its accessors) to witness a required property when the type has a field of the same name/type.
* fixup: pedantic template parsing error (thanks, clang!)
* fixup: cleanups and review feedback
* Removed some `#ifdef`'d out code from merge change
* Added proper diagnostics for accessor parameter constraints, which led to some fixes/refactorings
* Added a test case for the accessor-related diagnostics
|
|
|
|
The main new feature that works here is that a derived `struct` type can satisfy one or more interface requirements using methods it inherited from a base `struct` type:
```hlsl
interface ICounter { [mutating] void increment(); }
struct CounterBase { int val; [mutating] void increment() { val++; } }
struct ResetableCounter : CounterBase, ICounter
{
[mutating] void reset() { val = 0; }
}
```
Here the derived `ResetableCounter` type is satisfying the `increment()` requirement from `ICounter` using the inherited `CounterBase` method instead of one defined on `ResetableCounter`.
The crux of the problem here was that after lowering to HLSL/GLSL, the above code looks something like:
```hlsl
struct CounterBase { int val; };
void CounterBase_increment(in out CounterBase this) { this.val++; }
struct ResetableCounter { CounterBase base; }
void ResetableCounter_reset(in out ResetableCounter this) { this.base.val = 0; }
```
The central problem is that `CounterBase_increment` here is not type-compatible what we expect to find in the witness table for `ResetableCounter : ICounter`: the `this` parameter has the wrong type!
The basic solution strategy here is to intercept the search for a witness to sastify an interface requirement in `findWitnessForInterfaceRequirement` (those witnesses get collected into a witness table). The revised logic first looks for an exact match, which will only consider members introduced for the type itself, and not those introduced by base types.
If an exact match for a method requirement is not found, the semantic checker then tries to *synthesize* a witness for the requirement, which more or less amounts to generating a function like:
```hlsl
[mutating] void ResetableCounter::synthesized_increment()
{
this.increment();
}
```
The body of that synthesized method will type-check just fine in this case (because it desugars into `this.base.increment()`, more or less), and thus the synthesized method declaration can be used as the actual witness that drives downstream code generation.
Details:
* I added some options to lookup to allow us to explicitly skip member lookup through base interfaces; this should make sure that we don't accidentally satisfy an interface requirement using a member of the same or another interface (since such members are conceptually `abstract`).
* As it originally stood, the semantic checker was allowing `CounterBase.increment()` to satisfy the `increment()` requirement of `ResetableCounter` directly, with the result that we got invalid HLSL/GLSL code as output. In order to avoid this and other bad cases, I made sure that the "exact match" case of requirement satisfaction ignores members that included any "breadcrumbs" in the lookup result item (since the breadcrumbs would all indicate transformations that needed to be applied to `this` to find the right member).
* If we eventually have targets where `this` is passed by pointer/reference in all cases, then all of this work is not needed for the common case of single inheritance, and the base-type method should be usable as a witness directly. I don't see any easy way to handle that special case without producing target-dependent code in the front-end. It might be that we need an IR pass that can detect functions that are trivial "forwarding" functions and replace them with the function they forward to.
* This change includes a test case that should have come along with the original PR that started adding struct inheritance
Caveats:
* The comments in this change talk about things like allowing a method with a default parameter to satisfy a requirement without that parameter. That scenario won't actually work at present because we still have an enormous hack in our logic for checking methods against requirements: we don't actually consider their signatures! I couldn't fold a fix for that issue into this change because there are subtle corner cases around associated types that we need to handle correctly (which were part of the reason why the checking is as hacked as it is)
* This change does *not* try to test or address the case where we want to have a `Derived` type conform to `ISomething` because it inherits from `Base` and `Base : ISomething`. That case has its own details that need to be worked out, but ideally can follow a similar implementation strategy when it comes to re-using methods from `Base` to satisfy requirement on `Derived`.
|
|
This was an oversight in the stdlib, and the `!=` definition follows the `==` in a straightforward fashion.
|
|
* Working matrix swizzle.
Supports one and zero indexing and multiple elements.
Performs semantic checking of the swizzle. Matrix swizzles are
transformed into a vector of indexing operations during lowering to the
IR.
This change does not handle matrix swizzle as lvalues.
* Renaming
* Added missing semicolon
* Initialize variable for gcc
* Added the expect file for diagnostics
* Matrix swizzle updated per PR feedback
* Stylistic fix
* Formatting fixes
* Fix compiling with AST change.
Change indentation.
Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
|
|
* Fix front-end handling of generic static methods
The front-end logic that was testing if a member was usable as a static member neglected to unwrap any generic-ness and look at the declaration inside (the parser currently puts all modifiers on the inner declaration instead of the outer generic).
The test case included here is not a full compute test so that it only runs the front-end checking logic (where we had the bug).
* fixup: tabs->spaces
|
|
* Fixes for IR generics
There are a few different fixes going on here (and a single test that covers all of them).
1. Fix optionality of trailing semicolon for `struct`s
======================================================
We have logic in the parser that tries to make a trailing `;` on a `struct` declaration optional. That logic is a bit subtle and couild potentially break non-idiomatic HLSL input, so we try to only trigger it for files written in Slang (and not HLSL). For command-line `slangc` this is based on the file extension (`.slang` vs. `.hlsl`), and for the API it is based on the user-specified language.
The missing piece here was that the path for handling `import`ed code was *not* setting the source language of imported files at all, and so those files were not getting opted into the Slang-specific behavior. As a result, `import`ed code couldn't leave off the semicolon.
2. Fix generic code involving empty `interface`s
================================================
We have logic that tries to only specialize "definitions," but the definition-vs-declaration distinction at the IR level has historically been slippery. One corner case was that a witness table for an interface with no methods would always be considered a declaration, because it was empty. The notion of what is/isn't a definition has been made more nuanced so that it amounts to two main points:
* If something is decorated as `[import(...)]`, it is not a definition
* If something is a generic/func (a declaration that should have a body), and it has no body, it is a declaration
Otherwise we consider anything a definition, which means that non-`[import(...)]` witness tables are now definitions whether or not they have anything in them.
3. Fix IR lowering for members of generic types
===============================================
The IR lowering logic was trying to be a little careful in how it recurisvely emitted "all" `Decl`s to IR code. In particular, we don't want to recurse into things like function parameters, local variables, etc. since those can never be directly referenced by external code (they don't have linkage).
The existing logic was basically emitting everything at global scope, and then only recursing into (non-generic) type declarations. This created a problem where a method declared inside a generic `struct` would not be emitted to the IR for its own module at all *unless* it happened to be called by other code in the same module.
The fix here was to also recurse into the inner declaration of `GenericDecl`s. I also made the code recurse into any `AggTypeDeclBase` instead of just `AggTypeDecl`s, which means that members in `extension` declarations should not properly be emitted to the IR.
Conclusion
==========
These fixes should clear up some (but not all) cases where we might emit an `/* unhandled */` into output HLSL/GLSL. A future change will need to make that path a hard error and then clean up the remaining cases.
* fixup: tabs->spaces
|
|
This change adds logic for parsing `namespace` declarations, referencing them, and looking up their members.
* The parser changes are a bit subtle, because that is where we deal with the issue of "re-opening" a namespace. We kludge things a bit by re-using an existing `NamespaceDecl` in the same parent if one is available, and thereby ensure that all the members in the same namespace can see on another.
* In order to allow namespaces to be referenced by name they need to have a type so that a `DeclRefExpr` to them can be formed. For this purpose we introduce `NamespaceType` which is the (singleton) type of a reference to a given namespace.
* The new `NamespaceType` case is detected in the `MemberExpr` checking logic and routed to the same logic that `StaticMemberExpr` uses, and the static lookup logic was extended with support for looking up in a namespace (a thin wrapper around one of the existing worker routines in `slang-lookup.cpp`.
* I made `NamespaceDecl` have a shared base class with `ModuleDecl` in the hopes that this would allow us to allow references to modules by name in the future. That hasn't been tested as part of this change.
* I cleaned up a bunch of logic around `ModuleDecl` holding a `Scope` pointer that was being used for some of the more ad hoc lookup routines in the public API. Those have been switched over to something that is a bit more sensible given the language rules and that doesn't rely on keeping state sititng around on the `ModuleDecl`.
* I added a test case to make sure the new funcitonality works, which includes re-opening a namespace, and it also tests both `.` and `::` operations for lookup in a namespace.
* The main missing feature here is the ability to do something like C++ `using`. It would probably be cleanest if we used `import` for this, since we already have that syntax (and having both `import` and `using` seems like a recipe for confusion). Most of the infrastructure is present to support `import`ing one namespace into another (in a way that wouldn't automatically pollute the namespace for clients), but some careful thought needs to be put into how import of namespaces vs. modules should work.
|
|
The functionality already appears to work, and this test is just to make sure we don't regress on it.
The most interesting thing here is that I'm using this change to pitch a new organization for tests around what part of the language they cover (rather than the kind of test they are), since the `tests/compute/` directory is getting overly full and is hard to navigate. We can consider moving individual tests into more of a hierarchy at some later point.
|