slang.git - Making it easier to work with shaders

Age	Commit message (Collapse)	Author
2019-02-07	* Improve test coverage of bit cast, particularly for asfloat. Make the ↵	jsmall-nvidia
	values being cast between valid floats. (#832) * Typo fix
2019-02-05	Merge branch 'master' into fix-nested-type-conformances	Yong He

2019-02-05	Merge branch 'master' into gencloser	Yong He

2019-02-05	Merge branch 'master' into fix-nested-type-conformances	Tim Foley

2019-02-05	Allow entry points to have explicit generic parameters (#826)	Tim Foley
	* Allow entry points to have explicit generic parameters Prior to this change, the Slang implementation required users to use global `type_param` declarations in order to specialize a full shader. For example: ```hlsl type_param L : ILight; ParameterBlock<L> gLight; [shader("fragment")] float4 fs(...) { ... gLight.doSomething() ... } ``` With this change we can rewrite code like the above using explicit generics, plus the ability to have `uniform` entry-point parameters: ```hlsl [shader("fragment")] float4 fs<L : ILight>( uniform ParameterBlock<L> light, ...) { ... light.doSomething() ... } ``` Having this support in place should make it possible for us to eliminate global generic type parameters and the complications they cause (both at a conceptual and implementation level). The most central and visible piece of the change is that `EntryPointRequest` now holds a `DeclRef<FuncDecl>` instead of just ` RefPtr<FuncDecl>`, which allows it to refer to a specialization of a generic function. Various places in the code that refer to the `EntryPointRequest::decl` member now use a `getFuncDecl()` or `getFuncDeclRef()` method as appropriate (see `compiler.h`). In order to fill in the new data, the `findAndValidateEntryPoint` function has been greaterly overhauled. The changes to its operation include: * The by-name lookup step for the entry point function has been adapted to accept either a function or a generic function. * The generic argument strings provided by API or command line are no longer parsed all the way to `Type`s, but instead just to `Expr`s in the first pass. * There are now two cases for checking the global generic arguments against their matching parameters. The first case is the new one, where we plug the generic argument `Expr`s into the explicit generic parameters of an entry point (that case re-uses existing semantic checking logic). The second case is the pre-existing code for dealing with global generic type arguments. The `lower-to-ir.cpp` logic for hadling entry points then had to be extended. Making it deal with a full `DeclRef` instead of just a `Decl` was the easy part (just call `emitDeclRef` instead of `ensureDecl`). The more interesting bits were: * We need to carefully add the `IREntryPointDecoration` to the nested function and not the generic in the case where we have a generic entry point. There is a handy `getResolvedInstForDecorations` that can extract the return value for an IR generic so that we can decorate the right hting. * We need to make sure that in the case where we emit a `specialize` instruction (which normally wouldn't get a linkage decoration), we attach an `[export(...)]` decoration to it with the mangled name of the decl-ref, so that it can be found during the linking step. The IR linking step is then slightly more complicated because the mangled entry point name could either refer directly to an `IRFunc` or to a `specialize` instruction for a generic entry point. The logic was refactored to first clone the entry point symbol without concern for which case it is (the old code was specific to functions), and then if the result is a `specialize` instruction, we attempt to run generic specialization on-demand. That on-demand specialization is a bit of a kludge, but it deals with the fact that all the downstream passing only expect to see an `IRFunc`. A future cleanup might try to split out that specialization step into its own pass, which ends up being a limited form of the specialization pass. Since I was already having to touch a lot of the code around IR linking, I went ahead and refactored the signature of the operations. I eliminated the need for the caller to create, pass in, and then destroy an `IRSpecializationState` (really an IR linking state), and replaced it with a structure local to the pass (that data structure was a remnant of an older approach in the compiler), and then also renamed the main operation to `linkIR` to reflect what it is doing in our conceptual flow. Smaller changes made along the way include: * Refactored `visitGenericAppExpr` to create a subroutine `checkGenericAppWithCheckedArgs` so that it can be used by the entry-point validation logic described above). * Refactored the declarations around the IR passes in `emitEntryPoint()` (`emit.cpp`), to show that things are more self-contained than they used to be (e.g., that the `TypeLegalizationContext` is now only needed by one pass). * Refactored the generic specialization code so that there is a stand-along free function that can perform specialization on a `specialize` instruction without all the other context being required. This is only to support the limited specialization that needs to be done as part of linking. * Updated the `global-type-param.slang` test to actually test entry-point generic parameters. In a later pass we can/should rework all the tests/examples for global type parameters over to use explicit entry-point generic parameters (at which point we should rename the tests as well). For now I am leaving thigns with just one test case, with the expectation that bugs will be found and ironed out as we expand to more tests. * fixup * Fixup: don't leave entry-point decorations on stuff we don't want to keep The IR `[entryPoint]` decoration is effectively a "keep this alive" decoration, which means that attaching it to something we don't intend to keep around can lead to Bad Things. The approach to generic entry points was attaching `[entryPoint]` to the underlying `IRFunc` because that seemed to make sense, but that meant that the `specialize` instruction at global scope scould instantiate that generic and then keep it alive, even if the resulting function wouldn't be valid according to the language rules. As a quick fix, I'm attaching `[entryPoint]` to the `specialize` instruction instead in such cases, and then re-attaching it to the result of explicit specialization during linking. * Port most of remaining test and rename global type parameters This change ports as many as possible of the existing tests for global type parameters over to use entry-point generic parameters instead. For the most part this is a mechanical change. A few test cases remain using global generic parameters, as does the `model-viewer` example application. The reason for this is that the shaders have either or both the following features: * A vertex and fragment shader that can/shold agree on their parameters * A type declaration (e.g., a `struct`) that is dependent on one of the generic type parameters In these cases, it would really only make sense to switch to explicit parameters once we support shader entry points nested inside of a `struct` type, so that we can use an outer generic `struct` as a mechanism to scope the entry points and other type-dependent declrations. Since global-scope type parameters need to persist for at least a bit longer, I went ahead and renamed all the use sites over to use `type_param` for consistency.
2019-02-05	Allow generics to close with >>	Yong He

2019-02-05	Fix checking of interface conformances for nested types	Tim Foley
	Before this change, code like the following would crash the compiler: ```hlsl interface IThing { /* ... / } struct Outer { struct Inner : IThing {} } / go on to use Outer.Inner / ``` The problem was that the front-end logic for checking interface conformances was only* checking declarations at the top level of a module, or nested under a generic. This change fixes the logic to recurse through the entire tree of declarations. I have added a test case that uses a nested `struct` type to satisfy an associated type requirement, to confirm that the new check works as intended.
2019-01-31	Initial support for uniform parameters on entry points (#815)	Tim Foley
	* Initial support for uniform parameters on entry points The basic feature this work adds is the ability to define a shader entry point like: ```hlsl [shader("fragment")] float4 main( uniform Texture2D t, uniform SamplerState s, float2 uv : UV) { return t.Sample(s,uv); } ``` In this example, the `uniform` keyword is used to mark that the given entry point parameters are not varying input/output flowing through the pipeline, but rather uniform shader parameters that should function as if the shader was declared more like: ```hlsl Texture2D t, SamplerState s, [shader("fragment")] float4 main( float2 uv : UV) { return t.Sample(s,uv); } ``` Allowing `uniform` parameters on entry points makes it easier to define multiple entry points in one file without accidentally polluting the global scope with shader parameters that only certain entry points care about. This feature is also more or less a prerequisite for allowing generic type parameters directly on entry point functions, since the main use case for those type parameters is for determining what goes in various `ConstantBuffer`s or `ParameterBlock`s. There are two main pieces to the implementation. First, we need to be able to compute appropriate layout information for entry points that include `uniform` parameters. Second, we need to transform the entry point function to move any `uniform` parameters to be ordinary global-scope shader parameters, to make sure that all other back-end passes don't need to worry about this special case. The latter piece of the implementation is, relatively speaking, simpler. The pass in `ir-entry-point-uniforms.{h,cpp}` converts entry point parameters that are determined to be uniform (using the already-computed layout information) into fields of a `struct` type and then declares a global shader parameter based on that `struct` type (and applies already-computed layout information to that parameter). After that, the remaining IR passes (notably including type legalization) will handle things just as for any other global shader parameter. The changes to the layout step are more significant, but most of the changes are just cleanups and fixes to enable the feature. The two major changes that enable entry-point `uniform` parameters are: * In `collectEntryPointParameters` we now dispatch out to a new `computeEntryPointParameterTypeLayout` function, which decided whether to compute the type layout for a `uniform` parameter, or for a varying parameter (what used to be the default behavior handled by `processEntryPointParameterDecl`). * The main `generateParameterBindings` routine was extended so that it allocates registers/bindings to the resources required by each entry point (using `completeBindingsForParameter`) after it has allocated registers/binding to all of the global-scope parameters (this addition is mirrored in `specializeProgramLayout`). The effect of these changes is that the `uniform` parameters of any entry points specified in a compile request will be laid out after the global-scope parameters, in the order the entry points were specified in the compile request. A bunch of smaller changes were made around parameter layout that are worth enumerating so that the diffs make some sense: * The `EntryPointLayout` type was changed so that instead of trying to be a `StructTypeLayout`, it instead owns one, in the same fashion as `ProgramLayout`. This commonality was factored into a base class `ScopeLayout`, and a bunch of edits followed from that change. * Because `uniform` parameters are moved out of the entry point parameter list early in the IR transformations, the logic in `ir-glsl-legalize.cpp` that tried to look up parameter layout information by index would no longer work if the entry point parameter list had been altered. Instead, that logic now looks for the decorations directly on the parameters. * The `UsedRange` type in `parameter-binding.cpp` was tracking the existing parameter associated with a range using a `ParameterInfo` (which accounts for the possibility of multiple `VarDecl`s mapping to the same logical shader parameter), when just using a `VarLayout` is sufficient for all current use cases. The overhead of allocating a `ParameterInfo` seems like overkill for entry-point parameters, where there can't possibly be multiple declarations of the "same" parameter, so avoiding these overheads was a focus when trying to deduplicate code between the global and entry-point parameter cases. * A bunch of parameter binding logic that was specific to GLSL input has been deleted completely. There was no way to even execute this code in the compiler today, and there is pretty much zero chance of us needing (or wanting) to deal with GLSL input in the future. This includes custom `UsedRangeSet`s specific to each translation unit, which were only needed for global-scope `in` and `out` varying declarations in GLSL. * A bunch of functions with `EntryPointParameter` in their names were renamed to use `EntryPointVaryingParameter` to help distinguish that they only apply to the varying case, while entry point `uniform` parameters are handled elsewhere. * The `completeBindingsForParameter` function was re-worked into something that can be used for both global-scope shader parameters (where we have a `ParameterInfo` and possibly explicit bindings) and entry-point parameters (where we expect to have neither). This helps unify the (fairly subtle) logic for how we allocate and assign bindings for resources, constant buffers, parameter blocks, etc. * A small change was made so that the entry-point stage is attached directly to top-level parameters of the entry point, and not recursively to every field along the way. This could be a breaking change for some applications, but it makes more logical sense (to me); we'll have to check if this affects Falcor. This change produces different output for several of the reflection tests, but the changes are consistent with no longer attaching stage information to sub-fields of varying `struct`-type parameters. * Because there is a bunch of repeated logic in `parameter-binding.cpp` that has to do with computing a `struct` layout for ordinary/uniform data, I tried to factor that into a single `ScopeLayoutBuilder` type, which handles computing the offsets for any parameters with ordinary data, and then also handles wrapping up the layout in a constant buffer layout if there was any ordinary data at the end. * A similar convenience routine `maybeAllocateConstantBufferBinding` was added because I noticed multiple places in `parameter-binding.cpp` that were trying to allocate a constant buffer binding for global uniforms, and they were wildly inconsistent (and in most cases used logic that would only work for D3D). * The main `generateParameterBindings` routine is significantly shortened by using all of these utilities that were introduced. I tried to comment the places that changed to explain the overall flow correctly. * The `specializeProgramLayout` routine (used to take a `ProgramLayout` from `generateParameterBindings` and specialize it based on knowledge of global generic arguments) had basically been rewritten with more explicit commenting/rationale for what happens in each step. It makes use of the same shared utilities as `generateParameterBindings` and `collectEntryPointParameters`. In terms of testing: * I added a test case to specifically test the new behavior, and in particular I made sure to include a mix of both global and entry-point parameters and also to have entry-point parameters of both ordinary and resource/object types. * I tweaked an existing test for global type parameters to use an entry-point `uniform` parameter instead of a global one, in an effort to migrate it toward being able to use an explicitly generic entry point. * fixups from merge
2019-01-28	Support function parameters of existential (interface) type (#802)	Tim Foley
	* Support function parameters of existential (interface) type The basic idea here is that you can define a function that takes an interface-type parameter: ```hlsl interface IThing { void doSOmething(); } void coolFunction(IThing thing) { ... thing.doSomething() ... } ``` and call it with a concrete value that implements the given interface: ```hlsl struct Stuff : IThing { void doSomething() { /* secret sauce / } } ... Stuff stuff; coolFunction(stuff); ``` The compiler implementation will specialize `coolFunction` based on the concrete type that was actually passed in, resulting in output code along the lines of: ```hlsl struct Stuff { ... } void Stuff_doSomething(Stuff this) { / secret sauce / } void coolFunction_Stuff(Stuff thing) { ... Stuff_doSomething(thing); } ``` In terms of implementation the new specialization approach has been integrated into the existing pass for generic specialization (which has been refactored significantly along the way), because generic specialization can open up opportunities for existential/interface simplification and vice versa, so there is no fixed interleaving of the two passes that can clean up everything. The new logic therefore subsumes the old code for simplifying existential types (which only worked on local variables) in `ir-existential.{h,cpp}`. The local simplification rules from that implementation have become part of the core specialization pass instead, so that they can open up further transformation opportunities enabled by existential-type simplifications. This code in place right now only handles the basic case of a function parameter that directly uses an interface type, and not one that wraps up an interface type in an array, structure, etc. Additional simplifications need to be introduced to deal with those cases as well. fixup: typos
2019-01-28	Feature/bit cast glsl (#808)	jsmall-nvidia
	* First attempt at asint, asuint, asfloat intrinsics. * Test with countbits * Placing glsl definitions first makes them get picked up. * Some more improvements around asint. * Add support for vector versions of asint/asunit * Fix some typos in asuint/asint intrinsics for glsl. Simplified and increased coverage of as/u/int tests. * Added bit-cast-double test. Added notional support for asdouble bit casts to glsl - but couldn't test because glslang doesn't seem to support the extension. * Try to get double bit casts working - doesn't work cos of block issue. * Only output parents on intrinic replacement if return type is not void.
2019-01-25	Fixup handling of empty structs in function return types and parameters. (#796)	Yong He
	* Fixup handling of empty structs in function return types and parameters. * Bug fix in legalizeFunc() * More comprehensive empty struct test * Fix `legalizeFieldExtract` for empty struct field. * Add empty struct handling for construct inst
2019-01-24	Support "modern" declaration syntax as an option (#792)	Tim Foley
	* Support "modern" declaration syntax as an option Fixed #202 This change adds four new declaration keywords: The `let` and `var` keywords introduce immutable and mutable variables, respectively. They can only be used to declare a single variable at a time (unlike C declaration syntax), and they support inference of the variable's type from its initial-value expression. Examples: ``` let a : int = 1; // immutable with explicit type and initial-value expression let b = a + 1; // immutable, with type inferred var c : float; // mutable, with explicit type var d = b + c; // mutable, with type inferred ``` These declaration forms can be used wherever ordinary global, local, or member variable declarations appeared before. Right now they do not change rules about what is or is not considered a shader parameter. The `static` modifier should work on these forms as expected, but a `static let` variable is not the same as a `static const`, so an explicit `const` is still needed if you want that behavior. A `typealias` declaration introduces a named type alias, similar to `typedef`, but with more reasonable syntax. It inherits from the same AST class that `typedef` uses, so all of the code after parsing should be able to treat them as equivalent. To give a simple example: ``` // typedef int MyArray[3]; typealais MyArray = int[3]; ``` A `func` declaration introduces a function. Like `typealias` it re-uses the existing AST class, so there is no need for major changes after parsing. A `func` declaration uses a syntax similar to `let` variables for its parameters, and takes the (optional) result type in a trailing position. For example: ``` func myAdd(a: int, b: int) -> int { return a + b; } ``` If a `func` declaration leaves of the return type clause, the return type is assumed to be `void`. The main difference (beyond the trailing return type) is that the parameters of a `func`-declared function are immutable (unless they are `out`/`inout`). This change doesn't add support for declaring operator overloads with `func`, but that should be added later, and I'd like to make that the only way to declare such operations: ``` func +(left: MyType, right: MyType) -> MyType { ... } ``` The use of `:` for declaring parameter types here means that a function declared with modern syntax currently cannot include HLSL-style semantics on its parameters (or its result). We might consider introducing an `[attribute]`-based syntax for adding semantics to parameters if we think this is important, but for now it is fine to insist that users declare their entry points using traditional syntax. This change strives to avoid unecessary changes after parsing, but if the new syntax catches on with users there are some small ways we can take advantage of it for performance. In particular, since `let` declarations and parameters of modern-style functions are immutable, we do not need to generate read/write local temporaries for them during lowering to the IR (technically we can make the same optimization for `const` locals). In the process of implementing these new forms I also added a few subroutines to help share code better between existing cases in the parser. In particular, parsing of generic parameter lists on declarations that can be generic is now simplified and more unified. * Fixup: remove leftover debugging code * fixup: typos
2019-01-16	Initial support for dynamic dispatch using "tagged union" types (#772)	Tim Foley
	* Initial support for dynamic dispatch using "tagged union" types Suppose a user declares some generic shader code, like the following: ```hlsl interface IFrobnicator { ... } type_param T : IFrobincator; ParameterBlock<T : IFrobnicator> gFrobnicator; ... gFrobincator.frobnicate(value); ``` and then they have some concrete implementations of the required interface: ```hlsl struct A : IFrobnicator { ... } struct B : IFrobnicator { ... } ``` The current Slang compiler allows them to generate distinct compiled kernels for the case of `T=A` and the case of `T=B`. This means that the decision of which implementation to use must be made at or before the time when a shader gets bound in the application. This change adds a new ability where the Slang compiler can generate code to handle the case where `T` might be either `A` or `B`, and which case it is will be determined dynamically at runtime. This means a single compiled kernel can handle both cases, and the decision about which code path to run can be made any time before the shader executes. This new option is supported by defining a tagged union type. Via the API, the user specifies that `T` should be specialized to `__TaggedUnion(A,B)` (the double underscore indicates that this is an experimental and unsupported feature at present). We refer to the types `A` and `B` here as the "case" types of the tagged union. Conceptually, the compiler synthesizes a type something like: ```hlsl struct TU { union { A a; B b; } payload; uint tag; } ``` The user can then allocate a constant buffer to hold their tagged union type, and when they pick a concrete type to use (say `B`), they fill in the first `sizeof(B)` bytes of their buffer with data describing a `B` instance, and then set the `tag` field to the appopriate 0-based index of the case type they chose (in this case the `B` case gets the tag value `1`). Actually implementing tagged unions takes a few main steps: * Type parsing was extended to special-case `__TaggedUnion` as a contextual keyword. This is really only intended to be used when parsing types from the API or command-line, and Bad Things are likely to happen if a user ever puts it directly in their code. Eventually construction of tagged unions should be an API feature and not part of the language syntax. * Semantic checking was extended to recognize that a tagged union like `__TaggedUnion(A,B)` shoud support an interface like `IFrobnicator` whenever all of the case types suport it, as long as the interface is "safe" for use with tagged unions (which means it doesn't use a few of the advancd langauge features like associated types). * The IR was extended with instructions to represent tagged union types and to extract their tag and the payload for the different cases as needed. * IR generation was extended to synthesize implementations of interface methods for any interface that a tagged union needs to support. Right now the implementation is simplistic and only handles simple method requirements, which it does by emitting a `switch` instruction to pick between the different cases. * A new IR pass was introduced to "desugar" any tagged union types used in the code. The downstream HLSL and GLSL compilers don't support `union`s, so we have to instead emit a tagged union as a "bag of bits" and implement loading the data for particular cases from it manually. * Final code emit mostly Just Works after the above steps, but we had to introduce an explicit IR instruction for bit-casting to handle the output of the desugaring pass. There are a bunch of gaps and caveats in this implementation, but that seems reasonable for something that is an experimental feature. The various `TODO` comments and assertion failures in unimplemented cases are intended, so that this work can be checked in even if it isn't feature-complete. * fixup: missing files * fixup: typos
2019-01-16	Improve handling of {} initializer list expressions (#778)	Tim Foley
	Fixes #775 It was reported (in #775) that Slang doesn't handle initializer-list syntax when initializing matrix variables. When starting on a fix for that it became apparent that the time was right to fix two broad issues in the compiler's current handling of `{}`-enclosed initializer lists. The first issue was that the front-end checking of initializer lists wasn't handling the C-style behavior where an initializer list can either contain nested `{}`-enclosed lists for sub-arrays/-structures, or directly contain "leaf" values for initializing those aggregates. For example, the following two variable declarations ought to be equivalent: ```hlsl int4 a[] = { {1, 2, 3, 4}, {5, 6, 7, 8} }; int4 b[] = { 1, 2, 3, 4, 5, 6, 7, 8 }; ``` Getting this distinction right is important because we want to support initializing a matrix either from a list of vectors for its rows, or a list of scalars for its elements (in row-major order). The front-end semantic checking logic for initializer lists was revamped so that it conceptually tries to "read" an expression of a desired type from the initializer list, and decides at each step whether to consume a single expression by coercing it to the desired type, or to recursively read multiple sub-values to construct the type as an aggregate. The logic for deciding between direct vs aggregate initialization could potentially use some tweaking, but luckily it should always handle the case where users introduce explicit `{}`-enclosed sub-lists to make their intention clear, so that existing Slang code should continue to work as before. The second issue was that initializers without the expected number of elements weren't implemented in code generation, so they would lead to internal compiler errors. This change revamps the codegen logic for initializer lists so that it can synthesize default values for fields/elements that were left out during initialization. This includes an attempt to support default initialization of `struct` fields based on explicitly written initialization expressions.
2019-01-14	Add an error for global uniform parameter declarations (#773)	Tim Foley
	A global uniform parameter in HLSL might canonically be defined like this: ```hlsl uniform float gSomeParameter; ``` The fxc and dxc compilers automatically collect all such parameters into a synthesized constant buffer, along the lines of: ```hlsl cbuffer $Globals { float gSomeParameter; } ``` Slang currently supports parsing and semantic checking of declarations like the above, and computes shader parameter layout/binding information that is appropriate for a constant buffer like `$Globals` above, but it does not include the support to emit HLSL or GLSL code that matches that layout, so that use of global uniforms in Slang is silently unsupported. Making this problem worse, the HLSL language is quite lax, and will parse the following as shader parameters as well: ```hlsl int gCounter = 0; const float kScaleFactor = 2.0f; ``` Each of those declarations introduces a global shader parameter, and then provides a default value for it via the initializer. These declarations do not introduce an ordinary global variable or constant as might be expected. (For anybody who wants to know, `static` is required to introduce a "real" global variable (although it will be a thread-local global in practice), while `static const` is required to introduce a global constant) I was not too worried about users trying to use global-scope uniforms and failing (since that has fallen out of common HLSL/GLSL practice), but the possibility that users might try to declare global variables/constants and get shader parameters by mistake creates more of a risk so that this hole is worth plugging. The right long-term fix is of course to support the intended semantics of global-scope uniforms, but that feature needs to be prioritized against other requests. A few of the Slang tests were unwittingly relying on this functionality, including some compute tests that seemingly got away with it based on the DXBC generated from the HLSL output by Slang just happening to match the layout they expected. These tests have all been tweaked to use explicit `cbuffer`s or `ParameterBlock`s instead.
2019-01-11	Fix some subtle bugs in D3D constant buffer layout (#771)	Tim Foley
	* Fix some subtle bugs in D3D constant buffer layout The root of the issue here is that the D3D constant buffer layout rules require 16-byte alignment for arrays and structures, but they do not round up the size of an array/structure type to account for that alignment. That means that in cases like the following: ```hlsl cbuffer C0 { float3 a[2]; float c0; } struct A { float4 x; float3 y; }; cbuffer C1 { A a; float c1; } ``` The `c0` and `c1` fields get an offset of 28 and not 32 like you might expect if the preceding array/structure field `a` had been padded out to match its 16-byte alignment. The actual fix here is relatively simple, and mostly amount to shuffling around some code in `type-layout.cpp` to ensure that the D3D constant buffer layout don't inherit the logic that was rounding up array/structure sizes. Along the way I took the opportunity to clean up the inheritance hierarchy by making the GLSL-family layout rules not try to share anythign with the D3D family (not that there is very little to share), which in turn allowed for some simplification of the GLSL side of things. Fixing this behavior changed the output of a few reflection tests. In the case of `tests/reflection/arrays.hlsl` the output confirmed that we had been producing bad reflection information in these kinds of cases. The output for `tests/reflection/matrix-layout.slang` also showed some bugs in our reflection, but these were overall more minor: we mis-reported the size of certain matrices as 64 bytes instead of 60, and as a result also computed the size of the overall constant buffer as 4 bytes bigger than needed. In all of these cases I double-checked the expected output against dxc to make sure that the new offsets/sizes are what we should have been producing in the first place. I also updated the reflection test harness to start outputting layout information for the element type of a structured buffer, which changed the output of `tests/reflection/structured-buffer.slang`, but this didn't show any change in what we reported: it is just information that wasn't in the output to begin with. Finally, I added two new tests around these subtle cases of buffer layout behavior (especially subtle because it varies across target APIs). The `tests/compute/buffer-layout.slang` test simply sets up a type to ilustrate the troublesome scenarios and then embeds it in both a constant buffer and structured buffer that will be backed by memory with sequential `int` values. We then read out the value of a field as a way to probe its de facto offset at runtime. This test doesn't really stress the Slang compiler (except for our ability to pass through the same type declarations to downstream compilers), but it is useful to confirm our expectations about where things land in memory. The `tests/reflection/buffer-layout.slang` test then uses the reflection test infrastructure to confirm that the same type declarations used in the compute test produce the expected offsets in our reported reflection information. Before the fixes in this change this test showed us producing dangerously incorrect results in our D3D reflection information, which has now been fixed to match the empirically-determined offsets from the compute test. * fixups based on review feedback
2018-12-20	Feature/lex memory reduction (#762)	jsmall-nvidia
	* Only do scrubbing if needed. When allocating content try to limit size (with scrubbing each token takes up 1k), now it's 16 bytes min size. * Don't allocate for every call to write on the CallbackWriter - use the m_appendBuffer. * Don't allocate memory for CallbackWriter use m_appendBuffer. * Use UnownedStringSlice for suffix output for parsing float/int literals. Fix typo in invalidFloatingPointLiteralSuffix * Using memory arena to hold tokens that are not in SourceManager. * Improve comment on lexing. * Make UnownedStringSlice allocation simpler on SourceManager. * Fix error on gcc around UnownedStringSlice - because VC converted string + UnownedStringSlice automatically into a String. * Fix generateName needing concat string for gcc. * When constructing a Token in parseAttributeName - because it's a Identifier, we have to set the Name. * Remove translation through String on getIntrinsicOp * Make func-cbuffer-param disablable with -exclude compatibility-issue * Move memory leak in render-test. * From review - can just use "?:" instead of performing a concat.
2018-12-17	First step toward supporting use of interfaces as existential types (#716)	Tim Foley
	* First step toward supporting use of interfaces as existential types Traditional generics involve universal quantification. E.g., a declaration like: ``` void drive<T : IVehicle>(T vehicle); ``` indicates for for all types `T` that implement the `IVehicle` interface, the `drive()` function is available. In contrast, whend directly using an interface type like: ``` IVehicle v = ...; v.doSomething(); ``` we only know that there exists some concrete type (we could call it `E`) such that `v` refers to a value of type `E`, and `E` implements the `IVehicle` interface. In order to perform an operation like `v.doSomething()` we need to "open" the existential value so that we can look at the concrete type and how it implements the `IVehicle.doSomething` requirement. This change adds a very explicit representation of existentials to Slang's IR. An operation like `e = makeExistential(v, w)` creates a value of some existential type (interfaces being our only existential types for now), by wrapping a concrete value `v` (the type of `v` can be seen as an implicit operand) and a witness table `w` showing that the type of `v` implements the requirements of the chosen interface type. In turn, opening of an existential is handled with operations `extractExistential{Value\|Type\|WitnessTable}` which pull the corresponding piece of information out of a value of existential type (which somewhere in the code had to have been created with `makeExistential`). The change includes a trivial simplification pass that can detect cases where an `extractExistential` operation is applied direclty to a `makeExistential` operation, so that there is only one possible result that could be extracted. This allows for simplification of existential types used in trivial ways for local variables (this is mostly so I can check in a functional test, rather than to actually support useful code involving interfaces right now). The logic in the semantic checking phase of the compiler is comparatively more complex. When we are about to perform member lookup given an expression like `obj.member` we will first check if `obj` has an existential type, and if it does we will construct a suitable local context in which we extract the value, type, and witness table from the existential (these all become explicit AST expression nodes), and then use the extracted value as the base of the lookup operation. The nature of existential values is that two different values with the same existential (interface) type could wrap concrete values with differnt types, so that we need to carefully refer only to the extracted type/value/witness-table of specific values. We handle this right now by conceptually moving the existential-type value into a local variable (by introducing a `LetExpr` that amounts to `let v = <init> in <body>`) and then require that the extract expressions must refer to the (immutable) variable declaration from which they are extracting a value. (Eventually we should expand this so that when using an immutable local variable of existential type we just use that variable as-is rather than introduce a new temporary) A simple test case is included that uses an interface type in an almost trivial way for a local variable; this test can be run and produces the expected results. A more complex test case that passes an existential into a function is included, but left disabled because a more aggressive simplification approach is required to generate working code from it. Add missing file for expected test output * Fixups for merge from top-of-tree
2018-12-17	Specialize away resource-type function parameters (#759)	Tim Foley
	* Specialize away resource-type function parameters Work on #397. Introduction ------------ Suppose a user writes a function that takes a resource type as a parameter: ```hlsl float4 getThing(RWStructuredBuffer<float4> buffer, int index) { return buffer[index]; } ``` This function creates challenges when generating code for GLSL-based targets, because a global shader parameter of type `RWStructuredBuffer`: ```hlsl RWStructuredBuffer<float4> gBuffer; ``` translates to a global GLSL `buffer` declaration: ```hlsl buffer _S0 { float4 _data[]; } gBuffer; ``` There is no equivalent to that `buffer` declaration that can be used in function parameter position, and it is illegal in GLSL to pass `gBuffer` into a function. (Aside: yes, we could in principle translate a function parameter like `RWStructuredBuffer<float4> buffer` to `float4 buffer[]`, but that will not in turn generalize to arrays of structured buffers; it is a dead-end strategy) The solution employed by many shader compilers is to "inline everything" to eliminate the need for parameters of resource types, and then rely on dataflow optimization to eliminate locals of resource types. This strategy can of course lead to an increase in code size, and it also means that call stacks are lost when doing step-through debugging. Another serious issue is that an "early `return`" from a function can turn into the equivalent of a multi-level `break` when inlined, and not all of our targets support multi-level `break`. The solution implemented in this change works around some, but not all, of the problems with full inlining. The approach here generates specialized versions of a function like `getThing`, adapted to the actual arguments provided at different call sites. Thus if we have code like: ```hlsl RWStructuredBuffer<float4> gA; RWStructuredBuffer<float4> gB[10]; ... getThing(gA, x); getThing(gA, y); getThing(gB[someVal], z); ``` we will generate two specializations of `getThing`: one specialized for the `buffer` parameter being `gA` and the other for `gB`: ```hlsl float4 getThing_gA(int index) { return gA[index]; } float4 getThing_gB(int _val, int index) { return gB[_val][index]; } ``` and the call sites will change to match: ```hlsl getThing_gA(x); getThing_gA(y); getThing_gB(someVal, z); ``` Note how in the case where the argument being passed in was obtained by indexing into an array of resources, the callee is specialized to the identity of the global shader parameter (`gB`), and now accepts a new parameter to indicate the array index into it. While this description motivates the change based on GLSL output, the same basic issue can arise for other targets. For example, while current HLSL has added the `ConstantBuffer<T>` type, it is not supported on older targets, and it turns out that even dxc does not allow functions to have `ConstantBuffer<T>` parameters. Longer-term, we will likely need to do even more aggressive specialization both in order to generate SPIR-V output directly, and also to deal with function that have return values or `out` parameters of resource types. Implementation -------------- The meat of the change is in `ir-specialize-resources.{h,cpp}`, where we have a pass that looks at all call sites (`IRCall` instructions) in the program, and attempts to replace them with calls to specialized functions, where the specializations are generated on-demand. The code in this pass is heavily commented, so hopefully it serves to explain itself all right. After specialization is complete, we may still have functions like the original `getThing` that will produce invalid code when emitted as GLSL, so we need a way to make sure they don't appear in the output. To date we've had some very ad hoc approaches for ignoring IR constructs that we don't want to affect emitted code, but this change goes ahead and adds a more real dead code elimination (DCE) pass in `ir-dce.{h,cpp}`. This pass follows a straightforward approach of tagging instructions that are "live" and then propagating liveness through the whole program, before making a single pass to delete anything that isn't live. When I first added the DCE pass it eliminated everything because there were no "roots" for liveness. I solved this for now by adding a new decoration, `IREntryPointDecoration`, to mark shader entry points in the IR which should always be live (as should anything they depend on). A secondary problem that arose was that for GLSL ray tracing shaders it is possible for the incoming/outgoing payload or attributes parameters to be unused, but eliminating them as dead would change the signature of a shader an potential break the rules for how ray tracing programs communicate. I added a very simple `IRDependsOnDecoration` that allows one IR instruction to keep another alive as if it used it, without actually using it. There's also a fixup in the IR dumping logic where I was forgetting to store anything in the mapping from instruction to their names, so that the name of an instruction was getting incremented each time it was referenced. Testing ------- There are three different tests added as part of this change: * The `compute/func-resource-param` test covers the basic `RWStructuredBuffer` case above, which we expect to work fine for D3D11/12, but fail for Vulkan without specialization. * The `cross-compile/func-resource-param-array` test covers the case where we don't just have one resource, but an array of them. This is not an end-to-end compute test primarily because our `render-test` application doesn't yet handle arrays of resources correctly in its binding logic. * The `compute/func-cbuffer-param` test covers the case of a function with a `ConstantBuffer<T>` parameter, which requires specialization to become valid for any of our targets. * fixup: warnings/errors from other compilers * fixup: typos and cleanup * fixup: typos
2018-12-13	Move mangled name out of IRGlobalValue (#752)	Tim Foley
	* Move mangled name out of IRGlobalValue Previously the `IRGlobalValue` type was used as a root for all IR instructions that can have "linkage," in the sense that a definition in one module can satisfy a use in another module. The mangled symbol name was stored in state directly on each `IRGlobalValue`, which created some complications, and also forced IR instructions that wanted to support linkage to wedge into the hierarchy at that specific point. This change moves the mangled name out into a decoration: either an `IRImportDecoration` or an `IRExportDecoration`, both of which inherit from `IRLinkageDecoration` which exposes the mangled name. This change has a few benefits: * We can now have any kind of instruction be exported/imported, without having to inherit from `IRGlobalValue`. This could potentially let `IRStructType` and `IRWitnessTable` be simplified to just have operand lists instead of dummy chldren as they do today. * We can now easily have "global values" like functions that explicitly don't get linkage, instead of using a null or empty mangled name as a marker. * We can use the exact opcode on a linkage decoration to distinguish imports from exports, which could be used to more accurately resolve symbols during the linking step. Other than adding the decorations and making sure that AST->IR lowering adds them, the main changes here are around any code that used `IRGlobalValue`. Variables and parameters of type `IRGlobalValue` were changed to `IRInst` easily, so the main challenge was around code that casts to `IRGlobalValue. In cases where a cast to `IRGlobalValue` also performed a test for the mangled name being non-null/non-empty, we simply switched the code to check for the presence of an `IRLinkageDecoration`, since that is the new way of indicating a value with linakge. Most of the serious complications arose in `ir.cpp` around the "linking"/target-specialization and generic specialization steps. The "linking" logic was checking for `IRGlobalValue` to opt into some more complicated cloning logic, and just checking for a linkage decoration here wasn't sufficient since the front-end does* produce global values without linkage in some cases (e.g., for a function-`static` variable we produce a global variable without linkage). This logic was updated to just check for the cases that used to amount to `IRGlobalValue`s directly by opcode. It might be simpler in the short term to have kept `IRGlobalValue` around to make the existing casts Just Work, but I'm confident that this logic could actually be rewritten for much greater clarity and simplicity and that is the better way forward. The generic specialization logic was using some really messy code to generate a new mangled name to represent the specialized symbol, and then searching for an existing match for that name. The original idea there was that an IR module could include "pre-specialized" versions of certain generics to speed up back-end compilation by eliminating the need to specialize in some cases, but this feature has never been implemented so the overhead here is just a waste. Instead, I moved generic specialization to use a simpler dictionary to map the operands to a `specialize` instruction over to the resulting specialized value. This allows for some simplifications in the name mangling logic, because it no longer needs to figure out how to produce mangled names from IR instructions representing types/values. As part of this change I also overhauled the IR emit logic to produce cleaner output by default, borrowing some of the ideas from the logic in `emit.cpp`. IR values are now automatically given names based on their "name hint" decoration, if any, to make the code easier to follow, and I also made it so that types and literals get collapsed into their use sites in a new "simplified" IR dump mode (which is currently the default, with no way to opt into the other mode without tweaking the code). The resulting IR dumps are much nicer to look at, but as a result the one test that involves IR dumping (`ir/string-literal`) doesn't really test what it used to. One weird issue that came up during testing is that the `transitive-interface` test had previously been producing output that made no sense (that is, the expected output file wasn't really sensible), and somehow these changes were altering its behavior. Changing the test to use `int` values instead of `float` was enough to make the output be what I'd expect, and hand inspection of generating DXBC has me convinced we were compiling the `float` case correctly too. There appears to be some issue around tests with floating-point outputs that we should investigate. * fixup: C++ declaration order
2018-12-11	Decorations are instructions (#748)	Tim Foley
	* Make a test case use IR serialization * Make all IR instructions usable as parents This makes it so that every `IRInst` has the list of children that used to be on `IRParentInst` and eliminates `IRParentInst`. Most places in the code were only checking against `IRParentInst` so that they could know whether there were child instructions to iterate over. This change bloats the size of every instruction by two pointers, but we hope to be able to eliminate that overhead with a better encoding later. * Change IR decorations to be instructions. The main change here is that `IRDecoration` now inherits from `IRInst`, and `IRInst` now has a single linked list that holds both decorations and children. At each point where code used to loop over `getChildren()` on an `IRInst`, I checked whether it made sense to leave the operation as processing just the children, or if it should process both decorations and children. The thorniest bit was making sure the logic for inserting an instruction into a parent is correct. For the most part, once IR code is built all insertions are explicitly before/after another instruction, so the ordering can't get messed up. The sticking point is any code that does an explicit `insertAtStart` or `insertAtEnd`, but I surveyed those to make sure they are correct in context, and I also made all insertions bottleneck through one routine that does a better job of asserting the preconditions than what was there before. We may still want a "smart" insertion function at some point so that if somebody does `someDecoration->insertAtEnd(someInst)` the decoration intelligently goes to the end of the decoration list, and not the entire decorations-and-children list. All of the existing decoration types were refactored to provide accessors for their operands, rather than directly exposing fields. In most cases the operands are required to be `IRConstant` nodes of fixed types. Not all of these types need to be kept around in the new approach, but they were left in so that as much existing code as possible can be kept working. The `IRBuilder` was extended with factory functions to make the various decoration types and attach them. All the fields in concrete decorations that were using `StringRepresentation` or `Name` pointers are now using IR-level string operands which provide their value as an `UnownedStringSlice`, so logic that was working with those decoration values needed to be updated here and there. I also needed to add the logic to clone string-literal values to the IR cloning pass, since they are now being used in almost every piece of code. A new type of constant IR instruction for literal pointers was added, to handle the cases where an IR decoration needs an operand that is a raw AST-level pointer. These are even being serialized, although we obviously should not rely on them to round-trip through serialization in the future. Ideally, a follow-on change should add a cleanup pass where we remove any decorations from a module that shouldn't be allowed in the serialized code. The biggest overall cleanup is in the serialization logic, where a lot of code just disappears because it can process the raw "decorations and children" list as the logical children of an IR instruction. The only special cases left are literals (which seem like they will always need special-casing) and global values (because they have a mangled name, which we plan to move into a decoration). One other example of a simplification made possible by this change: the `IRNotePatchConstantFunc` instruction was implemented as an instruction only because it couldn't be encoded as a decoration at the time (it needed to have an operand that referenced an IR function). The IR dumping logic was also updated (which meant a change to the `ir/string-literal` test) to try to make it print out all decorations a bit more systematically now that they are encoded like other instructions. The formatting isn't quite perfect, but it is good enough to be able to read what is going on. I didn't include updates to the validation logic to ensure that decorations are being added in ways that follow the invariants, but that would be a nice thing to add next. * fixup: 64-bit issues * fixup: forward declaration issues
2018-12-07	Change how buffers are emitted (#741)	Tim Foley
	* Change how buffers are emitted This is a change with a lot of pieces, which can't always be separated out cleanly. I'm going to walk through them in what I hope is a logical order. The main goal of this change was to allow arrays of structured buffers to translate to Vulkan. Consider two declarations of structured buffers in HLSL/Slang: ```hlsl StructuredBuffer<X> single; StructuredBuffer<Y> multiple[10]; ``` The current translation logic was handling `single` by translating it into an unnamed GLSL `buffer` block like: ```glsl layout(std430) buffer _S1 { X single[]; }; ``` That syntax allows an expression like `single[i]` in Slang to be translated simply as `single[i]` in GLSL. But that naive translating doesn't work for `multiple`, since we need to declare a array of blocks in GLSL, which requires giving the whole thing a name: ```glsl layout(std430) buffer _S2 { Y _data[]; } multiple[10]; ``` Now a reference to `multiple[i][j]` in Slang needs to become `multiple[i]._data[j]` in GLSL. To avoid having way too many special cases around single structured buffers vs. arrays, it makes sense to allows emit things in the latter form, so that we instead lower `single` as: ```glsl layout(std430) buffer _S1 { X _data[]; } single; ``` So that now a reference to `single[i]` becomes `single._data[i]` in GLSL. Most of that can be handled in the standard library translation of the structured buffer indexing operations. The only wrinkle there is that there were some old special-case instructions in the IR intended to handle buffer load/store operations (these were added back when I was trying to keep the "VM" path working). These aren't really needed to have structured-buffer operations work; they can be handled as ordinary functions as far as the stdlib is concerned. I removed the old instructions. Along the way, it became clear that a few other cases follow the same pattern. Byte-addressed buffers are an obvious case. We were lowering HLSL/Slang: ```hlsl ByteAddressBuffer b; ... uint x = b.Load(0); ``` to GLSL like: ```glsl layout(std430) buffer _S1 { uint b[]; }; ... uint x = b[0]; ``` That logic would fail for arrays the same way that the structured buffer case was failing. The fix is the same: use named `buffer` blocks and then introduce an explicit `_data` field: ```glsl layout(std430) buffer _S1 { uint _data[]; } b; ... uint x = b._data[0]; ``` Just like with structured buffers, all of the VK translation for operations on byte-addressed buffers can be implemented directly in teh stdlib, so once the emit logic was changed it was just a matter of adding `._data` to a bunch of VK tranlsations. It turns out that arrays of constant buffers have more or less the same problem, and furthermore we have some problems with any code that directly uses the modern HLSL `ConstantBuffer<T>` type. Note: the emit logic around constant buffers sometimes refers to "parameter groups" because that is being used in the compiler as a catch-all term for constant buffers, texture buffers, and parameter blocks. The existing code was going out of its way to reproduce the way that constant buffer declarations are implicitly referenced in HLSL: ```hlsl cbuffer C { float f; } ... float tmp = f; // No reference to `C` here ``` This can be seen in the emit logic with the `isDerefBaseImplicit` function, which is used to take the internal IR representation for a reference to `f` (which is closer to the expression `(C).f` or `C->f`) and leave off any reference to `C` so that we emit just `f`. That kind of logic just flat out doesn't work in some important cases. Arrays of constant buffers are a clear one: ```hlsl ConstantBuffer<X> cbArray[3]; ... X x = cbArray[0]; ``` There is no way to translate that to an ordinary `cbuffer` declaration at all. The same problem can be created without arrays, though: ```hlsl ConstantBuffer<X> singleCB; ... X x = singleCB; ``` The current strategy for translating constant buffers was translating `singleCB` into a `cbuffer` declaration that reproduced the fields of `X` as its members, which just wouldn't work: ```hlsl cbuffer singleCB { float f; // field of `X` } ... X x = singleCB; // ERROR: there is nothing named `singleCB` in this HLSL ``` The new strategy is more consistent. We still generate a `cbuffer` declaration for a single constant buffer, but we always give it a single field of the chosen element type: ```hlsl cbuffer singleCB { X singleCB; } ... X x = singleCB; // this works fine! ``` And in the array case we generate code that uses the explicit `ConstantBuffer<T>` type: ```hlsl ConstantBuffer<X> cbArray[3]; ... X x = cbArray[0]; ``` The GLSL output is more complicated because unlike with HLSL there is no implicit conversion from a uniform block to its element type (there is no notion of an element type). The array case thus needs a `_data` field similar to what we do for structured buffers: ```glsl layout(std140) uniform _S3 { X _data; } cbArray[3]; ... X x = cbArray[0]._data; ``` And then the non-array case needs to have a similar `_data` field for consistency: ```glsl layout(std140) uniform _S1 { X _data; } singleCB; ... X x = singleCB._data; ``` This is handled by inserting the necessary reference to `_data` whenever we dereference a constant buffer, either as part of a load instruction (loading from the whole CB as a pointer), or an `IRFieldAddress` instruction which forms a pointer into the CB (e.g., `&(singleCB->f)` becomes `singleCB._data.f`). The current emit logic handles `ParameterBlock<X>` differently from `ConstantBuffer<X>`, but really only to allow parameter blocks to be explicitly named in the output, while constant buffers were left implicit by default. Thus the only difference was a legacy one (from back when trying to exactly reproduce the HLSL text we got as input was considered an important goal), and the new approach to emitting constant buffers would get rid of it. I removed the separate logic for emitting `ParameterBlock<X>` and just let the handling for constant buffers deal with it. Note that any resource types inside of a `ParameterBlock<X>` would have been moved out as part of legalization, so that a parameter block is 100% equivalent to a constant buffer when it comes time to emit code. Unsurprisingly, changing the way we generate HLSL and GLSL output for all these buffer types meant that any tests that were directly comparing the output of `slangc` against `fxc`, `dxc`, or `glslang` broke. The basic approach to fixing the breakage in GLSL tests was to update the GLSL baseline to reflect the new output startegy. In some cases I used macros to name the various `_S<digits>` temporaries so that future renaming will hopefully be easier (it would be great if we auto-generated temporary names with a bit more context). There was one GLSL test (`tests/bugs/vk-structured-buffer-binding`) that was using raw GLSL expected output, and this was changed to use a GLSL baseline to generate SPIR-V for comparison. For HLSL tests we were sometimes running the same input file through `slangc` and `fxc`/`dxc`, and in these cases I macro-ized the various `cbuffer` declarations to generate different declarations depending on the compiler. I completely dropped the tests coming from the D3D SDK because they aren't providing much coverage, and updating them would change them so far from the original code that the purported benefit (using a body of existing shaders) would be lost. I also dropped the explicit matrix layout qualifiers in the `matrix-layout` test because the new output strategy breaks those for GLSL (you can't put matrix layout qualifiers on `struct` fields, and now the body of every constant buffer is inside a `struct`). This isn't as big of a loss as it seems, because our handling of those qualifiers wasn't really right to begin with. Slang users should only be setting the matrix layout mode globally (and we should probably switch to error out on the explicit qualifiers for now). The other thing that got dropped is tests involving `packoffset` modifiers. Slang already warns that it doesn't support these, and the way they were used in the test cases is actually misleading. For the binding/layout-related tests, the goal was to show that Slang reproduces the same layout as fxc, in which case explicitly enforcing a layout via `packoffset` seems like cheating (are we sure we enforced the layout fxc would have produced?). The real reason was that Slang used to emit explicit `packoffset` on every* field of a `cbuffer` it would output, because of an `fxc` bug where you couldn't use `register` on textures/samplers declared inside a `cbuffer` unless every field in the `cbuffer` used a `register` or `packoffset` modifier. Slang hasn't required that behavior in a while because it now splits textures and samplers, and the one test case where we needed `packoffset` to work around the `fxc` bug in the baseline HLSL has been macro-ified even more to work around the bug. The amount of churn in the test cases is unfortunate, but it continues to point at the weakness of any testing strategy that checks for exact equivalent between Slang's output and that of other compilers. We need to keep working to replace these tests with better alternatives. In `check.cpp` there is logic to perform implicit dereferencing, so that if you write `obj.f` where `obj` is a `ConstantBuffer<X>` (or some other "pointer-like" type) and `f` is a field in `X`, then this effectively translates as `(obj).f`. That is, we dereference the value of type `ConstantBuffer<X>` to get a value of type `X`, and then refer to the field of the `X` value. There was a problem where the logic to insert that kind of implicit dereference operation was using a reference (`auto& type = ...`) for the type of the expression being dereferenced, and then clobbering it. This would mean that an expression of type `ConstantBuffer<X>` would have its type overwritten to be just `X` and then codegen would break later on. I'm not sure how we haven't run into that before. The `array-of-buffers` test case was added to confirm that we now support arrays of constant, structured, and byte-address buffers for both DXIL and SPIR-V output. Okay, so that was a lot of stuff, but hopefully it is clear how this all works to make the output of the compiler more consistent and explicit, while also supporting the required new functionality. fixup: review feedback
2018-11-19	Add Vulkan cross-compilation for byte-address buffers (#721)	Tim Foley
	* Add Vulkan cross-compilation for byte-address buffers This covers `ByteAddressBuffer`, `RWByteAddressBuffer`, and `RasterizerOrderedByteAddressBuffer`. A declaration of any of these types translates to a GLSL `buffer` declaration with a single `uint` array of data. Most of the methods on these types then have straightforward translations to operations on the array. The overall translation is similar to what was already being done for structured buffers. While implementing GLSL translation for the various atomic (`Interlocked`) methods, I discovered that some of these included declarations that aren't actually included in HLSL. I cleaned these up, including in the declarations of the global `Interlocked` functions. The test case that is included here covers only the most basic functionality: `Load`, `Load2`, `Load4` and `Store`. We should try to back-fill tests for the remaining methods when we have time. Two large caveats with this work: 1. We don't handle arrays of byte-address buffers, just as we don't handle arrays of structured buffers. That will take additional work. 2. We don't handle byte-address (or structured) buffers being passed as function parameters, since the parameter would need to be declared as a bare `uint[]` array. * Fixup: don't lump raytracing acceleration structures in with buffers Raytracing acceleration structures share a common base class with byte-address buffers (they are both buffer resources without a specific element type), and I was mistakently matching on this base class in an attempt to have a catch-all that applied to all byte-address buffers. The fix here was to add a distinct base class for all byte-address buffers and catch that instead. * Fixup: typos
2018-10-30	Feature/serial string pool refactor (#702)	jsmall-nvidia
	* Ongoing serialization for full debug work. * Use StringRepresentationCache and StringSlicePool for serialization. * Removed some older path handling for serialization which had some wrong underlying assumptions. * Builds with refactored use of SubStringPool in ir-serialize. * Removed prohibitedCategories because not used anywhere. * Add category 'compatibility-issue' * Remove work in progress on debug serialization.
2018-10-29	Rework command-line options handling for entry points and targets (#697)	Tim Foley
	* Rework command-line options handling for entry points and targets Overview: * The biggest functionality change is that the implicit ordering constraints when multiple `-entry` options are reversed: any `-stage` option affects the `-entry` to its left instead of to its right as it used to. This is technically a breaking change, but I expect most users aren't using this feature. * The options parsing tries to handle profile versions and stages as distinct data (rather than using the combined `Profile` type all over), and treats a `-profile` option that specifies both a profile version and a stage (e.g., `-profile ps_5_0`) as if it were sugar for both a `-profile` and a `-stage` (e.g., `-profile sm_5_0 -stage fragment`). * We now technically handle multiple `-target` options in one invocation of `-slangc`, but do not advertise that fact in the documentation because it might be confusing for users. Similar to the relationship between `-stage` and `-entry`, any `-profile` option affects the most recent `-target` option unless there is only one `-target`. * The logic for associating `-o` options with corresponding entry points and targets has been beefed up. The rule is that a `-o` option for a compiled kernel binds to the entry point to its left, unless there is only one entry point (just like for `-stage`). The associated target for a `-o` option is found via a search, however, because otherwise it would be impossible to specify `-o` options for both SPIR-V and DXIL in one pass. * The handling of output paths for entry points in the internal compiler structures was changed, because previously it could only handle one output path per entry point (even when there are multiple targets). The new logic builds up a per-target mapping from an entry point to its desired output path (if any). Details: * Support for formatting profile versions, stages, and compile targets (formats) was added to diagnostic printing, so that we can make better error messages. This is fairly ad hoc, and it would be nice to have all of the string<->enum stuff be more data-driven throughout the codebase. * Test cases were added for (almost) all of the error conditions in the current options validation. The main one that is missing is around specifying an `-entry` option before any source file when compiling multiple files. This is because the test runner is putting the source file name first on the command line automatically, so we can't reproduce that case. * Several reflection-related tests now reflect entry points where they didn't before, because the logic for detecting when to infer a default `main` entry point have been made more loose * On the dxc path, beefed up the handling of mapping from Slang `Profile`s to the coresponding string to use when invoking dxc. * A bunch of tests cases were in violation of the newly imposed rules, so those needed to be cleaned up. * There were also a bunch of test cases that had accidentally gotten "disabled" at some point because there were comparing output from `slangc` both with and without a `-pass-through` option, but that meant that any errors in command-line parsing produced the same error output in both the Slang and pass-through cases. This change updates `slang-test` to always expect a successful run for these tests, and then manually updates or disables the various test cases that are affected. * When merging the updated test for matrix layout mode, I found that the new command-line logic was failing to propagate a matrix layout mode passed to `render-test` into the compiler. This was because the `-matrix-layout` options were implemented as per-target, but the target was being set by API while the option came in via command line (passed through the API). It seems like we want matrix layout mode to be a global option anyway (rather than per-target), so I made that change here. Add missing expected output files * A 64-bit fix * Remove commented-out code noted in review
2018-10-26	Work around dxc matrix layout behavior (#694)	Tim Foley
	The Slang compiler allows the default matrix layout convention (row-major vs. column-major) to be specified via the command line or API. When generating output HLSL, Slang emits a `#pragma pack_matrix` directive for the chosen default convention, so that a user can generate plain HLSL output and still have it encode their desired defaults. The problem that has arisen is that many released versions of dxc (including those in the most recent Windows SDK at this time) ignore the `#pragma pack_matrix` directive (the feature has since been added to top-of-tree dxc). The main fix here is to instead pass the `-Zpr` option in to dxc when invoking it if the row-major (non-default) convention is requested. This will solve the problem for clients that use Slang to generate DXIL, but not for clients who use Slang to generate plain HLSL that they then pass into dxc (those clients are assumed to be able to work around the problem for themselves). In order to test the change, I added a test that fills a constant buffer with sequential integers, and then reads out the rows/columns of an `int3x4` matrix with both row- and column-major layout, as well as an integer placed after the matrix, so we can see the offset it was given. The `render-test` application did not yet support generating code via dxc/DXIL, so I added an option for that. This ends up assuming that anybody who is running the D3D12 tests will also have a version of dxc available.
2018-10-18	Add support for static methods in interfaces (#680)	Tim Foley
	This change allows an interface to include `static` methods as requirements, so that types that conform to the interface will need to satisfy the requirement with a `static` method. The essence of the check is simple: when checking that a method satisfies a requirement, we enforce that both are `static` or both are non-`static`. Making that simple change and adding a test change broke a few other places in the compiler that this change tries to fix. The main fix is to handle cases where we might look up an "effectively static" member of a type through an instance, and to make sure that we replace the instance-based lookup with type-based lookup. There was already logic along these lines in `lower-to-ir.cpp`, so this change centralizes it in `check.cpp` where it seems to logically belong.
2018-10-11	Add basic support for [mutating] methods (#667)	Tim Foley
	By default, when writing a "method" (aka "member function") in Slang, the `this` parameter is implicitly an `in` parameter. So this: ```hlsl struct Foo { int state; int getState() { return state; } void setState(int s) { state = s; } }; ``` is desugared into something like this: ```hlsl struct Foo { int state }; int Foo_getState(Foo this) { return this.state; } // BAD: void Foo_setState(Foo this, int s) { this.state = s; } ``` That "setter" doesn't really do what was intended. It modifies a local copy of type `Foo`, because `in` parameters in HLSL represent by-value copy-in semantics, and are mutable in the body the function. Slang was updated to give a static error on the original code to catch this kind of mistake (so that `this` parameters are unlike ordinary function parameters, and no longer mutable). Of course, sometimes users want a mutable `this` parameter. Rather than make a mutable `this` the default (there are arguments both for and against this), this change adds a new attribute `[mutating]` that can be put on a method (member function) to indicate that its `this` parameter should be an `in out` parameter: ```hlsl [mutating] void setState(int s) { state = s; } ``` The above will translate to, more or less: ```hlsl void Foo_setState(inout Foo this, int s) { this.state = s; } ``` One added detail is that `[mutating]` can also be used on interface requirements, with the same semantics. A `[mutating]` requirement can be satisfied with a `[mutating]` or non-`[mutating]` method, while a non-`[mutating]` requirement can't be satisfied with a `[mutating]` method (the call sites would not expect mutation to happen). The design of `[mutating]` here is heavily influenced by the equivalent `mutating` keyword in Swift. Notes on the implementation: * Adding the new attribute was straightforward using the existing support, but I had to change around where attributes get checked in the overall sequencing of static checks, because attributes were being checked after function bodies, but with this change I need to look at semantically-checked attributes to determine the mutability of `this` * The check to restrict it so that `[mutating]` methods cannot satisfy non-`[mutating]` requirements was easy to add, but it points out the fact that there is a huge TODO comment where the actual checking of method signatures is supposed to happen. That is a bug waiting to bite users and needs to be fixed! * While we had special-case logic to detect attempts to modify state accessed through an immutable `this` (e.g., `this.state = s`), that logic didn't trigger when the mutation happened through a function/operator call (e.g., `this.state += s`), so this change factors out the validation logic for that case and calls through to it from both the assignment and `out` argument cases. * The error message for the special-case check was updated to note that the user could apply `[mutating]` to their function declaration to get rid of the error. * The semantic checking logic for an explicit `this` expression was already walking up through the scopes (created during parsing) and looking for a scope that represents an outer type declaration that `this` might be referring to. We simply extend it to note when it passes through the scope for a function or similar declaration (`FunctionDeclBase`) and check for the `[mutating]` attribute. If the attribute is seen, it returns a mutable `this` expression, and otherwise leaves it immutable. * The IR lowering logic then needed to be updated so that when adding an IR-level parameter to represent `this`, it gives it the appropriate "direction" based on the attributes of the function declaration being lowered. The rest of the IR logic works as-is, because it will treat `this` just like an other parameter (whether it is `in` or `inout`). * This biggest chunk of work was the "implicit `this`" case, because ordinary name lookup may resolve an expression like `state` into `this.state`, so that the `this` expression comes out of "thin air." To handle this case, I extended the structure of the "breadcrumbs" that come along with a lookup result (the breadcrumbs are used for any case where a single identifier like `state` needs to be embellished to a more complex expression as a result of lookup), so that it can identify whether a `Breadcrumb::Kind::This` node comes from a `[mutating]` context or not. Similar to the logic for an explicit `this`, we handle this by noting when we pass through a `FunctionDeclBase` when moving up through scopes, and look for the `[mutating]` attribute on it. The rest of the work was just plumbing the additional state through.
2018-06-12	Initial support for enum declarations (#599)	Tim Foley
	Slang `enum` declarations will always be scoped, e.g.: ```hlsl enum Color { Red, Green = 2, Blue, } Color c = Color.Red; // Not just `Red` ``` A user can write `enum class` as a placebo for now (to ease sharing of headers with C++). Slang does not currently support the `::` operator for static member lookup, so it must be `Color.Green` and not `Color::Green`. Support for `::` as an alternate syntax could be added later if there is strong user demand. An `enum` type can have a declared "tag type" using syntax like C++ `enum class`: ```hlsl enum MyThings : uint { First = 0, // ... } ``` The `enum` cases will store their values using that type. An `enum` that doesn't declare a tag type will use the type `int` by default. Enum cases are assigned values just like in C/C++: cases can have explicit values, but otherwise default to one more than the previous case, or zero for the first case. All `enum` types will automatically conform to a standard-library `interface` called `__EnumType`, which is used so that basic operators like equality testing can be defined generically for all `enum` types. This change only adds one operator at first (the `==` comparison), but other should be added later. An `enum` case needs to be explicitly converted to an integer where needed (e.g., `int(Color.Red)`). This is implemented by having the main integer types (`int` and `uint`) support built-in initializers that can work for any `enum` type (or rather, anything conforming to `__EnumType`). Eventually these will be restricted so that an `enum` type can only be converted to its associated tag type. IR code generation completely eliminates `enum` types and their cases. The `enum` type will be replaced with its tag type, and the cases will be replaced with the tag values. Currently this could leave some mess in the IR where cast operations are applied between values that actually have the same type.
2018-06-05	Fix atomic operations on RWBuffer (#593)	Tim Foley
	* Fix atomic operations on RWBuffer An earlier change added support for passing true pointers to `__ref` parameters to fix the global `Interlocked()` functions when applied to `groupshared` variables or `RWStructureBuffer<T>` elements. That change didn't apply to `RWBuffer<T>` or `RWTexture2D<T>`, etc. because those types had so far only declared `get` and `set` accessors, but not any `ref` accessors (which return a pointer). The main fixes here are: Add `ref` accessors to the subscript oeprations on the `RW` resource types Adjust the logic for emitting calls to subscript accessors so that we don't get quite as eager about invoking a `ref` accessor, and instead try to invoke just a `get` or `set` accessor when these will suffice. This is important for Vulkan cross-compilation, where we don't yet support the semantics of our `ref` accessors. * Add a test case for atomics on a `RWBuffer` * Fix up `render-test` so that we can specify a format for a buffer resource, which allows us to use things other than `StructuredBuffer` and `ByteAddressBuffer`. The work there is probably not complete; I just did what I could to get the test working. * A bunch of files got whitespace edits thanks to the fact that I'm using editorconfig and others on the project seemingly arent... * fixup: remove ifdefed-out code
2018-05-30	GroupMemoryBarrierWithGroupSync only works on groupshared memory - it ↵	jsmall-nvidia
	doesn't block on global memory accesses. The fix is to copy the values to be processed by InterlockedAdd into shared array. The previous test ran successfully on Dx11, but broke on Dx12. (#586)
2018-05-29	Fix global atomic functions (#582)	Tim Foley
	Fixes #581 This change adds a new parameter passing mode `__ref` to exist alongisde `in`, `out`, and `inout`. The `__ref` modifier indicates true by-reference parameter passing (whereas `inout` is copy-in-copy-out). This is not intended to be something that users interact with directly, but rather a low-level feature that lets us provide a correct signature for the `Interlocked*()` operations in the standard library. Most of the support for passing what are logically addresses around already exists in the IR, so the majority of the work here is just in introducing the new type `Ref<T>` and then using it appropriately when lowering `__ref` parameters/arguments to the IR.
2018-05-25	Fixes 574. Eliminate empty structs during type legalization (#577)	Yong He

2018-05-24	A bunch of work to resolve #569 (#576)	Tim Foley
	* render-test should not fail on HLSL compiler warnings The logic in `render-test` that invokes `D3DCompile` was causing a test to fail if it produced any warnings (not just if compilation fails). Warning output can be dealt with by the test runner, since it will compare output between runs anyway, and it is useful to be able to run something through `render-test` that compiles with warnings. * Be more careful about deleting IR instructions There was an `IRInst::deallocate()` method that had a precondition that the instruction should already be removed from its parent and clear out all its operands before calling, but it wasn't checking this and the few call sites weren't doing things right either. I consolidated things on `IRInst::removeAndDeallocate()` which does all the things: removes from the parent, clear out operands, and then deallocates. I also made sure to clear out the type operand. This clears up some crashing issues where passes were removing instructions but those instructions would still show up as users of other instructions. * Don't emit bitwise not for non-Boolean types It seems like the logic in `emit.cpp` messed things up and decided that `Not` (the IR instruction that is equivalent to `!` in the AST) should emit as `!` for Boolean types and `~` for other types, but this makes no sense (e.g., `~(a & 1)` is very different from `!(a & 1)`, even when interpreted as a condition). It seems like this logic was intended for the `BitNot` case, where `~a` and `!a` are actually equivalent for Boolean values (but a target language might not like `~a` on `bool` values). Maybe the original plan was that the `Not` instruction should only apply to Boolean values in the first place, and that other values should be converted to `bool` (or a vector of `bool`) before applying `Not`, but even in that case the emit logic makes no sense. This caused an actual problem for one of my test cases, so it was important to fix it now. * Fix issue with cached resolution for overoaded operators The basic problem was that the lookup logic was forming a key based on the first definition it found for the overloaded operator, but that means that when processing a prefix `++a` call we might look up the postfix definition of `operator++` and decide to use its opcode as the key. This "fixes" the logic by looking for the first definition with a "compatible" definition (e.g., a `__prefix` function if we are checking a `PrefixExpr`), and then uses its opcode. A better fix in the long run would be to make the cache just be keyed on the operator name and the "fixity" of the expression (prefix, postfix, or infix). * Introduce an intermediate structured control-flow representation The code previously used a single function called `emitIRStmtsForBlocks` in `emit.cpp` that would take a logical sub-graph of the CFG and emit it as high-level statements. It would do this by recognizing operations like coniditional branches that it could turn into high-level `if` statements, etc. The main problem with this function was that it mixed together the logic for how we restructure the program with the logic for how we emit high-level code from that structure. This change splits those two parts of the algorithm by introducing an intermediate data structure: a tree of `Region`s, which represent single-entry regions of the CFG. There are subclasses of `Region` corresponding to various structured control-flow constructs, and then a leaf case that wraps a single `IRBlock`. The new function `generateRegionsForIRBlocks()` (in `ir-restructure.cpp`) now handles the restructuring work, by building one or more `Region`s to represent a sub-graph, while `emitRegion()` handles emitting HLSL/GLSL source code from a region. Splitting things in this way opens up some opportunities for future changes: * We can expand the set of IR control-flow constructs allowed, so long as we can still generate structure `Region`s from them, without having to mess with the emit logic (e.g., we could start to support multi-level `break` by introducing temporaries as needed). In the limit we can generate our `Region`s using something like the "Relooper" algorithm. * We can emit to other representations while retaining the same control-flow restructuring support. E.g., if we drop the structured information from the IR, then emitting to SPIR-V for Vulkan would require us to use the strucured control-flow information from these `Region`s. * We can do analysis that needs to understand `Region` structure. This is relevant to issue #569, which was what prompted me to start on this work. Now that we have a representation of the nesting of `Region`s, we can use it to reason about visibility of values between blocks. During development of this change I ran into a gotcha, in that I had been assuming each IR block would map to a single `Region`, forgetting that our current lowering of "continue clauses" in `for` loops leads to them being duplicated. The `Region` representation handles this by having a linked-list struct mapping IR blocks to the `SimpleRegion`s that represent them. I added a test case that includes a `for` loop with a continue clause that is reached along multiple paths just to make sure that we continue to support that case. The compiler output should not change as a result of this work; this is supposed to be a pure refactoring change. * Add a pass to resolve scoping issues in generated code Fixes #569 The basic problem arises because the structured control flow that we output in high-level HLSL/GLSL doesn't match the "scoping" rules of an SSA IR. In particular, SSA says that a value can be used in any block that is dominated by the definition, but in the presence of `break` and `continue` statements it is easy to construct cases where a block dominates something that is not in its scope for structured control flow. Consider: ```hlsl for(;;) { int a = xyz; if(a) { int b = a; break; } int c = a; } int d = b; ``` This program is invalid as HLSL, because the variable `b` is referenced outside of its scope, but if we look at the CFG for this function, it is clear that the block that computes `b` dominated the block that computes `d`. IR optimizations can easily create code like this, so we need to be ready for it. The previous change added an explicit `Region` structure to represent the structured control flow that we re-form out of the IR, and this change adds a pass that exploits the structuring information to detect cases like the above and introduce temporaries to fix the scoping issue. For example, the pass would change the earlier code block into something like: ```hlsl int tmp; for(;;) { int a = xyz; if(a) { int b = a; tmp = b; break; } int c = a; } int d = tmp; ``` That is, we introduce a new `tmp` variable at a scope "above" both the definition and use of `b`, and then we copy `b` into that temporary right where it is computed, and then use the temporary instead of the original `b` at the use site. A few details that came up during the implementation: * Downstream compilers may get confused by code like the above, and complain that `tmp` may be used before it is initialized, even though the very definition of dominators in a CFG means we don't have to worry about it. Still, I introduced some one-off code to initialize the temporaries just to silence spurious warnings coming from fxc. * We need to be careful not to apply this logic to "phi nodes" (the parameters of basic blocks) since they will already be turned into temporaries by the emit logic, and trying to introduce temporaries with this pass led to broken code (I still need to investigate why). It may be that a future version of this pass should also take the code out of SSA form, so that we can introduce both kinds of temporaries in a single pass (and maybe eliminate some unnecessary variables by doing basic register allocation). There is another transformation that could fix some issues of this kind, by moving code out of a structured control-flow construct and to the "join point" after it. For example, we could turn our loop from the start of this commit message into: ```hlsl for(;;) { int a = xyz; if(a) { break; } int c = a; } int b = a; int d = b; ``` Moving the definition of `b` to after the loop is possible because there is no way to get out of the loop without executing that code anyway. Now the scoping issue for `d`'s use of `b` has gone away, but of course we've introduced a new scoping issue for `a`, when it gets used by `b`. Adding a pass to re-arrange control flow like this could reduce the cases where we have to apply the current pass, but it wouldn't eliminate them entirely. That means such a pass can be deferred to future work. This change includes a test case the reproduces the original issue, so that we can confirm the fix works.
2018-04-18	Fix output of `groupshared` with IR type system (#492)	Tim Foley
	The basic problem was that the lowering logic was constructing (more or less) `Ptr<@GroupShared X>` instead of `@GroupShared Ptr<X>`. There were also problems with passes not propagating through rates that should have been (e.g., legalization). I've added a test case to actually validate `groupshared` support.
2018-04-10	Feature/dx12 compute (#482)	jsmall-nvidia
	* Dx12 rendering works in test framework. * Turn on dx12 render tests. * Getting simpler dx12 compute tests to work. * With expected data in test - check for specialized and then for the default, so that multiple test can share the same expected data, but specialized cases can still be set. * Fixed construction and binding on dx12 textures. * Control which render apis used in test from command line. * Small aesthetic fixes in render-test/main.cpp. * Fix binding problem for uavs/srvs dx12. Previously tried to create srv/uav for StorageBuffers (like dx11 does), but the binding breaks as you can end up with two srvs using the same register. First pass at fixing problems with Texture creation for dx12 - assertions were hit with 3d or array textures. * Fixes to improve Dx12 setup shader resource views for cubemaps/arrays. * Fixed d3d12 textureSamplingTest - problem was that cubemap/array textures were not being uploaded correctly. * Changed the order of how binding of constant buffers (as just set on the Renderer) indexes. Previously they were given the lowest indices, but they clashed with the indices from the 'Binding'. Changing this means all tests run on d3d12. * Add code to allow use of warp (although not command line switchable yet). Fix problem setting up raw UAV - as identified by warp. * Added RenderApiUtil - which can detect if a render api is potentially available. * Moved render flag testing/parsing into RenderApiUtil. * Fix signed/unsigned warning. * Fixes around enums prefixed with k on the review of feature/dx12 compute branch.
2018-03-29	Add support for default parameter values in IR codegen (#459)	Tim Foley
	Fixes #61 When lowering from AST to IR, if a call site doesn't supply an argument expression for each of the parameters to the callee, then use the default value expressions (stored as the "initializer" of the parameter decl) for each omitted parameter. This relies on the front-end to have already checked the call site for validity. Along the way I also cleaned up some of the checking of parameter declarations so that it is more like the checking of ordinary variable declarations (although the code is not yet shared). I also cleaned out some dead cases in the lowering logic for when we don't actually have a declaration available for a callee (these would only matter if we supported functions as first-class values). I added a simple test case to confirm that call sites both with and without the optional parameter work as expected. The strategy in this change is extremely simplistic, and might only be appropriate for default parameter value expressions that are compile-time constants (which should be the 99% case). This may require a major overhaul if we decide to handle default parameter values differently (e.g., by generating extra functions to ensure that the separate compilation story is what we want). Another issue that could change a lot of this logic would be if we start to support by-name parameters at call sites, since we could no longer assume that the argument and parameter lists align one-to-one (with the argument list possibly being shorter). Any work to add more flexible argument passing conventions would need to build a suitable structure to map from arguments to parameters, or vice-versa.
2018-03-16	Overhaul implementation of [attributes] (#443)	Tim Foley
	The existing code parsed all of the square-bracket `[attributes]` into `HLSLUncheckedAttribute`, and then went on to hand-convert some of them to specialized subclasses of `HLSLAttribute`. When attributes didn't check, they were left as-is, and no error message was issued, because at the time the compiler was focused on accepting arbitrary input. This change greatly overhauls the handling of `[attributes]`. Attributes are now declared in the stdlib, with declarations like: ```hlsl __attributeTarget(LoopStmt) attribute_syntax [unroll(count: int = 0)] : UnrollAttribute; ``` In this syntax, the `unroll` part is giving the attribute name (the `[]` are just for flavor, to make the declaration look like a use site; we could drop it if we don't like the clutter), the `count` is a parameter of the attribute, which we expect to be of type `int`, and which has a default value of `0` if unspecified. The `: UnrollAttribute` part specifies the meta-level C++ class that will implement this attribute (and corresponds to a class in `modifier-defs.h`). This syntax is similar to our current `syntax` declarations. I'm starting to think we should change it to something like a `__meta_class(UnrollAttribute)` modifier, and then use that uniformly across all cases (e.g., also replacing the curreent `__magic_type(Foo)` syntax). The `__attributeTarget(LoopStmt)` is a modifier that specifies the meta-level C++ class for syntax that this attribute is allowed to attach to. It is legal to have more than one of these. Attributes continue to be parsed in an unchecked form, so that we don't tie up semantic analysis and parsing more than necessary. During checking, we look up the attribute name in the current scope, and then replace the unchecked attribute with a more specific one if the checking passes. Checking proceeds in generic and attribute-specific phases. The generic phase includes checking the number of arguments against those specified in the attribute declaration (I don't currently check types, or handle default arguments), and then checking that at least one `__attributeTarget(...)` modifier applies to the syntax node being modified. The attribute-specific phase then applies to the specialized C++ subclass of `Attribute`, and does the actual checking right now (e.g., that step is responsible for actually type-checking things at present). This can obviously be improved over time. With this support I went ahead and added declarations for all the HLSL attributes I could find documented on MSDN. I also added a provisional declaration for the `[shader(...)]` attribute that has been added to dxc, but which is not yet documented. One important detail here is that lookup of attribute names needs to be done carefully, so that we don't let, e.g., local variables shadow an attribute declaration: ```hlsl int unroll = 5; // This attribute should not get confused by the local variable `unroll` [unroll] for(...) { .. } ``` The lookup logic already has a notion of a `LookupMask` that can be used to filter declarations out of the result. In this change I surfaced that mask through the main lookup API (rather than requiring a second pass to "refine" lookup results), and made is so that the default lookup mask does not include attributes, while an explicit mask can be used to look up only attributes. (An alternatie design we discussed was to follow the approach of C# and have the declaration of an attribute like `[unroll]` actually be `unrollAttribute`, with a suffix. I decided not to follow that approach for now because it seemed like printing good error messages in that case could require us to carefully trim the `Attribute` suffix off of names at times, and using the existing mask behavior seemed simpler.) To verify that the shadowing behavior is indeed correct, I modified the `loop-unroll.slang` test case. Smaller notes: * Removed the `HLSL` prefix from several of the C++ attribute classes * Made sure to actually validate the modifiers on statements * Special-cased checking for `ParamDecl` with a null type, because I'm re-using `ParamDecl` for attribute parameters, but can't give a concrete type to some of them right now * Deleting some old, dead emit-from-AST logic around attributes, rather than try to "fix" code that doesn't run (a more complete scrub of that code is still needed) * Fixed AST inheritance hierarchy so that a `Modifier` is a `SyntaxNode` rather than a `SyntaxNodeBase`. I have no idea why we have both of those, and we need to clean that up soon.
2018-02-22	Initial work on validating "constexpr"-ness in IR (#420)	Tim Foley
	* Initial work on validating "constexpr"-ness in IR The underlying issue here is that certain operations in the target shading languages constrain their operands to be compile-time constants. A notable example is the optional texel offset parameter to the `Texture2D.Sample` operation. When calling these operations in GLSL, the user is required to pass a "constant expression," and any variables in that expression must therefore be marked with the `const` qualifier (and themselves be initialized with constant expressions). Any GLSL output we generate must of course respect these rules. When calling these operations in HLSL, the user is not so constrained. Instead, they can pass an arbitrary expression, which may involve ordinary variables with no particular markup, and then the compiler is responsible for determining if the actual value after simplification works out to be a constant. In some cases, the requirement that a value be constant might actually trigger things like loop unrolling. Also, it is okay to use a function parameter to determine such a constant expression, as long as the argument turns out to be a constant at all call sites. The way we have decided to tackle these challenges in Slang is that we we propagate a notion of `constexpr`-ness through the IR. This is currently being tackled in `ir-constexpr.cpp` with a combination of forward and backward iterative dataflow: * When the operands to an instruction are all `constexpr`, and the opcode is one we believe can be constant-folded, then we infer that the instruction can be evaluated as `constexpr` * When instruction is required to be `constexpr`, then we infer that all of its operands are also required to be `constexpr`. If this process ever infers that a function parameter is required to be `constexpr`, then we might have to continue propagation at all the call sites to that function. If after all the propagation is done, there are any cases where an instruction is required to be `constexpr`, but it can't be `constexpr` (we weren't able to infer `constexpr`-ness for its operands), then we issue an error. This implementation encodes the idea of `constexpr`-ness in the IR as part of the type system, using a simplified notion of rates. This change adds a `RateQualifiedType` that can represent `@R T`, and then introduces a `ConstExprRate` that can be used for `R`. Many accessors for the type information on IR nodes were updated to distinguish when one wants the "full" type of an IR value (which might include rate information) vs. just the "data" type. A `constexpr` qualifier was added in the front-end, and is being used to decorate the texel offset parameter for `Texture2D.Sample`. Lowering from AST to IR looks for this qalifier and infers when a function parameter must be typed as `@ConstExpr T` instead of just `T`. There are lots of limitations and gotchas in the implementation so far: * The `@ConstExpr` rate is the only one added in this change, but it seems clear that the conceptual `ThreadGroup` rate that was added to represent `groupshared` should probably get folded into the representation. * I'm not 100% pleased with how many places in the IR I have to special-case for rate-qualified types. At the same type, pulling out rate as a distinct field on `IRValue` would probably require that we pay attention to rate everywhere. * I've added a test case to show that we can issue errors when users fail to provide a constant expression for the texel offset, but the actual error message isn't great because it doesn't indicate why a constant expression was required. Realistically the "initial IR" should contain a few more decorations we can use to relate error conditions back to the original code (even if this is in a side-band structure). * I've added a test case that is supposed to show that we can back-propagate `constexpr`-ness to local variables, and I've manually confirmed that it works for Vulkan/SPIR-V output, but the level of Vulkan support in `render_test` today means I can't enable the test for check-in. * While I'm attempting to propagate `@ConstExpr` information from callees to callers, I haven't implemented any logic to specialize callee functions based on values at call sites. * In a similar vein, there is no handling of control-flow dependence in the current code. If we infer that a phi (block parameter) needs to be `@ConstExpr`, then it isn't actually enough to require that the inputs to the phi (arguments from predecessor blocks) are all `@ConstExpr` because we also need any control-flow decisions that pick which incoming edge we take to be `@ConstExpr` as well. * As a practical matter, implicit propagation of `@ConstExpr` from a function body to a function parameter should only be allowed for functions that are "local" to a module. Any function that might be accessed from outside of a module should really have had its `@ConstExpr` parameter marked manually, and our pass should validate that they follow their own rules. Right now we have no kind of visibility (`public` vs `private`) system, so I'm kind of ignoring this issue. While that is a lot of gaps, this is also just enough code to get the Falcor MultiPassPostProcess example working, so I'm inclined to get it checked in. * Fixup: missing expected output for test * Fixup: disable test that relies on [unroll] for now
2018-02-13	Fix a bug in IR use-def information (#406)	Tim Foley
	The basic problem here is that when unlinking an `IRUse` from the linked list of uses, there were several cases where I was failing to set the `prevLink` field of the next node to match the `prevLink` field of the node being removed. That doesn't show up when walking the linked list of uses forward, but it breaks it whenever you have subsequent unlinking operations. This change fixes the bugs of that kind I could find, and also adds a debug validation method to try to avoid breaking it again. I also made more access to `IRUse` go through accessor methods rather than using fields directly, to try to avoid this kind of error. I stopped short of making anything `private`, because I tend to find that it creates more hassles than it avoids. A few other fixes along the way: - Made the `List<T>` type default-initialize elements when you resize it. I hadn't realized we weren't doing that. - Add a standalone `dumpIR(IRGlobalValue*)` so help when debugging issues.
2018-02-08	Basic IR support for `static const` globals (#404)	Tim Foley
	* Basic IR support for `static const` globals Our strategy for lowering global variables can fall back to putting their initialization into a function, but that isn't really appropriate for global constants (it also isn't appropriate for arrays, but we'll need to deal with that seaprately). This change adds a distinct case for global constants (rather than treating them as variables), and forces the emission logic to always emit them as a single expression. Doing this makes assumptions about how the IR for these constants gets emitted (and what optimziations might do to it). In order to make things work, I had to switch the handling of initializer-list expressions to not be lowered via temporaries and mutation (since that isn't a good fit for reverting to a single expression). I've added a single test case to ensure that this works in the simplest scenario. My next priority will be to see if this unblocks my work in Falcor. * Fixup: bug fixes
2018-02-03	Remove non-IR codegen paths (#398)	Tim Foley
	The basic change is simple: remove support for all code generation paths other than the IR. There is a lot of vestigial code left, but the main logic in `ast-legalize.` is gone. Doing this breaks a lot* of tests, for various reasons: - We can no longer guarantee exactly matching DXBC or SPIR-V output after things pass through out IR - Many builtins don't have matching versions defined for GLSL output via IR (even when they had versions defined via the earlier approach that worked with the AST) - A lot of code creates intermediate values of opaque types in the IR, which turn into opaque-type temporaries that aren't allowed (this breaks many GLSL tests, but also some HLSL) I implemented some small fixes for issues that I could get working in the time I had, but most of the above are larger than made sense to fix in this commit. For now I'm disabling the tests that cause problems, but we will need to make a concerted effort to get things working on this new substrate if we are going to make good on our goals.
2018-02-02	Remove support for the -no-checking flag (#392)	Tim Foley
	* Remove support for the -no-checking flag Fixes #381 Fixes #383 Work on #382 - No longer expose flag through API (`SLANG_COMPILE_FLAG_NO_CHECKING`) and command-line (`-no-checking`) options - Remove all logic in `check.cpp` that was withholding diagnostics (including errors) when the no-checking mode was enabled - Remove `HiddenImplicitCastExpr`, which was only created to support no-checking mode (it represented an implicit cast that our checking through was needed, but couldn't emit because it might be wrong) - Remove logic for storing function bodies as raw token lists when checking is turned off. I'm leaving in the `UnparsedStmt` AST node in case we ever need/want to lazily parse and check function bodies down the line. - Remove a few of the code-generation paths we had to contend with, but keep the comment about them in place. - Remove GLSL-based tests that can't meaningfully work with the new approach. - Fix other tests that used a GLSL baseline so that their GLSL compiles with `-pass-through glslang` instead of invoking `slang` with the `-no-checking` flag. - Remove tests that were explicitly added to test the "rewriter + IR" path, since that is no longer supported. There is more cleanup that can be done here, now that we know that AST-based rewrite and IR will never co-exist, but it is probably easier to deal with that as part of removing the AST-based rewrite path. We've lost some test coverage here, but actually not too much if we consider that we are dropping GLSL input anyway. * Fixup: test runner was mis-counting ignored tests * Fixup: turn on dumping on test failure under Travis * Fixup: enable extensions in Linux build of glslang
2018-02-02	Initial work on getting render-test to support vulkan (#391)	Tim Foley
	* Basic fixes to gets some Vulkan GLSL out of the IR path We haven't been paying much attention to the Vulkan output from the IR path, but that needs to change ASAP. This commit really just implements quick fixes, without concern for whether they are a good fit in the long term. - Add some more mappings from D3D `SV_` semantics to built-in GLSL variables, and stop redeclaring those built-in variables in our output GLSL. - Add custom output logic for HLSL `StructuredBuffer<T>` types, so that they emit as `buffer` declarations with an unsized array inside. This has some real limitations: - What if the user passes the type into a function? The parameter should be typed as an (unsized) array, and not a buffer. - What happens if we have an array of structured buffers? We need to declare an array of blocks (which GLSL allows), but this changes the GLSL we should emit when indexing. - Customize the way that we emit entry point attributes (e.g., `[numthread(...)]`) to also support outputting equivalent GLSL `layout` qualifiers. In many of these cases, a better fix might involve doing more of this work in the IR as part of legalization (e.g., we already have a pass that deals with varying input/output for GLSL, so that should probalby be responsible for swapping the `SV_` to `gl_`, especially in cases where the types don't match perfectly across langauges). * Start adding Vulkan support to render-test - Add both Vulkan and D3D12 as nominally supported back-ends - Add a git submodule to pull in the Vulkan SDK dependencies - I don't want our users to have to install it manually, since the SDK is huge - Checking in the binaries to our main repository seems like a bad idea, but my hope is that we can prune the bloat using a subodule with the `shallow` cloning option - Implement enough logic for the Vulkan back-end to get a single test passing on Vulkan * Fix warning * Fixup: disable new compute tests for Linux * Fixup: ignore Vulkan tests on AppVeyor * Dynamically load Vulkan implementation Rather than statically link to the Vulkan library, we will dynamically load all of the required functions. This removes the need to have the stub libs involved at all. * Remove vulkan submodule I had set up a `vulkan` submodule to pull in the headers and stub libs, but now that we are going to dynamically load all the symbols anyway, the stub lib binaries aren't needed and we can just commit the headers. * Add Vulkan headers to external/
2018-02-01	Implement type splitting for raw buffers (#393)	Tim Foley
	* Fix render-test to handle raw buffers I don't know if this fix will work for UAVs that are neither structured nor raw, but it fixes the code that currently only really works if every UAV is structured (since it doesn't set a format). * Make type legalization consider raw buffer types The type layout logic was already handling these, but the type splitting logic in legalization was failing to split structure types that contain, e.g., `RWByteAddressBuffer`. A compute test case has been added to confirm the fix.
2018-01-21	specialize witness tables when needed when specializing ↵	Yong He
	`lookup_witness_table` instruction. (#376)
2018-01-21	Add directive to ignore file for test runner	Tim Foley

2018-01-21	Improvements and bug fixes for global type parameters	Yong He
	1. allow spReflection_FindTypeByName to accept arbitrary type expression string 2. allow const int generic value to be used as expression value, and as array size 3. various bug fixes in witness table specialization / function cloning during specializeIRForEntryPoint to avoid creating duplicate global values, not copying the right definition of a function from the other module, not cloning witness tables that are required by specializeGenerics etc.
2018-01-20	bug fixes	Yong He
	fixes #373 fixes bug that misses current translation unit's scope when resolving entry-point global type argument expression.
2018-01-19	Allow arbitrary type string as type argument in spAddEntryPointEx.	Yong He