slang.git - Making it easier to work with shaders

Age	Commit message (Collapse)	Author
2019-02-11	* Use LayoutResourceKind for calcing total num regs used (#838)	jsmall-nvidia
	* Made diagnostic message more compliant + fixed test output * Typo fixes
2019-02-11	[[vk::shader_record]] (#836)	jsmall-nvidia
	* * Replaced ShaderRecordNVLayoutModifier with ShaderRecordAttribute * Allowed attributed [[vk::shader_record] and [[shader_record]] * Checking there is at most 1 ShaderRecord active * Small typo fixes * Slightly improve diagnostic. Replace expected file.
2019-02-08	Hotfix/dispatch thread id improvements (#834)	jsmall-nvidia
	* * Make vector comparisons out correct functions on glsl * Test for vector comparisons * Typo fixes * Glsl vector comparisons use functions. * Added a coercion test. * Do checking for the SV_DispatchThreadId type to see if it appears valid. * Fix typo * Make glsl do type conversion for SV_DispatchThreadID parameter. * Fix glsl to match func-resource-param-array with changes to how SV_DispatchThreadID changes.
2019-02-08	Fix vector compares on GLSL targets (#833)	jsmall-nvidia
	* * Make vector comparisons out correct functions on glsl * Test for vector comparisons * Typo fixes * Glsl vector comparisons use functions. * Added a coercion test.
2019-02-07	* Improve test coverage of bit cast, particularly for asfloat. Make the ↵	jsmall-nvidia
	values being cast between valid floats. (#832) * Typo fix
2019-02-05	Merge branch 'master' into fix-nested-type-conformances	Yong He

2019-02-05	Merge branch 'master' into gencloser	Yong He

2019-02-05	Merge branch 'master' into fix-nested-type-conformances	Tim Foley

2019-02-05	Allow entry points to have explicit generic parameters (#826)	Tim Foley
	* Allow entry points to have explicit generic parameters Prior to this change, the Slang implementation required users to use global `type_param` declarations in order to specialize a full shader. For example: ```hlsl type_param L : ILight; ParameterBlock<L> gLight; [shader("fragment")] float4 fs(...) { ... gLight.doSomething() ... } ``` With this change we can rewrite code like the above using explicit generics, plus the ability to have `uniform` entry-point parameters: ```hlsl [shader("fragment")] float4 fs<L : ILight>( uniform ParameterBlock<L> light, ...) { ... light.doSomething() ... } ``` Having this support in place should make it possible for us to eliminate global generic type parameters and the complications they cause (both at a conceptual and implementation level). The most central and visible piece of the change is that `EntryPointRequest` now holds a `DeclRef<FuncDecl>` instead of just ` RefPtr<FuncDecl>`, which allows it to refer to a specialization of a generic function. Various places in the code that refer to the `EntryPointRequest::decl` member now use a `getFuncDecl()` or `getFuncDeclRef()` method as appropriate (see `compiler.h`). In order to fill in the new data, the `findAndValidateEntryPoint` function has been greaterly overhauled. The changes to its operation include: * The by-name lookup step for the entry point function has been adapted to accept either a function or a generic function. * The generic argument strings provided by API or command line are no longer parsed all the way to `Type`s, but instead just to `Expr`s in the first pass. * There are now two cases for checking the global generic arguments against their matching parameters. The first case is the new one, where we plug the generic argument `Expr`s into the explicit generic parameters of an entry point (that case re-uses existing semantic checking logic). The second case is the pre-existing code for dealing with global generic type arguments. The `lower-to-ir.cpp` logic for hadling entry points then had to be extended. Making it deal with a full `DeclRef` instead of just a `Decl` was the easy part (just call `emitDeclRef` instead of `ensureDecl`). The more interesting bits were: * We need to carefully add the `IREntryPointDecoration` to the nested function and not the generic in the case where we have a generic entry point. There is a handy `getResolvedInstForDecorations` that can extract the return value for an IR generic so that we can decorate the right hting. * We need to make sure that in the case where we emit a `specialize` instruction (which normally wouldn't get a linkage decoration), we attach an `[export(...)]` decoration to it with the mangled name of the decl-ref, so that it can be found during the linking step. The IR linking step is then slightly more complicated because the mangled entry point name could either refer directly to an `IRFunc` or to a `specialize` instruction for a generic entry point. The logic was refactored to first clone the entry point symbol without concern for which case it is (the old code was specific to functions), and then if the result is a `specialize` instruction, we attempt to run generic specialization on-demand. That on-demand specialization is a bit of a kludge, but it deals with the fact that all the downstream passing only expect to see an `IRFunc`. A future cleanup might try to split out that specialization step into its own pass, which ends up being a limited form of the specialization pass. Since I was already having to touch a lot of the code around IR linking, I went ahead and refactored the signature of the operations. I eliminated the need for the caller to create, pass in, and then destroy an `IRSpecializationState` (really an IR linking state), and replaced it with a structure local to the pass (that data structure was a remnant of an older approach in the compiler), and then also renamed the main operation to `linkIR` to reflect what it is doing in our conceptual flow. Smaller changes made along the way include: * Refactored `visitGenericAppExpr` to create a subroutine `checkGenericAppWithCheckedArgs` so that it can be used by the entry-point validation logic described above). * Refactored the declarations around the IR passes in `emitEntryPoint()` (`emit.cpp`), to show that things are more self-contained than they used to be (e.g., that the `TypeLegalizationContext` is now only needed by one pass). * Refactored the generic specialization code so that there is a stand-along free function that can perform specialization on a `specialize` instruction without all the other context being required. This is only to support the limited specialization that needs to be done as part of linking. * Updated the `global-type-param.slang` test to actually test entry-point generic parameters. In a later pass we can/should rework all the tests/examples for global type parameters over to use explicit entry-point generic parameters (at which point we should rename the tests as well). For now I am leaving thigns with just one test case, with the expectation that bugs will be found and ironed out as we expand to more tests. * fixup * Fixup: don't leave entry-point decorations on stuff we don't want to keep The IR `[entryPoint]` decoration is effectively a "keep this alive" decoration, which means that attaching it to something we don't intend to keep around can lead to Bad Things. The approach to generic entry points was attaching `[entryPoint]` to the underlying `IRFunc` because that seemed to make sense, but that meant that the `specialize` instruction at global scope scould instantiate that generic and then keep it alive, even if the resulting function wouldn't be valid according to the language rules. As a quick fix, I'm attaching `[entryPoint]` to the `specialize` instruction instead in such cases, and then re-attaching it to the result of explicit specialization during linking. * Port most of remaining test and rename global type parameters This change ports as many as possible of the existing tests for global type parameters over to use entry-point generic parameters instead. For the most part this is a mechanical change. A few test cases remain using global generic parameters, as does the `model-viewer` example application. The reason for this is that the shaders have either or both the following features: * A vertex and fragment shader that can/shold agree on their parameters * A type declaration (e.g., a `struct`) that is dependent on one of the generic type parameters In these cases, it would really only make sense to switch to explicit parameters once we support shader entry points nested inside of a `struct` type, so that we can use an outer generic `struct` as a mechanism to scope the entry points and other type-dependent declrations. Since global-scope type parameters need to persist for at least a bit longer, I went ahead and renamed all the use sites over to use `type_param` for consistency.
2019-02-05	Allow generics to close with >>	Yong He

2019-02-05	Fix checking of interface conformances for nested types	Tim Foley
	Before this change, code like the following would crash the compiler: ```hlsl interface IThing { /* ... / } struct Outer { struct Inner : IThing {} } / go on to use Outer.Inner / ``` The problem was that the front-end logic for checking interface conformances was only* checking declarations at the top level of a module, or nested under a generic. This change fixes the logic to recurse through the entire tree of declarations. I have added a test case that uses a nested `struct` type to satisfy an associated type requirement, to confirm that the new check works as intended.
2019-01-31	Initial support for uniform parameters on entry points (#815)	Tim Foley
	* Initial support for uniform parameters on entry points The basic feature this work adds is the ability to define a shader entry point like: ```hlsl [shader("fragment")] float4 main( uniform Texture2D t, uniform SamplerState s, float2 uv : UV) { return t.Sample(s,uv); } ``` In this example, the `uniform` keyword is used to mark that the given entry point parameters are not varying input/output flowing through the pipeline, but rather uniform shader parameters that should function as if the shader was declared more like: ```hlsl Texture2D t, SamplerState s, [shader("fragment")] float4 main( float2 uv : UV) { return t.Sample(s,uv); } ``` Allowing `uniform` parameters on entry points makes it easier to define multiple entry points in one file without accidentally polluting the global scope with shader parameters that only certain entry points care about. This feature is also more or less a prerequisite for allowing generic type parameters directly on entry point functions, since the main use case for those type parameters is for determining what goes in various `ConstantBuffer`s or `ParameterBlock`s. There are two main pieces to the implementation. First, we need to be able to compute appropriate layout information for entry points that include `uniform` parameters. Second, we need to transform the entry point function to move any `uniform` parameters to be ordinary global-scope shader parameters, to make sure that all other back-end passes don't need to worry about this special case. The latter piece of the implementation is, relatively speaking, simpler. The pass in `ir-entry-point-uniforms.{h,cpp}` converts entry point parameters that are determined to be uniform (using the already-computed layout information) into fields of a `struct` type and then declares a global shader parameter based on that `struct` type (and applies already-computed layout information to that parameter). After that, the remaining IR passes (notably including type legalization) will handle things just as for any other global shader parameter. The changes to the layout step are more significant, but most of the changes are just cleanups and fixes to enable the feature. The two major changes that enable entry-point `uniform` parameters are: * In `collectEntryPointParameters` we now dispatch out to a new `computeEntryPointParameterTypeLayout` function, which decided whether to compute the type layout for a `uniform` parameter, or for a varying parameter (what used to be the default behavior handled by `processEntryPointParameterDecl`). * The main `generateParameterBindings` routine was extended so that it allocates registers/bindings to the resources required by each entry point (using `completeBindingsForParameter`) after it has allocated registers/binding to all of the global-scope parameters (this addition is mirrored in `specializeProgramLayout`). The effect of these changes is that the `uniform` parameters of any entry points specified in a compile request will be laid out after the global-scope parameters, in the order the entry points were specified in the compile request. A bunch of smaller changes were made around parameter layout that are worth enumerating so that the diffs make some sense: * The `EntryPointLayout` type was changed so that instead of trying to be a `StructTypeLayout`, it instead owns one, in the same fashion as `ProgramLayout`. This commonality was factored into a base class `ScopeLayout`, and a bunch of edits followed from that change. * Because `uniform` parameters are moved out of the entry point parameter list early in the IR transformations, the logic in `ir-glsl-legalize.cpp` that tried to look up parameter layout information by index would no longer work if the entry point parameter list had been altered. Instead, that logic now looks for the decorations directly on the parameters. * The `UsedRange` type in `parameter-binding.cpp` was tracking the existing parameter associated with a range using a `ParameterInfo` (which accounts for the possibility of multiple `VarDecl`s mapping to the same logical shader parameter), when just using a `VarLayout` is sufficient for all current use cases. The overhead of allocating a `ParameterInfo` seems like overkill for entry-point parameters, where there can't possibly be multiple declarations of the "same" parameter, so avoiding these overheads was a focus when trying to deduplicate code between the global and entry-point parameter cases. * A bunch of parameter binding logic that was specific to GLSL input has been deleted completely. There was no way to even execute this code in the compiler today, and there is pretty much zero chance of us needing (or wanting) to deal with GLSL input in the future. This includes custom `UsedRangeSet`s specific to each translation unit, which were only needed for global-scope `in` and `out` varying declarations in GLSL. * A bunch of functions with `EntryPointParameter` in their names were renamed to use `EntryPointVaryingParameter` to help distinguish that they only apply to the varying case, while entry point `uniform` parameters are handled elsewhere. * The `completeBindingsForParameter` function was re-worked into something that can be used for both global-scope shader parameters (where we have a `ParameterInfo` and possibly explicit bindings) and entry-point parameters (where we expect to have neither). This helps unify the (fairly subtle) logic for how we allocate and assign bindings for resources, constant buffers, parameter blocks, etc. * A small change was made so that the entry-point stage is attached directly to top-level parameters of the entry point, and not recursively to every field along the way. This could be a breaking change for some applications, but it makes more logical sense (to me); we'll have to check if this affects Falcor. This change produces different output for several of the reflection tests, but the changes are consistent with no longer attaching stage information to sub-fields of varying `struct`-type parameters. * Because there is a bunch of repeated logic in `parameter-binding.cpp` that has to do with computing a `struct` layout for ordinary/uniform data, I tried to factor that into a single `ScopeLayoutBuilder` type, which handles computing the offsets for any parameters with ordinary data, and then also handles wrapping up the layout in a constant buffer layout if there was any ordinary data at the end. * A similar convenience routine `maybeAllocateConstantBufferBinding` was added because I noticed multiple places in `parameter-binding.cpp` that were trying to allocate a constant buffer binding for global uniforms, and they were wildly inconsistent (and in most cases used logic that would only work for D3D). * The main `generateParameterBindings` routine is significantly shortened by using all of these utilities that were introduced. I tried to comment the places that changed to explain the overall flow correctly. * The `specializeProgramLayout` routine (used to take a `ProgramLayout` from `generateParameterBindings` and specialize it based on knowledge of global generic arguments) had basically been rewritten with more explicit commenting/rationale for what happens in each step. It makes use of the same shared utilities as `generateParameterBindings` and `collectEntryPointParameters`. In terms of testing: * I added a test case to specifically test the new behavior, and in particular I made sure to include a mix of both global and entry-point parameters and also to have entry-point parameters of both ordinary and resource/object types. * I tweaked an existing test for global type parameters to use an entry-point `uniform` parameter instead of a global one, in an effort to migrate it toward being able to use an explicitly generic entry point. * fixups from merge
2019-01-30	Fixing IR-lowering not properly registering func decl	Yong He

2019-01-29	Add underscores to `AttributeUsage` to signal its preview state.	Yong He

2019-01-29	Add support for user defined attributes.	Yong He

2019-01-28	Merge branch 'master' into yong-fix2	Tim Foley

2019-01-28	Support function parameters of existential (interface) type (#802)	Tim Foley
	* Support function parameters of existential (interface) type The basic idea here is that you can define a function that takes an interface-type parameter: ```hlsl interface IThing { void doSOmething(); } void coolFunction(IThing thing) { ... thing.doSomething() ... } ``` and call it with a concrete value that implements the given interface: ```hlsl struct Stuff : IThing { void doSomething() { /* secret sauce / } } ... Stuff stuff; coolFunction(stuff); ``` The compiler implementation will specialize `coolFunction` based on the concrete type that was actually passed in, resulting in output code along the lines of: ```hlsl struct Stuff { ... } void Stuff_doSomething(Stuff this) { / secret sauce / } void coolFunction_Stuff(Stuff thing) { ... Stuff_doSomething(thing); } ``` In terms of implementation the new specialization approach has been integrated into the existing pass for generic specialization (which has been refactored significantly along the way), because generic specialization can open up opportunities for existential/interface simplification and vice versa, so there is no fixed interleaving of the two passes that can clean up everything. The new logic therefore subsumes the old code for simplifying existential types (which only worked on local variables) in `ir-existential.{h,cpp}`. The local simplification rules from that implementation have become part of the core specialization pass instead, so that they can open up further transformation opportunities enabled by existential-type simplifications. This code in place right now only handles the basic case of a function parameter that directly uses an interface type, and not one that wraps up an interface type in an array, structure, etc. Additional simplifications need to be introduced to deal with those cases as well. fixup: typos
2019-01-28	remove line directives in test files	Yong He

2019-01-28	Fix type legalization to correctly handle empty struct fields.	Yong He

2019-01-28	Merge branch 'master' into texture-fix	Yong He

2019-01-28	Feature/bit cast glsl (#808)	jsmall-nvidia
	* First attempt at asint, asuint, asfloat intrinsics. * Test with countbits * Placing glsl definitions first makes them get picked up. * Some more improvements around asint. * Add support for vector versions of asint/asunit * Fix some typos in asuint/asint intrinsics for glsl. Simplified and increased coverage of as/u/int tests. * Added bit-cast-double test. Added notional support for asdouble bit casts to glsl - but couldn't test because glslang doesn't seem to support the extension. * Try to get double bit casts working - doesn't work cos of block issue. * Only output parents on intrinic replacement if return type is not void.
2019-01-25	Add GLSL translation rules for `SampleCmp`, `asint` and `asfloat`.	Yong He

2019-01-25	fix up empty-struct-parameters	Yong He

2019-01-25	Move glsl entry point legalization to later stage of compilation.	Yong He
	This allows generic types to be used in entry point parameters.
2019-01-25	Fix GLSL translation of several Texture* operations (#800)	Tim Foley
	A user found that the `Texture2D<float2>.Load(...)` operation was not being compiled to GLSL properly, such that it returned a `vec4` instead of the expected `vec2`. The GLSL texture-related functions always return (and take) 4-component vectors, and we already have infrastructure in `emit.cpp` for recognizing a `$z` operator in the GLSL intrinsic definition to stand in for an appropriate swizzle based on teh number of components in the texture result type. This change just adds that `$z` operator to the GLSL code for several more texture operations (including `Load()`) that are defined on a `Texture*<T>` and that return `T`. This change doesn't try to add additional GLSL translations for texture-related operations (e.g., additional variations like `SampleCmp` that we have defined in the stdlib but not given GLSL translations for). That work still needs to be done.
2019-01-25	Fixup handling of empty structs in function return types and parameters. (#796)	Yong He
	* Fixup handling of empty structs in function return types and parameters. * Bug fix in legalizeFunc() * More comprehensive empty struct test * Fix `legalizeFieldExtract` for empty struct field. * Add empty struct handling for construct inst
2019-01-24	Support "modern" declaration syntax as an option (#792)	Tim Foley
	* Support "modern" declaration syntax as an option Fixed #202 This change adds four new declaration keywords: The `let` and `var` keywords introduce immutable and mutable variables, respectively. They can only be used to declare a single variable at a time (unlike C declaration syntax), and they support inference of the variable's type from its initial-value expression. Examples: ``` let a : int = 1; // immutable with explicit type and initial-value expression let b = a + 1; // immutable, with type inferred var c : float; // mutable, with explicit type var d = b + c; // mutable, with type inferred ``` These declaration forms can be used wherever ordinary global, local, or member variable declarations appeared before. Right now they do not change rules about what is or is not considered a shader parameter. The `static` modifier should work on these forms as expected, but a `static let` variable is not the same as a `static const`, so an explicit `const` is still needed if you want that behavior. A `typealias` declaration introduces a named type alias, similar to `typedef`, but with more reasonable syntax. It inherits from the same AST class that `typedef` uses, so all of the code after parsing should be able to treat them as equivalent. To give a simple example: ``` // typedef int MyArray[3]; typealais MyArray = int[3]; ``` A `func` declaration introduces a function. Like `typealias` it re-uses the existing AST class, so there is no need for major changes after parsing. A `func` declaration uses a syntax similar to `let` variables for its parameters, and takes the (optional) result type in a trailing position. For example: ``` func myAdd(a: int, b: int) -> int { return a + b; } ``` If a `func` declaration leaves of the return type clause, the return type is assumed to be `void`. The main difference (beyond the trailing return type) is that the parameters of a `func`-declared function are immutable (unless they are `out`/`inout`). This change doesn't add support for declaring operator overloads with `func`, but that should be added later, and I'd like to make that the only way to declare such operations: ``` func +(left: MyType, right: MyType) -> MyType { ... } ``` The use of `:` for declaring parameter types here means that a function declared with modern syntax currently cannot include HLSL-style semantics on its parameters (or its result). We might consider introducing an `[attribute]`-based syntax for adding semantics to parameters if we think this is important, but for now it is fine to insist that users declare their entry points using traditional syntax. This change strives to avoid unecessary changes after parsing, but if the new syntax catches on with users there are some small ways we can take advantage of it for performance. In particular, since `let` declarations and parameters of modern-style functions are immutable, we do not need to generate read/write local temporaries for them during lowering to the IR (technically we can make the same optimization for `const` locals). In the process of implementing these new forms I also added a few subroutines to help share code better between existing cases in the parser. In particular, parsing of generic parameter lists on declarations that can be generic is now simplified and more unified. * Fixup: remove leftover debugging code * fixup: typos
2019-01-24	Fix a regression in geometry shader cross-compilation (#794)	Tim Foley
	The underlying problem here was that legalization of entry point parameters for GLSL eliminates all the parameters to `main()`, but we still left a dangling reference to one of those parameters if it was a geometry shader output stream. The un-parented parameter would lead to an infinite loop in a later IR step, because it would never be reached by the transformation, and thus could never change its status to the one for "visited" instructions. The fix here is to simply replace any refernces to the GS input stream parameter with an `undefined` instruction in the IR, and then rely on the fact that the downstream GLSL emit logic wouldn't actually reference that value anyway (hence why the danlging reference wasn't originally an issue). I included a basic cross-compilation test case for geometry shaders to try to avoid subsequent regressions like this (Vulkan GS support is one of the most commonly recurring regressions we've had). The comment I put into the IR legalization logic makes it clear that the strategy used there isn't 100% rock-solid anyway (it only works in all the `EmitVertex()` calls come from the shader entry point function, and not subroutines. Adding a better (more robust) translation strategy for geometry shaders would be a nice bit of future work.
2019-01-23	Fixing GLSL sign function.	Yong He
	fixes #602
2019-01-23	Fix IR emit logic for methods in `struct` types (#791)	Tim Foley
	There was a bug in the logic for emitting initial IR, such that it was neglecting to emit "methods" (member functions) unless they were also referenced by a non-member (global) function, or were needed to satisfy an interface requirement. This would only matter for `import`ed modules, since for non-`import`ed code, anything relevant would be referenced by the entry point so that the problem would never surface. This change fixes the underlying problem by adding a step to the IR lowering pass called `ensureAllDeclsRec` that makes sure that not only global-scope declarations, but also anything nested under a `struct` type gets emitted to the initial IR module. There are also a few unrelated fixes in this PR, which are things I ran into while making the fix: * Deleted support for the (long gone) `IRDeclRef` type in our `slang.natvis` file * Added support for visualizing the value of IR string and integer literals when they appear in the debugger * Fixed IR dumping logic to not skip emitting `struct` and `interface` instructions. Switching those to inherit from `IRType` accidentally affected how they get printed in IR dumps by default. * Fixed up the IR linking logic so that it correctly takes `[export]` decorations into account, so that an exported definition will always be taken over any other (unless the latter is more specialized for the target). I initially implemented this in an attempt to fix the original issue, but found it wasn't a fix for the root cause. It is still a better approach than what was implemented previously, so I'm leaving it in place.
2019-01-21	Path simplification/hash mode, plus bug fixes (#788)	jsmall-nvidia
	* * Fix memory bug around expanding va_args - needed buffer to have space for terminating 0 * Fix problem with FileWriter defaults being globals, as memory they allocate, will only be freed after return from main - work around by making StdWriters RefObject derived, and kept in scope such the writers are destroyed before checks for leaks is found * Added SimplifyPathAndHash mode for CacheFileSystem - will simplify the path and see if simplified path is in cache before reading file (limiting amout of underlying file requests) * * Added calcReplaceChar * Renamed DefaultFileSystem to OSFileSystem * Made OSFileSystem convert windows \ to / on linux * Simplified logic for caching in CacheFileSystem. * Added pragma-once-c to add extra test, but also so there is an 'include' directory in preprocessor tests. * Small fixes in pragma once test. * Simplified cache handling path, so that paths/simplified paths area always added. * Improve naming of methods for different caches.
2019-01-17	Feature/hash for source identity (#786)	jsmall-nvidia
	* * Added COMMAND_LINE_SIMPLE test type * Made how spawning works controllable by paramter/type SpawnType * Made break-outside-loop and global-uniform run command line slangc * calcRelativePath -> calcCombinedPath * Add 64 bit version of GetHash. * Add support for Hash based mode for CacheFileSystem.
2019-01-16	Initial support for dynamic dispatch using "tagged union" types (#772)	Tim Foley
	* Initial support for dynamic dispatch using "tagged union" types Suppose a user declares some generic shader code, like the following: ```hlsl interface IFrobnicator { ... } type_param T : IFrobincator; ParameterBlock<T : IFrobnicator> gFrobnicator; ... gFrobincator.frobnicate(value); ``` and then they have some concrete implementations of the required interface: ```hlsl struct A : IFrobnicator { ... } struct B : IFrobnicator { ... } ``` The current Slang compiler allows them to generate distinct compiled kernels for the case of `T=A` and the case of `T=B`. This means that the decision of which implementation to use must be made at or before the time when a shader gets bound in the application. This change adds a new ability where the Slang compiler can generate code to handle the case where `T` might be either `A` or `B`, and which case it is will be determined dynamically at runtime. This means a single compiled kernel can handle both cases, and the decision about which code path to run can be made any time before the shader executes. This new option is supported by defining a tagged union type. Via the API, the user specifies that `T` should be specialized to `__TaggedUnion(A,B)` (the double underscore indicates that this is an experimental and unsupported feature at present). We refer to the types `A` and `B` here as the "case" types of the tagged union. Conceptually, the compiler synthesizes a type something like: ```hlsl struct TU { union { A a; B b; } payload; uint tag; } ``` The user can then allocate a constant buffer to hold their tagged union type, and when they pick a concrete type to use (say `B`), they fill in the first `sizeof(B)` bytes of their buffer with data describing a `B` instance, and then set the `tag` field to the appopriate 0-based index of the case type they chose (in this case the `B` case gets the tag value `1`). Actually implementing tagged unions takes a few main steps: * Type parsing was extended to special-case `__TaggedUnion` as a contextual keyword. This is really only intended to be used when parsing types from the API or command-line, and Bad Things are likely to happen if a user ever puts it directly in their code. Eventually construction of tagged unions should be an API feature and not part of the language syntax. * Semantic checking was extended to recognize that a tagged union like `__TaggedUnion(A,B)` shoud support an interface like `IFrobnicator` whenever all of the case types suport it, as long as the interface is "safe" for use with tagged unions (which means it doesn't use a few of the advancd langauge features like associated types). * The IR was extended with instructions to represent tagged union types and to extract their tag and the payload for the different cases as needed. * IR generation was extended to synthesize implementations of interface methods for any interface that a tagged union needs to support. Right now the implementation is simplistic and only handles simple method requirements, which it does by emitting a `switch` instruction to pick between the different cases. * A new IR pass was introduced to "desugar" any tagged union types used in the code. The downstream HLSL and GLSL compilers don't support `union`s, so we have to instead emit a tagged union as a "bag of bits" and implement loading the data for particular cases from it manually. * Final code emit mostly Just Works after the above steps, but we had to introduce an explicit IR instruction for bit-casting to handle the output of the desugaring pass. There are a bunch of gaps and caveats in this implementation, but that seems reasonable for something that is an experimental feature. The various `TODO` comments and assertion failures in unimplemented cases are intended, so that this work can be checked in even if it isn't feature-complete. * fixup: missing files * fixup: typos
2019-01-16	Improve handling of {} initializer list expressions (#778)	Tim Foley
	Fixes #775 It was reported (in #775) that Slang doesn't handle initializer-list syntax when initializing matrix variables. When starting on a fix for that it became apparent that the time was right to fix two broad issues in the compiler's current handling of `{}`-enclosed initializer lists. The first issue was that the front-end checking of initializer lists wasn't handling the C-style behavior where an initializer list can either contain nested `{}`-enclosed lists for sub-arrays/-structures, or directly contain "leaf" values for initializing those aggregates. For example, the following two variable declarations ought to be equivalent: ```hlsl int4 a[] = { {1, 2, 3, 4}, {5, 6, 7, 8} }; int4 b[] = { 1, 2, 3, 4, 5, 6, 7, 8 }; ``` Getting this distinction right is important because we want to support initializing a matrix either from a list of vectors for its rows, or a list of scalars for its elements (in row-major order). The front-end semantic checking logic for initializer lists was revamped so that it conceptually tries to "read" an expression of a desired type from the initializer list, and decides at each step whether to consume a single expression by coercing it to the desired type, or to recursively read multiple sub-values to construct the type as an aggregate. The logic for deciding between direct vs aggregate initialization could potentially use some tweaking, but luckily it should always handle the case where users introduce explicit `{}`-enclosed sub-lists to make their intention clear, so that existing Slang code should continue to work as before. The second issue was that initializers without the expected number of elements weren't implemented in code generation, so they would lead to internal compiler errors. This change revamps the codegen logic for initializer lists so that it can synthesize default values for fields/elements that were left out during initialization. This includes an attempt to support default initialization of `struct` fields based on explicitly written initialization expressions.
2019-01-16	Feature/external compiler reporting (#776)	jsmall-nvidia
	* Added support for converting SlangResult to string in PlatformUtil. * * Added reportExternalCompilerError * Made external compilers use this * Made DiagnosticSink accept UnownedStringSlice * Made emitXXX compiler functions return SlangError * Use smart pointers to handle life of Com interfaces * * Make SlangResult compatible with HRESULT for some common cases. * Make PlatformUtil::appendResult return SlangResult * Compile check SLANG_RESULT. * Add tests for checking diagnostics from external compilers. * * Make external compiler tests only run on windows for now. * Added 'windows' and 'unix' categories * Added categories based on what backends are available. Will make more tests run on linux and handle case where dxcompiler is not available on appveyor. * * Added spSessionCheckPassThroughSupport * Use to determine whats available for categories for tests * Add support for outputting source filename/s when using pass through.
2019-01-16	Fix a bug in IR linking (#777)	Tim Foley
	The IR linking logic was recently rewritten to use the (optional) `IRLinkageDecoration`s instead of assuming `IRGlobalVals` always have a mangled name field, and in that process a bug seems to have crept in where in the case that an instruction that would usually quality as a "global value" does not have linkage, we were failing to register the instruction we create in the output module as a replacement for the original instruction. This problem affects `static` variables inside of functions, leading to them potentially getting emitted multiple times.
2019-01-16	Add proper IR codegen support for local static const variables (#779)	Tim Foley
	Previously the IR codegen logic was treating function-scope `static const` variables just like `static` variables, which results in them generating less efficient output HLSL/GLSL. This change special-cases function-local `static const` variables with logic that mirrors how we handle global-scope `static const` variables. The approach in this change attempts to find a simpler solution to deal with `static const` variables inside of generic functions than what is currently done for `static` variables in generic functions, but I haven't tested whether that works in practice, so I didn't apply the same approach to the plain `static` case. That would make a good follow-on change. I've included a single test case to demonstrate that with this fix the Slang compiler generates output DXBC that uses an indexable "immediate" constant buffer, whereas without the fix it generates an array in local memory (slow).
2019-01-15	Fix up declaration checking order for enums (#774)	Tim Foley
	The logic in `check.cpp` for declaration checking is very messy and needs to be re-written, but in the interim we need to be careful to avoid any cases where a declaration, or some piece of it, gets redundantly checked multiple times. The way the logic had been working, the different "cases" in an `enum` type were being checked twice, and that meant that any initialization expression for a case would be type-checked the first time (potentially leading to a new AST) and then the checked AST would be checked again. This created a problem if the first round of checking introduced any AST nodes that the checking logic would not expect to see (because the parser cannot possibly produce them). The fix here is to follow the style of the other declaration checking cases, where checking is separated into two distinct phases (the "header" phase makes the declaration usable by others, while the "body" phase checks its implementation details for internal consistency). This change includes a test case that produced an internal compiler error before, and compiles without error now.
2019-01-14	Add an error for global uniform parameter declarations (#773)	Tim Foley
	A global uniform parameter in HLSL might canonically be defined like this: ```hlsl uniform float gSomeParameter; ``` The fxc and dxc compilers automatically collect all such parameters into a synthesized constant buffer, along the lines of: ```hlsl cbuffer $Globals { float gSomeParameter; } ``` Slang currently supports parsing and semantic checking of declarations like the above, and computes shader parameter layout/binding information that is appropriate for a constant buffer like `$Globals` above, but it does not include the support to emit HLSL or GLSL code that matches that layout, so that use of global uniforms in Slang is silently unsupported. Making this problem worse, the HLSL language is quite lax, and will parse the following as shader parameters as well: ```hlsl int gCounter = 0; const float kScaleFactor = 2.0f; ``` Each of those declarations introduces a global shader parameter, and then provides a default value for it via the initializer. These declarations do not introduce an ordinary global variable or constant as might be expected. (For anybody who wants to know, `static` is required to introduce a "real" global variable (although it will be a thread-local global in practice), while `static const` is required to introduce a global constant) I was not too worried about users trying to use global-scope uniforms and failing (since that has fallen out of common HLSL/GLSL practice), but the possibility that users might try to declare global variables/constants and get shader parameters by mistake creates more of a risk so that this hole is worth plugging. The right long-term fix is of course to support the intended semantics of global-scope uniforms, but that feature needs to be prioritized against other requests. A few of the Slang tests were unwittingly relying on this functionality, including some compute tests that seemingly got away with it based on the DXBC generated from the HLSL output by Slang just happening to match the layout they expected. These tests have all been tweaked to use explicit `cbuffer`s or `ParameterBlock`s instead.
2019-01-11	Fix some subtle bugs in D3D constant buffer layout (#771)	Tim Foley
	* Fix some subtle bugs in D3D constant buffer layout The root of the issue here is that the D3D constant buffer layout rules require 16-byte alignment for arrays and structures, but they do not round up the size of an array/structure type to account for that alignment. That means that in cases like the following: ```hlsl cbuffer C0 { float3 a[2]; float c0; } struct A { float4 x; float3 y; }; cbuffer C1 { A a; float c1; } ``` The `c0` and `c1` fields get an offset of 28 and not 32 like you might expect if the preceding array/structure field `a` had been padded out to match its 16-byte alignment. The actual fix here is relatively simple, and mostly amount to shuffling around some code in `type-layout.cpp` to ensure that the D3D constant buffer layout don't inherit the logic that was rounding up array/structure sizes. Along the way I took the opportunity to clean up the inheritance hierarchy by making the GLSL-family layout rules not try to share anythign with the D3D family (not that there is very little to share), which in turn allowed for some simplification of the GLSL side of things. Fixing this behavior changed the output of a few reflection tests. In the case of `tests/reflection/arrays.hlsl` the output confirmed that we had been producing bad reflection information in these kinds of cases. The output for `tests/reflection/matrix-layout.slang` also showed some bugs in our reflection, but these were overall more minor: we mis-reported the size of certain matrices as 64 bytes instead of 60, and as a result also computed the size of the overall constant buffer as 4 bytes bigger than needed. In all of these cases I double-checked the expected output against dxc to make sure that the new offsets/sizes are what we should have been producing in the first place. I also updated the reflection test harness to start outputting layout information for the element type of a structured buffer, which changed the output of `tests/reflection/structured-buffer.slang`, but this didn't show any change in what we reported: it is just information that wasn't in the output to begin with. Finally, I added two new tests around these subtle cases of buffer layout behavior (especially subtle because it varies across target APIs). The `tests/compute/buffer-layout.slang` test simply sets up a type to ilustrate the troublesome scenarios and then embeds it in both a constant buffer and structured buffer that will be backed by memory with sequential `int` values. We then read out the value of a field as a way to probe its de facto offset at runtime. This test doesn't really stress the Slang compiler (except for our ability to pass through the same type declarations to downstream compilers), but it is useful to confirm our expectations about where things land in memory. The `tests/reflection/buffer-layout.slang` test then uses the reflection test infrastructure to confirm that the same type declarations used in the compute test produce the expected offsets in our reported reflection information. Before the fixes in this change this test showed us producing dangerously incorrect results in our D3D reflection information, which has now been fixed to match the empirically-determined offsets from the compute test. * fixups based on review feedback
2019-01-07	Feature/serialization debug info (#767)	jsmall-nvidia
	* Remove AppContext. Use StdChannels to hold writers, and TestToolUtil to hold test tool specific functionality. * StdChannels -> StdWriters * getStdOut -> getOut, getStdError -> getError * Renamed main.cpp files of tools to try and stop visual studio getting confused between files - such that clicking on an error takes editor to the right location. * Work in progress on being able to serialize debug information. * * Added MemoryStream * First pass converting to IRSerialData * Able to read and write IRSerialData with debug data * Start at reconstruting IR serialized data. * First pass of generation debug SourceLocs from debug data. Works for test set for line nos. * Bug fixes. Moved testing of serialization into IRSerialUtil * Work around problem with irModule = generateIRForTranslationUnit(translationUnit); two times in a row produces different output(!). Fix by just creating once. * Remove problem with use of ternary op in slang.cpp on gcc/clang. * Added -verify-debug-serial-ir option that makes IR modules go through full serialization with debug information and verification. * Add a test that does serial debug verification that is run by default on linux.
2018-12-20	Feature/lex memory reduction (#762)	jsmall-nvidia
	* Only do scrubbing if needed. When allocating content try to limit size (with scrubbing each token takes up 1k), now it's 16 bytes min size. * Don't allocate for every call to write on the CallbackWriter - use the m_appendBuffer. * Don't allocate memory for CallbackWriter use m_appendBuffer. * Use UnownedStringSlice for suffix output for parsing float/int literals. Fix typo in invalidFloatingPointLiteralSuffix * Using memory arena to hold tokens that are not in SourceManager. * Improve comment on lexing. * Make UnownedStringSlice allocation simpler on SourceManager. * Fix error on gcc around UnownedStringSlice - because VC converted string + UnownedStringSlice automatically into a String. * Fix generateName needing concat string for gcc. * When constructing a Token in parseAttributeName - because it's a Identifier, we have to set the Name. * Remove translation through String on getIntrinsicOp * Make func-cbuffer-param disablable with -exclude compatibility-issue * Move memory leak in render-test. * From review - can just use "?:" instead of performing a concat.
2018-12-19	Refactor several IR passes (#761)	Tim Foley
	* Refactor several IR passes This change takes some IR passes that lived together in `ir.cpp` and moves them into their own files to improve clarity. In most cases these were passes introduced early in the life of the IR, so that it didn't seem like a big deal to have them all in one file, but now that `ir.cpp` has grown unwieldly this seems like an important cleanup to make. To give a quick rundown of the passes involved: * The IR "linking" step has been pulled out to `ir-link.{h,cpp}`. This code for this pass is pretty much identical to what was in `ir.cpp`, and no attempt has been made to clean up or refactor it in the current change. * The GLSL legalization step has been pulled out to `ir-glsl-legalize.{h,cpp}`. This used to be invoked directly from the linking step, but has been made a new top-level pass invoked from `emit.cpp`. Just like with the linking, the code in the new file is just a copy-paste of what was in `ir.cpp`, and no attempt at cleanup has been made. Also note that it might be a good idea to move this pass later in the overall sequence, but this PR doesn't attempt to do that as it could change results. * The generic specialization step has been pulled out to `ir-specialize.{h,cpp}`. The file name does not explicitly reference generic specialization because I anticipate this pass having to perform other kinds of specialization as well. The code in this case amounts to a heavy cleanup/refactoring pass and thus deserves careful scrutiny. The reason for the cleanup is that the generic specialization step used to be part of the "linking" step long ago, and continued to share infrastructure with it long after that stopped making sense. The newly cleaned up pass has much simpler logic that should be easy enough to follow from the comments. * In order to reduce code dulication, the IR "cloning" part of the `ir-specialize-resources.{h,cpp}` pass was pulled into its own files (`ir-clone.{h,cpp}`) that both the generic specialization step and the resource-based specialization step now share. The remaining changes then pertain to deleting a bunch of code out of `ir.cpp` and adding the new files to the build. The only test that needed updating was `vkray/raygen`, where some subtle ordering change in the refactored generic specialization logic has lead to the relative order of the specialized `TraceRay` and `saturate` functions beind reversed. * fixup: typo in assert * fixup: typos in comments
2018-12-18	Fix for byte-address buffers on Vulkan (#760)	Tim Foley
	* Fix output comparison for compute tests There was some vestigial logic there that was first doing a string-based comparison of actual/expected output, and then falling back to a path that parsed the expected output as a float, then converted that to an integer, then printed that integer in hex, and did the comparison with the result of that conversion. I'm not even clear on what that code was trying to accomplish, but its main effect was allowing a test failure to slide by unnoticed becaues somehow an all-zeroes actual output file was matching an expected output file with no zeros. My understanding is that it went something like this: * The first line of expected output was `A` (as in hexidecimal for the decimal integer `10`), and the first line of actual output was `0`. * The `StringToFloat` function was failing on the input string `"A"` and returned `0.0` to indicate failure (rather than reporting any kind of error) * We then converted the `0.0` to integer `0` and converted it to a base-16 string `"0"` * The comparison to the actual output passed, and then a careless early exit in the comparison loop meant that a full test would pass as soon as one line of output passed according to this "second change" logic. This change removes the broken code in the test runner since nothing was relying on it, other than the one broken test case we wanted to fix anyway. * Fix the declarations of byte-address buffer methods for Vulkan The HLSL `ByteAddressBuffer` and `RWByteAddressBuffer` types have methods `Load` and `Store` that take byte offsets from the start of the buffer, but we translate them into GLSL that uses `uint[]` array, so that indexing into that array will be off by a factor of four. Somehow the code for mutable byte address buffers was written to add 4, 8, and 12 bytes to the base offset of a vector to get to its subsequent components, but I never thought about the implications this would have for the base address itself. This change includes the following fixes: * Any place in the translation of a byte-address `Load` or `Store` method that was using the address/offset value has been changed to use `$1 / 4` instead of `$1`. * The offsets of 4, 8, and 12 have been changed to 1, 2, and 3 since they are now being added to an index instead of a byte offset * The `GetDimensions` methods have introduced a factor of `* 4` to account for the fact that they need to return a byte size and not a count of elements. With this change the existing `byte-address-buffer` test now produces the desired output under Vulkan.
2018-12-17	First step toward supporting use of interfaces as existential types (#716)	Tim Foley
	* First step toward supporting use of interfaces as existential types Traditional generics involve universal quantification. E.g., a declaration like: ``` void drive<T : IVehicle>(T vehicle); ``` indicates for for all types `T` that implement the `IVehicle` interface, the `drive()` function is available. In contrast, whend directly using an interface type like: ``` IVehicle v = ...; v.doSomething(); ``` we only know that there exists some concrete type (we could call it `E`) such that `v` refers to a value of type `E`, and `E` implements the `IVehicle` interface. In order to perform an operation like `v.doSomething()` we need to "open" the existential value so that we can look at the concrete type and how it implements the `IVehicle.doSomething` requirement. This change adds a very explicit representation of existentials to Slang's IR. An operation like `e = makeExistential(v, w)` creates a value of some existential type (interfaces being our only existential types for now), by wrapping a concrete value `v` (the type of `v` can be seen as an implicit operand) and a witness table `w` showing that the type of `v` implements the requirements of the chosen interface type. In turn, opening of an existential is handled with operations `extractExistential{Value\|Type\|WitnessTable}` which pull the corresponding piece of information out of a value of existential type (which somewhere in the code had to have been created with `makeExistential`). The change includes a trivial simplification pass that can detect cases where an `extractExistential` operation is applied direclty to a `makeExistential` operation, so that there is only one possible result that could be extracted. This allows for simplification of existential types used in trivial ways for local variables (this is mostly so I can check in a functional test, rather than to actually support useful code involving interfaces right now). The logic in the semantic checking phase of the compiler is comparatively more complex. When we are about to perform member lookup given an expression like `obj.member` we will first check if `obj` has an existential type, and if it does we will construct a suitable local context in which we extract the value, type, and witness table from the existential (these all become explicit AST expression nodes), and then use the extracted value as the base of the lookup operation. The nature of existential values is that two different values with the same existential (interface) type could wrap concrete values with differnt types, so that we need to carefully refer only to the extracted type/value/witness-table of specific values. We handle this right now by conceptually moving the existential-type value into a local variable (by introducing a `LetExpr` that amounts to `let v = <init> in <body>`) and then require that the extract expressions must refer to the (immutable) variable declaration from which they are extracting a value. (Eventually we should expand this so that when using an immutable local variable of existential type we just use that variable as-is rather than introduce a new temporary) A simple test case is included that uses an interface type in an almost trivial way for a local variable; this test can be run and produces the expected results. A more complex test case that passes an existential into a function is included, but left disabled because a more aggressive simplification approach is required to generate working code from it. Add missing file for expected test output * Fixups for merge from top-of-tree
2018-12-17	Specialize away resource-type function parameters (#759)	Tim Foley
	* Specialize away resource-type function parameters Work on #397. Introduction ------------ Suppose a user writes a function that takes a resource type as a parameter: ```hlsl float4 getThing(RWStructuredBuffer<float4> buffer, int index) { return buffer[index]; } ``` This function creates challenges when generating code for GLSL-based targets, because a global shader parameter of type `RWStructuredBuffer`: ```hlsl RWStructuredBuffer<float4> gBuffer; ``` translates to a global GLSL `buffer` declaration: ```hlsl buffer _S0 { float4 _data[]; } gBuffer; ``` There is no equivalent to that `buffer` declaration that can be used in function parameter position, and it is illegal in GLSL to pass `gBuffer` into a function. (Aside: yes, we could in principle translate a function parameter like `RWStructuredBuffer<float4> buffer` to `float4 buffer[]`, but that will not in turn generalize to arrays of structured buffers; it is a dead-end strategy) The solution employed by many shader compilers is to "inline everything" to eliminate the need for parameters of resource types, and then rely on dataflow optimization to eliminate locals of resource types. This strategy can of course lead to an increase in code size, and it also means that call stacks are lost when doing step-through debugging. Another serious issue is that an "early `return`" from a function can turn into the equivalent of a multi-level `break` when inlined, and not all of our targets support multi-level `break`. The solution implemented in this change works around some, but not all, of the problems with full inlining. The approach here generates specialized versions of a function like `getThing`, adapted to the actual arguments provided at different call sites. Thus if we have code like: ```hlsl RWStructuredBuffer<float4> gA; RWStructuredBuffer<float4> gB[10]; ... getThing(gA, x); getThing(gA, y); getThing(gB[someVal], z); ``` we will generate two specializations of `getThing`: one specialized for the `buffer` parameter being `gA` and the other for `gB`: ```hlsl float4 getThing_gA(int index) { return gA[index]; } float4 getThing_gB(int _val, int index) { return gB[_val][index]; } ``` and the call sites will change to match: ```hlsl getThing_gA(x); getThing_gA(y); getThing_gB(someVal, z); ``` Note how in the case where the argument being passed in was obtained by indexing into an array of resources, the callee is specialized to the identity of the global shader parameter (`gB`), and now accepts a new parameter to indicate the array index into it. While this description motivates the change based on GLSL output, the same basic issue can arise for other targets. For example, while current HLSL has added the `ConstantBuffer<T>` type, it is not supported on older targets, and it turns out that even dxc does not allow functions to have `ConstantBuffer<T>` parameters. Longer-term, we will likely need to do even more aggressive specialization both in order to generate SPIR-V output directly, and also to deal with function that have return values or `out` parameters of resource types. Implementation -------------- The meat of the change is in `ir-specialize-resources.{h,cpp}`, where we have a pass that looks at all call sites (`IRCall` instructions) in the program, and attempts to replace them with calls to specialized functions, where the specializations are generated on-demand. The code in this pass is heavily commented, so hopefully it serves to explain itself all right. After specialization is complete, we may still have functions like the original `getThing` that will produce invalid code when emitted as GLSL, so we need a way to make sure they don't appear in the output. To date we've had some very ad hoc approaches for ignoring IR constructs that we don't want to affect emitted code, but this change goes ahead and adds a more real dead code elimination (DCE) pass in `ir-dce.{h,cpp}`. This pass follows a straightforward approach of tagging instructions that are "live" and then propagating liveness through the whole program, before making a single pass to delete anything that isn't live. When I first added the DCE pass it eliminated everything because there were no "roots" for liveness. I solved this for now by adding a new decoration, `IREntryPointDecoration`, to mark shader entry points in the IR which should always be live (as should anything they depend on). A secondary problem that arose was that for GLSL ray tracing shaders it is possible for the incoming/outgoing payload or attributes parameters to be unused, but eliminating them as dead would change the signature of a shader an potential break the rules for how ray tracing programs communicate. I added a very simple `IRDependsOnDecoration` that allows one IR instruction to keep another alive as if it used it, without actually using it. There's also a fixup in the IR dumping logic where I was forgetting to store anything in the mapping from instruction to their names, so that the name of an instruction was getting incremented each time it was referenced. Testing ------- There are three different tests added as part of this change: * The `compute/func-resource-param` test covers the basic `RWStructuredBuffer` case above, which we expect to work fine for D3D11/12, but fail for Vulkan without specialization. * The `cross-compile/func-resource-param-array` test covers the case where we don't just have one resource, but an array of them. This is not an end-to-end compute test primarily because our `render-test` application doesn't yet handle arrays of resources correctly in its binding logic. * The `compute/func-cbuffer-param` test covers the case of a function with a `ConstantBuffer<T>` parameter, which requires specialization to become valid for any of our targets. * fixup: warnings/errors from other compilers * fixup: typos and cleanup * fixup: typos
2018-12-14	Represent global shader parameters explicitly in the IR (#756)	Tim Foley
	Before this change, global shader parameters were represented in the IR as just being ordinary global variables. The only indication that a particular global represented a parameter was when it got a layotu attached to it (as part of back-end processing), and we've had a number of bugs related to layouts being dropped so that what should have been a shader parameter turned into an ordinary global variable in the output. This change is more strongly motivated by the fact that making shader parameters look like globals means that we cannot easily reason about their value when doing IR transformations. If we see two `load`s from the same global variable can we assume they yield the same value? In the general case we cannot, and this means that any transformation that wants to rely on the fact that an input `Texture2D` shader parameter can't actually change over the life of the program needs to do extra work. The fix here is to introduce a new kind of IR instruction that represents a global shader parameter directly (not a pointer to it as a global would), at which point there isn't even such a notion as a "load" from the parameter, since it represents the value directly. In several cases logic that used to apply to global variables in case they were shader parameters (by looking for a layout) is now moved to apply to these global parameters. The biggest source of issues in this change was that switching from pointers to plain values to represent these shader parameters stresses different cases in type legalization. I also had to deal with the case of legalization for GLSL where we actually do need global shader parameters that are writable (since varying output goes in the global scope), but in that case I borrowed the use of pointer-like `Out<...>` and `InOut<...>` types to represent that intent, which we were already using for function parameters representing outputs. A few tests started failing because the changes lead to a slightly different order of code emission, which in some HLSL tests resulted in a function parameter named `s` getting emitted before a global parameter named `s`, leading to the latter getting the name `s_1` instead of `s_0`. A few SPIR-V tests started failing because the new approach means that we no longer end up performing a load from all varying input parameters at the start of `main` and instead reference the varying inputs directly. The resulting code is more idomatic, but it differed from the baselines for those tests.
2018-12-13	Move mangled name out of IRGlobalValue (#752)	Tim Foley
	* Move mangled name out of IRGlobalValue Previously the `IRGlobalValue` type was used as a root for all IR instructions that can have "linkage," in the sense that a definition in one module can satisfy a use in another module. The mangled symbol name was stored in state directly on each `IRGlobalValue`, which created some complications, and also forced IR instructions that wanted to support linkage to wedge into the hierarchy at that specific point. This change moves the mangled name out into a decoration: either an `IRImportDecoration` or an `IRExportDecoration`, both of which inherit from `IRLinkageDecoration` which exposes the mangled name. This change has a few benefits: * We can now have any kind of instruction be exported/imported, without having to inherit from `IRGlobalValue`. This could potentially let `IRStructType` and `IRWitnessTable` be simplified to just have operand lists instead of dummy chldren as they do today. * We can now easily have "global values" like functions that explicitly don't get linkage, instead of using a null or empty mangled name as a marker. * We can use the exact opcode on a linkage decoration to distinguish imports from exports, which could be used to more accurately resolve symbols during the linking step. Other than adding the decorations and making sure that AST->IR lowering adds them, the main changes here are around any code that used `IRGlobalValue`. Variables and parameters of type `IRGlobalValue` were changed to `IRInst` easily, so the main challenge was around code that casts to `IRGlobalValue. In cases where a cast to `IRGlobalValue` also performed a test for the mangled name being non-null/non-empty, we simply switched the code to check for the presence of an `IRLinkageDecoration`, since that is the new way of indicating a value with linakge. Most of the serious complications arose in `ir.cpp` around the "linking"/target-specialization and generic specialization steps. The "linking" logic was checking for `IRGlobalValue` to opt into some more complicated cloning logic, and just checking for a linkage decoration here wasn't sufficient since the front-end does* produce global values without linkage in some cases (e.g., for a function-`static` variable we produce a global variable without linkage). This logic was updated to just check for the cases that used to amount to `IRGlobalValue`s directly by opcode. It might be simpler in the short term to have kept `IRGlobalValue` around to make the existing casts Just Work, but I'm confident that this logic could actually be rewritten for much greater clarity and simplicity and that is the better way forward. The generic specialization logic was using some really messy code to generate a new mangled name to represent the specialized symbol, and then searching for an existing match for that name. The original idea there was that an IR module could include "pre-specialized" versions of certain generics to speed up back-end compilation by eliminating the need to specialize in some cases, but this feature has never been implemented so the overhead here is just a waste. Instead, I moved generic specialization to use a simpler dictionary to map the operands to a `specialize` instruction over to the resulting specialized value. This allows for some simplifications in the name mangling logic, because it no longer needs to figure out how to produce mangled names from IR instructions representing types/values. As part of this change I also overhauled the IR emit logic to produce cleaner output by default, borrowing some of the ideas from the logic in `emit.cpp`. IR values are now automatically given names based on their "name hint" decoration, if any, to make the code easier to follow, and I also made it so that types and literals get collapsed into their use sites in a new "simplified" IR dump mode (which is currently the default, with no way to opt into the other mode without tweaking the code). The resulting IR dumps are much nicer to look at, but as a result the one test that involves IR dumping (`ir/string-literal`) doesn't really test what it used to. One weird issue that came up during testing is that the `transitive-interface` test had previously been producing output that made no sense (that is, the expected output file wasn't really sensible), and somehow these changes were altering its behavior. Changing the test to use `int` values instead of `float` was enough to make the output be what I'd expect, and hand inspection of generating DXBC has me convinced we were compiling the `float` case correctly too. There appears to be some issue around tests with floating-point outputs that we should investigate. * fixup: C++ declaration order
2018-12-11	Decorations are instructions (#748)	Tim Foley
	* Make a test case use IR serialization * Make all IR instructions usable as parents This makes it so that every `IRInst` has the list of children that used to be on `IRParentInst` and eliminates `IRParentInst`. Most places in the code were only checking against `IRParentInst` so that they could know whether there were child instructions to iterate over. This change bloats the size of every instruction by two pointers, but we hope to be able to eliminate that overhead with a better encoding later. * Change IR decorations to be instructions. The main change here is that `IRDecoration` now inherits from `IRInst`, and `IRInst` now has a single linked list that holds both decorations and children. At each point where code used to loop over `getChildren()` on an `IRInst`, I checked whether it made sense to leave the operation as processing just the children, or if it should process both decorations and children. The thorniest bit was making sure the logic for inserting an instruction into a parent is correct. For the most part, once IR code is built all insertions are explicitly before/after another instruction, so the ordering can't get messed up. The sticking point is any code that does an explicit `insertAtStart` or `insertAtEnd`, but I surveyed those to make sure they are correct in context, and I also made all insertions bottleneck through one routine that does a better job of asserting the preconditions than what was there before. We may still want a "smart" insertion function at some point so that if somebody does `someDecoration->insertAtEnd(someInst)` the decoration intelligently goes to the end of the decoration list, and not the entire decorations-and-children list. All of the existing decoration types were refactored to provide accessors for their operands, rather than directly exposing fields. In most cases the operands are required to be `IRConstant` nodes of fixed types. Not all of these types need to be kept around in the new approach, but they were left in so that as much existing code as possible can be kept working. The `IRBuilder` was extended with factory functions to make the various decoration types and attach them. All the fields in concrete decorations that were using `StringRepresentation` or `Name` pointers are now using IR-level string operands which provide their value as an `UnownedStringSlice`, so logic that was working with those decoration values needed to be updated here and there. I also needed to add the logic to clone string-literal values to the IR cloning pass, since they are now being used in almost every piece of code. A new type of constant IR instruction for literal pointers was added, to handle the cases where an IR decoration needs an operand that is a raw AST-level pointer. These are even being serialized, although we obviously should not rely on them to round-trip through serialization in the future. Ideally, a follow-on change should add a cleanup pass where we remove any decorations from a module that shouldn't be allowed in the serialized code. The biggest overall cleanup is in the serialization logic, where a lot of code just disappears because it can process the raw "decorations and children" list as the logical children of an IR instruction. The only special cases left are literals (which seem like they will always need special-casing) and global values (because they have a mangled name, which we plan to move into a decoration). One other example of a simplification made possible by this change: the `IRNotePatchConstantFunc` instruction was implemented as an instruction only because it couldn't be encoded as a decoration at the time (it needed to have an operand that referenced an IR function). The IR dumping logic was also updated (which meant a change to the `ir/string-literal` test) to try to make it print out all decorations a bit more systematically now that they are encoded like other instructions. The formatting isn't quite perfect, but it is good enough to be able to read what is going on. I didn't include updates to the validation logic to ensure that decorations are being added in ways that follow the invariants, but that would be a nice thing to add next. * fixup: 64-bit issues * fixup: forward declaration issues
2018-12-07	Change how buffers are emitted (#741)	Tim Foley
	* Change how buffers are emitted This is a change with a lot of pieces, which can't always be separated out cleanly. I'm going to walk through them in what I hope is a logical order. The main goal of this change was to allow arrays of structured buffers to translate to Vulkan. Consider two declarations of structured buffers in HLSL/Slang: ```hlsl StructuredBuffer<X> single; StructuredBuffer<Y> multiple[10]; ``` The current translation logic was handling `single` by translating it into an unnamed GLSL `buffer` block like: ```glsl layout(std430) buffer _S1 { X single[]; }; ``` That syntax allows an expression like `single[i]` in Slang to be translated simply as `single[i]` in GLSL. But that naive translating doesn't work for `multiple`, since we need to declare a array of blocks in GLSL, which requires giving the whole thing a name: ```glsl layout(std430) buffer _S2 { Y _data[]; } multiple[10]; ``` Now a reference to `multiple[i][j]` in Slang needs to become `multiple[i]._data[j]` in GLSL. To avoid having way too many special cases around single structured buffers vs. arrays, it makes sense to allows emit things in the latter form, so that we instead lower `single` as: ```glsl layout(std430) buffer _S1 { X _data[]; } single; ``` So that now a reference to `single[i]` becomes `single._data[i]` in GLSL. Most of that can be handled in the standard library translation of the structured buffer indexing operations. The only wrinkle there is that there were some old special-case instructions in the IR intended to handle buffer load/store operations (these were added back when I was trying to keep the "VM" path working). These aren't really needed to have structured-buffer operations work; they can be handled as ordinary functions as far as the stdlib is concerned. I removed the old instructions. Along the way, it became clear that a few other cases follow the same pattern. Byte-addressed buffers are an obvious case. We were lowering HLSL/Slang: ```hlsl ByteAddressBuffer b; ... uint x = b.Load(0); ``` to GLSL like: ```glsl layout(std430) buffer _S1 { uint b[]; }; ... uint x = b[0]; ``` That logic would fail for arrays the same way that the structured buffer case was failing. The fix is the same: use named `buffer` blocks and then introduce an explicit `_data` field: ```glsl layout(std430) buffer _S1 { uint _data[]; } b; ... uint x = b._data[0]; ``` Just like with structured buffers, all of the VK translation for operations on byte-addressed buffers can be implemented directly in teh stdlib, so once the emit logic was changed it was just a matter of adding `._data` to a bunch of VK tranlsations. It turns out that arrays of constant buffers have more or less the same problem, and furthermore we have some problems with any code that directly uses the modern HLSL `ConstantBuffer<T>` type. Note: the emit logic around constant buffers sometimes refers to "parameter groups" because that is being used in the compiler as a catch-all term for constant buffers, texture buffers, and parameter blocks. The existing code was going out of its way to reproduce the way that constant buffer declarations are implicitly referenced in HLSL: ```hlsl cbuffer C { float f; } ... float tmp = f; // No reference to `C` here ``` This can be seen in the emit logic with the `isDerefBaseImplicit` function, which is used to take the internal IR representation for a reference to `f` (which is closer to the expression `(C).f` or `C->f`) and leave off any reference to `C` so that we emit just `f`. That kind of logic just flat out doesn't work in some important cases. Arrays of constant buffers are a clear one: ```hlsl ConstantBuffer<X> cbArray[3]; ... X x = cbArray[0]; ``` There is no way to translate that to an ordinary `cbuffer` declaration at all. The same problem can be created without arrays, though: ```hlsl ConstantBuffer<X> singleCB; ... X x = singleCB; ``` The current strategy for translating constant buffers was translating `singleCB` into a `cbuffer` declaration that reproduced the fields of `X` as its members, which just wouldn't work: ```hlsl cbuffer singleCB { float f; // field of `X` } ... X x = singleCB; // ERROR: there is nothing named `singleCB` in this HLSL ``` The new strategy is more consistent. We still generate a `cbuffer` declaration for a single constant buffer, but we always give it a single field of the chosen element type: ```hlsl cbuffer singleCB { X singleCB; } ... X x = singleCB; // this works fine! ``` And in the array case we generate code that uses the explicit `ConstantBuffer<T>` type: ```hlsl ConstantBuffer<X> cbArray[3]; ... X x = cbArray[0]; ``` The GLSL output is more complicated because unlike with HLSL there is no implicit conversion from a uniform block to its element type (there is no notion of an element type). The array case thus needs a `_data` field similar to what we do for structured buffers: ```glsl layout(std140) uniform _S3 { X _data; } cbArray[3]; ... X x = cbArray[0]._data; ``` And then the non-array case needs to have a similar `_data` field for consistency: ```glsl layout(std140) uniform _S1 { X _data; } singleCB; ... X x = singleCB._data; ``` This is handled by inserting the necessary reference to `_data` whenever we dereference a constant buffer, either as part of a load instruction (loading from the whole CB as a pointer), or an `IRFieldAddress` instruction which forms a pointer into the CB (e.g., `&(singleCB->f)` becomes `singleCB._data.f`). The current emit logic handles `ParameterBlock<X>` differently from `ConstantBuffer<X>`, but really only to allow parameter blocks to be explicitly named in the output, while constant buffers were left implicit by default. Thus the only difference was a legacy one (from back when trying to exactly reproduce the HLSL text we got as input was considered an important goal), and the new approach to emitting constant buffers would get rid of it. I removed the separate logic for emitting `ParameterBlock<X>` and just let the handling for constant buffers deal with it. Note that any resource types inside of a `ParameterBlock<X>` would have been moved out as part of legalization, so that a parameter block is 100% equivalent to a constant buffer when it comes time to emit code. Unsurprisingly, changing the way we generate HLSL and GLSL output for all these buffer types meant that any tests that were directly comparing the output of `slangc` against `fxc`, `dxc`, or `glslang` broke. The basic approach to fixing the breakage in GLSL tests was to update the GLSL baseline to reflect the new output startegy. In some cases I used macros to name the various `_S<digits>` temporaries so that future renaming will hopefully be easier (it would be great if we auto-generated temporary names with a bit more context). There was one GLSL test (`tests/bugs/vk-structured-buffer-binding`) that was using raw GLSL expected output, and this was changed to use a GLSL baseline to generate SPIR-V for comparison. For HLSL tests we were sometimes running the same input file through `slangc` and `fxc`/`dxc`, and in these cases I macro-ized the various `cbuffer` declarations to generate different declarations depending on the compiler. I completely dropped the tests coming from the D3D SDK because they aren't providing much coverage, and updating them would change them so far from the original code that the purported benefit (using a body of existing shaders) would be lost. I also dropped the explicit matrix layout qualifiers in the `matrix-layout` test because the new output strategy breaks those for GLSL (you can't put matrix layout qualifiers on `struct` fields, and now the body of every constant buffer is inside a `struct`). This isn't as big of a loss as it seems, because our handling of those qualifiers wasn't really right to begin with. Slang users should only be setting the matrix layout mode globally (and we should probably switch to error out on the explicit qualifiers for now). The other thing that got dropped is tests involving `packoffset` modifiers. Slang already warns that it doesn't support these, and the way they were used in the test cases is actually misleading. For the binding/layout-related tests, the goal was to show that Slang reproduces the same layout as fxc, in which case explicitly enforcing a layout via `packoffset` seems like cheating (are we sure we enforced the layout fxc would have produced?). The real reason was that Slang used to emit explicit `packoffset` on every* field of a `cbuffer` it would output, because of an `fxc` bug where you couldn't use `register` on textures/samplers declared inside a `cbuffer` unless every field in the `cbuffer` used a `register` or `packoffset` modifier. Slang hasn't required that behavior in a while because it now splits textures and samplers, and the one test case where we needed `packoffset` to work around the `fxc` bug in the baseline HLSL has been macro-ified even more to work around the bug. The amount of churn in the test cases is unfortunate, but it continues to point at the weakness of any testing strategy that checks for exact equivalent between Slang's output and that of other compilers. We need to keep working to replace these tests with better alternatives. In `check.cpp` there is logic to perform implicit dereferencing, so that if you write `obj.f` where `obj` is a `ConstantBuffer<X>` (or some other "pointer-like" type) and `f` is a field in `X`, then this effectively translates as `(obj).f`. That is, we dereference the value of type `ConstantBuffer<X>` to get a value of type `X`, and then refer to the field of the `X` value. There was a problem where the logic to insert that kind of implicit dereference operation was using a reference (`auto& type = ...`) for the type of the expression being dereferenced, and then clobbering it. This would mean that an expression of type `ConstantBuffer<X>` would have its type overwritten to be just `X` and then codegen would break later on. I'm not sure how we haven't run into that before. The `array-of-buffers` test case was added to confirm that we now support arrays of constant, structured, and byte-address buffers for both DXIL and SPIR-V output. Okay, so that was a lot of stuff, but hopefully it is clear how this all works to make the output of the compiler more consistent and explicit, while also supporting the required new functionality. fixup: review feedback