slang.git - Making it easier to work with shaders

	Commit message (Collapse)	Author	Age
*	Use slang- prefix on slang compiler and core source (#973)	jsmall-nvidia	2019-05-31
\| \| \| \| \| \| \| \| \| \| \| \|	* Prefixing source files in source/slang with slang- * Prefix source in source/slang with slang- prefix. * Rename core source files with slang- prefix. * Update project files. * Fix problems from automatic merge.
*	Changes required for application adoption of interface-type parameters (#963)	Tim Foley	2019-05-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* A few changes required for application adoption of interface-type parameters There are a few small changes here that are all related in that they arose from trying to integrate support for specialization via global interface-type shader parameters into a real application. Allow querying the "pending" layout via reflection API ------------------------------------------------------ The naming here isn't ideal, and could probably use a round of "bikeshedding" to arrive at something better, but the basic idea is that when you have a type like: ``` struct MyStuff { int a; IFoo foo; int b; } ``` the fields `a` and `b` get allocated space directly in the "primary" layout for `MyStuff` (at offsets 0 and 4, with `sizeof(MyStuff) == 8`), but the `foo` field can't be allocated space until we know what concrete type will get plugged in there. If we have a concrete type in mind: ``` struct Bar : IFoo { int bar; } ``` then we can know how much space the `foo` field will take up, but we still can't allocate it space directly in `MyStuff`, because we already decided that `sizeof(MyStuff) == 8`. Now imagine we place some `MyStuff` values into constant buffers: ``` cbuffer X { MyStuff x; } cbuffer Y { MyStuff y; float4 z; } ``` In each case we know that we want to place the `MyStuff::foo` field at the end of the containing constant buffer so that it doesn't disrupt the layout of the existing fields. But that means that the offset of `MyStuff::foo` relative to the start of the `MyStuff` isn't fixed, because of unrelated fields like `z` that need to get in between. In our layout code, we handle this by having a notion of a "pending" layout. Once we know how `MyStuff::foo` will be specialized, we can compute both a "primary" and a "pending" layout for `MyStuff`, which basically treats it as if it were two distinct types: ``` struct MyStuff_Primary { int a; int b; } struct MyStuff_Pending { Bar foo; } ``` Layout for an aggregate type like the `X` or `Y` constant buffer then proceeds by computing an aggregate primary layout and an aggregate pending layout, and then finally a constant buffer or parameter block "flushes" all or part of the pending data by appending it to the primary data to get the final layout. What all this means is that a type like `MyStuff` will have two different layouts (a default one for the primary data and a "pending" one for any specialized interface-type fields), and a variable like `Y::y` will also have two variable layouts that specify offsets (one set of offsets for its primary part, and one set of offsets for its pending part). In order to handle interface-type fields with these layout rules, an application needs a way to query the "pending" part of a type or variable layout, which luckily gives it back just another type/variable layout. The API change here is minimal, although actually exploiting the new API correctly in application code could prove challenging. Allow creating of explicitly specialized types ---------------------------------------------- This feature isn't actually implemented all the way through the compiler (I just needed enough to make the API calls go through), but I've added support for specializing a type that has interface-type fields through the reflection API. This maps to an `ExistentialSpecializedType` in the AST, and I'm lowering it to the IR as a `BindExistentialsType`, although that isn't 100% correct for the future. This feature will require a future PR to actually flesh out the implementation work, but I'll wait until that is the sticking point on the application side before I do that. Introduce a tiny `Hasher` abstraction ------------------------------------- While implementing all the boilerplate for a new `Type` subclass (we really need to reduce that work...), I got fed up with how we do hash-code computation and introduced a small utility `Hasher` type that is intended to wrap up the idiom of combining hashes. For now this isn't a major change, but in the future I'd like to expand on the design a bit to clean up some of the warts around how we handle hashing: * The `Hasher` implementation can and should switch from maintaining a single `HashCode` as its state to something that contains a more complete state (larger than the hash code) and just hashes new bytes into that state as it goes. This should make it possible to implement a `Hasher` for more serious hash functions, whether MD5, CityHash, or whatever we decide is good default. * Things that are hashable shouldn't have a `getHashCode()` method, but instead should have something like a `hashInto(Hasher&)` method. This change would have the dual benefits that (1) a composite type can easily hash all the fields that contribute to its identity into the hasher with minimal fuss/boilerplate, and (2) the hashes for composite types will be of higher quality because they can exploit all the bits of the hasher's state to combine the fields, instead of restricting each sub-field to just the bits in a hash code. We should be able to incrementally improve the quality of our design there over future changes, but for now it probably isn't a critical priority. Fixes for legalization of existential types ------------------------------------------- There were some missing cases in the handling of type legalization, such that a global interface-type shader parameter that got specialized to a type that contains only resource-type fields would cause a crash in the legalization step. I added a test for this case, and then made `ir-legalize-types.cpp` account for this case (the code to handle it ias a bit of a kludge, and shows that the `declareVars()` routine there is getting to a level of complexity that is worrying. * fixup: review feedback
*	String/List closer to conventions, and use Index type (#959)	jsmall-nvidia	2019-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* List made members m_ Tweaked types to closer match conventions. * Use asserts for checking conditions on List. Other small improvements. * List<T>.Count() -> getSize() * List<T> Add -> add First -> getFirst Last -> getLast RemoveLast -> removeLast ReleaseBuffer -> detachBuffer GetArrayView -> getArrayView * List<T>:: AddRange -> addRange Capacity -> getCapacity Insert -> insert InsertRange -> insertRange AddRange -> addRange RemoveRange -> removeRange RemoveAt -> removeAt Remove -> remove Reverse -> reverse FastRemove -> fastRemove FastRemoveAt -> fastRemoveAt Clear -> clear * List<T> FreeBuffer -> _deallocateBuffer Free -> clearAndDeallocate SwapWith -> swapWith * List<T> SetSize -> setSize Reserve -> reserve GrowToSize growToSize * UnsafeShrinkToSize -> unsafeShrinkToSize Compress -> compress FindLast -> findLastIndex FindLast -> findLastIndex Simplify Contains * List<T> Removed m_allocator (wasn't used) Swap -> swapElements Sort -> sort Contains -> contains ForEach -> forEach QuickSort -> quickSort InsertionSort -> insertionSort BinarySearch -> binarySearch Max -> calcMax Min -> calcMin * Initializer::Initialize -> initialize List<T>:: Allocate -> _allocate Init -> _init IndexOf -> indexOf * * Put #include <assert.h> in common.h, and remove unneeded inclusions * Small refactor of ArrayView - remove stride as not used * getSize -> getCount setSize -> setCount unsafeShrinkToSize->unsafeShrinkToCount growToSize -> growToCount m_size -> m_count * Some tidy up around Allocator. * Use Index type on List. * Refactor of IntSet. First tentative look at using Index. * Made Index an Int Did preliminary fixes. Made String use Index. * Partial refactor of String. * String::Buffer -> getBuffer ToWString -> toWString * Small improvements to String. String:: Buffer() -> getBuffer() Equals() -> equals * Try to use Index where appropriate. * Fix warnings on windows x86 builds.
*	Initial support for the `precise` keyword (#958)	Tim Foley	2019-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes #858 The `precise` keyword exists in both HLSL and GLSL and when applied to a variable declaration is supposed to indicate that all computations that contribute to the value of that variable should not be altered based on "fast-math" optimizations. The main examples are that separate multiply and add operations should not be turned into fused multiply-add (fma) operations, and that operations cannot ignore the possibility of infinity or not-a-number values (e.g., by assuming that `x * 0.0f` is always `0.0f`). (Aside: it is possible that my understanding of what the semantics of `precise` are in HLSL and GLSL is imperfect so that either the GLSL variant isn't sufficient to provide the semantics of the HLSL keyword, or that the definition of "all computations that contribute" to a value isn't actually correct. We may need to revise this implementation based on subsequent learnings.) The basic idea here is to turn the AST `precise` keyword into a `[precise]` decoration in the IR and then emit that as a `precise` keyword again in the output. The main catch is that whereas most of our existing IR decorations apply to things like global shader parameters or `struct` members that usually stick around for the duration of compilation, `[precise]` will get slapped on local variables that will often get optimized away by our SSA pass. There are two ways a variable can get eliminated/replaced during the SSA pass: 1. A use of the variable can be replaced with an ordinary instruction that computes its value. 2. A use of the variable can be replaced with a reference to a "phi node" that will take on the appropriate value based on control flow. These two cases already had logic to copy a "name hint" decoration from the variable over to an instruction that will replace it, and I simply extended them to also propagate over a `[precise]` decoration. The test case added with this change intentionally constructs a case where `[precise]` needs to be propagated over to an SSA "phi node" in order to generate correct output code. The other gotcha is that we can emit variable declarations in various places in `emit.cpp`, and all of these needed to handle `[precise]`. Not only do we have actually local variables (`IRVar`), but we also have SSA phi nodes (`IRParam`), and then there are cases where an intermediate computation (an ordinary instruction) should be `[precise]` and thus we need to emit it as a temporary (not folding it into its use sites) and make sure that the temporary itself gets the `precise` keyword. I have manually confirmed that in the output SPIR-V, this change results in the `NoContraction` SPIR-V decoration being added to the relevant operations, and the output DXBC contains a multiply and an add in place of a multiply-add. The output DXIL does not show any obvious changes due to `precise`, although the exact order and operands of the math instructions emitted does differ when `precise` is added/removed. In all cases the output is equivalent to hand-written HLSL/GLSL with a `precise`-qualified local variable.
*	Add better control over image formats for GLSL/SPIR-V targets (#939)	Tim Foley	2019-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add better control over image formats for GLSL/SPIR-V targets Currently Slang emits GLSL code assuming all R/W images need to have explicit formats, and thus we try to infer a format from the element type of the image. E.g., given a `RWTexture2D<half4>` we might infer that a qualifier of `layout(rgba16f)` should be used. This strategy has two notable shortcomings: * Sometimes the user will want a format that doesn't match an existing HLSL type. E.g., if they want the equivalent of `layout(r11f_g11f_b10f)`, then what should they put in their `RWTexture2D<...>` to make the inference do what they need? * Sometimes the user knows that they don't need to specify a format at all, because using the `GL_EXT_shader_image_load_formatted` extension, they can still perform non-atomic load/store on images with no format specified in the SPIR-V. This change adds two features directed at these challenges. First, we add an explicit `[format(...)]` attribute that can be used to specify an explicit image format, including ones that don't match any HLSL type. An example of using this new attribute is: ```hlsl [format("r11f_g11f_b10f")] RWTexture2D<float3> myImage; ``` For simplicity in initial bring-up, the new formats all use the same naming as formats in GLSL (this should make it easy for a programmer who knows what they expect to get in the GLSL output). We can change the naming convention for formats at a later time, so long as we keep these existing names in as a compatibility feature. Note that this is not given a `vk::` prefix since the attribute should signal the programmer's intent to provide an image with that format on all targets (although only some targets might act on that information). Also note that the attribute takes a string (`[format("rgba8")`) instead of a bare identifier (`[format(rgba8)]`) because this is consistent with the existing convention for attributes in HLSL. When `[format(...)]` is left off, the default compiler behavior will still be to infer a format, but this behavior can be overidden for a single image using an explicit format of `"unknown"`: ```hlsl [format("unknown")] RWTexture2D<float4> mysteryMachine; ``` The second new feature is that if a user knows they are coding for a GPU that supports the `"unknown"` format in all non-atomic cases, then they can opt into making that the default for images without an explicit `[format(...)]`, using the new `-default-image-format-unknown` command-line option for `slangc`. The new test case included with this change confirms that we correctly see the explicit formats in the output GLSL and no formats for images without explicit `[format(...)]` when using the new command-line option. The test stresses images declared at global scope, in parameter blocks, and in entry-point parameter lists, to try and make sure that all the relevant IR passes in the compiler preserve the format information. * fixup: missing file
*	Improve support for interfaces as shader parameters (#886)	Tim Foley	2019-03-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Improve support for interfaces as shader parameters This change adds two main things over the existing support: 1. It is now possible to plug in concrete types that actually contain (uniform/ordinary) fields for the existential type parameters introduced by interface-type shader parameters. The `interface-shader-param2.slang` test shows that this works. 2. There is a limited amount of support for doing correct layout computation and generating output code that matches that layout, so that interface and ordinary-type fields can be interleaved to a limited extent. The `interface-shader-param3.slang` test confirms this behavior. There are several moving pieces in the change. * When it comes to terminology, we try to draw a more clear distinction between existial type parameters/arguments and existential/object value parametes/arguments. A simple way to look at it is that an `IFoo[3]` shader parameter introduces a single existential type parameter (so that a concrete type argument like `SomeThing` can be plugged in for the `IFoo`) but introduces three existential object/value parameters (to represent the concrete values for the array elements). * At the IR level, we support a few new operations. A `BindExistentialsType` can take a type that is not itself an interface/existential type but which depends on interfaces/existentials (e.g., `ConstantBuffer<IFoo>`) and plug in the concrete types to be used for its existential type slots. * Then a `wrapExistentials` instruction can take a type with all the existentials plugged in (possibly by `BindExistentialsType`) and wrap it into a value of the existential-using type (e.g., turn `ConstantBuffer<SomeThing>` into a `ConstantBuffer<IFoo>`). * The IR passes for doing generic/existential specialization have been updated to be able to desugar uses of these new operations just enough so that a `ConstantBuffer<IFoo>` can be used. * When we specialize an IR parameter of an interface type like `IFoo` based on a concrete type `SomeThing`, we turn the parameter into an `ExistentialBox<SomeThing>` to reflect the fact that we are conceptually referring to `SomeThing` indirectly (it shouldn't be factored into the layout of its surrounding type). * Parameter binding was updated so that it passes along the bound existential type arguments in a `Program` or `EntryPoint` to type layout, so that we can take them into account. The type layout code needs to do a little work to pass the appropriate range of arguments along to sub-fields when computing layout for aggregate types. * Type layout was updated to have a notion of "pending" items, which represent the concrete types of data that are logically being referenced by existential value slots. The basic idea is that these values aren't included in the layout of a type by default, but then they get "flushed" to come after all the non-existential-related data in a constant buffer, parameter block, etc. * The logic for computing a parameter group (`ConstantBuffer` or `ParameterBlock`) layout was updated to always "flush" the pending items on the element type of the group, so that the resource usage of specialized existential slots would be taken into account. * The type legalization pass has been adapted so that we can derive two different passes from it. One does resource-type legalization (which is all that the original pass did). The new pass uses the same basic machinery to legalize `ExistentialBox<T>` types by moving them out of their containing type(s), and then turning them into ordinary variables/parameters of type `T`. Big things missing from this change include: - Nothing is making sure that "pending" items at the global or entry-point level will get proper registers/bindings allocated to them. For the uniform case, all that matters in the current compiler is that we declare them in the right order in the output HLSL/GLSL, but for resources to be supported we will need to compute this layout information and start associating it with the existential/interface-type fields. - Nothing is being done to support `BindExistentials<S, ...>` where `S` is a `struct` type that might have existential-type fields (or nested fields...). Eventually we need to desugar a type like this into a fresh `struct` type that has the same field keys as `S`, but with fields replaced by suitable `BindExistentials` as needed. (The hard part of this would seem to be computing which slots go to which fields). As a practial matter, this missing feature means that interface-type members of `cbuffer` declarations won't work. The current tests carefully avoid both of these problems. They don't declare any buffer/texture fields in the concrete types, and they don't make use of `cbuffer` declarations or `ConstantBuffer`s over structure types with interface-type fields. * fixup: add override to methods * fixup: typos
*	Confirm layout for structured buffer of matrices, and related fixes (#889)	Tim Foley	2019-03-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix up handling of NumElements for D3D buffer views For everything but structured buffers we'd been setting this to the size in bytes, but that isn't really valid at all. The `NumElements` member in the view descs is supposed to be the number of buffer elements, so it would be capped at the byte size divided by the element size. This change fixes the computation of `NumElements` to take the size of the format into account for non-structured views. For the "raw" case, we use a size of 4 bytes since that matches the `DXGI_FORMAT_R32_TYPELESS` format we use (which seems to be required for raw buffers). I also added support for the raw case for SRVs where it didn't seem to be supported before (not that any of our tests cover it). * Fix handling of size padding for D3D11 buffers The existing code was enforcing a 256-byte-aligned size for all buffers, but this can cause problems for a structured buffer. A structured buffer must have a size that is a multiple of the stride, so a structured buffer with a 48-byte stride and a 96-byte size would get rounded up to 256 bytes, which is not an integer multiple of 48. This change makes it so that we only apply the padding to constant buffers. According to MSDN, constant buffers only require padding to a 16-byte aligned size, and no other restrictions are listed for D3D11, but it is difficult to know whether those constrains are exhaustive. I've left in the 256-byte padding for now (rather than switch to 16-byte), even though I suspect that was only needed as a band-aid for the `NumElements` issue fixed by another commit. * Fix an IR generation bug when indxing into a strutured buffer element The problem here arises when we have a structured buffer of matrices (an array type would likely trigger it too): ```hlsl RWStructuredBuffer<float3x4> gMatrices; ``` and then we index into it directly, rather than copying to a temporary: ```hlsl // CRASH: float v = gMatrices[i][j][k]; // OKAY: float3x4 m = gMatrices[i]; float v = m[j][k]; ``` The underlying issue is that our IR lowering pass tries to defer the decision about whether to use a `get` vs. `set` vs. `ref` accessor for a subscript until as late as possible (this is to deal with the fact that sometimes D3D can provide a `ref` accessor where GLSL can only provide a `get` or `set`). We probably need to overhaul that aspect of IR codegen sooner or later, but this change uses some of the existing machinery to try to force the `gMatrices[i]` subexpression to take the form of a pointer when doing sub-indexing like this. This fixes the present case, and hopefully shouldn't break anything else that used to work (because the subroutines I'm using to coerce the `gMatrices[i]` expression should be idempotent on the cases that were already implemented). * Add a test case to confirm fxc/dxc layout for structured buffers of matrices Even when row-major layout is requested globally, fxc and dxc seem to lay out a `StructuredBuffer<float3x4>` with column-major layout on the elements. This commit adds a test that confirms that behavior. This commit does not try to implement a fix for the issue (either fixing Slang's layout reflection information to be correct for what fxc/dxc do in practice, or fixing Slang's HLSL output to work around the fxc/dxc behavior), but just documents the status quo. If/when we decide how we'd like to handle the issue long-term, this test can/should be updated to match the decision we make. * fixup: build breakage on clang/gcc This is one of those cases where the Microsoft compiler is letting through some stuff that isn't technically valid C++ ("delayed template parsing"). Fixed by just moving some declarations to earlier in the file.
*	First steps toward supporting interface-type parameters on shaders (#852)	Tim Foley	2019-02-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First steps toward supporting interface-type parameters on shaders What's New ---------- From the perspective of a user, the main thing this change adds is the ability to declare top-level shader parameters (either at global scope, or in an entry-point parameter list) with interface types. For example, the following becomes possible: ```hlsl // Define an interface to modify values interface IModifier { float4 modify(float4 val); } // Define some concrete implementations struct Doubler : IModifier { float4 modify(float4 val) { return val + val; } } struct Squarer : IModifier { ... } // Define a global shader parameter of interface type IModifier gGlobalModifier; // Define an entry point with an interface-type `uniform` parameter void myShader( unifrom IModifier entryPointModifier, float4 inColor : COLOR, out float4 outColor : SV_Target) { // Use the interface-type parameters to compute things float4 color = inColor; color = gGlobalModifier.modify(color); color = entryPointModifier.modify(color); outColor = color; } ``` The user can specialize that shader by specifying the concrete types to use for global and entry-point parameters of interface types (e.g., plugging in `Doubler` for `gGlobalModifier` and `Squarer` for `entryPointModifier`). The "plugging in" process is done in terms of a concept of both global and local "existential slots" which are a new `LayoutResourceKind` that represents the holes where concrete types need to be plugged in for existential/interface types. In simple cases like the above, each interface-type parameter will yield a single existential slot in either the global or entry-point parameter layout. Users can query the start slot and number of slots for each shader parameter, just like they would for any other resource that a parameter can consume. Before generating specialized code, the user plugs in the name of the concrete type they would like to use for each slot using `spSetTypeNameForGlobalExistentialSlot` and/or `spSetTypeNameForEntryPointExistentialSlot`. There are some major limitations to the implementation in this first change: * Parameters must be of interface type (e.g., `IFoo`) and not an array (`IFoo[3]`), or buffer (`ConstantBuffer<IFoo>`) over an interface type. Similarly, `struct` types with interface-type fields still don't work. * The work on interface-type function parameters still doesn't include support for `out` or `inout` parameters, nor for functions that return interface types (that isn't technically related to this change, but affects its usefullness). * No work is being done to correctly lay out shader parameters once the concrete types for existential slots are known, so that this change really only works when the concrete type that gets plugged in is empty. These limitations are severe enough that this feature isn't really usable as implemented in this change, and this merely represents a stepping stone toward a more complete implementation. Implementation -------------- The API side of thing largely mirrors what was already done to support passing strings for the type names to use for global/entry-point generic arguments, so there should be no major surprises there. The logic in `check.cpp` computes the list of existential slots when creating unspecialized `Program`s and `EntryPoint`s (this is logically the "front end" of the compiler), and then checks the supplied argument types against what is expected in each slot when creating specialized `Program`s and `EntryPoint`s. This again mirrors how generic arguments are handled. Type layout was extended to compute the number of existential slots that a type consumes, and will thus automatically assign ranges of slots to top-level and entry-point shader parameters in the same way it already allocates `register`s and `binding`s. The big missing feature is the ability to specialize a layout to account for the concrete types plugged into the existential-type slots. IR generation for specialized programs and entry points was slightly extended so that it attaches information about the concrete types plugged into the existential slots, and the witness tables that show how they conform to the interface for that slot. The linking step needed some small tweaks to make sure that information gets copied over to the target-specific program when we start code generation. The meat of the IR-level work is in `ir-bind-existentials.cpp`, which takes the information that was placed in the IR module by the generation/linking steps and uses it to rewrite shader parameters. For example, if there is a shader parameter `p` of type `IModifier`, and the corresponding existential slot has the type `Doubler` in it, we will rewrite the parameter to have type `Doubler`, and rewrite any uses of `p` to instead use `makeExistential(p, /witness that Doubler conforms to IModifier/)`. Once the replacement is done on the parameters, the existing work for specializing existential-based code when the input type(s) are known kicks in and does the rest. Testing ------- A single compute test is added to validate that this feature works. It is narrowly tailored to not require any of the features not supported by the initial implementation (e.g., all of the concrete types used have no members). The test case does include use of an associated type through one of these existential-type parameters, which has exposed a subtle bug in how "opening" of existential values is implemented in the front-end. Rather than fix the underlying problem, I cleaned up the code in the front-end to special case when the existential value being opened is a variable bound with `let`, to directly use a reference to that variable rather than introduce a temporary. Similarly, in the IR generation step, I added an optimization to make variables declared with `let` skip introducing an IR-level variable and just use the SSA value of their initializer directly instead. * fixup: missing files * fixup: incorrect type for unreachable return * fixup: actually comment ir-bind-existentials.cpp
*	Split front- and back-ends (#846)	Tim Foley	2019-02-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Split front- and back-ends This change is a major refactor of several of the types that provide the behind-the-scenes implementation of the public C API. The goal of this refactor is primarily to allow for future API services that let the user operate both the front- and back-ends of the compiler in a more complex fashion. For example, as user should be able to compile a bunch of source code into modules, look up types, functions, etc. in those modules, specialize generic types/functions to the types they've looked up, and then finally request target code to be gernerated for specialized entry points. The back-end code generation they trigger should re-use the front-end compilation work (parsing, semantic checking, IR generation) that was already performed. The most visible change is that `CompileRequest` has been split up into several smaller types that take responsibility for parts of what it did: * The `Linkage` type owns the storage for `import`ed modules, and well as the `TargetRequest`s that represent code-generation targets. The intention is that an application could use a single `Linkage` for the duration of its runtime (so long as it was okay with the memory usage), so that each `import`ed module only gets loaded once. For now, this type needs to manage the search paths, file system, and source manager, because of its responsibility for loading files. * A `FrontEndCompileRequest` owns the stuff related to parsing, semantic checking, and initial IR generation. This most notably includes the `TranslationUnitRequest`s and the `FrontEndEntryPointRequest`s (which used to be just `EntryPointRequest`s). It's main job is to produce AST and IR modules for each translation unit, and to find and validate the entry points. The front-end request does not interact with generic arguments for global or entry-point generic parameters. * The main output of both `import` operations and front-end translation units is the `Module` type, which is just a simple container for both the AST module (to service the reflection/layout APIs, and also for semantic checking of code that `import`s the module) and the IR module (for linking and code generation). This type captures the commonalities between the old `LoadedModule` (which is now just an alias for `Module`) and `TranslationUnitRequest` (which now owns a `Module`). * The secondary output of front-end compilation is a `Program`, which comprises a list of referenced `Module`s and validated `EntryPoint`s that will be used together. Layout and code generation both need a `Program` to tell them what modules and entry points will be used together (we don't want to just code-gen everythin that has ever been loaded into the linakge). The `Program`s created by the front-end do not include generic arguments, so they may provide incomplete layout information and/or be unsuitable for code generation. * A `BackEndCompileRequest` owns stuff related to turning a `Program` into output kernels for the targets of a `Linkage`. Most of the data it owns beyond the `Program` to be compiled is minor, so this is a good candidate for demotion from a heap-allocated object to just a `struct` of options that gets passed around. * The `CompileRequestBase` type is an attempt to wrap up the common functionality of both front-end and back-end compile requests. Most of it is just exposing the availability of a linkage and `DiagnosticSink`, so this type is a good candidate for subsequent removal. The main interesting thing it has is the flags related to dumping and validation of IR, so there is probably a good refactoring still to be made around deciding how options should be handled going forward. * Behind the scenes, the `Program` type is set up to handle some level of on-line compilation and layout work. The `Program` knows the `Linkage` it belongs to, and allows for a `TargetProgram` to be looked up based on a specific `TargetRequest`. A `TargetProgram` then allows layout information and compiled kernel code to be asked for on-demand, in order to support eventual "live" compilation scenarios. * The `EndToEndCompileRequest` type is a composition/coordination type that replaces the old `CompileRequest` in a way that uses the services of the various other types. It owns a few pieces of state that only make sense in the context of an end-to-end compile (e.g., there is really no way to "pass through" code when the front- and back-ends are run separately) or a command-line compile (everything to do with specifying output paths for files is really just for the benefit of `slangc`, and might even be moved there over time). * One important detail is that the `EndToEndCompilRequest` owns all of the string-based generic arguments for both global and entry-point generic parameters. The logic in `check.cpp` for dealing with those arguments has been heavily refactored to separate out the parsings steps that are specific to end-to-end compilation with string-based type arguments, and the semantic checking steps that result in a specialized `Program` (which can be exposed through new APIs that aren't tied to end-to-end compilation). It is perhaps not surprising that this change had a lot of consequences, so I'll briefly run over some of the main categories of changes required: * I changed the way that global generic arguments are passed via API (use `spSetGlobalGenericArgs` instead of the generic arguments for `spAddEntryPointEx`, which are not just for entry-point generics), which has been a change that we've needed for a long time. This is technically a breaking API change, although we should have very few client applications that care about it. * A bunch of places that used to take "big" objects like `CompileRequest` now just take the sub-pieces they care about (e.g., a function might have only needed a `Linkage` and a `DiagnosticSink`). This makes many subroutines or "context" struct types more generally useful, at the cost of taking more parameters. * In a few cases the conceptually clean separation of the layers breaks down (often for edge-case or compatibility features), and so we may pass along additional objects that are allowed to be null, but are used when present. A big example of this is how the back-end code generation routines accept an `EndToEndCompileRequest` that is optional, and only used to check whether "pass through" compilation is needed. We should probably look into cleaning this kind of logic up over time so that we don't need to violate the apparent separation of phases of compilation. * In cases where separation of layers was being broken for the sake of GLSL features, I went ahead and ripped them out, since all of that should be dead code anyway. * In many cases I increased the encapsulation of data in the core types to help track down use sites and make sure they are following invariants better. * In cases where code was doing, e.g., `context->shared->compileRequest->session->getThing()` I have tried to introduce convenience routines so that the usage site is just `context->getThing()` to improve encapsulation and allow changes to be made more easily going forward. * The `noteInternalErrorLoc` functionality was moved off of the compile request and into `DiagnosticSink`, since that is the one type you can rely on having around when you want to note an internal error. We may consider going forward if (and how) it should reset the counter used for noting locations on internal errors. * A few APIs now take `DiagnosticSink` arguments where they didn't before, and as a result some public APIs need to create `DiagnosticSink`s to pass in, before going ahead and ignoring the messages. In the future there should be variations of these APIs that accept an `ISlangBlob` parameter for the output. fixup: missing include for compilers with accurate template checking (non-VS) * fixup: review feedback
*	Fix typos in diagnostic message and comments (#843)	jsmall-nvidia	2019-02-13
\| \| \| \| \| \| \|	* * Fix some comment typos * Fix typo in diagnostic message * Fix typo in expected output of undefined-in-preprocessor-conditional
*	Allow entry points to have explicit generic parameters (#826)	Tim Foley	2019-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Allow entry points to have explicit generic parameters Prior to this change, the Slang implementation required users to use global `type_param` declarations in order to specialize a full shader. For example: ```hlsl type_param L : ILight; ParameterBlock<L> gLight; [shader("fragment")] float4 fs(...) { ... gLight.doSomething() ... } ``` With this change we can rewrite code like the above using explicit generics, plus the ability to have `uniform` entry-point parameters: ```hlsl [shader("fragment")] float4 fs<L : ILight>( uniform ParameterBlock<L> light, ...) { ... light.doSomething() ... } ``` Having this support in place should make it possible for us to eliminate global generic type parameters and the complications they cause (both at a conceptual and implementation level). The most central and visible piece of the change is that `EntryPointRequest` now holds a `DeclRef<FuncDecl>` instead of just ` RefPtr<FuncDecl>`, which allows it to refer to a specialization of a generic function. Various places in the code that refer to the `EntryPointRequest::decl` member now use a `getFuncDecl()` or `getFuncDeclRef()` method as appropriate (see `compiler.h`). In order to fill in the new data, the `findAndValidateEntryPoint` function has been greaterly overhauled. The changes to its operation include: * The by-name lookup step for the entry point function has been adapted to accept either a function or a generic function. * The generic argument strings provided by API or command line are no longer parsed all the way to `Type`s, but instead just to `Expr`s in the first pass. * There are now two cases for checking the global generic arguments against their matching parameters. The first case is the new one, where we plug the generic argument `Expr`s into the explicit generic parameters of an entry point (that case re-uses existing semantic checking logic). The second case is the pre-existing code for dealing with global generic type arguments. The `lower-to-ir.cpp` logic for hadling entry points then had to be extended. Making it deal with a full `DeclRef` instead of just a `Decl` was the easy part (just call `emitDeclRef` instead of `ensureDecl`). The more interesting bits were: * We need to carefully add the `IREntryPointDecoration` to the nested function and not the generic in the case where we have a generic entry point. There is a handy `getResolvedInstForDecorations` that can extract the return value for an IR generic so that we can decorate the right hting. * We need to make sure that in the case where we emit a `specialize` instruction (which normally wouldn't get a linkage decoration), we attach an `[export(...)]` decoration to it with the mangled name of the decl-ref, so that it can be found during the linking step. The IR linking step is then slightly more complicated because the mangled entry point name could either refer directly to an `IRFunc` or to a `specialize` instruction for a generic entry point. The logic was refactored to first clone the entry point symbol without concern for which case it is (the old code was specific to functions), and then if the result is a `specialize` instruction, we attempt to run generic specialization on-demand. That on-demand specialization is a bit of a kludge, but it deals with the fact that all the downstream passing only expect to see an `IRFunc`. A future cleanup might try to split out that specialization step into its own pass, which ends up being a limited form of the specialization pass. Since I was already having to touch a lot of the code around IR linking, I went ahead and refactored the signature of the operations. I eliminated the need for the caller to create, pass in, and then destroy an `IRSpecializationState` (really an IR linking state), and replaced it with a structure local to the pass (that data structure was a remnant of an older approach in the compiler), and then also renamed the main operation to `linkIR` to reflect what it is doing in our conceptual flow. Smaller changes made along the way include: * Refactored `visitGenericAppExpr` to create a subroutine `checkGenericAppWithCheckedArgs` so that it can be used by the entry-point validation logic described above). * Refactored the declarations around the IR passes in `emitEntryPoint()` (`emit.cpp`), to show that things are more self-contained than they used to be (e.g., that the `TypeLegalizationContext` is now only needed by one pass). * Refactored the generic specialization code so that there is a stand-along free function that can perform specialization on a `specialize` instruction without all the other context being required. This is only to support the limited specialization that needs to be done as part of linking. * Updated the `global-type-param.slang` test to actually test entry-point generic parameters. In a later pass we can/should rework all the tests/examples for global type parameters over to use explicit entry-point generic parameters (at which point we should rename the tests as well). For now I am leaving thigns with just one test case, with the expectation that bugs will be found and ironed out as we expand to more tests. * fixup * Fixup: don't leave entry-point decorations on stuff we don't want to keep The IR `[entryPoint]` decoration is effectively a "keep this alive" decoration, which means that attaching it to something we don't intend to keep around can lead to Bad Things. The approach to generic entry points was attaching `[entryPoint]` to the underlying `IRFunc` because that seemed to make sense, but that meant that the `specialize` instruction at global scope scould instantiate that generic and then keep it alive, even if the resulting function wouldn't be valid according to the language rules. As a quick fix, I'm attaching `[entryPoint]` to the `specialize` instruction instead in such cases, and then re-attaching it to the result of explicit specialization during linking. * Port most of remaining test and rename global type parameters This change ports as many as possible of the existing tests for global type parameters over to use entry-point generic parameters instead. For the most part this is a mechanical change. A few test cases remain using global generic parameters, as does the `model-viewer` example application. The reason for this is that the shaders have either or both the following features: * A vertex and fragment shader that can/shold agree on their parameters * A type declaration (e.g., a `struct`) that is dependent on one of the generic type parameters In these cases, it would really only make sense to switch to explicit parameters once we support shader entry points nested inside of a `struct` type, so that we can use an outer generic `struct` as a mechanism to scope the entry points and other type-dependent declrations. Since global-scope type parameters need to persist for at least a bit longer, I went ahead and renamed all the use sites over to use `type_param` for consistency.
*	Fix confused definitions of pre/post increment/decrement (#827)	Tim Foley	2019-02-05
\| \| \| \| \| \| \| \| \| \|	* Fix confused definitions of pre/post increment/decrement We somehow have been compiling `++i` as `i++` (and vice versa) for a long time without noticing. This change fixes the implementation of these pseudo-ops in IR codegen, and adds a comment to explain the rationale for why their definitions should be what they are. * fixup: typo
*	Feature/casting tidyup (#822)	jsmall-nvidia	2019-02-04
\| \| \| \| \| \| \| \| \| \|	* Use 'is' over 'as' where appropriate. * dynamic_cast -> dynamicCast * Replace 'dynamicCast' with 'as' where has no change in behavior/ambiguity. * Replace dynamicCast with as where doesn't change behavior/non ambiguous.
*	Feature/as refactor (#817)	jsmall-nvidia	2019-01-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Made dynamicCast a free function. * Replace As with as or dynamicCast depending on if it is a type. * Fix problem with using non smart pointer cast. * Removed legacy asXXXX methods. * Remove As from Type. * Removed As from Qual type -> made coercable into Type, such that can just use free 'as'. Remove left over QualType::As() impl. * Remove As from SyntaxNodeBase. * Made as for instructions implemented by dynamicCast. * Replace As on DeclRef. Use the global as<> to do the cast. * Add const safe versions of dynamicCast and as for IRInst
*	Fixing IR-lowering not properly registering func decl	Yong He	2019-01-30
\|
*	Fix IR emit logic for methods in `struct` types (#791)	Tim Foley	2019-01-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There was a bug in the logic for emitting initial IR, such that it was neglecting to emit "methods" (member functions) unless they were also referenced by a non-member (global) function, or were needed to satisfy an interface requirement. This would only matter for `import`ed modules, since for non-`import`ed code, anything relevant would be referenced by the entry point so that the problem would never surface. This change fixes the underlying problem by adding a step to the IR lowering pass called `ensureAllDeclsRec` that makes sure that not only global-scope declarations, but also anything nested under a `struct` type gets emitted to the initial IR module. There are also a few unrelated fixes in this PR, which are things I ran into while making the fix: * Deleted support for the (long gone) `IRDeclRef` type in our `slang.natvis` file * Added support for visualizing the value of IR string and integer literals when they appear in the debugger * Fixed IR dumping logic to not skip emitting `struct` and `interface` instructions. Switching those to inherit from `IRType` accidentally affected how they get printed in IR dumps by default. * Fixed up the IR linking logic so that it correctly takes `[export]` decorations into account, so that an exported definition will always be taken over any other (unless the latter is more specialized for the target). I initially implemented this in an attempt to fix the original issue, but found it wasn't a fix for the root cause. It is still a better approach than what was implemented previously, so I'm leaving it in place.
*	Clean up variable declaration class hierarchy (#787)	Tim Foley	2019-01-22
\| \| \| \| \| \| \| \| \| \| \| \|	The AST class hierarchy for variable declarations had a few messy bits. First, there are more subclasses of `VarDeclBase` than seem strictly necessary; especially for stuff like `struct` member variable which use `StructField` even for `static` fields (which are effectively globals). Second, the AST node type for the "cases" within an `enum` was made a subclass of `VarDeclBase` for expediency, but this isn't really semantically accurate (and doesn't seem to be paying off much in deduplication of code). This change tries to address both of those problems. First, we replace the existing `Variable` and `StructField` cases with a single `VarDecl` case that covers globals, locals, and member variables. I haven't gone so far as to replace function parameters or generic value parameters, but that might be worth considering as a further clean-up. Second, we change `EnumCaseDecl` to inherit directly from `Decl` instead of `VarDeclBase` and add an explicit case for handling them where they were previously handled as if they were variable declarations (this was done by manually surveying all locations in the code that referenced `VarDeclBase`).
*	Initial support for dynamic dispatch using "tagged union" types (#772)	Tim Foley	2019-01-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Initial support for dynamic dispatch using "tagged union" types Suppose a user declares some generic shader code, like the following: ```hlsl interface IFrobnicator { ... } type_param T : IFrobincator; ParameterBlock<T : IFrobnicator> gFrobnicator; ... gFrobincator.frobnicate(value); ``` and then they have some concrete implementations of the required interface: ```hlsl struct A : IFrobnicator { ... } struct B : IFrobnicator { ... } ``` The current Slang compiler allows them to generate distinct compiled kernels for the case of `T=A` and the case of `T=B`. This means that the decision of which implementation to use must be made at or before the time when a shader gets bound in the application. This change adds a new ability where the Slang compiler can generate code to handle the case where `T` might be either `A` or `B`, and which case it is will be determined dynamically at runtime. This means a single compiled kernel can handle both cases, and the decision about which code path to run can be made any time before the shader executes. This new option is supported by defining a tagged union type. Via the API, the user specifies that `T` should be specialized to `__TaggedUnion(A,B)` (the double underscore indicates that this is an experimental and unsupported feature at present). We refer to the types `A` and `B` here as the "case" types of the tagged union. Conceptually, the compiler synthesizes a type something like: ```hlsl struct TU { union { A a; B b; } payload; uint tag; } ``` The user can then allocate a constant buffer to hold their tagged union type, and when they pick a concrete type to use (say `B`), they fill in the first `sizeof(B)` bytes of their buffer with data describing a `B` instance, and then set the `tag` field to the appopriate 0-based index of the case type they chose (in this case the `B` case gets the tag value `1`). Actually implementing tagged unions takes a few main steps: * Type parsing was extended to special-case `__TaggedUnion` as a contextual keyword. This is really only intended to be used when parsing types from the API or command-line, and Bad Things are likely to happen if a user ever puts it directly in their code. Eventually construction of tagged unions should be an API feature and not part of the language syntax. * Semantic checking was extended to recognize that a tagged union like `__TaggedUnion(A,B)` shoud support an interface like `IFrobnicator` whenever all of the case types suport it, as long as the interface is "safe" for use with tagged unions (which means it doesn't use a few of the advancd langauge features like associated types). * The IR was extended with instructions to represent tagged union types and to extract their tag and the payload for the different cases as needed. * IR generation was extended to synthesize implementations of interface methods for any interface that a tagged union needs to support. Right now the implementation is simplistic and only handles simple method requirements, which it does by emitting a `switch` instruction to pick between the different cases. * A new IR pass was introduced to "desugar" any tagged union types used in the code. The downstream HLSL and GLSL compilers don't support `union`s, so we have to instead emit a tagged union as a "bag of bits" and implement loading the data for particular cases from it manually. * Final code emit mostly Just Works after the above steps, but we had to introduce an explicit IR instruction for bit-casting to handle the output of the desugaring pass. There are a bunch of gaps and caveats in this implementation, but that seems reasonable for something that is an experimental feature. The various `TODO` comments and assertion failures in unimplemented cases are intended, so that this work can be checked in even if it isn't feature-complete. * fixup: missing files * fixup: typos
*	Improve handling of {} initializer list expressions (#778)	Tim Foley	2019-01-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes #775 It was reported (in #775) that Slang doesn't handle initializer-list syntax when initializing matrix variables. When starting on a fix for that it became apparent that the time was right to fix two broad issues in the compiler's current handling of `{}`-enclosed initializer lists. The first issue was that the front-end checking of initializer lists wasn't handling the C-style behavior where an initializer list can either contain nested `{}`-enclosed lists for sub-arrays/-structures, or directly contain "leaf" values for initializing those aggregates. For example, the following two variable declarations ought to be equivalent: ```hlsl int4 a[] = { {1, 2, 3, 4}, {5, 6, 7, 8} }; int4 b[] = { 1, 2, 3, 4, 5, 6, 7, 8 }; ``` Getting this distinction right is important because we want to support initializing a matrix either from a list of vectors for its rows, or a list of scalars for its elements (in row-major order). The front-end semantic checking logic for initializer lists was revamped so that it conceptually tries to "read" an expression of a desired type from the initializer list, and decides at each step whether to consume a single expression by coercing it to the desired type, or to recursively read multiple sub-values to construct the type as an aggregate. The logic for deciding between direct vs aggregate initialization could potentially use some tweaking, but luckily it should always handle the case where users introduce explicit `{}`-enclosed sub-lists to make their intention clear, so that existing Slang code should continue to work as before. The second issue was that initializers without the expected number of elements weren't implemented in code generation, so they would lead to internal compiler errors. This change revamps the codegen logic for initializer lists so that it can synthesize default values for fields/elements that were left out during initialization. This includes an attempt to support default initialization of `struct` fields based on explicitly written initialization expressions.
*	Add proper IR codegen support for local static const variables (#779)	Tim Foley	2019-01-16
\| \| \| \| \| \| \| \| \|	Previously the IR codegen logic was treating function-scope `static const` variables just like `static` variables, which results in them generating less efficient output HLSL/GLSL. This change special-cases function-local `static const` variables with logic that mirrors how we handle global-scope `static const` variables. The approach in this change attempts to find a simpler solution to deal with `static const` variables inside of generic functions than what is currently done for `static` variables in generic functions, but I haven't tested whether that works in practice, so I didn't apply the same approach to the plain `static` case. That would make a good follow-on change. I've included a single test case to demonstrate that with this fix the Slang compiler generates output DXBC that uses an indexable "immediate" constant buffer, whereas without the fix it generates an array in local memory (slow).
*	Feature/lex memory reduction (#762)	jsmall-nvidia	2018-12-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Only do scrubbing if needed. When allocating content try to limit size (with scrubbing each token takes up 1k), now it's 16 bytes min size. * Don't allocate for every call to write on the CallbackWriter - use the m_appendBuffer. * Don't allocate memory for CallbackWriter use m_appendBuffer. * Use UnownedStringSlice for suffix output for parsing float/int literals. Fix typo in invalidFloatingPointLiteralSuffix * Using memory arena to hold tokens that are not in SourceManager. * Improve comment on lexing. * Make UnownedStringSlice allocation simpler on SourceManager. * Fix error on gcc around UnownedStringSlice - because VC converted string + UnownedStringSlice automatically into a String. * Fix generateName needing concat string for gcc. * When constructing a Token in parseAttributeName - because it's a Identifier, we have to set the Name. * Remove translation through String on getIntrinsicOp * Make func-cbuffer-param disablable with -exclude compatibility-issue * Move memory leak in render-test. * From review - can just use "?:" instead of performing a concat.
*	First step toward supporting use of interfaces as existential types (#716)	Tim Foley	2018-12-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First step toward supporting use of interfaces as existential types Traditional generics involve universal quantification. E.g., a declaration like: ``` void drive<T : IVehicle>(T vehicle); ``` indicates for for all types `T` that implement the `IVehicle` interface, the `drive()` function is available. In contrast, whend directly using an interface type like: ``` IVehicle v = ...; v.doSomething(); ``` we only know that there exists some concrete type (we could call it `E`) such that `v` refers to a value of type `E`, and `E` implements the `IVehicle` interface. In order to perform an operation like `v.doSomething()` we need to "open" the existential value so that we can look at the concrete type and how it implements the `IVehicle.doSomething` requirement. This change adds a very explicit representation of existentials to Slang's IR. An operation like `e = makeExistential(v, w)` creates a value of some existential type (interfaces being our only existential types for now), by wrapping a concrete value `v` (the type of `v` can be seen as an implicit operand) and a witness table `w` showing that the type of `v` implements the requirements of the chosen interface type. In turn, opening of an existential is handled with operations `extractExistential{Value\|Type\|WitnessTable}` which pull the corresponding piece of information out of a value of existential type (which somewhere in the code had to have been created with `makeExistential`). The change includes a trivial simplification pass that can detect cases where an `extractExistential` operation is applied direclty to a `makeExistential` operation, so that there is only one possible result that could be extracted. This allows for simplification of existential types used in trivial ways for local variables (this is mostly so I can check in a functional test, rather than to actually support useful code involving interfaces right now). The logic in the semantic checking phase of the compiler is comparatively more complex. When we are about to perform member lookup given an expression like `obj.member` we will first check if `obj` has an existential type, and if it does we will construct a suitable local context in which we extract the value, type, and witness table from the existential (these all become explicit AST expression nodes), and then use the extracted value as the base of the lookup operation. The nature of existential values is that two different values with the same existential (interface) type could wrap concrete values with differnt types, so that we need to carefully refer only to the extracted type/value/witness-table of specific values. We handle this right now by conceptually moving the existential-type value into a local variable (by introducing a `LetExpr` that amounts to `let v = <init> in <body>`) and then require that the extract expressions must refer to the (immutable) variable declaration from which they are extracting a value. (Eventually we should expand this so that when using an immutable local variable of existential type we just use that variable as-is rather than introduce a new temporary) A simple test case is included that uses an interface type in an almost trivial way for a local variable; this test can be run and produces the expected results. A more complex test case that passes an existential into a function is included, but left disabled because a more aggressive simplification approach is required to generate working code from it. Add missing file for expected test output * Fixups for merge from top-of-tree
*	Specialize away resource-type function parameters (#759)	Tim Foley	2018-12-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Specialize away resource-type function parameters Work on #397. Introduction ------------ Suppose a user writes a function that takes a resource type as a parameter: ```hlsl float4 getThing(RWStructuredBuffer<float4> buffer, int index) { return buffer[index]; } ``` This function creates challenges when generating code for GLSL-based targets, because a global shader parameter of type `RWStructuredBuffer`: ```hlsl RWStructuredBuffer<float4> gBuffer; ``` translates to a global GLSL `buffer` declaration: ```hlsl buffer _S0 { float4 _data[]; } gBuffer; ``` There is no equivalent to that `buffer` declaration that can be used in function parameter position, and it is illegal in GLSL to pass `gBuffer` into a function. (Aside: yes, we could in principle translate a function parameter like `RWStructuredBuffer<float4> buffer` to `float4 buffer[]`, but that will not in turn generalize to arrays of structured buffers; it is a dead-end strategy) The solution employed by many shader compilers is to "inline everything" to eliminate the need for parameters of resource types, and then rely on dataflow optimization to eliminate locals of resource types. This strategy can of course lead to an increase in code size, and it also means that call stacks are lost when doing step-through debugging. Another serious issue is that an "early `return`" from a function can turn into the equivalent of a multi-level `break` when inlined, and not all of our targets support multi-level `break`. The solution implemented in this change works around some, but not all, of the problems with full inlining. The approach here generates specialized versions of a function like `getThing`, adapted to the actual arguments provided at different call sites. Thus if we have code like: ```hlsl RWStructuredBuffer<float4> gA; RWStructuredBuffer<float4> gB[10]; ... getThing(gA, x); getThing(gA, y); getThing(gB[someVal], z); ``` we will generate two specializations of `getThing`: one specialized for the `buffer` parameter being `gA` and the other for `gB`: ```hlsl float4 getThing_gA(int index) { return gA[index]; } float4 getThing_gB(int _val, int index) { return gB[_val][index]; } ``` and the call sites will change to match: ```hlsl getThing_gA(x); getThing_gA(y); getThing_gB(someVal, z); ``` Note how in the case where the argument being passed in was obtained by indexing into an array of resources, the callee is specialized to the identity of the global shader parameter (`gB`), and now accepts a new parameter to indicate the array index into it. While this description motivates the change based on GLSL output, the same basic issue can arise for other targets. For example, while current HLSL has added the `ConstantBuffer<T>` type, it is not supported on older targets, and it turns out that even dxc does not allow functions to have `ConstantBuffer<T>` parameters. Longer-term, we will likely need to do even more aggressive specialization both in order to generate SPIR-V output directly, and also to deal with function that have return values or `out` parameters of resource types. Implementation -------------- The meat of the change is in `ir-specialize-resources.{h,cpp}`, where we have a pass that looks at all call sites (`IRCall` instructions) in the program, and attempts to replace them with calls to specialized functions, where the specializations are generated on-demand. The code in this pass is heavily commented, so hopefully it serves to explain itself all right. After specialization is complete, we may still have functions like the original `getThing` that will produce invalid code when emitted as GLSL, so we need a way to make sure they don't appear in the output. To date we've had some very ad hoc approaches for ignoring IR constructs that we don't want to affect emitted code, but this change goes ahead and adds a more real dead code elimination (DCE) pass in `ir-dce.{h,cpp}`. This pass follows a straightforward approach of tagging instructions that are "live" and then propagating liveness through the whole program, before making a single pass to delete anything that isn't live. When I first added the DCE pass it eliminated everything because there were no "roots" for liveness. I solved this for now by adding a new decoration, `IREntryPointDecoration`, to mark shader entry points in the IR which should always be live (as should anything they depend on). A secondary problem that arose was that for GLSL ray tracing shaders it is possible for the incoming/outgoing payload or attributes parameters to be unused, but eliminating them as dead would change the signature of a shader an potential break the rules for how ray tracing programs communicate. I added a very simple `IRDependsOnDecoration` that allows one IR instruction to keep another alive as if it used it, without actually using it. There's also a fixup in the IR dumping logic where I was forgetting to store anything in the mapping from instruction to their names, so that the name of an instruction was getting incremented each time it was referenced. Testing ------- There are three different tests added as part of this change: * The `compute/func-resource-param` test covers the basic `RWStructuredBuffer` case above, which we expect to work fine for D3D11/12, but fail for Vulkan without specialization. * The `cross-compile/func-resource-param-array` test covers the case where we don't just have one resource, but an array of them. This is not an end-to-end compute test primarily because our `render-test` application doesn't yet handle arrays of resources correctly in its binding logic. * The `compute/func-cbuffer-param` test covers the case of a function with a `ConstantBuffer<T>` parameter, which requires specialization to become valid for any of our targets. * fixup: warnings/errors from other compilers * fixup: typos and cleanup * fixup: typos
*	Represent global shader parameters explicitly in the IR (#756)	Tim Foley	2018-12-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this change, global shader parameters were represented in the IR as just being ordinary global variables. The only indication that a particular global represented a parameter was when it got a layotu attached to it (as part of back-end processing), and we've had a number of bugs related to layouts being dropped so that what should have been a shader parameter turned into an ordinary global variable in the output. This change is more strongly motivated by the fact that making shader parameters look like globals means that we cannot easily reason about their value when doing IR transformations. If we see two `load`s from the same global variable can we assume they yield the same value? In the general case we cannot, and this means that any transformation that wants to rely on the fact that an input `Texture2D` shader parameter can't actually change over the life of the program needs to do extra work. The fix here is to introduce a new kind of IR instruction that represents a global shader parameter directly (not a pointer to it as a global would), at which point there isn't even such a notion as a "load" from the parameter, since it represents the value directly. In several cases logic that used to apply to global variables in case they were shader parameters (by looking for a layout) is now moved to apply to these global parameters. The biggest source of issues in this change was that switching from pointers to plain values to represent these shader parameters stresses different cases in type legalization. I also had to deal with the case of legalization for GLSL where we actually do need global shader parameters that are writable (since varying output goes in the global scope), but in that case I borrowed the use of pointer-like `Out<...>` and `InOut<...>` types to represent that intent, which we were already using for function parameters representing outputs. A few tests started failing because the changes lead to a slightly different order of code emission, which in some HLSL tests resulted in a function parameter named `s` getting emitted before a global parameter named `s`, leading to the latter getting the name `s_1` instead of `s_0`. A few SPIR-V tests started failing because the new approach means that we no longer end up performing a load from all varying input parameters at the start of `main` and instead reference the varying inputs directly. The resulting code is more idomatic, but it differed from the baselines for those tests.
*	Move mangled name out of IRGlobalValue (#752)	Tim Foley	2018-12-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Move mangled name out of IRGlobalValue Previously the `IRGlobalValue` type was used as a root for all IR instructions that can have "linkage," in the sense that a definition in one module can satisfy a use in another module. The mangled symbol name was stored in state directly on each `IRGlobalValue`, which created some complications, and also forced IR instructions that wanted to support linkage to wedge into the hierarchy at that specific point. This change moves the mangled name out into a decoration: either an `IRImportDecoration` or an `IRExportDecoration`, both of which inherit from `IRLinkageDecoration` which exposes the mangled name. This change has a few benefits: * We can now have any kind of instruction be exported/imported, without having to inherit from `IRGlobalValue`. This could potentially let `IRStructType` and `IRWitnessTable` be simplified to just have operand lists instead of dummy chldren as they do today. * We can now easily have "global values" like functions that explicitly don't get linkage, instead of using a null or empty mangled name as a marker. * We can use the exact opcode on a linkage decoration to distinguish imports from exports, which could be used to more accurately resolve symbols during the linking step. Other than adding the decorations and making sure that AST->IR lowering adds them, the main changes here are around any code that used `IRGlobalValue`. Variables and parameters of type `IRGlobalValue` were changed to `IRInst` easily, so the main challenge was around code that casts to `IRGlobalValue. In cases where a cast to `IRGlobalValue` also performed a test for the mangled name being non-null/non-empty, we simply switched the code to check for the presence of an `IRLinkageDecoration`, since that is the new way of indicating a value with linakge. Most of the serious complications arose in `ir.cpp` around the "linking"/target-specialization and generic specialization steps. The "linking" logic was checking for `IRGlobalValue` to opt into some more complicated cloning logic, and just checking for a linkage decoration here wasn't sufficient since the front-end does* produce global values without linkage in some cases (e.g., for a function-`static` variable we produce a global variable without linkage). This logic was updated to just check for the cases that used to amount to `IRGlobalValue`s directly by opcode. It might be simpler in the short term to have kept `IRGlobalValue` around to make the existing casts Just Work, but I'm confident that this logic could actually be rewritten for much greater clarity and simplicity and that is the better way forward. The generic specialization logic was using some really messy code to generate a new mangled name to represent the specialized symbol, and then searching for an existing match for that name. The original idea there was that an IR module could include "pre-specialized" versions of certain generics to speed up back-end compilation by eliminating the need to specialize in some cases, but this feature has never been implemented so the overhead here is just a waste. Instead, I moved generic specialization to use a simpler dictionary to map the operands to a `specialize` instruction over to the resulting specialized value. This allows for some simplifications in the name mangling logic, because it no longer needs to figure out how to produce mangled names from IR instructions representing types/values. As part of this change I also overhauled the IR emit logic to produce cleaner output by default, borrowing some of the ideas from the logic in `emit.cpp`. IR values are now automatically given names based on their "name hint" decoration, if any, to make the code easier to follow, and I also made it so that types and literals get collapsed into their use sites in a new "simplified" IR dump mode (which is currently the default, with no way to opt into the other mode without tweaking the code). The resulting IR dumps are much nicer to look at, but as a result the one test that involves IR dumping (`ir/string-literal`) doesn't really test what it used to. One weird issue that came up during testing is that the `transitive-interface` test had previously been producing output that made no sense (that is, the expected output file wasn't really sensible), and somehow these changes were altering its behavior. Changing the test to use `int` values instead of `float` was enough to make the output be what I'd expect, and hand inspection of generating DXBC has me convinced we were compiling the `float` case correctly too. There appears to be some issue around tests with floating-point outputs that we should investigate. * fixup: C++ declaration order
*	Running tests in slang-test process (#740)	jsmall-nvidia	2018-12-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First pass at having an interface to write text to that can be replaced. Simplifed and made more rigerous the interface used to write formatted strings. * Added AppContext to simplify setting up and parsing around of streams. * Added more simplified way to get the std error/out from AppContext. * Work in progress using dll for tools to speed up testing. * First pass at ISlangWriter interface. * Added support for writing VaArgs. Added NullWriter. * Use ISlangWriter for output. * Use ISlangWriter for output - replacing OutputCallback. Make IRDump go to ISlangWriter * SlangWriterTargetType -> SlangWriterChannel Improvements around AppContext * Shared library working with slang-reflection-test. * Dll testing working for render-test. * Include va_list definintion from header. * Fix errors from clang. * Fix typo for linux. * Added -usexes option * Fix typo. * Fix arguments problem on linux. * Fix typo for linux. * Add windows tool shared library projects. * Fix warning from x86 win build. Fix signed warning from slang-test/main.cpp * First attempt at getting premake to work on travis, and run tests. * Try moving build out into script. * Invoke bash scripts so they don't have to be executable. * Drive configuration/tests from env parameters set by travis * Try using source to run travis tests. * Remove the build.linux directory - but doing so will overwrite Makefile. * Made -fno-delete-null-pointer-checks gcc only. * Try to fix warning from -fno-delete-null-pointer-checks * Turn of warnings for unknown switches. * Try to make premake choose the correct tooling. * Disabled missing braces warning. * Disable -Wundefined-var-template on clang. * -Wunused-function disabled for clang. * Fix typo due to SlangBool. * Remove this nullptr tests. * "-Wno-unused-private-field" for clang. * Added "-Wno-undefined-bool-conversion" * Add DominatorList::end fix. * Split scripts into travis_build.sh travis_test.sh * Fix gcc/clang template pre-declaration issue around QualType. * Fix premake to build such that pthread correctly links with slang-glslang
*	Decorations are instructions (#748)	Tim Foley	2018-12-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Make a test case use IR serialization * Make all IR instructions usable as parents This makes it so that every `IRInst` has the list of children that used to be on `IRParentInst` and eliminates `IRParentInst`. Most places in the code were only checking against `IRParentInst` so that they could know whether there were child instructions to iterate over. This change bloats the size of every instruction by two pointers, but we hope to be able to eliminate that overhead with a better encoding later. * Change IR decorations to be instructions. The main change here is that `IRDecoration` now inherits from `IRInst`, and `IRInst` now has a single linked list that holds both decorations and children. At each point where code used to loop over `getChildren()` on an `IRInst`, I checked whether it made sense to leave the operation as processing just the children, or if it should process both decorations and children. The thorniest bit was making sure the logic for inserting an instruction into a parent is correct. For the most part, once IR code is built all insertions are explicitly before/after another instruction, so the ordering can't get messed up. The sticking point is any code that does an explicit `insertAtStart` or `insertAtEnd`, but I surveyed those to make sure they are correct in context, and I also made all insertions bottleneck through one routine that does a better job of asserting the preconditions than what was there before. We may still want a "smart" insertion function at some point so that if somebody does `someDecoration->insertAtEnd(someInst)` the decoration intelligently goes to the end of the decoration list, and not the entire decorations-and-children list. All of the existing decoration types were refactored to provide accessors for their operands, rather than directly exposing fields. In most cases the operands are required to be `IRConstant` nodes of fixed types. Not all of these types need to be kept around in the new approach, but they were left in so that as much existing code as possible can be kept working. The `IRBuilder` was extended with factory functions to make the various decoration types and attach them. All the fields in concrete decorations that were using `StringRepresentation` or `Name` pointers are now using IR-level string operands which provide their value as an `UnownedStringSlice`, so logic that was working with those decoration values needed to be updated here and there. I also needed to add the logic to clone string-literal values to the IR cloning pass, since they are now being used in almost every piece of code. A new type of constant IR instruction for literal pointers was added, to handle the cases where an IR decoration needs an operand that is a raw AST-level pointer. These are even being serialized, although we obviously should not rely on them to round-trip through serialization in the future. Ideally, a follow-on change should add a cleanup pass where we remove any decorations from a module that shouldn't be allowed in the serialized code. The biggest overall cleanup is in the serialization logic, where a lot of code just disappears because it can process the raw "decorations and children" list as the logical children of an IR instruction. The only special cases left are literals (which seem like they will always need special-casing) and global values (because they have a mangled name, which we plan to move into a decoration). One other example of a simplification made possible by this change: the `IRNotePatchConstantFunc` instruction was implemented as an instruction only because it couldn't be encoded as a decoration at the time (it needed to have an operand that referenced an IR function). The IR dumping logic was also updated (which meant a change to the `ir/string-literal` test) to try to make it print out all decorations a bit more systematically now that they are encoded like other instructions. The formatting isn't quite perfect, but it is good enough to be able to read what is going on. I didn't include updates to the validation logic to ensure that decorations are being added in ways that follow the invariants, but that would be a nice thing to add next. * fixup: 64-bit issues * fixup: forward declaration issues
*	Add support for globallycoherent modifier (#732)	Tim Foley	2018-11-29
\| \| \| \| \| \| \| \| \|	The `globallycoherent` modifier indicates that resource might be read or written by threads outside of the current thread group, so that any memory barriers that affect it should guarantee coherency at the global memory scope, and not just thread-group scope. The equivalent GLSL modifier appears to be `coherent`. This change adds the front-end modifier, transforms it into an IR-level decoration during lowering, and then checks for the modifier during code emit. Note: this logic may not behave correctly when `globallycoherent` is added to a field in a `struct`, since the modifier would then need to be propagated to any variables created during type legalization. Checking up on that is left to future work. Note: it isn't entirely clear if `globallycoherent` should be treated as a declaration modifier or a type modifier. The point is moot for now because Slang doesn't have any support for type modifiers, but when we get around to that we will need to make a decision.
*	Feature/early depth stencil (#727)	jsmall-nvidia	2018-11-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First pass support for early depth stencil. * Add a simple test to check if output has attributes. * Use cross compilation to test [earlydepthstencil] on glsl. * If target is dxil, use dxc to test against. Add hlsl to test earlydepthstencil against. * * Added spSessionHasCompileTargetSupport * Made slang-test use spSessionHasCompileTargetSupport to ignore tests that cannot run
*	Add callable shader support for Vulkan ray tracing (#718)	Tim Foley	2018-11-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add callable shader support for Vulkan ray tracing This change extends the previous work to update Vulkan ray tracing support for the finished `GL_NV_ray_tracing` spec. One of the features missing in the experimental extension that was added to the final spec is "callable shaders," which allow ray tracing shaders to call other shaders as general-purpose subroutines. Most of the implementation work here mirrors what was done for the `TraceRay()` function to map it to `traceNV()`. We map the generic `CallShader<P>` function to the non-generic `executeCallableNV`, with a payload identifier that indicates a specific global variable of type `P` (the global variable being generated from a `static` local in `CallShader`). A new modifier is added to identify the payload structure, and the parameter binding/layout logic introduces a new resource kind for callable-shader payload data (where previously the logic had assumed ray and callable payloads should use the same resource kind). Two test shaders are included: one for the callable shader (`callable.slang`) and one for a ray generation shader that calls it (`callable-caller.slang`). Just for kicks, the payload data type is defined in a shared file so that we can be sure the two agree (trying to emulate what might be good practice, and ensure that ray tracing support works together with other Slang mechanisms). * Typo fix: assocaited->associated One instance was found in review, but I went ahead and fixed a bunch since I seem to make this typo a lot. * Typo fix: defintiion->definition
*	Update Vulkan ray tracing support to final extension spec (#717)	Tim Foley	2018-11-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Update version of glslang used * Update VK raytracing support for final extension spec A lot of this change is just plain renaming: The `NVX` suffixes become just `NV`, and the extension name changes from `GL_NVX_raytracing` to `GL_NV_ray_tracing`. The Slang standard library and the GLSL baselines for the tests are consistently updated. The other detail is that the final spec requires the "payload" identifier in a `traceNV()` call to be a compile-time constant, which means it cannot be defined as a local variable first, as in: ```glsl int payloadID = 0; traceNV(..., payloadID); // ERROR ``` In terms of how the original support was implemented, the payload ID is being computed via a special builtin function that maps each global GLSL payload variable to a unique ID. There are a few ways we could try to resolve the problem here: 1. We could aspire to put our equivalent of the `constexpr` modifier on the output of the function, so that the GLSL variable gets declared `const` and thus fits the GLSL rules for a constant expression. 2. We could introduce a pass to replace the payload-location instructions with literal integers. 3. We could use a special-purpose instruction instead of a builtin function call, and have that instruction indicate that it doesn't have side effects (so it can be folded into the call site) 4. We could somehow mark the builtin function as not having side effects. We choose option (4) simply because it provides a feature that could have other applications. This change adds a `[__readNone]` attribute that can be applied to function declarations to express a promise on the part of the programmer that the given function has no side effects and computes its result strictly from the bits of its input arguments (and not things they point to, etc.). This mirrors an equivalent function attribute in LLVM. We mark the function that computes a ray payload location with this attribute, and propagate the attribute through the layers of the IR, so that when the emit logic asks if an operation has side effects (to see if it can be folded into the arguments of a subsequent expression), we get an affirmative response. This change should get all of the features that were present in the experiemntal `NVX` extension working with the final extension spec. It does not address callable shaders, which will come as a subsequent change.
*	Fix a crash on function-static variables with initializers (#703)	Tim Foley	2018-10-30
\| \| \| \| \|	This code path hadn't been used, and it had a crash due to not inserting the basic blocks it created (for initializing the variable) into the parent function. The fix adds a bit more smarts to the `IRBuilder` to help with inserting basic blocks into the flow of a function. The actual user issue was around `static const` declarations, and it is clear that the code is incorrectly treating a function local `static const` as if it were just `static`. That will need to be fixed in another change.
*	Feature/premake linux (#689)	jsmall-nvidia	2018-10-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Premake work in progress for linux. * Added dump function. * Remove examples on linux Small warning fix. * * Don't build render-test on linux * Removed work around virtual destructor warning, and just used virtual dtor for simplicity * Git ignore obj directories * Fix premake working on windows. * * Fix sprintf_s functions * Make generates arg parsing more robust * Added FloatIntUnion to avoid type punning/strong aliasing issues, and repeated union definitions. * Work around problems building on linux with getClass claiming a strict aliasing issue. * Fix for targetBlock appearing potentiall used unintialized to gcc. * Linux slang link options -fPIC to make dll. * Add -fPIC to build options on linux. * Add -ldl for linux on slang. * Fixes to try and get premake working with .so on linux. * Make core compile with -fPIC * Try to fix linux linking with --no-as-needed before -ldl * Add rpath back. * Remove render-gl from linux build. * Re-add location for linux. * Don't include <malloc.h> except on windows. * Remove unused line to fix warning on osx. * Remove ambiguity on OSX for operator <<. * Fixing ambiguity with operator overloading and Int types for OSX. * Fix ambiguity around UInt and operator * Fix ambiguity of UInt conversion for OSX. * Added UnambiguousInt and UnambiguousUInt to make it easier to work around OSX integer coercion for UInt/Int types.
*	Osx build fixes (#681)	Matt Pharr	2018-10-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Remove 'register' qualifiers. These will be illegal come c++17 and give a warning on OSX. * Add UNREACHABLE_RETURNs to silence compiler warnings. * Make FileStream::GetPosition() compile on OSX (w.r.t. the linux build, I believe that strictly-speaking, fpos64_t is specified as an opaque type and the cast to an Int64 is not necessarily well-defined.) * Avoid an inadvertent trigraph.
*	Vulkan implicit sampler fixups (#686)	Tim Foley	2018-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix sampler-less texture functions (#685) * Fix sampler-less texeture functions I'm honestly not sure how the original work on this feature in #648 worked at all (probably insufficient testing). We have these front-end modifiers to indicate that a particular function definition requires a certain GLSL version, or a GLSL extension in order to be used, and they are supposed to be automatically employed by the logic in `emit.cpp` to output `#extension` lines in the output GLSL. However, it turns out that nothing is actually wired up right now, so that adding the modifiers to a declaration is a placebo. This change propagates the modifiers through as decorations, and then uses them during GLSL code emit, which allows the functions that require `EXT_samplerless_texture_functions` to work. * fixup: 32-bit warning * Add serialization support for GLSL extension/version decorations
*	Add support for static methods in interfaces (#680)	Tim Foley	2018-10-18
\| \| \| \| \| \| \|	This change allows an interface to include `static` methods as requirements, so that types that conform to the interface will need to satisfy the requirement with a `static` method. The essence of the check is simple: when checking that a method satisfies a requirement, we enforce that both are `static` or both are non-`static`. Making that simple change and adding a test change broke a few other places in the compiler that this change tries to fix. The main fix is to handle cases where we might look up an "effectively static" member of a type through an instance, and to make sure that we replace the instance-based lookup with type-based lookup. There was already logic along these lines in `lower-to-ir.cpp`, so this change centralizes it in `check.cpp` where it seems to logically belong.
*	Add a warning on missing return, and initial SCCP pass (#671)	Tim Foley	2018-10-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add a warning on missing return, and initial SCCP pass The user-visible feature added here is a diagnostic for functions with non-`void` return type where control flow might fall off the end. This sounds like a trivial diagnostic to add as part of the front-end AST checking, but that can run afoul of really basic stuff like: ```hlsl int thisFunctionisOkay(int a) { while(true) { if(a > 10) return a; a = a2 + 1; } // no return here! } ``` This function "obviously" doesn't need to have a `return` statement at the end there, but realizing this fact relies on the compiler to understand that the `while(true)` loop can't exit normally, and doesn't contain any `break` statement. One can write "obvious" examples that need more and more complex analysis to rule out. The answer Slang uses for stuff like this is to do the analysis at the IR level right after initial code generation (this would be before serialization, BTW, so that attached `IRHighLevelDeclDecoration`s can be used). When lowering the AST to the IR, we always emit a `missingReturn` instruction (a subtype of `IRUnreachable`) at the end of its body if it isn't already terminated. The IR analysis pass to detect missing `return` statements is then as simple as just walking through all the functions in the module and making sure they don't contain `missingReturn` instructions. For that simple pass to work, we first need to make some effort to remove dead blocks that control flow can never reach. This change adds a very basic initial implementation of Spare Conditional Constant Propagation (SCCP), which is a well-known SSA optimization that combines constant propagation over SSA form with dead code elimination over a CFG to achieve optimizations that are not possible with either optimization along. For the moment, we don't actually implement any constant folding* as part of the SCCP pass, so we can eliminate the dead block in a case like the function above (and those in the test case added in this change), but will not catch things like a `while(0 < 1)` loop. Handling more "obvious" cases like that is left for future work. * fixup: warning on unreachable code * Handle case where user of an inst isn't in same function/code The code as assuming any instruction in the SSA work list has to come from the function/code being processed, but this misses the case where an instruction in a generic has a use inside the function that the generic produces. This change adds code to guard against that case.
*	Add basic support for [mutating] methods (#667)	Tim Foley	2018-10-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By default, when writing a "method" (aka "member function") in Slang, the `this` parameter is implicitly an `in` parameter. So this: ```hlsl struct Foo { int state; int getState() { return state; } void setState(int s) { state = s; } }; ``` is desugared into something like this: ```hlsl struct Foo { int state }; int Foo_getState(Foo this) { return this.state; } // BAD: void Foo_setState(Foo this, int s) { this.state = s; } ``` That "setter" doesn't really do what was intended. It modifies a local copy of type `Foo`, because `in` parameters in HLSL represent by-value copy-in semantics, and are mutable in the body the function. Slang was updated to give a static error on the original code to catch this kind of mistake (so that `this` parameters are unlike ordinary function parameters, and no longer mutable). Of course, sometimes users want a mutable `this` parameter. Rather than make a mutable `this` the default (there are arguments both for and against this), this change adds a new attribute `[mutating]` that can be put on a method (member function) to indicate that its `this` parameter should be an `in out` parameter: ```hlsl [mutating] void setState(int s) { state = s; } ``` The above will translate to, more or less: ```hlsl void Foo_setState(inout Foo this, int s) { this.state = s; } ``` One added detail is that `[mutating]` can also be used on interface requirements, with the same semantics. A `[mutating]` requirement can be satisfied with a `[mutating]` or non-`[mutating]` method, while a non-`[mutating]` requirement can't be satisfied with a `[mutating]` method (the call sites would not expect mutation to happen). The design of `[mutating]` here is heavily influenced by the equivalent `mutating` keyword in Swift. Notes on the implementation: * Adding the new attribute was straightforward using the existing support, but I had to change around where attributes get checked in the overall sequencing of static checks, because attributes were being checked after function bodies, but with this change I need to look at semantically-checked attributes to determine the mutability of `this` * The check to restrict it so that `[mutating]` methods cannot satisfy non-`[mutating]` requirements was easy to add, but it points out the fact that there is a huge TODO comment where the actual checking of method signatures is supposed to happen. That is a bug waiting to bite users and needs to be fixed! * While we had special-case logic to detect attempts to modify state accessed through an immutable `this` (e.g., `this.state = s`), that logic didn't trigger when the mutation happened through a function/operator call (e.g., `this.state += s`), so this change factors out the validation logic for that case and calls through to it from both the assignment and `out` argument cases. * The error message for the special-case check was updated to note that the user could apply `[mutating]` to their function declaration to get rid of the error. * The semantic checking logic for an explicit `this` expression was already walking up through the scopes (created during parsing) and looking for a scope that represents an outer type declaration that `this` might be referring to. We simply extend it to note when it passes through the scope for a function or similar declaration (`FunctionDeclBase`) and check for the `[mutating]` attribute. If the attribute is seen, it returns a mutable `this` expression, and otherwise leaves it immutable. * The IR lowering logic then needed to be updated so that when adding an IR-level parameter to represent `this`, it gives it the appropriate "direction" based on the attributes of the function declaration being lowered. The rest of the IR logic works as-is, because it will treat `this` just like an other parameter (whether it is `in` or `inout`). * This biggest chunk of work was the "implicit `this`" case, because ordinary name lookup may resolve an expression like `state` into `this.state`, so that the `this` expression comes out of "thin air." To handle this case, I extended the structure of the "breadcrumbs" that come along with a lookup result (the breadcrumbs are used for any case where a single identifier like `state` needs to be embellished to a more complex expression as a result of lookup), so that it can identify whether a `Breadcrumb::Kind::This` node comes from a `[mutating]` context or not. Similar to the logic for an explicit `this`, we handle this by noting when we pass through a `FunctionDeclBase` when moving up through scopes, and look for the `[mutating]` attribute on it. The rest of the work was just plumbing the additional state through.
*	Support cross-compilation of ray tracing shaders to Vulkan (#663)	Tim Foley	2018-10-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Move to newer glslang * Support cross-compilation of ray tracing shaders to Vulkan This change allows HLSL shaders authored for DirectX Raytracing (DXR) to be cross-compiled to run with the experimental `GL_NVX_raytracing` extension (aka "VKRay"). * The GLSL extension spec is marked as experimental, so that any shaders written using this support should be ready for breaking changes when the spec is finalized. * "Callable shaders" are not exposed throug the GLSL extension, so this feature of DXR will not be cross-compiled. * The experimental Vulkan raytracing extension does not have an equivalent to DXR's "local root signature" concept. This does not visibly impact shader translation (because the local/global root signature mapping is handled outside of the HLSL code), but in practice it means that applications which rely on local root signatures on their DXR path will not be able to use the translation in this change as-is; more work will be needed. The simplest part of the implementation was to go into the Slang standard library and start adding GLSL translations for the various DXR operations. In some cases, like mapping `IgnoreHit()` to `ignoreIntersectionNVX()` this is almost trivial. The various functions to query system-provided values (e.g., `RayTMin()`) were also easy, with the only gotcha being that they map to variables rather than function calls in GLSL, and our handling of `__target_intrinsic` assumes that a bare identifier represents a replacement function name, and not a full expression, so we have to wrap these definitions in parentheses. The tricky operations are then `TraceRay<P>()` and `ReportHit<A>()`, because these two are generics/templates in HLSL. GLSL doesn't support generics, even for "standard library" functions, so the raytracing extension implements a slightly complex workaround: the matching operations `traceNVX()` and `reportIntersectionNVX()` pass the payload/attributes argument data via a global variable. That is, shader code for the GLSL extensions writes to the global variable and then calls the intrinsic function. The linkage between the call site and the global is established by a modifier keyword (`rayPayloadNVX` and `hitAttributeNVX`, respectively) and in the case of ray payload also uses `location` number to identify which payload global to use (since a single shader can trace rays with multiple payload types). Our translation strategy in Slang tries to leverage standard language mechanisms instead of special-case logic. For example, to translate the `ReportHit<A>()` function, we provide both a default declaration that will work for HLSL (where the operation is built-in with the signature given), and a definition marked with the `__specialized_for_target(glsl)` modifier. The GLSL definition declares a function `static` variable that will fill the role of the required global, and then does what the GLSL spec requires: assigns to the global, and then calls the `reportIntersectionNVX` builtin (which we declare as a separate builtin). Our ordinary lowering process will turn that `static` variable into an ordinary global in the IR, and the `[__vulkanHitAttributes]` attribute on the variable will be emitted as `hitAttributeNVX` in the output. There is no additional cross-compilation logic in Slang specific to `ReportHit<A>()` - the target-specific definition in the standard library Just Works. The case for `TraceRay<P>()` is a bit more complicated, simply because the GLSL `traceNVX()` function needs to be passed the `location` for the payload global. We implement the payload global as a function-`static` variable, with the knowledge that every unique specialization of `TraceRay<P>()` will generate a unique global variable of type `P` to implement our function-`static` variable. We then add a slightly magical builtin function `__rayPayloadLocation()` that can map such a variable to its generated `location`; the logic for this is implemented in `emit.cpp` and described below. We also changed the `RayDesc` and `BuiltinTriangleIntersectionAttributes` types from "magic" intrinsic types over to ordinary types (because the GLSL output needs to declare them as ordinary `struct` types). This ends up removing some cases in the AST and IR type representations. By itself this change would break HLSL emit, because in that case the types really are intrinsic. We added a `__target_intrinsic` modifier to these types to make them intrinsic for HLSL, and then updated the downstream passes to handle the notion of target-intrinsic types. The logic for binding/layout of entry point inputs and outputs was updated so that raytracing stages don't follow the default logic for varying input/output parameters. This is because the input/output parameters of a raytracing entry point aren't really "varying" in the same sense as those in the rasterization pipeline. In particular, the SPIR-V model for raytracing input and output treats "ray payload" and "hit attributes" parameters as being in a distinct storage class from `in` or `out` parameters. We also detect cases where a ray tracing stage declares inputs/outputs that it shouldn't have. This logic could conceivably be extended to other stages (e.g., to give an error on a compute shader with user-defined varying input/output). The type layout logic added cases for handling raytracing payload and hit-attribute data, but this is currently just a stub implementation that follows the same logic as for varying `in` and `out` parameters (it cannot give meaningful byte sizes/offsets right now). To my knowledge the GLSL spec doesn't currently specify anything about layout, and I haven't read the DXR spec language carefully enough to know what it says about layout. A future change should update the layout logic to allow for byte-based layout of ray payloads, etc. so that we can query this information via reflection. The GLSL legalization logic in `ir.cpp` was updated to factor out the per-entry-point-parameter code into its own function, and then that function was updated to special-case the input/output of a ray-tracing shader. While for rasterization stages we typically want to take the user-declared input/output and "scalarize" it for use in GLSL (in part to deal with language limitations, and in part to tease system values apart from user-defined input/output), the GLSL spec for raytracing requires payload and hit attribute parameters to be declared as single variables. There is also the issue that even for an `in out` parameter, a ray payload parameter should only turn into a single global, whereas the handling for varying `in out` parameters generates both an `in` and an `out` global for the GLSL case. Other than the handling of entry point parameters, the GLSL legalization pass doesn't need to do anything special for ray tracing shaders. The trickiest change in the `emit.cpp` logic is that we now generate `location`s for ray payload arguments (the outgoing from a `TraceRay()` call) on demand during code generation. This is a bit hacky, and it would be nice to handle it as a separate pass on the IR rather than clutter up the emit logic, but this approach was expedient. Basically, any of the global variables that got generated from the `static` declarations in the standard library implementation of `TraceRay()` will trigger the logic to assign them a `location`. The logic for emitting intrinsic operations added a few new `$`-based escape sequences. The `$XP` case handles emitting the location of a generated ray payload variable; this is how we emit the matching location at the site where we call `traceNVX`. The `$XT` case emits the appropriate translation for `RayTCurrent()` in HLSL, because it maps to something different depending on the target stage. All of the test cases here consist of a pair of an HLSL/Slang shader written to the DXR spec, plus a matching GLSL shader for a baseline. The GLSL shaders are carefully designed so that when fed into glslang they will produce the same SPIR-V as our cross-compilation process. This kind of testing is quite fragile, but it seems to be the best we can do until our testing framework code supports both DXR and VKRay. A bunch of the core changes ended up being blocked on issues in the rest of the compiler, so some additional features go implemented or fixed along the way: The first big wall this work ran into was that the `__specialized_for_target` modifier hasn't actually been working correctly for a while. It turns out that for the one function that is using it, `saturate()`, we have been outputting the workaround GLSL function in all cases (including for HLSL output) rather than only on GLSL targets. The problem here is that for a generic function with a `__specialized_for_target` modifier or a `__target_intrinsic` modifier, the IR-level decoration will end up attached to the `IRFunc` instruction nested in the `IRGeneric`, but the logic for comparing IR declarations to see which is more specialized (via `getTargetSpecializationLevel()`) was looking only at decorations on the top-level value (the generic). The quick (hacky) fix here is to make `getTargetSpecializationLevel()` try to look at the return value of a generic rather than the generic itself, so that it can see the decorations that indicate target-specific functions. A more refined fix would be to attach target-specificity decorations to the outer-most generic (to simplify the "linking" logic). The only reason not to fold that into the current fix is that the `__target_intrinsic` modifier currently serves double-duty as a marker of target specialization and information to drive emit logic. The latter (the emit-related stuff) currently needs to live on the `IRFunc`, and moving it to the generic could easily break a lot of code. This needs more work in a follow-on fix, but for now target specialization should again be working. The other big gotcha that the simple "just use the standard library" strategy ran into was that function-`static` variables weren't actually implemented yet, and in particular function-`static` variables inside of generic functions required some careful coding. The logic in `lower-to-ir.cpp` has this `emitOuterGenerics()` function that is supposed to take a declaration that might be nested inside of zero or more levels of AST generics, and emit corresponding IR generics for all those levels. This is needed because two different AST functions nested inside a single generic `struct` declaration should turn into distinct `IRFunc`s nested in distinct `IRGeneric`s. The tricky bit to making that all work is that the same AST-level generic type parameter will then map to different IR-level instructions (the parameters of distinct `IRGeneric`s) when lowering each function. The existing logic handled this in an idiomatic way by making "sub-builders" and "sub-contexts." This change refactors some of the repeated logic into a `NestedContext` type to help simplify the pattern, and applies it consistently throughout the `lower-to-ir.cpp` file. Besides that cleanup, the major change is `lowerFunctionStaticVarDecl` which, unsurprisingly, handles lower of function-`static` variables to IR globals. The careful handling of nested contexts here is needed because if we are in the middle of lowering a generic function, then a `static` variable should turn into its own `IRGeneric` wrapping an `IRGlobalVar`. The body of the function should refer to the global variable by specializing the global variable's `IRGeneric` to the parameters of the functions `IRGeneric`. This tricky detail is handled by `defaultSpecializeOuterGenerics`. An additional subtlety not actually required for this raytracing work (and thus not properly tested right now) is handling function-`static` variables with initializers. These can't just be lowered to globals with initializers, because HLSL follows the C rule that function-`static` variables are initialized when the declaration statement is first executed (and this could be visible in the presence of side-effects). The lowering strategy here translates any `static` variable with an initializer into two globals: one for the actual storage, plus a second `bool` variable to track whether it has been initialized yet. There are some opportunities to optimize this case, especially for `static const` data, but that will need to wait for future changes. We've slowly been shifting away from the model where a user thinks of a "profile" as including both a stage and a feature level. Instead, the user should think about selecting a profile that only describes a feature level (e.g., `sm_6_1`, `glsl_450`, etc.), and then separately specifying a stage (`vertex`, `raygeneration, etc.) for each entry point. The challenge here is that the command-line processing still only had a single `-profile` switch, and no way to specify the stage. Adding the `-stage` option was relatively easy, but making it work with the existing validation logic for command-line arguments was tricky, because of the complex model that `slangc` supports for compiling multiple entry points in a single pass. * In `slang.h` add new reflection parameter categories for ray payloads and hit attributes, as part of entry point input/output signatures. * A previous change already updated our copy of glslang to one that supports the `GL_NVX_raytracing` extension, so in `slang-glslang.cpp` we just needed to map Slang's `enum` values for the raytracing stage names to their equivalents in the glslang code. * Moved the logic for looking up a stage by name (`findStageByName()`) out of `check.cpp` and into `compiler.cpp`, with a declaration in `profile.h` * Added a `$z` suffix to the GLSL translation of `Texture.SampleLevel()`, to handle cases where the texture element type is not a 4-component vector. Note that this fix should actually be applied to all* these texture-sampling operations, but I didn't want to add a bunch of changes that are (clearly) not being tested right now. * The layout logic for entry points was updated to correctly skip producing a `TypeLayout` for an entry point result of type `void`, which meant that the related emit logic now needs to guard against a null value for the result layout. * In `ir.cpp`, dump decorations on every instruction instead of just selected ones, so that our IR dump output is more complete. * Added a command-line `-line-directive-mode` option so that we can easily turn off `#line` directives in the output when debugging. Not all cases where plumbed through because the `none` case is realistically the most important. * Parser was fixed to properly initialize parent links for "scope" declarations used for statements, so that we can walk backwards from a function-scope variable (including a `static`) and see the outer function/generics/etc. * Added GLSL 460 profile, since it is required for ray tracing. Also updated the logic for computing the "effective" profile to use to recognize that GLSL raytracing stages require GLSL 460. * Added some conventional ray-tracing shader suffixes to the handling in `slang-test`. This code isn't actually used, but was relevant when I started by copy-pasting some existing VKRay shaders as the starting point for my testing. * Fixup: typos
*	Remap IROp value ranges	jsmall-nvidia	2018-09-24
\| \| \| \|	* Change the layout of IROp such that 'main' IROps are 0-x. (#649) * Removed MANUAL_RANGE instuction types, as no longer needed.
*	Support for IRStringLit (#645)	jsmall-nvidia	2018-09-19
\| \| \| \| \| \| \| \|	* * Added support for strings in IR with IRStringLit - with storage of chars after it * Added kIRDecorationOp_Transitory - can be used for detecting instructions constructed on stack * Made IRConstant hashing work off type * Fix comment that is out of date about how an instruction is determines to hold a transitory string.
*	Improvements around IR representation and memory usage (#635)	jsmall-nvidia	2018-09-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* * Remove dispose from IRInst * Use MemoryArena instead of MemoryPool * Make all IRInst not require Dtor - by having ref counted array store ptrs that need freeing * Increase block size - typically compilation is 2Mb of IR space(!) * Fix issues around StringRepresentation::equal because null has special meaning. * Don't bother to construct as String to compare StringRepresentation, just used UnownedStringSlice. * Added fromLiteral support to UnownedStringSlice and use instead of strlen version. * Use more conventional way to test StringRepresentation against a String. * Fix gcc/clang template problem with cast.
*	Fix translation of RWTexture subscript operations for Vulkan (#618)	Tim Foley	2018-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Partially fixes #615 There's kind of a mess going on here, and it is difficult to be sure which of the changes here are strictly necessary. Also, our testing isn't setup to run tests that use `RWTexture2D`, so the only testing I can really run is manual tests using Falcor. The most basic issue here is that in an earlier change I added `ref` accessors for the subscript operation on various `RW` types in the standard library, and that included `RWTexture2D` (and the other `RWTexture` types). The compiler ended up favoring a `ref` accessor over a `set` accessor even when the `set` would suffice, but only the `set` accessor could be lowerd to GLSL/SPIR-V. This change ends up implementing two different fixes for the same problem: * Logic has been added to try and favor a `set` accessor over a `ref` accessor in the cases where either could be used (but still require a `ref` accessor to be used when it is really needed) * The `ref` accessor for `RWTexture*` has been removed, since it turns out that the operations that might have benefited from it (atomics, and component-granularity stores) aren't actually allowed on typed UAVs anyway. There is a deeper issue here that somebody needs to go through and rationalize our representation and handling of accessors like this, but I'm not going to be able to do that in the time I can put into this PR.
*	Support for Tessellation (#607)	jsmall-nvidia	2018-06-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix typo OuptutTopologyAttribute -> OutputTopologyAttribute First pass support for handing tesselation shaders - domain and hull. * Added attribute PatchConstantFuncAttribute * Added visitHLSLPatchType(HLSLPatchType* type) such that the patch type template parameters are handled * Added IRNotePatchConstantFunc - such that the patch constant function is referenced within IR * Added support for outputing typical tesselation attributes (although minimal validation is performed) * Added findFunctionDeclByName * Small improvements to diagnostic. * Improved diagnostics and checking for geometry shader attributes. * Added diagnostic if patchconstantfunc is not found Handle assert failure when outputing a domain shader alone and therefore attr->patchConstantFuncDecl is not set. * Simple script tess.hlsl to test out domain/hull shaders. * Added url for where hull shader attributes are defined. * Fix unsigned/signed comparison warning. * Restore removal of fix in "Improve generic argument inference for builtins (#598)" * Update tessellation test case to compare against fxc The test was previously comparing against fixed expected DXBC output, but this caused problems when the test runner tried to execute the test on Linux (where there is no fxc to invoke...), and would also be a potential source of problems down the road if different users run using different builds of fxc. The simple solution here is to convert the test to compare against fxc output generated on the fly. That test type is already filtered out on non-Windows builds, so it eliminates the portability issue (in a crude way). I also changed the test to compile both entry points in one compiler invocation, just to streamline things into fewer distinct tests. * Eliminate unnecessary call to `lowerFuncDecl` In a very obscure case this could cause a bug, if the patch-constant function had somehow already been lowered (because it was called somewhere else in the code). The call should not be needed because `ensureDecl` will lower a declaration on-demand if required, so eliminating it causes no problems for code that wouldn't be in that extreme corner case.
*	Fix some issues around codegen for l-values and assignment (#601)	Tim Foley	2018-06-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The problem here arose when a complicated l-value was formed like: ```hlsl struct Foo { float4 a; } RWStructuredBuffer<Foo> gBuffer; gBuffer[index].a.xz += whatever; ``` In this case the `gBuffer[index].a.xz` expression is a complex l-value in multiple ways: * The `gBuffer[index]` subscript could be routed to either a `get` accessor or a `ref` accessor (and maybe also a `set` accessor if we add one to the stdlib definition), and we defer the choice of which to call until as late as possible in codegen today. * The `_.a` part then becomes a "bound member acess" because we can't actually produce a direct pointer until we've resolved how to implement the subscript operation. * The `_.xz` part becomes a "swizzled l-value" because there is no way to materialize it as a pointer to contiguous storage in the orignal object (the `x` and `z` components of a vector aren't contiguous). Recent changes to support atomic operations on buffer elements introduced the `ref` accessor on `RWStructuredBuffer`, which made it possible to form a pointer to a buffer element in the IR. This interacted with some code for the "bound member" case that was trying to only introduce a temporary when absolutely necessary, and was doing so by assuming anything with an address didn't need to be moved into a temporary. The first fix is to clean up that logic in the bound-member case for assignment: always create a temporary, rather than do it conditionally. The second fix here is more systemic: we add logic to try to coerce the representation of an l-value during codegen into being a simple address, and employ that in cases where we know an address is desired. In a case like the above this helps to get things into the form that is required, so that a swizzled store can be issued. There is still some potential for cleanup in this logic, but I don't want to introduce more changes than seem necessary to fix the original problem.
*	Initial support for enum declarations (#599)	Tim Foley	2018-06-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Slang `enum` declarations will always be scoped, e.g.: ```hlsl enum Color { Red, Green = 2, Blue, } Color c = Color.Red; // Not just `Red` ``` A user can write `enum class` as a placebo for now (to ease sharing of headers with C++). Slang does not currently support the `::` operator for static member lookup, so it must be `Color.Green` and not `Color::Green`. Support for `::` as an alternate syntax could be added later if there is strong user demand. An `enum` type can have a declared "tag type" using syntax like C++ `enum class`: ```hlsl enum MyThings : uint { First = 0, // ... } ``` The `enum` cases will store their values using that type. An `enum` that doesn't declare a tag type will use the type `int` by default. Enum cases are assigned values just like in C/C++: cases can have explicit values, but otherwise default to one more than the previous case, or zero for the first case. All `enum` types will automatically conform to a standard-library `interface` called `__EnumType`, which is used so that basic operators like equality testing can be defined generically for all `enum` types. This change only adds one operator at first (the `==` comparison), but other should be added later. An `enum` case needs to be explicitly converted to an integer where needed (e.g., `int(Color.Red)`). This is implemented by having the main integer types (`int` and `uint`) support built-in initializers that can work for any `enum` type (or rather, anything conforming to `__EnumType`). Eventually these will be restricted so that an `enum` type can only be converted to its associated tag type. IR code generation completely eliminates `enum` types and their cases. The `enum` type will be replaced with its tag type, and the cases will be replaced with the tag values. Currently this could leave some mess in the IR where cast operations are applied between values that actually have the same type.
*	Fix atomic operations on RWBuffer (#593)	Tim Foley	2018-06-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix atomic operations on RWBuffer An earlier change added support for passing true pointers to `__ref` parameters to fix the global `Interlocked()` functions when applied to `groupshared` variables or `RWStructureBuffer<T>` elements. That change didn't apply to `RWBuffer<T>` or `RWTexture2D<T>`, etc. because those types had so far only declared `get` and `set` accessors, but not any `ref` accessors (which return a pointer). The main fixes here are: Add `ref` accessors to the subscript oeprations on the `RW` resource types Adjust the logic for emitting calls to subscript accessors so that we don't get quite as eager about invoking a `ref` accessor, and instead try to invoke just a `get` or `set` accessor when these will suffice. This is important for Vulkan cross-compilation, where we don't yet support the semantics of our `ref` accessors. * Add a test case for atomics on a `RWBuffer` * Fix up `render-test` so that we can specify a format for a buffer resource, which allows us to use things other than `StructuredBuffer` and `ByteAddressBuffer`. The work there is probably not complete; I just did what I could to get the test working. * A bunch of files got whitespace edits thanks to the fact that I'm using editorconfig and others on the project seemingly arent... * fixup: remove ifdefed-out code
*	Fix global atomic functions (#582)	Tim Foley	2018-05-29
\| \| \| \| \| \| \| \| \|	Fixes #581 This change adds a new parameter passing mode `__ref` to exist alongisde `in`, `out`, and `inout`. The `__ref` modifier indicates true by-reference parameter passing (whereas `inout` is copy-in-copy-out). This is not intended to be something that users interact with directly, but rather a low-level feature that lets us provide a correct signature for the `Interlocked*()` operations in the standard library. Most of the support for passing what are logically addresses around already exists in the IR, so the majority of the work here is just in introducing the new type `Ref<T>` and then using it appropriately when lowering `__ref` parameters/arguments to the IR.
*	Cleanups around behavior when the compiler fails (#553)	Tim Foley	2018-05-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Cleanups around behavior when the compiler fails * Add another case where we try to `noteInternalErrorLoc()` if an exception in thrown. This one is the in the logic for emitting an IR instruciton. This could be improved by adding another layer at the function level (as a catch-all for instructions with no location), but something is better than nothing. * Change a bunch of `assert()`s over to `SLANG_ASSERT()`s, so that we can theoretically take more control over them (e.g., make release builds with asserts enabled) * Some other small cleanups around the assertions we perform. In the survey I made, I didn't really see many obvious "smoking gun" cases where we could produce a significantly better error message for some of the unimplemented/unexpected paths, other than to actually implement the missing functionality. * fixup
*	Fixes #559 (#560)	Yong He	2018-05-11
\|