slang.git - Making it easier to work with shaders

	Commit message (Collapse)	Author	Age
*	Use slang- prefix on slang compiler and core source (#973)	jsmall-nvidia	2019-05-31
\| \| \| \| \| \| \| \| \| \| \| \|	* Prefixing source files in source/slang with slang- * Prefix source in source/slang with slang- prefix. * Rename core source files with slang- prefix. * Update project files. * Fix problems from automatic merge.
*	Initial support for the `precise` keyword (#958)	Tim Foley	2019-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes #858 The `precise` keyword exists in both HLSL and GLSL and when applied to a variable declaration is supposed to indicate that all computations that contribute to the value of that variable should not be altered based on "fast-math" optimizations. The main examples are that separate multiply and add operations should not be turned into fused multiply-add (fma) operations, and that operations cannot ignore the possibility of infinity or not-a-number values (e.g., by assuming that `x * 0.0f` is always `0.0f`). (Aside: it is possible that my understanding of what the semantics of `precise` are in HLSL and GLSL is imperfect so that either the GLSL variant isn't sufficient to provide the semantics of the HLSL keyword, or that the definition of "all computations that contribute" to a value isn't actually correct. We may need to revise this implementation based on subsequent learnings.) The basic idea here is to turn the AST `precise` keyword into a `[precise]` decoration in the IR and then emit that as a `precise` keyword again in the output. The main catch is that whereas most of our existing IR decorations apply to things like global shader parameters or `struct` members that usually stick around for the duration of compilation, `[precise]` will get slapped on local variables that will often get optimized away by our SSA pass. There are two ways a variable can get eliminated/replaced during the SSA pass: 1. A use of the variable can be replaced with an ordinary instruction that computes its value. 2. A use of the variable can be replaced with a reference to a "phi node" that will take on the appropriate value based on control flow. These two cases already had logic to copy a "name hint" decoration from the variable over to an instruction that will replace it, and I simply extended them to also propagate over a `[precise]` decoration. The test case added with this change intentionally constructs a case where `[precise]` needs to be propagated over to an SSA "phi node" in order to generate correct output code. The other gotcha is that we can emit variable declarations in various places in `emit.cpp`, and all of these needed to handle `[precise]`. Not only do we have actually local variables (`IRVar`), but we also have SSA phi nodes (`IRParam`), and then there are cases where an intermediate computation (an ordinary instruction) should be `[precise]` and thus we need to emit it as a temporary (not folding it into its use sites) and make sure that the temporary itself gets the `precise` keyword. I have manually confirmed that in the output SPIR-V, this change results in the `NoContraction` SPIR-V decoration being added to the relevant operations, and the output DXBC contains a multiply and an add in place of a multiply-add. The output DXIL does not show any obvious changes due to `precise`, although the exact order and operands of the math instructions emitted does differ when `precise` is added/removed. In all cases the output is equivalent to hand-written HLSL/GLSL with a `precise`-qualified local variable.
*	Add better control over image formats for GLSL/SPIR-V targets (#939)	Tim Foley	2019-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add better control over image formats for GLSL/SPIR-V targets Currently Slang emits GLSL code assuming all R/W images need to have explicit formats, and thus we try to infer a format from the element type of the image. E.g., given a `RWTexture2D<half4>` we might infer that a qualifier of `layout(rgba16f)` should be used. This strategy has two notable shortcomings: * Sometimes the user will want a format that doesn't match an existing HLSL type. E.g., if they want the equivalent of `layout(r11f_g11f_b10f)`, then what should they put in their `RWTexture2D<...>` to make the inference do what they need? * Sometimes the user knows that they don't need to specify a format at all, because using the `GL_EXT_shader_image_load_formatted` extension, they can still perform non-atomic load/store on images with no format specified in the SPIR-V. This change adds two features directed at these challenges. First, we add an explicit `[format(...)]` attribute that can be used to specify an explicit image format, including ones that don't match any HLSL type. An example of using this new attribute is: ```hlsl [format("r11f_g11f_b10f")] RWTexture2D<float3> myImage; ``` For simplicity in initial bring-up, the new formats all use the same naming as formats in GLSL (this should make it easy for a programmer who knows what they expect to get in the GLSL output). We can change the naming convention for formats at a later time, so long as we keep these existing names in as a compatibility feature. Note that this is not given a `vk::` prefix since the attribute should signal the programmer's intent to provide an image with that format on all targets (although only some targets might act on that information). Also note that the attribute takes a string (`[format("rgba8")`) instead of a bare identifier (`[format(rgba8)]`) because this is consistent with the existing convention for attributes in HLSL. When `[format(...)]` is left off, the default compiler behavior will still be to infer a format, but this behavior can be overidden for a single image using an explicit format of `"unknown"`: ```hlsl [format("unknown")] RWTexture2D<float4> mysteryMachine; ``` The second new feature is that if a user knows they are coding for a GPU that supports the `"unknown"` format in all non-atomic cases, then they can opt into making that the default for images without an explicit `[format(...)]`, using the new `-default-image-format-unknown` command-line option for `slangc`. The new test case included with this change confirms that we correctly see the explicit formats in the output GLSL and no formats for images without explicit `[format(...)]` when using the new command-line option. The test stresses images declared at global scope, in parameter blocks, and in entry-point parameter lists, to try and make sure that all the relevant IR passes in the compiler preserve the format information. * fixup: missing file
*	[[vk::shader_record]] (#836)	jsmall-nvidia	2019-02-11
\| \| \| \| \| \| \| \| \| \|	* * Replaced ShaderRecordNVLayoutModifier with ShaderRecordAttribute * Allowed attributed [[vk::shader_record] and [[shader_record]] * Checking there is at most 1 ShaderRecord active * Small typo fixes * Slightly improve diagnostic. Replace expected file.
*	Add support for user defined attributes.	Yong He	2019-01-29
\|
*	Add support for Vulkan raytraicng "shader record" (#735)	Tim Foley	2018-11-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The syntax for this is a placeholder for now, since we will probably want to migrate to whatever gets decided on for dxc. To declare that some data should be part of the "shader record" use `layout(shaderRecordNV)` to mirror the GLSL raytracing extension: ```hlsl layout(shaderRecordNV) cbuffer MyShaderRecord { float4 someColor; uint someValue; } ``` The intention (not enforced) is that an application would map `MyShaderRecord` to "root constants" in the "local root signature" when compiling for DXR, while the output code in GLSL will always map to the shader record in Vulkan raytracing: ```glsl layout(shaderRecordNV) buffer MyShaderRecord { float4 someColor; uint someValue; }; ``` This change does not support declaring a global value of `struct` type with `layout(shaderRecordNV)` (or a `ParameterBlock` with the modifiers, although that would be a nice-to-have feature) and it does not support having the contents of the shader record be mutable (even if GLSL/Vulkan allows it). Those can/should be added in future changes. In terms of implementation, this closely mirrors the way that `layout(push_constant)` buffers were being handled, where the data inside the `ConstantBuffer<X>` (the value of type `X`) gets laid out using ordinary rules (and consuming ordinary `UNIFORM` storage, while the buffer itself is given a different layout resource to reflect that fact that it does not consume a VK `binding` any more, but a different conceptual resource. Note: an alternative design here (that might actually be preferrable) would be to have both push-constant and shader-record buffers be handled as alternative aliases for `ConstantBuffer` (or maybe `ParameterBlock`) so that you have, e.g.: ```hlsl PushConstantBuffer<X> myPushConstants; ShaderRecord<Y> myShaderRecord; ``` This alternative design avoids API-specific decorations on the declarations, and reflects the intent of the programmer very directly, even when they are compiling for a target like D3D that doesn't reflect these choices at the IL level (it could still be exposed through the Slang reflection API).
*	Add support for globallycoherent modifier (#732)	Tim Foley	2018-11-29
\| \| \| \| \| \| \| \| \|	The `globallycoherent` modifier indicates that resource might be read or written by threads outside of the current thread group, so that any memory barriers that affect it should guarantee coherency at the global memory scope, and not just thread-group scope. The equivalent GLSL modifier appears to be `coherent`. This change adds the front-end modifier, transforms it into an IR-level decoration during lowering, and then checks for the modifier during code emit. Note: this logic may not behave correctly when `globallycoherent` is added to a field in a `struct`, since the modifier would then need to be propagated to any variables created during type legalization. Checking up on that is left to future work. Note: it isn't entirely clear if `globallycoherent` should be treated as a declaration modifier or a type modifier. The point is moot for now because Slang doesn't have any support for type modifiers, but when we get around to that we will need to make a decision.
*	Feature/early depth stencil (#727)	jsmall-nvidia	2018-11-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First pass support for early depth stencil. * Add a simple test to check if output has attributes. * Use cross compilation to test [earlydepthstencil] on glsl. * If target is dxil, use dxc to test against. Add hlsl to test earlydepthstencil against. * * Added spSessionHasCompileTargetSupport * Made slang-test use spSessionHasCompileTargetSupport to ignore tests that cannot run
*	Add callable shader support for Vulkan ray tracing (#718)	Tim Foley	2018-11-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add callable shader support for Vulkan ray tracing This change extends the previous work to update Vulkan ray tracing support for the finished `GL_NV_ray_tracing` spec. One of the features missing in the experimental extension that was added to the final spec is "callable shaders," which allow ray tracing shaders to call other shaders as general-purpose subroutines. Most of the implementation work here mirrors what was done for the `TraceRay()` function to map it to `traceNV()`. We map the generic `CallShader<P>` function to the non-generic `executeCallableNV`, with a payload identifier that indicates a specific global variable of type `P` (the global variable being generated from a `static` local in `CallShader`). A new modifier is added to identify the payload structure, and the parameter binding/layout logic introduces a new resource kind for callable-shader payload data (where previously the logic had assumed ray and callable payloads should use the same resource kind). Two test shaders are included: one for the callable shader (`callable.slang`) and one for a ray generation shader that calls it (`callable-caller.slang`). Just for kicks, the payload data type is defined in a shared file so that we can be sure the two agree (trying to emulate what might be good practice, and ensure that ray tracing support works together with other Slang mechanisms). * Typo fix: assocaited->associated One instance was found in review, but I went ahead and fixed a bunch since I seem to make this typo a lot. * Typo fix: defintiion->definition
*	Update Vulkan ray tracing support to final extension spec (#717)	Tim Foley	2018-11-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Update version of glslang used * Update VK raytracing support for final extension spec A lot of this change is just plain renaming: The `NVX` suffixes become just `NV`, and the extension name changes from `GL_NVX_raytracing` to `GL_NV_ray_tracing`. The Slang standard library and the GLSL baselines for the tests are consistently updated. The other detail is that the final spec requires the "payload" identifier in a `traceNV()` call to be a compile-time constant, which means it cannot be defined as a local variable first, as in: ```glsl int payloadID = 0; traceNV(..., payloadID); // ERROR ``` In terms of how the original support was implemented, the payload ID is being computed via a special builtin function that maps each global GLSL payload variable to a unique ID. There are a few ways we could try to resolve the problem here: 1. We could aspire to put our equivalent of the `constexpr` modifier on the output of the function, so that the GLSL variable gets declared `const` and thus fits the GLSL rules for a constant expression. 2. We could introduce a pass to replace the payload-location instructions with literal integers. 3. We could use a special-purpose instruction instead of a builtin function call, and have that instruction indicate that it doesn't have side effects (so it can be folded into the call site) 4. We could somehow mark the builtin function as not having side effects. We choose option (4) simply because it provides a feature that could have other applications. This change adds a `[__readNone]` attribute that can be applied to function declarations to express a promise on the part of the programmer that the given function has no side effects and computes its result strictly from the bits of its input arguments (and not things they point to, etc.). This mirrors an equivalent function attribute in LLVM. We mark the function that computes a ray payload location with this attribute, and propagate the attribute through the layers of the IR, so that when the emit logic asks if an operation has side effects (to see if it can be folded into the arguments of a subsequent expression), we get an affirmative response. This change should get all of the features that were present in the experiemntal `NVX` extension working with the final extension spec. It does not address callable shaders, which will come as a subsequent change.
*	Add basic support for [mutating] methods (#667)	Tim Foley	2018-10-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By default, when writing a "method" (aka "member function") in Slang, the `this` parameter is implicitly an `in` parameter. So this: ```hlsl struct Foo { int state; int getState() { return state; } void setState(int s) { state = s; } }; ``` is desugared into something like this: ```hlsl struct Foo { int state }; int Foo_getState(Foo this) { return this.state; } // BAD: void Foo_setState(Foo this, int s) { this.state = s; } ``` That "setter" doesn't really do what was intended. It modifies a local copy of type `Foo`, because `in` parameters in HLSL represent by-value copy-in semantics, and are mutable in the body the function. Slang was updated to give a static error on the original code to catch this kind of mistake (so that `this` parameters are unlike ordinary function parameters, and no longer mutable). Of course, sometimes users want a mutable `this` parameter. Rather than make a mutable `this` the default (there are arguments both for and against this), this change adds a new attribute `[mutating]` that can be put on a method (member function) to indicate that its `this` parameter should be an `in out` parameter: ```hlsl [mutating] void setState(int s) { state = s; } ``` The above will translate to, more or less: ```hlsl void Foo_setState(inout Foo this, int s) { this.state = s; } ``` One added detail is that `[mutating]` can also be used on interface requirements, with the same semantics. A `[mutating]` requirement can be satisfied with a `[mutating]` or non-`[mutating]` method, while a non-`[mutating]` requirement can't be satisfied with a `[mutating]` method (the call sites would not expect mutation to happen). The design of `[mutating]` here is heavily influenced by the equivalent `mutating` keyword in Swift. Notes on the implementation: * Adding the new attribute was straightforward using the existing support, but I had to change around where attributes get checked in the overall sequencing of static checks, because attributes were being checked after function bodies, but with this change I need to look at semantically-checked attributes to determine the mutability of `this` * The check to restrict it so that `[mutating]` methods cannot satisfy non-`[mutating]` requirements was easy to add, but it points out the fact that there is a huge TODO comment where the actual checking of method signatures is supposed to happen. That is a bug waiting to bite users and needs to be fixed! * While we had special-case logic to detect attempts to modify state accessed through an immutable `this` (e.g., `this.state = s`), that logic didn't trigger when the mutation happened through a function/operator call (e.g., `this.state += s`), so this change factors out the validation logic for that case and calls through to it from both the assignment and `out` argument cases. * The error message for the special-case check was updated to note that the user could apply `[mutating]` to their function declaration to get rid of the error. * The semantic checking logic for an explicit `this` expression was already walking up through the scopes (created during parsing) and looking for a scope that represents an outer type declaration that `this` might be referring to. We simply extend it to note when it passes through the scope for a function or similar declaration (`FunctionDeclBase`) and check for the `[mutating]` attribute. If the attribute is seen, it returns a mutable `this` expression, and otherwise leaves it immutable. * The IR lowering logic then needed to be updated so that when adding an IR-level parameter to represent `this`, it gives it the appropriate "direction" based on the attributes of the function declaration being lowered. The rest of the IR logic works as-is, because it will treat `this` just like an other parameter (whether it is `in` or `inout`). * This biggest chunk of work was the "implicit `this`" case, because ordinary name lookup may resolve an expression like `state` into `this.state`, so that the `this` expression comes out of "thin air." To handle this case, I extended the structure of the "breadcrumbs" that come along with a lookup result (the breadcrumbs are used for any case where a single identifier like `state` needs to be embellished to a more complex expression as a result of lookup), so that it can identify whether a `Breadcrumb::Kind::This` node comes from a `[mutating]` context or not. Similar to the logic for an explicit `this`, we handle this by noting when we pass through a `FunctionDeclBase` when moving up through scopes, and look for the `[mutating]` attribute on it. The rest of the work was just plumbing the additional state through.
*	Support cross-compilation of ray tracing shaders to Vulkan (#663)	Tim Foley	2018-10-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Move to newer glslang * Support cross-compilation of ray tracing shaders to Vulkan This change allows HLSL shaders authored for DirectX Raytracing (DXR) to be cross-compiled to run with the experimental `GL_NVX_raytracing` extension (aka "VKRay"). * The GLSL extension spec is marked as experimental, so that any shaders written using this support should be ready for breaking changes when the spec is finalized. * "Callable shaders" are not exposed throug the GLSL extension, so this feature of DXR will not be cross-compiled. * The experimental Vulkan raytracing extension does not have an equivalent to DXR's "local root signature" concept. This does not visibly impact shader translation (because the local/global root signature mapping is handled outside of the HLSL code), but in practice it means that applications which rely on local root signatures on their DXR path will not be able to use the translation in this change as-is; more work will be needed. The simplest part of the implementation was to go into the Slang standard library and start adding GLSL translations for the various DXR operations. In some cases, like mapping `IgnoreHit()` to `ignoreIntersectionNVX()` this is almost trivial. The various functions to query system-provided values (e.g., `RayTMin()`) were also easy, with the only gotcha being that they map to variables rather than function calls in GLSL, and our handling of `__target_intrinsic` assumes that a bare identifier represents a replacement function name, and not a full expression, so we have to wrap these definitions in parentheses. The tricky operations are then `TraceRay<P>()` and `ReportHit<A>()`, because these two are generics/templates in HLSL. GLSL doesn't support generics, even for "standard library" functions, so the raytracing extension implements a slightly complex workaround: the matching operations `traceNVX()` and `reportIntersectionNVX()` pass the payload/attributes argument data via a global variable. That is, shader code for the GLSL extensions writes to the global variable and then calls the intrinsic function. The linkage between the call site and the global is established by a modifier keyword (`rayPayloadNVX` and `hitAttributeNVX`, respectively) and in the case of ray payload also uses `location` number to identify which payload global to use (since a single shader can trace rays with multiple payload types). Our translation strategy in Slang tries to leverage standard language mechanisms instead of special-case logic. For example, to translate the `ReportHit<A>()` function, we provide both a default declaration that will work for HLSL (where the operation is built-in with the signature given), and a definition marked with the `__specialized_for_target(glsl)` modifier. The GLSL definition declares a function `static` variable that will fill the role of the required global, and then does what the GLSL spec requires: assigns to the global, and then calls the `reportIntersectionNVX` builtin (which we declare as a separate builtin). Our ordinary lowering process will turn that `static` variable into an ordinary global in the IR, and the `[__vulkanHitAttributes]` attribute on the variable will be emitted as `hitAttributeNVX` in the output. There is no additional cross-compilation logic in Slang specific to `ReportHit<A>()` - the target-specific definition in the standard library Just Works. The case for `TraceRay<P>()` is a bit more complicated, simply because the GLSL `traceNVX()` function needs to be passed the `location` for the payload global. We implement the payload global as a function-`static` variable, with the knowledge that every unique specialization of `TraceRay<P>()` will generate a unique global variable of type `P` to implement our function-`static` variable. We then add a slightly magical builtin function `__rayPayloadLocation()` that can map such a variable to its generated `location`; the logic for this is implemented in `emit.cpp` and described below. We also changed the `RayDesc` and `BuiltinTriangleIntersectionAttributes` types from "magic" intrinsic types over to ordinary types (because the GLSL output needs to declare them as ordinary `struct` types). This ends up removing some cases in the AST and IR type representations. By itself this change would break HLSL emit, because in that case the types really are intrinsic. We added a `__target_intrinsic` modifier to these types to make them intrinsic for HLSL, and then updated the downstream passes to handle the notion of target-intrinsic types. The logic for binding/layout of entry point inputs and outputs was updated so that raytracing stages don't follow the default logic for varying input/output parameters. This is because the input/output parameters of a raytracing entry point aren't really "varying" in the same sense as those in the rasterization pipeline. In particular, the SPIR-V model for raytracing input and output treats "ray payload" and "hit attributes" parameters as being in a distinct storage class from `in` or `out` parameters. We also detect cases where a ray tracing stage declares inputs/outputs that it shouldn't have. This logic could conceivably be extended to other stages (e.g., to give an error on a compute shader with user-defined varying input/output). The type layout logic added cases for handling raytracing payload and hit-attribute data, but this is currently just a stub implementation that follows the same logic as for varying `in` and `out` parameters (it cannot give meaningful byte sizes/offsets right now). To my knowledge the GLSL spec doesn't currently specify anything about layout, and I haven't read the DXR spec language carefully enough to know what it says about layout. A future change should update the layout logic to allow for byte-based layout of ray payloads, etc. so that we can query this information via reflection. The GLSL legalization logic in `ir.cpp` was updated to factor out the per-entry-point-parameter code into its own function, and then that function was updated to special-case the input/output of a ray-tracing shader. While for rasterization stages we typically want to take the user-declared input/output and "scalarize" it for use in GLSL (in part to deal with language limitations, and in part to tease system values apart from user-defined input/output), the GLSL spec for raytracing requires payload and hit attribute parameters to be declared as single variables. There is also the issue that even for an `in out` parameter, a ray payload parameter should only turn into a single global, whereas the handling for varying `in out` parameters generates both an `in` and an `out` global for the GLSL case. Other than the handling of entry point parameters, the GLSL legalization pass doesn't need to do anything special for ray tracing shaders. The trickiest change in the `emit.cpp` logic is that we now generate `location`s for ray payload arguments (the outgoing from a `TraceRay()` call) on demand during code generation. This is a bit hacky, and it would be nice to handle it as a separate pass on the IR rather than clutter up the emit logic, but this approach was expedient. Basically, any of the global variables that got generated from the `static` declarations in the standard library implementation of `TraceRay()` will trigger the logic to assign them a `location`. The logic for emitting intrinsic operations added a few new `$`-based escape sequences. The `$XP` case handles emitting the location of a generated ray payload variable; this is how we emit the matching location at the site where we call `traceNVX`. The `$XT` case emits the appropriate translation for `RayTCurrent()` in HLSL, because it maps to something different depending on the target stage. All of the test cases here consist of a pair of an HLSL/Slang shader written to the DXR spec, plus a matching GLSL shader for a baseline. The GLSL shaders are carefully designed so that when fed into glslang they will produce the same SPIR-V as our cross-compilation process. This kind of testing is quite fragile, but it seems to be the best we can do until our testing framework code supports both DXR and VKRay. A bunch of the core changes ended up being blocked on issues in the rest of the compiler, so some additional features go implemented or fixed along the way: The first big wall this work ran into was that the `__specialized_for_target` modifier hasn't actually been working correctly for a while. It turns out that for the one function that is using it, `saturate()`, we have been outputting the workaround GLSL function in all cases (including for HLSL output) rather than only on GLSL targets. The problem here is that for a generic function with a `__specialized_for_target` modifier or a `__target_intrinsic` modifier, the IR-level decoration will end up attached to the `IRFunc` instruction nested in the `IRGeneric`, but the logic for comparing IR declarations to see which is more specialized (via `getTargetSpecializationLevel()`) was looking only at decorations on the top-level value (the generic). The quick (hacky) fix here is to make `getTargetSpecializationLevel()` try to look at the return value of a generic rather than the generic itself, so that it can see the decorations that indicate target-specific functions. A more refined fix would be to attach target-specificity decorations to the outer-most generic (to simplify the "linking" logic). The only reason not to fold that into the current fix is that the `__target_intrinsic` modifier currently serves double-duty as a marker of target specialization and information to drive emit logic. The latter (the emit-related stuff) currently needs to live on the `IRFunc`, and moving it to the generic could easily break a lot of code. This needs more work in a follow-on fix, but for now target specialization should again be working. The other big gotcha that the simple "just use the standard library" strategy ran into was that function-`static` variables weren't actually implemented yet, and in particular function-`static` variables inside of generic functions required some careful coding. The logic in `lower-to-ir.cpp` has this `emitOuterGenerics()` function that is supposed to take a declaration that might be nested inside of zero or more levels of AST generics, and emit corresponding IR generics for all those levels. This is needed because two different AST functions nested inside a single generic `struct` declaration should turn into distinct `IRFunc`s nested in distinct `IRGeneric`s. The tricky bit to making that all work is that the same AST-level generic type parameter will then map to different IR-level instructions (the parameters of distinct `IRGeneric`s) when lowering each function. The existing logic handled this in an idiomatic way by making "sub-builders" and "sub-contexts." This change refactors some of the repeated logic into a `NestedContext` type to help simplify the pattern, and applies it consistently throughout the `lower-to-ir.cpp` file. Besides that cleanup, the major change is `lowerFunctionStaticVarDecl` which, unsurprisingly, handles lower of function-`static` variables to IR globals. The careful handling of nested contexts here is needed because if we are in the middle of lowering a generic function, then a `static` variable should turn into its own `IRGeneric` wrapping an `IRGlobalVar`. The body of the function should refer to the global variable by specializing the global variable's `IRGeneric` to the parameters of the functions `IRGeneric`. This tricky detail is handled by `defaultSpecializeOuterGenerics`. An additional subtlety not actually required for this raytracing work (and thus not properly tested right now) is handling function-`static` variables with initializers. These can't just be lowered to globals with initializers, because HLSL follows the C rule that function-`static` variables are initialized when the declaration statement is first executed (and this could be visible in the presence of side-effects). The lowering strategy here translates any `static` variable with an initializer into two globals: one for the actual storage, plus a second `bool` variable to track whether it has been initialized yet. There are some opportunities to optimize this case, especially for `static const` data, but that will need to wait for future changes. We've slowly been shifting away from the model where a user thinks of a "profile" as including both a stage and a feature level. Instead, the user should think about selecting a profile that only describes a feature level (e.g., `sm_6_1`, `glsl_450`, etc.), and then separately specifying a stage (`vertex`, `raygeneration, etc.) for each entry point. The challenge here is that the command-line processing still only had a single `-profile` switch, and no way to specify the stage. Adding the `-stage` option was relatively easy, but making it work with the existing validation logic for command-line arguments was tricky, because of the complex model that `slangc` supports for compiling multiple entry points in a single pass. * In `slang.h` add new reflection parameter categories for ray payloads and hit attributes, as part of entry point input/output signatures. * A previous change already updated our copy of glslang to one that supports the `GL_NVX_raytracing` extension, so in `slang-glslang.cpp` we just needed to map Slang's `enum` values for the raytracing stage names to their equivalents in the glslang code. * Moved the logic for looking up a stage by name (`findStageByName()`) out of `check.cpp` and into `compiler.cpp`, with a declaration in `profile.h` * Added a `$z` suffix to the GLSL translation of `Texture.SampleLevel()`, to handle cases where the texture element type is not a 4-component vector. Note that this fix should actually be applied to all* these texture-sampling operations, but I didn't want to add a bunch of changes that are (clearly) not being tested right now. * The layout logic for entry points was updated to correctly skip producing a `TypeLayout` for an entry point result of type `void`, which meant that the related emit logic now needs to guard against a null value for the result layout. * In `ir.cpp`, dump decorations on every instruction instead of just selected ones, so that our IR dump output is more complete. * Added a command-line `-line-directive-mode` option so that we can easily turn off `#line` directives in the output when debugging. Not all cases where plumbed through because the `none` case is realistically the most important. * Parser was fixed to properly initialize parent links for "scope" declarations used for statements, so that we can walk backwards from a function-scope variable (including a `static`) and see the outer function/generics/etc. * Added GLSL 460 profile, since it is required for ray tracing. Also updated the logic for computing the "effective" profile to use to recognize that GLSL raytracing stages require GLSL 460. * Added some conventional ray-tracing shader suffixes to the handling in `slang-test`. This code isn't actually used, but was relevant when I started by copy-pasting some existing VKRay shaders as the starting point for my testing. * Fixup: typos
*	Support for [[vk::push_constant]] (#629)	jsmall-nvidia	2018-08-22
\| \| \| \| \| \| \| \| \| \|	* Support for attributed [[vk::push_constant]] and [[push_constant]]. Can also use layout(push_constant). * Fix test so matches the expected output. * Add expected output to binding-push-constant-gl.hlsl * Trivial change to force travis rebuild to test the gcc linux build really has a problem.
*	Feature/attributed binding (#621)	jsmall-nvidia	2018-07-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Typo fix, and added dxc to command line documentation. * Fix small typos. Added support for Scope to lexer. Fix bug in Token ctor. * Add support for attribute names that are scoped. * Added GLSLBindingAttribute. Make binding work through core.met.slang. * Allow [[gl::binding(binding, set)]] [[vk::binding(binding,set)]]
*	Support for Tessellation (#607)	jsmall-nvidia	2018-06-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix typo OuptutTopologyAttribute -> OutputTopologyAttribute First pass support for handing tesselation shaders - domain and hull. * Added attribute PatchConstantFuncAttribute * Added visitHLSLPatchType(HLSLPatchType* type) such that the patch type template parameters are handled * Added IRNotePatchConstantFunc - such that the patch constant function is referenced within IR * Added support for outputing typical tesselation attributes (although minimal validation is performed) * Added findFunctionDeclByName * Small improvements to diagnostic. * Improved diagnostics and checking for geometry shader attributes. * Added diagnostic if patchconstantfunc is not found Handle assert failure when outputing a domain shader alone and therefore attr->patchConstantFuncDecl is not set. * Simple script tess.hlsl to test out domain/hull shaders. * Added url for where hull shader attributes are defined. * Fix unsigned/signed comparison warning. * Restore removal of fix in "Improve generic argument inference for builtins (#598)" * Update tessellation test case to compare against fxc The test was previously comparing against fixed expected DXBC output, but this caused problems when the test runner tried to execute the test on Linux (where there is no fxc to invoke...), and would also be a potential source of problems down the road if different users run using different builds of fxc. The simple solution here is to convert the test to compare against fxc output generated on the fly. That test type is already filtered out on non-Windows builds, so it eliminates the portability issue (in a crude way). I also changed the test to compile both entry points in one compiler invocation, just to streamline things into fewer distinct tests. * Eliminate unnecessary call to `lowerFuncDecl` In a very obscure case this could cause a bug, if the patch-constant function had somehow already been lowered (because it was called somewhere else in the code). The call should not be needed because `ensureDecl` will lower a declaration on-demand if required, so eliminating it causes no problems for code that wouldn't be in that extreme corner case.
*	Fix global atomic functions (#582)	Tim Foley	2018-05-29
\| \| \| \| \| \| \| \| \|	Fixes #581 This change adds a new parameter passing mode `__ref` to exist alongisde `in`, `out`, and `inout`. The `__ref` modifier indicates true by-reference parameter passing (whereas `inout` is copy-in-copy-out). This is not intended to be something that users interact with directly, but rather a low-level feature that lets us provide a correct signature for the `Interlocked*()` operations in the standard library. Most of the support for passing what are logically addresses around already exists in the IR, so the majority of the work here is just in introducing the new type `Ref<T>` and then using it appropriately when lowering `__ref` parameters/arguments to the IR.
*	Add support for explicit register space bindings (#542)	Tim Foley	2018-05-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change adds support for specifying explicit register spaces, like: ```hlsl // Bind to texture register #2 in space #1 Texture2D t : register(t2, space1); ``` I added a test case to confirm that the register space is properly propagated through the Slang reflection API. This change also adds proper error messages for some error/unsupported cases that weren't being diagnosed: * Specifying a completely bogus register "class" (e.g., `register(bad99)`) * Failing to specify a register index (`register(u)`) * Specifying a component mask (`register(t0.x)`) * Using `packoffset` bindings I added test cases to cover all of these, as well as the new errors around support for register `space` bindings. In order to get the existing tests to pass, I had to remove explicit `packoffset` bindings from some DXSDK test shaders. None of these `packoffset` bindings were semantically significant (they matched what the compiler would do anyway, for both Slang and the standard HLSL compiler). Removing them is required for Slang now that we give an explicit error about our lack of `packoffset` support. In a future change we might add logic to either detect semantically insignificant `packoffset`s, or to just go ahead and support them properly (as a general feature on `struct` types).
*	Introduce an IR-level type system (#481)	Tim Foley	2018-04-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Introduce an IR-level type system Up to this point, the Slang IR has used the front-end type system to represent types in the IR. As a result (but ultimately more importantly) the IR representation of generics and specialization has used AST-level concepts embedded in the IR. For example, to express the specialization of `vector<T,N>` to a concrete type `float` for `T`, we needed an IR operation that could represent the specialization, with operands that somehow represented the type argument `float`. The whole thing was very complicated. The big idea of this change is to introduce a new representation in which types in the IR are just ordinary instructions, so that using them as operands makes sense. The hierarchy of IR types closely mirrors the AST-side hierarchy for now, and that will probably be something we should maintain going forward. In order to make these changes work, though, I also had to do major overhauls of things like the way substitutions are performed, how we check interface conformances, the way lookup through interface types is done, etc. etc. This is a big change, and unfortunately any attempt to summarize it in the commit message wouldn't do it justice. * Fix 64-bit build warning * Fix up some clang warnings/errors
*	Pass AST interpolation modifiers through to codegen. (#475)	Tim Foley	2018-04-04
\| \| \| \| \| \| \| \| \|	This is a short-term fix, because we (1) don't have an IR-level representation of interpolation qualifiers, and (2) can't introduce one until after the IR-level type system is introduced (to be able to handle `struct` fields). The approach here is to find the AST-level declaration, either from layout information (in the case of an ordinary variable or function parameter), or from struct field information (because structs are being output from the AST form anyway). I've included a single end-to-end rendering test to confirm that we handle the `nointerpolation` modifier the same as HLSL. I also added the `noperspective` modifier, which seemed to be missing from our implementation.
*	Overhaul implementation of [attributes] (#443)	Tim Foley	2018-03-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existing code parsed all of the square-bracket `[attributes]` into `HLSLUncheckedAttribute`, and then went on to hand-convert some of them to specialized subclasses of `HLSLAttribute`. When attributes didn't check, they were left as-is, and no error message was issued, because at the time the compiler was focused on accepting arbitrary input. This change greatly overhauls the handling of `[attributes]`. Attributes are now declared in the stdlib, with declarations like: ```hlsl __attributeTarget(LoopStmt) attribute_syntax [unroll(count: int = 0)] : UnrollAttribute; ``` In this syntax, the `unroll` part is giving the attribute name (the `[]` are just for flavor, to make the declaration look like a use site; we could drop it if we don't like the clutter), the `count` is a parameter of the attribute, which we expect to be of type `int`, and which has a default value of `0` if unspecified. The `: UnrollAttribute` part specifies the meta-level C++ class that will implement this attribute (and corresponds to a class in `modifier-defs.h`). This syntax is similar to our current `syntax` declarations. I'm starting to think we should change it to something like a `__meta_class(UnrollAttribute)` modifier, and then use that uniformly across all cases (e.g., also replacing the curreent `__magic_type(Foo)` syntax). The `__attributeTarget(LoopStmt)` is a modifier that specifies the meta-level C++ class for syntax that this attribute is allowed to attach to. It is legal to have more than one of these. Attributes continue to be parsed in an unchecked form, so that we don't tie up semantic analysis and parsing more than necessary. During checking, we look up the attribute name in the current scope, and then replace the unchecked attribute with a more specific one if the checking passes. Checking proceeds in generic and attribute-specific phases. The generic phase includes checking the number of arguments against those specified in the attribute declaration (I don't currently check types, or handle default arguments), and then checking that at least one `__attributeTarget(...)` modifier applies to the syntax node being modified. The attribute-specific phase then applies to the specialized C++ subclass of `Attribute`, and does the actual checking right now (e.g., that step is responsible for actually type-checking things at present). This can obviously be improved over time. With this support I went ahead and added declarations for all the HLSL attributes I could find documented on MSDN. I also added a provisional declaration for the `[shader(...)]` attribute that has been added to dxc, but which is not yet documented. One important detail here is that lookup of attribute names needs to be done carefully, so that we don't let, e.g., local variables shadow an attribute declaration: ```hlsl int unroll = 5; // This attribute should not get confused by the local variable `unroll` [unroll] for(...) { .. } ``` The lookup logic already has a notion of a `LookupMask` that can be used to filter declarations out of the result. In this change I surfaced that mask through the main lookup API (rather than requiring a second pass to "refine" lookup results), and made is so that the default lookup mask does not include attributes, while an explicit mask can be used to look up only attributes. (An alternatie design we discussed was to follow the approach of C# and have the declaration of an attribute like `[unroll]` actually be `unrollAttribute`, with a suffix. I decided not to follow that approach for now because it seemed like printing good error messages in that case could require us to carefully trim the `Attribute` suffix off of names at times, and using the existing mask behavior seemed simpler.) To verify that the shadowing behavior is indeed correct, I modified the `loop-unroll.slang` test case. Smaller notes: * Removed the `HLSL` prefix from several of the C++ attribute classes * Made sure to actually validate the modifiers on statements * Special-cased checking for `ParamDecl` with a null type, because I'm re-using `ParamDecl` for attribute parameters, but can't give a concrete type to some of them right now * Deleting some old, dead emit-from-AST logic around attributes, rather than try to "fix" code that doesn't run (a more complete scrub of that code is still needed) * Fixed AST inheritance hierarchy so that a `Modifier` is a `SyntaxNode` rather than a `SyntaxNodeBase`. I have no idea why we have both of those, and we need to clean that up soon.
*	Initial work on validating "constexpr"-ness in IR (#420)	Tim Foley	2018-02-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Initial work on validating "constexpr"-ness in IR The underlying issue here is that certain operations in the target shading languages constrain their operands to be compile-time constants. A notable example is the optional texel offset parameter to the `Texture2D.Sample` operation. When calling these operations in GLSL, the user is required to pass a "constant expression," and any variables in that expression must therefore be marked with the `const` qualifier (and themselves be initialized with constant expressions). Any GLSL output we generate must of course respect these rules. When calling these operations in HLSL, the user is not so constrained. Instead, they can pass an arbitrary expression, which may involve ordinary variables with no particular markup, and then the compiler is responsible for determining if the actual value after simplification works out to be a constant. In some cases, the requirement that a value be constant might actually trigger things like loop unrolling. Also, it is okay to use a function parameter to determine such a constant expression, as long as the argument turns out to be a constant at all call sites. The way we have decided to tackle these challenges in Slang is that we we propagate a notion of `constexpr`-ness through the IR. This is currently being tackled in `ir-constexpr.cpp` with a combination of forward and backward iterative dataflow: * When the operands to an instruction are all `constexpr`, and the opcode is one we believe can be constant-folded, then we infer that the instruction can be evaluated as `constexpr` * When instruction is required to be `constexpr`, then we infer that all of its operands are also required to be `constexpr`. If this process ever infers that a function parameter is required to be `constexpr`, then we might have to continue propagation at all the call sites to that function. If after all the propagation is done, there are any cases where an instruction is required to be `constexpr`, but it can't be `constexpr` (we weren't able to infer `constexpr`-ness for its operands), then we issue an error. This implementation encodes the idea of `constexpr`-ness in the IR as part of the type system, using a simplified notion of rates. This change adds a `RateQualifiedType` that can represent `@R T`, and then introduces a `ConstExprRate` that can be used for `R`. Many accessors for the type information on IR nodes were updated to distinguish when one wants the "full" type of an IR value (which might include rate information) vs. just the "data" type. A `constexpr` qualifier was added in the front-end, and is being used to decorate the texel offset parameter for `Texture2D.Sample`. Lowering from AST to IR looks for this qalifier and infers when a function parameter must be typed as `@ConstExpr T` instead of just `T`. There are lots of limitations and gotchas in the implementation so far: * The `@ConstExpr` rate is the only one added in this change, but it seems clear that the conceptual `ThreadGroup` rate that was added to represent `groupshared` should probably get folded into the representation. * I'm not 100% pleased with how many places in the IR I have to special-case for rate-qualified types. At the same type, pulling out rate as a distinct field on `IRValue` would probably require that we pay attention to rate everywhere. * I've added a test case to show that we can issue errors when users fail to provide a constant expression for the texel offset, but the actual error message isn't great because it doesn't indicate why a constant expression was required. Realistically the "initial IR" should contain a few more decorations we can use to relate error conditions back to the original code (even if this is in a side-band structure). * I've added a test case that is supposed to show that we can back-propagate `constexpr`-ness to local variables, and I've manually confirmed that it works for Vulkan/SPIR-V output, but the level of Vulkan support in `render_test` today means I can't enable the test for check-in. * While I'm attempting to propagate `@ConstExpr` information from callees to callers, I haven't implemented any logic to specialize callee functions based on values at call sites. * In a similar vein, there is no handling of control-flow dependence in the current code. If we infer that a phi (block parameter) needs to be `@ConstExpr`, then it isn't actually enough to require that the inputs to the phi (arguments from predecessor blocks) are all `@ConstExpr` because we also need any control-flow decisions that pick which incoming edge we take to be `@ConstExpr` as well. * As a practical matter, implicit propagation of `@ConstExpr` from a function body to a function parameter should only be allowed for functions that are "local" to a module. Any function that might be accessed from outside of a module should really have had its `@ConstExpr` parameter marked manually, and our pass should validate that they follow their own rules. Right now we have no kind of visibility (`public` vs `private`) system, so I'm kind of ignoring this issue. While that is a lot of gaps, this is also just enough code to get the Falcor MultiPassPostProcess example working, so I'm inclined to get it checked in. * Fixup: missing expected output for test * Fixup: disable test that relies on [unroll] for now
*	Fix legalization of generic types (#377)	Tim Foley	2018-01-21
\| \| \| \| \| \| \| \|	Previously, all legalizations of a generic type would use the name of the original decl for the "ordinary" part of things, and this would lead to collisions because the names didn't include the mangled generic arguments. This is now fixed by storing the mangled name of the original inside of `struct` declarations created for legalization, and using those names instead. Also adds support for `getElementPtr` instructions when doing IR type legalization. Also tries to make a `DeclRefType` convert to a string using the underlying `DeclRef`. This doesn't help because `DeclRef::toString` doesn't actually include generic arguments either.
*	All compiler fixes to get ir branch work with falcor feature demo.	Yong He	2018-01-17
\| \| \| \| \| \| \| \| \|	- support overloaded generic function. this involves adding a new expression type, `OverloadedExpr2` to hold the candidate expressions for the generic function decl being referenced. - make BitNot a normal IROp instead of an IRPseudoOp - make sure we clone the decorations of parameters when cloning ir functions - propagate geometry shader entry point attributes (`[maxvertexcount]` and `[instance]`) through HLSL emit - IR emit: handle geometry shader entry-point parameter decorations, such as 'triangle'. - IR emit: treat geometry shader stream output typed ir value as `should fold into use`.
*	Make AST and IR share type legalization code (#303)	Tim Foley	2017-12-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Make AST and IR share type legalization code A previous change already made it so that the AST-to-AST lowering/legalization pass could work together with IR-based lowering of `import`ed code, but that change didn't take into account the case where a function written in the AST needed to call an IR function and pass in a type that required legalization. Both the IR-based and AST-based passes had their own approaches to type legalization, that mostly agreed on the desired output, but they ended up creating their own representations for legalized types which would mean that for a function call the caller and callee might end up legalizing the parameter list to use different types. This change tries to fix this issue (and adds a new test case that relies on the fix) by massively overhauling the AST-based legalization pass so that it uses the same type legalization code as the IR. The shared code has been moved out into `legalize-types.{h,cpp}`. Notes: - I eliminated the `FilteredTupleType` type, since it was starting to cause code duplication in a lot of places. Instead, type legalization just creates new `struct` types to represent the result of filtering. - One big consequence of this is that the `LegalType::pair` case needs to remember for each field in the original type which field (if any) in the new `struct` type it maps to - A big source of complexity (and probably bugs) in this code is trying to figure out how to parent these new `struct` definitions effectively. A good follow-on change would be something that outputs declarations on-demand during the AST emit logic (as we do for the IR), just to avoid some of this song and dance. - The old AST type legalization had a notion of both a "tuple" type and a "varying tuple" type. The "tuple" case was quite complex, and combined behavior currently handled by `LegalType::pair` (for splitting into ordinary and special sides) and `LegalType::tuple` (for holding multiple distinct elements to represent the fields of an aggregate). The "varying tuple" case was closer to `LegalType::tuple`, so I tried to just re-use the existing logic for that too. The one place this potentially gets messy is in `reifyTuple()`. - The messiest bit of handling the "varying tuple" concept (which is used for GLSL shader inputs/outputs since they have to be scalarized) is that when passing them as function arguments we need to reify the tuple back into a structured value. Because the `LegalExpr` hierarchy doesn't have type information, but constructing a value of the "original" type requires such information, things get a little messy. - I did not try to deal with any of the logic related to handling system inputs/outputs for cross-compilation purposes. Of course, the long-term goal is that any actual cross-compilation is handled via the IR, but this change can't afford to break the AST-based path just yet. As a result, there is still quite a bit of complexity in the handling of assignment, to deal with cases where "fixups" are required. * fixup: bad code in macro, not caught by Visual Studio compiler * fixup: more stuff missed by VS compiler * fixup: VS continutes to miss stuff in UNREACHABLE_RETURN
*	Parameter blocks (#245)	Tim Foley	2017-11-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Rename existing ParameterBlock to ParameterGroup We are planning to add a new `ParameterBlock<T>` type, which maps to the notion of a "parameter block" as used in the Spire research work. Unfortunately, the compiler codebase already uses the term `ParameterBlock` as catch-all to encompass all of HLSL `cbuffer`/`tbuffer` and GLSL `uniform`/`buffer`/`in`/`out` blocks (all of which are lexical `{}`-enclosed blocks that define parameters...). This change instead renames all of the existing concepts over to `ParameterGroup`, which isn't an ideal name, but at least doesn't directly overlap the new terminology or any existing terminology. The new `ParameterBlockType` case will probably be a subclass of `ParameterGroupType`, since it is a logical extension of the underlying concept. * Add Shader Model 5.1 profiles The HLSL `register(..., space0)` syntax is only allowed on "SM5.1" and later profiles (which is supported by the newer version of `d3dcompiler_47.dll` that comes with the Win10 SDK, but not the older version of `d3dcompiler_47.dll` - good luck figuring out which you have!). This change adds those profiles to our master list of profiles, and nothing else. * First pass at support for `ParameterBlock<T>` - Add the type declaration in stdlib - Add a special case of `ParameterGroupType` for parameter blocks - Handle parameter blocks in type layout (currently handling them identically to constant buffers for now, which isn't going to be right in the long term) - Add an IR pass that basically replaces `ParameterBlock<T>` with `T` - Eventually this should replace it with either `T` or `ConstantBuffer<T>`, depending on whether the layout that was computed required a constant buffer to hold any "free" uniforms - Add first stab at an IR pass to "scalarize" global variables using aggregate types with resources inside. - This currently only applies to global variables, so it won't handle things passed through functions, or used as local variables - It also only supports cases where the references to the original variable are always references to its fields, and not the whole value itself - Add a single test case that technically passes with this level of support, but probably isn't very representative of what we need from the feature * Fold parameter-block desugaring into a more complete "type legalization" pass The basic problem that was arising is that once you desugar `ParameterBlock<T>` into `T`, you then need todeal with splitting `T` into its constituent fields if it contains any resource types. Handling those transformations by following the usual use-def chains wasn't really helping, because you might need systematic rewriting that can really only be handled bottom-up. This change adds a new pass that is intended to perform multiple kinds of type "legalization" at once: - It will turn `ParameterBlock<T>` into `T` - It may at some point also convert `ConstantBuffer<T>` into `T` as well - It will turn an value of an aggregate type that contains resources into N different values (one per field) - As a result of this, it will also deal with AOS-to-SOA conversion of these types Legalization is applied to every function/instruction/value, so that it can make large-scale changes that would be tough to manage with a work list. This pass needs to be run after generics have been fully specialized, so that we know we are always dealing with fully concrete types, so that their legalization for a given target is completely known. This is still work in progress; there's more to be done to get this working with all our test cases, and finish the remaining `ParameterBlock<T>` work. * Improve binding/layout information when using parameter blocks - When doing type layout for a parameter block, don't include the resources consumed by the element type in the resource usage for the parameter block - Note that this is pretty much identical to how a `ConstantBuffer<T>` does not report any `LayoutResourceKind::Uniform` usage, except that `ParameterBlock<T>` is also going to hide underlying texture/sampler reigster usage - The one exception here is that any nested items that use up entire `space`s or `set`s those need to be exposed in the resource usage of the parent (I don't have a test for this) - When type legalization needs to scalarize things, it must propagate layout information down to the new leaf variables. In general, the register/index for a new leaf parameter should be the sum of the offsets for all of the parent variables along the "chain" from the original variable down to the leaf (we aren't dealing with arrays here just yet). - When type legalization decides to eliminate a pointer(-like) type (e.g., desugar `ParameterBlock<T>` over to `T`), actually deal with that in terms of the `LegalVal`s created, so that we can know to turn a `load` into a no-op when applied to a value that got indirection removed. - Hack up the "complex" parameter-block test so that it actually passes (the big hack here is that the HLSL baseline is using names that are generated by the IR, and are unlikely to be stable as we add/remove transformations). - Note: I can't make these be compute tests right now, because regsiter spaces/sets are a feature of D3D12/Vulkan, and our test runner isn't using those APIs.
*	Work towards target-specific function overloads (#210)	Tim Foley	2017-10-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Checkpoint: interface conformance work - Add explicit definition of `saturate` for the GLSL target, which calls through to `clamp` - Needed to add explicit initializer to `__BuiltinFloatingPointType` to allow initialization from a single `float`, so that the `saturate` implementation can be sure that it can initialize a `T` from `0.0` or `1.0`. - This triggered errors in overload resolution, because the logic in place could not figure out that the `T` of the outer generic (`saturate<T>()`) conformed to the interface required by the callee. At this point I have the call to the scalar `clamp()` getting past type-checking, but not the vector or matrix cases. * More fixups for overload resolution inside generics - Make sure value parameters are treated the same as type parameters: we only want to solve for the parameters of the generic actually being applied, and not accidentally generate constraints for outer generics (e.g., when checking the body of a generic function). - Make sure that the diagnostics stuff uses the correct source manager when expanding the location of a builtin. * Fixes for function redeclaration - Handle case of redeclaring a generic function - Enumerate siblings in the parent of the generic not the parent of the function - Add logic to compare generic signatures - When generic signatures match, specialize functions to compatible generic arguments before comparing the function signatures - Fix redeclaration logic to not detect prefix/postifx operators as redeclarations of one another - Build an explicit representation of function redeclaration groups - First declaration is the "primary" and others are stored in a linked list - Make overload resolution handle redeclared functions - Only consider the primary declaration and skip others
*	Replace old notion of "intrinsic" operations	Tim Foley	2017-09-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code previously had an enumerated type for "intrinsic" operations, and allowed functions to be marked `__intrinsic_op(...)` to indicate the operation they map to. The nature of the IR meant that each of these intrinsic ops had to have a corresponding IR opcode, but the `enum` types weren't the same. This change cleans things up a bit by deciding that the `__intrinsic_op(...)` modifier names an actual IR opcode, and so the `IntrinsicOp` enum is gone. The biggest source of complexity here is that there are certain operations that need to be "intrinsic"-ish for the purposes of the current AST-based translation path, because we need them to round-trip from source to AST and back. Right now this is being handled by defining a bunch of "pseudo-ops" which can be used in the `__intrinsic_op` modifier, but which are not meant to be represented in the IR. Currently I don't actually handle this during IR generation. In the long run, once we are using IR for everything that needs cross-compilation, we should be able to eliminate the pseudo-ops in favor of just having these be ordinary (inline) functions defined in the stdlib (e.g., the `+=` operator can just have a direct definition). There was a second category of modifier that gets a little caught up in this, which is the `__intrinsic` modifier, which got used in two ways: 1. A function marked `__intrinsic(glsl, ...)` had what I call a "target intrinsic" modifier, which specified how to lower it for a specific target (e.g., GLSL). 2. A function just marked `__intrinsic` was supposed to be a marker for "this function shouldn't be emitted in the output, because the implementation is expected to be provided" The latter category of function should really be an `__intrinsic_op`, so I translated all those uses. I added a tiny bit of sugar so that `__intrinsic_op` without an explicit opcode will look up an opcode based on the name of the function being called, so that an operation like `sin` can automatically be plumbed through to an equivalent IR op. (The first category is a stopgap for the AST-based cross-compilation, and will hopefully be replaced by something better as we get the IR-based path working). Getting the switch from `__intrinsic` to `__intrinsic_op` working required shuffling around some code in `emit.cpp` that handles looking up those modifiers and emitting builtin operations appropriately during cross-compilation. Depending on where we go with things, a possible extension of this approach is to allow multiple operands to `__intrinsic_op` so that the first specifies the opcode, and then the rest are literal arguments to specify "sub-ops." This could help us handle stuff like texture-fetch operations without an explosion in the number of opcodes. I still need to think about whether this is a good idea or not.
*	Continue work on IR-based codegen	Tim Foley	2017-09-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This gets us far enough that we can convert a single test case to use the IR, under the new `-use-ir` flag. Getting this merged into mainline will at least ensure that we keep the IR path working in a minimal fashion, even when we have to add functionality the existing AST-based path There is definitely some clutter here from keeping both IR-based and AST-based translation around, but I don't want to have a long-lived branch for the IR that gets further and further away from the `master` branch that is actually getting used and tested. Summary of changes: - Add pointer types and basic `load` operation to be able to handle variable declarations - Add basic `call` instruction type - Add simple address math for field reference in l-value - Always add IR for referenced decls to global scope - Add notion of "intrinsic" type modifier, which maps a type declaration directly to an IR opcode (plus optional literal operands to handle things like texture/sampler flavor) - Improve printing of IR instructions, types, operands - Add constant-buffer type to IR - Allow any instruction to be detected as "should be folded into use sites" and use this to tag things of constant-buffer type - Also add logic for implicit base on member expressions, to handle references to `cbuffer` members - Add connection back to original decl to IR variables (including global shader parameters...) - Use reflection name instead of true name when emitting HLSL from IR (so that we can match HLSL output) - Make IR include decorations for type layout - Re-use existing emit logic for HLSL semantics to output `register` semantics for IR-based code - Make IR-based codegen be an option we can enable from the command line - It still isn't on by default (it can barely manage a trivial shader), but it seems better to enable it always instead of putting it under an `#ifdef` - Fix up how we check for intrinsic operations suring AST-based cross compilation so that adding new intrinsic ops for the IR won't break codegen.
*	Move implicit conversion operations to stdlib	Tim Foley	2017-09-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Previously, there were a variety of rules in `check.cpp` to pick the conversion cost for various cases involving scalar, vector, and matrix types. - The main problem of the previous approach is that any lowering pass would need to convert an arbitrary "type cast" node into the right low-level operation(s). - The new approach is that a type conversion (implicit or explicit) always resolves as a call to a constructor/initializer for the destination type. This means that the existing rules around marking operations as builtins should work for lowering. - The support this, the checking logic needs to perform lookup of intializers/constructors when asked to perform conversion between types. It does this by re-using the existing logic for lookup and overload resolution if/when a type was applied in an ordinary context. - Next, we define a modifier that can be attached to constructors/initializers to mark them as suitable for implicit conversion, and associate them with the correct cost to be used when doing overload comparisons. - We add the modifier to all the scalar-to-scalar cases in the stdlib, using the logic that previously existed in semantic checking. - Next we add cases for general vector-to-scalar conversions that also convert type, using the same cost computation as above. - This probably misses various cases, but at this point they can hopefully be added just in the stdlib. - One gotcha here is that in lowering, we need to make sure to lower any kind of call expression to another call expression of the same AST node class, so that we don't lose information on what casts were implicit/hidden in teh source-to-source case. Two notes for potential longer-term changes: 1. There is still some duplication between the type conversion declarations here and the "join" logic for types used for generic arguments. Ideally we'd eventually clean up the "join" logic to be based on convertability, but that isn't a high priority right now, as long as joins continue to pick the right type. 2. It is a bit gross to have to declare all the N^2 conversions for vector/matrix types to duplicate the cases for scalars. For the simple scalar-to-vector case, we might try to support multiple conversion "steps" where both a scalar-to-scalar and a scalar-to-vector step can be allowed (this could be tagged on the modifiers already introduced). That simple option doesn't scale to vector-to-vector element type conversions, though, where you'd really want to make it a generic with a constraint like: vector<T,N> init<U>(vector<U,N> value) where T : ConvertibleFrom<U>; Here the `ConvertibleFrom<U>` interface expresses the fact that a conforming type has an initializer that takes a `U`. What doesn't appear in this context is any notion of conversion costs. We'd need some kind of system for computing the conversion cost of the vector conversion from the cost of the `T` to `U` converion.
*	Add an explicit `Name` type	Tim Foley	2017-08-14
\| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes #23 Up to this point, the compiler has used the ordinary `String` type to represent declaration names, which means a bunch of lookup structures throughout the compiler were string-to-whatever maps, which can reduce efficiency. It also means that things like the `Token` type end up carying a `String` by value and paying for things like reference-counting. This change adds a `Name` type that is used to represent names of variables, types, macros, etc. Names are cached and unique'd globally for a session, and the string-to-name mapping gets done during lexing. From that point on, most mapping is from pointers, which should make all the various table lookups faster. More importantly (possibly), this brings us one step closer to being able to pool-allocate the AST nodes.
*	Data-driven parsing of modifiers	Tim Foley	2017-08-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Just like the previous change did for declaration keywords, this change uses the lexical environment to drive the lookup and dispatch of modifier parsing. This allows us to easily add modifiers to Slang, even when they might conflict with identifiers used in user code (because the modifier names are no longer special keywords, but ordinary identifiers). There was already some support for ideas like this with `__modifier` declarations (`ModifierDecl`) used to introduce some GLSL-specific keywords (so that they wouldn't pollute the namespace of HLSL files). The new approach changes these to be actual `syntax` declarations (`SyntaxDecl`) with the same representation as those used to introduce declaration keywords. Because many modifiers just introduce a single keyword that maps to a simple AST node (no further tokens/data), I modified the handling of syntax declarations so that they can take a user-data parameter, and this allows the common case ("just create an AST node of this type...") to be handled with minimal complications. This also adds in a general-purpose string-based lookup path for AST node classes, that should support programmatic creation in more cases. Statements are now the main case of keywords that need to be made table driven.
*	Major naming overhaul:	Tim Foley	2017-08-09
\| \| \| \| \| \| \| \| \| \|	- `ExpressionSyntaxNode` becomes `Expr` - `StatementSyntaxNode` becomes `Stmt` - `StructSyntaxNode` becomes `StructDecl` - `ProgramSyntaxNode` becomes `ModuleDecl` - `ExpressionType` becomes `Type` - Existing fields names `Type` become `type` - There might be some collateral damage here if there were, e.g., `enum`s named `Type`, but I can live with that for now and fix those up as a I see them
*	Fix up translation of `GetDimensions()`	Tim Foley	2017-07-19
\| \| \| \| \| \| \| \| \| \| \| \|	Fixes #122 - In cases with an explicit mip level being specified, there was a mistake in how the argument for setting the mip level in the GLSL code was constructed that led to a parse error in GLSL - Also, that argument is a `uint` in HLSL and an `int` in GLSL, so an explicit cast was needed - The GLSL functions here seem to require a newer GLSL (at least higher than `420`), so I had to add in a capability for builtins to specify a required GLSL version. For now I made these ones require `450`. - Added a test case to confirm that our lowering works (for some definition of "works")
*	Add reflection support for GLSL thread-group-size modifier	Tim Foley	2017-07-14
\| \| \| \| \| \| \| \| \| \| \| \|	Fixes #15 These are the modifiers like: layout(local_size_x = 16) in; Unlike the HLSL case, these don't get attache to the entry point function itself, so there is a bit more work involed in looking them up. Just to make sure I didn't mess up the HLSL case, I went ahead and added two tests for this capability: one for GLSL and one for HLSL.
*	Don't assign a `binding` to a `push_constant` buffer	Tim Foley	2017-07-14
\| \| \| \| \| \| \| \| \| \| \| \|	Fixes #12 - This was a latent issue, but the previous commit brought it to the front. - As indicated in #12, I don't allocate a descriptor-table slot to the block - Instead I allocate a `PushConstantBuffer` - Unlike what #12 asks for, I don't use a different resource type for the contents of the block - Pretty much all the logic is easiest if these continue to be just plain `Uniform` data
*	Add ability for intrinsics to require GLSL extensions	Tim Foley	2017-07-12
\| \| \| \| \| \|	When cross-compiling, we need to detect when an intrinsic is used that required non-default GLSL capabilities and emit an appropriate `#extension ... : require` line. I'm handling this by attaching a custom modifier to declarations that require an extension in order to be callable.
*	Don't emit interpolation modifiers on struct fields when outputting GLSL	Tim Foley	2017-07-12
\| \| \| \| \| \| \| \|	HLSL (and thus Slang) commonly puts interpolation modifiers like `sample` on the fields of `struct` types used as stage input/output, while GLSL only allows them on global-scope `in` and `out` variables (or ones in blocks). This change emits a really hacky filtering step to skip over certain modifiers when emitting a declaration. This lets us skip interpolation-mode modifiers when outputting a struct field to GLSL. Note: this probably gets the `in` or `out` block case wrong...
*	Initial work on handling resources in structs during cross-compilation	Tim Foley	2017-07-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- The basic idea is that during the "lowering" pass, some types (notably: aggregate types that contain resource variables) will get turned into "tuple" types, which are pseduo-types that aren't meant to survive lowering. - An attempt to declare a variable with a tuple type expands into a tuple of declarations - An attempt to reference such a tuple-ified variable leads to a tuple of expressions - An attempt to extract a member from such a tuple expression will pick the appropriate sub-element - Dereference a tuple by dereferencing the primary expression - Expand a tuple in the argument list to a call into N arguments (by recursively flattening the tuple) - Don't create tuple types when not generating GLSL - Make sure to preserve the specialized type of a call expression through lowering, since emission of unchecked calls relies on that info. - TODO: maybe the infix/prefix/postifx/select information should come in as a side-band? Should we have modifiers on expressions? - Make sure to offset the layout for a nested field based on teh base offset of its parent variable, when generating declarations for nested fields
*	More cross-compilation fixes	Tim Foley	2017-07-10
\| \| \| \| \| \| \| \| \|	- Add GLSL mappings for more `Texture` methods - The annoying one here is `Texture.Load()` because it doesn't take a sampler, but the GLSL equivalent needs one (while the SPIR-V does not). I've hacked this pretty seriously for now. - Try to ensure that we add `uniform` to global declarations that need it in GLSL - When outputting an `in` or `out` variable that might have been created from an `inout` shader parameter, filter the layout qualifiers that we output to only cover the appropriate resource kind.
*	Start to support cross-compilation via "lowering" pass	Tim Foley	2017-07-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- The big change here is the introduction of a "lowering" pass that takes an input AST from the semantic checker, and produces an output AST suitable for emitting. The intention is that he lowering pass is responsible for: - Stripping out unused code (when we have enough information to do so), by only outputting declarations that are transitively references from an entry point - When cross-compiling to GLSL, generating a suitable `void main()` entry point to wrap the user-written entry-point function - (Eventually) legalizing types in the program, by scalarizing aggregate types that mix uniform and resource types - (Eventually) instantiating generic declarations so that the resulting code only deals with fully specialized declarations - (Eventually) de-sugaring OOP constructs into basic "structs and functions" form - (Eventually) instantiating code that depends on interface types at the concrete types chosen - It is clear that there is still a lot of work to be done there, to this change is really about getting infrastructure in place without breaking the existing test cases. - One cleanup here is that we get rid of the idea of whole-translation-unit output, since that was specific to HLSL output, and there is really no strong reason for keeping it. Users should now just ask for the output for each entry point that they wanted to generate. - The biggest source of complexity for the lowering process is that it needs to produce the same AST structure as the input, to deal with the complexity of the rewriter case. That is, we need the output to be able to reproduce the input exactly in the case where we are rewriting and nothing needs to change, so the output format needs at least the degrees of freedom of the input. - As a result, we end up having to distinguish "rewriter" and "full" modes in both lowering and code-emit steps, so that we can react appropriately. - Generating a GLSL `main()` also adds a lot of complexity. Right now I'm using the simplest approach, where we always output the Slang/HLSL entry point as an ordinary function (as written) and then emit a simple GLSL `main()` to call it. I generate globals for all the shader inputs/outputs (these need to be scalarized and have explicit `location`s attached), and then collect these into the `struct` types of the original parameters as needed. - This approach will start to have some major down-sides once we have to deal with "arrayed" input/output - A long-term question here is how to replace entry-point parameter types with scalarized and/or "transposed" versions, while still letting the original code work as written (including copying those inputs to temporary arrays) - Split `BlockStatementSyntaxNode` into: - `BlockStmt` which just provides a scope around a `body` statement - `SeqStmt` which just allows multiple statements to be treated as one - Change how we emit `for` loops, to deal with the case where the initialization part might expand into multiple statements - Basically `for(A;B;C) {D}` becomes `{A; for(;B;C) {D}}`, so we can handle arbitrary statements for `A` - As an additional wrinkle, when we are rewriting HLSL, we just generate `A; for(;B;C) {D}` to deal with the broken scoping there - This change is needed because the lowering pass was sometimes expanding the original initialization statement `A` into a block `{A}`. Certainly if it declared multiple variables we'd need to handle it, and this seemed the easiest way - A more significant challenge for lowering would come if/when we ever wanted to support true short-circuiting behavior for `&&` and `\|\|` - For right now I'm not changing the behavior of the "rewriter" mode, so we still have `UnparsedStmt` instances being generated, but it is clear that eventually we need to parse all input, even if we can't type-check 100% of it. This is required so that we can rewrite user code that might refer to a shader input with interface type.
*	Add meta-definitions for AST types	Tim Foley	2017-06-30
	- The big change here is that all the definitions for syntax-node classes have been macro-ized, to that we can do light metaprogramming over them - The use of macros for this has big down-sides, but I'm not quite ready to do anything more heavy-weight right now - The macro-ized definitions can be included multiple times, to generate different declarations/code as needed - The first example of using this meta-programming facility is a new visitor system - The actual visitor base classes and the dispatch logic are all generated from the meta-files - There was only one visitor left in the code: the semantics checker, so that was ported to the new system. - All current test cases pass, so of course that means all is well.