| Age | Commit message (Collapse) | Author |
|
* Add __truncate and __sampledType for spirv_asm
Allows some texture tests to start passing
* add __isVector
Currently unused
* Add 1-vector legalization pass (WIP)
* Add capabilities for image types
* neaten instruction dumping
* add 1-vector test
* Add a couple of cases to vec1 legalization
* Remove texture tests from expected failures
* comment
* regenerate vs projects
* Remove redundant define form synchapi emulation
* refactoring image methods
* All sample functions refactored
* Remove incorrect glsl intrinsics
Partially addresses https://github.com/shader-slang/slang/issues/3174
* __subscript image ops via writing funcs
* Extract texture struct writing from core.meta.slang
* Abstract out cuda intrinsic
* Remvoe erroneous call to opDecorateIndex
* spirv asm IR utils
* Correct position of loads for SPIR-V asm inst operands
* Raise constructors to global scope during spir-v legalization
* Correct snippet output
* Implement most texture sampling ops for SPIR-V
* Legalize 1-vectors for glsl too
* Make SPIR-V inst operands non-hoistable
* Better 1-vector legalization
* Put textures in ptrs for spirv
* insert missing break
* Add vec1 legalization test
* Add some missing pieces to slang-ir-insts
* Greatly neaten vec1 legalization
* a
* Neaten vec1 legalization
* Add image read and write intrinsics for spir-v
* Squash warnings
* regenerate vs projects
* Drop redundant guards
* Drop 5 tests from expected failure list
* Inst numbering changes to cross compile tests
* vec1 legalization tests only on vk
* Correct location of asm op emit
* Inline constant in spirv-asm
* Correct signedness for lane in wave intrinsics
* Extract element from float1 for cuda
* squash warnings
* Neaten spirv-emit
* dedupe more capabilities
* warnings
* neaten assert
* comments
* comments
|
|
* Misc. SPIRV Fixes, Part 2.
* Fix up.
* Fix.
* Add system smenatic values.
* 16 bit int and floats, matrix/vector reshape, bool ops.
* Fix.
* Fix.
* Allow push constant entry point params.
* entrypoint params.
* swizzleSet and swizzledStore.
* packoffset.
* string hash.
* Fix.
* Matrix arithmetics.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Various dxc/fxc compatibility fixes.
* Cleanup.
* Fix test cases.
* Fix comments.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* For C-like targets, emit resource declarations before other globals
* Remove unused tests
|
|
* Fix optimization pass not converging.
* Fix.
* Fix tests.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* More control flow and Phi param simplifications.
* Fix.
* Fix gcc error.
* Fix.
* More IR cleanup.
* Fix bug in phi param dce + ifelse simplify.
* Propagate and DCE side-effect-free functions.
* Enhance CFG simplifcation to remove loops with no side effects.
* Fix.
* Fixes.
* Fix tests. Add [__AlwaysFoldIntoUseSite] for rayPayloadLocation.
* More cleanup.
* Fixes.
* Fix.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Overhaul global inst deduplication and cpp/cuda backend.
* Update IR documentation.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Work to mitigate SPIR-V bloat
SPIR-V is not an especially compact format, but some patterns in how Slang generates code and then runs it through `spirv-opt` lead to many redundant field-by-field copy operations being emitted. This change attempts to address some of the resulting bloat from the Slang side of things.
Note: experimentation shows that the bloat is less pronounced when running either *no* SPIR-V optimizations or *full* SPIR-V optimizations, so it is also likely that the bloat should be addressed by changing which `spirv-opt` passes the Slang compiler runs in default (`-O1`) builds. Such changes should come as a distinct pull request.
This change primarily does two things:
First, the code generation strategy for passing arguments to `out` and `inout` parameters has been changed. In the past, the compiler would *always* copy the argument value into a temporary, then pass the address of the temporary, and then write back the value after the call. The new code generation strategy attempts to identify when an argument value already has a simple address in memory and passes that address directly when possible. This eliminates many copy operations that occur before/after calls to functions with `out`/`inout` parameters.
Second, we introduce an IR optimization pass that detects call sites where the entire contents of a buffer (usually a constant buffer) is being passed to a callee function, such that many bytes are loaded and then passed even if only very few are used in the callee. The pass moves the load operations from the caller to a specialized version of the the callee where possible (e.g., when the constant buffer in question is a global shader parameter). Doing this eliminates another major category of copies.
Notes:
* The IR lowering logic is complicated by the fact that several kinds of l-values (values that are usable as the desitnation of assignment, or for `out`/`inout` arguments) are not actually addressable. An easy example is a non-contiguous swizzle like `v.xwz` on a `float4`, where the value occupies 12 bytes, but not 12 consecutive bytes with a single address. There are many more corner cases like that and the IR lowering pass carries a lot of complexity to deal with them. A more systematic overhaul is due some time soon.
* The IR representation of `out` and `inout` parameters deserves some careful scrutiny when making these kinds of changes. The official semantics of `inout` in HLSL has been "copy in copy out" (and `out` is just "copy out") which is observably different from any solution that passes in the address of an l-value directly. By making this change we are saying that Slang's semantics are not precisely those of legacy HLSL, and that our semantics for `inout` parameters are closer to those of `inout` in Swift or of a mutable borrow in Rust. In the Swift case the implementation can freely pass the underlying storage of an l-value or the address of a temporary, and valid programs may not observe the different. It is thus illegal to observe the value in a storage local while a mutation to that location is "in flight." All of this is way more detailed and technical than 99% of Slang users will ever care about, but importantly it gives us semantic cover to eliminate these copies in the IR, and also to emit output C++ code that implements `out` and `inout` as by-reference parameter passing.
* There was an exsting generic pass for specializing functions based on call sites that uses a "template method" style of pattern to customize its behavior. That pass needed to be generalized to handle this use case because it had previously operated on the assumption that the "desire" to specialize a callee function must be driven by the parameter declarations of that function, and not on the argument values passed in. The code has been slightly refactored to allow the policy for specialization to consider both parameters and arguments.
* Unsurprisingly, a bunch of the GLSL (and thus SPIR-V) generated has changed with this work, so several baseline `.slang.glsl` files needed to be updated.
* This change is incomplete in that it does not address broader cases of buffer loads, including both partial loads from constant buffers (just loading one field, but a field that uses a "large" structure type), and loads from multi-element buffers (a lot from a structured buffer where the element type is "large"). The main question in each of those cases is how to define how "large" a structure needs to be before we decide to try and sink loads into callee functions like this. In the worst case, sinking loads in this way may actually create *more* memory traffic (because the same values get loaded in multiple callee functions).
* fixup: run premake
* fixup: typo
|
|
* Use "capability" system to select VKRT extension
Slang currently supports translation of ray tracing shader code to Vulkan GLSL code that uses the `GL_NV_ray_tracing` extension. A multi-vendor equivalent of that extension has been released as `GL_EXT_ray_tracing` and we want Slang to support that extension as well.
At the simplest, making the change from one extension to the other is just a matter of changing a few strings, since it does not appear that anything of significance was changed at the GLSL level (or even in SPIR-V). Where this gets trickier is when we have users who want us to support *both* extensions, and to be able to switch between them.
The solution we've implemented here more or less amounts to:
* If you don't tell the compiler which extension to use, it will default to `GL_EXT_ray_tracing` (the newer multi-vendor one).
* If you explicitly want the older extension, you can opt into it using the `-profile` option or via a new API for explicitly adding capabilities to your target.
Making that work required a few different kinds of changes:
* The options parsing and public API needed ways to add optional capabilities to a target.
* During GLSL code emit, we can check the capabilities that were added to the target to see if the `GL_NV_ray_tracing` extension was explicitly enabled and, if not, default to using the `GL_EXT_ray_tracing` names for things. This step is needed because some of the modifiers/attributes involved in the extension have to be handled explicitly in the code generator rather than implicitly as part of mapping intrinsic functions.
* We add two different translations to the relevant operatiosn in the stdlib, one marked with each of the extensions. If profile/capability-based overload resolution can be relied on to pick the right one, this should Just Work.
* Next, a bunch of work had to go into making capability-based overloading Just Work for the purposes of this change. There's been a nearly complete reworking of the implementation of `CapabilitySet` here to make it more suitable for our needs.
* The tests that were using ray tracing translation for Vulkan needed to be updated. For some of them I updated their baselines to use `GL_EXT_ray_tracing` so that they can test the new path. For others, I updated the command line for the test case so that it explicitly opts into using `GL_NV_ray_tracing`. The result is that we have some coverage of each extension. I would have liked to have each test run in both modes, but our pass-through glslang support doesn't support `-D` options, so I couldn't take that step easily.
This change does *not* add support for `GL_EXT_ray_query`, the extension that supports "DXR 1.1" style queries under Vulkan. Adding support for that extension should hopefully be a smaller step because it doesn't have the same multiple-extensions issue.
This change does *not* address a lot of possible avenues for improvement or cleanup around the capability system. It focuses only on those changes that are necessary to make the ray tracing feature work and leaves the rest for future work.
* fixup: infinite loop
* Comment-only change to retrigger TC build
|
|
* Run SSA pass to clean up generic temporary variables during lowering.
* Fix `undefined` emitting logic.
* revert dumpir control flag
* Defer fold decision of `undefined` values after special case logic for GLSL and HLSL.
* Update expected test result.
* Manually update raygen.slang.glsl to minimize change.
* fix formatting
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
* Refactor several IR passes
This change takes some IR passes that lived together in `ir.cpp` and moves them into their own files to improve clarity.
In most cases these were passes introduced early in the life of the IR, so that it didn't seem like a big deal to have them all in one file, but now that `ir.cpp` has grown unwieldly this seems like an important cleanup to make.
To give a quick rundown of the passes involved:
* The IR "linking" step has been pulled out to `ir-link.{h,cpp}`. This code for this pass is pretty much identical to what was in `ir.cpp`, and no attempt has been made to clean up or refactor it in the current change.
* The GLSL legalization step has been pulled out to `ir-glsl-legalize.{h,cpp}`. This used to be invoked directly from the linking step, but has been made a new top-level pass invoked from `emit.cpp`. Just like with the linking, the code in the new file is just a copy-paste of what was in `ir.cpp`, and no attempt at cleanup has been made. Also note that it might be a good idea to move this pass later in the overall sequence, but this PR doesn't attempt to do that as it could change results.
* The generic specialization step has been pulled out to `ir-specialize.{h,cpp}`. The file name does not explicitly reference *generic* specialization because I anticipate this pass having to perform other kinds of specialization as well. The code in this case amounts to a heavy cleanup/refactoring pass and thus deserves careful scrutiny. The reason for the cleanup is that the generic specialization step used to be part of the "linking" step long ago, and continued to share infrastructure with it long after that stopped making sense. The newly cleaned up pass has much simpler logic that should be easy enough to follow from the comments.
* In order to reduce code dulication, the IR "cloning" part of the `ir-specialize-resources.{h,cpp}` pass was pulled into its own files (`ir-clone.{h,cpp}`) that both the generic specialization step and the resource-based specialization step now share.
The remaining changes then pertain to deleting a bunch of code out of `ir.cpp` and adding the new files to the build.
The only test that needed updating was `vkray/raygen`, where some subtle ordering change in the refactored generic specialization logic has lead to the relative order of the specialized `TraceRay` and `saturate` functions beind reversed.
* fixup: typo in assert
* fixup: typos in comments
|
|
* Change how buffers are emitted
This is a change with a lot of pieces, which can't always be separated out cleanly. I'm going to walk through them in what I hope is a logical order.
The main goal of this change was to allow arrays of structured buffers to translate to Vulkan. Consider two declarations of structured buffers in HLSL/Slang:
```hlsl
StructuredBuffer<X> single;
StructuredBuffer<Y> multiple[10];
```
The current translation logic was handling `single` by translating it into an *unnamed* GLSL `buffer` block like:
```glsl
layout(std430)
buffer _S1
{
X single[];
};
```
That syntax allows an expression like `single[i]` in Slang to be translated simply as `single[i]` in GLSL.
But that naive translating doesn't work for `multiple`, since we need to declare a array of blocks in GLSL, which requires giving the whole thing a name:
```glsl
layout(std430)
buffer _S2
{
Y _data[];
} multiple[10];
```
Now a reference to `multiple[i][j]` in Slang needs to become `multiple[i]._data[j]` in GLSL.
To avoid having way too many special cases around single structured buffers vs. arrays, it makes sense to allows emit things in the latter form, so that we instead lower `single` as:
```glsl
layout(std430)
buffer _S1
{
X _data[];
} single;
```
So that now a reference to `single[i]` becomes `single._data[i]` in GLSL.
Most of that can be handled in the standard library translation of the structured buffer indexing operations.
The only wrinkle there is that there were some *old* special-case instructions in the IR intended to handle buffer load/store operations (these were added back when I was trying to keep the "VM" path working). These aren't really needed to have structured-buffer operations work; they can be handled as ordinary functions as far as the stdlib is concerned. I removed the old instructions.
Along the way, it became clear that a few other cases follow the same pattern. Byte-addressed buffers are an obvious case. We were lowering HLSL/Slang:
```hlsl
ByteAddressBuffer b;
...
uint x = b.Load(0);
```
to GLSL like:
```glsl
layout(std430)
buffer _S1
{
uint b[];
};
...
uint x = b[0];
```
That logic would fail for arrays the same way that the structured buffer case was failing. The fix is the same: use named `buffer` blocks and then introduce an explicit `_data` field:
```glsl
layout(std430)
buffer _S1
{
uint _data[];
} b;
...
uint x = b._data[0];
```
Just like with structured buffers, all of the VK translation for operations on byte-addressed buffers can be implemented directly in teh stdlib, so once the emit logic was changed it was just a matter of adding `._data` to a bunch of VK tranlsations.
It turns out that arrays of constant buffers have more or less the same problem, and furthermore we have some problems with any code that directly uses the modern HLSL `ConstantBuffer<T>` type.
Note: the emit logic around constant buffers sometimes refers to "parameter groups" because that is being used in the compiler as a catch-all term for constant buffers, texture buffers, and parameter blocks.
The existing code was going out of its way to reproduce the way that constant buffer declarations are implicitly referenced in HLSL:
```hlsl
cbuffer C { float f; }
...
float tmp = f; // No reference to `C` here
```
This can be seen in the emit logic with the `isDerefBaseImplicit` function, which is used to take the internal IR representation for a reference to `f` (which is closer to the expression `(*C).f` or `C->f`) and leave off any reference to `C` so that we emit just `f`.
That kind of logic just flat out doesn't work in some important cases. Arrays of constant buffers are a clear one:
```hlsl
ConstantBuffer<X> cbArray[3];
...
X x = cbArray[0];
```
There is no way to translate that to an ordinary `cbuffer` declaration at all. The same problem can be created without arrays, though:
```hlsl
ConstantBuffer<X> singleCB;
...
X x = singleCB;
```
The current strategy for translating constant buffers was translating `singleCB` into a `cbuffer` declaration that reproduced the fields of `X` as its members, which just wouldn't work:
```hlsl
cbuffer singleCB
{
float f; // field of `X`
}
...
X x = singleCB; // ERROR: there is nothing named `singleCB` in this HLSL
```
The new strategy is more consistent. We still generate a `cbuffer` declaration for a single constant buffer, but we always give it a single field of the chosen element type:
```hlsl
cbuffer singleCB
{
X singleCB;
}
...
X x = singleCB; // this works fine!
```
And in the array case we generate code that uses the explicit `ConstantBuffer<T>` type:
```hlsl
ConstantBuffer<X> cbArray[3];
...
X x = cbArray[0];
```
The GLSL output is more complicated because unlike with HLSL there is no implicit conversion from a uniform block to its element type (there is no notion of an element type). The array case thus needs a `_data` field similar to what we do for structured buffers:
```glsl
layout(std140)
uniform _S3
{
X _data;
} cbArray[3];
...
X x = cbArray[0]._data;
```
And then the non-array case needs to have a similar `_data` field for consistency:
```glsl
layout(std140)
uniform _S1
{
X _data;
} singleCB;
...
X x = singleCB._data;
```
This is handled by inserting the necessary reference to `_data` whenever we dereference a constant buffer, either as part of a load instruction (loading from the whole CB as a pointer), or an `IRFieldAddress` instruction which forms a pointer into the CB (e.g., `&(singleCB->f)` becomes `singleCB._data.f`).
The current emit logic handles `ParameterBlock<X>` differently from `ConstantBuffer<X>`, but really only to allow parameter blocks to be explicitly named in the output, while constant buffers were left implicit by default. Thus the only difference was a legacy one (from back when trying to exactly reproduce the HLSL text we got as input was considered an important goal), and the new approach to emitting constant buffers would get rid of it.
I removed the separate logic for emitting `ParameterBlock<X>` and just let the handling for constant buffers deal with it.
Note that any resource types inside of a `ParameterBlock<X>` would have been moved out as part of legalization, so that a parameter block is 100% equivalent to a constant buffer when it comes time to emit code.
Unsurprisingly, changing the way we generate HLSL and GLSL output for all these buffer types meant that any tests that were directly comparing the output of `slangc` against `fxc`, `dxc`, or `glslang` broke.
The basic approach to fixing the breakage in GLSL tests was to update the GLSL baseline to reflect the new output startegy. In some cases I used macros to name the various `_S<digits>` temporaries so that future renaming will hopefully be easier (it would be great if we auto-generated temporary names with a bit more context). There was one GLSL test (`tests/bugs/vk-structured-buffer-binding`) that was using raw GLSL expected output, and this was changed to use a GLSL baseline to generate SPIR-V for comparison.
For HLSL tests we were sometimes running the same input file through `slangc` and `fxc`/`dxc`, and in these cases I macro-ized the various `cbuffer` declarations to generate different declarations depending on the compiler.
I completely dropped the tests coming from the D3D SDK because they aren't providing much coverage, and updating them would change them so far from the original code that the purported benefit (using a body of existing shaders) would be lost.
I also dropped the explicit matrix layout qualifiers in the `matrix-layout` test because the new output strategy breaks those for GLSL (you can't put matrix layout qualifiers on `struct` fields, and now the body of every constant buffer is inside a `struct`). This isn't as big of a loss as it seems, because our handling of those qualifiers wasn't really right to begin with. Slang users should only be setting the matrix layout mode globally (and we should probably switch to error out on the explicit qualifiers for now).
The other thing that got dropped is tests involving `packoffset` modifiers.
Slang already warns that it doesn't support these, and the way they were used in the test cases is actually misleading. For the binding/layout-related tests, the goal was to show that Slang reproduces the same layout as fxc, in which case explicitly enforcing a layout via `packoffset` seems like cheating (are we sure we enforced the layout fxc would have produced?). The real reason was that Slang used to emit explicit `packoffset` on *every* field of a `cbuffer` it would output, because of an `fxc` bug where you couldn't use `register` on textures/samplers declared inside a `cbuffer` unless *every* field in the `cbuffer` used a `register` or `packoffset` modifier. Slang hasn't required that behavior in a while because it now splits textures and samplers, and the one test case where we needed `packoffset` to work around the `fxc` bug in the baseline HLSL has been macro-ified even more to work around the bug.
The amount of churn in the test cases is unfortunate, but it continues to point at the weakness of any testing strategy that checks for exact equivalent between Slang's output and that of other compilers. We need to keep working to replace these tests with better alternatives.
In `check.cpp` there is logic to perform implicit dereferencing, so that if you write `obj.f` where `obj` is a `ConstantBuffer<X>` (or some other "pointer-like" type) and `f` is a field in `X`, then this effectively translates as `(*obj).f`. That is, we dereference the value of type `ConstantBuffer<X>` to get a value of type `X`, and then refer to the field of the `X` value.
There was a problem where the logic to insert that kind of implicit dereference operation was using a reference (`auto& type = ...`) for the type of the expression being dereferenced, and then clobbering it. This would mean that an expression of type `ConstantBuffer<X>` would have its type overwritten to be just `X` and then codegen would break later on.
I'm not sure how we haven't run into that before.
The `array-of-buffers` test case was added to confirm that we now support arrays of constant, structured, and byte-address buffers for both DXIL and SPIR-V output.
Okay, so that was a lot of stuff, but hopefully it is clear how this all works to make the output of the compiler more consistent and explicit, while also supporting the required new functionality.
* fixup: review feedback
|
|
* Update version of glslang used
* Update VK raytracing support for final extension spec
A lot of this change is just plain renaming: The `NVX` suffixes become just `NV`, and the extension name changes from `GL_NVX_raytracing` to `GL_NV_ray_tracing`.
The Slang standard library and the GLSL baselines for the tests are consistently updated.
The other detail is that the final spec requires the "payload" identifier in a `traceNV()` call to be a compile-time constant, which means it cannot be defined as a local variable first, as in:
```glsl
int payloadID = 0;
traceNV(..., payloadID); // ERROR
```
In terms of how the original support was implemented, the payload ID is being computed via a special builtin function that maps each global GLSL payload variable to a unique ID. There are a few ways we could try to resolve the problem here:
1. We could aspire to put our equivalent of the `constexpr` modifier on the output of the function, so that the GLSL variable gets declared `const` and thus fits the GLSL rules for a constant expression.
2. We could introduce a pass to replace the payload-location instructions with literal integers.
3. We could use a special-purpose instruction instead of a builtin function call, and have that instruction indicate that it doesn't have side effects (so it can be folded into the call site)
4. We could somehow mark the builtin function as not having side effects.
We choose option (4) simply because it provides a feature that could have other applications. This change adds a `[__readNone]` attribute that can be applied to function declarations to express a promise on the part of the programmer that the given function has no side effects and computes its result strictly from the bits of its input arguments (and not things they point to, etc.). This mirrors an equivalent function attribute in LLVM.
We mark the function that computes a ray payload location with this attribute, and propagate the attribute through the layers of the IR, so that when the emit logic asks if an operation has side effects (to see if it can be folded into the arguments of a subsequent expression), we get an affirmative response.
This change should get all of the features that were present in the experiemntal `NVX` extension working with the final extension spec. It does not address callable shaders, which will come as a subsequent change.
|
|
* Move to newer glslang
* Support cross-compilation of ray tracing shaders to Vulkan
This change allows HLSL shaders authored for DirectX Raytracing (DXR) to be cross-compiled to run with the experimental `GL_NVX_raytracing` extension (aka "VKRay").
* The GLSL extension spec is marked as experimental, so that any shaders written using this support should be ready for breaking changes when the spec is finalized.
* "Callable shaders" are not exposed throug the GLSL extension, so this feature of DXR will not be cross-compiled.
* The experimental Vulkan raytracing extension does not have an equivalent to DXR's "local root signature" concept. This does not visibly impact shader translation (because the local/global root signature mapping is handled outside of the HLSL code), but in practice it means that applications which rely on local root signatures on their DXR path will not be able to use the translation in this change as-is; more work will be needed.
The simplest part of the implementation was to go into the Slang standard library and start adding GLSL translations for the various DXR operations.
In some cases, like mapping `IgnoreHit()` to `ignoreIntersectionNVX()` this is almost trivial.
The various functions to query system-provided values (e.g., `RayTMin()`) were also easy, with the only gotcha being that they map to variables rather than function calls in GLSL, and our handling of `__target_intrinsic` assumes that a bare identifier represents a replacement function name, and not a full expression, so we have to wrap these definitions in parentheses.
The tricky operations are then `TraceRay<P>()` and `ReportHit<A>()`, because these two are generics/templates in HLSL.
GLSL doesn't support generics, even for "standard library" functions, so the raytracing extension implements a slightly complex workaround: the matching operations `traceNVX()` and `reportIntersectionNVX()` pass the payload/attributes argument data via a global variable.
That is, shader code for the GLSL extensions writes to the global variable and then calls the intrinsic function.
The linkage between the call site and the global is established by a modifier keyword (`rayPayloadNVX` and `hitAttributeNVX`, respectively) and in the case of ray payload also uses `location` number to identify which payload global to use (since a single shader can trace rays with multiple payload types).
Our translation strategy in Slang tries to leverage standard language mechanisms instead of special-case logic.
For example, to translate the `ReportHit<A>()` function, we provide both a default declaration that will work for HLSL (where the operation is built-in with the signature given), and a *definition* marked with the `__specialized_for_target(glsl)` modifier.
The GLSL definition declares a function `static` variable that will fill the role of the required global, and then does what the GLSL spec requires: assigns to the global, and then calls the `reportIntersectionNVX` builtin (which we declare as a separate builtin).
Our ordinary lowering process will turn that `static` variable into an ordinary global in the IR, and the `[__vulkanHitAttributes]` attribute on the variable will be emitted as `hitAttributeNVX` in the output.
There is no additional cross-compilation logic in Slang specific to `ReportHit<A>()` - the target-specific definition in the standard library Just Works.
The case for `TraceRay<P>()` is a bit more complicated, simply because the GLSL `traceNVX()` function needs to be passed the `location` for the payload global.
We implement the payload global as a function-`static` variable, with the knowledge that every unique specialization of `TraceRay<P>()` will generate a unique global variable of type `P` to implement our function-`static` variable.
We then add a slightly magical builtin function `__rayPayloadLocation()` that can map such a variable to its generated `location`; the logic for this is implemented in `emit.cpp` and described below.
We also changed the `RayDesc` and `BuiltinTriangleIntersectionAttributes` types from "magic" intrinsic types over to ordinary types (because the GLSL output needs to declare them as ordinary `struct` types).
This ends up removing some cases in the AST and IR type representations.
By itself this change would break HLSL emit, because in that case the types really are intrinsic.
We added a `__target_intrinsic` modifier to these types to make them intrinsic for HLSL, and then updated the downstream passes to handle the notion of target-intrinsic types.
The logic for binding/layout of entry point inputs and outputs was updated so that raytracing stages don't follow the default logic for varying input/output parameters.
This is because the input/output parameters of a raytracing entry point aren't really "varying" in the same sense as those in the rasterization pipeline.
In particular, the SPIR-V model for raytracing input and output treats "ray payload" and "hit attributes" parameters as being in a distinct storage class from `in` or `out` parameters.
We also detect cases where a ray tracing stage declares inputs/outputs that it shouldn't have. This logic could conceivably be extended to other stages (e.g., to give an error on a compute shader with user-defined varying input/output).
The type layout logic added cases for handling raytracing payload and hit-attribute data, but this is currently just a stub implementation that follows the same logic as for varying `in` and `out` parameters (it cannot give meaningful byte sizes/offsets right now).
To my knowledge the GLSL spec doesn't currently specify anything about layout, and I haven't read the DXR spec language carefully enough to know what it says about layout.
A future change should update the layout logic to allow for byte-based layout of ray payloads, etc. so that we can query this information via reflection.
The GLSL legalization logic in `ir.cpp` was updated to factor out the per-entry-point-parameter code into its own function, and then that function was updated to special-case the input/output of a ray-tracing shader.
While for rasterization stages we typically want to take the user-declared input/output and "scalarize" it for use in GLSL (in part to deal with language limitations, and in part to tease system values apart from user-defined input/output), the GLSL spec for raytracing requires payload and hit attribute parameters to be declared as single variables. There is also the issue that even for an `in out` parameter, a ray payload parameter should only turn into a single global, whereas the handling for varying `in out` parameters generates both an `in` and an `out` global for the GLSL case.
Other than the handling of entry point parameters, the GLSL legalization pass doesn't need to do anything special for ray tracing shaders.
The trickiest change in the `emit.cpp` logic is that we now generate `location`s for ray payload arguments (the outgoing from a `TraceRay()` call) on demand during code generation.
This is a bit hacky, and it would be nice to handle it as a separate pass on the IR rather than clutter up the emit logic, but this approach was expedient.
Basically, any of the global variables that got generated from the `static` declarations in the standard library implementation of `TraceRay()` will trigger the logic to assign them a `location`.
The logic for emitting intrinsic operations added a few new `$`-based escape sequences. The `$XP` case handles emitting the location of a generated ray payload variable; this is how we emit the matching location at the site where we call `traceNVX`. The `$XT` case emits the appropriate translation for `RayTCurrent()` in HLSL, because it maps to something different depending on the target stage.
All of the test cases here consist of a pair of an HLSL/Slang shader written to the DXR spec, plus a matching GLSL shader for a baseline.
The GLSL shaders are carefully designed so that when fed into glslang they will produce the same SPIR-V as our cross-compilation process.
This kind of testing is quite fragile, but it seems to be the best we can do until our testing framework code supports *both* DXR and VKRay.
A bunch of the core changes ended up being blocked on issues in the rest of the compiler, so some additional features go implemented or fixed along the way:
The first big wall this work ran into was that the `__specialized_for_target` modifier hasn't actually been working correctly for a while.
It turns out that for the one function that is using it, `saturate()`, we have been outputting the workaround GLSL function in *all* cases (including for HLSL output) rather than only on GLSL targets.
The problem here is that for a generic function with a `__specialized_for_target` modifier or a `__target_intrinsic` modifier, the IR-level decoration will end up attached to the `IRFunc` instruction nested in the `IRGeneric`, but the logic for comparing IR declarations to see which is more specialized (via `getTargetSpecializationLevel()`) was looking only at decorations on the top-level value (the generic).
The quick (hacky) fix here is to make `getTargetSpecializationLevel()` try to look at the return value of a generic rather than the generic itself, so that it can see the decorations that indicate target-specific functions.
A more refined fix would be to attach target-specificity decorations to the outer-most generic (to simplify the "linking" logic).
The only reason not to fold that into the current fix is that the `__target_intrinsic` modifier currently serves double-duty as a marker of target specialization *and* information to drive emit logic. The latter (the emit-related stuff) currently needs to live on the `IRFunc`, and moving it to the generic could easily break a lot of code.
This needs more work in a follow-on fix, but for now target specialization should again be working.
The other big gotcha that the simple "just use the standard library" strategy ran into was that function-`static` variables weren't actually implemented yet, and in particular function-`static` variables inside of generic functions required some careful coding.
The logic in `lower-to-ir.cpp` has this `emitOuterGenerics()` function that is supposed to take a declaration that might be nested inside of zero or more levels of AST generics, and emit corresponding IR generics for all those levels.
This is needed because two different AST functions nested inside a single generic `struct` declaration should turn into distinct `IRFunc`s nested in distinct `IRGeneric`s.
The tricky bit to making that all work is that the same AST-level generic type parameter will then map to *different* IR-level instructions (the parameters of distinct `IRGeneric`s) when lowering each function.
The existing logic handled this in an idiomatic way by making "sub-builders" and "sub-contexts."
This change refactors some of the repeated logic into a `NestedContext` type to help simplify the pattern, and applies it consistently throughout the `lower-to-ir.cpp` file.
Besides that cleanup, the major change is `lowerFunctionStaticVarDecl` which, unsurprisingly, handles lower of function-`static` variables to IR globals.
The careful handling of nested contexts here is needed because if we are in the middle of lowering a generic function, then a `static` variable should turn into its *own* `IRGeneric` wrapping an `IRGlobalVar`. The body of the function should refer to the global variable by specializing the global variable's `IRGeneric` to the parameters of the *functions* `IRGeneric`. This tricky detail is handled by `defaultSpecializeOuterGenerics`.
An additional subtlety not actually required for this raytracing work (and thus not properly tested right now) is handling function-`static` variables with initializers.
These can't just be lowered to globals with initializers, because HLSL follows the C rule that function-`static` variables are initialized when the declaration statement is first executed (and this could be visible in the presence of side-effects).
The lowering strategy here translates any `static` variable with an initializer into *two* globals: one for the actual storage, plus a second `bool` variable to track whether it has been initialized yet.
There are some opportunities to optimize this case, especially for `static const` data, but that will need to wait for future changes.
We've slowly been shifting away from the model where a user thinks of a "profile" as including both a stage and a feature level.
Instead, the user should think about selecting a profile that only describes a feature level (e.g., `sm_6_1`, `glsl_450`, etc.), and then separately specifying a stage (`vertex`, `raygeneration, etc.) for each entry point.
The challenge here is that the command-line processing still only had a single `-profile` switch, and no way to specify the stage.
Adding the `-stage` option was relatively easy, but making it work with the existing validation logic for command-line arguments was tricky, because of the complex model that `slangc` supports for compiling multiple entry points in a single pass.
* In `slang.h` add new reflection parameter categories for ray payloads and hit attributes, as part of entry point input/output signatures.
* A previous change already updated our copy of glslang to one that supports the `GL_NVX_raytracing` extension, so in `slang-glslang.cpp` we just needed to map Slang's `enum` values for the raytracing stage names to their equivalents in the glslang code.
* Moved the logic for looking up a stage by name (`findStageByName()`) out of `check.cpp` and into `compiler.cpp`, with a declaration in `profile.h`
* Added a `$z` suffix to the GLSL translation of `Texture*.SampleLevel()`, to handle cases where the texture element type is not a 4-component vector. Note that this fix should actually be applied to *all* these texture-sampling operations, but I didn't want to add a bunch of changes that are (clearly) not being tested right now.
* The layout logic for entry points was updated to correctly skip producing a `TypeLayout` for an entry point result of type `void`, which meant that the related emit logic now needs to guard against a null value for the result layout.
* In `ir.cpp`, dump decorations on every instruction instead of just selected ones, so that our IR dump output is more complete.
* Added a command-line `-line-directive-mode` option so that we can easily turn off `#line` directives in the output when debugging. Not all cases where plumbed through because the `none` case is realistically the most important.
* Parser was fixed to properly initialize parent links for "scope" declarations used for statements, so that we can walk backwards from a function-scope variable (including a `static`) and see the outer function/generics/etc.
* Added GLSL 460 profile, since it is required for ray tracing. Also updated the logic for computing the "effective" profile to use to recognize that GLSL raytracing stages require GLSL 460.
* Added some conventional ray-tracing shader suffixes to the handling in `slang-test`. This code isn't actually used, but was relevant when I started by copy-pasting some existing VKRay shaders as the starting point for my testing.
* Fixup: typos
|