| Age | Commit message (Collapse) | Author |
|
* Work to mitigate SPIR-V bloat
SPIR-V is not an especially compact format, but some patterns in how Slang generates code and then runs it through `spirv-opt` lead to many redundant field-by-field copy operations being emitted. This change attempts to address some of the resulting bloat from the Slang side of things.
Note: experimentation shows that the bloat is less pronounced when running either *no* SPIR-V optimizations or *full* SPIR-V optimizations, so it is also likely that the bloat should be addressed by changing which `spirv-opt` passes the Slang compiler runs in default (`-O1`) builds. Such changes should come as a distinct pull request.
This change primarily does two things:
First, the code generation strategy for passing arguments to `out` and `inout` parameters has been changed. In the past, the compiler would *always* copy the argument value into a temporary, then pass the address of the temporary, and then write back the value after the call. The new code generation strategy attempts to identify when an argument value already has a simple address in memory and passes that address directly when possible. This eliminates many copy operations that occur before/after calls to functions with `out`/`inout` parameters.
Second, we introduce an IR optimization pass that detects call sites where the entire contents of a buffer (usually a constant buffer) is being passed to a callee function, such that many bytes are loaded and then passed even if only very few are used in the callee. The pass moves the load operations from the caller to a specialized version of the the callee where possible (e.g., when the constant buffer in question is a global shader parameter). Doing this eliminates another major category of copies.
Notes:
* The IR lowering logic is complicated by the fact that several kinds of l-values (values that are usable as the desitnation of assignment, or for `out`/`inout` arguments) are not actually addressable. An easy example is a non-contiguous swizzle like `v.xwz` on a `float4`, where the value occupies 12 bytes, but not 12 consecutive bytes with a single address. There are many more corner cases like that and the IR lowering pass carries a lot of complexity to deal with them. A more systematic overhaul is due some time soon.
* The IR representation of `out` and `inout` parameters deserves some careful scrutiny when making these kinds of changes. The official semantics of `inout` in HLSL has been "copy in copy out" (and `out` is just "copy out") which is observably different from any solution that passes in the address of an l-value directly. By making this change we are saying that Slang's semantics are not precisely those of legacy HLSL, and that our semantics for `inout` parameters are closer to those of `inout` in Swift or of a mutable borrow in Rust. In the Swift case the implementation can freely pass the underlying storage of an l-value or the address of a temporary, and valid programs may not observe the different. It is thus illegal to observe the value in a storage local while a mutation to that location is "in flight." All of this is way more detailed and technical than 99% of Slang users will ever care about, but importantly it gives us semantic cover to eliminate these copies in the IR, and also to emit output C++ code that implements `out` and `inout` as by-reference parameter passing.
* There was an exsting generic pass for specializing functions based on call sites that uses a "template method" style of pattern to customize its behavior. That pass needed to be generalized to handle this use case because it had previously operated on the assumption that the "desire" to specialize a callee function must be driven by the parameter declarations of that function, and not on the argument values passed in. The code has been slightly refactored to allow the policy for specialization to consider both parameters and arguments.
* Unsurprisingly, a bunch of the GLSL (and thus SPIR-V) generated has changed with this work, so several baseline `.slang.glsl` files needed to be updated.
* This change is incomplete in that it does not address broader cases of buffer loads, including both partial loads from constant buffers (just loading one field, but a field that uses a "large" structure type), and loads from multi-element buffers (a lot from a structured buffer where the element type is "large"). The main question in each of those cases is how to define how "large" a structure needs to be before we decide to try and sink loads into callee functions like this. In the worst case, sinking loads in this way may actually create *more* memory traffic (because the same values get loaded in multiple callee functions).
* fixup: run premake
* fixup: typo
|
|
* Update glslang to 11.1.0
This change pulls new versions of glslang, spirv-headers, and spirv-tools as submodules, and makes the necessary changes to other files in the repository to get it all building (at least on Windows).
This change also enables building of glslang from source by default, so that we can easily generate new binaries for inclusion in the `slang-binaries` repository.
* fixup: missing file
|
|
This provides a stand-alone option distinct from `-profile` that can be used to add capabilities to a target. A test has been added to confirm that `-profile X -capability Y` works the same as `-profile X+Y`.
The intention is that this option could be used in applications that use the API to set up their target but then use the options-parsing logic to handle things like capabilities.
Note: that latter bit has not been confirmed, so it is possible that this approach does not actually suffice for hybrid API + options usage. That will need to be confirmed in follow-up work.
|
|
* Use "capability" system to select VKRT extension
Slang currently supports translation of ray tracing shader code to Vulkan GLSL code that uses the `GL_NV_ray_tracing` extension. A multi-vendor equivalent of that extension has been released as `GL_EXT_ray_tracing` and we want Slang to support that extension as well.
At the simplest, making the change from one extension to the other is just a matter of changing a few strings, since it does not appear that anything of significance was changed at the GLSL level (or even in SPIR-V). Where this gets trickier is when we have users who want us to support *both* extensions, and to be able to switch between them.
The solution we've implemented here more or less amounts to:
* If you don't tell the compiler which extension to use, it will default to `GL_EXT_ray_tracing` (the newer multi-vendor one).
* If you explicitly want the older extension, you can opt into it using the `-profile` option or via a new API for explicitly adding capabilities to your target.
Making that work required a few different kinds of changes:
* The options parsing and public API needed ways to add optional capabilities to a target.
* During GLSL code emit, we can check the capabilities that were added to the target to see if the `GL_NV_ray_tracing` extension was explicitly enabled and, if not, default to using the `GL_EXT_ray_tracing` names for things. This step is needed because some of the modifiers/attributes involved in the extension have to be handled explicitly in the code generator rather than implicitly as part of mapping intrinsic functions.
* We add two different translations to the relevant operatiosn in the stdlib, one marked with each of the extensions. If profile/capability-based overload resolution can be relied on to pick the right one, this should Just Work.
* Next, a bunch of work had to go into making capability-based overloading Just Work for the purposes of this change. There's been a nearly complete reworking of the implementation of `CapabilitySet` here to make it more suitable for our needs.
* The tests that were using ray tracing translation for Vulkan needed to be updated. For some of them I updated their baselines to use `GL_EXT_ray_tracing` so that they can test the new path. For others, I updated the command line for the test case so that it explicitly opts into using `GL_NV_ray_tracing`. The result is that we have some coverage of each extension. I would have liked to have each test run in both modes, but our pass-through glslang support doesn't support `-D` options, so I couldn't take that step easily.
This change does *not* add support for `GL_EXT_ray_query`, the extension that supports "DXR 1.1" style queries under Vulkan. Adding support for that extension should hopefully be a smaller step because it doesn't have the same multiple-extensions issue.
This change does *not* address a lot of possible avenues for improvement or cleanup around the capability system. It focuses only on those changes that are necessary to make the ray tracing feature work and leaves the rest for future work.
* fixup: infinite loop
* Comment-only change to retrigger TC build
|
|
* Enable all dynamic dispatch tests on CUDA.
* Fix expected cross-compile test results.
|
|
Entry point `uniform` parameters were a feature of the original Cg and HLSL, but have not been used much in production shader code. One of our goals on Slang is to reduce the (ab)use of the global scope, so bringing entry point `uniform` parameters up to a greater level of usability is an important goal.
Some policy choices about how global vs. entry-point `uniform` parameters behave have already been made, that shape decisions looking forward:
* For DXBC/DXIL, it makes the most sense to follow the lead of fxc/dxc, by treating entry point `uniform` parameters as a kind of syntax sugar for global shader parameters. Any parameters of "ordinary" types are bundles up into an implicit constant buffer, and all the resources (including the implicit constant buffer) are assigned `register`s just as for globals. It is up to the application to decide how to bind those parameters via a root signature (using root descriptors, root constants, descriptor tables, local vs. global root signature, etc.)
* For CPU, it makes sense to pass global vs. entry-point parameters as two different pointers, although the details of what we do for CPU are the least constrained across all current targets.
* For CUDA compute, it makes the most sense to map global shader parameters to `__constant__` global data, and entry-point `uniform` parameters to kernel parameters. This choice ensures that the signature of a kernel when translated from Slang->CUDA follows the Principle of Least Surprise, at the cost of making entry-point vs. global parameters be passed via different mechanisms.
* For OptiX ray tracing, it makes sense to expand on the precedent from CUDA compute: pass global parameters via global `__constant__` data (as is already expected by OptiX for whole-launch parameters), and pass entry-point `uniform` parameters via the "shader record." This establishes a precedent that for ray-tracing shaders, global-scope parameters map to the "global root signature" concept from DXR, while entry-point `uniform` parameters map to a "local root signature" or "shader record."
* For Vulkan ray tracing, the precedent from OptiX then argues that entry-point `uniform` parameters should map to the Vulkan "shader record" concept (and thus cannot support things like resource types).
* The remaining interesting case is what to do for non-ray-tracing shaders on Vulkan.
The dev team agrees that the most reasonable choice to make for non-ray-tracing Vulkan shaders is to map entry-point `uniform` parameters to "push constants." In particular, this makes it easy to express the case of a compute kernel with direct parameters of ordinary/value types in the way that will be implemented most efficiently.
The big picture is then that a kernel like:
```hlsl
void computeMain(uniform float someValue) { ... }
```
will map to output GLSL like:
```glsl
layout(push_constant)
uniform
{
float someValue;
} U;
void main() { ... }
```
If the user really wanted a constant-buffer binding to be created instead, they can easily change their input to make the buffer explicit:
```hlsl
struct Params { float someValue; }
void computeMain(uniform ConstantBuffer<Params> params) { ... }
```
(Forcing the user to be explicit about the desire for a buffer here creates a nice symmetry between Vulkan and CUDA; in the first case the user sets up the data in host memory and passes it to the GPU by copy, while in the second case the user must allocate and set up a device-memory buffer for the data. This symmetry extends to D3D if the application chooses to map entry-point `uniform` parameters to root constants.)
This change implements logic in the "parameter binding" part of the Slang compiler to make sure that entry-point `uniform` parameters are wrapped up in a push-constant buffer rather than an ordinary constant buffer for non-ray-tracing shaders on Vulkan (and in a shader record "buffer" for the ray-tracing case).
The majority of the actual work was in adding support for root/push constants to the test framework and the graphics API abstraction it uses. To be clear about that support:
* Root constant ranges are (perhaps confusingly) treated as a new kind of "slot" that can appear on a descriptor set. This choice ensures that the implicit numbering of registers/spaces used by the back-ends can account for these ranges correctly.
* The `TEST_INPUT` lines are extended to allow a `root_constants` case that behaves more or less like `cbuffer`
* The CPU and CUDA paths can treat a `root_constants` input identically to a `cbuffer`. They already allocate the actual buffers based on reflection, and just use `cbuffer` as a directive that causes bytes to be copied in.
* On D3D12 and Vulkan, a descriptor set allocates a `List<char>` to hold the bytes of root constant data assigned into it, and these bytes are flushed to the command list when the table is actually bound (usually right before rendering).
* On D3D11, a descriptor set treats a root constant range more or less like a constant buffer range (with a single buffer), except that it also automatically allocates a buffer to hold the data. Assigning "root constant" data automatically copies it into that buffer.
The small number of tests that used entry-point `uniform` parameters of ordinary types were updated to use the new `root_constant` input type, and the bugs that surfaced were fixed.
A new test to confirm that entry-point `uniform` parameters map to the shader record for VK ray tracing was added.
An important but technically unrelated change is the removal of the `DescriptorSetImpl::Binding` type and related function from the Vulkan implementation of `Renderer`. That type was created to ensure that objects that are bound into a descriptor set don't get released while the descriptor set is still alive, but the implementation relied on a complicated linear search to check for existing bindings, which could create a performance issue for descriptor sets that include large arrays of descriptors. The new implementation makes use of the approach already present in the various `Renderer` implementations (including the Vulkan one) for assigning ranges in a descriptor set a flat/linear index for where their pertinent data is to be bound. As a result, the Vulkan `DescriptorSetImpl` now uses a single flat array of `RefPtr`s to track bound objects, and has no need for linear search when binding.
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* Run SSA pass to clean up generic temporary variables during lowering.
* Fix `undefined` emitting logic.
* revert dumpir control flag
* Defer fold decision of `undefined` values after special case logic for GLSL and HLSL.
* Update expected test result.
* Manually update raygen.slang.glsl to minimize change.
* fix formatting
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
buffer (#936)
|
|
* Output readonly on buffers for glsl if resource is readonly.
Didn't add to emitGLSLParameterGroup because the cases there seem to to either be implicitly read only, or allow write.
* * Improve comments around use of 'readonly' on glsl output
* Use readonly with shaderRecord
* Add comment pointing out shader record can be rw on vk, so might require changes in the future.
|
|
* Refactor several IR passes
This change takes some IR passes that lived together in `ir.cpp` and moves them into their own files to improve clarity.
In most cases these were passes introduced early in the life of the IR, so that it didn't seem like a big deal to have them all in one file, but now that `ir.cpp` has grown unwieldly this seems like an important cleanup to make.
To give a quick rundown of the passes involved:
* The IR "linking" step has been pulled out to `ir-link.{h,cpp}`. This code for this pass is pretty much identical to what was in `ir.cpp`, and no attempt has been made to clean up or refactor it in the current change.
* The GLSL legalization step has been pulled out to `ir-glsl-legalize.{h,cpp}`. This used to be invoked directly from the linking step, but has been made a new top-level pass invoked from `emit.cpp`. Just like with the linking, the code in the new file is just a copy-paste of what was in `ir.cpp`, and no attempt at cleanup has been made. Also note that it might be a good idea to move this pass later in the overall sequence, but this PR doesn't attempt to do that as it could change results.
* The generic specialization step has been pulled out to `ir-specialize.{h,cpp}`. The file name does not explicitly reference *generic* specialization because I anticipate this pass having to perform other kinds of specialization as well. The code in this case amounts to a heavy cleanup/refactoring pass and thus deserves careful scrutiny. The reason for the cleanup is that the generic specialization step used to be part of the "linking" step long ago, and continued to share infrastructure with it long after that stopped making sense. The newly cleaned up pass has much simpler logic that should be easy enough to follow from the comments.
* In order to reduce code dulication, the IR "cloning" part of the `ir-specialize-resources.{h,cpp}` pass was pulled into its own files (`ir-clone.{h,cpp}`) that both the generic specialization step and the resource-based specialization step now share.
The remaining changes then pertain to deleting a bunch of code out of `ir.cpp` and adding the new files to the build.
The only test that needed updating was `vkray/raygen`, where some subtle ordering change in the refactored generic specialization logic has lead to the relative order of the specialized `TraceRay` and `saturate` functions beind reversed.
* fixup: typo in assert
* fixup: typos in comments
|
|
Before this change, global shader parameters were represented in the IR as just being ordinary global variables.
The only indication that a particular global represented a parameter was when it got a layotu attached to it (as part of back-end processing), and we've had a number of bugs related to layouts being dropped so that what should have been a shader parameter turned into an ordinary global variable in the output.
This change is more strongly motivated by the fact that making shader parameters look like globals means that we cannot easily reason about their value when doing IR transformations.
If we see two `load`s from the same global variable can we assume they yield the same value?
In the general case we cannot, and this means that any transformation that wants to rely on the fact that an input `Texture2D` shader parameter can't actually change over the life of the program needs to do extra work.
The fix here is to introduce a new kind of IR instruction that represents a global shader parameter directly (not a pointer to it as a global would), at which point there isn't even such a notion as a "load" from the parameter, since it represents the value directly.
In several cases logic that used to apply to global variables in case they were shader parameters (by looking for a layout) is now moved to apply to these global parameters.
The biggest source of issues in this change was that switching from pointers to plain values to represent these shader parameters stresses different cases in type legalization. I also had to deal with the case of legalization for GLSL where we actually *do* need global shader parameters that are writable (since varying output goes in the global scope), but in that case I borrowed the use of pointer-like `Out<...>` and `InOut<...>` types to represent that intent, which we were already using for function parameters representing outputs.
A few tests started failing because the changes lead to a slightly different order of code emission, which in some HLSL tests resulted in a function parameter named `s` getting emitted before a global parameter named `s`, leading to the latter getting the name `s_1` instead of `s_0`.
A few SPIR-V tests started failing because the new approach means that we no longer end up performing a load from all varying input parameters at the start of `main` and instead reference the varying inputs directly. The resulting code is more idomatic, but it differed from the baselines for those tests.
|
|
* Change how buffers are emitted
This is a change with a lot of pieces, which can't always be separated out cleanly. I'm going to walk through them in what I hope is a logical order.
The main goal of this change was to allow arrays of structured buffers to translate to Vulkan. Consider two declarations of structured buffers in HLSL/Slang:
```hlsl
StructuredBuffer<X> single;
StructuredBuffer<Y> multiple[10];
```
The current translation logic was handling `single` by translating it into an *unnamed* GLSL `buffer` block like:
```glsl
layout(std430)
buffer _S1
{
X single[];
};
```
That syntax allows an expression like `single[i]` in Slang to be translated simply as `single[i]` in GLSL.
But that naive translating doesn't work for `multiple`, since we need to declare a array of blocks in GLSL, which requires giving the whole thing a name:
```glsl
layout(std430)
buffer _S2
{
Y _data[];
} multiple[10];
```
Now a reference to `multiple[i][j]` in Slang needs to become `multiple[i]._data[j]` in GLSL.
To avoid having way too many special cases around single structured buffers vs. arrays, it makes sense to allows emit things in the latter form, so that we instead lower `single` as:
```glsl
layout(std430)
buffer _S1
{
X _data[];
} single;
```
So that now a reference to `single[i]` becomes `single._data[i]` in GLSL.
Most of that can be handled in the standard library translation of the structured buffer indexing operations.
The only wrinkle there is that there were some *old* special-case instructions in the IR intended to handle buffer load/store operations (these were added back when I was trying to keep the "VM" path working). These aren't really needed to have structured-buffer operations work; they can be handled as ordinary functions as far as the stdlib is concerned. I removed the old instructions.
Along the way, it became clear that a few other cases follow the same pattern. Byte-addressed buffers are an obvious case. We were lowering HLSL/Slang:
```hlsl
ByteAddressBuffer b;
...
uint x = b.Load(0);
```
to GLSL like:
```glsl
layout(std430)
buffer _S1
{
uint b[];
};
...
uint x = b[0];
```
That logic would fail for arrays the same way that the structured buffer case was failing. The fix is the same: use named `buffer` blocks and then introduce an explicit `_data` field:
```glsl
layout(std430)
buffer _S1
{
uint _data[];
} b;
...
uint x = b._data[0];
```
Just like with structured buffers, all of the VK translation for operations on byte-addressed buffers can be implemented directly in teh stdlib, so once the emit logic was changed it was just a matter of adding `._data` to a bunch of VK tranlsations.
It turns out that arrays of constant buffers have more or less the same problem, and furthermore we have some problems with any code that directly uses the modern HLSL `ConstantBuffer<T>` type.
Note: the emit logic around constant buffers sometimes refers to "parameter groups" because that is being used in the compiler as a catch-all term for constant buffers, texture buffers, and parameter blocks.
The existing code was going out of its way to reproduce the way that constant buffer declarations are implicitly referenced in HLSL:
```hlsl
cbuffer C { float f; }
...
float tmp = f; // No reference to `C` here
```
This can be seen in the emit logic with the `isDerefBaseImplicit` function, which is used to take the internal IR representation for a reference to `f` (which is closer to the expression `(*C).f` or `C->f`) and leave off any reference to `C` so that we emit just `f`.
That kind of logic just flat out doesn't work in some important cases. Arrays of constant buffers are a clear one:
```hlsl
ConstantBuffer<X> cbArray[3];
...
X x = cbArray[0];
```
There is no way to translate that to an ordinary `cbuffer` declaration at all. The same problem can be created without arrays, though:
```hlsl
ConstantBuffer<X> singleCB;
...
X x = singleCB;
```
The current strategy for translating constant buffers was translating `singleCB` into a `cbuffer` declaration that reproduced the fields of `X` as its members, which just wouldn't work:
```hlsl
cbuffer singleCB
{
float f; // field of `X`
}
...
X x = singleCB; // ERROR: there is nothing named `singleCB` in this HLSL
```
The new strategy is more consistent. We still generate a `cbuffer` declaration for a single constant buffer, but we always give it a single field of the chosen element type:
```hlsl
cbuffer singleCB
{
X singleCB;
}
...
X x = singleCB; // this works fine!
```
And in the array case we generate code that uses the explicit `ConstantBuffer<T>` type:
```hlsl
ConstantBuffer<X> cbArray[3];
...
X x = cbArray[0];
```
The GLSL output is more complicated because unlike with HLSL there is no implicit conversion from a uniform block to its element type (there is no notion of an element type). The array case thus needs a `_data` field similar to what we do for structured buffers:
```glsl
layout(std140)
uniform _S3
{
X _data;
} cbArray[3];
...
X x = cbArray[0]._data;
```
And then the non-array case needs to have a similar `_data` field for consistency:
```glsl
layout(std140)
uniform _S1
{
X _data;
} singleCB;
...
X x = singleCB._data;
```
This is handled by inserting the necessary reference to `_data` whenever we dereference a constant buffer, either as part of a load instruction (loading from the whole CB as a pointer), or an `IRFieldAddress` instruction which forms a pointer into the CB (e.g., `&(singleCB->f)` becomes `singleCB._data.f`).
The current emit logic handles `ParameterBlock<X>` differently from `ConstantBuffer<X>`, but really only to allow parameter blocks to be explicitly named in the output, while constant buffers were left implicit by default. Thus the only difference was a legacy one (from back when trying to exactly reproduce the HLSL text we got as input was considered an important goal), and the new approach to emitting constant buffers would get rid of it.
I removed the separate logic for emitting `ParameterBlock<X>` and just let the handling for constant buffers deal with it.
Note that any resource types inside of a `ParameterBlock<X>` would have been moved out as part of legalization, so that a parameter block is 100% equivalent to a constant buffer when it comes time to emit code.
Unsurprisingly, changing the way we generate HLSL and GLSL output for all these buffer types meant that any tests that were directly comparing the output of `slangc` against `fxc`, `dxc`, or `glslang` broke.
The basic approach to fixing the breakage in GLSL tests was to update the GLSL baseline to reflect the new output startegy. In some cases I used macros to name the various `_S<digits>` temporaries so that future renaming will hopefully be easier (it would be great if we auto-generated temporary names with a bit more context). There was one GLSL test (`tests/bugs/vk-structured-buffer-binding`) that was using raw GLSL expected output, and this was changed to use a GLSL baseline to generate SPIR-V for comparison.
For HLSL tests we were sometimes running the same input file through `slangc` and `fxc`/`dxc`, and in these cases I macro-ized the various `cbuffer` declarations to generate different declarations depending on the compiler.
I completely dropped the tests coming from the D3D SDK because they aren't providing much coverage, and updating them would change them so far from the original code that the purported benefit (using a body of existing shaders) would be lost.
I also dropped the explicit matrix layout qualifiers in the `matrix-layout` test because the new output strategy breaks those for GLSL (you can't put matrix layout qualifiers on `struct` fields, and now the body of every constant buffer is inside a `struct`). This isn't as big of a loss as it seems, because our handling of those qualifiers wasn't really right to begin with. Slang users should only be setting the matrix layout mode globally (and we should probably switch to error out on the explicit qualifiers for now).
The other thing that got dropped is tests involving `packoffset` modifiers.
Slang already warns that it doesn't support these, and the way they were used in the test cases is actually misleading. For the binding/layout-related tests, the goal was to show that Slang reproduces the same layout as fxc, in which case explicitly enforcing a layout via `packoffset` seems like cheating (are we sure we enforced the layout fxc would have produced?). The real reason was that Slang used to emit explicit `packoffset` on *every* field of a `cbuffer` it would output, because of an `fxc` bug where you couldn't use `register` on textures/samplers declared inside a `cbuffer` unless *every* field in the `cbuffer` used a `register` or `packoffset` modifier. Slang hasn't required that behavior in a while because it now splits textures and samplers, and the one test case where we needed `packoffset` to work around the `fxc` bug in the baseline HLSL has been macro-ified even more to work around the bug.
The amount of churn in the test cases is unfortunate, but it continues to point at the weakness of any testing strategy that checks for exact equivalent between Slang's output and that of other compilers. We need to keep working to replace these tests with better alternatives.
In `check.cpp` there is logic to perform implicit dereferencing, so that if you write `obj.f` where `obj` is a `ConstantBuffer<X>` (or some other "pointer-like" type) and `f` is a field in `X`, then this effectively translates as `(*obj).f`. That is, we dereference the value of type `ConstantBuffer<X>` to get a value of type `X`, and then refer to the field of the `X` value.
There was a problem where the logic to insert that kind of implicit dereference operation was using a reference (`auto& type = ...`) for the type of the expression being dereferenced, and then clobbering it. This would mean that an expression of type `ConstantBuffer<X>` would have its type overwritten to be just `X` and then codegen would break later on.
I'm not sure how we haven't run into that before.
The `array-of-buffers` test case was added to confirm that we now support arrays of constant, structured, and byte-address buffers for both DXIL and SPIR-V output.
Okay, so that was a lot of stuff, but hopefully it is clear how this all works to make the output of the compiler more consistent and explicit, while also supporting the required new functionality.
* fixup: review feedback
|
|
The syntax for this is a placeholder for now, since we will probably want to migrate to whatever gets decided on for dxc. To declare that some data should be part of the "shader record" use `layout(shaderRecordNV)` to mirror the GLSL raytracing extension:
```hlsl
layout(shaderRecordNV)
cbuffer MyShaderRecord
{
float4 someColor;
uint someValue;
}
```
The intention (not enforced) is that an application would map `MyShaderRecord` to "root constants" in the "local root signature" when compiling for DXR, while the output code in GLSL will always map to the shader record in Vulkan raytracing:
```glsl
layout(shaderRecordNV)
buffer MyShaderRecord
{
float4 someColor;
uint someValue;
};
```
This change does *not* support declaring a global value of `struct` type with `layout(shaderRecordNV)` (or a `ParameterBlock` with the modifiers, although that would be a nice-to-have feature) and it does *not* support having the contents of the shader record be mutable (even if GLSL/Vulkan allows it). Those can/should be added in future changes.
In terms of implementation, this closely mirrors the way that `layout(push_constant)` buffers were being handled, where the data inside the `ConstantBuffer<X>` (the value of type `X`) gets laid out using ordinary rules (and consuming ordinary `UNIFORM` storage, while the buffer itself is given a different layout resource to reflect that fact that it does not consume a VK `binding` any more, but a different conceptual resource.
Note: an alternative design here (that might actually be preferrable) would be to have both push-constant and shader-record buffers be handled as alternative aliases for `ConstantBuffer` (or maybe `ParameterBlock`) so that you have, e.g.:
```hlsl
PushConstantBuffer<X> myPushConstants;
ShaderRecord<Y> myShaderRecord;
```
This alternative design avoids API-specific decorations on the declarations, and reflects the intent of the programmer very directly, even when they are compiling for a target like D3D that doesn't reflect these choices at the IL level (it could still be exposed through the Slang reflection API).
|
|
* Fix output of binding of structured buffer on GLSL.
* Added test to check vk binding is coming thru.
* Fix closethit binding inconsistency.
|
|
* Add callable shader support for Vulkan ray tracing
This change extends the previous work to update Vulkan ray tracing support for the finished `GL_NV_ray_tracing` spec.
One of the features missing in the experimental extension that was added to the final spec is "callable shaders," which allow ray tracing shaders to call other shaders as general-purpose subroutines.
Most of the implementation work here mirrors what was done for the `TraceRay()` function to map it to `traceNV()`.
We map the generic `CallShader<P>` function to the non-generic `executeCallableNV`, with a payload identifier that indicates a specific global variable of type `P` (the global variable being generated from a `static` local in `CallShader`). A new modifier is added to identify the payload structure, and the parameter binding/layout logic introduces a new resource kind for callable-shader payload data (where previously the logic had assumed ray and callable payloads should use the same resource kind).
Two test shaders are included: one for the callable shader (`callable.slang`) and one for a ray generation shader that calls it (`callable-caller.slang`). Just for kicks, the payload data type is defined in a shared file so that we can be sure the two agree (trying to emulate what might be good practice, and ensure that ray tracing support works together with other Slang mechanisms).
* Typo fix: assocaited->associated
One instance was found in review, but I went ahead and fixed a bunch since I seem to make this typo a lot.
* Typo fix: defintiion->definition
|
|
* Update version of glslang used
* Update VK raytracing support for final extension spec
A lot of this change is just plain renaming: The `NVX` suffixes become just `NV`, and the extension name changes from `GL_NVX_raytracing` to `GL_NV_ray_tracing`.
The Slang standard library and the GLSL baselines for the tests are consistently updated.
The other detail is that the final spec requires the "payload" identifier in a `traceNV()` call to be a compile-time constant, which means it cannot be defined as a local variable first, as in:
```glsl
int payloadID = 0;
traceNV(..., payloadID); // ERROR
```
In terms of how the original support was implemented, the payload ID is being computed via a special builtin function that maps each global GLSL payload variable to a unique ID. There are a few ways we could try to resolve the problem here:
1. We could aspire to put our equivalent of the `constexpr` modifier on the output of the function, so that the GLSL variable gets declared `const` and thus fits the GLSL rules for a constant expression.
2. We could introduce a pass to replace the payload-location instructions with literal integers.
3. We could use a special-purpose instruction instead of a builtin function call, and have that instruction indicate that it doesn't have side effects (so it can be folded into the call site)
4. We could somehow mark the builtin function as not having side effects.
We choose option (4) simply because it provides a feature that could have other applications. This change adds a `[__readNone]` attribute that can be applied to function declarations to express a promise on the part of the programmer that the given function has no side effects and computes its result strictly from the bits of its input arguments (and not things they point to, etc.). This mirrors an equivalent function attribute in LLVM.
We mark the function that computes a ray payload location with this attribute, and propagate the attribute through the layers of the IR, so that when the emit logic asks if an operation has side effects (to see if it can be folded into the arguments of a subsequent expression), we get an affirmative response.
This change should get all of the features that were present in the experiemntal `NVX` extension working with the final extension spec. It does not address callable shaders, which will come as a subsequent change.
|
|
* Move to newer glslang
* Support cross-compilation of ray tracing shaders to Vulkan
This change allows HLSL shaders authored for DirectX Raytracing (DXR) to be cross-compiled to run with the experimental `GL_NVX_raytracing` extension (aka "VKRay").
* The GLSL extension spec is marked as experimental, so that any shaders written using this support should be ready for breaking changes when the spec is finalized.
* "Callable shaders" are not exposed throug the GLSL extension, so this feature of DXR will not be cross-compiled.
* The experimental Vulkan raytracing extension does not have an equivalent to DXR's "local root signature" concept. This does not visibly impact shader translation (because the local/global root signature mapping is handled outside of the HLSL code), but in practice it means that applications which rely on local root signatures on their DXR path will not be able to use the translation in this change as-is; more work will be needed.
The simplest part of the implementation was to go into the Slang standard library and start adding GLSL translations for the various DXR operations.
In some cases, like mapping `IgnoreHit()` to `ignoreIntersectionNVX()` this is almost trivial.
The various functions to query system-provided values (e.g., `RayTMin()`) were also easy, with the only gotcha being that they map to variables rather than function calls in GLSL, and our handling of `__target_intrinsic` assumes that a bare identifier represents a replacement function name, and not a full expression, so we have to wrap these definitions in parentheses.
The tricky operations are then `TraceRay<P>()` and `ReportHit<A>()`, because these two are generics/templates in HLSL.
GLSL doesn't support generics, even for "standard library" functions, so the raytracing extension implements a slightly complex workaround: the matching operations `traceNVX()` and `reportIntersectionNVX()` pass the payload/attributes argument data via a global variable.
That is, shader code for the GLSL extensions writes to the global variable and then calls the intrinsic function.
The linkage between the call site and the global is established by a modifier keyword (`rayPayloadNVX` and `hitAttributeNVX`, respectively) and in the case of ray payload also uses `location` number to identify which payload global to use (since a single shader can trace rays with multiple payload types).
Our translation strategy in Slang tries to leverage standard language mechanisms instead of special-case logic.
For example, to translate the `ReportHit<A>()` function, we provide both a default declaration that will work for HLSL (where the operation is built-in with the signature given), and a *definition* marked with the `__specialized_for_target(glsl)` modifier.
The GLSL definition declares a function `static` variable that will fill the role of the required global, and then does what the GLSL spec requires: assigns to the global, and then calls the `reportIntersectionNVX` builtin (which we declare as a separate builtin).
Our ordinary lowering process will turn that `static` variable into an ordinary global in the IR, and the `[__vulkanHitAttributes]` attribute on the variable will be emitted as `hitAttributeNVX` in the output.
There is no additional cross-compilation logic in Slang specific to `ReportHit<A>()` - the target-specific definition in the standard library Just Works.
The case for `TraceRay<P>()` is a bit more complicated, simply because the GLSL `traceNVX()` function needs to be passed the `location` for the payload global.
We implement the payload global as a function-`static` variable, with the knowledge that every unique specialization of `TraceRay<P>()` will generate a unique global variable of type `P` to implement our function-`static` variable.
We then add a slightly magical builtin function `__rayPayloadLocation()` that can map such a variable to its generated `location`; the logic for this is implemented in `emit.cpp` and described below.
We also changed the `RayDesc` and `BuiltinTriangleIntersectionAttributes` types from "magic" intrinsic types over to ordinary types (because the GLSL output needs to declare them as ordinary `struct` types).
This ends up removing some cases in the AST and IR type representations.
By itself this change would break HLSL emit, because in that case the types really are intrinsic.
We added a `__target_intrinsic` modifier to these types to make them intrinsic for HLSL, and then updated the downstream passes to handle the notion of target-intrinsic types.
The logic for binding/layout of entry point inputs and outputs was updated so that raytracing stages don't follow the default logic for varying input/output parameters.
This is because the input/output parameters of a raytracing entry point aren't really "varying" in the same sense as those in the rasterization pipeline.
In particular, the SPIR-V model for raytracing input and output treats "ray payload" and "hit attributes" parameters as being in a distinct storage class from `in` or `out` parameters.
We also detect cases where a ray tracing stage declares inputs/outputs that it shouldn't have. This logic could conceivably be extended to other stages (e.g., to give an error on a compute shader with user-defined varying input/output).
The type layout logic added cases for handling raytracing payload and hit-attribute data, but this is currently just a stub implementation that follows the same logic as for varying `in` and `out` parameters (it cannot give meaningful byte sizes/offsets right now).
To my knowledge the GLSL spec doesn't currently specify anything about layout, and I haven't read the DXR spec language carefully enough to know what it says about layout.
A future change should update the layout logic to allow for byte-based layout of ray payloads, etc. so that we can query this information via reflection.
The GLSL legalization logic in `ir.cpp` was updated to factor out the per-entry-point-parameter code into its own function, and then that function was updated to special-case the input/output of a ray-tracing shader.
While for rasterization stages we typically want to take the user-declared input/output and "scalarize" it for use in GLSL (in part to deal with language limitations, and in part to tease system values apart from user-defined input/output), the GLSL spec for raytracing requires payload and hit attribute parameters to be declared as single variables. There is also the issue that even for an `in out` parameter, a ray payload parameter should only turn into a single global, whereas the handling for varying `in out` parameters generates both an `in` and an `out` global for the GLSL case.
Other than the handling of entry point parameters, the GLSL legalization pass doesn't need to do anything special for ray tracing shaders.
The trickiest change in the `emit.cpp` logic is that we now generate `location`s for ray payload arguments (the outgoing from a `TraceRay()` call) on demand during code generation.
This is a bit hacky, and it would be nice to handle it as a separate pass on the IR rather than clutter up the emit logic, but this approach was expedient.
Basically, any of the global variables that got generated from the `static` declarations in the standard library implementation of `TraceRay()` will trigger the logic to assign them a `location`.
The logic for emitting intrinsic operations added a few new `$`-based escape sequences. The `$XP` case handles emitting the location of a generated ray payload variable; this is how we emit the matching location at the site where we call `traceNVX`. The `$XT` case emits the appropriate translation for `RayTCurrent()` in HLSL, because it maps to something different depending on the target stage.
All of the test cases here consist of a pair of an HLSL/Slang shader written to the DXR spec, plus a matching GLSL shader for a baseline.
The GLSL shaders are carefully designed so that when fed into glslang they will produce the same SPIR-V as our cross-compilation process.
This kind of testing is quite fragile, but it seems to be the best we can do until our testing framework code supports *both* DXR and VKRay.
A bunch of the core changes ended up being blocked on issues in the rest of the compiler, so some additional features go implemented or fixed along the way:
The first big wall this work ran into was that the `__specialized_for_target` modifier hasn't actually been working correctly for a while.
It turns out that for the one function that is using it, `saturate()`, we have been outputting the workaround GLSL function in *all* cases (including for HLSL output) rather than only on GLSL targets.
The problem here is that for a generic function with a `__specialized_for_target` modifier or a `__target_intrinsic` modifier, the IR-level decoration will end up attached to the `IRFunc` instruction nested in the `IRGeneric`, but the logic for comparing IR declarations to see which is more specialized (via `getTargetSpecializationLevel()`) was looking only at decorations on the top-level value (the generic).
The quick (hacky) fix here is to make `getTargetSpecializationLevel()` try to look at the return value of a generic rather than the generic itself, so that it can see the decorations that indicate target-specific functions.
A more refined fix would be to attach target-specificity decorations to the outer-most generic (to simplify the "linking" logic).
The only reason not to fold that into the current fix is that the `__target_intrinsic` modifier currently serves double-duty as a marker of target specialization *and* information to drive emit logic. The latter (the emit-related stuff) currently needs to live on the `IRFunc`, and moving it to the generic could easily break a lot of code.
This needs more work in a follow-on fix, but for now target specialization should again be working.
The other big gotcha that the simple "just use the standard library" strategy ran into was that function-`static` variables weren't actually implemented yet, and in particular function-`static` variables inside of generic functions required some careful coding.
The logic in `lower-to-ir.cpp` has this `emitOuterGenerics()` function that is supposed to take a declaration that might be nested inside of zero or more levels of AST generics, and emit corresponding IR generics for all those levels.
This is needed because two different AST functions nested inside a single generic `struct` declaration should turn into distinct `IRFunc`s nested in distinct `IRGeneric`s.
The tricky bit to making that all work is that the same AST-level generic type parameter will then map to *different* IR-level instructions (the parameters of distinct `IRGeneric`s) when lowering each function.
The existing logic handled this in an idiomatic way by making "sub-builders" and "sub-contexts."
This change refactors some of the repeated logic into a `NestedContext` type to help simplify the pattern, and applies it consistently throughout the `lower-to-ir.cpp` file.
Besides that cleanup, the major change is `lowerFunctionStaticVarDecl` which, unsurprisingly, handles lower of function-`static` variables to IR globals.
The careful handling of nested contexts here is needed because if we are in the middle of lowering a generic function, then a `static` variable should turn into its *own* `IRGeneric` wrapping an `IRGlobalVar`. The body of the function should refer to the global variable by specializing the global variable's `IRGeneric` to the parameters of the *functions* `IRGeneric`. This tricky detail is handled by `defaultSpecializeOuterGenerics`.
An additional subtlety not actually required for this raytracing work (and thus not properly tested right now) is handling function-`static` variables with initializers.
These can't just be lowered to globals with initializers, because HLSL follows the C rule that function-`static` variables are initialized when the declaration statement is first executed (and this could be visible in the presence of side-effects).
The lowering strategy here translates any `static` variable with an initializer into *two* globals: one for the actual storage, plus a second `bool` variable to track whether it has been initialized yet.
There are some opportunities to optimize this case, especially for `static const` data, but that will need to wait for future changes.
We've slowly been shifting away from the model where a user thinks of a "profile" as including both a stage and a feature level.
Instead, the user should think about selecting a profile that only describes a feature level (e.g., `sm_6_1`, `glsl_450`, etc.), and then separately specifying a stage (`vertex`, `raygeneration, etc.) for each entry point.
The challenge here is that the command-line processing still only had a single `-profile` switch, and no way to specify the stage.
Adding the `-stage` option was relatively easy, but making it work with the existing validation logic for command-line arguments was tricky, because of the complex model that `slangc` supports for compiling multiple entry points in a single pass.
* In `slang.h` add new reflection parameter categories for ray payloads and hit attributes, as part of entry point input/output signatures.
* A previous change already updated our copy of glslang to one that supports the `GL_NVX_raytracing` extension, so in `slang-glslang.cpp` we just needed to map Slang's `enum` values for the raytracing stage names to their equivalents in the glslang code.
* Moved the logic for looking up a stage by name (`findStageByName()`) out of `check.cpp` and into `compiler.cpp`, with a declaration in `profile.h`
* Added a `$z` suffix to the GLSL translation of `Texture*.SampleLevel()`, to handle cases where the texture element type is not a 4-component vector. Note that this fix should actually be applied to *all* these texture-sampling operations, but I didn't want to add a bunch of changes that are (clearly) not being tested right now.
* The layout logic for entry points was updated to correctly skip producing a `TypeLayout` for an entry point result of type `void`, which meant that the related emit logic now needs to guard against a null value for the result layout.
* In `ir.cpp`, dump decorations on every instruction instead of just selected ones, so that our IR dump output is more complete.
* Added a command-line `-line-directive-mode` option so that we can easily turn off `#line` directives in the output when debugging. Not all cases where plumbed through because the `none` case is realistically the most important.
* Parser was fixed to properly initialize parent links for "scope" declarations used for statements, so that we can walk backwards from a function-scope variable (including a `static`) and see the outer function/generics/etc.
* Added GLSL 460 profile, since it is required for ray tracing. Also updated the logic for computing the "effective" profile to use to recognize that GLSL raytracing stages require GLSL 460.
* Added some conventional ray-tracing shader suffixes to the handling in `slang-test`. This code isn't actually used, but was relevant when I started by copy-pasting some existing VKRay shaders as the starting point for my testing.
* Fixup: typos
|