| Age | Commit message (Collapse) | Author |
|
* Compile append and consume structured buffers to glsl.
* Fix.
* Update CI config.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Fixes for Shader Execution Reordering on VK
There are some mismatches between the way that hit objects are
handled between the current NVAPI/HLSL and proposed GLSL extensions
for shader execution reordering. These mismatches create complications
for generating valid GLSL/SPIR-V code from input Slang.
Many of the problems that apply to `HitObject` also apply to the
existing `RayQuery<>` type used for "inline" ray tracing.
In the case of `RayQuery<>` we have that for *both* HLSL and
GLSL/SPIR-V:
* A `RayQuery` (or `rayQueryEXT`) is an opaque handle to underlying
mutable storage
* The storage that backs a `RayQuery` is allocated as part of the
"defualt constructor" for a local variable declared with type
`RayQuery`.
* The `RayQuery` API provides numerous operations that mutate the
storage referred to by the opaque handle.
The key difference between HLSL and GLSL/SPIR-V for the case of a
`RayQuery` amounts to:
* In HLSL, local variables of type `RayQuery` can be assigned to,
and assignment has by-reference semantics. It is possible to create
multiple aliased handles to the same underlying storage.
* In GLSL/SPIR-V, local variables of type `rayQueryEXT` cannot be
assigned to, returned from functions, etc. It is impossible to
create multiple aliased handles to the same underlying storage.
The case for `HitObject`s is signicantly *more* messy, because:
* In NVAPI/HLSL a `HitObject` is effectively a "value type" in that
it only exposes constructors, and there is no way to mutate the
state of a `HitObject` other than by assignment to a variable of that
type. It makes no semantic difference whether a `HitObject` directly
stores the value(s), or if it is a handle, since there is no way
to introduce aliasing of mutable state. Assignment of `HitObject`s
semantically creates a copy.
* In GLSL/SPIR-V, a `hitObjectNV` is, like a `rayQueryEXT`, a handle
to underlying mutable state. These handles cannot be assigned,
returned from functions, etc. There is no way to make a copy of
a hit object.
This change includes several changes to how *both* `RayQuery<>` and
`HitObject` are implemented, with the intention of getting more cases
to work correctly when compiling for GLSL/SPIR-V, and to set up a
more clear mental model for the semantics we want to give to these
types in Slang, and how those semantics can/should map to our targets.
An overview of important changes:
* Marked a few operations on `RayQuery` as `[mutating]` that
realistically should have already been that way.
* Marked the `HitObject` type as being non-copyable (an attribute we
do not currently enforce), and marked the various GLSL operations that
construct a hit object as having an `out` parameter of the `HitObject`
type (even if they are nominally specified in GLSL as not writing
to the correspondign parameter).
* Added a distinct IR opcode (`allocateOpaqueHandle`) to represent the
implicit allocation that happens when declaring a variable of type
`HitObject` or `RayQuery`, and made the "implicit constructor" for
those types map to the new op. This operation took a lot of tweaking
to get emitting in a reasonable way, and I'm still not 100% sure that
all of the emission-related logic for it is strictly required
(or correct).
* Added new IR instructions for `HitObject` and `RayQuery` types, and
made the stdlib types map to those IR instructions.
* Treat `HitObject` and `RayQuery` as resource types for the purpose
of our existing pass that specializes calls to functions that have
outputs of resource type
* Added a new test case that includes a function that returns a
`HitObject` as its result.
* Many test cases saw slight changes in their output (especially around
the relative ordering of declarations of `HitObject`s and `RayQuery`s
with other instructions)
* Remove debugging logic
|
|
|
|
* Various dxc/fxc compatibility fixes.
* Cleanup.
* Fix test cases.
* Fix comments.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Fix Phi simplification bug.
* Fix up.
* Fix.
* Fix.
* Fix.
* Fix.
* Fix.
* Fix test.
* Fix test.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* More control flow and Phi param simplifications.
* Fix.
* Fix gcc error.
* Fix.
* More IR cleanup.
* Fix bug in phi param dce + ifelse simplify.
* Propagate and DCE side-effect-free functions.
* Enhance CFG simplifcation to remove loops with no side effects.
* Fix.
* Fixes.
* Fix tests. Add [__AlwaysFoldIntoUseSite] for rayPayloadLocation.
* More cleanup.
* Fixes.
* Fix.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
|
|
* Register allocation during phi elimination.
* Enhance the test case.
* Cleanup line breaks in test case.
* remove unncessary line break changes.
* More cleanups.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Reimplement address elimination pass.
* Fix error.
* Update test references.
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
|
|
parameter on GLSL. (#2207)
Improved the trace-ray-inline test to check that the flag is not ignored anymore.
|
|
* Various gfx fixes.
* Fix test case.
* Fix crash.
* Trigger build
* Trigger build 2
* Fix vulkan unit tests.
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Work to mitigate SPIR-V bloat
SPIR-V is not an especially compact format, but some patterns in how Slang generates code and then runs it through `spirv-opt` lead to many redundant field-by-field copy operations being emitted. This change attempts to address some of the resulting bloat from the Slang side of things.
Note: experimentation shows that the bloat is less pronounced when running either *no* SPIR-V optimizations or *full* SPIR-V optimizations, so it is also likely that the bloat should be addressed by changing which `spirv-opt` passes the Slang compiler runs in default (`-O1`) builds. Such changes should come as a distinct pull request.
This change primarily does two things:
First, the code generation strategy for passing arguments to `out` and `inout` parameters has been changed. In the past, the compiler would *always* copy the argument value into a temporary, then pass the address of the temporary, and then write back the value after the call. The new code generation strategy attempts to identify when an argument value already has a simple address in memory and passes that address directly when possible. This eliminates many copy operations that occur before/after calls to functions with `out`/`inout` parameters.
Second, we introduce an IR optimization pass that detects call sites where the entire contents of a buffer (usually a constant buffer) is being passed to a callee function, such that many bytes are loaded and then passed even if only very few are used in the callee. The pass moves the load operations from the caller to a specialized version of the the callee where possible (e.g., when the constant buffer in question is a global shader parameter). Doing this eliminates another major category of copies.
Notes:
* The IR lowering logic is complicated by the fact that several kinds of l-values (values that are usable as the desitnation of assignment, or for `out`/`inout` arguments) are not actually addressable. An easy example is a non-contiguous swizzle like `v.xwz` on a `float4`, where the value occupies 12 bytes, but not 12 consecutive bytes with a single address. There are many more corner cases like that and the IR lowering pass carries a lot of complexity to deal with them. A more systematic overhaul is due some time soon.
* The IR representation of `out` and `inout` parameters deserves some careful scrutiny when making these kinds of changes. The official semantics of `inout` in HLSL has been "copy in copy out" (and `out` is just "copy out") which is observably different from any solution that passes in the address of an l-value directly. By making this change we are saying that Slang's semantics are not precisely those of legacy HLSL, and that our semantics for `inout` parameters are closer to those of `inout` in Swift or of a mutable borrow in Rust. In the Swift case the implementation can freely pass the underlying storage of an l-value or the address of a temporary, and valid programs may not observe the different. It is thus illegal to observe the value in a storage local while a mutation to that location is "in flight." All of this is way more detailed and technical than 99% of Slang users will ever care about, but importantly it gives us semantic cover to eliminate these copies in the IR, and also to emit output C++ code that implements `out` and `inout` as by-reference parameter passing.
* There was an exsting generic pass for specializing functions based on call sites that uses a "template method" style of pattern to customize its behavior. That pass needed to be generalized to handle this use case because it had previously operated on the assumption that the "desire" to specialize a callee function must be driven by the parameter declarations of that function, and not on the argument values passed in. The code has been slightly refactored to allow the policy for specialization to consider both parameters and arguments.
* Unsurprisingly, a bunch of the GLSL (and thus SPIR-V) generated has changed with this work, so several baseline `.slang.glsl` files needed to be updated.
* This change is incomplete in that it does not address broader cases of buffer loads, including both partial loads from constant buffers (just loading one field, but a field that uses a "large" structure type), and loads from multi-element buffers (a lot from a structured buffer where the element type is "large"). The main question in each of those cases is how to define how "large" a structure needs to be before we decide to try and sink loads into callee functions like this. In the worst case, sinking loads in this way may actually create *more* memory traffic (because the same values get loaded in multiple callee functions).
* fixup: run premake
* fixup: typo
|
|
* Fix handling of RT accelerations structures for non-RT stages
The recent change that added support for the `GL_EXT_ray_query` extension made is so that a shader that declares a `RaytracingAccelerationStructure` as an input to a non-RT shader stage but then never *uses* it wouldn't enable any RT extension, resulting in a compilation failure in glslang.
This change reverts that behavior so that such shaders enable `GL_EXT_ray_tracing`, since that is the older of the two RT extensions that introduce `accelerationStructureEXT`. It is possible that we will need to revisit this decision based on which of the two extensions ends up being more broadly supported, but I think that right now it is fair to say that there exist drivers that support `GL_EXT_ray_tracing` but not `GL_EXT_ray_query`, so the former is the better default.
* fixup: failing test
|
|
For the most part, this translation is straightforward because the `GL_EXT_ray_query` extension is well aligned with the DXR 1.1 `RayQuery` feature. Many function map one-to-one from one extension to the other.
A few notable details:
* The equivalent of the `RayQuery<Flags>` type is non-generic in GLSL, and the GLSL path previously didn't have support for trying to look up an intrinsic type name on an IR type declaration, so that required some tweaks to the emit logic.
* All the GLSL functions are free functions instead of member functions, but our IR doesn't recognize that distinction anyway
* The main `TraceRayInline()` call is the one that took the most tweaking, just because it takes a `RayDesc` structure for D3D/HLSL but takes individual vector sand scalars for VK/GLSL. The approach here is a standard one for how we manage this stuff in the stdlib (and I wanted to avoid adding even more `$` magic for intrinsics).
* For several other calls, the HLSL API had distinct `Candidate***()` and `Committed***()` calls that return information about a candidate hit vs. the one committed into the query. In contrast, the GLSL API uses a single call that takes an additional "must be compile-time constant" `bool` parameter to select between the two behaviors. This is even the case for one call that basically returns a value of a different `enum` type depending on the state of that `bool`. The D3D API model here seems almost strictly better and I have no idea why the GLSL extension was defined this way.
* Because both the `GL_EXT_ray_query` and `GL_EXT_ray_tracing` extensions declare the `accelerationStructureEXT` type, we can no longer infer what extension is supposed to be used based only on the presene of such a type. The logic right now is a bit slippery, because in theory a program that declares an acceleration structure but never traces into it could end up getting a compilation error now. We will have to see if that corner case comes up in practice. :(
The one big detail that is looming after doing this work is that both the HLSL and GLSL exposures of ray queries are extremely "slippery" about the actual identity of queries (e.g., when is one query a copy of another, vs. just being a new variable that references the existing query). Somehow queries get their identity from the original declaration, and as such our "default constructor" approach to them seems semanticay correct, but the whole thing is kind of slippery at a foundational level and I don't know how to fix it with the API as defined. Oh well; just something to keep an eye on.
Co-authored-by: Yong He <yonghe@outlook.com>
|