slang.git - Making it easier to work with shaders

Age	Commit message (Collapse)	Author
2022-08-17	Warning on lossy implicit casts. (#2367)	Yong He
	* Warning on bool to float conversion. * Fix test cases. * Improve. * LanguageServer: don't show constant value for non constant variables. * Fix tests. * Fix warnings in tests. Co-authored-by: Yong He <yhe@nvidia.com>
2022-04-25	Fixed the implementation of RayQuery flags passed through the generic ↵	Alexey Panteleev
	parameter on GLSL. (#2207) Improved the trace-ray-inline test to check that the flag is not ignored anymore.
2022-02-25	Improved SCCP, inlining and resource specialization passes, legalize ↵	Yong He
	`ImageSubscript` for GLSL (#2146)
2022-02-16	Various gfx fixes. (#2132)	Yong He
	* Various gfx fixes. * Fix test case. * Fix crash. * Trigger build * Trigger build 2 * Fix vulkan unit tests. Co-authored-by: Yong He <yhe@nvidia.com>
2021-07-21	Work to mitigate SPIR-V bloat (#1914)	Theresa Foley
	* Work to mitigate SPIR-V bloat SPIR-V is not an especially compact format, but some patterns in how Slang generates code and then runs it through `spirv-opt` lead to many redundant field-by-field copy operations being emitted. This change attempts to address some of the resulting bloat from the Slang side of things. Note: experimentation shows that the bloat is less pronounced when running either no SPIR-V optimizations or full SPIR-V optimizations, so it is also likely that the bloat should be addressed by changing which `spirv-opt` passes the Slang compiler runs in default (`-O1`) builds. Such changes should come as a distinct pull request. This change primarily does two things: First, the code generation strategy for passing arguments to `out` and `inout` parameters has been changed. In the past, the compiler would always copy the argument value into a temporary, then pass the address of the temporary, and then write back the value after the call. The new code generation strategy attempts to identify when an argument value already has a simple address in memory and passes that address directly when possible. This eliminates many copy operations that occur before/after calls to functions with `out`/`inout` parameters. Second, we introduce an IR optimization pass that detects call sites where the entire contents of a buffer (usually a constant buffer) is being passed to a callee function, such that many bytes are loaded and then passed even if only very few are used in the callee. The pass moves the load operations from the caller to a specialized version of the the callee where possible (e.g., when the constant buffer in question is a global shader parameter). Doing this eliminates another major category of copies. Notes: * The IR lowering logic is complicated by the fact that several kinds of l-values (values that are usable as the desitnation of assignment, or for `out`/`inout` arguments) are not actually addressable. An easy example is a non-contiguous swizzle like `v.xwz` on a `float4`, where the value occupies 12 bytes, but not 12 consecutive bytes with a single address. There are many more corner cases like that and the IR lowering pass carries a lot of complexity to deal with them. A more systematic overhaul is due some time soon. * The IR representation of `out` and `inout` parameters deserves some careful scrutiny when making these kinds of changes. The official semantics of `inout` in HLSL has been "copy in copy out" (and `out` is just "copy out") which is observably different from any solution that passes in the address of an l-value directly. By making this change we are saying that Slang's semantics are not precisely those of legacy HLSL, and that our semantics for `inout` parameters are closer to those of `inout` in Swift or of a mutable borrow in Rust. In the Swift case the implementation can freely pass the underlying storage of an l-value or the address of a temporary, and valid programs may not observe the different. It is thus illegal to observe the value in a storage local while a mutation to that location is "in flight." All of this is way more detailed and technical than 99% of Slang users will ever care about, but importantly it gives us semantic cover to eliminate these copies in the IR, and also to emit output C++ code that implements `out` and `inout` as by-reference parameter passing. * There was an exsting generic pass for specializing functions based on call sites that uses a "template method" style of pattern to customize its behavior. That pass needed to be generalized to handle this use case because it had previously operated on the assumption that the "desire" to specialize a callee function must be driven by the parameter declarations of that function, and not on the argument values passed in. The code has been slightly refactored to allow the policy for specialization to consider both parameters and arguments. * Unsurprisingly, a bunch of the GLSL (and thus SPIR-V) generated has changed with this work, so several baseline `.slang.glsl` files needed to be updated. * This change is incomplete in that it does not address broader cases of buffer loads, including both partial loads from constant buffers (just loading one field, but a field that uses a "large" structure type), and loads from multi-element buffers (a lot from a structured buffer where the element type is "large"). The main question in each of those cases is how to define how "large" a structure needs to be before we decide to try and sink loads into callee functions like this. In the worst case, sinking loads in this way may actually create more memory traffic (because the same values get loaded in multiple callee functions). * fixup: run premake * fixup: typo
2021-03-16	Fix the "acceleration structure in compute" bug for GL_NV_ray_tracing too ↵	Tim Foley
	(#1759) A recent change broke code that uses `RayTracingAccelerationStructure` in non-RT shader stages for Vulkan/GLSL when also not doing any ray tracing in the shader code. A recent fix patched that up for code using `GL_EXT_ray_tracing` and/or `GL_EXT_ray_query`, but that fix didn't apply on the path that uses `GL_NV_ray_tracing` via an opt-in. This change fixes that gap and checks in a test for it.
2021-03-15	Fix handling of RT accelerations structures for non-RT stages (#1753)	Tim Foley
	* Fix handling of RT accelerations structures for non-RT stages The recent change that added support for the `GL_EXT_ray_query` extension made is so that a shader that declares a `RaytracingAccelerationStructure` as an input to a non-RT shader stage but then never uses it wouldn't enable any RT extension, resulting in a compilation failure in glslang. This change reverts that behavior so that such shaders enable `GL_EXT_ray_tracing`, since that is the older of the two RT extensions that introduce `accelerationStructureEXT`. It is possible that we will need to revisit this decision based on which of the two extensions ends up being more broadly supported, but I think that right now it is fair to say that there exist drivers that support `GL_EXT_ray_tracing` but not `GL_EXT_ray_query`, so the former is the better default. * fixup: failing test
2021-03-08	Add GLSL support for SV_InnerCoverage (#1740)	Tim Foley
	This was a fairly straightforward addition once I found the correct GLSL extension spec to use.
2021-03-05	Add Vulkan/SPIR-V support for TraceRayInline() (#1737)	Tim Foley
	For the most part, this translation is straightforward because the `GL_EXT_ray_query` extension is well aligned with the DXR 1.1 `RayQuery` feature. Many function map one-to-one from one extension to the other. A few notable details: * The equivalent of the `RayQuery<Flags>` type is non-generic in GLSL, and the GLSL path previously didn't have support for trying to look up an intrinsic type name on an IR type declaration, so that required some tweaks to the emit logic. * All the GLSL functions are free functions instead of member functions, but our IR doesn't recognize that distinction anyway * The main `TraceRayInline()` call is the one that took the most tweaking, just because it takes a `RayDesc` structure for D3D/HLSL but takes individual vector sand scalars for VK/GLSL. The approach here is a standard one for how we manage this stuff in the stdlib (and I wanted to avoid adding even more `$` magic for intrinsics). * For several other calls, the HLSL API had distinct `Candidate*()` and `Committed()` calls that return information about a candidate hit vs. the one committed into the query. In contrast, the GLSL API uses a single call that takes an additional "must be compile-time constant" `bool` parameter to select between the two behaviors. This is even the case for one call that basically returns a value of a different `enum` type depending on the state of that `bool`. The D3D API model here seems almost strictly better and I have no idea why the GLSL extension was defined this way. Because both the `GL_EXT_ray_query` and `GL_EXT_ray_tracing` extensions declare the `accelerationStructureEXT` type, we can no longer infer what extension is supposed to be used based only on the presene of such a type. The logic right now is a bit slippery, because in theory a program that declares an acceleration structure but never traces into it could end up getting a compilation error now. We will have to see if that corner case comes up in practice. :( The one big detail that is looming after doing this work is that both the HLSL and GLSL exposures of ray queries are extremely "slippery" about the actual identity of queries (e.g., when is one query a copy of another, vs. just being a new variable that references the existing query). Somehow queries get their identity from the original declaration, and as such our "default constructor" approach to them seems semanticay correct, but the whole thing is kind of slippery at a foundational level and I don't know how to fix it with the API as defined. Oh well; just something to keep an eye on. Co-authored-by: Yong He <yonghe@outlook.com>
2021-03-03	Add GLSL/SPIR-V support got GetAttributeAtVertex (#1733)	Tim Foley
	This change allows varying fragment shader inputs to be declared in a way that allows the `GetAttributeAtVertex` operation to compile to valid code for both D3D and GLSL/SPIR-V/Vulkan. The key is that rather than just use ordinary `nointerpolation`-qualified inputs the code must declare these varying inputs with a new `pervertex` qualifier that marks them as only being usable with `GetAttributeAtVertex`. The `pervertex`-tagged inputs then translate to GLSL inputs using the `pervertexNV` qualifier Note that this change does not include any enforcement of the requirements around how these qualifiers are used (and the compiler doesn't have enforcement for the existing operations like `EvaluateAttributeAtCentroid`). The underlying problem is that the inerpolation-mode qualifiers and explicit interpolation functions in HLSL constitute a kind of rate-qualified type system, but without any systematic rules. It seems wasteful to encode a bunch of ad hoc rules for this stuff as special cases in the compiler when the clear right answer is to implement a systematic approach to rates.
2021-02-24	Add support for GetAttributeAtVertex for D3D (#1725)	Tim Foley
	This operation was added along with the `SV_Barycentrics` system-value input, and allows for a `nointerpolation` varying input to a fragment shader to be fetched at a specific vertex index within the primitive that is causing the fragment shader to be invoked. This change adds support for the new operations in the standard library, and also includes a test case to make sure that we emit it correctly when producing HLSL/DXIL. This change also includes a small bug fix to our emission logic for function parameters so that we properly emit layout-related attributes for varying parameters declared directly on an entry point. (Note that most attribute end up being declared in `struct` types in existing HLSL shaders, and our IR passes produce only global variables for attributes on GLSL; the only case this affects is inidividual scalar/vector attributes declared declared as entry-point parameters, when outputting HLSL) Note that this change only adds support for the new function on the HLSL/DXIL path, and doesn't yet add any cross-compilation support for GLSL/SPIR-V. The reason for this is that the equivalent GLSL feature(s) appear to use a different model to the HLSL version, and we need to invent a suitable approach to align them to make portable code possible.
2021-02-22	Add basic support for fragment shader interlock (FSI) (#1722)	Tim Foley
	Both D3D "rasterizer ordered views" (ROVs) and GLSL "fragment shader interlock" (FSI) are aimed at the same basic use case: they allow for fragment shaders to contain operations that require mutual exclusion and/or deterinistics ordering between fragment shader invocations that affect the same framebuffer coordinates. The language-level exposure of the features varies greatly between the two API families, though: * ROVs define an implicit ordering and mutual exclusion constraint: certain resoure parameters are marked as `RasterizerOrdered`, and reads/writes to these resources must be sequences as if fragment-shader invocations ran in sequential order for each pixel. * FSI defines paired begin/end functions that mark a critical section of code. All memory operations in the critical section must be sequences as if fragment-shader invocations ran in sequential order for each pixel. In order to make this model tractable, only a single critical section is allowed per fragment shader, and the begin/end must appear at the top level of the shader entry point function (not under control flow or after a possible conditional `return`. The simplest way for Slang to support portable programs that run across both API families is to insist that code that cares about these ordering guarantees must use both mechanisms, and then each of them will only affect the API that cares about it. Slang already supports ROV resource types, and already lowers them to plain textures for GLSL/SPIR-V. This change adds the missing feature of a begin/end function pair for FSI, which will map to empty functions on non-GLSL targets.
2021-01-29	Fix issue when passing ray query to a subroutine (#1680)	Tim Foley
	The problem would manifest for any code that declared a DXR 1.1 `RayQuery` value, but then only used it as one location in their code. The most common way for this to arise in user code was declaring a `RayQuery` and then handing it off to a helper/worker subroutine. RayQuery<0> myRayQuery; helperRoutine(myRayQuery, ...); The root cause was in the emit logic, where the initialization of `myRayQuery` above (a `defaultConstruct` operation in our IR) was getting folded into its (only) use site. This folding makes some sense, because the initialization of a ray query is not an operation with side effects, but doesn't work in practice because our way of handling default construction in HLSL output is by using a variable declaration. The simple fix here is to ensure that `defaultConstruct` instructions never get folded into use sites. If we decide to revisit the logic here, it might be possible to separate out the case where a `defaultConstruct` is being used as a stand-alone instruction, where we can emit it as: RayQuery<0> myRayQuery; versus cases where the `defaultConstruct` is being used as a sub-expression, such as: helperRoutine(RayQuery<0>(), ...); Whether or not we can emit the latter form (or if it would be equivalent) depends on details of how constructors like this are being implemented in dxc. For now it seems safest to emit things in a form that is obviously expected to work. Aside: Historically, the HLSL language has had no notion of "constructors" as being a thing. A variable that is declared but not initialized in HLSL has always been left uninitialized, since the first version of the language. The `RayQuery` type in DXR 1.1 is the first example of a type that appears to have a C++-style "default constructor," although HLSL as implemented by dxc still does not expose constructors as a user-visible or documented feature. (There is the small detail that the DXR 1.0 `HitGroup` type also relied on C++ constructor syntax, but I'm not aware of anybody using that feature right now, so it is mostly a curiosity.)
2021-01-15	Convert more tests to use shader objects (#1659)	Tim Foley
	This change converts a large number of our existing tests to use the `ShaderObject` support that was added to the `gfx` layer. In many cases, tests were just updated to pass `-shaderobj` and the result Just Worked. In other cases, a `name` attribute had to be added to one or more `TEST_INPUT` lines. For tests that did not work with shader objects "out of the box," I spent a little bit of time trying to get them work, but fell back to letting those tests run in the older mode. Future changes to the infrastructure will be needed to get those additional tests working in the new path. Along with the changes to test files, the following implementation changes were made to get additional tests working: * Because the shader object mode uses explicit register bindings (from reflection), the hacky logic that was offseting `u` registers for D3D12 based on the number of render targets gets disabled (by another hack). * The "flat" reflection information coming from Slang was not correctly reporting "binding ranges" for things that consumed only uniform data (which would be everything on CUDA/CPU), so it was refactored to properly include binding ranges for anything where the type of the field/variable implied a binding range should be created (even if the `LayoutResourceKind` was `::Uniform`). * A few fixes were made to the CUDA implementation of `Renderer`, in order to get additional tests up and running. Most of these changes had to do with texture bindings, which hadn't really been tested previously. In addition, a few changes were made that were attempts at getting more tests working, but didn't actually help. These could be dropped if requested: * As a quality-of-life feature (not being used) the `object` style of `TEST_INPUT` line is upgraded to support inferring the type to use from the type of the input being set. * Any `object` shader input lines get ignored in non-shader-object mode.
2020-07-10	CUDA/CPU varying compute inputs as IR pass (#1438)	Tim Foley
	The main change here is that the CPU and CUDA C++ emit paths now rely on an earlier IR pass to legalize the varying parameter list of a kernel and translate references to varying parameters with semantics like `SV_DispatchThreadID`. Doing so removes a lot of special-case logic from the emit passes. This work moves us even closer to being able to eliminate `KernelContext` from the CPU/CUDA emit logic, because it removes the issue of state related to varying inputs being stored in `KernelContext`. The new pass that handles the legalization is in `slang-ir-legalize-varying-params.cpp`, and it borrows heavily from the existing `slang-ir-glsl-legalize.cpp` pass. The new pass factors out the target-independent and target-dependent logic, so that both CPU and CUDA can share much of the same code despite having very different rules for how the system-value parameters are being provided. An eventual goal is to have the new pass also handle the GLSL case, but doing so requires copying even more logic out of the GLSL-specific pass, and doing so seemed like a step to far for what was meant to be a stepping-stone change as part of other work. As a result of the incomplete nature of the pass, certain cases don't work for compute shader inputs for CPU/CUDA (e.g., wrapping your varying inputs in a `struct` type parameter), but those were cases that also didn't work in the existing `emit`-based logic. One major consequence of this change is that the logic for emitting the various different functions that represent an entry point for our CPU back-end has been streamlined and simplified. The original logic had a fair bit of cleverness built in to try and avoid unnecessary math ops when computing the various IDs/indices, while the new logic is much more simplistic (the main dispatch function loops over threadgroups with a triply-nested `for` and then delegates to the group-level function loops over threads with its own nested `for`s). Longer term, it will be important to simplify the CPU functions we emit further, by eliminating things like the `_Thread` function that should never really be exposed to users (the minimum granularity of invoking a CPU compute kernel should be a single threadgroup). We may eventually decide to synthesize all of the extra code that is being generated in the `emit` pass as IR instead.
2020-04-22	Disable OptiX tests by default. (#1331)	Tim Foley
	When running `slang-test`, the OptiX tests will be skipped by default for now, and must be explicitly enabled by adding `-category optix` on the command line. I will need to add a better discovery mechanism down the line, closer to how support for different graphics APIs is being tested, but for now this should be enough to unblock our CI builds.
2020-04-17	Add support for global shader parameters to OptiX path (#1323)	Tim Foley
	There are two main pieces here. First, we specialize the code generaiton for CUDA kernels to account for the way that shader parameters are passed differently for ordinary compute kernels vs. ray-tracing kernels. Both global and entry-point shader parameters in Slang are translated to kernel function parameters for CUDA compute kernels, while for OptiX ray tracing kernels we need to use a global `__constant__` variable for the global parameters, and the SBT data (accessed via an OptiX API function) for entry-point shader parameters. This choice bakes in a few pieces of policy when it comes to how Slang ray-tracing shaders translate to OptiX: * It fixes the name used for the global `__constant__` variable for global shader parameters to be `SLANG_globalParams`. Since that name has to be specified when creating a pipeline with the OptiX API, the choice of name effectively becomes an ABI contract for Slang's code generation. * It fixes the choice that global parameters in Slang map to per-launch parameters in OptiX, and entry-point parameters in Slang map to SBT-backed parameters in OptiX. This is a reasonable policy, and it is also one that we are likely to need to codify for Vulkan as well, but it is always a bit unfortunate to bake policy choices like this into the compiler (especially when shaders compiled for D3D can often decouple the form of their HLSL/Slang code from how things are bound in the API). The second piece is a lot of refactoring of the logic in `render-test/cuda/cuda-compute-util.cpp`, so that the logic for setting up (and reading back) the buffers of parameter data can be shared between the compute and ray-tracing paths. The result may not be a true global optimum for how the code is organized, but it at least serves the goal of not duplicating the parameter-binding logic between compute and ray-tracing.