slang.git - Making it easier to work with shaders

Age	Commit message (Collapse)	Author
2025-09-02	render-test: Change D3D12 default to sm_6_5 (#8320)	James Helferty (NVIDIA)
	Changes default for render-test to sm_6_5. Since sm_6_5 is the new default, remove the -use-dxil option, add -use-dxcb option Remove -use-dxil option from all test cases. Add -use-dxcb to two tests that needed it. Fixes #7611
2025-08-25	Fix#8082: Batch-6: Enable cuda tests (#8266)	Harsh Aggarwal (NVIDIA)

2025-08-25	Fix#8081: Batch-5: Enable cuda tests (#8263)	Harsh Aggarwal (NVIDIA)

2025-07-03	Enable tests for CUDA (#7593)	Mukund Keshava
	Enable intrinsic tests for cuda. Most of these tests were either disabled or just not enabled for cuda. Fixes #7592 Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
2022-08-17	Warning on lossy implicit casts. (#2367)	Yong He
	* Warning on bool to float conversion. * Fix test cases. * Improve. * LanguageServer: don't show constant value for non constant variables. * Fix tests. * Fix warnings in tests. Co-authored-by: Yong He <yhe@nvidia.com>
2021-11-11	Fix two test cases that were causing problems (#2011)	Theresa Foley
	First, we have a CUDA-only test that simply needed a format name to be changed to match the new conventions in `gfx`. Second, we have one of the "active mask" tests that seems to produce different results locally for developers (under Vulkan) than it does on CI. This is almost certainly down to differences in GPUs and/or drivers. The inconsistency ultimately proves the point that I was trying to make when I wrote those tests - the "active mask" concept is effectively meaningless as exposed in D3D and Vulkan because it has not been specified in a way that allows programmers to reason about its value, and drivers have implemented wildly different interpretations of its supposed semantics for so long that there is no real hope of turning `WaveGetActiveMask()` into something that returns a well-defined value in any but the most trivial cases. TLDR: I disabled that test for Vulkan, which means it is completely disabled.
2021-01-15	Convert more tests to use shader objects (#1659)	Tim Foley
	This change converts a large number of our existing tests to use the `ShaderObject` support that was added to the `gfx` layer. In many cases, tests were just updated to pass `-shaderobj` and the result Just Worked. In other cases, a `name` attribute had to be added to one or more `TEST_INPUT` lines. For tests that did not work with shader objects "out of the box," I spent a little bit of time trying to get them work, but fell back to letting those tests run in the older mode. Future changes to the infrastructure will be needed to get those additional tests working in the new path. Along with the changes to test files, the following implementation changes were made to get additional tests working: * Because the shader object mode uses explicit register bindings (from reflection), the hacky logic that was offseting `u` registers for D3D12 based on the number of render targets gets disabled (by another hack). * The "flat" reflection information coming from Slang was not correctly reporting "binding ranges" for things that consumed only uniform data (which would be everything on CUDA/CPU), so it was refactored to properly include binding ranges for anything where the type of the field/variable implied a binding range should be created (even if the `LayoutResourceKind` was `::Uniform`). * A few fixes were made to the CUDA implementation of `Renderer`, in order to get additional tests up and running. Most of these changes had to do with texture bindings, which hadn't really been tested previously. In addition, a few changes were made that were attempts at getting more tests working, but didn't actually help. These could be dropped if requested: * As a quality-of-life feature (not being used) the `object` style of `TEST_INPUT` line is upgraded to support inferring the type to use from the type of the input being set. * Any `object` shader input lines get ignored in non-shader-object mode.
2020-07-24	Enable CUDA for active-mask tests. (#1458)	jsmall-nvidia
	Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-07-23	Fix for vulkan tests failing (#1456)	jsmall-nvidia
	* Clean up device when VKRenderer dtor is run. Added destroy methods to VulkanSwapChain & VulkanDeviceQueue * Small fixes around testing if DeviceQueue is valid. * Disable active-mask tests. Different drivers appear to change the results.
2020-06-05	Fixes for active mask synthesis + tests (#1370)	Tim Foley
	* Fixes for active mask synthesis + tests There are two fixes here: * The code generation that follows active mask synthesis was requiring CUDA SM architecture version 7.0 for one of the introduced instructions, but not all of them. This change centralizes the handling of upgrading the required CUDA SM architecture version, and makes sure that the instructions introduced by active mask synthesis request version 7.0. * The tests for active mask synthesis were not flagged as requiring the `cuda_sm_7_0` feature when invoking `render-test-tool`, which meant they run but produce unexpected results when invoked on a GPU without the required semantics for functions like `__ballot_sync()`. This change adds the missing `-render-feature cuda_sm_7_0` to those tests. * fixup: mark more tests that rely on implicit active mask
2020-06-03	Disable CUDA active mask tests as failing on CI. (#1367)	jsmall-nvidia

2020-05-26	Synthesize "active mask" for CUDA (#1352)	Tim Foley
	* Synthesize "active mask" for CUDA The Big Picture =============== The most important change here is to `hlsl.meta.slang`, where the declaration of `WaveGetActiveMask()` is changed so that instead of mapping to `__activemask()` on CUDA (which is semantically incorrect) it maps to a dedicated IR instruction. The other `WaveActive()` intrinsics that make use of the implicit "active mask" concept had already been changed in #1336 so that they explicitly translate to call the equivalent `WaveMask()` intrinsic with the result of `WaveGetActiveMask()`. As a result, all of the `WaveActive()` functions are now no different from a user-defined function that uses `WaveGetActiveMask()`. The bulk of the work in this change goes into an IR pass to replace the new instruction for getting the active mask gets replaced with appropriately computed values before we generate output CUDA code. That work is in `slang-ir-synthesize-active-mask.{h,cpp}`. Utilities ========= There are a few pieces of code that were helpful in writing the main pass but that can be explained separately: IR instructions were added corresponding to the Slang `WaveMaskBallot()` and `WaveMaskMatch()` functions, which map to the CUDA `__ballot_sync()` and `__match_any_sync()` operations, respectively. These are only implemented for the CUDA target because they are only being generated as part of our CUDA-only pass. * The `IRDominatorTree` type was updated to make it a bit more robust in the presence of unreachable blocks in the CFG. It is possible that the same ends could be achieved more efficiently by folding the corner cases into the main logic, but I went ahead and made things very explicit for now. * I added an `IREdge` utility type to better encapsulate the way that certain code operating on the predecessors/successors of an `IRBlock` were using an `IRUse` to represent a control-flow edge. The `IREdge` type makes the logic of those operations more explicit. A future change should proably change it so that `IRBlock::getPredecessors()` and `getSuccessors()` are instead `getIncomingEdges()` and `getOutgoingEdges()` and work as iterators over `IREdge` values, given the way that the predecessor and successor lists today can contain duplicates. Using the above `IREdge` type, the logic for detecting and break critical edges was broken down into something that is a bit more clear (I hope), and that also factors out the breaking of an edge (by inserting a block along it) into a reusable subroutine. The Main Pass ============= The implementation of the new pass is in `slang-ir-synthesize-active-mask.cpp`, and that file attempts to include enough comments to make the logic clear. A brief summary for the benefit of the commit history: * The first order of business is to identify functions that need to have the active mask value piped into them, and to add an additional parameter to them so that the active mask is passed down explicitly. Call sites are adjusted to pass down the active mask which can then result in new functions being identified as needing the active mask. * The next challenge is for a function that uses the active mask, to compute the active mask value to use in each basic block. The entry block can easily use the active mask value that was passed in, while other blocks need more work. * When doing a conditional branch, we can compute the new mask for the block we branch to as a function of the existing mask and the branch condition. E.g., the value `WaveMaskBallot(existingMask, condition)` can be used as the mask for the "then" block of an `if` statement. * When control flow paths need to "reconverge" at a point after a structured control-flow statement, we need to insert logic to synchronize and re-build the mask that will execute after the statement, while also excluding any lanes/threads that exited the statement in other ways (e.g., an early `return` from the function). The explanation here is fairly hand-wavy, but the actual pass uses much more crisp definitions, so the code itself should be inspected if you care about the details. Tests ===== The tests for the new feature are all under `tests/hlsl-intrinsic/active-mask/`. Most of them stress a single control-flow construct (`if`, `switch`, or loop) and write out the value of `WaveGetActiveMask()` at various points in the code. In practice, our definition of the active mask doesn't always agree with what D3D/Vulkan implementations seem to produce in practice, and as a result a certain amount of effort has gone into adding tweaks to the tests that force them to produce the expected output on existing graphics APIs. These tweaks usually amount to introducing conditional branches that aren't actually conditional in practice (the branch condition is always `true` or always `false` at runtime), in order to trick some simplistic analysis approaches that downstream compilers seem to employ. One test case currently fails on our CUDA target (`switch-trivial-fallthrough.slang`) and has been disabled. This is an expected failure, because making it produce the expected value requires a bit of detailed/careful coding that would add a lot of additional complexity to this change. It seemed better to leave that as future work. Future Work =========== * As discussed under "Tests" above, the handling of simple `switch` statements in the current pass is incomplete. * There's an entire can of worms to be dealt with around the handling of fall-through for `switch`. * The current work also doesn't handle `discard` statements, which is unimportant right now (CUDA doesn't have fragment shaders), but might matter if we decide to synthesize masks for other targets. Similar work would probably be needed if we ever have `throw` or other non-local control flow that crosses function boundaries. * An important optimization opportunity is being left on the floor in this change. When block that comes "after" a structured control-flow region (which is encoded explicitly in Slang IR and SPIR-V) post-dominates the entry block of the region, then we know that the active mask when exiting the region must be the same as the mask when entering the region, and there is no need to insert explicit code to cause "re-convergence." This should be addressed in a follow-on change once we add code to Slang for computing a post-dominator tree from a function CFG. * Related to the above, the decision-making around whether a basic block "needs" the active mask is perhaps too conservative, since it decides that any block that precedes one needing the active mask also needs it. This isn't true in cases where the active mask for a merge block can be inferred by post-dominance (as described above), so that the blocks that branch to it don't need to compute an active mask at all. * If/when we extend the CPU target to support these operations (along with SIMD code generation, I assume), we will also need to synthesize an active mask on that platform, but the approach taken here (which pretty much relies on support for CUDA "cooperative groups") wouldn't seem to apply in the SIMD case. * Similarly, the approach taken to computing the active mask here requires a new enough CUDA SM architecture version to support explicit cooperative groups. If we want to run on older CUDA-supporting architectures, we will need a new and potentially very different strategy. * Because the new pass here changes the signature of functions that require the active mask (and not those that don't), it creates possible problems for generating code that uses dynamic dispatch (via function pointers). In principle, we need to know at a call site whether or not the callee uses the active mask. There are multiple possible solutions to this problem, and they'd need to be worked through before we can make the implicit active mask and dynamic dispatch be mutually compatible. * Related to changing function signatures: no effort is made in this pass to clean up the IR type of the functions it modifies, so there could technically be mismatches between the IR type of a function and its actual signature. If/when this causes problems for downstream passes we probably need to do some cleanup. * fixup: backslash-escaped lines I did some "ASCII art" sorts of diagrams to explain cases in the CFG, and some of those diagrams used backslash (`\`) characters as the last character on the line, causing them to count as escaped newlines for C/C++. The gcc compiler apparently balked at those lines, since they made some of the single-line comments into multi-line comments. I solved the problem by adding a terminating column of `\|` characters at the end of each line that was part of an ASCII art diagram. * fixup: typos Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>