diff options
| author | Tim Foley <tfoleyNV@users.noreply.github.com> | 2020-05-26 15:11:22 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2020-05-26 15:11:22 -0700 |
| commit | e5d0f3360f44a4cdd2390e7817db17bb3cc0dd04 (patch) | |
| tree | ef98562a33308f268aafb7ecc3ffb369cddd08a1 /tests | |
| parent | b1369040c3d6d6a8704bdb17d9de99f36a108e07 (diff) | |
Synthesize "active mask" for CUDA (#1352)
* Synthesize "active mask" for CUDA
The Big Picture
===============
The most important change here is to `hlsl.meta.slang`, where the declaration of `WaveGetActiveMask()` is changed so that instead of mapping to `__activemask()` on CUDA (which is semantically incorrect) it maps to a dedicated IR instruction.
The other `WaveActive*()` intrinsics that make use of the implicit "active mask" concept had already been changed in #1336 so that they explicitly translate to call the equivalent `WaveMask*()` intrinsic with the result of `WaveGetActiveMask()`. As a result, all of the `WaveActive*()` functions are now no different from a user-defined function that uses `WaveGetActiveMask()`.
The bulk of the work in this change goes into an IR pass to replace the new instruction for getting the active mask gets replaced with appropriately computed values before we generate output CUDA code. That work is in `slang-ir-synthesize-active-mask.{h,cpp}`.
Utilities
=========
There are a few pieces of code that were helpful in writing the main pass but that can be explained separately:
* IR instructions were added corresponding to the Slang `WaveMaskBallot()` and `WaveMaskMatch()` functions, which map to the CUDA `__ballot_sync()` and `__match_any_sync()` operations, respectively. These are only implemented for the CUDA target because they are only being generated as part of our CUDA-only pass.
* The `IRDominatorTree` type was updated to make it a bit more robust in the presence of unreachable blocks in the CFG. It is possible that the same ends could be achieved more efficiently by folding the corner cases into the main logic, but I went ahead and made things very explicit for now.
* I added an `IREdge` utility type to better encapsulate the way that certain code operating on the predecessors/successors of an `IRBlock` were using an `IRUse*` to represent a control-flow edge. The `IREdge` type makes the logic of those operations more explicit. A future change should proably change it so that `IRBlock::getPredecessors()` and `getSuccessors()` are instead `getIncomingEdges()` and `getOutgoingEdges()` and work as iterators over `IREdge` values, given the way that the predecessor and successor lists today can contain duplicates.
* Using the above `IREdge` type, the logic for detecting and break critical edges was broken down into something that is a bit more clear (I hope), and that also factors out the breaking of an edge (by inserting a block along it) into a reusable subroutine.
The Main Pass
=============
The implementation of the new pass is in `slang-ir-synthesize-active-mask.cpp`, and that file attempts to include enough comments to make the logic clear. A brief summary for the benefit of the commit history:
* The first order of business is to identify functions that need to have the active mask value piped into them, and to add an additional parameter to them so that the active mask is passed down explicitly. Call sites are adjusted to pass down the active mask which can then result in new functions being identified as needing the active mask.
* The next challenge is for a function that uses the active mask, to compute the active mask value to use in each basic block. The entry block can easily use the active mask value that was passed in, while other blocks need more work.
* When doing a conditional branch, we can compute the new mask for the block we branch to as a function of the existing mask and the branch condition. E.g., the value `WaveMaskBallot(existingMask, condition)` can be used as the mask for the "then" block of an `if` statement.
* When control flow paths need to "reconverge" at a point after a structured control-flow statement, we need to insert logic to synchronize and re-build the mask that will execute after the statement, while also excluding any lanes/threads that exited the statement in other ways (e.g., an early `return` from the function).
The explanation here is fairly hand-wavy, but the actual pass uses much more crisp definitions, so the code itself should be inspected if you care about the details.
Tests
=====
The tests for the new feature are all under `tests/hlsl-intrinsic/active-mask/`. Most of them stress a single control-flow construct (`if`, `switch`, or loop) and write out the value of `WaveGetActiveMask()` at various points in the code.
In practice, our definition of the active mask doesn't always agree with what D3D/Vulkan implementations seem to produce in practice, and as a result a certain amount of effort has gone into adding tweaks to the tests that force them to produce the expected output on existing graphics APIs. These tweaks usually amount to introducing conditional branches that aren't actually conditional in practice (the branch condition is always `true` or always `false` at runtime), in order to trick some simplistic analysis approaches that downstream compilers seem to employ.
One test case currently fails on our CUDA target (`switch-trivial-fallthrough.slang`) and has been disabled. This is an expected failure, because making it produce the expected value requires a bit of detailed/careful coding that would add a lot of additional complexity to this change. It seemed better to leave that as future work.
Future Work
===========
* As discussed under "Tests" above, the handling of simple `switch` statements in the current pass is incomplete.
* There's an entire can of worms to be dealt with around the handling of fall-through for `switch`.
* The current work also doesn't handle `discard` statements, which is unimportant right now (CUDA doesn't have fragment shaders), but might matter if we decide to synthesize masks for other targets. Similar work would probably be needed if we ever have `throw` or other non-local control flow that crosses function boundaries.
* An important optimization opportunity is being left on the floor in this change. When block that comes "after" a structured control-flow region (which is encoded explicitly in Slang IR and SPIR-V) post-dominates the entry block of the region, then we know that the active mask when exiting the region must be the same as the mask when entering the region, and there is no need to insert explicit code to cause "re-convergence." This should be addressed in a follow-on change once we add code to Slang for computing a post-dominator tree from a function CFG.
* Related to the above, the decision-making around whether a basic block "needs" the active mask is perhaps too conservative, since it decides that any block that precedes one needing the active mask also needs it. This isn't true in cases where the active mask for a merge block can be inferred by post-dominance (as described above), so that the blocks that branch to it don't need to compute an active mask at all.
* If/when we extend the CPU target to support these operations (along with SIMD code generation, I assume), we will also need to synthesize an active mask on that platform, but the approach taken here (which pretty much relies on support for CUDA "cooperative groups") wouldn't seem to apply in the SIMD case.
* Similarly, the approach taken to computing the active mask here requires a new enough CUDA SM architecture version to support explicit cooperative groups. If we want to run on older CUDA-supporting architectures, we will need a new and potentially very different strategy.
* Because the new pass here changes the signature of functions that require the active mask (and not those that don't), it creates possible problems for generating code that uses dynamic dispatch (via function pointers). In principle, we need to know at a call site whether or not the callee uses the active mask. There are multiple possible solutions to this problem, and they'd need to be worked through before we can make the implicit active mask and dynamic dispatch be mutually compatible.
* Related to changing function signatures: no effort is made in this pass to clean up the IR type of the functions it modifies, so there could technically be mismatches between the IR type of a function and its actual signature. If/when this causes problems for downstream passes we probably need to do some cleanup.
* fixup: backslash-escaped lines
I did some "ASCII art" sorts of diagrams to explain cases in the CFG, and some of those diagrams used backslash (`\`) characters as the last character on the line, causing them to count as escaped newlines for C/C++.
The gcc compiler apparently balked at those lines, since they made some of the single-line comments into multi-line comments.
I solved the problem by adding a terminating column of `|` characters at the end of each line that was part of an ASCII art diagram.
* fixup: typos
Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
Diffstat (limited to 'tests')
23 files changed, 1060 insertions, 0 deletions
diff --git a/tests/hlsl-intrinsic/active-mask/README.md b/tests/hlsl-intrinsic/active-mask/README.md new file mode 100644 index 000000000..862630433 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/README.md @@ -0,0 +1,14 @@ +Active Mask Tests +================= + +The tests in this directory are designed to ensure that the "active mask" used by HLSL wave-level operations matches what is expected, even on targets where the active mask must be synthesized. + +Note that the exact active mask that should be used on wave operations isn't precisely defined in documentation for HLSL. The nearest thing to a public statement of the intended behavior is this statement on the [wiki for dxc](https://github.com/Microsoft/DirectXShaderCompiler/wiki/Wave-Intrinsics) (emphasis ours): + +> These intrinsics are dependent on active lanes and therefore flow control. In the model of this document, implementations must enforce that the number of active lanes exactly corresponds to the *programmer’s view of flow control*. In a future version, there may be a compiler flag to relax this requirement as a default, but also enable applications to be explicit about the exact set of lanes to be used in a particular wave operation (see section Wave Handles in the Future Features section below). + +The requirement is then to compute an explicit mask that matches the "programmers view" of control flow, which is arguably something up to interpretation. + +The GLSL "subgroup" operations are slightly more precise in the language they use, but ultimately leaves the expected value of the active mask under-specified in many cases. + +The goal of these tests is to establish some empirical results for what the active mask is expected/required to be in various cases. We will do our best to match the observed behavior of APIs where the implicit "active mask" is an existing feature, but we also reserve the right to take a stand and define what the behavior *ought* to be based on the necessarily more precise definitions that we use in the Slang implementation. diff --git a/tests/hlsl-intrinsic/active-mask/for-break.slang b/tests/hlsl-intrinsic/active-mask/for-break.slang new file mode 100644 index 000000000..e9976afa8 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/for-break.slang @@ -0,0 +1,71 @@ +// for-break.slang + +// Test active mask synthesis for a `for` loop that +// has no ordinary exit condition and can thus +// only be exited via a (single) `break`. + +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-cpu -compute +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-slang -compute +//TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -dx12 -use-dxil -profile cs_6_0 -xslang -DHACK +//TEST(compute, vulkan):COMPARE_COMPUTE_EX:-vk -compute -xslang -DHACK +//TEST(compute):COMPARE_COMPUTE_EX:-cuda -compute + +//TEST_INPUT:ubuffer(data=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], stride=4):out,name buffer +RWStructuredBuffer<int> buffer; + +#define THREAD_COUNT 4 +#define LOC_COUNT 4 +#define ITER_COUNT THREAD_COUNT +#define WRITE(LOC, ITER) buffer[tid + (LOC)*THREAD_COUNT + (ITER)*THREAD_COUNT*LOC_COUNT] = 0xA0000000 | (tid << 24) | (ITER << 16) | (LOC << 8) | WaveGetActiveMask() + +//TEST_INPUT:cbuffer(data=[0 1]):name C +cbuffer C +{ + int alwaysFalse; + int alwaysTrue; +} + +void test(int tid) +{ + int ii = 0; + for(;;) + { + WRITE(0, ii); + if(ii >= tid) + { + WRITE(1, ii); + + // Note: This flag has been included to force + // D3D and Vulkan implementations to provide + // the expected/desired behavior for the active + // mask in the preceding code. + // + // It seems that without making the `break` + // conditional, some implementation will see that + // this block post-dominates the entire loop, + // and thus decide that the code is semantically + // "outside" the loop, despite the fact that + // it is clearly lexically *inside* the loop. + // + // Making the `break` conditional introduces + // a new control-flow edge that changes the + // post-dominator relationship and thus makes + // such implementations see this code as being + // "inside" the loop again. + // + #ifdef HACK + if(alwaysTrue) + #endif + break; + } + WRITE(2, ii); + ii++; + } + WRITE(3, ITER_COUNT); +} + +[numthreads(THREAD_COUNT, 1, 1)] +void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID) +{ + test(dispatchThreadID.x); +}
\ No newline at end of file diff --git a/tests/hlsl-intrinsic/active-mask/for-break.slang.expected.txt b/tests/hlsl-intrinsic/active-mask/for-break.slang.expected.txt new file mode 100644 index 000000000..9ddb7cd23 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/for-break.slang.expected.txt @@ -0,0 +1,108 @@ +A000000F +A100000F +A200000F +A300000F +A0000101 +0 +0 +0 +0 +A100020E +A200020E +A300020E +0 +0 +0 +0 +0 +A101000E +A201000E +A301000E +0 +A1010102 +0 +0 +0 +0 +A201020C +A301020C +0 +0 +0 +0 +0 +0 +A202000C +A302000C +0 +0 +A2020104 +0 +0 +0 +0 +A3020208 +0 +0 +0 +0 +0 +0 +0 +A3030008 +0 +0 +0 +A3030108 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +A004030F +A104030F +A204030F +A304030F +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 diff --git a/tests/hlsl-intrinsic/active-mask/for-continue-ext.slang b/tests/hlsl-intrinsic/active-mask/for-continue-ext.slang new file mode 100644 index 000000000..8b8ff5540 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/for-continue-ext.slang @@ -0,0 +1,74 @@ +// for-continue-ext.slang + +// Test case of a `for` loop that +// has multiple paths to continue +// the loop (both the ordinary one +// and an explicit `continue`) + +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-cpu -compute +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-slang -compute +//TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -dx12 -use-dxil -profile cs_6_0 -xslang -DHACK +//TEST(compute, vulkan):COMPARE_COMPUTE_EX:-vk -compute -xslang -DHACK +//TEST(compute):COMPARE_COMPUTE_EX:-cuda -compute + +//TEST_INPUT:ubuffer(data=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], stride=4):out,name buffer +RWStructuredBuffer<int> buffer; + +#define THREAD_COUNT 4 +#define LOC_COUNT 4 +#define ITER_COUNT THREAD_COUNT +#define WRITE_VAL(LOC, ITER, VAL) buffer[tid + (LOC)*THREAD_COUNT + (ITER)*THREAD_COUNT*LOC_COUNT] = 0xA0000000 | (tid << 24) | (ITER << 16) | (LOC << 8) | VAL +#define WRITE(LOC, ITER) WRITE_VAL(LOC, ITER, WaveGetActiveMask()) + +//TEST_INPUT:cbuffer(data=[0 1]):name C +cbuffer C +{ + int alwaysFalse; + int alwaysTrue; +} + +void test(uint tid) +{ + for(int ii = 0; ii < tid; ++ii) + { + WRITE(0, ii); + if(tid & 1) + { + if(tid & 2) + { + // Note: because of the two `if(alwaysFalse) break;` branches + // here, the `WRITE(3,ii)` at the end of the loop body no + // longer post-dominates the entry to the `if`, so that + // implemenations that use immediate post-dominator + // reconvergence will not properly reconvergence lanes + // one and two at that point. + // + // The Slang synthesis pass for the active mask produces + // the expected/desired result even in the presence of + // these additional control-flow edges. + #ifndef HACK + if(alwaysFalse) break; + #endif + + WRITE(1, ii); + continue; + } + else + { + #ifndef HACK + if(alwaysFalse) break; + #endif + + WRITE(2, ii); + } + } + WRITE(3, ii); + } + WRITE(4, 0); +} + +[numthreads(THREAD_COUNT, 1, 1)] +void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID) +{ + test(dispatchThreadID.x); +}
\ No newline at end of file diff --git a/tests/hlsl-intrinsic/active-mask/for-continue-ext.slang.expected.txt b/tests/hlsl-intrinsic/active-mask/for-continue-ext.slang.expected.txt new file mode 100644 index 000000000..41c2e2591 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/for-continue-ext.slang.expected.txt @@ -0,0 +1,108 @@ +0 +A100000E +A200000E +A300000E +0 +0 +0 +A3000108 +0 +A1000202 +0 +0 +0 +A1000306 +A2000306 +0 +A000040F +A100040F +A200040F +A300040F +0 +0 +0 +A3010108 +0 +0 +0 +0 +0 +0 +A2010304 +0 +0 +0 +0 +A3020008 +0 +0 +0 +A3020108 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 diff --git a/tests/hlsl-intrinsic/active-mask/for-continue.slang b/tests/hlsl-intrinsic/active-mask/for-continue.slang new file mode 100644 index 000000000..a117832f7 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/for-continue.slang @@ -0,0 +1,83 @@ +// for-continue.slang + +// Test case of a `for` loop that +// has multiple paths to continue +// the loop (both the ordinary one +// and an explicit `continue`) + +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-cpu -compute +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-slang -compute +//TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -dx12 -use-dxil -profile cs_6_0 -xslang -DHACK +//TEST(compute, vulkan):COMPARE_COMPUTE_EX:-vk -compute -xslang -DHACK +//TEST(compute):COMPARE_COMPUTE_EX:-cuda -compute + +//TEST_INPUT:ubuffer(data=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], stride=4):out,name buffer +RWStructuredBuffer<int> buffer; + +#define THREAD_COUNT 4 +#define LOC_COUNT 4 +#define ITER_COUNT THREAD_COUNT +#define WRITE_VAL(LOC, ITER, VAL) buffer[tid + (LOC)*THREAD_COUNT + (ITER)*THREAD_COUNT*LOC_COUNT] = 0xA0000000 | (tid << 24) | (ITER << 16) | (LOC << 8) | VAL +#define WRITE(LOC, ITER) WRITE_VAL(LOC, ITER, WaveGetActiveMask()) + +//TEST_INPUT:cbuffer(data=[0 1]):name C +cbuffer C +{ + int alwaysFalse; + int alwaysTrue; +} + +// In order to make a test of `continue` behavior +// adversarial, we need to observe the value of +// the active mask during the code that executes +// on a `continue`. +// +// We therefore define a subroutine that +// will perform the increment action for an +// ordinary counted loop, allowing us to +// observe the value of the active mask inside +// the function. +// +void inc(uint tid, in out int ii) +{ + // NOTE: For our current HLSL and GLSL output + // strategies, we end up duplicating the + // "continue clause" of a `for` loop into + // each of the sites in the code where control + // flow continues the loop. Unsurprisingly, + // those copies mean that the active mask + // seen on those platforms is not the expected + // one. + // + // We will therefore write out the expected + // value directly instead of using `WaveGetActiveMask()` + // on those targets. + // +#ifdef HACK + WRITE_VAL(3, ii, (0xE << ii) & 0xF); +#else + WRITE(3, ii); +#endif + ii++; +} + +void test(uint tid) +{ + for(int ii = 0; ii < tid; inc(tid, ii)) + { + WRITE(0, ii); + if(tid & 1) + { + WRITE(1, ii); + continue; + } + WRITE(2, ii); + } + WRITE(4, 0); +} + +[numthreads(THREAD_COUNT, 1, 1)] +void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID) +{ + test(dispatchThreadID.x); +}
\ No newline at end of file diff --git a/tests/hlsl-intrinsic/active-mask/for-continue.slang.expected.txt b/tests/hlsl-intrinsic/active-mask/for-continue.slang.expected.txt new file mode 100644 index 000000000..ad1ca4559 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/for-continue.slang.expected.txt @@ -0,0 +1,108 @@ +0 +A100000E +A200000E +A300000E +0 +A100010A +0 +A300010A +0 +0 +A2000204 +0 +0 +A100030E +A200030E +A300030E +A000040F +A100040F +A200040F +A300040F +0 +0 +0 +A3010108 +0 +0 +A2010204 +0 +0 +0 +A201030C +A301030C +0 +0 +0 +A3020008 +0 +0 +0 +A3020108 +0 +0 +0 +0 +0 +0 +0 +A3020308 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 +0 diff --git a/tests/hlsl-intrinsic/active-mask/for.slang b/tests/hlsl-intrinsic/active-mask/for.slang new file mode 100644 index 000000000..d620818f8 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/for.slang @@ -0,0 +1,32 @@ +// for.slang + +// Test active mask synthesis for a standard `for` +// loop over an integer, with no `break` or `continue` +// logic. + +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-cpu -compute +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-slang -compute +//TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -dx12 -use-dxil -profile cs_6_0 +//TEST(compute, vulkan):COMPARE_COMPUTE_EX:-vk -compute +//TEST(compute):COMPARE_COMPUTE_EX:-cuda -compute + +//TEST_INPUT:ubuffer(data=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], stride=4):out,name buffer +RWStructuredBuffer<int> buffer; + +#define THREAD_COUNT 4 +#define WRITE(IDX) buffer[IDX*THREAD_COUNT + tid] = WaveGetActiveMask() + +void test(int tid) +{ + for(int ii = 0; ii < tid; ++ii) + { + WRITE(ii); + } + WRITE(THREAD_COUNT); +} + +[numthreads(THREAD_COUNT, 1, 1)] +void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID) +{ + test(dispatchThreadID.x); +}
\ No newline at end of file diff --git a/tests/hlsl-intrinsic/active-mask/for.slang.expected.txt b/tests/hlsl-intrinsic/active-mask/for.slang.expected.txt new file mode 100644 index 000000000..9d3c89743 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/for.slang.expected.txt @@ -0,0 +1,27 @@ +0 +E +E +E +0 +0 +C +C +0 +0 +0 +8 +0 +0 +0 +0 +F +F +F +F +0 +0 +0 +0 +0 +0 +0 diff --git a/tests/hlsl-intrinsic/active-mask/if-conditional-exit.slang b/tests/hlsl-intrinsic/active-mask/if-conditional-exit.slang new file mode 100644 index 000000000..99a400f96 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/if-conditional-exit.slang @@ -0,0 +1,45 @@ +// if-conditional-exit.slang + +// Test active mask synthesis for an `if` where one side +// conditionally exits the current function/scope, +// and thus may or may not "re-converge" after the +// conditional. + +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-cpu -compute +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-slang -compute +//TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -dx12 -use-dxil -profile cs_6_0 +//TEST(compute, vulkan):COMPARE_COMPUTE_EX:-vk -compute +//TEST(compute):COMPARE_COMPUTE_EX:-cuda -compute + + +//TEST_INPUT:ubuffer(data=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], stride=4):out,name buffer +RWStructuredBuffer<int> buffer; + +#define THREAD_COUNT 4 +#define WRITE(IDX) buffer[IDX*THREAD_COUNT + tid] = WaveGetActiveMask() + +void test(int tid) +{ + WRITE(0); + if(tid & 1) + { + WRITE(1); + if(tid & 2) + { + WRITE(2); + return; + } + WRITE(3); + } + else + { + WRITE(4); + } + WRITE(5); +} + +[numthreads(THREAD_COUNT, 1, 1)] +void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID) +{ + test(dispatchThreadID.x); +}
\ No newline at end of file diff --git a/tests/hlsl-intrinsic/active-mask/if-conditional-exit.slang.expected.txt b/tests/hlsl-intrinsic/active-mask/if-conditional-exit.slang.expected.txt new file mode 100644 index 000000000..2880950d5 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/if-conditional-exit.slang.expected.txt @@ -0,0 +1,24 @@ +F +F +F +F +0 +A +0 +A +0 +0 +0 +8 +0 +2 +0 +0 +5 +0 +5 +0 +7 +7 +7 +0 diff --git a/tests/hlsl-intrinsic/active-mask/if-early-exit.slang b/tests/hlsl-intrinsic/active-mask/if-early-exit.slang new file mode 100644 index 000000000..75f438905 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/if-early-exit.slang @@ -0,0 +1,38 @@ +// if-early-exit.slang + +// Test active mask synthesis for an `if` where one side +// unconditionally exits the current function/scope, +// and thus cannot "re-converge" after the conditional. + +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-cpu -compute +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-slang -compute +//TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -dx12 -use-dxil -profile cs_6_0 +//TEST(compute, vulkan):COMPARE_COMPUTE_EX:-vk -compute +//TEST(compute):COMPARE_COMPUTE_EX:-cuda -compute + +//TEST_INPUT:ubuffer(data=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], stride=4):out,name buffer +RWStructuredBuffer<int> buffer; + +#define THREAD_COUNT 4 +#define WRITE(IDX) buffer[IDX*THREAD_COUNT + tid] = WaveGetActiveMask() + +void test(int tid) +{ + WRITE(0); + if(tid & 1) + { + WRITE(1); + return; + } + else + { + WRITE(2); + } + WRITE(3); +} + +[numthreads(THREAD_COUNT, 1, 1)] +void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID) +{ + test(dispatchThreadID.x); +}
\ No newline at end of file diff --git a/tests/hlsl-intrinsic/active-mask/if-early-exit.slang.expected.txt b/tests/hlsl-intrinsic/active-mask/if-early-exit.slang.expected.txt new file mode 100644 index 000000000..e4244443a --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/if-early-exit.slang.expected.txt @@ -0,0 +1,16 @@ +F +F +F +F +0 +A +0 +A +5 +0 +5 +0 +5 +0 +5 +0 diff --git a/tests/hlsl-intrinsic/active-mask/if-one-sided.slang b/tests/hlsl-intrinsic/active-mask/if-one-sided.slang new file mode 100644 index 000000000..760512863 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/if-one-sided.slang @@ -0,0 +1,31 @@ +// if-one-sided.slang + +// Test active mask synthesis in the "easy case" of a one-sided `if`. + +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-cpu -compute +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-slang -compute +//TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -dx12 -use-dxil -profile cs_6_0 +//TEST(compute, vulkan):COMPARE_COMPUTE_EX:-vk -compute +//TEST(compute):COMPARE_COMPUTE_EX:-cuda -compute + +//TEST_INPUT:ubuffer(data=[0 0 0 0 0 0 0 0 0 0 0 0], stride=4):out,name buffer +RWStructuredBuffer<int> buffer; + +#define THREAD_COUNT 4 +#define WRITE(IDX) buffer[IDX*THREAD_COUNT + tid] = WaveGetActiveMask() + +void test(int tid) +{ + WRITE(0); + if(tid & 1) + { + WRITE(1); + } + WRITE(2); +} + +[numthreads(THREAD_COUNT, 1, 1)] +void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID) +{ + test(dispatchThreadID.x); +}
\ No newline at end of file diff --git a/tests/hlsl-intrinsic/active-mask/if-one-sided.slang.expected.txt b/tests/hlsl-intrinsic/active-mask/if-one-sided.slang.expected.txt new file mode 100644 index 000000000..637cfb771 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/if-one-sided.slang.expected.txt @@ -0,0 +1,12 @@ +F +F +F +F +0 +A +0 +A +F +F +F +F diff --git a/tests/hlsl-intrinsic/active-mask/if.slang b/tests/hlsl-intrinsic/active-mask/if.slang new file mode 100644 index 000000000..b8aa30b03 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/if.slang @@ -0,0 +1,35 @@ +// if.slang + +// Test active mask synthesis in the "easy case" of a two-sided `if`. + +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-cpu -compute +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-slang -compute +//TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -dx12 -use-dxil -profile cs_6_0 +//TEST(compute, vulkan):COMPARE_COMPUTE_EX:-vk -compute +//TEST(compute):COMPARE_COMPUTE_EX:-cuda -compute + +//TEST_INPUT:ubuffer(data=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], stride=4):out,name buffer +RWStructuredBuffer<int> buffer; + +#define THREAD_COUNT 4 +#define WRITE(IDX) buffer[IDX*THREAD_COUNT + tid] = WaveGetActiveMask() + +void test(int tid) +{ + WRITE(0); + if(tid & 1) + { + WRITE(1); + } + else + { + WRITE(2); + } + WRITE(3); +} + +[numthreads(THREAD_COUNT, 1, 1)] +void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID) +{ + test(dispatchThreadID.x); +}
\ No newline at end of file diff --git a/tests/hlsl-intrinsic/active-mask/if.slang.expected.txt b/tests/hlsl-intrinsic/active-mask/if.slang.expected.txt new file mode 100644 index 000000000..4f9fabfa6 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/if.slang.expected.txt @@ -0,0 +1,16 @@ +F +F +F +F +0 +A +0 +A +5 +0 +5 +0 +F +F +F +F diff --git a/tests/hlsl-intrinsic/active-mask/switch-no-default.slang b/tests/hlsl-intrinsic/active-mask/switch-no-default.slang new file mode 100644 index 000000000..8afc5ed68 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/switch-no-default.slang @@ -0,0 +1,53 @@ +// switch-no-default.slang + +// Test active mask synthesis for a `switch` statement +// with no `default` case (such that unhandled values +// branch directly to the `break` target after the +// `switch`). + +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-cpu -compute +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-slang -compute +//TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -dx12 -use-dxil -profile cs_6_0 -xslang -DHACK +//TEST(compute, vulkan):COMPARE_COMPUTE_EX:-vk -compute -xslang -DHACK +//TEST(compute):COMPARE_COMPUTE_EX:-cuda -compute + +//TEST_INPUT:ubuffer(data=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], stride=4):out,name buffer +RWStructuredBuffer<int> buffer; + +#define THREAD_COUNT 4 +#define LOC_COUNT 5 +#define WRITE(LOC) buffer[tid + (LOC)*THREAD_COUNT] = 0xA0000000 | (tid << 24) | (LOC << 8) | WaveGetActiveMask() + +//TEST_INPUT:cbuffer(data=[0 1]):name C +cbuffer C +{ + int alwaysFalse; + int alwaysTrue; +} + +void test(int tid) +{ + switch(tid) + { + case 0: + WRITE(0); + break; + + case 1: + WRITE(1); + break; + + case 2: + WRITE(2); + break; + + // NOTE: no `default:` + } + WRITE(4); +} + +[numthreads(THREAD_COUNT, 1, 1)] +void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID) +{ + test(dispatchThreadID.x); +}
\ No newline at end of file diff --git a/tests/hlsl-intrinsic/active-mask/switch-no-default.slang.expected.txt b/tests/hlsl-intrinsic/active-mask/switch-no-default.slang.expected.txt new file mode 100644 index 000000000..f529cc9b2 --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/switch-no-default.slang.expected.txt @@ -0,0 +1,20 @@ +A0000001 +0 +0 +0 +0 +A1000102 +0 +0 +0 +0 +A2000204 +0 +0 +0 +0 +0 +A000040F +A100040F +A200040F +A300040F diff --git a/tests/hlsl-intrinsic/active-mask/switch-trivial-fallthrough.slang b/tests/hlsl-intrinsic/active-mask/switch-trivial-fallthrough.slang new file mode 100644 index 000000000..ceb7d236d --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/switch-trivial-fallthrough.slang @@ -0,0 +1,53 @@ +// switch-trivial-fallthrough.slang + +// Test active mask synthesis for a `switch` statement +// that exhibits "trivial fall-through" from one `case` +// to another. + +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-cpu -compute +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-slang -compute +//TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -dx12 -use-dxil -profile cs_6_0 -xslang -DHACK +//TEST(compute, vulkan):COMPARE_COMPUTE_EX:-vk -compute -xslang -DHACK + +// Note: this test is currently disabled on the CUDA +// target because we do not synthesize the active +// mask value we want/expect to see. +// +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-cuda -compute + +//TEST_INPUT:ubuffer(data=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], stride=4):out,name buffer +RWStructuredBuffer<int> buffer; + +#define THREAD_COUNT 4 +#define LOC_COUNT 5 +#define WRITE(LOC) buffer[tid + (LOC)*THREAD_COUNT] = 0xA0000000 | (tid << 24) | (LOC << 8) | WaveGetActiveMask() + +//TEST_INPUT:cbuffer(data=[0 1]):name C +cbuffer C +{ + int alwaysFalse; + int alwaysTrue; +} + +void test(int tid) +{ + switch(tid) + { + case 0: + case 1: + WRITE(0); + break; + + case 2: + default: + WRITE(2); + break; + } + WRITE(4); +} + +[numthreads(THREAD_COUNT, 1, 1)] +void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID) +{ + test(dispatchThreadID.x); +}
\ No newline at end of file diff --git a/tests/hlsl-intrinsic/active-mask/switch-trivial-fallthrough.slang.expected.txt b/tests/hlsl-intrinsic/active-mask/switch-trivial-fallthrough.slang.expected.txt new file mode 100644 index 000000000..8d144548e --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/switch-trivial-fallthrough.slang.expected.txt @@ -0,0 +1,20 @@ +A0000003 +A1000003 +0 +0 +0 +0 +0 +0 +0 +0 +A200020C +A300020C +0 +0 +0 +0 +A000040F +A100040F +A200040F +A300040F diff --git a/tests/hlsl-intrinsic/active-mask/switch.slang b/tests/hlsl-intrinsic/active-mask/switch.slang new file mode 100644 index 000000000..cd959c59f --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/switch.slang @@ -0,0 +1,52 @@ +// switch.slang + +// Test active mask synthesis for a trivial `switch` statement + +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-cpu -compute +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-slang -compute +//TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -dx12 -use-dxil -profile cs_6_0 -xslang -DHACK +//TEST(compute, vulkan):COMPARE_COMPUTE_EX:-vk -compute -xslang -DHACK +//TEST(compute):COMPARE_COMPUTE_EX:-cuda -compute + +//TEST_INPUT:ubuffer(data=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], stride=4):out,name buffer +RWStructuredBuffer<int> buffer; + +#define THREAD_COUNT 4 +#define LOC_COUNT 5 +#define WRITE(LOC) buffer[tid + (LOC)*THREAD_COUNT] = 0xA0000000 | (tid << 24) | (LOC << 8) | WaveGetActiveMask() + +//TEST_INPUT:cbuffer(data=[0 1]):name C +cbuffer C +{ + int alwaysFalse; + int alwaysTrue; +} + +void test(int tid) +{ + switch(tid) + { + case 0: + WRITE(0); + break; + + case 1: + WRITE(1); + break; + + case 2: + WRITE(2); + break; + + default: + WRITE(3); + break; + } + WRITE(4); +} + +[numthreads(THREAD_COUNT, 1, 1)] +void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID) +{ + test(dispatchThreadID.x); +}
\ No newline at end of file diff --git a/tests/hlsl-intrinsic/active-mask/switch.slang.expected.txt b/tests/hlsl-intrinsic/active-mask/switch.slang.expected.txt new file mode 100644 index 000000000..d661b0bcb --- /dev/null +++ b/tests/hlsl-intrinsic/active-mask/switch.slang.expected.txt @@ -0,0 +1,20 @@ +A0000001 +0 +0 +0 +0 +A1000102 +0 +0 +0 +0 +A2000204 +0 +0 +0 +0 +A3000308 +A000040F +A100040F +A200040F +A300040F |
