From 1241006b6d89cae09766ca9795187ef9c0dd2085 Mon Sep 17 00:00:00 2001 From: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> Date: Mon, 26 Feb 2024 19:09:09 -0500 Subject: Partially implement shader_subgroup extension(s); Partially resolves #3548 (#3580) * Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548 Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548 GL_KHR_shader_subgroup implemented based on https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt Implementation is broken down into seperate glsl extensions due to the ***large differences*** in implementation of each section, and functionality/testing. GL_KHR_shader_subgroup_basic{ **Partially implemented** Implementation: * All 9 built-in variables have been stubbed without proper value; implementation is still required for these system variables; related to #411. * Functions were reimplemented despite nearly mirrored HLSL functions due to: * hlsl.meta implementations targetting workgroups rather than a warp/wave/subgroup: * `__syncwarp` vs `__syncthreads` * `SubgroupMemory` vs `WorkgroupMemory` * etc. * hlsl.meta implementations target broader SPIR-V memory targets to block on: * ImageMemory|UniformMemory versus SPIR-V specifying barriers for ImageMemory and seperately an option for UniformMemory * `subgroupElect` for CUDA has a different implementation than `WaveIsFirstLane`, this is because spec states that `subgroupElect()` only returns the lowest active gl_SubgroupInvocationID; therefore we are supposed to fetch the current active mask even if some invocations are turned off by branches Testing: tests for the variable -- `tests/glsl/shader-subgroup-built-in-variables.slang` * these tests do not test functionality since not implemented yet tests for the functions -- `tests/glsl/shader-subgroup-basic.slang` * concurrency is tested for using SubgroupMemory, UniformMemory through attempting to create a GPU side race condition with writing and reading memory * due to testing tools avaible there are no tests for ImageMemory * subgroupElect is tested to return invocation #0, the lowest invocation that will always run; wave size is 32, therefore #0 is always active and will always be the elected invocation. } GL_KHR_shader_subgroup_vote{ **Fully implemented** Implementation: * 3/3 functions are using the hlsl.meta implementation Testing: `tests/glsl/shader-subgroup-vote.slang` * Testing each a positive (returns true) and negative (returns false) test case to ensure vote results are correct } GL_KHR_shader_subgroup_ballot{ **Partially implemented** Implementation: There are 10/10 functions that are implemented: * 3 are using hlsl.meta implementation * 7 are using new implementations -- only support GLSL, SPIR-V, HLSL, CUDA * These implementations do not exist in hlsl.meta, so they were added * `subgroupInverseBallot` lacks an analog function to call; this feature was emulated: * in CUDA through knowing waves are 32bit and lanes are 0 indexed, this implys that ` (ballotResult >> YOUR_INVOCATION) & 1` checks if your invocation is active, for example, `(0b11001 >> 3) & 1` would mean that only invocation 5, 4, and 1 is active, 3 would mean `YOUR_INVOCATION` is the fourth invocation in the subgroup. `(0b11001>>3) & 1` would return true since your bit is toggled and evaluates to `0b11 & 0b1` * in HLSL through testing if the wave count is 32 or less (use the same logic as CUDA in this case); else find the index `YOUR_INVOCATION` corrisponds with where each vector has 32bits (32 waves); avoid division in the process. then run the same algorithm cuda employs. * `subgroupBallotBitExtract` is logically the same as `subgroupInverseBallot` * 5 implementations do not have a CUDA, HLSL, and CPP imlementation yet (subgroupBallotFindMSB, subgroupBallotFindLSB, subgroupBallotExclusiveBitCount, subgroupBallotInclusiveBitCount, subgroupBallotBitCount) due to being out of scope for the commit Testing: `tests/glsl/shader-subgroup-ballot.slang` * the function tests for an expected value of each ballot function; tests try inputting larger than 32 toggled bits as function parameters to ensure the implementation correctly identifies values up to a maximum of the subgroup invocation count as per extension specification (otherwise the functionality is fairly trivial to test) } GL_KHR_shader_subgroup_arithmetic{ **Partially implemented** Implementation: * There are 21 functions to implement: * 14 functions are using the hlsl.meta implementation * 7 functions are new implementations -- only implemented for GLSL and SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required * CUDA, CPP, HLSL are out of scope for the commit Testing: `tests/glsl/shader-subgroup-arithmetic.slang` * all tests silently kill the shader; outputted GLSL was checked, could not see an issue * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_shuffle{ **Partially implemented** Implementation: * There are 2 functions to implement: * 1 function is using the existing hlsl.meta implmentation * 1 function is using a new implmentation (subgroupShuffleXor) -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle.slang` * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit] * tests fail with cpp due to `kIROp_WaveGetActiveMask` failing to be called } GL_KHR_shader_subgroup_shuffle_relative{ **Partially implemented** Implementation: * There are 2 functions to implement: * all 2 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-relative.slang` * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_clustered{ **Partially implemented** Implementation: * There are 7 functions to implement: * all 7 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-clustered.slang` * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_quad{ **Partially implemented** Implementation: * There are 4 functions to implement: * all 4 functions are using hlsl.meta implmentations -- only implemented for GLSL & SPIR-V & HLSL Testing: `tests/glsl/shader-subgroup-shuffle-quad.slang` * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit] } --------- Failing tests and why: Note: due to system variables not being implemented largly for CUDA and CPP, these tests will fail (#3 and #4){ tests/glsl/shader-subgroup-arithmetic.slang.3 tests/glsl/shader-subgroup-arithmetic.slang.4 tests/glsl/shader-subgroup-ballot.slang.4 tests/glsl/shader-subgroup-basic.slang.3 tests/glsl/shader-subgroup-basic.slang.4 tests/glsl/shader-subgroup-quad.slang.3 tests/glsl/shader-subgroup-quad.slang.4 tests/glsl/shader-subgroup-vote.slang.3 tests/glsl/shader-subgroup-vote.slang.4 } Note: due to kIROp_WaveGetActiveMask not being loaded for cpp the following test will fail{ tests/glsl/shader-subgroup-shuffle.slang.4 } Note: due to a unknown silent error the following will fail [could not spot an error in the generated glsl and spir-v]{ tests/glsl/shader-subgroup-arithmetic.slang.5 (vk) tests/glsl/shader-subgroup-arithmetic.slang.6 (vk) } Other notes of worthy:{ * only a few types are checked currently in tests due to equality templates not allowing freely casting to int/uint, meaning to test types en-mass is not trivial and will most likley be completly replaced once templates can cast & check equality more freely. * did not implement vector types for any functions that may use them (mostly in reference to SPIR-V, since many may accept scalar or vector inputs); applicable to subgroup-shuffle, subgroup-shuffle-relative, subgroup-arithmetic, subgroup-shuffle, subgroup_clustered, subgroup_quad * did not implement checks for half floats * CUDA, CPP, HLSL implementations were largly out of scope and if not implemented, this is due to the implementation not being trivial } Random fixes encountered:{ * hlsl.meta incorrectly sets `OpCapability` as `GroupNonUniformBallot` when the `OpCapability` should be `GroupNonUniformVote`; this is as per SPIR-V spec for all SPIR-V calls used in `GL_KHR_shader_subgroup_vote`: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformAll } * added vector types and tests; Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548 GL_KHR_shader_subgroup implemented based on https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt GL_KHR_shader_subgroup_* & GLSL ref: * https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt * https://www.khronos.org/blog/vulkan-subgroup-tutorial * https://www.khronos.org/assets/uploads/developers/library/2018-vulkan-devday/06-subgroups.pdf HLSL ref: * https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions * https://github.com/Microsoft/DirectXShaderCompiler/wiki/Wave-Intrinsics CUDA ref: * https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html SPIR-V ref: * https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_memory_semantics_id Implementation is broken down into seperate glsl extensions due to the ***large differences*** in implementation of each section, and functionality/testing. GL_KHR_shader_subgroup_basic{ **Partially implemented** Implementation: * All 9 built-in variables have been stubbed without proper value; implementation is still required for these system variables; related to #411. * Functions were reimplemented despite nearly mirrored HLSL functions due to: * hlsl.meta implementations targetting workgroups rather than a warp/wave/subgroup: * `__syncwarp` vs `__syncthreads` * `SubgroupMemory` vs `WorkgroupMemory` * etc. * hlsl.meta implementations target broader SPIR-V memory targets to block on: * ImageMemory|UniformMemory versus SPIR-V specifying barriers for ImageMemory and seperately an option for UniformMemory * `subgroupElect` for CUDA has a different implementation than `WaveIsFirstLane`, this is because spec states that `subgroupElect()` only returns the lowest active gl_SubgroupInvocationID; therefore we are supposed to fetch the current active mask even if some invocations are turned off by branches Testing: tests for the variable -- `tests/glsl/shader-subgroup-built-in-variables.slang` * these tests do not test functionality since not implemented yet tests for the functions -- `tests/glsl/shader-subgroup-basic.slang` * concurrency is tested for using SubgroupMemory, UniformMemory through attempting to create a GPU side race condition with writing and reading memory * due to testing tools avaible there are no tests for ImageMemory * subgroupElect is tested to return invocation #0, the lowest invocation that will always run; wave size is 32, therefore #0 is always active and will always be the elected invocation. } GL_KHR_shader_subgroup_vote{ **Fully implemented** Implementation: * 3/3 functions are using the hlsl.meta implementation Testing: `tests/glsl/shader-subgroup-vote.slang` * Testing each a positive (returns true) and negative (returns false) test case to ensure vote results are correct } GL_KHR_shader_subgroup_ballot{ **Partially implemented** Implementation: There are 10/10 functions that are implemented: * 3 are using hlsl.meta implementation * 7 are using new implementations -- only support GLSL, SPIR-V, HLSL, CUDA * These implementations do not exist in hlsl.meta, so they were added * `subgroupInverseBallot` lacks an analog function to call; this feature was emulated: * in CUDA through knowing waves are 32bit and lanes are 0 indexed, this implys that ` (ballotResult >> YOUR_INVOCATION) & 1` checks if your invocation is active, for example, `(0b11001 >> 3) & 1` would mean that only invocation 5, 4, and 1 is active, 3 would mean `YOUR_INVOCATION` is the fourth invocation in the subgroup. `(0b11001>>3) & 1` would return true since your bit is toggled and evaluates to `0b11 & 0b1` * in HLSL through testing if the wave count is 32 or less (use the same logic as CUDA in this case); else find the index `YOUR_INVOCATION` corrisponds with where each vector has 32bits (32 waves); avoid division in the process. then run the same algorithm cuda employs. * `subgroupBallotBitExtract` is logically the same as `subgroupInverseBallot` * 5 implementations do not have a CUDA, HLSL, and CPP imlementation yet (subgroupBallotFindMSB, subgroupBallotFindLSB, subgroupBallotExclusiveBitCount, subgroupBallotInclusiveBitCount, subgroupBallotBitCount) due to being out of scope for the commit Testing: `tests/glsl/shader-subgroup-ballot.slang` * the function tests for an expected value of each ballot function; tests try inputting larger than 32 toggled bits as function parameters to ensure the implementation correctly identifies values up to a maximum of the subgroup invocation count as per extension specification (otherwise the functionality is fairly trivial to test) } GL_KHR_shader_subgroup_arithmetic{ **Partially implemented** Implementation: * There are 21 functions to implement: * 14 functions are using the hlsl.meta implementation * 7 functions are new implementations -- only implemented for GLSL and SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required * CUDA, CPP, HLSL are out of scope for the commit Testing: `tests/glsl/shader-subgroup-arithmetic.slang` * all tests silently kill the shader; outputted GLSL was checked, could not see an issue * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_shuffle{ **Partially implemented** Implementation: * There are 2 functions to implement: * 1 function is using the existing hlsl.meta implmentation * 1 function is using a new implmentation (subgroupShuffleXor) -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] * tests fail with cpp due to `kIROp_WaveGetActiveMask` failing to be called } GL_KHR_shader_subgroup_shuffle_relative{ **Partially implemented** Implementation: * There are 2 functions to implement: * all 2 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-relative.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_clustered{ **Partially implemented** Implementation: * There are 7 functions to implement: * all 7 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-clustered.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_quad{ **Partially implemented** Implementation: * There are 4 functions to implement: * all 4 functions are using hlsl.meta implmentations -- only implemented for GLSL & SPIR-V & HLSL Testing: `tests/glsl/shader-subgroup-shuffle-quad.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } --------- Failing tests and why: Note: test numbers are assuming none of the existing tests are toggled off Note: due to system variables not being implemented largly for CUDA and CPP, these tests will fail (#3 and #4){ tests/glsl/shader-subgroup-arithmetic.slang.3 tests/glsl/shader-subgroup-arithmetic.slang.4 tests/glsl/shader-subgroup-ballot.slang.4 tests/glsl/shader-subgroup-basic.slang.3 tests/glsl/shader-subgroup-basic.slang.4 tests/glsl/shader-subgroup-quad.slang.3 tests/glsl/shader-subgroup-quad.slang.4 tests/glsl/shader-subgroup-vote.slang.3 tests/glsl/shader-subgroup-vote.slang.4 } Note: due to kIROp_WaveGetActiveMask not being loaded for cpp the following test will fail{ tests/glsl/shader-subgroup-shuffle.slang.4 tests/glsl/shader-subgroup-shuffle-relative.slang.4 tests/glsl/shader-subgroup-basic.slang.4 } Note: due to a unknown silent error the following will fail [could not spot an error in the generated glsl and spir-v]{ tests/glsl/shader-subgroup-arithmetic.slang.5 (vk) tests/glsl/shader-subgroup-arithmetic.slang.6 (vk) } Other notes of worthy:{ * only a few types are checked currently in arithmetic test; this is due to the test silently failing, meaning I can't actually test anything implemented * did not implement checks for half floats * CUDA, CPP, HLSL implementations were largly out of scope and not implemented, this is due to the implementation being non trivial for many functions } Random fixes encountered:{ * hlsl.meta incorrectly sets `OpCapability` as `GroupNonUniformBallot` when the `OpCapability` should be `GroupNonUniformVote`; this is as per SPIR-V spec for all SPIR-V calls used in `GL_KHR_shader_subgroup_vote`: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformAll } * Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548 Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548 GL_KHR_shader_subgroup implemented based on https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt GL_KHR_shader_subgroup_* & GLSL ref: * https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt * https://www.khronos.org/blog/vulkan-subgroup-tutorial * https://www.khronos.org/assets/uploads/developers/library/2018-vulkan-devday/06-subgroups.pdf HLSL ref: * https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions * https://github.com/Microsoft/DirectXShaderCompiler/wiki/Wave-Intrinsics CUDA ref: * https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html SPIR-V ref: * https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_memory_semantics_id Implementation is broken down into seperate glsl extensions due to the ***large differences*** in implementation of each section, and functionality/testing. GL_KHR_shader_subgroup_basic{ **Partially implemented** Implementation: * All 9 built-in variables have been stubbed without proper value; implementation is still required for these system variables; related to #411. * Functions were reimplemented despite nearly mirrored HLSL functions due to: * hlsl.meta implementations targetting workgroups rather than a warp/wave/subgroup: * `__syncwarp` vs `__syncthreads` * `SubgroupMemory` vs `WorkgroupMemory` * etc. * hlsl.meta implementations target broader SPIR-V memory targets to block on: * ImageMemory|UniformMemory versus SPIR-V specifying barriers for ImageMemory and seperately an option for UniformMemory * `subgroupElect` for CUDA has a different implementation than `WaveIsFirstLane`, this is because spec states that `subgroupElect()` only returns the lowest active gl_SubgroupInvocationID; therefore we are supposed to fetch the current active mask even if some invocations are turned off by branches Testing: tests for the variable -- `tests/glsl/shader-subgroup-built-in-variables.slang` * these tests do not test functionality since not implemented yet tests for the functions -- `tests/glsl/shader-subgroup-basic.slang` * concurrency is tested for using SubgroupMemory, UniformMemory through attempting to create a GPU side race condition with writing and reading memory * due to testing tools avaible there are no tests for ImageMemory * subgroupElect is tested to return invocation #0, the lowest invocation that will always run; wave size is 32, therefore #0 is always active and will always be the elected invocation. } GL_KHR_shader_subgroup_vote{ **Fully implemented** Implementation: * 3/3 functions are using the hlsl.meta implementation Testing: `tests/glsl/shader-subgroup-vote.slang` * Testing each a positive (returns true) and negative (returns false) test case to ensure vote results are correct } GL_KHR_shader_subgroup_ballot{ **Partially implemented** Implementation: There are 10/10 functions that are implemented: * 3 are using hlsl.meta implementation * 7 are using new implementations -- only support GLSL, SPIR-V, HLSL, CUDA * These implementations do not exist in hlsl.meta, so they were added * `subgroupInverseBallot` lacks an analog function to call; this feature was emulated: * in CUDA through knowing waves are 32bit and lanes are 0 indexed, this implys that ` (ballotResult >> YOUR_INVOCATION) & 1` checks if your invocation is active, for example, `(0b11001 >> 3) & 1` would mean that only invocation 5, 4, and 1 is active, 3 would mean `YOUR_INVOCATION` is the fourth invocation in the subgroup. `(0b11001>>3) & 1` would return true since your bit is toggled and evaluates to `0b11 & 0b1` * in HLSL through testing if the wave count is 32 or less (use the same logic as CUDA in this case); else find the index `YOUR_INVOCATION` corrisponds with where each vector has 32bits (32 waves); avoid division in the process. then run the same algorithm cuda employs. * `subgroupBallotBitExtract` is logically the same as `subgroupInverseBallot` * 5 implementations do not have a CUDA, HLSL, and CPP imlementation yet (subgroupBallotFindMSB, subgroupBallotFindLSB, subgroupBallotExclusiveBitCount, subgroupBallotInclusiveBitCount, subgroupBallotBitCount) due to being out of scope for the commit Testing: `tests/glsl/shader-subgroup-ballot.slang` * the function tests for an expected value of each ballot function; tests try inputting larger than 32 toggled bits as function parameters to ensure the implementation correctly identifies values up to a maximum of the subgroup invocation count as per extension specification (otherwise the functionality is fairly trivial to test) } GL_KHR_shader_subgroup_arithmetic{ **Partially implemented** Implementation: * There are 21 functions to implement: * 14 functions are using the hlsl.meta implementation * 7 functions are new implementations -- only implemented for GLSL and SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required * CUDA, CPP, HLSL are out of scope for the commit Testing: `tests/glsl/shader-subgroup-arithmetic.slang` * all tests silently kill the shader; outputted GLSL was checked, could not see an issue * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_shuffle{ **Partially implemented** Implementation: * There are 2 functions to implement: * 1 function is using the existing hlsl.meta implmentation * 1 function is using a new implmentation (subgroupShuffleXor) -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] * tests fail with cpp due to `kIROp_WaveGetActiveMask` failing to be called } GL_KHR_shader_subgroup_shuffle_relative{ **Partially implemented** Implementation: * There are 2 functions to implement: * all 2 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-relative.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_clustered{ **Partially implemented** Implementation: * There are 7 functions to implement: * all 7 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-clustered.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_quad{ **Partially implemented** Implementation: * There are 4 functions to implement: * all 4 functions are using hlsl.meta implmentations -- only implemented for GLSL & SPIR-V & HLSL Testing: `tests/glsl/shader-subgroup-shuffle-quad.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } --------- Failing tests and why: Note: test numbers are assuming none of the existing tests are toggled off Note: due to system variables not being implemented largly for CUDA and CPP, these tests will fail (#3 and #4){ tests/glsl/shader-subgroup-arithmetic.slang.3 tests/glsl/shader-subgroup-arithmetic.slang.4 tests/glsl/shader-subgroup-ballot.slang.4 tests/glsl/shader-subgroup-basic.slang.3 tests/glsl/shader-subgroup-basic.slang.4 tests/glsl/shader-subgroup-quad.slang.3 tests/glsl/shader-subgroup-quad.slang.4 tests/glsl/shader-subgroup-vote.slang.3 tests/glsl/shader-subgroup-vote.slang.4 } Note: due to kIROp_WaveGetActiveMask not being loaded for cpp the following test will fail{ tests/glsl/shader-subgroup-shuffle.slang.4 tests/glsl/shader-subgroup-shuffle-relative.slang.4 tests/glsl/shader-subgroup-basic.slang.4 } Other notes of worthy:{ * added preamble function and macros for implementing subgroup functionality (and tests) to make it possible to iterate on the functionality with reasonable effort in the future * CUDA, CPP, HLSL implementations were largly out of scope and not implemented, this is due to the implementation being non trivial for many functions * doubles cause a silent crash on most subgroup functions tested (silent shader hang) * __requireGLSLExtension does not work as intended inside glsl.meta; as a result half, int16, int64 int8, all are ommited from testing } Random fixes encountered:{ * hlsl.meta incorrectly sets `OpCapability` as `GroupNonUniformBallot` when the `OpCapability` should be `GroupNonUniformVote`; this is as per SPIR-V spec for all SPIR-V calls used in `GL_KHR_shader_subgroup_vote`: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformAll * hlsl.meta incorrectly uses for WaveMaskPrefixBitOr (SPIR-V) OpGroupNonUniformBitwiseAnd intead of OpGroupNonUniformBitwiseOr; this was fixed } * redesign tests under suggestions that they should be smaller, more maintainable, and test the most amount of data reasonabley possible (balance with fast iterations); optional double testing varying parameter testing most tests chain results now * fix missing impl and merge conflict resolutions * reundant test code cleanup and organization move tests to proper location (glsl-intrinsic) clean up redundant code (input buffers) * add missing logical operands support (and remove hlsl/cuda code reuse due to the functional differences) under all And, Or, Xor ops redesign tests to conform to a better testing paradigm * testing code style change to not use white space as a toggle for tests * provided crash reason for doubles (intel iris gpu's crash in glsl with doubles due to missing support in device caps [as per vulkan validation layer) uncommented the `__requireGLSLExtension` code so once it is fixed int16/8/64/half wil work with subgroup not requiring future intervention * fixing some vk validation layer errors (OpMemoryBarrier, Shuffle operations) modified style of tests; removed redundancy (extra code that does nothing); fixed some incorrect run targets; added error reasons for all encountered problems (and if needed, a #define/#if toggle) * remove comments of important tests inplace of #define over the broken feature of extended shader_subgroup types * removed macros inside glsl.meta removed erroneous __target_switch to directly call hlsl.meta function added elaboration on the problem with __requireGLSLExtension changed WaveMaskPrefixBit[or|and|xor] to support the expected type of only as per `HLSL Shader Model 6.5` specs removed "precision highp" since it does not affect tests * changes some hlsl.meta functions used to be more appropriate (as per suggested) WaveMask -> WaveActive.* WaveMaskPrefix.* -> WavePrefix.* remove __target_switch case's for unimplemented case's of intrinsics fix _getLaneId() being removed from some regex used earlier * fix usage of __target_intrinsic instead of __intrinsic_asm; silently would cause only arguments to be emmitted as return changed usage of `__requireGLSLExtension` because now it causes a crash from the missing intrinsic (instead of a silent error) * fix shader subgroup extended types support for GLSL and SPIR-V: 1. seperate intrinsic/__requireGLSL generating functionality of shader_subgroup_preamble into child function calls due to otherwise `__requireGLSLExtension` being ignored if the calling function of shader_subgroup_preamble calls an `__intrinsic_asm` 2. fixed HLSL.meta logic for wave operations (Add, Mul, exclusiveAdd, exclusiveMul) to no longer cast the input type T into a uint due to cost-of-op & crash. * Int8_t bit casted into uint32_t crashed the compiler. As per SPIR-V spec, OpGroupNonUniformI.* work on uint and int types meaning the function has no need to cast to a unit. 3. removed erroneous __target_switch for subgroupShuffle * 1. ignore tests gracefully 2. remove un-needed SPIRV capability specifying (with OpCapability) 3. clean up structure of typeRequireChecks_shader_subgroup_GLSL 4. explain why HLSL/CUDA are not targeted for shader-subgroup-arithmetic.slang * syntax changes + `property` declaration fix + builtin var glsl implementation + changed incorrect HLSL.meta assumptions (#1)`property` declaration as *non member* implementation change/fix (all of the changes to `slang-lower-to-ir.cpp`) using (#1), implemented subgroup builtin's for GLSL/SPIR-V; did not implement built'ins completly for HLSL/CUDA due to non trivial implementations. CPP has no implementation due to missing support of system values changed some incorrect HLSL.meta subgroup implementation assumptions of type usage (bit casting 8bit->32bit, wrong capabilities causing errors) dumping ast crash with spir-v when using builtin's fixed by adding the `builtin` spirv case (all of the changes to `slang-ast-dump.cpp`) [ForceInline] addition to functions missing it return instead of spirv_asm when empty blocks are used * syntax & organization of tests adjustment (specifically how if'def's are managed) * figuring out where ci fails * figuring out where ci fails -- testing with enclusive & regular * testing CI with exclusive, regular, inclusive * remove unneeded white space test CI inconsistency issues further with arithmetic.slang * testing if the ci run fails due to some timeout/recovery issue * split up arithmetic tests and push to test with CI --------- Co-authored-by: Yong He --- tests/glsl-intrinsic/intrinsic-texture.slang | 4 +- .../shader-subgroup-arithmetic_Exclusive.slang | 191 +++++++++++++++++++++ .../shader-subgroup-arithmetic_Inclusive.slang | 191 +++++++++++++++++++++ .../shader-subgroup-arithmetic_None.slang | 191 +++++++++++++++++++++ .../shader-subgroup/shader-subgroup-ballot.slang | 142 +++++++++++++++ .../shader-subgroup/shader-subgroup-basic.slang | 66 +++++++ .../shader-subgroup-builtin-variables.slang | 44 +++++ .../shader-subgroup-clustered.slang | 171 ++++++++++++++++++ .../shader-subgroup/shader-subgroup-quad.slang | 129 ++++++++++++++ .../shader-subgroup-shuffle-relative.slang | 121 +++++++++++++ .../shader-subgroup/shader-subgroup-shuffle.slang | 139 +++++++++++++++ .../shader-subgroup/shader-subgroup-vote.slang | 167 ++++++++++++++++++ 12 files changed, 1554 insertions(+), 2 deletions(-) create mode 100644 tests/glsl-intrinsic/shader-subgroup/shader-subgroup-arithmetic_Exclusive.slang create mode 100644 tests/glsl-intrinsic/shader-subgroup/shader-subgroup-arithmetic_Inclusive.slang create mode 100644 tests/glsl-intrinsic/shader-subgroup/shader-subgroup-arithmetic_None.slang create mode 100644 tests/glsl-intrinsic/shader-subgroup/shader-subgroup-ballot.slang create mode 100644 tests/glsl-intrinsic/shader-subgroup/shader-subgroup-basic.slang create mode 100644 tests/glsl-intrinsic/shader-subgroup/shader-subgroup-builtin-variables.slang create mode 100644 tests/glsl-intrinsic/shader-subgroup/shader-subgroup-clustered.slang create mode 100644 tests/glsl-intrinsic/shader-subgroup/shader-subgroup-quad.slang create mode 100644 tests/glsl-intrinsic/shader-subgroup/shader-subgroup-shuffle-relative.slang create mode 100644 tests/glsl-intrinsic/shader-subgroup/shader-subgroup-shuffle.slang create mode 100644 tests/glsl-intrinsic/shader-subgroup/shader-subgroup-vote.slang (limited to 'tests') diff --git a/tests/glsl-intrinsic/intrinsic-texture.slang b/tests/glsl-intrinsic/intrinsic-texture.slang index 3b42be715..591ced099 100644 --- a/tests/glsl-intrinsic/intrinsic-texture.slang +++ b/tests/glsl-intrinsic/intrinsic-texture.slang @@ -6,8 +6,8 @@ //TEST:SIMPLE(filecheck=CHECK_CUDA): -allow-glsl -stage fragment -entry computeMain -target cuda // Disabling following targets because they are currently causing compile errors. -//T-EST:SIMPLE(filecheck=CHECK_HLSL): -allow-glsl -stage fragment -entry computeMain -target hlsl -//T-EST:SIMPLE(filecheck=CHECK_CPP): -allow-glsl -stage fragment -entry computeMain -target cpp +//DISABLE_TEST:SIMPLE(filecheck=CHECK_HLSL): -allow-glsl -stage fragment -entry computeMain -target hlsl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CPP): -allow-glsl -stage fragment -entry computeMain -target cpp // "Offset" family of texture functions in GLSL requires offset parameter to be a constant value. // It appears that slangc removes the constant-ness of constant values. diff --git a/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-arithmetic_Exclusive.slang b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-arithmetic_Exclusive.slang new file mode 100644 index 000000000..7bfc4d886 --- /dev/null +++ b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-arithmetic_Exclusive.slang @@ -0,0 +1,191 @@ +//TEST:SIMPLE(filecheck=CHECK_GLSL): -allow-glsl -stage compute -entry computeMain -target glsl -DTARGET_GLSL +//TEST:SIMPLE(filecheck=CHECK_SPV): -allow-glsl -stage compute -entry computeMain -target spirv -emit-spirv-directly -DTARGET_SPIRV +//TEST:SIMPLE(filecheck=CHECK_HLSL): -allow-glsl -stage compute -entry computeMain -target hlsl -DTARGET_HLSL +//TEST:SIMPLE(filecheck=CHECK_CUDA): -allow-glsl -stage compute -entry computeMain -target cuda -DTARGET_CUDA + +// not testing cpp due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CPP): -allow-glsl -stage compute -entry computeMain -target cpp -DTARGET_CPP + +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl -emit-spirv-directly +#version 430 + +#if 1 \ + && !defined(TARGET_HLSL) \ + && !defined(TARGET_CUDA) +// hlsl does not treat boolean types with subgroup.* as a logical operator +// cuda is missing an implementation +#define TEST_when_logical_operators_are_implemented +#endif + +//TEST_INPUT:ubuffer(data=[0 0], stride=4):out,name=outputBuffer +buffer MyBlockName2 +{ + uint data[]; +} outputBuffer; + +#define local_size_x_v 4 +layout(local_size_x = local_size_x_v) in; + +__generic +bool test1Logical() { + return true +#if defined(TEST_when_logical_operators_are_implemented) + && subgroupExclusiveAnd(T(1)) == T(1) + && subgroupExclusiveOr(T(1)) == T(1) + && subgroupExclusiveXor(T(1)) == T(1) +#endif // #if defined(TEST_when_logical_operators_are_implemented) + ; +} + +__generic +bool testVLogical() { + typealias gvec = vector; + + return true +#if defined(TEST_when_logical_operators_are_implemented) + && subgroupExclusiveAnd(gvec(T(1))) == gvec(T(1)) + && subgroupExclusiveOr(gvec(T(1))) == gvec(T(1)) + && subgroupExclusiveXor(gvec(T(1))) == gvec(T(1)) +#endif // #if defined(TEST_when_logical_operators_are_implemented) + ; +} + +bool testLogical() { + return true + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + ; +} + +__generic +bool test1Arithmetic() { + return true + && subgroupExclusiveAdd(T(1)) == T(3) + && subgroupExclusiveMul(T(1)) == T(1) + && subgroupExclusiveMin(T(1)) == T(1) + && subgroupExclusiveMax(T(1)) == T(1) + ; +} +__generic +bool testVArithmetic() { + typealias gvec = vector; + + return true + && subgroupExclusiveAdd(gvec(T(1))) == gvec(T(3)) + && subgroupExclusiveMul(gvec(T(1))) == gvec(T(1)) + && subgroupExclusiveMin(gvec(T(1))) == gvec(T(1)) + && subgroupExclusiveMax(gvec(T(1))) == gvec(T(1)) + ; +} + +bool testArithmetic() { + return true + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() // WARNING: intel GPU's lack FP64 support + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + ; +} + +void computeMain() +{ + + bool res0 = true + && testLogical() + ; + + bool res1 = true + && testArithmetic() + ; + + if (gl_LocalInvocationID.x == 3) { + // seperate so if there is an erroneous error the "major" + // tests are issolated into 2 branches without polluting the + // file with a bunch of individual test values + outputBuffer.data[0] = res0; + outputBuffer.data[1] = res1; + } + + // CHECK_GLSL: void main( + // CHECK_SPV: OpEntryPoint + // CHECK_HLSL: void computeMain( + // CHECK_CUDA: void computeMain( + // CHECK_CPP: void _computeMain( + // BUF: 1 + // BUF-NEXT: 1 +} diff --git a/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-arithmetic_Inclusive.slang b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-arithmetic_Inclusive.slang new file mode 100644 index 000000000..09c6bdbdf --- /dev/null +++ b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-arithmetic_Inclusive.slang @@ -0,0 +1,191 @@ +//TEST:SIMPLE(filecheck=CHECK_GLSL): -allow-glsl -stage compute -entry computeMain -target glsl -DTARGET_GLSL +//TEST:SIMPLE(filecheck=CHECK_SPV): -allow-glsl -stage compute -entry computeMain -target spirv -emit-spirv-directly -DTARGET_SPIRV +//TEST:SIMPLE(filecheck=CHECK_HLSL): -allow-glsl -stage compute -entry computeMain -target hlsl -DTARGET_HLSL +//TEST:SIMPLE(filecheck=CHECK_CUDA): -allow-glsl -stage compute -entry computeMain -target cuda -DTARGET_CUDA + +// not testing cpp due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CPP): -allow-glsl -stage compute -entry computeMain -target cpp -DTARGET_CPP + +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl -emit-spirv-directly +#version 430 + +#if 1 \ + && !defined(TARGET_HLSL) \ + && !defined(TARGET_CUDA) +// hlsl does not treat boolean types with subgroup.* as a logical operator +// cuda is missing an implementation +#define TEST_when_logical_operators_are_implemented +#endif + +//TEST_INPUT:ubuffer(data=[0 0], stride=4):out,name=outputBuffer +buffer MyBlockName2 +{ + uint data[]; +} outputBuffer; + +#define local_size_x_v 4 +layout(local_size_x = local_size_x_v) in; + +__generic +bool test1Logical() { + return true +#if defined(TEST_when_logical_operators_are_implemented) + && subgroupInclusiveAnd(T(1)) == T(1) + && subgroupInclusiveOr(T(1)) == T(1) + && subgroupInclusiveXor(T(1)) == T(0) +#endif // #if defined(TEST_when_logical_operators_are_implemented) + ; +} + +__generic +bool testVLogical() { + typealias gvec = vector; + + return true +#if defined(TEST_when_logical_operators_are_implemented) + && subgroupInclusiveAnd(gvec(T(1))) == gvec(T(1)) + && subgroupInclusiveOr(gvec(T(1))) == gvec(T(1)) + && subgroupInclusiveXor(gvec(T(1))) == gvec(T(0)) +#endif // #if defined(TEST_when_logical_operators_are_implemented) + ; +} + +bool testLogical() { + return true + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + ; +} + +__generic +bool test1Arithmetic() { + return true + && subgroupInclusiveAdd(T(1)) == T(4) + && subgroupInclusiveMul(T(1)) == T(1) + && subgroupInclusiveMin(T(1)) == T(1) + && subgroupInclusiveMax(T(1)) == T(1) + ; +} +__generic +bool testVArithmetic() { + typealias gvec = vector; + + return true + && subgroupInclusiveAdd(gvec(T(1))) == gvec(T(4)) + && subgroupInclusiveMul(gvec(T(1))) == gvec(T(1)) + && subgroupInclusiveMin(gvec(T(1))) == gvec(T(1)) + && subgroupInclusiveMax(gvec(T(1))) == gvec(T(1)) + ; +} + +bool testArithmetic() { + return true + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() // WARNING: intel GPU's lack FP64 support + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + ; +} + +void computeMain() +{ + + bool res0 = true + && testLogical() + ; + + bool res1 = true + && testArithmetic() + ; + + if (gl_LocalInvocationID.x == 3) { + // seperate so if there is an erroneous error the "major" + // tests are issolated into 2 branches without polluting the + // file with a bunch of individual test values + outputBuffer.data[0] = res0; + outputBuffer.data[1] = res1; + } + + // CHECK_GLSL: void main( + // CHECK_SPV: OpEntryPoint + // CHECK_HLSL: void computeMain( + // CHECK_CUDA: void computeMain( + // CHECK_CPP: void _computeMain( + // BUF: 1 + // BUF-NEXT: 1 +} diff --git a/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-arithmetic_None.slang b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-arithmetic_None.slang new file mode 100644 index 000000000..5300e6796 --- /dev/null +++ b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-arithmetic_None.slang @@ -0,0 +1,191 @@ +//TEST:SIMPLE(filecheck=CHECK_GLSL): -allow-glsl -stage compute -entry computeMain -target glsl -DTARGET_GLSL +//TEST:SIMPLE(filecheck=CHECK_SPV): -allow-glsl -stage compute -entry computeMain -target spirv -emit-spirv-directly -DTARGET_SPIRV +//TEST:SIMPLE(filecheck=CHECK_HLSL): -allow-glsl -stage compute -entry computeMain -target hlsl -DTARGET_HLSL +//TEST:SIMPLE(filecheck=CHECK_CUDA): -allow-glsl -stage compute -entry computeMain -target cuda -DTARGET_CUDA + +// not testing cpp due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CPP): -allow-glsl -stage compute -entry computeMain -target cpp -DTARGET_CPP + +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl -emit-spirv-directly +#version 430 + +#if 1 \ + && !defined(TARGET_HLSL) \ + && !defined(TARGET_CUDA) +// hlsl does not treat boolean types with subgroup.* as a logical operator +// cuda is missing an implementation +#define TEST_when_logical_operators_are_implemented +#endif + +//TEST_INPUT:ubuffer(data=[0 0], stride=4):out,name=outputBuffer +buffer MyBlockName2 +{ + uint data[]; +} outputBuffer; + +#define local_size_x_v 4 +layout(local_size_x = local_size_x_v) in; + +__generic +bool test1Logical() { + return true +#if defined(TEST_when_logical_operators_are_implemented) + && subgroupAnd(T(1)) == T(1) + && subgroupOr(T(1)) == T(1) + && subgroupXor(T(1)) == T(0) +#endif // #if defined(TEST_when_logical_operators_are_implemented) + ; +} + +__generic +bool testVLogical() { + typealias gvec = vector; + + return true +#if defined(TEST_when_logical_operators_are_implemented) + && subgroupAnd(gvec(T(1))) == gvec(T(1)) + && subgroupOr(gvec(T(1))) == gvec(T(1)) + && subgroupXor(gvec(T(1))) == gvec(T(0)) +#endif // #if defined(TEST_when_logical_operators_are_implemented) + ; +} + +bool testLogical() { + return true + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + ; +} + +__generic +bool test1Arithmetic() { + return true + && subgroupAdd(T(1)) == T(local_size_x_v) // 32 + && subgroupMul(T(1)) == T(1) + && subgroupMin(T(1)) == T(1) + && subgroupMax(T(1)) == T(1) + ; +} +__generic +bool testVArithmetic() { + typealias gvec = vector; + + return true + && subgroupAdd(gvec(T(1))) == gvec(T(local_size_x_v)) // 32 + && subgroupMul(gvec(T(1))) == gvec(T(1)) + && subgroupMin(gvec(T(1))) == gvec(T(1)) + && subgroupMax(gvec(T(1))) == gvec(T(1)) + ; +} + +bool testArithmetic() { + return true + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() // WARNING: intel GPU's lack FP64 support + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + ; +} + +void computeMain() +{ + + bool res0 = true + && testLogical() + ; + + bool res1 = true + && testArithmetic() + ; + + if (gl_LocalInvocationID.x == 3) { + // seperate so if there is an erroneous error the "major" + // tests are issolated into 2 branches without polluting the + // file with a bunch of individual test values + outputBuffer.data[0] = res0; + outputBuffer.data[1] = res1; + } + + // CHECK_GLSL: void main( + // CHECK_SPV: OpEntryPoint + // CHECK_HLSL: void computeMain( + // CHECK_CUDA: void computeMain( + // CHECK_CPP: void _computeMain( + // BUF: 1 + // BUF-NEXT: 1 +} diff --git a/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-ballot.slang b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-ballot.slang new file mode 100644 index 000000000..8bbd60689 --- /dev/null +++ b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-ballot.slang @@ -0,0 +1,142 @@ +//TEST:SIMPLE(filecheck=CHECK_GLSL): -allow-glsl -stage compute -entry computeMain -target glsl +//TEST:SIMPLE(filecheck=CHECK_SPV): -allow-glsl -stage compute -entry computeMain -target spirv -emit-spirv-directly +//TEST:SIMPLE(filecheck=CHECK_HLSL): -allow-glsl -stage compute -entry computeMain -target hlsl -DTARGET_HLSL + +// not testing cuda due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CUDA): -allow-glsl -stage compute -entry computeMain -target cuda -DTARGET_CUDA +// not testing cpp due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CPP): -allow-glsl -stage compute -entry computeMain -target cpp -DTARGET_CPP + +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl -emit-spirv-directly +#version 430 + +// breaks on Nvidia GPU by returning 0 which is trivially wrong (works on Intel Iris Xe) +//#define TEST_when_glsl_subgroupBallotExclusiveBitCount_is_not_bugged + +//TEST_INPUT:ubuffer(data=[0 0], stride=4):out,name=outputBuffer +buffer MyBlockName2 +{ + uint data[]; +} outputBuffer; + +layout(local_size_x = 32) in; + +__generic +bool test1BroadcastX() { + return true + && subgroupBroadcast(T(1), 0) == T(1) + && subgroupBroadcastFirst(T(1)) == T(1) + ; +} +__generic +bool testVBroadcastX() { + typealias gvec = vector; + + return true + && subgroupBroadcast(gvec(T(1)), 0) == gvec(T(1)) + && subgroupBroadcastFirst(gvec(T(1))) == gvec(T(1)) + ; +} + +__generic +bool test1BroadcastX() { + return true + && subgroupBroadcast(T(1), 0) == T(1) + && subgroupBroadcastFirst(T(1)) == T(1) + ; +} +__generic +bool testVBroadcastX() { + typealias gvec = vector; + + return true + && subgroupBroadcast(gvec(T(1)), 0) == gvec(T(1)) + && subgroupBroadcastFirst(gvec(T(1))) == gvec(T(1)) + ; +} +bool testBroadcastX() { + return true + && test1BroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && test1BroadcastX() // WARNING: intel GPU's lack FP64 support + && testVBroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && test1BroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && test1BroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && test1BroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && test1BroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && test1BroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && test1BroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && test1BroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && test1BroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && test1BroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && test1BroadcastX() + && testVBroadcastX() + && testVBroadcastX() + && testVBroadcastX() + ; +} + +bool testBallot() { + return true + && (subgroupBallot(true).x == 0xFFFFFFFF) + && (subgroupInverseBallot(uvec4(0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF)) == true) + && (subgroupBallotBitExtract(uvec4(0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF), 0) == true) + && (subgroupBallotBitCount(uvec4(0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF)) == 32) + && (subgroupBallotInclusiveBitCount(uvec4(0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF)) != 0) +#ifdef TEST_when_glsl_subgroupBallotExclusiveBitCount_is_not_bugged + && (subgroupBallotExclusiveBitCount(uvec4(0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF)) != 0) +#endif + && (subgroupBallotFindLSB(uvec4(0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF)) == 0) + && (subgroupBallotFindMSB(uvec4(0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF)) == 31) + ; +} + +void computeMain() +{ + outputBuffer.data[0] = true + && testBroadcastX() + ; + outputBuffer.data[1] = true + && testBallot() + ; + + // CHECK_GLSL: void main( + // CHECK_SPV: OpEntryPoint + // CHECK_HLSL: void computeMain( + // CHECK_CUDA: void computeMain( + // CHECK_CPP: void _computeMain( + // BUF: 1 + // BUF-NEXT: 1 +} diff --git a/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-basic.slang b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-basic.slang new file mode 100644 index 000000000..82f2dc8e2 --- /dev/null +++ b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-basic.slang @@ -0,0 +1,66 @@ +//TEST:SIMPLE(filecheck=CHECK_GLSL): -allow-glsl -stage compute -entry computeMain -target glsl +//TEST:SIMPLE(filecheck=CHECK_SPV): -allow-glsl -stage compute -entry computeMain -target spirv -emit-spirv-directly +//TEST:SIMPLE(filecheck=CHECK_HLSL): -allow-glsl -stage compute -entry computeMain -target hlsl -DTARGET_HLSL + +// not testing cuda due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CUDA): -allow-glsl -stage compute -entry computeMain -target cuda -DTARGET_CUDA +// not testing cpp due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CPP): -allow-glsl -stage compute -entry computeMain -target cpp -DTARGET_CPP + +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl -emit-spirv-directly +#version 430 + +//TEST_INPUT:ubuffer(data=[0 0 0 0 0], stride=4):out,name=outputBuffer +buffer MyBlockName2 +{ + uint data[]; +} outputBuffer; + +layout(local_size_x = 32) in; + +shared uint shareMem; + +void computeMain() +{ + // TODO: no test for image memory was done -- subgroupMemoryBarrierImage(); + // tests are seperate since concurrency testing + + shareMem = 100; + subgroupMemoryBarrierShared(); + outputBuffer.data[0] = 1; + subgroupBarrier(); + outputBuffer.data[0] = 2; + subgroupBarrier(); + + outputBuffer.data[1] = 1; + subgroupMemoryBarrier(); + outputBuffer.data[1] = 2; + subgroupBarrier(); + + outputBuffer.data[2] = 1; + subgroupMemoryBarrierBuffer(); + outputBuffer.data[2] = 2; + subgroupBarrier(); + + shareMem = 2; + subgroupMemoryBarrierShared(); + outputBuffer.data[3] = shareMem; + subgroupBarrier(); + + if (subgroupElect()) { + outputBuffer.data[4] = gl_GlobalInvocationID.x + 2; + } + + // CHECK_GLSL: void main( + // CHECK_SPV: OpEntryPoint + // CHECK_HLSL: void computeMain( + // CHECK_CUDA: void computeMain( + // CHECK_CPP: void _computeMain( + + // BUF: 2 + // BUF-NEXT: 2 + // BUF-NEXT: 2 + // BUF-NEXT: 2 + // BUF-NEXT: 2 +} diff --git a/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-builtin-variables.slang b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-builtin-variables.slang new file mode 100644 index 000000000..21b533178 --- /dev/null +++ b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-builtin-variables.slang @@ -0,0 +1,44 @@ +//TEST:SIMPLE(filecheck=CHECK_GLSL): -allow-glsl -stage compute -entry computeMain -target glsl +//TEST:SIMPLE(filecheck=CHECK_SPV): -allow-glsl -stage compute -entry computeMain -target spirv -emit-spirv-directly + +// missing implementation of most builtin values due to non trivial translation +//DISABLE_TEST:SIMPLE(filecheck=CHECK_HLSL): -allow-glsl -stage compute -entry computeMain -target hlsl -DTARGET_HLSL +// missing implementation of most builtin values due to non trivial translation +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CUDA): -allow-glsl -stage compute -entry computeMain -target cuda -DTARGET_CUDA +//missing implementation of system (varying?) values +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CPP): -allow-glsl -stage compute -entry computeMain -target cpp -DTARGET_CPP + +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl -emit-spirv-directly +#version 430 + +//TEST_INPUT:ubuffer(data=[0], stride=4):out,name=outputBuffer +buffer MyBlockName2 +{ + uint data[]; +} outputBuffer; + +layout(local_size_x = 32) in; + +void computeMain() +{ + if (gl_GlobalInvocationID.x == 3) { + outputBuffer.data[0] = true + && gl_NumSubgroups == 1 + && gl_SubgroupID == 0 //1 subgroup, 0 based indexing + && gl_SubgroupSize == 32 + && gl_SubgroupInvocationID == 3 + && gl_SubgroupEqMask == uvec4(0b1000,0,0,0) + && gl_SubgroupGeMask == uvec4(0xFFFFFFF8,0,0,0) + && gl_SubgroupGtMask == uvec4(0xFFFFFFF0,0,0,0) + && gl_SubgroupLeMask == uvec4(0b1111,0,0,0) + && gl_SubgroupLtMask == uvec4(0b111,0,0,0) + ; + } + // CHECK_GLSL: void main( + // CHECK_SPV: OpEntryPoint + // CHECK_HLSL: void computeMain( + // CHECK_CUDA: void computeMain( + // CHECK_CPP: void _computeMain( + // BUF: 1 +} diff --git a/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-clustered.slang b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-clustered.slang new file mode 100644 index 000000000..9e9b089d2 --- /dev/null +++ b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-clustered.slang @@ -0,0 +1,171 @@ +//TEST:SIMPLE(filecheck=CHECK_GLSL): -allow-glsl -stage compute -entry computeMain -target glsl +//TEST:SIMPLE(filecheck=CHECK_SPV): -allow-glsl -stage compute -entry computeMain -target spirv -emit-spirv-directly + +// not testing hlsl due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_HLSL): -allow-glsl -stage compute -entry computeMain -target hlsl -DTARGET_HLSL +// not testing cuda due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CUDA): -allow-glsl -stage compute -entry computeMain -target cuda -DTARGET_CUDA +// not testing cpp due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CPP): -allow-glsl -stage compute -entry computeMain -target cpp -DTARGET_CPP + +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl -emit-spirv-directly +#version 430 + +//TEST_INPUT:ubuffer(data=[0 0], stride=4):out,name=outputBuffer +buffer MyBlockName2 +{ + uint data[]; +} outputBuffer; + +layout(local_size_x = 32) in; + +__generic +bool test1Logical() { + return true + && subgroupClusteredAnd(T(1), 1) == T(1) + && subgroupClusteredOr(T(1), 1) == T(1) + && subgroupClusteredXor(T(1), 1) == T(1) + ; +} + +__generic +bool testVLogical() { + typealias gvec = vector; + + return true + && subgroupClusteredAnd(gvec(T(1)), 1) == gvec(T(1)) + && subgroupClusteredOr(gvec(T(1)), 1) == gvec(T(1)) + && subgroupClusteredXor(gvec(T(1)), 1) == gvec(T(1)) + ; +} + +bool testLogical() { + return true + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + && test1Logical() + && testVLogical() + && testVLogical() + && testVLogical() + ; +} + +__generic +bool test1Arithmetic() { + return true + && subgroupClusteredAdd(T(1), 1) == T(1) + && subgroupClusteredMul(T(1), 1) == T(1) + && subgroupClusteredMin(T(1), 1) == T(1) + && subgroupClusteredMax(T(1), 1) == T(1) + ; +} + +__generic +bool testVArithmetic() { + typealias gvec = vector; + + return true + && subgroupClusteredAdd(gvec(T(1)), 1) == gvec(T(1)) + && subgroupClusteredMul(gvec(T(1)), 1) == gvec(T(1)) + && subgroupClusteredMin(gvec(T(1)), 1) == gvec(T(1)) + && subgroupClusteredMax(gvec(T(1)), 1) == gvec(T(1)) + ; +} + +bool testArithmetic() { + return true + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() // WARNING: intel GPU's lack FP64 support + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + && test1Arithmetic() + && testVArithmetic() + && testVArithmetic() + && testVArithmetic() + ; +} + +void computeMain() +{ + outputBuffer.data[0] = true + && testLogical() + ; + outputBuffer.data[1] = true + && testArithmetic() + ; + + // CHECK_GLSL: void main( + // CHECK_SPV: OpEntryPoint + // CHECK_HLSL: void computeMain( + // CHECK_CUDA: void computeMain( + // CHECK_CPP: void _computeMain( + // BUF: 1 + // BUF-NEXT: 1 +} diff --git a/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-quad.slang b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-quad.slang new file mode 100644 index 000000000..5ed6398b2 --- /dev/null +++ b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-quad.slang @@ -0,0 +1,129 @@ +//TEST:SIMPLE(filecheck=CHECK_GLSL): -allow-glsl -stage compute -entry computeMain -target glsl +//TEST:SIMPLE(filecheck=CHECK_SPV): -allow-glsl -stage compute -entry computeMain -target spirv -emit-spirv-directly +//TEST:SIMPLE(filecheck=CHECK_HLSL): -allow-glsl -stage compute -entry computeMain -target hlsl -DTARGET_HLSL + +// not testing cuda due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CUDA): -allow-glsl -stage compute -entry computeMain -target cuda -DTARGET_CUDA +// not testing cpp due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CPP): -allow-glsl -stage compute -entry computeMain -target cpp -DTARGET_CPP + +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl -emit-spirv-directly +#version 430 + +//TEST_INPUT:ubuffer(data=[0], stride=4):out,name=outputBuffer +buffer MyBlockName2 +{ + uint data[]; +} outputBuffer; + +layout(local_size_x = 4) in; + +__generic +bool test1QuadX() { + return true + && subgroupQuadSwapHorizontal(T(2)) == T(2) + && subgroupQuadSwapVertical(T(2)) == T(2) + && subgroupQuadSwapDiagonal(T(3)) == T(3) + && subgroupQuadBroadcast(T(1), 1) == T(1) + ; +} +__generic +bool testVQuadX() { + typealias gvec = vector; + + return true + && subgroupQuadSwapHorizontal(gvec(T(2))) == gvec(T(2)) + && subgroupQuadSwapVertical(gvec(T(2))) == gvec(T(2)) + && subgroupQuadSwapDiagonal(gvec(T(3))) == gvec(T(3)) + && subgroupQuadBroadcast(gvec(T(1)), 1) == gvec(T(1)) + ; +} + +__generic +bool test1QuadX() { + return true + && subgroupQuadSwapHorizontal(T(2)) == T(2) + && subgroupQuadSwapVertical(T(2)) == T(2) + && subgroupQuadSwapDiagonal(T(3)) == T(3) + && subgroupQuadBroadcast(T(1), 1) == T(1) + ; +} +__generic +bool testVQuadX() { + typealias gvec = vector; + + return true + && subgroupQuadSwapHorizontal(gvec(T(2))) == gvec(T(2)) + && subgroupQuadSwapVertical(gvec(T(2))) == gvec(T(2)) + && subgroupQuadSwapDiagonal(gvec(T(3))) == gvec(T(3)) + && subgroupQuadBroadcast(gvec(T(1)), 1) == gvec(T(1)) + ; +} +bool testQuadSwapX() { + return true + && test1QuadX() + && testVQuadX() + && testVQuadX() + && testVQuadX() + && test1QuadX() // WARNING: intel GPU's lack FP64 support + && testVQuadX() + && testVQuadX() + && testVQuadX() + && test1QuadX() + && testVQuadX() + && testVQuadX() + && testVQuadX() + && test1QuadX() + && testVQuadX() + && testVQuadX() + && testVQuadX() + && test1QuadX() + && testVQuadX() + && testVQuadX() + && testVQuadX() + && test1QuadX() + && testVQuadX() + && testVQuadX() + && testVQuadX() + && test1QuadX() + && testVQuadX() + && testVQuadX() + && testVQuadX() + && test1QuadX() + && testVQuadX() + && testVQuadX() + && testVQuadX() + && test1QuadX() + && testVQuadX() + && testVQuadX() + && testVQuadX() + && test1QuadX() + && testVQuadX() + && testVQuadX() + && testVQuadX() + && test1QuadX() + && testVQuadX() + && testVQuadX() + && testVQuadX() + && test1QuadX() + && testVQuadX() + && testVQuadX() + && testVQuadX() + ; +} + +void computeMain() +{ + + outputBuffer.data[0] = true + && testQuadSwapX() + ; + + // CHECK_GLSL: void main( + // CHECK_SPV: OpEntryPoint + // CHECK_HLSL: void computeMain( + // CHECK_CUDA: void computeMain( + // CHECK_CPP: void _computeMain( + // BUF: 1 +} diff --git a/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-shuffle-relative.slang b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-shuffle-relative.slang new file mode 100644 index 000000000..0e187c568 --- /dev/null +++ b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-shuffle-relative.slang @@ -0,0 +1,121 @@ +//TEST:SIMPLE(filecheck=CHECK_GLSL): -allow-glsl -stage compute -entry computeMain -target glsl +//TEST:SIMPLE(filecheck=CHECK_SPV): -allow-glsl -stage compute -entry computeMain -target spirv -emit-spirv-directly + +// not testing hlsl due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_HLSL): -allow-glsl -stage compute -entry computeMain -target hlsl -DTARGET_HLSL +// not testing cuda due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CUDA): -allow-glsl -stage compute -entry computeMain -target cuda -DTARGET_CUDA +// not testing cpp due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CPP): -allow-glsl -stage compute -entry computeMain -target cpp -DTARGET_CPP + +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl -emit-spirv-directly +#version 430 + +//TEST_INPUT:ubuffer(data=[0], stride=4):out,name=outputBuffer +buffer MyBlockName2 +{ + uint data[]; +} outputBuffer; + +layout(local_size_x = 32) in; + +__generic +bool test1ShuffleX() { + return true + && subgroupShuffleUp(T(1), 1) == T(1) + && subgroupShuffleDown(T(1), 1) == T(1) + ; +} +__generic +bool testVShuffleX() { + typealias gvec = vector; + + return true + && subgroupShuffleUp(gvec(T(1)), 1) == gvec(T(1)) + && subgroupShuffleDown(gvec(T(1)), 1) == gvec(T(1)) + ; +} + +__generic +bool test1ShuffleX() { + return true + && subgroupShuffleUp(T(1), 1) == T(1) + && subgroupShuffleDown(T(1), 1) == T(1) + ; +} +__generic +bool testVShuffleX() { + typealias gvec = vector; + + return true + && subgroupShuffleUp(gvec(T(1)), 1) == gvec(T(1)) + && subgroupShuffleDown(gvec(T(1)), 1) == gvec(T(1)) + ; +} +bool testShuffleX() { + return true + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() // WARNING: intel GPU's lack FP64 support + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + ; +} + +void computeMain() +{ + outputBuffer.data[0] = true + && testShuffleX() + ; + + // CHECK_GLSL: void main( + // CHECK_SPV: OpEntryPoint + // CHECK_HLSL: void computeMain( + // CHECK_CUDA: void computeMain( + // CHECK_CPP: void _computeMain( + // BUF: 1 +} diff --git a/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-shuffle.slang b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-shuffle.slang new file mode 100644 index 000000000..5dca1a588 --- /dev/null +++ b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-shuffle.slang @@ -0,0 +1,139 @@ +//TEST:SIMPLE(filecheck=CHECK_GLSL): -allow-glsl -stage compute -entry computeMain -target glsl +//TEST:SIMPLE(filecheck=CHECK_SPV): -allow-glsl -stage compute -entry computeMain -target spirv -emit-spirv-directly + +// not testing hlsl due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_HLSL): -allow-glsl -stage compute -entry computeMain -target hlsl -DTARGET_HLSL +// not testing cuda due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CUDA): -allow-glsl -stage compute -entry computeMain -target cuda -DTARGET_CUDA +// not testing cpp due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CPP): -allow-glsl -stage compute -entry computeMain -target cpp -DTARGET_CPP + +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl -emit-spirv-directly +#version 430 + +#if 1 \ + && !defined(TARGET_HLSL) \ + && !defined(TARGET_CUDA) +// hlsl is missing an implementation +// cuda is missing an implementation +#define TEST_when_subgroupShuffleXor_is_implemented +#endif + +//TEST_INPUT:ubuffer(data=[0], stride=4):out,name=outputBuffer +buffer MyBlockName2 +{ + uint data[]; +} outputBuffer; + +layout(local_size_x = 32) in; + +__generic +bool test1ShuffleX() { + return true + && subgroupShuffle(T(1), 1) == T(1) +#ifdef TEST_when_subgroupShuffleXor_is_implemented + && subgroupShuffleXor(T(1), 1) == T(1) +#endif // #ifdef TEST_when_subgroupShuffleXor_is_implemented + ; +} +__generic +bool testVShuffleX() { + typealias gvec = vector; + + return true + && subgroupShuffle(gvec(T(1)), 1) == gvec(T(1)) +#ifdef TEST_when_subgroupShuffleXor_is_implemented + && subgroupShuffleXor(gvec(T(1)), 1) == gvec(T(1)) +#endif // #ifdef TEST_when_subgroupShuffleXor_is_implemented + ; +} + +__generic +bool test1ShuffleX() { + return true + && subgroupShuffle(T(1), 1) == T(1) +#if !defined(TARGET_CUDA) && !defined(TARGET_HLSL) + && subgroupShuffleXor(T(1), 1) == T(1) +#endif // #if !defined(TARGET_CUDA) && !defined(TARGET_HLSL) + ; +} +__generic +bool testVShuffleX() { + typealias gvec = vector; + + return true + && subgroupShuffle(gvec(T(1)), 1) == gvec(T(1)) +#if !defined(TARGET_CUDA) && !defined(TARGET_HLSL) + && subgroupShuffleXor(gvec(T(1)), 1) == gvec(T(1)) +#endif // #if !defined(TARGET_CUDA) && !defined(TARGET_HLSL) + ; +} +bool testShuffleX() { + return true + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() // WARNING: intel GPU's lack FP64 support + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + && test1ShuffleX() + && testVShuffleX() + && testVShuffleX() + && testVShuffleX() + ; +} + + +void computeMain() +{ + + outputBuffer.data[0] = true + && testShuffleX() + ; + + // CHECK_GLSL: void main( + // CHECK_SPV: OpEntryPoint + // CHECK_HLSL: void computeMain( + // CHECK_CUDA: void computeMain( + // CHECK_CPP: void _computeMain( + // BUF: 1 +} diff --git a/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-vote.slang b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-vote.slang new file mode 100644 index 000000000..bcd4aeb56 --- /dev/null +++ b/tests/glsl-intrinsic/shader-subgroup/shader-subgroup-vote.slang @@ -0,0 +1,167 @@ +//TEST:SIMPLE(filecheck=CHECK_GLSL): -allow-glsl -stage compute -entry computeMain -target glsl +//TEST:SIMPLE(filecheck=CHECK_SPV): -allow-glsl -stage compute -entry computeMain -target spirv -emit-spirv-directly +//TEST:SIMPLE(filecheck=CHECK_HLSL): -allow-glsl -stage compute -entry computeMain -target hlsl -DTARGET_HLSL + +// not testing cuda due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CUDA): -allow-glsl -stage compute -entry computeMain -target cuda -DTARGET_CUDA +// not testing cpp due to missing impl +//DISABLE_TEST:SIMPLE(filecheck=CHECK_CPP): -allow-glsl -stage compute -entry computeMain -target cpp -DTARGET_CPP + +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl +//TEST(compute, vulkan):COMPARE_COMPUTE(filecheck-buffer=BUF):-vk -compute -entry computeMain -allow-glsl -emit-spirv-directly +#version 430 + +//TEST_INPUT:ubuffer(data=[9], stride=4):name=inputBuffer +buffer MyBlockName +{ + uint data[]; +} inputBuffer; + +//TEST_INPUT:ubuffer(data=[0 0 0 0 0], stride=4):out,name=outputBuffer +buffer MyBlockName2 +{ + uint data[]; +} outputBuffer; + +layout(local_size_x = 32) in; + +__generic +bool test1AllEqual() { + return true + && subgroupAllEqual(T(1)) == true + && subgroupAllEqual(T(gl_GlobalInvocationID.x)) == false + ; +} +__generic +bool testVAllEqual() { + typealias gvec = vector; + + return true + && subgroupAllEqual(gvec(T(1))) == true + && subgroupAllEqual(gvec(T(gl_GlobalInvocationID.x))) == false + ; +} + +__generic +bool test1AllEqual() { + return true + && subgroupAllEqual(T(1)) == true + && subgroupAllEqual(T(gl_GlobalInvocationID.x)) == false + ; +} +__generic +bool testVAllEqual() { + typealias gvec = vector; + + return true + && subgroupAllEqual(gvec(T(1))) == true + && subgroupAllEqual(gvec(T(gl_GlobalInvocationID.x))) == false + ; +} +bool testAllEqual() { + return true + && test1AllEqual() + && testVAllEqual() + && testVAllEqual() + && testVAllEqual() + && test1AllEqual() // WARNING: intel GPU's lack FP64 support + && testVAllEqual() + && testVAllEqual() + && testVAllEqual() + && test1AllEqual() + && testVAllEqual() + && testVAllEqual() + && testVAllEqual() + && test1AllEqual() + && testVAllEqual() + && testVAllEqual() + && testVAllEqual() + && test1AllEqual() + && testVAllEqual() + && testVAllEqual() + && testVAllEqual() + && test1AllEqual() + && testVAllEqual() + && testVAllEqual() + && testVAllEqual() + && test1AllEqual() + && testVAllEqual() + && testVAllEqual() + && testVAllEqual() + && test1AllEqual() + && testVAllEqual() + && testVAllEqual() + && testVAllEqual() + && test1AllEqual() + && testVAllEqual() + && testVAllEqual() + && testVAllEqual() + && test1AllEqual() + && testVAllEqual() + && testVAllEqual() + && testVAllEqual() + && test1AllEqual() + && testVAllEqual() + && testVAllEqual() + && testVAllEqual() + && test1AllEqual() + && testVAllEqual() + && testVAllEqual() + && testVAllEqual() + ; +} + +void computeMain() +{ + //seperate tests since testing concurrency + + // one is true, rest false, positive + outputBuffer.data[0] = 1; + bool t1 = inputBuffer.data[0] == gl_GlobalInvocationID.x; + if (subgroupAny(t1)) { + subgroupBarrier(); + outputBuffer.data[0] = 2; + } + + // all false, negative + outputBuffer.data[1] = 1; + t1 = false; + if (!subgroupAny(t1)) { + subgroupBarrier(); + outputBuffer.data[1] = 2; + } + + // all true, positive + outputBuffer.data[2] = 1; + t1 = true; + if (subgroupAll(t1)) { + subgroupBarrier(); + outputBuffer.data[2] = 2; + } + + // all false, negative + outputBuffer.data[3] = 1; + t1 = false; + if (!subgroupAll(t1)) { + subgroupBarrier(); + outputBuffer.data[3] = 2; + } + + outputBuffer.data[4] = 1; + + if (testAllEqual()) { + subgroupBarrier(); + outputBuffer.data[4] = 2; + } + + // CHECK_GLSL: void main( + // CHECK_SPV: OpEntryPoint + // CHECK_HLSL: void computeMain( + // CHECK_CUDA: void computeMain( + // CHECK_CPP: void _computeMain( + // BUF: 2 + // BUF-NEXT: 2 + // BUF-NEXT: 2 + // BUF-NEXT: 2 + // BUF-NEXT: 2 +} -- cgit v1.2.3