summaryrefslogtreecommitdiffstats
path: root/source
diff options
context:
space:
mode:
authorArielG-NV <159081215+ArielG-NV@users.noreply.github.com>2024-02-26 19:09:09 -0500
committerGitHub <noreply@github.com>2024-02-26 16:09:09 -0800
commit1241006b6d89cae09766ca9795187ef9c0dd2085 (patch)
tree9b114fc0ac218cae917b43f31090f1e20e2d96e5 /source
parent4f03eb9d657fd74da341bb2b0d391c6b855073af (diff)
Partially implement shader_subgroup extension(s); Partially resolves #3548 (#3580)
* Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548 Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548 GL_KHR_shader_subgroup implemented based on https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt Implementation is broken down into seperate glsl extensions due to the ***large differences*** in implementation of each section, and functionality/testing. GL_KHR_shader_subgroup_basic{ **Partially implemented** Implementation: * All 9 built-in variables have been stubbed without proper value; implementation is still required for these system variables; related to #411. * Functions were reimplemented despite nearly mirrored HLSL functions due to: * hlsl.meta implementations targetting workgroups rather than a warp/wave/subgroup: * `__syncwarp` vs `__syncthreads` * `SubgroupMemory` vs `WorkgroupMemory` * etc. * hlsl.meta implementations target broader SPIR-V memory targets to block on: * ImageMemory|UniformMemory versus SPIR-V specifying barriers for ImageMemory and seperately an option for UniformMemory * `subgroupElect` for CUDA has a different implementation than `WaveIsFirstLane`, this is because spec states that `subgroupElect()` only returns the lowest active gl_SubgroupInvocationID; therefore we are supposed to fetch the current active mask even if some invocations are turned off by branches Testing: tests for the variable -- `tests/glsl/shader-subgroup-built-in-variables.slang` * these tests do not test functionality since not implemented yet tests for the functions -- `tests/glsl/shader-subgroup-basic.slang` * concurrency is tested for using SubgroupMemory, UniformMemory through attempting to create a GPU side race condition with writing and reading memory * due to testing tools avaible there are no tests for ImageMemory * subgroupElect is tested to return invocation #0, the lowest invocation that will always run; wave size is 32, therefore #0 is always active and will always be the elected invocation. } GL_KHR_shader_subgroup_vote{ **Fully implemented** Implementation: * 3/3 functions are using the hlsl.meta implementation Testing: `tests/glsl/shader-subgroup-vote.slang` * Testing each a positive (returns true) and negative (returns false) test case to ensure vote results are correct } GL_KHR_shader_subgroup_ballot{ **Partially implemented** Implementation: There are 10/10 functions that are implemented: * 3 are using hlsl.meta implementation * 7 are using new implementations -- only support GLSL, SPIR-V, HLSL, CUDA * These implementations do not exist in hlsl.meta, so they were added * `subgroupInverseBallot` lacks an analog function to call; this feature was emulated: * in CUDA through knowing waves are 32bit and lanes are 0 indexed, this implys that ` (ballotResult >> YOUR_INVOCATION) & 1` checks if your invocation is active, for example, `(0b11001 >> 3) & 1` would mean that only invocation 5, 4, and 1 is active, 3 would mean `YOUR_INVOCATION` is the fourth invocation in the subgroup. `(0b11001>>3) & 1` would return true since your bit is toggled and evaluates to `0b11 & 0b1` * in HLSL through testing if the wave count is 32 or less (use the same logic as CUDA in this case); else find the index `YOUR_INVOCATION` corrisponds with where each vector has 32bits (32 waves); avoid division in the process. then run the same algorithm cuda employs. * `subgroupBallotBitExtract` is logically the same as `subgroupInverseBallot` * 5 implementations do not have a CUDA, HLSL, and CPP imlementation yet (subgroupBallotFindMSB, subgroupBallotFindLSB, subgroupBallotExclusiveBitCount, subgroupBallotInclusiveBitCount, subgroupBallotBitCount) due to being out of scope for the commit Testing: `tests/glsl/shader-subgroup-ballot.slang` * the function tests for an expected value of each ballot function; tests try inputting larger than 32 toggled bits as function parameters to ensure the implementation correctly identifies values up to a maximum of the subgroup invocation count as per extension specification (otherwise the functionality is fairly trivial to test) } GL_KHR_shader_subgroup_arithmetic{ **Partially implemented** Implementation: * There are 21 functions to implement: * 14 functions are using the hlsl.meta implementation * 7 functions are new implementations -- only implemented for GLSL and SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required * CUDA, CPP, HLSL are out of scope for the commit Testing: `tests/glsl/shader-subgroup-arithmetic.slang` * all tests silently kill the shader; outputted GLSL was checked, could not see an issue * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_shuffle{ **Partially implemented** Implementation: * There are 2 functions to implement: * 1 function is using the existing hlsl.meta implmentation * 1 function is using a new implmentation (subgroupShuffleXor) -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle.slang` * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit] * tests fail with cpp due to `kIROp_WaveGetActiveMask` failing to be called } GL_KHR_shader_subgroup_shuffle_relative{ **Partially implemented** Implementation: * There are 2 functions to implement: * all 2 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-relative.slang` * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_clustered{ **Partially implemented** Implementation: * There are 7 functions to implement: * all 7 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-clustered.slang` * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_quad{ **Partially implemented** Implementation: * There are 4 functions to implement: * all 4 functions are using hlsl.meta implmentations -- only implemented for GLSL & SPIR-V & HLSL Testing: `tests/glsl/shader-subgroup-shuffle-quad.slang` * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit] } --------- Failing tests and why: Note: due to system variables not being implemented largly for CUDA and CPP, these tests will fail (#3 and #4){ tests/glsl/shader-subgroup-arithmetic.slang.3 tests/glsl/shader-subgroup-arithmetic.slang.4 tests/glsl/shader-subgroup-ballot.slang.4 tests/glsl/shader-subgroup-basic.slang.3 tests/glsl/shader-subgroup-basic.slang.4 tests/glsl/shader-subgroup-quad.slang.3 tests/glsl/shader-subgroup-quad.slang.4 tests/glsl/shader-subgroup-vote.slang.3 tests/glsl/shader-subgroup-vote.slang.4 } Note: due to kIROp_WaveGetActiveMask not being loaded for cpp the following test will fail{ tests/glsl/shader-subgroup-shuffle.slang.4 } Note: due to a unknown silent error the following will fail [could not spot an error in the generated glsl and spir-v]{ tests/glsl/shader-subgroup-arithmetic.slang.5 (vk) tests/glsl/shader-subgroup-arithmetic.slang.6 (vk) } Other notes of worthy:{ * only a few types are checked currently in tests due to equality templates not allowing freely casting to int/uint, meaning to test types en-mass is not trivial and will most likley be completly replaced once templates can cast & check equality more freely. * did not implement vector types for any functions that may use them (mostly in reference to SPIR-V, since many may accept scalar or vector inputs); applicable to subgroup-shuffle, subgroup-shuffle-relative, subgroup-arithmetic, subgroup-shuffle, subgroup_clustered, subgroup_quad * did not implement checks for half floats * CUDA, CPP, HLSL implementations were largly out of scope and if not implemented, this is due to the implementation not being trivial } Random fixes encountered:{ * hlsl.meta incorrectly sets `OpCapability` as `GroupNonUniformBallot` when the `OpCapability` should be `GroupNonUniformVote`; this is as per SPIR-V spec for all SPIR-V calls used in `GL_KHR_shader_subgroup_vote`: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformAll } * added vector types and tests; Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548 GL_KHR_shader_subgroup implemented based on https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt GL_KHR_shader_subgroup_* & GLSL ref: * https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt * https://www.khronos.org/blog/vulkan-subgroup-tutorial * https://www.khronos.org/assets/uploads/developers/library/2018-vulkan-devday/06-subgroups.pdf HLSL ref: * https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions * https://github.com/Microsoft/DirectXShaderCompiler/wiki/Wave-Intrinsics CUDA ref: * https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html SPIR-V ref: * https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_memory_semantics_id Implementation is broken down into seperate glsl extensions due to the ***large differences*** in implementation of each section, and functionality/testing. GL_KHR_shader_subgroup_basic{ **Partially implemented** Implementation: * All 9 built-in variables have been stubbed without proper value; implementation is still required for these system variables; related to #411. * Functions were reimplemented despite nearly mirrored HLSL functions due to: * hlsl.meta implementations targetting workgroups rather than a warp/wave/subgroup: * `__syncwarp` vs `__syncthreads` * `SubgroupMemory` vs `WorkgroupMemory` * etc. * hlsl.meta implementations target broader SPIR-V memory targets to block on: * ImageMemory|UniformMemory versus SPIR-V specifying barriers for ImageMemory and seperately an option for UniformMemory * `subgroupElect` for CUDA has a different implementation than `WaveIsFirstLane`, this is because spec states that `subgroupElect()` only returns the lowest active gl_SubgroupInvocationID; therefore we are supposed to fetch the current active mask even if some invocations are turned off by branches Testing: tests for the variable -- `tests/glsl/shader-subgroup-built-in-variables.slang` * these tests do not test functionality since not implemented yet tests for the functions -- `tests/glsl/shader-subgroup-basic.slang` * concurrency is tested for using SubgroupMemory, UniformMemory through attempting to create a GPU side race condition with writing and reading memory * due to testing tools avaible there are no tests for ImageMemory * subgroupElect is tested to return invocation #0, the lowest invocation that will always run; wave size is 32, therefore #0 is always active and will always be the elected invocation. } GL_KHR_shader_subgroup_vote{ **Fully implemented** Implementation: * 3/3 functions are using the hlsl.meta implementation Testing: `tests/glsl/shader-subgroup-vote.slang` * Testing each a positive (returns true) and negative (returns false) test case to ensure vote results are correct } GL_KHR_shader_subgroup_ballot{ **Partially implemented** Implementation: There are 10/10 functions that are implemented: * 3 are using hlsl.meta implementation * 7 are using new implementations -- only support GLSL, SPIR-V, HLSL, CUDA * These implementations do not exist in hlsl.meta, so they were added * `subgroupInverseBallot` lacks an analog function to call; this feature was emulated: * in CUDA through knowing waves are 32bit and lanes are 0 indexed, this implys that ` (ballotResult >> YOUR_INVOCATION) & 1` checks if your invocation is active, for example, `(0b11001 >> 3) & 1` would mean that only invocation 5, 4, and 1 is active, 3 would mean `YOUR_INVOCATION` is the fourth invocation in the subgroup. `(0b11001>>3) & 1` would return true since your bit is toggled and evaluates to `0b11 & 0b1` * in HLSL through testing if the wave count is 32 or less (use the same logic as CUDA in this case); else find the index `YOUR_INVOCATION` corrisponds with where each vector has 32bits (32 waves); avoid division in the process. then run the same algorithm cuda employs. * `subgroupBallotBitExtract` is logically the same as `subgroupInverseBallot` * 5 implementations do not have a CUDA, HLSL, and CPP imlementation yet (subgroupBallotFindMSB, subgroupBallotFindLSB, subgroupBallotExclusiveBitCount, subgroupBallotInclusiveBitCount, subgroupBallotBitCount) due to being out of scope for the commit Testing: `tests/glsl/shader-subgroup-ballot.slang` * the function tests for an expected value of each ballot function; tests try inputting larger than 32 toggled bits as function parameters to ensure the implementation correctly identifies values up to a maximum of the subgroup invocation count as per extension specification (otherwise the functionality is fairly trivial to test) } GL_KHR_shader_subgroup_arithmetic{ **Partially implemented** Implementation: * There are 21 functions to implement: * 14 functions are using the hlsl.meta implementation * 7 functions are new implementations -- only implemented for GLSL and SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required * CUDA, CPP, HLSL are out of scope for the commit Testing: `tests/glsl/shader-subgroup-arithmetic.slang` * all tests silently kill the shader; outputted GLSL was checked, could not see an issue * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_shuffle{ **Partially implemented** Implementation: * There are 2 functions to implement: * 1 function is using the existing hlsl.meta implmentation * 1 function is using a new implmentation (subgroupShuffleXor) -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] * tests fail with cpp due to `kIROp_WaveGetActiveMask` failing to be called } GL_KHR_shader_subgroup_shuffle_relative{ **Partially implemented** Implementation: * There are 2 functions to implement: * all 2 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-relative.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_clustered{ **Partially implemented** Implementation: * There are 7 functions to implement: * all 7 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-clustered.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_quad{ **Partially implemented** Implementation: * There are 4 functions to implement: * all 4 functions are using hlsl.meta implmentations -- only implemented for GLSL & SPIR-V & HLSL Testing: `tests/glsl/shader-subgroup-shuffle-quad.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } --------- Failing tests and why: Note: test numbers are assuming none of the existing tests are toggled off Note: due to system variables not being implemented largly for CUDA and CPP, these tests will fail (#3 and #4){ tests/glsl/shader-subgroup-arithmetic.slang.3 tests/glsl/shader-subgroup-arithmetic.slang.4 tests/glsl/shader-subgroup-ballot.slang.4 tests/glsl/shader-subgroup-basic.slang.3 tests/glsl/shader-subgroup-basic.slang.4 tests/glsl/shader-subgroup-quad.slang.3 tests/glsl/shader-subgroup-quad.slang.4 tests/glsl/shader-subgroup-vote.slang.3 tests/glsl/shader-subgroup-vote.slang.4 } Note: due to kIROp_WaveGetActiveMask not being loaded for cpp the following test will fail{ tests/glsl/shader-subgroup-shuffle.slang.4 tests/glsl/shader-subgroup-shuffle-relative.slang.4 tests/glsl/shader-subgroup-basic.slang.4 } Note: due to a unknown silent error the following will fail [could not spot an error in the generated glsl and spir-v]{ tests/glsl/shader-subgroup-arithmetic.slang.5 (vk) tests/glsl/shader-subgroup-arithmetic.slang.6 (vk) } Other notes of worthy:{ * only a few types are checked currently in arithmetic test; this is due to the test silently failing, meaning I can't actually test anything implemented * did not implement checks for half floats * CUDA, CPP, HLSL implementations were largly out of scope and not implemented, this is due to the implementation being non trivial for many functions } Random fixes encountered:{ * hlsl.meta incorrectly sets `OpCapability` as `GroupNonUniformBallot` when the `OpCapability` should be `GroupNonUniformVote`; this is as per SPIR-V spec for all SPIR-V calls used in `GL_KHR_shader_subgroup_vote`: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformAll } * Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548 Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548 GL_KHR_shader_subgroup implemented based on https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt GL_KHR_shader_subgroup_* & GLSL ref: * https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt * https://www.khronos.org/blog/vulkan-subgroup-tutorial * https://www.khronos.org/assets/uploads/developers/library/2018-vulkan-devday/06-subgroups.pdf HLSL ref: * https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions * https://github.com/Microsoft/DirectXShaderCompiler/wiki/Wave-Intrinsics CUDA ref: * https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html SPIR-V ref: * https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_memory_semantics_id Implementation is broken down into seperate glsl extensions due to the ***large differences*** in implementation of each section, and functionality/testing. GL_KHR_shader_subgroup_basic{ **Partially implemented** Implementation: * All 9 built-in variables have been stubbed without proper value; implementation is still required for these system variables; related to #411. * Functions were reimplemented despite nearly mirrored HLSL functions due to: * hlsl.meta implementations targetting workgroups rather than a warp/wave/subgroup: * `__syncwarp` vs `__syncthreads` * `SubgroupMemory` vs `WorkgroupMemory` * etc. * hlsl.meta implementations target broader SPIR-V memory targets to block on: * ImageMemory|UniformMemory versus SPIR-V specifying barriers for ImageMemory and seperately an option for UniformMemory * `subgroupElect` for CUDA has a different implementation than `WaveIsFirstLane`, this is because spec states that `subgroupElect()` only returns the lowest active gl_SubgroupInvocationID; therefore we are supposed to fetch the current active mask even if some invocations are turned off by branches Testing: tests for the variable -- `tests/glsl/shader-subgroup-built-in-variables.slang` * these tests do not test functionality since not implemented yet tests for the functions -- `tests/glsl/shader-subgroup-basic.slang` * concurrency is tested for using SubgroupMemory, UniformMemory through attempting to create a GPU side race condition with writing and reading memory * due to testing tools avaible there are no tests for ImageMemory * subgroupElect is tested to return invocation #0, the lowest invocation that will always run; wave size is 32, therefore #0 is always active and will always be the elected invocation. } GL_KHR_shader_subgroup_vote{ **Fully implemented** Implementation: * 3/3 functions are using the hlsl.meta implementation Testing: `tests/glsl/shader-subgroup-vote.slang` * Testing each a positive (returns true) and negative (returns false) test case to ensure vote results are correct } GL_KHR_shader_subgroup_ballot{ **Partially implemented** Implementation: There are 10/10 functions that are implemented: * 3 are using hlsl.meta implementation * 7 are using new implementations -- only support GLSL, SPIR-V, HLSL, CUDA * These implementations do not exist in hlsl.meta, so they were added * `subgroupInverseBallot` lacks an analog function to call; this feature was emulated: * in CUDA through knowing waves are 32bit and lanes are 0 indexed, this implys that ` (ballotResult >> YOUR_INVOCATION) & 1` checks if your invocation is active, for example, `(0b11001 >> 3) & 1` would mean that only invocation 5, 4, and 1 is active, 3 would mean `YOUR_INVOCATION` is the fourth invocation in the subgroup. `(0b11001>>3) & 1` would return true since your bit is toggled and evaluates to `0b11 & 0b1` * in HLSL through testing if the wave count is 32 or less (use the same logic as CUDA in this case); else find the index `YOUR_INVOCATION` corrisponds with where each vector has 32bits (32 waves); avoid division in the process. then run the same algorithm cuda employs. * `subgroupBallotBitExtract` is logically the same as `subgroupInverseBallot` * 5 implementations do not have a CUDA, HLSL, and CPP imlementation yet (subgroupBallotFindMSB, subgroupBallotFindLSB, subgroupBallotExclusiveBitCount, subgroupBallotInclusiveBitCount, subgroupBallotBitCount) due to being out of scope for the commit Testing: `tests/glsl/shader-subgroup-ballot.slang` * the function tests for an expected value of each ballot function; tests try inputting larger than 32 toggled bits as function parameters to ensure the implementation correctly identifies values up to a maximum of the subgroup invocation count as per extension specification (otherwise the functionality is fairly trivial to test) } GL_KHR_shader_subgroup_arithmetic{ **Partially implemented** Implementation: * There are 21 functions to implement: * 14 functions are using the hlsl.meta implementation * 7 functions are new implementations -- only implemented for GLSL and SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required * CUDA, CPP, HLSL are out of scope for the commit Testing: `tests/glsl/shader-subgroup-arithmetic.slang` * all tests silently kill the shader; outputted GLSL was checked, could not see an issue * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_shuffle{ **Partially implemented** Implementation: * There are 2 functions to implement: * 1 function is using the existing hlsl.meta implmentation * 1 function is using a new implmentation (subgroupShuffleXor) -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] * tests fail with cpp due to `kIROp_WaveGetActiveMask` failing to be called } GL_KHR_shader_subgroup_shuffle_relative{ **Partially implemented** Implementation: * There are 2 functions to implement: * all 2 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-relative.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_clustered{ **Partially implemented** Implementation: * There are 7 functions to implement: * all 7 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-clustered.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_quad{ **Partially implemented** Implementation: * There are 4 functions to implement: * all 4 functions are using hlsl.meta implmentations -- only implemented for GLSL & SPIR-V & HLSL Testing: `tests/glsl/shader-subgroup-shuffle-quad.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } --------- Failing tests and why: Note: test numbers are assuming none of the existing tests are toggled off Note: due to system variables not being implemented largly for CUDA and CPP, these tests will fail (#3 and #4){ tests/glsl/shader-subgroup-arithmetic.slang.3 tests/glsl/shader-subgroup-arithmetic.slang.4 tests/glsl/shader-subgroup-ballot.slang.4 tests/glsl/shader-subgroup-basic.slang.3 tests/glsl/shader-subgroup-basic.slang.4 tests/glsl/shader-subgroup-quad.slang.3 tests/glsl/shader-subgroup-quad.slang.4 tests/glsl/shader-subgroup-vote.slang.3 tests/glsl/shader-subgroup-vote.slang.4 } Note: due to kIROp_WaveGetActiveMask not being loaded for cpp the following test will fail{ tests/glsl/shader-subgroup-shuffle.slang.4 tests/glsl/shader-subgroup-shuffle-relative.slang.4 tests/glsl/shader-subgroup-basic.slang.4 } Other notes of worthy:{ * added preamble function and macros for implementing subgroup functionality (and tests) to make it possible to iterate on the functionality with reasonable effort in the future * CUDA, CPP, HLSL implementations were largly out of scope and not implemented, this is due to the implementation being non trivial for many functions * doubles cause a silent crash on most subgroup functions tested (silent shader hang) * __requireGLSLExtension does not work as intended inside glsl.meta; as a result half, int16, int64 int8, all are ommited from testing } Random fixes encountered:{ * hlsl.meta incorrectly sets `OpCapability` as `GroupNonUniformBallot` when the `OpCapability` should be `GroupNonUniformVote`; this is as per SPIR-V spec for all SPIR-V calls used in `GL_KHR_shader_subgroup_vote`: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformAll * hlsl.meta incorrectly uses for WaveMaskPrefixBitOr (SPIR-V) OpGroupNonUniformBitwiseAnd intead of OpGroupNonUniformBitwiseOr; this was fixed } * redesign tests under suggestions that they should be smaller, more maintainable, and test the most amount of data reasonabley possible (balance with fast iterations); optional double testing varying parameter testing most tests chain results now * fix missing impl and merge conflict resolutions * reundant test code cleanup and organization move tests to proper location (glsl-intrinsic) clean up redundant code (input buffers) * add missing logical operands support (and remove hlsl/cuda code reuse due to the functional differences) under all And, Or, Xor ops redesign tests to conform to a better testing paradigm * testing code style change to not use white space as a toggle for tests * provided crash reason for doubles (intel iris gpu's crash in glsl with doubles due to missing support in device caps [as per vulkan validation layer) uncommented the `__requireGLSLExtension` code so once it is fixed int16/8/64/half wil work with subgroup not requiring future intervention * fixing some vk validation layer errors (OpMemoryBarrier, Shuffle operations) modified style of tests; removed redundancy (extra code that does nothing); fixed some incorrect run targets; added error reasons for all encountered problems (and if needed, a #define/#if toggle) * remove comments of important tests inplace of #define over the broken feature of extended shader_subgroup types * removed macros inside glsl.meta removed erroneous __target_switch to directly call hlsl.meta function added elaboration on the problem with __requireGLSLExtension changed WaveMaskPrefixBit[or|and|xor] to support the expected type of <int> only as per `HLSL Shader Model 6.5` specs removed "precision highp" since it does not affect tests * changes some hlsl.meta functions used to be more appropriate (as per suggested) WaveMask -> WaveActive.* WaveMaskPrefix.* -> WavePrefix.* remove __target_switch case's for unimplemented case's of intrinsics fix _getLaneId() being removed from some regex used earlier * fix usage of __target_intrinsic instead of __intrinsic_asm; silently would cause only arguments to be emmitted as return changed usage of `__requireGLSLExtension` because now it causes a crash from the missing intrinsic (instead of a silent error) * fix shader subgroup extended types support for GLSL and SPIR-V: 1. seperate intrinsic/__requireGLSL generating functionality of shader_subgroup_preamble into child function calls due to otherwise `__requireGLSLExtension` being ignored if the calling function of shader_subgroup_preamble calls an `__intrinsic_asm` 2. fixed HLSL.meta logic for wave operations (Add, Mul, exclusiveAdd, exclusiveMul) to no longer cast the input type T into a uint due to cost-of-op & crash. * Int8_t bit casted into uint32_t crashed the compiler. As per SPIR-V spec, OpGroupNonUniformI.* work on uint and int types meaning the function has no need to cast to a unit. 3. removed erroneous __target_switch for subgroupShuffle * 1. ignore tests gracefully 2. remove un-needed SPIRV capability specifying (with OpCapability) 3. clean up structure of typeRequireChecks_shader_subgroup_GLSL 4. explain why HLSL/CUDA are not targeted for shader-subgroup-arithmetic.slang * syntax changes + `property` declaration fix + builtin var glsl implementation + changed incorrect HLSL.meta assumptions (#1)`property` declaration as *non member* implementation change/fix (all of the changes to `slang-lower-to-ir.cpp`) using (#1), implemented subgroup builtin's for GLSL/SPIR-V; did not implement built'ins completly for HLSL/CUDA due to non trivial implementations. CPP has no implementation due to missing support of system values changed some incorrect HLSL.meta subgroup implementation assumptions of type usage (bit casting 8bit->32bit, wrong capabilities causing errors) dumping ast crash with spir-v when using builtin's fixed by adding the `builtin` spirv case (all of the changes to `slang-ast-dump.cpp`) [ForceInline] addition to functions missing it return instead of spirv_asm when empty blocks are used * syntax & organization of tests adjustment (specifically how if'def's are managed) * figuring out where ci fails * figuring out where ci fails -- testing with enclusive & regular * testing CI with exclusive, regular, inclusive * remove unneeded white space test CI inconsistency issues further with arithmetic.slang * testing if the ci run fails due to some timeout/recovery issue * split up arithmetic tests and push to test with CI --------- Co-authored-by: Yong He <yonghe@outlook.com>
Diffstat (limited to 'source')
-rw-r--r--source/slang/glsl.meta.slang1793
-rw-r--r--source/slang/hlsl.meta.slang151
-rw-r--r--source/slang/slang-ast-dump.cpp3
-rw-r--r--source/slang/slang-lower-to-ir.cpp13
4 files changed, 1835 insertions, 125 deletions
diff --git a/source/slang/glsl.meta.slang b/source/slang/glsl.meta.slang
index 8403d1391..824b3e3f3 100644
--- a/source/slang/glsl.meta.slang
+++ b/source/slang/glsl.meta.slang
@@ -2825,21 +2825,1794 @@ public uint rayQueryGetIntersectionTypeEXT(rayQueryEXT q, bool committed)
return 0;
}
+// TODO: implementation of built-in variables; proper tests; these are stubs
+// likley related to the following issue since GLSL adds new
+// 'system' variables: https://github.com/shader-slang/slang/issues/411
-//
-// Subgroup
-//
+__generic<T : __BuiltinType>
+[ForceInline]
+void typeRequireChecks_shader_subgroup_GLSL() {
+ // the following is a seperate function call, since else the `__requireGLSLExtension` and associated __intrinsic_asm is ignored if the calling function also calls an __intrinsic_asm
+ __target_switch
+ {
+ case glsl:
+ if (__type_equals<T, half>()
+ || __type_equals<T, float16_t>()
+ ) __requireGLSLExtension("GL_EXT_shader_subgroup_extended_types_float16");
+ else if (__type_equals<T, uint8_t>()
+ || __type_equals<T, int8_t>()
+ ) __requireGLSLExtension("GL_EXT_shader_subgroup_extended_types_int8");
+ else if (__type_equals<T, uint16_t>()
+ || __type_equals<T, int16_t>()
+ ) __requireGLSLExtension("GL_EXT_shader_subgroup_extended_types_int16");
+ else if (__type_equals<T, uint64_t>()
+ || __type_equals<T, int64_t>()
+ ) __requireGLSLExtension("GL_EXT_shader_subgroup_extended_types_int64");
+
+ __intrinsic_asm "";
+ }
+}
+
+__generic<T : __BuiltinType>
+void shader_subgroup_preamble() {
+ // checks needed for shader_subgroup functions; __requireGLSLExtension does not work
+ // (does not add the ext specified correctly to the compile output; using extended type
+ // will result in error for using the type)
+ __target_switch
+ {
+ case glsl:
+ typeRequireChecks_shader_subgroup_GLSL<T>();
+ case spirv:
+ return;
+ }
+
+}
+
+// GL_KHR_shader_subgroup_basic Built-in Variables
+
+void requireGLSLExtForSubgroupBasicBuiltin() {
+ __target_switch
+ {
+ case glsl:
+ __requireGLSLExtension("GL_KHR_shader_subgroup_basic");
+ __intrinsic_asm "";
+ }
+}
+
+__spirv_version(1.3)
+void setupExtForSubgroupBasicBuiltIn() {
+ __target_switch
+ {
+ case glsl:
+ requireGLSLExtForSubgroupBasicBuiltin();
+ case spirv:
+ return;
+ }
+}
+
+void requireGLSLExtForSubgroupBallotBuiltin() {
+ __target_switch
+ {
+ case glsl:
+ __requireGLSLExtension("GL_KHR_shader_subgroup_ballot");
+ __intrinsic_asm "";
+ }
+}
+
+__spirv_version(1.3)
+void setupExtForSubgroupBallotBuiltIn() {
+ __target_switch
+ {
+ case glsl:
+ requireGLSLExtForSubgroupBallotBuiltin();
+ case spirv:
+ return;
+ }
+}
+
+[require(glsl)]
+[require(spirv)]
+public property uint gl_NumSubgroups {
+
+ get {
+ setupExtForSubgroupBasicBuiltIn();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "(gl_NumSubgroups)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniform;
+ result:$$uint = OpLoad builtin(NumSubgroups:uint);
+ };
+ }
+
+ }
+}
+
+[require(glsl)]
+[require(spirv)]
+public property uint gl_SubgroupID
+{
+ get {
+ setupExtForSubgroupBasicBuiltIn();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "(gl_SubgroupID)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniform;
+ result:$$uint = OpLoad builtin(SubgroupId:uint);
+ };
+ }
+ }
+}
+
+[require(glsl)]
+[require(spirv)]
+public property uint gl_SubgroupSize
+{
+ get {
+ setupExtForSubgroupBasicBuiltIn();
+ return WaveGetLaneCount();
+ }
+}
+
+[require(glsl)]
+[require(spirv)]
+public property uint gl_SubgroupInvocationID
+{
+ get {
+ setupExtForSubgroupBasicBuiltIn();
+ return WaveGetLaneIndex();
+ }
+}
+
+[require(glsl)]
+[require(spirv)]
+public property uvec4 gl_SubgroupEqMask
+{
+ get {
+ setupExtForSubgroupBasicBuiltIn();
+ setupExtForSubgroupBallotBuiltIn();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "(gl_SubgroupEqMask)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformBallot;
+ result:$$uvec4 = OpLoad builtin(SubgroupEqMask:uvec4);
+ };
+ }
+ }
+}
+
+[require(glsl)]
+[require(spirv)]
+public property uvec4 gl_SubgroupGeMask
+{
+ get {
+ setupExtForSubgroupBasicBuiltIn();
+ setupExtForSubgroupBallotBuiltIn();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "(gl_SubgroupGeMask)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformBallot;
+ result:$$uvec4 = OpLoad builtin(SubgroupGeMask:uvec4);
+ };
+ }
+ }
+}
+
+[require(glsl)]
+[require(spirv)]
+public property uvec4 gl_SubgroupGtMask
+{
+ get {
+ setupExtForSubgroupBasicBuiltIn();
+ setupExtForSubgroupBallotBuiltIn();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "(gl_SubgroupGtMask)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformBallot;
+ result:$$uvec4 = OpLoad builtin(SubgroupGtMask:uvec4);
+ };
+ }
+ }
+}
+
+[require(glsl)]
+[require(spirv)]
+public property uvec4 gl_SubgroupLeMask
+{
+ get {
+ setupExtForSubgroupBasicBuiltIn();
+ setupExtForSubgroupBallotBuiltIn();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "(gl_SubgroupLeMask)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformBallot;
+ result:$$uvec4 = OpLoad builtin(SubgroupLeMask:uvec4);
+ };
+ }
+ }
+}
+
+[require(glsl)]
+[require(spirv)]
+public property uvec4 gl_SubgroupLtMask
+{
+ get {
+ setupExtForSubgroupBasicBuiltIn();
+ setupExtForSubgroupBallotBuiltIn();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "(gl_SubgroupLtMask)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformBallot;
+ result:$$uvec4 = OpLoad builtin(SubgroupLtMask:uvec4);
+ };
+ }
+ }
+}
+
+// GL_KHR_shader_subgroup_basic
+
+__glsl_extension(GL_KHR_shader_subgroup_basic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public void subgroupBarrier()
+{
+ __target_switch
+ {
+ case cuda:
+ __intrinsic_asm "__syncwarp()";
+ case hlsl:
+ __intrinsic_asm "AllMemoryBarrierWithGroupSync()";
+ case glsl:
+ __intrinsic_asm "subgroupBarrier()";
+ case spirv:
+ spirv_asm {
+ OpCapability Shader;
+ OpControlBarrier Subgroup Subgroup AcquireRelease|SubgroupMemory|ImageMemory|UniformMemory
+ };
+
+ }
+}
+
+__glsl_extension(GL_KHR_shader_subgroup_basic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public void subgroupMemoryBarrier()
+{
+ __target_switch
+ {
+ case cuda:
+ __intrinsic_asm "__threadfence_block()";
+ case hlsl:
+ __intrinsic_asm "AllMemoryBarrier()";
+ case glsl:
+ __intrinsic_asm "subgroupMemoryBarrier()";
+ case spirv:
+ spirv_asm {
+ OpCapability Shader;
+ OpMemoryBarrier Subgroup AcquireRelease|SubgroupMemory|ImageMemory|UniformMemory
+ };
+
+ }
+}
+
+__glsl_extension(GL_KHR_shader_subgroup_basic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public void subgroupMemoryBarrierBuffer()
+{
+ // the following implementation is NOT the same as DeviceMemoryBarrier
+ // HLSL lacks the same granularity of blocking on subgroup memory within a subgroup
+ __target_switch
+ {
+ case cuda:
+ __intrinsic_asm "__threadfence_block()";
+ case hlsl:
+ __intrinsic_asm "DeviceMemoryBarrier()";
+ case glsl:
+ __intrinsic_asm "subgroupMemoryBarrierBuffer()";
+ case spirv:
+ spirv_asm {
+ OpCapability Shader;
+ OpMemoryBarrier Subgroup AcquireRelease|UniformMemory
+ };
+
+ }
+}
+
+__glsl_extension(GL_KHR_shader_subgroup_basic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public void subgroupMemoryBarrierImage()
+{
+ __target_switch
+ {
+ case cuda:
+ __intrinsic_asm "__threadfence_block()";
+ case hlsl:
+ __intrinsic_asm "DeviceMemoryBarrier()";
+ case glsl:
+ __intrinsic_asm "subgroupMemoryBarrierImage()";
+ case spirv:
+ spirv_asm {
+ OpMemoryBarrier Subgroup AcquireRelease|ImageMemory
+ };
+
+ }
+}
+
+__glsl_extension(GL_KHR_shader_subgroup_basic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public void subgroupMemoryBarrierShared()
+{
+ __target_switch
+ {
+ case cuda:
+ __intrinsic_asm "__threadfence_block()";
+ case hlsl:
+ __intrinsic_asm "GroupMemoryBarrier()";
+ case glsl:
+ __intrinsic_asm "subgroupMemoryBarrierShared()";
+ case spirv:
+ spirv_asm {
+ // SubgroupMemory triggers vulkan validation layer error;
+ // WorkgroupMemory is the next level of granularity
+ OpMemoryBarrier Subgroup AcquireRelease|WorkgroupMemory
+ };
+
+ }
+}
+
+__glsl_extension(GL_KHR_shader_subgroup_basic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public bool subgroupElect()
+{
+ __target_switch
+ {
+ case cuda:
+ __intrinsic_asm "( (__activemask() & (__activemask()*-1)) == _getLaneId())";
+ case glsl:
+ case spirv:
+ case hlsl:
+ return WaveIsFirstLane();
+
+ }
+}
+
+// GL_KHR_shader_subgroup_vote
+
+__glsl_extension(GL_KHR_shader_subgroup_vote) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public bool subgroupAll(bool value)
+{
+
+ return WaveActiveAllTrue(value);
+
+}
+
+__glsl_extension(GL_KHR_shader_subgroup_vote) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public bool subgroupAny(bool value)
+{
+ return WaveActiveAnyTrue(value);
+
+}
+
+__generic<T : __BuiltinType>
+__glsl_extension(GL_KHR_shader_subgroup_vote) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public bool subgroupAllEqual(T value)
+{
+ shader_subgroup_preamble<T>();
+ return WaveActiveAllEqual(value);
+}
+
+__generic<T : __BuiltinType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_vote) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public bool subgroupAllEqual(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ return WaveActiveAllEqual(value);
+}
+
+// GL_KHR_shader_subgroup_arithmetic
+
+__generic<T : __BuiltinArithmeticType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupAdd(T value)
+{
+ shader_subgroup_preamble<T>();
+ return WaveActiveSum(value);
+}
+
+__generic<T : __BuiltinArithmeticType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupMul(T value)
+{
+ shader_subgroup_preamble<T>();
+ return WaveActiveProduct(value);
+}
+
+__generic<T : __BuiltinArithmeticType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupMin(T value)
+{
+ shader_subgroup_preamble<T>();
+ return WaveActiveMin(value);
+}
+
+__generic<T : __BuiltinArithmeticType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupMax(T value)
+{
+ shader_subgroup_preamble<T>();
+ return WaveActiveMax(value);
+}
+
+__generic<T : __BuiltinLogicalType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupAnd(T value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl: __intrinsic_asm "subgroupAnd($0)";
+ case spirv:
+ if (__isBool<T>()) {
+ return spirv_asm {
+ OpCapability GroupNonUniformArithmetic;
+ OpGroupNonUniformLogicalAnd $$T result Subgroup 0 $value
+ };
+ }
+ else {
+ return spirv_asm {
+ OpCapability GroupNonUniformArithmetic;
+ OpGroupNonUniformBitwiseAnd $$T result Subgroup 0 $value
+ };
+ }
+ }
+}
+
+__generic<T : __BuiltinLogicalType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupOr(T value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl: __intrinsic_asm "subgroupOr($0)";
+ case spirv:
+ if (__isBool<T>()) {
+ return spirv_asm {
+ OpCapability GroupNonUniformArithmetic;
+ OpGroupNonUniformLogicalOr $$T result Subgroup 0 $value
+ };
+ }
+ else {
+ return spirv_asm {
+ OpCapability GroupNonUniformArithmetic;
+ OpGroupNonUniformBitwiseOr $$T result Subgroup 0 $value
+ };
+ }
+ }
+}
+
+__generic<T : __BuiltinLogicalType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupXor(T value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl: __intrinsic_asm "subgroupXor($0)";
+ case spirv:
+ if (__isBool<T>()) {
+ return spirv_asm {
+ OpCapability GroupNonUniformArithmetic;
+ OpGroupNonUniformLogicalXor $$T result Subgroup 0 $value
+ };
+ }
+ else {
+ return spirv_asm {
+ OpCapability GroupNonUniformArithmetic;
+ OpGroupNonUniformBitwiseXor $$T result Subgroup 0 $value
+ };
+ }
+ }
+}
+
+__generic<T : __BuiltinArithmeticType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupInclusiveAdd(T value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupInclusiveAdd($0)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFAdd $$T result Subgroup InclusiveScan $value};
+ else if (__isInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformIAdd $$T result Subgroup InclusiveScan $value};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinArithmeticType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupInclusiveMul(T value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupInclusiveMul($0)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFMul $$T result Subgroup InclusiveScan $value};
+ else if (__isInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformIMul $$T result Subgroup InclusiveScan $value};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinArithmeticType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupInclusiveMin(T value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupInclusiveMin($0)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFMin $$T result Subgroup InclusiveScan $value};
+ else if (__isSignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformSMin $$T result Subgroup InclusiveScan $value};
+ else if (__isUnsignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformUMin $$T result Subgroup InclusiveScan $value};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinArithmeticType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupInclusiveMax(T value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupInclusiveMax($0)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFMax $$T result Subgroup InclusiveScan $value};
+ else if (__isSignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformSMax $$T result Subgroup InclusiveScan $value};
+ else if (__isUnsignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformUMax $$T result Subgroup InclusiveScan $value};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinLogicalType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupInclusiveAnd(T value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl: __intrinsic_asm "subgroupInclusiveAnd($0)";
+ case spirv:
+ if (__isBool<T>()) {
+ return spirv_asm {
+ OpCapability GroupNonUniformArithmetic;
+ OpGroupNonUniformLogicalAnd $$T result Subgroup InclusiveScan $value
+ };
+ }
+ else {
+ return spirv_asm {
+ OpCapability GroupNonUniformArithmetic;
+ OpGroupNonUniformBitwiseAnd $$T result Subgroup InclusiveScan $value
+ };
+ }
+ }
+}
+
+__generic<T : __BuiltinLogicalType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupInclusiveOr(T value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl: __intrinsic_asm "subgroupInclusiveOr($0)";
+ case spirv:
+ if (__isBool<T>()) {
+ return spirv_asm {
+ OpCapability GroupNonUniformArithmetic;
+ OpGroupNonUniformLogicalOr $$T result Subgroup InclusiveScan $value
+ };
+ }
+ else {
+ return spirv_asm {
+ OpCapability GroupNonUniformArithmetic;
+ OpGroupNonUniformBitwiseOr $$T result Subgroup InclusiveScan $value
+ };
+ }
+ }
+}
-__glsl_extension(KHR_shader_subgroup)
-__glsl_version(450)
-public void subgroupBarrier()
+__generic<T : __BuiltinLogicalType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupInclusiveXor(T value)
{
- //__subgroupBarrier();
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupInclusiveXor($0)";
+ case spirv:
+ if (__isBool<T>()) return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformLogicalXor $$T result Subgroup InclusiveScan $value};
+ else return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformBitwiseXor $$T result Subgroup InclusiveScan $value};
+ }
+ return T(0);
}
-__glsl_extension(KHR_shader_subgroup)
-__glsl_version(450)
-public void subgroupMemoryBarrier()
+__generic<T : __BuiltinArithmeticType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupExclusiveAdd(T value)
{
+ shader_subgroup_preamble<T>();
+ return WavePrefixSum(value);
}
+
+__generic<T : __BuiltinArithmeticType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupExclusiveMul(T value)
+{
+ shader_subgroup_preamble<T>();
+ return WavePrefixProduct(value);
+}
+
+__generic<T : __BuiltinArithmeticType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupExclusiveMin(T value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupExclusiveMin($0)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFMin $$T result Subgroup ExclusiveScan $value};
+ else if (__isSignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformSMin $$T result Subgroup ExclusiveScan $value};
+ else if (__isUnsignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformUMin $$T result Subgroup ExclusiveScan $value};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinArithmeticType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupExclusiveMax(T value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupExclusiveMax($0)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFMax $$T result Subgroup ExclusiveScan $value};
+ else if (__isSignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformSMax $$T result Subgroup ExclusiveScan $value};
+ else if (__isUnsignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformUMax $$T result Subgroup ExclusiveScan $value};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinLogicalType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupExclusiveAnd(T value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl: __intrinsic_asm "subgroupExclusiveAnd($0)";
+ case spirv:
+ if (__isBool<T>()) return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformLogicalAnd $$T result Subgroup ExclusiveScan $value};
+ else return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformBitwiseAnd $$T result Subgroup ExclusiveScan $value};
+ }
+}
+
+__generic<T : __BuiltinLogicalType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupExclusiveOr(T value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl: __intrinsic_asm "subgroupExclusiveOr($0)";
+ case spirv:
+ if (__isBool<T>()) return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformLogicalOr $$T result Subgroup ExclusiveScan $value};
+ else return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformBitwiseOr $$T result Subgroup ExclusiveScan $value};
+ }
+}
+
+__generic<T : __BuiltinLogicalType>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupExclusiveXor(T value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl: __intrinsic_asm "subgroupExclusiveXor($0)";
+ case spirv:
+ if (__isBool<T>()) return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformLogicalXor $$T result Subgroup ExclusiveScan $value};
+ else return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformBitwiseXor $$T result Subgroup ExclusiveScan $value};
+ }
+}
+
+// GL_KHR_shader_subgroup_arithmetic
+//note: this is a seperate section because it is so huge that the only reasonable way to implement this is to just regex replace code
+
+__generic<T : __BuiltinArithmeticType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupAdd(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ return WaveActiveSum(value);
+}
+
+__generic<T : __BuiltinArithmeticType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupMul(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ return WaveActiveProduct(value);
+}
+
+__generic<T : __BuiltinArithmeticType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupMin(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ return WaveActiveMin(value);
+}
+
+__generic<T : __BuiltinArithmeticType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupMax(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ return WaveActiveMax(value);
+}
+
+__generic<T : __BuiltinLogicalType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupAnd(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl: __intrinsic_asm "subgroupAnd($0)";
+ case spirv:
+ if (__isBool<T>()) {
+ return spirv_asm {
+ OpCapability GroupNonUniformArithmetic;
+ OpGroupNonUniformLogicalAnd $$vector<T,N> result Subgroup 0 $value
+ };
+ }
+ else {
+ return spirv_asm {
+ OpCapability GroupNonUniformArithmetic;
+ OpGroupNonUniformBitwiseAnd $$vector<T,N> result Subgroup 0 $value
+ };
+ }
+
+ }
+}
+
+__generic<T : __BuiltinLogicalType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupOr(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl: __intrinsic_asm "subgroupOr($0)";
+ case spirv:
+ if (__isBool<T>()) {
+ return spirv_asm {
+ OpCapability GroupNonUniformArithmetic;
+ OpGroupNonUniformLogicalOr $$vector<T,N> result Subgroup 0 $value
+ };
+ }
+ else {
+ return spirv_asm {
+ OpCapability GroupNonUniformArithmetic;
+ OpGroupNonUniformBitwiseOr $$vector<T,N> result Subgroup 0 $value
+ };
+ }
+
+ }
+}
+
+__generic<T : __BuiltinLogicalType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupXor(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl: __intrinsic_asm "subgroupXor($0)";
+ case spirv:
+ if (__isBool<T>()) {
+ return spirv_asm {
+ OpCapability GroupNonUniformArithmetic;
+ OpGroupNonUniformLogicalXor $$vector<T,N> result Subgroup 0 $value
+ };
+ }
+ else {
+ return spirv_asm {
+ OpCapability GroupNonUniformArithmetic;
+ OpGroupNonUniformBitwiseXor $$vector<T,N> result Subgroup 0 $value
+ };
+ }
+ }
+}
+
+__generic<T : __BuiltinArithmeticType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupInclusiveAdd(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupInclusiveAdd($0)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFAdd $$vector<T,N> result Subgroup InclusiveScan $value};
+ else if (__isInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformIAdd $$vector<T,N> result Subgroup InclusiveScan $value};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinArithmeticType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupInclusiveMul(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupInclusiveMul($0)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFMul $$vector<T,N> result Subgroup InclusiveScan $value};
+ else if (__isInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformIMul $$vector<T,N> result Subgroup InclusiveScan $value};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinArithmeticType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupInclusiveMin(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupInclusiveMin($0)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFMin $$vector<T,N> result Subgroup InclusiveScan $value};
+ else if (__isSignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformSMin $$vector<T,N> result Subgroup InclusiveScan $value};
+ else if (__isUnsignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformUMin $$vector<T,N> result Subgroup InclusiveScan $value};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinArithmeticType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupInclusiveMax(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupInclusiveMax($0)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFMax $$vector<T,N> result Subgroup InclusiveScan $value};
+ else if (__isSignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformSMax $$vector<T,N> result Subgroup InclusiveScan $value};
+ else if (__isUnsignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformUMax $$vector<T,N> result Subgroup InclusiveScan $value};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinLogicalType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupInclusiveAnd(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupInclusiveAnd($0)";
+ case spirv:
+ if (__isBool<T>()) return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformLogicalAnd $$vector<T,N> result Subgroup InclusiveScan $value};
+ else return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformBitwiseAnd $$vector<T,N> result Subgroup InclusiveScan $value};
+ }
+}
+
+__generic<T : __BuiltinLogicalType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupInclusiveOr(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupInclusiveOr($0)";
+ case spirv:
+ if (__isBool<T>()) return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformLogicalOr $$vector<T,N> result Subgroup InclusiveScan $value};
+ else return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformBitwiseOr $$vector<T,N> result Subgroup InclusiveScan $value};
+ }
+}
+
+__generic<T : __BuiltinLogicalType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupInclusiveXor(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupInclusiveXor($0)";
+ case spirv:
+ if (__isBool<T>()) return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformLogicalXor $$vector<T,N> result Subgroup InclusiveScan $value};
+ else return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformBitwiseXor $$vector<T,N> result Subgroup InclusiveScan $value};
+ }
+}
+
+__generic<T : __BuiltinArithmeticType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupExclusiveAdd(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ return WavePrefixSum(value);
+}
+
+
+__generic<T : __BuiltinArithmeticType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupExclusiveMul(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ return WavePrefixProduct(value);
+}
+
+__generic<T : __BuiltinArithmeticType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupExclusiveMin(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupExclusiveMin($0)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFMin $$vector<T,N> result Subgroup ExclusiveScan $value};
+ else if (__isSignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformSMin $$vector<T,N> result Subgroup ExclusiveScan $value};
+ else if (__isUnsignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformUMin $$vector<T,N> result Subgroup ExclusiveScan $value};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinArithmeticType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupExclusiveMax(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupExclusiveMax($0)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFMax $$vector<T,N> result Subgroup ExclusiveScan $value};
+ else if (__isSignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformSMax $$vector<T,N> result Subgroup ExclusiveScan $value};
+ else if (__isUnsignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformUMax $$vector<T,N> result Subgroup ExclusiveScan $value};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinLogicalType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupExclusiveAnd(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl: __intrinsic_asm "subgroupExclusiveAnd($0)";
+ case spirv:
+ if (__isBool<T>()) return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformLogicalAnd $$vector<T,N> result Subgroup ExclusiveScan $value};
+ else return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformBitwiseAnd $$vector<T,N> result Subgroup ExclusiveScan $value};
+ }
+}
+
+__generic<T : __BuiltinLogicalType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupExclusiveOr(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl: __intrinsic_asm "subgroupExclusiveOr($0)";
+ case spirv:
+ if (__isBool<T>()) return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformLogicalOr $$vector<T,N> result Subgroup ExclusiveScan $value};
+ else return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformBitwiseOr $$vector<T,N> result Subgroup ExclusiveScan $value};
+ }
+}
+
+__generic<T : __BuiltinLogicalType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_arithmetic) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupExclusiveXor(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl: __intrinsic_asm "subgroupExclusiveXor($0)";
+ case spirv:
+ if (__isBool<T>()) return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformLogicalXor $$vector<T,N> result Subgroup ExclusiveScan $value};
+ else return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformBitwiseXor $$vector<T,N> result Subgroup ExclusiveScan $value};
+ }
+}
+
+// GL_KHR_shader_subgroup_ballot
+
+__generic<T : __BuiltinType>
+__glsl_extension(GL_KHR_shader_subgroup_ballot) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupBroadcast(T value, uint id)
+{
+ shader_subgroup_preamble<T>();
+ return WaveMaskBroadcastLaneAt(WaveGetActiveMask(), value, id);
+}
+
+__generic<T : __BuiltinType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_ballot) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupBroadcast(vector<T,N> value, uint id)
+{
+ shader_subgroup_preamble<T>();
+ return WaveMaskBroadcastLaneAt(WaveGetActiveMask(), value, id);
+}
+
+__generic<T : __BuiltinType>
+__glsl_extension(GL_KHR_shader_subgroup_ballot) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupBroadcastFirst(T value)
+{
+ shader_subgroup_preamble<T>();
+ return WaveMaskReadLaneFirst(WaveGetActiveMask(), value);
+}
+
+__generic<T : __BuiltinType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_ballot) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupBroadcastFirst(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ return WaveMaskReadLaneFirst(WaveGetActiveMask(), value);
+}
+
+// WaveMaskBallot is not the same; it force trunc's
+__glsl_extension(GL_KHR_shader_subgroup_ballot) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public uvec4 subgroupBallot(bool value)
+{
+ return WaveActiveBallot(value);
+}
+
+// logic for HLSL and CUDA which lack InverseBalloc
+// CUDA: works exclusivly 32 waves, therefore only need comp x
+// HLSL:{
+// 1. index into comp I want: index = trunc(float(lane)*(1/32))
+// 2. lane & value[index]
+// note: 1/32 wil be converted to multiplication
+// we do 1/32 since 1 uint stores 32 threads
+// note 2: we have a waveLaneCount check because based on wave lane count we can determine if we can do a
+// fast path or slow path (know index is 0 or non 0)
+// }
+__glsl_extension(GL_KHR_shader_subgroup_ballot) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public bool subgroupInverseBallot(uvec4 value)
+{
+ __target_switch
+ {
+ case cuda:
+ // only has 32 warps
+ __intrinsic_asm "(($0).x >> (_getLaneId()) & 1)";
+ case hlsl:
+ // much like _WaveCountBits, but here we hope that we hit case 0; we can then avoid the expensive logic
+ const uint waveLaneCount = WaveGetLaneCount();
+ switch ((waveLaneCount - 1) / 32)
+ {
+ case 0:
+ __intrinsic_asm "(($0)[0] >> WaveGetLaneIndex()) & 1)";
+ case 1:
+ case 2:
+ case 3:
+ __intrinsic_asm "((($0)[uint(float(WaveGetLaneIndex())*0.03125f)] >> WaveGetLaneIndex()) & 1)";
+ }
+ case glsl:
+ __intrinsic_asm "subgroupInverseBallot($0)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformBallot;
+ OpGroupNonUniformInverseBallot $$bool result Subgroup $value
+ };
+ }
+ return false;
+}
+
+// same logic as subgroupInverseBallot
+__glsl_extension(GL_KHR_shader_subgroup_ballot) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public bool subgroupBallotBitExtract(uvec4 value, uint index)
+{
+ __target_switch
+ {
+ case cuda:
+ __intrinsic_asm "($1 & ($0).x) != 0";
+ case hlsl:
+ const uint waveLaneCount = WaveGetLaneCount();
+ switch ((waveLaneCount - 1) / 32)
+ {
+ case 0:
+ __intrinsic_asm "($0)[0] & ($1)";
+ case 1:
+ case 2:
+ case 3:
+ __intrinsic_asm "($0)[uint(float($1)*0.03125f)] & ($1)";
+ }
+ case glsl:
+ __intrinsic_asm "subgroupBallotBitExtract($0, $1)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformBallot;
+ OpGroupNonUniformBallotBitExtract $$bool result Subgroup $value $index
+ };
+ }
+ return false;
+}
+
+
+// the count is only supposed to use uvec4 values within bottom bits of subgroup launched, not a simple countbits
+__glsl_extension(GL_KHR_shader_subgroup_ballot) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public uint subgroupBallotBitCount(uvec4 value)
+{
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupBallotBitCount($0)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformBallot;
+ OpGroupNonUniformBallotBitCount $$uint result Subgroup Reduce $value
+ };
+ }
+}
+
+__glsl_extension(GL_KHR_shader_subgroup_ballot) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public uint subgroupBallotInclusiveBitCount(uvec4 value)
+{
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupBallotInclusiveBitCount($0)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformBallot;
+ OpGroupNonUniformBallotBitCount $$uint result Subgroup InclusiveScan $value
+ };
+ }
+}
+
+__glsl_extension(GL_KHR_shader_subgroup_ballot) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public uint subgroupBallotExclusiveBitCount(uvec4 value)
+{
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupBallotExclusiveBitCount($0)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformBallot;
+ OpGroupNonUniformBallotBitCount $$uint result Subgroup ExclusiveScan $value
+ };
+ }
+}
+
+__glsl_extension(GL_KHR_shader_subgroup_ballot) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public uint subgroupBallotFindLSB(uvec4 value)
+{
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupBallotFindLSB($0)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformBallot;
+ OpGroupNonUniformBallotFindLSB $$uint result Subgroup $value
+ };
+ }
+}
+
+__glsl_extension(GL_KHR_shader_subgroup_ballot) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public uint subgroupBallotFindMSB(uvec4 value)
+{
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupBallotFindMSB($0)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformBallot;
+ OpGroupNonUniformBallotFindMSB $$uint result Subgroup $value
+ };
+ }
+}
+
+// GL_KHR_shader_subgroup_shuffle
+
+__generic<T : __BuiltinType>
+__glsl_extension(GL_KHR_shader_subgroup_shuffle) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupShuffle(T value, uint index)
+{
+ shader_subgroup_preamble<T>();
+ return WaveShuffle(value, index);
+}
+
+__generic<T : __BuiltinType>
+__glsl_extension(GL_KHR_shader_subgroup_shuffle) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupShuffleXor(T value, uint mask)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupShuffleXor($0,$1)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformBallot;
+ OpGroupNonUniformShuffleXor $$T result Subgroup $value $mask
+ };
+ }
+}
+
+__generic<T : __BuiltinType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_shuffle) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupShuffle(vector<T,N> value, uint index)
+{
+ shader_subgroup_preamble<T>();
+ return WaveShuffle(value, index);
+}
+
+__generic<T : __BuiltinType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_shuffle) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupShuffleXor(vector<T,N> value, uint mask)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupShuffleXor($0,$1)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformBallot;
+ OpGroupNonUniformShuffleXor $$vector<T,N> result Subgroup $value $mask
+ };
+ }
+}
+
+
+// GL_KHR_shader_subgroup_shuffle_relative
+
+__generic<T : __BuiltinType>
+__glsl_extension(GL_KHR_shader_subgroup_shuffle_relative) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupShuffleUp(T value, uint delta)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupShuffleUp($0, $1)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformShuffleRelative;
+ OpGroupNonUniformShuffleUp $$T result Subgroup $value $delta
+ };
+ }
+}
+
+__generic<T : __BuiltinType>
+__glsl_extension(GL_KHR_shader_subgroup_shuffle_relative) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupShuffleDown(T value, uint delta)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupShuffleDown($0, $1)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformShuffleRelative;
+ OpGroupNonUniformShuffleDown $$T result Subgroup $value $delta
+ };
+ }
+}
+
+
+__generic<T : __BuiltinType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_shuffle_relative) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupShuffleUp(vector<T,N> value, uint delta)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupShuffleUp($0, $1)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformShuffleRelative;
+ OpGroupNonUniformShuffleUp $$vector<T,N> result Subgroup $value $delta
+ };
+ }
+}
+
+__generic<T : __BuiltinType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_shuffle_relative) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupShuffleDown(vector<T,N> value, uint delta)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupShuffleDown($0, $1)";
+ case spirv:
+ return spirv_asm {
+ OpCapability GroupNonUniformShuffleRelative;
+ OpGroupNonUniformShuffleDown $$vector<T,N> result Subgroup $value $delta
+ };
+ }
+}
+// GL_KHR_shader_subgroup_clustered
+
+__generic<T : __BuiltinArithmeticType>
+__glsl_extension(GL_KHR_shader_subgroup_clustered) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupClusteredAdd(T value, uint clusterSize)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupClusteredAdd($0, $1)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformFAdd $$T result Subgroup ClusteredReduce $value $clusterSize};
+ else if (__isInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformIAdd $$T result Subgroup ClusteredReduce $value $clusterSize};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinArithmeticType>
+__glsl_extension(GL_KHR_shader_subgroup_clustered) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupClusteredMul(T value, uint clusterSize)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupClusteredMul($0, $1)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformFMul $$T result Subgroup ClusteredReduce $value $clusterSize};
+ else if (__isInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformIMul $$T result Subgroup ClusteredReduce $value $clusterSize};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinArithmeticType>
+__glsl_extension(GL_KHR_shader_subgroup_clustered) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupClusteredMin(T value, uint clusterSize)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupClusteredMin($0, $1)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformFMin $$T result Subgroup ClusteredReduce $value $clusterSize};
+ else if (__isSignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformSMin $$T result Subgroup ClusteredReduce $value $clusterSize};
+ else if (__isUnsignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformUMin $$T result Subgroup ClusteredReduce $value $clusterSize};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinArithmeticType>
+__glsl_extension(GL_KHR_shader_subgroup_clustered) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupClusteredMax(T value, uint clusterSize)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupClusteredMax($0, $1)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformFMax $$T result Subgroup ClusteredReduce $value $clusterSize};
+ else if (__isSignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformSMax $$T result Subgroup ClusteredReduce $value $clusterSize};
+ else if (__isUnsignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformUMax $$T result Subgroup ClusteredReduce $value $clusterSize};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinLogicalType>
+__glsl_extension(GL_KHR_shader_subgroup_clustered) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupClusteredAnd(T value, uint clusterSize)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupClusteredAnd($0, $1)";
+ case spirv:
+ if (__isBool<T>()) return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformLogicalAnd $$T result Subgroup ClusteredReduce $value $clusterSize};
+ else return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformBitwiseAnd $$T result Subgroup ClusteredReduce $value $clusterSize};
+ }
+}
+
+__generic<T : __BuiltinLogicalType>
+__glsl_extension(GL_KHR_shader_subgroup_clustered) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupClusteredOr(T value, uint clusterSize)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupClusteredOr($0, $1)";
+ case spirv:
+ if (__isBool<T>()) return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformLogicalOr $$T result Subgroup ClusteredReduce $value $clusterSize};
+ else return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformBitwiseOr $$T result Subgroup ClusteredReduce $value $clusterSize};
+ }
+}
+
+
+
+__generic<T : __BuiltinLogicalType>
+__glsl_extension(GL_KHR_shader_subgroup_clustered) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupClusteredXor(T value, uint clusterSize)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupClusteredXor($0, $1)";
+ case spirv:
+ if (__isBool<T>()) return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformLogicalXor $$T result Subgroup ClusteredReduce $value $clusterSize};
+ else return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformBitwiseXor $$T result Subgroup ClusteredReduce $value $clusterSize};
+ }
+}
+
+
+
+__generic<T : __BuiltinArithmeticType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_clustered) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupClusteredAdd(vector<T,N> value, uint clusterSize)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupClusteredAdd($0, $1)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered;
+ OpGroupNonUniformFAdd $$vector<T,N> result Subgroup ClusteredReduce $value $clusterSize};
+ else if (__isInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformIAdd $$vector<T,N> result Subgroup ClusteredReduce $value $clusterSize};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinArithmeticType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_clustered) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupClusteredMul(vector<T,N> value, uint clusterSize)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupClusteredMul($0, $1)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformFMul $$vector<T,N> result Subgroup ClusteredReduce $value $clusterSize};
+ else if (__isInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformIMul $$vector<T,N> result Subgroup ClusteredReduce $value $clusterSize};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinArithmeticType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_clustered) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupClusteredMin(vector<T,N> value, uint clusterSize)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupClusteredMin($0, $1)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformFMin $$vector<T,N> result Subgroup ClusteredReduce $value $clusterSize};
+ else if (__isSignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformSMin $$vector<T,N> result Subgroup ClusteredReduce $value $clusterSize};
+ else if (__isUnsignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformUMin $$vector<T,N> result Subgroup ClusteredReduce $value $clusterSize};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinArithmeticType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_clustered) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupClusteredMax(vector<T,N> value, uint clusterSize)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupClusteredMax($0, $1)";
+ case spirv:
+ if (__isFloat<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformFMax $$vector<T,N> result Subgroup ClusteredReduce $value $clusterSize};
+ else if (__isSignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformSMax $$vector<T,N> result Subgroup ClusteredReduce $value $clusterSize};
+ else if (__isUnsignedInt<T>())
+ return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformUMax $$vector<T,N> result Subgroup ClusteredReduce $value $clusterSize};
+ else return value;
+ }
+}
+
+__generic<T : __BuiltinLogicalType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_clustered) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupClusteredAnd(vector<T,N> value, uint clusterSize)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupClusteredAnd($0, $1)";
+ case spirv:
+ if (__isBool<T>()) return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformLogicalAnd $$vector<T,N> result Subgroup ClusteredReduce $value $clusterSize};
+ else return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformBitwiseAnd $$vector<T,N> result Subgroup ClusteredReduce $value $clusterSize};
+ }
+}
+
+__generic<T : __BuiltinLogicalType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_clustered) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupClusteredOr(vector<T,N> value, uint clusterSize)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupClusteredOr($0, $1)";
+ case spirv:
+ if (__isBool<T>()) return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformLogicalOr $$vector<T,N> result Subgroup ClusteredReduce $value $clusterSize};
+ else return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformBitwiseOr $$vector<T,N> result Subgroup ClusteredReduce $value $clusterSize};
+ }
+}
+
+__generic<T : __BuiltinLogicalType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_clustered) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupClusteredXor(vector<T,N> value, uint clusterSize)
+{
+ shader_subgroup_preamble<T>();
+ __target_switch
+ {
+ case glsl:
+ __intrinsic_asm "subgroupClusteredXor($0, $1)";
+ case spirv:
+ if (__isBool<T>()) return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformLogicalXor $$vector<T,N> result Subgroup ClusteredReduce $value $clusterSize};
+ else return spirv_asm {OpCapability GroupNonUniformArithmetic; OpCapability GroupNonUniformClustered; OpGroupNonUniformBitwiseXor $$vector<T,N> result Subgroup ClusteredReduce $value $clusterSize};
+ }
+}
+
+// GL_KHR_shader_subgroup_quad
+
+__generic<T : __BuiltinType>
+__glsl_extension(GL_KHR_shader_subgroup_quad) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupQuadBroadcast(T value, uint id)
+{
+ shader_subgroup_preamble<T>();
+ return QuadReadLaneAt(value, id);
+}
+
+__generic<T : __BuiltinType>
+__glsl_extension(GL_KHR_shader_subgroup_quad) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupQuadSwapHorizontal(T value)
+{
+ shader_subgroup_preamble<T>();
+ return QuadReadAcrossX(value);
+}
+
+__generic<T : __BuiltinType>
+__glsl_extension(GL_KHR_shader_subgroup_quad) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupQuadSwapVertical(T value)
+{
+ shader_subgroup_preamble<T>();
+ return QuadReadAcrossY(value);
+}
+
+__generic<T : __BuiltinType>
+__glsl_extension(GL_KHR_shader_subgroup_quad) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public T subgroupQuadSwapDiagonal(T value)
+{
+ shader_subgroup_preamble<T>();
+ return QuadReadAcrossDiagonal(value);
+}
+
+
+__generic<T : __BuiltinType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_quad) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupQuadBroadcast(vector<T,N> value, uint id)
+{
+ shader_subgroup_preamble<T>();
+ return QuadReadLaneAt(value, id);
+}
+
+__generic<T : __BuiltinType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_quad) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupQuadSwapHorizontal(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ return QuadReadAcrossX(value);
+}
+
+__generic<T : __BuiltinType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_quad) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupQuadSwapVertical(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ return QuadReadAcrossY(value);
+}
+
+__generic<T : __BuiltinType, let N : int>
+__glsl_extension(GL_KHR_shader_subgroup_quad) [require(glsl)]
+__spirv_version(1.3) [require(spirv)]
+[ForceInline] public vector<T,N> subgroupQuadSwapDiagonal(vector<T,N> value)
+{
+ shader_subgroup_preamble<T>();
+ return QuadReadAcrossDiagonal(value);
+} \ No newline at end of file
diff --git a/source/slang/hlsl.meta.slang b/source/slang/hlsl.meta.slang
index 156ecc194..84ba11cad 100644
--- a/source/slang/hlsl.meta.slang
+++ b/source/slang/hlsl.meta.slang
@@ -7807,19 +7807,14 @@ T WaveMaskProduct(WaveMask mask, T expr)
case spirv:
if (__isFloat<T>())
return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFMul $$T result Subgroup 0 $expr};
- else if (__isSignedInt<T>())
+ else if (__isInt<T>())
{
return spirv_asm
{
OpCapability GroupNonUniformArithmetic;
- // TODO: use the correct integer width
- OpBitcast $$uint %uvalue $expr;
- OpGroupNonUniformIMul $$uint %mulResult Subgroup 0 %uvalue;
- OpBitcast $$T result %mulResult
+ OpGroupNonUniformIMul $$T result Subgroup 0 $expr;
};
}
- else if (__isUnsignedInt<T>())
- return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformIMul $$T result Subgroup 0 $expr};
else return expr;
}
}
@@ -7837,19 +7832,14 @@ vector<T,N> WaveMaskProduct(WaveMask mask, vector<T,N> expr)
case spirv:
if (__isFloat<T>())
return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFMul $$vector<T,N> result Subgroup 0 $expr};
- else if (__isSignedInt<T>())
+ else if (__isInt<T>())
{
return spirv_asm
{
OpCapability GroupNonUniformArithmetic;
- // TODO: use the correct integer width
- OpBitcast $$vector<uint,N> %uvalue $expr;
- OpGroupNonUniformIMul $$vector<uint,N> %mulResult Subgroup 0 %uvalue;
- OpBitcast $$vector<T,N> result %mulResult
+ OpGroupNonUniformIMul $$vector<T,N> result Subgroup 0 $expr;
};
}
- else if (__isUnsignedInt<T>())
- return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformIMul $$vector<T,N> result Subgroup 0 $expr};
else return expr;
}
}
@@ -7877,19 +7867,14 @@ T WaveMaskSum(WaveMask mask, T expr)
case spirv:
if (__isFloat<T>())
return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFAdd $$T result Subgroup 0 $expr};
- else if (__isSignedInt<T>())
+ else if (__isInt<T>())
{
return spirv_asm
{
OpCapability GroupNonUniformArithmetic;
- // TODO: use the correct integer width
- OpBitcast $$uint %uvalue $expr;
- OpGroupNonUniformIAdd $$uint %mulResult Subgroup 0 %uvalue;
- OpBitcast $$T result %mulResult
+ OpGroupNonUniformIAdd $$T result Subgroup 0 $expr;
};
}
- else if (__isUnsignedInt<T>())
- return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformIAdd $$T result Subgroup 0 $expr};
else return expr;
}
}
@@ -7908,19 +7893,14 @@ vector<T,N> WaveMaskSum(WaveMask mask, vector<T,N> expr)
case spirv:
if (__isFloat<T>())
return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFAdd $$vector<T,N> result Subgroup 0 $expr};
- else if (__isSignedInt<T>())
+ else if (__isInt<T>())
{
return spirv_asm
{
OpCapability GroupNonUniformArithmetic;
- // TODO: use the correct integer width
- OpBitcast $$vector<uint,N> %uvalue $expr;
- OpGroupNonUniformIAdd $$vector<uint,N> %mulResult Subgroup 0 %uvalue;
- OpBitcast $$vector<T,N> result %mulResult
+ OpGroupNonUniformIAdd $$vector<T,N> result Subgroup 0 $expr;
};
}
- else if (__isUnsignedInt<T>())
- return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformIAdd $$vector<T,N> result Subgroup 0 $expr};
else return expr;
}
}
@@ -8002,19 +7982,14 @@ T WaveMaskPrefixProduct(WaveMask mask, T expr)
case spirv:
if (__isFloat<T>())
return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFMul $$T result Subgroup ExclusiveScan $expr};
- else if (__isSignedInt<T>())
+ else if (__isInt<T>())
{
return spirv_asm
{
OpCapability GroupNonUniformArithmetic;
- // TODO: use the correct integer width
- OpBitcast $$uint %uvalue $expr;
- OpGroupNonUniformIMul $$uint %mulResult Subgroup ExclusiveScan %uvalue;
- OpBitcast $$T result %mulResult
+ OpGroupNonUniformIMul $$T result Subgroup ExclusiveScan $expr;
};
}
- else if (__isUnsignedInt<T>())
- return spirv_asm {OpGroupNonUniformIMul $$T result Subgroup ExclusiveScan $expr};
else return expr;
}
}
@@ -8033,19 +8008,14 @@ vector<T,N> WaveMaskPrefixProduct(WaveMask mask, vector<T,N> expr)
case spirv:
if (__isFloat<T>())
return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFMul $$vector<T,N> result Subgroup ExclusiveScan $expr};
- else if (__isSignedInt<T>())
+ else if (__isInt<T>())
{
return spirv_asm
{
OpCapability GroupNonUniformArithmetic;
- // TODO: use the correct integer width
- OpBitcast $$vector<uint,N> %uvalue $expr;
- OpGroupNonUniformIMul $$vector<uint,N> %mulResult Subgroup ExclusiveScan %uvalue;
- OpBitcast $$vector<T,N> result %mulResult
+ OpGroupNonUniformIMul $$vector<T,N> result Subgroup ExclusiveScan $expr;
};
}
- else if (__isUnsignedInt<T>())
- return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformIMul $$vector<T,N> result Subgroup ExclusiveScan $expr};
else return expr;
}
}
@@ -8069,19 +8039,14 @@ T WaveMaskPrefixSum(WaveMask mask, T expr)
case spirv:
if (__isFloat<T>())
return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFAdd $$T result Subgroup ExclusiveScan $expr};
- else if (__isSignedInt<T>())
+ else if (__isInt<T>())
{
return spirv_asm
{
OpCapability GroupNonUniformArithmetic;
- // TODO: use the correct integer width
- %uvalue:$$uint = OpBitcast $expr;
- %mulResult:$$uint = OpGroupNonUniformIAdd Subgroup ExclusiveScan %uvalue;
- result:$$T = OpBitcast %mulResult
+ result:$$T = OpGroupNonUniformIAdd Subgroup ExclusiveScan $expr;
};
}
- else if (__isUnsignedInt<T>())
- return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformIAdd $$T result Subgroup ExclusiveScan $expr};
else return expr;
}
}
@@ -8101,19 +8066,14 @@ vector<T,N> WaveMaskPrefixSum(WaveMask mask, vector<T,N> expr)
case spirv:
if (__isFloat<T>())
return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFAdd $$vector<T,N> result Subgroup ExclusiveScan $expr};
- else if (__isSignedInt<T>())
+ else if (__isInt<T>())
{
return spirv_asm
{
OpCapability GroupNonUniformArithmetic;
- // TODO: use the correct integer width
- %uvalue: $$vector<uint,N> = OpBitcast $expr;
- %mulResult: $$vector<uint,N> = OpGroupNonUniformIAdd Subgroup ExclusiveScan %uvalue;
- result: $$vector<T,N> = OpBitcast %mulResult
+ result:$$vector<T,N> = OpGroupNonUniformIAdd Subgroup ExclusiveScan $expr;
};
}
- else if (__isUnsignedInt<T>())
- return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformIAdd $$vector<T,N> result Subgroup ExclusiveScan $expr};
else return expr;
}
}
@@ -8612,23 +8572,14 @@ T WaveActive$(opName.hlslName)(T expr)
OpCapability GroupNonUniformArithmetic;
OpGroupNonUniformF$(opName.glslName) $$T result Subgroup 0 $expr
};
- else if (__isSignedInt<T>())
+ else if (__isInt<T>())
{
return spirv_asm
{
OpCapability GroupNonUniformArithmetic;
- // TODO: use the correct integer width
- OpBitcast $$uint %uvalue $expr;
- OpGroupNonUniformI$(opName.glslName) $$uint %mulResult Subgroup 0 %uvalue;
- OpBitcast $$T result %mulResult
+ OpGroupNonUniformI$(opName.glslName) $$T result Subgroup 0 $expr;
};
}
- else if (__isUnsignedInt<T>())
- return spirv_asm
- {
- OpCapability GroupNonUniformArithmetic;
- OpGroupNonUniformI$(opName.glslName) $$T result Subgroup 0 $expr
- };
else return expr;
default:
return WaveMask$(opName.hlslName)(WaveGetActiveMask(), expr);
@@ -8653,23 +8604,14 @@ vector<T,N> WaveActive$(opName.hlslName)(vector<T,N> expr)
OpCapability GroupNonUniformArithmetic;
OpGroupNonUniformF$(opName.glslName) $$vector<T,N> result Subgroup 0 $expr
};
- else if (__isSignedInt<T>())
+ else if (__isInt<T>())
{
return spirv_asm
{
OpCapability GroupNonUniformArithmetic;
- // TODO: use the correct integer width
- OpBitcast $$vector<uint,N> %uvalue $expr;
- OpGroupNonUniformI$(opName.glslName) $$vector<uint,N> %$(opName.glslName)Result Subgroup 0 %uvalue;
- OpBitcast $$vector<T,N> result %$(opName.glslName)Result
+ OpGroupNonUniformI$(opName.glslName) $$vector<T,N> result Subgroup 0 $expr;
};
}
- else if (__isUnsignedInt<T>())
- return spirv_asm
- {
- OpCapability GroupNonUniformArithmetic;
- OpGroupNonUniformI$(opName.glslName) $$vector<T,N> result Subgroup 0 $expr
- };
else return expr;
default:
return WaveMask$(opName.hlslName)(WaveGetActiveMask(), expr);
@@ -8909,19 +8851,14 @@ T WavePrefixProduct(T expr)
OpCapability GroupNonUniformArithmetic;
OpGroupNonUniformFMul $$T result Subgroup ExclusiveScan $expr
};
- else if (__isSignedInt<T>())
+ else if (__isInt<T>())
{
return spirv_asm
{
OpCapability GroupNonUniformArithmetic;
- // TODO: use the correct integer width
- OpBitcast $$uint %uvalue $expr;
- OpGroupNonUniformIMul $$uint %mulResult Subgroup ExclusiveScan %uvalue;
- OpBitcast $$T result %mulResult
+ OpGroupNonUniformIMul $$T result Subgroup ExclusiveScan $expr;
};
}
- else if (__isUnsignedInt<T>())
- return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformIMul $$T result Subgroup ExclusiveScan $expr};
else return expr;
default:
return WaveMaskPrefixProduct(WaveGetActiveMask(), expr);
@@ -8943,19 +8880,14 @@ vector<T,N> WavePrefixProduct(vector<T,N> expr)
case spirv:
if (__isFloat<T>())
return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFMul $$vector<T,N> result Subgroup ExclusiveScan $expr};
- else if (__isSignedInt<T>())
+ else if (__isInt<T>())
{
return spirv_asm
{
OpCapability GroupNonUniformArithmetic;
- // TODO: use the correct integer width
- OpBitcast $$vector<uint,N> %uvalue $expr;
- OpGroupNonUniformIMul $$vector<uint,N> %mulResult Subgroup ExclusiveScan %uvalue;
- OpBitcast $$vector<T,N> result %mulResult
+ OpGroupNonUniformIMul $$vector<T,N> result Subgroup ExclusiveScan $expr;
};
}
- else if (__isUnsignedInt<T>())
- return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformIMul $$vector<T,N> result Subgroup ExclusiveScan $expr};
else return expr;
default:
return WaveMaskPrefixProduct(WaveGetActiveMask(), expr);
@@ -8983,19 +8915,14 @@ T WavePrefixSum(T expr)
case spirv:
if (__isFloat<T>())
return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFAdd $$T result Subgroup ExclusiveScan $expr};
- else if (__isSignedInt<T>())
+ else if (__isInt<T>())
{
return spirv_asm
{
OpCapability GroupNonUniformArithmetic;
- // TODO: use the correct integer width
- %uvalue:$$uint = OpBitcast $expr;
- %mulResult:$$uint = OpGroupNonUniformIAdd Subgroup ExclusiveScan %uvalue;
- result:$$T = OpBitcast %mulResult
+ result:$$T = OpGroupNonUniformIAdd Subgroup ExclusiveScan $expr;
};
}
- else if (__isUnsignedInt<T>())
- return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformIAdd $$T result Subgroup ExclusiveScan $expr};
else return expr;
default:
return WaveMaskPrefixSum(WaveGetActiveMask(), expr);
@@ -9016,19 +8943,14 @@ vector<T,N> WavePrefixSum(vector<T,N> expr)
case spirv:
if (__isFloat<T>())
return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformFAdd $$vector<T,N> result Subgroup ExclusiveScan $expr};
- else if (__isSignedInt<T>())
+ else if (__isInt<T>())
{
return spirv_asm
{
OpCapability GroupNonUniformArithmetic;
- // TODO: use the correct integer width
- %uvalue:$$vector<uint,N> = OpBitcast $expr;
- %mulResult:$$vector<uint,N> = OpGroupNonUniformIAdd Subgroup ExclusiveScan %uvalue;
- result:$$vector<T,N> = OpBitcast %mulResult
+ result:$$vector<T,N> = OpGroupNonUniformIAdd Subgroup ExclusiveScan $expr;
};
}
- else if (__isUnsignedInt<T>())
- return spirv_asm {OpCapability GroupNonUniformArithmetic; OpGroupNonUniformIAdd $$vector<T,N> result Subgroup ExclusiveScan $expr};
else return expr;
default:
return WaveMaskPrefixSum(WaveGetActiveMask(), expr);
@@ -11036,6 +10958,7 @@ struct HitObject
let tmin = Ray.TMin;
let tmax = Ray.TMax;
spirv_asm {
+ OpCapability ShaderInvocationReorderNV;
OpHitObjectTraceRayNV
/**/ &__return_val
/**/ $AccelerationStructure
@@ -11781,7 +11704,7 @@ struct HitObject
}
}
- /// Returns the attributes of a hit. Valid if the hit object represents a hit or a miss.
+ /// Returns the attributes of a hit. Valid if the hit object represents a hit or a miss.
[ForceInline]
attr_t GetAttributes<attr_t>()
{
@@ -13164,6 +13087,14 @@ struct ConstBufferPointer
}
}
+
+
+ __subscript(int index) -> T
+ {
+ [ForceInline]
+ get {return ConstBufferPointer<T>.fromUInt(toUInt() + __naturalStrideOf<T>() * index).get(); }
+ }
+
__glsl_version(450)
__glsl_extension(GL_EXT_shader_explicit_arithmetic_types_int64)
__glsl_extension(GL_EXT_buffer_reference)
@@ -13215,10 +13146,4 @@ struct ConstBufferPointer
};
}
}
-
- __subscript(int index)->T
- {
- [ForceInline]
- get { return ConstBufferPointer<T>.fromUInt(toUInt() + __naturalStrideOf<T>() * index).get(); }
- }
}
diff --git a/source/slang/slang-ast-dump.cpp b/source/slang/slang-ast-dump.cpp
index ccd9b9ee7..3bb83f80b 100644
--- a/source/slang/slang-ast-dump.cpp
+++ b/source/slang/slang-ast-dump.cpp
@@ -666,6 +666,9 @@ struct ASTDumpContext
case SPIRVAsmOperand::SlangImmediateValue:
m_writer->emit("!");
break;
+ case SPIRVAsmOperand::BuiltinVar:
+ m_writer->emit("builtin");
+ break;
default:
SLANG_UNREACHABLE("Unhandled case in ast dump for SPIRVAsmOperand");
}
diff --git a/source/slang/slang-lower-to-ir.cpp b/source/slang/slang-lower-to-ir.cpp
index 416a6671b..1359e1242 100644
--- a/source/slang/slang-lower-to-ir.cpp
+++ b/source/slang/slang-lower-to-ir.cpp
@@ -3714,10 +3714,19 @@ struct ExprLoweringVisitorBase : public ExprVisitor<Derived, LoweredValInfo>
LoweredValInfo visitVarExpr(VarExpr* expr)
{
+ auto lowerTypeOfExpr = lowerType(context, expr->type);
+ auto declRef = expr->declRef;
+ if (auto propertyDeclRef = declRef.as<PropertyDecl>())
+ {
+ // A reference to a property is a special case, because
+ // we must translate the reference to the property
+ // into a reference to one of its accessors.
+ return lowerStorageReference(context, lowerTypeOfExpr, propertyDeclRef, LoweredValInfo(), 0, nullptr);
+ }
LoweredValInfo info = emitDeclRef(
context,
- expr->declRef,
- lowerType(context, expr->type));
+ declRef,
+ lowerTypeOfExpr);
return info;
}