diff options
| author | ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> | 2024-02-26 19:09:09 -0500 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2024-02-26 16:09:09 -0800 |
| commit | 1241006b6d89cae09766ca9795187ef9c0dd2085 (patch) | |
| tree | 9b114fc0ac218cae917b43f31090f1e20e2d96e5 /source/slang/slang-ast-dump.cpp | |
| parent | 4f03eb9d657fd74da341bb2b0d391c6b855073af (diff) | |
Partially implement shader_subgroup extension(s); Partially resolves #3548 (#3580)
* Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548
Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548
GL_KHR_shader_subgroup implemented based on https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt
Implementation is broken down into seperate glsl extensions due to the ***large differences*** in implementation of each section, and functionality/testing.
GL_KHR_shader_subgroup_basic{
**Partially implemented**
Implementation:
* All 9 built-in variables have been stubbed without proper value; implementation is still required for these system variables; related to #411.
* Functions were reimplemented despite nearly mirrored HLSL functions due to:
* hlsl.meta implementations targetting workgroups rather than a warp/wave/subgroup:
* `__syncwarp` vs `__syncthreads`
* `SubgroupMemory` vs `WorkgroupMemory`
* etc.
* hlsl.meta implementations target broader SPIR-V memory targets to block on:
* ImageMemory|UniformMemory versus SPIR-V specifying barriers for ImageMemory and seperately an option for UniformMemory
* `subgroupElect` for CUDA has a different implementation than `WaveIsFirstLane`, this is because spec states that `subgroupElect()` only returns the lowest active gl_SubgroupInvocationID; therefore we are supposed to fetch the current active mask even if some invocations are turned off by branches
Testing:
tests for the variable -- `tests/glsl/shader-subgroup-built-in-variables.slang`
* these tests do not test functionality since not implemented yet
tests for the functions -- `tests/glsl/shader-subgroup-basic.slang`
* concurrency is tested for using SubgroupMemory, UniformMemory through attempting to create a GPU side race condition with writing and reading memory
* due to testing tools avaible there are no tests for ImageMemory
* subgroupElect is tested to return invocation #0, the lowest invocation that will always run; wave size is 32, therefore #0 is always active and will always be the elected invocation.
}
GL_KHR_shader_subgroup_vote{
**Fully implemented**
Implementation:
* 3/3 functions are using the hlsl.meta implementation
Testing:
`tests/glsl/shader-subgroup-vote.slang`
* Testing each a positive (returns true) and negative (returns false) test case to ensure vote results are correct
}
GL_KHR_shader_subgroup_ballot{
**Partially implemented**
Implementation:
There are 10/10 functions that are implemented:
* 3 are using hlsl.meta implementation
* 7 are using new implementations -- only support GLSL, SPIR-V, HLSL, CUDA
* These implementations do not exist in hlsl.meta, so they were added
* `subgroupInverseBallot` lacks an analog function to call; this feature was emulated:
* in CUDA through knowing waves are 32bit and lanes are 0 indexed, this implys that ` (ballotResult >> YOUR_INVOCATION) & 1` checks if your invocation is active, for example, `(0b11001 >> 3) & 1` would mean that only invocation 5, 4, and 1 is active, 3 would mean `YOUR_INVOCATION` is the fourth invocation in the subgroup. `(0b11001>>3) & 1` would return true since your bit is toggled and evaluates to `0b11 & 0b1`
* in HLSL through testing if the wave count is 32 or less (use the same logic as CUDA in this case); else find the index `YOUR_INVOCATION` corrisponds with where each vector has 32bits (32 waves); avoid division in the process. then run the same algorithm cuda employs.
* `subgroupBallotBitExtract` is logically the same as `subgroupInverseBallot`
* 5 implementations do not have a CUDA, HLSL, and CPP imlementation yet (subgroupBallotFindMSB, subgroupBallotFindLSB, subgroupBallotExclusiveBitCount, subgroupBallotInclusiveBitCount, subgroupBallotBitCount) due to being out of scope for the commit
Testing:
`tests/glsl/shader-subgroup-ballot.slang`
* the function tests for an expected value of each ballot function; tests try inputting larger than 32 toggled bits as function parameters to ensure the implementation correctly identifies values up to a maximum of the subgroup invocation count as per extension specification (otherwise the functionality is fairly trivial to test)
}
GL_KHR_shader_subgroup_arithmetic{
**Partially implemented**
Implementation:
* There are 21 functions to implement:
* 14 functions are using the hlsl.meta implementation
* 7 functions are new implementations -- only implemented for GLSL and SPIR-V
* GLSL & SPIR-V both use their related functions, no emulation required
* CUDA, CPP, HLSL are out of scope for the commit
Testing:
`tests/glsl/shader-subgroup-arithmetic.slang`
* all tests silently kill the shader; outputted GLSL was checked, could not see an issue
* these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit]
}
GL_KHR_shader_subgroup_shuffle{
**Partially implemented**
Implementation:
* There are 2 functions to implement:
* 1 function is using the existing hlsl.meta implmentation
* 1 function is using a new implmentation (subgroupShuffleXor) -- only implmented for GLSL & SPIR-V
* GLSL & SPIR-V both use their related functions, no emulation required
Testing:
`tests/glsl/shader-subgroup-shuffle.slang`
* these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit]
* tests fail with cpp due to `kIROp_WaveGetActiveMask` failing to be called
}
GL_KHR_shader_subgroup_shuffle_relative{
**Partially implemented**
Implementation:
* There are 2 functions to implement:
* all 2 functions are using a new implmentation -- only implmented for GLSL & SPIR-V
* GLSL & SPIR-V both use their related functions, no emulation required
Testing:
`tests/glsl/shader-subgroup-shuffle-relative.slang`
* these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit]
}
GL_KHR_shader_subgroup_clustered{
**Partially implemented**
Implementation:
* There are 7 functions to implement:
* all 7 functions are using a new implmentation -- only implmented for GLSL & SPIR-V
* GLSL & SPIR-V both use their related functions, no emulation required
Testing:
`tests/glsl/shader-subgroup-shuffle-clustered.slang`
* these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit]
}
GL_KHR_shader_subgroup_quad{
**Partially implemented**
Implementation:
* There are 4 functions to implement:
* all 4 functions are using hlsl.meta implmentations -- only implemented for GLSL & SPIR-V & HLSL
Testing:
`tests/glsl/shader-subgroup-shuffle-quad.slang`
* these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit]
}
---------
Failing tests and why:
Note: due to system variables not being implemented largly for CUDA and CPP, these tests will fail (#3 and #4){
tests/glsl/shader-subgroup-arithmetic.slang.3
tests/glsl/shader-subgroup-arithmetic.slang.4
tests/glsl/shader-subgroup-ballot.slang.4
tests/glsl/shader-subgroup-basic.slang.3
tests/glsl/shader-subgroup-basic.slang.4
tests/glsl/shader-subgroup-quad.slang.3
tests/glsl/shader-subgroup-quad.slang.4
tests/glsl/shader-subgroup-vote.slang.3
tests/glsl/shader-subgroup-vote.slang.4
}
Note: due to kIROp_WaveGetActiveMask not being loaded for cpp the following test will fail{
tests/glsl/shader-subgroup-shuffle.slang.4
}
Note: due to a unknown silent error the following will fail [could not spot an error in the generated glsl and spir-v]{
tests/glsl/shader-subgroup-arithmetic.slang.5 (vk)
tests/glsl/shader-subgroup-arithmetic.slang.6 (vk)
}
Other notes of worthy:{
* only a few types are checked currently in tests due to equality templates not allowing freely casting to int/uint, meaning to test types en-mass is not trivial and will most likley be completly replaced once templates can cast & check equality more freely.
* did not implement vector types for any functions that may use them (mostly in reference to SPIR-V, since many may accept scalar or vector inputs); applicable to subgroup-shuffle, subgroup-shuffle-relative, subgroup-arithmetic, subgroup-shuffle, subgroup_clustered, subgroup_quad
* did not implement checks for half floats
* CUDA, CPP, HLSL implementations were largly out of scope and if not implemented, this is due to the implementation not being trivial
}
Random fixes encountered:{
* hlsl.meta incorrectly sets `OpCapability` as `GroupNonUniformBallot` when the `OpCapability` should be `GroupNonUniformVote`; this is as per SPIR-V spec for all SPIR-V calls used in `GL_KHR_shader_subgroup_vote`: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformAll
}
* added vector types and tests;
Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548
GL_KHR_shader_subgroup implemented based on https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt
GL_KHR_shader_subgroup_* & GLSL ref:
* https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt
* https://www.khronos.org/blog/vulkan-subgroup-tutorial
* https://www.khronos.org/assets/uploads/developers/library/2018-vulkan-devday/06-subgroups.pdf
HLSL ref:
* https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions
* https://github.com/Microsoft/DirectXShaderCompiler/wiki/Wave-Intrinsics
CUDA ref:
* https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
SPIR-V ref:
* https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_memory_semantics_id
Implementation is broken down into seperate glsl extensions due to the ***large differences*** in implementation of each section, and functionality/testing.
GL_KHR_shader_subgroup_basic{
**Partially implemented**
Implementation:
* All 9 built-in variables have been stubbed without proper value; implementation is still required for these system variables; related to #411.
* Functions were reimplemented despite nearly mirrored HLSL functions due to:
* hlsl.meta implementations targetting workgroups rather than a warp/wave/subgroup:
* `__syncwarp` vs `__syncthreads`
* `SubgroupMemory` vs `WorkgroupMemory`
* etc.
* hlsl.meta implementations target broader SPIR-V memory targets to block on:
* ImageMemory|UniformMemory versus SPIR-V specifying barriers for ImageMemory and seperately an option for UniformMemory
* `subgroupElect` for CUDA has a different implementation than `WaveIsFirstLane`, this is because spec states that `subgroupElect()` only returns the lowest active gl_SubgroupInvocationID; therefore we are supposed to fetch the current active mask even if some invocations are turned off by branches
Testing:
tests for the variable -- `tests/glsl/shader-subgroup-built-in-variables.slang`
* these tests do not test functionality since not implemented yet
tests for the functions -- `tests/glsl/shader-subgroup-basic.slang`
* concurrency is tested for using SubgroupMemory, UniformMemory through attempting to create a GPU side race condition with writing and reading memory
* due to testing tools avaible there are no tests for ImageMemory
* subgroupElect is tested to return invocation #0, the lowest invocation that will always run; wave size is 32, therefore #0 is always active and will always be the elected invocation.
}
GL_KHR_shader_subgroup_vote{
**Fully implemented**
Implementation:
* 3/3 functions are using the hlsl.meta implementation
Testing:
`tests/glsl/shader-subgroup-vote.slang`
* Testing each a positive (returns true) and negative (returns false) test case to ensure vote results are correct
}
GL_KHR_shader_subgroup_ballot{
**Partially implemented**
Implementation:
There are 10/10 functions that are implemented:
* 3 are using hlsl.meta implementation
* 7 are using new implementations -- only support GLSL, SPIR-V, HLSL, CUDA
* These implementations do not exist in hlsl.meta, so they were added
* `subgroupInverseBallot` lacks an analog function to call; this feature was emulated:
* in CUDA through knowing waves are 32bit and lanes are 0 indexed, this implys that ` (ballotResult >> YOUR_INVOCATION) & 1` checks if your invocation is active, for example, `(0b11001 >> 3) & 1` would mean that only invocation 5, 4, and 1 is active, 3 would mean `YOUR_INVOCATION` is the fourth invocation in the subgroup. `(0b11001>>3) & 1` would return true since your bit is toggled and evaluates to `0b11 & 0b1`
* in HLSL through testing if the wave count is 32 or less (use the same logic as CUDA in this case); else find the index `YOUR_INVOCATION` corrisponds with where each vector has 32bits (32 waves); avoid division in the process. then run the same algorithm cuda employs.
* `subgroupBallotBitExtract` is logically the same as `subgroupInverseBallot`
* 5 implementations do not have a CUDA, HLSL, and CPP imlementation yet (subgroupBallotFindMSB, subgroupBallotFindLSB, subgroupBallotExclusiveBitCount, subgroupBallotInclusiveBitCount, subgroupBallotBitCount) due to being out of scope for the commit
Testing:
`tests/glsl/shader-subgroup-ballot.slang`
* the function tests for an expected value of each ballot function; tests try inputting larger than 32 toggled bits as function parameters to ensure the implementation correctly identifies values up to a maximum of the subgroup invocation count as per extension specification (otherwise the functionality is fairly trivial to test)
}
GL_KHR_shader_subgroup_arithmetic{
**Partially implemented**
Implementation:
* There are 21 functions to implement:
* 14 functions are using the hlsl.meta implementation
* 7 functions are new implementations -- only implemented for GLSL and SPIR-V
* GLSL & SPIR-V both use their related functions, no emulation required
* CUDA, CPP, HLSL are out of scope for the commit
Testing:
`tests/glsl/shader-subgroup-arithmetic.slang`
* all tests silently kill the shader; outputted GLSL was checked, could not see an issue
* these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]
}
GL_KHR_shader_subgroup_shuffle{
**Partially implemented**
Implementation:
* There are 2 functions to implement:
* 1 function is using the existing hlsl.meta implmentation
* 1 function is using a new implmentation (subgroupShuffleXor) -- only implmented for GLSL & SPIR-V
* GLSL & SPIR-V both use their related functions, no emulation required
Testing:
`tests/glsl/shader-subgroup-shuffle.slang`
* these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]
* tests fail with cpp due to `kIROp_WaveGetActiveMask` failing to be called
}
GL_KHR_shader_subgroup_shuffle_relative{
**Partially implemented**
Implementation:
* There are 2 functions to implement:
* all 2 functions are using a new implmentation -- only implmented for GLSL & SPIR-V
* GLSL & SPIR-V both use their related functions, no emulation required
Testing:
`tests/glsl/shader-subgroup-shuffle-relative.slang`
* these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]
}
GL_KHR_shader_subgroup_clustered{
**Partially implemented**
Implementation:
* There are 7 functions to implement:
* all 7 functions are using a new implmentation -- only implmented for GLSL & SPIR-V
* GLSL & SPIR-V both use their related functions, no emulation required
Testing:
`tests/glsl/shader-subgroup-shuffle-clustered.slang`
* these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]
}
GL_KHR_shader_subgroup_quad{
**Partially implemented**
Implementation:
* There are 4 functions to implement:
* all 4 functions are using hlsl.meta implmentations -- only implemented for GLSL & SPIR-V & HLSL
Testing:
`tests/glsl/shader-subgroup-shuffle-quad.slang`
* these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]
}
---------
Failing tests and why:
Note: test numbers are assuming none of the existing tests are toggled off
Note: due to system variables not being implemented largly for CUDA and CPP, these tests will fail (#3 and #4){
tests/glsl/shader-subgroup-arithmetic.slang.3
tests/glsl/shader-subgroup-arithmetic.slang.4
tests/glsl/shader-subgroup-ballot.slang.4
tests/glsl/shader-subgroup-basic.slang.3
tests/glsl/shader-subgroup-basic.slang.4
tests/glsl/shader-subgroup-quad.slang.3
tests/glsl/shader-subgroup-quad.slang.4
tests/glsl/shader-subgroup-vote.slang.3
tests/glsl/shader-subgroup-vote.slang.4
}
Note: due to kIROp_WaveGetActiveMask not being loaded for cpp the following test will fail{
tests/glsl/shader-subgroup-shuffle.slang.4
tests/glsl/shader-subgroup-shuffle-relative.slang.4
tests/glsl/shader-subgroup-basic.slang.4
}
Note: due to a unknown silent error the following will fail [could not spot an error in the generated glsl and spir-v]{
tests/glsl/shader-subgroup-arithmetic.slang.5 (vk)
tests/glsl/shader-subgroup-arithmetic.slang.6 (vk)
}
Other notes of worthy:{
* only a few types are checked currently in arithmetic test; this is due to the test silently failing, meaning I can't actually test anything implemented
* did not implement checks for half floats
* CUDA, CPP, HLSL implementations were largly out of scope and not implemented, this is due to the implementation being non trivial for many functions
}
Random fixes encountered:{
* hlsl.meta incorrectly sets `OpCapability` as `GroupNonUniformBallot` when the `OpCapability` should be `GroupNonUniformVote`; this is as per SPIR-V spec for all SPIR-V calls used in `GL_KHR_shader_subgroup_vote`: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformAll
}
* Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548
Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548
GL_KHR_shader_subgroup implemented based on https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt
GL_KHR_shader_subgroup_* & GLSL ref:
* https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt
* https://www.khronos.org/blog/vulkan-subgroup-tutorial
* https://www.khronos.org/assets/uploads/developers/library/2018-vulkan-devday/06-subgroups.pdf
HLSL ref:
* https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions
* https://github.com/Microsoft/DirectXShaderCompiler/wiki/Wave-Intrinsics
CUDA ref:
* https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
SPIR-V ref:
* https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_memory_semantics_id
Implementation is broken down into seperate glsl extensions due to the ***large differences*** in implementation of each section, and functionality/testing.
GL_KHR_shader_subgroup_basic{
**Partially implemented**
Implementation:
* All 9 built-in variables have been stubbed without proper value; implementation is still required for these system variables; related to #411.
* Functions were reimplemented despite nearly mirrored HLSL functions due to:
* hlsl.meta implementations targetting workgroups rather than a warp/wave/subgroup:
* `__syncwarp` vs `__syncthreads`
* `SubgroupMemory` vs `WorkgroupMemory`
* etc.
* hlsl.meta implementations target broader SPIR-V memory targets to block on:
* ImageMemory|UniformMemory versus SPIR-V specifying barriers for ImageMemory and seperately an option for UniformMemory
* `subgroupElect` for CUDA has a different implementation than `WaveIsFirstLane`, this is because spec states that `subgroupElect()` only returns the lowest active gl_SubgroupInvocationID; therefore we are supposed to fetch the current active mask even if some invocations are turned off by branches
Testing:
tests for the variable -- `tests/glsl/shader-subgroup-built-in-variables.slang`
* these tests do not test functionality since not implemented yet
tests for the functions -- `tests/glsl/shader-subgroup-basic.slang`
* concurrency is tested for using SubgroupMemory, UniformMemory through attempting to create a GPU side race condition with writing and reading memory
* due to testing tools avaible there are no tests for ImageMemory
* subgroupElect is tested to return invocation #0, the lowest invocation that will always run; wave size is 32, therefore #0 is always active and will always be the elected invocation.
}
GL_KHR_shader_subgroup_vote{
**Fully implemented**
Implementation:
* 3/3 functions are using the hlsl.meta implementation
Testing:
`tests/glsl/shader-subgroup-vote.slang`
* Testing each a positive (returns true) and negative (returns false) test case to ensure vote results are correct
}
GL_KHR_shader_subgroup_ballot{
**Partially implemented**
Implementation:
There are 10/10 functions that are implemented:
* 3 are using hlsl.meta implementation
* 7 are using new implementations -- only support GLSL, SPIR-V, HLSL, CUDA
* These implementations do not exist in hlsl.meta, so they were added
* `subgroupInverseBallot` lacks an analog function to call; this feature was emulated:
* in CUDA through knowing waves are 32bit and lanes are 0 indexed, this implys that ` (ballotResult >> YOUR_INVOCATION) & 1` checks if your invocation is active, for example, `(0b11001 >> 3) & 1` would mean that only invocation 5, 4, and 1 is active, 3 would mean `YOUR_INVOCATION` is the fourth invocation in the subgroup. `(0b11001>>3) & 1` would return true since your bit is toggled and evaluates to `0b11 & 0b1`
* in HLSL through testing if the wave count is 32 or less (use the same logic as CUDA in this case); else find the index `YOUR_INVOCATION` corrisponds with where each vector has 32bits (32 waves); avoid division in the process. then run the same algorithm cuda employs.
* `subgroupBallotBitExtract` is logically the same as `subgroupInverseBallot`
* 5 implementations do not have a CUDA, HLSL, and CPP imlementation yet (subgroupBallotFindMSB, subgroupBallotFindLSB, subgroupBallotExclusiveBitCount, subgroupBallotInclusiveBitCount, subgroupBallotBitCount) due to being out of scope for the commit
Testing:
`tests/glsl/shader-subgroup-ballot.slang`
* the function tests for an expected value of each ballot function; tests try inputting larger than 32 toggled bits as function parameters to ensure the implementation correctly identifies values up to a maximum of the subgroup invocation count as per extension specification (otherwise the functionality is fairly trivial to test)
}
GL_KHR_shader_subgroup_arithmetic{
**Partially implemented**
Implementation:
* There are 21 functions to implement:
* 14 functions are using the hlsl.meta implementation
* 7 functions are new implementations -- only implemented for GLSL and SPIR-V
* GLSL & SPIR-V both use their related functions, no emulation required
* CUDA, CPP, HLSL are out of scope for the commit
Testing:
`tests/glsl/shader-subgroup-arithmetic.slang`
* all tests silently kill the shader; outputted GLSL was checked, could not see an issue
* these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]
}
GL_KHR_shader_subgroup_shuffle{
**Partially implemented**
Implementation:
* There are 2 functions to implement:
* 1 function is using the existing hlsl.meta implmentation
* 1 function is using a new implmentation (subgroupShuffleXor) -- only implmented for GLSL & SPIR-V
* GLSL & SPIR-V both use their related functions, no emulation required
Testing:
`tests/glsl/shader-subgroup-shuffle.slang`
* these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]
* tests fail with cpp due to `kIROp_WaveGetActiveMask` failing to be called
}
GL_KHR_shader_subgroup_shuffle_relative{
**Partially implemented**
Implementation:
* There are 2 functions to implement:
* all 2 functions are using a new implmentation -- only implmented for GLSL & SPIR-V
* GLSL & SPIR-V both use their related functions, no emulation required
Testing:
`tests/glsl/shader-subgroup-shuffle-relative.slang`
* these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]
}
GL_KHR_shader_subgroup_clustered{
**Partially implemented**
Implementation:
* There are 7 functions to implement:
* all 7 functions are using a new implmentation -- only implmented for GLSL & SPIR-V
* GLSL & SPIR-V both use their related functions, no emulation required
Testing:
`tests/glsl/shader-subgroup-shuffle-clustered.slang`
* these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]
}
GL_KHR_shader_subgroup_quad{
**Partially implemented**
Implementation:
* There are 4 functions to implement:
* all 4 functions are using hlsl.meta implmentations -- only implemented for GLSL & SPIR-V & HLSL
Testing:
`tests/glsl/shader-subgroup-shuffle-quad.slang`
* these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit]
}
---------
Failing tests and why:
Note: test numbers are assuming none of the existing tests are toggled off
Note: due to system variables not being implemented largly for CUDA and CPP, these tests will fail (#3 and #4){
tests/glsl/shader-subgroup-arithmetic.slang.3
tests/glsl/shader-subgroup-arithmetic.slang.4
tests/glsl/shader-subgroup-ballot.slang.4
tests/glsl/shader-subgroup-basic.slang.3
tests/glsl/shader-subgroup-basic.slang.4
tests/glsl/shader-subgroup-quad.slang.3
tests/glsl/shader-subgroup-quad.slang.4
tests/glsl/shader-subgroup-vote.slang.3
tests/glsl/shader-subgroup-vote.slang.4
}
Note: due to kIROp_WaveGetActiveMask not being loaded for cpp the following test will fail{
tests/glsl/shader-subgroup-shuffle.slang.4
tests/glsl/shader-subgroup-shuffle-relative.slang.4
tests/glsl/shader-subgroup-basic.slang.4
}
Other notes of worthy:{
* added preamble function and macros for implementing subgroup functionality (and tests) to make it possible to iterate on the functionality with reasonable effort in the future
* CUDA, CPP, HLSL implementations were largly out of scope and not implemented, this is due to the implementation being non trivial for many functions
* doubles cause a silent crash on most subgroup functions tested (silent shader hang)
* __requireGLSLExtension does not work as intended inside glsl.meta; as a result half, int16, int64 int8, all are ommited from testing
}
Random fixes encountered:{
* hlsl.meta incorrectly sets `OpCapability` as `GroupNonUniformBallot` when the `OpCapability` should be `GroupNonUniformVote`; this is as per SPIR-V spec for all SPIR-V calls used in `GL_KHR_shader_subgroup_vote`: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformAll
* hlsl.meta incorrectly uses for WaveMaskPrefixBitOr (SPIR-V) OpGroupNonUniformBitwiseAnd intead of OpGroupNonUniformBitwiseOr; this was fixed
}
* redesign tests under suggestions that they should be smaller, more maintainable, and test the most amount of data reasonabley possible (balance with fast iterations);
optional double testing
varying parameter testing
most tests chain results now
* fix missing impl and merge conflict resolutions
* reundant test code cleanup and organization
move tests to proper location (glsl-intrinsic)
clean up redundant code (input buffers)
* add missing logical operands support (and remove hlsl/cuda code reuse due to the functional differences) under all And, Or, Xor ops
redesign tests to conform to a better testing paradigm
* testing code style change to not use white space as a toggle for tests
* provided crash reason for doubles (intel iris gpu's crash in glsl with doubles due to missing support in device caps [as per vulkan validation layer)
uncommented the `__requireGLSLExtension` code so once it is fixed int16/8/64/half wil work with subgroup not requiring future intervention
* fixing some vk validation layer errors (OpMemoryBarrier, Shuffle operations)
modified style of tests; removed redundancy (extra code that does nothing); fixed some incorrect run targets; added error reasons for all encountered problems (and if needed, a #define/#if toggle)
* remove comments of important tests inplace of #define over the broken feature of extended shader_subgroup types
* removed macros inside glsl.meta
removed erroneous __target_switch to directly call hlsl.meta function
added elaboration on the problem with __requireGLSLExtension
changed WaveMaskPrefixBit[or|and|xor] to support the expected type of <int> only as per `HLSL Shader Model 6.5` specs
removed "precision highp" since it does not affect tests
* changes some hlsl.meta functions used to be more appropriate (as per suggested)
WaveMask -> WaveActive.*
WaveMaskPrefix.* -> WavePrefix.*
remove __target_switch case's for unimplemented case's of intrinsics
fix _getLaneId() being removed from some regex used earlier
* fix usage of __target_intrinsic instead of __intrinsic_asm; silently would cause only arguments to be emmitted as return
changed usage of `__requireGLSLExtension` because now it causes a crash from the missing intrinsic (instead of a silent error)
* fix shader subgroup extended types support for GLSL and SPIR-V:
1. seperate intrinsic/__requireGLSL generating functionality of shader_subgroup_preamble into child function calls due to otherwise `__requireGLSLExtension` being ignored if the calling function of shader_subgroup_preamble calls an `__intrinsic_asm`
2. fixed HLSL.meta logic for wave operations (Add, Mul, exclusiveAdd, exclusiveMul) to no longer cast the input type T into a uint due to cost-of-op & crash.
* Int8_t bit casted into uint32_t crashed the compiler. As per SPIR-V spec, OpGroupNonUniformI.* work on uint and int types meaning the function has no need to cast to a unit.
3. removed erroneous __target_switch for subgroupShuffle
* 1. ignore tests gracefully
2. remove un-needed SPIRV capability specifying (with OpCapability)
3. clean up structure of typeRequireChecks_shader_subgroup_GLSL
4. explain why HLSL/CUDA are not targeted for shader-subgroup-arithmetic.slang
* syntax changes + `property` declaration fix + builtin var glsl implementation + changed incorrect HLSL.meta assumptions
(#1)`property` declaration as *non member* implementation change/fix (all of the changes to `slang-lower-to-ir.cpp`)
using (#1), implemented subgroup builtin's for GLSL/SPIR-V; did not implement built'ins completly for HLSL/CUDA due to non trivial implementations. CPP has no implementation due to missing support of system values
changed some incorrect HLSL.meta subgroup implementation assumptions of type usage (bit casting 8bit->32bit, wrong capabilities causing errors)
dumping ast crash with spir-v when using builtin's fixed by adding the `builtin` spirv case (all of the changes to `slang-ast-dump.cpp`)
[ForceInline] addition to functions missing it
return instead of spirv_asm when empty blocks are used
* syntax & organization of tests adjustment (specifically how if'def's are managed)
* figuring out where ci fails
* figuring out where ci fails -- testing with enclusive & regular
* testing CI with exclusive, regular, inclusive
* remove unneeded white space
test CI inconsistency issues further with arithmetic.slang
* testing if the ci run fails due to some timeout/recovery issue
* split up arithmetic tests and push to test with CI
---------
Co-authored-by: Yong He <yonghe@outlook.com>
Diffstat (limited to 'source/slang/slang-ast-dump.cpp')
| -rw-r--r-- | source/slang/slang-ast-dump.cpp | 3 |
1 files changed, 3 insertions, 0 deletions
diff --git a/source/slang/slang-ast-dump.cpp b/source/slang/slang-ast-dump.cpp index ccd9b9ee7..3bb83f80b 100644 --- a/source/slang/slang-ast-dump.cpp +++ b/source/slang/slang-ast-dump.cpp @@ -666,6 +666,9 @@ struct ASTDumpContext case SPIRVAsmOperand::SlangImmediateValue: m_writer->emit("!"); break; + case SPIRVAsmOperand::BuiltinVar: + m_writer->emit("builtin"); + break; default: SLANG_UNREACHABLE("Unhandled case in ast dump for SPIRVAsmOperand"); } |
