diff options
| author | jsmall-nvidia <jsmall@nvidia.com> | 2020-02-19 14:16:38 -0500 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2020-02-19 11:16:38 -0800 |
| commit | 46a1b5f58a528bf1cd2fa2907234a9090cd7ba10 (patch) | |
| tree | f6581768499187bf1572079d65b99c15a95e7743 /docs/cuda-target.md | |
| parent | 1d9152bd2d0b1234680ce6a9f7ef940d7f179e9a (diff) | |
Initial partial support for WaveXXX intrinsics on CUDA (#1228)
* Start work on wave intrinsics for CUDA.
* Add prelimary CUDA support for some Wave intrinsics.
Document the issue around WaveGetLaneIndex
Diffstat (limited to 'docs/cuda-target.md')
| -rw-r--r-- | docs/cuda-target.md | 12 |
1 files changed, 12 insertions, 0 deletions
diff --git a/docs/cuda-target.md b/docs/cuda-target.md index 743e9830c..a9b35d735 100644 --- a/docs/cuda-target.md +++ b/docs/cuda-target.md @@ -18,6 +18,7 @@ These limitations apply to Slang transpiling to CUDA. * Samplers are not separate objects in CUDA - they are combined into a single 'TextureObject'. So samplers are effectively ignored on CUDA targets. * Whilst there is tex1Dfetch there are no equivalents for higher dimensions - so such accesses are not currently supported * When using a TextureArray (layered texture in CUDA) - the index will be treated as an int, as this is all CUDA allows +* Care must be used in using `WaveGetLaneIndex` wave intrinsic - it will only give the right results for appopriate launches The following are a work in progress or not implmented but are planned to be so in the future @@ -143,6 +144,17 @@ void setDownstreamCompilerPrelude(SlangPassThrough passThrough, const char* prel The code that sets up the prelude for the test infrastucture and command line usage can be found in ```TestToolUtil::setSessionDefaultPrelude```. Essentially this determines what the absolute path is to `slang-cpp-prelude.h` is and then just makes the prelude `#include "the absolute path"`. +Limitations +=========== + +## WaveGetLaneIndex + +This defaults to `threadIdx.x & SLANG_CUDA_WARP_MASK`. Depending on how the kernel is launched this could be incorrect. + +There other ways to get lane index, for example using inline assembly. This mechanism though is apparently slower than the simple method used here. + +There is potential to calculate the lane id using the [numthreads] markup in Slang/HLSL, but that also requires some assumptions of how that maps to a lane index. + Language aspects ================ |
