summaryrefslogtreecommitdiffstats
path: root/docs
diff options
context:
space:
mode:
authorjsmall-nvidia <jsmall@nvidia.com>2020-02-19 14:16:38 -0500
committerGitHub <noreply@github.com>2020-02-19 11:16:38 -0800
commit46a1b5f58a528bf1cd2fa2907234a9090cd7ba10 (patch)
treef6581768499187bf1572079d65b99c15a95e7743 /docs
parent1d9152bd2d0b1234680ce6a9f7ef940d7f179e9a (diff)
Initial partial support for WaveXXX intrinsics on CUDA (#1228)
* Start work on wave intrinsics for CUDA. * Add prelimary CUDA support for some Wave intrinsics. Document the issue around WaveGetLaneIndex
Diffstat (limited to 'docs')
-rw-r--r--docs/cuda-target.md12
1 files changed, 12 insertions, 0 deletions
diff --git a/docs/cuda-target.md b/docs/cuda-target.md
index 743e9830c..a9b35d735 100644
--- a/docs/cuda-target.md
+++ b/docs/cuda-target.md
@@ -18,6 +18,7 @@ These limitations apply to Slang transpiling to CUDA.
* Samplers are not separate objects in CUDA - they are combined into a single 'TextureObject'. So samplers are effectively ignored on CUDA targets.
* Whilst there is tex1Dfetch there are no equivalents for higher dimensions - so such accesses are not currently supported
* When using a TextureArray (layered texture in CUDA) - the index will be treated as an int, as this is all CUDA allows
+* Care must be used in using `WaveGetLaneIndex` wave intrinsic - it will only give the right results for appopriate launches
The following are a work in progress or not implmented but are planned to be so in the future
@@ -143,6 +144,17 @@ void setDownstreamCompilerPrelude(SlangPassThrough passThrough, const char* prel
The code that sets up the prelude for the test infrastucture and command line usage can be found in ```TestToolUtil::setSessionDefaultPrelude```. Essentially this determines what the absolute path is to `slang-cpp-prelude.h` is and then just makes the prelude `#include "the absolute path"`.
+Limitations
+===========
+
+## WaveGetLaneIndex
+
+This defaults to `threadIdx.x & SLANG_CUDA_WARP_MASK`. Depending on how the kernel is launched this could be incorrect.
+
+There other ways to get lane index, for example using inline assembly. This mechanism though is apparently slower than the simple method used here.
+
+There is potential to calculate the lane id using the [numthreads] markup in Slang/HLSL, but that also requires some assumptions of how that maps to a lane index.
+
Language aspects
================