Read half->float RWTexture conversion (#1842)

* #include an absolute path didn't work - because paths were taken to always be relative. * Fix for writing to RWTexture with half types on CUDA. * CUDA half functionality doc updates. * First pass support for sust.p RWTexture format conversion on write. * Tidy up implementation of $C. Made clamping mode #define able. * A simple test for RWTexture CUDA format conversion. * Add support for float2 and float4. * WIP conversion testing. * Use $E to fix byte addressing in X in CUDA. * Do not scale when accessing via _convert versions of surface functions. * Revert to previous test. * Test with half/float convert write/read. * More broad half->float read conversion testing. * Improve documentation around half and RWTexture conversion.
author: jsmall-nvidia <jsmall@nvidia.com> 2021-05-15 11:45:58 -0400
committer: GitHub <noreply@github.com> 2021-05-15 11:45:58 -0400
commit: d5e8044d0a9723bb0bbd7ae1738d1157265da783 (patch)
tree: d330e87e67646fd6e978e4debad17b4f7fbe2c40 /docs/cuda-target.md
parent: bfe75618be81566882be8570b8db82ad5a2f8fe4 (diff)
1 files changed, 51 insertions, 6 deletions
diff --git a/docs/cuda-target.md b/docs/cuda-target.md
index 47b058f50..d5affc0e9 100644
--- a/docs/cuda-target.md
+++ b/docs/cuda-target.md
@@ -20,11 +20,11 @@ These limitations apply to Slang transpiling to CUDA.
 * Samplers are not separate objects in CUDA - they are combined into a single 'TextureObject'. So samplers are effectively ignored on CUDA targets. 
 * When using a TextureArray.Sample (layered texture in CUDA) - the index will be treated as an int, as this is all CUDA allows
 * Care must be used in using `WaveGetLaneIndex` wave intrinsic - it will only give the right results for appropriate launches
-* CUDA 'surfaces' are used for textures which are read/write. CUDA does NOT do format conversion with surfaces.
+* CUDA 'surfaces' are used for textures which are read/write (aka RWTexture). 
 
 The following are a work in progress or not implemented but are planned to be so in the future
 
-* Some resource types remain unsupported, and not all methods on types are supported
+* Some resource types remain unsupported, and not all methods on all types are supported
 
 # How it works
 
@@ -122,8 +122,6 @@ The UniformState and UniformEntryPointParams struct typically vary by shader. Un
     size_t sizeInBytes;
 ```  
 
-
-
 ## Texture
 
 Read only textures will be bound as the opaque CUDA type CUtexObject. This type is the combination of both a texture AND a sampler. This is somewhat different from HLSL, where there can be separate `SamplerState` variables. This allows access of a single texture binding with different types of sampling. 
@@ -138,11 +136,58 @@ Load is only supported for Texture1D, and the mip map selection argument is igno
  
 RWTexture types are converted into CUsurfObject type. 
 
-In CUDA it is not possible to do a format conversion on an access to a CUsurfObject, so it must be backed by the same data format as is used within the Slang source code. 
+In regular CUDA it is not possible to do a format conversion on an access to a CUsurfObject. Slang does add support for hardware write conversions where they are available. To enable the feature it is necessary to attribute your RWTexture with `format`. For example
+
+```
+[format("rg16f")]
+RWTexture2D<float2> rwt2D_2;
+```
+
+The format names used are the same as for (GLSL layout format types)[https://www.khronos.org/opengl/wiki/Layout_Qualifier_(GLSL)]. If no format is specified Slang will *assume* that the format is the same as the type specified. 
+
+Note that the format attribution is on variables/paramters/fields and not part of the type system. This means that if you have a scenario like...
+
+```
+[format(rg16f)]
+RWTexture2d<float2> g_texture;
+
+float2 getValue(RWTexture2D<float2> t)
+{
+    return t[int2(0, 0];
+}
+
+void doThing()
+{
+    float2 v = getValue(g_texture);
+}
+```
+
+Even `getValue` will receive t *without* the format attribute, and so will access it, presumably erroneously. A work around for this specific scenario would be to attribute the parameter
+
+```
+float2 getValue([format("rg16f")] RWTexture2D<float2> t)
+{
+    return t[int2(0, 0];
+}
+```
+
+This will only work correctly if `getValue` is called with a `t` that has that format attribute. As it stands no checking is performed on this matching so no error or warning will be produced if there is a mismatch. 
+
+There is limited software support for doing a conversion on reading. Currently this only supports only 1D, 2D, 3D RWTexture, backed with half1, half2 or half4. For this path to work NVRTC must have the `cuda_fp16.h` and associated files available. Please check the section on `Half Support`.
+
+If hardware read conversions are desired, this can be achieved by having a Texture<T> that uses the surface of a RWTexture<T>. Using the Texture<T> not only allows hardware conversion but also filtering. 
 
 It is also worth noting that CUsurfObjects in CUDA are NOT allowed to have mip maps. 
 
-By default surface access uses cudaBoundaryModeZero, this can be replaced using the macro SLANG_CUDA_BOUNDARY_MODE in the CUDA prelude.
+By default surface access uses cudaBoundaryModeZero, this can be replaced using the macro SLANG_CUDA_BOUNDARY_MODE in the CUDA prelude. For HW format conversions the macro SLANG_PTX_BOUNDARY_MODE. These boundary settings are in effect global for the whole of the kernel. 
+
+`SLANG_CUDA_BOUNDARY_MODE` can be one of
+
+* cudaBoundaryModeZero      causes an execution trap on out-of-bounds addresses
+* cudaBoundaryModeClamp     stores data at the nearest surface location (sized appropriately)
+* cudaBoundaryModeTrap      drops stores to out-of-bounds addresses 
+
+`SLANG_PTX_BOUNDARY_MODE` can be one of `trap`, `clamp` or `zero`. In general it is recommended to have both set to the same type of value, for example `cudaBoundaryModeZero` and `zero`.
 
 ## Sampler
author	jsmall-nvidia <jsmall@nvidia.com>	2021-05-15 11:45:58 -0400
committer	GitHub <noreply@github.com>	2021-05-15 11:45:58 -0400
commit	d5e8044d0a9723bb0bbd7ae1738d1157265da783 (patch)
tree	d330e87e67646fd6e978e4debad17b4f7fbe2c40 /docs/cuda-target.md
parent	bfe75618be81566882be8570b8db82ad5a2f8fe4 (diff)