First pass at a Target Compatibility document (#1287)

* WIP compatibility docs. * Test transpose in matrix-float. * Small improvement to CUDA docs. * Added some discussion around tessellation. * Small improvements to target-compatibility.md * Improve compatibility documentation. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
author: jsmall-nvidia <jsmall@nvidia.com> 2020-03-23 17:55:49 -0400
committer: GitHub <noreply@github.com> 2020-03-23 17:55:49 -0400
commit: 7b4e0e1892bad9f51677b191c69b01aee7403632 (patch)
tree: 613acd7984b16242fd0f7c4138a71ba71576f974 /docs
parent: 05c9a5c9dc23a716c7cbeae91f581bbc13f10ed2 (diff)
2 files changed, 117 insertions, 1 deletions
diff --git a/docs/cuda-target.md b/docs/cuda-target.md
index c41858bd4..17a79b1d0 100644
--- a/docs/cuda-target.md
+++ b/docs/cuda-target.md
@@ -16,7 +16,7 @@ These limitations apply to Slang transpiling to CUDA.
 
 * Only supports the 'texture object' style binding (The texture object API is only supported on devices of compute capability 3.0 or higher. )
 * Samplers are not separate objects in CUDA - they are combined into a single 'TextureObject'. So samplers are effectively ignored on CUDA targets. 
-* When using a TextureArray (layered texture in CUDA) - the index will be treated as an int, as this is all CUDA allows
+* When using a TextureArray.Sample (layered texture in CUDA) - the index will be treated as an int, as this is all CUDA allows
 * Care must be used in using `WaveGetLaneIndex` wave intrinsic - it will only give the right results for appropriate launches
 * CUDA 'surfaces' are used for textures which are read/write. CUDA does NOT do format conversion with surfaces.
 
diff --git a/docs/target-compatibility.md b/docs/target-compatibility.md
new file mode 100644
index 000000000..64695c09c
--- /dev/null
+++ b/docs/target-compatibility.md
@@ -0,0 +1,116 @@
+Slang Target Compatibility 
+==========================
+
+
+Shader Model (SM) numbers are D3D Shader Model versions, unless explicitly stated otherwise.
+OpenGL compatibility is not listed here, because OpenGL isn't an officially supported target. 
+
+Items with a + means that the feature is anticipated to be added in the future.
+Items with ^ means there is some discussion about support later in the document for this target.
+
+| Feature                     |    D3D11     |    D3D12     |     VK     |      CUDA     |    CPU
+|-----------------------------|--------------|--------------|------------|---------------|---------------
+| Half Type                   |     No       |     Yes      |   Yes      |     No +      |    No +
+| Double Type                 |     Yes      |     Yes      |   Yes      |     Yes       |    Yes
+| Double Intrinsics           |     No       |   Limited +  |  Limited   |     Most      |    Yes
+| u/int64_t Type              |     No       |   Yes ^      |   Yes      |     Yes       |    Yes
+| u/int64_t Intrinsics        |     No       |   No         |   Yes      |     Yes       |    Yes
+| int matrix                  |     Yes      |   Yes        |   No +     |     Yes       |    Yes
+| tex.GetDimension            |     Yes      |   Yes        |   Yes      |     No        |    Yes
+| SM6.0 Wave Intrinsics       |     No       |   Yes        |  Partial   |     Yes       |    No
+| SM6.0 Quad Intrinsics       |     No       |   Yes        |   No +     |     No        |    No
+| SM6.5 Wave Intrinsics       |     No       |   Yes ^      |   No +     |     Yes       |    No
+| Tesselation                 |     Yes ^    |   Yes ^      |   No +     |     No        |    No
+| Graphics Pipeline           |     Yes      |   Yes        |   Yes      |     No        |    No
+| Ray Tracing DXR 1.0         |     No       |   Yes ^      |   Yes ^    |     No        |    No
+| Ray Tracing DXR 1.1         |     No       |   Yes        |   No +     |     No        |    No
+| Native Bindless             |     No       |    No        |   No       |     Yes       |    Yes
+| Buffer bounds               |     Yes      |   Yes        |   Yes      |   Limited ^   |    Limited ^
+| Resource bounds             |     Yes      |   Yes        |   Yes      | Yes (optional)|    Yes
+| Atomics                     |     Yes      |   Yes        |   Yes      |     Yes       |    Yes
+| Group shared mem/Barriers   |     Yes      |   Yes        |   Yes      |     Yes       |    No + 
+| TextureArray.Sample float   |     Yes      |   Yes        |   Yes      |     No        |    Yes
+| Separate Sampler            |     Yes      |   Yes        |   Yes      |     No        |    Yes
+| tex.Load                    |     Yes      |   Yes        |   Yes      |  Limited ^    |    Yes
+| Full bool                   |     Yes      |   Yes        |   Yes      |     No        |    Yes ^ 
+| Mesh Shader                 |     No       |   No +       |   No +     |     No        |    No
+
+## Half Type
+
+There appears to be a problem writing to a StructuredBuffer containing half on D3D12. D3D12 also appears to have problems doing calculations with half.
+
+## u/int64_t Type
+
+Requires SM6.0 which requires DXIL for D3D12. Therefore not available with DXBC on D3D11 or D3D12.
+
+## int matrix
+
+Means can use matrix types containing integer types. 
+
+## tex.GetDimensions
+
+tex.GetDimensions is the GetDimensions method on 'texture' objects. This is not supported on CUDA as CUDA has no equivalent functionality to get these values. GetDimensions work on Buffer resource types on CUDA.
+
+## SM6.5 Wave Intrinsics
+
+SM6.5 Wave Intrinsics are supported, but requires a downstream DXC compiler that supports SM6.5. As it stands the DXC shipping with windows does not. 
+
+## Tesselation
+
+Although tesselation stages should work on D3D11 and D3D12 they are not tested within our test framework, and may have problems. 
+
+## Native Bindless  
+
+Bindless is possible on targets that support it - but is not the default behavior for those targets, and typically require significant effort in Slang code. 
+
+'Native Bindless' targets use a form of 'bindless' for all targets. On CUDA this requires the target to use 'texture object' style binding and for the device to have 'compute capability 3.0' or higher.
+
+## Resource bounds 
+
+For CUDA this is optional as can be controlled via the SLANG_CUDA_BOUNDARY_MODE macro in the `slang-cuda-prelude.h`. By default it's behavior is `cudaBoundaryModeZero`.
+
+## Buffer Bounds
+
+This is the feature when accessing outside of the bounds of a Buffer there is well defined behavior - on read returning all 0s, and on write, the write being ignored.
+
+On CPU there is only bounds checking on debug compilation of C++ code. This will assert if the access is out of range.
+
+On CUDA out of bounds accesses default to element 0 (!). The behavior can be controlled via the SLANG_CUDA_BOUND_CHECK macro in the `slang-cuda-prelude.h`. This behavior may seem a little strange - and it requires a buffer that has at least one member to not do something nasty. It is really a 'least worst' answer to a difficult problem and is better than out of range accesses or worse writes.
+
+## TextureArray.Sample float 
+
+When using 'Sample' on a TextureArray, CUDA treats the array index parameter as an int, even though it is passed as a float.
+
+## Separate Sampler
+
+This feature means that a multiple Samplers can be used with a Texture. In terms of the HLSL code this can be seen as the 'SamplerState' being a parameter passed to the 'Sample' method on a texture object. 
+
+On CUDA the SamplerState is ignored, because on this target a 'texture object' is the Texture and Sampler combination.
+
+## Graphics Pipeline
+
+CPU and CUDA only currently support compute shaders. 
+
+## Ray Tracing DXR 1.0
+
+Vulkan does not support a local root signature, but there is the concept of a 'shader record'. In Slang a single constant buffer can be marked as a shader record with the `[[vk::shader_record]]` attribute, for example:
+
+```
+[[vk::shader_record]]
+cbuffer ShaderRecord
+{
+	uint shaderRecordID;
+} 
+```
+
+In practice to write shader code that works across D3D12 and VK you should have a single constant buffer marked as 'shader record' for VK and then on D3D that constant buffer should be bound in the local root signature on D3D. 
+
+## tex.Load
+
+tex.Load is only supported on CUDA for Texture1D. Additionally CUDA only allows such access for linear memory, meaning the bound texture can also not have mip maps. Load *is* allowed on RWTexture types of other dimensions including 1D on CUDA.
+
+## Full bool
+
+Means fully featured bool support. CUDA has issues around bool because there isn't a vector bool type built in. Currently bool aliases to an int vector type. 
+
+On CPU there are some issues in so far as bool's size is not well defined in size an alignment. Most C++ compilers now use a byte to represent a bool. In the past it has been backed by an int on some compilers.
author	jsmall-nvidia <jsmall@nvidia.com>	2020-03-23 17:55:49 -0400
committer	GitHub <noreply@github.com>	2020-03-23 17:55:49 -0400
commit	7b4e0e1892bad9f51677b191c69b01aee7403632 (patch)
tree	613acd7984b16242fd0f7c4138a71ba71576f974 /docs
parent	05c9a5c9dc23a716c7cbeae91f581bbc13f10ed2 (diff)