diff options
| author | jsmall-nvidia <jsmall@nvidia.com> | 2023-02-14 18:30:04 -0500 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2023-02-14 18:30:04 -0500 |
| commit | 598e07f580d47c998885c946c0bfacd08bfec6e6 (patch) | |
| tree | 859bb98f6b8e6cfa062091ef34cea1436ae221d9 /docs | |
| parent | b92a75db2aab1adffe08ae0103cafb080f9795e2 (diff) | |
Preliminary Shader Execution Reordering Doc (#2648)
* #include an absolute path didn't work - because paths were taken to always be relative.
* Add preliminary Shader Execution Reordering doc.
Update target-compatibility docs.
* Fix debugBreak.
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/shader-execution-reordering.md | 453 | ||||
| -rw-r--r-- | docs/target-compatibility.md | 116 |
2 files changed, 530 insertions, 39 deletions
diff --git a/docs/shader-execution-reordering.md b/docs/shader-execution-reordering.md new file mode 100644 index 000000000..f6d44fcab --- /dev/null +++ b/docs/shader-execution-reordering.md @@ -0,0 +1,453 @@ +Shader Execution Reordering (SER) +================================= + +Slang provides preliminary support for Shader Execution Reordering (SER). The API hasn't been finalized and may change in the future. + +The feature is available on D3D12 via [NVAPI](nvapi-support.md) and on Vulkan through the [GL_NV_shader_invocation_reorder](https://github.com/KhronosGroup/GLSL/blob/master/extensions/nv/GLSL_NV_shader_invocation_reorder.txt) extension. + +Note: An upgrade is required to `slang-glslang` and associated projects to add support for SPIR-V output via Slang. + +## Vulkan + +SER as implemented on Vulkan has extra limitations on usage. On D3D via NvAPI `HitObject` variables are like regular variables. They can be assigned, passed to functions and so forth. Using `GL_NV_shader_invocation_reorder` on Vulkan, this isn't the case and `HitObject` variables are special and act is if their introduction allocates a single unique entry. One implication of this is there are limitations on Vulkan around HitObject with flow control, and assignment to HitObject variables. + +TODO: Examples and discussion around these limitation. + +## Links + +* [SER white paper for NVAPI](https://developer.nvidia.com/sites/default/files/akamai/gameworks/ser-whitepaper.pdf) + +# Preliminary API + +The API is preliminary and based on the NvAPI SER interface. It may change with future Slang versions. + +## Free Functions + +* [ReorderThread](#reorder-thread) + +-------------------------------------------------------------------------------- +# `struct HitObject` + +## Description + +Immutable data type representing a ray hit or a miss. Can be used to invoke hit or miss shading, +or as a key in ReorderThread. Created by one of several methods described below. HitObject +and its related functions are available in raytracing shader types only. + +## Methods + +* [TraceRay](#trace-ray) +* [TraceMotionRay](#trace-motion-ray) +* [MakeMiss](#make-miss) +* [MakeHit](#make-hit) +* [MakeMotionHit](#make-motion-hit) +* [MakeMotionMiss](#make-motion-miss) +* [MakeNop](#make-nop) +* [Invoke](#invoke) +* [IsMiss](#is-miss) +* [IsHit](#is-hit) +* [IsNop](#is-nop) +* [GetRayDesc](#get-ray-desc) +* [GetShaderTableIndex](#get-shader-table-index) +* [GetInstanceIndex](#get-instance-index) +* [GetInstanceID](#get-instance-id) +* [GetGeometryIndex](#get-geometry-index) +* [GetPrimitiveIndex](#get-primitive-index) +* [GetHitKind](#get-hit-kind) +* [LoadLocalRootTableConstant](#load-local-root-table-constant) + +-------------------------------------------------------------------------------- +<a id="trace-ray"></a> +# `HitObject.TraceRay` + +## Description + +Executes ray traversal (including anyhit and intersection shaders) like TraceRay, but returns the +resulting hit information as a HitObject and does not trigger closesthit or miss shaders. + +## Signature + +``` +static HitObject HitObject.TraceRay<payload_t>( + RaytracingAccelerationStructure AccelerationStructure, + uint RayFlags, + uint InstanceInclusionMask, + uint RayContributionToHitGroupIndex, + uint MultiplierForGeometryContributionToHitGroupIndex, + uint MissShaderIndex, + RayDesc Ray, + inout payload_t Payload); +``` + +-------------------------------------------------------------------------------- +<a id="trace-motion-ray"></a> +# `HitObject.TraceMotionRay` + +## Description + +Executes motion ray traversal (including anyhit and intersection shaders) like TraceRay, but returns the +resulting hit information as a HitObject and does not trigger closesthit or miss shaders. + +## Signature + +``` +static HitObject HitObject.TraceMotionRay<payload_t>( + RaytracingAccelerationStructure AccelerationStructure, + uint RayFlags, + uint InstanceInclusionMask, + uint RayContributionToHitGroupIndex, + uint MultiplierForGeometryContributionToHitGroupIndex, + uint MissShaderIndex, + RayDesc Ray, + float CurrentTime, + inout payload_t Payload); +``` + + +-------------------------------------------------------------------------------- +<a id="make-hit"></a> +# `HitObject.MakeHit` + +## Description + +Creates a HitObject representing a hit based on values explicitly passed as arguments, without +tracing a ray. The primitive specified by AccelerationStructure, InstanceIndex, GeometryIndex, +and PrimitiveIndex must exist. The shader table index is computed using the formula used with +TraceRay. The computed index must reference a valid hit group record in the shader table. The +Attributes parameter must either be an attribute struct, such as +BuiltInTriangleIntersectionAttributes, or another HitObject to copy the attributes from. + +## Signature + +``` +static HitObject HitObject.MakeHit<attr_t>( + RaytracingAccelerationStructure AccelerationStructure, + uint InstanceIndex, + uint GeometryIndex, + uint PrimitiveIndex, + uint HitKind, + uint RayContributionToHitGroupIndex, + uint MultiplierForGeometryContributionToHitGroupIndex, + RayDesc Ray, + attr_t attributes); +static HitObject HitObject.MakeHit<attr_t>( + uint HitGroupRecordIndex, + RaytracingAccelerationStructure AccelerationStructure, + uint InstanceIndex, + uint GeometryIndex, + uint PrimitiveIndex, + uint HitKind, + RayDesc Ray, + attr_t attributes); +``` + +-------------------------------------------------------------------------------- +<a id="make-motion-hit"></a> +# `HitObject.MakeMotionHit` + +## Description + +See MakeHit but handles Motion +Currently only supported on VK + +## Signature + +``` +static HitObject HitObject.MakeMotionHit<attr_t>( + RaytracingAccelerationStructure AccelerationStructure, + uint InstanceIndex, + uint GeometryIndex, + uint PrimitiveIndex, + uint HitKind, + uint RayContributionToHitGroupIndex, + uint MultiplierForGeometryContributionToHitGroupIndex, + RayDesc Ray, + float CurrentTime, + attr_t attributes); +static HitObject HitObject.MakeMotionHit<attr_t>( + uint HitGroupRecordIndex, + RaytracingAccelerationStructure AccelerationStructure, + uint InstanceIndex, + uint GeometryIndex, + uint PrimitiveIndex, + uint HitKind, + RayDesc Ray, + float CurrentTime, + attr_t attributes); +``` + +-------------------------------------------------------------------------------- +<a id="make-miss"></a> +# `HitObject.MakeMiss` + +## Description + +Creates a HitObject representing a miss based on values explicitly passed as arguments, without +tracing a ray. The provided shader table index must reference a valid miss record in the shader +table. + +## Signature + +``` +static HitObject HitObject.MakeMiss( + uint MissShaderIndex, + RayDesc Ray); +``` + +-------------------------------------------------------------------------------- +<a id="make-motion-miss"></a> +# `HitObject.MakeMotionMiss` + +## Description + +See MakeMiss but handles Motion +Currently only supported on VK + +## Signature + +``` +static HitObject HitObject.MakeMotionMiss( + uint MissShaderIndex, + RayDesc Ray, + float CurrentTime); +``` + +-------------------------------------------------------------------------------- +<a id="make-nop"></a> +# `HitObject.MakeNop` + +## Description + +Creates a HitObject representing “NOP” (no operation) which is neither a hit nor a miss. Invoking a +NOP hit object using HitObject::Invoke has no effect. Reordering by hit objects using +ReorderThread will group NOP hit objects together. This can be useful in some reordering +scenarios where future control flow for some threads is known to process neither a hit nor a +miss. + +## Signature + +``` +static HitObject HitObject.MakeNop(); +``` + +-------------------------------------------------------------------------------- +<a id="invoke"></a> +# `HitObject.Invoke` + +## Description + +Invokes closesthit or miss shading for the specified hit object. In case of a NOP HitObject, no +shader is invoked. + +## Signature + +``` +static void HitObject.Invoke<payload_t>( + RaytracingAccelerationStructure AccelerationStructure, + HitObject HitOrMiss, + inout payload_t Payload); +``` + +-------------------------------------------------------------------------------- +<a id="is-miss"></a> +# `HitObject.IsMiss` + +## Description + +Returns true if the HitObject encodes a miss, otherwise returns false. + +## Signature + +``` +bool HitObject.IsMiss(); +``` + +-------------------------------------------------------------------------------- +<a id="is-hit"></a> +# `HitObject.IsHit` + +## Description + +Returns true if the HitObject encodes a hit, otherwise returns false. + +## Signature + +``` +bool HitObject.IsHit(); +``` + +-------------------------------------------------------------------------------- +<a id="is-nop"></a> +# `HitObject.IsNop` + +## Description + +Returns true if the HitObject encodes a nop, otherwise returns false. + +## Signature + +``` +bool HitObject.IsNop(); +``` + +-------------------------------------------------------------------------------- +<a id="get-ray-desc"></a> +# `HitObject.GetRayDesc` + +## Description + +Queries ray properties from HitObject. Valid if the hit object represents a hit or a miss. + +## Signature + +``` +RayDesc HitObject.GetRayDesc(); +``` + +-------------------------------------------------------------------------------- +<a id="get-shader-table-index"></a> +# `HitObject.GetShaderTableIndex` + +## Description + +Queries shader table index from HitObject. Valid if the hit object represents a hit or a miss. + +## Signature + +``` +uint HitObject.GetShaderTableIndex(); +``` + +-------------------------------------------------------------------------------- +<a id="get-instance-index"></a> +# `HitObject.GetInstanceIndex` + +## Description + +Returns the instance index of a hit. Valid if the hit object represents a hit. + +## Signature + +``` +uint HitObject.GetInstanceIndex(); +``` + +-------------------------------------------------------------------------------- +<a id="get-instance-id"></a> +# `HitObject.GetInstanceID` + +## Description + +Returns the instance ID of a hit. Valid if the hit object represents a hit. + +## Signature + +``` +uint HitObject.GetInstanceID(); +``` + +-------------------------------------------------------------------------------- +<a id="get-geometry-index"></a> +# `HitObject.GetGeometryIndex` + +## Description + +Returns the geometry index of a hit. Valid if the hit object represents a hit. + +## Signature + +``` +uint HitObject.GetGeometryIndex(); +``` + +-------------------------------------------------------------------------------- +<a id="get-primitive-index"></a> +# `HitObject.GetPrimitiveIndex` + +## Description + +Returns the primitive index of a hit. Valid if the hit object represents a hit. + +## Signature + +``` +uint HitObject.GetPrimitiveIndex(); +``` + +-------------------------------------------------------------------------------- +<a id="get-hit-kind"></a> +# `HitObject.GetHitKind` + +## Description + +Returns the hit kind. Valid if the hit object represents a hit. + +## Signature + +``` +uint HitObject.GetHitKind(); +``` + +-------------------------------------------------------------------------------- +<a id="get-attributes"></a> +# `HitObject.GetAttributes` + +## Description + +Returns the attributes of a hit. Valid if the hit object represents a hit or a miss. + +## Signature + +``` +attr_t HitObject.GetAttributes<attr_t>(); +``` + +-------------------------------------------------------------------------------- +<a id="load-local-root-table-constant"></a> +# `HitObject.LoadLocalRootTableConstant` + +## Description + +Loads a root constant from the local root table referenced by the hit object. Valid if the hit object +represents a hit or a miss. RootConstantOffsetInBytes must be a multiple of 4. + +## Signature + +``` +uint HitObject.LoadLocalRootTableConstant(uint RootConstantOffsetInBytes); +``` + +-------------------------------------------------------------------------------- +<a id="reorder-thread"></a> +# `ReorderThread` + +## Description + +Reorders threads based on a coherence hint value. NumCoherenceHintBits indicates how many of +the least significant bits of CoherenceHint should be considered during reordering (max: 16). +Applications should set this to the lowest value required to represent all possible values in +CoherenceHint. For best performance, all threads should provide the same value for +NumCoherenceHintBits. +Where possible, reordering will also attempt to retain locality in the thread’s launch indices +(DispatchRaysIndex in DXR). + +`ReorderThread(HitOrMiss)` is equivalent to + +``` +void ReorderThread( HitObject HitOrMiss, uint CoherenceHint, uint NumCoherenceHintBitsFromLSB ); +``` + +With CoherenceHint and NumCoherenceHintBitsFromLSB as 0, meaning they are ignored. + +## Signature + +``` +void ReorderThread( + uint CoherenceHint, + uint NumCoherenceHintBitsFromLSB); +void ReorderThread( + HitObject HitOrMiss, + uint CoherenceHint, + uint NumCoherenceHintBitsFromLSB); +void ReorderThread(HitObject HitOrMiss); +``` diff --git a/docs/target-compatibility.md b/docs/target-compatibility.md index 7769fe8db..96d1353a9 100644 --- a/docs/target-compatibility.md +++ b/docs/target-compatibility.md @@ -7,89 +7,100 @@ OpenGL compatibility is not listed here, because OpenGL isn't an officially supp Items with a + means that the feature is anticipated to be added in the future. Items with ^ means there is some discussion about support later in the document for this target. -| Feature | D3D11 | D3D12 | VK | CUDA | CPU -|-----------------------------|--------------|--------------|------------|---------------|--------------- -| Half Type | No | Yes ^ | Yes | Yes ^ | No + -| Double Type | Yes | Yes | Yes | Yes | Yes -| Double Intrinsics | No | Limited + | Limited | Most | Yes -| u/int8_t Type | No | No | Yes ^ | Yes | Yes -| u/int16_t Type | No | Yes ^ | Yes ^ | Yes | Yes -| u/int64_t Type | No | Yes ^ | Yes | Yes | Yes -| u/int64_t Intrinsics | No | No | Yes | Yes | Yes -| int matrix | Yes | Yes | No + | Yes | Yes -| tex.GetDimension | Yes | Yes | Yes | No | Yes -| SM6.0 Wave Intrinsics | No | Yes | Partial | Yes ^ | No -| SM6.0 Quad Intrinsics | No | Yes | No + | No | No -| SM6.5 Wave Intrinsics | No | Yes ^ | No + | Yes ^ | No -| WaveMask Intrinsics | Yes ^ | Yes ^ | Yes + | Yes | No -| WaveShuffle | No | Limited ^ | Yes | Yes | No -| Tesselation | Yes ^ | Yes ^ | No + | No | No -| Graphics Pipeline | Yes | Yes | Yes | No | No -| Ray Tracing DXR 1.0 | No | Yes ^ | Yes ^ | No | No -| Ray Tracing DXR 1.1 | No | Yes | No + | No | No -| Native Bindless | No | No | No | Yes | Yes -| Buffer bounds | Yes | Yes | Yes | Limited ^ | Limited ^ -| Resource bounds | Yes | Yes | Yes | Yes (optional)| Yes -| Atomics | Yes | Yes | Yes | Yes | Yes -| Group shared mem/Barriers | Yes | Yes | Yes | Yes | No + -| TextureArray.Sample float | Yes | Yes | Yes | No | Yes -| Separate Sampler | Yes | Yes | Yes | No | Yes -| tex.Load | Yes | Yes | Yes | Limited ^ | Yes -| Full bool | Yes | Yes | Yes | No | Yes ^ -| Mesh Shader | No | No + | No + | No | No -| `[unroll]` | Yes | Yes | Yes ^ | Yes | Limited + -| Atomics | Yes | Yes | Yes | Yes | No + -| Atomics on RWBuffer | Yes | Yes | Yes | No | No + -| Sampler Feedback | No | Yes | No + | No | Yes ^ -| RWByteAddressBuffer Atomic | No | Yes ^ | Yes ^ | Yes | No + -| debugBreak | No | No | Yes | Yes | Yes - +| Feature | D3D11 | D3D12 | VK | CUDA | CPU +|-----------------------------------------------------|--------------|--------------|------------|---------------|--------------- +| [Half Type](#half) | No | Yes ^ | Yes | Yes ^ | No + +| Double Type | Yes | Yes | Yes | Yes | Yes +| Double Intrinsics | No | Limited + | Limited | Most | Yes +| [u/int8_t Type](#int8_t) | No | No | Yes ^ | Yes | Yes +| [u/int16_t Type](#int16_t) | No | Yes ^ | Yes ^ | Yes | Yes +| [u/int64_t Type](#int64_t) | No | Yes ^ | Yes | Yes | Yes +| u/int64_t Intrinsics | No | No | Yes | Yes | Yes +| [int matrix](#int-matrix) | Yes | Yes | No + | Yes | Yes +| [tex.GetDimensions](#tex-get-dimensions) | Yes | Yes | Yes | No | Yes +| [SM6.0 Wave Intrinsics](#sm6-wave) | No | Yes | Partial | Yes ^ | No +| SM6.0 Quad Intrinsics | No | Yes | No + | No | No +| [SM6.5 Wave Intrinsics](#sm6.5-wave) | No | Yes ^ | No + | Yes ^ | No +| [WaveMask Intrinsics](#wave-mask) | Yes ^ | Yes ^ | Yes + | Yes | No +| [WaveShuffle](#wave-shuffle) | No | Limited ^ | Yes | Yes | No +| [Tesselation](#tesselation) | Yes ^ | Yes ^ | No + | No | No +| [Graphics Pipeline](#graphics-pipeline) | Yes | Yes | Yes | No | No +| [Ray Tracing DXR 1.0](#ray-tracing-1.0) | No | Yes ^ | Yes ^ | No | No +| Ray Tracing DXR 1.1 | No | Yes | No + | No | No +| [Native Bindless](#native-bindless) | No | No | No | Yes | Yes +| [Buffer bounds](#buffer-bounds) | Yes | Yes | Yes | Limited ^ | Limited ^ +| [Resource bounds](#resource-bounds) | Yes | Yes | Yes | Yes (optional)| Yes +| Atomics | Yes | Yes | Yes | Yes | Yes +| Group shared mem/Barriers | Yes | Yes | Yes | Yes | No + +| [TextureArray.Sample float](#tex-array-sample-float)| Yes | Yes | Yes | No | Yes +| [Separate Sampler](#separate-sampler) | Yes | Yes | Yes | No | Yes +| [tex.Load](#tex-load) | Yes | Yes | Yes | Limited ^ | Yes +| [Full bool](#full-bool) | Yes | Yes | Yes | No | Yes ^ +| [Mesh Shader](#mesh-shader) | No | Yes | Yes | No | No +| [`[unroll]`](#unroll] | Yes | Yes | Yes ^ | Yes | Limited + +| Atomics | Yes | Yes | Yes | Yes | No + +| [Atomics on RWBuffer](#rwbuffer-atomics) | Yes | Yes | Yes | No | No + +| [Sampler Feedback](#sampler-feedback) | No | Yes | No + | No | Yes ^ +| [RWByteAddressBuffer Atomic](#byte-address-atomic) | No | Yes ^ | Yes ^ | Yes | No + +| [Shader Execution Reordering](#ser) | No | Yes ^ | Yes ^ | No | No +| [debugBreak](#debug-break) | No | No | Yes | Yes | Yes + +<a id="half"></a> ## Half Type There appears to be a problem writing to a StructuredBuffer containing half on D3D12. D3D12 also appears to have problems doing calculations with half. In order for half to work in CUDA, NVRTC must be able to include `cuda_fp16.h` and related files. Please read the [CUDA target documentation](cuda-target.md) for more details. +<a id="int8_t"></a> ## u/int8_t Type Not currently supported in D3D11/D3D12 because not supported in HLSL/DXIL/DXBC. Supported in Vulkan via the extensions `GL_EXT_shader_explicit_arithmetic_types` and `GL_EXT_shader_8bit_storage`. +<a id="int16_t"></a> ## u/int16_t Type Requires SM6.2 which requires DXIL and therefore DXC and D3D12. For DXC this is discussed [here](https://github.com/Microsoft/DirectXShaderCompiler/wiki/16-Bit-Scalar-Types). Supported in Vulkan via the extensions `GL_EXT_shader_explicit_arithmetic_types` and `GL_EXT_shader_16bit_storage`. +<a id="int64_t"></a> ## u/int64_t Type Requires SM6.0 which requires DXIL for D3D12. Therefore not available with DXBC on D3D11 or D3D12. +<a id="int-matrix"></a> ## int matrix Means can use matrix types containing integer types. +<a id="tex-get-dimensions"></a> ## tex.GetDimensions tex.GetDimensions is the GetDimensions method on 'texture' objects. This is not supported on CUDA as CUDA has no equivalent functionality to get these values. GetDimensions work on Buffer resource types on CUDA. +<a id="sm6-wave"></a> ## SM6.0 Wave Intrinsics CUDA has premliminary support for Wave Intrinsics, introduced in [PR #1352](https://github.com/shader-slang/slang/pull/1352). Slang synthesizes the 'WaveMask' based on program flow and the implied 'programmer view' of exectution. This support is built on top of WaveMask intrinsics with Wave Intrinsics being replaced with WaveMask Intrinsic calls with Slang generating the code to calculate the appropriate WaveMasks. Please read [PR #1352](https://github.com/shader-slang/slang/pull/1352) for a better description of the status. +<a id="sm6.5-wave"></a> ## SM6.5 Wave Intrinsics SM6.5 Wave Intrinsics are supported, but requires a downstream DXC compiler that supports SM6.5. As it stands the DXC shipping with windows does not. +<a id="wave-mask"></a> ## WaveMask Intrinsics In order to map better to the CUDA sync/mask model Slang supports 'WaveMask' intrinsics. They operate in broadly the same way as the Wave intrinsics, but require the programmer to specify the lanes that are involved. To write code that uses wave intrinsics acrosss targets including CUDA, currently the WaveMask intrinsics must be used. For this to work, the masks passed to the WaveMask functions should exactly match the 'Active lanes' concept that HLSL uses, otherwise the result is undefined. The WaveMask intrinsics are not part of HLSL and are only available on Slang. +<a id="wave-shuffle"></a> ## WaveShuffle `WaveShuffle` and `WaveBroadcastLaneAt` are Slang specific intrinsic additions to expand the options available around `WaveReadLaneAt`. @@ -110,20 +121,24 @@ Other than the different restrictions on laneId they act identically to WaveRead On HLSL based targets currently `WaveShuffle` will be converted into `WaveReadLaneAt`. Strictly speaking this means it *requires* the `laneId` to be `dynamically uniform` across the Wave. In practice some hardware supports the loosened usage, and others does not. In the future this may be fixed in Slang and/or HLSL to work across all hardware. For now if you use `WaveShuffle` on HLSL based targets it will be necessary to confirm that `WaveReadLaneAt` has the loosened behavior for all the hardware intended. If target hardware does not support the loosened restrictions it's behavior is undefined. +<a id="tesselation"></a> ## Tesselation Although tesselation stages should work on D3D11 and D3D12 they are not tested within our test framework, and may have problems. +<a id="native-bindless"></a> ## Native Bindless Bindless is possible on targets that support it - but is not the default behavior for those targets, and typically require significant effort in Slang code. 'Native Bindless' targets use a form of 'bindless' for all targets. On CUDA this requires the target to use 'texture object' style binding and for the device to have 'compute capability 3.0' or higher. +<a id="resource-bounds"></a> ## Resource bounds For CUDA this is optional as can be controlled via the SLANG_CUDA_BOUNDARY_MODE macro in the `slang-cuda-prelude.h`. By default it's behavior is `cudaBoundaryModeZero`. +<a id="buffer-bounds"></a> ## Buffer Bounds This is the feature when accessing outside of the bounds of a Buffer there is well defined behavior - on read returning all 0s, and on write, the write being ignored. @@ -132,20 +147,24 @@ On CPU there is only bounds checking on debug compilation of C++ code. This will On CUDA out of bounds accesses default to element 0 (!). The behavior can be controlled via the SLANG_CUDA_BOUND_CHECK macro in the `slang-cuda-prelude.h`. This behavior may seem a little strange - and it requires a buffer that has at least one member to not do something nasty. It is really a 'least worst' answer to a difficult problem and is better than out of range accesses or worse writes. +<a id="tex-array-sample-float"></a> ## TextureArray.Sample float When using 'Sample' on a TextureArray, CUDA treats the array index parameter as an int, even though it is passed as a float. +<a id="separate-sampler"></a> ## Separate Sampler This feature means that a multiple Samplers can be used with a Texture. In terms of the HLSL code this can be seen as the 'SamplerState' being a parameter passed to the 'Sample' method on a texture object. On CUDA the SamplerState is ignored, because on this target a 'texture object' is the Texture and Sampler combination. +<a id="graphics-pipeline"></a> ## Graphics Pipeline CPU and CUDA only currently support compute shaders. +<a id="ray-tracing-1.0"></a> ## Ray Tracing DXR 1.0 Vulkan does not support a local root signature, but there is the concept of a 'shader record'. In Slang a single constant buffer can be marked as a shader record with the `[[vk::shader_record]]` attribute, for example: @@ -160,16 +179,19 @@ cbuffer ShaderRecord In practice to write shader code that works across D3D12 and VK you should have a single constant buffer marked as 'shader record' for VK and then on D3D that constant buffer should be bound in the local root signature on D3D. +<a id="tex-load"></a> ## tex.Load tex.Load is only supported on CUDA for Texture1D. Additionally CUDA only allows such access for linear memory, meaning the bound texture can also not have mip maps. Load *is* allowed on RWTexture types of other dimensions including 1D on CUDA. +<a id="full-bool"></a> ## Full bool Means fully featured bool support. CUDA has issues around bool because there isn't a vector bool type built in. Currently bool aliases to an int vector type. On CPU there are some issues in so far as bool's size is not well defined in size an alignment. Most C++ compilers now use a byte to represent a bool. In the past it has been backed by an int on some compilers. +<a id="unroll"></a> ## `[unroll]` The unroll attribute allows for unrolling `for` loops. At the moment the feature is dependent on downstream compiler support which is mixed. In the longer term the intention is for Slang to contain it's own loop unroller - and therefore not be dependent on the feature on downstream compilers. @@ -180,6 +202,7 @@ On GLSL and VK targets loop unrolling uses the [GL_EXT_control_flow_attributes]( Slang does have a cross target mechanism to [unroll loops](language-reference/06-statements.md), in the section `Compile-Time For Statement`. +<a id="rwbuffer-atomics"></a> ## Atomics on RWBuffer For VK the GLSL output from Slang seems plausible, but VK binding fails in tests harness. @@ -188,6 +211,7 @@ On CUDA RWBuffer becomes CUsurfObject, which is a 'texture' type and does not su On the CPU atomics are not supported, but will be in the future. +<a id="sampler-feedback"></a> ## Sampler Feedback The HLSL [sampler feedback feature](https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html) is available for DirectX12. The features requires shader model 6.5 and therefore a version of [DXC](https://github.com/Microsoft/DirectXShaderCompiler) that supports that model or higher. The Shader Model 6.5 requirement also means only DXIL binary format is supported. @@ -196,6 +220,7 @@ There doesn't not appear to be a similar feature available in Vulkan yet, but wh For CPU targets there is the IFeedbackTexture interface that requires an implemention for use. Slang does not currently include CPU implementations for texture types. +<a id="byte-address-atomic"></a> ## RWByteAddressBuffer Atomic The additional supported methods on RWByteAddressBuffer are... @@ -224,9 +249,22 @@ in the separate [NVAPI Support](nvapi-support.md) document. On Vulkan, for float the [`GL_EXT_shader_atomic_float`](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_shader_atomic_float.html) extension is required. For int64 the [`GL_EXT_shader_atomic_int64`](https://raw.githubusercontent.com/KhronosGroup/GLSL/master/extensions/ext/GL_EXT_shader_atomic_int64.txt) extension is required. -CUDA requires SM6.0 or higher for int64 support. +CUDA requires SM6.0 or higher for int64 support. + +<a id="mesh-shader"></a> +## Mesh Shader + +There is preliminary [Mesh Shader support](https://github.com/shader-slang/slang/pull/2464). + +<a id="ser"></a> +## Shader Execution Reordering + +More information about [Shader Execution Reordering](shader-execution-reordering.md). + +Currently support is available in D3D12 via NVAPI, and for Vulkan via the [GL_NV_shader_invocation_reorder](https://github.com/KhronosGroup/GLSL/blob/master/extensions/nv/GLSL_NV_shader_invocation_reorder.txt) extension. -## debugBreak +<a id="debug-break"></a> +## Debug Break Slang has preliminary support for `debugBreak()` intrinsic. With the appropriate tooling, when `debugBreak` is hit it will cause execution to halt and display in the attached debugger. |
