Metal documentation (#5549)

Closes https://github.com/shader-slang/slang/issues/4262 Co-authored-by: Yong He <yonghe@outlook.com>
author: Ellie Hermaszewska <ellieh@nvidia.com> 2024-11-20 02:23:59 +0800
committer: GitHub <noreply@github.com> 2024-11-19 10:23:59 -0800
commit: 6e4473d1dc18d6a1d6c8e57df7af649f67719419 (patch)
tree: ef8bd7e79d83add61aad385292460da1d5e2e8c6
parent: 0bf6a668208c65c980648fbe74a8c0a7bf4ded77 (diff)
3 files changed, 465 insertions, 133 deletions
diff --git a/docs/target-compatibility.md b/docs/target-compatibility.md
index 0fdeeca8d..5ea793c68 100644
--- a/docs/target-compatibility.md
+++ b/docs/target-compatibility.md
@@ -1,66 +1,68 @@
-Slang Target Compatibility 
-==========================
+# Slang Target Compatibility
 
 Shader Model (SM) numbers are D3D Shader Model versions, unless explicitly stated otherwise.
-OpenGL compatibility is not listed here, because OpenGL isn't an officially supported target. 
+OpenGL compatibility is not listed here, because OpenGL isn't an officially supported target.
 
 Items with a + means that the feature is anticipated to be added in the future.
 Items with ^ means there is some discussion about support later in the document for this target.
 
-| Feature                                             |    D3D11     |    D3D12     |     VK     |      CUDA     |    CPU
-|-----------------------------------------------------|--------------|--------------|------------|---------------|---------------
-| [Half Type](#half)                                  |     No       |     Yes ^    |   Yes      |     Yes ^     |    No +
-| Double Type                                         |     Yes      |     Yes      |   Yes      |     Yes       |    Yes
-| Double Intrinsics                                   |     No       |   Limited +  |  Limited   |     Most      |    Yes
-| [u/int8_t Type](#int8_t)                            |     No       |   No         |   Yes ^    |     Yes       |    Yes
-| [u/int16_t Type](#int16_t)                          |     No       |   Yes ^      |   Yes ^    |     Yes       |    Yes
-| [u/int64_t Type](#int64_t)                          |     No       |   Yes ^      |   Yes      |     Yes       |    Yes
-| u/int64_t Intrinsics                                |     No       |   No         |   Yes      |     Yes       |    Yes
-| [int matrix](#int-matrix)                           |     Yes      |   Yes        |   No +     |     Yes       |    Yes
-| [tex.GetDimensions](#tex-get-dimensions)            |     Yes      |   Yes        |   Yes      |     No        |    Yes
-| [SM6.0 Wave Intrinsics](#sm6-wave)                  |     No       |   Yes        |  Partial   |     Yes ^     |    No
-| SM6.0 Quad Intrinsics                               |     No       |   Yes        |   No +     |     No        |    No
-| [SM6.5 Wave Intrinsics](#sm6.5-wave)                |     No       |   Yes ^      |   No +     |     Yes ^     |    No
-| [WaveMask Intrinsics](#wave-mask)                   |     Yes ^    |   Yes ^      |   Yes +    |     Yes       |    No
-| [WaveShuffle](#wave-shuffle)                        |     No       |   Limited ^  |   Yes      |     Yes       |    No
-| [Tesselation](#tesselation)                         |     Yes ^    |   Yes ^      |   No +     |     No        |    No
-| [Graphics Pipeline](#graphics-pipeline)             |     Yes      |   Yes        |   Yes      |     No        |    No
-| [Ray Tracing DXR 1.0](#ray-tracing-1.0)             |     No       |   Yes ^      |   Yes ^    |     No        |    No
-| Ray Tracing DXR 1.1                                 |     No       |   Yes        |   No +     |     No        |    No
-| [Native Bindless](#native-bindless)                 |     No       |    No        |   No       |     Yes       |    Yes
-| [Buffer bounds](#buffer-bounds)                     |     Yes      |   Yes        |   Yes      |   Limited ^   |    Limited ^
-| [Resource bounds](#resource-bounds)                 |     Yes      |   Yes        |   Yes      | Yes (optional)|    Yes
-| Atomics                                             |     Yes      |   Yes        |   Yes      |     Yes       |    Yes
-| Group shared mem/Barriers                           |     Yes      |   Yes        |   Yes      |     Yes       |    No + 
-| [TextureArray.Sample float](#tex-array-sample-float)|     Yes      |   Yes        |   Yes      |     No        |    Yes
-| [Separate Sampler](#separate-sampler)               |     Yes      |   Yes        |   Yes      |     No        |    Yes
-| [tex.Load](#tex-load)                               |     Yes      |   Yes        |   Yes      |  Limited ^    |    Yes
-| [Full bool](#full-bool)                             |     Yes      |   Yes        |   Yes      |     No        |    Yes ^ 
-| [Mesh Shader](#mesh-shader)                         |     No       |   Yes        |   Yes      |     No        |    No
-| [`[unroll]`](#unroll]                               |     Yes      |   Yes        |   Yes ^    |     Yes       |    Limited + 
-| Atomics                                             |     Yes      |   Yes        |   Yes      |     Yes       |    No + 
-| [Atomics on RWBuffer](#rwbuffer-atomics)            |     Yes      |   Yes        |   Yes      |     No        |    No + 
-| [Sampler Feedback](#sampler-feedback)               |     No       |   Yes        |   No +     |     No        |    Yes ^
-| [RWByteAddressBuffer Atomic](#byte-address-atomic)  |     No       |   Yes ^      |   Yes ^    |     Yes       |    No +
-| [Shader Execution Reordering](#ser)                 |     No       |   Yes ^      |   Yes ^    |     No        |    No 
-| [debugBreak](#debug-break)                          |     No       |   No         |   Yes      |     Yes       |    Yes
-| [realtime clock](#realtime-clock)                   |     No       |   Yes ^      |   Yes      |     Yes       |    No
+| Feature                                              | D3D11 | D3D12     | VK      | CUDA           | Metal | CPU       |
+| ---------------------------------------------------- | ----- | --------- | ------- | -------------- | ----- | --------- |
+| [Half Type](#half)                                   | No    | Yes ^     | Yes     | Yes ^          | Yes   | No +      |
+| Double Type                                          | Yes   | Yes       | Yes     | Yes            | No    | Yes       |
+| Double Intrinsics                                    | No    | Limited + | Limited | Most           | No    | Yes       |
+| [u/int8_t Type](#int8_t)                             | No    | No        | Yes ^   | Yes            | Yes   | Yes       |
+| [u/int16_t Type](#int16_t)                           | No    | Yes ^     | Yes ^   | Yes            | Yes   | Yes       |
+| [u/int64_t Type](#int64_t)                           | No    | Yes ^     | Yes     | Yes            | Yes   | Yes       |
+| u/int64_t Intrinsics                                 | No    | No        | Yes     | Yes            | Yes   | Yes       |
+| [int matrix](#int-matrix)                            | Yes   | Yes       | No +    | Yes            | No    | Yes       |
+| [tex.GetDimensions](#tex-get-dimensions)             | Yes   | Yes       | Yes     | No             | Yes   | Yes       |
+| [SM6.0 Wave Intrinsics](#sm6-wave)                   | No    | Yes       | Partial | Yes ^          | No    | No        |
+| SM6.0 Quad Intrinsics                                | No    | Yes       | No +    | No             | No    | No        |
+| [SM6.5 Wave Intrinsics](#sm6.5-wave)                 | No    | Yes ^     | No +    | Yes ^          | No    | No        |
+| [WaveMask Intrinsics](#wave-mask)                    | Yes ^ | Yes ^     | Yes +   | Yes            | No    | No        |
+| [WaveShuffle](#wave-shuffle)                         | No    | Limited ^ | Yes     | Yes            | No    | No        |
+| [Tesselation](#tesselation)                          | Yes ^ | Yes ^     | No +    | No             | No    | No        |
+| [Graphics Pipeline](#graphics-pipeline)              | Yes   | Yes       | Yes     | No             | Yes   | No        |
+| [Ray Tracing DXR 1.0](#ray-tracing-1.0)              | No    | Yes ^     | Yes ^   | No             | No    | No        |
+| Ray Tracing DXR 1.1                                  | No    | Yes       | No +    | No             | No    | No        |
+| [Native Bindless](#native-bindless)                  | No    | No        | No      | Yes            | No    | Yes       |
+| [Buffer bounds](#buffer-bounds)                      | Yes   | Yes       | Yes     | Limited ^      | No ^  | Limited ^ |
+| [Resource bounds](#resource-bounds)                  | Yes   | Yes       | Yes     | Yes (optional) | Yes   | Yes       |
+| Atomics                                              | Yes   | Yes       | Yes     | Yes            | Yes   | Yes       |
+| Group shared mem/Barriers                            | Yes   | Yes       | Yes     | Yes            | Yes   | No +      |
+| [TextureArray.Sample float](#tex-array-sample-float) | Yes   | Yes       | Yes     | No             | Yes   | Yes       |
+| [Separate Sampler](#separate-sampler)                | Yes   | Yes       | Yes     | No             | Yes   | Yes       |
+| [tex.Load](#tex-load)                                | Yes   | Yes       | Yes     | Limited ^      | Yes   | Yes       |
+| [Full bool](#full-bool)                              | Yes   | Yes       | Yes     | No             | Yes   | Yes ^     |
+| [Mesh Shader](#mesh-shader)                          | No    | Yes       | Yes     | No             | Yes   | No        |
+| [`[unroll]`](#unroll]                                | Yes   | Yes       | Yes ^   | Yes            | No ^  | Limited + |
+| Atomics                                              | Yes   | Yes       | Yes     | Yes            | Yes   | No +      |
+| [Atomics on RWBuffer](#rwbuffer-atomics)             | Yes   | Yes       | Yes     | No             | Yes   | No +      |
+| [Sampler Feedback](#sampler-feedback)                | No    | Yes       | No +    | No             | No    | Yes ^     |
+| [RWByteAddressBuffer Atomic](#byte-address-atomic)   | No    | Yes ^     | Yes ^   | Yes            | Yes   | No +      |
+| [Shader Execution Reordering](#ser)                  | No    | Yes ^     | Yes ^   | No             | No    | No        |
+| [debugBreak](#debug-break)                           | No    | No        | Yes     | Yes            | No    | Yes       |
+| [realtime clock](#realtime-clock)                    | No    | Yes ^     | Yes     | Yes            | No    | No        |
 
 <a id="half"></a>
+
 ## Half Type
 
 There appears to be a problem writing to a StructuredBuffer containing half on D3D12. D3D12 also appears to have problems doing calculations with half.
 
-In order for half to work in CUDA, NVRTC must be able to include `cuda_fp16.h` and related files. Please read the [CUDA target documentation](cuda-target.md) for more details. 
+In order for half to work in CUDA, NVRTC must be able to include `cuda_fp16.h` and related files. Please read the [CUDA target documentation](cuda-target.md) for more details.
 
 <a id="int8_t"></a>
+
 ## u/int8_t Type
 
-Not currently supported in D3D11/D3D12 because not supported in HLSL/DXIL/DXBC. 
+Not currently supported in D3D11/D3D12 because not supported in HLSL/DXIL/DXBC.
 
 Supported in Vulkan via the extensions `GL_EXT_shader_explicit_arithmetic_types` and `GL_EXT_shader_8bit_storage`.
 
 <a id="int16_t"></a>
+
 ## u/int16_t Type
 
 Requires SM6.2 which requires DXIL and therefore DXC and D3D12. For DXC this is discussed [here](https://github.com/Microsoft/DirectXShaderCompiler/wiki/16-Bit-Scalar-Types).
@@ -68,21 +70,25 @@ Requires SM6.2 which requires DXIL and therefore DXC and D3D12. For DXC this is
 Supported in Vulkan via the extensions `GL_EXT_shader_explicit_arithmetic_types` and `GL_EXT_shader_16bit_storage`.
 
 <a id="int64_t"></a>
+
 ## u/int64_t Type
 
 Requires SM6.0 which requires DXIL for D3D12. Therefore not available with DXBC on D3D11 or D3D12.
 
 <a id="int-matrix"></a>
+
 ## int matrix
 
-Means can use matrix types containing integer types. 
+Means can use matrix types containing integer types.
 
 <a id="tex-get-dimensions"></a>
+
 ## tex.GetDimensions
 
 tex.GetDimensions is the GetDimensions method on 'texture' objects. This is not supported on CUDA as CUDA has no equivalent functionality to get these values. GetDimensions work on Buffer resource types on CUDA.
 
 <a id="sm6-wave"></a>
+
 ## SM6.0 Wave Intrinsics
 
 CUDA has premliminary support for Wave Intrinsics, introduced in [PR #1352](https://github.com/shader-slang/slang/pull/1352). Slang synthesizes the 'WaveMask' based on program flow and the implied 'programmer view' of exectution. This support is built on top of WaveMask intrinsics with Wave Intrinsics being replaced with WaveMask Intrinsic calls with Slang generating the code to calculate the appropriate WaveMasks.
@@ -90,56 +96,63 @@ CUDA has premliminary support for Wave Intrinsics, introduced in [PR #1352](http
 Please read [PR #1352](https://github.com/shader-slang/slang/pull/1352) for a better description of the status.
 
 <a id="sm6.5-wave"></a>
+
 ## SM6.5 Wave Intrinsics
 
-SM6.5 Wave Intrinsics are supported, but requires a downstream DXC compiler that supports SM6.5. As it stands the DXC shipping with windows does not. 
+SM6.5 Wave Intrinsics are supported, but requires a downstream DXC compiler that supports SM6.5. As it stands the DXC shipping with windows does not.
 
 <a id="wave-mask"></a>
+
 ## WaveMask Intrinsics
 
-In order to map better to the CUDA sync/mask model Slang supports 'WaveMask' intrinsics. They operate in broadly the same way as the Wave intrinsics, but require the programmer to specify the lanes that are involved. To write code that uses wave intrinsics acrosss targets including CUDA, currently the WaveMask intrinsics must be used. For this to work, the masks passed to the WaveMask functions should exactly match the 'Active lanes' concept that HLSL uses, otherwise the result is undefined. 
+In order to map better to the CUDA sync/mask model Slang supports 'WaveMask' intrinsics. They operate in broadly the same way as the Wave intrinsics, but require the programmer to specify the lanes that are involved. To write code that uses wave intrinsics acrosss targets including CUDA, currently the WaveMask intrinsics must be used. For this to work, the masks passed to the WaveMask functions should exactly match the 'Active lanes' concept that HLSL uses, otherwise the result is undefined.
 
 The WaveMask intrinsics are not part of HLSL and are only available on Slang.
 
 <a id="wave-shuffle"></a>
+
 ## WaveShuffle
 
-`WaveShuffle` and `WaveBroadcastLaneAt` are Slang specific intrinsic additions to expand the options available around `WaveReadLaneAt`. 
+`WaveShuffle` and `WaveBroadcastLaneAt` are Slang specific intrinsic additions to expand the options available around `WaveReadLaneAt`.
 
-To be clear this means they will not compile directly on 'standard' HLSL compilers such as `dxc`, but Slang HLSL *output* (which will not contain these intrinsics) can (and typically is) compiled via dxc.
+To be clear this means they will not compile directly on 'standard' HLSL compilers such as `dxc`, but Slang HLSL _output_ (which will not contain these intrinsics) can (and typically is) compiled via dxc.
 
 The difference between them can be summarized as follows
 
-* WaveBroadcastLaneAt - laneId must be a compile time constant 
-* WaveReadLaneAt - laneId can be dynamic but *MUST* be the same value across the Wave ie 'dynamically uniform' across the Wave
-* WaveShuffle - laneId can be truly dynamic (NOTE! That it is not strictly truly available currently on all targets, specifically HLSL)
+- WaveBroadcastLaneAt - laneId must be a compile time constant
+- WaveReadLaneAt - laneId can be dynamic but _MUST_ be the same value across the Wave ie 'dynamically uniform' across the Wave
+- WaveShuffle - laneId can be truly dynamic (NOTE! That it is not strictly truly available currently on all targets, specifically HLSL)
 
 Other than the different restrictions on laneId they act identically to WaveReadLaneAt.
 
 `WaveBroadcastLaneAt` and `WaveReadLaneAt` will work on all targets that support wave intrinsics, with the only current restriction being that on GLSL targets, only scalars and vectors are supported.
 
-`WaveShuffle` will always work on CUDA/Vulkan. 
+`WaveShuffle` will always work on CUDA/Vulkan.
 
-On HLSL based targets currently `WaveShuffle` will be converted into `WaveReadLaneAt`. Strictly speaking this means it *requires* the `laneId` to be `dynamically uniform` across the Wave. In practice some hardware supports the loosened usage, and others does not. In the future this may be fixed in Slang and/or HLSL to work across all hardware. For now if you use `WaveShuffle` on HLSL based targets it will be necessary to confirm that `WaveReadLaneAt` has the loosened behavior for all the hardware intended. If target hardware does not support the loosened restrictions it's behavior is undefined. 
+On HLSL based targets currently `WaveShuffle` will be converted into `WaveReadLaneAt`. Strictly speaking this means it _requires_ the `laneId` to be `dynamically uniform` across the Wave. In practice some hardware supports the loosened usage, and others does not. In the future this may be fixed in Slang and/or HLSL to work across all hardware. For now if you use `WaveShuffle` on HLSL based targets it will be necessary to confirm that `WaveReadLaneAt` has the loosened behavior for all the hardware intended. If target hardware does not support the loosened restrictions it's behavior is undefined.
 
 <a id="tesselation"></a>
+
 ## Tesselation
 
-Although tesselation stages should work on D3D11 and D3D12 they are not tested within our test framework, and may have problems. 
+Although tesselation stages should work on D3D11 and D3D12 they are not tested within our test framework, and may have problems.
 
 <a id="native-bindless"></a>
-## Native Bindless  
 
-Bindless is possible on targets that support it - but is not the default behavior for those targets, and typically require significant effort in Slang code. 
+## Native Bindless
+
+Bindless is possible on targets that support it - but is not the default behavior for those targets, and typically require significant effort in Slang code.
 
 'Native Bindless' targets use a form of 'bindless' for all targets. On CUDA this requires the target to use 'texture object' style binding and for the device to have 'compute capability 3.0' or higher.
 
 <a id="resource-bounds"></a>
-## Resource bounds 
+
+## Resource bounds
 
 For CUDA this is optional as can be controlled via the SLANG_CUDA_BOUNDARY_MODE macro in the `slang-cuda-prelude.h`. By default it's behavior is `cudaBoundaryModeZero`.
 
 <a id="buffer-bounds"></a>
+
 ## Buffer Bounds
 
 This is the feature when accessing outside of the bounds of a Buffer there is well defined behavior - on read returning all 0s, and on write, the write being ignored.
@@ -148,24 +161,30 @@ On CPU there is only bounds checking on debug compilation of C++ code. This will
 
 On CUDA out of bounds accesses default to element 0 (!). The behavior can be controlled via the SLANG_CUDA_BOUND_CHECK macro in the `slang-cuda-prelude.h`. This behavior may seem a little strange - and it requires a buffer that has at least one member to not do something nasty. It is really a 'least worst' answer to a difficult problem and is better than out of range accesses or worse writes.
 
+In Metal, accessing a buffer out of bounds is undefined behavior.
+
 <a id="tex-array-sample-float"></a>
-## TextureArray.Sample float 
+
+## TextureArray.Sample float
 
 When using 'Sample' on a TextureArray, CUDA treats the array index parameter as an int, even though it is passed as a float.
 
 <a id="separate-sampler"></a>
+
 ## Separate Sampler
 
-This feature means that a multiple Samplers can be used with a Texture. In terms of the HLSL code this can be seen as the 'SamplerState' being a parameter passed to the 'Sample' method on a texture object. 
+This feature means that a multiple Samplers can be used with a Texture. In terms of the HLSL code this can be seen as the 'SamplerState' being a parameter passed to the 'Sample' method on a texture object.
 
 On CUDA the SamplerState is ignored, because on this target a 'texture object' is the Texture and Sampler combination.
 
 <a id="graphics-pipeline"></a>
+
 ## Graphics Pipeline
 
-CPU and CUDA only currently support compute shaders. 
+CPU and CUDA only currently support compute shaders.
 
 <a id="ray-tracing-1.0"></a>
+
 ## Ray Tracing DXR 1.0
 
 Vulkan does not support a local root signature, but there is the concept of a 'shader record'. In Slang a single constant buffer can be marked as a shader record with the `[[vk::shader_record]]` attribute, for example:
@@ -175,53 +194,61 @@ Vulkan does not support a local root signature, but there is the concept of a 's
 cbuffer ShaderRecord
 {
 	uint shaderRecordID;
-} 
+}
 ```
 
-In practice to write shader code that works across D3D12 and VK you should have a single constant buffer marked as 'shader record' for VK and then on D3D that constant buffer should be bound in the local root signature on D3D. 
+In practice to write shader code that works across D3D12 and VK you should have a single constant buffer marked as 'shader record' for VK and then on D3D that constant buffer should be bound in the local root signature on D3D.
 
 <a id="tex-load"></a>
+
 ## tex.Load
 
-tex.Load is only supported on CUDA for Texture1D. Additionally CUDA only allows such access for linear memory, meaning the bound texture can also not have mip maps. Load *is* allowed on RWTexture types of other dimensions including 1D on CUDA.
+tex.Load is only supported on CUDA for Texture1D. Additionally CUDA only allows such access for linear memory, meaning the bound texture can also not have mip maps. Load _is_ allowed on RWTexture types of other dimensions including 1D on CUDA.
 
 <a id="full-bool"></a>
+
 ## Full bool
 
-Means fully featured bool support. CUDA has issues around bool because there isn't a vector bool type built in. Currently bool aliases to an int vector type. 
+Means fully featured bool support. CUDA has issues around bool because there isn't a vector bool type built in. Currently bool aliases to an int vector type.
 
-On CPU there are some issues in so far as bool's size is not well defined in size an alignment. Most C++ compilers now use a byte to represent a bool. In the past it has been backed by an int on some compilers. 
+On CPU there are some issues in so far as bool's size is not well defined in size an alignment. Most C++ compilers now use a byte to represent a bool. In the past it has been backed by an int on some compilers.
 
 <a id="unroll"></a>
+
 ## `[unroll]`
 
-The unroll attribute allows for unrolling `for` loops. At the moment the feature is dependent on downstream compiler support which is mixed. In the longer term the intention is for Slang to contain it's own loop unroller - and therefore not be dependent on the feature on downstream compilers. 
+The unroll attribute allows for unrolling `for` loops. At the moment the feature is dependent on downstream compiler support which is mixed. In the longer term the intention is for Slang to contain it's own loop unroller - and therefore not be dependent on the feature on downstream compilers.
 
-On C++ this attribute becomes SLANG_UNROLL which is defined in the prelude. This can be predefined if there is a suitable mechanism, if there isn't a definition SLANG_UNROLL will be an empty definition. 
+On C++ this attribute becomes SLANG_UNROLL which is defined in the prelude. This can be predefined if there is a suitable mechanism, if there isn't a definition SLANG_UNROLL will be an empty definition.
 
 On GLSL and VK targets loop unrolling uses the [GL_EXT_control_flow_attributes](https://github.com/KhronosGroup/GLSL/blob/master/extensions/ext/GL_EXT_control_flow_attributes.txt) extension.
 
+Metal Shading Language does not support loop unrolling.
+
 Slang does have a cross target mechanism to [unroll loops](language-reference/06-statements.md), in the section `Compile-Time For Statement`.
 
 <a id="rwbuffer-atomics"></a>
+
 ## Atomics on RWBuffer
 
 For VK the GLSL output from Slang seems plausible, but VK binding fails in tests harness.
 
-On CUDA RWBuffer becomes CUsurfObject, which is a 'texture' type and does not support atomics. 
+On CUDA RWBuffer becomes CUsurfObject, which is a 'texture' type and does not support atomics.
 
 On the CPU atomics are not supported, but will be in the future.
 
 <a id="sampler-feedback"></a>
+
 ## Sampler Feedback
 
-The HLSL [sampler feedback feature](https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html) is available for DirectX12. The features requires shader model 6.5 and therefore a version of [DXC](https://github.com/Microsoft/DirectXShaderCompiler) that supports that model or higher. The Shader Model 6.5 requirement also means only DXIL binary format is supported. 
+The HLSL [sampler feedback feature](https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html) is available for DirectX12. The features requires shader model 6.5 and therefore a version of [DXC](https://github.com/Microsoft/DirectXShaderCompiler) that supports that model or higher. The Shader Model 6.5 requirement also means only DXIL binary format is supported.
 
 There doesn't not appear to be a similar feature available in Vulkan yet, but when it is available support should be addeed.
 
-For CPU targets there is the IFeedbackTexture interface that requires an implemention for use. Slang does not currently include CPU implementations for texture types.  
+For CPU targets there is the IFeedbackTexture interface that requires an implemention for use. Slang does not currently include CPU implementations for texture types.
 
 <a id="byte-address-atomic"></a>
+
 ## RWByteAddressBuffer Atomic
 
 The additional supported methods on RWByteAddressBuffer are...
@@ -246,18 +273,20 @@ uint64_t RWByteAddressBuffer::InterlockedXorU64(uint byteAddress, uint64_t value
 ```
 
 On HLSL based targets this functionality is achieved using [NVAPI](https://developer.nvidia.com/nvapi). Support for NVAPI is described
-in the separate [NVAPI Support](nvapi-support.md) document.  
+in the separate [NVAPI Support](nvapi-support.md) document.
 
 On Vulkan, for float the [`GL_EXT_shader_atomic_float`](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VK_EXT_shader_atomic_float.html) extension is required. For int64 the [`GL_EXT_shader_atomic_int64`](https://raw.githubusercontent.com/KhronosGroup/GLSL/master/extensions/ext/GL_EXT_shader_atomic_int64.txt) extension is required.
 
-CUDA requires SM6.0 or higher for int64 support. 
+CUDA requires SM6.0 or higher for int64 support.
 
 <a id="mesh-shader"></a>
+
 ## Mesh Shader
 
 There is preliminary [Mesh Shader support](https://github.com/shader-slang/slang/pull/2464).
 
 <a id="ser"></a>
+
 ## Shader Execution Reordering
 
 More information about [Shader Execution Reordering](shader-execution-reordering.md).
@@ -265,20 +294,22 @@ More information about [Shader Execution Reordering](shader-execution-reordering
 Currently support is available in D3D12 via NVAPI, and for Vulkan via the [GL_NV_shader_invocation_reorder](https://github.com/KhronosGroup/GLSL/blob/master/extensions/nv/GLSL_NV_shader_invocation_reorder.txt) extension.
 
 <a id="debug-break"></a>
+
 ## Debug Break
 
 Slang has preliminary support for `debugBreak()` intrinsic. With the appropriate tooling, when `debugBreak` is hit it will cause execution to halt and display in the attached debugger.
 
-Currently this is supported in all targets except HLSL. Note that on some targets if there isn't an appropriate debugging environment the debugBreak might cause execution to fail or potentially it is ignored. 
+Currently this is supported in all targets except HLSL. Note that on some targets if there isn't an appropriate debugging environment the debugBreak might cause execution to fail or potentially it is ignored.
 
-On C++ targets debugBreak is implemented using SLANG_BREAKPOINT defined in "slang-cpp-prelude.h". If there isn't a suitable intrinsic, this will default to attempting to write to `nullptr` leading to a crash. 
+On C++ targets debugBreak is implemented using SLANG_BREAKPOINT defined in "slang-cpp-prelude.h". If there isn't a suitable intrinsic, this will default to attempting to write to `nullptr` leading to a crash.
 
 Some additional details:
 
-* If [slang-llvm](cpu-target.md#slang-llvm) is being used as the downstream compiler (as is typical with `host-callable`), it will crash into the debugger, but may not produce a usable stack trace.
-* For "normal" C++ downstream compilers such as Clang/Gcc/Visual Studio, to break into readable source code, debug information is typically necessary. Disabling optimizations may be useful to break on the appropriate specific line, and have variables inspectable.
+- If [slang-llvm](cpu-target.md#slang-llvm) is being used as the downstream compiler (as is typical with `host-callable`), it will crash into the debugger, but may not produce a usable stack trace.
+- For "normal" C++ downstream compilers such as Clang/Gcc/Visual Studio, to break into readable source code, debug information is typically necessary. Disabling optimizations may be useful to break on the appropriate specific line, and have variables inspectable.
 
 <a id="realtime-clock"></a>
+
 ## Realtime Clock
 
 Realtime clock support is available via the API
@@ -290,10 +321,10 @@ uint getRealtimeClockLow();
 uint2 getRealtimeClock();
 ```
 
-On D3D this is supported through NVAPI via `NvGetSpecial`. 
+On D3D this is supported through NVAPI via `NvGetSpecial`.
 
 On Vulkan this is supported via [VK_KHR_shader_clock extension](https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_shader_clock.html)
 
 On CUDA this is supported via [clock](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#time-function).
 
-Currently this is not supported on CPU, although this will potentially be added in the future.
-\ No newline at end of file
+Currently this is not supported on CPU, although this will potentially be added in the future.
diff --git a/docs/user-guide/09-targets.md b/docs/user-guide/09-targets.md
index d6aebab0c..acebd1860 100644
--- a/docs/user-guide/09-targets.md
+++ b/docs/user-guide/09-targets.md
@@ -3,13 +3,11 @@ layout: user-guide
 permalink: /user-guide/targets
 ---
 
-Supported Compilation Targets
-============================
+# Supported Compilation Targets
 
 This chapter provides a brief overview of the compilation targets supported by Slang, and their different capabilities.
 
-Background and Terminology
---------------------------
+## Background and Terminology
 
 ### Code Formats
 
@@ -48,13 +46,13 @@ Just as applications can do computation outside of the dedicated compute pipelin
 
 The kernels that execute within a pipeline typically has access to four different kinds of data:
 
-* _Varying inputs_ coming from the system or from a preceding pipeline stage
+- _Varying inputs_ coming from the system or from a preceding pipeline stage
 
-* _Varying outputs_ which will be passed along to the system or to a following pipeline stage
+- _Varying outputs_ which will be passed along to the system or to a following pipeline stage
 
-* _Temporaries_ which are scratch memory or registers used by each invocation of the kernel and then dismissed on exit.
+- _Temporaries_ which are scratch memory or registers used by each invocation of the kernel and then dismissed on exit.
 
-* _Shader parameters_ (sometimes also called _uniform parameters_), which provide access to data from outside the pipeline dataflow
+- _Shader parameters_ (sometimes also called _uniform parameters_), which provide access to data from outside the pipeline dataflow
 
 The first three of these kinds of data are largely handled by the implementation of a pipeline.
 In contrast, an application programmer typically needs to manually prepare shader parameters, using the appropriate mechanisms and rules for each target platform.
@@ -100,8 +98,7 @@ Using root constants can eliminate some overheads from passing parameters of ord
 Passing a single `float` using a root constant rather than a buffer obviously eliminates a level of indirection.
 More importantly, though, using a root constant can avoid application code having to allocate and manage the lifetime of a buffer in a concurrent CPU/GPU program.
 
-Direct3D 11
------------
+## Direct3D 11
 
 Direct3D 11 (D3D11) is a older graphics API, but remains popular because it is much simpler to learn and use than some more recent APIs.
 In this section we will give an overview of the relevant features of D3D11 when used as a target platform for Slang.
@@ -117,28 +114,28 @@ D3D11 exposes two pipelines: rasterization and compute.
 
 The D3D11 rasterization pipeline can include up to five programmable stages, although most of them are optional:
 
-* The `vertex` stage (VS) transforms vertex data loaded from memory
+- The `vertex` stage (VS) transforms vertex data loaded from memory
 
-* The optional `hull` stage (HS) typically sets up and computes desired tessellation levels for a higher-order primitive
+- The optional `hull` stage (HS) typically sets up and computes desired tessellation levels for a higher-order primitive
 
-* The optional `domain` stage (DS) evaluates a higher-order surface at domain locations chosen by a fixed-function tessellator
+- The optional `domain` stage (DS) evaluates a higher-order surface at domain locations chosen by a fixed-function tessellator
 
-* The optional `geometry` stage  (GS) receives as input a primitive and can produce zero or more new primitives as output
+- The optional `geometry` stage (GS) receives as input a primitive and can produce zero or more new primitives as output
 
-* The optional `fragment` stage transforms fragments produced by the fixed-function rasterizer, determining the values for those fragments that will be merged with values in zero or more render targets. The fragment stage is sometimes called a "pixel" stage (PS), even when it does not process pixels.
+- The optional `fragment` stage transforms fragments produced by the fixed-function rasterizer, determining the values for those fragments that will be merged with values in zero or more render targets. The fragment stage is sometimes called a "pixel" stage (PS), even when it does not process pixels.
 
 ### Parameter Passing
 
 Shader parameters are passed to each D3D11 stage via slots.
 Each stage has its own slots of the following types:
 
-* **Constant buffers** are used for passing relatively small (4KB or less) amounts of data that will be read by GPU code. Constant bufers are passed via `b` registers.
+- **Constant buffers** are used for passing relatively small (4KB or less) amounts of data that will be read by GPU code. Constant bufers are passed via `b` registers.
 
-* **Shader resource views** (SRVs) include most textures, buffers, and other opaque resource types thare are read or sampled by GPU code. SRVs use `t` registers.
+- **Shader resource views** (SRVs) include most textures, buffers, and other opaque resource types thare are read or sampled by GPU code. SRVs use `t` registers.
 
-* **Unordered access views** (UAVs) include textures, buffers, and other opaque resource types used for write or read-write operations in GPU code. UAVs use `u` registers.
+- **Unordered access views** (UAVs) include textures, buffers, and other opaque resource types used for write or read-write operations in GPU code. UAVs use `u` registers.
 
-* **Samplers** are used to pass opaque texture-sampling stage, and use `s` registers.
+- **Samplers** are used to pass opaque texture-sampling stage, and use `s` registers.
 
 In addition, the D3D11 pipeline provides _vertex buffer_ slots and a single _index buffer_ slot to be used as the source vertex and index data that defines primitives.
 User-defined varying vertex shader inputs are bound to _vertex attribute_ slots (referred to as "input elements" in D3D11) which define how data from vertex buffers should be fetched to provide values for vertex attributes.
@@ -149,8 +146,7 @@ User-defined fragment shader varying outputs (with `SV_Target` binding semantics
 One notable detail of the D3D11 API is that the slots for fragment-stage UAVs and RTVs overlap.
 For example, a fragment kernel cannot use both `u0` and `SV_Target0` at once.
 
-Direct3D 12
------------
+## Direct3D 12
 
 Direct3D 12 (D3D12) is the current major version of the Direct3D API.
 
@@ -167,14 +163,15 @@ Revisions to D3D12 have added additional stages to the rasterization pipeline, a
 
 #### Mesh Shaders
 
-> #### Note ###
+> #### Note
+>
 > The Slang system does not currently support mesh shaders.
 
 The D3D12 rasterization pipeline provides alternative geometry processing stages that may be used as an alternative to the `vertex`, `hull`, `domain`, and `geometry` stages:
 
-* The `mesh` stage runs groups of threads which are responsible cooperating to produce both the vertex and index data for a _meshlet_ a bounded-size chunk of geometry.
+- The `mesh` stage runs groups of threads which are responsible cooperating to produce both the vertex and index data for a _meshlet_ a bounded-size chunk of geometry.
 
-* The optional `amplification` stage precedes the mesh stage and is responsible for determining how many mesh shader invocations should be run.
+- The optional `amplification` stage precedes the mesh stage and is responsible for determining how many mesh shader invocations should be run.
 
 Compared to the D3D11 pipeline without tesselllation (hull and domain shaders), a mesh shader is kind of like a combined/generalized vertex and geometry shader.
 
@@ -185,17 +182,17 @@ Compared to the D3D11 pipeline with tessellation, an amplification shader is kin
 The DirectX Ray Tracing (DXR) feature added a ray tracing pipeline to D3D12.
 The D3D12 ray tracing pipeline exposes the following programmable stages:
 
-* The ray generation (`raygeneration`) stage is similar to a compute stage, but can trace zero or more rays and make use of the results of those traces.
+- The ray generation (`raygeneration`) stage is similar to a compute stage, but can trace zero or more rays and make use of the results of those traces.
 
-* The `intersection` stage runs kernels to compute whether a ray intersects a user-defined primitive type. The system also includes a default intersector that handles triangle meshes.
+- The `intersection` stage runs kernels to compute whether a ray intersects a user-defined primitive type. The system also includes a default intersector that handles triangle meshes.
 
-* The so-called any-hit (`anyhit`) stage runs on _candidate_ hits where a ray has intersected some geometry, but the hit must be either accepted or rejected by application logic. Note that the any-hit stage does not necessarily run on *all* hits, because configuration options on both scene geometry and rays can lead to these checks being bypassed.
+- The so-called any-hit (`anyhit`) stage runs on _candidate_ hits where a ray has intersected some geometry, but the hit must be either accepted or rejected by application logic. Note that the any-hit stage does not necessarily run on _all_ hits, because configuration options on both scene geometry and rays can lead to these checks being bypassed.
 
-* The closest-hit (`closesthit`) stage runs a single _accepted_ hit for a ray; under typical circumstances this will be the closest hit to the origin of the ray. A typical closest-hit shader might compute the apparent color of a surface, similar to a typical fragment shader.
+- The closest-hit (`closesthit`) stage runs a single _accepted_ hit for a ray; under typical circumstances this will be the closest hit to the origin of the ray. A typical closest-hit shader might compute the apparent color of a surface, similar to a typical fragment shader.
 
-* The `miss` stage runs for rays that do not find or accept any hits in a scene. A typical miss shader might return a background color or sample an environment map.
+- The `miss` stage runs for rays that do not find or accept any hits in a scene. A typical miss shader might return a background color or sample an environment map.
 
-* The `callable` stage allows user-defined kernels to be invoked like subroutines in the context of the ray tracing pipeline.
+- The `callable` stage allows user-defined kernels to be invoked like subroutines in the context of the ray tracing pipeline.
 
 Compared to existing rasterization and compute pipelines, an important difference in the design of the D3D12 ray tracing pipeline is that multiple kernels can be loaded into the pipeline for each of the programming stages.
 The specific closest-hit, miss, or other kernel that runs for a given hit or ray is determined by indexing into an appropriate _shader table_, which is effectively an array of kernels.
@@ -204,7 +201,6 @@ The indexing into a shader table can depend on many factors including the type o
 Note that DXR version 1.1 adds ray tracing types and operations that can be used outside of the dedicated ray tracing pipeline.
 These new mechanisms have less visible impact for a programmer using or integrating Slang.
 
-
 ### Parameter Passing
 
 The mechanisms for parameter passing in D3D12 differ greatly from D3D11.
@@ -218,7 +214,7 @@ While shader parameters are bound registers and spaces, those registers and spac
 Instead, the configuration of the root parameters and the correspondence of registers/spaces to root parameters, blocks, and/or slots are defined by a _pipeline layout_ that D3D12 calls a "root signature."
 
 Unlike D3D11, all of the stages in a D3D12 pipeline share the same root parameters.
-A D3D12 pipeline layout can specify that certain root parameters or certain slots within blocks will only be accessed by a subset of stages, and can map the *same* register/space pair to different parameters/blocks/slots as long as this is done for disjoint subset of stages.
+A D3D12 pipeline layout can specify that certain root parameters or certain slots within blocks will only be accessed by a subset of stages, and can map the _same_ register/space pair to different parameters/blocks/slots as long as this is done for disjoint subset of stages.
 
 #### Ray Tracing Specifics
 
@@ -231,8 +227,7 @@ Shader parameters are still bound to registers and spaces as for non-ray-tracing
 One important detail is that some shader table entries are associated with a kernel for a single stage (e.g., a single miss shader), while other shader table entries are associated with a "hit group" consisting of up to one each of an intersection, any-hit, and closest-hit kernel.
 Because multiple kernels in a hit group share the same shader table entry, they also share the configured slots in that entry for binding root constants, blocks, etc.
 
-Vulkan
-------
+## Vulkan
 
 Vulkan is a cross-platform GPU API for graphics and compute with a detailed specification produced by a multi-vendor standards body.
 In contrast with OpenGL, Vulkan focuses on providing explicit control over as many aspects of GPU work as possible.
@@ -266,10 +261,10 @@ That is, a buffer and a texture both using `binding=2` in `set=3` for Vulkan wil
 The Vulkan ray tracing pipeline also uses a shader table, and also forms hit groups similar to D3D12.
 Unlike D3D12, each shader table entry in Vulkan can only be used to pass ordinary values (akin to root constants), and cannot be configured for binding of opaque types or blocks.
 
-OpenGL
-------
+## OpenGL
 
-> #### Note ####
+> #### Note
+>
 > Slang has only limited support for compiling code for OpenGL.
 
 OpenGL has existed for many years, and predates programmable GPU pipelines of the kind this chapter discusses; we will focus solely on use of OpenGL as an API for programmable GPU pipelines.
@@ -296,32 +291,71 @@ The binding index of a parameter is the zero-based index of the slot (of the app
 Note that while OpenGL and Vulkan both use binding indices for shader parameters like textures, the semantics of those are different because OpenGL uses distinct slots for passing buffers and textures.
 For OpenGL it is legal to have a texture that uses `binding=2` and a buffer that uses `binding=2` in the same kernel, because those are indices of distinct kinds of slots, while this scenario would typically be invalid for Vulkan.
 
-Metal
------
+## Metal
 
-> #### Note ####
+> #### Note
+>
 > Slang support for Metal is a work in progress.
 
-Metal is a shading language exclusive on Apple platforms. The functionality from Metal is similar to DX12 or Vulkan with more or less features.
+Metal is Apple's proprietary graphics and compute API for iOS and macOS
+platforms. It provides a modern, low-overhead architecture similar to Direct3D
+12 and Vulkan.
+
+Metal kernels must be compiled to the Metal Shading Language (MSL), which is
+based on C++14 with additional GPU-specific features and constraints. Unlike
+some other APIs, Metal does not use an intermediate representation - MSL source
+code is compiled directly to platform-specific binaries by Apple's compiler.
 
 ### Pipelines
 
-Metal includes vertex, fragment, task, mesh and tessellation stages for rasterization, as well as compute, and ray tracing stages.
+Metal supports rasterization, compute, and ray tracing pipelines.
 
-> #### Note ####
+> #### Note
+>
 > Ray-tracing support for Metal is a work in progress.
 
+The Metal rasterization pipeline includes the following programmable stages:
+
+- The vertex stage transforms vertex data loaded from memory
+
+- The optional mesh stage allows groups of threads to cooperatively generate geometry
+
+- The optional task stage can be used to control mesh shader invocations
+
+- The optional tessellation stages (kernel, post-tessellation vertex) enable hardware tessellation
+
+- The fragment stage processes fragments produced by the rasterizer
+
 ### Parameter Passing
 
-Metal uses slots for binding resources, and it has three types of bindings: buffer, texture and sampler.
-In addition, it has argument buffer which is itself a buffer, but any further resource members of the argument buffer does not occupy any explicit binding points, and instead set via an offset within the buffer referred to as id in the metal spec.
+Metal uses a combination of slots and blocks for parameter passing:
 
-Note that Metal 3.1 currently doesn't support arrays of buffers.
+- Resources (buffers, textures, samplers) are bound to slots using explicit
+  binding indices
 
-CUDA and OptiX
---------------
+- Argument buffers (similar to descriptor tables/sets in other APIs) can group
+  multiple resources together
 
-> #### Note ####
+- Each resource type (buffer, texture, sampler) has its own independent binding
+  space
+
+- Arguments within argument buffers are referenced by offset rather than
+  explicit bindings
+
+Unlike some other APIs, Metal:
+
+- Does not support arrays of buffers as of version 3.1
+- Shares binding slots across all pipeline stages
+- Uses argument buffers that can contain nested resources without consuming additional binding slots
+
+The Metal ray tracing pipeline follows similar parameter passing conventions to
+the rasterization and compute pipelines, while adding intersection,
+closest-hit, and miss stages comparable to those in Direct3D 12 and Vulkan.
+
+## CUDA and OptiX
+
+> #### Note
+>
 > Slang support for OptiX is a work in progress.
 
 CUDA C/C++ is a language for expressing heterogeneous CPU and GPU code with a simple interface to invoking GPU compute work.
@@ -330,7 +364,6 @@ We focus here on OptiX version 7 and up.
 
 CUDA and OptiX allow kernels to be loaded as GPU-specific binaries, or using the PTX intermediate language.
 
-
 ### Pipelines
 
 CUDA supports a compute pipeline that is similar to D3D12 or Vulkan, with additional features.
@@ -358,10 +391,10 @@ OptiX supports use of constant memory storage for ray tracing pipelines, where a
 OptiX uses a shader table for managing kernels and hit groups, and allows kernels to access the bytes of their shader table entry via a pointer.
 Similar to the compute pipeline, application code can layer many different policies on top of these mechanisms.
 
-CPU Compute
------------
+## CPU Compute
 
-> #### Note ####
+> #### Note
+>
 > Slang's support for CPU compute is functional, but not feature- or performance-complete.
 > Backwards-incompatible changes to this target may come in future versions of Slang.
 
@@ -379,8 +412,7 @@ Slang's CPU compute target supports only a compute pipeline.
 
 Because CPU target support flexible pointer-based addressing and large low-latency caches, a compute kernel can simply be passed a small fixed number of pointers and be relied upon to load parameter values of any types via indirection through those pointers.
 
-Summary
--------
+## Summary
 
 This chapter has reviewed the main target platforms supported by the Slang compiler and runtime system.
 A key point to take away is that there is great variation in the capabilities of these systems.
diff --git a/docs/user-guide/a2-02-metal-target-specific.md b/docs/user-guide/a2-02-metal-target-specific.md
new file mode 100644
index 000000000..a69f466a9
--- /dev/null
+++ b/docs/user-guide/a2-02-metal-target-specific.md
@@ -0,0 +1,269 @@
+---
+layout: user-guide
+permalink: /user-guide/metal-target-specific
+---
+
+# Metal-specific functionalities
+
+This chapter provides information for Metal-specific functionalities and
+behaviors in Slang.
+
+## Entry Point Parameter Handling
+
+Slang performs several transformations on entry point parameters when targeting Metal:
+
+- Struct parameters are flattened to eliminate nested structures
+- Input parameters with varying inputs are packed into a single struct
+- System value semantics are translated to Metal attributes
+- Parameters without semantics are given automatic attribute indices
+
+## System-Value semantics
+
+The system-value semantics are translated to the following Metal attributes:
+
+| SV semantic name            | Metal attribute                                      |
+| --------------------------- | ---------------------------------------------------- |
+| `SV_Position`               | `[[position]]`                                       |
+| `SV_Coverage`               | `[[sample_mask]]`                                    |
+| `SV_Depth`                  | `[[depth(any)]]`                                     |
+| `SV_DepthGreaterEqual`      | `[[depth(greater)]]`                                 |
+| `SV_DepthLessEqual`         | `[[depth(less)]]`                                    |
+| `SV_DispatchThreadID`       | `[[thread_position_in_grid]]`                        |
+| `SV_GroupID`                | `[[threadgroup_position_in_grid]]`                   |
+| `SV_GroupThreadID`          | `[[thread_position_in_threadgroup]]`                 |
+| `SV_GroupIndex`             | Calculated from `SV_GroupThreadID` and group extents |
+| `SV_InstanceID`             | `[[instance_id]]`                                    |
+| `SV_IsFrontFace`            | `[[front_facing]]`                                   |
+| `SV_PrimitiveID`            | `[[primitive_id]]`                                   |
+| `SV_RenderTargetArrayIndex` | `[[render_target_array_index]]`                      |
+| `SV_SampleIndex`            | `[[sample_id]]`                                      |
+| `SV_Target<N>`              | `[[color(N)]]`                                       |
+| `SV_VertexID`               | `[[vertex_id]]`                                      |
+| `SV_ViewportArrayIndex`     | `[[viewport_array_index]]`                           |
+
+Custom semantics are mapped to user attributes:
+
+- `[[user(SEMANTIC_NAME)]]` For non-system value semantics
+- `[[user(SEMANTIC_NAME_INDEX)]]` When semantic has an index
+
+## Interpolation Modifiers
+
+Slang maps interpolation modifiers to Metal's interpolation attributes:
+
+| Slang Interpolation | Metal Attribute             |
+| ------------------- | --------------------------- |
+| `nointerpolation`   | `[[flat]]`                  |
+| `noperspective`     | `[[center_no_perspective]]` |
+| `linear`            | `[[sample_no_perspective]]` |
+| `sample`            | `[[sample_perspective]]`    |
+| `centroid`          | `[[center_perspective]]`    |
+
+## Resource Types
+
+Resource types are translated with appropriate Metal qualifiers:
+
+| Slang Type            | Metal Translation  |
+| --------------------- | ------------------ |
+| `Texture2D`           | `texture2d`        |
+| `RWTexture2D`         | `texture2d`        |
+| `ByteAddressBuffer`   | `uint32_t device*` |
+| `StructuredBuffer<T>` | `device* T`        |
+| `ConstantBuffer<T>`   | `constant* T`      |
+
+| Slang Type                        | Metal Translation                     |
+| --------------------------------- | ------------------------------------- |
+| `Texture1D`                       | `texture1d`                           |
+| `Texture1DArray`                  | `texture1d_array`                     |
+| `RWTexture1D`                     | `texture1d`                           |
+| `RWTexture1DArray`                | `texture1d_array`                     |
+| `Texture2D`                       | `texture2d`                           |
+| `Texture2DArray`                  | `texture2d_array`                     |
+| `RWTexture2D`                     | `texture2d`                           |
+| `RWTexture2DArray`                | `texture2d_array`                     |
+| `Texture3D`                       | `texture3d`                           |
+| `RWTexture3D`                     | `texture3d`                           |
+| `TextureCube`                     | `texturecube`                         |
+| `TextureCubeArray`                | `texturecube_array`                   |
+| `Buffer<T>`                       | `device* T`                           |
+| `RWBuffer<T>`                     | `device* T`                           |
+| `ByteAddressBuffer`               | `device* uint32_t`                    |
+| `RWByteAddressBuffer`             | `device* uint32_t`                    |
+| `StructuredBuffer<T>`             | `device* T`                           |
+| `RWStructuredBuffer<T>`           | `device* T`                           |
+| `AppendStructuredBuffer<T>`       | `device* T`                           |
+| `ConsumeStructuredBuffer<T>`      | `device* T`                           |
+| `ConstantBuffer<T>`               | `constant* T`                         |
+| `SamplerState`                    | `sampler`                             |
+| `SamplerComparisonState`          | `sampler`                             |
+| `RaytracingAccelerationStructure` | `(Not supported)`                     |
+| `RasterizerOrderedTexture2D`      | `texture2d [[raster_order_group(0)]]` |
+| `RasterizerOrderedBuffer<T>`      | `device* T [[raster_order_group(0)]]` |
+
+Raster-ordered access resources receive the `[[raster_order_group(0)]]`
+attribute, for example `texture2d<float, access::read_write> tex
+[[raster_order_group(0)]]`.
+
+# Array Types
+
+Array types in Metal are declared using the array template:
+
+| Slang Type          | Metal Translation          |
+| ------------------- | -------------------------- |
+| `ElementType[Size]` | `array<ElementType, Size>` |
+
+# Matrix Layout
+
+Metal exclusively uses column-major matrix layout. Slang automatically handles
+the translation of matrix operations to maintain correct semantics:
+
+- Matrix multiplication is transformed to account for layout differences
+- Matrix types are declared as `matrix<T, Columns, Rows>`, for example
+  `float3x4` is represented as `matrix<float, 3, 4>`
+
+# Mesh Shader Support
+
+Mesh shaders can be targeted using the following types and syntax. The same as task/mesh shaders generally in Slang.
+
+```slang
+[outputtopology("triangle")]
+[numthreads(12, 1, 1)]
+void meshMain(
+    in uint tig: SV_GroupIndex,
+    in payload MeshPayload meshPayload,
+    OutputVertices<Vertex, MAX_VERTS> verts,
+    OutputIndices<uint3, MAX_PRIMS> triangles,
+    OutputPrimitives<Primitive, MAX_PRIMS> primitives
+    )
+```
+
+## Header Inclusions and Namespace
+
+When targeting Metal, Slang automatically includes the following headers, these
+are available to any intrinsic code.
+
+```cpp
+#include <metal_stdlib>
+#include <metal_math>
+#include <metal_texture>
+using namespace metal;
+```
+
+## Parameter blocks and Argument Buffers
+
+`ParameterBlock` values are translated into _Argument Buffers_ potentially
+containing nested resources. For example this Slang code...
+
+```slang
+struct MyParameters
+{
+    int x;
+    int y;
+    StructuredBuffer<float> buffer1;
+    RWStructuredBuffer<uint3> buffer2;
+}
+
+ParameterBlock<MyParameters> gObj;
+
+void main(){ ... gObj ... }
+```
+
+... results in this Metal output:
+
+```cpp
+struct MyParameters
+{
+    int x;
+    int y;
+    float device* buffer1;
+    uint3 device* buffer2;
+};
+
+[[kernel]] void main(MyParameters constant* gObj [[buffer(1)]])
+```
+
+## Struct Parameter Flattening
+
+When targeting Metal, top-level nested struct parameters are automatically
+flattened. For example:
+
+```slang
+struct NestedStruct
+{
+    float2 uv;
+};
+struct InputStruct
+{
+    float4 position;
+    float3 normal;
+    NestedStruct nested;
+};
+```
+
+Will be flattened to:
+
+```cpp
+struct InputStruct
+{
+    float4 position;
+    float3 normal;
+    float2 uv;
+};
+```
+
+## Return Value Handling
+
+Non-struct return values from entry points are automatically wrapped in a
+struct with appropriate semantics. For example:
+
+```slang
+float4 main() : SV_Target
+{
+    return float4(1,2,3,4);
+}
+```
+
+becomes:
+
+```c++
+struct FragmentOutput
+{
+    float4 value : SV_Target;
+};
+FragmentOutput main()
+{
+    return { float4(1,2,3,4) };
+}
+```
+
+## Value Type Conversion
+
+Metal enforces strict type requirements for certain operations. Slang
+automatically performs the following conversions:
+
+- Vector size expansion (e.g., float2 to float4), for example when the user
+  specified `float2` but the semantic type in Metal is float4.
+- Image store value expansion to 4-components
+
+For example:
+
+```slang
+RWTexture2D<float2> tex;
+tex[coord] = float2(1,2);  // Automatically expanded to float4(1,2,0,0)
+```
+
+## Conservative Rasterization
+
+Since Metal doesn't support conservative rasterization, SV_InnerCoverage is always false.
+
+## Address Space Assignment
+
+Metal requires explicit address space qualifiers. Slang automatically assigns appropriate address spaces:
+
+| Variable Type         | Metal Address Space |
+| --------------------- | ------------------- |
+| Local Variables       | `thread`            |
+| Global Variables      | `device`            |
+| Uniform Buffers       | `constant`          |
+| RW/Structured Buffers | `device`            |
+| Group Shared          | `threadgroup`       |
+| Parameter Blocks      | `constant`          |
author	Ellie Hermaszewska <ellieh@nvidia.com>	2024-11-20 02:23:59 +0800
committer	GitHub <noreply@github.com>	2024-11-19 10:23:59 -0800
commit	6e4473d1dc18d6a1d6c8e57df7af649f67719419 (patch)
tree	ef8bd7e79d83add61aad385292460da1d5e2e8c6
parent	0bf6a668208c65c980648fbe74a8c0a7bf4ded77 (diff)