summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorjsmall-nvidia <jsmall@nvidia.com>2021-05-15 11:45:58 -0400
committerGitHub <noreply@github.com>2021-05-15 11:45:58 -0400
commitd5e8044d0a9723bb0bbd7ae1738d1157265da783 (patch)
treed330e87e67646fd6e978e4debad17b4f7fbe2c40
parentbfe75618be81566882be8570b8db82ad5a2f8fe4 (diff)
Read half->float RWTexture conversion (#1842)
* #include an absolute path didn't work - because paths were taken to always be relative. * Fix for writing to RWTexture with half types on CUDA. * CUDA half functionality doc updates. * First pass support for sust.p RWTexture format conversion on write. * Tidy up implementation of $C. Made clamping mode #define able. * A simple test for RWTexture CUDA format conversion. * Add support for float2 and float4. * WIP conversion testing. * Use $E to fix byte addressing in X in CUDA. * Do not scale when accessing via _convert versions of surface functions. * Revert to previous test. * Test with half/float convert write/read. * More broad half->float read conversion testing. * Improve documentation around half and RWTexture conversion.
-rw-r--r--docs/cuda-target.md57
-rw-r--r--docs/target-compatibility.md3
-rw-r--r--prelude/slang-cuda-prelude.h90
-rw-r--r--source/slang/core.meta.slang2
-rw-r--r--source/slang/slang-intrinsic-expand.cpp86
-rw-r--r--tests/compute/half-rw-texture-convert.slang14
-rw-r--r--tests/compute/half-rw-texture-convert2.slang53
-rw-r--r--tests/compute/half-rw-texture-convert2.slang.expected.txt5
8 files changed, 276 insertions, 34 deletions
diff --git a/docs/cuda-target.md b/docs/cuda-target.md
index 47b058f50..d5affc0e9 100644
--- a/docs/cuda-target.md
+++ b/docs/cuda-target.md
@@ -20,11 +20,11 @@ These limitations apply to Slang transpiling to CUDA.
* Samplers are not separate objects in CUDA - they are combined into a single 'TextureObject'. So samplers are effectively ignored on CUDA targets.
* When using a TextureArray.Sample (layered texture in CUDA) - the index will be treated as an int, as this is all CUDA allows
* Care must be used in using `WaveGetLaneIndex` wave intrinsic - it will only give the right results for appropriate launches
-* CUDA 'surfaces' are used for textures which are read/write. CUDA does NOT do format conversion with surfaces.
+* CUDA 'surfaces' are used for textures which are read/write (aka RWTexture).
The following are a work in progress or not implemented but are planned to be so in the future
-* Some resource types remain unsupported, and not all methods on types are supported
+* Some resource types remain unsupported, and not all methods on all types are supported
# How it works
@@ -122,8 +122,6 @@ The UniformState and UniformEntryPointParams struct typically vary by shader. Un
size_t sizeInBytes;
```
-
-
## Texture
Read only textures will be bound as the opaque CUDA type CUtexObject. This type is the combination of both a texture AND a sampler. This is somewhat different from HLSL, where there can be separate `SamplerState` variables. This allows access of a single texture binding with different types of sampling.
@@ -138,11 +136,58 @@ Load is only supported for Texture1D, and the mip map selection argument is igno
RWTexture types are converted into CUsurfObject type.
-In CUDA it is not possible to do a format conversion on an access to a CUsurfObject, so it must be backed by the same data format as is used within the Slang source code.
+In regular CUDA it is not possible to do a format conversion on an access to a CUsurfObject. Slang does add support for hardware write conversions where they are available. To enable the feature it is necessary to attribute your RWTexture with `format`. For example
+
+```
+[format("rg16f")]
+RWTexture2D<float2> rwt2D_2;
+```
+
+The format names used are the same as for (GLSL layout format types)[https://www.khronos.org/opengl/wiki/Layout_Qualifier_(GLSL)]. If no format is specified Slang will *assume* that the format is the same as the type specified.
+
+Note that the format attribution is on variables/paramters/fields and not part of the type system. This means that if you have a scenario like...
+
+```
+[format(rg16f)]
+RWTexture2d<float2> g_texture;
+
+float2 getValue(RWTexture2D<float2> t)
+{
+ return t[int2(0, 0];
+}
+
+void doThing()
+{
+ float2 v = getValue(g_texture);
+}
+```
+
+Even `getValue` will receive t *without* the format attribute, and so will access it, presumably erroneously. A work around for this specific scenario would be to attribute the parameter
+
+```
+float2 getValue([format("rg16f")] RWTexture2D<float2> t)
+{
+ return t[int2(0, 0];
+}
+```
+
+This will only work correctly if `getValue` is called with a `t` that has that format attribute. As it stands no checking is performed on this matching so no error or warning will be produced if there is a mismatch.
+
+There is limited software support for doing a conversion on reading. Currently this only supports only 1D, 2D, 3D RWTexture, backed with half1, half2 or half4. For this path to work NVRTC must have the `cuda_fp16.h` and associated files available. Please check the section on `Half Support`.
+
+If hardware read conversions are desired, this can be achieved by having a Texture<T> that uses the surface of a RWTexture<T>. Using the Texture<T> not only allows hardware conversion but also filtering.
It is also worth noting that CUsurfObjects in CUDA are NOT allowed to have mip maps.
-By default surface access uses cudaBoundaryModeZero, this can be replaced using the macro SLANG_CUDA_BOUNDARY_MODE in the CUDA prelude.
+By default surface access uses cudaBoundaryModeZero, this can be replaced using the macro SLANG_CUDA_BOUNDARY_MODE in the CUDA prelude. For HW format conversions the macro SLANG_PTX_BOUNDARY_MODE. These boundary settings are in effect global for the whole of the kernel.
+
+`SLANG_CUDA_BOUNDARY_MODE` can be one of
+
+* cudaBoundaryModeZero causes an execution trap on out-of-bounds addresses
+* cudaBoundaryModeClamp stores data at the nearest surface location (sized appropriately)
+* cudaBoundaryModeTrap drops stores to out-of-bounds addresses
+
+`SLANG_PTX_BOUNDARY_MODE` can be one of `trap`, `clamp` or `zero`. In general it is recommended to have both set to the same type of value, for example `cudaBoundaryModeZero` and `zero`.
## Sampler
diff --git a/docs/target-compatibility.md b/docs/target-compatibility.md
index 54d7bf997..6fda4ceed 100644
--- a/docs/target-compatibility.md
+++ b/docs/target-compatibility.md
@@ -1,7 +1,6 @@
Slang Target Compatibility
==========================
-
Shader Model (SM) numbers are D3D Shader Model versions, unless explicitly stated otherwise.
OpenGL compatibility is not listed here, because OpenGL isn't an officially supported target.
@@ -203,8 +202,6 @@ uint64_t RWByteAddressBuffer::InterlockedMinU64(uint byteAddress, uint64_t value
uint64_t RWByteAddressBuffer::InterlockedAndU64(uint byteAddress, uint64_t value);
uint64_t RWByteAddressBuffer::InterlockedOrU64(uint byteAddress, uint64_t value);
uint64_t RWByteAddressBuffer::InterlockedXorU64(uint byteAddress, uint64_t value);
-
-
```
On HLSL based targets this functionality is achieved using [NVAPI](https://developer.nvidia.com/nvapi). Support for NVAPI is described
diff --git a/prelude/slang-cuda-prelude.h b/prelude/slang-cuda-prelude.h
index a18da027b..4df60e965 100644
--- a/prelude/slang-cuda-prelude.h
+++ b/prelude/slang-cuda-prelude.h
@@ -381,6 +381,41 @@ SLANG_SURFACE_WRITE(surf2DLayeredwrite, (int x, int y, int layer), (x, y, layer)
SLANG_SURFACE_WRITE(surfCubemapwrite, (int x, int y, int face), (x, y, face))
SLANG_SURFACE_WRITE(surfCubemapLayeredwrite, (int x, int y, int layerFace), (x, y, layerFace))
+// ! Hack to test out reading !!!
+// Only works converting *from* half
+
+//template <typename T>
+//SLANG_FORCE_INLINE SLANG_CUDA_CALL T surf2Dread_convert(cudaSurfaceObject_t surfObj, int x, int y, cudaSurfaceBoundaryMode boundaryMode);
+
+#define SLANG_SURFACE_READ_HALF_CONVERT(FUNC_NAME, TYPE_ARGS, ARGS) \
+\
+template <typename T> \
+SLANG_FORCE_INLINE SLANG_CUDA_CALL T FUNC_NAME##_convert(cudaSurfaceObject_t surfObj, SLANG_DROP_PARENS TYPE_ARGS, cudaSurfaceBoundaryMode boundaryMode); \
+\
+template <> \
+SLANG_FORCE_INLINE SLANG_CUDA_CALL float FUNC_NAME##_convert<float>(cudaSurfaceObject_t surfObj, SLANG_DROP_PARENS TYPE_ARGS, cudaSurfaceBoundaryMode boundaryMode) \
+{ \
+ return __ushort_as_half(FUNC_NAME<uint16_t>(surfObj, SLANG_DROP_PARENS ARGS, boundaryMode)); \
+} \
+\
+template <> \
+SLANG_FORCE_INLINE SLANG_CUDA_CALL float2 FUNC_NAME##_convert<float2>(cudaSurfaceObject_t surfObj, SLANG_DROP_PARENS TYPE_ARGS, cudaSurfaceBoundaryMode boundaryMode) \
+{ \
+ const __half2 v = __ushort_as_half(FUNC_NAME<ushort2>(surfObj, SLANG_DROP_PARENS ARGS, boundaryMode)); \
+ return float2{v.x, v.y}; \
+} \
+\
+template <> \
+SLANG_FORCE_INLINE SLANG_CUDA_CALL float4 FUNC_NAME##_convert<float4>(cudaSurfaceObject_t surfObj, SLANG_DROP_PARENS TYPE_ARGS, cudaSurfaceBoundaryMode boundaryMode) \
+{ \
+ const __half4 v = __ushort_as_half(FUNC_NAME<ushort4>(surfObj, SLANG_DROP_PARENS ARGS, boundaryMode)); \
+ return float4{v.xy.x, v.xy.y, v.zw.x, v.zw.y}; \
+}
+
+SLANG_SURFACE_READ_HALF_CONVERT(surf1Dread, (int x), (x))
+SLANG_SURFACE_READ_HALF_CONVERT(surf2Dread, (int x, int y), (x, y))
+SLANG_SURFACE_READ_HALF_CONVERT(surf3Dread, (int x, int y, int z), (x, y, z))
+
#endif
// Support for doing format conversion when writing to a surface/RWTexture
@@ -392,10 +427,14 @@ template <typename T>
SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf1Dwrite_convert(T, cudaSurfaceObject_t surfObj, int x, cudaSurfaceBoundaryMode boundaryMode);
template <typename T>
SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf2Dwrite_convert(T, cudaSurfaceObject_t surfObj, int x, int y, cudaSurfaceBoundaryMode boundaryMode);
+template <typename T>
+SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf3Dwrite_convert(T, cudaSurfaceObject_t surfObj, int x, int y, int z, cudaSurfaceBoundaryMode boundaryMode);
// https://docs.nvidia.com/cuda/inline-ptx-assembly/index.html
// https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#surface-instructions-sust
+// Float
+
template <>
SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf1Dwrite_convert<float>(float v, cudaSurfaceObject_t surfObj, int x, cudaSurfaceBoundaryMode boundaryMode)
{
@@ -408,6 +447,57 @@ SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf2Dwrite_convert<float>(float v, cuda
asm volatile ( "{sust.p.2d.b32." SLANG_PTX_BOUNDARY_MODE " [%0, {%1,%2}], {%3};}\n\t" :: "l"(surfObj),"r"(x),"r"(y),"f"(v));
}
+template <>
+SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf3Dwrite_convert<float>(float v, cudaSurfaceObject_t surfObj, int x, int y, int z, cudaSurfaceBoundaryMode boundaryMode)
+{
+ asm volatile ( "{sust.p.2d.b32." SLANG_PTX_BOUNDARY_MODE " [%0, {%1,%2,%3}], {%4};}\n\t" :: "l"(surfObj),"r"(x),"r"(y),"r"(z),"f"(v));
+}
+
+// Float2
+
+template <>
+SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf1Dwrite_convert<float2>(float2 v, cudaSurfaceObject_t surfObj, int x, cudaSurfaceBoundaryMode boundaryMode)
+{
+ const float vx = v.x, vy = v.y;
+ asm volatile ( "{sust.p.1d.b32." SLANG_PTX_BOUNDARY_MODE " [%0, {%1}], {%2,%3};}\n\t" :: "l"(surfObj),"r"(x),"f"(vx),"f"(vy));
+}
+
+template <>
+SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf2Dwrite_convert<float2>(float2 v, cudaSurfaceObject_t surfObj, int x, int y, cudaSurfaceBoundaryMode boundaryMode)
+{
+ const float vx = v.x, vy = v.y;
+ asm volatile ( "{sust.p.2d.b32." SLANG_PTX_BOUNDARY_MODE " [%0, {%1,%2}], {%3,%4};}\n\t" :: "l"(surfObj),"r"(x),"r"(y),"f"(vx),"f"(vy));
+}
+
+template <>
+SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf3Dwrite_convert<float2>(float2 v, cudaSurfaceObject_t surfObj, int x, int y, int z, cudaSurfaceBoundaryMode boundaryMode)
+{
+ const float vx = v.x, vy = v.y;
+ asm volatile ( "{sust.p.2d.b32." SLANG_PTX_BOUNDARY_MODE " [%0, {%1,%2,%3}], {%4,%5};}\n\t" :: "l"(surfObj),"r"(x),"r"(y),"r"(z),"f"(vx),"f"(vy));
+}
+
+// Float4
+template <>
+SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf1Dwrite_convert<float4>(float4 v, cudaSurfaceObject_t surfObj, int x, cudaSurfaceBoundaryMode boundaryMode)
+{
+ const float vx = v.x, vy = v.y, vz = v.z, vw = v.w;
+ asm volatile ( "{sust.p.1d.b32." SLANG_PTX_BOUNDARY_MODE " [%0, {%1}], {%2,%3,%4,%5};}\n\t" :: "l"(surfObj),"r"(x),"f"(vx),"f"(vy),"f"(vz),"f"(vw));
+}
+
+template <>
+SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf2Dwrite_convert<float4>(float4 v, cudaSurfaceObject_t surfObj, int x, int y, cudaSurfaceBoundaryMode boundaryMode)
+{
+ const float vx = v.x, vy = v.y, vz = v.z, vw = v.w;
+ asm volatile ( "{sust.p.2d.b32." SLANG_PTX_BOUNDARY_MODE " [%0, {%1,%2}], {%3,%4,%5,%6};}\n\t" :: "l"(surfObj),"r"(x),"r"(y),"f"(vx),"f"(vy),"f"(vz),"f"(vw));
+}
+
+template <>
+SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf3Dwrite_convert<float4>(float4 v, cudaSurfaceObject_t surfObj, int x, int y, int z, cudaSurfaceBoundaryMode boundaryMode)
+{
+ const float vx = v.x, vy = v.y, vz = v.z, vw = v.w;
+ asm volatile ( "{sust.p.2d.b32." SLANG_PTX_BOUNDARY_MODE " [%0, {%1,%2,%3}], {%4,%5,%6,%7};}\n\t" :: "l"(surfObj),"r"(x),"r"(y),"r"(z),"f"(vx),"f"(vy),"f"(vz),"f"(vw));
+}
+
// ----------------------------- F32 -----------------------------------------
// Unary
diff --git a/source/slang/core.meta.slang b/source/slang/core.meta.slang
index 9e5cf80c8..6b73630a3 100644
--- a/source/slang/core.meta.slang
+++ b/source/slang/core.meta.slang
@@ -1083,7 +1083,7 @@ for (int tt = 0; tt < kBaseTextureTypeCount; ++tt)
}
sb << (isArray ? "Layered" : "");
- sb << "read<$T0>($0";
+ sb << "read$C<$T0>($0";
for (int i = 0; i < vecCount; ++i)
{
diff --git a/source/slang/slang-intrinsic-expand.cpp b/source/slang/slang-intrinsic-expand.cpp
index c5bedbc37..bd2e17b28 100644
--- a/source/slang/slang-intrinsic-expand.cpp
+++ b/source/slang/slang-intrinsic-expand.cpp
@@ -1,6 +1,8 @@
// slang-intrinsic-expand.cpp
#include "slang-intrinsic-expand.h"
+#include "slang-emit-cuda.h"
+
namespace Slang {
void IntrinsicExpandContext::emit(IRCall* inst, IRUse* args, Int argCount, const UnownedStringSlice& intrinsicText)
@@ -101,13 +103,13 @@ static BaseType _getBaseTypeFromScalarType(SlangScalarType type)
// The VK back-end gets away with this kind of coincidentally, since the "legalization" we have to do for resources means that there wouldn't be a single f() function any more.
// But for CUDA and C++ that's not the case or generally desirable.
-static IRFormatDecoration* _findImageFormatDecoration(IRInst* inst)
+static IRFormatDecoration* _findImageFormatDecoration(IRInst* resourceInst)
{
// JS(TODO):
// There could perhaps be other situations, that need to be covered
// If this is a load, we need to get the decoration from the field key
- if (IRLoad* load = as<IRLoad>(inst))
+ if (IRLoad* load = as<IRLoad>(resourceInst))
{
if (IRFieldAddress* fieldAddress = as<IRFieldAddress>(load->getOperand(0)))
{
@@ -116,7 +118,7 @@ static IRFormatDecoration* _findImageFormatDecoration(IRInst* inst)
}
}
// Otherwise just try on the instruction
- return inst->findDecoration<IRFormatDecoration>();
+ return resourceInst->findDecoration<IRFormatDecoration>();
}
// Returns true if dataType and imageFormat are compatible - that they have the same representation,
@@ -149,36 +151,26 @@ static bool _isImageFormatCompatible(ImageFormat imageFormat, IRType* dataType)
return formatBaseType == baseType;
}
-static bool _isConvertRequired(ImageFormat imageFormat, IRInst* resourceVar)
+static bool _isConvertRequired(ImageFormat imageFormat, IRInst* callee)
{
- auto textureType = as<IRTextureTypeBase>(resourceVar->getDataType());
+ auto textureType = as<IRTextureTypeBase>(callee->getDataType());
IRType* elementType = textureType ? textureType->getElementType() : nullptr;
return elementType && !_isImageFormatCompatible(imageFormat, elementType);
}
-static size_t _calcBackingElementSizeInBytes(IRInst* resourceVar)
+static size_t _calcBackingElementSizeInBytes(IRInst* resourceInst)
{
// First see if there is a format associated with the resource
- if (IRFormatDecoration* formatDecoration = _findImageFormatDecoration(resourceVar))
+ if (IRFormatDecoration* formatDecoration = _findImageFormatDecoration(resourceInst))
{
- const ImageFormat imageFormat = formatDecoration->getFormat();
-
- if (_isConvertRequired(imageFormat, resourceVar))
- {
- // If the access is a converting access then the x coordinate is *NOT* scaled
- // This is a CUDA specific issue(!).
- return 1;
- }
-
- const auto& imageFormatInfo = getImageFormatInfo(imageFormat);
- return imageFormatInfo.sizeInBytes;
+ return getImageFormatInfo(formatDecoration->getFormat()).sizeInBytes;
}
else
{
// If not we *assume* the backing format is the same as the element type used for access.
/// Ie in RWTexture<T>, this would return sizeof(T)
- auto textureType = as<IRTextureTypeBase>(resourceVar->getDataType());
+ auto textureType = as<IRTextureTypeBase>(resourceInst->getDataType());
IRType* elementType = textureType ? textureType->getElementType() : nullptr;
if (elementType)
@@ -206,6 +198,18 @@ static size_t _calcBackingElementSizeInBytes(IRInst* resourceVar)
return 4;
}
+static bool _isResourceRead(IRCall* call)
+{
+ IRType* returnType = call->getDataType();
+ return returnType && (as<IRVoidType>(returnType) == nullptr);
+}
+
+static bool _isResourceWrite(IRCall* call)
+{
+ IRType* returnType = call->getDataType();
+ return returnType && (as<IRVoidType>(returnType) != nullptr);
+}
+
const char* IntrinsicExpandContext::_emitSpecial(const char* cursor)
{
const char*const end = m_text.end();
@@ -323,13 +327,35 @@ const char* IntrinsicExpandContext::_emitSpecial(const char* cursor)
// writes that will do a format conversion.
if (m_emitter->getTarget() == CodeGenTarget::CUDASource)
{
- IRInst* arg0 = m_callInst->getArg(0);
+ IRInst* resourceInst = m_callInst->getArg(0);
- if (IRFormatDecoration* formatDecoration = _findImageFormatDecoration(arg0))
+ if (IRFormatDecoration* formatDecoration = _findImageFormatDecoration(resourceInst))
{
const ImageFormat imageFormat = formatDecoration->getFormat();
- if (_isConvertRequired(imageFormat, arg0))
+ if (_isConvertRequired(imageFormat, resourceInst))
{
+ // If the function returns something it's a reader so we may need to convert
+ // and in doing so require half
+ if (_isResourceRead(m_callInst))
+ {
+ // If the source format if half derived, then we need to enable half
+ switch (imageFormat)
+ {
+ case ImageFormat::r16f:
+ case ImageFormat::rg16f:
+ case ImageFormat::rgba16f:
+ {
+ CUDAExtensionTracker* extensionTracker = as<CUDAExtensionTracker>(m_emitter->getExtensionTracker());
+ if (extensionTracker)
+ {
+ extensionTracker->requireBaseType(BaseType::Half);
+ }
+ break;
+ }
+ default: break;
+ }
+ }
+
// Append _convert on the name to signify we need to use a code path, that will automatically
// do the format conversion.
m_writer->emit("_convert");
@@ -344,7 +370,21 @@ const char* IntrinsicExpandContext::_emitSpecial(const char* cursor)
/// Sometimes accesses need to be scaled. For example in CUDA the x coordinate for surface
/// access is byte addressed.
/// $E will return the byte size of the *backing element*.
- size_t elemSizeInBytes = _calcBackingElementSizeInBytes(m_callInst->getArg(0));
+
+ IRInst* resourceInst = m_callInst->getArg(0);
+ size_t elemSizeInBytes = _calcBackingElementSizeInBytes(resourceInst);
+
+ // If we have a format converstion and its a *write* we don't need to scale
+ if (IRFormatDecoration* formatDecoration = _findImageFormatDecoration(resourceInst))
+ {
+ const ImageFormat imageFormat = formatDecoration->getFormat();
+ if (_isConvertRequired(imageFormat, resourceInst) && _isResourceWrite(m_callInst))
+ {
+ // If there is a conversion *and* it's a write we don't need to scale.
+ elemSizeInBytes = 1;
+ }
+ }
+
SLANG_ASSERT(elemSizeInBytes > 0);
m_writer->emitUInt64(UInt64(elemSizeInBytes));
break;
diff --git a/tests/compute/half-rw-texture-convert.slang b/tests/compute/half-rw-texture-convert.slang
index cf6eea4ea..338f44454 100644
--- a/tests/compute/half-rw-texture-convert.slang
+++ b/tests/compute/half-rw-texture-convert.slang
@@ -25,6 +25,14 @@
[format("r16f")]
RWTexture2D<float> rwt2D;
+//TEST_INPUT: RWTexture2D(format=RG_Float16, size=4, content = one, mipMaps = 1):name rwt2D_2
+[format("rg16f")]
+RWTexture2D<float2> rwt2D_2;
+
+//TEST_INPUT: RWTexture2D(format=RGBA_Float16, size=4, content = one, mipMaps = 1):name rwt2D_4
+[format("rgba16f")]
+RWTexture2D<float4> rwt2D_4;
+
//TEST_INPUT: ubuffer(data=[0 0 0 0], stride=4):out,name outputBuffer
RWStructuredBuffer<float> outputBuffer;
@@ -38,5 +46,9 @@ void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID)
// Do a format converting write!
rwt2D[uint2(idx, idx)] = val;
+ rwt2D_2[uint2(idx, idx)] = float2(val * 2, val * 3);
+
+ rwt2D_4[uint2(idx, idx)] = float4(val + 1, val - 1, val * 4, val * -4);
+
outputBuffer[idx] = val;
-}
+} \ No newline at end of file
diff --git a/tests/compute/half-rw-texture-convert2.slang b/tests/compute/half-rw-texture-convert2.slang
new file mode 100644
index 000000000..e9b7200c4
--- /dev/null
+++ b/tests/compute/half-rw-texture-convert2.slang
@@ -0,0 +1,53 @@
+// Native half not supported on CPU currently
+//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-cpu -compute -output-using-type -shaderobj
+// Doesn't work on DX11 currently - locks up on binding
+//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -output-using-type -shaderobj
+// Produces a different result on DX12 with DXBC than expected(!). So disabled for now
+//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -dx12 -output-using-type -shaderobj
+//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -dx12 -profile cs_6_0 -use-dxil -output-using-type -shaderobj
+// TODO(JS): Doesn't work on vk currently, because createTextureView not implemented on vk renderer
+//DIABLE_TEST(compute, vulkan):COMPARE_COMPUTE_EX:-vk -compute -output-using-type -shaderobj
+
+//TEST(compute):COMPARE_COMPUTE_EX:-cuda -compute -output-using-type -shaderobj -render-features half
+
+//TEST_INPUT: RWTexture2D(format=R_Float16, size=4, content = one, mipMaps = 1):name rwt2D
+[format("r16f")]
+RWTexture2D<float> rwt2D;
+
+//TEST_INPUT: RWTexture2D(format=RG_Float16, size=4, content = one, mipMaps = 1):name rwt2D_2
+[format("rg16f")]
+RWTexture2D<float2> rwt2D_2;
+
+//TEST_INPUT: RWTexture2D(format=RGBA_Float16, size=4, content = one, mipMaps = 1):name rwt2D_4
+[format("rgba16f")]
+RWTexture2D<float4> rwt2D_4;
+
+//TEST_INPUT: ubuffer(data=[0 0 0 0], stride=4):out,name outputBuffer
+RWStructuredBuffer<float> outputBuffer;
+
+[numthreads(4, 1, 1)]
+void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID)
+{
+ int idx = dispatchThreadID.x;
+
+ float val = 0;
+
+ // Do a format converting write2!
+ rwt2D[uint2(idx, idx)] = idx - 1;
+ rwt2D_2[uint2(idx, idx)] = float2(idx + idx, idx * idx);
+ rwt2D_4[uint2(idx, idx)] = float4(idx + 97, idx + 8, idx + 16, idx + 24);
+
+ // May not be strictly necessary
+ AllMemoryBarrierWithGroupSync();
+
+ // Do read converting
+ // There is *only* CUDA support for half/float *converting* reads for 1d, 2d, 3d shapes for RWTexture/surface
+
+ float4 v4 = rwt2D_4[uint2(idx, idx)];
+ float2 v2 = rwt2D_2[uint2(idx, idx)];
+ float v = rwt2D[uint2(idx, idx)];
+
+ val += v4.x + v2.x + v;
+
+ outputBuffer[idx] = val;
+}
diff --git a/tests/compute/half-rw-texture-convert2.slang.expected.txt b/tests/compute/half-rw-texture-convert2.slang.expected.txt
new file mode 100644
index 000000000..462941fda
--- /dev/null
+++ b/tests/compute/half-rw-texture-convert2.slang.expected.txt
@@ -0,0 +1,5 @@
+type: float
+96.000000
+100.000000
+104.000000
+108.000000