diff options
| -rw-r--r-- | docs/cuda-target.md | 57 | ||||
| -rw-r--r-- | docs/target-compatibility.md | 3 | ||||
| -rw-r--r-- | prelude/slang-cuda-prelude.h | 90 | ||||
| -rw-r--r-- | source/slang/core.meta.slang | 2 | ||||
| -rw-r--r-- | source/slang/slang-intrinsic-expand.cpp | 86 | ||||
| -rw-r--r-- | tests/compute/half-rw-texture-convert.slang | 14 | ||||
| -rw-r--r-- | tests/compute/half-rw-texture-convert2.slang | 53 | ||||
| -rw-r--r-- | tests/compute/half-rw-texture-convert2.slang.expected.txt | 5 |
8 files changed, 276 insertions, 34 deletions
diff --git a/docs/cuda-target.md b/docs/cuda-target.md index 47b058f50..d5affc0e9 100644 --- a/docs/cuda-target.md +++ b/docs/cuda-target.md @@ -20,11 +20,11 @@ These limitations apply to Slang transpiling to CUDA. * Samplers are not separate objects in CUDA - they are combined into a single 'TextureObject'. So samplers are effectively ignored on CUDA targets. * When using a TextureArray.Sample (layered texture in CUDA) - the index will be treated as an int, as this is all CUDA allows * Care must be used in using `WaveGetLaneIndex` wave intrinsic - it will only give the right results for appropriate launches -* CUDA 'surfaces' are used for textures which are read/write. CUDA does NOT do format conversion with surfaces. +* CUDA 'surfaces' are used for textures which are read/write (aka RWTexture). The following are a work in progress or not implemented but are planned to be so in the future -* Some resource types remain unsupported, and not all methods on types are supported +* Some resource types remain unsupported, and not all methods on all types are supported # How it works @@ -122,8 +122,6 @@ The UniformState and UniformEntryPointParams struct typically vary by shader. Un size_t sizeInBytes; ``` - - ## Texture Read only textures will be bound as the opaque CUDA type CUtexObject. This type is the combination of both a texture AND a sampler. This is somewhat different from HLSL, where there can be separate `SamplerState` variables. This allows access of a single texture binding with different types of sampling. @@ -138,11 +136,58 @@ Load is only supported for Texture1D, and the mip map selection argument is igno RWTexture types are converted into CUsurfObject type. -In CUDA it is not possible to do a format conversion on an access to a CUsurfObject, so it must be backed by the same data format as is used within the Slang source code. +In regular CUDA it is not possible to do a format conversion on an access to a CUsurfObject. Slang does add support for hardware write conversions where they are available. To enable the feature it is necessary to attribute your RWTexture with `format`. For example + +``` +[format("rg16f")] +RWTexture2D<float2> rwt2D_2; +``` + +The format names used are the same as for (GLSL layout format types)[https://www.khronos.org/opengl/wiki/Layout_Qualifier_(GLSL)]. If no format is specified Slang will *assume* that the format is the same as the type specified. + +Note that the format attribution is on variables/paramters/fields and not part of the type system. This means that if you have a scenario like... + +``` +[format(rg16f)] +RWTexture2d<float2> g_texture; + +float2 getValue(RWTexture2D<float2> t) +{ + return t[int2(0, 0]; +} + +void doThing() +{ + float2 v = getValue(g_texture); +} +``` + +Even `getValue` will receive t *without* the format attribute, and so will access it, presumably erroneously. A work around for this specific scenario would be to attribute the parameter + +``` +float2 getValue([format("rg16f")] RWTexture2D<float2> t) +{ + return t[int2(0, 0]; +} +``` + +This will only work correctly if `getValue` is called with a `t` that has that format attribute. As it stands no checking is performed on this matching so no error or warning will be produced if there is a mismatch. + +There is limited software support for doing a conversion on reading. Currently this only supports only 1D, 2D, 3D RWTexture, backed with half1, half2 or half4. For this path to work NVRTC must have the `cuda_fp16.h` and associated files available. Please check the section on `Half Support`. + +If hardware read conversions are desired, this can be achieved by having a Texture<T> that uses the surface of a RWTexture<T>. Using the Texture<T> not only allows hardware conversion but also filtering. It is also worth noting that CUsurfObjects in CUDA are NOT allowed to have mip maps. -By default surface access uses cudaBoundaryModeZero, this can be replaced using the macro SLANG_CUDA_BOUNDARY_MODE in the CUDA prelude. +By default surface access uses cudaBoundaryModeZero, this can be replaced using the macro SLANG_CUDA_BOUNDARY_MODE in the CUDA prelude. For HW format conversions the macro SLANG_PTX_BOUNDARY_MODE. These boundary settings are in effect global for the whole of the kernel. + +`SLANG_CUDA_BOUNDARY_MODE` can be one of + +* cudaBoundaryModeZero causes an execution trap on out-of-bounds addresses +* cudaBoundaryModeClamp stores data at the nearest surface location (sized appropriately) +* cudaBoundaryModeTrap drops stores to out-of-bounds addresses + +`SLANG_PTX_BOUNDARY_MODE` can be one of `trap`, `clamp` or `zero`. In general it is recommended to have both set to the same type of value, for example `cudaBoundaryModeZero` and `zero`. ## Sampler diff --git a/docs/target-compatibility.md b/docs/target-compatibility.md index 54d7bf997..6fda4ceed 100644 --- a/docs/target-compatibility.md +++ b/docs/target-compatibility.md @@ -1,7 +1,6 @@ Slang Target Compatibility ========================== - Shader Model (SM) numbers are D3D Shader Model versions, unless explicitly stated otherwise. OpenGL compatibility is not listed here, because OpenGL isn't an officially supported target. @@ -203,8 +202,6 @@ uint64_t RWByteAddressBuffer::InterlockedMinU64(uint byteAddress, uint64_t value uint64_t RWByteAddressBuffer::InterlockedAndU64(uint byteAddress, uint64_t value); uint64_t RWByteAddressBuffer::InterlockedOrU64(uint byteAddress, uint64_t value); uint64_t RWByteAddressBuffer::InterlockedXorU64(uint byteAddress, uint64_t value); - - ``` On HLSL based targets this functionality is achieved using [NVAPI](https://developer.nvidia.com/nvapi). Support for NVAPI is described diff --git a/prelude/slang-cuda-prelude.h b/prelude/slang-cuda-prelude.h index a18da027b..4df60e965 100644 --- a/prelude/slang-cuda-prelude.h +++ b/prelude/slang-cuda-prelude.h @@ -381,6 +381,41 @@ SLANG_SURFACE_WRITE(surf2DLayeredwrite, (int x, int y, int layer), (x, y, layer) SLANG_SURFACE_WRITE(surfCubemapwrite, (int x, int y, int face), (x, y, face)) SLANG_SURFACE_WRITE(surfCubemapLayeredwrite, (int x, int y, int layerFace), (x, y, layerFace)) +// ! Hack to test out reading !!! +// Only works converting *from* half + +//template <typename T> +//SLANG_FORCE_INLINE SLANG_CUDA_CALL T surf2Dread_convert(cudaSurfaceObject_t surfObj, int x, int y, cudaSurfaceBoundaryMode boundaryMode); + +#define SLANG_SURFACE_READ_HALF_CONVERT(FUNC_NAME, TYPE_ARGS, ARGS) \ +\ +template <typename T> \ +SLANG_FORCE_INLINE SLANG_CUDA_CALL T FUNC_NAME##_convert(cudaSurfaceObject_t surfObj, SLANG_DROP_PARENS TYPE_ARGS, cudaSurfaceBoundaryMode boundaryMode); \ +\ +template <> \ +SLANG_FORCE_INLINE SLANG_CUDA_CALL float FUNC_NAME##_convert<float>(cudaSurfaceObject_t surfObj, SLANG_DROP_PARENS TYPE_ARGS, cudaSurfaceBoundaryMode boundaryMode) \ +{ \ + return __ushort_as_half(FUNC_NAME<uint16_t>(surfObj, SLANG_DROP_PARENS ARGS, boundaryMode)); \ +} \ +\ +template <> \ +SLANG_FORCE_INLINE SLANG_CUDA_CALL float2 FUNC_NAME##_convert<float2>(cudaSurfaceObject_t surfObj, SLANG_DROP_PARENS TYPE_ARGS, cudaSurfaceBoundaryMode boundaryMode) \ +{ \ + const __half2 v = __ushort_as_half(FUNC_NAME<ushort2>(surfObj, SLANG_DROP_PARENS ARGS, boundaryMode)); \ + return float2{v.x, v.y}; \ +} \ +\ +template <> \ +SLANG_FORCE_INLINE SLANG_CUDA_CALL float4 FUNC_NAME##_convert<float4>(cudaSurfaceObject_t surfObj, SLANG_DROP_PARENS TYPE_ARGS, cudaSurfaceBoundaryMode boundaryMode) \ +{ \ + const __half4 v = __ushort_as_half(FUNC_NAME<ushort4>(surfObj, SLANG_DROP_PARENS ARGS, boundaryMode)); \ + return float4{v.xy.x, v.xy.y, v.zw.x, v.zw.y}; \ +} + +SLANG_SURFACE_READ_HALF_CONVERT(surf1Dread, (int x), (x)) +SLANG_SURFACE_READ_HALF_CONVERT(surf2Dread, (int x, int y), (x, y)) +SLANG_SURFACE_READ_HALF_CONVERT(surf3Dread, (int x, int y, int z), (x, y, z)) + #endif // Support for doing format conversion when writing to a surface/RWTexture @@ -392,10 +427,14 @@ template <typename T> SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf1Dwrite_convert(T, cudaSurfaceObject_t surfObj, int x, cudaSurfaceBoundaryMode boundaryMode); template <typename T> SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf2Dwrite_convert(T, cudaSurfaceObject_t surfObj, int x, int y, cudaSurfaceBoundaryMode boundaryMode); +template <typename T> +SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf3Dwrite_convert(T, cudaSurfaceObject_t surfObj, int x, int y, int z, cudaSurfaceBoundaryMode boundaryMode); // https://docs.nvidia.com/cuda/inline-ptx-assembly/index.html // https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#surface-instructions-sust +// Float + template <> SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf1Dwrite_convert<float>(float v, cudaSurfaceObject_t surfObj, int x, cudaSurfaceBoundaryMode boundaryMode) { @@ -408,6 +447,57 @@ SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf2Dwrite_convert<float>(float v, cuda asm volatile ( "{sust.p.2d.b32." SLANG_PTX_BOUNDARY_MODE " [%0, {%1,%2}], {%3};}\n\t" :: "l"(surfObj),"r"(x),"r"(y),"f"(v)); } +template <> +SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf3Dwrite_convert<float>(float v, cudaSurfaceObject_t surfObj, int x, int y, int z, cudaSurfaceBoundaryMode boundaryMode) +{ + asm volatile ( "{sust.p.2d.b32." SLANG_PTX_BOUNDARY_MODE " [%0, {%1,%2,%3}], {%4};}\n\t" :: "l"(surfObj),"r"(x),"r"(y),"r"(z),"f"(v)); +} + +// Float2 + +template <> +SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf1Dwrite_convert<float2>(float2 v, cudaSurfaceObject_t surfObj, int x, cudaSurfaceBoundaryMode boundaryMode) +{ + const float vx = v.x, vy = v.y; + asm volatile ( "{sust.p.1d.b32." SLANG_PTX_BOUNDARY_MODE " [%0, {%1}], {%2,%3};}\n\t" :: "l"(surfObj),"r"(x),"f"(vx),"f"(vy)); +} + +template <> +SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf2Dwrite_convert<float2>(float2 v, cudaSurfaceObject_t surfObj, int x, int y, cudaSurfaceBoundaryMode boundaryMode) +{ + const float vx = v.x, vy = v.y; + asm volatile ( "{sust.p.2d.b32." SLANG_PTX_BOUNDARY_MODE " [%0, {%1,%2}], {%3,%4};}\n\t" :: "l"(surfObj),"r"(x),"r"(y),"f"(vx),"f"(vy)); +} + +template <> +SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf3Dwrite_convert<float2>(float2 v, cudaSurfaceObject_t surfObj, int x, int y, int z, cudaSurfaceBoundaryMode boundaryMode) +{ + const float vx = v.x, vy = v.y; + asm volatile ( "{sust.p.2d.b32." SLANG_PTX_BOUNDARY_MODE " [%0, {%1,%2,%3}], {%4,%5};}\n\t" :: "l"(surfObj),"r"(x),"r"(y),"r"(z),"f"(vx),"f"(vy)); +} + +// Float4 +template <> +SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf1Dwrite_convert<float4>(float4 v, cudaSurfaceObject_t surfObj, int x, cudaSurfaceBoundaryMode boundaryMode) +{ + const float vx = v.x, vy = v.y, vz = v.z, vw = v.w; + asm volatile ( "{sust.p.1d.b32." SLANG_PTX_BOUNDARY_MODE " [%0, {%1}], {%2,%3,%4,%5};}\n\t" :: "l"(surfObj),"r"(x),"f"(vx),"f"(vy),"f"(vz),"f"(vw)); +} + +template <> +SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf2Dwrite_convert<float4>(float4 v, cudaSurfaceObject_t surfObj, int x, int y, cudaSurfaceBoundaryMode boundaryMode) +{ + const float vx = v.x, vy = v.y, vz = v.z, vw = v.w; + asm volatile ( "{sust.p.2d.b32." SLANG_PTX_BOUNDARY_MODE " [%0, {%1,%2}], {%3,%4,%5,%6};}\n\t" :: "l"(surfObj),"r"(x),"r"(y),"f"(vx),"f"(vy),"f"(vz),"f"(vw)); +} + +template <> +SLANG_FORCE_INLINE SLANG_CUDA_CALL void surf3Dwrite_convert<float4>(float4 v, cudaSurfaceObject_t surfObj, int x, int y, int z, cudaSurfaceBoundaryMode boundaryMode) +{ + const float vx = v.x, vy = v.y, vz = v.z, vw = v.w; + asm volatile ( "{sust.p.2d.b32." SLANG_PTX_BOUNDARY_MODE " [%0, {%1,%2,%3}], {%4,%5,%6,%7};}\n\t" :: "l"(surfObj),"r"(x),"r"(y),"r"(z),"f"(vx),"f"(vy),"f"(vz),"f"(vw)); +} + // ----------------------------- F32 ----------------------------------------- // Unary diff --git a/source/slang/core.meta.slang b/source/slang/core.meta.slang index 9e5cf80c8..6b73630a3 100644 --- a/source/slang/core.meta.slang +++ b/source/slang/core.meta.slang @@ -1083,7 +1083,7 @@ for (int tt = 0; tt < kBaseTextureTypeCount; ++tt) } sb << (isArray ? "Layered" : ""); - sb << "read<$T0>($0"; + sb << "read$C<$T0>($0"; for (int i = 0; i < vecCount; ++i) { diff --git a/source/slang/slang-intrinsic-expand.cpp b/source/slang/slang-intrinsic-expand.cpp index c5bedbc37..bd2e17b28 100644 --- a/source/slang/slang-intrinsic-expand.cpp +++ b/source/slang/slang-intrinsic-expand.cpp @@ -1,6 +1,8 @@ // slang-intrinsic-expand.cpp #include "slang-intrinsic-expand.h" +#include "slang-emit-cuda.h" + namespace Slang { void IntrinsicExpandContext::emit(IRCall* inst, IRUse* args, Int argCount, const UnownedStringSlice& intrinsicText) @@ -101,13 +103,13 @@ static BaseType _getBaseTypeFromScalarType(SlangScalarType type) // The VK back-end gets away with this kind of coincidentally, since the "legalization" we have to do for resources means that there wouldn't be a single f() function any more. // But for CUDA and C++ that's not the case or generally desirable. -static IRFormatDecoration* _findImageFormatDecoration(IRInst* inst) +static IRFormatDecoration* _findImageFormatDecoration(IRInst* resourceInst) { // JS(TODO): // There could perhaps be other situations, that need to be covered // If this is a load, we need to get the decoration from the field key - if (IRLoad* load = as<IRLoad>(inst)) + if (IRLoad* load = as<IRLoad>(resourceInst)) { if (IRFieldAddress* fieldAddress = as<IRFieldAddress>(load->getOperand(0))) { @@ -116,7 +118,7 @@ static IRFormatDecoration* _findImageFormatDecoration(IRInst* inst) } } // Otherwise just try on the instruction - return inst->findDecoration<IRFormatDecoration>(); + return resourceInst->findDecoration<IRFormatDecoration>(); } // Returns true if dataType and imageFormat are compatible - that they have the same representation, @@ -149,36 +151,26 @@ static bool _isImageFormatCompatible(ImageFormat imageFormat, IRType* dataType) return formatBaseType == baseType; } -static bool _isConvertRequired(ImageFormat imageFormat, IRInst* resourceVar) +static bool _isConvertRequired(ImageFormat imageFormat, IRInst* callee) { - auto textureType = as<IRTextureTypeBase>(resourceVar->getDataType()); + auto textureType = as<IRTextureTypeBase>(callee->getDataType()); IRType* elementType = textureType ? textureType->getElementType() : nullptr; return elementType && !_isImageFormatCompatible(imageFormat, elementType); } -static size_t _calcBackingElementSizeInBytes(IRInst* resourceVar) +static size_t _calcBackingElementSizeInBytes(IRInst* resourceInst) { // First see if there is a format associated with the resource - if (IRFormatDecoration* formatDecoration = _findImageFormatDecoration(resourceVar)) + if (IRFormatDecoration* formatDecoration = _findImageFormatDecoration(resourceInst)) { - const ImageFormat imageFormat = formatDecoration->getFormat(); - - if (_isConvertRequired(imageFormat, resourceVar)) - { - // If the access is a converting access then the x coordinate is *NOT* scaled - // This is a CUDA specific issue(!). - return 1; - } - - const auto& imageFormatInfo = getImageFormatInfo(imageFormat); - return imageFormatInfo.sizeInBytes; + return getImageFormatInfo(formatDecoration->getFormat()).sizeInBytes; } else { // If not we *assume* the backing format is the same as the element type used for access. /// Ie in RWTexture<T>, this would return sizeof(T) - auto textureType = as<IRTextureTypeBase>(resourceVar->getDataType()); + auto textureType = as<IRTextureTypeBase>(resourceInst->getDataType()); IRType* elementType = textureType ? textureType->getElementType() : nullptr; if (elementType) @@ -206,6 +198,18 @@ static size_t _calcBackingElementSizeInBytes(IRInst* resourceVar) return 4; } +static bool _isResourceRead(IRCall* call) +{ + IRType* returnType = call->getDataType(); + return returnType && (as<IRVoidType>(returnType) == nullptr); +} + +static bool _isResourceWrite(IRCall* call) +{ + IRType* returnType = call->getDataType(); + return returnType && (as<IRVoidType>(returnType) != nullptr); +} + const char* IntrinsicExpandContext::_emitSpecial(const char* cursor) { const char*const end = m_text.end(); @@ -323,13 +327,35 @@ const char* IntrinsicExpandContext::_emitSpecial(const char* cursor) // writes that will do a format conversion. if (m_emitter->getTarget() == CodeGenTarget::CUDASource) { - IRInst* arg0 = m_callInst->getArg(0); + IRInst* resourceInst = m_callInst->getArg(0); - if (IRFormatDecoration* formatDecoration = _findImageFormatDecoration(arg0)) + if (IRFormatDecoration* formatDecoration = _findImageFormatDecoration(resourceInst)) { const ImageFormat imageFormat = formatDecoration->getFormat(); - if (_isConvertRequired(imageFormat, arg0)) + if (_isConvertRequired(imageFormat, resourceInst)) { + // If the function returns something it's a reader so we may need to convert + // and in doing so require half + if (_isResourceRead(m_callInst)) + { + // If the source format if half derived, then we need to enable half + switch (imageFormat) + { + case ImageFormat::r16f: + case ImageFormat::rg16f: + case ImageFormat::rgba16f: + { + CUDAExtensionTracker* extensionTracker = as<CUDAExtensionTracker>(m_emitter->getExtensionTracker()); + if (extensionTracker) + { + extensionTracker->requireBaseType(BaseType::Half); + } + break; + } + default: break; + } + } + // Append _convert on the name to signify we need to use a code path, that will automatically // do the format conversion. m_writer->emit("_convert"); @@ -344,7 +370,21 @@ const char* IntrinsicExpandContext::_emitSpecial(const char* cursor) /// Sometimes accesses need to be scaled. For example in CUDA the x coordinate for surface /// access is byte addressed. /// $E will return the byte size of the *backing element*. - size_t elemSizeInBytes = _calcBackingElementSizeInBytes(m_callInst->getArg(0)); + + IRInst* resourceInst = m_callInst->getArg(0); + size_t elemSizeInBytes = _calcBackingElementSizeInBytes(resourceInst); + + // If we have a format converstion and its a *write* we don't need to scale + if (IRFormatDecoration* formatDecoration = _findImageFormatDecoration(resourceInst)) + { + const ImageFormat imageFormat = formatDecoration->getFormat(); + if (_isConvertRequired(imageFormat, resourceInst) && _isResourceWrite(m_callInst)) + { + // If there is a conversion *and* it's a write we don't need to scale. + elemSizeInBytes = 1; + } + } + SLANG_ASSERT(elemSizeInBytes > 0); m_writer->emitUInt64(UInt64(elemSizeInBytes)); break; diff --git a/tests/compute/half-rw-texture-convert.slang b/tests/compute/half-rw-texture-convert.slang index cf6eea4ea..338f44454 100644 --- a/tests/compute/half-rw-texture-convert.slang +++ b/tests/compute/half-rw-texture-convert.slang @@ -25,6 +25,14 @@ [format("r16f")] RWTexture2D<float> rwt2D; +//TEST_INPUT: RWTexture2D(format=RG_Float16, size=4, content = one, mipMaps = 1):name rwt2D_2 +[format("rg16f")] +RWTexture2D<float2> rwt2D_2; + +//TEST_INPUT: RWTexture2D(format=RGBA_Float16, size=4, content = one, mipMaps = 1):name rwt2D_4 +[format("rgba16f")] +RWTexture2D<float4> rwt2D_4; + //TEST_INPUT: ubuffer(data=[0 0 0 0], stride=4):out,name outputBuffer RWStructuredBuffer<float> outputBuffer; @@ -38,5 +46,9 @@ void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID) // Do a format converting write! rwt2D[uint2(idx, idx)] = val; + rwt2D_2[uint2(idx, idx)] = float2(val * 2, val * 3); + + rwt2D_4[uint2(idx, idx)] = float4(val + 1, val - 1, val * 4, val * -4); + outputBuffer[idx] = val; -} +}
\ No newline at end of file diff --git a/tests/compute/half-rw-texture-convert2.slang b/tests/compute/half-rw-texture-convert2.slang new file mode 100644 index 000000000..e9b7200c4 --- /dev/null +++ b/tests/compute/half-rw-texture-convert2.slang @@ -0,0 +1,53 @@ +// Native half not supported on CPU currently +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-cpu -compute -output-using-type -shaderobj +// Doesn't work on DX11 currently - locks up on binding +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -output-using-type -shaderobj +// Produces a different result on DX12 with DXBC than expected(!). So disabled for now +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -dx12 -output-using-type -shaderobj +//DISABLE_TEST(compute):COMPARE_COMPUTE_EX:-slang -compute -dx12 -profile cs_6_0 -use-dxil -output-using-type -shaderobj +// TODO(JS): Doesn't work on vk currently, because createTextureView not implemented on vk renderer +//DIABLE_TEST(compute, vulkan):COMPARE_COMPUTE_EX:-vk -compute -output-using-type -shaderobj + +//TEST(compute):COMPARE_COMPUTE_EX:-cuda -compute -output-using-type -shaderobj -render-features half + +//TEST_INPUT: RWTexture2D(format=R_Float16, size=4, content = one, mipMaps = 1):name rwt2D +[format("r16f")] +RWTexture2D<float> rwt2D; + +//TEST_INPUT: RWTexture2D(format=RG_Float16, size=4, content = one, mipMaps = 1):name rwt2D_2 +[format("rg16f")] +RWTexture2D<float2> rwt2D_2; + +//TEST_INPUT: RWTexture2D(format=RGBA_Float16, size=4, content = one, mipMaps = 1):name rwt2D_4 +[format("rgba16f")] +RWTexture2D<float4> rwt2D_4; + +//TEST_INPUT: ubuffer(data=[0 0 0 0], stride=4):out,name outputBuffer +RWStructuredBuffer<float> outputBuffer; + +[numthreads(4, 1, 1)] +void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID) +{ + int idx = dispatchThreadID.x; + + float val = 0; + + // Do a format converting write2! + rwt2D[uint2(idx, idx)] = idx - 1; + rwt2D_2[uint2(idx, idx)] = float2(idx + idx, idx * idx); + rwt2D_4[uint2(idx, idx)] = float4(idx + 97, idx + 8, idx + 16, idx + 24); + + // May not be strictly necessary + AllMemoryBarrierWithGroupSync(); + + // Do read converting + // There is *only* CUDA support for half/float *converting* reads for 1d, 2d, 3d shapes for RWTexture/surface + + float4 v4 = rwt2D_4[uint2(idx, idx)]; + float2 v2 = rwt2D_2[uint2(idx, idx)]; + float v = rwt2D[uint2(idx, idx)]; + + val += v4.x + v2.x + v; + + outputBuffer[idx] = val; +} diff --git a/tests/compute/half-rw-texture-convert2.slang.expected.txt b/tests/compute/half-rw-texture-convert2.slang.expected.txt new file mode 100644 index 000000000..462941fda --- /dev/null +++ b/tests/compute/half-rw-texture-convert2.slang.expected.txt @@ -0,0 +1,5 @@ +type: float +96.000000 +100.000000 +104.000000 +108.000000 |
