<feed xmlns='http://www.w3.org/2005/Atom'>
<title>slang.git/prelude, branch master</title>
<subtitle>Making it easier to work with shaders</subtitle>
<id>https://git.yummers.dev/slang.git/atom?h=master</id>
<link rel='self' href='https://git.yummers.dev/slang.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/'/>
<updated>2025-10-16T03:59:47+00:00</updated>
<entry>
<title>Immutable access qualifier for pointers and use `__ldg` on cuda. (#8710)</title>
<updated>2025-10-16T03:59:47+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2025-10-16T03:59:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=01510f2c922af8629c7a730ef92a31fa83bd9f49'/>
<id>urn:sha1:01510f2c922af8629c7a730ef92a31fa83bd9f49</id>
<content type='text'>
This PR implements `Access.Immutable` to allow pointers to immutable
data.

The new type `ImmutablePtr&lt;T&gt;` is defined as an alias of `Ptr&lt;T,
Address.Immutable&gt;`.
By forming a immutable pointer, the programmer is conveying to the
compiler that the data at the pointer address will never change during
the execution of the current program. Therefore loads from immutable
pointers can be deduplicated by the compiler, and will translate to
`__ldg` when generating code for CUDA.

The SPIRV backend is not changed in this PR, since the current SPIRV
spec makes it very difficult to specify loads from immutable address
without generating tons of wrappers and boilerplate type declarations.
We would like to see the spec evolved a bit to around its support of
`NonWritable` physical storage pointers or immutable loads before we
attempt to express such immutability in SPIRV. For now we simply emit
ordinary pointers and loads when generating spirv.

---------

Co-authored-by: slangbot &lt;186143334+slangbot@users.noreply.github.com&gt;</content>
</entry>
<entry>
<title>Add support targeting older OptiX versions (#8700)</title>
<updated>2025-10-14T15:21:03+00:00</updated>
<author>
<name>Simon Kallweit</name>
<email>64953474+skallweitNV@users.noreply.github.com</email>
</author>
<published>2025-10-14T15:21:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=5978f934ee9a8a3e710dc743a4af92191639b718'/>
<id>urn:sha1:5978f934ee9a8a3e710dc743a4af92191639b718</id>
<content type='text'>
Currently, the emitted CUDA code does only compile with latest OptiX
9.0. This change allows code to be compiled with OptiX 8.0 upwards by
not emitting OptiX calls that are not available. In a later step we
should add proper capabilities for the various OptiX versions.</content>
</entry>
<entry>
<title>Improve texture loads and stores on CUDA (#8644)</title>
<updated>2025-10-08T18:18:23+00:00</updated>
<author>
<name>Simon Kallweit</name>
<email>64953474+skallweitNV@users.noreply.github.com</email>
</author>
<published>2025-10-08T18:18:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=4b3fd0a4536f873649e6fcdefbaf2f8b6f0897e0'/>
<id>urn:sha1:4b3fd0a4536f873649e6fcdefbaf2f8b6f0897e0</id>
<content type='text'>
- fix handling layer and mip level
- add support for 1D layered textures
- reduce code by using macros
- assert when trying to emit unsupported intrinsics

There is a new set of unit tests in slang-rhi for exhaustive testing of
shader loads/stores on textures. These fixes allow to enable most of
these tests. Formatted loads/stores on surfaces are not supported in PTX
ISA, so this would require codegen for the conversion which in theory
should be possible but not as part of the CUDA prelude.</content>
</entry>
<entry>
<title>Enhance buffer load specialization pass to specialize past field extracts. (#8547)</title>
<updated>2025-10-01T02:08:23+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2025-10-01T02:08:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=e4611e2e30a3e5969d402f5ed7e72706a0e3b024'/>
<id>urn:sha1:e4611e2e30a3e5969d402f5ed7e72706a0e3b024</id>
<content type='text'>
This allows us to specialize functions whose argument is a sub element
of a constant buffer, instead of being only applicable to entire buffer
element. Closes #8421.

This change also implements a proper heuristic to determine when to
specialize the calls and defer the buffer loads.

This PR addresses a pathological case exposed in
`slangpy\slangpy\benchmarks\test_benchmark_tensor.py`, which used to
take 27ms to finish, and now takes 1.25ms.


For example, given:
```
struct Bottom
{
    float bigArray[1024];

    [mutating]
    void setVal(int index, float value) { bigArray[index] = value; }
}

struct Root
{
    Bottom top[2];
    [mutating]
    void setTopVal(int x, int y, float value)
    {
        top[x].setVal(y, value);
    }
}

RWStructuredBuffer&lt;Root&gt; sb;

[shader("compute")]
[numthreads(1, 1, 1)]
void compute_main(uint3 tid: SV_DispatchThreadID)
{
    sb[0].setTopVal(1, 2, 100.0f);
}
```

We are now able to specialize the call to `setTopVal` into:
```
void compute_main(uint3 tid: SV_DispatchThreadID)
{
    setTopVal_specialized(0, 1, 2, 100.0f);
}

void setTopVal_specialized(int sbIdx, int x, int y, float value)
{
      Bottom_setVal_specialized(sbIdx, x, y, value);
}

void Bottom_setVal_specialized(int sbIdx, int x, int y, float value)
{
     sb[sbIdx].top[x].bigArray[y] = value;
}
```

And get rid of all unnecessary loads. Achieving this requires a
combination of function call specialization and buffer-load-defer pass.
The buffer-load-defer pass has been completely rewritten to be more
correct and avoid introducing redundant loads.

This PR also adds tests to make sure pointers, bindless handles, and
loads from structured buffer or constant buffers works as expected.</content>
</entry>
<entry>
<title>Always define OptixTraversableHandle (#8411)</title>
<updated>2025-09-09T15:55:27+00:00</updated>
<author>
<name>Simon Kallweit</name>
<email>64953474+skallweitNV@users.noreply.github.com</email>
</author>
<published>2025-09-09T15:55:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=3745d75543ded3bc457ff5aba152141d382ba306'/>
<id>urn:sha1:3745d75543ded3bc457ff5aba152141d382ba306</id>
<content type='text'>
This fixes an issue where non-raytracing kernels couldn't contain any
RaytracingAccelerationStructure resources even when not used.</content>
</entry>
<entry>
<title>Enable CUDA support for additional HLSL intrinsic tests (#8293)</title>
<updated>2025-09-04T05:28:02+00:00</updated>
<author>
<name>Harsh Aggarwal (NVIDIA)</name>
<email>haaggarwal@nvidia.com</email>
</author>
<published>2025-09-04T05:28:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=5ec41675d817f82a7ce3c4d79c68548db0bd4227'/>
<id>urn:sha1:5ec41675d817f82a7ce3c4d79c68548db0bd4227</id>
<content type='text'>
Enable CUDA support for additional HLSL intrinsic tests by implementing
missing functionality and fixing compiler bugs affecting CUDA targets.

- Fix critical bug in InterlockedCompareStore64 where division used /4
instead of /8 for 64-bit types, causing incorrect memory addressing for
all signed int 64_t atomics
- Add signed int64_t atomic wrappers (atomicExch, atomicCAS) to CUDA
prelu de that properly cast to/from unsigned types as required by CUDA's
atomic API
- Enable tests: atomic-intrinsics-64bit.slang

- Implement CUDA support for QuadAny and QuadAll operations using warp
shu ffle primitives (__shfl_sync with quad-level lane masking)
- Add CUDA to quad_control capability definition in
slang-capabilities.capdef
- Add _slang_quadAny/_slang_quadAll helper functions to CUDA prelude
- Enable tests: quad-control-comp-functionality.slang,
subgroup-quad.slang

---------

Co-authored-by: szihs &lt;675653+szihs@users.noreply.github.com&gt;</content>
</entry>
<entry>
<title>Updated support to enable batch3  (#8219)</title>
<updated>2025-08-20T09:11:06+00:00</updated>
<author>
<name>Harsh Aggarwal (NVIDIA)</name>
<email>haaggarwal@nvidia.com</email>
</author>
<published>2025-08-20T09:11:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=e0c20a076f2ec84586b6508664df4f59273c6aaf'/>
<id>urn:sha1:e0c20a076f2ec84586b6508664df4f59273c6aaf</id>
<content type='text'>
Enable CUDA support for batch 3 tests
    - Enhanced wave operations with exclusive support
    - Added proper identity values for min/max operations
    - Fixed intrinsic name mapping issues
    - Updated test configurations

Co-authored-by: Ellie Hermaszewska &lt;ellieh@nvidia.com&gt;</content>
</entry>
<entry>
<title>Enable CUDA testing for batch 2 (#8147)</title>
<updated>2025-08-12T18:19:15+00:00</updated>
<author>
<name>jarcherNV</name>
<email>jarcher@nvidia.com</email>
</author>
<published>2025-08-12T18:19:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=3618f7a4c35de32405e6ebe458bae835869e2876'/>
<id>urn:sha1:3618f7a4c35de32405e6ebe458bae835869e2876</id>
<content type='text'>
Enable CUDA for the tests listed in issue #8078
This requires a minor CUDA prelude change, adding some math functions.</content>
</entry>
<entry>
<title>Fix intrinsic LoadLocalRootTableConstant for optix (#7949)</title>
<updated>2025-08-07T21:43:25+00:00</updated>
<author>
<name>Harsh Aggarwal (NVIDIA)</name>
<email>haaggarwal@nvidia.com</email>
</author>
<published>2025-08-07T21:43:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=e595743b5aa4f6bd88800cfbcfd6eead3cc3d01b'/>
<id>urn:sha1:e595743b5aa4f6bd88800cfbcfd6eead3cc3d01b</id>
<content type='text'>
Due to an older version of spec referred there was an inconsitency v1.29
2/20/2025 - [HitObject LoadLocalRootArgumentsConstant]

Latest spec

https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html#hitobject-loadlocalroottableconstant

Refer:
OptiX backend support for Shader Execution Reordering (SER) features as
outlined in issue #6647. -</content>
</entry>
<entry>
<title>Initial copy elision pass (#8042)</title>
<updated>2025-08-07T07:22:22+00:00</updated>
<author>
<name>ArielG-NV</name>
<email>159081215+ArielG-NV@users.noreply.github.com</email>
</author>
<published>2025-08-07T07:22:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=063cbeaaea2fb00a10c6058ea4a9632092772ea5'/>
<id>urn:sha1:063cbeaaea2fb00a10c6058ea4a9632092772ea5</id>
<content type='text'>
Fixes #7574

Changes:
* Add an initial (fairly simple) optimization pass which is able to
eliminate redundant copies.
* Our current existing optimizer passes remove redundant load/store very
robustly, this pass will focus on other cases of copy elimination
* Primary approach is to make all functions which are `in T` and `T` is
trivial to copy into a `__constref T`. We then (depending on scenario)
manually insert a variable+load if a pass-by-reference is not possible;
otherwise we pass by `constref`.
* Added optimizations to eliminate redundant code which causes
`constref` to fail to compile

---------

Co-authored-by: Harsh Aggarwal &lt;haaggarwal@nvidia.com&gt;
Co-authored-by: Claude &lt;noreply@anthropic.com&gt;
Co-authored-by: slangbot &lt;ellieh+slangbot@nvidia.com&gt;
Co-authored-by: slangbot &lt;186143334+slangbot@users.noreply.github.com&gt;</content>
</entry>
</feed>
