<feed xmlns='http://www.w3.org/2005/Atom'>
<title>slang.git/tests/cuda, branch master</title>
<subtitle>Making it easier to work with shaders</subtitle>
<id>https://git.yummers.dev/slang.git/atom?h=master</id>
<link rel='self' href='https://git.yummers.dev/slang.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/'/>
<updated>2025-10-16T03:59:47+00:00</updated>
<entry>
<title>Immutable access qualifier for pointers and use `__ldg` on cuda. (#8710)</title>
<updated>2025-10-16T03:59:47+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2025-10-16T03:59:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=01510f2c922af8629c7a730ef92a31fa83bd9f49'/>
<id>urn:sha1:01510f2c922af8629c7a730ef92a31fa83bd9f49</id>
<content type='text'>
This PR implements `Access.Immutable` to allow pointers to immutable
data.

The new type `ImmutablePtr&lt;T&gt;` is defined as an alias of `Ptr&lt;T,
Address.Immutable&gt;`.
By forming a immutable pointer, the programmer is conveying to the
compiler that the data at the pointer address will never change during
the execution of the current program. Therefore loads from immutable
pointers can be deduplicated by the compiler, and will translate to
`__ldg` when generating code for CUDA.

The SPIRV backend is not changed in this PR, since the current SPIRV
spec makes it very difficult to specify loads from immutable address
without generating tons of wrappers and boilerplate type declarations.
We would like to see the spec evolved a bit to around its support of
`NonWritable` physical storage pointers or immutable loads before we
attempt to express such immutability in SPIRV. For now we simply emit
ordinary pointers and loads when generating spirv.

---------

Co-authored-by: slangbot &lt;186143334+slangbot@users.noreply.github.com&gt;</content>
</entry>
<entry>
<title>Allow 1D SV_DispatchThreadID in CPU targets (#8612)</title>
<updated>2025-10-08T23:13:27+00:00</updated>
<author>
<name>Julius Ikkala</name>
<email>julius.ikkala@gmail.com</email>
</author>
<published>2025-10-08T23:13:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=1e4265edd4ec4c44e3d8f209fca802727076aa46'/>
<id>urn:sha1:1e4265edd4ec4c44e3d8f209fca802727076aa46</id>
<content type='text'>
The varying param legalization pass didn't deal with this 1D form of
SV_DispatchThreadID for CPU targets:

```slang
void computeMain(int i : SV_DispatchThreadID)
```

Instead, it just overrode the type of `i` with a `uint3`, breaking lots
of code that attempted to use `i` for something, like a `switch`
statement for example.

I ran across this when going through `language-feature` tests for the
LLVM target, which will also use this legalization pass. I'm separately
submitting this now because this also fixes the existing CPU target. The
test I enable in this PR is one that was previously generating broken
code on CPU.

(somewhat related issue: #7468)</content>
</entry>
<entry>
<title>Enhance buffer load specialization pass to specialize past field extracts. (#8547)</title>
<updated>2025-10-01T02:08:23+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2025-10-01T02:08:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=e4611e2e30a3e5969d402f5ed7e72706a0e3b024'/>
<id>urn:sha1:e4611e2e30a3e5969d402f5ed7e72706a0e3b024</id>
<content type='text'>
This allows us to specialize functions whose argument is a sub element
of a constant buffer, instead of being only applicable to entire buffer
element. Closes #8421.

This change also implements a proper heuristic to determine when to
specialize the calls and defer the buffer loads.

This PR addresses a pathological case exposed in
`slangpy\slangpy\benchmarks\test_benchmark_tensor.py`, which used to
take 27ms to finish, and now takes 1.25ms.


For example, given:
```
struct Bottom
{
    float bigArray[1024];

    [mutating]
    void setVal(int index, float value) { bigArray[index] = value; }
}

struct Root
{
    Bottom top[2];
    [mutating]
    void setTopVal(int x, int y, float value)
    {
        top[x].setVal(y, value);
    }
}

RWStructuredBuffer&lt;Root&gt; sb;

[shader("compute")]
[numthreads(1, 1, 1)]
void compute_main(uint3 tid: SV_DispatchThreadID)
{
    sb[0].setTopVal(1, 2, 100.0f);
}
```

We are now able to specialize the call to `setTopVal` into:
```
void compute_main(uint3 tid: SV_DispatchThreadID)
{
    setTopVal_specialized(0, 1, 2, 100.0f);
}

void setTopVal_specialized(int sbIdx, int x, int y, float value)
{
      Bottom_setVal_specialized(sbIdx, x, y, value);
}

void Bottom_setVal_specialized(int sbIdx, int x, int y, float value)
{
     sb[sbIdx].top[x].bigArray[y] = value;
}
```

And get rid of all unnecessary loads. Achieving this requires a
combination of function call specialization and buffer-load-defer pass.
The buffer-load-defer pass has been completely rewritten to be more
correct and avoid introducing redundant loads.

This PR also adds tests to make sure pointers, bindless handles, and
loads from structured buffer or constant buffers works as expected.</content>
</entry>
<entry>
<title>Add Optix Intrinsics Coverage (#8159) (#8310)</title>
<updated>2025-09-03T05:16:49+00:00</updated>
<author>
<name>Harsh Aggarwal (NVIDIA)</name>
<email>haaggarwal@nvidia.com</email>
</author>
<published>2025-09-03T05:16:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=f5fae010863c0c836a5a80e659673bcd58010dba'/>
<id>urn:sha1:f5fae010863c0c836a5a80e659673bcd58010dba</id>
<content type='text'>
Add 29 intrinsics to the list by new test</content>
</entry>
<entry>
<title>[CUDA] Fix incorrect `kIROp_RaytracingAccelerationStructureType` emitting logic (#8168)</title>
<updated>2025-08-15T15:24:06+00:00</updated>
<author>
<name>ArielG-NV</name>
<email>159081215+ArielG-NV@users.noreply.github.com</email>
</author>
<published>2025-08-15T15:24:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=f75bf474ef87737c87ef6dcb431bd0b87faee0a8'/>
<id>urn:sha1:f75bf474ef87737c87ef6dcb431bd0b87faee0a8</id>
<content type='text'>
Fixes: #8167

Current emitting logic does not work, this has been corrected.
The provided test ensures our CUDA code is valid by compiling PTX from
it.

`m_writer-&gt;emit("OptixTraversableHandle");` should be `out &lt;&lt;` since
`out` adds to type-name-cache; otherwise using a type twice will produce
bad type-names (since we filled type-name cache with "" instead of
"typeName")</content>
</entry>
<entry>
<title>Fix intrinsic LoadLocalRootTableConstant for optix (#7949)</title>
<updated>2025-08-07T21:43:25+00:00</updated>
<author>
<name>Harsh Aggarwal (NVIDIA)</name>
<email>haaggarwal@nvidia.com</email>
</author>
<published>2025-08-07T21:43:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=e595743b5aa4f6bd88800cfbcfd6eead3cc3d01b'/>
<id>urn:sha1:e595743b5aa4f6bd88800cfbcfd6eead3cc3d01b</id>
<content type='text'>
Due to an older version of spec referred there was an inconsitency v1.29
2/20/2025 - [HitObject LoadLocalRootArgumentsConstant]

Latest spec

https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html#hitobject-loadlocalroottableconstant

Refer:
OptiX backend support for Shader Execution Reordering (SER) features as
outlined in issue #6647. -</content>
</entry>
<entry>
<title>Initial copy elision pass (#8042)</title>
<updated>2025-08-07T07:22:22+00:00</updated>
<author>
<name>ArielG-NV</name>
<email>159081215+ArielG-NV@users.noreply.github.com</email>
</author>
<published>2025-08-07T07:22:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=063cbeaaea2fb00a10c6058ea4a9632092772ea5'/>
<id>urn:sha1:063cbeaaea2fb00a10c6058ea4a9632092772ea5</id>
<content type='text'>
Fixes #7574

Changes:
* Add an initial (fairly simple) optimization pass which is able to
eliminate redundant copies.
* Our current existing optimizer passes remove redundant load/store very
robustly, this pass will focus on other cases of copy elimination
* Primary approach is to make all functions which are `in T` and `T` is
trivial to copy into a `__constref T`. We then (depending on scenario)
manually insert a variable+load if a pass-by-reference is not possible;
otherwise we pass by `constref`.
* Added optimizations to eliminate redundant code which causes
`constref` to fail to compile

---------

Co-authored-by: Harsh Aggarwal &lt;haaggarwal@nvidia.com&gt;
Co-authored-by: Claude &lt;noreply@anthropic.com&gt;
Co-authored-by: slangbot &lt;ellieh+slangbot@nvidia.com&gt;
Co-authored-by: slangbot &lt;186143334+slangbot@users.noreply.github.com&gt;</content>
</entry>
<entry>
<title>Fix 7441: CUDA boolean vector layout to use 1-byte elements (#7862)</title>
<updated>2025-08-01T09:18:53+00:00</updated>
<author>
<name>Harsh Aggarwal (NVIDIA)</name>
<email>haaggarwal@nvidia.com</email>
</author>
<published>2025-08-01T09:18:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=bdda8a90cdd44ca822b21233ac988f92d1f20826'/>
<id>urn:sha1:bdda8a90cdd44ca822b21233ac988f92d1f20826</id>
<content type='text'>
* Fix 7441: CUDA boolean vector layout to use 1-byte elements

Boolean vectors (bool1, bool2, bool3, bool4) were incorrectly implemented
as integer-based types using 4 bytes per element instead of actual 1-byte
boolean elements on CUDA targets.

Changes:
- Update CUDA prelude to define boolean vectors as structs with bool fields
  instead of typedef aliases to integer vectors
- Implement CUDALayoutRulesImpl::GetVectorLayout to use 1-byte alignment
  for boolean vectors, matching actual CUDA memory layout behavior
- Update make_bool functions to populate struct fields correctly

This ensures boolean vectors have the same memory layout as bool[4] arrays:
- bool1: 1 byte (was 4 bytes)
- bool2: 2 bytes (was 8 bytes)
- bool3: 3 bytes (was 12 bytes)
- bool4: 4 bytes (was 16 bytes)

Fixes memory layout mismatch between Slang reflection API and actual
CUDA compilation, achieving 75% memory savings for boolean vector usage.

* Fix CI issues -

Add and update associated functions and operators

* Make boolX same as uchar

* Use align construct on struct for boolX

* Improve Test case for robust alignment checks

* Formatting

* Disable selected slangpy tests

* add metal check which is slightly different than cuda

* Test-1

* Test-2

* Test-3

* Test-4

* ReflectionChange

* cleanup and update

* _slang_select with plain bool is needed for reverse-loop-checkpoint-test</content>
</entry>
<entry>
<title>Add new capdef for lss intrinsics (#7427)</title>
<updated>2025-06-13T06:20:17+00:00</updated>
<author>
<name>Mukund Keshava</name>
<email>mkeshava@nvidia.com</email>
</author>
<published>2025-06-13T06:20:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=e9dcdd884ac616b4c9f5b258fb2a78d10bfce293'/>
<id>urn:sha1:e9dcdd884ac616b4c9f5b258fb2a78d10bfce293</id>
<content type='text'>
* Add new capdef for lss intrinsics

Fixes #7426

Raygen shaders need to be supported for only hitobject APIs. So we need
a special capability for that, instead of a common one.

* regenerate command line reference

---------

Co-authored-by: slangbot &lt;186143334+slangbot@users.noreply.github.com&gt;</content>
</entry>
<entry>
<title>Add optix support for coopvec (#7286)</title>
<updated>2025-06-10T04:48:24+00:00</updated>
<author>
<name>Mukund Keshava</name>
<email>mkeshava@nvidia.com</email>
</author>
<published>2025-06-10T04:48:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=d70da65a90ccd73439895a43b3958c0ea1441f35'/>
<id>urn:sha1:d70da65a90ccd73439895a43b3958c0ea1441f35</id>
<content type='text'>
* WiP: Add coopvec support for Optix

* format code

* fix minor issues

* Fix review comments

---------

Co-authored-by: slangbot &lt;186143334+slangbot@users.noreply.github.com&gt;</content>
</entry>
</feed>
