<feed xmlns='http://www.w3.org/2005/Atom'>
<title>slang.git/prelude/slang-cpp-types.h, branch master</title>
<subtitle>Making it easier to work with shaders</subtitle>
<id>https://git.yummers.dev/slang.git/atom?h=master</id>
<link rel='self' href='https://git.yummers.dev/slang.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/'/>
<updated>2025-10-01T02:08:23+00:00</updated>
<entry>
<title>Enhance buffer load specialization pass to specialize past field extracts. (#8547)</title>
<updated>2025-10-01T02:08:23+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2025-10-01T02:08:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=e4611e2e30a3e5969d402f5ed7e72706a0e3b024'/>
<id>urn:sha1:e4611e2e30a3e5969d402f5ed7e72706a0e3b024</id>
<content type='text'>
This allows us to specialize functions whose argument is a sub element
of a constant buffer, instead of being only applicable to entire buffer
element. Closes #8421.

This change also implements a proper heuristic to determine when to
specialize the calls and defer the buffer loads.

This PR addresses a pathological case exposed in
`slangpy\slangpy\benchmarks\test_benchmark_tensor.py`, which used to
take 27ms to finish, and now takes 1.25ms.


For example, given:
```
struct Bottom
{
    float bigArray[1024];

    [mutating]
    void setVal(int index, float value) { bigArray[index] = value; }
}

struct Root
{
    Bottom top[2];
    [mutating]
    void setTopVal(int x, int y, float value)
    {
        top[x].setVal(y, value);
    }
}

RWStructuredBuffer&lt;Root&gt; sb;

[shader("compute")]
[numthreads(1, 1, 1)]
void compute_main(uint3 tid: SV_DispatchThreadID)
{
    sb[0].setTopVal(1, 2, 100.0f);
}
```

We are now able to specialize the call to `setTopVal` into:
```
void compute_main(uint3 tid: SV_DispatchThreadID)
{
    setTopVal_specialized(0, 1, 2, 100.0f);
}

void setTopVal_specialized(int sbIdx, int x, int y, float value)
{
      Bottom_setVal_specialized(sbIdx, x, y, value);
}

void Bottom_setVal_specialized(int sbIdx, int x, int y, float value)
{
     sb[sbIdx].top[x].bigArray[y] = value;
}
```

And get rid of all unnecessary loads. Achieving this requires a
combination of function call specialization and buffer-load-defer pass.
The buffer-load-defer pass has been completely rewritten to be more
correct and avoid introducing redundant loads.

This PR also adds tests to make sure pointers, bindless handles, and
loads from structured buffer or constant buffers works as expected.</content>
</entry>
<entry>
<title>Initial copy elision pass (#8042)</title>
<updated>2025-08-07T07:22:22+00:00</updated>
<author>
<name>ArielG-NV</name>
<email>159081215+ArielG-NV@users.noreply.github.com</email>
</author>
<published>2025-08-07T07:22:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=063cbeaaea2fb00a10c6058ea4a9632092772ea5'/>
<id>urn:sha1:063cbeaaea2fb00a10c6058ea4a9632092772ea5</id>
<content type='text'>
Fixes #7574

Changes:
* Add an initial (fairly simple) optimization pass which is able to
eliminate redundant copies.
* Our current existing optimizer passes remove redundant load/store very
robustly, this pass will focus on other cases of copy elimination
* Primary approach is to make all functions which are `in T` and `T` is
trivial to copy into a `__constref T`. We then (depending on scenario)
manually insert a variable+load if a pass-by-reference is not possible;
otherwise we pass by `constref`.
* Added optimizations to eliminate redundant code which causes
`constref` to fail to compile

---------

Co-authored-by: Harsh Aggarwal &lt;haaggarwal@nvidia.com&gt;
Co-authored-by: Claude &lt;noreply@anthropic.com&gt;
Co-authored-by: slangbot &lt;ellieh+slangbot@nvidia.com&gt;
Co-authored-by: slangbot &lt;186143334+slangbot@users.noreply.github.com&gt;</content>
</entry>
<entry>
<title>format</title>
<updated>2024-10-29T06:49:26+00:00</updated>
<author>
<name>Ellie Hermaszewska</name>
<email>ellieh@nvidia.com</email>
</author>
<published>2024-10-29T06:49:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=f65d756bff8d4c5cbc15bd0322a2ae8e6b896a21'/>
<id>urn:sha1:f65d756bff8d4c5cbc15bd0322a2ae8e6b896a21</id>
<content type='text'>
* format

* Minor test fixes

* enable checking cpp format in ci</content>
</entry>
<entry>
<title>Implement 8.14-8.19 of OpenGL-GLSL specification</title>
<updated>2024-04-03T13:30:46+00:00</updated>
<author>
<name>ArielG-NV</name>
<email>159081215+ArielG-NV@users.noreply.github.com</email>
</author>
<published>2024-04-03T13:30:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=a697b2c6707ee699cb734a03fa529dd214ac66cc'/>
<id>urn:sha1:a697b2c6707ee699cb734a03fa529dd214ac66cc</id>
<content type='text'>
The following PR implements 8.14-8.19 of the [OpenGL-GLSL specification](https://registry.khronos.org/OpenGL/specs/gl/GLSLangSpec.4.60.pdf).

Fully implements all functions and built-in type's, resolves https://github.com/shader-slang/slang/issues/3692 for GLSL &amp; SPRI-V targets.

_Notes:_
Testing Tools:
* Fragment shaders cannot test computational results. Only OpCodes are checked for proper emitting.

Implementation Notes:
* SubpassInput requires an unknown image format.
* SubpassInput is disjoint from TextureType: __SubpassImpl (.slang) &amp; SubpassInputType (Compiler) to reduce code generation required.
* SubpassInput required an additional input layout modifier, input_attachment_index, this was added as a new parameter binding attribute. Since the following qualifiers can overlap with different resources (`layout(input_attachment_index = 0, binding = 0, set = 0)`) input_attachment_index is checked for overlapping resource bindings separately from other qualifiers with `LayoutResourceKind::InputAttachmentIndex`.
* `GLSLInputAttachmentIndexLayoutModifier` was added to enforce function parameters only accepting `in` decorated variables.
* `in` decorated variables needed to have emitting modified to allow directly emitting the variable into function calls if used as a parameter, normally Slang has a "global variable" shadow as a "global parameter" through a copy. This does not work and is solved using `GlobalVariableShadowingGlobalParameterDecoration` to build a relationship of "global variable" to "global parameter", we then resolve this relationship and replace "global variable" uses later in compile.
* `AtomicCounterMemory` memory-constraint requires `OpCapability AtomicStorage`, `AtomicStorage` is invalid for Vulkan targets. glslang outputs for `barrier`, `memoryBarrier`, and `groupMemoryBarrier` `AtomicCounterMemory` as a memory constraint. This compiles as valid SPIR-V for Vulkan since `OpCapability AtomicStorage` is not declared. This behavior of glslang is undefined as per [3.31.Capability of the SPIR-V specification](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_capability). We will omit `AtomicCounterMemory` from our barrier calls.</content>
</entry>
<entry>
<title>Improve cpp prelude. (#3725)</title>
<updated>2024-03-09T02:09:13+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2024-03-09T02:09:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=5074ee7c8a7f154273ed26815a8018df27dc03bb'/>
<id>urn:sha1:5074ee7c8a7f154273ed26815a8018df27dc03bb</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Add PyTorch C++ binding generation. (#2734)</title>
<updated>2023-03-26T20:59:11+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2023-03-26T20:59:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=d64ee86a3130f8eeb75d09193c38c621d7565eba'/>
<id>urn:sha1:d64ee86a3130f8eeb75d09193c38c621d7565eba</id>
<content type='text'>
* Add PyTorch C++ binding generation.

* fix

---------

Co-authored-by: Yong He &lt;yhe@nvidia.com&gt;</content>
</entry>
<entry>
<title>Overhaul global inst deduplication and cpp/cuda backend. (#2654)</title>
<updated>2023-02-16T21:55:32+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2023-02-16T21:55:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=4c4826d47eeef4675daae4ae53ff76f4d5ebd84a'/>
<id>urn:sha1:4c4826d47eeef4675daae4ae53ff76f4d5ebd84a</id>
<content type='text'>
* Overhaul global inst deduplication and cpp/cuda backend.

* Update IR documentation.

---------

Co-authored-by: Yong He &lt;yhe@nvidia.com&gt;</content>
</entry>
<entry>
<title>Fix code generation for matrix reshape. (#2568)</title>
<updated>2022-12-14T17:37:55+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2022-12-14T17:37:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=1c2c4908c64396de2d1bee197c8f000ae2fed0fc'/>
<id>urn:sha1:1c2c4908c64396de2d1bee197c8f000ae2fed0fc</id>
<content type='text'>
Co-authored-by: Yong He &lt;yhe@nvidia.com&gt;</content>
</entry>
<entry>
<title>Improved bounds checking for C++/CUDA (#2263)</title>
<updated>2022-06-08T23:51:49+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2022-06-08T23:51:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=4db6bd3cd6da1871fdac520c280bd9f933e48489'/>
<id>urn:sha1:4db6bd3cd6da1871fdac520c280bd9f933e48489</id>
<content type='text'>
* #include an absolute path didn't work - because paths were taken to always be relative.

* Use TerminatedUnownedStringSlice for literals in output C++.

* Remove Escape/Unescape functions used in slang-token-reader.cpp
Add target type of 'host-cpp' etc to map to the target types.

* Fix some corner cases around string encoding.

* Added unit test for string escaping.
Fixed some assorted escaping bugs.

* Updated test output.

* Added decode test.

* Stop using hex output, to get around 'greedy' aspect. Use octal instead.

* Added HostHostCallable
Small changes to use ArtifactDesc/Info instead of large switches.

* Fix C++ emit to handle arbitrary function export.

* Add options handling for callable without an output being specified.

* Can compile with COM interface. Added example using com interface.

* Use the IR Ptr type instead of hack in C++ emit for interfaces.

* Fix issue with outputting the COM call when ptr is used.

* Fix crash issue on compilation failure.

* Add support for __global.

* Added `ActualGlobalRate`
Added special handling around globals and COM interfaces.
Tested out in cpu-com-example.

* Fix typo in NodeBase.

* Support for accessing globals by name working.

* Bounds checking for C++
Improved bounds checks for CUDA.

* Check that actual global initialization is working.

* Fix typo.

* Refactor the com replacement such that it doesn't need a cache or do anything special with GlobalVar.

* Fix typo in CUDA prelude.

* Remove context.
Only create replacement if needed.

* Split out COM host-callable into a unit-test.

* host-callable com testing on C++and llvm.

* Comment around the COM ptr replacement.

* WIP Zero bound test.

* Disable com test on vs 32 bit.
Fix C++ prelude

* Disable 32 bit targets testing com host-callable.

* For now disable zero index test.

* Enable bounds checking for CPU/CUDA.

* Small fixes.
Disable CUDA zero index bound fix.

* Add test result for bound check.

* Work around for index wrapping issue.

* Added Fixed array test.

* Only enable prelude asserts via SLANG_PRELUDE_ENABLE_ASSERT (unless defined by the user)</content>
</entry>
<entry>
<title>First Slang LLVM integration (#1934)</title>
<updated>2021-09-10T20:31:26+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2021-09-10T20:31:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=27ce5eb0de9f792f3e433bcb239c07d79371cf45'/>
<id>urn:sha1:27ce5eb0de9f792f3e433bcb239c07d79371cf45</id>
<content type='text'>
* #include an absolute path didn't work - because paths were taken to always be relative.

* First integration with 'slang-llvm'.

* Fix project.

* Fix test output.

* First pass assert support.

* Add inline impls for min and max.

* Add abs inline abs impl for llvm.

* Make abs not use ternary op

* Fix typo in slang-llvm.h

* Sundary fixes to make remaining tests using llvm backend pass.</content>
</entry>
</feed>
