<feed xmlns='http://www.w3.org/2005/Atom'>
<title>slang.git/source/slang/slang-emit.cpp, branch master</title>
<subtitle>Making it easier to work with shaders</subtitle>
<id>https://git.yummers.dev/slang.git/atom?h=master</id>
<link rel='self' href='https://git.yummers.dev/slang.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/'/>
<updated>2025-10-18T01:37:45+00:00</updated>
<entry>
<title>Optionally disable entry point param cbuffer transform</title>
<updated>2025-10-18T01:37:45+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2025-10-11T20:34:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=482914e1b42dd44f4696b48c834136631672cebe'/>
<id>urn:sha1:482914e1b42dd44f4696b48c834136631672cebe</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Inline global constants for shader style CPU targets (#8686)</title>
<updated>2025-10-16T18:53:10+00:00</updated>
<author>
<name>Julius Ikkala</name>
<email>julius.ikkala@gmail.com</email>
</author>
<published>2025-10-16T18:53:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=537697d3aa21b418c348b1017005603668b6a4cb'/>
<id>urn:sha1:537697d3aa21b418c348b1017005603668b6a4cb</id>
<content type='text'>
On the shader-host-callable target, test `gh-4874.slang` generates IR
that contains global constants referencing global params. These need to
get inlined into functions, as otherwise
`introduceExplicitGlobalContext()` will fail with "no outer func at use
site for global", making the test crash the compiler.</content>
</entry>
<entry>
<title>Immutable access qualifier for pointers and use `__ldg` on cuda. (#8710)</title>
<updated>2025-10-16T03:59:47+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2025-10-16T03:59:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=01510f2c922af8629c7a730ef92a31fa83bd9f49'/>
<id>urn:sha1:01510f2c922af8629c7a730ef92a31fa83bd9f49</id>
<content type='text'>
This PR implements `Access.Immutable` to allow pointers to immutable
data.

The new type `ImmutablePtr&lt;T&gt;` is defined as an alias of `Ptr&lt;T,
Address.Immutable&gt;`.
By forming a immutable pointer, the programmer is conveying to the
compiler that the data at the pointer address will never change during
the execution of the current program. Therefore loads from immutable
pointers can be deduplicated by the compiler, and will translate to
`__ldg` when generating code for CUDA.

The SPIRV backend is not changed in this PR, since the current SPIRV
spec makes it very difficult to specify loads from immutable address
without generating tons of wrappers and boilerplate type declarations.
We would like to see the spec evolved a bit to around its support of
`NonWritable` physical storage pointers or immutable loads before we
attempt to express such immutability in SPIRV. For now we simply emit
ordinary pointers and loads when generating spirv.

---------

Co-authored-by: slangbot &lt;186143334+slangbot@users.noreply.github.com&gt;</content>
</entry>
<entry>
<title>Improve perf with `-separate-debug-info` (#8670)</title>
<updated>2025-10-10T02:22:22+00:00</updated>
<author>
<name>Jay Kwak</name>
<email>82421531+jkwak-work@users.noreply.github.com</email>
</author>
<published>2025-10-10T02:22:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=a3372605363cfe511b5248cb1393f59b25cb2483'/>
<id>urn:sha1:a3372605363cfe511b5248cb1393f59b25cb2483</id>
<content type='text'>
When Slang form a new spirv code without the debug info, List container
had to reserve the memory space before adding items in it.

This improves the given repro test time from 56 minutes to 6 minutes.</content>
</entry>
<entry>
<title>Defer `IRCastStorageToLogicalDeref` in lowerBufferElementType pass. (#8668)</title>
<updated>2025-10-10T01:30:24+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2025-10-10T01:30:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=e420f2f980813559b186a6a6bcd5540f74310d02'/>
<id>urn:sha1:e420f2f980813559b186a6a6bcd5540f74310d02</id>
<content type='text'>
Fix a regression on metal test.

In `lowerBufferElementTypeToStorageType` pass, not only we want to defer
an argument that is `CastStorageToLogical` to the callee, but also apply
the same defer logic to `CastStorageToLogicalDeref` as well.

Because `CastStorageToLogicalDeref` will appear as argumnet if
`lowerBufferElementTypeToStorageType` is run before we apply the
`in-&gt;borrow` transformation pass, which is the case for metal parameter
block legalization.</content>
</entry>
<entry>
<title>Small fix to buffer load specialization pass to allow more specialization to happen. (#8653)</title>
<updated>2025-10-10T01:26:28+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2025-10-10T01:26:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=3cf1f5a616917480c63b76aae906dc36b29e46ce'/>
<id>urn:sha1:3cf1f5a616917480c63b76aae906dc36b29e46ce</id>
<content type='text'>
This allows us to further cleanup unnecessary copies in the target code
we generate.

Part of effort of #8652.</content>
</entry>
<entry>
<title>Fix legalization crash when processing metal parameter blocks. (#8591)</title>
<updated>2025-10-03T19:52:26+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2025-10-03T19:52:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=6a2cf239a89340ed2985d04609499e8c4a2d8f89'/>
<id>urn:sha1:6a2cf239a89340ed2985d04609499e8c4a2d8f89</id>
<content type='text'>
Closes #7606.

When Slang compile for a bindful target, we will run the resource type
legalization pass to hoist resource typed struct fields outside of the
struct type and define them as global parameters and passing them around
via dedicated function parameters.
When we compile for a bindless target, we don't run this pass.
However, Metal is a hybrid bindful and bindless target. We need to run
type legalization for the constant buffer, but skip type legalization
for parameter block.

The previous attempt to support this behavior is to hack the type
legalization pass to return `LegalVal::simple` when it sees a
`ParameterBlock&lt;T&gt;`. However, whenever the code is accessing
`parameterBlock.someNestedField`, the type of the nested field may get a
`LegalType::tuple`, and now we will run into inconsistent scenarios
where we have a `LegalVal::simple` on the operand val, and but the
legalization logic is expecting that val to be a `LegalType::tuple`.
This breaks a lot of assumptions and invariants in the type legalization
pass, resulting unstable/fragile behavior.

To systematically solve this problem, this change generalizes the
existing legalize buffer element type pass to translate
`ParameterBlock&lt;Texture2D&gt;` (and similar cases) to
`ParameterBlock&lt;Texture2D.Handle&gt;`. So that such parameter block will
always be legalized to `LegalType:::simple` during type legalization,
and we will never run into any inconsistent cases. This allowed us to
get rid of the hacky logic in the type legalization pass to try to
workaround the inconsistencies.</content>
</entry>
<entry>
<title>Enhance buffer load specialization pass to specialize past field extracts. (#8547)</title>
<updated>2025-10-01T02:08:23+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2025-10-01T02:08:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=e4611e2e30a3e5969d402f5ed7e72706a0e3b024'/>
<id>urn:sha1:e4611e2e30a3e5969d402f5ed7e72706a0e3b024</id>
<content type='text'>
This allows us to specialize functions whose argument is a sub element
of a constant buffer, instead of being only applicable to entire buffer
element. Closes #8421.

This change also implements a proper heuristic to determine when to
specialize the calls and defer the buffer loads.

This PR addresses a pathological case exposed in
`slangpy\slangpy\benchmarks\test_benchmark_tensor.py`, which used to
take 27ms to finish, and now takes 1.25ms.


For example, given:
```
struct Bottom
{
    float bigArray[1024];

    [mutating]
    void setVal(int index, float value) { bigArray[index] = value; }
}

struct Root
{
    Bottom top[2];
    [mutating]
    void setTopVal(int x, int y, float value)
    {
        top[x].setVal(y, value);
    }
}

RWStructuredBuffer&lt;Root&gt; sb;

[shader("compute")]
[numthreads(1, 1, 1)]
void compute_main(uint3 tid: SV_DispatchThreadID)
{
    sb[0].setTopVal(1, 2, 100.0f);
}
```

We are now able to specialize the call to `setTopVal` into:
```
void compute_main(uint3 tid: SV_DispatchThreadID)
{
    setTopVal_specialized(0, 1, 2, 100.0f);
}

void setTopVal_specialized(int sbIdx, int x, int y, float value)
{
      Bottom_setVal_specialized(sbIdx, x, y, value);
}

void Bottom_setVal_specialized(int sbIdx, int x, int y, float value)
{
     sb[sbIdx].top[x].bigArray[y] = value;
}
```

And get rid of all unnecessary loads. Achieving this requires a
combination of function call specialization and buffer-load-defer pass.
The buffer-load-defer pass has been completely rewritten to be more
correct and avoid introducing redundant loads.

This PR also adds tests to make sure pointers, bindless handles, and
loads from structured buffer or constant buffers works as expected.</content>
</entry>
<entry>
<title>Rewriting the lower-buffer-element-type pass to avoid unnecessary packing/unpacking. (#8526)</title>
<updated>2025-09-30T00:45:08+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2025-09-30T00:45:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=a6deb5ed82cb8fc6b4f4c5c5fee264e09f97ff89'/>
<id>urn:sha1:a6deb5ed82cb8fc6b4f4c5c5fee264e09f97ff89</id>
<content type='text'>
Part of the effort to improve the performance of generated SPIRV code.

The existing lower-buffer-element-type pass works by loading the entire
buffer element content from memory, and translate it to logical type
stored in a local variable at the earliest reference of a buffer handle.
This means that is can generate inefficient code that reads more than
necessary.

Consider this example:
```
struct BigStruct { bool values[1024]; }
ConstantBuffer&lt;BigStruct&gt; cb;

void test(BigStruct v)
{
      if (v.values[0]) { printf("ok"); }
}

[numthreads(1,1,1)]
void computeMain()
{
    test(cb);
}
```

In IR, the `computeMain` function before lower-buffer-element-type pass
is something like following:
```
func test:
   %v = param : BigStruct
   %barr = fieldExtract(%v, "values")
   %element = elementExtract(%barr, 0)
    ... // uses %element 

func computeMain:
  %v = load(cb)
  call %test %v
```

The existing lower-buffer-element-type pass will rewrite the bool array
in `BigStruct` into `int` array so it is legal in SPIRV. However, it
does so by inserting the translation on the first `load` of the constant
buffer:

```
struct BigStruct_std430 {
    int values[1024];
}
var cb : ConstantBuffer&lt;BigStruct_std430&gt;;
func computeMain:
   %tmpVar : var&lt;BigStruct&gt;
    call %unpackStorage(%tmpVar, cb)
   %v : BigStruct = load %tmpVar
   call %test %v
```

This means that the entire array will be loaded and translated to int,
before calling `test`, which only uses one element. It turns out that
the downstream compiler isn't always able to optimize out this
inefficient translation/copy.

This PR completely rewrites the way buffer-element-type lowering is
handled to avoid producing this inefficient code. It works in two parts:
first we turn on the `transformParamsToConstRef` pass for SPIRV target
as well, so we will translate the `test` function to take the `v`
parameter as `constref`. The second part is a redesigned
buffer-element-type pass that defers the storage-type to logical-type
translation until a value is actually used by a `load` instruction.

In this example, after `transformParamsToConstRef`, the IR is:

```
func test:
   %v = param : ConstRef&lt;BigStruct&gt;
   %barr = fieldAddr(%v, "values")
   %elementPtr = elementAddr(%barr, 0)
   %element = load(%elementPtr)
    ... // uses %element 

func computeMain:
  call %test %cb
```

The new `buffer-element-type-lowering` pass will take this IR, and
insert translation at latest possible time across the entire call graph,
and translate the IR into:

```
func test:
   %v = param : ConstRef&lt;BigStruct_std430&gt;
   %barr = fieldAddr(%v, "values")
   %elementPtr : ptr&lt;int&gt; = elementAddr(%barr, 0)
   %element_int = load(%elementPtr)
    %element = cast(%element_int) : %bool
    ... // uses %element 

func computeMain:
  call %test %cb
```

In this new IR, there is no longer a load and conversion of the entire
array.

See new comment in `slang-ir-lower-buffer-element-type.cpp` for more
details of how the pass works.

This PR also address many other issues surfaced by turning on
`transformParamsToConstRef` pass on SPIRV backend.

---------

Co-authored-by: slangbot &lt;186143334+slangbot@users.noreply.github.com&gt;</content>
</entry>
<entry>
<title>Fix CUDA global variable initialization with constructor calls (#8340)</title>
<updated>2025-09-18T15:46:44+00:00</updated>
<author>
<name>Harsh Aggarwal (NVIDIA)</name>
<email>haaggarwal@nvidia.com</email>
</author>
<published>2025-09-18T15:46:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=54faa55c0bd4c4beede7337a76ed3a56d1eb4f15'/>
<id>urn:sha1:54faa55c0bd4c4beede7337a76ed3a56d1eb4f15</id>
<content type='text'>
Fix CUDA global variable initialization with constructor calls

Resolves CUDA compilation failure where global variables with struct
constructor
initialization generated illegal `__device__` variable runtime
initialization.

  **Problem:**
  ```cuda
  // Generated invalid CUDA code:
  __device__ static const Stuff_0 gStuff_0 = Stuff_x24init_0(args...);
// Error: "dynamic initialization is not supported for a __device__
variable"

  Root Cause Discovered:
Through extensive debugging, found that
moveGlobalVarInitializationToEntryPoints
pass only handled kIROp_GlobalVar instructions, but global constants
with
  constructor calls appeared as kIROp_Call instructions at module scope.

  Solution:
1. IR Pipeline Fix: Extended moveGlobalVarInitializationToEntryPoints to
detect
and transform module-level constructor calls into proper global
variables with
  entry-point initialization
2. Field Access Fix: Enhanced kIROp_FieldExtract logic to emit correct
-&gt;
  syntax for pointer types and address-of operations
3. Constructor Emission: Added CUDA-specific handling for constructor
calls

  Architecture:
- Transforms let %gStuff = call %Constructor(...) into kernel context
initialization
- Moves runtime initialization from global scope to entry-point
execution
  - Follows CUDA best practices for global state management

  Files:
- source/slang/slang-ir-explicit-global-init.cpp: Extended IR
transformation pass
- source/slang/slang-emit-c-like.cpp: Enhanced field access and foldable
value logic
- source/slang/slang-emit-cuda.cpp: Added CUDA-specific field extraction
handling

  Result:
  // Now generates proper CUDA code:
  struct KernelContext_0 { Stuff_0 gStuff_1; };
  // Runtime initialization in entry point:
  kernelContext_1.gStuff_1 = constructor_call();

  Fixes: tests/compute/type-legalize-global-with-init.slang</content>
</entry>
</feed>
