<feed xmlns='http://www.w3.org/2005/Atom'>
<title>slang.git/source/slang/slang-ir-specialize-buffer-load-arg.cpp, branch master</title>
<subtitle>Making it easier to work with shaders</subtitle>
<id>https://git.yummers.dev/slang.git/atom?h=master</id>
<link rel='self' href='https://git.yummers.dev/slang.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/'/>
<updated>2025-10-10T01:26:28+00:00</updated>
<entry>
<title>Small fix to buffer load specialization pass to allow more specialization to happen. (#8653)</title>
<updated>2025-10-10T01:26:28+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2025-10-10T01:26:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=3cf1f5a616917480c63b76aae906dc36b29e46ce'/>
<id>urn:sha1:3cf1f5a616917480c63b76aae906dc36b29e46ce</id>
<content type='text'>
This allows us to further cleanup unnecessary copies in the target code
we generate.

Part of effort of #8652.</content>
</entry>
<entry>
<title>Enhance buffer load specialization pass to specialize past field extracts. (#8547)</title>
<updated>2025-10-01T02:08:23+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2025-10-01T02:08:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=e4611e2e30a3e5969d402f5ed7e72706a0e3b024'/>
<id>urn:sha1:e4611e2e30a3e5969d402f5ed7e72706a0e3b024</id>
<content type='text'>
This allows us to specialize functions whose argument is a sub element
of a constant buffer, instead of being only applicable to entire buffer
element. Closes #8421.

This change also implements a proper heuristic to determine when to
specialize the calls and defer the buffer loads.

This PR addresses a pathological case exposed in
`slangpy\slangpy\benchmarks\test_benchmark_tensor.py`, which used to
take 27ms to finish, and now takes 1.25ms.


For example, given:
```
struct Bottom
{
    float bigArray[1024];

    [mutating]
    void setVal(int index, float value) { bigArray[index] = value; }
}

struct Root
{
    Bottom top[2];
    [mutating]
    void setTopVal(int x, int y, float value)
    {
        top[x].setVal(y, value);
    }
}

RWStructuredBuffer&lt;Root&gt; sb;

[shader("compute")]
[numthreads(1, 1, 1)]
void compute_main(uint3 tid: SV_DispatchThreadID)
{
    sb[0].setTopVal(1, 2, 100.0f);
}
```

We are now able to specialize the call to `setTopVal` into:
```
void compute_main(uint3 tid: SV_DispatchThreadID)
{
    setTopVal_specialized(0, 1, 2, 100.0f);
}

void setTopVal_specialized(int sbIdx, int x, int y, float value)
{
      Bottom_setVal_specialized(sbIdx, x, y, value);
}

void Bottom_setVal_specialized(int sbIdx, int x, int y, float value)
{
     sb[sbIdx].top[x].bigArray[y] = value;
}
```

And get rid of all unnecessary loads. Achieving this requires a
combination of function call specialization and buffer-load-defer pass.
The buffer-load-defer pass has been completely rewritten to be more
correct and avoid introducing redundant loads.

This PR also adds tests to make sure pointers, bindless handles, and
loads from structured buffer or constant buffers works as expected.</content>
</entry>
<entry>
<title>format</title>
<updated>2024-10-29T06:49:26+00:00</updated>
<author>
<name>Ellie Hermaszewska</name>
<email>ellieh@nvidia.com</email>
</author>
<published>2024-10-29T06:49:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=f65d756bff8d4c5cbc15bd0322a2ae8e6b896a21'/>
<id>urn:sha1:f65d756bff8d4c5cbc15bd0322a2ae8e6b896a21</id>
<content type='text'>
* format

* Minor test fixes

* enable checking cpp format in ci</content>
</entry>
<entry>
<title>Fix most of the disabled warnings on gcc/clang (#2839)</title>
<updated>2023-04-27T04:36:59+00:00</updated>
<author>
<name>Ellie Hermaszewska</name>
<email>ellieh@nvidia.com</email>
</author>
<published>2023-04-27T04:36:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=3acbe8145c60f4d1e7a180b4602a94269a489df5'/>
<id>urn:sha1:3acbe8145c60f4d1e7a180b4602a94269a489df5</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Refactor: eliminate BackEndCompileRequest (#2178)</title>
<updated>2022-04-11T19:01:31+00:00</updated>
<author>
<name>Theresa Foley</name>
<email>10618364+tangent-vector@users.noreply.github.com</email>
</author>
<published>2022-04-11T19:01:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=1409a5379d38ac153eabb4c19c7f4463a8b030ca'/>
<id>urn:sha1:1409a5379d38ac153eabb4c19c7f4463a8b030ca</id>
<content type='text'>
An earlier refactoring pass over the compiler codebase split the
type that had been called `CompileRequest` into three distinct
pieces:

* `FrontEndCompileRequest` which was supposed to own state and
  options related to running the compiler front end and producing
  IR + reflection (e.g., what translation units and source
  files/strings are included).

* `BackEndCompileRequest` which was supposed to own state and options
  related to running the compiler back end to translate the IR
  for a `ComponentType` (program) into output code. (Note that the
  `BackEndCompileRequest` was conceived of as orthogonal to the
  `TargetRequest`s, which store per-target and target-specific
  options.)

* `EndToEndCompileRequest` which was an umbrella object that owns
  separate front-end and back-end requests, plus any state that is
  only relevant when doing a true end-to-end compile (such as the
  kinds of compiles initiated with `slangc`). As originally conceived,
  the only state that this type was supposed to own was stuff related
  to "pass-through" compilation, as well as state related to writing
  of generated code to output files.

That refactoring work was very useful at the time, because it allowed
us to "scrub" the back end compilation steps to remove all
dependencies on front-end and AST state (this was important for our
goals of enabling linking and codegen from serialized Slang IR).

At this point, however, it is clear that the hierarchy that was built
up serves very little purpose:

* The `BackEndCompileRequest` type is only used in two places:

    * As part of an `EndToEndCompileRequest`, where the settings on
      the `BackEndCompileRequest` can be configured, but only through
      the `EndToEndCompileRequest`

    * As part of on-demand code generation through the `IComponentType`
      APIs. In this case, the settings stored on the
      `BackEndCompileRequest` are not accessible to the application
      at all, and will always use their default values, so that
      instantiating a "request" object doesn't really make any sense.

* The `FrontEndCompileRequest` type has a similar situation:

    * Front-end compilation as part of an `EndToEndCompileRequest`
      supports user configuration of `FrontEndCompileRequest` settings,
      but only through the `EndToEndCompileRequest`

    * Front-end compilation triggered by an `import` or a `loadModule()`
      call does not support user configuration of settings at all. It
      will always derive all relevant settings from thsoe on the
      session ("linkage").

In addition, subsequent changes have been made to the compiler that
show a bit of a "code smell" and/or forward-looking worries for this
decomposition:

* In some cases we've had to add the same setting to multiple types
  in the breakdown (front-end, back-end, end-to-end, linkage, target,
  etc.) which makes it harder for us to validate that all the possible
  mixtures of state work correctly.

* Related to the above, in some cases we have manual logic that copies
  state from one of the objects in the breakdown to another, in order
  to ensure that the user's intention is actually followed.

* As a forward-looking concern, it seems that developers have sometimes
  added new configuration options and state to places that don't really
  make sense according to the rationale of the original decomposition
  (e.g., we probably don't want to have a lot of state that is only
  available via end-to-end requests, given that the API structure is
  meant to push users *away* from end-to-end compiles).

As a result of all of the above, I've been planning a large refactor
with the following big-picture goals:

* Eliminate `BackEndCompileRequest`

    * Move all relevant state/options from the back-end request to
      the end-to-end request, since that is the only place they could
      be set anyway.

    * Introduce a transient "context" type to be used for the duration
      of code generation that serves the main functions that back-end
      requests really served in the codebase

* Make `EndToEndCompileRequest` be a subclass of
  `FrontEndCompileRequest`

    * Consider addding a transient "context" type for front-end
      compiles that can be used in `import`-like cases rather than
      needing a full front-end request object. If this works, then
      eliminate `FrontEndCompileRequest` and be back to world with
      just a single `CompileRequest` type

* Move *all* compiler configuration options to a distinct type (named
  something like `CompilerConfig` or `CompilerOptions` or whatever)
  which stores setting as key-value pairs, and has a notion of
  "inheritance" such that one configuration can extend or build on top
  of another. Make all the relevant types use this catch-all structure
  instead of redundantly storing flags in many places.

This change deals with the first of those bullets: removeal of
`BackEndCompileRequest`. The addition of the `CodeGenContext` type is
perhaps an unncessary additional step, but making that change helps
clean up a bunch of the code related to per-target code generation,
so I think it is the right choice.

Co-authored-by: Yong He &lt;yonghe@outlook.com&gt;</content>
</entry>
<entry>
<title>Work to mitigate SPIR-V bloat (#1914)</title>
<updated>2021-07-21T19:52:08+00:00</updated>
<author>
<name>Theresa Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2021-07-21T19:52:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=23d406f8a3b325f91fecd9ad52bd510ded5f49a7'/>
<id>urn:sha1:23d406f8a3b325f91fecd9ad52bd510ded5f49a7</id>
<content type='text'>
* Work to mitigate SPIR-V bloat

SPIR-V is not an especially compact format, but some patterns in how Slang generates code and then runs it through `spirv-opt` lead to many redundant field-by-field copy operations being emitted. This change attempts to address some of the resulting bloat from the Slang side of things.

Note: experimentation shows that the bloat is less pronounced when running either *no* SPIR-V optimizations or *full* SPIR-V optimizations, so it is also likely that the bloat should be addressed by changing which `spirv-opt` passes the Slang compiler runs in default (`-O1`) builds. Such changes should come as a distinct pull request.

This change primarily does two things:

First, the code generation strategy for passing arguments to `out` and `inout` parameters has been changed. In the past, the compiler would *always* copy the argument value into a temporary, then pass the address of the temporary, and then write back the value after the call. The new code generation strategy attempts to identify when an argument value already has a simple address in memory and passes that address directly when possible. This eliminates many copy operations that occur before/after calls to functions with `out`/`inout` parameters.

Second, we introduce an IR optimization pass that detects call sites where the entire contents of a buffer (usually a constant buffer) is being passed to a callee function, such that many bytes are loaded and then passed even if only very few are used in the callee. The pass moves the load operations from the caller to a specialized version of the the callee where possible (e.g., when the constant buffer in question is a global shader parameter). Doing this eliminates another major category of copies.

Notes:

* The IR lowering logic is complicated by the fact that several kinds of l-values (values that are usable as the desitnation of assignment, or for `out`/`inout` arguments) are not actually addressable. An easy example is a non-contiguous swizzle like `v.xwz` on a `float4`, where the value occupies 12 bytes, but not 12 consecutive bytes with a single address. There are many more corner cases like that and the IR lowering pass carries a lot of complexity to deal with them. A more systematic overhaul is due some time soon.

* The IR representation of `out` and `inout` parameters deserves some careful scrutiny when making these kinds of changes. The official semantics of `inout` in HLSL has been "copy in copy out" (and `out` is just "copy out") which is observably different from any solution that passes in the address of an l-value directly. By making this change we are saying that Slang's semantics are not precisely those of legacy HLSL, and that our semantics for `inout` parameters are closer to those of `inout` in Swift or of a mutable borrow in Rust. In the Swift case the implementation can freely pass the underlying storage of an l-value or the address of a temporary, and valid programs may not observe the different. It is thus illegal to observe the value in a storage local while a mutation to that location is "in flight." All of this is way more detailed and technical than 99% of Slang users will ever care about, but importantly it gives us semantic cover to eliminate these copies in the IR, and also to emit output C++ code that implements `out` and `inout` as by-reference parameter passing.

* There was an exsting generic pass for specializing functions based on call sites that uses a "template method" style of pattern to customize its behavior. That pass needed to be generalized to handle this use case because it had previously operated on the assumption that the "desire" to specialize a callee function must be driven by the parameter declarations of that function, and not on the argument values passed in. The code has been slightly refactored to allow the policy for specialization to consider both parameters and arguments.

* Unsurprisingly, a bunch of the GLSL (and thus SPIR-V) generated has changed with this work, so several baseline `.slang.glsl` files needed to be updated.

* This change is incomplete in that it does not address broader cases of buffer loads, including both partial loads from constant buffers (just loading one field, but a field that uses a "large" structure type), and loads from multi-element buffers (a lot from a structured buffer where the element type is "large"). The main question in each of those cases is how to define how "large" a structure needs to be before we decide to try and sink loads into callee functions like this. In the worst case, sinking loads in this way may actually create *more* memory traffic (because the same values get loaded in multiple callee functions).

* fixup: run premake

* fixup: typo</content>
</entry>
</feed>
