<feed xmlns='http://www.w3.org/2005/Atom'>
<title>slang.git/tests/compute/unbounded-array-of-array-syntax.slang.glsl, branch master</title>
<subtitle>Making it easier to work with shaders</subtitle>
<id>https://git.yummers.dev/slang.git/atom?h=master</id>
<link rel='self' href='https://git.yummers.dev/slang.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/'/>
<updated>2023-08-23T12:49:33+00:00</updated>
<entry>
<title>Lower all ByteAddressBuffer uses for SPIRV. (#3143)</title>
<updated>2023-08-23T12:49:33+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2023-08-23T12:49:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=6437c38e0a3c2c1daf36cb5e543dc0b467fa4b15'/>
<id>urn:sha1:6437c38e0a3c2c1daf36cb5e543dc0b467fa4b15</id>
<content type='text'>
Co-authored-by: Yong He &lt;yhe@nvidia.com&gt;</content>
</entry>
<entry>
<title>Compile append and consume structured buffers to glsl. (#3142)</title>
<updated>2023-08-22T00:07:34+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2023-08-22T00:07:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=bd6dbaf7c3ea720b4ed39904fe08878f9dcbd947'/>
<id>urn:sha1:bd6dbaf7c3ea720b4ed39904fe08878f9dcbd947</id>
<content type='text'>
* Compile append and consume structured buffers to glsl.

* Fix.

* Update CI config.

---------

Co-authored-by: Yong He &lt;yhe@nvidia.com&gt;</content>
</entry>
<entry>
<title>Support per field matrix layout (#3101)</title>
<updated>2023-08-14T23:23:19+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2023-08-14T23:23:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=661d6198bbb9857d3fdc6df477e0742ed0b0765c'/>
<id>urn:sha1:661d6198bbb9857d3fdc6df477e0742ed0b0765c</id>
<content type='text'>
* Support per field matrix layout

* Fix warnings.

* Fix.

* Fix tests.

* Fix spiv gen.

* Fix.

* More test fixes.

* Fix.

* Run only GPU tests on self-hosted servers.

* Remove -use-glsl-matrix-layout-modifier.

* Fix.

---------

Co-authored-by: Yong He &lt;yhe@nvidia.com&gt;</content>
</entry>
<entry>
<title>Be lenient on same-size unsigend-&gt;signed conversion. (#2913)</title>
<updated>2023-06-01T20:53:31+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2023-06-01T20:53:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=16cd361dd67471bcc355d1b3b72b0b022518088f'/>
<id>urn:sha1:16cd361dd67471bcc355d1b3b72b0b022518088f</id>
<content type='text'>
* Be lenient on same-size unsigend-&gt;signed conversion.

* Fix tests.

* Use 250.

* wip

* Fix.

* Fix tests.

* Fix.

---------

Co-authored-by: Yong He &lt;yhe@nvidia.com&gt;</content>
</entry>
<entry>
<title>Arithmetic simplifications and more IR clean up logic. (#2632)</title>
<updated>2023-02-08T02:36:35+00:00</updated>
<author>
<name>Yong He</name>
<email>yonghe@outlook.com</email>
</author>
<published>2023-02-08T02:36:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=4be623c52a6518eb86756a0369706c1d6670f6bb'/>
<id>urn:sha1:4be623c52a6518eb86756a0369706c1d6670f6bb</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Use IR pass to eliminate phi nodes (#2226)</title>
<updated>2022-05-10T14:18:03+00:00</updated>
<author>
<name>Theresa Foley</name>
<email>10618364+tangent-vector@users.noreply.github.com</email>
</author>
<published>2022-05-10T14:18:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=8c540f216f9fe9366bbe57732063607b41344b9f'/>
<id>urn:sha1:8c540f216f9fe9366bbe57732063607b41344b9f</id>
<content type='text'>
* Use IR pass to eliminate phi nodes

"Phi nodes" are one of the key contrivances that makes SSA (Static
Single Assignment) form work. Because SSA is so great for compiler
IRs, we kind of need to deal with phi nodes, but they also get in
the way because they don't have a direct analog in most lower-level
machine ISAs or execution models, nor in most of the high-level
languages a transpiler wants to emit. As a result a compiler like
ours needs to be able to eliminate the phi nodes from a program as
part of generating output code.

(For any clever people noting that SPIR-V supports phi nodes
directly: yes, it does. It doesn't need to and it probably *shouldn't*.
Anybody involved in the decision-making knows my reasoning, and
anybody else should feel free to ask me if they want the lecture.
Anyway...)

The basic idea of elimiating phi nodes is simple enough. We replace
each phi node with a temporary variable. Uses of the phi use values
loaded from the temporary. The operation of the phi itself
(assigning a value based on the branch taken) amounts to an assignment
into the temporary.

Previously, the Slang compiler dealt with phi nodes very late in
the process of generating code: in the middle of emitting strings
of source code in a high-level language like HLSL or GLSL. Doing the
work that late in compilation has two big drawbacks:

1. Our ability to emit clean and/or optimal code is limited because
we may not be able to make certain changes to the IR, or because we
cannot make use of additional information like a dominator tree that
might be available at other points in compilation.

2. Any other IR passes that relate to temporary variables won't be
able to see the variables that we generate for phi nodes. This could
raise issues with correctness (e.g., if we want to compute live-range
information for *all* temporary variables), or performance (we have no
way to run additional IR optimization passes after phis are eliminated).

This change addresses these problems by making the elimination of
phi nodes an explicit IR pass. Additional optimizations can easily be
run after this pass (although we'd need to be careful not to run
passes that could end up introducing new phis). The pass makes use
of the information available to it to try to produce code that will
emit to "clean" HLSL/GLSL.

The core of the pass is in `slang-ir-eliminate-phis.cpp`, and is
heavily commented, so I won't describe the approach in detail here.
There are two related issues that came up, though:

First, it turned out that our emit logic for local variables (`IRVar`
instructions) wasn't using the function we'd defined named `emitVar()`.
One worrying consequence of that oversight was that the `precise`
modifier would impact generated HLSL/GLSL for variables that turned
into SSA values (including phi nodes), but *not* for local variables
that had not been SSA'd (or that had been SSA'd and then de-SSA'd).
This change also fixes that bug; it is unclear how widespread the
impact of the original issue might be.

Second, generating explicit IR temporaries for phi nodes exposed a
pre-existing bug in the `slang-ir-restructure-scoping` pass. That pass
basically detects cases where we have an instruction `I` with a use
`U` such that the use follows the rules of SSA form ("def dominates
use," meaning `I` dominations `U`), but does not follow the more
restrictive scoping rules of high-level-language output (where a value
computed "inside" a loop is not automatically visible to code outside
the loop just because it dominates that code). That pass did not
correctly account for the case where `I` was a temporary variable.
It seems that case could not arise before now because we didn't have
any passes that would move `var`, `load`, or `store` operations out
of the basic block they started in. The fix for that pass was relatively
simple, and will make the whole thing more robust in case we add more
aggressive optimizations later.

* fixup: expected test output</content>
</entry>
<entry>
<title>Work to mitigate SPIR-V bloat (#1914)</title>
<updated>2021-07-21T19:52:08+00:00</updated>
<author>
<name>Theresa Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2021-07-21T19:52:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=23d406f8a3b325f91fecd9ad52bd510ded5f49a7'/>
<id>urn:sha1:23d406f8a3b325f91fecd9ad52bd510ded5f49a7</id>
<content type='text'>
* Work to mitigate SPIR-V bloat

SPIR-V is not an especially compact format, but some patterns in how Slang generates code and then runs it through `spirv-opt` lead to many redundant field-by-field copy operations being emitted. This change attempts to address some of the resulting bloat from the Slang side of things.

Note: experimentation shows that the bloat is less pronounced when running either *no* SPIR-V optimizations or *full* SPIR-V optimizations, so it is also likely that the bloat should be addressed by changing which `spirv-opt` passes the Slang compiler runs in default (`-O1`) builds. Such changes should come as a distinct pull request.

This change primarily does two things:

First, the code generation strategy for passing arguments to `out` and `inout` parameters has been changed. In the past, the compiler would *always* copy the argument value into a temporary, then pass the address of the temporary, and then write back the value after the call. The new code generation strategy attempts to identify when an argument value already has a simple address in memory and passes that address directly when possible. This eliminates many copy operations that occur before/after calls to functions with `out`/`inout` parameters.

Second, we introduce an IR optimization pass that detects call sites where the entire contents of a buffer (usually a constant buffer) is being passed to a callee function, such that many bytes are loaded and then passed even if only very few are used in the callee. The pass moves the load operations from the caller to a specialized version of the the callee where possible (e.g., when the constant buffer in question is a global shader parameter). Doing this eliminates another major category of copies.

Notes:

* The IR lowering logic is complicated by the fact that several kinds of l-values (values that are usable as the desitnation of assignment, or for `out`/`inout` arguments) are not actually addressable. An easy example is a non-contiguous swizzle like `v.xwz` on a `float4`, where the value occupies 12 bytes, but not 12 consecutive bytes with a single address. There are many more corner cases like that and the IR lowering pass carries a lot of complexity to deal with them. A more systematic overhaul is due some time soon.

* The IR representation of `out` and `inout` parameters deserves some careful scrutiny when making these kinds of changes. The official semantics of `inout` in HLSL has been "copy in copy out" (and `out` is just "copy out") which is observably different from any solution that passes in the address of an l-value directly. By making this change we are saying that Slang's semantics are not precisely those of legacy HLSL, and that our semantics for `inout` parameters are closer to those of `inout` in Swift or of a mutable borrow in Rust. In the Swift case the implementation can freely pass the underlying storage of an l-value or the address of a temporary, and valid programs may not observe the different. It is thus illegal to observe the value in a storage local while a mutation to that location is "in flight." All of this is way more detailed and technical than 99% of Slang users will ever care about, but importantly it gives us semantic cover to eliminate these copies in the IR, and also to emit output C++ code that implements `out` and `inout` as by-reference parameter passing.

* There was an exsting generic pass for specializing functions based on call sites that uses a "template method" style of pattern to customize its behavior. That pass needed to be generalized to handle this use case because it had previously operated on the assumption that the "desire" to specialize a callee function must be driven by the parameter declarations of that function, and not on the argument values passed in. The code has been slightly refactored to allow the policy for specialization to consider both parameters and arguments.

* Unsurprisingly, a bunch of the GLSL (and thus SPIR-V) generated has changed with this work, so several baseline `.slang.glsl` files needed to be updated.

* This change is incomplete in that it does not address broader cases of buffer loads, including both partial loads from constant buffers (just loading one field, but a field that uses a "large" structure type), and loads from multi-element buffers (a lot from a structured buffer where the element type is "large"). The main question in each of those cases is how to define how "large" a structure needs to be before we decide to try and sink loads into callee functions like this. In the worst case, sinking loads in this way may actually create *more* memory traffic (because the same values get loaded in multiple callee functions).

* fixup: run premake

* fixup: typo</content>
</entry>
<entry>
<title>Fix a bug in exiting SSA form for loops (#1293)</title>
<updated>2020-03-25T23:13:26+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2020-03-25T23:13:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=423b558bcc04c52626973475a3e4f758c6405f0c'/>
<id>urn:sha1:423b558bcc04c52626973475a3e4f758c6405f0c</id>
<content type='text'>
The Slang compiler was bit by a known issue when translating from SSA form back to straight-line code. Give code like the following:

    int x = 0;
    int y = 1;
    while(...)
    {
        ...
        int t = x;
        x = y;
        y = t;
    }
    ...

The SSA construction pass will eliminate the temporary `t` and yield code something like:

        br(b, 0, 1);

    block b(param x : Int, param y : Int):
        ...
        br(b, y, x);

The loop-dependent variables have become parameters of the loop block, and the branchs to the top of the loop pass the appropriate values for the next iteration (e.g., the jump that starts the loop sends in `0` and `1`).

The problem comes up when translating the back-edge the continues the loop out of SSA form. Our generated code will re-introduce temporaries for `x` and `y`:

    int x;
    int y;

    // jump into loop becomes:
    x = 0;
    y = 1;

    for(;;)
    {
        ...
        // back-edge becomes
        x = y;
        y = x;
        continue;
    }

The problem there is that we've naively translated a branch like `br(b, &lt;a&gt;, &lt;b&gt;)` into `x = &lt;a&gt;; y = &lt;b&gt;;` but that doesn't work correctly in the case where `&lt;b&gt;` is `x`, because we will have already clobbered the value of `x` with `&lt;a&gt;`.

The simplest fix is to introduce a temporary (just like the input code had), and generate:

    // back-edge becomes
    int t = x;
    x = y;
    y = t;

This change modifies the `emitPhiVarAssignments()` function so that it detects bad cases like the above and emits temporaries to work around the problem. A new test case is included that produced incorrect output before the change, and now produces the expected results.

A secondary change is folded in here that tries to guard against a more subtle version of the problem:

    for(...)
    {
        ...
        int x1 = x + 1;
        int y1 = y + 1;
        x = y1;
        y = x1;
    }

In this more complicated case, each of `x` and `y` is being assigned to a value derived from the other, but neither is being set using a block parameter directly, so the changes to `emitPhiVarAssignments()` do not apply.

The problem in this case would be if the `shouldFoldInstIntoUseSites()` logic decided to fold the computation of `x1` or `y1` into the branch instruction, resulting in:

    x = y + 1;
    y = x + 1;

which would again violate the semantics of the original code, because now there is an assignment to `x` before the computation of `x + 1`.

Right now it seems impossible to force this case to arise in practice, due to implementation details in how we generate IR code for loops. In particular, the block that computes the `x+1` and `y+1` values is currently always distinct from the block that branches back to the top of the loop, and we do not allow "folding" of sub-expressions from different blocks. It is possible, however, that future changes to the compiler could change the form of the IR we generate and make it possible for this problem to arise.

The right fix for this issue would be to say that we should introduce a temporary for any branch argument that "involves" a block parameter (whether directly using it or using it as a sub-expression). Unfortunately, the ad hoc approach we use for folding sub-expressions today means that testing if an operand "involves" something would be both expensive and unwieldy.

A more expedient fix is to disallow *all* folding of sub-expressions into unconditional branch instructions (the ones that can pass arguments to the target block), which is what I ended up implementing in this change. Making that defensive change alters the GLSL we output for some of our cross-compilation tests, in a way that required me to update the baseline/gold GLSL.

A better long-term fix for this whole space of issues would be to have the "de-SSA" operation be something we do explicitly on the IR. Such an IR pass would still need to be careful about the first issue addressed in this change, but the second one should (in principle) be a non-issue given that our emit/folding logic already handles code with explicit mutable local variables correctly.</content>
</entry>
<entry>
<title>Feature/gpu unbound array of array (#1083)</title>
<updated>2019-10-17T16:06:58+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2019-10-17T16:06:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=1102c53513837e7f052730b847270f533876833f'/>
<id>urn:sha1:1102c53513837e7f052730b847270f533876833f</id>
<content type='text'>
* Simple testing of unbounded array of array on GPU.

* Fix problem on CPU targets around NonUniformResourceIndex
Use the unbounded-array-of-array-syntax test for CPU and GPU tests.
</content>
</entry>
</feed>
