<feed xmlns='http://www.w3.org/2005/Atom'>
<title>slang.git/source/slang/ir-ssa.cpp, branch master</title>
<subtitle>Making it easier to work with shaders</subtitle>
<id>https://git.yummers.dev/slang.git/atom?h=master</id>
<link rel='self' href='https://git.yummers.dev/slang.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/'/>
<updated>2019-05-31T21:20:37+00:00</updated>
<entry>
<title>Use slang- prefix on slang compiler and core source (#973)</title>
<updated>2019-05-31T21:20:37+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2019-05-31T21:20:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=6cbc3929a54d37bd23cb5efa8e3320ba02f78b2f'/>
<id>urn:sha1:6cbc3929a54d37bd23cb5efa8e3320ba02f78b2f</id>
<content type='text'>
* Prefixing source files in source/slang with slang-

* Prefix source in source/slang with slang- prefix.

* Rename core source files with slang- prefix.

* Update project files.

* Fix problems from automatic merge.
</content>
</entry>
<entry>
<title>String/List closer to conventions, and use Index type (#959)</title>
<updated>2019-04-29T21:03:46+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2019-04-29T21:03:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=4880789e3003441732cca4471091563f36531635'/>
<id>urn:sha1:4880789e3003441732cca4471091563f36531635</id>
<content type='text'>
* List made members m_
Tweaked types to closer match conventions.

* Use asserts for checking conditions on List.
Other small improvements.

* List&lt;T&gt;.Count() -&gt; getSize()

* List&lt;T&gt;
Add -&gt; add
First -&gt; getFirst
Last -&gt; getLast
RemoveLast -&gt; removeLast
ReleaseBuffer -&gt; detachBuffer
GetArrayView -&gt; getArrayView

* List&lt;T&gt;::
AddRange -&gt; addRange
Capacity -&gt; getCapacity
Insert -&gt; insert
InsertRange -&gt; insertRange
AddRange -&gt; addRange
RemoveRange -&gt; removeRange
RemoveAt -&gt; removeAt
Remove -&gt; remove
Reverse -&gt; reverse
FastRemove -&gt; fastRemove
FastRemoveAt -&gt; fastRemoveAt
Clear -&gt; clear

* List&lt;T&gt;
FreeBuffer -&gt; _deallocateBuffer
Free -&gt; clearAndDeallocate
SwapWith -&gt; swapWith

* List&lt;T&gt;
SetSize -&gt; setSize
Reserve -&gt; reserve
GrowToSize growToSize

* UnsafeShrinkToSize -&gt; unsafeShrinkToSize
Compress -&gt; compress
FindLast -&gt; findLastIndex
FindLast -&gt; findLastIndex
Simplify Contains

* List&lt;T&gt;
Removed m_allocator (wasn't used)
Swap -&gt; swapElements
Sort -&gt; sort
Contains -&gt; contains
ForEach -&gt; forEach
QuickSort -&gt; quickSort
InsertionSort -&gt; insertionSort
BinarySearch -&gt; binarySearch

Max -&gt; calcMax
Min -&gt; calcMin

* Initializer::Initialize -&gt; initialize
List&lt;T&gt;::
Allocate -&gt; _allocate
Init -&gt; _init
IndexOf -&gt; indexOf

* * Put #include &lt;assert.h&gt; in common.h, and remove unneeded inclusions
* Small refactor of ArrayView - remove stride as not used

* getSize -&gt; getCount
setSize -&gt; setCount
unsafeShrinkToSize-&gt;unsafeShrinkToCount
growToSize -&gt; growToCount
m_size -&gt; m_count

* Some tidy up around Allocator.

* Use Index type on List.

* Refactor of IntSet.
First tentative look at using Index.

* Made Index an Int
Did preliminary fixes.
Made String use Index.

* Partial refactor of String.

* String::Buffer -&gt; getBuffer
ToWString -&gt; toWString

* Small improvements to String.
String::
Buffer() -&gt; getBuffer()
Equals() -&gt; equals

* Try to use Index where appropriate.

* Fix warnings on windows x86 builds.
</content>
</entry>
<entry>
<title>Initial support for the `precise` keyword (#958)</title>
<updated>2019-04-29T16:31:25+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2019-04-29T16:31:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=ded340beb4b5197b559626acc39920abb2d39e77'/>
<id>urn:sha1:ded340beb4b5197b559626acc39920abb2d39e77</id>
<content type='text'>
Fixes #858

The `precise` keyword exists in both HLSL and GLSL and when applied to a variable declaration is supposed to indicate that all computations that contribute to the value of that variable should not be altered based on "fast-math" optimizations. The main examples are that separate multiply and add operations should not be turned into fused multiply-add (fma) operations, and that operations cannot ignore the possibility of infinity or not-a-number values (e.g., by assuming that `x * 0.0f` is always `0.0f`).

(Aside: it is possible that my understanding of what the semantics of `precise` are in HLSL and GLSL is imperfect so that either the GLSL variant isn't sufficient to provide the semantics of the HLSL keyword, or that the definition of "all computations that contribute" to a value isn't actually correct. We may need to revise this implementation based on subsequent learnings.)

The basic idea here is to turn the AST `precise` keyword into a `[precise]` decoration in the IR and then emit that as a `precise` keyword again in the output.

The main catch is that whereas most of our existing IR decorations apply to things like global shader parameters or `struct` members that usually stick around for the duration of compilation, `[precise]` will get slapped on local variables that will often get optimized away by our SSA pass. There are two ways a variable can get eliminated/replaced during the SSA pass:

1. A use of the variable can be replaced with an ordinary instruction that computes its value.
2. A use of the variable can be replaced with a reference to a "phi node" that will take on the appropriate value based on control flow.

These two cases already had logic to copy a "name hint" decoration from the variable over to an instruction that will replace it, and I simply extended them to also propagate over a `[precise]` decoration.

The test case added with this change intentionally constructs a case where `[precise]` needs to be propagated over to an SSA "phi node" in order to generate correct output code.

The other gotcha is that we can emit variable declarations in various places in `emit.cpp`, and all of these needed to handle `[precise]`. Not only do we have actually local variables (`IRVar`), but we also have SSA phi nodes (`IRParam`), and then there are cases where an intermediate computation (an ordinary instruction) should be `[precise]` and thus we need to emit it as a temporary (not folding it into its use sites) and make sure that the temporary itself gets the `precise` keyword.

I have manually confirmed that in the output SPIR-V, this change results in the `NoContraction` SPIR-V decoration being added to the relevant operations, and the output DXBC contains a multiply and an add in place of a multiply-add. The output DXIL does not show any obvious changes due to `precise`, although the exact order and operands of the math instructions emitted does differ when `precise` is added/removed. In all cases the output is equivalent to hand-written HLSL/GLSL with a `precise`-qualified local variable.</content>
</entry>
<entry>
<title>Initial support for dynamic dispatch using "tagged union" types (#772)</title>
<updated>2019-01-16T20:48:11+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2019-01-16T20:48:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=aedf61784606406c090302efd8b7ac668ac997fc'/>
<id>urn:sha1:aedf61784606406c090302efd8b7ac668ac997fc</id>
<content type='text'>
* Initial support for dynamic dispatch using "tagged union" types

Suppose a user declares some generic shader code, like the following:

```hlsl
interface IFrobnicator { ... }
type_param T : IFrobincator;
ParameterBlock&lt;T : IFrobnicator&gt; gFrobnicator;
...
gFrobincator.frobnicate(value);
```

and then they have some concrete implementations of the required interface:

```hlsl
struct A : IFrobnicator { ... }
struct B : IFrobnicator { ... }
```

The current Slang compiler allows them to generate  distinct compiled kernels for the case of `T=A` and the case of `T=B`. This means that the decision of which implementation to use must be made at or before the time when a shader gets bound in the application.

This change adds a new ability where the Slang compiler can generate code to handle the case where `T` might be *either* `A` or `B`, and which case it is will be determined dynamically at runtime. This means a single compiled kernel can handle both cases, and the decision about which code path to run can be made any time before the shader executes.

This new option is supported by defining a *tagged union* type. Via the API, the user specifies that `T` should be specialized to `__TaggedUnion(A,B)` (the double underscore indicates that this is an experimental and unsupported feature at present). We refer to the types `A` and `B` here as the "case" types of the tagged union. Conceptually, the compiler synthesizes a type something like:

```hlsl
struct TU { union { A a; B b; } payload; uint tag; }
```

The user can then allocate a constant buffer to hold their tagged union type, and when they pick a concrete type to use (say `B`), they fill in the first `sizeof(B)` bytes of their buffer with data describing a `B` instance, and then set the `tag` field to the appopriate 0-based index of the case type they chose (in this case the `B` case gets the tag value `1`).

Actually implementing tagged unions takes a few main steps:

* Type parsing was extended to special-case `__TaggedUnion` as a contextual keyword. This is really only intended to be used when parsing types from the API or command-line, and Bad Things are likely to happen if a user ever puts it directly in their code. Eventually construction of tagged unions should be an API feature and not part of the language syntax.

* Semantic checking was extended to recognize that a tagged union like `__TaggedUnion(A,B)` shoud support an interface like `IFrobnicator` whenever all of the case types suport it, as long as the interface is "safe" for use with tagged unions (which means it doesn't use a few of the advancd langauge features like associated types).

* The IR was extended with instructions to represent tagged union types and to extract their tag and the payload for the different cases as needed.

* IR generation was extended to synthesize implementations of interface methods for any interface that a tagged union needs to support. Right now the implementation is simplistic and only handles simple method requirements, which it does by emitting a `switch` instruction to pick between the different cases.

* A new IR pass was introduced to "desugar" any tagged union types used in the code. The downstream HLSL and GLSL compilers don't support `union`s, so we have to instead emit a tagged union as a "bag of bits" and implement loading the data for particular cases from it manually.

* Final code emit mostly Just Works after the above steps, but we had to introduce an explicit IR instruction for bit-casting to handle the output of the desugaring pass.

There are a bunch of gaps and caveats in this implementation, but that seems reasonable for something that is an experimental feature. The various `TODO` comments and assertion failures in unimplemented cases are intended, so that this work can be checked in even if it isn't feature-complete.

* fixup: missing files

* fixup: typos
</content>
</entry>
<entry>
<title>Decorations are instructions (#748)</title>
<updated>2018-12-11T23:17:55+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-12-11T23:17:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=62d3e387774255be4d507cca045ac97dabac9970'/>
<id>urn:sha1:62d3e387774255be4d507cca045ac97dabac9970</id>
<content type='text'>
* Make a test case use IR serialization

* Make all IR instructions usable as parents

This makes it so that every `IRInst` has the list of children that used to be on `IRParentInst` and eliminates `IRParentInst`.
Most places in the code were only checking against `IRParentInst` so that they could know whether there were child instructions to iterate over.

This change bloats the size of every instruction by two pointers, but we hope to be able to eliminate that overhead with a better encoding later.

* Change IR decorations to be instructions.

The main change here is that `IRDecoration` now inherits from `IRInst`, and `IRInst` now has a single linked list that holds both decorations *and* children.
At each point where code used to loop over `getChildren()` on an `IRInst`, I checked whether it made sense to leave the operation as processing just the children, or if it should process both decorations and children.

The thorniest bit was making sure the logic for inserting an instruction into a parent is correct. For the most part, once IR code is built all insertions are explicitly before/after another instruction, so the ordering can't get messed up. The sticking point is any code that does an explicit `insertAtStart` or `insertAtEnd`, but I surveyed those to make sure they are correct in context, and I also made all insertions bottleneck through one routine that does a better job of asserting the preconditions than what was there before. We may still want a "smart" insertion function at some point so that if somebody does `someDecoration-&gt;insertAtEnd(someInst)` the decoration intelligently goes to the end of the decoration list, and not the entire decorations-and-children list.

All of the existing decoration types were refactored to provide accessors for their operands, rather than directly exposing fields. In most cases the operands are required to be `IRConstant` nodes of fixed types. Not all of these types need to be kept around in the new approach, but they were left in so that as much existing code as possible can be kept working.
The `IRBuilder` was extended with factory functions to make the various decoration types and attach them.

All the fields in concrete decorations that were using `StringRepresentation` or `Name` pointers are now using IR-level string operands which provide their value as an `UnownedStringSlice`, so logic that was working with those decoration values needed to be updated here and there. I also needed to add the logic to clone string-literal values to the IR cloning pass, since they are now being used in almost every piece of code.

A new type of constant IR instruction for literal pointers was added, to handle the cases where an IR decoration needs an operand that is a raw AST-level pointer. These are even being serialized, although we obviously should not rely on them to round-trip through serialization in the future. Ideally, a follow-on change should add a cleanup pass where we remove any decorations from a module that shouldn't be allowed in the serialized code.

The biggest overall cleanup is in the serialization logic, where a lot of code just disappears because it can process the raw "decorations and children" list as the logical children of an IR instruction. The only special cases left are literals (which seem like they will always need special-casing) and global values (because they have a mangled name, which we plan to move into a decoration).

One other example of a simplification made possible by this change: the `IRNotePatchConstantFunc` instruction was implemented as an instruction only because it couldn't be encoded as a decoration at the time (it needed to have an operand that referenced an IR function).

The IR dumping logic was also updated (which meant a change to the `ir/string-literal` test) to try to make it print out all decorations a bit more systematically now that they are encoded like other instructions. The formatting isn't quite perfect, but it is good enough to be able to read what is going on.

I didn't include updates to the validation logic to ensure that decorations are being added in ways that follow the invariants, but that would be a nice thing to add next.

* fixup: 64-bit issues

* fixup: forward declaration issues
</content>
</entry>
<entry>
<title>Fix an SSA construction bug (#739)</title>
<updated>2018-12-04T21:56:59+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-12-04T21:56:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=4f69056d7c50fd19da3120866ae7e38d402d3d95'/>
<id>urn:sha1:4f69056d7c50fd19da3120866ae7e38d402d3d95</id>
<content type='text'>
Fixes #723

This fixes a case where the SSA construction pass wasn't dealing with the possibility of a phi node that it had provisionally introduced being replaced later. The result was invalid IR (caught with `-validate-ir`) that referenced an instruction nowhere in the IR module (because it was dropped).

The fix centralizes the code for dealing with phi nodes that have been replaced, so that the two different paths where variables get "read" during SSA construction can both use the same logic.</content>
</entry>
<entry>
<title>Cleanups around behavior when the compiler fails (#553)</title>
<updated>2018-05-11T20:56:14+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-05-11T20:56:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=5e604a6f39ef8e8086702d41113ea78856804c99'/>
<id>urn:sha1:5e604a6f39ef8e8086702d41113ea78856804c99</id>
<content type='text'>
* Cleanups around behavior when the compiler fails

* Add another case where we try to `noteInternalErrorLoc()` if an exception in thrown. This one is the in the logic for emitting an IR instruciton. This could be improved by adding another layer at the function level (as a catch-all for instructions with no location), but something is better than nothing.

* Change a bunch of `assert()`s over to `SLANG_ASSERT()`s, so that we can theoretically take more control over them (e.g., make release builds with asserts enabled)

* Some other small cleanups around the assertions we perform.

In the survey I made, I didn't really see many obvious "smoking gun" cases where we could produce a significantly better error message for some of the unimplemented/unexpected paths, other than to actually implement the missing functionality.

* fixup
</content>
</entry>
<entry>
<title>Add a pass for computing dominator trees (#541)</title>
<updated>2018-05-04T00:40:26+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-05-04T00:40:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=494330d4941ccaf50e07ef309fd783c2f44a492e'/>
<id>urn:sha1:494330d4941ccaf50e07ef309fd783c2f44a492e</id>
<content type='text'>
This code is currently not used by anything, but I wanted to check in a first pass at an implementation of dominator tree construction so that we don't have to keep avoiding implementing algorithms that rely on having dominator information available.

The algorithm used to construct the dominator tree is taken from "A Simple, Fast Dominance Algorithm" by Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy.
This is not the "best" algorithm in terms of asymptotic performance, but it is among the simplest algorithms for computing a dominator tree that still outperforms naive iterative set-based methods.

The actual data structure and API for the dominator tree has a bit of "cleverness" in it to try to make the common queries reasonably fast (e.g., you can check whether A dominates B in constant time).
My hope is that even if we implement a more advanced algorithm for constructing the dominator tree, we can retain compatibility with passes that might make use of this API.

Because no code is currently using this logic, I have done only minimal testing by stepping through this code and validating the results on paper for some very small CFGs.
More serious testing/debugging may need to wait until we have an optimization pass that needs the dominator tree we compute here.
One open question I have is how best to introduce traditional unit testing into Slang, since this is an example of code that would benefit greatly from being unit tested.</content>
</entry>
<entry>
<title>Pass through original names for most declarations (#547)</title>
<updated>2018-05-03T23:34:49+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-05-03T23:34:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=00afea1e59e8929324df882009618534d3065138'/>
<id>urn:sha1:00afea1e59e8929324df882009618534d3065138</id>
<content type='text'>
The basic idea here is that when lowering to the IR, the front-end will attach a "name hint" to the IR instruction(s) that represent a given declaration, and then the passes that work on the IR will try to preserve and propagate those names, and then finally the emit logic will use them in place of mangled or unique names when available.

This change does *not* try to deal with the issues that arise when we try to use those variable names in the output without any modification (e.g., handling cases where they might clash with keywords or builtins in the target language). Instead, it tries to establish baseline behavior for propagating through names, so that a later change can concentrate on the issue of using those names exactly when it is legal to do so.

In order to avoid issues around the name "hints" causing problems we take two main steps:

1. We "scrub" each name to reduce it down to the allowed set of identifier characters in C-like languages, and then ensure that it doesn't do things that would be illegal in some downstream languages (e.g., consecutive underscores are not allowed in GLSL) or could clash with Slang's mangled names. This process isn't guaranteed to give distinct results for distinct inputs (it isn't a mangling scheme, after all).

2. We generate a unique ID for each occurence of a given name and always use that as a suffix. This means that even if a name happens to overlap with a keyword (if you somehow have a variable named `do`), we will still add a suffix that makes it not a problem (we'd output `do_0` which is fine).

The logic for generating these names is mostly straightforward. For simple variables, we use their given name directly, while for other declarations we try to form a name that includes their parent declaration (e.g. `SomeType.someMethod`).

Various IR passes need to propagate or preserve this information. The most interesting is type legalization, when we take a variable with an aggregate type and split some of the fields out into their own variables. In that case we generate "dotted" names like `someVar.someTexture` and rely on the emit logic to turn that into `someVar_someTexture`.

During SSA generation, if we are promoting a variable to SSA temporaries, we will try to propagate the name of the variable over to the temporaries (unless they already have a name from some other place). The same applies to block parameters ("phi nodes").

Many of the test changes need their expected output to be updated for this change. Luckily in most cases the output has gotten easier to understand.</content>
</entry>
<entry>
<title>Improve SSA promotion for arrays and structs (#521)</title>
<updated>2018-04-23T17:37:56+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-04-23T17:37:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=9a7849d893ebb755a607befff6b3429830421112'/>
<id>urn:sha1:9a7849d893ebb755a607befff6b3429830421112</id>
<content type='text'>
* Improve SSA promotion for arrays and structs

Fixes #518

The existing SSA pass would only handle `load(v)` and `store(v,...)`
where `v` is the variable instruction, and would bail out if `v` was
used as an operand in any other fashion.

The new pass adds support for `load(ac)` where `ac` is an "access chain"
with a gramar like:

    ac :: v
        | getElementPtr(ac, ...)
	| getFieldAddress(ac, ...)

What this means in practical terms is that we can promote a local
variable of array or structure type to an SSA temporary even if there
are loads of individual elements/fields, as along as any *assignment* to
the variable assigns the whole thing.

I've added a test case to confirm that this change fixes passing of
arrays as function parameters for Vulkan.

* Fixup: disable test on Vulkan because render-test isn't ready

This is a fix for Vulkan, but I don't think our testing setup is ready
for it.

* Fixup: error in unreachable return case, caught by clang

* Fixups based on testing

These are fixes found when testing the original changes against the user code that originated the bug report.

* `emit.cpp`: Make sure to handle array-of-texture types when deciding whether to declare a temporary as a local variable in GLSL output

* `ir-legalize-types.cpp`: Make a not of a source of validation failures that we need to clean up sooner or later (just not in scope for this bug fix change).

* `ir-ssa.cpp`:
  * When checking if something is an access chain with a promotable var at the end, make sure the recursive case recurses into the "access chain" logic instead of the leaf case
  * Add some assertions to guard the assumption that any access chain we apply has been scheduled for removal
  * Correctly emit an element *extract* instead of getting an element *address* when promoting an element access into an array being promoted
  * Eliminate a wrapper routine that was setting up an `IRBuilder` and use the one from the block being processed in the SSA pass (since it was set up for stuff just like this)

* `ir-validate.cpp`
  * Add a hack to avoid validation failures when running IR validation on the stdlib code. This case triggers for an initializer (`__init`) declaration inside an interface, since the logical "return type" is the interface type itself, which has no representation at the IR level and thus yields a null result type in a `FuncType` instruction.
</content>
</entry>
</feed>
