<feed xmlns='http://www.w3.org/2005/Atom'>
<title>slang.git/source/slang/ir-restructure.cpp, branch master</title>
<subtitle>Making it easier to work with shaders</subtitle>
<id>https://git.yummers.dev/slang.git/atom?h=master</id>
<link rel='self' href='https://git.yummers.dev/slang.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/'/>
<updated>2019-05-31T21:20:37+00:00</updated>
<entry>
<title>Use slang- prefix on slang compiler and core source (#973)</title>
<updated>2019-05-31T21:20:37+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2019-05-31T21:20:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=6cbc3929a54d37bd23cb5efa8e3320ba02f78b2f'/>
<id>urn:sha1:6cbc3929a54d37bd23cb5efa8e3320ba02f78b2f</id>
<content type='text'>
* Prefixing source files in source/slang with slang-

* Prefix source in source/slang with slang- prefix.

* Rename core source files with slang- prefix.

* Update project files.

* Fix problems from automatic merge.
</content>
</entry>
<entry>
<title>String/List closer to conventions, and use Index type (#959)</title>
<updated>2019-04-29T21:03:46+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2019-04-29T21:03:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=4880789e3003441732cca4471091563f36531635'/>
<id>urn:sha1:4880789e3003441732cca4471091563f36531635</id>
<content type='text'>
* List made members m_
Tweaked types to closer match conventions.

* Use asserts for checking conditions on List.
Other small improvements.

* List&lt;T&gt;.Count() -&gt; getSize()

* List&lt;T&gt;
Add -&gt; add
First -&gt; getFirst
Last -&gt; getLast
RemoveLast -&gt; removeLast
ReleaseBuffer -&gt; detachBuffer
GetArrayView -&gt; getArrayView

* List&lt;T&gt;::
AddRange -&gt; addRange
Capacity -&gt; getCapacity
Insert -&gt; insert
InsertRange -&gt; insertRange
AddRange -&gt; addRange
RemoveRange -&gt; removeRange
RemoveAt -&gt; removeAt
Remove -&gt; remove
Reverse -&gt; reverse
FastRemove -&gt; fastRemove
FastRemoveAt -&gt; fastRemoveAt
Clear -&gt; clear

* List&lt;T&gt;
FreeBuffer -&gt; _deallocateBuffer
Free -&gt; clearAndDeallocate
SwapWith -&gt; swapWith

* List&lt;T&gt;
SetSize -&gt; setSize
Reserve -&gt; reserve
GrowToSize growToSize

* UnsafeShrinkToSize -&gt; unsafeShrinkToSize
Compress -&gt; compress
FindLast -&gt; findLastIndex
FindLast -&gt; findLastIndex
Simplify Contains

* List&lt;T&gt;
Removed m_allocator (wasn't used)
Swap -&gt; swapElements
Sort -&gt; sort
Contains -&gt; contains
ForEach -&gt; forEach
QuickSort -&gt; quickSort
InsertionSort -&gt; insertionSort
BinarySearch -&gt; binarySearch

Max -&gt; calcMax
Min -&gt; calcMin

* Initializer::Initialize -&gt; initialize
List&lt;T&gt;::
Allocate -&gt; _allocate
Init -&gt; _init
IndexOf -&gt; indexOf

* * Put #include &lt;assert.h&gt; in common.h, and remove unneeded inclusions
* Small refactor of ArrayView - remove stride as not used

* getSize -&gt; getCount
setSize -&gt; setCount
unsafeShrinkToSize-&gt;unsafeShrinkToCount
growToSize -&gt; growToCount
m_size -&gt; m_count

* Some tidy up around Allocator.

* Use Index type on List.

* Refactor of IntSet.
First tentative look at using Index.

* Made Index an Int
Did preliminary fixes.
Made String use Index.

* Partial refactor of String.

* String::Buffer -&gt; getBuffer
ToWString -&gt; toWString

* Small improvements to String.
String::
Buffer() -&gt; getBuffer()
Equals() -&gt; equals

* Try to use Index where appropriate.

* Fix warnings on windows x86 builds.
</content>
</entry>
<entry>
<title>Feature/premake linux (#689)</title>
<updated>2018-10-25T15:24:16+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2018-10-25T15:24:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=4f0415e338862ffec50c2d47eddea958255b504e'/>
<id>urn:sha1:4f0415e338862ffec50c2d47eddea958255b504e</id>
<content type='text'>
* Premake work in progress for linux.

* Added dump function.

* Remove examples on linux
Small warning fix.

* * Don't build render-test on linux
* Removed work around virtual destructor warning, and just used virtual dtor for simplicity
* Git ignore obj directories

* Fix premake working on windows.

* * Fix sprintf_s functions
* Make generates arg parsing more robust

* Added FloatIntUnion to avoid type punning/strong aliasing issues, and repeated union definitions.

* Work around problems building on linux with getClass claiming a strict aliasing issue.

* Fix for targetBlock appearing potentiall used unintialized to gcc.

* Linux slang link options -fPIC to make dll.

* Add -fPIC to build options on linux.

* Add -ldl for linux on slang.

* Fixes to try and get premake working with .so on linux.

* Make core compile with -fPIC

* Try to fix linux linking with --no-as-needed before -ldl

* Add rpath back.

* Remove render-gl from linux build.

* Re-add location for linux.

* Don't include &lt;malloc.h&gt; except on windows.

* Remove unused line to fix warning on osx.

* Remove ambiguity on OSX for operator &lt;&lt;.

* Fixing ambiguity with operator overloading and Int types for OSX.

* Fix ambiguity around UInt and operator

* Fix ambiguity of UInt conversion for OSX.

* Added UnambiguousInt and UnambiguousUInt to make it easier to work around OSX integer coercion for UInt/Int types.
</content>
</entry>
<entry>
<title>Add a warning on missing return, and initial SCCP pass (#671)</title>
<updated>2018-10-12T19:23:14+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-10-12T19:23:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=c9ad36868961009fcd67e579f9b51e1333688208'/>
<id>urn:sha1:c9ad36868961009fcd67e579f9b51e1333688208</id>
<content type='text'>
* Add a warning on missing return, and initial SCCP pass

The user-visible feature added here is a diagnostic for functions with non-`void` return type where control flow might fall off the end. This *sounds* like a trivial diagnostic to add as part of the front-end AST checking, but that can run afoul of really basic stuff like:

```hlsl
int thisFunctionisOkay(int a)
{
    while(true)
    {
        if(a &gt; 10) return a;
        a = a*2 + 1;
    }
    // no return here!
}
```

This function "obviously" doesn't need to have a `return` statement at the end there, but realizing this fact relies on the compiler to understand that the `while(true)` loop can't exit normally, and doesn't contain any `break` statement. One can write "obvious" examples that need more and more complex analysis to rule out.

The answer Slang uses for stuff like this is to do the analysis at the IR level right after initial code generation (this would be before serialization, BTW, so that attached `IRHighLevelDeclDecoration`s can be used).

When lowering the AST to the IR, we always emit a `missingReturn` instruction (a subtype of `IRUnreachable`) at the end of its body if it isn't already terminated. The IR analysis pass to detect missing `return` statements is then as simple as just walking through all the functions in the module and making sure they don't contain `missingReturn` instructions.

For that simple pass to work, we first need to make some effort to remove dead blocks that control flow can never reach. This change adds a very basic initial implementation of Spare Conditional Constant Propagation (SCCP), which is a well-known SSA optimization that combines constant propagation over SSA form with dead code elimination over a CFG to achieve optimizations that are not possible with either optimization along.

For the moment, we don't actually implement any constant *folding* as part of the SCCP pass, so we can eliminate the dead block in a case like the function above (and those in the test case added in this change), but will not catch things like a `while(0 &lt; 1)` loop. Handling more "obvious" cases like that is left for future work.

* fixup: warning on unreachable code

* Handle case where user of an inst isn't in same function/code

The code as assuming any instruction in the SSA work list has to come from the function/code being processed, but this misses the case where an instruction in a generic has a use inside the function that the generic produces.

This change adds code to guard against that case.
</content>
</entry>
<entry>
<title>A bunch of work to resolve #569 (#576)</title>
<updated>2018-05-25T02:20:11+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-05-25T02:20:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=18709fbaa03fe0ef0727a802d864fae6c5163fc0'/>
<id>urn:sha1:18709fbaa03fe0ef0727a802d864fae6c5163fc0</id>
<content type='text'>
* render-test should not fail on HLSL compiler *warnings*

The logic in `render-test` that invokes `D3DCompile` was causing a test to fail if it produced any warnings (not just if compilation fails).
Warning output can be dealt with by the test runner, since it will compare output between runs anyway, and it is useful to be able to run something through `render-test` that compiles with warnings.

* Be more careful about deleting IR instructions

There was an `IRInst::deallocate()` method that had a precondition that the instruction should already be removed from its parent and clear out all its operands before calling, but it wasn't checking this and the few call sites weren't doing things right either.
I consolidated things on `IRInst::removeAndDeallocate()` which does all the things: removes from the parent, clear out operands, and then deallocates.
I also made sure to clear out the type operand.

This clears up some crashing issues where passes were removing instructions but those instructions would still show up as users of other instructions.

* Don't emit bitwise not for non-Boolean types

It seems like the logic in `emit.cpp` messed things up and decided that `Not` (the IR instruction that is equivalent to `!` in the AST) should emit as `!` for Boolean types and `~` for other types, but this makes no sense (e.g., `~(a &amp; 1)` is very different from `!(a &amp; 1)`, even when interpreted as a condition).

It seems like this logic was intended for the `BitNot` case, where `~a` and `!a` are actually equivalent for Boolean values (but a target language might not like `~a` on `bool` values).

Maybe the original plan was that the `Not` instruction should only apply to Boolean values in the first place, and that other values should be converted to `bool` (or a vector of `bool`) before applying `Not`, but even in that case the emit logic makes no sense.

This caused an actual problem for one of my test cases, so it was important to fix it now.

* Fix issue with cached resolution for overoaded operators

The basic problem was that the lookup logic was forming a key based on the *first* definition it found for the overloaded operator, but that means that when processing a prefix `++a` call we might look up the *postfix* definition of `operator++` and decide to use its opcode as the key.

This "fixes" the logic by looking for the first definition with a "compatible" definition (e.g., a `__prefix` function if we are checking a `PrefixExpr`), and then uses its opcode.

A better fix in the long run would be to make the cache just be keyed on the operator name and the "fixity" of the expression (prefix, postfix, or infix).

* Introduce an intermediate structured control-flow representation

The code previously used a single function called `emitIRStmtsForBlocks` in `emit.cpp` that would take a logical sub-graph of the CFG and emit it as high-level statements.
It would do this by recognizing operations like coniditional branches that it could turn into high-level `if` statements, etc.
The main problem with this function was that it mixed together the logic for how we restructure the program with the logic for how we emit high-level code from that structure.

This change splits those two parts of the algorithm by introducing an intermediate data structure: a tree of `Region`s, which represent single-entry regions of the CFG.
There are subclasses of `Region` corresponding to various structured control-flow constructs, and then a leaf case that wraps a single `IRBlock`.

The new function `generateRegionsForIRBlocks()` (in `ir-restructure.cpp`) now handles the restructuring work, by building one or more `Region`s to represent a sub-graph, while `emitRegion()` handles emitting HLSL/GLSL source code from a region.
Splitting things in this way opens up some opportunities for future changes:

* We can expand the set of IR control-flow constructs allowed, so long as we can still generate structure `Region`s from them, without having to mess with the emit logic (e.g., we could start to support multi-level `break` by introducing temporaries as needed). In the limit we can generate our `Region`s using something like the "Relooper" algorithm.

* We can emit to other representations while retaining the same control-flow restructuring support. E.g., if we drop the structured information from the IR, then emitting to SPIR-V for Vulkan would require us to use the strucured control-flow information from these `Region`s.

* We can do analysis that needs to understand `Region` structure. This is relevant to issue #569, which was what prompted me to start on this work. Now that we have a representation of the nesting of `Region`s, we can use it to reason about visibility of values between blocks.

During development of this change I ran into a gotcha, in that I had been assuming each IR block would map to a single `Region`, forgetting that our current lowering of "continue clauses" in `for` loops leads to them being duplicated. The `Region` representation handles this by having a linked-list struct mapping IR blocks to the `SimpleRegion`s that represent them. I added a test case that includes a `for` loop with a continue clause that is reached along multiple paths just to make sure that we continue to support that case.

The compiler output should not change as a result of this work; this is supposed to be a pure refactoring change.

* Add a pass to resolve scoping issues in generated code

Fixes #569

The basic problem arises because the structured control flow that we output in high-level HLSL/GLSL doesn't match the "scoping" rules of an SSA IR.
In particular, SSA says that a value can be used in any block that is dominated by the definition, but in the presence of `break` and `continue` statements it is easy to construct cases where a block dominates something that is not in its scope for structured control flow. Consider:

```hlsl
for(;;) {
    int a = xyz;
    if(a) { int b = a; break; }
    int c = a;
}
int d = b;
```

This program is invalid as HLSL, because the variable `b` is referenced outside of its scope, but if we look at the CFG for this function, it is clear that the block that computes `b` dominated the block that computes `d`. IR optimizations can easily create code like this, so we need to be ready for it.

The previous change added an explicit `Region` structure to represent the structured control flow that we re-form out of the IR, and this change adds a pass that exploits the structuring information to detect cases like the above and introduce temporaries to fix the scoping issue. For example, the pass would change the earlier code block into something like:

```hlsl
int tmp;
for(;;) {
    int a = xyz;
    if(a) { int b = a; tmp = b; break; }
    int c = a;
}
int d = tmp;
```

That is, we introduce a new `tmp` variable at a scope "above" both the definition and use of `b`, and then we copy `b` into that temporary right where it is computed, and then use the temporary instead of the original `b` at the use site.

A few details that came up during the implementation:

* Downstream compilers may get confused by code like the above, and complain that `tmp` may be used before it is initialized, even though the very definition of dominators in a CFG means we don't have to worry about it. Still, I introduced some one-off code to initialize the temporaries just to silence spurious warnings coming from fxc.

* We need to be careful not to apply this logic to "phi nodes" (the parameters of basic blocks) since they will already be turned into temporaries by the emit logic, and trying to introduce temporaries with this pass led to broken code (I still need to investigate why). It may be that a future version of this pass should also take the code out of SSA form, so that we can introduce both kinds of temporaries in a single pass (and maybe eliminate some unnecessary variables by doing basic register allocation).

There is another transformation that could fix some issues of this kind, by moving code out of a structured control-flow construct and to the "join point" after it. For example, we could turn our loop from the start of this commit message into:

```hlsl
for(;;) {
    int a = xyz;
    if(a) { break; }
    int c = a;
}
int b = a;
int d = b;
```

Moving the definition of `b` to after the loop is possible because there is no way to get out of the loop without executing that code anyway. Now the scoping issue for `d`'s use of `b` has gone away, but of course we've introduced a *new* scoping issue for `a`, when it gets used by `b`.

Adding a pass to re-arrange control flow like this could reduce the cases where we have to apply the current pass, but it wouldn't eliminate them entirely. That means such a pass can be deferred to future work.

This change includes a test case the reproduces the original issue, so that we can confirm the fix works.
</content>
</entry>
</feed>
