<feed xmlns='http://www.w3.org/2005/Atom'>
<title>slang.git/tools/render-test/d3d-util.cpp, branch master</title>
<subtitle>Making it easier to work with shaders</subtitle>
<id>https://git.yummers.dev/slang.git/atom?h=master</id>
<link rel='self' href='https://git.yummers.dev/slang.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/'/>
<updated>2018-06-28T18:14:48+00:00</updated>
<entry>
<title>Share graphics API layer between tests/examples (#603)</title>
<updated>2018-06-28T18:14:48+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-06-28T18:14:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=dfe13b54286b27dd15f591455bbb86b7798285c2'/>
<id>urn:sha1:dfe13b54286b27dd15f591455bbb86b7798285c2</id>
<content type='text'>
The `render-test` project has an in-progress graphics API abstraction layer, and it makes sense to share this code with our examples rather than write a bunch of redundant code between examples and tests.

Most of this change is just moving files from `tools/render-test/*` to a new library project at `tools/slang-graphics/`. The most complicated code change there is renaming from `render_test` to `slang_graphics`.

The existing `hello` example was ported to use the graphics API layer instead of raw D3D11 API calls. It is still hard-coded to use the D3D11 back-end and the `SLANG_DXBC` target, so more work is needed if we want to actually support multiple APIs in the examples.

I also went ahead and implemented an extremely rudimentary set of APIs to abstract over the Windows platform calls that were being made in the example, so that we could potentially run that same example on other platforms. I did *not* port `render-test` to use those APIs, and I also did not implement them for anything but Windows (my assumption is that for most other platforms we would just use SDL2, and require people to ensure it is installed to their machine before building Slang examples).</content>
</entry>
<entry>
<title>Make render-test use Slang for all shader compilation (#597)</title>
<updated>2018-06-13T22:39:04+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-06-13T22:39:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=77562ef82bcbab569ebbbd769957948d825c92ad'/>
<id>urn:sha1:77562ef82bcbab569ebbbd769957948d825c92ad</id>
<content type='text'>
* Make render-test use Slang for all shader compilation

This streamlines the code for render-test by having all its shader compilation go through the Slang API, so that it doesn't have to deal with custom logic to compile HLSL-&gt;DXBC and HLSL-&gt;DXIL. We were already leaning on Slang to generate SPIR-V for Vulkan, so this makes all the paths more consistent.

My original plan with this change was to make the D3D12 render path start using DXIL at this point, since the change would make that easy, but it turns out that some aspects of how we handle parameter binding are not compatible with that right now, so it would need to come as a later change.

There's a lot of details here, so I will try to walk through the changes, including the incidental ones:

* Add logic to `premake5.lua` so that we copy the necessary libraries for HLSL shader compilation to our target directory from the Windows SDK. This is necessary so that our tests can actually invoke `dxcompiler.dll`

* Re-run Premake to generate new project files. This moves around a few files that I manually added in previous changes without re-running Premake.

* When invoking `fxc` as a pass-through compiler, be sure to pass along any macros defines via API or command-line. This isn't a strictly required change with how things worked out, but it is a positive one anyway, because it makes `slangc -pass-through fxc` more useful.

* Don't print output from a downstream `fxc` invocation if it produces warnings but no errors. The main reason for this is so that our tests don't fail because of `fxc` warnings on Slang's output (which then don't match the baselines), but it can also be rationalized as not wanting to confuse users with warnings that don't come from the "real" compiler they are using. This probably needs fine-tuning as a policy.

* Add the HLSL `NonUniformResourceIndex` function. This was an oversight because it isn't documented as a builtin on MSDN, and only gets mentioned obliquely when they talk about resource indexing.

* Add `glsl_&lt;version&gt;` profiles to match our `sm_&lt;version&gt;` profiles, so that it is easy for a user to use the profile mechanism to request a specific GLSL version without also specifying a stage name.

* Update the render-test logic so that there is a single `ShaderCompiler` implementation that *always* uses Slang, and get rid of all of the renderer-specific `ShaderCompiler` implementations.

* Update logic in render-test `main.cpp` to select the options to use for the eventual Slang compile based on the choice of renderer and input language. I didn't change the options that render-test exposes, even though they are getting increasingly silly (e.g., `-glsl-rewrite` doesn't use GLSL as its input...).

* Note: the D3D12 renderer will still use fxc, DXBC, and SM 5.0 for now, since trying to update it to switch to dxc, DXIL, and SM 6.0 didn't work well at the time.

* Add a bit of supporting D3D12 code to make sure that we don't allocate a structured buffer when a buffer has a format.

* Make sure to *also* define the `__HLSL__` macro when compiling Slang code, because otherwise a bunch of tests don't work (I'm not clear on how it worked before...).

* fixup: missing file
</content>
</entry>
<entry>
<title>Fix atomic operations on RWBuffer (#593)</title>
<updated>2018-06-06T04:35:48+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-06-06T04:35:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=1a698128c15bce0c05b0664bb1458842e1e55511'/>
<id>urn:sha1:1a698128c15bce0c05b0664bb1458842e1e55511</id>
<content type='text'>
* Fix atomic operations on RWBuffer

An earlier change added support for passing true pointers to `__ref` parameters to fix the global `Interlocked*()` functions when applied to `groupshared` variables or `RWStructureBuffer&lt;T&gt;` elements.
That change didn't apply to `RWBuffer&lt;T&gt;` or `RWTexture2D&lt;T&gt;`, etc. because those types had so far only declared `get` and `set` accessors, but not any `ref` accessors (which return a pointer).

The main fixes here are:

* Add `ref` accessors to the subscript oeprations on the `RW*` resource types

* Adjust the logic for emitting calls to subscript accessors so that we don't get quite as eager about invoking a `ref` accessor, and instead try to invoke just a `get` or `set` accessor when these will suffice. This is important for Vulkan cross-compilation, where we don't yet support the semantics of our `ref` accessors.

* Add a test case for atomics on a `RWBuffer`

* Fix up `render-test` so that we can specify a format for a buffer resource, which allows us to use things other than `*StructuredBuffer` and `*ByteAddressBuffer`. The work there is probably not complete; I just did what I could to get the test working.

* A bunch of files got whitespace edits thanks to the fact that I'm using editorconfig and others on the project seemingly arent...

* fixup: remove ifdefed-out code
</content>
</entry>
<entry>
<title>A bunch of work to resolve #569 (#576)</title>
<updated>2018-05-25T02:20:11+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-05-25T02:20:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=18709fbaa03fe0ef0727a802d864fae6c5163fc0'/>
<id>urn:sha1:18709fbaa03fe0ef0727a802d864fae6c5163fc0</id>
<content type='text'>
* render-test should not fail on HLSL compiler *warnings*

The logic in `render-test` that invokes `D3DCompile` was causing a test to fail if it produced any warnings (not just if compilation fails).
Warning output can be dealt with by the test runner, since it will compare output between runs anyway, and it is useful to be able to run something through `render-test` that compiles with warnings.

* Be more careful about deleting IR instructions

There was an `IRInst::deallocate()` method that had a precondition that the instruction should already be removed from its parent and clear out all its operands before calling, but it wasn't checking this and the few call sites weren't doing things right either.
I consolidated things on `IRInst::removeAndDeallocate()` which does all the things: removes from the parent, clear out operands, and then deallocates.
I also made sure to clear out the type operand.

This clears up some crashing issues where passes were removing instructions but those instructions would still show up as users of other instructions.

* Don't emit bitwise not for non-Boolean types

It seems like the logic in `emit.cpp` messed things up and decided that `Not` (the IR instruction that is equivalent to `!` in the AST) should emit as `!` for Boolean types and `~` for other types, but this makes no sense (e.g., `~(a &amp; 1)` is very different from `!(a &amp; 1)`, even when interpreted as a condition).

It seems like this logic was intended for the `BitNot` case, where `~a` and `!a` are actually equivalent for Boolean values (but a target language might not like `~a` on `bool` values).

Maybe the original plan was that the `Not` instruction should only apply to Boolean values in the first place, and that other values should be converted to `bool` (or a vector of `bool`) before applying `Not`, but even in that case the emit logic makes no sense.

This caused an actual problem for one of my test cases, so it was important to fix it now.

* Fix issue with cached resolution for overoaded operators

The basic problem was that the lookup logic was forming a key based on the *first* definition it found for the overloaded operator, but that means that when processing a prefix `++a` call we might look up the *postfix* definition of `operator++` and decide to use its opcode as the key.

This "fixes" the logic by looking for the first definition with a "compatible" definition (e.g., a `__prefix` function if we are checking a `PrefixExpr`), and then uses its opcode.

A better fix in the long run would be to make the cache just be keyed on the operator name and the "fixity" of the expression (prefix, postfix, or infix).

* Introduce an intermediate structured control-flow representation

The code previously used a single function called `emitIRStmtsForBlocks` in `emit.cpp` that would take a logical sub-graph of the CFG and emit it as high-level statements.
It would do this by recognizing operations like coniditional branches that it could turn into high-level `if` statements, etc.
The main problem with this function was that it mixed together the logic for how we restructure the program with the logic for how we emit high-level code from that structure.

This change splits those two parts of the algorithm by introducing an intermediate data structure: a tree of `Region`s, which represent single-entry regions of the CFG.
There are subclasses of `Region` corresponding to various structured control-flow constructs, and then a leaf case that wraps a single `IRBlock`.

The new function `generateRegionsForIRBlocks()` (in `ir-restructure.cpp`) now handles the restructuring work, by building one or more `Region`s to represent a sub-graph, while `emitRegion()` handles emitting HLSL/GLSL source code from a region.
Splitting things in this way opens up some opportunities for future changes:

* We can expand the set of IR control-flow constructs allowed, so long as we can still generate structure `Region`s from them, without having to mess with the emit logic (e.g., we could start to support multi-level `break` by introducing temporaries as needed). In the limit we can generate our `Region`s using something like the "Relooper" algorithm.

* We can emit to other representations while retaining the same control-flow restructuring support. E.g., if we drop the structured information from the IR, then emitting to SPIR-V for Vulkan would require us to use the strucured control-flow information from these `Region`s.

* We can do analysis that needs to understand `Region` structure. This is relevant to issue #569, which was what prompted me to start on this work. Now that we have a representation of the nesting of `Region`s, we can use it to reason about visibility of values between blocks.

During development of this change I ran into a gotcha, in that I had been assuming each IR block would map to a single `Region`, forgetting that our current lowering of "continue clauses" in `for` loops leads to them being duplicated. The `Region` representation handles this by having a linked-list struct mapping IR blocks to the `SimpleRegion`s that represent them. I added a test case that includes a `for` loop with a continue clause that is reached along multiple paths just to make sure that we continue to support that case.

The compiler output should not change as a result of this work; this is supposed to be a pure refactoring change.

* Add a pass to resolve scoping issues in generated code

Fixes #569

The basic problem arises because the structured control flow that we output in high-level HLSL/GLSL doesn't match the "scoping" rules of an SSA IR.
In particular, SSA says that a value can be used in any block that is dominated by the definition, but in the presence of `break` and `continue` statements it is easy to construct cases where a block dominates something that is not in its scope for structured control flow. Consider:

```hlsl
for(;;) {
    int a = xyz;
    if(a) { int b = a; break; }
    int c = a;
}
int d = b;
```

This program is invalid as HLSL, because the variable `b` is referenced outside of its scope, but if we look at the CFG for this function, it is clear that the block that computes `b` dominated the block that computes `d`. IR optimizations can easily create code like this, so we need to be ready for it.

The previous change added an explicit `Region` structure to represent the structured control flow that we re-form out of the IR, and this change adds a pass that exploits the structuring information to detect cases like the above and introduce temporaries to fix the scoping issue. For example, the pass would change the earlier code block into something like:

```hlsl
int tmp;
for(;;) {
    int a = xyz;
    if(a) { int b = a; tmp = b; break; }
    int c = a;
}
int d = tmp;
```

That is, we introduce a new `tmp` variable at a scope "above" both the definition and use of `b`, and then we copy `b` into that temporary right where it is computed, and then use the temporary instead of the original `b` at the use site.

A few details that came up during the implementation:

* Downstream compilers may get confused by code like the above, and complain that `tmp` may be used before it is initialized, even though the very definition of dominators in a CFG means we don't have to worry about it. Still, I introduced some one-off code to initialize the temporaries just to silence spurious warnings coming from fxc.

* We need to be careful not to apply this logic to "phi nodes" (the parameters of basic blocks) since they will already be turned into temporaries by the emit logic, and trying to introduce temporaries with this pass led to broken code (I still need to investigate why). It may be that a future version of this pass should also take the code out of SSA form, so that we can introduce both kinds of temporaries in a single pass (and maybe eliminate some unnecessary variables by doing basic register allocation).

There is another transformation that could fix some issues of this kind, by moving code out of a structured control-flow construct and to the "join point" after it. For example, we could turn our loop from the start of this commit message into:

```hlsl
for(;;) {
    int a = xyz;
    if(a) { break; }
    int c = a;
}
int b = a;
int d = b;
```

Moving the definition of `b` to after the loop is possible because there is no way to get out of the loop without executing that code anyway. Now the scoping issue for `d`'s use of `b` has gone away, but of course we've introduced a *new* scoping issue for `a`, when it gets used by `b`.

Adding a pass to re-arrange control flow like this could reduce the cases where we have to apply the current pass, but it wouldn't eliminate them entirely. That means such a pass can be deferred to future work.

This change includes a test case the reproduces the original issue, so that we can confirm the fix works.
</content>
</entry>
<entry>
<title>Feature/vulkan first render (#545)</title>
<updated>2018-05-03T18:25:13+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2018-05-03T18:25:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=367f3a78a40731da45ee12b9a18c94707f1d1429'/>
<id>urn:sha1:367f3a78a40731da45ee12b9a18c94707f1d1429</id>
<content type='text'>
* First pass at InputLayout for Vulkan
Add support for RGBA_Float32

* Use VulkanModule and VulkanApi to handle accessing Vulkan types.

* First pass at Vulkan swap chain/Device queue.

* Added VulkanUtil for generic function functions.

* Move more functionality to VulkanApi and VulkanUtil.
Make Buffer able to initialize itself.

* More tidy up around VulkanDeviceQueue

* First pass use of VulkanDeviceQueue in VkRenderer

* First pass use of VulkanSwapChain on VkRenderer

* Added depth formats.
Binding for constant and vertex buffers for Vulkan.

* Setting up VkImageView on backbuffers.

* First pass support for setting up vkRenderPass.

* Fixes to work around Vulkan swap chain/verification issues.

* Added support for Pipeline and a pipeline cache.

* Working without waiting - because use of pipeline cache.

* Added support for VkFramebuffer in Vulkan.

* First pass at creating Vulkan graphics pipeline.

* More efforts to get Vulkan to render.

* Small improvement for checking of Binding flags.

* Removed setConstantBuffers from the Renderer interface - so that all resource binding takes place through the BindingState.
To make this work required a 'hack' in render-test main.cpp - so that the constant buffer binding that is needed in some tests is only added when it doesn't clash.

* RendererID -&gt; unified into RendererType. Added getRendererType to Renderer interface.
Added ProjectionStyle, and function to get from RendererType.
Added getIdentityProjection to RendererUtil - to get projection that is the 'identity' - but hits the same pixels for all projection styles.

* Fix build problem on Win32 on Vulkan where should use VK_NULL_HANDLE.

* Improve naming, comments. Remove dead code.

* Remove unwanted comment.
</content>
</entry>
<entry>
<title>Feature/renderer binding (#489)</title>
<updated>2018-04-17T20:59:03+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2018-04-17T20:59:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=00389a127af8db18a3ec8fe7ad2dd114a65ac024'/>
<id>urn:sha1:00389a127af8db18a3ec8fe7ad2dd114a65ac024</id>
<content type='text'>
* Dx12 rendering works in test framework.

* Turn on dx12 render tests.

* First pass at Resource and TextureResource/BufferResource types.

* Fix bug in Dx11 impl for BufferResource.

* Dx12 supports TextureResource and binds using TextureResource type, and all tests pass.

* Added TextureBuffer::Size type to make handling mips a little simpler.

* Small improvements to Dx12 constant buffer binding
Removed k prefix on an enum

* First pass impl of dx11 createTextureResource
Added setDefaults to TextureResource::Desc and BufferResource::Desc to simplify setup
accessFlags -&gt; cpuAccessFlags
desc -&gt; srcDesc

* Split out generateTextureResource - can produce the texture using createTextureResource on the Renderer.

* Added support for read mapping to Dx11
accessFlags -&gt; cpuAccessFlags
First pass at using TextureResource/BufferResource on Dx11
Some tests fail with this checkin

* TextureResource working on all tests on dx11.

* Construct ResourceBuffers on Dx11 and Dx12 using utility function createInputBufferResource.

* First pass at OpenGl TextureResource

* Small fixes to dx12 and dx11 setup.
Gl working working using BufferResource and TextureResource

* Tidy up around the compareSampler - looks like the previous test was incorrect.

* Small documentation /naming improvements.

* Fix some more small documentation issues.
</content>
</entry>
<entry>
<title>Fixes based on review of feature/dx12 PR. (#473)</title>
<updated>2018-04-03T16:25:51+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2018-04-03T16:25:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=3115ba7a3640937df01ecf60f7ff55f71a3ab7c2'/>
<id>urn:sha1:3115ba7a3640937df01ecf60f7ff55f71a3ab7c2</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Feature/dx12 (#469)</title>
<updated>2018-04-02T23:59:33+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2018-04-02T23:59:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=38d5ef4764e9271ce2360f42b0759a236cddd9fe'/>
<id>urn:sha1:38d5ef4764e9271ce2360f42b0759a236cddd9fe</id>
<content type='text'>
* Fix signed/unsigned comparison warning.

* Split out d3d functions that will work across dx11 and 12.

* Improve slang-test/README.md around command line options.

* Make Guid comparison honor alignment for comparisons, such that mechanism work on architectures that can only do aligned accesses.

* Initial setup of D3D12 Renderer, with presentFrame and clearFrame.

* More support for D3D12

* Added FreeList
* Added D3D12CircularResourceHeap
* First attempt at createBuffer

* First pass at map/unmap.

* First pass binding vertex/constant buffers, and setting up InputLayout. Note that memory is not kept in scope on binding yet.

* First pass of D3DDescriptorHeap

* Small tidy up in render-d3d11. Added D3DDescriptorHeap to project.

* First pass at D3D12 bind state.

* Fix typos in D3D12Resource

* Tidy up Dx11 render binding a little to match more with Dx12 style.

* First pass at Dx12 BindingState

* Handling of the command list d3d12. Support for submitGpuWork and waitForGpu.

* First attempt at Dx12 capture of backbuffer to file.

* First attempt at D3D12 binding for graphics.

* D3D12 setup viewport etc - does now render triangle in render0.hlsl.

* First pass at support for compute on D3D12Renderer

* Use spaces over tabs in D3DUtil

* Tabs to spaces in D3D12DescriptorHeap

* Convert tab-&gt;spaces on render-d3d12.cpp
</content>
</entry>
</feed>
