diff options
| author | Tim Foley <tfoleyNV@users.noreply.github.com> | 2018-12-17 14:48:48 -0800 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2018-12-17 14:48:48 -0800 |
| commit | 3a02c590afdd2624b2c729e989ada9393d708f75 (patch) | |
| tree | ad09bfe9c1bcc132e211c64854ef22b1b42b3678 /source/slang/emit.cpp | |
| parent | d2ddc590601778f309c81f7d19d5e7fed34210de (diff) | |
Specialize away resource-type function parameters (#759)
* Specialize away resource-type function parameters
Work on #397.
Introduction
------------
Suppose a user writes a function that takes a resource type as a parameter:
```hlsl
float4 getThing(RWStructuredBuffer<float4> buffer, int index)
{
return buffer[index];
}
```
This function creates challenges when generating code for GLSL-based targets, because a global shader parameter of type `RWStructuredBuffer`:
```hlsl
RWStructuredBuffer<float4> gBuffer;
```
translates to a global GLSL `buffer` declaration:
```hlsl
buffer _S0
{
float4 _data[];
} gBuffer;
```
There is no equivalent to that `buffer` declaration that can be used in function parameter position, and it is illegal in GLSL to pass `gBuffer` into a function.
(Aside: yes, we could in principle translate a function parameter like `RWStructuredBuffer<float4> buffer` to `float4 buffer[]`, but that will not in turn generalize to arrays of structured buffers; it is a dead-end strategy)
The solution employed by many shader compilers is to "inline everything" to eliminate the need for parameters of resource types, and then rely on dataflow optimization to eliminate locals of resource types. This strategy can of course lead to an increase in code size, and it also means that call stacks are lost when doing step-through debugging. Another serious issue is that an "early `return`" from a function can turn into the equivalent of a multi-level `break` when inlined, and not all of our targets support multi-level `break`.
The solution implemented in this change works around some, but not all, of the problems with full inlining.
The approach here generates specialized versions of a function like `getThing`, adapted to the actual arguments provided at different call sites.
Thus if we have code like:
```hlsl
RWStructuredBuffer<float4> gA;
RWStructuredBuffer<float4> gB[10];
...
getThing(gA, x);
getThing(gA, y);
getThing(gB[someVal], z);
```
we will generate two specializations of `getThing`: one specialized for the `buffer` parameter being `gA` and the other for `gB`:
```hlsl
float4 getThing_gA(int index) { return gA[index]; }
float4 getThing_gB(int _val, int index) { return gB[_val][index]; }
```
and the call sites will change to match:
```hlsl
getThing_gA(x);
getThing_gA(y);
getThing_gB(someVal, z);
```
Note how in the case where the argument being passed in was obtained by indexing into an array of resources, the callee is specialized to the identity of the global shader parameter (`gB`), and now accepts a new parameter to indicate the array index into it.
While this description motivates the change based on GLSL output, the same basic issue can arise for other targets.
For example, while current HLSL has added the `ConstantBuffer<T>` type, it is not supported on older targets, and it turns out that even dxc does not allow functions to have `ConstantBuffer<T>` parameters.
Longer-term, we will likely need to do even more aggressive specialization both in order to generate SPIR-V output directly, and also to deal with function that have return values or `out` parameters of resource types.
Implementation
--------------
The meat of the change is in `ir-specialize-resources.{h,cpp}`, where we have a pass that looks at all call sites (`IRCall` instructions) in the program, and attempts to replace them with calls to specialized functions, where the specializations are generated on-demand.
The code in this pass is heavily commented, so hopefully it serves to explain itself all right.
After specialization is complete, we may still have functions like the original `getThing` that will produce invalid code when emitted as GLSL, so we need a way to make sure they don't appear in the output.
To date we've had some very ad hoc approaches for ignoring IR constructs that we don't want to affect emitted code, but this change goes ahead and adds a more real dead code elimination (DCE) pass in `ir-dce.{h,cpp}`.
This pass follows a straightforward approach of tagging instructions that are "live" and then propagating liveness through the whole program, before making a single pass to delete anything that isn't live.
When I first added the DCE pass it eliminated *everything* because there were no "roots" for liveness.
I solved this for now by adding a new decoration, `IREntryPointDecoration`, to mark shader entry points in the IR which should always be live (as should anything they depend on).
A secondary problem that arose was that for GLSL ray tracing shaders it is possible for the incoming/outgoing payload or attributes parameters to be unused, but eliminating them as dead would change the signature of a shader an potential break the rules for how ray tracing programs communicate.
I added a very simple `IRDependsOnDecoration` that allows one IR instruction to keep another alive *as if* it used it, without actually using it.
There's also a fixup in the IR dumping logic where I was forgetting to store anything in the mapping from instruction to their names, so that the name of an instruction was getting incremented each time it was referenced.
Testing
-------
There are three different tests added as part of this change:
* The `compute/func-resource-param` test covers the basic `RWStructuredBuffer` case above, which we expect to work fine for D3D11/12, but fail for Vulkan without specialization.
* The `cross-compile/func-resource-param-array` test covers the case where we don't just have one resource, but an array of them. This is not an end-to-end compute test primarily because our `render-test` application doesn't yet handle arrays of resources correctly in its binding logic.
* The `compute/func-cbuffer-param` test covers the case of a function with a `ConstantBuffer<T>` parameter, which requires specialization to become valid for any of our targets.
* fixup: warnings/errors from other compilers
* fixup: typos and cleanup
* fixup: typos
Diffstat (limited to 'source/slang/emit.cpp')
| -rw-r--r-- | source/slang/emit.cpp | 105 |
1 files changed, 86 insertions, 19 deletions
diff --git a/source/slang/emit.cpp b/source/slang/emit.cpp index 39616b3df..f8b14b0f3 100644 --- a/source/slang/emit.cpp +++ b/source/slang/emit.cpp @@ -2,9 +2,11 @@ #include "emit.h" #include "../core/slang-writer.h" +#include "ir-dce.h" #include "ir-insts.h" #include "ir-restructure.h" #include "ir-restructure-scoping.h" +#include "ir-specialize-resources.h" #include "ir-ssa.h" #include "ir-validate.h" #include "legalize-types.h" @@ -5603,21 +5605,56 @@ struct EmitVisitor emit("}\n"); } + /// Emit the array brackets that go on the end of a declaration of the given type. void emitArrayBrackets( EmitContext* ctx, - IRType* type) + IRType* inType) { SLANG_UNUSED(ctx); - if(auto arrayType = as<IRArrayType>(type)) - { - emit("["); - EmitVal(arrayType->getElementCount(), kEOp_General); - emit("]"); - } - else if(auto unsizedArrayType = as<IRUnsizedArrayType>(type)) + // A declaration may require zero, one, or + // more array brackets. When writing out array + // brackets from left to right, they represent + // the structure of the type from the "outside" + // in (that is, if we have a 5-element array of + // 3-element arrays we should output `[5][3]`), + // because of C-style declarator rules. + // + // This conveniently means that we can print + // out all the array brackets with a looping + // rather than a recursive structure. + // + // We will peel the input type like an onion, + // looking at one layer at a time until we + // reach a non-array type in the middle. + // + IRType* type = inType; + for(;;) { - emit("[]"); + if(auto arrayType = as<IRArrayType>(type)) + { + emit("["); + EmitVal(arrayType->getElementCount(), kEOp_General); + emit("]"); + + // Continue looping on the next layer in. + // + type = arrayType->getElementType(); + } + else if(auto unsizedArrayType = as<IRUnsizedArrayType>(type)) + { + emit("[]"); + + // Continue looping on the next layer in. + // + type = unsizedArrayType->getElementType(); + } + else + { + // This layer wasn't an array, so we are done. + // + return; + } } } @@ -5752,16 +5789,6 @@ struct EmitVisitor emit(";\n"); } - IRType* unwrapArray(IRType* type) - { - IRType* t = type; - while( auto arrayType = as<IRArrayTypeBase>(t) ) - { - t = arrayType->getElementType(); - } - return t; - } - void emitIRStructuredBuffer_GLSL( EmitContext* ctx, IRGlobalParam* varDecl, @@ -6546,6 +6573,46 @@ String emitEntryPoint( #endif validateIRModuleIfEnabled(compileRequest, irModule); + // After type legalization and subsequent SSA cleanup we expect + // that any resource types passed to functions are exposed + // as their own top-level parameters (which might have + // resource or array-of-...-resource types). + // + // Many of our targets place restrictions on how certain + // resource types can be used, so that having them as + // function parameters is invalid. To clean this up, + // we will try to specialize called functions based + // on the actual resources that are being passed to them + // at specific call sites. + // + // Because the legalization may depend on what target + // we are compiling for (certain things might be okay + // for D3D targets that are not okay for Vulkan), we + // pass down the target request along with the IR. + // + specializeResourceParameters(compileRequest, targetRequest, irModule); + +#if 0 + dumpIRIfEnabled(compileRequest, irModule, "AFTER RESOURCE SPECIALIZATION"); +#endif + validateIRModuleIfEnabled(compileRequest, irModule); + + // The resource-based specialization pass above + // may create specialized versions of functions, but + // it does not try to completely eliminate the original + // functions, so there might still be invalid code in + // our IR module. + // + // To clean up the code, we will apply a fairly general + // dead-code-elimination (DCE) pass that only retains + // whatever code is "live." + // + eliminateDeadCode(compileRequest, irModule); +#if 0 + dumpIRIfEnabled(compileRequest, irModule, "AFTER DCE"); +#endif + validateIRModuleIfEnabled(compileRequest, irModule); + // After all of the required optimization and legalization // passes have been performed, we can emit target code from // the IR module. |
