summaryrefslogtreecommitdiffstats
path: root/source/slang/ir-dce.cpp
diff options
context:
space:
mode:
authorTim Foley <tfoleyNV@users.noreply.github.com>2018-12-17 14:48:48 -0800
committerGitHub <noreply@github.com>2018-12-17 14:48:48 -0800
commit3a02c590afdd2624b2c729e989ada9393d708f75 (patch)
treead09bfe9c1bcc132e211c64854ef22b1b42b3678 /source/slang/ir-dce.cpp
parentd2ddc590601778f309c81f7d19d5e7fed34210de (diff)
Specialize away resource-type function parameters (#759)
* Specialize away resource-type function parameters Work on #397. Introduction ------------ Suppose a user writes a function that takes a resource type as a parameter: ```hlsl float4 getThing(RWStructuredBuffer<float4> buffer, int index) { return buffer[index]; } ``` This function creates challenges when generating code for GLSL-based targets, because a global shader parameter of type `RWStructuredBuffer`: ```hlsl RWStructuredBuffer<float4> gBuffer; ``` translates to a global GLSL `buffer` declaration: ```hlsl buffer _S0 { float4 _data[]; } gBuffer; ``` There is no equivalent to that `buffer` declaration that can be used in function parameter position, and it is illegal in GLSL to pass `gBuffer` into a function. (Aside: yes, we could in principle translate a function parameter like `RWStructuredBuffer<float4> buffer` to `float4 buffer[]`, but that will not in turn generalize to arrays of structured buffers; it is a dead-end strategy) The solution employed by many shader compilers is to "inline everything" to eliminate the need for parameters of resource types, and then rely on dataflow optimization to eliminate locals of resource types. This strategy can of course lead to an increase in code size, and it also means that call stacks are lost when doing step-through debugging. Another serious issue is that an "early `return`" from a function can turn into the equivalent of a multi-level `break` when inlined, and not all of our targets support multi-level `break`. The solution implemented in this change works around some, but not all, of the problems with full inlining. The approach here generates specialized versions of a function like `getThing`, adapted to the actual arguments provided at different call sites. Thus if we have code like: ```hlsl RWStructuredBuffer<float4> gA; RWStructuredBuffer<float4> gB[10]; ... getThing(gA, x); getThing(gA, y); getThing(gB[someVal], z); ``` we will generate two specializations of `getThing`: one specialized for the `buffer` parameter being `gA` and the other for `gB`: ```hlsl float4 getThing_gA(int index) { return gA[index]; } float4 getThing_gB(int _val, int index) { return gB[_val][index]; } ``` and the call sites will change to match: ```hlsl getThing_gA(x); getThing_gA(y); getThing_gB(someVal, z); ``` Note how in the case where the argument being passed in was obtained by indexing into an array of resources, the callee is specialized to the identity of the global shader parameter (`gB`), and now accepts a new parameter to indicate the array index into it. While this description motivates the change based on GLSL output, the same basic issue can arise for other targets. For example, while current HLSL has added the `ConstantBuffer<T>` type, it is not supported on older targets, and it turns out that even dxc does not allow functions to have `ConstantBuffer<T>` parameters. Longer-term, we will likely need to do even more aggressive specialization both in order to generate SPIR-V output directly, and also to deal with function that have return values or `out` parameters of resource types. Implementation -------------- The meat of the change is in `ir-specialize-resources.{h,cpp}`, where we have a pass that looks at all call sites (`IRCall` instructions) in the program, and attempts to replace them with calls to specialized functions, where the specializations are generated on-demand. The code in this pass is heavily commented, so hopefully it serves to explain itself all right. After specialization is complete, we may still have functions like the original `getThing` that will produce invalid code when emitted as GLSL, so we need a way to make sure they don't appear in the output. To date we've had some very ad hoc approaches for ignoring IR constructs that we don't want to affect emitted code, but this change goes ahead and adds a more real dead code elimination (DCE) pass in `ir-dce.{h,cpp}`. This pass follows a straightforward approach of tagging instructions that are "live" and then propagating liveness through the whole program, before making a single pass to delete anything that isn't live. When I first added the DCE pass it eliminated *everything* because there were no "roots" for liveness. I solved this for now by adding a new decoration, `IREntryPointDecoration`, to mark shader entry points in the IR which should always be live (as should anything they depend on). A secondary problem that arose was that for GLSL ray tracing shaders it is possible for the incoming/outgoing payload or attributes parameters to be unused, but eliminating them as dead would change the signature of a shader an potential break the rules for how ray tracing programs communicate. I added a very simple `IRDependsOnDecoration` that allows one IR instruction to keep another alive *as if* it used it, without actually using it. There's also a fixup in the IR dumping logic where I was forgetting to store anything in the mapping from instruction to their names, so that the name of an instruction was getting incremented each time it was referenced. Testing ------- There are three different tests added as part of this change: * The `compute/func-resource-param` test covers the basic `RWStructuredBuffer` case above, which we expect to work fine for D3D11/12, but fail for Vulkan without specialization. * The `cross-compile/func-resource-param-array` test covers the case where we don't just have one resource, but an array of them. This is not an end-to-end compute test primarily because our `render-test` application doesn't yet handle arrays of resources correctly in its binding logic. * The `compute/func-cbuffer-param` test covers the case of a function with a `ConstantBuffer<T>` parameter, which requires specialization to become valid for any of our targets. * fixup: warnings/errors from other compilers * fixup: typos and cleanup * fixup: typos
Diffstat (limited to 'source/slang/ir-dce.cpp')
-rw-r--r--source/slang/ir-dce.cpp325
1 files changed, 325 insertions, 0 deletions
diff --git a/source/slang/ir-dce.cpp b/source/slang/ir-dce.cpp
new file mode 100644
index 000000000..0f037bfe5
--- /dev/null
+++ b/source/slang/ir-dce.cpp
@@ -0,0 +1,325 @@
+// ir-dce.cpp
+#include "ir-dce.h"
+
+#include "ir.h"
+#include "ir-insts.h"
+
+namespace Slang
+{
+
+struct DeadCodeEliminationContext
+{
+ // This type implements a simple global DCE pass over
+ // an entire module.
+ //
+ // We start with member variables to stand in for
+ // the parameters that were passed to the top-level
+ // `eliminateDeadCode` function.
+ //
+ CompileRequest* compileRequest;
+ IRModule* module;
+
+ // Our overall process is going to be to determine
+ // which instructions in the module are "live"
+ // and then eliminate anything that wasn't found to
+ // be live.
+ //
+ // We will track the liveness state by keeping
+ // a set of all instructions we have so far determined
+ // to be live.
+ //
+ HashSet<IRInst*> liveInsts;
+
+ // Querying whether an instruction has been
+ // determined to be live is easy.
+ //
+ bool isInstLive(IRInst* inst)
+ {
+ // The only wrinkle is that we want to safeguard
+ // against a null instruction (there are some
+ // corner cases where we still construct IR
+ // instructions with a null type).
+ //
+ if(!inst) return false;
+
+ return liveInsts.Contains(inst);
+ }
+
+ // We are going to do an iterative analysis
+ // where we mark instructions we know are
+ // live, and then see if that can help us
+ // identify any other instructions that
+ // must also be live.
+ //
+ // For this, we will use a work list of
+ // instructions that have been marked
+ // as live, but for which we haven't
+ // looked at their impact on other
+ // instructions.
+ //
+ List<IRInst*> workList;
+
+ // When we discover that an instruction seems
+ // to be live, we will add it to our set,
+ // and also the work list, but only if we
+ // haven't done so previously.
+ //
+ void markInstAsLive(IRInst* inst)
+ {
+ // Again, we safeguard against null instructions
+ // just in case.
+ //
+ if(!inst) return;
+
+ if(liveInsts.Contains(inst))
+ return;
+ liveInsts.Add(inst);
+ workList.Add(inst);
+ }
+
+ // Given the basic infrastructrure above, let's
+ // dive into the task of actually finding all
+ // the live code in a module.
+ //
+ void processModule()
+ {
+ // First of all, we know that the root module instruction
+ // should be considered as live, because otherwise
+ // we'd end up eliminating it, so that is a
+ // good place to start.
+ //
+ markInstAsLive(module->getModuleInst());
+
+ // Marking the module as live should have
+ // seeded our work list, so we can now start
+ // processing entries off of our work list
+ // until it goes dry.
+ //
+ while( workList.Count() )
+ {
+ auto inst = workList.Last();
+ workList.RemoveLast();
+
+ // At this point we know that `inst` is live,
+ // and we want to start considering which other
+ // instructions must be live because of that
+ // knowlege.
+ //
+ // A first easy case is that the parent (if any)
+ // of a live instruction had better be live, or
+ // else we might delete the parent, and
+ // the child with it.
+ //
+ markInstAsLive(inst->getParent());
+
+ // Next the type of a live instruction, and all
+ // of its operands must also be live, or else
+ // we won't be able to compute its value.
+ //
+ markInstAsLive(inst->getFullType());
+ UInt operandCount = inst->getOperandCount();
+ for( UInt ii = 0; ii < operandCount; ++ii )
+ {
+ markInstAsLive(inst->getOperand(ii));
+ }
+
+ // Finally, we need to consider the children
+ // and decorations of the instruction.
+ //
+ // Note that just because an instruction is
+ // live doesn't mean its children must be, or
+ // else we'd never eliminate *anything* (we
+ // marked the whole module as live, and everything
+ // is a transitive child of the module).
+ //
+ // Decorations, in contrast, are always live if their
+ // parents are (because we don't want to silently drop
+ // decorations). It is still important to *mark*
+ // decorations as live, because they have operands,
+ // and those operands need to be marked as live.
+ // We will fold decorations into the same loop
+ // as children for simplicity.
+ //
+ // To keep the code here simple, we'll defer the
+ // decision of whether a child (or decoration)
+ // should be live when its parent is to a subroutine.
+ //
+ for( auto child : inst->getDecorationsAndChildren() )
+ {
+ if(shouldInstBeLiveIfParentIsLive(child))
+ {
+ // In this case, we know `inst` is live and
+ // its `child` should be live if its parent is,
+ // so the `child` must be live too.
+ //
+ markInstAsLive(child);
+ }
+ }
+ }
+
+ // If our work list runs dry, that means we've reached a steady
+ // state where everything that is transitively relevant to
+ // the "outputs" of the module has been marked as live.
+ //
+ // Now we can simply walk through all of our instructions
+ // recursively and eliminate those that are "dead" by
+ // virtue of not having been found live.
+ //
+ eliminateDeadInstsRec(module->getModuleInst());
+ }
+
+ void eliminateDeadInstsRec(IRInst* inst)
+ {
+ // Given the instruction `inst` we need to eliminate
+ // any dead code at, or under it.
+ //
+ // The easy case is if `inst` is dead (that is, not live).
+ //
+ if( !isInstLive(inst) )
+ {
+ // We can simply remove and deallocate `inst` because it is
+ // dead, and not worry about any of its descendents,
+ // because they must have been dead too (since we always
+ // mark the parent of a live instruction as live).
+ //
+ inst->removeAndDeallocate();
+ }
+ else
+ {
+ // If `inst` is live, then we need to deal with the possibility
+ // that its children/decorations (or descendents in general)
+ // might still be dead.
+ //
+ // The biggest wrinkle is that we walk the linked list of
+ // children/decorations a bit carefully, using a temporary
+ // to hold the next node, in case we eliminate one of
+ // the children as we go.
+ //
+ IRInst* next = nullptr;
+ for( IRInst* child = inst->getFirstDecorationOrChild(); child; child = next )
+ {
+ next = child->getNextInst();
+ eliminateDeadInstsRec(child);
+ }
+ }
+ }
+
+ // Now we come to the decision procedure we put off before:
+ // should a given `inst` be live if its parent is?
+ //
+ bool shouldInstBeLiveIfParentIsLive(IRInst* inst)
+ {
+ // The main source of confusion/complexity here is that
+ // we are using the same routine to decide:
+ //
+ // * Should some ordinary instruction in a basic block be kept around?
+ // * Should a basic block in some function be kept around?
+ // * Should a function/type/variable in a module be kept around?
+ //
+ // Still, there are a few basic patterns we can observe.
+ // First, if `inst` is an instruction that might have some effects
+ // when it is executed, then we should keep it around.
+ //
+ if(inst->mightHaveSideEffects())
+ return true;
+ //
+ // The `mightHaveSideEffects` query is conservative, and will
+ // return `true` as its default mode, so once we are past that
+ // query we know that `inst` is either something "structural"
+ // (that makes up the program) rather than executable, or it
+ // is executable but was on a white list of things that are
+ // safe to eliminate.
+
+ // Most top-level objects (functions, types, etc.) obviously
+ // do *not* have side effects. That creates the risk that
+ // we'll just go ahead and eliminate every single function/type
+ // in a module. There needs to be a way to identify the
+ // functions we want to keep around, and for right now
+ // that is handled with the `[entryPoint]` decoration.
+ //
+ if(inst->findDecorationImpl(kIROp_EntryPointDecoration))
+ return true;
+ //
+ // TODO: Eventually it would make sense to consider everything
+ // with an `[export(...)]` decoration as live, but our current
+ // approach to linking for back-end compilation leaves many
+ // linkage decorations in place that we seemingly don't need/want.
+
+ // A basic block is an interesting case. Knowing that a function
+ // is live means that its entry block is live, but the liveness
+ // of any other blocks is determined by whether they are referenced
+ // by other instructions (e.g., a branch from one block to
+ // another).
+ //
+ if( auto block = as<IRBlock>(inst) )
+ {
+ // To determine whether this is the first block in its
+ // parent function (or what-have-you) we can simply
+ // check if there is a previous block before it.
+ //
+ auto prevBlock = block->getPrevBlock();
+ return prevBlock == nullptr;
+ }
+
+ // There are a few special cases of "structural" instructions
+ // that we don't want to eliminate, so we'll check for those next.
+ //
+ switch( inst->op )
+ {
+ // Function parameters obviously shouldn't get eliminated,
+ // even if nothing references them, and block parameters
+ // (phi nodes) will be considered live when their block is,
+ // just so that we don't have to deal with any complications
+ // around re-writing the relevant inter-block argument passing.
+ //
+ // TODO: A smarter DCE pass could deal with this case more
+ // carefully, or we could improve the interprocedural SCCP
+ // pass to deal with block parameters instead.
+ //
+ case kIROp_Param:
+ return true;
+
+ // IR struct types and witness tables are currently kludged
+ // so that they have child instructions that represent their
+ // entries (effectively `(key,value)` pairs), and those child
+ // instructions are never directly referenced (e.g., an access
+ // to a struct field references the *key* but not the `(key,value)`
+ // pair that is the `IRField` instruction.
+ //
+ // TODO: at some point the IR should use a different representation
+ // for struct types and witness tables that does away with
+ // this problem.
+ //
+ case kIROp_StructField:
+ case kIROp_WitnessTableEntry:
+ return true;
+
+ default:
+ break;
+ }
+
+ // If none of the explicit cases above matched, then we will consider
+ // the instruction to not be live just because its parent is. Further
+ // analysis could still lead to a change in the status of `inst`, if
+ // an instruction that uses it as an operand is marked live.
+ //
+ return false;
+ }
+};
+
+// The top-level function for invoking the DCE pass
+// is straighforward. We set up the context object
+// and then defer to it for the real work.
+//
+void eliminateDeadCode(
+ CompileRequest* compileRequest,
+ IRModule* module)
+{
+ DeadCodeEliminationContext context;
+ context.compileRequest = compileRequest;
+ context.module = module;
+
+ context.processModule();
+}
+
+}