diff options
| author | Tim Foley <tfoleyNV@users.noreply.github.com> | 2018-02-07 14:37:37 -0800 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2018-02-07 14:37:37 -0800 |
| commit | be8b891c4e0b7541a1c5f1aafa6f562113d5cdcb (patch) | |
| tree | 471279604e75ba6c28bf999da09fa57955b09757 /source | |
| parent | 1fbc73d96fbc0a199d823dfb38fc8f02bf7ada0a (diff) | |
Generate SSA form for IR functions (#400)
* Generate SSA form for IR functions
The basic idea here is simple: in the front-end after we have lowered the AST to initial IR we will apply a set of "mandatory" optimization passes. The first of these is to attempt to translate the all functions into SSA form so that they are amenable to subsequent dataflow optimizations. Eventually, the mandatory optimization passes would include diagnostic passes that make sure variables aren't used when undefined, etc.
Just doing basic SSA generation already cleans up a lot of the messiness in our IR today, because constructs that used to involve many local variables can now be handled via SSA temporaries.
The implementation of SSA generation is in `ir-ssa.cpp`, and it follows the approach of Braun et al.'s "Simple and Efficient Construction of Static Single Assignment Form." I used this instead of the more well-known Cytron et al. algorithm because Braun's algorith mis very simple to code, and does not require auxiliary analyses to generate the dominance frontier.
The main wrinkle in our SSA representation right now is that instead of using ordinary phi nodes, we instead allow basic blocks to have parameters, where predecessor blocks pass in different parameter values. This encodes information equivalent to traditional phi nodes, but has two (small) benefits:
1. There is no fixed relationship between the order of phi operands and predecessor blocks, so we don't have to worry about breaking the phis when we alter the order in which predecessors are stored. This is important for us because predecessors are being stored implicitly.
2. It is easy to operationalize a "branch with arguments" either when lowering to other languages, or when interpreting the IR. A branch with arguments is implemented as a sequence of stores from the arguments to the parameters of the target block (very similar to a call), followed by a jump to the block.
Relevant to the above, this change also adds an interface for enumerating the predecessors or successors of a block in our CFG. Rather than use an auxliary structure, we directly use the information already encoded in the IR:
* The sucessors of a block are the target label operands of its terminator instruction. In our IR this is a contiguous range of `IRUse`s, possible with a stride (to account for the way `switch` interleaves values and blocks).
* The predecessors of a block are a subset of the uses of the block's value. Specifically, they are any uses that are on a terminator instruction, and within the range of values that represent the successor list of that instruction.
One important limitation of the "blocks with arguments" model for handling phis is that it is really only convenient to stash extra arguments on an unconditional terminator instruction. This change works around this prob lem by breaking any "critical edges" - edges between a block with multiple successors and one with multiple predecessors. We assume that "phi" nodes will only ever be needed on a block with multiple predecessors, and because critical edges are broken, each of these predecessors will then have only a single successor, so its branch instruction can handle the extra arguments.
This change introduces a notion of an "undefined" instruction in the IR. This is handled as an instruction rather than a value because I anticipate that we will want to distinguish different undefined values when it comes time to start issuing error messages (those messages will need to point to the variable that was used when undefined).
* Fix expected test output.
Another change was merged that enabled the `glsl-parameter-blocks` test, and its output is affected by our IR optimization work.
Diffstat (limited to 'source')
| -rw-r--r-- | source/slang/emit.cpp | 114 | ||||
| -rw-r--r-- | source/slang/ir-inst-defs.h | 2 | ||||
| -rw-r--r-- | source/slang/ir-insts.h | 16 | ||||
| -rw-r--r-- | source/slang/ir-ssa.cpp | 848 | ||||
| -rw-r--r-- | source/slang/ir-ssa.h | 9 | ||||
| -rw-r--r-- | source/slang/ir.cpp | 261 | ||||
| -rw-r--r-- | source/slang/ir.h | 62 | ||||
| -rw-r--r-- | source/slang/lower-to-ir.cpp | 37 | ||||
| -rw-r--r-- | source/slang/slang.vcxproj | 2 | ||||
| -rw-r--r-- | source/slang/slang.vcxproj.filters | 2 | ||||
| -rw-r--r-- | source/slang/vm.cpp | 40 |
11 files changed, 1381 insertions, 12 deletions
diff --git a/source/slang/emit.cpp b/source/slang/emit.cpp index f9f24fcf6..eaabd520c 100644 --- a/source/slang/emit.cpp +++ b/source/slang/emit.cpp @@ -6074,6 +6074,14 @@ emitDeclImpl(decl, nullptr); emit(";\n"); break; + case kIROp_undefined: + { + auto type = inst->getType(); + emitIRType(ctx, type, getIRName(inst)); + emit(";\n"); + } + break; + case kIROp_Var: { auto ptrType = inst->getType(); @@ -6202,6 +6210,37 @@ emitDeclImpl(decl, nullptr); } } + // When we are about to traverse an edge from one block to another, + // we need to emit the assignments that conceptually occur "along" + // the edge. In traditional SSA these are the phi nodes in the + // target block, while in our representation these use the arguments + // to the branch instruction to fill in the parameters of the target. + void emitPhiVarAssignments( + EmitContext* ctx, + UInt argCount, + IRUse* args, + IRBlock* targetBlock) + { + UInt argCounter = 0; + for (auto pp = targetBlock->getFirstParam(); pp; pp = pp->getNextParam()) + { + UInt argIndex = argCounter++; + + if (argIndex >= argCount) + { + assert(!"not enough arguments for branch"); + break; + } + + IRValue* arg = args[argIndex].usedValue; + + emitIROperand(ctx, pp); + emit(" = "); + emitIROperand(ctx, arg); + emit(";\n"); + } + } + // We want to emit a range of code in the IR, represented // by the blocks that are logically in the interval [begin, end) // which we consider as a single-entry multiple-exit region. @@ -6305,6 +6344,14 @@ emitDeclImpl(decl, nullptr); auto breakBlock = t->getBreakBlock(); auto continueBlock = t->getContinueBlock(); + UInt argCount = t->getArgCount(); + static const UInt kFixedArgCount = 3; + emitPhiVarAssignments( + ctx, + argCount - kFixedArgCount, + t->getArgs() + kFixedArgCount, + targetBlock); + if (auto loopControlDecoration = t->findDecoration<IRLoopControlDecoration>()) { switch (loopControlDecoration->mode) @@ -6388,7 +6435,20 @@ emitDeclImpl(decl, nullptr); break; case kIROp_break: - emit("break;\n"); + { + auto t = (IRBreak*)terminator; + auto targetBlock = t->getTargetBlock(); + + UInt argCount = t->getArgCount(); + static const UInt kFixedArgCount = 1; + emitPhiVarAssignments( + ctx, + argCount - kFixedArgCount, + t->getArgs() + kFixedArgCount, + targetBlock); + + emit("break;\n"); + } return; case kIROp_continue: @@ -6405,6 +6465,15 @@ emitDeclImpl(decl, nullptr); { auto continueInst = (IRContinue*) terminator; auto targetBlock = continueInst->getTargetBlock(); + + UInt argCount = continueInst->getArgCount(); + static const UInt kFixedArgCount = 1; + emitPhiVarAssignments( + ctx, + argCount - kFixedArgCount, + continueInst->getArgs() + kFixedArgCount, + targetBlock); + IRBlock* loopHead = nullptr; ctx->shared->irMapContinueTargetToLoopHead.TryGetValue(targetBlock, loopHead); SLANG_ASSERT(loopHead); @@ -6422,10 +6491,16 @@ emitDeclImpl(decl, nullptr); auto t = (IRLoopTest*)terminator; auto afterBlock = t->getTrueBlock(); + auto exitBlock = t->getFalseBlock(); emit("if("); emitIROperand(ctx, t->getCondition()); - emit(")\n{} else break;\n"); + emit(")\n{} else {\n"); + emitIRStmtsForBlocks( + ctx, + exitBlock, + nullptr); + emit("}\n"); // Continue with the block after the test block = afterBlock; @@ -6440,6 +6515,16 @@ emitDeclImpl(decl, nullptr); // block, or a backward edge to the top // of a loop. auto t = (IRUnconditionalBranch*)terminator; + auto targetBlock = t->getTargetBlock(); + + UInt argCount = t->getArgCount(); + static const UInt kFixedArgCount = 1; + emitPhiVarAssignments( + ctx, + argCount - kFixedArgCount, + t->getArgs() + kFixedArgCount, + targetBlock); + block = t->getTargetBlock(); } break; @@ -6758,6 +6843,27 @@ emitDeclImpl(decl, nullptr); } } + void emitPhiVarDecls( + EmitContext* ctx, + IRFunc* func) + { + // We will skip the first block, since its parameters are + // the parameters of the whole function. + auto bb = func->getFirstBlock(); + if (!bb) + return; + bb = bb->getNextBlock(); + + for (; bb; bb = bb->getNextBlock()) + { + for (auto pp = bb->getFirstParam(); pp; pp = pp->getNextParam()) + { + emitIRType(ctx, pp->getType(), getIRName(pp)); + emit(";\n"); + } + } + } + void emitIRSimpleFunc( EmitContext* ctx, IRFunc* func) @@ -6816,6 +6922,10 @@ emitDeclImpl(decl, nullptr); { emit("\n{\n"); + // HACK: forward-declare all the local variables needed for the + // prameters of non-entry blocks. + emitPhiVarDecls(ctx, func); + // Need to emit the operations in the blocks of the function emitIRStmtsForBlocks(ctx, func->getFirstBlock(), nullptr); diff --git a/source/slang/ir-inst-defs.h b/source/slang/ir-inst-defs.h index fa02da683..38515e6f0 100644 --- a/source/slang/ir-inst-defs.h +++ b/source/slang/ir-inst-defs.h @@ -96,6 +96,8 @@ INST(IntLit, integer_constant, 0, 0) INST(FloatLit, float_constant, 0, 0) INST(decl_ref, decl_ref, 0, 0) +INST(undefined, undefined, 0, 0) + INST(specialize, specialize, 2, 0) INST(lookup_interface_method, lookup_interface_method, 2, 0) INST(lookup_witness_table, lookup_witness_table, 2, 0) diff --git a/source/slang/ir-insts.h b/source/slang/ir-insts.h index 46a804a1c..266374545 100644 --- a/source/slang/ir-insts.h +++ b/source/slang/ir-insts.h @@ -326,6 +326,15 @@ struct IRWitnessTable : IRGlobalValue IRValueList<IRWitnessTableEntry> entries; }; +// An instruction that yields an undefined value. +// +// Note that we make this an instruction rather than a value, +// so that we will be able to identify a variable that is +// used when undefined. +struct IRUndefined : IRInst +{ +}; + // Description of an instruction to be used for global value numbering struct IRInstKey { @@ -394,6 +403,7 @@ struct IRBuilder IRValue* getBoolValue(bool value); IRValue* getIntValue(IRType* type, IRIntegerValue value); IRValue* getFloatValue(IRType* type, IRFloatingPointValue value); + IRValue* getDeclRefVal( DeclRefBase const& declRef); IRValue* getTypeVal(IRType* type); // create an IR value that represents a type @@ -443,6 +453,10 @@ struct IRBuilder UInt argCount, IRValue* const* args); + IRUndefined* emitUndefined(IRType* type); + + + IRModule* createModule(); IRFunc* createFunc(); @@ -458,6 +472,8 @@ struct IRBuilder IRBlock* createBlock(); IRBlock* emitBlock(); + IRParam* createParam( + IRType* type); IRParam* emitParam( IRType* type); diff --git a/source/slang/ir-ssa.cpp b/source/slang/ir-ssa.cpp new file mode 100644 index 000000000..366ecc279 --- /dev/null +++ b/source/slang/ir-ssa.cpp @@ -0,0 +1,848 @@ +// ir-ssa.cpp +#include "ir-ssa.h" + +#include "ir.h" +#include "ir-insts.h" + +namespace Slang { + +// Track information on a phi node we are in +// the process of constructing. +struct PhiInfo : RefObject +{ + // The phi node will be represented as a parameter + // to a (non-entry) basic block. + IRParam* phi; + + // The original variable that this phi will be replacing. + IRVar* var; + + // The operands to the phi will be stored as uses here, + // because our IR parameters don't have operands. + // + // Once we've collected all the values we plan to use, + // we will turn this into argument in predecessor blocks + // that branch to this one. + // + // The order of elements in this list must match the + // order in which the predecessor blocks get enumerated. + List<IRUse> operands; + + // Did we end up removing this phi because it was determined + // to be trivial? + bool wasRemoved = false; +}; + +// Information about a basic block that we generate/use +// during SSA construction. +struct SSABlockInfo : RefObject +{ + // Map a promotable variable to the value to + // use for that variable + Dictionary<IRVar*, IRValue*> valueForVar; + + // The underlying basic block. + IRBlock* block; + + // Have we processed all the instructions in the + // body of this block (so that we would have + // found any stores to SSA variables)? + bool isFilled = false; + + // Have we filled all the predecessors of + // this block, so that we can actually perform + // look up in them? + bool isSealed = false; + + // An IR builder to use when we want to construct + // stuff in the context of this block + IRBuilder builder; + + // Phi nodes we are creating for this block. + List<PhiInfo*> phis; + + // Arguments that this block needs to pass along + // to the phi nodes defined by is sucessor + List<IRValue*> successorArgs; +}; + +// State for constructing SSA form for a global value +// with code (usually a function). +struct ConstructSSAContext +{ + // The value that we want to rewrite into SSA form + // (usually an IR function) + IRGlobalValueWithCode* globalVal; + + // Variables that we've identified for promotion + // to SSA values. + List<IRVar*> promotableVars; + + // Information about each basic block + Dictionary<IRBlock*, RefPtr<SSABlockInfo>> blockInfos; + + // IR building state to use during the operation + SharedIRBuilder sharedBuilder; + + + Dictionary<IRParam*, RefPtr<PhiInfo>> phiInfos; + + PhiInfo* getPhiInfo(IRParam* phi) + { + return *phiInfos.TryGetValue(phi); + } +}; + +// Is the given variable one that we can promote to SSA form? +bool isPromotableVar( + ConstructSSAContext* /*context*/, + IRVar* var) +{ + // If the variable is only used directly as the pointer + // operand of load and store instructions, then it should + // be promote-able. + // + // For now, we won't deal with cases where the variable + // is an aggregate an we sometimes pull out individual + // fields or elements. This is an important extension, + // but we probably also need to think about scalarizing + // any aggregates when we promote them to registers. + + for (auto u = var->firstUse; u; u = u->nextUse) + { + auto user = u->user; + switch (user->op) + { + default: + // If the variable gets used by any operation + // we can't account for directly, then it isn't + // promotable. + return false; + + case kIROp_Load: + { + // A load has only a single argument, so + // it had better be our pointer. + assert(u == &((IRLoad*) user)->ptr); + } + break; + + case kIROp_Store: + { + auto storeInst = (IRStore*)user; + + // We don't want to promote a variable if + // its address gets stored into another + // variable, so check for that case. + if (u == &storeInst->val) + return false; + + // Otherwise our variable is being used + // as the destination for the store, and + // that is okay by us. + assert(u == &storeInst->ptr); + } + break; + } + } + + // If all of the uses passed our checking, then + // we are good to go. + return true; +} + +// Identify local variables that can be promoted to SSA form +void identifyPromotableVars( + ConstructSSAContext* context) +{ + for (auto bb = context->globalVal->getFirstBlock(); bb; bb = bb->getNextBlock()) + { + for (auto ii = bb->getFirstInst(); ii; ii = ii->getNextInst()) + { + if (ii->op != kIROp_Var) + continue; + + IRVar* var = (IRVar*)ii; + + if (isPromotableVar(context, var)) + { + context->promotableVars.Add(var); + } + } + } +} + +IRVar* asPromotableVar( + ConstructSSAContext* context, + IRValue* value) +{ + if (value->op != kIROp_Var) + return nullptr; + + IRVar* var = (IRVar*)value; + if (!context->promotableVars.Contains(var)) + return nullptr; + + return var; +} + +// Try to read the value of an SSA variable +// in the context of the given block. If +// the variable is defined in the block, then +// that value will be used. If not, this all +// may recursively work its way up through +// the predecessors of the block. +IRValue* readVar( + ConstructSSAContext* context, + SSABlockInfo* blockInfo, + IRVar* var); + + +// Add a phi node to represent the given variable +PhiInfo* addPhi( + ConstructSSAContext* context, + SSABlockInfo* blockInfo, + IRVar* var) +{ + auto builder = &blockInfo->builder; + IRParam* phi = builder->createParam(var->getType()->getValueType()); + + RefPtr<PhiInfo> phiInfo = new PhiInfo(); + context->phiInfos.Add(phi, phiInfo); + + phiInfo->phi = phi; + phiInfo->var = var; + + blockInfo->phis.Add(phiInfo); + + return phiInfo; +} + +IRValue* tryRemoveTrivialPhi( + ConstructSSAContext* /*context*/, + PhiInfo* phiInfo) +{ + auto phi = phiInfo->phi; + + // We are going to check if all of the operands + // to the phi are either the same, or are equal + // to the phi itself. + + IRValue* same = nullptr; + for (auto u : phiInfo->operands) + { + auto usedVal = u.usedValue; + assert(usedVal); + + if (usedVal == same || usedVal == phi) + { + // Either this is a self-reference, or it refers + // to the same value we've seen already. + continue; + } + if (same != nullptr) + { + // We've found at least two distinct values + // other than the phi itself, so this phi + // indeed appears to be non-trivial. + // + // We will keep the phi around. + return phi; + } + else + { + // This value is distinct from the phi itself, + // so we need to track its value. + same = usedVal; + } + } + + if (!same) + { + // There were no operands other than the phi itself. + // This implies that the value at the use sites should + // actually be undefined. + + assert(!"unimplemented"); + } + + // replace uses of the phi (including its possible uses + // of itself) with the unique non-phi value. + phi->replaceUsesWith(same); + + // We will mark the phi as removed in the `PhiInfo` structure, + // so that we can avoid adding it to the block later. + phiInfo->wasRemoved = true; + + // TODO: we need a way to record that this parameter + // is no longer to be used... + + // TODO: need to recursively consider chances to simplify + // other phi nodes now. + + return same; +} + +IRValue* addPhiOperands( + ConstructSSAContext* context, + SSABlockInfo* blockInfo, + PhiInfo* phiInfo) +{ + auto var = phiInfo->var; + + auto block = blockInfo->block; + + List<IRValue*> operandValues; + for (auto predBlock : block->getPredecessors()) + { + // Precondition: if we have multiple predecessors, then + // each must have only one successor (no critical edges). + // + assert(predBlock->getSuccessors().getCount() == 1); + + auto predInfo = *context->blockInfos.TryGetValue(predBlock); + + auto phiOperand = readVar(context, predInfo, var); + + operandValues.Add(phiOperand); + } + + // The `IRUse` type needs to stay at a stable location + // since they get threaded into lists. We allocate the + // list with its final size so that we can preserve the + // required invariant. + + UInt operandCount = operandValues.Count(); + phiInfo->operands.SetSize(operandCount); + for(UInt ii = 0; ii < operandCount; ++ii) + { + phiInfo->operands[ii].init(nullptr, operandValues[ii]); + } + + return tryRemoveTrivialPhi(context, phiInfo); +} + +void writeVar( + ConstructSSAContext* /*context*/, + SSABlockInfo* blockInfo, + IRVar* var, + IRValue* val) +{ + blockInfo->valueForVar[var] = val; +} + +void maybeSealBlock( + ConstructSSAContext* context, + SSABlockInfo* blockInfo) +{ + // We can't seal a block that has already been sealed. + if (blockInfo->isSealed) + return; + + // We can't seal a block until all of its predecessors + // have been filled. + for (auto pp : blockInfo->block->getPredecessors()) + { + auto predInfo = *context->blockInfos.TryGetValue(pp); + if (!predInfo->isFilled) + return; + } + + // All the checks passed, so it seems like we can be sealed. + + // We will loop over any incomplete phis that have been recoreded + // for this block, and complete them here. + // + // Note that we are doing the "inefficient" loop where we compute + // the count on each iteration to account for the possibility that + // new incomplete phis will get added while we are working. + for (UInt ii = 0; ii < blockInfo->phis.Count(); ++ii) + { + auto incompletePhi = blockInfo->phis[ii]; + addPhiOperands(context, blockInfo, incompletePhi); + } + + // After we've completed all our incomplete phis, we can mark this + // block as sealed and move along. + blockInfo->isSealed = true; +} + +IRValue* readVarRec( + ConstructSSAContext* context, + SSABlockInfo* blockInfo, + IRVar* var) +{ + IRValue* val = nullptr; + if (!blockInfo->isSealed) + { + // If block isn't sealed, we need to + // speculatively add a phi to it. + // This phi may get removed later, once + // we are able to seal this block. + + PhiInfo* phiInfo = addPhi(context, blockInfo, var); + val = phiInfo->phi; + } + else + { + // If the block is sealed, then we are free to look at + // it predecessor list, and use that to decide what to do. + auto predecessors = blockInfo->block->getPredecessors(); + + // + IRBlock* firstPred = nullptr; + bool multiplePreds = false; + for (auto pp : predecessors) + { + if (!firstPred) + { + // A candidate for the sole predecessor + firstPred = pp; + } + else if (pp == firstPred) + { + // Same as existing predecessor + } + else + { + // Multiple unique predecessors + multiplePreds = true; + } + } + + if (!firstPred) + { + // The block had *no* predecssors. This will commonly + // happen for the entry block, but could also conceivably + // happen for a block that is somehow disconnected + // from the CFG and thus unreachable. + + // We would only reach this function (`readVarRec`) if + // a local lookup in the block had already failed, so + // at this point we are dealing with an undefined value. + + auto type = var->getType()->getValueType(); + val = blockInfo->builder.emitUndefined(type); + } + else if (!multiplePreds) + { + // There is only a single predecessor for this block, + // so there is no need to insert a phi. Instead, we + // just perform the lookup step recursively in + // the predecessor. + auto predInfo = *context->blockInfos.TryGetValue(firstPred); + val = readVar(context, predInfo, var); + } + else + { + // The default/fallback case requires us to create + // a phi node in the current block, and then look + // up the appropriate operands in the predecessor + // blocks, which will eventually become the operands + // that drive the phi. + + // Create the phi node for the given variable + PhiInfo* phiInfo = addPhi(context, blockInfo, var); + + // Mark the phi as the value for the variable inside + // this block + writeVar(context, blockInfo, var, phiInfo->phi); + + // Now add operands to the phi and maybe simplify + // it, based on what gets found. + + val = addPhiOperands(context, blockInfo, phiInfo); + } + } + + // Whatever value we find, we need to mark it as the + // value for the given variable in this block + writeVar(context, blockInfo, var, val); + + return val; +} + + +IRValue* readVar( + ConstructSSAContext* context, + SSABlockInfo* blockInfo, + IRVar* var) +{ + // In the easy case, there will be a preceeding + // store in the same block, so we can use + // that local value. + IRValue* val = nullptr; + if (blockInfo->valueForVar.TryGetValue(var, val)) + { + // Hooray, we found a value to use, and we + // can proceed without too many complications. + return val; + } + + // Otherwise we need to try to non-trivial/recursive + // case of lookup. + return readVarRec(context, blockInfo, var); +} + +void processBlock( + ConstructSSAContext* context, + IRBlock* block, + SSABlockInfo* blockInfo) +{ + // Before starting, check if this block can be sealed + maybeSealBlock(context, blockInfo); + + // Walk the instructions in the block, and either + // leave them as-is, or replace them with a value + // that we look up with local/global value numbering + + IRInst* next = nullptr; + for (auto ii = block->getFirstInst(); ii; ii = next) + { + next = ii->getNextInst(); + + // Any new instructions we create to represent + // the new value will get inserted before whatever + // instruction we are working with. + blockInfo->builder.insertBeforeInst = ii; + + switch (ii->op) + { + default: + // Ordinary instruction -> leave as-is + break; + + case kIROp_Store: + { + auto storeInst = (IRStore*)ii; + auto ptrArg = storeInst->ptr.usedValue; + auto valArg = storeInst->val.usedValue; + + if (auto var = asPromotableVar(context, ptrArg)) + { + // We are storing to a promotable variable, + // so we want to register the value being + // stored as the value for the given SSA + // variable. + writeVar(context, blockInfo, var, valArg); + + // Also eliminate the store instruction, + // since it is no longer needed. + storeInst->removeAndDeallocate(); + } + } + break; + + case kIROp_Load: + { + IRLoad* loadInst = (IRLoad*)ii; + auto ptrArg = loadInst->ptr.usedValue; + + if (auto var = asPromotableVar(context, ptrArg)) + { + // We are loading from a promotable variable. + // Look up the value in the context of this + // block. + auto val = readVar(context, blockInfo, var); + + // We can just replace all uses of this + // load instruction with the given value. + loadInst->replaceUsesWith(val); + + // Also eliminate the load instruction, + // since it is no longer needed. + loadInst->removeAndDeallocate(); + } + } + break; + } + } + + blockInfo->builder.insertBeforeInst = block->lastInst; + + // Once we are done with all of the instructions + // in a block, we can mark it as "filled," which + // means we can actually consider lookups into + // it. + blockInfo->isFilled = true; + + // Having filled this block might allow us to seal some + // of its successor(s) + for (auto ss : block->getSuccessors()) + { + auto successorInfo = *context->blockInfos.TryGetValue(ss); + maybeSealBlock(context, successorInfo); + } +} + +static void breakCriticalEdges( + ConstructSSAContext* context) +{ + // A critical edge is an edge P -> S where + // P has multiple sucessors, and S has multiple + // predecessors. + // + // In the context of our CFG representation, such an edge + // will be an `IRUse` in the terminator instruction of block P, + // which refers to block S. + // + // We will make a pass over the CFG to collect all the critical + // edges, and then we will break them in a follow-up pass. + + List<IRUse*> criticalEdges; + + auto globalVal = context->globalVal; + for (auto pred = globalVal->getFirstBlock(); pred; pred = pred->getNextBlock()) + { + auto successors = pred->getSuccessors(); + if (successors.getCount() <= 1) + continue; + + auto succIter = successors.begin(); + auto succEnd = successors.end(); + + for (; succIter != succEnd; ++succIter) + { + auto succ = *succIter; + + // For the edge to be critical, the successor must have + // more than one predecessor. + // More than that, we require that it has more than one + // *unique* predecessor, to handle the case where multiple + // cases of a `switch` might lead to the same block. + // + // To implement this, we test if it has any predecessor + // other than `pred` which we already know about. + + bool multiplePreds = false; + for (auto pp : succ->getPredecessors()) + { + if (pp != pred) + { + multiplePreds = true; + break; + } + } + if (!multiplePreds) + continue; + + // We have found a critical edge from `pred` to `succ`. + // + // Furthermore, the `IRUse` embedded in `succIter` represents + // that edge directly. + auto edgeUse = succIter.use; + criticalEdges.Add(edgeUse); + } + } + + // Now we will iterate over the critical edges and break each + // one by inserting a new block. Note that we do not try + // to break the edges while doing the initial walk, because + // that would change the CFG while we are walking it. + + for (auto edgeUse : criticalEdges) + { + auto pred = (IRBlock*) edgeUse->user->parent; + assert(pred->op == kIROp_Block); + + auto succ = (IRBlock*)edgeUse->usedValue; + assert(succ->op == kIROp_Block); + + IRBuilder builder; + builder.sharedBuilder = &context->sharedBuilder; + builder.curFunc = globalVal; + builder.curBlock = pred; + + // Create a new block that will sit "along" the edge + IRBlock* edgeBlock = builder.createBlock(); + + // The predecessor block should now branch to + // the edge block. + edgeUse->set(edgeBlock); + + // The edge block should branch (unconditionally) + // to the successor block. + builder.curBlock = edgeBlock; + builder.emitBranch(succ); + + // Insert the new block into the block list + // for the function. + // + // In principle, the order of this list shouldn't + // affect the semantics of a program, but we + // might want to be careful about ordering anyway. + edgeBlock->insertAfter(pred); + } +} + +// Construct SSA form for a global value with code +void constructSSA(ConstructSSAContext* context) +{ + // First, detect and and break any critical edges in the CFG, + // because our representation of SSA form doesn't allow for them. + breakCriticalEdges(context); + + + // Figure out what variables we can promote to + // SSA temporaries. + identifyPromotableVars(context); + + // If none of the variables are promote-able, + // then we can exit without making any changes + if (context->promotableVars.Count() == 0) + return; + + // We are going to walk the blocks in order, + // and try to process each, by replacing loads + // and stores of promotable variables with simple values. + + auto globalVal = context->globalVal; + for (auto bb = globalVal->firstBlock; bb; bb = bb->nextBlock) + { + auto blockInfo = new SSABlockInfo(); + blockInfo->block = bb; + + blockInfo->builder.sharedBuilder = &context->sharedBuilder; + blockInfo->builder.curBlock = bb; + blockInfo->builder.curFunc = globalVal; + blockInfo->builder.insertBeforeInst = bb->lastInst; + + context->blockInfos.Add(bb, blockInfo); + } + for (auto bb = globalVal->firstBlock; bb; bb = bb->nextBlock) + { + auto blockInfo = * context->blockInfos.TryGetValue(bb); + processBlock(context, bb, blockInfo); + } + + // We need to transfer the logical arguments to our phi nodes + // from the phi nodes back to the predecessor blocks that will + // pass them in. + for (auto bb = globalVal->firstBlock; bb; bb = bb->nextBlock) + { + auto blockInfo = *context->blockInfos.TryGetValue(bb); + + for (auto phiInfo : blockInfo->phis) + { + // If we eliminated this phi, then we had better not + // include it in the result. + if (phiInfo->wasRemoved) + continue; + + // We should add the phi as an explicit parameter of + // the given block. + bb->addParam(phiInfo->phi); + + UInt predCounter = 0; + for (auto pp : bb->getPredecessors()) + { + UInt predIndex = predCounter++; + auto predInfo = *context->blockInfos.TryGetValue(pp); + + IRValue* operandVal = phiInfo->operands[predIndex].usedValue; + + phiInfo->operands[predIndex].clear(); + + predInfo->successorArgs.Add(operandVal); + } + } + } + + // Some blocks may now need to pass along arguments to their sucessor, + // which have been stored into the `SSABlockInfo::successorArgs` field. + for (auto bb = globalVal->firstBlock; bb; bb = bb->nextBlock) + { + auto blockInfo = * context->blockInfos.TryGetValue(bb); + + // Sanity check: all blocks should be filled and sealed. + assert(blockInfo->isSealed); + assert(blockInfo->isFilled); + + // Don't do any work for blocks that don't need to pass along + // values to the sucessor block. + auto addedArgCount = blockInfo->successorArgs.Count(); + if (addedArgCount == 0) + continue; + + // We need to replace the terminator instruction with one that + // has additional arguments. + + IRTerminatorInst* oldTerminator = (IRTerminatorInst*) bb->getLastInst(); + assert(isTerminatorInst(oldTerminator)); + + blockInfo->builder.insertBeforeInst = nullptr; + + auto oldArgCount = oldTerminator->argCount; + auto newArgCount = oldArgCount + addedArgCount; + + List<IRValue*> newArgs; + for (UInt aa = 0; aa < oldArgCount; ++aa) + { + newArgs.Add(oldTerminator->getArg(aa)); + } + for (UInt aa = 0; aa < addedArgCount; ++aa) + { + newArgs.Add(blockInfo->successorArgs[aa]); + } + + IRTerminatorInst* newTerminator = (IRTerminatorInst*)blockInfo->builder.emitIntrinsicInst( + oldTerminator->type, + oldTerminator->op, + newArgCount, + newArgs.Buffer()); + + // Swap decorations over to the new instruction + newTerminator->firstDecoration = oldTerminator->firstDecoration; + oldTerminator->firstDecoration = nullptr; + + // A terminator better not have uses, so we shouldn't have + // to replace them. + assert(!oldTerminator->firstUse); + + + // Okay, we should be clear to remove the old terminator + oldTerminator->removeAndDeallocate(); + } + + // Now we should be able to go through and remove + // of of the variables + for (auto var : context->promotableVars) + { + var->removeAndDeallocate(); + } +} + +// Construct SSA form for a global value with code +void constructSSA(IRModule* module, IRGlobalValueWithCode* globalVal) +{ + ConstructSSAContext context; + context.globalVal = globalVal; + + context.sharedBuilder.module = module; + context.sharedBuilder.session = module->session; + + constructSSA(&context); +} + +void constructSSA(IRModule* module, IRGlobalValue* globalVal) +{ + switch (globalVal->op) + { + case kIROp_Func: + case kIROp_global_var: + constructSSA(module, (IRGlobalValueWithCode*)globalVal); + + default: + break; + } +} + +void constructSSA(IRModule* module) +{ + for (auto gv = module->getFirstGlobalValue(); gv; gv = gv->getNextValue()) + { + constructSSA(module, gv); + } +} + +} diff --git a/source/slang/ir-ssa.h b/source/slang/ir-ssa.h new file mode 100644 index 000000000..b9a05c2ff --- /dev/null +++ b/source/slang/ir-ssa.h @@ -0,0 +1,9 @@ +// ir-ssa.h +#pragma once + +struct IRModule; + +namespace Slang +{ + void constructSSA(IRModule* module); +} diff --git a/source/slang/ir.cpp b/source/slang/ir.cpp index 77aac3f5a..634632f0d 100644 --- a/source/slang/ir.cpp +++ b/source/slang/ir.cpp @@ -28,16 +28,27 @@ namespace Slang { // TODO: need to make this faster by using a dictionary... - if (0) {} + static const struct { + char const* mnemonic; + IROp op; + } kOps[] = { #define INST(ID, MNEMONIC, ARG_COUNT, FLAGS) \ - else if(strcmp(name, #MNEMONIC) == 0) return kIROp_##ID; + { #MNEMONIC, kIROp_##ID }, #define PSEUDO_INST(ID) \ - else if(strcmp(name, #ID) == 0) return kIRPseudoOp_##ID; + { #ID, kIRPseudoOp_##ID }, #include "ir-inst-defs.h" + }; + + for (auto ee : kOps) + { + if (strcmp(name, ee.mnemonic) == 0) + return ee.op; + } + return IROp(kIROp_Invalid); } @@ -125,6 +136,228 @@ namespace Slang lastParam = param; } + // The predecessors of a block should all show up as users + // of its value, so rather than explicitly store the CFG, + // we will recover it on demand from the use-def information. + // + // Note: we are really iterating over incoming/outgoing *edges* + // for a block, because there might be multiple uses of a block, + // if more than one way of an N-way branch targets the same block. + + // Get the list of successor blocks for an instruction, + // which we expect to be the last instruction in a block. + static IRBlock::SuccessorList getSuccessors(IRInst* terminator) + { + // If the block somehow isn't terminated, then + // there is no way to read its successors, so + // we return an empty list. + if (!terminator || !isTerminatorInst(terminator)) + return IRBlock::SuccessorList(nullptr, nullptr); + + // Otherwise, based on the opcode of the terminator + // instruction, we will build up our list of uses. + IRUse* begin = nullptr; + IRUse* end = nullptr; + UInt stride = 1; + + auto args = terminator->getArgs(); + switch (terminator->op) + { + case kIROp_ReturnVal: + case kIROp_ReturnVoid: + case kIROp_unreachable: + case kIROp_discard: + break; + + case kIROp_unconditionalBranch: + case kIROp_break: + case kIROp_continue: + case kIROp_loop: + // unconditonalBranch <block> + begin = args + 0; + end = begin + 1; + break; + + case kIROp_conditionalBranch: + case kIROp_if: + case kIROp_ifElse: + case kIROp_loopTest: + // conditionalBranch <condition> <trueBlock> <falseBlock> + begin = args + 1; + end = begin + 2; + break; + + case kIROp_switch: + // switch <val> <break> <default> <caseVal1> <caseBlock1> ... + begin = args + 4; + end = args + terminator->getArgCount() + 1; + stride = 2; + break; + + default: + assert(!"unepxected"); + return IRBlock::SuccessorList(nullptr, nullptr); + } + + return IRBlock::SuccessorList(begin, end, stride); + } + + void IRBlock::insertAfter(IRBlock* other) + { + assert(other); + insertAfter(other, other->parentFunc); + } + + void IRBlock::insertAfter(IRBlock* other, IRGlobalValueWithCode* func) + { + assert(other || func); + + if (!other) other = func->lastBlock; + if (!func) func = other->parentFunc; + + assert(func); + + auto pp = other; + auto nn = other ? other->nextBlock : nullptr; + + if (pp) + { + pp->nextBlock = this; + } + else + { + func->firstBlock = this; + } + + if (nn) + { + nn->prevBlock = this; + } + else + { + func->lastBlock = this; + } + + this->prevBlock = pp; + this->nextBlock = nn; + this->parentFunc = func; + } + + static IRUse* adjustPredecessorUse(IRUse* use) + { + // We will search until we either find a + // suitable use, or run out of uses. + for (;use; use = use->nextUse) + { + // We only want to deal with uses that represent + // a "sucessor" operand to some terminator instruction. + // We will re-use the logic for getting the successor + // list from such an instruction. + + auto successorList = getSuccessors((IRInst*) use->user); + + if(use >= successorList.begin_ + && use < successorList.end_) + { + UInt index = (use - successorList.begin_); + if ((index % successorList.stride) == 0) + { + // This use is in the range of the sucessor list, + // and so it represents a real edge between + // blocks. + return use; + } + } + } + + // If we ran out of uses, then we are at the end + // of the list of incoming edges. + return nullptr; + } + + IRBlock::PredecessorList IRBlock::getPredecessors() + { + // We want to iterate over the predecessors of this block. + // First, we resign ourselves to iterating over the + // incoming edges, rather than the blocks themselves. + // This might sound like a trival distinction, but it is + // possible for there to be multiple edges between two + // blocks (as for a `switch` with multiple cases that + // map to the same code). Any client that wants just + // the unique predecessor blocks needs to deal with + // the deduplication themselves. + // + // Next, we note that for any predecessor edge, there will + // be a use of this block in the terminator instruction of + // the predecessor. We basically just want to iterate over + // the users of this block, then, but we need to be careful + // to rule out anything that doesn't actually represent + // an edge. The `adjustPredecessorUse` function will be + // used to search for a use that actually represents an edge. + + return PredecessorList( + adjustPredecessorUse(firstUse)); + } + + UInt IRBlock::PredecessorList::getCount() + { + UInt count = 0; + for (auto ii : *this) + { + (void)ii; + count++; + } + return count; + } + + void IRBlock::PredecessorList::Iterator::operator++() + { + if (!use) return; + use = adjustPredecessorUse(use->nextUse); + } + + IRBlock* IRBlock::PredecessorList::Iterator::operator*() + { + if (!use) return nullptr; + return (IRBlock*)use->user->parent; + } + + IRBlock::SuccessorList IRBlock::getSuccessors() + { + // The successors of a block will all be listed + // as operands of its terminator instruction. + // Depending on the terminator, we might have + // different numbers of operands to deal with. + // + // (We might also have to deal with a "stride" + // in the case where the basic-block operands + // are mixed up with non-block operands) + + auto terminator = getLastInst(); + return Slang::getSuccessors(terminator); + } + + UInt IRBlock::SuccessorList::getCount() + { + UInt count = 0; + for (auto ii : *this) + { + (void)ii; + count++; + } + return count; + } + + void IRBlock::SuccessorList::Iterator::operator++() + { + use += stride; + } + + IRBlock* IRBlock::SuccessorList::Iterator::operator*() + { + return (IRBlock*)use->usedValue; + } + // IRFunc IRType* IRFunc::getResultType() { return getType()->getResultType(); } @@ -609,6 +842,18 @@ namespace Slang &value); } + IRUndefined* IRBuilder::emitUndefined(IRType* type) + { + auto inst = createInst<IRUndefined>( + this, + kIROp_undefined, + type); + + addInst(inst); + + return inst; + } + IRValue* IRBuilder::getDeclRefVal( DeclRefBase const& declRef) { @@ -1004,13 +1249,20 @@ namespace Slang return bb; } - IRParam* IRBuilder::emitParam( + IRParam* IRBuilder::createParam( IRType* type) { auto param = createValue<IRParam>( this, kIROp_Param, type); + return param; + } + + IRParam* IRBuilder::emitParam( + IRType* type) + { + auto param = createParam(type); if (auto bb = curBlock) { @@ -3283,6 +3535,7 @@ namespace Slang case kIROp_Func: case kIROp_witness_table: return cloneGlobalValue(this, (IRGlobalValue*) originalValue); + case kIROp_boolConst: { IRConstant* c = (IRConstant*)originalValue; diff --git a/source/slang/ir.h b/source/slang/ir.h index 1b4529a3c..028468308 100644 --- a/source/slang/ir.h +++ b/source/slang/ir.h @@ -375,6 +375,68 @@ struct IRBlock : IRValue IRGlobalValueWithCode* getParent() { return parentFunc; } + void insertAfter(IRBlock* other); + void insertAfter(IRBlock* other, IRGlobalValueWithCode* func); + + struct PredecessorList + { + PredecessorList(IRUse* begin) : b(begin) {} + IRUse* b; + + UInt getCount(); + + struct Iterator + { + Iterator(IRUse* use) : use(use) {} + + IRBlock* operator*(); + + void operator++(); + + bool operator!=(Iterator const& that) + { + return use != that.use; + } + + IRUse* use; + }; + + Iterator begin() { return Iterator(b); } + Iterator end() { return Iterator(nullptr); } + }; + + struct SuccessorList + { + SuccessorList(IRUse* begin, IRUse* end, UInt stride = 1) : begin_(begin), end_(end), stride(stride) {} + IRUse* begin_; + IRUse* end_; + UInt stride; + + UInt getCount(); + + struct Iterator + { + Iterator(IRUse* use, UInt stride) : use(use), stride(stride) {} + + IRBlock* operator*(); + + void operator++(); + + bool operator!=(Iterator const& that) + { + return use != that.use; + } + + IRUse* use; + UInt stride; + }; + + Iterator begin() { return Iterator(begin_, stride); } + Iterator end() { return Iterator(end_, stride); } + }; + + PredecessorList getPredecessors(); + SuccessorList getSuccessors(); }; // For right now, we will represent the type of diff --git a/source/slang/lower-to-ir.cpp b/source/slang/lower-to-ir.cpp index bde50aa89..5c6f62c38 100644 --- a/source/slang/lower-to-ir.cpp +++ b/source/slang/lower-to-ir.cpp @@ -5,6 +5,7 @@ #include "ir.h" #include "ir-insts.h" +#include "ir-ssa.h" #include "mangle.h" #include "type-layout.h" #include "visitor.h" @@ -4201,6 +4202,42 @@ IRModule* generateIRForTranslationUnit( ensureDecl(context, decl); } + // We will perform certain "mandatory" optimization passes now. + // These passes serve two purposes: + // + // 1. To simplify the code that we use in backend compilation, + // or when serializing/deserializing modules, so that we can + // amortize this effort when we compile multiple entry points + // that use the same module(s). + // + // 2. To ensure certain semantic properties that can't be + // validated without dataflow information. For example, we want + // to detect when a variable might be used before it is initialized. + + // Note: if you need to debug the IR that is created before + // any mandatory optimizations have been applied, then + // uncomment this line while debugging. + + // dumpIR(module); + + // First, attempt to promote local variables to SSA + // temporaries whenever possible. + constructSSA(module); + + // TODO: Do basic constant folding and DCE + + // TODO: give error messages if any `undefined` or + // `unreachable` instructions remain. + + // TODO: consider doing some more aggressive optimizations + // (in particular specialization of generics) here, so + // that we can avoid doing them downstream. + // + // Note: doing specialization or inlining involving code + // from other modules potentially makes the IR we generate + // "fragile" in that we'd now need to recompile when + // a module we depend on changes. + // If we are being sked to dump IR during compilation, // then we can dump the initial IR for the module here. if(compileRequest->shouldDumpIR) diff --git a/source/slang/slang.vcxproj b/source/slang/slang.vcxproj index 23400bb23..1ff88020f 100644 --- a/source/slang/slang.vcxproj +++ b/source/slang/slang.vcxproj @@ -179,6 +179,7 @@ <ClInclude Include="expr-defs.h" /> <ClInclude Include="ir-inst-defs.h" /> <ClInclude Include="ir-insts.h" /> + <ClInclude Include="ir-ssa.h" /> <ClInclude Include="ir.h" /> <ClInclude Include="legalize-types.h" /> <ClInclude Include="lexer.h" /> @@ -217,6 +218,7 @@ <ClCompile Include="dxc-support.cpp" /> <ClCompile Include="emit.cpp" /> <ClCompile Include="ir-legalize-types.cpp" /> + <ClCompile Include="ir-ssa.cpp" /> <ClCompile Include="ir.cpp" /> <ClCompile Include="legalize-types.cpp" /> <ClCompile Include="lexer.cpp" /> diff --git a/source/slang/slang.vcxproj.filters b/source/slang/slang.vcxproj.filters index 4c463a62a..e0bf59025 100644 --- a/source/slang/slang.vcxproj.filters +++ b/source/slang/slang.vcxproj.filters @@ -44,6 +44,7 @@ <ClInclude Include="vm.h" /> <ClInclude Include="mangle.h" /> <ClInclude Include="legalize-types.h" /> + <ClInclude Include="ir-ssa.h" /> </ItemGroup> <ItemGroup> <ClCompile Include="check.cpp" /> @@ -73,6 +74,7 @@ <ClCompile Include="dxc-support.cpp" /> <ClCompile Include="ir-legalize-types.cpp" /> <ClCompile Include="legalize-types.cpp" /> + <ClCompile Include="ir-ssa.cpp" /> </ItemGroup> <ItemGroup> <CustomBuild Include="core.meta.slang" /> diff --git a/source/slang/vm.cpp b/source/slang/vm.cpp index ffc455232..c486615c9 100644 --- a/source/slang/vm.cpp +++ b/source/slang/vm.cpp @@ -1003,17 +1003,45 @@ void resumeThread( VMType type = decodeType(frame, &ip); UInt argCount = decodeUInt(&ip); - Int destinationBlock = decodeSInt(&ip); - for( UInt aa = 2; aa < argCount; ++aa ) + Int destinationBlockIndex = decodeSInt(&ip); + + auto func = frame->func; + auto& destinationBlock = func->bcFunc->blocks[destinationBlockIndex]; + + // We may be passing arguments through to the destination + // block, so any remaining operands of the branch instruction + // need to be used to fill in the parameter registers of + // the destination block. + + UInt paramCount = destinationBlock.paramCount; + + // There might be additional operands, because some branches + // also include information on merge points and break/continue labels. + UInt remainingArgCount = argCount - 1; + assert(remainingArgCount >= paramCount); + + UInt extraArgCount = remainingArgCount - paramCount; + + for (UInt ee = 0; ee < extraArgCount; ++ee) { decodeOperandPtr<void>(frame, &ip); } - // TODO: we need to deal with the case of - // passing arguments to the block, which - // means copying between the registers... + auto baseRegIndex = destinationBlock.params - func->bcFunc->regs; + for( UInt pp = 0; pp < paramCount; ++pp ) + { + auto regIndex = baseRegIndex + pp; + + void* argPtr = decodeOperandPtr<void>(frame, &ip); + void* regPtr = getRegPtrImpl(frame, regIndex); + + VMType regType = func->regs[regIndex].type; + memcpy(regPtr, argPtr, regType.getSize()); + } + + // Now simply jump to the destination block. // - ip = frame->func->bcFunc->blocks[destinationBlock].code; + ip = destinationBlock.code; } break; |
