diff options
| author | Tim Foley <tfoleyNV@users.noreply.github.com> | 2018-05-24 19:20:11 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2018-05-24 19:20:11 -0700 |
| commit | 18709fbaa03fe0ef0727a802d864fae6c5163fc0 (patch) | |
| tree | a7bfddb97aaff016d38b5772ef7929e5c36572c5 /source/slang/ir-restructure-scoping.cpp | |
| parent | d7515c30209cc995573d9b0258de1716c30c6012 (diff) | |
A bunch of work to resolve #569 (#576)
* render-test should not fail on HLSL compiler *warnings*
The logic in `render-test` that invokes `D3DCompile` was causing a test to fail if it produced any warnings (not just if compilation fails).
Warning output can be dealt with by the test runner, since it will compare output between runs anyway, and it is useful to be able to run something through `render-test` that compiles with warnings.
* Be more careful about deleting IR instructions
There was an `IRInst::deallocate()` method that had a precondition that the instruction should already be removed from its parent and clear out all its operands before calling, but it wasn't checking this and the few call sites weren't doing things right either.
I consolidated things on `IRInst::removeAndDeallocate()` which does all the things: removes from the parent, clear out operands, and then deallocates.
I also made sure to clear out the type operand.
This clears up some crashing issues where passes were removing instructions but those instructions would still show up as users of other instructions.
* Don't emit bitwise not for non-Boolean types
It seems like the logic in `emit.cpp` messed things up and decided that `Not` (the IR instruction that is equivalent to `!` in the AST) should emit as `!` for Boolean types and `~` for other types, but this makes no sense (e.g., `~(a & 1)` is very different from `!(a & 1)`, even when interpreted as a condition).
It seems like this logic was intended for the `BitNot` case, where `~a` and `!a` are actually equivalent for Boolean values (but a target language might not like `~a` on `bool` values).
Maybe the original plan was that the `Not` instruction should only apply to Boolean values in the first place, and that other values should be converted to `bool` (or a vector of `bool`) before applying `Not`, but even in that case the emit logic makes no sense.
This caused an actual problem for one of my test cases, so it was important to fix it now.
* Fix issue with cached resolution for overoaded operators
The basic problem was that the lookup logic was forming a key based on the *first* definition it found for the overloaded operator, but that means that when processing a prefix `++a` call we might look up the *postfix* definition of `operator++` and decide to use its opcode as the key.
This "fixes" the logic by looking for the first definition with a "compatible" definition (e.g., a `__prefix` function if we are checking a `PrefixExpr`), and then uses its opcode.
A better fix in the long run would be to make the cache just be keyed on the operator name and the "fixity" of the expression (prefix, postfix, or infix).
* Introduce an intermediate structured control-flow representation
The code previously used a single function called `emitIRStmtsForBlocks` in `emit.cpp` that would take a logical sub-graph of the CFG and emit it as high-level statements.
It would do this by recognizing operations like coniditional branches that it could turn into high-level `if` statements, etc.
The main problem with this function was that it mixed together the logic for how we restructure the program with the logic for how we emit high-level code from that structure.
This change splits those two parts of the algorithm by introducing an intermediate data structure: a tree of `Region`s, which represent single-entry regions of the CFG.
There are subclasses of `Region` corresponding to various structured control-flow constructs, and then a leaf case that wraps a single `IRBlock`.
The new function `generateRegionsForIRBlocks()` (in `ir-restructure.cpp`) now handles the restructuring work, by building one or more `Region`s to represent a sub-graph, while `emitRegion()` handles emitting HLSL/GLSL source code from a region.
Splitting things in this way opens up some opportunities for future changes:
* We can expand the set of IR control-flow constructs allowed, so long as we can still generate structure `Region`s from them, without having to mess with the emit logic (e.g., we could start to support multi-level `break` by introducing temporaries as needed). In the limit we can generate our `Region`s using something like the "Relooper" algorithm.
* We can emit to other representations while retaining the same control-flow restructuring support. E.g., if we drop the structured information from the IR, then emitting to SPIR-V for Vulkan would require us to use the strucured control-flow information from these `Region`s.
* We can do analysis that needs to understand `Region` structure. This is relevant to issue #569, which was what prompted me to start on this work. Now that we have a representation of the nesting of `Region`s, we can use it to reason about visibility of values between blocks.
During development of this change I ran into a gotcha, in that I had been assuming each IR block would map to a single `Region`, forgetting that our current lowering of "continue clauses" in `for` loops leads to them being duplicated. The `Region` representation handles this by having a linked-list struct mapping IR blocks to the `SimpleRegion`s that represent them. I added a test case that includes a `for` loop with a continue clause that is reached along multiple paths just to make sure that we continue to support that case.
The compiler output should not change as a result of this work; this is supposed to be a pure refactoring change.
* Add a pass to resolve scoping issues in generated code
Fixes #569
The basic problem arises because the structured control flow that we output in high-level HLSL/GLSL doesn't match the "scoping" rules of an SSA IR.
In particular, SSA says that a value can be used in any block that is dominated by the definition, but in the presence of `break` and `continue` statements it is easy to construct cases where a block dominates something that is not in its scope for structured control flow. Consider:
```hlsl
for(;;) {
int a = xyz;
if(a) { int b = a; break; }
int c = a;
}
int d = b;
```
This program is invalid as HLSL, because the variable `b` is referenced outside of its scope, but if we look at the CFG for this function, it is clear that the block that computes `b` dominated the block that computes `d`. IR optimizations can easily create code like this, so we need to be ready for it.
The previous change added an explicit `Region` structure to represent the structured control flow that we re-form out of the IR, and this change adds a pass that exploits the structuring information to detect cases like the above and introduce temporaries to fix the scoping issue. For example, the pass would change the earlier code block into something like:
```hlsl
int tmp;
for(;;) {
int a = xyz;
if(a) { int b = a; tmp = b; break; }
int c = a;
}
int d = tmp;
```
That is, we introduce a new `tmp` variable at a scope "above" both the definition and use of `b`, and then we copy `b` into that temporary right where it is computed, and then use the temporary instead of the original `b` at the use site.
A few details that came up during the implementation:
* Downstream compilers may get confused by code like the above, and complain that `tmp` may be used before it is initialized, even though the very definition of dominators in a CFG means we don't have to worry about it. Still, I introduced some one-off code to initialize the temporaries just to silence spurious warnings coming from fxc.
* We need to be careful not to apply this logic to "phi nodes" (the parameters of basic blocks) since they will already be turned into temporaries by the emit logic, and trying to introduce temporaries with this pass led to broken code (I still need to investigate why). It may be that a future version of this pass should also take the code out of SSA form, so that we can introduce both kinds of temporaries in a single pass (and maybe eliminate some unnecessary variables by doing basic register allocation).
There is another transformation that could fix some issues of this kind, by moving code out of a structured control-flow construct and to the "join point" after it. For example, we could turn our loop from the start of this commit message into:
```hlsl
for(;;) {
int a = xyz;
if(a) { break; }
int c = a;
}
int b = a;
int d = b;
```
Moving the definition of `b` to after the loop is possible because there is no way to get out of the loop without executing that code anyway. Now the scoping issue for `d`'s use of `b` has gone away, but of course we've introduced a *new* scoping issue for `a`, when it gets used by `b`.
Adding a pass to re-arrange control flow like this could reduce the cases where we have to apply the current pass, but it wouldn't eliminate them entirely. That means such a pass can be deferred to future work.
This change includes a test case the reproduces the original issue, so that we can confirm the fix works.
Diffstat (limited to 'source/slang/ir-restructure-scoping.cpp')
| -rw-r--r-- | source/slang/ir-restructure-scoping.cpp | 434 |
1 files changed, 434 insertions, 0 deletions
diff --git a/source/slang/ir-restructure-scoping.cpp b/source/slang/ir-restructure-scoping.cpp new file mode 100644 index 000000000..61b56e190 --- /dev/null +++ b/source/slang/ir-restructure-scoping.cpp @@ -0,0 +1,434 @@ +// ir-restructure-scoping.cpp +#include "ir-restructure-scoping.h" + +#include "ir.h" +#include "ir-insts.h" +#include "ir-restructure.h" + +namespace Slang +{ + +/// Try to find the first structured region that represents `block` +/// +/// In general the same block may appear as multiple regions, +/// so this will return the first region in the linked list. +static SimpleRegion* getFirstRegionForBlock( + RegionTree* regionTree, + IRBlock* block) +{ + SimpleRegion* region = nullptr; + if( regionTree->mapBlockToRegion.TryGetValue(block, region) ) + { + return region; + } + return nullptr; +} + +/// Try to find the first structured region that contains `inst`. +static SimpleRegion* getFirstRegionForInst( + RegionTree* regionTree, + IRInst* inst) +{ + auto ii = inst; + while(ii) + { + if(auto block = as<IRBlock>(ii)) + return getFirstRegionForBlock(regionTree, block); + + ii = ii->getParent(); + } + + return nullptr; +} + +/// Compute the depth of a node in the region tree. +/// +/// This is the number of nodes (including `region`) +/// on a path from `region` to the root. +/// +static Int computeDepth(Region* region) +{ + Int depth = 0; + for( Region* rr = region; rr; rr = rr->getParent() ) + { + depth++; + } + return depth; +} + +/// Get the `n`th ancestor of `region`. +/// +/// When `n` is zero, this returns `region`. +/// When `n` is one, this returns the parent of `region`, and so forth. +/// +static Region* getAncestor(Region* region, Int n) +{ + Region* rr = region; + for( Int ii = 0; ii < n; ++ii ) + { + SLANG_ASSERT(rr); + rr = rr->getParent(); + } + return rr; +} + +/// Find a region that is an ancestor of both `left` and `right`. +static Region* findCommonAncestorRegion( + Region* left, + Region* right) +{ + // Rather than blinding search through each ancestor of `left` + // and see if it is also an ancestor of `right` and vice-versa, + // let's try to be smart about this. + // + // We will start by computing the depth of `left` and `right`: + // + Int leftDepth = computeDepth(left); + Int rightDepth = computeDepth(right); + + // Whatever the common ancestor is, it can't be any deeper + // than the minimum of these two depths. + // + Int minDepth = Math::Min(leftDepth, rightDepth); + + // Let's fetch the ancestor of each of `left` and `right` + // corresponding to that depth: + // + Region* leftAncestor = getAncestor(left, leftDepth - minDepth); + Region* rightAncestor = getAncestor(right, rightDepth - minDepth); + + // Now we know that `leftAncestor` and `rightAncestor` + // must have the same depth. Let's go ahead and assert + // it just to be safe: + // + SLANG_ASSERT(computeDepth(leftAncestor) == minDepth); + SLANG_ASSERT(computeDepth(rightAncestor) == minDepth); + + // If `leftAncestor` and `rightAncestor` are the same node, + // then we've found a common ancestor, otherwise we should + // look at their parents. Because the depth must match + // on both sides, we will never risk missing an ancestor. + // + while( leftAncestor != rightAncestor ) + { + leftAncestor = leftAncestor->getParent(); + rightAncestor = rightAncestor->getParent(); + } + + // Okay, we've found a common ancestor. + // + Region* commonAncestor = leftAncestor; + return commonAncestor; +} + +/// Find a simple region that is an ancestor of both `left` and `right`. +static SimpleRegion* findSimpleCommonAncestorRegion( + Region* left, + Region* right) +{ + // Start by finding a common ancestor without worrying about it being simple. + Region* ancestor = findCommonAncestorRegion(left, right); + + // Now search for a simple region up the tree. + while( ancestor ) + { + if(ancestor->getFlavor() == Region::Flavor::Simple) + return (SimpleRegion*) ancestor; + + ancestor = ancestor->getParent(); + } + + // This shouldn't ever occur. The root of the region tree should + // be a simple regions that represents the entry block of the + // function. + // + SLANG_UNEXPECTED("no common ancestor found in region tree"); + UNREACHABLE_RETURN(nullptr); +} + +IRInst* getDefaultInitVal( + IRBuilder* builder, + IRType* type) +{ + switch( type->op ) + { + default: + return nullptr; + + case kIROp_BoolType: + return builder->getBoolValue(false); + + case kIROp_IntType: + case kIROp_UIntType: + case kIROp_UInt64Type: + return builder->getIntValue(type, 0); + + case kIROp_HalfType: + case kIROp_FloatType: + case kIROp_DoubleType: + return builder->getFloatValue(type, 0.0); + + // TODO: handle vector/matrix types here, by + // creating an appropriate scalar value and + // then "splatting" it. + } +} + +/// Initialize a variable to a sane default value, if possible. +void defaultInitializeVar( + IRBuilder* builder, + IRVar* var, + IRType* type) +{ + IRInst* initVal = nullptr; + switch( type->op ) + { + case kIROp_VoidType: + default: + // By default, see if we can synthesize an IR value + // to be used as the default, and allow the logic + // below to store it into the variable. + initVal = getDefaultInitVal(builder, type); + break; + + // TODO: Handle aggregate types (structures, arrays) + // explicitly here, since they need to be careful about + // the cases where an element/field type might not + // be something we can default-initialize. + } + + if( initVal ) + { + builder->emitStore(var, initVal); + } +} + +/// Detect and fix any structured scoping issues for a given `def` instruction. +/// +/// The `defRegion` should be the region that contains `def`, and `regionTree` +/// should be the region tree for the function that contains `def`. +/// +static void fixValueScopingForInst( + IRInst* def, + SimpleRegion* defRegion, + RegionTree* regionTree) +{ + // This algorithm should not consider "phi nodes" for now, + // because the emit logic will already create variables for them. + // We could consider folding the logic to move out of SSA form + // into this function, but that would add a lot of complexity for now. + if(def->op == kIROp_Param) + return; + + // We would have a scoping violation if there exists some + // use `u` of `def` such that the region containing `u` + // (call it `useRegion`) is not a descendent of `defRegion` + // in the region tree. + // + // If there are no scoping violations, we don't want to do + // anything. If there *are* any scoping violations, then + // we ill need to introduce a temporary `tmp`, store into + // it right after `def`, and then load from it at any "bad" + // use sites. + // + // Of course, for the whole thing to work, we also need + // to put `tmp` into a block somwhere, and it needs to + // be a block that is visible to all of the uses, or we + // are just back int the same mess again. + // + // The right block to use for inserting `tmp` is the least + // common ancestor of `def` and all the "bad" uses, so + // we will get a bit "clever" and fold in the search for + // bad uses with the computation of the region we should + // insert `tmp` into (to avoid looping over the uses + // twice). + // + SimpleRegion* insertRegion = defRegion; + IRVar* tmp = nullptr; + + // If we end up needing to insert code we'll need an IR builder, + // so we will go ahead and create one now. + // + // TODO: the logic to compute `module` here could be hoisted + // out earlier, rather than being done per-instruction. + // + IRModule* module = regionTree->irCode->getModule(); + + SharedIRBuilder sharedBuilder; + sharedBuilder.session = module->session; + sharedBuilder.module = module; + + IRBuilder builder; + builder.sharedBuilder = &sharedBuilder; + + // Because we will be changing some of the uses of `def` + // to use other values while we iterate the list, we + // need to be a bit careful and extract the next use + // in the linked list *before* we operator on `u`. + // + IRUse* nextUse = nullptr; + for( auto u = def->firstUse; u; u = nextUse ) + { + nextUse = u->nextUse; + + // Looking at the use site `u`, we'd like to check if + // it violates our scoping rules. + // + // As a simple early-exit case, if the user is in + // the same block as the definition, there are no problems. + // + IRInst* user = u->getUser(); + if(user->getParent() == defRegion->block) + continue; + + // Otherwise, let's find the structures control-flow + // region that holds the user. We expect to always + // find one, because the use site must be in the same + // function. + // + // TODO: Double check that logic if we ever introduce + // things like nested function. + // + SimpleRegion* useRegion = getFirstRegionForInst(regionTree, user); + + // If there is no region associated with the use, then + // the use must be in unreachable code (part of the CFG, + // but not part of the region tree). We will skip + // such uses for now, since they won't even appear in + // the output. + // + if(!useRegion) + continue; + + // Now we want to check if `useRegion` is a child/descendent + // of a region that has the same block as `defRegion`. + // If it is, then there is no scoping problem with this use. + // + if(useRegion->isDescendentOf(defRegion->block)) + continue; + + // If we've gotten this far, we know that `u` is a "bad" + // use of `def`, and needs fixing. + // + // We will create the `tmp` variable on demand, so + // that we create it when the first "bad" use is encountered, + // and then re-use it for subsequent bad uses. + // + if( !tmp ) + { + // We will create a temporary to represent `def`, + // and insert a `store` into it right after `def`. + // + // Note: we are inserting the new variable right + // after `def` for now, just because we don't + // yet know the final region that it should be + // placed into. We will move it to the correct + // location when we are done. + // + builder.setInsertBefore(def->getNextInst()); + tmp = builder.emitVar(def->getDataType()); + builder.emitStore(tmp, def); + } + + // In order to know where `tmp` should be defined + // at the end of the algorithm, we need to compute + // a valid `insertRegion` that is an ancestor of + // all of the use sites (and it also a simple region + // so that we can insert into its IR block). + // + // We need to deal with one complexity in our restructuring + // process, which is that a block may be duplicated into + // one or more regions, so we loop over all the regions + // for the same block as `useRegion`. + // + for(auto rr = useRegion; rr; rr = rr->nextSimpleRegionForSameBlock) + { + insertRegion = findSimpleCommonAncestorRegion( + insertRegion, + rr); + } + + // To fix up the use `u`, we will need to change + // it from using `def` to using a load from `tmp` + // + builder.setInsertBefore(user); + IRInst* tmpVal = builder.emitLoad(tmp); + + // We are clobbering the value used by the `IRUse` `u`, + // while will cut it out of the list of uses for `def`. + // We need to be careful when doing this to not disrupt + // our iteration of the uses of `def`, so we carefully + // used the `nextUse` temporary at the start of the loop. + // + u->set(tmpVal); + } + + // At the end of the loop, the `tmp` variable will have + // been created if and only if we fixed up anything. + // + if( tmp ) + { + // If we created a temporary, then now we need to move + // its definition to the right place, which is the + // `insertRegion` that we computed during the loop. + // + // We'd like to insert our temporary near the top + // of the region, since that is the conventional + // place for local variables to go. + // + tmp->insertBefore( + insertRegion->block->getFirstOrdinaryInst()); + + // The whole point of the transformation we are doing + // here is that `def` is not on the "obvious" control + // flow path to one or more uses (which are now using + // `tmp`), but that means that it might not be "obvious" + // to a downstream compiler that `tmp` always gets + // initialized (by the code we inserted after `def`) + // before each of these use sites. + // + // We *know* that things are valid as long as our + // dominator tree was valid - there is no way to + // get to the block that loads from `tmp` without passing + // through the block that computes `def` (and then + // stores it into `tmp`) first. + // + // To avoid warnings/errros, we will go ahead and try + // to emit logic to "default initialize" the `tmp` + // variable if possible. + // + builder.setInsertBefore(tmp->getNextInst()); + defaultInitializeVar(&builder, tmp, def->getDataType()); + } +} + +void fixValueScoping(RegionTree* regionTree) +{ + // We are going to have to walk through every instruction + // in the code of the function to detect an bad cases. + // + auto code = regionTree->irCode; + for(auto block : code->getBlocks()) + { + // All of the instruction in `block` will have the same + // parent region, so we will look it up now rather than + // have to re-do this work on a per-instruction basis. + // + auto parentRegion = getFirstRegionForBlock(regionTree, block); + + // If a block has no region then it must be unreachable, + // so we will skip it entirely for this pass. + // + // TODO: we should be eliminating unrechable blocks anyway. + // + if(!parentRegion) + continue; + + for(auto inst : block->getChildren()) + { + fixValueScopingForInst(inst, parentRegion, regionTree); + } + } +} + +} |
