diff options
| author | Theresa Foley <tfoleyNV@users.noreply.github.com> | 2021-07-21 12:52:08 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2021-07-21 12:52:08 -0700 |
| commit | 23d406f8a3b325f91fecd9ad52bd510ded5f49a7 (patch) | |
| tree | 54d770593e38fcc5e60b9d6188f0a14641e7b002 /source/slang/slang-ir-specialize-function-call.cpp | |
| parent | e57ea944c4aba0cf385f0f3db6b6ddc7760b8ffa (diff) | |
Work to mitigate SPIR-V bloat (#1914)
* Work to mitigate SPIR-V bloat
SPIR-V is not an especially compact format, but some patterns in how Slang generates code and then runs it through `spirv-opt` lead to many redundant field-by-field copy operations being emitted. This change attempts to address some of the resulting bloat from the Slang side of things.
Note: experimentation shows that the bloat is less pronounced when running either *no* SPIR-V optimizations or *full* SPIR-V optimizations, so it is also likely that the bloat should be addressed by changing which `spirv-opt` passes the Slang compiler runs in default (`-O1`) builds. Such changes should come as a distinct pull request.
This change primarily does two things:
First, the code generation strategy for passing arguments to `out` and `inout` parameters has been changed. In the past, the compiler would *always* copy the argument value into a temporary, then pass the address of the temporary, and then write back the value after the call. The new code generation strategy attempts to identify when an argument value already has a simple address in memory and passes that address directly when possible. This eliminates many copy operations that occur before/after calls to functions with `out`/`inout` parameters.
Second, we introduce an IR optimization pass that detects call sites where the entire contents of a buffer (usually a constant buffer) is being passed to a callee function, such that many bytes are loaded and then passed even if only very few are used in the callee. The pass moves the load operations from the caller to a specialized version of the the callee where possible (e.g., when the constant buffer in question is a global shader parameter). Doing this eliminates another major category of copies.
Notes:
* The IR lowering logic is complicated by the fact that several kinds of l-values (values that are usable as the desitnation of assignment, or for `out`/`inout` arguments) are not actually addressable. An easy example is a non-contiguous swizzle like `v.xwz` on a `float4`, where the value occupies 12 bytes, but not 12 consecutive bytes with a single address. There are many more corner cases like that and the IR lowering pass carries a lot of complexity to deal with them. A more systematic overhaul is due some time soon.
* The IR representation of `out` and `inout` parameters deserves some careful scrutiny when making these kinds of changes. The official semantics of `inout` in HLSL has been "copy in copy out" (and `out` is just "copy out") which is observably different from any solution that passes in the address of an l-value directly. By making this change we are saying that Slang's semantics are not precisely those of legacy HLSL, and that our semantics for `inout` parameters are closer to those of `inout` in Swift or of a mutable borrow in Rust. In the Swift case the implementation can freely pass the underlying storage of an l-value or the address of a temporary, and valid programs may not observe the different. It is thus illegal to observe the value in a storage local while a mutation to that location is "in flight." All of this is way more detailed and technical than 99% of Slang users will ever care about, but importantly it gives us semantic cover to eliminate these copies in the IR, and also to emit output C++ code that implements `out` and `inout` as by-reference parameter passing.
* There was an exsting generic pass for specializing functions based on call sites that uses a "template method" style of pattern to customize its behavior. That pass needed to be generalized to handle this use case because it had previously operated on the assumption that the "desire" to specialize a callee function must be driven by the parameter declarations of that function, and not on the argument values passed in. The code has been slightly refactored to allow the policy for specialization to consider both parameters and arguments.
* Unsurprisingly, a bunch of the GLSL (and thus SPIR-V) generated has changed with this work, so several baseline `.slang.glsl` files needed to be updated.
* This change is incomplete in that it does not address broader cases of buffer loads, including both partial loads from constant buffers (just loading one field, but a field that uses a "large" structure type), and loads from multi-element buffers (a lot from a structured buffer where the element type is "large"). The main question in each of those cases is how to define how "large" a structure needs to be before we decide to try and sink loads into callee functions like this. In the worst case, sinking loads in this way may actually create *more* memory traffic (because the same values get loaded in multiple callee functions).
* fixup: run premake
* fixup: typo
Diffstat (limited to 'source/slang/slang-ir-specialize-function-call.cpp')
| -rw-r--r-- | source/slang/slang-ir-specialize-function-call.cpp | 162 |
1 files changed, 93 insertions, 69 deletions
diff --git a/source/slang/slang-ir-specialize-function-call.cpp b/source/slang/slang-ir-specialize-function-call.cpp index eb574c002..0341438c5 100644 --- a/source/slang/slang-ir-specialize-function-call.cpp +++ b/source/slang/slang-ir-specialize-function-call.cpp @@ -8,6 +8,61 @@ namespace Slang { +bool FunctionCallSpecializeCondition::isParamSuitableForSpecialization(IRParam* param, IRInst* inArg) +{ + SLANG_UNUSED(param); + + // Determining if an argument is suitable for + // specializing a callee function requires + // looking at its (recurisve) structure. + // + // Rather than write a recursively procedure + // here, we will be tail-recursive by using + // a simple loop. + // + IRInst* arg = inArg; + for (;;) + { + // The leaf case we care about is when the + // argument at the call site is a global + // shader parameter, because then we can + // specialize a callee to refer to the same + // global parameter directly. + // + if (as<IRGlobalParam>(arg)) return true; + + // As we will see later, we can also + // specialize a call when the argument + // is the result of indexing into an + // array (`base[index]`) *if* the `base` + // of the indexing operation is also + // suitable for specialization. + // + if (arg->getOp() == kIROp_getElement || arg->getOp() == kIROp_Load) + { + auto base = arg->getOperand(0); + + // We will "recurse" on the base of + // the indexing operation by continuing + // our loop with the `base` as our new + // argument. + // + arg = base; + continue; + } + + // By default, we will *not* consider an argument + // suitable for specialization. + // + // TODO: There may be other cases that are worth + // handling here. The current code is based on + // observation of what simple shaders do in + // practice. + // + return false; + } +} + struct FunctionParameterSpecializationContext { // This type implements a pass to specialize functions @@ -121,14 +176,15 @@ struct FunctionParameterSpecializationContext // two conditions we care about: // // 1. Should we specialize? This amounts to whether - // `func` has any parameters that need specialization. - // We will call those "specializable" parameters for - // lack of a better name. + // `func` has any parameters that "want" specialization, + // or wheter `call` has any arguments that "want" specialization. + // If either the parameter or argument at a given position + // want specialization, we will call the coresponding parameter + // a "specializable" parameter for lack of a better name. // // 2. Can we specialize? This amounts to whether the - // arguments in `call` that correspond to those - // specializable parameters are "suitable" for use - // in specialization. + // parameter of `func` and the corresponding argument to + // `call` are both "suitable" for specialization. // // We are going to answer both of these queries in // a single loop that walks over the parameters of @@ -147,23 +203,23 @@ struct FunctionParameterSpecializationContext SLANG_ASSERT(argIndex < call->getArgCount()); auto arg = call->getArg(argIndex); - // If the given parameter doesn't need specialization, + // If neither the parameter nor the argument wants specialization, // then we need to keep looking. // - if(!doesParamNeedSpecialization(param)) + if(!doesParamWantSpecialization(param, arg)) continue; - // If we have run into a `param` that needs specialization, + // If we have run into a `param` or `arg` that wants specialization, // then our first condition is met. // anySpecializableParam = true; - // Now we need to check whether `arg` is actually suitable + // Now we need to check whether `param` and `arg` are actually suitable // for specialization (our second condition). If not, we // can bail out immediately because our second condition // cannot be met. // - if(!isArgSuitableForSpecialization(arg)) + if(!isParamSuitableForSpecialization(param, arg)) return false; } @@ -178,62 +234,14 @@ struct FunctionParameterSpecializationContext // Of course, now we need to back-fill the predicates that // the above function used to evaluate prameters and arguments. - bool doesParamNeedSpecialization(IRParam* param) + bool doesParamWantSpecialization(IRParam* param, IRInst* arg) { - return condition->doesParamNeedSpecialization(param); + return condition->doesParamWantSpecialization(param, arg); } - bool isArgSuitableForSpecialization(IRInst* inArg) + bool isParamSuitableForSpecialization(IRParam* param, IRInst* arg) { - // Determining if an argument is suitable for - // specializing a callee function requires - // looking at its (recurisve) structure. - // - // Rather than write a recursively procedure - // here, we will be tail-recursive by using - // a simple loop. - // - IRInst* arg = inArg; - for(;;) - { - // The leaf case we care about is when the - // argument at the call site is a global - // shader parameter, because then we can - // specialize a callee to refer to the same - // global parameter directly. - // - if(as<IRGlobalParam>(arg)) return true; - - // As we will see later, we can also - // specialize a call when the argument - // is the result of indexing into an - // array (`base[index]`) *if* the `base` - // of the indexing operation is also - // suitable for specialization. - // - if( arg->getOp() == kIROp_getElement || arg->getOp() == kIROp_Load ) - { - auto base = arg->getOperand(0); - - // We will "recurse" on the base of - // the indexing operation by continuing - // our loop with the `base` as our new - // argument. - // - arg = base; - continue; - } - - // By default, we will *not* consider an argument - // suitable for specialization. - // - // TODO: There may be other cases that are worth - // handling here. The current code is based on - // observation of what simple shaders do in - // practice. - // - return false; - } + return condition->isParamSuitableForSpecialization(param, arg); } // Once we'e determined that a given call site can/should @@ -451,10 +459,10 @@ struct FunctionParameterSpecializationContext IRParam* oldParam, IRInst* oldArg) { - // We know that the case where a parameter - // doesn't need specialization is easy. + // We know that the case where the parameter + // and argument don't want specialization is easy. // - if( !doesParamNeedSpecialization(oldParam) ) + if( !doesParamWantSpecialization(oldParam, oldArg) ) { // The new call site will use the same argument // value as the old one, and we don't need @@ -470,6 +478,12 @@ struct FunctionParameterSpecializationContext // is handled with a different function // because it needs to recurse in some cases. // + // We will add the parameter that we are specializing to + // the key for caching of specializations, because functions + // specialized at different parameter positions should not + // be shared. + // + ioInfo.key.vals.add(oldParam); getCallInfoForArg(ioInfo, oldArg); } } @@ -572,7 +586,7 @@ struct FunctionParameterSpecializationContext // As always, the easy case is when the parameter of // the original function doesn't need specialization. // - if( !doesParamNeedSpecialization(oldParam) ) + if( !doesParamWantSpecialization(oldParam, oldArg) ) { // The specialized callee will need a new parameter // that fills the same role as the old one, so we @@ -677,9 +691,19 @@ struct FunctionParameterSpecializationContext return newVal; } - else if (oldArg->getOp() == kIROp_Load) + else if (auto oldArgLoad = as<IRLoad>(oldArg)) { - return getSpecializedValueForArg(ioInfo, oldArg->getOperand(0)); + auto oldPtr = oldArgLoad->getPtr(); + auto newPtr = getSpecializedValueForArg(ioInfo, oldPtr); + + auto builder = getBuilder(); + builder->setInsertInto(nullptr); + auto newVal = builder->emitLoad( + oldArg->getFullType(), + newPtr); + ioInfo.newBodyInsts.add(newVal); + + return newVal; } else { |
