summaryrefslogtreecommitdiff
path: root/source/slang/slang-lower-to-ir.cpp
diff options
context:
space:
mode:
authorTheresa Foley <tfoleyNV@users.noreply.github.com>2021-07-21 12:52:08 -0700
committerGitHub <noreply@github.com>2021-07-21 12:52:08 -0700
commit23d406f8a3b325f91fecd9ad52bd510ded5f49a7 (patch)
tree54d770593e38fcc5e60b9d6188f0a14641e7b002 /source/slang/slang-lower-to-ir.cpp
parente57ea944c4aba0cf385f0f3db6b6ddc7760b8ffa (diff)
Work to mitigate SPIR-V bloat (#1914)
* Work to mitigate SPIR-V bloat SPIR-V is not an especially compact format, but some patterns in how Slang generates code and then runs it through `spirv-opt` lead to many redundant field-by-field copy operations being emitted. This change attempts to address some of the resulting bloat from the Slang side of things. Note: experimentation shows that the bloat is less pronounced when running either *no* SPIR-V optimizations or *full* SPIR-V optimizations, so it is also likely that the bloat should be addressed by changing which `spirv-opt` passes the Slang compiler runs in default (`-O1`) builds. Such changes should come as a distinct pull request. This change primarily does two things: First, the code generation strategy for passing arguments to `out` and `inout` parameters has been changed. In the past, the compiler would *always* copy the argument value into a temporary, then pass the address of the temporary, and then write back the value after the call. The new code generation strategy attempts to identify when an argument value already has a simple address in memory and passes that address directly when possible. This eliminates many copy operations that occur before/after calls to functions with `out`/`inout` parameters. Second, we introduce an IR optimization pass that detects call sites where the entire contents of a buffer (usually a constant buffer) is being passed to a callee function, such that many bytes are loaded and then passed even if only very few are used in the callee. The pass moves the load operations from the caller to a specialized version of the the callee where possible (e.g., when the constant buffer in question is a global shader parameter). Doing this eliminates another major category of copies. Notes: * The IR lowering logic is complicated by the fact that several kinds of l-values (values that are usable as the desitnation of assignment, or for `out`/`inout` arguments) are not actually addressable. An easy example is a non-contiguous swizzle like `v.xwz` on a `float4`, where the value occupies 12 bytes, but not 12 consecutive bytes with a single address. There are many more corner cases like that and the IR lowering pass carries a lot of complexity to deal with them. A more systematic overhaul is due some time soon. * The IR representation of `out` and `inout` parameters deserves some careful scrutiny when making these kinds of changes. The official semantics of `inout` in HLSL has been "copy in copy out" (and `out` is just "copy out") which is observably different from any solution that passes in the address of an l-value directly. By making this change we are saying that Slang's semantics are not precisely those of legacy HLSL, and that our semantics for `inout` parameters are closer to those of `inout` in Swift or of a mutable borrow in Rust. In the Swift case the implementation can freely pass the underlying storage of an l-value or the address of a temporary, and valid programs may not observe the different. It is thus illegal to observe the value in a storage local while a mutation to that location is "in flight." All of this is way more detailed and technical than 99% of Slang users will ever care about, but importantly it gives us semantic cover to eliminate these copies in the IR, and also to emit output C++ code that implements `out` and `inout` as by-reference parameter passing. * There was an exsting generic pass for specializing functions based on call sites that uses a "template method" style of pattern to customize its behavior. That pass needed to be generalized to handle this use case because it had previously operated on the assumption that the "desire" to specialize a callee function must be driven by the parameter declarations of that function, and not on the argument values passed in. The code has been slightly refactored to allow the policy for specialization to consider both parameters and arguments. * Unsurprisingly, a bunch of the GLSL (and thus SPIR-V) generated has changed with this work, so several baseline `.slang.glsl` files needed to be updated. * This change is incomplete in that it does not address broader cases of buffer loads, including both partial loads from constant buffers (just loading one field, but a field that uses a "large" structure type), and loads from multi-element buffers (a lot from a structured buffer where the element type is "large"). The main question in each of those cases is how to define how "large" a structure needs to be before we decide to try and sink loads into callee functions like this. In the worst case, sinking loads in this way may actually create *more* memory traffic (because the same values get loaded in multiple callee functions). * fixup: run premake * fixup: typo
Diffstat (limited to 'source/slang/slang-lower-to-ir.cpp')
-rw-r--r--source/slang/slang-lower-to-ir.cpp150
1 files changed, 70 insertions, 80 deletions
diff --git a/source/slang/slang-lower-to-ir.cpp b/source/slang/slang-lower-to-ir.cpp
index d2d15735c..c7e32072e 100644
--- a/source/slang/slang-lower-to-ir.cpp
+++ b/source/slang/slang-lower-to-ir.cpp
@@ -2014,6 +2014,40 @@ LoweredValInfo createVar(
return LoweredValInfo::ptr(irAlloc);
}
+// When we try to turn a `LoweredValInfo` into an address of some temporary storage,
+// we can either do it "aggressively" or not (what we'll call the "default" behavior,
+// although it isn't strictly more common).
+//
+// The case that this is mostly there to address is when somebody writes an operation
+// like:
+//
+// foo[a] = b;
+//
+// In that case, we might as well just use the `set` accessor if there is one, rather
+// than complicate things. However, in more complex cases like:
+//
+// foo[a].x = b;
+//
+// there is no way to satisfy the semantics of the code the user wrote (in terms of
+// only writing one vector component, and not a full vector) by using the `set`
+// accessor, and we need to be "aggressive" in turning the lvalue `foo[a]` into
+// an address.
+//
+// TODO: realistically IR lowering is too early to be binding to this choice,
+// because different accessors might be supported on different targets.
+//
+enum class TryGetAddressMode
+{
+ Default,
+ Aggressive,
+};
+
+/// Try to coerce `inVal` into a `LoweredValInfo::ptr()` with a simple address.
+LoweredValInfo tryGetAddress(
+ IRGenContext* context,
+ LoweredValInfo const& inVal,
+ TryGetAddressMode mode);
+
/// Add a single `in` argument value to a list of arguments
void addInArg(
IRGenContext* context,
@@ -2092,59 +2126,49 @@ void addArg(
// According to our "calling convention" we need to
// pass a pointer into the callee.
//
- // A naive approach would be to just take the address
- // of `loweredArg` above and pass it in, but that
- // has two issues:
- //
- // 1. The l-value might not be something that has a single
- // well-defined "address" (e.g., `foo.xzy`).
- //
- // 2. The l-value argument might actually alias some other
- // storage that the callee will access (e.g., we are
- // passing in a global variable, or two `out` parameters
- // are being passed the same location in an array).
- //
- // In each of these cases, the safe option is to create
- // a temporary variable to use for argument-passing,
- // and then do copy-in/copy-out around the call.
+ // Ideally we would like to just pass the address of
+ // `loweredArg`, and when that it possible we will do so.
+ // It may happen, though, that `loweredArg` is not an
+ // addressable l-value (e.g., it is `foo.xyz`, so that
+ // the bytes of the l-value are not contiguous).
//
- // TODO: We should consider ruling out case (2) as undefined
- // behavior, and specify that whether `inout` and `out` are
- // handled via copy-in-copy-out or by-reference parameter
- // passing is an implementation detail. That would allow
- // us to avoid introducing a copy except where it is required
- // for the semantics of (1).
- //
- // TODO: We should confirm whether such a change will make
- // it harder to create SSA values for variables that get
- // used with `out` or `inout` parameters.
-
- LoweredValInfo tempVar = createVar(context, paramType);
-
- // If the parameter is `in out` or `inout`, then we need
- // to ensure that we pass in the original value stored
- // in the argument, which we accomplish by assigning
- // from the l-value to our temp.
- if(paramDirection == kParameterDirection_InOut)
+ LoweredValInfo argPtr = tryGetAddress(context, argVal, TryGetAddressMode::Default);
+ if(argPtr.flavor == LoweredValInfo::Flavor::Ptr)
{
- assign(context, tempVar, argVal);
+ addInArg(context, ioArgs, LoweredValInfo::simple(argPtr.val));
}
+ else
+ {
+ // If the value is not one that could yield a simple l-value
+ // then we need to convert it into a temporary
+ //
+ LoweredValInfo tempVar = createVar(context, paramType);
- // Now we can pass the address of the temporary variable
- // to the callee as the actual argument for the `in out`
- SLANG_ASSERT(tempVar.flavor == LoweredValInfo::Flavor::Ptr);
- IRInst* tempPtr = getAddress(context, tempVar, loc);
- addInArg(context, ioArgs, LoweredValInfo::simple(tempPtr));
+ // If the parameter is `in out` or `inout`, then we need
+ // to ensure that we pass in the original value stored
+ // in the argument, which we accomplish by assigning
+ // from the l-value to our temp.
+ //
+ if (paramDirection == kParameterDirection_InOut)
+ {
+ assign(context, tempVar, argVal);
+ }
- // Finally, after the call we will need
- // to copy in the other direction: from our
- // temp back to the original l-value.
- OutArgumentFixup fixup;
- fixup.src = tempVar;
- fixup.dst = argVal;
+ // Now we can pass the address of the temporary variable
+ // to the callee as the actual argument for the `in out`
+ SLANG_ASSERT(tempVar.flavor == LoweredValInfo::Flavor::Ptr);
+ IRInst* tempPtr = getAddress(context, tempVar, loc);
+ addInArg(context, ioArgs, LoweredValInfo::simple(tempPtr));
- (*ioFixups).add(fixup);
+ // Finally, after the call we will need
+ // to copy in the other direction: from our
+ // temp back to the original l-value.
+ OutArgumentFixup fixup;
+ fixup.src = tempVar;
+ fixup.dst = argVal;
+ (*ioFixups).add(fixup);
+ }
}
break;
@@ -2196,40 +2220,6 @@ void addCallArgsForParam(
//
-// When we try to turn a `LoweredValInfo` into an address of some temporary storage,
-// we can either do it "aggressively" or not (what we'll call the "default" behavior,
-// although it isn't strictly more common).
-//
-// The case that this is mostly there to address is when somebody writes an operation
-// like:
-//
-// foo[a] = b;
-//
-// In that case, we might as well just use the `set` accessor if there is one, rather
-// than complicate things. However, in more complex cases like:
-//
-// foo[a].x = b;
-//
-// there is no way to satisfy the semantics of the code the user wrote (in terms of
-// only writing one vector component, and not a full vector) by using the `set`
-// accessor, and we need to be "aggressive" in turning the lvalue `foo[a]` into
-// an address.
-//
-// TODO: realistically IR lowering is too early to be binding to this choice,
-// because different accessors might be supported on different targets.
-//
-enum class TryGetAddressMode
-{
- Default,
- Aggressive,
-};
-
-/// Try to coerce `inVal` into a `LoweredValInfo::ptr()` with a simple address.
-LoweredValInfo tryGetAddress(
- IRGenContext* context,
- LoweredValInfo const& inVal,
- TryGetAddressMode mode);
-
/// Compute the direction for a parameter based on its declaration
ParameterDirection getParameterDirection(VarDeclBase* paramDecl)
{