diff options
| author | Tim Foley <tfoleyNV@users.noreply.github.com> | 2020-07-15 09:31:27 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2020-07-15 09:31:27 -0700 |
| commit | 723c9b1b3607ba910abbeb72f4f13bdff3cbd502 (patch) | |
| tree | 387ecf8c0a3324ebeb8361bb1abda08f8589721d /source/slang/slang-ir-entry-point-uniforms.cpp | |
| parent | 48f26ef082fa3b0c2a02dc57585f7e43210bbb63 (diff) | |
Remove KernelContext wrapper from CPU/CUDA emit (#1440)
* Remove KernelContext wrapper from CPU/CUDA emit
Currently, the CPU and CUDA C++ targets rely on a `KernelContext` type that is generated during emit, as a way to provide implicit access to things that were global in the input Slang code, but that can't actually be emitted as globals in the target language (because the semantics of global declarations differ).
For example, input like:
```hlsl
ConstantBuffer<Stuff> gStuff; // shader parameter
groupshared int gData[1024]; // thread-group shared variable
static int gCounter = 0; // "thread-local" global-scope variable
void subroutine() { ... }
[shader("compute")] void computeMain() { ... }
```
would translate to output C++ for CPU a bit like:
```c++
struct KernelContext
{
ConstantBuffer<Stuff> gStuff;
int gData[1024];
int gCounter = 0;
void subroutine() { ... }
void computeMain() { ... }
};
```
Note that both `computeMain()` and `subroutine()` are non-`static` members functions on `KernelContext`, so they have an implicit `this` parameter of type `KernelContext`, which allows the bodies of those functions to implicitly reference `gStuff`, etc. by name in their bodies.
Because `KernelContext::computeMain()` is a member function, we end up emitting an additional global-scope function to expose the entry point to the outside world, and that function is responsible for declaring a local `KernelContext` and invoking the generated entry point on it.
This approach has several important drawbacks:
* It complicates the emit logic for CPU and CUDA, with many special cases around when/how things get emitted
* It complicates the implementation of dynamic dispatch, because what seems like a function pointer in Slang IR needs to be a pointer-to-member-function in C++.
* It makes it difficult to have a non-kernel-oriented mode of compilation for CPU where a Slang function with a given signature gets output as a C++ CPU function with the "same" signature (not wrapped up as a member function of `KernelContext`.
This change makes a step toward addressing these issues by making the introducing of the `KernelContext` type be something that is done in an explicit IR pass instead of being handled as part of the last-mile emit logic.
The most important change is the removal of code related to `KernelContext` from the `slang-emit-{cpp,cuda}.{h,cpp}` files, with the equivalent logic instead being handled in a new pass in `slang-ir-explicit-global-context.{h,cpp}`. It should be noted that further cleanups to the emit logic should now be possible; in particular, both the CPU and CUDA emit paths are manually sequencing the `EmitAction`s instead of relying on the default logic, but at this point they should be able to just use the default. The additional cleanups are left for future work.
The explicit IR pass does more or less what one would expect: it identifies global-scope entities (global variables and parameters) that need to be wrapped and turns them into fields of a `KernelContext` type. It then modifies all entry points to initialize a `KernelContext` as part of their startup. Finally, any code that used to refer to the global entities is changed to refer to a field of the context, with the context passed via new function parameters (the new parameter is only added to functions that need it for now).
Transforming global variables into fields of a `KernelContext` type in the IR pass ends up dropping their initial-value expressions (since those were attached as basic blocks on the `IRGlobalVar`). To avoid breaking code that relies on global-scope (but thread-local) variables, this change also adds an explicit pass that takes the initialization logic on all global variables and moves it to explicit logic that runs at the start of every entry point in a linked module (`slang-ir-explicit-global-init.{h,cpp}`). This pass would also be useful when we get back to direct SPIR-V emit, since SPIR-V also requires initialization logic for globals to be emitted into entry points.
One complication that arises when the IR is introducing the types for entry-point parameters, global-scope parameters, and the `KernelContext` type is that it becomes harder for the emit logic to utter the names of those types (they might not even have names, since `IRNameHint`s might get stripped). This created a problem since the wrapper operations that were being generated for CPU were taking `void*` parameters and casting them to the appropriate type. To work around this issue, we have added an explicit IR pass (`slang-ir-entry-point-raw-ptr-params.{h,cpp}`) that transforms the signature of entry points so that any pointer parameters instead become raw pointer (`void*`) parameters, with the casting being handled inside the entry point itself.
One consequence of all the above changes is that for the CUDA target we no longer need a wrapper function to invoke the generated entry point any more, because the IR function for the entry point ends up having the correct/expected signature already. This is also the case for CPU when it comes to the `*_Thread` wrapper function, but this change doesn't try to eliminate the wrapper because of a belief that the `*_Thread`-level interface is going away anyway.
Because the IR is now responsible for ensuring the signature of the IR entry point for CUDA and CPU is what is expected, I needed to modify the `slang-ir-entry-point-uniforms` pass to always create an explicit parameter for the entry point uniforms when compiling for CUDA/CPU, even if there were no `uniform` parameters on the entry point as written. This also ended up requiring some tweaks to the parameter layout logic to ensure that CPU/CUDA targets always treat `ConstantBuffer<T>` as a `T*` even in the case where `T` is an empty `struct` type (which happens when we construct a `struct` type to represent the uniform parameters of an entry point with no uniform parameters...).
There are several future changes that can/should build on this work:
* We should change the generated signatures for CUDA kernels, so that they don't rely on `KernelContext` for global-scope parameters. At that point we can avoid generating a `KernelContext` at all for CUDA, except when a program uses global-scope thread-local variables.
* We should figure out how to make the "ABI" for dynamic-dispatch calls ensure that the kernel context is either always passed, or always *not* passed. Making a hard-and-fast rule as part of the calling convention for dynamic calls would ensure that they access through the context continues to work with dynamic calls (this change might break it in some cases).
* We should figure out how to handle the layout for the `KernelContext` in cases where a program is composed of multiple separately-compiled modules. Right now the layout of the `KernelContext` requires global knowledge (as does the pass that introduces explicit initialization for global-scope thread-locals).
* We should try to further clean up the CPU/CUDA C++ emit logic to fall back on the default emit behavior more, now that the various special-case approaches that were taken are no longer needed
* fixup: restore build files to default configuration
Diffstat (limited to 'source/slang/slang-ir-entry-point-uniforms.cpp')
| -rw-r--r-- | source/slang/slang-ir-entry-point-uniforms.cpp | 382 |
1 files changed, 249 insertions, 133 deletions
diff --git a/source/slang/slang-ir-entry-point-uniforms.cpp b/source/slang/slang-ir-entry-point-uniforms.cpp index 9c3c029a5..47e361d07 100644 --- a/source/slang/slang-ir-entry-point-uniforms.cpp +++ b/source/slang/slang-ir-entry-point-uniforms.cpp @@ -10,7 +10,7 @@ namespace Slang { -// The transformation in this file will solve the problem of taking +// The transformations in this file will solve the problem of taking // code like the following: // // float4 fragmentMain( @@ -88,21 +88,103 @@ namespace Slang // `params` above into individual variables for the `t` and // `s` fields. -// The overall structure here is similar to many other IR passes. -// We define a "context" structure to encapsulate the pass. +// For clarity and flexibility, the work is split across two +// different IR passes: // -struct MoveEntryPointUniformParametersToGlobalScope +// * The first pass simply collects together uniform parameters +// into a single parameter of `struct` or `ConstantBuffer<...>` type. +// +// * The second pass transforms entry-point uniform parameters +// into global shader parameters. + +// First we start with some helper subroutines for detecting +// whether a parameter represents a varying input rather than +// a uniform parameter. + + +// In order to determine whether a parameter is varying based on its +// layout, we need to know which resource kinds represent varying +// shader parameters. +// +bool isVaryingResourceKind(LayoutResourceKind kind) +{ + switch( kind ) + { + default: + return false; + + // Note: The set of cases that are considered + // varying here would need to be extended if we + // add more fine-grained resource kinds (e.g., + // if we ever add an explicit resource kind + // for geometry shader output streams). + // + // Ordinary varying input/output: + case LayoutResourceKind::VaryingInput: + case LayoutResourceKind::VaryingOutput: + // + // Ray-tracing shader input/output: + case LayoutResourceKind::CallablePayload: + case LayoutResourceKind::HitAttributes: + case LayoutResourceKind::RayPayload: + return true; + } +} + +bool isVaryingParameter(IRTypeLayout* typeLayout) +{ + // If *any* of the resources consumed by the parameter type + // is *not* a varying resource kind, then we consider the + // whole parameter to be uniform (and thus not varying). + // + // Note that this means that an empty type will always + // be considered varying, even if it had been explicitly + // marked `uniform`. + // + // Note that this logic rules out support for parameters + // that mix varying and non-varying resource kinds. + // + // TODO: This whole convoluted definition exists because + // we currently don't give system-value parameters any + // reosurce kind, so they show up as empty. Simply + // adding `LayoutResourceKind`s for system-value inputs + // and outputs would allow for simpler logic here. + // + for(auto sizeAttr : typeLayout->getSizeAttrs()) + { + if(!isVaryingResourceKind(sizeAttr->getResourceKind())) + return false; + } + return true; +} + +bool isVaryingParameter(IRVarLayout* varLayout) +{ + return isVaryingParameter(varLayout->getTypeLayout()); +} + +// Our two passes have a fair amount in common in terms of +// how they traverse the IR, so we will factor out the +// shared logic into a base type. + +struct PerEntryPointPass { // We'll hang on to the module we are processing, // so that we can refer to it when setting up `IRBuilder`s. // IRModule* module; + + SharedIRBuilder* m_sharedBuilder = nullptr; + // We will process a whole module by visiting all // its global functions, looking for entry points. // void processModule() { + SharedIRBuilder sharedBuilder(module); + m_sharedBuilder = &sharedBuilder; + // Note that we are only looking at true global-scope // functions and not functions nested inside of // IR generics. When using generic entry points, this @@ -130,21 +212,57 @@ struct MoveEntryPointUniformParametersToGlobalScope if( !func->findDecorationImpl(kIROp_EntryPointDecoration) ) continue; - // If we fine a candidate entry point, then we + // If we find a candidate entry point, then we // will process it. // processEntryPoint(func); } } - void processEntryPoint(IRFunc* func) + void processEntryPoint(IRFunc* entryPointFunc) + { + m_entryPointFunc = entryPointFunc; + processEntryPointImpl(entryPointFunc); + } + + IRFunc* m_entryPointFunc = nullptr; + + virtual void processEntryPointImpl(IRFunc* entryPointFunc) = 0; +}; + + +struct CollectEntryPointUniformParams : PerEntryPointPass +{ + CollectEntryPointUniformParamsOptions m_options; + + // *If* the entry point has any uniform parameter then we want to create a + // structure type to house them, and a single collected shader parameter (either + // an instance of that type or a constant buffer). + // + // We only want to create these if actually needed, so we will declare + // them here and then initialize them on-demand. + // + IRStructType* paramStructType = nullptr; + IRParam* collectedParam = nullptr; + + IRVarLayout* entryPointParamsLayout = nullptr; + bool needConstantBuffer = false; + + void processEntryPointImpl(IRFunc* entryPointFunc) SLANG_OVERRIDE { + // This pass object may be used across multiple entry points, + // so we need to make sure to reset state that could have been + // left over from a previous entry point. + // + paramStructType = nullptr; + collectedParam = nullptr; + // We expect all entry points to have explicit layout information attached. // // We will assert that we have the information we need, but try to be // defensive and bail out in the failure case in release builds. // - auto funcLayoutDecoration = func->findDecoration<IRLayoutDecoration>(); + auto funcLayoutDecoration = entryPointFunc->findDecoration<IRLayoutDecoration>(); SLANG_ASSERT(funcLayoutDecoration); if(!funcLayoutDecoration) return; @@ -161,31 +279,18 @@ struct MoveEntryPointUniformParametersToGlobalScope // If we are in the latter case we will need to make sure to allocate // an explicit IR constant buffer for that wrapper, // - auto entryPointParamsLayout = entryPointLayout->getParamsLayout(); - bool needConstantBuffer = as<IRParameterGroupTypeLayout>(entryPointParamsLayout->getTypeLayout()) != nullptr; + entryPointParamsLayout = entryPointLayout->getParamsLayout(); + needConstantBuffer = as<IRParameterGroupTypeLayout>(entryPointParamsLayout->getTypeLayout()) != nullptr; auto entryPointParamsStructLayout = getScopeStructLayout(entryPointLayout); // We will set up an IR builder so that we are ready to generate code. // - SharedIRBuilder sharedBuilderStorage; - auto sharedBuilder = &sharedBuilderStorage; - sharedBuilder->module = module; - sharedBuilder->session = module->getSession(); - - IRBuilder builderStorage; + IRBuilder builderStorage(m_sharedBuilder); auto builder = &builderStorage; - builder->sharedBuilder = sharedBuilder; - // *If* the entry point has any uniform parameter then we want to create a - // structure type to house them, and a global shader parameter (either - // an instance of that type or a constant buffer). - // - // We only want to create these if actually needed, so we will declare - // them here and then initialize them on-demand. - // - IRStructType* paramStructType = nullptr; - IRGlobalParam* globalParam = nullptr; + if(m_options.alwaysCreateCollectedParam) + ensureCollectedParamAndTypeHaveBeenCreated(); // We will be removing any uniform parameters we run into, so we // need to iterate the parameter list carefully to deal with @@ -193,7 +298,7 @@ struct MoveEntryPointUniformParametersToGlobalScope // IRParam* nextParam = nullptr; UInt paramCounter = 0; - for( IRParam* param = func->getFirstParam(); param; param = nextParam ) + for( IRParam* param = entryPointFunc->getFirstParam(); param; param = nextParam ) { nextParam = param->getNextParam(); UInt paramIndex = paramCounter++; @@ -225,62 +330,9 @@ struct MoveEntryPointUniformParametersToGlobalScope // to deal with creating the structure type and global shader // parameter that our transformed entry point will use. // - if( !paramStructType ) - { - // First we create the structure to hold the parameters. - // - builder->setInsertBefore(func); - paramStructType = builder->createStructType(); - builder->addNameHintDecoration(paramStructType, UnownedTerminatedStringSlice("EntryPointParams")); - - if( needConstantBuffer ) - { - // If we need a constant buffer, then the global - // shader parameter will be a `ConstantBuffer<paramStructType>` - // - auto constantBufferType = builder->getConstantBufferType(paramStructType); - globalParam = builder->createGlobalParam(constantBufferType); - } - else - { - // Otherwise, the global shader parameter is just - // an instance of `paramStructType`. - // - globalParam = builder->createGlobalParam(paramStructType); - } - - // No matter what, the global shader parameter should have the layout - // information from the entry point attached to it, so that the - // contained parameters will end up in the right place(s). - // - builder->addLayoutDecoration(globalParam, entryPointParamsLayout); - - // We add a name hint to the global parameter so that it will - // emit to more readable code when referenced. - // - builder->addNameHintDecoration(globalParam, UnownedTerminatedStringSlice("entryPointParams")); - - // We also decorate the parameter for the entry-point parameters - // so that we can find it again in downstream passes (like emit - // for CPU/CUDA) that might want to treat entry-point parameters - // different from other cases. - // - // TODO: Once we have support for multiple entry points to be emitted - // at once, we need a way to associate these per-entry-point parameters - // more closely with the original entry point. The two easiest options - // are: - // - // 1. Don't move the new aggregate parameter to the global scope - // on those targets, and instead keep it as a parameter of the - // entry point. - // - // 2. Use a decoration on the entry point itself to point at the - // global parameter for its per-entry-point parameter data. - // - builder->addDecoration(globalParam, kIROp_EntryPointParamDecoration); - } + ensureCollectedParamAndTypeHaveBeenCreated(); - // Now that we've ensured the global `struct` type and shader paramter + // Now that we've ensured the global `struct` type and collected shader paramter // exist, we need to add a field to the `struct` to represent the // current parameter. // @@ -349,7 +401,7 @@ struct MoveEntryPointUniformParametersToGlobalScope // auto fieldAddress = builder->emitFieldAddress( builder->getPtrType(paramType), - globalParam, + collectedParam, paramFieldKey); fieldVal = builder->emitLoad(fieldAddress); } @@ -361,7 +413,7 @@ struct MoveEntryPointUniformParametersToGlobalScope // fieldVal = builder->emitFieldExtract( paramType, - globalParam, + collectedParam, paramFieldKey); } @@ -380,76 +432,140 @@ struct MoveEntryPointUniformParametersToGlobalScope param->removeAndDeallocate(); } - fixUpFuncType(func); + if( collectedParam ) + { + collectedParam->insertBefore(entryPointFunc->getFirstBlock()->getFirstChild()); + } + + fixUpFuncType(entryPointFunc); } - // We need to be able to determine if a parameter is logically - // a "varying" parameter based on its layout. - // - bool isVaryingParameter(IRVarLayout* layout) + void ensureCollectedParamAndTypeHaveBeenCreated() { - // If *any* of the resources consumed by the parameter - // is a varying resource kind (e.g., varying input) then - // we consider the whole parameter to be varying. - // - // This is reasonable because there is no way to declare - // a parameter that mixes varying and non-varying fields. + if(paramStructType) + return; + + IRBuilder builder(m_sharedBuilder); + + // First we create the structure to hold the parameters. // - for( auto resInfo : layout->getOffsetAttrs() ) + builder.setInsertBefore(m_entryPointFunc); + paramStructType = builder.createStructType(); + builder.addNameHintDecoration(paramStructType, UnownedTerminatedStringSlice("EntryPointParams")); + + if( needConstantBuffer ) { - if(isVaryingResourceKind(resInfo->getResourceKind())) - return true; + // If we need a constant buffer, then the global + // shader parameter will be a `ConstantBuffer<paramStructType>` + // + auto constantBufferType = builder.getConstantBufferType(paramStructType); + collectedParam = builder.createParam(constantBufferType); + } + else + { + // Otherwise, the global shader parameter is just + // an instance of `paramStructType`. + // + collectedParam = builder.createParam(paramStructType); } - // TODO(JS): We probably want a more accurate way of determining if system semantic value - // We can use the flags Flag::SemanticValue for one. But main issue with this test, is for some - // targets currently (CPU) no resources are consumed. Perhaps this is fixed elsewhere by using a 'notional' resource. - - // Varying parameters with "system value" semantics currently show up as - // consuming no resources, so we need to special-case that here. - // - // Note: an empty `struct` parameter would also show up the same way, but - // we should eliminate any such parameters later on during type legalization. + // No matter what, the global shader parameter should have the layout + // information from the entry point attached to it, so that the + // contained parameters will end up in the right place(s). // - if(layout->getOffsetAttrs().getCount() == 0) - return true; + builder.addLayoutDecoration(collectedParam, entryPointParamsLayout); - // if none of the above tests determined that the - // parameter was varying, then we can safely consider - // it to be non-varying (uniform): - return false; + // We add a name hint to the global parameter so that it will + // emit to more readable code when referenced. + // + builder.addNameHintDecoration(collectedParam, UnownedTerminatedStringSlice("entryPointParams")); } +}; - // In order to determine whether a parameter is varying based on its - // layout, we need to know which resource kinds represent varying - // shader parameters. - // - bool isVaryingResourceKind(LayoutResourceKind kind) +struct MoveEntryPointUniformParametersToGlobalScope : PerEntryPointPass +{ + void processEntryPointImpl(IRFunc* entryPointFunc) SLANG_OVERRIDE { - switch( kind ) + // We will set up an IR builder so that we are ready to generate code. + // + IRBuilder builderStorage(m_sharedBuilder); + auto builder = &builderStorage; + + builder->setInsertBefore(entryPointFunc); + + // We will be removing any uniform parameters we run into, so we + // need to iterate the parameter list carefully to deal with + // us modifying it along the way. + // + IRParam* nextParam = nullptr; + for( IRParam* param = entryPointFunc->getFirstParam(); param; param = nextParam ) { - default: - return false; + nextParam = param->getNextParam(); - // Note: The set of cases that are considered - // varying here would need to be extended if we - // add more fine-grained resource kinds (e.g., - // if we ever add an explicit resource kind - // for geometry shader output streams). + // We expect all entry-point parameters to have layout information, + // but we will be defensive and skip parameters without the required + // information when we are in a release build. + // + auto layoutDecoration = param->findDecoration<IRLayoutDecoration>(); + SLANG_ASSERT(layoutDecoration); + if(!layoutDecoration) + continue; + auto paramLayout = as<IRVarLayout>(layoutDecoration->getLayout()); + SLANG_ASSERT(paramLayout); + if(!paramLayout) + continue; + + // A parameter that has varying input/output behavior should be left alone, + // since this pass is only supposed to apply to uniform (non-varying) + // parameters. + // + if(isVaryingParameter(paramLayout)) + continue; + + auto paramType = param->getFullType(); + + builder->setInsertBefore(entryPointFunc); + auto globalParam = builder->createGlobalParam(paramType); + + param->transferDecorationsTo(globalParam); + + // We also decorate the parameter for the entry-point parameters + // so that we can find it again in downstream passes (like emit + // for CPU/CUDA) that might want to treat entry-point parameters + // different from other cases. + // + // TODO: Once we have support for multiple entry points to be emitted + // at once, we need a way to associate these per-entry-point parameters + // more closely with the original entry point. The two easiest options + // are: + // + // 1. Don't move the new aggregate parameter to the global scope + // on those targets, and instead keep it as a parameter of the + // entry point. // - // Ordinary varying input/output: - case LayoutResourceKind::VaryingInput: - case LayoutResourceKind::VaryingOutput: + // 2. Use a decoration on the entry point itself to point at the + // global parameter for its per-entry-point parameter data. // - // Ray-tracing shader input/output: - case LayoutResourceKind::CallablePayload: - case LayoutResourceKind::HitAttributes: - case LayoutResourceKind::RayPayload: - return true; + builder->addDecoration(globalParam, kIROp_EntryPointParamDecoration); + + param->replaceUsesWith(globalParam); + param->removeAndDeallocate(); } + + fixUpFuncType(entryPointFunc); } }; +void collectEntryPointUniformParams( + IRModule* module, + CollectEntryPointUniformParamsOptions const& options) +{ + CollectEntryPointUniformParams context; + context.module = module; + context.m_options = options; + context.processModule(); +} + void moveEntryPointUniformParamsToGlobalScope( IRModule* module) { |
