diff options
| author | Tim Foley <tfoleyNV@users.noreply.github.com> | 2020-08-05 11:47:18 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2020-08-05 11:47:18 -0700 |
| commit | e713b56a63dcbf945e3e0e6d82666318795c74ff (patch) | |
| tree | 7883169c68f9516d1ebff70c5529b6f10933e1d5 /tools/gfx/d3d11 | |
| parent | 6fb2aa70a2681bffbac7e8de67e9598105389945 (diff) | |
Change the policy for entry-point uniform parameters on Vulkan (#1476)
Entry point `uniform` parameters were a feature of the original Cg and HLSL, but have not been used much in production shader code. One of our goals on Slang is to reduce the (ab)use of the global scope, so bringing entry point `uniform` parameters up to a greater level of usability is an important goal.
Some policy choices about how global vs. entry-point `uniform` parameters behave have already been made, that shape decisions looking forward:
* For DXBC/DXIL, it makes the most sense to follow the lead of fxc/dxc, by treating entry point `uniform` parameters as a kind of syntax sugar for global shader parameters. Any parameters of "ordinary" types are bundles up into an implicit constant buffer, and all the resources (including the implicit constant buffer) are assigned `register`s just as for globals. It is up to the application to decide how to bind those parameters via a root signature (using root descriptors, root constants, descriptor tables, local vs. global root signature, etc.)
* For CPU, it makes sense to pass global vs. entry-point parameters as two different pointers, although the details of what we do for CPU are the least constrained across all current targets.
* For CUDA compute, it makes the most sense to map global shader parameters to `__constant__` global data, and entry-point `uniform` parameters to kernel parameters. This choice ensures that the signature of a kernel when translated from Slang->CUDA follows the Principle of Least Surprise, at the cost of making entry-point vs. global parameters be passed via different mechanisms.
* For OptiX ray tracing, it makes sense to expand on the precedent from CUDA compute: pass global parameters via global `__constant__` data (as is already expected by OptiX for whole-launch parameters), and pass entry-point `uniform` parameters via the "shader record." This establishes a precedent that for ray-tracing shaders, global-scope parameters map to the "global root signature" concept from DXR, while entry-point `uniform` parameters map to a "local root signature" or "shader record."
* For Vulkan ray tracing, the precedent from OptiX then argues that entry-point `uniform` parameters should map to the Vulkan "shader record" concept (and thus cannot support things like resource types).
* The remaining interesting case is what to do for non-ray-tracing shaders on Vulkan.
The dev team agrees that the most reasonable choice to make for non-ray-tracing Vulkan shaders is to map entry-point `uniform` parameters to "push constants." In particular, this makes it easy to express the case of a compute kernel with direct parameters of ordinary/value types in the way that will be implemented most efficiently.
The big picture is then that a kernel like:
```hlsl
void computeMain(uniform float someValue) { ... }
```
will map to output GLSL like:
```glsl
layout(push_constant)
uniform
{
float someValue;
} U;
void main() { ... }
```
If the user really wanted a constant-buffer binding to be created instead, they can easily change their input to make the buffer explicit:
```hlsl
struct Params { float someValue; }
void computeMain(uniform ConstantBuffer<Params> params) { ... }
```
(Forcing the user to be explicit about the desire for a buffer here creates a nice symmetry between Vulkan and CUDA; in the first case the user sets up the data in host memory and passes it to the GPU by copy, while in the second case the user must allocate and set up a device-memory buffer for the data. This symmetry extends to D3D if the application chooses to map entry-point `uniform` parameters to root constants.)
This change implements logic in the "parameter binding" part of the Slang compiler to make sure that entry-point `uniform` parameters are wrapped up in a push-constant buffer rather than an ordinary constant buffer for non-ray-tracing shaders on Vulkan (and in a shader record "buffer" for the ray-tracing case).
The majority of the actual work was in adding support for root/push constants to the test framework and the graphics API abstraction it uses. To be clear about that support:
* Root constant ranges are (perhaps confusingly) treated as a new kind of "slot" that can appear on a descriptor set. This choice ensures that the implicit numbering of registers/spaces used by the back-ends can account for these ranges correctly.
* The `TEST_INPUT` lines are extended to allow a `root_constants` case that behaves more or less like `cbuffer`
* The CPU and CUDA paths can treat a `root_constants` input identically to a `cbuffer`. They already allocate the actual buffers based on reflection, and just use `cbuffer` as a directive that causes bytes to be copied in.
* On D3D12 and Vulkan, a descriptor set allocates a `List<char>` to hold the bytes of root constant data assigned into it, and these bytes are flushed to the command list when the table is actually bound (usually right before rendering).
* On D3D11, a descriptor set treats a root constant range more or less like a constant buffer range (with a single buffer), except that it also automatically allocates a buffer to hold the data. Assigning "root constant" data automatically copies it into that buffer.
The small number of tests that used entry-point `uniform` parameters of ordinary types were updated to use the new `root_constant` input type, and the bugs that surfaced were fixed.
A new test to confirm that entry-point `uniform` parameters map to the shader record for VK ray tracing was added.
An important but technically unrelated change is the removal of the `DescriptorSetImpl::Binding` type and related function from the Vulkan implementation of `Renderer`. That type was created to ensure that objects that are bound into a descriptor set don't get released while the descriptor set is still alive, but the implementation relied on a complicated linear search to check for existing bindings, which could create a performance issue for descriptor sets that include large arrays of descriptors. The new implementation makes use of the approach already present in the various `Renderer` implementations (including the Vulkan one) for assigning ranges in a descriptor set a flat/linear index for where their pertinent data is to be bound. As a result, the Vulkan `DescriptorSetImpl` now uses a single flat array of `RefPtr`s to track bound objects, and has no need for linear search when binding.
Co-authored-by: Yong He <yonghe@outlook.com>
Diffstat (limited to 'tools/gfx/d3d11')
| -rw-r--r-- | tools/gfx/d3d11/render-d3d11.cpp | 166 |
1 files changed, 163 insertions, 3 deletions
diff --git a/tools/gfx/d3d11/render-d3d11.cpp b/tools/gfx/d3d11/render-d3d11.cpp index cf2ae75e2..4eba4edaf 100644 --- a/tools/gfx/d3d11/render-d3d11.cpp +++ b/tools/gfx/d3d11/render-d3d11.cpp @@ -139,14 +139,66 @@ public: class DescriptorSetLayoutImpl : public DescriptorSetLayout { public: + // Each descriptor set for the D3D11 renderer stores distinct + // arrays for each kind of shader-visible entity D3D11 understands: + // shader resource views (SRVs), unordered access views (UAVs), + // constant buffers (CBs), and samplers. + // + // (This description will ignore compiled image/sampler pairs, + // since they aren't really well supported at present) + // + // Each descriptor range in an input `DescriptorSetLayout::Desc` + // will map to a range of entries in one of those arrays, but + // in general there can be multiple `DescriptorSlotType`s that + // map to the same `D3D11DescriptorSlotType`. + // + // Each `RangeInfo` in a D3D11 descriptor set layout represents + // of of the descriptor slot ranges in the original `Desc`, + // and stores the information that is relevant to its layout + // in our D3D11 implementation. + struct RangeInfo { + /// The type of descriptors in the range, in D3D11 terms (SRVs, UAVs, etc.) D3D11DescriptorSlotType type; + + /// The start index of this range in the relevant descriptor-type-specific array. + /// + /// Note: This is *not* the same as the index of the range, both because multiple + /// `DescriptorSlotType`s might map to the same array in the D3D11 implementation, + /// and also because a given range might store multiple descriptors (so a 3-texture + /// range that comes after a 5-texture range will have an `arrayIndex` of 5 but + /// a range index of 1). + /// UInt arrayIndex; + + /// For the case of a combined image/sampler pair, the `arrayIndex` is an index + /// into the array of SRVs, and we store a separate index into the array of + /// samplers. + /// UInt pairedSamplerArrayIndex; }; List<RangeInfo> m_ranges; + // Because D3D11 does not support root constants as they appear in + // D3D12 and Vulkan, we need to map root-constant ranges in the original `Desc` + // over to ordinary constant buffers. Each root-constant range (of whatever + // size) will map to a constant-buffer range of a single buffer. + // + // In order to be able to properly allocate/initialize these root constant + // buffers, we store additional information about them in a flattened array + // that only stores information for root constant ranges. + + struct RootConstantRangeInfo + { + /// Index of the `RangeInfo` corresponding to this root-constant range + Index rangeIndex; + + /// Size of the original root-constant range, in bytes. + UInt size; + }; + List<RootConstantRangeInfo> m_rootConstantRanges; + UInt m_counts[int(D3D11DescriptorSlotType::CountOf)]; }; @@ -174,6 +226,13 @@ public: UInt index, ResourceView* textureView, SamplerState* sampler) override; + virtual void setRootConstants( + UInt range, + UInt offset, + UInt size, + void const* data) override; + + D3D11Renderer* m_renderer = nullptr; RefPtr<DescriptorSetLayoutImpl> m_layout; @@ -1762,6 +1821,7 @@ Result D3D11Renderer::createDescriptorSetLayout(const DescriptorSetLayout::Desc& DescriptorSetLayoutImpl::RangeInfo rangeInfo; + UInt slotCount = rangeDesc.count; switch(rangeDesc.type) { default: @@ -1776,6 +1836,11 @@ Result D3D11Renderer::createDescriptorSetLayout(const DescriptorSetLayout::Desc& rangeInfo.type = D3D11DescriptorSlotType::CombinedTextureSampler; break; + case DescriptorSlotType::RootConstant: + // A root-constant range will be treated as if it were + // a constant-buffer range with a single buffer in it. + // + slotCount = 1; case DescriptorSlotType::UniformBuffer: case DescriptorSlotType::DynamicUniformBuffer: rangeInfo.type = D3D11DescriptorSlotType::ConstantBuffer; @@ -1803,17 +1868,31 @@ Result D3D11Renderer::createDescriptorSetLayout(const DescriptorSetLayout::Desc& rangeInfo.arrayIndex = counts[srvTypeIndex]; rangeInfo.pairedSamplerArrayIndex = counts[samplerTypeIndex]; - counts[srvTypeIndex] += rangeDesc.count; - counts[samplerTypeIndex] += rangeDesc.count; + counts[srvTypeIndex] += slotCount; + counts[samplerTypeIndex] += slotCount; } else { auto typeIndex = int(rangeInfo.type); rangeInfo.arrayIndex = counts[typeIndex]; - counts[typeIndex] += rangeDesc.count; + counts[typeIndex] += slotCount; } + Index rangeIndex = descriptorSetLayoutImpl->m_ranges.getCount(); descriptorSetLayoutImpl->m_ranges.add(rangeInfo); + + if(rangeDesc.type == DescriptorSlotType::RootConstant) + { + // If the range represents a root constant range, then + // we need to also store the information we will need when + // allocating a constant buffer to provide backing storage + // for the range. + // + DescriptorSetLayoutImpl::RootConstantRangeInfo rootConstantRangeInfo; + rootConstantRangeInfo.rangeIndex = rangeIndex; + rootConstantRangeInfo.size = rangeDesc.count; + descriptorSetLayoutImpl->m_rootConstantRanges.add(rootConstantRangeInfo); + } } for(int ii = 0; ii < int(D3D11DescriptorSlotType::CountOf); ++ii) @@ -1860,12 +1939,55 @@ Result D3D11Renderer::createDescriptorSet(DescriptorSetLayout* layout, Descripto RefPtr<DescriptorSetImpl> descriptorSetImpl = new DescriptorSetImpl(); + descriptorSetImpl->m_renderer = this; descriptorSetImpl->m_layout = layoutImpl; descriptorSetImpl->m_cbs .setCount(layoutImpl->m_counts[int(D3D11DescriptorSlotType::ConstantBuffer)]); descriptorSetImpl->m_srvs .setCount(layoutImpl->m_counts[int(D3D11DescriptorSlotType::ShaderResourceView)]); descriptorSetImpl->m_uavs .setCount(layoutImpl->m_counts[int(D3D11DescriptorSlotType::UnorderedAccessView)]); descriptorSetImpl->m_samplers.setCount(layoutImpl->m_counts[int(D3D11DescriptorSlotType::Sampler)]); + // If the layout includes any root constant ranges, then + // we will need to allocate a constant buffer for each + // range to provide "backing storage" for its data. + // + for(auto rootConstantRange : layoutImpl->m_rootConstantRanges) + { + // The root constant range will refer to a descriptor slot + // range that represents a range with a single constant + // buffer in it. We need to grab that range so that we + // know what constant-buffer bindign slot to fill in. + // + auto rangeIndex = rootConstantRange.rangeIndex; + auto bufferRange = layoutImpl->m_ranges[rangeIndex]; + + // We will allocate the constant buffer that provides + // backing storage directly using D3D11 API calls, + // rather than allocate it as a buffer resource. + // + // TODO: We could revisit that decision if allocating + // a buffer resource proves easier down the line. + + // Note: A D3D11 constant buffer must be a multiple of 16 bytes + // in size, so we will round up the allocation size to match + // the requirement. + // + UINT size = (UINT) rootConstantRange.size; + size = (size + 15) & ~15; + + D3D11_BUFFER_DESC bufferDesc; + bufferDesc.BindFlags = D3D11_BIND_CONSTANT_BUFFER; + bufferDesc.ByteWidth = size; + bufferDesc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE; + bufferDesc.MiscFlags = 0; + bufferDesc.StructureByteStride = 0; + bufferDesc.Usage = D3D11_USAGE_DYNAMIC; + + Slang::ComPtr<ID3D11Buffer> buffer; + SLANG_RETURN_ON_FAIL(m_device->CreateBuffer(&bufferDesc, nullptr, buffer.writeRef())); + + descriptorSetImpl->m_cbs[bufferRange.arrayIndex] = buffer; + } + *outDescriptorSet = descriptorSetImpl.detach(); return SLANG_OK; } @@ -2277,6 +2399,44 @@ void D3D11Renderer::DescriptorSetImpl::setCombinedTextureSampler( m_srvs[rangeInfo.pairedSamplerArrayIndex + index] = srvImpl->m_srv; } +void D3D11Renderer::DescriptorSetImpl::setRootConstants( + UInt range, + UInt offset, + UInt size, + void const* data) +{ + // The `range` parameter represents the index of a descriptor + // slot range in the layout of this descriptor set. + // + // A root constant range will have been translated into + // a constnat buffer range at creation time for the layout. + // + auto& rangeInfo = m_layout->m_ranges[range]; + assert(rangeInfo.type == D3D11DescriptorSlotType::ConstantBuffer); + + // At the time the descriptor set was allocated, a + // constant buffer will have been created and bound + // into `m_cbs` to provide backing storage for the + // root constant range. + // + auto dxBuffer = m_cbs[rangeInfo.arrayIndex]; + auto dxContext = m_renderer->m_immediateContext; + + // Once we have the buffer that provides backing + // storage we simply need to map it and write + // the user-provided data into it. + // + D3D11_MAPPED_SUBRESOURCE mapped; + HRESULT hr = dxContext->Map(dxBuffer, 0, D3D11_MAP_WRITE_NO_OVERWRITE, 0, &mapped); + if( FAILED(hr) ) + { + SLANG_ASSERT(!"failed to map backing storage for root constant range"); + return; + } + memcpy((char*)mapped.pData + offset, data, size); + dxContext->Unmap(dxBuffer, 0); +} + void D3D11Renderer::setDescriptorSet(PipelineType pipelineType, PipelineLayout* layout, UInt index, DescriptorSet* descriptorSet) { auto pipelineLayoutImpl = (PipelineLayoutImpl*)layout; |
