Shader object specialization work-in-progress (#1728)

* Shader object specialization work-in-progress The big change here is in the `setObject()` implementations, where we now take write the witness table ID and data for the value being assigned in both the CUDA and graphics-API paths (it is possible the code could be shared...). The logic for deciding whether a value "fits" in the existential value payload should actually be correct here, since it uses the reflection data. The other relevant change is that the logic for writing out the ordinary/uniform data for a shader object on the graphics-API path has been updated so that it only allocates the GPU buffer *after* it knows the specialized layout, and can thus allocate space for any extra parameter data that wasn't in the original layout but got added by specialization. There is some inactive code in place that tries to sketch how the implementation should handle writing the data of sub-objects for interface-type fields into the appropriate areas of the allocated buffer for a parent object, but that is stubbed out for now pending implementation of the relevant reflection information. This change also introduces logic in the graphics-API path to create a specialized layout for a shader object on-demand (so that it will only be created after the specialization arguments are known or can be inferred). The implementation needs to treat ordinary shader objects and root shader objects differently because the Slang API handles specialization differently for ordinary types vs. `IComponentType`s. Some notes and caveats: * The CUDA path doesn't need to compute specialized layouts the way the graphics-API path does because layout doesn't change based on specialization for that path (just as it won't for the CPU path) * This code just skips over the RTTI field in existential values because it seems that we currently aren't using it in generated code. * We are completely missing the logic for recursively writing the resource ranges of sub-objects bound to interface-type fields into the descriptor set(s) of the parent object. The missing link there is reflection API support, just as it is for filling in the ordinary/uniform data. We need a way to get the binding range offset (and binding array stride) for the "pending" data of a specialized interface-type field. * The logic for computing specialization arguments based on the shader objects bound to interface-type fields has a lot of holes. Some of the indexing math is flat-out incorrect, and it also doesn't make any attempt to handle sub-object ranges with more than one element in them. I tweaked some of the code there to make it *more* correct, but that doesn't mean it is actually correct at this point. * The logic for computing a specialized `IComponentType` for a `ProgramVars` in the graphics-API path seems to have a lot of overlap with `maybeSpecializeProgram()`, so we should look into ways to avoid the duplication over time. * clang error fix
author: Tim Foley <tfoleyNV@users.noreply.github.com> 2021-02-26 09:43:03 -0800
committer: GitHub <noreply@github.com> 2021-02-26 09:43:03 -0800
commit: b3501add6c4dda798acf7b84b71b638d8d0b7898 (patch)
tree: 0bb3fdc542f3e828a2526e9937060e9152e91b51 /tools/gfx/cuda/render-cuda.cpp
parent: af63ee4e4df8a4a7a8eead72780222eaeb746f34 (diff)
1 files changed, 98 insertions, 10 deletions
diff --git a/tools/gfx/cuda/render-cuda.cpp b/tools/gfx/cuda/render-cuda.cpp
index eabb2d301..a32bd2d03 100644
--- a/tools/gfx/cuda/render-cuda.cpp
+++ b/tools/gfx/cuda/render-cuda.cpp
@@ -514,28 +514,116 @@ public:
         setObject(ShaderOffset const& offset, IShaderObject* object)
     {
         auto layout = getLayout();
-        SLANG_ASSERT(offset.bindingRangeIndex >= 0);
-        SLANG_ASSERT(offset.bindingRangeIndex < layout->m_bindingRanges.getCount());
-        auto& bindingRange = layout->m_bindingRanges[offset.bindingRangeIndex];
+
+        auto bindingRangeIndex = offset.bindingRangeIndex;
+        SLANG_ASSERT(bindingRangeIndex >= 0);
+        SLANG_ASSERT(bindingRangeIndex < layout->m_bindingRanges.getCount());
+
+        auto& bindingRange = layout->m_bindingRanges[bindingRangeIndex];
 
         auto subObjectIndex = bindingRange.baseIndex + offset.bindingArrayIndex;
-        auto cudaObject = dynamic_cast<CUDAShaderObject*>(object);
+        auto subObject = dynamic_cast<CUDAShaderObject*>(object);
         if (subObjectIndex >= objects.getCount())
             objects.setCount(subObjectIndex + 1);
-        objects[subObjectIndex] = cudaObject;
+
+        // TODO: We should really not need to retain the objects here
+        objects[subObjectIndex] = subObject;
 
         switch( bindingRange.bindingType )
         {
         default:
-            SLANG_RETURN_ON_FAIL(setData(offset, &cudaObject->bufferResource->m_cudaMemory, sizeof(void*)));
+            SLANG_RETURN_ON_FAIL(setData(offset, &subObject->bufferResource->m_cudaMemory, sizeof(void*)));
             break;
 
+        // If the range being assigned into represents an interface/existential-type leaf field,
+        // then we need to consider how the `object` being assigned here affects specialization.
+        // We may also need to assign some data from the sub-object into the ordinary data
+        // buffer for the parent object.
+        //
         case slang::BindingType::ExistentialValue:
-            // TODO: handle the "does it fit" logic
             {
-                auto valueSize = cudaObject->m_layout->getElementTypeLayout()->getSize();
-                auto valueOffset = 16;
-                SLANG_RETURN_ON_FAIL(setDeviceData(offset.uniformOffset + valueOffset, cudaObject->getBuffer(), valueSize));
+                auto renderer = getRenderer();
+
+                ComPtr<slang::ISession> slangSession;
+                SLANG_RETURN_ON_FAIL(renderer->getSlangSession(slangSession.writeRef()));
+
+                // A leaf field of interface type is laid out inside of the parent object
+                // as a tuple of `(RTTI, WitnessTable, Payload)`. The layout of these fields
+                // is a contract between the compiler and any runtime system, so we will
+                // need to rely on details of the binary layout.
+
+                // We start by querying the layout/type of the concrete value that the application
+                // is trying to store into the field, and also the layout/type of the leaf
+                // existential-type field itself.
+                //
+                auto concreteTypeLayout = subObject->getElementTypeLayout();
+                auto concreteType = concreteTypeLayout->getType();
+                //
+                auto existentialTypeLayout = layout->getElementTypeLayout()->getBindingRangeLeafTypeLayout(bindingRangeIndex);
+                auto existentialType = existentialTypeLayout->getType();
+
+                // The first field of the tuple (offset zero) is the run-time type information (RTTI)
+                // ID for the concrete type being stored into the field.
+                //
+                // TODO: We need to be able to gather the RTTI type ID from `object` and then
+                // use `setData(offset, &TypeID, sizeof(TypeID))`.
+
+                // The second field of the tuple (offset 8) is the ID of the "witness" for the
+                // conformance of the concrete type to the interface used by this field.
+                //
+                auto witnessTableOffset = offset;
+                witnessTableOffset.uniformOffset += 8;
+                //
+                // Conformances of a type to an interface are computed and then stored by the
+                // Slang runtime, so we can look up the ID for this particular conformance (which
+                // will create it on demand).
+                //
+                // Note: If the type doesn't actually conform to the required interface for
+                // this sub-object range, then this is the point where we will detect that
+                // fact and error out.
+                //
+                uint32_t conformanceID = 0xFFFFFFFF;
+                SLANG_RETURN_ON_FAIL(slangSession->getTypeConformanceWitnessSequentialID(
+                    concreteType, existentialType, &conformanceID));
+                //
+                // Once we have the conformance ID, then we can write it into the object
+                // at the required offset.
+                //
+                SLANG_RETURN_ON_FAIL(setData(witnessTableOffset, &conformanceID, sizeof(conformanceID)));
+
+                // The third field of the tuple (offset 16) is the "payload" that is supposed to
+                // hold the data for a value of the given concrete type.
+                //
+                auto payloadOffset = offset;
+                payloadOffset.uniformOffset += 16;
+
+                // There are two cases we need to consider here for how the payload might be used:
+                //
+                // * If the concrete type of the value being bound is one that can "fit" into the
+                //   available payload space,  then it should be stored in the payload.
+                //
+                // * If the concrete type of the value cannot fit in the payload space, then it
+                //   will need to be stored somewhere else.
+                //
+                if(_doesValueFitInExistentialPayload(concreteTypeLayout, existentialTypeLayout))
+                {
+                    // If the value can fit in the payload area, then we will go ahead and copy
+                    // its bytes into that area.
+                    //
+                    auto valueSize = concreteTypeLayout->getSize();
+                    SLANG_RETURN_ON_FAIL(setDeviceData(payloadOffset.uniformOffset, subObject->getBuffer(), valueSize));
+                }
+                else
+                {
+                    // If the value cannot fit in the payload area, then we will pass a pointer
+                    // to the sub-object instead.
+                    //
+                    // Note: The Slang compiler does not currently emit code that handles the
+                    // pointer case, but that is the expected implementation for values
+                    // that do not fit into the fixed-size payload.
+                    //
+                    SLANG_RETURN_ON_FAIL(setData(payloadOffset, &subObject->bufferResource->m_cudaMemory, sizeof(void*)));
+                }
             }
             break;
         }
author	Tim Foley <tfoleyNV@users.noreply.github.com>	2021-02-26 09:43:03 -0800
committer	GitHub <noreply@github.com>	2021-02-26 09:43:03 -0800
commit	b3501add6c4dda798acf7b84b71b638d8d0b7898 (patch)
tree	0bb3fdc542f3e828a2526e9937060e9152e91b51 /tools/gfx/cuda/render-cuda.cpp
parent	af63ee4e4df8a4a7a8eead72780222eaeb746f34 (diff)