| Commit message (Collapse) | Author | Age |
| |
|
| |
Co-authored-by: Yong He <yhe@nvidia.com>
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* gfx: D3D12 and VK Fence implementation.
* Fix.
* Update project files.
* Revert project file changes.
* Remove project files
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
| |
* gfx Mutable Root shader object implementation.
* Fix x86 build.
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
| |
* gfx ShaderObject interface update, getTextureAllocationInfo()
* Fix render-vk compiler warnings and errors.
Co-authored-by: Yong He <yhe@nvidia.com>
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
| |
* Added both the SharedHandle struct containing a handle and the API the handle originated from and the getSharedHandle() method to IResource, which returns a Windows system handle for the resource that can then be shared between multiple APIs (currently only fully implemented for D3D12); Added createTextureFromNativeHandle() and createBufferFromNativeHandle() to IDevice, which creates a buffer or texture resource using the provided handle (currently only fully implemented for D3D12); Added createBufferFromSharedHandle() to IDevice, which creates a BufferResource using the provided system handle (currently only fully implemented for the D3D12 to CUDA interface); Provided a proper implementation for CUDADevice::getNativeHandle(); Added several new tests testing the aforementioned implementations; Moved NativeHandle and getNativeHandle() for IBufferResource and ITextureResource up a layer into IResource and renamed to NativeResourceHandle; Modified NativeResourceHandle to be a struct containing the handle and the API it originated from and propagated these changes where appropriate
* Combined all native and shared handle representations into a unified InteropHandle struct which tracks the handle's value and source API; Modified all getNativeHandle() and getSharedHandle() variants to operate on InteropHandle and modified all affected files
* D3D12 buffers and textures are now responsible for closing their shared handles if they exist; Renamed IDevice::getNativeHandle() to getNativeDeviceHandles()
* Fixed getNativeDeviceHandles() in render-cuda to match updated method elsewhere
* Temporarily disabling existingDeviceHandleCUDA and sharedHandleD3D12ToCUDA due to currently unreproducable test failures on TC
|
| |
|
|
|
|
|
|
|
|
|
| |
* Add interface for new gfx features.
* Add cuda implementation.
* Code review fixes.
* Fix.
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* GFX: implement mutable shader objects.
* Revert unnecessary changes
* Revert more changes.
* Fix clang errors.
* Fix clang/gcc errors.
* Fix clang errors.
* Remove CPU test.
* Fix after merge.
* Fix after merge.
* Remove gl test
* Code review fixes.
* Fixing all vk validation errors.
* Flush test output more often.
* Fix a crash in `specializeDynamicAssociatedTypeLookup`.
* temporarily disable std-lib-serialize test to see what happens
* Fix crashes.
* Make sure cpu gfx unit tests are properly disabled on TeamCity.
* Disable cpu test.
* Fix.
* Fix cuda.
* Disable nv-ray-tracing-motion-blur
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
* Added getNativeHandle() to TextureResource and BufferResource; Implemented getNativeHandle() in Vulkan and D3D12; Added new unit test files for the aforementioned implementation
* Added missing getNativeHandle() implementations to renderer-shared.cpp and CUDA
* Finished new getNativeHandle() unit tests for ITextureResource and IBufferResource; Modified ICommandQueue and ICommandBuffer unit tests to call QueryInterface to convert to IUnknown then back and compare resulting pointers for equality
* Unit tests updated and pass locally
* Cast m_buffer.m_buffer and m_image to uint64_t
|
| |
|
|
|
|
|
| |
* Added a getNativeHandle() method that retrieves the natively created handles; Modified RendererBase, VKDevice, D3D12Device, and DebugDevice to implement this new method
* Moved ExistingDeviceHandles out of Desc directly inside IDevice and renamed to NativeHandles; Modified calls accessing the struct accordingly in RendererBase, DebugDevice, VKDevice, and D3D12Device
* Minor cleanup changes (renames, etc.)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Experimental DXR1.0 support in gfx.
- Add `dispatchRays` command.
- Add `createRayTracingPipelineState` method to construct a D3D ray tracing state object from a linked slang program and user specified shader table.
Limitations/simplifications: no local root signature support, shader table entries contains only shader identifiers and is specified at pipeline creation time, owned by the pipeline state object.
* Root object binding for raytracing pipelines.
* `maybeSpecializePipeline` implementation for raytracing pipelines.
* Add ray-tracing-pipeline example.
* Fixes.
* Update README.md
* Update comments on the lifespan of specialized pipelines
Co-authored-by: Yong He <yhe@nvidia.com>
Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
| |
* Minor refactor to gfx D3D12 implementation.
- Allow more flexible collection of shader stages in a shader program.
- Add `createRayTracingPipelineState` public interface. (no implementation).
* Fix Vulkan initialization.
Co-authored-by: Yong He <yhe@nvidia.com>
|
| | |
|
| |
|
|
|
|
|
| |
* Support timestamp queries in `gfx`.
* Fix tab
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
| |
Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Update gfx back-ends to handle static specialization
The main goal here is to make the D3D11, D3D12 and Vulkan back-ends support static specialization of interface types in the case where the data for the type won't "fit" in the pre-allocated space for existential values. This includes all cases where the concrete type being specialized to has resources/samplers/etc., as well as any cases where its ordinary/uniform data exceeds the space available.
(Note that the CPU and CUDA targets don't need this work since they can (in theory) support arbitrary-size data in the fixed-size existential payload by using pointer indirection. Actually supporting indirection in those cases should be a distinct change)
The Slang compiler already performs layout for programs that have this kind of data that doesn't "fit," and it lays them out using an idea of "pending" type layouts. Basically, a type that contains some amount of specialized interface-type fields will produce both a "primary" type layout that just covers the data for the unspecialized case, as well as "pending" type layout that describes the layout for all the extra data needed by specialization.
When laying out a `ConstantBuffer<X>` or `ParameterBlocK<X>` ("CB" or "PB"), the front-end will try to place as much of that "pending" data into the layout of the buffer/block itself as is possible. That means that both CBs and PBs will be able to allocate trailing bytes for any ordinary data in the "pending" layout. PBs will be able to allocate any trailing resources/samplers into their layout, but for CBs they will spill out to be part of the pending layout for the buffer itself.
In order for the back-ends to properly handle pending data, they need to *either* assume the exact layout rules used by the front-end and try to reproduce them (e.g., by iterating over binding ranges and sub-objects in the exact same order that front-end layout would enumerate them), *or* they need to respect the reflection information produced by the front-end. This change takes the latter approach, trying to make only minimal assumptions about the layout rules being used. This choice is motivated by wanting to decouple the `gfx` implementation from the compiler front-end, especially insofar as this work has made me question whether the current layout rules are the best ones possible.
A common theme across all the implementations is to have a fixed-size type that can represent "binding offsets" for the chosen back-end. The offset type has fields that depend on the API-specific way bindings are indexed; e.g., for D3D11 it has offsets for CBV, SRV, UAV, and sampler bindings. This fixed-size offset type can be filled in based on Slang reflecton information, and then used to compute derived offsets with just a few add operations.
The simple offset type for each API is then extended to produce an offset type that includes both the offsets for "primary" data and also the offsets for "pending" data. Most logic that traffics in offsets doesn't have to know about this more complicated representation.
Making consistent use of these offsets required that I pretty much rewrite the logic that actually applies shader objects to the API state. Doing so might be lowering the efficiency of the system in the near term, but the increase in clarity was important for getting the work done, and it seems like it will also be important if/when we start trying to perform special-case optimizations around root and entry-point parameter setting.
While there are many API-specific differences, we can identify a repeated pattern where many steps, whether applying parameters to the pipeline stage or constructing signatures / layouts, can be broken down into three main operations on `ShaderObject`s or their layouts:
* `*AsValue()` is the core operation, and is the one used for the `ExistentialValue` case most of the time. It ignores the ordinary data in the object, and instead processes all nested binding ranges (for resources/smaplers) and sub-objects.
* `*AsConstantBuffer()` handles the `ConstntBuffer<X>` case, by dealing with the implicit buffer for ordinary data (if it is needed) and then delegates to the `*AsValue()` case.
* `*AsParameterBlock()` handles the `ParameterBlock<X>` case, by allocating/preparing/etc. any descriptor tables/sets that would be required for the current object/layout and then delegating to `*AsConstantBuffer()` to do the rest
The idea is that by having the parameter block case delegate to the constant buffer case, which delegates to the value/existential case, we can streamline a lot of the logic so that it doesn't seem quite as full of special cases.
Note: When preparing this pull request I spent a reasonable amount of time trying to clean up the D3D11 and Vulkan implementations, so they are probably the easiest to read and understand when it comes to the new code. Doing the cleanup work also helped to work out some weird corner case bugs/issues. In contrast, the D3D12 path hasn't had as much attention given to cleanliness and comments, so it really needs some attention down the line to get things into a state that is easier to understand.
* fixup: remove debugging code spotted in review
|
| |
|
| |
* `gfx` DebugCallback and debug layer.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Fixing `PseudoPtr` legalization and `gfx` lifetime issues.
* Fixing `model-viewer` example.
This change contains various fixes to bring `model-viewer` example to fully functional. These fixes include:
1. Add `spReflectionTypeLayout_getSubObjectRangeSpaceOffset` function to return the space index for a sub object referenced through a `ParameterBlock` binding.
2. Make sure `D3D12Device` specifies column major matrix order creating a Slang session.
3. Fix `platform::Window::close()` and `platform::Application::quit()`.
4. Fix memory leak during `model-viewer''s model loading.
5. Fix command buffer recording in `model-viewer`.
With these changes, model viewer can now produce an image with a gray cube. The lighting is still incorrect becuase the `gfx` shader object implementation still does not handle "pending layout" resulting from global existential parameters.
* Fix d3d12 root signature creation.
* Use row-major matrix layout in model-viewer
|
| |
|
|
|
|
|
|
|
| |
* Improve robustness of gfx lifetime management.
* fix clang error
* fix clang error
* Fix clang warning
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
| |
* Reimplement Vulkan shader objects.
This change reimplements Vulkan shader objects in the `gfx` layer so that it is no longer layered on top of the `DescriptorSet` abstraction. Since this is the last implementation that uses `DescriptorSet`, the change also removes all `DescriptorSet` related API from public `gfx` interface.
The Vulkan implementation now passes all test cases, but it still have two issues:
1. The PushConstant setting is not correct, this is because we don't seem to be able to get correct reflection data about the size of push constants for an entry-point.
2. The `shader-toy` example can't run on Vulkan, because it currently sets nullptr to `Texture` bindings, and this change doesn't properly handle setting resource to null in `ShaderObject`s yet. If we can use the `nullDescriptor` feature on vulkan, this implementation will be simple. However we still want to decide whether we want to use a Vulkan 1.2 feature for this.
* Fix up
|
| | |
|
| |
|
|
|
| |
* Swapchain resize
* Fix.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Refactor `gfx` to surface `CommandBuffer` interface.
* Fixes.
* Fix code review issues, and make vulkan runnable on devices without VK_EXT_extended_dynamic_states.
* Update solution files
* Move out-of-date examples to examples/experimental
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Shader object specialization work-in-progress
The big change here is in the `setObject()` implementations, where we now take write the witness table ID and data for the value being assigned in both the CUDA and graphics-API paths (it is possible the code could be shared...). The logic for deciding whether a value "fits" in the existential value payload should actually be correct here, since it uses the reflection data.
The other relevant change is that the logic for writing out the ordinary/uniform data for a shader object on the graphics-API path has been updated so that it only allocates the GPU buffer *after* it knows the specialized layout, and can thus allocate space for any extra parameter data that wasn't in the original layout but got added by specialization. There is some inactive code in place that tries to sketch how the implementation should handle writing the data of sub-objects for interface-type fields into the appropriate areas of the allocated buffer for a parent object, but that is stubbed out for now pending implementation of the relevant reflection information.
This change also introduces logic in the graphics-API path to create a specialized layout for a shader object on-demand (so that it will only be created after the specialization arguments are known or can be inferred). The implementation needs to treat ordinary shader objects and root shader objects differently because the Slang API handles specialization differently for ordinary types vs. `IComponentType`s.
Some notes and caveats:
* The CUDA path doesn't need to compute specialized layouts the way the graphics-API path does because layout doesn't change based on specialization for that path (just as it won't for the CPU path)
* This code just skips over the RTTI field in existential values because it seems that we currently aren't using it in generated code.
* We are completely missing the logic for recursively writing the resource ranges of sub-objects bound to interface-type fields into the descriptor set(s) of the parent object. The missing link there is reflection API support, just as it is for filling in the ordinary/uniform data. We need a way to get the binding range offset (and binding array stride) for the "pending" data of a specialized interface-type field.
* The logic for computing specialization arguments based on the shader objects bound to interface-type fields has a lot of holes. Some of the indexing math is flat-out incorrect, and it also doesn't make any attempt to handle sub-object ranges with more than one element in them. I tweaked some of the code there to make it *more* correct, but that doesn't mean it is actually correct at this point.
* The logic for computing a specialized `IComponentType` for a `ProgramVars` in the graphics-API path seems to have a lot of overlap with `maybeSpecializeProgram()`, so we should look into ways to avoid the duplication over time.
* clang error fix
|
| |
|
|
|
|
|
|
|
| |
* Explicit swapchain interface in `gfx`.
* Correctly return nullptr when `IRenderer` creation failed.
* Fix crashes on CUDA tests.
* Cleanups.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change kind of rolls together two different simplifications:
1. The `createShaderObject()` shouldn't really need to take an `IShaderObjectLayout` because it could just take the `slang::TypeLayoutReflection` instead and create the shader-object layout behind the scenes.
2. For that matter, it needn't take a `slang::TypeLayoutReflection` either, becaues it could just take a `slang::TypeReflection` and query the layout of that type behind the scenes.
The combination of these two changes means:
* `IShaderObjectLayout` is gone from the public API, as is `createShaderObjectLayout()`
* `createShaderObject()` directly takes a `slang::TypeReflection` and allocates a shader object of that type
The result is simpler and more streamlined application code.
Note that under the hood the implementation still has shader-object layouts, using the `ShaderObjectLayoutBase` class. A few locations had to change to use `RefPtr`s instead of `ComPtr`s now that the class is no longer a public COM-lite API type.
The hope is that this change makes it easier to allocate/cache layouts for things like specialized types "under the hood," as is needed to implement parameter setting for static specialization.
|
| | |
|
| | |
|
| | |
|
|
|
* COM-ify all slang-gfx interfaces.
|