| Age | Commit message (Collapse) | Author |
|
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* Added getNativeHandle() to TextureResource and BufferResource; Implemented getNativeHandle() in Vulkan and D3D12; Added new unit test files for the aforementioned implementation
* Added missing getNativeHandle() implementations to renderer-shared.cpp and CUDA
* Finished new getNativeHandle() unit tests for ITextureResource and IBufferResource; Modified ICommandQueue and ICommandBuffer unit tests to call QueryInterface to convert to IUnknown then back and compare resulting pointers for equality
* Unit tests updated and pass locally
* Cast m_buffer.m_buffer and m_image to uint64_t
|
|
* Added support for getting command buffer and command queue handles to ICommandBuffer and ICommandQueue; D3D12Device, VkDevice, and DebugDevice modifieid to implement this new functionality; immediate-renderer-base.cpp also modified to implement the new functions
* Removed excess boilerplate
* Changed readRef() to get() in D3D12 getNativeHandle() implementation for ICommandBuffer and ICommandQueue
* Added unit tests for new getNativeHandle() implementations, unfinished
* Queue test added; Minor cleanup changes
* getBufferHandleTestImpl() now closes the command buffer before returning
* Added getNativeHandle() implementations to CUDADevice
* Added comment clarifying that the Vulkan check is checking for a null handle, which is defined to be 0
|
|
* Resource barrier API and VK implementation
* Stub implementations
* Handle VK Acceleration Structure flag
* Add a couple more cases to pipeline barrier stages
|
|
* Update VS projects to 2019.
* Empty commit to trigger build
* Implement gfx inline ray tracing on D3D12.
|
|
|
|
|
|
* Support timestamp queries in `gfx`.
* Fix tab
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
- Fix emitting `StructuredBuffer<ISomething>::Load`, which triggers emitting for `IROp_WrapExistential` that is previously unhandled.
- Fix cuda layout around vectors, they should be aligned to 1,2,4,8,16 bytes instead of just using element type's alignment. That means `float4` has alignment of 16 instead of 4.
- Fix `SLANG_CUDA_HANDLE_ERROR` macro definition.
- Fix navis sometimes fail to find `Slang::kIROp_*` enum values when debugging external projects.
Co-authored-by: Yong He <yhe@nvidia.com>
Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Re-enable CUDA RWTexture tests.
Re-enable RWTexture1D test
Make sure tests have only single mip for RWTexture (required for CUDA)
* Fix issue with reading CUDA surface.
Re-enable working CUDA RWTextureTest.
Enable 1D case.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Fixes around Float16. Incorrect calculation of 'elementSize'.
|
|
|
|
Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Split out StringEscapeUtil.
* Added StringEscapeUtil.
* Fix typo in unix quoting type.
* Small comment improvements.
* Try to fix linux linking issue.
* Fix typo.
* Attempt to fix linux link issue.
* Update VS proj even though nothing really changed.
* Fix another typo issue.
* Fix for windows issue.
Fixed bug.
* Make separate Utils for escaping.
* Fix typo.
* Split out into StringEscapeHandler.
* Windows shell does handle removing quotes (so remove code to remove them).
* Handle unescaping if not initiating using the shell.
* Slight improvement around shell like decoding.
* Simplify command extraction.
* Add shared-library category type.
* Fix bug in command extraction.
* Typo in transcendental category.
* Enable unit-test on in smoke test category.
* Make parsing failing output as a failing test.
* Fixes for transcendental tests. Disable tests that do not work.
* Changed category parsing.
* Removed the TestResult parameter from _gatherTestsForFile.
Made testsList only output.
* Remove testing if all tests were disabled.
* Make args of CommandLine always unescaped.
* Add category.
* Don't need escaping on unix/linux.
* Remove some no longer used functions.
* Add requireSMVersion to CUDAExtensionTracker.
* half-calc.slang now works for CUDA.
* bit-cast-16-bit works on CUDA.
* WIP handling of CUDA vector<half> types.
* Half swizzle CUDA.
* Half vector test.
* Fix swizzle half bug.
* Fix compilation issue with narrowing to Index.
* Add unary ops.
* Add some vector scalar maths ops.
* Add half vector conversions for CUDA.
* Fix erroneous comment.
* Support for half comparisons.
* First pass test for half compare.
* Fix bug in CUDA specialized emit control.
Updated tests to have pre and post inc/dec.
* Removed unneeded parts of the cuda prelude.
* Half structured buffer works on CUDA.
* Added name lookup for Gfx::Format
* Support half texture type in test system.
* Test for half reading on CUDA.
* Add half formats to Vk and D3D utils.
* Fix getAt for CUDA - where there might not be a .x member in a vector.
|
|
* `gfx` DebugCallback and debug layer.
|
|
* Fix `model-viewer` crash when using Vulkan.
Fixing an issue in shader object layout creation for to make sure a correct descriptor set layout is calculated for types that need an implicit constant buffer.
* Fix formatting.
* Fixes.
* Fix memory leak in vulkan.
* Remove resource `Usage` from `gfx` interface.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* WIP CUDA half support.
* Working support for half on CUDA - requires cuda_fp16.h and associated files can be found.
* Fix for win32 for unused funcs.
* Fix for Clang.
* Hack to disable unused local function warning.
|
|
* Improve robustness of gfx lifetime management.
* fix clang error
* fix clang error
* Fix clang warning
|
|
|
|
|
|
* Reimplement Vulkan shader objects.
This change reimplements Vulkan shader objects in the `gfx` layer so that it is no longer layered on top of the `DescriptorSet` abstraction. Since this is the last implementation that uses `DescriptorSet`, the change also removes all `DescriptorSet` related API from public `gfx` interface.
The Vulkan implementation now passes all test cases, but it still have two issues:
1. The PushConstant setting is not correct, this is because we don't seem to be able to get correct reflection data about the size of push constants for an entry-point.
2. The `shader-toy` example can't run on Vulkan, because it currently sets nullptr to `Texture` bindings, and this change doesn't properly handle setting resource to null in `ShaderObject`s yet. If we can use the `nullDescriptor` feature on vulkan, this implementation will be simple. However we still want to decide whether we want to use a Vulkan 1.2 feature for this.
* Fix up
|
|
* Remove old code paths from render-test
Historically, the `render-test` tool was using three different code paths:
* One based on `gfx` and manual (non-reflection-based) parameter setting, used for OpenGL, D3D11, D3D12, and Vulkan
* One for CPU that used reflection-based parameter setting but shared no code with the first
* One for CUDA that used reflection-based parameter setting and shared some, but not all, code with the CPU path
Recently we've updated `render-test` to include a fourth option:
* Using `gfx` and the "shader object" system it exposes for a unified reflection-based parameter-setting system taht works across OpenGL, D3D11, D3D12, Vulkan, CUDA, and CPU
This change removes the first three options and leaves only the single unified path. A sa result, a bunch of code in `render-test` is no longer needed, and the codebase no longer relies on things like the `IDescriptorSet`-related APIs in `gfx`.
Several existing tests had to be disabled to make this change possible. Those tests will need to be audited and either re-enabled once we fix issues in the shader object system, or permanently removed if they don't test stuff we intend to support in the long run (e.g., global-scope type parameters, which aren't a clear necessity).
* fixup: CUDA detection logic
|
|
|
|
* Add a CPU renderer implementation
This change adds a CPU back-end to `gfx` and ensures that most of our existing CPU tests pass when using it.
Detailed notes:
* Most of the CPU renderer implementation is copy-pasted from the CUDA case, so they share a lot of similar logic
* The main addition to the CPU renderer is a semi-complete implementation of host-memory textures. The logic here handles all the main shapes (Buffer, 1D, 2D, 3D, Cube) and all the currently-supported `Format`s that are sample-able as-is (no D24S8). The implementation is not intended to be fast, and it currently only does nearest-neighbor sampling, but otherwise it tries to avoid cutting too many corners and should be ar reasonable starting point for a more complete (but not performance-oriented) implementation.
* Refactored the CPU prelude `IRWTexture` interface to inherit from `ITexture`, since in most cases a single type will end up implementing both. It might be worth it to collapse it all down to a single interface later.
* Changed the CPU prelude `ITexture`/`IRWTexture` interface so that it takes both a pointer *and* a size for output arguments. This change seems necessary to allow a shader variable declared as a `Texture2D<float>` to fetch a single `float` when the underlying texture might be using RGBA32F.
* Added to the `IComponentType` public API so that we can query a "host callable" for an entry point and not just a binary.
* Turned off the `-shaderobj` flag on two tests that weren't yet compatible with shader objects but still had the flag left in on the path (since previously the CPU path always used the non-`gfx` non-shader-object logic anyway)
* Disabled one test (`dynamic-dispatch-11`) that relied on the `ConstantBuffer<IInterface>` idiom that we know we are planning to chagne soon anyway.
* Made a few changes to the CUDA path to bring it into line with what I added for the CPU path. These were mostly bug fixes around indexing logic for sub-objects and resources.
* fixup
|
|
* Change representation of initial data for textures
Before this change, initial data for a texture has been provided with the `ITextureResource::Data` type, where a call to `IDevice::createTexture()` would take zero or one `Data` and, if present, use it to initialize all the subresources of a texture.
The organization of `Data` was not actually quite how its own documentation comment described it (the implementations didn't agree with the comment), and while it aggressively factored out redundancies (e.g., only storing the stride for each mip level once, instead of once per subresource for large arrays), the result was that setting up a `Data` correcty was a bit confusing.
This change makes the initial data for a texture using a `SubresourceData` type that is almost identical to what D3D11 uses, so that developers are more likely to be comfortable filling it in. All of the existing implementations were easily adapted to use the new type, so it seems like a net win.
Note: Both Vulkan and D3D11 do away with the idea of initializing a texture with data as part of allocating it, and we might eventually want to do the same given the complexity that this system entails. The main reason to preserve this detail is for better compatibility with D3D11, where immutable textures/buffers need to have their data specified at creation time. It seems good to preserve the ability to have immutable resources on target APIs where this distinction could affect performance (e.g., immutable resources do not need state/transition tracking on APIs like D3D11).
* fixup: CUDA
|
|
* Swapchain resize
* Fix.
|
|
* Refactor `gfx` to surface `CommandBuffer` interface.
* Fixes.
* Fix code review issues, and make vulkan runnable on devices without VK_EXT_extended_dynamic_states.
* Update solution files
* Move out-of-date examples to examples/experimental
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Shader object specialization work-in-progress
The big change here is in the `setObject()` implementations, where we now take write the witness table ID and data for the value being assigned in both the CUDA and graphics-API paths (it is possible the code could be shared...). The logic for deciding whether a value "fits" in the existential value payload should actually be correct here, since it uses the reflection data.
The other relevant change is that the logic for writing out the ordinary/uniform data for a shader object on the graphics-API path has been updated so that it only allocates the GPU buffer *after* it knows the specialized layout, and can thus allocate space for any extra parameter data that wasn't in the original layout but got added by specialization. There is some inactive code in place that tries to sketch how the implementation should handle writing the data of sub-objects for interface-type fields into the appropriate areas of the allocated buffer for a parent object, but that is stubbed out for now pending implementation of the relevant reflection information.
This change also introduces logic in the graphics-API path to create a specialized layout for a shader object on-demand (so that it will only be created after the specialization arguments are known or can be inferred). The implementation needs to treat ordinary shader objects and root shader objects differently because the Slang API handles specialization differently for ordinary types vs. `IComponentType`s.
Some notes and caveats:
* The CUDA path doesn't need to compute specialized layouts the way the graphics-API path does because layout doesn't change based on specialization for that path (just as it won't for the CPU path)
* This code just skips over the RTTI field in existential values because it seems that we currently aren't using it in generated code.
* We are completely missing the logic for recursively writing the resource ranges of sub-objects bound to interface-type fields into the descriptor set(s) of the parent object. The missing link there is reflection API support, just as it is for filling in the ordinary/uniform data. We need a way to get the binding range offset (and binding array stride) for the "pending" data of a specialized interface-type field.
* The logic for computing specialization arguments based on the shader objects bound to interface-type fields has a lot of holes. Some of the indexing math is flat-out incorrect, and it also doesn't make any attempt to handle sub-object ranges with more than one element in them. I tweaked some of the code there to make it *more* correct, but that doesn't mean it is actually correct at this point.
* The logic for computing a specialized `IComponentType` for a `ProgramVars` in the graphics-API path seems to have a lot of overlap with `maybeSpecializeProgram()`, so we should look into ways to avoid the duplication over time.
* clang error fix
|
|
* Explicit swapchain interface in `gfx`.
* Correctly return nullptr when `IRenderer` creation failed.
* Fix crashes on CUDA tests.
* Cleanups.
|
|
* Make gfx library visible to external user.
* Fixup
|
|
This change kind of rolls together two different simplifications:
1. The `createShaderObject()` shouldn't really need to take an `IShaderObjectLayout` because it could just take the `slang::TypeLayoutReflection` instead and create the shader-object layout behind the scenes.
2. For that matter, it needn't take a `slang::TypeLayoutReflection` either, becaues it could just take a `slang::TypeReflection` and query the layout of that type behind the scenes.
The combination of these two changes means:
* `IShaderObjectLayout` is gone from the public API, as is `createShaderObjectLayout()`
* `createShaderObject()` directly takes a `slang::TypeReflection` and allocates a shader object of that type
The result is simpler and more streamlined application code.
Note that under the hood the implementation still has shader-object layouts, using the `ShaderObjectLayoutBase` class. A few locations had to change to use `RefPtr`s instead of `ComPtr`s now that the class is no longer a public COM-lite API type.
The hope is that this change makes it easier to allocate/cache layouts for things like specialized types "under the hood," as is needed to implement parameter setting for static specialization.
|
|
The purpose of these changes is to make the `shader-object` example work correctly on CUDA.
Originally I had tried to add changes to the "flat" reflection information so that it introduced descriptor ranges to match the binding ranges it added for interface/existential-type fields. This approach helped the CUDA code that was using that information to try and compute uniform offsets for those fields, but it broke most of the other renderer back-ends. Instead, I removed the relevant asserts from `CUDAShaderObject::setObject()`.
Note taht there are leftover changes from my edits to the flat reflection information, around how it handles "leaf" fields that consume multiple resource kinds. I believe that those changes are, on balance, "more correct" now than they were before, so I decided to leave them in.
The other major fix here is to specialize the `CUDAShaderObject::setObject()` logic to handle the case of setting a shader object for a parameter that has interface type instead of a constant-buffer or parameter block. Mostly I just copy bytes from the child object into the parent object. There are a few caveats, though:
* I am not writing the RTTI or witness-table information, so dynamic dispatch won't work.
* I am assuming a hard-coded offset of 16 bytes for the any-value, which will work for now but is a bit too "magical" and might also break once we support conjunctions of interfaces with dynamic dispatch
* I am assuming that the child value to be writen into the field will "fit" into the any-value area. We need some way to determine whether or not things fit dynamically (ideally using the reflection data), and adapt accordingly.
* I had to add another method on the base CUDA shader object type to handle setting data using a device-memory pointr instead of a host-memory pointer
* There's not a lot we can do about it, but in the case of assigning an ordinary `CUDAShaderObject` into an interface-type field of a `CUDAEntryPointShaderObject` we end up needing to perform a device->host memory copy, because the bytes of the value will have already been written to GPU memory, but need to be in GPU memory for the dispatch call.
* The implementation I'm using here basically assumes that the child shader object must have been finalized before it gets plugged into the parent shader object. We haven't yet made a policy decision about that bit.
|
|
|
|
|
|
|
|
|
|
* Make `gfx` compile to a DLL.
* Fix cuda
* Fix cuda build
* Bug gl screen capture bug.
|
|
This change converts a large number of our existing tests to use the `ShaderObject` support that was added to the `gfx` layer.
In many cases, tests were just updated to pass `-shaderobj` and the result Just Worked.
In other cases, a `name` attribute had to be added to one or more `TEST_INPUT` lines.
For tests that did not work with shader objects "out of the box," I spent a little bit of time trying to get them work, but fell back to letting those tests run in the older mode.
Future changes to the infrastructure will be needed to get those additional tests working in the new path.
Along with the changes to test files, the following implementation changes were made to get additional tests working:
* Because the shader object mode uses explicit register bindings (from reflection), the hacky logic that was offseting `u` registers for D3D12 based on the number of render targets gets disabled (by another hack).
* The "flat" reflection information coming from Slang was not correctly reporting "binding ranges" for things that consumed only uniform data (which would be everything on CUDA/CPU), so it was refactored to properly include binding ranges for anything where the type of the field/variable implied a binding range should be created (even if the `LayoutResourceKind` was `::Uniform`).
* A few fixes were made to the CUDA implementation of `Renderer`, in order to get additional tests up and running. Most of these changes had to do with texture bindings, which hadn't really been tested previously.
In addition, a few changes were made that were attempts at getting more tests working, but didn't actually help. These could be dropped if requested:
* As a quality-of-life feature (not being used) the `object` style of `TEST_INPUT` line is upgraded to support inferring the type to use from the type of the input being set.
* Any `object` shader input lines get ignored in non-shader-object mode.
|
|
* COM-ify all slang-gfx interfaces.
|
|
* Make `gfx::Renderer` a COM interface.
This is a first step towards making the `gfx` library expose a COM compatible DLL interface. Remaining classes will come as separate PRs.
* Fixup project files
* Fix calling conventions
* Make gfx::create*Renderer() functions increase ref count by 1
* Make renderer createFunc return via out parameter
|
|
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Cleanup CUDA renderer.
* More cleanup
* fixes.
* update comments
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Implements CUDA renderer in gfx.
* Revert unnecessary change.
* Revert unnecessary changes.
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|