summaryrefslogtreecommitdiffstats
path: root/tools/gfx/vulkan
Commit message (Collapse)AuthorAge
...
* Add GLSL450 intrinsics to SPIRV direct emit. (#1921)Yong He2021-08-17
| | | | | | | | | | | | | * Add GLSL450 intrinsics to SPIRV direct emit. * Fix. * Fix compiler error. * Fix. * Fix compiler error. * Make direct-spirv tests actually run.
* Further implementation of SPIRV direct emit. (#1920)Yong He2021-08-12
| | | | | | | | | | | | | | * Further implementation of SPIRV direct emit. This change implements: - Struct, Vector, Matrix and Unsized Array types. - Basic arithmetic opcodes, vector construct, swizzle etc. - getElementPtr, getElement, fieldAddress, extractField. - SPIRV target intrinsics with SPIRV asm code in stdlib. - RWStructuredBuffer and StructuredBuffer. - Pointer storage class propagation. - Control flow. * Fix.
* Experimental DXR1.0 support in gfx. (#1915)Yong He2021-07-28
| | | | | | | | | | | | | | | | | | | | | | | * Experimental DXR1.0 support in gfx. - Add `dispatchRays` command. - Add `createRayTracingPipelineState` method to construct a D3D ray tracing state object from a linked slang program and user specified shader table. Limitations/simplifications: no local root signature support, shader table entries contains only shader identifiers and is specified at pipeline creation time, owned by the pipeline state object. * Root object binding for raytracing pipelines. * `maybeSpecializePipeline` implementation for raytracing pipelines. * Add ray-tracing-pipeline example. * Fixes. * Update README.md * Update comments on the lifespan of specialized pipelines Co-authored-by: Yong He <yhe@nvidia.com> Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
* Minor refactor to gfx D3D12 implementation. (#1913)Yong He2021-07-20
| | | | | | | | | | * Minor refactor to gfx D3D12 implementation. - Allow more flexible collection of shader stages in a shader program. - Add `createRayTracingPipelineState` public interface. (no implementation). * Fix Vulkan initialization. Co-authored-by: Yong He <yhe@nvidia.com>
* Enable swiftshader in linux CI builds (#1909)Yong He2021-07-19
|
* Enable testing with Swiftshader. (#1906)Yong He2021-07-09
|
* Allow render-test to run inline ray tracing tests. (#1903)Yong He2021-07-08
| | | | | | | | | | | * Update VS projects to 2019. * Empty commit to trigger build * Implement gfx inline ray tracing on D3D12. * Allow render-test to run inline ray tracing tests. Co-authored-by: Yong He <yhe@nvidia.com>
* Implement gfx inline ray tracing on D3D12. (#1902)Yong He2021-07-08
| | | | | | | * Update VS projects to 2019. * Empty commit to trigger build * Implement gfx inline ray tracing on D3D12.
* [gfx] Add inline ray tracing support. (#1899)Yong He2021-06-30
|
* [gfx] Add `IBufferResource::getDeviceAddress()`. (#1892)Yong He2021-06-23
|
* Support timestamp queries in `gfx`. (#1880)Yong He2021-06-10
| | | | | | | * Support timestamp queries in `gfx`. * Fix tab Co-authored-by: Yong He <yhe@nvidia.com>
* Fix D3D11 `uploadBufferResource`. (#1869)Yong He2021-06-04
|
* [gfx] Support StructuredBuffer<IInterface>. (#1851)Yong He2021-05-21
| | | Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
* Half texture support (#1836)jsmall-nvidia2021-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * Split out StringEscapeUtil. * Added StringEscapeUtil. * Fix typo in unix quoting type. * Small comment improvements. * Try to fix linux linking issue. * Fix typo. * Attempt to fix linux link issue. * Update VS proj even though nothing really changed. * Fix another typo issue. * Fix for windows issue. Fixed bug. * Make separate Utils for escaping. * Fix typo. * Split out into StringEscapeHandler. * Windows shell does handle removing quotes (so remove code to remove them). * Handle unescaping if not initiating using the shell. * Slight improvement around shell like decoding. * Simplify command extraction. * Add shared-library category type. * Fix bug in command extraction. * Typo in transcendental category. * Enable unit-test on in smoke test category. * Make parsing failing output as a failing test. * Fixes for transcendental tests. Disable tests that do not work. * Changed category parsing. * Removed the TestResult parameter from _gatherTestsForFile. Made testsList only output. * Remove testing if all tests were disabled. * Make args of CommandLine always unescaped. * Add category. * Don't need escaping on unix/linux. * Remove some no longer used functions. * Add requireSMVersion to CUDAExtensionTracker. * half-calc.slang now works for CUDA. * bit-cast-16-bit works on CUDA. * WIP handling of CUDA vector<half> types. * Half swizzle CUDA. * Half vector test. * Fix swizzle half bug. * Fix compilation issue with narrowing to Index. * Add unary ops. * Add some vector scalar maths ops. * Add half vector conversions for CUDA. * Fix erroneous comment. * Support for half comparisons. * First pass test for half compare. * Fix bug in CUDA specialized emit control. Updated tests to have pre and post inc/dec. * Removed unneeded parts of the cuda prelude. * Half structured buffer works on CUDA. * Added name lookup for Gfx::Format * Support half texture type in test system. * Test for half reading on CUDA. * Add half formats to Vk and D3D utils. * Fix getAt for CUDA - where there might not be a .x member in a vector.
* Clean up the way we stride over sub-object ranges for D3D11 and Vulkan (#1829)Tim Foley2021-04-30
| | | | | | | | | | | | | | This change applies to the case where we have a sub-object range with more than one object in it (so arrays of constant buffers, parameter blocks, or existentials). It is worth noting that we don't really seem to have any tests covering this case right now, so it is entirely possible that things are busted already, and/or that this change doesn't work the way I think it does. When we go to actually bind the state from a sub-object range into the pipeline, we care about: * The offset to where the first object should be written * The "stride" between consecutive objects We were already capturing the offset information from Slang reflection data, and the same was true for the part of the offset that pertains to "pending" ordinary data. For other cases, though, we were manually incrementing the offset by values computed manually in `gfx` for Vulkan, and we were just skipping the offsetting step entirely for D3D11. With this change we extract more complete stride information from the Slang reflection data and use that instead of ad hoc computations. Hopefully this data is correct, and if it isn't we can consider whether it needs fixing at the Slang reflection level rather than in `gfx`. This change doesn't apply to the OpenGL back-end (which hasn't been updated to match the new approach to static specialization at all) or to the D3D12 back end (which has been updated a bit, but probably needs other cleanup work to be done first to bring it in line).
* Update gfx back-ends to handle static specialization (#1826)Tim Foley2021-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Update gfx back-ends to handle static specialization The main goal here is to make the D3D11, D3D12 and Vulkan back-ends support static specialization of interface types in the case where the data for the type won't "fit" in the pre-allocated space for existential values. This includes all cases where the concrete type being specialized to has resources/samplers/etc., as well as any cases where its ordinary/uniform data exceeds the space available. (Note that the CPU and CUDA targets don't need this work since they can (in theory) support arbitrary-size data in the fixed-size existential payload by using pointer indirection. Actually supporting indirection in those cases should be a distinct change) The Slang compiler already performs layout for programs that have this kind of data that doesn't "fit," and it lays them out using an idea of "pending" type layouts. Basically, a type that contains some amount of specialized interface-type fields will produce both a "primary" type layout that just covers the data for the unspecialized case, as well as "pending" type layout that describes the layout for all the extra data needed by specialization. When laying out a `ConstantBuffer<X>` or `ParameterBlocK<X>` ("CB" or "PB"), the front-end will try to place as much of that "pending" data into the layout of the buffer/block itself as is possible. That means that both CBs and PBs will be able to allocate trailing bytes for any ordinary data in the "pending" layout. PBs will be able to allocate any trailing resources/samplers into their layout, but for CBs they will spill out to be part of the pending layout for the buffer itself. In order for the back-ends to properly handle pending data, they need to *either* assume the exact layout rules used by the front-end and try to reproduce them (e.g., by iterating over binding ranges and sub-objects in the exact same order that front-end layout would enumerate them), *or* they need to respect the reflection information produced by the front-end. This change takes the latter approach, trying to make only minimal assumptions about the layout rules being used. This choice is motivated by wanting to decouple the `gfx` implementation from the compiler front-end, especially insofar as this work has made me question whether the current layout rules are the best ones possible. A common theme across all the implementations is to have a fixed-size type that can represent "binding offsets" for the chosen back-end. The offset type has fields that depend on the API-specific way bindings are indexed; e.g., for D3D11 it has offsets for CBV, SRV, UAV, and sampler bindings. This fixed-size offset type can be filled in based on Slang reflecton information, and then used to compute derived offsets with just a few add operations. The simple offset type for each API is then extended to produce an offset type that includes both the offsets for "primary" data and also the offsets for "pending" data. Most logic that traffics in offsets doesn't have to know about this more complicated representation. Making consistent use of these offsets required that I pretty much rewrite the logic that actually applies shader objects to the API state. Doing so might be lowering the efficiency of the system in the near term, but the increase in clarity was important for getting the work done, and it seems like it will also be important if/when we start trying to perform special-case optimizations around root and entry-point parameter setting. While there are many API-specific differences, we can identify a repeated pattern where many steps, whether applying parameters to the pipeline stage or constructing signatures / layouts, can be broken down into three main operations on `ShaderObject`s or their layouts: * `*AsValue()` is the core operation, and is the one used for the `ExistentialValue` case most of the time. It ignores the ordinary data in the object, and instead processes all nested binding ranges (for resources/smaplers) and sub-objects. * `*AsConstantBuffer()` handles the `ConstntBuffer<X>` case, by dealing with the implicit buffer for ordinary data (if it is needed) and then delegates to the `*AsValue()` case. * `*AsParameterBlock()` handles the `ParameterBlock<X>` case, by allocating/preparing/etc. any descriptor tables/sets that would be required for the current object/layout and then delegating to `*AsConstantBuffer()` to do the rest The idea is that by having the parameter block case delegate to the constant buffer case, which delegates to the value/existential case, we can streamline a lot of the logic so that it doesn't seem quite as full of special cases. Note: When preparing this pull request I spent a reasonable amount of time trying to clean up the D3D11 and Vulkan implementations, so they are probably the easiest to read and understand when it comes to the new code. Doing the cleanup work also helped to work out some weird corner case bugs/issues. In contrast, the D3D12 path hasn't had as much attention given to cleanliness and comments, so it really needs some attention down the line to get things into a state that is easier to understand. * fixup: remove debugging code spotted in review
* `gfx` DebugCallback and debug layer. (#1822)Yong He2021-04-29
| | | * `gfx` DebugCallback and debug layer.
* Remove resource `Usage` from `gfx` interface. (#1813)Yong He2021-04-24
| | | | | | | | | | | | | * Fix `model-viewer` crash when using Vulkan. Fixing an issue in shader object layout creation for to make sure a correct descriptor set layout is calculated for types that need an implicit constant buffer. * Fix formatting. * Fixes. * Fix memory leak in vulkan. * Remove resource `Usage` from `gfx` interface.
* Fix `model-viewer` crash when using Vulkan. (#1804)Yong He2021-04-23
|
* Various fixes to make `model-viewer` example almost working. (#1801)Yong He2021-04-20
| | | | | | | | | | | | | | | | | | * Fixing `PseudoPtr` legalization and `gfx` lifetime issues. * Fixing `model-viewer` example. This change contains various fixes to bring `model-viewer` example to fully functional. These fixes include: 1. Add `spReflectionTypeLayout_getSubObjectRangeSpaceOffset` function to return the space index for a sub object referenced through a `ParameterBlock` binding. 2. Make sure `D3D12Device` specifies column major matrix order creating a Slang session. 3. Fix `platform::Window::close()` and `platform::Application::quit()`. 4. Fix memory leak during `model-viewer''s model loading. 5. Fix command buffer recording in `model-viewer`. With these changes, model viewer can now produce an image with a gray cube. The lighting is still incorrect becuase the `gfx` shader object implementation still does not handle "pending layout" resulting from global existential parameters. * Fix d3d12 root signature creation. * Use row-major matrix layout in model-viewer
* Improve robustness of gfx lifetime management. (#1788)Yong He2021-04-08
| | | | | | | | | * Improve robustness of gfx lifetime management. * fix clang error * fix clang error * Fix clang warning
* Transient root shader object. (#1782)Yong He2021-04-05
|
* `gfx` explicit transient resource management. (#1774)Yong He2021-03-31
|
* Clean up render-test handling of input (#1766)Tim Foley2021-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original goal of this change was to streamline the `TEST_INPUT` system by eliminating options that are no longer relevant once we have eliminated the non-shader-object execution paths. The result is more or less a re-implementation/refactor of the logic around how input is parsed and represented, that tries to set things up for a more general sytem going forward. The main changes isthat the `ShaderInputLayout` no longer tracks a simple flat list of `ShaderInputLayoutEntry` (that is a kind of pseudo-union of the various buffer/texture/value cases), and it instead uses a hierarchical representation composed of `RefObject`-derived classes to represent "values." There are several "simple" cases of values * Textures * Samplers * Uniform/ordinary data (`uniform`) * Buffers composed of uniform/ordinary data (`ubuffer`) Then there are composed/aggregate values that nest other values: * An *aggregate* value is a set of *fields* which are name/value pairs. It can be used to fill in a structure, for example. * An *array* value is a list of values for the elements of an array. It can be used to fill out an array-of-textures parameter, for example. * A combined texture/sampler value is a pair of a texture value and a sampler value (easy enough) * An *object* holds an optional type name for a shader object to allocate (it defaults to the type that is "under" the current shader cursor when binding), and a nested value that describes how to fill in the contents of that object Finally there are cases of values that are just syntactic sugar: * A `cbuffer` is just shorthand for creating an object value with a nested uniform/ordinary data value The big idea with this recursive structure is that it gives us a way to handle more arbitrary data types with name-based binding. Supporting this new capability requires changes to both how input layouts get parsed, and also how they get bound into shader objects. On the parsing side, things have been refactored a bit so that parsing isn't a single monolithic routine. The refactor also tries to make it so that the various options on an input item (e.g., the `size=...` option for textures) are only supported on the relevant type of entry (so you can't specify as many useless options that will be ignored). The bigger change to parsing is that it now supports a hierarchical structure, where certain input elements like `begin_array` can push a new "parent" value onto a stack, and subsequent `TEST_INPUT` lines will be parsed as children of that item until a matching `end` item. This approach means that we can now in principle describe arbitrary hierarchical structures as part of test input without endlessly increasing the complexity of invididual `TEST_INPUT` lines. On the binding side, we now have a central recursive operation called `assign(ShaderCursor, ShaderInputLayout::ValPtr)` that assigns from a parsed `ShaderInputLayout` value to a particular cursor. That operation can then recurse on the fields/elements/contents of whatever the cursor points to. Major open directions: * With this change it is still necessary to use `uniform` entries to set things like individual integers or `float`s and that is a little silly. It would be good to have some streamlines cases for setting individual scalar values. * Further, once we have a hierarchical representation of the values for `TEST_INPUT` lines, it becomes clear that we really ought to move to a format more like `TEST_INPUT: dstLocation = srcValue;` where `srcValue` is some kind of hierarchial expression grammar. Refactoring things in this way should make the binding logic even more clear and easy to understand. The refactored parser should make parsing hierarchical expressions easier to do in the future (even if it uses the push/pop model for now) * One detailed note is that the representation of buffers in this change is kind of a compromise. Just as an "object" value is a thin wrapper around a recursively-contained value for its "content" it seems clear that a buffer could be represented as a wrapper around a content value that could include hierarchical aggregates/objects instead of just flat binary data (this would be important for things like a buffer over a structure type that lays out different on different targets). The main problem right now with changing the representation is actually needing to compute the size of a buffer based on its content, so that can/should be addressed in a subsequent change. Details: * The base `RenderTestApp` class and the `ShaderObjectRenderTestApp` classes have been merged, since the hierarchy no longer serves any purpose. * Disabled the tess that rely on `StructuredBuffer<IWhatever>` because they aren't really supported by our current shader object implementation * Replaced used of `Uniform` and `root_constants` in `TEST_INPUT` lines with just `uniform` * Removed a bunch of uses of `stride` from `cbuffer` inputs, where it wasn't really correct/meaningful * Added the `copyBuffer()` operation to VK/D3D renderers, along with some missing `Usage` cases to support it. * Made `ShaderCursor` handle the logic to look up a name in the entry points of a root shader object, rather than just having that logic in `render-test`. (We probably need to make a clear design choice on this issue)
* Improve Vulkan shader-objects implementation. (#1765)Yong He2021-03-25
| | | | | | | | | | | | | | | | | | * Improve Vulkan shader-objects implementation. 1. Null bindings no longer crashes. 2. No longer copies push constants to staging CPU buffer before setting it into command buffer. The entry-point shader object now directly sets it into command buffer upon `bindObject` call. * Update comments * Fix * Re-enable 3 tests. Improved vulkan implementation so that each shader object is responsible for creating descriptor sets on-demand. Fixed slang reflection to correctly report `ParameterBlock` binding. * Fix gcc compile error.
* Reimplement Vulkan shader objects. (#1764)Yong He2021-03-24
| | | | | | | | | | | * Reimplement Vulkan shader objects. This change reimplements Vulkan shader objects in the `gfx` layer so that it is no longer layered on top of the `DescriptorSet` abstraction. Since this is the last implementation that uses `DescriptorSet`, the change also removes all `DescriptorSet` related API from public `gfx` interface. The Vulkan implementation now passes all test cases, but it still have two issues: 1. The PushConstant setting is not correct, this is because we don't seem to be able to get correct reflection data about the size of push constants for an entry-point. 2. The `shader-toy` example can't run on Vulkan, because it currently sets nullptr to `Texture` bindings, and this change doesn't properly handle setting resource to null in `ShaderObject`s yet. If we can use the `nullDescriptor` feature on vulkan, this implementation will be simple. However we still want to decide whether we want to use a Vulkan 1.2 feature for this. * Fix up
* Remove `DescriptorSet` from D3D11 and GL devices. (#1761)Yong He2021-03-18
|
* Change representation of initial data for textures (#1747)Tim Foley2021-03-11
| | | | | | | | | | | | | * Change representation of initial data for textures Before this change, initial data for a texture has been provided with the `ITextureResource::Data` type, where a call to `IDevice::createTexture()` would take zero or one `Data` and, if present, use it to initialize all the subresources of a texture. The organization of `Data` was not actually quite how its own documentation comment described it (the implementations didn't agree with the comment), and while it aggressively factored out redundancies (e.g., only storing the stride for each mip level once, instead of once per subresource for large arrays), the result was that setting up a `Data` correcty was a bit confusing. This change makes the initial data for a texture using a `SubresourceData` type that is almost identical to what D3D11 uses, so that developers are more likely to be comfortable filling it in. All of the existing implementations were easily adapted to use the new type, so it seems like a net win. Note: Both Vulkan and D3D11 do away with the idea of initializing a texture with data as part of allocating it, and we might eventually want to do the same given the complexity that this system entails. The main reason to preserve this detail is for better compatibility with D3D11, where immutable textures/buffers need to have their data specified at creation time. It seems good to preserve the ability to have immutable resources on target APIs where this distinction could affect performance (e.g., immutable resources do not need state/transition tracking on APIs like D3D11). * fixup: CUDA
* Add Linux support to `platform` and `gfx`. (#1744)Yong He2021-03-11
|
* Swapchain resize and rename to `IDevice` (#1741)Yong He2021-03-10
| | | | | * Swapchain resize * Fix.
* Refactor `gfx` to surface `CommandBuffer` interface. (#1735)Yong He2021-03-04
| | | | | | | | | | | | | * Refactor `gfx` to surface `CommandBuffer` interface. * Fixes. * Fix code review issues, and make vulkan runnable on devices without VK_EXT_extended_dynamic_states. * Update solution files * Move out-of-date examples to examples/experimental Co-authored-by: Yong He <yhe@nvidia.com>
* Explicit swapchain interface in `gfx`. (#1726)Yong He2021-02-24
| | | | | | | | | * Explicit swapchain interface in `gfx`. * Correctly return nullptr when `IRenderer` creation failed. * Fix crashes on CUDA tests. * Cleanups.
* Make gfx library visible to external user. (#1719)Yong He2021-02-19
| | | | | * Make gfx library visible to external user. * Fixup
* Add `SampleGrad` overload for lod clamp. (#1711)Yong He2021-02-17
| | | | | | | | | | | | * Add `SampleGrad` overload for lod clamp. * Fix gfx to run the test on vulkan. * Whitespace change to trigger CI build * remove presentFrame call in render-test Co-authored-by: Yong He <yhe@nvidia.com> Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
* Shader-Object example (#1694)Yong He2021-02-05
|
* [gfx] Shader-object driven shader compilation. (#1688)Yong He2021-02-04
|
* Make own a slang session. (#1678)Yong He2021-01-27
|
* Integrate reflection more deeply into gfx layer (#1677)Tim Foley2021-01-26
|
* Add `StructuredBuffer` support in `gfx`. (#1666)Yong He2021-01-21
|
* Make `ShaderCursor` no longer depend on core. (#1661)Yong He2021-01-19
|
* Make `gfx` compile to a DLL. (#1660)Yong He2021-01-17
| | | | | | | | | * Make `gfx` compile to a DLL. * Fix cuda * Fix cuda build * Bug gl screen capture bug.
* COM-ify all slang-gfx interfaces. (#1656)Yong He2021-01-14
| | | * COM-ify all slang-gfx interfaces.
* Make `gfx::Renderer` a COM interface. (#1653)Yong He2021-01-11
| | | | | | | | | | | | | * Make `gfx::Renderer` a COM interface. This is a first step towards making the `gfx` library expose a COM compatible DLL interface. Remaining classes will come as separate PRs. * Fixup project files * Fix calling conventions * Make gfx::create*Renderer() functions increase ref count by 1 * Make renderer createFunc return via out parameter
* Refactor GUI/Window utils out of gfx library (#1649)Yong He2021-01-06
| | | Co-authored-by: Yong He <yhe@nvidia.com>
* Move ShaderObject to be under renderer interface. (#1633)Yong He2020-12-10
| | | | | | | * Move ShaderObject to be under renderer interface. * Make `create*PipelineState` take `const PipelineStateDesc&`. * Move ShaderCursor implementation to a cpp file
* Add shader object parameter binding to renderer_test. (#1622)Yong He2020-12-03
| | | | | | | | | * Add shader object parameter binding to renderer_test. * remove multiple-definitions.hlsl * Fix cuda implementation. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
* Fix Vk leak (#1579)jsmall-nvidia2020-10-15
| | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * Handle scope of VkShaderModule. * Fix tabbing issue.
* Use new vulkan debug layer. (#1566)Yong He2020-10-02
| | | | | | | * Use new vulkan debug layer. * Try use VK_LAYER_KHRONOS_validation when it exists. Co-authored-by: Tim Foley <tim.foley.is@gmail.com>
* Vulkan update/NVAPI support (#1511)jsmall-nvidia2020-08-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * First pass at incorporating nvapi into test harness. * D3d12 Atomic Float Add via NVAPI working * Dx12 atomic float appears to work. * Atomic float add on Dx12. * Added atomic64 feature addition to vk. Fix correct output for atomic-float-byte-address.slang * Disable atomic float failing tests. * Upgraded VK headers. * Detect atomic float availability on VK. * Try to get test working for in64 atomic. * Made HLSL prelude controlled via the render-test requirements. * Added -enable-nvapi to premake. * Fix D3D12Renderer when NVAPI is not available. * Small improvements to VKRenderer. * Improve atomic documentation in target-compatibility.md.
* Change the policy for entry-point uniform parameters on Vulkan (#1476)Tim Foley2020-08-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Entry point `uniform` parameters were a feature of the original Cg and HLSL, but have not been used much in production shader code. One of our goals on Slang is to reduce the (ab)use of the global scope, so bringing entry point `uniform` parameters up to a greater level of usability is an important goal. Some policy choices about how global vs. entry-point `uniform` parameters behave have already been made, that shape decisions looking forward: * For DXBC/DXIL, it makes the most sense to follow the lead of fxc/dxc, by treating entry point `uniform` parameters as a kind of syntax sugar for global shader parameters. Any parameters of "ordinary" types are bundles up into an implicit constant buffer, and all the resources (including the implicit constant buffer) are assigned `register`s just as for globals. It is up to the application to decide how to bind those parameters via a root signature (using root descriptors, root constants, descriptor tables, local vs. global root signature, etc.) * For CPU, it makes sense to pass global vs. entry-point parameters as two different pointers, although the details of what we do for CPU are the least constrained across all current targets. * For CUDA compute, it makes the most sense to map global shader parameters to `__constant__` global data, and entry-point `uniform` parameters to kernel parameters. This choice ensures that the signature of a kernel when translated from Slang->CUDA follows the Principle of Least Surprise, at the cost of making entry-point vs. global parameters be passed via different mechanisms. * For OptiX ray tracing, it makes sense to expand on the precedent from CUDA compute: pass global parameters via global `__constant__` data (as is already expected by OptiX for whole-launch parameters), and pass entry-point `uniform` parameters via the "shader record." This establishes a precedent that for ray-tracing shaders, global-scope parameters map to the "global root signature" concept from DXR, while entry-point `uniform` parameters map to a "local root signature" or "shader record." * For Vulkan ray tracing, the precedent from OptiX then argues that entry-point `uniform` parameters should map to the Vulkan "shader record" concept (and thus cannot support things like resource types). * The remaining interesting case is what to do for non-ray-tracing shaders on Vulkan. The dev team agrees that the most reasonable choice to make for non-ray-tracing Vulkan shaders is to map entry-point `uniform` parameters to "push constants." In particular, this makes it easy to express the case of a compute kernel with direct parameters of ordinary/value types in the way that will be implemented most efficiently. The big picture is then that a kernel like: ```hlsl void computeMain(uniform float someValue) { ... } ``` will map to output GLSL like: ```glsl layout(push_constant) uniform { float someValue; } U; void main() { ... } ``` If the user really wanted a constant-buffer binding to be created instead, they can easily change their input to make the buffer explicit: ```hlsl struct Params { float someValue; } void computeMain(uniform ConstantBuffer<Params> params) { ... } ``` (Forcing the user to be explicit about the desire for a buffer here creates a nice symmetry between Vulkan and CUDA; in the first case the user sets up the data in host memory and passes it to the GPU by copy, while in the second case the user must allocate and set up a device-memory buffer for the data. This symmetry extends to D3D if the application chooses to map entry-point `uniform` parameters to root constants.) This change implements logic in the "parameter binding" part of the Slang compiler to make sure that entry-point `uniform` parameters are wrapped up in a push-constant buffer rather than an ordinary constant buffer for non-ray-tracing shaders on Vulkan (and in a shader record "buffer" for the ray-tracing case). The majority of the actual work was in adding support for root/push constants to the test framework and the graphics API abstraction it uses. To be clear about that support: * Root constant ranges are (perhaps confusingly) treated as a new kind of "slot" that can appear on a descriptor set. This choice ensures that the implicit numbering of registers/spaces used by the back-ends can account for these ranges correctly. * The `TEST_INPUT` lines are extended to allow a `root_constants` case that behaves more or less like `cbuffer` * The CPU and CUDA paths can treat a `root_constants` input identically to a `cbuffer`. They already allocate the actual buffers based on reflection, and just use `cbuffer` as a directive that causes bytes to be copied in. * On D3D12 and Vulkan, a descriptor set allocates a `List<char>` to hold the bytes of root constant data assigned into it, and these bytes are flushed to the command list when the table is actually bound (usually right before rendering). * On D3D11, a descriptor set treats a root constant range more or less like a constant buffer range (with a single buffer), except that it also automatically allocates a buffer to hold the data. Assigning "root constant" data automatically copies it into that buffer. The small number of tests that used entry-point `uniform` parameters of ordinary types were updated to use the new `root_constant` input type, and the bugs that surfaced were fixed. A new test to confirm that entry-point `uniform` parameters map to the shader record for VK ray tracing was added. An important but technically unrelated change is the removal of the `DescriptorSetImpl::Binding` type and related function from the Vulkan implementation of `Renderer`. That type was created to ensure that objects that are bound into a descriptor set don't get released while the descriptor set is still alive, but the implementation relied on a complicated linear search to check for existing bindings, which could create a performance issue for descriptor sets that include large arrays of descriptors. The new implementation makes use of the approach already present in the various `Renderer` implementations (including the Vulkan one) for assigning ranges in a descriptor set a flat/linear index for where their pertinent data is to be bound. As a result, the Vulkan `DescriptorSetImpl` now uses a single flat array of `RefPtr`s to track bound objects, and has no need for linear search when binding. Co-authored-by: Yong He <yonghe@outlook.com>