| Age | Commit message (Collapse) | Author |
|
|
|
* Allow overriding specialization args via `IShaderObject`.
* Fixes.
Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Added SourceLoc handling for command line parsing.
* Fix typo in debug.
* Fix issue around the DiagnosticSink used in options parsing not having a writer available - by having DiagnosticSink parenting.
* Small rename for clarity.
* WIP extracting command line args for downstream tools.
* Unit tests/bug fixes around extracting args.
* Use DownstreamArgs in the EndToEndCompileRequest
* Passing downstream compiler options downstream.
* Fix issue with endToEndReq being nullptr.
* Fix issue with diagnostics number change.
* Small improvements to how the source line is displayed if it's too long.
Default to 120, as suggested in previous review.
* Make render test use x-args parsing and CommandArgReader.
* Added missing diagnostics.
* More DownstreamArgs to linkage so can be seen by 'components'.
Added dxc-x-arg test.
* Used combination of name and args instead of two Lists, which whilst equivalent was perhaps a little confusing.
* Added documentation for -X support.
* Added test for x-args parsing diagnostic. Improved diagnostic with list of known names.
* Fix issues from merge.
* Fix lookup for -matrix-layout-column-major in render test.
* Remove commented out line.
|
|
Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Split out StringEscapeUtil.
* Added StringEscapeUtil.
* Fix typo in unix quoting type.
* Small comment improvements.
* Try to fix linux linking issue.
* Fix typo.
* Attempt to fix linux link issue.
* Update VS proj even though nothing really changed.
* Fix another typo issue.
* Fix for windows issue.
Fixed bug.
* Make separate Utils for escaping.
* Fix typo.
* Split out into StringEscapeHandler.
* Windows shell does handle removing quotes (so remove code to remove them).
* Handle unescaping if not initiating using the shell.
* Slight improvement around shell like decoding.
* Simplify command extraction.
* Add shared-library category type.
* Fix bug in command extraction.
* Typo in transcendental category.
* Enable unit-test on in smoke test category.
* Make parsing failing output as a failing test.
* Fixes for transcendental tests. Disable tests that do not work.
* Changed category parsing.
* Removed the TestResult parameter from _gatherTestsForFile.
Made testsList only output.
* Remove testing if all tests were disabled.
* Make args of CommandLine always unescaped.
* Add category.
* Don't need escaping on unix/linux.
* Remove some no longer used functions.
* Add requireSMVersion to CUDAExtensionTracker.
* half-calc.slang now works for CUDA.
* bit-cast-16-bit works on CUDA.
* WIP handling of CUDA vector<half> types.
* Half swizzle CUDA.
* Half vector test.
* Fix swizzle half bug.
* Fix compilation issue with narrowing to Index.
* Add unary ops.
* Add some vector scalar maths ops.
* Add half vector conversions for CUDA.
* Fix erroneous comment.
* Support for half comparisons.
* First pass test for half compare.
* Fix bug in CUDA specialized emit control.
Updated tests to have pre and post inc/dec.
* Removed unneeded parts of the cuda prelude.
* Half structured buffer works on CUDA.
* Added name lookup for Gfx::Format
* Support half texture type in test system.
* Test for half reading on CUDA.
* Add half formats to Vk and D3D utils.
* Fix getAt for CUDA - where there might not be a .x member in a vector.
|
|
* `gfx` DebugCallback and debug layer.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Split out StringEscapeUtil.
* Added StringEscapeUtil.
* Fix typo in unix quoting type.
* Small comment improvements.
* Try to fix linux linking issue.
* Fix typo.
* Attempt to fix linux link issue.
* Update VS proj even though nothing really changed.
* Fix another typo issue.
* Fix for windows issue.
Fixed bug.
* Make separate Utils for escaping.
* Fix typo.
* Split out into StringEscapeHandler.
* Windows shell does handle removing quotes (so remove code to remove them).
* Handle unescaping if not initiating using the shell.
* Slight improvement around shell like decoding.
* Simplify command extraction.
* Add shared-library category type.
* Fix bug in command extraction.
* Typo in transcendental category.
* Enable unit-test on in smoke test category.
* Make parsing failing output as a failing test.
* Fixes for transcendental tests. Disable tests that do not work.
* Changed category parsing.
* Removed the TestResult parameter from _gatherTestsForFile.
Made testsList only output.
* Remove testing if all tests were disabled.
* Make args of CommandLine always unescaped.
* Add category.
* Don't need escaping on unix/linux.
* Remove some no longer used functions.
|
|
* Fix `model-viewer` crash when using Vulkan.
Fixing an issue in shader object layout creation for to make sure a correct descriptor set layout is calculated for types that need an implicit constant buffer.
* Fix formatting.
* Fixes.
* Fix memory leak in vulkan.
* Remove resource `Usage` from `gfx` interface.
|
|
* Improve robustness of gfx lifetime management.
* fix clang error
* fix clang error
* Fix clang warning
|
|
|
|
This change originated as an attempt to re-enable a test case, but it has ended up disabling more tests (for good reasons) than it re-enables.
The main change here is a significant overhaul of the way that the D3D12 render path extracts information from the Slang reflection API to produce a root signature.
There were also some supporting fixes in the reflection information to make sure it returns what the D3D12 back-end needed.
The big picture here is that the D3D12 path now uses the descriptor ranges stored in the reflection data more or less directly.
It still needs to use register/space offset information queried via the "old" reflection API, but it only does so at the top level now, for the program and entry points themselves.
All other layout information is derived directly from what Slang provides.
Smaller changes:
* The "flat" reflection API was expanded to include `getBindingRangeDescriptorRangeCount()` which was clearly missing.
* The "flat" reflection results for a constant buffer or parameter block that didn't contain any uniform data and was mapped to a plain constant buffer needed to be fixed up. That logic is still way to subtle to be trusted.
* Several additional tests were disabled that relied on static specialization, global/entry-point generi type parameters, structured buffers of interfaces or other features we don't officially support with shader objects right now. All of the affected tests were somehow passing by sheer luck and because they often passed in specialization arguments via explicit `TEST_INPUT` lines.
* The `inteface-shader-param` test is re-enabled now that we can properly describe its input with the new `set` mode on `TEST_INPUT`
* `ShaderCursor::getElement()` can now be used on structure types (in addition to arrays) to support by-index access to fields
* The `TEST_INPUT` system was expanded to support both by-name and by-index setting of structure fields for aggregates
* The `TEST_INPUT` system was expanded to allow an `out` prefix to mark parts of an expression as outputs on a `set` lines
* The `TEST_INPUT` system was expanded so that anything that would be allowed on a `TEST_INPUT` line by itself (like `ubuffer(...)`) can now be used as a sub-expression on a `set` line
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
|
|
This change allows the `TEST_INPUT` syntax used by `render-test` to support aggregate values with a single input line more easily.
The test writer can now use a syntax like:
```
//TEST_INPUT:set someVar = 3.0
```
Input lines that start with the `set` keyword will now use a simpler `dst = src` format (instead of `dst:name=src` as the existing syntax used). The right-hand side expression can include:
* Numeric literals, both integer and floating-point (currently only supporting 32-bit scalar types; we could fix this later)
* Arrays, consisting of zero or more comma-separated expressions inside `[]`
* Aggregates, consisting of zero or more comma-separated "fields" inside `{}`. A field can either be `name: <expr>` or just `<expr>`
* Objects, which can be written as either `new SomeType{ <fields> }` or `new{ <fields> }` in the case where the type is know-able from context
With this approach is should be possible to support almost arbitrary-type inputs on a single line. For now, I have used this support to re-enable an existing test that had been disabled due to lack of support for setting up arrays of objects.
Major things left to do:
* The new syntax doesn't support the existing cases we had for `Texture2D`, etc. Those should probably be supported but I'd like to find a way to do it without duplicating the parsing logic (ideally the value cases from the existing code should Just Work in the new model)
* There is no support right now for non-32-bit scalar types
* It would be good if this support (and the shader cursor system) supported treating vectors like aggregates
* The actual value-setting logic doesn't currently handle aggregates without field names, so `{ a:0, b:1 }` will work but `{ 0, 1 }` will parse but fail when it comes time to set values
* While this approach lets complicated values be set with a single line, that isn't always what a user will want to do: in the future we should provide a way to break up an aggregate value over multiple lines that is consistent with this approach
* Once we port all of the relvant tests over, it would be great to drop the `set` prefix and have these lines look as simple and conventional as possible
|
|
The original goal of this change was to streamline the `TEST_INPUT` system by eliminating options that are no longer relevant once we have eliminated the non-shader-object execution paths. The result is more or less a re-implementation/refactor of the logic around how input is parsed and represented, that tries to set things up for a more general sytem going forward.
The main changes isthat the `ShaderInputLayout` no longer tracks a simple flat list of `ShaderInputLayoutEntry` (that is a kind of pseudo-union of the various buffer/texture/value cases), and it instead uses a hierarchical representation composed of `RefObject`-derived classes to represent "values."
There are several "simple" cases of values
* Textures
* Samplers
* Uniform/ordinary data (`uniform`)
* Buffers composed of uniform/ordinary data (`ubuffer`)
Then there are composed/aggregate values that nest other values:
* An *aggregate* value is a set of *fields* which are name/value pairs. It can be used to fill in a structure, for example.
* An *array* value is a list of values for the elements of an array. It can be used to fill out an array-of-textures parameter, for example.
* A combined texture/sampler value is a pair of a texture value and a sampler value (easy enough)
* An *object* holds an optional type name for a shader object to allocate (it defaults to the type that is "under" the current shader cursor when binding), and a nested value that describes how to fill in the contents of that object
Finally there are cases of values that are just syntactic sugar:
* A `cbuffer` is just shorthand for creating an object value with a nested uniform/ordinary data value
The big idea with this recursive structure is that it gives us a way to handle more arbitrary data types with name-based binding. Supporting this new capability requires changes to both how input layouts get parsed, and also how they get bound into shader objects.
On the parsing side, things have been refactored a bit so that parsing isn't a single monolithic routine. The refactor also tries to make it so that the various options on an input item (e.g., the `size=...` option for textures) are only supported on the relevant type of entry (so you can't specify as many useless options that will be ignored).
The bigger change to parsing is that it now supports a hierarchical structure, where certain input elements like `begin_array` can push a new "parent" value onto a stack, and subsequent `TEST_INPUT` lines will be parsed as children of that item until a matching `end` item. This approach means that we can now in principle describe arbitrary hierarchical structures as part of test input without endlessly increasing the complexity of invididual `TEST_INPUT` lines.
On the binding side, we now have a central recursive operation called `assign(ShaderCursor, ShaderInputLayout::ValPtr)` that assigns from a parsed `ShaderInputLayout` value to a particular cursor. That operation can then recurse on the fields/elements/contents of whatever the cursor points to.
Major open directions:
* With this change it is still necessary to use `uniform` entries to set things like individual integers or `float`s and that is a little silly. It would be good to have some streamlines cases for setting individual scalar values.
* Further, once we have a hierarchical representation of the values for `TEST_INPUT` lines, it becomes clear that we really ought to move to a format more like `TEST_INPUT: dstLocation = srcValue;` where `srcValue` is some kind of hierarchial expression grammar. Refactoring things in this way should make the binding logic even more clear and easy to understand. The refactored parser should make parsing hierarchical expressions easier to do in the future (even if it uses the push/pop model for now)
* One detailed note is that the representation of buffers in this change is kind of a compromise. Just as an "object" value is a thin wrapper around a recursively-contained value for its "content" it seems clear that a buffer could be represented as a wrapper around a content value that could include hierarchical aggregates/objects instead of just flat binary data (this would be important for things like a buffer over a structure type that lays out different on different targets). The main problem right now with changing the representation is actually needing to compute the size of a buffer based on its content, so that can/should be addressed in a subsequent change.
Details:
* The base `RenderTestApp` class and the `ShaderObjectRenderTestApp` classes have been merged, since the hierarchy no longer serves any purpose.
* Disabled the tess that rely on `StructuredBuffer<IWhatever>` because they aren't really supported by our current shader object implementation
* Replaced used of `Uniform` and `root_constants` in `TEST_INPUT` lines with just `uniform`
* Removed a bunch of uses of `stride` from `cbuffer` inputs, where it wasn't really correct/meaningful
* Added the `copyBuffer()` operation to VK/D3D renderers, along with some missing `Usage` cases to support it.
* Made `ShaderCursor` handle the logic to look up a name in the entry points of a root shader object, rather than just having that logic in `render-test`. (We probably need to make a clear design choice on this issue)
|
|
* Reimplement Vulkan shader objects.
This change reimplements Vulkan shader objects in the `gfx` layer so that it is no longer layered on top of the `DescriptorSet` abstraction. Since this is the last implementation that uses `DescriptorSet`, the change also removes all `DescriptorSet` related API from public `gfx` interface.
The Vulkan implementation now passes all test cases, but it still have two issues:
1. The PushConstant setting is not correct, this is because we don't seem to be able to get correct reflection data about the size of push constants for an entry-point.
2. The `shader-toy` example can't run on Vulkan, because it currently sets nullptr to `Texture` bindings, and this change doesn't properly handle setting resource to null in `ShaderObject`s yet. If we can use the `nullDescriptor` feature on vulkan, this implementation will be simple. However we still want to decide whether we want to use a Vulkan 1.2 feature for this.
* Fix up
|
|
* Remove old code paths from render-test
Historically, the `render-test` tool was using three different code paths:
* One based on `gfx` and manual (non-reflection-based) parameter setting, used for OpenGL, D3D11, D3D12, and Vulkan
* One for CPU that used reflection-based parameter setting but shared no code with the first
* One for CUDA that used reflection-based parameter setting and shared some, but not all, code with the CPU path
Recently we've updated `render-test` to include a fourth option:
* Using `gfx` and the "shader object" system it exposes for a unified reflection-based parameter-setting system taht works across OpenGL, D3D11, D3D12, Vulkan, CUDA, and CPU
This change removes the first three options and leaves only the single unified path. A sa result, a bunch of code in `render-test` is no longer needed, and the codebase no longer relies on things like the `IDescriptorSet`-related APIs in `gfx`.
Several existing tests had to be disabled to make this change possible. Those tests will need to be audited and either re-enabled once we fix issues in the shader object system, or permanently removed if they don't test stuff we intend to support in the long run (e.g., global-scope type parameters, which aren't a clear necessity).
* fixup: CUDA detection logic
|
|
|
|
* Add a CPU renderer implementation
This change adds a CPU back-end to `gfx` and ensures that most of our existing CPU tests pass when using it.
Detailed notes:
* Most of the CPU renderer implementation is copy-pasted from the CUDA case, so they share a lot of similar logic
* The main addition to the CPU renderer is a semi-complete implementation of host-memory textures. The logic here handles all the main shapes (Buffer, 1D, 2D, 3D, Cube) and all the currently-supported `Format`s that are sample-able as-is (no D24S8). The implementation is not intended to be fast, and it currently only does nearest-neighbor sampling, but otherwise it tries to avoid cutting too many corners and should be ar reasonable starting point for a more complete (but not performance-oriented) implementation.
* Refactored the CPU prelude `IRWTexture` interface to inherit from `ITexture`, since in most cases a single type will end up implementing both. It might be worth it to collapse it all down to a single interface later.
* Changed the CPU prelude `ITexture`/`IRWTexture` interface so that it takes both a pointer *and* a size for output arguments. This change seems necessary to allow a shader variable declared as a `Texture2D<float>` to fetch a single `float` when the underlying texture might be using RGBA32F.
* Added to the `IComponentType` public API so that we can query a "host callable" for an entry point and not just a binary.
* Turned off the `-shaderobj` flag on two tests that weren't yet compatible with shader objects but still had the flag left in on the path (since previously the CPU path always used the non-`gfx` non-shader-object logic anyway)
* Disabled one test (`dynamic-dispatch-11`) that relied on the `ConstantBuffer<IInterface>` idiom that we know we are planning to chagne soon anyway.
* Made a few changes to the CUDA path to bring it into line with what I added for the CPU path. These were mostly bug fixes around indexing logic for sub-objects and resources.
* fixup
|
|
* Change representation of initial data for textures
Before this change, initial data for a texture has been provided with the `ITextureResource::Data` type, where a call to `IDevice::createTexture()` would take zero or one `Data` and, if present, use it to initialize all the subresources of a texture.
The organization of `Data` was not actually quite how its own documentation comment described it (the implementations didn't agree with the comment), and while it aggressively factored out redundancies (e.g., only storing the stride for each mip level once, instead of once per subresource for large arrays), the result was that setting up a `Data` correcty was a bit confusing.
This change makes the initial data for a texture using a `SubresourceData` type that is almost identical to what D3D11 uses, so that developers are more likely to be comfortable filling it in. All of the existing implementations were easily adapted to use the new type, so it seems like a net win.
Note: Both Vulkan and D3D11 do away with the idea of initializing a texture with data as part of allocating it, and we might eventually want to do the same given the complexity that this system entails. The main reason to preserve this detail is for better compatibility with D3D11, where immutable textures/buffers need to have their data specified at creation time. It seems good to preserve the ability to have immutable resources on target APIs where this distinction could affect performance (e.g., immutable resources do not need state/transition tracking on APIs like D3D11).
* fixup: CUDA
|
|
* Swapchain resize
* Fix.
|
|
* Refactor `gfx` to surface `CommandBuffer` interface.
* Fixes.
* Fix code review issues, and make vulkan runnable on devices without VK_EXT_extended_dynamic_states.
* Update solution files
* Move out-of-date examples to examples/experimental
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Explicit swapchain interface in `gfx`.
* Correctly return nullptr when `IRenderer` creation failed.
* Fix crashes on CUDA tests.
* Cleanups.
|
|
* Make gfx library visible to external user.
* Fixup
|
|
This change kind of rolls together two different simplifications:
1. The `createShaderObject()` shouldn't really need to take an `IShaderObjectLayout` because it could just take the `slang::TypeLayoutReflection` instead and create the shader-object layout behind the scenes.
2. For that matter, it needn't take a `slang::TypeLayoutReflection` either, becaues it could just take a `slang::TypeReflection` and query the layout of that type behind the scenes.
The combination of these two changes means:
* `IShaderObjectLayout` is gone from the public API, as is `createShaderObjectLayout()`
* `createShaderObject()` directly takes a `slang::TypeReflection` and allocates a shader object of that type
The result is simpler and more streamlined application code.
Note that under the hood the implementation still has shader-object layouts, using the `ShaderObjectLayoutBase` class. A few locations had to change to use `RefPtr`s instead of `ComPtr`s now that the class is no longer a public COM-lite API type.
The hope is that this change makes it easier to allocate/cache layouts for things like specialized types "under the hood," as is needed to implement parameter setting for static specialization.
|
|
* Add `SampleGrad` overload for lod clamp.
* Fix gfx to run the test on vulkan.
* Whitespace change to trigger CI build
* remove presentFrame call in render-test
Co-authored-by: Yong He <yhe@nvidia.com>
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Fix bugs with m_features on Dx12 and gl.
Fix issue about GFX_NVAPI availability.
* Fix handling of SLANG_E_NOT_AVAILABLE on renderer startup.
* Clarify comment.
* Improve comment.
|
|
|
|
|
|
|
|
|
|
* Make `gfx` compile to a DLL.
* Fix cuda
* Fix cuda build
* Bug gl screen capture bug.
|
|
This change converts a large number of our existing tests to use the `ShaderObject` support that was added to the `gfx` layer.
In many cases, tests were just updated to pass `-shaderobj` and the result Just Worked.
In other cases, a `name` attribute had to be added to one or more `TEST_INPUT` lines.
For tests that did not work with shader objects "out of the box," I spent a little bit of time trying to get them work, but fell back to letting those tests run in the older mode.
Future changes to the infrastructure will be needed to get those additional tests working in the new path.
Along with the changes to test files, the following implementation changes were made to get additional tests working:
* Because the shader object mode uses explicit register bindings (from reflection), the hacky logic that was offseting `u` registers for D3D12 based on the number of render targets gets disabled (by another hack).
* The "flat" reflection information coming from Slang was not correctly reporting "binding ranges" for things that consumed only uniform data (which would be everything on CUDA/CPU), so it was refactored to properly include binding ranges for anything where the type of the field/variable implied a binding range should be created (even if the `LayoutResourceKind` was `::Uniform`).
* A few fixes were made to the CUDA implementation of `Renderer`, in order to get additional tests up and running. Most of these changes had to do with texture bindings, which hadn't really been tested previously.
In addition, a few changes were made that were attempts at getting more tests working, but didn't actually help. These could be dropped if requested:
* As a quality-of-life feature (not being used) the `object` style of `TEST_INPUT` line is upgraded to support inferring the type to use from the type of the input being set.
* Any `object` shader input lines get ignored in non-shader-object mode.
|
|
* COM-ify all slang-gfx interfaces.
|
|
* Make `gfx::Renderer` a COM interface.
This is a first step towards making the `gfx` library expose a COM compatible DLL interface. Remaining classes will come as separate PRs.
* Fixup project files
* Fix calling conventions
* Make gfx::create*Renderer() functions increase ref count by 1
* Make renderer createFunc return via out parameter
|
|
* Implements CUDA renderer in gfx.
* Revert unnecessary change.
* Revert unnecessary changes.
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
* Move ShaderObject to be under renderer interface.
* Make `create*PipelineState` take `const PipelineStateDesc&`.
* Move ShaderCursor implementation to a cpp file
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Move reflection to reflection-api.
* Slight reorg to pull out potentially Slang internal functions from the reflection API impls.
* Remove visual studio projects
* Fix for slang-binaries copy.
* Add the visual studio projects in build/visual-studio
* Remove miniz project.
* Differentiate the linePath from the filePath.
* Improve comment in premake5.lua + to kick of CI.
* Kick CI.
|
|
* Add shader object parameter binding to renderer_test.
* remove multiple-definitions.hlsl
* Fix cuda implementation.
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Mangling/module name extraction for GenericDecl
* Add comment on SerialFilter to explain re-enabling Stmt.
* Support setting up SyntaxDecl when reconstructed after deserialization.
* Improvements to setup SyntaxDecl.
* Fix typo so can read compressed SourceLocs.
* Fix issue with SourceManger.
* Simple test for serializing out stdlib and reading back in.
* Fix calling convention.
* Add override to StdLib impls.
* Fix typo.
* Apply testing to an actual compute test when using load-stdlib
Make -load/compile-stdlib processable by Slang
Move out testing into util into TestToolUtil so can be shared.
* Slightly more concise setup of session.
* Fix some errors introduced with session handling.
* Made setup for compile same across slangc and slangc-tool.
|
|
* Use integer RTTI/witness handles in existential tuples.
* Fix clang error.
* Fix IR serialization to use 16bits for opcode.
* Undo accidental comment change.
* Use variable length encoding for opcode.
* Fix compile error.
* Fixing issues
* Fix code review issues.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Fix handling of access modifiers inside type definition.
* Fix access problem for AST node.
Make dumping produce a single function with switch, to potentially make available without Dump specific access.
* WIP on serialization design doc.
* Remove project references to previously generated files.
* More docs on serialization design.
* Improve serialization documentation.
Remove unused function from IRSerialReader.
* Small fixes around naming. Remove long comment from slang-serialize.h - as covered in serialization.md
* Remove long comment in slang-serialize.h as covered in serialization.md
* More information about doing replacements on read for AST and problems surrounding.
* Typo fix.
* Spelling fixes.
* Value serialize.
* Value types with inheritence.
* Use value reflection serial conversion for more AST types
* Use automatic serialization on more of AST.
* Get the types via decltype, simplifies what the extractor has to do.
* Update the serialization.md for the value serialization.
* Small doc improvements.
* Update project.
* Remove ImportExternalDecl type
Added addImportSymbol and ImportSymbol type
Fixed bug in container which meant it wouldn't read back AST module
* Because of change of how imports and handled, store objects as SerialPointers.
* First pass symbol lookup from mangled names.
* Cache current module looked up from mangled name.
* Fix SourceLoc bug.
Improve comments.
* Added diagnostic on mangled symbol not being found
* Fix typo.
* WIP serializing stdlib.
* WIP serializing stdlib in.
* Fix problem serializing arrays that hold data that is already serialized.
* Remove clash of names in MagicTypeModifier.
* Make conversion from char to String explicit.
Fix reference count issue with SerialReader.
* Add code to save/load stdlib.
* Use return code to avoid warning - SerialContainerUtil::write(module, options, &stream))
* Make all String numeric ctors explicit.
Added isChar to UnownedStringSlice.
Added operator== for UnownedStringSlice to String to avoid need to convert to String and allocate.
* Add error check to readAllText.
* tabs -> spaces on String.h
* tab -> spaces String.cpp
* Remove msg for StringBuilder, just build inplace for exceptions.
* Check SerialClasses - for name clashes.
Renamed Modifier::name as Modifier::keywordName
* Handling of extensions when deserializing AST - updating the moduleDecl->mapTypeToCandidateExtensions
Co-authored-by: Tim Foley <tim.foley.is@gmail.com>
|
|
|
|
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
* Support shader parameters that are an array of existential type.
* Rename to getFirstNonExistentialValueCategory
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Allow unspecialized existential shader parameters (dynamic dispatch).
* Fixes.
* Fixes
* disable cuda test
|
|
* Support dynamic existential shader parameters in render-test
* Fix linux build error.
* Fixes.
* Fix code review issues.
* Fix gcc error.
* More fixes.
* More fixes.
|
|
* First pass at incorporating nvapi into test harness.
* D3d12 Atomic Float Add via NVAPI working
* Dx12 atomic float appears to work.
* Atomic float add on Dx12.
* Added atomic64 feature addition to vk.
Fix correct output for atomic-float-byte-address.slang
* Disable atomic float failing tests.
* Upgraded VK headers.
* Detect atomic float availability on VK.
* Try to get test working for in64 atomic.
* Made HLSL prelude controlled via the render-test requirements.
* Added -enable-nvapi to premake.
* Fix D3D12Renderer when NVAPI is not available.
* Small improvements to VKRenderer.
* Improve atomic documentation in target-compatibility.md.
* Fixed NVAPI working on D3D12.
* Test for specific NVAPI features.
* Remove requiredFeatures from Renderer::Desc as was ignored. Tried to document more around nvapiExtnSlot.
* Readded requiredFeatures to Renderer::Desc
* Improve comments in the tests.
|
|
* First pass at incorporating nvapi into test harness.
* D3d12 Atomic Float Add via NVAPI working
* Dx12 atomic float appears to work.
* Atomic float add on Dx12.
* Added atomic64 feature addition to vk.
Fix correct output for atomic-float-byte-address.slang
* Disable atomic float failing tests.
* Upgraded VK headers.
* Detect atomic float availability on VK.
* Try to get test working for in64 atomic.
* Made HLSL prelude controlled via the render-test requirements.
* Added -enable-nvapi to premake.
* Fix D3D12Renderer when NVAPI is not available.
* Small improvements to VKRenderer.
* Improve atomic documentation in target-compatibility.md.
|
|
Entry point `uniform` parameters were a feature of the original Cg and HLSL, but have not been used much in production shader code. One of our goals on Slang is to reduce the (ab)use of the global scope, so bringing entry point `uniform` parameters up to a greater level of usability is an important goal.
Some policy choices about how global vs. entry-point `uniform` parameters behave have already been made, that shape decisions looking forward:
* For DXBC/DXIL, it makes the most sense to follow the lead of fxc/dxc, by treating entry point `uniform` parameters as a kind of syntax sugar for global shader parameters. Any parameters of "ordinary" types are bundles up into an implicit constant buffer, and all the resources (including the implicit constant buffer) are assigned `register`s just as for globals. It is up to the application to decide how to bind those parameters via a root signature (using root descriptors, root constants, descriptor tables, local vs. global root signature, etc.)
* For CPU, it makes sense to pass global vs. entry-point parameters as two different pointers, although the details of what we do for CPU are the least constrained across all current targets.
* For CUDA compute, it makes the most sense to map global shader parameters to `__constant__` global data, and entry-point `uniform` parameters to kernel parameters. This choice ensures that the signature of a kernel when translated from Slang->CUDA follows the Principle of Least Surprise, at the cost of making entry-point vs. global parameters be passed via different mechanisms.
* For OptiX ray tracing, it makes sense to expand on the precedent from CUDA compute: pass global parameters via global `__constant__` data (as is already expected by OptiX for whole-launch parameters), and pass entry-point `uniform` parameters via the "shader record." This establishes a precedent that for ray-tracing shaders, global-scope parameters map to the "global root signature" concept from DXR, while entry-point `uniform` parameters map to a "local root signature" or "shader record."
* For Vulkan ray tracing, the precedent from OptiX then argues that entry-point `uniform` parameters should map to the Vulkan "shader record" concept (and thus cannot support things like resource types).
* The remaining interesting case is what to do for non-ray-tracing shaders on Vulkan.
The dev team agrees that the most reasonable choice to make for non-ray-tracing Vulkan shaders is to map entry-point `uniform` parameters to "push constants." In particular, this makes it easy to express the case of a compute kernel with direct parameters of ordinary/value types in the way that will be implemented most efficiently.
The big picture is then that a kernel like:
```hlsl
void computeMain(uniform float someValue) { ... }
```
will map to output GLSL like:
```glsl
layout(push_constant)
uniform
{
float someValue;
} U;
void main() { ... }
```
If the user really wanted a constant-buffer binding to be created instead, they can easily change their input to make the buffer explicit:
```hlsl
struct Params { float someValue; }
void computeMain(uniform ConstantBuffer<Params> params) { ... }
```
(Forcing the user to be explicit about the desire for a buffer here creates a nice symmetry between Vulkan and CUDA; in the first case the user sets up the data in host memory and passes it to the GPU by copy, while in the second case the user must allocate and set up a device-memory buffer for the data. This symmetry extends to D3D if the application chooses to map entry-point `uniform` parameters to root constants.)
This change implements logic in the "parameter binding" part of the Slang compiler to make sure that entry-point `uniform` parameters are wrapped up in a push-constant buffer rather than an ordinary constant buffer for non-ray-tracing shaders on Vulkan (and in a shader record "buffer" for the ray-tracing case).
The majority of the actual work was in adding support for root/push constants to the test framework and the graphics API abstraction it uses. To be clear about that support:
* Root constant ranges are (perhaps confusingly) treated as a new kind of "slot" that can appear on a descriptor set. This choice ensures that the implicit numbering of registers/spaces used by the back-ends can account for these ranges correctly.
* The `TEST_INPUT` lines are extended to allow a `root_constants` case that behaves more or less like `cbuffer`
* The CPU and CUDA paths can treat a `root_constants` input identically to a `cbuffer`. They already allocate the actual buffers based on reflection, and just use `cbuffer` as a directive that causes bytes to be copied in.
* On D3D12 and Vulkan, a descriptor set allocates a `List<char>` to hold the bytes of root constant data assigned into it, and these bytes are flushed to the command list when the table is actually bound (usually right before rendering).
* On D3D11, a descriptor set treats a root constant range more or less like a constant buffer range (with a single buffer), except that it also automatically allocates a buffer to hold the data. Assigning "root constant" data automatically copies it into that buffer.
The small number of tests that used entry-point `uniform` parameters of ordinary types were updated to use the new `root_constant` input type, and the bugs that surfaced were fixed.
A new test to confirm that entry-point `uniform` parameters map to the shader record for VK ray tracing was added.
An important but technically unrelated change is the removal of the `DescriptorSetImpl::Binding` type and related function from the Vulkan implementation of `Renderer`. That type was created to ensure that objects that are bound into a descriptor set don't get released while the descriptor set is still alive, but the implementation relied on a complicated linear search to check for existing bindings, which could create a performance issue for descriptor sets that include large arrays of descriptors. The new implementation makes use of the approach already present in the various `Renderer` implementations (including the Vulkan one) for assigning ranges in a descriptor set a flat/linear index for where their pertinent data is to be bound. As a result, the Vulkan `DescriptorSetImpl` now uses a single flat array of `RefPtr`s to track bound objects, and has no need for linear search when binding.
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
The Big Picture
===============
Given input Slang code like:
```hlsl
Texture2D gA;
[shader("compute")]
void kernelFunc(uniform Texture2D b, uint3 tid : SV_DispatchThreadID)
{ ... }
```
the existing CUDA code generation strategy would always generate a kernel with a signature like:
```c++
struct GlobalParams { Texture2D gA; }
struct EntryPointParams { Texture2D b; }
extern "C" __global__
void kernelFunc(EntryPointParams* entryPointParams, GlobalParams* globalParams)
{ ... }
```
This choice was consistent with the conventions of the CPU kernel target, and shares the advantage that it is easy for the user to data-drive the logic for filling in parameters and then invoking a kernel.
However, the approach outlined above has two serious problems when used for CUDA kernels:
* First, it defies the programmer's expectation about what an "equivalent" CUDA kernel signature would be, which makes it awkward for a developer to invoke this kernel from CUDA C++ host code (especially in the context of an app that might also run hand-written CUDA kernels).
* Second, the performance of this approach suffers because every access to a global or entry point parameter turns into a load from global memory. In contrast, a typical hand-written CUDA kernel passes its parameters via an implementation-specific path that (for current CUDA platforms) seems to be equivalent to `__constant__` memory in performance.
This change alters the convention so that the Slang compiler takes the code from the top of this message and translates it into something like:
```c++
struct GlobalParams { Texture2D gA; }
__constant__ GlobalParams SLANG_globalParams;
extern "C" __global__
void kernelFunc( Texture2D b )
{ ... }
```
This translation alleviates both problems with the current translation:
* The signature of the generated CUDA kernel function is as close to that of the original as is possible (we had to eliminate the `SV_*`-semantic varying inputs), and should directly match what the programmer would expect in common cases.
* Entry-point parameters are passed via CUDA kernel parameters, and should thus match in performance. Global parameters are passed via a variable in `__constant__` memory, and thus should also perform as well as possible/expected.
Detailed Changes
================
* Disable the `collectEntryPointUniformParams` pass for CUDA, so that entry-point `uniform` parameters are *not* bundles into a single `struct` and/or `ConstantBuffer`.
* When targeting CUDA, disable the logic for generating an entry-point parameter for passing in the global shader parameter(s)
* Allow `CLikeSourceEmitter` subclasses to override the name generated for entry-point symbols, and use this to add the required prefix for each OptiX kernel type when translating a ray-tracing kernel.
* Add logic to emit "parameter groups" in a specialized way for CUDA (this is the same approach that allows us to generate `cbufffer { ... }` declarations for fxc). A global-scope parameter group will turn into a global `__constant__` variable called `SLANG_globalParams` (that name becomes part of the ABI for Slang-compiled shaders).
* Update the logic in `render-test` for loading and invoking CUDA kernels to handle the new policy.
The last bullet there merits expansion, since it is indicative of the work a client using Slang would have to go through to use our generated kernels with the new policy:
* When loading a CUDA module with one or more kernels, we also use `cuModuleGetGlobal` to query the address of the `SLANG_globalParams` symbol in that CUDA module. That pointer needs to be used when setting global parameter values to be used by kernels in that CUDA odule.
* Because our existing `BindPoint` logic for CUDA always sets up parameter data in GPU memory, we end up having to copy the entry-point parameter data from GPU memory to host memory. This step would ideally be skipped in a codebase that understands the correct policy, but it is a bit unfortunate that it is no longer trivially correct for an application to store all parameter data in GPU memory.
* Before invoking the kernel, we need to use a `cudaMemcpyAsync` to copy from the prepared GPU memory for global parameters over to the `SLANG_globalParams` symbol associated with the kernel to be invoked. Because this operations is issued on the same CUDA stream as the kernel call, it is guaranteed to not overlap with GPU kernel execution.
* When invoking the kernel, we take advantage of the seldom-used `CU_LAUNCH_PARAM_BUFFER_POINTER` facility to specify a contiguous memory region with all the entry-point parameters in it instead of passing each entry-point parameter separately. Given Slang reflection it is also possible to query the offset of each entry-point parameter in the buffer, so we could invoke the kernel in the traditional fashion as well. The choice here is up to the application.
Caveats
=======
* This is a breaking change, and any subsequent release will need to reflect that fact. Any customers who rely on Slang's current CUDA codegen strategy are likely to be surprised by this change, and I don't see an easy way to give them a more gentle transition.
* This change does *not* remove the logic that introduces a `KernelContext` type for code that requires it. That means that things like `static` global variables can continue to work on CUDA for now, but we know that those are not going to be something we can support in the long-term with separate compilation.
* While the policy implemented in this change is a reasonable default, it is still not going to perfectly match expecations for some developers. In particular, some developers who are familiar with both D3D and CUDA will likely wonder why a global `cbuffer` in Slang translates to a global-memory pointer in the output CUDA instead of one global `__constant__` variable per `cbuffer`. A more detailed alternate translation would generate a distinct global `__constant__` variable for each top-level constant buffer or parameter block. We may need to refine the translation even more based on feedback from users who care about how we handle global-scope parameters.
* Recent changes in Slang have broken the logic that handles the OptiX "shader record" as an alternative mechanism for passing entry-point parameters. In order to get any level of OptiX support up and running we will have to change the IR passes that run on CUDA kernels to actually run the "collection" of `uniform` parameters for ray tracing stages, and then to replace references to the resulting parameter with a call to the function to access the shader record.
* The use of `SLANG_globalParams` here works well enough in the case of whole-program compilation; every `CUmodule` ends up with (zero or) one parameter with this name, and an application can just hard-code it. As a mechanism it wouldn't work in the presence of separately-compiled modules that might introduce their own global parameters (including cases like constant lookup tables that really want to be at the global scope). An alternative approach would have Slang generate output PTX for each module, where a module has an optional global symbol for its own global-scope parameters (with a mangled name that is based on the module name), and then a linked CUDA binary has all of those distinct symbols. Such an approach would be compatible with module-at-a-time reflection and parameter binding, but would lead to another breaking change down the line for code that switches to `SLANG_globalParams`.
|