slang.git - Making it easier to work with shaders

	Commit message (Collapse)	Author	Age
*	Various Fixes to gfx, reflection and emit. (#1867)	Yong He	2021-06-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Various Fixes to gfx, reflection and emit. - Fix GLSL emit to properly output `bitsTo` functions for `IRBitCast` insts. - Add line directive mode setting for `ISession`. - Extend `TypeLayout::getElementStride` to handle `VectorType` case. - Fix `IDevice::readBufferResource` 's D3D12 implementation to copy only the requested bytes out. - Fix `render-test` to use the `ISession` from `gfx` instead of creating its own `ISession` to make sure `gfx` and `render-test` agree on WitnessTable and RTTI IDs. - Extend `render-test` to support filling vector and matrix values in the new `set x = ...` TEST_INPUT syntax. - Add a `dynamic-dispatch-15` test case to make sure packing / unpacking works correctly across all targets, and to make sure render-test's RTTI/WitnessTable ID filling logic is correct for non-trivial cases. * Remove default-major test * Fix cyclic reference in `ExtendedTypeLayout`. * Move `lineDirectiveMode` setting to `TargetDesc`. Add `structureSize` to `TargetDesc` and `SessionDesc` for future binary compatibility. * Cleanup. Co-authored-by: Yong He <yhe@nvidia.com>
*	Rework shader object specialization control interface. (#1857)	Yong He	2021-05-25
\|
*	Allow overriding specialization args via `IShaderObject`. (#1854)	Yong He	2021-05-25
\| \| \| \| \| \| \|	* Allow overriding specialization args via `IShaderObject`. * Fixes. Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
*	Improvements in -X support (#1852)	jsmall-nvidia	2021-05-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Added SourceLoc handling for command line parsing. * Fix typo in debug. * Fix issue around the DiagnosticSink used in options parsing not having a writer available - by having DiagnosticSink parenting. * Small rename for clarity. * WIP extracting command line args for downstream tools. * Unit tests/bug fixes around extracting args. * Use DownstreamArgs in the EndToEndCompileRequest * Passing downstream compiler options downstream. * Fix issue with endToEndReq being nullptr. * Fix issue with diagnostics number change. * Small improvements to how the source line is displayed if it's too long. Default to 120, as suggested in previous review. * Make render test use x-args parsing and CommandArgReader. * Added missing diagnostics. * More DownstreamArgs to linkage so can be seen by 'components'. Added dxc-x-arg test. * Used combination of name and args instead of two Lists, which whilst equivalent was perhaps a little confusing. * Added documentation for -X support. * Added test for x-args parsing diagnostic. Improved diagnostic with list of known names. * Fix issues from merge. * Fix lookup for -matrix-layout-column-major in render test. * Remove commented out line.
*	[gfx] Support StructuredBuffer<IInterface>. (#1851)	Yong He	2021-05-21
\| \| \|	Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
*	`gfx` DebugCallback and debug layer. (#1822)	Yong He	2021-04-29
\| \| \|	* `gfx` DebugCallback and debug layer.
*	Remove resource `Usage` from `gfx` interface. (#1813)	Yong He	2021-04-24
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix `model-viewer` crash when using Vulkan. Fixing an issue in shader object layout creation for to make sure a correct descriptor set layout is calculated for types that need an implicit constant buffer. * Fix formatting. * Fixes. * Fix memory leak in vulkan. * Remove resource `Usage` from `gfx` interface.
*	Improve robustness of gfx lifetime management. (#1788)	Yong He	2021-04-08
\| \| \| \| \| \| \| \| \|	* Improve robustness of gfx lifetime management. * fix clang error * fix clang error * Fix clang warning
*	Transient root shader object. (#1782)	Yong He	2021-04-05
\|
*	Refactor D3D12 renderer root signature creation (#1779)	Tim Foley	2021-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change originated as an attempt to re-enable a test case, but it has ended up disabling more tests (for good reasons) than it re-enables. The main change here is a significant overhaul of the way that the D3D12 render path extracts information from the Slang reflection API to produce a root signature. There were also some supporting fixes in the reflection information to make sure it returns what the D3D12 back-end needed. The big picture here is that the D3D12 path now uses the descriptor ranges stored in the reflection data more or less directly. It still needs to use register/space offset information queried via the "old" reflection API, but it only does so at the top level now, for the program and entry points themselves. All other layout information is derived directly from what Slang provides. Smaller changes: * The "flat" reflection API was expanded to include `getBindingRangeDescriptorRangeCount()` which was clearly missing. * The "flat" reflection results for a constant buffer or parameter block that didn't contain any uniform data and was mapped to a plain constant buffer needed to be fixed up. That logic is still way to subtle to be trusted. * Several additional tests were disabled that relied on static specialization, global/entry-point generi type parameters, structured buffers of interfaces or other features we don't officially support with shader objects right now. All of the affected tests were somehow passing by sheer luck and because they often passed in specialization arguments via explicit `TEST_INPUT` lines. * The `inteface-shader-param` test is re-enabled now that we can properly describe its input with the new `set` mode on `TEST_INPUT` * `ShaderCursor::getElement()` can now be used on structure types (in addition to arrays) to support by-index access to fields * The `TEST_INPUT` system was expanded to support both by-name and by-index setting of structure fields for aggregates * The `TEST_INPUT` system was expanded to allow an `out` prefix to mark parts of an expression as outputs on a `set` lines * The `TEST_INPUT` system was expanded so that anything that would be allowed on a `TEST_INPUT` line by itself (like `ubuffer(...)`) can now be used as a sub-expression on a `set` line Co-authored-by: Yong He <yonghe@outlook.com>
*	`gfx` explicit transient resource management. (#1774)	Yong He	2021-03-31
\|
*	Add a streamlined syntax for TEST_INPUT lines (#1768)	Tim Foley	2021-03-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change allows the `TEST_INPUT` syntax used by `render-test` to support aggregate values with a single input line more easily. The test writer can now use a syntax like: ``` //TEST_INPUT:set someVar = 3.0 ``` Input lines that start with the `set` keyword will now use a simpler `dst = src` format (instead of `dst:name=src` as the existing syntax used). The right-hand side expression can include: * Numeric literals, both integer and floating-point (currently only supporting 32-bit scalar types; we could fix this later) * Arrays, consisting of zero or more comma-separated expressions inside `[]` * Aggregates, consisting of zero or more comma-separated "fields" inside `{}`. A field can either be `name: <expr>` or just `<expr>` * Objects, which can be written as either `new SomeType{ <fields> }` or `new{ <fields> }` in the case where the type is know-able from context With this approach is should be possible to support almost arbitrary-type inputs on a single line. For now, I have used this support to re-enable an existing test that had been disabled due to lack of support for setting up arrays of objects. Major things left to do: * The new syntax doesn't support the existing cases we had for `Texture2D`, etc. Those should probably be supported but I'd like to find a way to do it without duplicating the parsing logic (ideally the value cases from the existing code should Just Work in the new model) * There is no support right now for non-32-bit scalar types * It would be good if this support (and the shader cursor system) supported treating vectors like aggregates * The actual value-setting logic doesn't currently handle aggregates without field names, so `{ a:0, b:1 }` will work but `{ 0, 1 }` will parse but fail when it comes time to set values * While this approach lets complicated values be set with a single line, that isn't always what a user will want to do: in the future we should provide a way to break up an aggregate value over multiple lines that is consistent with this approach * Once we port all of the relvant tests over, it would be great to drop the `set` prefix and have these lines look as simple and conventional as possible
*	Clean up render-test handling of input (#1766)	Tim Foley	2021-03-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original goal of this change was to streamline the `TEST_INPUT` system by eliminating options that are no longer relevant once we have eliminated the non-shader-object execution paths. The result is more or less a re-implementation/refactor of the logic around how input is parsed and represented, that tries to set things up for a more general sytem going forward. The main changes isthat the `ShaderInputLayout` no longer tracks a simple flat list of `ShaderInputLayoutEntry` (that is a kind of pseudo-union of the various buffer/texture/value cases), and it instead uses a hierarchical representation composed of `RefObject`-derived classes to represent "values." There are several "simple" cases of values * Textures * Samplers * Uniform/ordinary data (`uniform`) * Buffers composed of uniform/ordinary data (`ubuffer`) Then there are composed/aggregate values that nest other values: * An aggregate value is a set of fields which are name/value pairs. It can be used to fill in a structure, for example. * An array value is a list of values for the elements of an array. It can be used to fill out an array-of-textures parameter, for example. * A combined texture/sampler value is a pair of a texture value and a sampler value (easy enough) * An object holds an optional type name for a shader object to allocate (it defaults to the type that is "under" the current shader cursor when binding), and a nested value that describes how to fill in the contents of that object Finally there are cases of values that are just syntactic sugar: * A `cbuffer` is just shorthand for creating an object value with a nested uniform/ordinary data value The big idea with this recursive structure is that it gives us a way to handle more arbitrary data types with name-based binding. Supporting this new capability requires changes to both how input layouts get parsed, and also how they get bound into shader objects. On the parsing side, things have been refactored a bit so that parsing isn't a single monolithic routine. The refactor also tries to make it so that the various options on an input item (e.g., the `size=...` option for textures) are only supported on the relevant type of entry (so you can't specify as many useless options that will be ignored). The bigger change to parsing is that it now supports a hierarchical structure, where certain input elements like `begin_array` can push a new "parent" value onto a stack, and subsequent `TEST_INPUT` lines will be parsed as children of that item until a matching `end` item. This approach means that we can now in principle describe arbitrary hierarchical structures as part of test input without endlessly increasing the complexity of invididual `TEST_INPUT` lines. On the binding side, we now have a central recursive operation called `assign(ShaderCursor, ShaderInputLayout::ValPtr)` that assigns from a parsed `ShaderInputLayout` value to a particular cursor. That operation can then recurse on the fields/elements/contents of whatever the cursor points to. Major open directions: * With this change it is still necessary to use `uniform` entries to set things like individual integers or `float`s and that is a little silly. It would be good to have some streamlines cases for setting individual scalar values. * Further, once we have a hierarchical representation of the values for `TEST_INPUT` lines, it becomes clear that we really ought to move to a format more like `TEST_INPUT: dstLocation = srcValue;` where `srcValue` is some kind of hierarchial expression grammar. Refactoring things in this way should make the binding logic even more clear and easy to understand. The refactored parser should make parsing hierarchical expressions easier to do in the future (even if it uses the push/pop model for now) * One detailed note is that the representation of buffers in this change is kind of a compromise. Just as an "object" value is a thin wrapper around a recursively-contained value for its "content" it seems clear that a buffer could be represented as a wrapper around a content value that could include hierarchical aggregates/objects instead of just flat binary data (this would be important for things like a buffer over a structure type that lays out different on different targets). The main problem right now with changing the representation is actually needing to compute the size of a buffer based on its content, so that can/should be addressed in a subsequent change. Details: * The base `RenderTestApp` class and the `ShaderObjectRenderTestApp` classes have been merged, since the hierarchy no longer serves any purpose. * Disabled the tess that rely on `StructuredBuffer<IWhatever>` because they aren't really supported by our current shader object implementation * Replaced used of `Uniform` and `root_constants` in `TEST_INPUT` lines with just `uniform` * Removed a bunch of uses of `stride` from `cbuffer` inputs, where it wasn't really correct/meaningful * Added the `copyBuffer()` operation to VK/D3D renderers, along with some missing `Usage` cases to support it. * Made `ShaderCursor` handle the logic to look up a name in the entry points of a root shader object, rather than just having that logic in `render-test`. (We probably need to make a clear design choice on this issue)
*	Remove old code paths from render-test (#1760)	Tim Foley	2021-03-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Remove old code paths from render-test Historically, the `render-test` tool was using three different code paths: * One based on `gfx` and manual (non-reflection-based) parameter setting, used for OpenGL, D3D11, D3D12, and Vulkan * One for CPU that used reflection-based parameter setting but shared no code with the first * One for CUDA that used reflection-based parameter setting and shared some, but not all, code with the CPU path Recently we've updated `render-test` to include a fourth option: * Using `gfx` and the "shader object" system it exposes for a unified reflection-based parameter-setting system taht works across OpenGL, D3D11, D3D12, Vulkan, CUDA, and CPU This change removes the first three options and leaves only the single unified path. A sa result, a bunch of code in `render-test` is no longer needed, and the codebase no longer relies on things like the `IDescriptorSet`-related APIs in `gfx`. Several existing tests had to be disabled to make this change possible. Those tests will need to be audited and either re-enabled once we fix issues in the shader object system, or permanently removed if they don't test stuff we intend to support in the long run (e.g., global-scope type parameters, which aren't a clear necessity). * fixup: CUDA detection logic
*	Add a CPU renderer implementation (#1750)	Tim Foley	2021-03-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add a CPU renderer implementation This change adds a CPU back-end to `gfx` and ensures that most of our existing CPU tests pass when using it. Detailed notes: * Most of the CPU renderer implementation is copy-pasted from the CUDA case, so they share a lot of similar logic * The main addition to the CPU renderer is a semi-complete implementation of host-memory textures. The logic here handles all the main shapes (Buffer, 1D, 2D, 3D, Cube) and all the currently-supported `Format`s that are sample-able as-is (no D24S8). The implementation is not intended to be fast, and it currently only does nearest-neighbor sampling, but otherwise it tries to avoid cutting too many corners and should be ar reasonable starting point for a more complete (but not performance-oriented) implementation. * Refactored the CPU prelude `IRWTexture` interface to inherit from `ITexture`, since in most cases a single type will end up implementing both. It might be worth it to collapse it all down to a single interface later. * Changed the CPU prelude `ITexture`/`IRWTexture` interface so that it takes both a pointer and a size for output arguments. This change seems necessary to allow a shader variable declared as a `Texture2D<float>` to fetch a single `float` when the underlying texture might be using RGBA32F. * Added to the `IComponentType` public API so that we can query a "host callable" for an entry point and not just a binary. * Turned off the `-shaderobj` flag on two tests that weren't yet compatible with shader objects but still had the flag left in on the path (since previously the CPU path always used the non-`gfx` non-shader-object logic anyway) * Disabled one test (`dynamic-dispatch-11`) that relied on the `ConstantBuffer<IInterface>` idiom that we know we are planning to chagne soon anyway. * Made a few changes to the CUDA path to bring it into line with what I added for the CPU path. These were mostly bug fixes around indexing logic for sub-objects and resources. * fixup
*	Swapchain resize and rename to `IDevice` (#1741)	Yong He	2021-03-10
\| \| \| \| \|	* Swapchain resize * Fix.
*	Refactor `gfx` to surface `CommandBuffer` interface. (#1735)	Yong He	2021-03-04
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Refactor `gfx` to surface `CommandBuffer` interface. * Fixes. * Fix code review issues, and make vulkan runnable on devices without VK_EXT_extended_dynamic_states. * Update solution files * Move out-of-date examples to examples/experimental Co-authored-by: Yong He <yhe@nvidia.com>
*	Explicit swapchain interface in `gfx`. (#1726)	Yong He	2021-02-24
\| \| \| \| \| \| \| \| \|	* Explicit swapchain interface in `gfx`. * Correctly return nullptr when `IRenderer` creation failed. * Fix crashes on CUDA tests. * Cleanups.
*	Make gfx library visible to external user. (#1719)	Yong He	2021-02-19
\| \| \| \| \|	* Make gfx library visible to external user. * Fixup
*	Streamline shader object creation (#1717)	Tim Foley	2021-02-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change kind of rolls together two different simplifications: 1. The `createShaderObject()` shouldn't really need to take an `IShaderObjectLayout` because it could just take the `slang::TypeLayoutReflection` instead and create the shader-object layout behind the scenes. 2. For that matter, it needn't take a `slang::TypeLayoutReflection` either, becaues it could just take a `slang::TypeReflection` and query the layout of that type behind the scenes. The combination of these two changes means: * `IShaderObjectLayout` is gone from the public API, as is `createShaderObjectLayout()` * `createShaderObject()` directly takes a `slang::TypeReflection` and allocates a shader object of that type The result is simpler and more streamlined application code. Note that under the hood the implementation still has shader-object layouts, using the `ShaderObjectLayoutBase` class. A few locations had to change to use `RefPtr`s instead of `ComPtr`s now that the class is no longer a public COM-lite API type. The hope is that this change makes it easier to allocate/cache layouts for things like specialized types "under the hood," as is needed to implement parameter setting for static specialization.
*	Add `SampleGrad` overload for lod clamp. (#1711)	Yong He	2021-02-17
\| \| \| \| \| \| \| \| \| \| \| \|	* Add `SampleGrad` overload for lod clamp. * Fix gfx to run the test on vulkan. * Whitespace change to trigger CI build * remove presentFrame call in render-test Co-authored-by: Yong He <yhe@nvidia.com> Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	DX12 & NVAPI fixes (#1695)	jsmall-nvidia	2021-02-08
\| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Fix bugs with m_features on Dx12 and gl. Fix issue about GFX_NVAPI availability. * Fix handling of SLANG_E_NOT_AVAILABLE on renderer startup. * Clarify comment. * Improve comment.
*	[gfx] Shader-object driven shader compilation. (#1688)	Yong He	2021-02-04
\|
*	Make own a slang session. (#1678)	Yong He	2021-01-27
\|
*	Integrate reflection more deeply into gfx layer (#1677)	Tim Foley	2021-01-26
\|
*	Make `ShaderCursor` no longer depend on core. (#1661)	Yong He	2021-01-19
\|
*	Make `gfx` compile to a DLL. (#1660)	Yong He	2021-01-17
\| \| \| \| \| \| \| \| \|	* Make `gfx` compile to a DLL. * Fix cuda * Fix cuda build * Bug gl screen capture bug.
*	Convert more tests to use shader objects (#1659)	Tim Foley	2021-01-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change converts a large number of our existing tests to use the `ShaderObject` support that was added to the `gfx` layer. In many cases, tests were just updated to pass `-shaderobj` and the result Just Worked. In other cases, a `name` attribute had to be added to one or more `TEST_INPUT` lines. For tests that did not work with shader objects "out of the box," I spent a little bit of time trying to get them work, but fell back to letting those tests run in the older mode. Future changes to the infrastructure will be needed to get those additional tests working in the new path. Along with the changes to test files, the following implementation changes were made to get additional tests working: * Because the shader object mode uses explicit register bindings (from reflection), the hacky logic that was offseting `u` registers for D3D12 based on the number of render targets gets disabled (by another hack). * The "flat" reflection information coming from Slang was not correctly reporting "binding ranges" for things that consumed only uniform data (which would be everything on CUDA/CPU), so it was refactored to properly include binding ranges for anything where the type of the field/variable implied a binding range should be created (even if the `LayoutResourceKind` was `::Uniform`). * A few fixes were made to the CUDA implementation of `Renderer`, in order to get additional tests up and running. Most of these changes had to do with texture bindings, which hadn't really been tested previously. In addition, a few changes were made that were attempts at getting more tests working, but didn't actually help. These could be dropped if requested: * As a quality-of-life feature (not being used) the `object` style of `TEST_INPUT` line is upgraded to support inferring the type to use from the type of the input being set. * Any `object` shader input lines get ignored in non-shader-object mode.
*	COM-ify all slang-gfx interfaces. (#1656)	Yong He	2021-01-14
\| \| \|	* COM-ify all slang-gfx interfaces.
*	Make `gfx::Renderer` a COM interface. (#1653)	Yong He	2021-01-11
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Make `gfx::Renderer` a COM interface. This is a first step towards making the `gfx` library expose a COM compatible DLL interface. Remaining classes will come as separate PRs. * Fixup project files * Fix calling conventions * Make gfx::createRenderer() functions increase ref count by 1 Make renderer createFunc return via out parameter
*	Implements CUDA renderer in gfx. (#1637)	Yong He	2020-12-11
\| \| \| \| \| \| \| \| \|	* Implements CUDA renderer in gfx. * Revert unnecessary change. * Revert unnecessary changes. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Move ShaderObject to be under renderer interface. (#1633)	Yong He	2020-12-10
\| \| \| \| \| \| \|	* Move ShaderObject to be under renderer interface. * Make `createPipelineState` take `const PipelineStateDesc&`. Move ShaderCursor implementation to a cpp file
*	Add shader object parameter binding to renderer_test. (#1622)	Yong He	2020-12-03
\| \| \| \| \| \| \| \| \|	* Add shader object parameter binding to renderer_test. * remove multiple-definitions.hlsl * Fix cuda implementation. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Test for serializing out and reading back Stdlib (#1605)	jsmall-nvidia	2020-11-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Mangling/module name extraction for GenericDecl * Add comment on SerialFilter to explain re-enabling Stmt. * Support setting up SyntaxDecl when reconstructed after deserialization. * Improvements to setup SyntaxDecl. * Fix typo so can read compressed SourceLocs. * Fix issue with SourceManger. * Simple test for serializing out stdlib and reading back in. * Fix calling convention. * Add override to StdLib impls. * Fix typo. * Apply testing to an actual compute test when using load-stdlib Make -load/compile-stdlib processable by Slang Move out testing into util into TestToolUtil so can be shared. * Slightly more concise setup of session. * Fix some errors introduced with session handling. * Made setup for compile same across slangc and slangc-tool.
*	Support CUDA bindless texture in dynamic dispatch code. (#1575)	Yong He	2020-10-09
\|
*	Support dynamic existential shader parameters in render-test (#1525)	Yong He	2020-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Support dynamic existential shader parameters in render-test * Fix linux build error. * Fixes. * Fix code review issues. * Fix gcc error. * More fixes. * More fixes.
*	NVAPI improvements (#1512)	jsmall-nvidia	2020-08-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First pass at incorporating nvapi into test harness. * D3d12 Atomic Float Add via NVAPI working * Dx12 atomic float appears to work. * Atomic float add on Dx12. * Added atomic64 feature addition to vk. Fix correct output for atomic-float-byte-address.slang * Disable atomic float failing tests. * Upgraded VK headers. * Detect atomic float availability on VK. * Try to get test working for in64 atomic. * Made HLSL prelude controlled via the render-test requirements. * Added -enable-nvapi to premake. * Fix D3D12Renderer when NVAPI is not available. * Small improvements to VKRenderer. * Improve atomic documentation in target-compatibility.md. * Fixed NVAPI working on D3D12. * Test for specific NVAPI features. * Remove requiredFeatures from Renderer::Desc as was ignored. Tried to document more around nvapiExtnSlot. * Readded requiredFeatures to Renderer::Desc * Improve comments in the tests.
*	Vulkan update/NVAPI support (#1511)	jsmall-nvidia	2020-08-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First pass at incorporating nvapi into test harness. * D3d12 Atomic Float Add via NVAPI working * Dx12 atomic float appears to work. * Atomic float add on Dx12. * Added atomic64 feature addition to vk. Fix correct output for atomic-float-byte-address.slang * Disable atomic float failing tests. * Upgraded VK headers. * Detect atomic float availability on VK. * Try to get test working for in64 atomic. * Made HLSL prelude controlled via the render-test requirements. * Added -enable-nvapi to premake. * Fix D3D12Renderer when NVAPI is not available. * Small improvements to VKRenderer. * Improve atomic documentation in target-compatibility.md.
*	Fix CUDA build of render-test (#1316)	Tim Foley	2020-04-10
\| \| \|	The CUDA build of the render-test tool had been broken in a fixup change to #1307 (which was ostensibly adding features for the CUDA path). The fix is a simple one-liner.
*	Initial work to support OptiX output for ray tracing shaders (#1307)	Tim Foley	2020-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Initial work to support OptiX output for ray tracing shaders This change represents in-progress work toward allowing Slang/HLSL ray-tracing shaders to be cross-compiled for execution on top of OptiX. The work as it exists here is incomplete, but the changes are incremental and should not disturb existing supported use cases. One major unresolved issue in this work is that the OptiX SDK does not appear to set an environment variable Changes include: * Modified the premake script to support new options for adding OptiX to the build. Right now the default path to the OptiX SDK is hard-coded because the installer doesn't seem to set an environment variable. We will want to update that to have a reasonable default path for both Windows and Unix-y platforms in a later chance. * I ran the premake generator on the project since I added new options, which resulted in a bunch of diffs to the Visual Studio project files that are unrelated to this change. Many of the diffs come from previous edits that added files using only the Visual Studio IDE rather than by re-running premake, so it is arguably better to have the checked-in project files more accurately reflect the generated files used for CI builds. * The "downstream compiler" abstraction was extended to have an explicit notion of the kind of pipeline that shaders are being compiled for (e.g., compute vs. rasterization vs. ray tracing). This option is used to tell the NVRTC case when it needs to include the OptiX SDK headers in the search path for shader compilation (and also when it should add a `#define` to make the prelude pull in OptiX). This code again uses a hard-coded default path for the OptiX SDK; we will need to modify that to have a better discovery approach and also to support an API or command-line override. * One note for the future is that instead of passing down a "pipeline type" we could instead pass down the list/set of stages for the kernels being compiled, and the OptiX support could be enabled whenever there is any ray tracing entry point present in a module. That approach would allow mixing RT and compute kernels during downstream compilation. We will need to revisit these choices when we start supporting code generation for multiple entry points at a time. * The CUDA emit logic is currently mostly unchanged. The biggest difference is that when emitting a ray-tracing entry point we prefix the name of the generated `__global__` function with a marker for its stage type, as required by the OptiX runtime (e.g., a `__raygen__` prefix is required on all ray-generation entry points). * The `Renderer` abstraction had a bare minimum of changes made to be able to understand that ray-tracing pipelines exist, and also that some APIs will require the name of each entry point along with its binary data in order to create a program. * The `ShaderCompileRequest` type was updated so that only a single "source" is supported (rather than distinct source for each entry point), and also the entry points have been turned into a single list where each entry identifies its stage instead of a fixed list of fields for the supported entry-point types. * The CUDA compute path had a lot of code added to support execution for the new ray-tracing pipeline type. The logic is mostly derived from the `optixHello` example in the OptiX SDK, and at present only supports running a single ray-generation shader with no parameters. The code here is not intended to be ready for use, but represents a signficiant amount of learning-by-doing. * The `slang-support.cpp` file in `render-test` was updated so that instead of having separate compilation logic for compute vs. rasterization shaders (which would mean adding a third path for ray tracing), there is now a single flow to the code that works for all pipeline types and any kind of entry points. * Implicit in the new code is dropping support for the way GLSL was being compiled for pass-through render tests, which means pass-through GLSL render tests will no longer work. It seems like we didn't have any of those to begin with, though, so it is no great loss. * Also implicit are some new invariants about how shaders without known/default entry points need to be handled. For example, the ray tracing case intentionally does not fill in entry points on the `ShaderCompileRequest` and instead fully relies on the Slang compiler's support for discovering and enumerating entry points via reflection. As a consequence of those edits the `-no-default-entry-point` flag on `render-test` is probably not working, but it seems like we don't have any test cases that use that flag anyway. Given the seemingly breaking changes in those last two bullets, I was surprised to find that all our current tests seem to pass with this change. If there are things that I'm missing, I hope they will come up in review. * fixup: issues from review and CI * Some issues noted during the review process (e.g., a missing `break`) * Fix logic for render tests with `-no-default-entry-point`. I had somehow missed that we had tests reliant on that flag. This required a bit of refactoring to pass down the relevant flag (luckily the function in question was already being passed most of what was in `Options`, so that just passing that in directly actually simplifies the call sites a bit. * There was a missing line of code to actually add the default compute entry points to the compile request. I think this was a problem that slipped in as part of some pre-PR refactoring/cleanup changes that I failed to re-test.
*	CUDA version handling (#1301)	jsmall-nvidia	2020-03-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* render feature for CUDA compute model. * Use SemanticVersion type. * Enable CUDA wave tests that require CUDA SM 7.0. Provide mechanism for DownstreamCompiler to specify version numbers. * Enabled wave-equality.slang * Make CUDA SM version major version not just a single digit. * Fix assert. * DownstreamCompiler::Version -> CapabilityVersion
*	Better diagnostics on failure on CUDA. (#1288)	jsmall-nvidia	2020-03-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Better diagnostics on failure on CUDA. * Catch exceptions in render-test * * Added ability to disable reporting on CUDA failures * Stopped using exception for reporting (just write to StdWriter::out() * Removed CUDAResult type * Don't set arch type on nvrtc to see if fixes CI issues. * Try compute_30 on CUDA. * Added ability to IGNORE_ a test DIsabled rw-texture-simple and texture-get-dimensions * Disable tests that require CUDA SM7.0 Use DISABLE_ prefix to disable tests. * Disable signalUnexpectedError doing printf.
*	Fixes to make all CPU compute shaders work on CUDA (#1211)	jsmall-nvidia	2020-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Launch CUDA test taking into account dispatch size. * Enable isCPUOnly hack to work on CUDA. * Rename 'isCPUOnly' hack to 'onlyCPULikeBinding'. * Add $T special type. Support SampleLevel on CUDA. * Fix typo.
*	Feature/test for double behavior (#1186)	jsmall-nvidia	2020-01-29
\| \| \| \| \| \| \| \| \| \| \| \|	* Split out binding writing. * Pass in the entry type. * Take into account output type with -output-using-type Added GPULikeBindRoot Added dxbc-double-problem test. * Add the dxbc-double-problem test.
*	Synthesizing CUDA tests (#1183)	jsmall-nvidia	2020-01-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* When using setUniform clamp the amount of data written to the buffer size. * CUDA implement StructuredBuffer/ByteAddressBuffer as pointer/count as is on CPU. Allow bounds check to zero index. Update docs. * Synthesize tests. * Fix bug in CUDA output. * Fixing more tests to run on CUDA. * Added BaseType for layout of Vector and Matrix - as they are held as int32_t vector array types. * Enable unbound array support on CUDA. * Added unsized array support for CUDA documentation.
*	Slang -> CUDA kernel runs correctly in test infrastructure (#1167)	jsmall-nvidia	2020-01-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First pass at BindLocation. * Added BindSet::init - for initializing with two input constant buffers. Needs better name, and perhaps should be another class. * Fix handling of constant buffer stripping. Improved initialization. * Trying to generalize BindLocation a little more. Split out CPULikeBindRoot. * More work to make BindLocation et al work with non uniform bindings. * Added parsing to a location. * WIP: Trying to get CPU working with BindLocation. * Describe problem of knowing the type of the reference point in the binding table. * More ideas on getBindings fix. * Remove BindSet as member of BindLocation. * Added BindLocation::Invalid * Made BindLocation able to be key in hash * Use BindLocation for bindings on BindingSet. * Added cuda and nvrtc categories to test infrastructure. Disabled CUDA synthetic tests by default. Fixed such that all tests now produce something in BindLocation style. * Use m_userIndex instead of m_userData on Resource. Move the binding setup out of cpu-compute-util (as no longer CPU specific) * Removed CPUBinding - used BindLocation/BindSet instead. Fixed some bugs around indexOf around uniform indirection. * Renamed BindSet::Resource -> BindSet::Value. * Document BindLocation. * Fixes for Clang/GCC Improve invariant requirement handling when constructing from BindPoints. * WIP: First attempt to run CUDA kernel. * Fix some issues around doing CUDA kernel launch. * Fix issues around use of cudaMemCpy . * Better cuda runtime error checking mechanism. * Fixed bug in passing parameters to cuda kernel launch. Simplified initialisation of context. * WIP: Fix CUDA runtime issues. * Add explicit CUDA synchronize so failures don't appear on implicit ones. * Fix problem emitting non shared variable on CUDA. * Fix some typos in CUDA layout. Use just a pointer for now for CUDA StucturedBuffer. * Arg order for CUDA launch was wrong. * First compute kernel runs on CUDA.
*	Bind Location (#1166)	jsmall-nvidia	2020-01-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First pass at BindLocation. * Added BindSet::init - for initializing with two input constant buffers. Needs better name, and perhaps should be another class. * Fix handling of constant buffer stripping. Improved initialization. * Trying to generalize BindLocation a little more. Split out CPULikeBindRoot. * More work to make BindLocation et al work with non uniform bindings. * Added parsing to a location. * WIP: Trying to get CPU working with BindLocation. * Describe problem of knowing the type of the reference point in the binding table. * More ideas on getBindings fix. * Remove BindSet as member of BindLocation. * Added BindLocation::Invalid * Made BindLocation able to be key in hash * Use BindLocation for bindings on BindingSet. * Added cuda and nvrtc categories to test infrastructure. Disabled CUDA synthetic tests by default. Fixed such that all tests now produce something in BindLocation style. * Use m_userIndex instead of m_userData on Resource. Move the binding setup out of cpu-compute-util (as no longer CPU specific) * Removed CPUBinding - used BindLocation/BindSet instead. Fixed some bugs around indexOf around uniform indirection. * Renamed BindSet::Resource -> BindSet::Value. * Document BindLocation. * Fixes for Clang/GCC Improve invariant requirement handling when constructing from BindPoints.
*	Setup of runtime cuda device (#1162)	jsmall-nvidia	2020-01-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* CUDA generated first test compiles. * WIP on enabling CUDA in render-test. * Detect CUDA_PATH environmental variable to build build cuda support into render-test. Added WIP cuda-compute-util.cpp/h Added CUDA as a renderer type. * Fix libraries needed for cuda in premake. * Added -enable-cuda premake option. Defaults to false. * Creates CUDA device, loads PTX and finds entry point. * Fix some erroneous cruft from slang-cuda-prelude.h
*	Remove support for explicit register/binding syntax on TEST_INPUT (#1132)	Tim Foley	2019-11-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The `TEST_INPUT` facility allows textual Slang test cases to provide two kinds of information to the `render-test` tool: 1. Information on what shader inputs exist 2. Information on what values/objects to bind into those shader inputs Under the first category of information, there exists supporting for attaching a `dxbinding(...)` annotation to a `TEST_INPUT` which seemingly indicates what HLSL `register` the input uses. There is a similar `glbinding(...)` annotation, used for OpenGL and Vulkan. It turns out that these annotations were, in practice, completely ignored and had no bearing on how `render-test` allocates or bindings graphics API objects. There was some amount of code attempting to validate that explicit registers/bindings were being set appropriately, but the actual values were being ignored. The visible consequence of the `dxbinding` and `glbinding` annotations being ignored is issue #1036: the order of `TEST_INPUT` lines was de facto determining the registers/bindings that were being used by `render-test`. This change simply removes the placebo features and strips things down to what is implemented in practice: the `TEST_INPUT` lines do not need target-API-specific binding/register numbers, because their order in the file implicitly defines them. I added logic to the parsing of `TEST_INPUT` lines to make sure I got an error message on any leftover annotations, and went ahead and systematicaly deleted all of the placebo annotations from our test cases. If we decide to make `TEST_INPUT` lines not depend on order of declaration in the future, we can build it up as a new and better considered feature. The main alternative I considered was to keep the annotations in place, and change `render-test` and the `gfx` abstraction layer to properly respect them, but that path actually creates much more opportunity for breakage (since every single test case would suddenly be specifying its root signature / pipeline layout via a different path using data that has never been tested). The approach in this change has the benefit of giving me high confidence that all the test cases continue to work just as they had before.
*	Simple test profiling (#1062)	jsmall-nvidia	2019-09-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First pass support for performance profiling * Test across all elements * Fix bug - sourceContents is not used, should use rawSource. * * Add ability to get prelude from API. * Allow specifying source language for render-test * Made it possible to compile a test input file as C++ * Special handling for reflection * Added C++ impl to performance-profile.slang * Remove some clang warnings. * Output profile timings on appveyor and other TC. * Remove passing around of StdWriters (can use global). Small comment improvements.