| Age | Commit message (Collapse) | Author |
|
If the user has a derived `struct` type:
```hlsl
struct Base { int b = 1; }
struct Derived : Base { int d = 2; }
```
Then it is still reasonable for them to want to use initializer lists when declaring variables using the `Derived` type:
```hlsl
Derived x = {};
Derived y = { 7, 8 };
```
This change implements two missing pieces of functionality in the Slang compiler to allow this case:
* First, when the front-end semantic checks are applied to an initializer list, if the type being initialized is a derived `struct` type it always expects to find initialization arguments for its base type before those for its fields.
* Second, when lowering an initializer-list expression from the AST to the IR, the compiler expects the first argument in the list to be the initial value for the base field (if any). This also applies to default-initialization of fields/variables.
This change slightly entangles front-end logic with the logic for how struct inheritance is lowered to the IR, but the behavior is unlikely to confuse users who expect C++-like layout.
It is worth noting that with this change it should be possible to initialize the base type using either a nested initializer list or flat arguments:
```hlsl
struct BigBase { int x; int y; int z; }
struct BigDerived : BigBase { int w; }
BigDerived a = { {1,2,3}, 4 };
BigDerived b = { 1, 2, 3, 4 };
```
This behavior should Just Work because of the existing C-like rules for initializer lists where an aggregate can be initialized by either a `{}`-enclosed block or distinct values for its leaf fields.
|
|
During lowering from AST to IR, the Slang compiler translates code that uses `struct` inheritance:
```hlsl
struct Base { int a; }
struct Derived : Base {}
```
into code where the inheritance relationship is "witnessed" by a simple field:
```hlsl
struct Base { int a; }
struct Derived { Base __anonymous_field__; }
```
The underlying bug here is that the `__anonymous_field__` that the compiler generated during IR lowering was not being given any linkage decorations (no mangled name). As a result, if multiple separately-compiled modules all access that field they could disagree on its identity as an IR instruction. This could lead to output code being generated where the declaration of `__anonymous_field__` uses one IR instruction, but accesses use another.
This change includes a fix for the issue, and a test that serves as a reproducer for the original problem.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* WIP JSONWriter/JSONParser.
* Checking different Layout styles for JSON.
* Add slang-json-parser.h/.cpp
|
|
(#1855)
Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
Co-authored-by: Yong He <yonghe@outlook.com>
Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
|
|
The basic bug here was that `enum` types with an explicit tag type:
enum Color : int32_t { ... }
would have an `InheritanceDecl` implying that `Color` inherits from
`int32_t`. The problem is that this is *not* actually an inheritance
relationship, since a `Color` needs to be explicitly cast to/from an
`int32_t`.
Various parts of the compiler currently treat this case like real
inheritance, and as a result the operations taht would apply to an
`int32_t` end up applying to a `Color` as well. This particularly leads
to an ambiguity between applying the `==` operator, because it has
overloads for both the `__EnumType` and `__Builtin{something}`
interfaces.
The fix here is to explicitly exclude the `InheritanceDecl` that
represents an enumeration tag type when considering declared subtype
relationships. A more complete version of this fix would need to go
through all places in the code where `InheritanceDecl`s are used and
make sure that any places using them for true inheritnace relationships
ignore those that represent an enumeration tag type.
(An alternative option would be to use a distinct kind of `Decl` to
represent the tag-type relationship, perhaps even going so far as to
modifying the type of the relevant AST node as part of semantic
checking)
This change includes a regression test for the way this bug surfaced in
user code.
Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* WIP Json lexer.
* Check JSON Lex with unit test
* Add JSON escaping/unescaping of strings.
* Big fix encoding/decoding.
* Fix typo in JSON diagnostics.
* Fix typo.
* Better float testing.
|
|
|
|
* Allow overriding specialization args via `IShaderObject`.
* Fixes.
Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
|
|
* OptiX ray payload can now be read from and written to using the two payload register pointer method
* changing op to more descriptive name
* fixup: comment change to re-trigger CI
Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Added SourceLoc handling for command line parsing.
* Fix typo in debug.
* Fix issue around the DiagnosticSink used in options parsing not having a writer available - by having DiagnosticSink parenting.
* Small rename for clarity.
* WIP extracting command line args for downstream tools.
* Unit tests/bug fixes around extracting args.
* Use DownstreamArgs in the EndToEndCompileRequest
* Passing downstream compiler options downstream.
* Fix issue with endToEndReq being nullptr.
* Fix issue with diagnostics number change.
* Small improvements to how the source line is displayed if it's too long.
Default to 120, as suggested in previous review.
* Make render test use x-args parsing and CommandArgReader.
* Added missing diagnostics.
* More DownstreamArgs to linkage so can be seen by 'components'.
Added dxc-x-arg test.
* Used combination of name and args instead of two Lists, which whilst equivalent was perhaps a little confusing.
* Added documentation for -X support.
* Added test for x-args parsing diagnostic. Improved diagnostic with list of known names.
* Fix issues from merge.
* Fix lookup for -matrix-layout-column-major in render test.
* Remove commented out line.
|
|
Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Added SourceLoc handling for command line parsing.
* Fix typo in debug.
* Fix issue around the DiagnosticSink used in options parsing not having a writer available - by having DiagnosticSink parenting.
* Small rename for clarity.
* WIP extracting command line args for downstream tools.
* Unit tests/bug fixes around extracting args.
* Use DownstreamArgs in the EndToEndCompileRequest
* Passing downstream compiler options downstream.
* Fix issue with endToEndReq being nullptr.
* Fix issue with diagnostics number change.
* Small improvements to how the source line is displayed if it's too long.
Default to 120, as suggested in previous review.
Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
|
|
* Overhaul the preprocessor
The old Slang preprocessor was based on a simple mental model that tried to unify two parts of macro expansion:
* Scanning for macro invocations in a sequence of tokens
* Producing the expanded tokens for a macro expansion by substituting arguments into its body
The basic was that substitution of macro arguments into a macro definition is superficially similar to top-level macro expansion, just with an environment where the macro arguments act like `#define`s for the corresponding parameter names. That approach was "clever" and could conceivably have been extended to include a lot of advanced preprocessor features (e.g., a preprocessor-level `lambda` would be easy to support!), but it was basically impossible to make it correctly handle all the corner cases of the full C/C++ preprocessor.
The fundamental problem with the old approach was that it conflated the two parts of expansion listed above into one implementation, while the various special cases of the C/C++ preprocessor rely on treating the two cases very differently. The new approach here (which is somewhere between a refactor and a full rewrite of the preprocessor) changes things up in a few key ways:
* The abstraction still cares a lot about streams of tokens, but it now treats the top level streams (`InputFile`s) as fairly different from the lower-level streams (`InputStream`s)
* Macro expansion is handled as a dedicated type of stream that wraps another stream. This allows macro expansion to be applied to anything, and supports cases where multiple rounds of macro expansion are required by the spec.
* Macro *invocations* and the substitution of their arguments are now handled by a completely new system.
* Macro arguments are no longer treated as if they were `#define`s
* The macro body/definition is analyzed at definition time to detect various kinds of issues, and to derive a list of "ops" that make it easier to "play back" the definition at substitution time
* Token pasting and stringizing are now only handled in macro definitions (rather than being allowed anywhere), and their use cases are restricted to only those that make sense (e.g., you can't stringize anythign except a macro parameter, because anything else wouldn't make sense)
The key new types here are the `ExpansionInputStream` which handles scanning for macro invocations, and the `MacroInvocation` type, which handles playing back the macro body with substitutions.
The `ExpansionInputStream` is the easier of the two to understand. By refactoring it to use a single token of lookahead, the one major detail it had to deal with before (abandoning expansion of a function-like macro if the macro name was not followed by `(`) is significantly easier to manage.
The more subtle part is the `MacroInvocation` type, and most of the complexity there is around handling of token pasting, and the fact that either or both of the operands to a token paste might be empty.
Many of the test cases that exposed the problems in the preprocessor have been moved from `current-bugs` to `preprocessor` since they now work correctly.
* debugging: enable extractor command line dump
* fixup
* fixup
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Added SourceLoc handling for command line parsing.
* Fix typo in debug.
* Fix issue around the DiagnosticSink used in options parsing not having a writer available - by having DiagnosticSink parenting.
* Small rename for clarity.
Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* WIP Fxc as downstream compiler.
* First pass FXC downstream compiler working.
* GCC compile fix.
* Fix FXC parsing issue.
* Special case filesystem access.
* Use StringUtil getSlice.
* Fix isses with not emitting source for FXC.
* WIP on DXC.
* Small fixes for DXBC handling.
* Removed DXC from ParseDiagnosticUtil (can use generic)
Try to improve output for notes from DXC.
* FIrst pass of Glslang as DownstreamCompiler
* Fix some problems with parsing for glslang replacement.
* Add slang-glslang-compiler.cpp/.h
* Fix downstream for spir-v output.
* dissassemble -> disassemble
* Fix typo and improve some naming/comments.
* Remove getSharedLibrary from DownstreamCompiler
* Removed some no longer used diagnostics.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Fix for writing to RWTexture with half types on CUDA.
* CUDA half functionality doc updates.
* First pass support for sust.p RWTexture format conversion on write.
* Tidy up implementation of $C.
Made clamping mode #define able.
* A simple test for RWTexture CUDA format conversion.
* Add support for float2 and float4.
* WIP conversion testing.
* Use $E to fix byte addressing in X in CUDA.
* Do not scale when accessing via _convert versions of surface functions.
* Revert to previous test.
* Test with half/float convert write/read.
* More broad half->float read conversion testing.
* Improve documentation around half and RWTexture conversion.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Fix for writing to RWTexture with half types on CUDA.
* CUDA half functionality doc updates.
* First pass support for sust.p RWTexture format conversion on write.
* Tidy up implementation of $C.
Made clamping mode #define able.
* A simple test for RWTexture CUDA format conversion.
* Use $E to fix byte addressing in X in CUDA.
* Do not scale when accessing via _convert versions of surface functions.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Fix for writing to RWTexture with half types on CUDA.
* CUDA half functionality doc updates.
* First pass support for sust.p RWTexture format conversion on write.
* Tidy up implementation of $C.
Made clamping mode #define able.
* A simple test for RWTexture CUDA format conversion.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* WIP Fxc as downstream compiler.
* First pass FXC downstream compiler working.
* GCC compile fix.
* Fix FXC parsing issue.
* Special case filesystem access.
* Use StringUtil getSlice.
* Fix isses with not emitting source for FXC.
* WIP on DXC.
* Small fixes for DXBC handling.
* Removed DXC from ParseDiagnosticUtil (can use generic)
Try to improve output for notes from DXC.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* WIP Fxc as downstream compiler.
* First pass FXC downstream compiler working.
* GCC compile fix.
* Fix FXC parsing issue.
* Special case filesystem access.
* Use StringUtil getSlice.
* Fix isses with not emitting source for FXC.
* Small fixes for DXBC handling.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Fix for issue where threading KernelContext was not working on C++ test when there were multiple invocations.
* Improve test for context threading.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Fix for writing to RWTexture with half types on CUDA.
* CUDA half functionality doc updates.
|
|
The `OverloadedExpr` type didn't provide a default value for its field:
Name* name;
This led to a null-pointer crash in the logic that deals with synthesizing interface requirements because it creates an `OverloadedExpr` but doesn't initialize the field.
This change makes two fixes:
1. The logic in the synthesis path actually initializes `name` so that it can feed into any downstream error messages
2. The `OverloadedExpr` declaration now includes an initial value for `name` so that it will at least be null instead of garbage if we slip up again
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Split out StringEscapeUtil.
* Added StringEscapeUtil.
* Fix typo in unix quoting type.
* Small comment improvements.
* Try to fix linux linking issue.
* Fix typo.
* Attempt to fix linux link issue.
* Update VS proj even though nothing really changed.
* Fix another typo issue.
* Fix for windows issue.
Fixed bug.
* Make separate Utils for escaping.
* Fix typo.
* Split out into StringEscapeHandler.
* Windows shell does handle removing quotes (so remove code to remove them).
* Handle unescaping if not initiating using the shell.
* Slight improvement around shell like decoding.
* Simplify command extraction.
* Add shared-library category type.
* Fix bug in command extraction.
* Typo in transcendental category.
* Enable unit-test on in smoke test category.
* Make parsing failing output as a failing test.
* Fixes for transcendental tests. Disable tests that do not work.
* Changed category parsing.
* Removed the TestResult parameter from _gatherTestsForFile.
Made testsList only output.
* Remove testing if all tests were disabled.
* Make args of CommandLine always unescaped.
* Add category.
* Don't need escaping on unix/linux.
* Remove some no longer used functions.
* Add requireSMVersion to CUDAExtensionTracker.
* half-calc.slang now works for CUDA.
* bit-cast-16-bit works on CUDA.
* WIP handling of CUDA vector<half> types.
* Half swizzle CUDA.
* Half vector test.
* Fix swizzle half bug.
* Fix compilation issue with narrowing to Index.
* Add unary ops.
* Add some vector scalar maths ops.
* Add half vector conversions for CUDA.
* Fix erroneous comment.
* Support for half comparisons.
* First pass test for half compare.
* Fix bug in CUDA specialized emit control.
Updated tests to have pre and post inc/dec.
* Removed unneeded parts of the cuda prelude.
* Half structured buffer works on CUDA.
* Added name lookup for Gfx::Format
* Support half texture type in test system.
* Test for half reading on CUDA.
* Add half formats to Vk and D3D utils.
* Fix getAt for CUDA - where there might not be a .x member in a vector.
* Template specialization for half surface access works.
* Half RWTexture support.
* Test for half RWTexture access.
* Update half-rw-texture test.
* Remove test function from CUDA prelude.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Split out StringEscapeUtil.
* Added StringEscapeUtil.
* Fix typo in unix quoting type.
* Small comment improvements.
* Try to fix linux linking issue.
* Fix typo.
* Attempt to fix linux link issue.
* Update VS proj even though nothing really changed.
* Fix another typo issue.
* Fix for windows issue.
Fixed bug.
* Make separate Utils for escaping.
* Fix typo.
* Split out into StringEscapeHandler.
* Windows shell does handle removing quotes (so remove code to remove them).
* Handle unescaping if not initiating using the shell.
* Slight improvement around shell like decoding.
* Simplify command extraction.
* Add shared-library category type.
* Fix bug in command extraction.
* Typo in transcendental category.
* Enable unit-test on in smoke test category.
* Make parsing failing output as a failing test.
* Fixes for transcendental tests. Disable tests that do not work.
* Changed category parsing.
* Removed the TestResult parameter from _gatherTestsForFile.
Made testsList only output.
* Remove testing if all tests were disabled.
* Make args of CommandLine always unescaped.
* Add category.
* Don't need escaping on unix/linux.
* Remove some no longer used functions.
* Add requireSMVersion to CUDAExtensionTracker.
* half-calc.slang now works for CUDA.
* bit-cast-16-bit works on CUDA.
* WIP handling of CUDA vector<half> types.
* Half swizzle CUDA.
* Half vector test.
* Fix swizzle half bug.
* Fix compilation issue with narrowing to Index.
* Add unary ops.
* Add some vector scalar maths ops.
* Add half vector conversions for CUDA.
* Fix erroneous comment.
* Support for half comparisons.
* First pass test for half compare.
* Fix bug in CUDA specialized emit control.
Updated tests to have pre and post inc/dec.
* Removed unneeded parts of the cuda prelude.
* Half structured buffer works on CUDA.
* Added name lookup for Gfx::Format
* Support half texture type in test system.
* Test for half reading on CUDA.
* Add half formats to Vk and D3D utils.
* Fix getAt for CUDA - where there might not be a .x member in a vector.
|
|
Introduction
============
Several of our target platforms share a concept of "opaque" types, including resources (`Texture2D`) and samplers (`SamplerState`), which are restricted in how they can be used. GLSL and SPIR-V place very severe restrictions, in that opaque types cannot be used for the type of:
* (mutable) local variables
* (mutable) global variables
* structure fields
* Function result/return
* `out` or `inout` parameters
The HLSL language allows all of these cases, but with the practical caveat that the compiler front-end must be able to statically analyze how opaque types have been used and "optimize away" all of the above cases. For example, it is legal to have a local variable of an opaque type, but at any point where the variable gets used it must be statically known which top-level shader parameter the variable refers to.
Existing Work
=============
In the Slang compiler we need to implement our own passes to detect these "illegal" uses of opaque types and legalize them. The work is basically broken into two distinct steps:
* The existing `legalizeResourceTypes()` pass detects illegal types (e.g., a `struct` that has a field of type `Texture2D`) and replaces them with legal types, sometimes by splitting apart declarations (e.g., a parameter using such a `struct` type gets split into multiple parameters). At a high level, we can think of this as "exposing" opaque types so that they are not hidden inside of nested structures.
* Next, the `specializeResourceOutputs()` pass detects calls to functions that output opaque types (whether by the function return value of `out` / `inout` parameters). The pass analyzes the body of such functions, and tries to isolate the logic that determines their resource-type outputs and hoise that logic into call sites (so that the opaque-type outputs can then be eliminated).
This Change
===========
One important missing case was that the type legalization step was incapable of legalizing types that appear in the result/return type of functions. The existing logic would simply diagnose an internal/unimplemented error if it ecountered a non-simple type in the return position.
At a high-level, supporting this case seems simple enough. Given a function signature like:
```
struct Things { int a; Texture2D b; }
Things myFunc(int x) { ... }
```
we want to split the result type into an "ordinary" result type and then `out` parameters for any opaque-type fields:
```
struct Things_Legal { int a; }
Things_Legal myFunc(int x, out Texture2D result_b) { ... };
```
Similarly, at a call site to a function like this:
```
Things t = myFunc(99);
```
we split the function result into ordinary and opaque-type parts, and pass the latter as `out` parameters:
```
Texture2D t_b;
Things_Legal t = myFunc(99, /*out*/ t_b);
```
The main place where things get tricky is when dealing with `return` sites within the body of a function that needs legalization:
```
Things myFunc(int x) {
...
Things things = ...;
...
return things;
}
```
In theory the answer is simple: a `return` translates into writes to the `out` parameters for any opaque-type data, followed by a return of the ordinary-type part:
```
Things_Legal myFunc(int x, out Texture2D result_b) {
...
Things_Legal things = ...;
Texture2D things_b = ...;
...
result_b = things_b;
return things;
}
```
The sticking point here is that this step requires tracking data between the legalization of the parameter list for `myFunc` and legalization of the `return`s in its body, so that we can identify the `result_b` parameter to be able to write to it. The existing type legalization pass was not built with the idea that such communication is commonly needed; it assumes that each instruction can be legalized in isolation, so long as dependencies are respected.
This change adds logic such that the `legalizeFunc()` step sets up a data structure that it used to represent information about how a function (and its parameter list) got legalized, so that the logic for a `return` can make use of that legalized information. Right now the information we track consists of just the list of parameters that were introduced to represent a return/result type.
Testing
=======
In order to confirm what features do/don't work, I added a set of tests that cover a cross-product of opaque type use cases:
* The opaque type can be used in the function result type, an `out` parameter, or an `inout` parameter
* The opaque type can be used "directly" or nested inside a `struct`.
These tests are helpful to make sure we handle the most important cases, but it is worth noting that the coverage is still lacking in that we do not sufficiently test all the options for what the function body might do. An opaque-type function result could be derived from many different sources:
* It could be a global shader parameter
* It could be an `in` or `inout` parameter of the function itself
* It could be wrapped up in one or more structure types
* It could be wrapped up in one or more array types (such that the output of specialization needs to pass around array indices)
* It could involve use of the type as a local variable (including passing it into other functions with result/`out`/`inout` outputs of opaque types)
This change makes it so that we can handle the simplest cases involving result/return types with a wrapper `struct`, and adds test cases that confirm we handle several other cases for `out` and `inout` parameters. Gaining confidence that we cover all the cases that arise in practical shaders will require more work over following changes.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Split out StringEscapeUtil.
* Added StringEscapeUtil.
* Fix typo in unix quoting type.
* Small comment improvements.
* Try to fix linux linking issue.
* Fix typo.
* Attempt to fix linux link issue.
* Update VS proj even though nothing really changed.
* Fix another typo issue.
* Fix for windows issue.
Fixed bug.
* Make separate Utils for escaping.
* Fix typo.
* Split out into StringEscapeHandler.
* Windows shell does handle removing quotes (so remove code to remove them).
* Handle unescaping if not initiating using the shell.
* Slight improvement around shell like decoding.
* Simplify command extraction.
* Add shared-library category type.
* Fix bug in command extraction.
* Typo in transcendental category.
* Enable unit-test on in smoke test category.
* Make parsing failing output as a failing test.
* Fixes for transcendental tests. Disable tests that do not work.
* Changed category parsing.
* Removed the TestResult parameter from _gatherTestsForFile.
Made testsList only output.
* Remove testing if all tests were disabled.
* Make args of CommandLine always unescaped.
* Add category.
* Don't need escaping on unix/linux.
* Remove some no longer used functions.
* Add requireSMVersion to CUDAExtensionTracker.
* half-calc.slang now works for CUDA.
* bit-cast-16-bit works on CUDA.
* WIP handling of CUDA vector<half> types.
* Half swizzle CUDA.
* Half vector test.
* Fix swizzle half bug.
* Fix compilation issue with narrowing to Index.
* Add unary ops.
* Add some vector scalar maths ops.
* Add half vector conversions for CUDA.
* Fix erroneous comment.
* Support for half comparisons.
* First pass test for half compare.
* Fix bug in CUDA specialized emit control.
Updated tests to have pre and post inc/dec.
* Removed unneeded parts of the cuda prelude.
* Half structured buffer works on CUDA.
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
|
|
* enabling command line compiler to output PTX with multiple entry points.
* adding some simple optix intrinsics to slang
* Now handling the kIROp_RaytracingAccelerationStructureType for CUDA. Source seems to generate correctly, but accels are always optimized out of resulting PTX due to no trace calls
* fixing unnecessary diff with master
* allowing unhandled untyped buffers to fall through to super::calcTypeName
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Split out StringEscapeUtil.
* Added StringEscapeUtil.
* Fix typo in unix quoting type.
* Small comment improvements.
* Try to fix linux linking issue.
* Fix typo.
* Attempt to fix linux link issue.
* Update VS proj even though nothing really changed.
* Fix another typo issue.
* Fix for windows issue.
Fixed bug.
* Make separate Utils for escaping.
* Fix typo.
* Split out into StringEscapeHandler.
* Windows shell does handle removing quotes (so remove code to remove them).
* Handle unescaping if not initiating using the shell.
* Slight improvement around shell like decoding.
* Simplify command extraction.
* Add shared-library category type.
* Fix bug in command extraction.
* Typo in transcendental category.
* Enable unit-test on in smoke test category.
* Make parsing failing output as a failing test.
* Fixes for transcendental tests. Disable tests that do not work.
* Changed category parsing.
* Removed the TestResult parameter from _gatherTestsForFile.
Made testsList only output.
* Remove testing if all tests were disabled.
* Make args of CommandLine always unescaped.
* Add category.
* Don't need escaping on unix/linux.
* Remove some no longer used functions.
* Add requireSMVersion to CUDAExtensionTracker.
* half-calc.slang now works for CUDA.
* bit-cast-16-bit works on CUDA.
* WIP handling of CUDA vector<half> types.
* Half swizzle CUDA.
* Half vector test.
* Fix swizzle half bug.
* Fix compilation issue with narrowing to Index.
* Add unary ops.
* Add some vector scalar maths ops.
* Add half vector conversions for CUDA.
* Fix erroneous comment.
|
|
* Cleanup work on D3D12 shader object static specialization
This builds on PR #1829 in a small way (because that PR adds `getStride()` to Slang type layout reflection). The only relevant changes here are in the `render-d3d12.cpp` file.
The basic idea here is to clean up the D3D12 path to be more in line with the cleanups made for D3D11 and Vulkan. The way that D3D12 shader parameter binding goes through a root signature means that some of the details that were required for those APIs (in particular, tracking both "primary" and "pending" offsets during multiple steps) are not required for the critical-path binding stuff on D3D12.
There is some subtlety to the handling of the "ordinary" data buffer in the `bindAsValue()` case that I don't like, and that I'm not 100% confident I've gotten right. We may find that we have to revisit that logic as we add more tests.
* fixup
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Split out StringEscapeUtil.
* Added StringEscapeUtil.
* Fix typo in unix quoting type.
* Small comment improvements.
* Try to fix linux linking issue.
* Fix typo.
* Attempt to fix linux link issue.
* Update VS proj even though nothing really changed.
* Fix another typo issue.
* Fix for windows issue.
Fixed bug.
* Make separate Utils for escaping.
* Fix typo.
* Split out into StringEscapeHandler.
* Windows shell does handle removing quotes (so remove code to remove them).
* Handle unescaping if not initiating using the shell.
* Slight improvement around shell like decoding.
* Simplify command extraction.
* Add shared-library category type.
* Fix bug in command extraction.
* Typo in transcendental category.
* Enable unit-test on in smoke test category.
* Make parsing failing output as a failing test.
* Fixes for transcendental tests. Disable tests that do not work.
* Changed category parsing.
* Removed the TestResult parameter from _gatherTestsForFile.
Made testsList only output.
* Remove testing if all tests were disabled.
* Make args of CommandLine always unescaped.
* Add category.
* Don't need escaping on unix/linux.
* Remove some no longer used functions.
* Add requireSMVersion to CUDAExtensionTracker.
* half-calc.slang now works for CUDA.
* bit-cast-16-bit works on CUDA.
* WIP handling of CUDA vector<half> types.
* Half swizzle CUDA.
* Half vector test.
* Fix swizzle half bug.
* Fix compilation issue with narrowing to Index.
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
This change applies to the case where we have a sub-object range with more than one object in it (so arrays of constant buffers, parameter blocks, or existentials). It is worth noting that we don't really seem to have any tests covering this case right now, so it is entirely possible that things are busted already, and/or that this change doesn't work the way I think it does.
When we go to actually bind the state from a sub-object range into the pipeline, we care about:
* The offset to where the first object should be written
* The "stride" between consecutive objects
We were already capturing the offset information from Slang reflection data, and the same was true for the part of the offset that pertains to "pending" ordinary data. For other cases, though, we were manually incrementing the offset by values computed manually in `gfx` for Vulkan, and we were just skipping the offsetting step entirely for D3D11.
With this change we extract more complete stride information from the Slang reflection data and use that instead of ad hoc computations. Hopefully this data is correct, and if it isn't we can consider whether it needs fixing at the Slang reflection level rather than in `gfx`.
This change doesn't apply to the OpenGL back-end (which hasn't been updated to match the new approach to static specialization at all) or to the D3D12 back end (which has been updated a bit, but probably needs other cleanup work to be done first to bring it in line).
|
|
* Update gfx back-ends to handle static specialization
The main goal here is to make the D3D11, D3D12 and Vulkan back-ends support static specialization of interface types in the case where the data for the type won't "fit" in the pre-allocated space for existential values. This includes all cases where the concrete type being specialized to has resources/samplers/etc., as well as any cases where its ordinary/uniform data exceeds the space available.
(Note that the CPU and CUDA targets don't need this work since they can (in theory) support arbitrary-size data in the fixed-size existential payload by using pointer indirection. Actually supporting indirection in those cases should be a distinct change)
The Slang compiler already performs layout for programs that have this kind of data that doesn't "fit," and it lays them out using an idea of "pending" type layouts. Basically, a type that contains some amount of specialized interface-type fields will produce both a "primary" type layout that just covers the data for the unspecialized case, as well as "pending" type layout that describes the layout for all the extra data needed by specialization.
When laying out a `ConstantBuffer<X>` or `ParameterBlocK<X>` ("CB" or "PB"), the front-end will try to place as much of that "pending" data into the layout of the buffer/block itself as is possible. That means that both CBs and PBs will be able to allocate trailing bytes for any ordinary data in the "pending" layout. PBs will be able to allocate any trailing resources/samplers into their layout, but for CBs they will spill out to be part of the pending layout for the buffer itself.
In order for the back-ends to properly handle pending data, they need to *either* assume the exact layout rules used by the front-end and try to reproduce them (e.g., by iterating over binding ranges and sub-objects in the exact same order that front-end layout would enumerate them), *or* they need to respect the reflection information produced by the front-end. This change takes the latter approach, trying to make only minimal assumptions about the layout rules being used. This choice is motivated by wanting to decouple the `gfx` implementation from the compiler front-end, especially insofar as this work has made me question whether the current layout rules are the best ones possible.
A common theme across all the implementations is to have a fixed-size type that can represent "binding offsets" for the chosen back-end. The offset type has fields that depend on the API-specific way bindings are indexed; e.g., for D3D11 it has offsets for CBV, SRV, UAV, and sampler bindings. This fixed-size offset type can be filled in based on Slang reflecton information, and then used to compute derived offsets with just a few add operations.
The simple offset type for each API is then extended to produce an offset type that includes both the offsets for "primary" data and also the offsets for "pending" data. Most logic that traffics in offsets doesn't have to know about this more complicated representation.
Making consistent use of these offsets required that I pretty much rewrite the logic that actually applies shader objects to the API state. Doing so might be lowering the efficiency of the system in the near term, but the increase in clarity was important for getting the work done, and it seems like it will also be important if/when we start trying to perform special-case optimizations around root and entry-point parameter setting.
While there are many API-specific differences, we can identify a repeated pattern where many steps, whether applying parameters to the pipeline stage or constructing signatures / layouts, can be broken down into three main operations on `ShaderObject`s or their layouts:
* `*AsValue()` is the core operation, and is the one used for the `ExistentialValue` case most of the time. It ignores the ordinary data in the object, and instead processes all nested binding ranges (for resources/smaplers) and sub-objects.
* `*AsConstantBuffer()` handles the `ConstntBuffer<X>` case, by dealing with the implicit buffer for ordinary data (if it is needed) and then delegates to the `*AsValue()` case.
* `*AsParameterBlock()` handles the `ParameterBlock<X>` case, by allocating/preparing/etc. any descriptor tables/sets that would be required for the current object/layout and then delegating to `*AsConstantBuffer()` to do the rest
The idea is that by having the parameter block case delegate to the constant buffer case, which delegates to the value/existential case, we can streamline a lot of the logic so that it doesn't seem quite as full of special cases.
Note: When preparing this pull request I spent a reasonable amount of time trying to clean up the D3D11 and Vulkan implementations, so they are probably the easiest to read and understand when it comes to the new code. Doing the cleanup work also helped to work out some weird corner case bugs/issues. In contrast, the D3D12 path hasn't had as much attention given to cleanliness and comments, so it really needs some attention down the line to get things into a state that is easier to understand.
* fixup: remove debugging code spotted in review
|
|
|
|
|
|
* Add gfx user's guide.
* Add getting started chapter in gfx-guide
* Fixes
* Fix
* Polishing doc template
|
|
* `gfx` DebugCallback and debug layer.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Split out StringEscapeUtil.
* Added StringEscapeUtil.
* Fix typo in unix quoting type.
* Small comment improvements.
* Try to fix linux linking issue.
* Fix typo.
* Attempt to fix linux link issue.
* Update VS proj even though nothing really changed.
* Fix another typo issue.
* Fix for windows issue.
Fixed bug.
* Make separate Utils for escaping.
* Fix typo.
* Split out into StringEscapeHandler.
* Windows shell does handle removing quotes (so remove code to remove them).
* Handle unescaping if not initiating using the shell.
* Slight improvement around shell like decoding.
* Simplify command extraction.
* Add shared-library category type.
* Fix bug in command extraction.
* Typo in transcendental category.
* Enable unit-test on in smoke test category.
* Make parsing failing output as a failing test.
* Fixes for transcendental tests. Disable tests that do not work.
* Changed category parsing.
* Removed the TestResult parameter from _gatherTestsForFile.
Made testsList only output.
* Remove testing if all tests were disabled.
* Make args of CommandLine always unescaped.
* Add category.
* Don't need escaping on unix/linux.
* Remove some no longer used functions.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Split out StringEscapeUtil.
* Added StringEscapeUtil.
* Fix typo in unix quoting type.
* Small comment improvements.
* Try to fix linux linking issue.
* Fix typo.
* Attempt to fix linux link issue.
* Update VS proj even though nothing really changed.
* Fix another typo issue.
* Fix for windows issue.
Fixed bug.
* Make separate Utils for escaping.
* Fix typo.
* Split out into StringEscapeHandler.
* Windows shell does handle removing quotes (so remove code to remove them).
* Handle unescaping if not initiating using the shell.
* Slight improvement around shell like decoding.
* Simplify command extraction.
* Add shared-library category type.
* Fix bug in command extraction.
* Typo in transcendental category.
* Enable unit-test on in smoke test category.
* Make parsing failing output as a failing test.
* Fixes for transcendental tests. Disable tests that do not work.
* Changed category parsing.
* Removed the TestResult parameter from _gatherTestsForFile.
Made testsList only output.
* Remove testing if all tests were disabled.
* Fix typo.
* Disable path canonical test on linux because CI issue.
|
|
|
|
Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Some test around matrix layout.
* A test for problem with C++ code output.
* Default should be column major CPU/CUDA tests confused this.
* Added column-major test
* Small fixes around tabs/comments
* Diagnostic problem for init of vector type with inappropriate params.
* Test demonstrating inconsistency between GPU and 'CPU-like' non square matrices.
* Added column major non square test.
* Remove vector mismatch - because ambiguity is arguably reasonable because float can silently promote to a vector.
* Small typo fixes for non-square-column-major.slang
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Update matrix documentation.
* Small fixes.
* Some small fixes.
* Fixes and improvements to matrix doc.
* Small fixes.
* Additional matrix doc layout clarification.
|
|
|
|
|
|
|
|
|
|
|
|
|