| Age | Commit message (Collapse) | Author |
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Fix memory leak on ScopedAllocation.
Fix off by one bug on UIntSet
|
|
* Make witness and RTTI handles lower to `uint2`.
And enable some dynamic dispatch tests on D3D/VK.
* Bug fixes.
|
|
Overview
========
Prior to this change, we had two different code generation strategies for interface/existential types in Slang, that didn't always play nicely together:
* The "legacy" static specialization approach could handle plugging in an arbitrary concrete type for an existential type parameter (including types with resources, etc.), but wouldn't work well with things like a `StructuredBuffer<>` of an interface type, and requires somewhat counter-intuitive layout rules to make work.
* The new dynamic dispatch approach produces simpler, more easily understood layouts by assuming that values of interface type can fit into a fixed number of bytes. The tradeoff there is that it cannot handle types that include resources (only POD types).
The goal of this change is to make it so that the two strategies can co-exist. In particular, in cases where a shader is amenable to both static specialization and dynamic dispatch, the type layouts should agree.
In order to make the type layouts agree, we:
* Declare that *all* values of existential type reserve storage according to the dynamic-dispatch rules (so 16 bytes for the RTTI and witness-table information, plus whatever bytes are needed to story "any value" of a conforming type).
* Then we modify the "legacy" layout rules so that if a value of concrete type can fit in the reserved "any value" space for a given interface, then it is laid out there exactly like the dynamic dispatch rules would do. Otherwise, we fall back to the previous legacy rules (since we don't need to agree with the dynamic-dispatch layout on types that can't be used with dynamic dispatch).
Details
=======
* Renamed `ExistentialBox` to `BoundInterfaceType` to better clarify how it relates to `BindExistentialsType`
* Unconditionally apply the `lowerGenerics` pass during emit, since it is now responsible for aspects of the lowering of existential types when specialization is used.
* Made IR type layout take the target into account, so that the layout of resource types can vary by target (e.g., being POD on some targets, and invalid on others)
* Cleaned up some issues around using global shader parameters as the "key" for their layout information in the global-scope layout (only comes up when there are global-scope `uniform` parameters)
* Made there be a default any-value size (16) instead of making it be an error to leave out. This was the simplest option; we could try to go back to having an error, but we'd need to only issue it if we are sure a type/interface is being used with dynamic dispatch, since static dispatch doesn't have to obey the restrictions.
* Changed lowering of existential types to tuples so that bound interfaces where the concrete type won't fit use a "pseudo-pointer" instead of an "any-value" to hold the payload
* Changed IR type legalization to handle the "pseudo-pointer" case and apply layout information from an interface type over to the payload part when static specialization was used.
* Changed some details of how witness tables were being lowered, so that we didn't have to create "proxy" witness tables for the constraints on associated types (just use the actual requirement entries we generate)
* Changed witness tables so that they know the subtype doing the conforming
* Added logic so that we don't generate pack/unpack logic and witness table wrapper functions for types that are incompatible with any-value/dynamic dispatch for a given interface.
* Changed the core AST-level type layout logic to use the dynamic-dispatch layout in case things fit, and the legacy static specialization case when things don't (while also reserving space for the dynamic-dispatch fields)
* Changed a bunch of test cases for static specialization to properly use the new layout (which introduces new buffers in some cases, and moves data around in others).
Future Work
===========
The experience of trying to reconcile our older way of handling interface-type specialization with our newer model (that supports dynamic dispatch) makes it clear that we really need to make similar changes to our handling of generic type parameters on entry points and at the global scope.
A future change should make it so that a global type parameter is lowered with a type layout similar to a value parameter of interface type, including the RTTI and witness-table pieces, and just leaving out the "any value" piece. A similar translation strategy should apply to entry-point generic parameters (mirroring how we lower generic functions for dynamic dispatch already), and value specialization parameters.
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* WIP FileSystem refactor.
* Made loadFile load the file in binary mode.
* Fixed some comments.
Fixed typo in RelativePath - not used 'fixedPath'.
|
|
* Fix constant folding in attributes
* remove unnecessary change
* remove unnecessary change
* remove unnecessary change
* Fixed circular checking issue.
* cleanup
* more cleanup
* minimize diff
* minimize diff
* minimize diff
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Mangling/module name extraction for GenericDecl
* Add comment on SerialFilter to explain re-enabling Stmt.
* Support setting up SyntaxDecl when reconstructed after deserialization.
* Improvements to setup SyntaxDecl.
* Fix typo so can read compressed SourceLocs.
* Fix issue with SourceManger.
* Simple test for serializing out stdlib and reading back in.
* Fix calling convention.
* Add override to StdLib impls.
* Fix typo.
* Apply testing to an actual compute test when using load-stdlib
Make -load/compile-stdlib processable by Slang
Move out testing into util into TestToolUtil so can be shared.
* Slightly more concise setup of session.
* Fix some errors introduced with session handling.
* Made setup for compile same across slangc and slangc-tool.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Mangling/module name extraction for GenericDecl
* Add comment on SerialFilter to explain re-enabling Stmt.
* Support setting up SyntaxDecl when reconstructed after deserialization.
* Improvements to setup SyntaxDecl.
* Fix typo so can read compressed SourceLocs.
* Fix issue with SourceManger.
|
|
* Fix VS2017 Warnings
* Update slang-visitor.h
Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Improve diagnostic for token pasting.
* Token paste location test.
* Output include hierarchy.
* WIP on includes hierarchy.
* Improved include hierarchy output - to handle source files without tokens.
Improved test case.
* Small comment improvements.
Fixed a typo with not returning a reference.
* Slight simplification of the ViewInitiatingHierarchy, by adding GetOrAddValue to Dictionary.
* Remove the need for ViewInitiatingHierarchy type.
* Improve output of path in diagnostic for includes hierarchy.
* Remove comment in diagnostic for token-paste-location.slang
* Update command line docs to include `-output-includes`
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* Use integer RTTI/witness handles in existential tuples.
* Fix clang error.
* Fix IR serialization to use 16bits for opcode.
* Undo accidental comment change.
* Use variable length encoding for opcode.
* Fix compile error.
* Fixing issues
* Fix code review issues.
|
|
* Fix IR serialization to use 16bits for opcode.
* Undo accidental comment change.
* Use variable length encoding for opcode.
* Fixing issues
|
|
|
|
* Specialize witness table lookups.
* Remove generated files from vcxproj
* Fix call to generic interface methods.
|
|
The existing type legalization logic worked as a single preorder pass over the IR tree. This could create problems in cases where an instruction might be processed before one of its operands (e.g., a function that references a global shader parameter is processed before that parameter).
This change makes it so that type legalization uses a work list, and only adds instructions to the work list once their parent, type, and operands have been processed. As a result, we should be able to guarantee that an instruction will only be processed once all of its operands have been.
One wrinkle here is that in the current IR it is possible to end up with a cycle of uses for global-scope instructions, specifically around interface types and their list of requirements. This change includes a short-term kludge to break those cycles and allow the pass to complete.
As it stands, this is simply a refactoring pass and no new functionality is introduced. The changes are necessary to unblock work in a feature branch that depends on type legalization being more robust against IR that might use an unexpected ordering.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Fix handling of access modifiers inside type definition.
* Fix access problem for AST node.
Make dumping produce a single function with switch, to potentially make available without Dump specific access.
* WIP on serialization design doc.
* Remove project references to previously generated files.
* More docs on serialization design.
* Improve serialization documentation.
Remove unused function from IRSerialReader.
* Small fixes around naming. Remove long comment from slang-serialize.h - as covered in serialization.md
* Remove long comment in slang-serialize.h as covered in serialization.md
* More information about doing replacements on read for AST and problems surrounding.
* Typo fix.
* Spelling fixes.
* Value serialize.
* Value types with inheritence.
* Use value reflection serial conversion for more AST types
* Use automatic serialization on more of AST.
* Get the types via decltype, simplifies what the extractor has to do.
* Update the serialization.md for the value serialization.
* Small doc improvements.
* Update project.
* Remove ImportExternalDecl type
Added addImportSymbol and ImportSymbol type
Fixed bug in container which meant it wouldn't read back AST module
* Because of change of how imports and handled, store objects as SerialPointers.
* First pass symbol lookup from mangled names.
* Cache current module looked up from mangled name.
* Fix SourceLoc bug.
Improve comments.
* Added diagnostic on mangled symbol not being found
* Fix typo.
* WIP serializing stdlib.
* WIP serializing stdlib in.
* Fix problem serializing arrays that hold data that is already serialized.
* Remove clash of names in MagicTypeModifier.
* Make conversion from char to String explicit.
Fix reference count issue with SerialReader.
* Add code to save/load stdlib.
* Use return code to avoid warning - SerialContainerUtil::write(module, options, &stream))
* Make all String numeric ctors explicit.
Added isChar to UnownedStringSlice.
Added operator== for UnownedStringSlice to String to avoid need to convert to String and allocate.
* Add error check to readAllText.
* tabs -> spaces on String.h
* tab -> spaces String.cpp
* Remove msg for StringBuilder, just build inplace for exceptions.
* Check SerialClasses - for name clashes.
Renamed Modifier::name as Modifier::keywordName
* Handling of extensions when deserializing AST - updating the moduleDecl->mapTypeToCandidateExtensions
Co-authored-by: Tim Foley <tim.foley.is@gmail.com>
|
|
The Slang IR builder has a notion of "hoistable" instructions, which are basically those instructions that represent a pure side-effect-free operation on their operands, and which can and should be deduplicated. Most types are "hoistable" instructions.
In order to make deduplication of hoistable instructions work, we need to emit them at the right location. Consider if we had:
```hlsl
void myFunc<T>(...)
{
if(someCondition) {
vector<T, 4> a = ...;
...
} else {
vector<T, 4> b = ...;
}
}
```
The IR instruction that represents `vector<T,4>` can't be inserted at the global scope, because then the parameter `T` would not be visible to it. That instruction also shouldn't be inserted into the same block that declares `a`, because then the instruction itself wouldn't be visible at the point where `b` is declared.
The IR builder already has logic to pick the right parent instruction. In the example given, the IR instruction for `vector<T,4>` should be inserted into the body of the IR generic, but outside of the IR function that represents `myFunc`.
The problem this change fixes is that while the logic was picking the *parent* for a hoistable instruction correctly, it wasn't putting much care into pick the insertion *location*. The existing strategy amounted to:
* If the IR builder was set with an insertion location inside the chosen parent, then use that insertion location
* Otherwise, insert at the end of the chosen parent
Neither of those options is perfect. Either could lead to an instruction being inserted after one of its uses, and the second option could even lead to a type being inserted *after* the `return` instruction in a function/generic, which violates another structural invariant of our IR (that every block must end with a terminator, and terminators must only appear at the end of blocks).
This change updates the rules as follows:
* If the type of the instruction being created, or any of its operands are in the chosen parent, then insert immediately after whichever of those instructions is last in that parent.
* Otherwise, insert before the first non-decoration, non-parameter child of the chosen parent
The combined effect of these two rules is now that we insert any hoistable instruction as early as we can in its parent, without violating the structural validity rules.
(One small exception to these rules is that if the parent is the module then we don't worry about ordering and just insert at the end, since order-of-declaration isn't significant at module scope in our IR)
All of our existing tests work with this new behavior, although there could conceivably be future cases that lead to complicated breakage. For example, if a pass looks at the first "ordinary" instruction in a block and saves it to use as an insertion point for parameter, and then proceeds to manipulate code in the block before going back and inserting parameters at the chosen location, there is a chance that a hoistable instruction might have been inserted before the chosen insertion point, leading to a parameter being inserted after an ordinary instruction. In general, though, code that works like that would already be playing a dangerous game in that it is manipulating instructions in a block while assuming the first instruction will remain fixed.
This change is currently just a refactor, but the underlying issue surfaced as a bug when I made other changes in a feature branch.
|
|
Co-authored-by: Tim Foley <tim.foley.is@gmail.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Fix handling of access modifiers inside type definition.
* Fix access problem for AST node.
Make dumping produce a single function with switch, to potentially make available without Dump specific access.
* WIP on serialization design doc.
* Remove project references to previously generated files.
* More docs on serialization design.
* Improve serialization documentation.
Remove unused function from IRSerialReader.
* Small fixes around naming. Remove long comment from slang-serialize.h - as covered in serialization.md
* Remove long comment in slang-serialize.h as covered in serialization.md
* More information about doing replacements on read for AST and problems surrounding.
* Typo fix.
* Spelling fixes.
* Value serialize.
* Value types with inheritence.
* Use value reflection serial conversion for more AST types
* Use automatic serialization on more of AST.
* Get the types via decltype, simplifies what the extractor has to do.
* Update the serialization.md for the value serialization.
* Small doc improvements.
* Update project.
* Remove ImportExternalDecl type
Added addImportSymbol and ImportSymbol type
Fixed bug in container which meant it wouldn't read back AST module
* Because of change of how imports and handled, store objects as SerialPointers.
* First pass symbol lookup from mangled names.
* Cache current module looked up from mangled name.
* Fix SourceLoc bug.
Improve comments.
* Added diagnostic on mangled symbol not being found
* Fix typo.
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Fix handling of access modifiers inside type definition.
* Fix access problem for AST node.
Make dumping produce a single function with switch, to potentially make available without Dump specific access.
* WIP on serialization design doc.
* Remove project references to previously generated files.
* More docs on serialization design.
* Improve serialization documentation.
Remove unused function from IRSerialReader.
* Small fixes around naming. Remove long comment from slang-serialize.h - as covered in serialization.md
* Remove long comment in slang-serialize.h as covered in serialization.md
* More information about doing replacements on read for AST and problems surrounding.
* Typo fix.
* Spelling fixes.
* Value serialize.
* Value types with inheritence.
* Use value reflection serial conversion for more AST types
* Use automatic serialization on more of AST.
* Get the types via decltype, simplifies what the extractor has to do.
* Update the serialization.md for the value serialization.
* Small doc improvements.
* Update project.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* WIP on serialization design doc.
* More docs on serialization design.
* Improve serialization documentation.
Remove unused function from IRSerialReader.
* Small fixes around naming. Remove long comment from slang-serialize.h - as covered in serialization.md
* Remove long comment in slang-serialize.h as covered in serialization.md
* More information about doing replacements on read for AST and problems surrounding.
* Typo fix.
* Spelling fixes.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Fix handling of access modifiers inside type definition.
* Fix access problem for AST node.
Make dumping produce a single function with switch, to potentially make available without Dump specific access.
* Remove project references to previously generated files.
|
|
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Added CharUtil.
Added TypeSet to extractor.
First pass at being able to specify all headers for multiple output headers.
* Fix includes for new C++ extractor convension.
Update premake5 to use new extractor mechanisms.
* Small improvements around StringUtil.
* Split out NameConventionUtil.
* Use a 'convert' to convert between convention types.
* Fix output of build message for C++ extractor.
Improve NameConventionUtil interface.
* Improve comments.
* Fix warning on gcc.
* Fix clang warning.
* Fix some typos in NameConventionUtil.
* Small fix to premake5.lua
* Fix generated includes.
* Remove m_reflectType as no longer applicable with TypeSet.
* Fix .gitignore for slang-generated-* files.
Added getConvention to determine convention from slice.
Add versions of split and convert that infer the from convention
* Fix typo in spliting camel.
* LineWhitespace -> HorizontalWhitespace
* Improve CharUtil comments.
|
|
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Make AST serialization types, marker include _AST_. Ie SLANG_CLASS -> SLANG_AST_CLASS and SLANG_ABSTRACT_CLASS -> SLANG_ABSTRACT_AST_CLASS
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Ascii mode not always set on FileStream.
Remove this-> if not needed.
Simplify setting of m_fileAccess.
* Fix typo.
* Fix typo.
* Clear up default FileAccess calculation.
* Convert tabs to spaces.
* Small naming improvements in FileStream::seek.
|
|
over (#1580)
* #include an absolute path didn't work - because paths were taken to always be relative.
* Access the members iteration in _ensureAllDeclsRec via indices to avoid a change in the array invalidating the list.
* Fix another iterator of members in SemanticVisitor
* Slight improvements to comments - main purpose is to kick a new build.
|
|
The basic problem here is that when a function has multiple declarations with matching signatures (e.g., a forward declaration and then a later definition with a body), the IR lowering logic would lower all declarations whenever the first one was encountered, but then would only register an IR value as the lowered version of the first declaration. Other matching declarations would then run the risk of being lowered again, and in the case where they included features like loops with break/continue labels, that would create the risk of keys getting inserted into certain dictionaries more than one, leading to exceptions.
This change ensures that when lowering a function that has multiple matching declarations to IR, we register an IR value for all of those declarations and not just the first.
I have added a test case that leads to a crash without this change, to ensure that we don't introduce a regression down the line.
|
|
This change adds a single new entry point to our reflection API that allows an application to query the `TypeLayout` that represents the global-scope shader parameters. This can be used by the application in order to detect when the global parameters have required allocation of a default constant buffer, or simply to unify the handling of the global scope with handling of other kinds of parameters.
|
|
|
|
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* First pass at generalizing serializer.
* Split out ReflectClassInfo
* Use the general ReflectClassInfo
* Fix some typos in debug generalized serialization.
* Add calculation of classIds.
Make distinct addCopy/add on SerialClasses.
* Write up of more generalized serialization
* WIP to transition from ASTSerialReader/Writer etc to generalized SerialReader/Writer and associated types.
* Improvements to SerialExtraObjects.
Keep RefObjects in scope in factory
* Compiles with Serial refactor - doesn't quite work yet.
* First pass serialization appears to work with refector.
* Split out type info for general slang types.
* Split out slang-serialize-misc-type-info.h
* DebugSerialData -> SerialSourecLocData
DebugSerialReader -> SerialSourceLocReader
DebugSerialWriter -> SerialSourceLocWriter
* Remove unused template that only compiles on VS.
* Fix warning around unused function on non-VS.
* Improve output of type names that are in scopes in C++ extractor.
Update premake5.lua to run generation for RefObject derived types.
* C++ extractor working on RefObject type.
* Split out serialization functionality that spans different types into slang-serialization-factory.cpp/.h
Put AST type info into header.
Removed RefObjectSerialSubType - use RefObjectType
Add filtering for RefObject derived types
Remove construction and filteringhacks.
* Set up field serialization for SerialRefObject derived types.
* Fix template problem compiling on Clang/Gcc
* Work in progress to make Value types work.
* Added slang-value-reflect.cpp
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Added [__requiresNVAPI] to functions that need nvapi support.
* Added support for InterlockedExchangeU64
Added exchange-int64-byte-address-buffer test
Fixed typo in cas-int64-byte-address-buffer test
* Improve comment around NVAPI usage in hlsl.meta.slang
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Added [__requiresNVAPI] to functions that need nvapi support.
|
|
* Small fixes for CUDA code emit
* Add a CUDA translation to `GroupMemoryBarrierWithWaveSync()`. We map this to `__syncwarp()` for CUDA (with no mask, implying a full-warp sync).
* Consistently use `SLANG_PRELUDE_ASSERT` for assertions introduced in code emit (rather than just using the bare `assert(...)` function, which is not included by our CUDA prelude by default)
* Add a new `SLANG_CUDA_STRUCTURED_BUFFER_NO_COUNT` flag to the CUDA prelude that allows the `count` field to be omitted from `(RW)StructuredBuffer<T>`. This is a bit of a hacky because the computed layouts will still assume the `count` field is present, but this feature is required by at least one client application for now. A better long-term fix will take more time to design and implement.
* fixup: CUDA prelude code fix for pedantic compilers
Co-authored-by: Tim Foley <tim.foley.is@gmail.com>
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
|
|
* Specialize exsitentials parameters in struct fields.
* Cleanup.
* Handle partial existential parameter type specialization.
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Specialize exsitentials parameters in struct fields.
* Cleanup.
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* First pass at generalizing serializer.
* Split out ReflectClassInfo
* Use the general ReflectClassInfo
* Fix some typos in debug generalized serialization.
* Add calculation of classIds.
Make distinct addCopy/add on SerialClasses.
* Write up of more generalized serialization
* WIP to transition from ASTSerialReader/Writer etc to generalized SerialReader/Writer and associated types.
* Improvements to SerialExtraObjects.
Keep RefObjects in scope in factory
* Compiles with Serial refactor - doesn't quite work yet.
* First pass serialization appears to work with refector.
* Split out type info for general slang types.
* Split out slang-serialize-misc-type-info.h
* DebugSerialData -> SerialSourecLocData
DebugSerialReader -> SerialSourceLocReader
DebugSerialWriter -> SerialSourceLocWriter
* Remove unused template that only compiles on VS.
* Fix warning around unused function on non-VS.
|
|
* Add API for whole program compilation.
This change exposes a new target flag: `SLANG_TARGET_FLAG_GENERATE_WHOLE_PROGRAM` that can be set on a target with `spSetTargetFlags`. When this flag is set, `spCompile` function generates target code for the entire input module instead of just the specified entrypoints. The resulting code will include all the entrypoints defined in the input source.
The resulting whole program code can be retrieved with two new functions: `spGetTargetCodeBlob` and `spGetTargetHostCallable`.
This change also cleans up the unnecessary `entryPointIndices` parameter of `TargetProgram::getOrCreateWholeProgramResult`, and modifies the `cpu-hello-world` example to make use of the new whole-program compilation API to simplify its logic.
* Update comments.
|
|
* Enable default cpp prelude.
* Print the "#include" line as a normal source if the file does not exist.
* Bug fix
* Fix.
* Fix c++ prelude header.
* Remove unnecessary fopen call.
|
|
Based on review feedback from #1556, this change updates the Slang preprocessor so that it is no longer coupled to policy details from higher levels of the software stack. In particular, the preprocessor used to:
* Deal with updating the list of file paths that a `Module` depends on.
* (As of #1556) detect NVAPI-related macro definitions and use them to construct an AST-level `Modifier` attached to the `ModuleDecl`.
This change introduces a callback interface where the `Preprocessor` calls out to a `PreprocessorHandler` at certain points during execution, allowing the handler to introduce custom logic that suits a particular high-level use case.
This change also removes the dependence of the preprocessor on the `Linkage`, because in practice only a small number of its sub-objects were needed. As a convenience, a wrapper function that takes a `Linkage` was left in place so that the existing call sites didn't have to change very much.
|
|
While working on #1557, it became clear that something was going wrong when using `*ByteAddressBuffer.Load<T>` to load a vector type on GLSL/SPIR-V targets.
The root problem was that the IR-level layout logic (which computes the "natural" layout of a type) had not yet been extended to handle vectors. The fix is simple enough, but it highlights the fact that we probably need to go ahead and "complete" that layout logic sooner or later.
This change includes a test case that covers the behavior added here, as well as the case that #1557 fixes. Unfortunately, due to CI system limitations, the HLSL/dxc part of the test is not yet enabled.
|
|
In some cases, functionality is available as either a GLSL extension for Vulkan/SPIR-V, or through the NVAPI system for D3D. This situation creates complications because while GLSL extensions are generally all supported by the open-source glslang compiler (which we can bundle and ship), NVAPI operations are exposed through a specific header (`nvHLSLExtns.h`) that ships as part of the NVAPI SDK.
When a user wants to explicitly use NVAPI-provided operations in their shader code, there are no major complications for Slang; the user sets up their include paths, `#include`s the relevant header, calls functions in it, and lets Slang deal with the details of compilation.
The challenge for Slang arises when we want to provide a cross-platform interface in our standard library (e.g., the `RWByteAddressBuffer.InterlockedAddF32` method that was recently added) that uses either a GLSL extension (when compiling for Vulkan/SPIR-V) or an NVAPI (when compiling to DXBC or DXIL). In that case, the code *generated* by Slang now has a dependency on NVAPI, and we need to somehow emit a `#include` directive that pulls it in when invoking fxc or dxc. Because we do not (and seemingly cannot) bundle the NVAPI header with the compiler, we have to rely on ther user to have it available and to somehow communicate to Slang where it is.
Exposing portable routines that sometimes use NVAPI currently creates two main challenges:
1. The user is forced to interact with the "prelude" mechanism in the compiler, which allows the programmer to define code in a given target language that gets prepended to the Slang-generated code. While the prelude mechanism is powerful, it is also hard for users to integrate into their workflow, and our experience so far is that users want something that Just Works.
2. If the user writes code that uses some of our abstract operations that layer on NVAPI *and* they also want to use NVAPI explicitly, they end up with two copies of the NVAPI header (one included by the Slang front-end, and another included by the downstream fxc/dxc compiler). This puts the user in the situation of (a) having to ensure that they set the defines like `NV_SHADER_EXTN_SLOT` consistently both when invoking Slang and when adding their prelude, and (b) even if they do make the definitions consistent, they run into the problem that fxc/dxc complain about overlapping register bindings on the two copies of the `g_NvidiaExt` global shader paraemter that the NVAPI header declares.
This change attempts to resolve both issues by adding a lot of "do what I mean" logic to the compiler to try to ease things in the common case. In particular:
1. The user no longer needs to use the "prelude" mechanism when using NVAPI. The compiler now embeds a default prelude for HLSL output, which will `#include` the NVAPI header if and only if the generated code needs NVAPI access because of portable standard library routines that were used.
2. The user can mix-and-match explicit NVAPI use and stdlib functions that compile to use NVAPI. The register/space to be used by NVAPI when included via prelude is now set based on whatever the user set via the preprocessor so that it should automatically be consistent between both cases. Furthermore, the code we emit for the declaration of `g_NvidiaExt` when compiling explicit NVAPI use is set up to be conditional, so that it is skipped in the case where the prelude will pull in its own declaration of that parameter.
The way all this is achieved involves a lot of moving pieces:
* We now have an HLSL prelude, which mostly just serves to `#include "nvHLSLExtns.h"` in the case where NVAPI support is needed downstream.
* Standard library operations that require NVAPI for their implementation on HLSL include a new `[__requiresNVAPI]` attribute.
* The preprocessor has been extended so that after tokenizing an input file it looks up the NVAPI-relevant macros in the resulting environment, and if they are set it attached a modifier (`NVAPISlotModifier1) to the AST `ModuleDecl` that is based on their values. Logic is added to detect if multiple input files specify values for the macros in ways that conflict.
* The semantic checking step is extended so that it detects the "magic" NVAPI declarations (the `g_NvidiaExt` paramter and the `NvShaderExtnStruct` type that it uses) and attaches a modifier to them so that they can be identified as such in later steps.
* Parameter binding is extended to collect a list of the AST modifiers that reflect NVAPI binding, and to reserve the relevant register(s) so that ordinary user-defined parameters cannot conflict with them.
* IR lowering translates the three new AST modifiers related to NVAPI over to IR equivalents.
* IR linking is extended to make sure that it clones any `IRNVAPISlotDecoration`s attached to the input modules. The pass intentionally does not care where the modifiers came from; it just collects them all and leaves it to downstream code to sort out what they mean.
* Emit logic is extended to have a notion of "prelude directives" which are preprocessor directives that should come *before* the prelude in the generated code, because they can impact the way that the prelude compiles. This is done so that we don't have to introduce ad hoc logic for each downstream compiler to set any relevant `-D` flags (e.g., both fxc and dxc would need to duplicate such logic for NVAPI support).
* The HLSL source emitter is extended to track whether it emits any operations that require NVAPI support.
* The HLSL source emitter is extended to emit prelude directives based on whether NVAPI is needed and, if it is, to also set the register and space that NVAPI should use based on what was stored in the decoration(s) on the IR module.
* The HLSL source emitter is extended so that it detects global instructions that represent "magic" NVAPI constructs , and emit them as conditional definitions so that they are skipped when NVAPI is included via the prelude.
* The handling of requires capabilities during emit logic was cleaned up a bit so that more logic is shared across targets, and also so that the same logic is used both when emitting a function declaration/definition and when emitting a call to an instrinsic function (which won't get declared/defined).
|
|
This problem is only visible when
* using `RWByteAddressBuffer.Load<V>`,
* when `V` is `vector<T,N>`,
* and `V` is a non-32-bit type
In such a case, the Slang compiler generates output HLSL like:
someBuffer.Load<vector<T,N>>(someOffset);
and dxc balks because it fails to parse the `>>` as closing the generics, and instead parses it as a shift operator.
The solution here is simple: add a space before the closing `>` when emitting a `.Load<T>` operation.
Note that this change does not come with a test fix *yet* because writing the test case exposed a more complicated issue with GLSL codegen for this same scenario. This change includes the simple single-byte fix, to unblock users while we work on a fix for the GLSL case (and that fix will include the test coverage).
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Improve comments.
* Small comment improvement.
|
|
* Enable all dynamic dispatch tests on CUDA.
* Fix expected cross-compile test results.
|
|
* Test if blob is returned.
* Rename serialize files so can be grouped.
* StringRepresentationCache -> SerialStringTable
* Split out SerialStringTable from slang-serialize-ir
* First pass at reorganizing serialization/containers. Remain some issues about debug info.
* Fix bug in calculating sourceloc.
* Improve calcFixSourceLoc
* Make allocations for payload RiffContainer align to at least 8 bytes. This is important for read, if the payload can contain 8 byte aligned data. Note this has no effect on Riff file format alignment rules.
* Improve comments around RiffContainer and alignment.
* Remove SerialStringTable, can just use StringSlicePool instead.
* Add flags to control what is output in SerialContainer.
Turn off AST output for obfuscated code.
Lazily create astClasses when doing write container serialization.
* Typo fix for Clang/Linux.
* Fixes that came out of review
* TranslationUnit -> Module
* TargetModule -> TargetComponent
* PAYLOAD_MIN_ALIGNMENT -> kPayloadMinAlignment
|