| Age | Commit message (Collapse) | Author |
|
|
|
* Added GLSL_460 if ray tracing is used on fragment shader.
Moved GLSL specific setup init function.
* Split out _requireRayTracing method.
|
|
* Small improvements to documentation and code around DiagnosticSink
* Made methods/functions in slang-syntax.h be lowerCamel
Removed some commented out source (was placed elsewhere in code)
* Making AST related methods and function lowerCamel.
Made IsLeftValue -> isLeftValue.
|
|
* Synthesize "active mask" for CUDA
The Big Picture
===============
The most important change here is to `hlsl.meta.slang`, where the declaration of `WaveGetActiveMask()` is changed so that instead of mapping to `__activemask()` on CUDA (which is semantically incorrect) it maps to a dedicated IR instruction.
The other `WaveActive*()` intrinsics that make use of the implicit "active mask" concept had already been changed in #1336 so that they explicitly translate to call the equivalent `WaveMask*()` intrinsic with the result of `WaveGetActiveMask()`. As a result, all of the `WaveActive*()` functions are now no different from a user-defined function that uses `WaveGetActiveMask()`.
The bulk of the work in this change goes into an IR pass to replace the new instruction for getting the active mask gets replaced with appropriately computed values before we generate output CUDA code. That work is in `slang-ir-synthesize-active-mask.{h,cpp}`.
Utilities
=========
There are a few pieces of code that were helpful in writing the main pass but that can be explained separately:
* IR instructions were added corresponding to the Slang `WaveMaskBallot()` and `WaveMaskMatch()` functions, which map to the CUDA `__ballot_sync()` and `__match_any_sync()` operations, respectively. These are only implemented for the CUDA target because they are only being generated as part of our CUDA-only pass.
* The `IRDominatorTree` type was updated to make it a bit more robust in the presence of unreachable blocks in the CFG. It is possible that the same ends could be achieved more efficiently by folding the corner cases into the main logic, but I went ahead and made things very explicit for now.
* I added an `IREdge` utility type to better encapsulate the way that certain code operating on the predecessors/successors of an `IRBlock` were using an `IRUse*` to represent a control-flow edge. The `IREdge` type makes the logic of those operations more explicit. A future change should proably change it so that `IRBlock::getPredecessors()` and `getSuccessors()` are instead `getIncomingEdges()` and `getOutgoingEdges()` and work as iterators over `IREdge` values, given the way that the predecessor and successor lists today can contain duplicates.
* Using the above `IREdge` type, the logic for detecting and break critical edges was broken down into something that is a bit more clear (I hope), and that also factors out the breaking of an edge (by inserting a block along it) into a reusable subroutine.
The Main Pass
=============
The implementation of the new pass is in `slang-ir-synthesize-active-mask.cpp`, and that file attempts to include enough comments to make the logic clear. A brief summary for the benefit of the commit history:
* The first order of business is to identify functions that need to have the active mask value piped into them, and to add an additional parameter to them so that the active mask is passed down explicitly. Call sites are adjusted to pass down the active mask which can then result in new functions being identified as needing the active mask.
* The next challenge is for a function that uses the active mask, to compute the active mask value to use in each basic block. The entry block can easily use the active mask value that was passed in, while other blocks need more work.
* When doing a conditional branch, we can compute the new mask for the block we branch to as a function of the existing mask and the branch condition. E.g., the value `WaveMaskBallot(existingMask, condition)` can be used as the mask for the "then" block of an `if` statement.
* When control flow paths need to "reconverge" at a point after a structured control-flow statement, we need to insert logic to synchronize and re-build the mask that will execute after the statement, while also excluding any lanes/threads that exited the statement in other ways (e.g., an early `return` from the function).
The explanation here is fairly hand-wavy, but the actual pass uses much more crisp definitions, so the code itself should be inspected if you care about the details.
Tests
=====
The tests for the new feature are all under `tests/hlsl-intrinsic/active-mask/`. Most of them stress a single control-flow construct (`if`, `switch`, or loop) and write out the value of `WaveGetActiveMask()` at various points in the code.
In practice, our definition of the active mask doesn't always agree with what D3D/Vulkan implementations seem to produce in practice, and as a result a certain amount of effort has gone into adding tweaks to the tests that force them to produce the expected output on existing graphics APIs. These tweaks usually amount to introducing conditional branches that aren't actually conditional in practice (the branch condition is always `true` or always `false` at runtime), in order to trick some simplistic analysis approaches that downstream compilers seem to employ.
One test case currently fails on our CUDA target (`switch-trivial-fallthrough.slang`) and has been disabled. This is an expected failure, because making it produce the expected value requires a bit of detailed/careful coding that would add a lot of additional complexity to this change. It seemed better to leave that as future work.
Future Work
===========
* As discussed under "Tests" above, the handling of simple `switch` statements in the current pass is incomplete.
* There's an entire can of worms to be dealt with around the handling of fall-through for `switch`.
* The current work also doesn't handle `discard` statements, which is unimportant right now (CUDA doesn't have fragment shaders), but might matter if we decide to synthesize masks for other targets. Similar work would probably be needed if we ever have `throw` or other non-local control flow that crosses function boundaries.
* An important optimization opportunity is being left on the floor in this change. When block that comes "after" a structured control-flow region (which is encoded explicitly in Slang IR and SPIR-V) post-dominates the entry block of the region, then we know that the active mask when exiting the region must be the same as the mask when entering the region, and there is no need to insert explicit code to cause "re-convergence." This should be addressed in a follow-on change once we add code to Slang for computing a post-dominator tree from a function CFG.
* Related to the above, the decision-making around whether a basic block "needs" the active mask is perhaps too conservative, since it decides that any block that precedes one needing the active mask also needs it. This isn't true in cases where the active mask for a merge block can be inferred by post-dominance (as described above), so that the blocks that branch to it don't need to compute an active mask at all.
* If/when we extend the CPU target to support these operations (along with SIMD code generation, I assume), we will also need to synthesize an active mask on that platform, but the approach taken here (which pretty much relies on support for CUDA "cooperative groups") wouldn't seem to apply in the SIMD case.
* Similarly, the approach taken to computing the active mask here requires a new enough CUDA SM architecture version to support explicit cooperative groups. If we want to run on older CUDA-supporting architectures, we will need a new and potentially very different strategy.
* Because the new pass here changes the signature of functions that require the active mask (and not those that don't), it creates possible problems for generating code that uses dynamic dispatch (via function pointers). In principle, we need to know at a call site whether or not the callee uses the active mask. There are multiple possible solutions to this problem, and they'd need to be worked through before we can make the implicit active mask and dynamic dispatch be mutually compatible.
* Related to changing function signatures: no effort is made in this pass to clean up the IR type of the functions it modifies, so there could technically be mismatches between the IR type of a function and its actual signature. If/when this causes problems for downstream passes we probably need to do some cleanup.
* fixup: backslash-escaped lines
I did some "ASCII art" sorts of diagrams to explain cases in the CFG, and some of those diagrams used backslash (`\`) characters as the last character on the line, causing them to count as escaped newlines for C/C++.
The gcc compiler apparently balked at those lines, since they made some of the single-line comments into multi-line comments.
I solved the problem by adding a terminating column of `|` characters at the end of each line that was part of an ASCII art diagram.
* fixup: typos
Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
|
|
* Add support for generic load/store on byte-addressed buffers
Introduction
============
The HLSL `*ByteAddressBuffer` types originaly only supported loading/storing `uint` values or vectors of the same, using `Load`/`Load2`/`Load3`/`Load4` or `Store`/`Store2`/`Store3`/`Store4`. More recent versions of dxc have added support for generic `Load<T>` and `Store<T>`, which adds a two main pieces of functionality for users.
The first and more fundamental feature is that `T` can be a type that isn't 32 bits in size (or a vector with elements of such a type), thus exposing a capability that is difficult or impossible to emulate on top of 32-bit load/store (depending on what guarantees `*StructuredBuffer` makes about the atomicity of loads/stores).
The secondary benefit of having a generic `Load<T>` and `Store<T>` is that it becomes possible to load/store types like `float` without manual bit-casting, and also becomes possible to load/store `struct` types so long as all the fields are loadable/storable.
This change adds generic `Load<T>` and `Store<T>` to the Slang standard library definition of byte-address buffers, and tries to bring those same benefits to as many targets as possible. In particular, the secondary benefits become available on all targets, including DXBC: byte-address buffers can be used to directly load/store types other than `uint`, including user-defined `struct` types, so long as all of the fields of those types can be loaded/stored.
The ability to load/store non-32-bit types depends on target capabilities, and so is only available where direct support for those types is available. For 16-bit types like `half` this includes both Vulkan and D3D12 DXIL with appropriate extensions or shader models.
The implementation is somewhat involved, so I will try to explain the pieces here.
Standard Library
================
The changes to the Slang standard library in `hlsl.meta.slang` are pretty simple. We add new `Load<T>` and `Store<T>` generic methods to `*ByteAddressBuffer`, and route them through to a new IR opcode.
Right now the generic `Load<T>` and `Store<T>` do *not* place any constraints on the type `T`, although in practice they should only work when `T` is a fixed-size type that only contains "first class"
uniform/ordinary data (so no resources, unless the target makes resource types first class). Our front-end checking cannot currently represent first-class-ness and validate it (nor can it represent fixed-size-ness), so these gaps will have to do for now.
Rather than directly translate `Load<T>` or `Store<T>` calls into a single instruction, we instead bottleneck them through internal-use-only subroutines. The design choice here is intended to ensure that for some large user-defined type like `MassiveMaterialStruct` we only emit code for loading all of its fields *once* in the output HLSL/GLSL rather than once per load site. While downstream compilers are likely to inline all of this logic anyway, we are doing what we can to avoid generating bloated code.
Emit and C++/CUDA
=================
Over in `slang-emit-c-like.cpp` we translate the new ops into output code in a straightforward way. A call like `obj.Load<Foo>(offset)` will eventually output as a call like `obj.Load<Foo>(offset)` in the generated code, by default.
For the CPU C++ and CUDA C++ codegen paths, this is enough to make a workable implementation, and we add suitable templated `Load<T>` and `Store<T>` declarations to the prelude for those targets.
Legalization
============
For targets like DXBC and GLSL there is no way to emit a load operation for an aggregate type like a `struct`, so we introduce a legalization pass on the IR that will translate our byte-address-buffer load/store ops into multiple ops that are legal for the target.
Scalarization
-------------
The big picture here is easy enough to understand: when we see a load of a `struct` type from a byte-address buffer, we translate that into loads for each of the fields, and then assemble a new `struct` value from the results. We do similar things for arrays, matrices, and optionally for vectors (depending on the target).
Bit Casting
-----------
After scalarization alone, we might have a load of a `float` or a `float3` that isn't legal for D3D11/DXBC, but that *would* be legal if we just loaded a `uint` or `uint3` and then bit-casted it. The legalization pass thus includes an option to allow for loads/stores to be translated to operate on a same-size unsigned integer type and then to bit-cast.
To make this work actually usable, I had to add some more details to the implementation of the bit-cast op during HLSL emit and, more importantly, I had to customize the way that the byte-address buffer load/store ops get emitted to HLSL so that it prefers to use the existing operations like `Load`/`Load2`/`Load3`/`Load4` instead of the generic one, whenever operating on `uint`s or vectors of `uint`.
Translation to Structured Buffers
---------------------------------
Even after scalarizing all byte-address-buffer loads/stores, we still have a problem for GLSL targets, because a single global `buffer` declaration used to back a byte-address buffer can only have a single element type (currently always `uint`), so the granularity of loads/stores it can express is fixed at declaration time. If we want to load a `half` from a byte-address buffer, we need a dedicated `buffer` declaration in the output GLSL with an element type of `half`.
The solution we employ here is to translate all byte-address buffer loads into "equivalent" structured-buffer ops when targetting GLSL. We add logic to find the underlying global shader parameter that was used for a load/store and introduce a new structured-buffer parameter with the desired element type (e.g., `half`) and then rewrite the load/store op to use that buffer instead. We copy layout information from the original buffer to the new one, so that in the output GLSL all the various `buffer`s will use a single `binding` and thus alias "for free."
We don't want to create a new global buffer for every load/store, so we try to cache these "equivalent" structured buffers as best as we can. For the caching I ended up needing a pair to use as a key, so I tweaked the `KeyValuePair<K,V>` type in `core` so that it could actually work for that purpose.
Because we are working at the level of IR instructions instead of stdlib functions at this work I had to add new IR opcodes to represent structured-buffer load/store that only (currently) apply to GLSL.
Layout
======
In order to translate a load/store of a `struct` type into per-field load/store we need a way to access layout information for the types of the fields. Previously layout information has been an AST-level concern that then gets passed down to the IR only when needed and only on global parameters, so layout information isn't always available in cases like this, at the actual load/store point.
As an expedient move for now I've introduced a dedicated module that does IR-level layout and caches its results on the IR types themselves. This approach *only* supports the "natural" layout of a type, and thus is usable for structured buffers and byte-address buffers (or general pointer load/store on targets that support it), but which is *not* usable for things like constant buffer layout.
We've known for a while that the Right Way to do layout going forward is to have an IR-based layout system, and this could either be seen as a first step toward it, or else as a gross short-term hack. YMMV.
Details
=======
The GLSL "extension tracker" stuff around type support needed to be tweaked to recognize that types like `int16_t` aren't actually available by default. I switched it from using a "black list" of unavailable types at initialization time over to using a "white list" of types that are known to always be available without any extensions.
Tests
=====
There are two tests checked in here: one for the basic case of a `struct` type that has fields that should all be natively loadable, and one that stresses 16-bit types. Each test uses both load and store operations.
Future Directions
=================
Right now we translate vector load/store to GLSL as load/store of individual scalars, which means the assumed alignment is just that of the scalars (consistent with HLSL byte-address buffer rules). We could conceivably introduce some controls to allow outputting the vector load/store ops more directly to GLSL (e.g., declaring a `buffer` of `float4`s), which might enable more efficient load/store based on the alignment rules for `buffer`s.
The IR layout work has a number of rough edges, but the most worrying is probably the assumption that all matrices are laid out in row-major order. Slang really needs an overhaul of its handling of matrices and matrix layout, so I don't know if we can do much better in the near term.
At some point the IR-based layout system needs to be reconciled with our current AST-base layout, and we need to figure out how "natural" layout and the currently computed layouts co-exist (in particular, we need to make sure that the IR-based layout and the existing layout logic for structured buffers will agree). This probably needs to come along once we have moved the core layout logic to operate on IR types instead of AST types (a change we keep talking about).
As part of this work I had to touch the implementation of bit-casting for HLSL, and it seems like that logic has some serious gaps. We really ought to consider a separate legalization pass that can turn IR bitcast instructions into the separate ops that a target actually supports so that we can implement `uint64_t`<->`double` and other conersions that are technically achievable, but which are hard to express in HLSL today.
* fixup: missing files
|
|
* Fix typo in stdlib around WaveGetLaneIndex and WaveGetLaneCount
* Reorder emit so #extensions come before layout
* Added wave-get-lane-index.slang test.
|
|
The only big catch that I ran into with this batch was that I found the `float.getPi()` function was being emitted to the output GLSL even when that function wasn't being used. This seems to have been a latent problem in the earlier PR, but was only surfaced in the tests once a Slang->GLSL test started using another intrinsic that led to the `float : __BuiltinFloatingPointType` witness table being live in the IR.
The fix for the gotcha here was to add a late IR pass that basically empties out all witness tables in the IR, so that functions that are only referenced by witness tables can then be removed as dead code. This pass is something we should *not* apply if/when we start supporting real dynamic dispatch through witness tables, but that is a problem to be solved on another day.
The remaining tricky pieces of this change were:
* Needed to remember to mark functions as target intrinsics on HLSL and/or GLSL as appropriate (hopefully I caught all the cases) so they don't get emitted as source there.
* The `msad4` function in HLSL is very poorly documented, so filling in its definition was tricky. I made my best effort based on how it is described on MSDN, but it is likely that if anybody wants to rely on this function they will need us to vet our results with some tests.
|
|
* WIP add support for __spirv_version .
* Added IRRequireSPIRVVersionDecoration
* SPIR-V version passed to glslang.
Enable VK wave tests.
Split ExtensionTracker out, so can be cast and used externally to emit.
Added SourceResult.
* Fix warning on Clang.
* Missing hlsl.meta.h
* Refactor communication/parsing of __spirv_version with glslang.
* Fix some debug typos.
Be more precise in handling of substring handling.
* Make glslang forwards and backwards binary compatible.
* Small comment improvements.
* Added slang-spirv-target-info.h/cpp
* Fix for major/minor on gcc.
* Another fix for gcc/clang.
* VS projects include slang-spirv-target-info.h/cpp
* Removed SPIRVTargetInfo
Added SemanticVersion.
Don't bother with passing a target to glslang. Should be separate from 'version'.
* Renamed slang-emit-glsl-extension-tracker.cpp/.h -> slang-glsl-extension-tracker.cpp/.h
Fixed some VS project issues.
* Fix a comment.
* Added slang-semantic-version.cpp/.h
* Added slang-glsl-extension-tracker.cpp/.h
* Added split that can check for input has all been parsed.
* Fix problem on x86 win build.
|
|
* Added support for Targets to TypeTextUtil.
* Made Function names 'get' and 'find' instead of 'as' in TypeTextUtil.
|
|
When using row-major layout (via command-line or API option), the following sort of declaration:
```hlsl
StructuredBuffer<float4x4> gBuffer;
... gBuffer[i] ...
```
Generates unexpected results when compiled to DXBC via fxc or DXIL via dxc, because the fxc/dxc compilers do not respect the matrix layout mode in this specific case (a structured buffer of matrices). Instead, they always use column-major layout, even if row-major was requested by the user.
A user can work around this behavior by wrapping the matrix in a `struct`:
```hlsl
struct Wrapper { float4x4 wrapped; }
SturcturedBuffer<Wrapper> gBuffer;
... gBuffer[i].wrapped ...
```
This change simply automates that workaround when compiling for an HLSL-based downstream compiler, so that we get the same behavior across all our backends.
The change adds a test case to confirm the behavior across multiple targets, but it turns out we also had a test checked in that confirmed the buggy (or at least surprising) fxc/dxc behavior, so that one had its baselines changed and can work as a regression test for this fix as well.
|
|
* When using setUniform clamp the amount of data written to the buffer size.
* CUDA implement StructuredBuffer/ByteAddressBuffer as pointer/count as is on CPU.
Allow bounds check to zero index.
Update docs.
* Synthesize tests.
* Fix bug in CUDA output.
* Fixing more tests to run on CUDA.
* Added BaseType for layout of Vector and Matrix - as they are held as int32_t vector array types.
* Enable unbound array support on CUDA.
* Added unsized array support for CUDA documentation.
|
|
* WIP use IRTypeSet in CPPSourceEmitter - doesn't work because of a cloning issue, causing a crash on exit.
* Fix destruction of module issue for IRTypeSet usage in CPPEmitter.
* Fix out definition emitting ordering that was removed.
* Disable cuda output test.
|
|
* CPPCompiler -> DownstreamCompiler
* Added DownstreamCompileResult to start abstraction such that we don't need files.
* * Split out slang-blob.cpp
* Made CompileResult hold a DownstreamCompileResult - for access to binary or ISlangSharedLibrary
* Keep temporary files in scope.
* Add a hash to the hex dump stream.
* Move all file tracking into DownstreamCompiler.
* WIP support for nvrtc.
* WIP: Adding support for nvrtc compiler.
Adding enum types, wiring up the nvrtc into slang.
* Fix remaining CPPCompiler references.
* Fix order issue on target string matching.
* Use ISlangSharedLibrary for nvrtc.
* Use DownstreamCompiler for nvrtc.
* WIP first pass at compilation win nvrtc.
* Added testing if file is on file system into CommandLineDownstreamCompiler.
Added sourceContentsPath.
* Make test cuda-compile.cu work by just compiling not comparing output.
* Genearlize DownstreamCompiler usage.
* Fix warning on clang.
* Remove CompilerType from DownstreamCompiler.
* Use DownstreamCompiler interface for all compilers.
NOTE for FXC, DXC and GLSLANG this doesn't mean using 'compile' - it's still extracting functions from shared library.
* Replace DownstreamCompiler::SourceType -> SlangSourceLanguage
* Replace _canCompile with something data driven.
* Fix compiling on gcc/clang for DownstreamCompiler.
* Moved some text conversions into DownstreamCompiler.
* Fix problem on non-vc builds with not having return on locateCompilers for VS.
* Change so no warning for code not reachable on locateCompilers for vs.
* WIP: CUDA code generation - currently just using CPU layout and HLSL.
* emitXXXForEntryPoint -> emitEntryPointSource
emitSourceForEntryPoint -> emitEntryPointSourceFromIR
Fix up generating cuda to get PTX.
* WIP emitting cuda for IR.
* Small improvements to CUDA ouput.
* Disable the CUDA emit test, as output not currently compilable.
|
|
* CPPCompiler -> DownstreamCompiler
* Added DownstreamCompileResult to start abstraction such that we don't need files.
* * Split out slang-blob.cpp
* Made CompileResult hold a DownstreamCompileResult - for access to binary or ISlangSharedLibrary
* Keep temporary files in scope.
* Add a hash to the hex dump stream.
* Move all file tracking into DownstreamCompiler.
* WIP support for nvrtc.
* WIP: Adding support for nvrtc compiler.
Adding enum types, wiring up the nvrtc into slang.
* Fix remaining CPPCompiler references.
* Fix order issue on target string matching.
* Use ISlangSharedLibrary for nvrtc.
* Use DownstreamCompiler for nvrtc.
* WIP first pass at compilation win nvrtc.
* Added testing if file is on file system into CommandLineDownstreamCompiler.
Added sourceContentsPath.
* Make test cuda-compile.cu work by just compiling not comparing output.
* Genearlize DownstreamCompiler usage.
* Fix warning on clang.
* Remove CompilerType from DownstreamCompiler.
* Use DownstreamCompiler interface for all compilers.
NOTE for FXC, DXC and GLSLANG this doesn't mean using 'compile' - it's still extracting functions from shared library.
* Fix compiling on gcc/clang for DownstreamCompiler.
* Fix problem on non-vc builds with not having return on locateCompilers for VS.
* Change so no warning for code not reachable on locateCompilers for vs.
|
|
* CPPCompiler -> DownstreamCompiler
* Added DownstreamCompileResult to start abstraction such that we don't need files.
* * Split out slang-blob.cpp
* Made CompileResult hold a DownstreamCompileResult - for access to binary or ISlangSharedLibrary
* Keep temporary files in scope.
* Add a hash to the hex dump stream.
* Move all file tracking into DownstreamCompiler.
* WIP support for nvrtc.
* WIP: Adding support for nvrtc compiler.
Adding enum types, wiring up the nvrtc into slang.
* Fix remaining CPPCompiler references.
* Fix order issue on target string matching.
* Use ISlangSharedLibrary for nvrtc.
* Use DownstreamCompiler for nvrtc.
* WIP first pass at compilation win nvrtc.
* Added testing if file is on file system into CommandLineDownstreamCompiler.
Added sourceContentsPath.
* Make test cuda-compile.cu work by just compiling not comparing output.
* Fix warning on clang.
|
|
* CPPCompiler -> DownstreamCompiler
* Added DownstreamCompileResult to start abstraction such that we don't need files.
* * Split out slang-blob.cpp
* Made CompileResult hold a DownstreamCompileResult - for access to binary or ISlangSharedLibrary
* Keep temporary files in scope.
* Add a hash to the hex dump stream.
* Move all file tracking into DownstreamCompiler.
|
|
* WIP setting downstream compiler.
* Setting default downstream compiler for a source type.
|
|
* Initial work on direct emission of SPIR-V
This change adds a first vertical slice of support for emitting SPIR-V code directly from the Slang IR, instead of generating it indirectly via GLSL.
This work isn't usable for anything valuable right now; the goal is just to get something checked in that we can incrementally extend over time.
When invoking `slangc`, the `-emit-spirv-directly` option can be used to turn on the new code path.
I have not bothered to add an equivalent API option, because this flag is only intended to be used for testing in the immediate future.
The existing `emitEntryPoint()` function has become `emitEntryPointSource()` to more accurately reflect its role in a world where we can also emit entry points to a binary format.
Much of the logic that was inside `emitEntryPoint()` had to do with linking and then optimizing/transforming Slang IR code to get it ready for emission on a particular target.
This logic has been factored into a new `linkAndOptimizeIR()` function that can be shared between the path that emits source and the new one that emits SPIR-V.
The meat of the change is then the `emitSPIRVFromIR()` function in `slang-emit-spirv.cpp`, which is called *after* all the optimizations and transformations have been applied to the Slang IR to get it ready.
Rather than repeat myself here, I will try to make the comments in `slang-emit-spirv.cpp` usable as documentation of the approach being taken.
Smaller notes:
* I've included a test case that compares `slangc` output directly to expected SPIR-V. This is perhaps not an ideal plan for how to test SPIR-V emission going forward, but it suffices for now.
* The `external/` directory needed to be added to the include dirs for the `slang` project so that the new code can depend on the SPIR-V header.
* In `slang-ir-link`, the direct SPIR-V generation path means that we now link with a target of SPIR-V instead of GLSL. In principle this can be used to ensure that appropriate variants of intrinsics are selected based on the knowledge that we are emitting SPIR-V. In practice, that isn't being used at all.
* Fixup: path for SPIR-V headers
While working on this PR I used a copy of `spirv.h` that I placed into the repository tree manually, but since I started the work we ended up with SPIR-V headers in our tree anyway, albeit at a different path.
This change tries to fix things up so that my code uses the headers that were already placed in the repository.
* fixup; 64-bit build issue
* fixup: typo fixes based on review
|
|
* If obfuscate is enabled do strip on Layout
* Add option to keep insts that have layout decoration (else DCE strips layout)
* Add NameHint back in lowering - as strip now correctly removes. We may want NameHints in some stages even with obfuscation (for error messages in IR passes), as long as they are removed appropriately at the end
|
|
* Added RiffReadHelper
* Move type to fourCC in Chunk simplifies some code.
* Make MemoryArena able to track external blocks.
Allow ownership of Data to vary.
Changed IR serialization to use moved allocations to avoid copies.
As it turns out all of the array writes could use unowned data, but doing so requires the IRData to stay in scope longer than IRSerialData, which it does at the moment - but perhaps needs better naming or a control for the feature.
* Write out slang-module container.
* WIP on -r option.
Loading modules - with -r.
* Making the serialized-module run (without using imported module).
* Split compiling module from the test.
* Separate module compilation with a function working.
* Remove serialization test as not used.
* Fix warning on gcc.
* Updated test to have types across module boundary.
* Allow entry point declaration.
A test that tries to build with just an entry point declaration and a module.
* Try to make link work with multiple modules.
* Multi module linking first pass working.
* Multi module test working with -module-name option
* Added feature to repro manifest of approximation of command line that was used.
* Use isDefinition - for determining to add decorations to entry point lowering.
* Added support for repo-file-system.h
More precise control of CacheFileSystem.
Allow RelativeFileSystem to strip paths optionally.
Use canonical paths in PathInfo cache.
Fix bug in -D options for command line output of StateSerailizeUtil
* Add missing slang-options.h
* Fix bug in bit slang-state-serialize.cpp with bit removal.
* Added documentation around -repro-file-system
Added spLoadReproAsFileSystem function.
* Fix warning.
* spAddLibraryReference
* * Add support for slang-lib extension
* Container output when using -no-codegen option
* Use the m_containerFormat to determine if the module container is constructed.
Store the result in a blob. This allows for potential access via the API.
Write the blob if a filename is set.
Use m_ prefix for container variables.
* Added spGetContainerCode.
Made spGetCompileRequestCode work.
* * Put obfuscateCode on linkage
* Remove obfuscation from variable names - as can be achieved by either stripping and/or removing NameHintDecorations at lowering
* Remove name hints being added during lowering
* Add stripping of SourceLoc location in strip phase
* Hashing of linkage import/export names.
* Do final strip in emitEntryPoint, removes any remaining SourceLoc.
|
|
* Strip IR after front-end steps are done
The main feature of this change is to unconditonally strip out the `IRHighLevelDeclDecoration`s in an IR module once the "mandatory" IR passes in the front end have run. This ensures that later IR passes (e.g., code emission) *cannot* rely on AST-level information to get their job done.
Since I was already writing a pass to remove some instructions at the end of the front-end passes, I went ahead and also made the `-obfuscate` flag apply to the front-end IR generation by causing it to strip `IRNameHintDecoration`s while it is doing the other stripping. With this, the main identifying information left in IR modules (other than semantics and entry-point names) is mangled name strings for imported/exported symbols.
A few other things got changes along the way:
* Removed the `.expected` file for one of the tests, where that file seemingly shouldn't have been checked in at all.
* Updated the signature of the DCE pass both so that it doesn't require a back-end compile request (it wasn't using it anyway), and so that it takes some options to decide whether to keep symbols marked `[export(...)]` alive (the front-end wants to keep these, while back-end passes currently need to be able to eliminate them).
* Moved the `obfuscateCode` flag from the back-end compile request to the base class shared between front- and back-end requests, and updated the options and repro logic to set both as needed. An obvious improvement in the future would be to have the front- and back-end requests share these settings by referencing a single common object in the end-to-end case, rather than each having their own copy.
* Removed logic that was keeping layout instructions alive in DCE, even if they weren't used. This seems to have been a vestige of an intermediate step between AST and IR layout.
* fixup: add the new files
|
|
* Initial work on representing layout at IR level
This change starts the process of making the back-end of the compiler independent of the AST-level layout information (`TypeLayout`, `VarLayout`, etc.) so that it instead only relies on layout information that is embedded into IR modules. This brings us incrementally closer to a world in which the back-end could be run without the AST-level structures even existing (e.g., for an application that just wants to ship IR without any AST information for IP protection, while still supporting some amount of linking and specialization).
The main parts of the change are:
* There is a bunch of incidental churn related to specifying entry points by index instead of the `EntryPoint` object for certain operations. This ends up being a better choice because we can use the index to look up side-band information about the entry point that might not be stored on the `EntryPoint` object itself. In particular...
* We expand the `ComponentType` interface to support looking up the mangled name of an entry point by index. In common cases (no generic/interface specialization) this would be the same as asking the `EntryPoint` for its mangled name, but in cases where we have specialized a generic entry point, the mangled name would include speicalization arguments that are only available on the `SpecializedComponentType` that wraps the entry point. This part of the change isn't ideal and there might be a better solution waiting to be invented. Note that we store mangled entry point names as strings rather than using `DeclRef`s because that ensures that the information could be serialized and deserialized without a dependence on the AST.
* The `TargetProgram` type (which represents binding a specific `ComponentType` for a shader program to a specific `TargetRequest` that represents the target platform) is expanded to include an `IRModule` that represents layout information, in addition to the AST-level `ProgramLayout` it already contained. We create both of these objects at the same time (on-demand) to simplify the overall flow (so that any code that triggers creation of the AST-level layout will also ensure that the IR-level layout exists).
* A bunch of code in the emit passes that was passing down layout-related objects has been eliminated. It appears that most of those objects weren't actually being used, so this is just a cleanup, but it helps ensure that the back-end steps are "clean" and don't depend on the AST-level information. The one big exception here is that the emit logic needs to know the stage for the entry point being emitted (to deal with one wrinkle in translating DXR to VKRT).
* A big change (actually introduced by @jsmall-nvidia in a branch that this change copied and then built from) is to introduce some more explicit IR instructions to represent layout information, notably an `IRTypeLayout` and an `IRVarLayout`. For now these objects still reference their AST equivalents, but the separation gives us an incremental path to move information from the AST-level objects over to the IR ones. This work includes logic in `IRBuilder` to construct the IR-level layout objects from the AST-level ones on-demand, so that the existing code paths that try to attach AST-level layout will continue to work for now.
* Because layout information is now embedded in the IR, the `slang-ir-link.cpp` logic loses a lot of cases that used to deal with attaching AST-level layout objects to IR-level instructions during the linking process. Instead, the linker now assumes that one (or more) of the input IR modules will have layout information associated with it, and the linker makes sure to copy layout decorations (and the instructions they reference) from the input IR module(s) to the output using its more ordinary mechanisms.
* Inside `slang-lower-to-ir.cpp`, we add logic to construct an IR module in a `TargetProgram` that simply references the global shader parameters, entry points, etc. and attaches IR layout decorations to them. This is akin to the existing pass in the same file that constructs IR to represent specialization information, and both of these passes share infrastructure with the main AST->IR lowering pass. Eventually, it is expected that this pass will encompass more of the logic for copying AST-level layout information over to IR-level equivalents.
* One small wrinkle with this change was that the output for an HLSL generation test case changed some of its `#line` directives. The old code was actually more inaccurate than the new, so this change just updated the baseline. It also added some logic in the linker to make sure that when an IR instruction has multiple definitions, we try to pick up a source location from any of them, in case the "main" one somehow didn't get a location.
* Another small fix was that the key/value map in `StructTypeLayout` for mapping fields/members to their layouts was keyed on `Decl*` when it really should have been `VarDeclBase*`.
This change should in principle be a pure refactoring with no functionality changes, so no new tests were added. It is unfortunately also a change that has a high probability of breaking at least *some* client code, so we may want to be defensive and mark this with a new major version number (well, a new *minor* version number since we are pre-`1.0`) to give us some room for releasing hotfixes to the old version if needed.
* fixup: infinite recursion bug detected by clang
* fixup: remove commented-out code
|
|
* Added setDownstreamCompilerPrelude
Renamed setPassThroughPath to setDownstreamCompilerPath.
Fixed tests.
Added prelude directory & code to TestToolUtil to setup default preludes for testing/command line apis.
* Fix merge problem
* Remove hacks to make prelude work by adding a search path as no longer needed with 'user prelude'.
* Split up prelude into scalar intrinsics, and types.
Use slang.h for main header.
slang-cpp-prelude.h can now just include what it needs (relative to prelude directory) and define the few remaining things/work arounds.
* Fix typo.
|
|
* Expanded prelude for some other resource types. Disable C++ output for ParameterGroup.
* WIP: Layout for CPU.
* Fixes to CPU layout.
* WIP: The uniform is output, but the variable definition is not.
* WIP: Entry point parameters to global scope in C++.
Handling of resource types (in so far as outputting)
* Some discussion of ABI and different input types.
* WIP: More C++ support around resource types.
* WIP: Split up variables into different structures on emit.
* WIP: Emitting C++ with wrapping up of 'Context'
* WIP: C++ code has access to semantic values.
Wrap in struct so can use method calls to pass shared state.
Disable legalizeResourceTypes and legalizeExistentialTypeLayout
* Fix structured buffer layout for CPU.
* Remove testing/handling of global uniforms on CPU path.
Typo fix.
Changed CPU tests to use new CPU calling convention.
* Check globals are working. Initalize context to zero globals.
* Order the global parameters for C++ ouput by their layout.
Note - that layout isn't quite working correctly because the StructuredBuffer<int> the int seems to be consuming uniform space.
* Work around for reflection not having all data needed for layout ordering for C++ code.
* Output constant buffers as pointers.
* Entry point parameters accessed through pointer to struct.
* WIP: Layout for CPU is reasonable for test case.
* Only output 'f' after float literal if type marks as a float.
* Cast construction works on C++.
* Made IntrinsicOp::ConvertConstruct to make intent clearer.
* C++ handling construction from scalar.
Handle access of a scalar with .x.
Check default initialization.
* Comment about need for split of kIROp_construct.
Release build works.
* Added support from constructVectorFromScalar to C/C++ target.
* Handling of in/out in C/C++.
* First pass documentation CPU support.
* Improvements to C++/C slang code generation documentation.
* Small doc change to include need for mechansim to specify cpp compiler path.
* Better handling of swizzling - allow swizzling a scalar into a vector.
|
|
* Revise new COM-lite API
This change revises the "COM-lite" API that was recently introduced to try to streamline it and introduce some missing central/base concepts.
The central new abstraction in the API is the notion of a "component type," which is a unit of shader code composition. A component type can have:
* IR code for some number of functions/types/etc.
* Zero or more global shader parameters
* Zero or more "entry point" functions at which execution can start
* Zero or more "specialization" parameters (types or values that must be filled in before kernel code can be generated)
* Zero or more "requirements" (dependencies on other component types that must be satisfied before kernel code can be generated)
Both individual compiled modules, and validated entry points are then examples of component types, and we additionally define a few services that apply to all component types:
* We can take N component types and compose them to create a new component type that combines their code, shader parameters, entry points, and specialization parameters. A composed component type may also include requirements from the sub-component types, but it is also possible that by composing thing we satisfy requirements (if `A` requires `B`, and we compose `A` and `B`, then the requirement is now satisfied, and doesn't appear on the composite).
* We can take a component type with N specialization parameters, and specialize it by giving N compatible specialization arguments. The result of specialization is a new component type with zero specialization parameters. Under the right circumstances the specialzed component type will be layout compatible with the unspecialized one.
* One more example that isn't exposed in the public API today is that we can take a component with requirements and "complete" it by automatically composing it with component types that satisfy those requirements. This can be seen as a kind of linking step that pulls together the transitive closure of dependencies.
* We can query the layout for the shader parameters and entry points of a component type, for a specific target.
* We can query compiled kernel code for an entry point in a component type (for a specific target). This only works for component types with zero specialization parameters and zero requirements.
The idea is that by giving users a fairly general algebra of operations on component types, they can compose final programs in ways that meet their requirements. For example, it becomes possible to incrementally "grow" a component type to represent the global root signature for ray tracing shaders as new entry points are added, in such a way that it always stays layout-compatible with kernels that have already been compiled.
Much of the implementation work here is in implementing the unifying component type abstraction, and in particular re-writing code that used to assume a program consisted of a flat list of modules and entry points to work with a hierarchical representation that reflects the underlying algebra (e.g., with types to represent composite and specialized component types).
There's also a hidden "legacy" case of a component type to deal with some legacy compiler behaviors that can't be directly modeled on top of the simple algebra with modules and entry points.
This API is by no means feature-complete or fully developed. It is expected that we will flesh it out more when bringing up application code (e.g., Falcor) on top of the revamped API.
One notable thing that went away in this change is explicit support for "entry point groups" and notions of local root signatures (especially the Falcor-specific handling of the `shared` keyword, which a previous change turned into an explicitly supported feature). With the new "building blocks" approach, it should be possible for a DXR application to deal with local root signatures as a matter of policy (on top of the API we provide). If/when we need to provide some kind of emulation of local root signatures for Vulkan (and/or if Vulkan is extended with an explicit notion of local root signatures), we might need to revisit this choice.
* Fix debug build
There was invalid code inside an `assert()`, so the release build didn't catch it.
* fixup: warnings
* fixup: more warnings-as-errors
* fixup: review notes
* fixup: use component type visitors in place of dynamic casting
|
|
This change adds back a little bit of explicit support for global constants in the IR, after a previous change completely removed the existing `IRGlobalConstant` node type.
The new `IRGlobalConstant` is *not* a parent instruction, and doesn't function at all like the old one.
Instead it is effectively a simple instruction that takes zero or one operands:
* The zero-operand case represents a constant with unknown value. This would usually come from another module, and thus would have an `[import(...)]` linkage decoration, so that after linking it resolves to a constant with a known value.
* In the one-operand case, the single operand represents the value of the constant, so that the operation semantically behaves like an identity function. It exists just to give decorations something to "attach" to, so that a global constant with a value can have, e.g., an `[export(...)]` decoration to establish linkage.
The IR lowering pass was updated to create the new node type to wrap any global constants. For now we do this both for global `static const` variables and function-scope `static const`, although the latter doesn't really need the extra indirection.
The IR linking logic was extended to handle linking of global constants akin to how other global instructions are handled. The new logic is mostly boilerplate, and it is likely that a refactor of the linking logic would eliminate the need for this kind of per-instruction-opcode handling of IR instructions that can have linkage.
A custom pass was added that is intended to be run right after linking (it could arguably be folded into `linkIR()`, but I thought it was safer to keep each pass as small as possible). This pass replaces any `IRGlobalConstant` that has a value (operand) with that value, so that global constants should be eliminated after the linking step. This ensures that downstream optimization/transformation passes don't have to deal with the possibility of global constants.
Almost all the existing passes would Just Work if global constants were left in the IR. The two big exceptions are:
* Anything that relies on testing `IRInst*` identity as a way to test for things having the same value would break, since a global constant is a distinct `IRInst*` from its value.
* The type legalization pass doesn't handle `IRGlobalConstant` instructions with non-simple types. This could be added if we ever wanted it, but it seemed silly to write this code now if it would always be dead (and thus untested).
I went ahead and updated the emit logic to handle an `IRGlobalConstant`s that still existing in the IR module at emit time, since the amount of code required was small so that being robust to that case seemed safest (e.g., in case we ever want to have a path that emits code directly while skipping some/all of our IR transformation passes).
There should be no visible changes to the functionality of the compiler with this change, but it should help make IR dumps from the front-end more clear/explicit (since each constant will be a distinct instruction with its own name), and paves the way for supporting proper cross-module linkage of constants.
|
|
* WIP: Emitting Cpp
* Added HLSLType instead of using IRInst - because they don't seem to be deduped.
* Removed need for lexer to take a String.
Added mechansim to lookup intrinsic functions on C++.
* A c/c++ cross compilation test.
* WIP Cpp output using cloning and slang types.
* More work to generate mul funcs.
* WIP: Outputting some simple C++.
* Expose findOrEmitHoistableInst to IRBuilder to aid cloning,
* Simplification for checking for BasicTypes.
Test infrastructure compiles output C++ code.
* Dot and mat/vec multiplication output.
* First pass at swizzling.
* First support for binary ops.
* Builtin binary and unary functions.
* Any and all.
* WIP adding support for other functions.
Added code to generate function signature.
* Add scalar functions to slang-cpp-prelude.h
* Support for most built in operations.
* Tested first ternary.
* Checking the emitting of corner cases functions - normalize, length, any, all, normalize, reflect.
* Check asfloat etc work.
* Fmod support.
* WIP Array handling in C++.
* First stage in being able to handl arbitrary type output for CLikeSourceEmitter
* Removed Handler/Emitter split - so can implement more easily complex type naming.
* Array passing by value first pass.
* Rename Array -> FixedArray
* Outputs structs in C++.
* Emit the thread config.
* Dimension -> TypeDimension
* SpecializedOperation -> SpecializedIntrinsic
Operation -> IntrinsicOp
Use shared impl of isNominalOp
Commented use of m_uniqueModule etc.
* Add code to test slang->cpp when compiled doesn't have errors. Does so by building shared library and exporting the entry point.
* Fix linux clang/gcc compile error about override not being specified.
* Make sure c-cross-compile is run on linux targets/smoke.
* Remove c-cross-compile.slang from smoke.
* Fix running tests/cross-compile/c-cross-compile.slang on Ubuntu 16.04
* Only add -std=c++11 for C++ source.
|
|
* Start exposing a new COM-lite API
This change is mostly about exposing a new API to the Slang compiler that allows more fine-grained control over the compilation flow. The basic concepts in the new API are:
* An `IGlobalSession` is the granularity at which we load/parse the Slang stdlib, and therefore gives applications a way to amortize startup cost for the library across multiple compiles. This is a concept that might be able to go away in a future version of Slang.
* An `ISession` owns all the code that gets loaded/compiled/generated. Any `import`ed modules are shared across everything in a session (we don't re-parse/-check the code when we see another `import` for the same module). Any generic- or interface-based code in the session can be specialized using types from the same session (but not necessarily across sessions).
* An `IModule` is the unit of code loading and scoping. It doesn't expose any API in this change, but would be the right scope for looking up types or entry points by name.
* An `IProgram` is a "linked" combination of modules and entry points from which code can be generated and reflection information queried.
This change re-uses the existing reflection API types, rather than introduce a new API that duplicates that functionality. That will probably change in a future revision.
There are two major pieces of functionality added here that aren't related to the new API:
* We now have an API concept of "entry point groups" which are one or more entry points that are intended to be used together so that they need to have non-overlapping parameters. For now this is being used to handle "hit groups" and local root signatures for ray tracing, but I'm not sure this is a concept we will keep in the long run.
* We have a very special-case (client-application-specific) flag that ascribes special meaning to the `shared` keyword, so that it can be attached to global parameters to indicate that they are actually to be part of the local root signature rather than the global one for DXR.
None of the API design (including naming) here is finalized; the only reason to check in the changes at this point to avoid having a long-running branch that leads to merge pain. Clients should *not* try to depend on the new API just yet, since it is still a work in progress.
* fixup: clang warning
* fixup: try to detect clang C++11 support
* fixup
* fixup
* fixup
* fixup
* fixup: review feedback
|
|
* * Added SourceStyle to CLikeSourceEmitter, to limit cases to actual target types.
* Made Impl methods _ prefixed
* Small tidyup
* * SourceStream -> SourceWriter
* use slang-emit- prefix on SourceWriter file
* * Remove EmitContext -> merge into CLikeSourceEmitter
* slang-c-like-source-emitter -> slang-emit-source.cpp
* ExtensionUsageTracker -> GLSLExtensionTracker
slang-extension-usage-tracker.cpp/.h -> slang-emit-glsl-extension-tracker.cpp/.h
* emit-source.cpp.h -> emit-c-like.cpp/.h
* Small fix to move where some _ prefixed functions are declared in CLikeSourceEmitter.
* * CLikeSourceEmitter::CInfo -> Desc
* Functions to get and find CodeGenTarget by name
* Split out empty language impls
* Create an impl based on SourceStyle
* * CodeGenTarget conversion to and from string
* Move HLSL specific functions to HLSLEmitSource.
* Emitting texture and image types.
* Move move GLSL specific functionality to GLSLSourceEmitter
* Split more out of slang-emit-c-like
* Refactor more out of slang-emit-c-like
* * tryEmitIRInstExprImpl(IRInst* inst, IREmitMode mode, const EmitOpInfo& inOuterPrec)
* Fix bug around output of uintBitsToFloat
* More work refactoring out target specifics from slang-emit-c-like
* Move functions that are only implemented once in GLSL impl into their Impl method.
* Move rate qualification out of slang-emit-c-like
* * Added getEmitOpForOp - allows for table usage so different ops can be dealt with the same way
* Moved vector comparison to slang-emit-glsl
*
* * Use EmitOpInfo to control output in slang-emit-c-like.cpp for unary ops
* Move more functionality from CLikeSourceEmitter to HLSLSourceEmitter
* Make output of parameters implementaion specific.
* Extracted interpolation modifiers.
* Remove IR from methods that don't need them.
* Remove IR from method names.
* Refactor handling of output of types - to make the impls implement the full path without lots of cases for specific impls
* Add variable declaration modifiers and matrix layout to larget specific in slang-emit.
* Make target specific internal functions _ prefixed.
|
|
(#974)
* * Added SourceStyle to CLikeSourceEmitter, to limit cases to actual target types.
* Made Impl methods _ prefixed
* Small tidyup
* * SourceStream -> SourceWriter
* use slang-emit- prefix on SourceWriter file
* * Remove EmitContext -> merge into CLikeSourceEmitter
* slang-c-like-source-emitter -> slang-emit-source.cpp
* ExtensionUsageTracker -> GLSLExtensionTracker
slang-extension-usage-tracker.cpp/.h -> slang-emit-glsl-extension-tracker.cpp/.h
* emit-source.cpp.h -> emit-c-like.cpp/.h
* Small fix to move where some _ prefixed functions are declared in CLikeSourceEmitter.
|
|
* Prefixing source files in source/slang with slang-
* Prefix source in source/slang with slang- prefix.
* Rename core source files with slang- prefix.
* Update project files.
* Fix problems from automatic merge.
|