| Age | Commit message (Collapse) | Author |
|
* Initial refactor
* Refactor passes tests
* Removed Differential Bottom references from the IR side
|
|
* Clean up type checking of higher order expressions.
* Replace `goto` with `break` to pacify clang.
* Fix.
* Fixes.
* Fix more tests.
* Fix lowerWitnessTable parameter error.
* Exclude attributes from ast printing.
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Add [ForwardDerivativeOf] attribute.
* Fix handling around phi nodes.
* Fixes.
* Remove IR opcode for ForwardDerivativeOfDecoration.
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Rework differential conformance dictionary checking.
* Revert space changes.
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Modified the new type system to support generic differentiable types and added support for differentiating overloaded functions.
* Changed a few asserts to release asserts to avoid unreferenced variable errors
* Fixed a naming issue with TypeWitnessBreadcumb::Flavor::Decl
* Added logic to avoid tracking differentiable types if the module does not use auto-diff or define differentiable types.
* Moved the auto-diff passes to after the specialization step, added a more complex generics test
* Added a generics stress test and fixed AST-side logic. IR side needs some more work
* Added differential getter and setter logic, fixed multiple issues with DifferentiableTypeDictionary, added support for loops and conditions
* Changed differential getters to use pointer types, added getter type checking
* Fixed some bugs related to diff type registration and differential getters
* Removed some superfluous code
* Removed some more unused code.
* Fixed an issue with witness substitution
* Minor fix
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* WIP with hierarchical enums.
* Some small fixes and improvements around artifact desc related types.
* Improvements around hierarchical enum.
* Fixes to get Artifact types refactor to be able to execute tests.
* Attempt to better categorize PTX.
* Work around for potentially unused function warning.
* Typo fix.
* Simplify Artifact header.
* Small improvements around Artifact kind/payload/style.
* Added IDestroyable/ICastable
* Add IArtifactList.
* First impl of IArtifactUtil.
* Use the ICastable interface for IArtifactRepresentation.
* Added IArtifactRepresentation & IArtifactAssociated.
* Add SLANG_OVERRIDE to avoid gcc/clang warning.
* Fix calling convention issue on win32.
* Fix missing SLANG_OVERRIDE.
* First attempt at file abstraction around Artifact.
* Added creation of lock file.
* Move functionality for determining file paths to the IArtifactUtil.
Add casting to ICastable.
* Added some casting/finding mechanisms.
* Simplify IArtifact interface, and use Items for file reps.
* Fix problem with libraries on DXIL.
* Split out ArtifactRepresentation.
* Move ArtifactDesc functionality to ArtifactDescUtil. ArtifactInfoUtil becomes ArtifactDescUtil.
* Split implementations from the interfaces for Artifact.
* Use TypeTextUtil for target name outputting.
* Add artifact impls.
* Add ICastableList
* Added UnknownCastableAdapter
* Make ISlangSharedLibrary derive from ICastable, and remain backwards compatible with slang-llvm.
* Refactor Representation on Artifact.
* Make our ISlangBlobs also derive from ICastable.
Make ISlangBlob atomic ref counted.
* Fix typo.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* WIP with hierarchical enums.
* Some small fixes and improvements around artifact desc related types.
* Improvements around hierarchical enum.
* Fixes to get Artifact types refactor to be able to execute tests.
* Attempt to better categorize PTX.
* Work around for potentially unused function warning.
* Typo fix.
* Simplify Artifact header.
* Small improvements around Artifact kind/payload/style.
* Added IDestroyable/ICastable
* Add IArtifactList.
* First impl of IArtifactUtil.
* Use the ICastable interface for IArtifactRepresentation.
* Added IArtifactRepresentation & IArtifactAssociated.
* Add SLANG_OVERRIDE to avoid gcc/clang warning.
* Fix calling convention issue on win32.
* Fix missing SLANG_OVERRIDE.
|
|
only process referenced functions. (#2309)
* Added JVPTranscriber to handle differentiation of load, store, var, param and return instructions, as well as conversion of data and function types
* Changed class names to be more in line with convention. Added correct type checking for __jvp() and verified that simple calls with only loads and stores are processed correctly
* Added logic to differentiate basic arithmetic and literals inside IRConstruct and fixed the way parameters are differentiated
* Replaced some SLANG_UNEXPECTED macro uses with diagnostics instead
* Added work-list-based on-demand generation of derivative functions
* Fixed up a couple of TODOs
* Added attribute [__custom_jvp(f)] to assign a custom derivative function to a declaration
* Added a test for CustomJVPAttribute on a redeclaration of an imported function
* Moving arithmetic test to new folder
* Moving arithmetic test to new folder (2)
* Added missing test module
* Fixed a minor note
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Use TerminatedUnownedStringSlice for literals in output C++.
* Remove Escape/Unescape functions used in slang-token-reader.cpp
Add target type of 'host-cpp' etc to map to the target types.
* Fix some corner cases around string encoding.
* Added unit test for string escaping.
Fixed some assorted escaping bugs.
* Updated test output.
* Added decode test.
* Stop using hex output, to get around 'greedy' aspect. Use octal instead.
|
|
Co-authored-by: Yong He <yhe@nvidia.com>
Co-authored-by: Theresa Foley <10618364+tangent-vector@users.noreply.github.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Add support for HLSL `export`.
* Test for using `export` keyword.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Compile to a dxil library.
* Added CompileProduct.
* Support handling of ModuleLibrary.
* CacheBehavior -> Cache
* Use CompileProduct for -r references.
* CompileProduct -> Artifact.
* Determining an artifact type on binding.
* Determine binary linkability.
* Added Artifact::exists.
* Added ArtifactKeep.
* Small fixes.
* Small improvements to Artifact.
* Add zip extension.
* Fix some comments.
* Fix multiple adding of PublicDecoration.
Make public output export for DXIL/lib.
Add checking for simpleDecorations such that only added once.
* Use 'whole program' to identify library build.
* Move slang-artifact into compiler-core.
* Split out Keep free functions.
* Artifact::Keep -> ArtifactKeep.
* Handle libraries as artifacts.
* Add -target dxil so test infrastructure knows it needs DXC.
* Linking working in DXC.
* Improve handling around emit for 'export'.
* Add comment around Artifact name.
* Render test working with linking.
* Improvements around Artifact handling.
* Add ArtifactPayloadInfo.
* Small tidy up around artifact.
* Split out code to get info about Artifacts into artifact-info.cpp/.h
* IArtifact interface and IArtifactInstance interface.
* Fix small issues.
* Fix compilation warning issue.
* Fix missing SLANG_OVERRIDE.
* Small fixes to make compilation work on Visual Studio 2022.
* Small improvements to Artifact interface/naming.
* Added Desc with each element in IArchive to allow more flexibility in usage.
* Fix clang warning issue.
* Add ArtifactPayload::Diagnostics
* More discussion around IArtifact usage.
* Re-add slang-artifact.h which was removed during merge.
* Fix typo identified in review.
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Compile to a dxil library.
* Added CompileProduct.
* Support handling of ModuleLibrary.
* CacheBehavior -> Cache
* Use CompileProduct for -r references.
* CompileProduct -> Artifact.
* Determining an artifact type on binding.
* Determine binary linkability.
* Added Artifact::exists.
* Added ArtifactKeep.
* Small fixes.
* Small improvements to Artifact.
* Add zip extension.
* Fix some comments.
* Fix multiple adding of PublicDecoration.
Make public output export for DXIL/lib.
Add checking for simpleDecorations such that only added once.
* Use 'whole program' to identify library build.
* Move slang-artifact into compiler-core.
* Split out Keep free functions.
* Artifact::Keep -> ArtifactKeep.
* Handle libraries as artifacts.
* Add -target dxil so test infrastructure knows it needs DXC.
* Linking working in DXC.
* Improve handling around emit for 'export'.
* Add comment around Artifact name.
* Render test working with linking.
Co-authored-by: Theresa Foley <10618364+tangent-vector@users.noreply.github.com>
|
|
* #include an absolute path didn't work - because paths were taken to always be relative.
* Compile to a dxil library.
* Added CompileProduct.
* Support handling of ModuleLibrary.
* CacheBehavior -> Cache
* Use CompileProduct for -r references.
* CompileProduct -> Artifact.
* Determining an artifact type on binding.
* Determine binary linkability.
* Added Artifact::exists.
* Added ArtifactKeep.
* Small fixes.
* Small improvements to Artifact.
* Add zip extension.
* Fix some comments.
|
|
An earlier refactoring pass over the compiler codebase split the
type that had been called `CompileRequest` into three distinct
pieces:
* `FrontEndCompileRequest` which was supposed to own state and
options related to running the compiler front end and producing
IR + reflection (e.g., what translation units and source
files/strings are included).
* `BackEndCompileRequest` which was supposed to own state and options
related to running the compiler back end to translate the IR
for a `ComponentType` (program) into output code. (Note that the
`BackEndCompileRequest` was conceived of as orthogonal to the
`TargetRequest`s, which store per-target and target-specific
options.)
* `EndToEndCompileRequest` which was an umbrella object that owns
separate front-end and back-end requests, plus any state that is
only relevant when doing a true end-to-end compile (such as the
kinds of compiles initiated with `slangc`). As originally conceived,
the only state that this type was supposed to own was stuff related
to "pass-through" compilation, as well as state related to writing
of generated code to output files.
That refactoring work was very useful at the time, because it allowed
us to "scrub" the back end compilation steps to remove all
dependencies on front-end and AST state (this was important for our
goals of enabling linking and codegen from serialized Slang IR).
At this point, however, it is clear that the hierarchy that was built
up serves very little purpose:
* The `BackEndCompileRequest` type is only used in two places:
* As part of an `EndToEndCompileRequest`, where the settings on
the `BackEndCompileRequest` can be configured, but only through
the `EndToEndCompileRequest`
* As part of on-demand code generation through the `IComponentType`
APIs. In this case, the settings stored on the
`BackEndCompileRequest` are not accessible to the application
at all, and will always use their default values, so that
instantiating a "request" object doesn't really make any sense.
* The `FrontEndCompileRequest` type has a similar situation:
* Front-end compilation as part of an `EndToEndCompileRequest`
supports user configuration of `FrontEndCompileRequest` settings,
but only through the `EndToEndCompileRequest`
* Front-end compilation triggered by an `import` or a `loadModule()`
call does not support user configuration of settings at all. It
will always derive all relevant settings from thsoe on the
session ("linkage").
In addition, subsequent changes have been made to the compiler that
show a bit of a "code smell" and/or forward-looking worries for this
decomposition:
* In some cases we've had to add the same setting to multiple types
in the breakdown (front-end, back-end, end-to-end, linkage, target,
etc.) which makes it harder for us to validate that all the possible
mixtures of state work correctly.
* Related to the above, in some cases we have manual logic that copies
state from one of the objects in the breakdown to another, in order
to ensure that the user's intention is actually followed.
* As a forward-looking concern, it seems that developers have sometimes
added new configuration options and state to places that don't really
make sense according to the rationale of the original decomposition
(e.g., we probably don't want to have a lot of state that is only
available via end-to-end requests, given that the API structure is
meant to push users *away* from end-to-end compiles).
As a result of all of the above, I've been planning a large refactor
with the following big-picture goals:
* Eliminate `BackEndCompileRequest`
* Move all relevant state/options from the back-end request to
the end-to-end request, since that is the only place they could
be set anyway.
* Introduce a transient "context" type to be used for the duration
of code generation that serves the main functions that back-end
requests really served in the codebase
* Make `EndToEndCompileRequest` be a subclass of
`FrontEndCompileRequest`
* Consider addding a transient "context" type for front-end
compiles that can be used in `import`-like cases rather than
needing a full front-end request object. If this works, then
eliminate `FrontEndCompileRequest` and be back to world with
just a single `CompileRequest` type
* Move *all* compiler configuration options to a distinct type (named
something like `CompilerConfig` or `CompilerOptions` or whatever)
which stores setting as key-value pairs, and has a notion of
"inheritance" such that one configuration can extend or build on top
of another. Make all the relevant types use this catch-all structure
instead of redundantly storing flags in many places.
This change deals with the first of those bullets: removeal of
`BackEndCompileRequest`. The addition of the `CodeGenContext` type is
perhaps an unncessary additional step, but making that change helps
clean up a bunch of the code related to per-target code generation,
so I think it is the right choice.
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Cleanup refactoring work around the IR builder
We have some long-term goals for the IR that require a more centralized and disciplined set of rules for how IR instructions get created/emitted. I had been working on trying to set things up so that all IR instruction creation goes through a single bottleneck point, but the non-trivial work in that branch was getting drowned out by the sheer volume of cleanup and refactoring changes. This change tries to pull together several of the more important cleanups.
The big pieces are:
* `IRBuilder` and `SharedIRBuilder` now protect their data members and rely on users to initialize them more directly via constructor of an `init()` method. This change affects a *bunch* of sites where `IRBuilder`s were created. I changed use sites to use the constructors whenever possible, and to use `init()` in cases where we had longer-lived builders that needed to be initialized multiple times.
* The insertion location for the `IRBuilder` now uses an encapsulated type called `IRInsertLoc`. This new type can replace what used to be just two `IRInst*` fields in the builder, and also covers some new functionality (if we ever want to take advantage of it). Very little client code cares about this change, but it is still a nice cleanup in terms of making things more explicit.
* The creation of an `IRModule` has been moded *out* of `IRBuilder`, because in practice we `IRBuilder` always wants to be associated with a pre-existing `IRModule` at creation time (via its `SharedIRBuilder`). There is now an `IRModule::create()` operation instead. This required changing the sequencing at many `IRModule` creation sites, since most had been contriving to make an `IRBuilder` first. There were also several cleanups because code had been carelessly using non-reference-counted pointers for `IRModule`s in ways that broke now that `IRModule::create()` always returns a `RefPtr`.
* The core operations to actually allocate memory for IR instructions were moved into `IRModule` (since they interact with the memory pool that the module owns). These *were* called `createEmptyInst()` but have been renamed into `_allocateInst()`. In principle these seem like they should only be needed to be called by the `IRBuilder`, but in practice they are also needed by the IR deserialization logic.
* A few core operations for emitting IR instructions that were associted with `IRBuilder` were moved to actually be methods on `IRBuilder`. First is `_findOrEmitConstant` which is the primary bottleneck for creating simple scalar constant values. Another is `_createInst` (formerly part of the templated `createInstImpl` along with `createInstWithSizeImpl`) which is the main bottleneck for allocation and initialization of any instruction other than a constant (well, the `IRModuleInst` is the other exception...). Finally, there is also `_maybeSetSourceLoc()`, which is obvious to scope inside the `IRBuilder` once it is protecting the source-location info.
Notes:
* The `minSizeInBytes` parameter to `_createInst()` might not actually be needed at all. At this point any `IRInst` subtypes that need data allocated for things other than their operands already get created manually via `_allocateInst` or `_findOrEmitConstant`, so I *think* we could remove that part. I will handle that in a subsequent cleanup if it turns out to be the case.
* There is one IR pass (`slang-ir-string-hash.cpp`) that is using manual `_allocateInst()` instead of going through an `IRBuilder`. It could be easily cleaned up to not do so (and I will probably make that change down the line), but for now I wanted to avoid doing anything that wasn't close to pure refactoring if I could.
* At this point in our design an `IRBuilder` is a very lightweight thing - it basically just owns the insertion location plus a source location to write into instructions. A lot of our code currently treats `IRBuilder`s like they are expensive and/or need to be re-used (which leads to them being used in more mutable/stateful ways). It is quite likely that as we clean up other aspects of the implementation of IR creation/emission we can make `IRBuilder` use feel more lightweight in ways that can streamline and simplify code.
* The next step for this work is to identify the different paths that eventually lead to `_createInst()` being called, and unify them at a single bottleneck operation that can own the decisions around when to create an instruction vs. when to re-use an existing one (rather than those decisions being baked into the various `IRBuilder` subroutines that create instructions of the various subtypes).
* fixup: gcc/clang C++ spec details
|
|
|
|
* Add an accessor for IRInst opcode
This main changing is renaming `IRInst::op` over to `IRInst::m_op` and then adds an accessor `IRInst::getOp()` to read it. The rest of the changes are just changing use sites to `getOp` (or to `m_op` in the limited cases where we write to it).
This work is in anticipation of a future change that might need to store an extra bit in the same field as the opcode. It seemed better to do this massive refactoring as a separate PR.
* fixup
|
|
* Add first steps toward a "capability" system
We already have cases in the stdlib where we mark declarations as being specific to certain targets, e.g.:
```
// My ordinary function to add two numbers.
// Works everywhere.
//
void myFunc(int a, int b) { return a + b; }
// On the "coolgpu" target, we can use a secret intrinsic
// that adds numbers even faster!
//
__specialized_for_target(coolgpu)
void myFunc(int a, int b) { return __secretIntrinsic(a, b); }
```
The existing logic for dealing with these modifiers (`__specialized_for_target` and `__target_intrinsic`) was almost entirely string-based. We would turn the chosen compilation target into a string, and then use that to try and search for the "best" definition of a function at a few steps:
* During IR linking, we always pick one definition of an `[import]`ed function, and that definition will be the one with the "best" target-specialization modifier (if any)
* During final code generation, we always look up the "best" target-intrinsic modifier, and use it as the template for the code we output.
This change preserves the basic flow there, but replaces the ad hoc string-based logic with something a bit more principled, in terms of a new `CapabilitySet` type.
A `CapabilitySet` represents a set of zero or more atomic features (here represented as `CapabilityAtom`s). What a `CapabilitySet` means depends on how and where it is used:
* A compilation target implies a `CapabilitySet` where the contents of the set are the features the target *supports*.
* A `CapabilitySet` attached to a declaration (or a modifier on that declaration) describes a set of feature that declaration *requires*.
The current implementation of `CapabilitySet` is wasteful and inefficient, but that is something we can iterate on over time.
In practice, most of the current code only ever uses capability sets that are either empty (because they represent a function with no specific requirements) or singleton (because they represent asingle atomic capability like "is a GLSL target," "is an HLSL target," etc.).
The main goal here was to put in the skeleton of a new system, including some of the features it might need down the line, and then to leave changes that eventually use the greater flexibility for later. Eventually, the capability system should encompass:
* Differences between shader model versions, GLSL versions, SPIR-V versions, etc. (currently tracked with other modifiers)
* Optional extensions, and functions that are made available only with certain extensions (currently tracked with other modifiers)
* Front-end checking that the call graph of a program doesn't violate any capability-requirements (e.g., having a GLSL+HLSL portable function call a GLSL-only subroutine)
* Hypothetically we can also try to fold stage-specific (vertex-only, fragment-only, etc.) functions into this system, but doing so would require more linker cleverness if we allow overloading on stages (since we might have to clone a caller if it calls through to a callee with multiple stage-specific versions)
One important complication that the system has to deal with just because of the "do what I mean" nature of the current compiler is that somethings a current Slang user might compile for target X and specify version N, but then use a function that actually requires version N+1 of that target. Currently the Slang compiler silently "upgrades" the version(s) used by user code in these cases, because it is often what users want in cross-compilation scenarios.
Dealing with the "silent upgrade" situation requires us to be a little careful and sometimes pick a "best" capability set that doesn't appear to be supported on our target. Refining that system and potentially getting rid of the "do what I mean" behavior over time could be a goal for future changes.
* fixup: handle case where value is incompatible during linking
|
|
* Make witness and RTTI handles lower to `uint2`.
And enable some dynamic dispatch tests on D3D/VK.
* Bug fixes.
|
|
Overview
========
Prior to this change, we had two different code generation strategies for interface/existential types in Slang, that didn't always play nicely together:
* The "legacy" static specialization approach could handle plugging in an arbitrary concrete type for an existential type parameter (including types with resources, etc.), but wouldn't work well with things like a `StructuredBuffer<>` of an interface type, and requires somewhat counter-intuitive layout rules to make work.
* The new dynamic dispatch approach produces simpler, more easily understood layouts by assuming that values of interface type can fit into a fixed number of bytes. The tradeoff there is that it cannot handle types that include resources (only POD types).
The goal of this change is to make it so that the two strategies can co-exist. In particular, in cases where a shader is amenable to both static specialization and dynamic dispatch, the type layouts should agree.
In order to make the type layouts agree, we:
* Declare that *all* values of existential type reserve storage according to the dynamic-dispatch rules (so 16 bytes for the RTTI and witness-table information, plus whatever bytes are needed to story "any value" of a conforming type).
* Then we modify the "legacy" layout rules so that if a value of concrete type can fit in the reserved "any value" space for a given interface, then it is laid out there exactly like the dynamic dispatch rules would do. Otherwise, we fall back to the previous legacy rules (since we don't need to agree with the dynamic-dispatch layout on types that can't be used with dynamic dispatch).
Details
=======
* Renamed `ExistentialBox` to `BoundInterfaceType` to better clarify how it relates to `BindExistentialsType`
* Unconditionally apply the `lowerGenerics` pass during emit, since it is now responsible for aspects of the lowering of existential types when specialization is used.
* Made IR type layout take the target into account, so that the layout of resource types can vary by target (e.g., being POD on some targets, and invalid on others)
* Cleaned up some issues around using global shader parameters as the "key" for their layout information in the global-scope layout (only comes up when there are global-scope `uniform` parameters)
* Made there be a default any-value size (16) instead of making it be an error to leave out. This was the simplest option; we could try to go back to having an error, but we'd need to only issue it if we are sure a type/interface is being used with dynamic dispatch, since static dispatch doesn't have to obey the restrictions.
* Changed lowering of existential types to tuples so that bound interfaces where the concrete type won't fit use a "pseudo-pointer" instead of an "any-value" to hold the payload
* Changed IR type legalization to handle the "pseudo-pointer" case and apply layout information from an interface type over to the payload part when static specialization was used.
* Changed some details of how witness tables were being lowered, so that we didn't have to create "proxy" witness tables for the constraints on associated types (just use the actual requirement entries we generate)
* Changed witness tables so that they know the subtype doing the conforming
* Added logic so that we don't generate pack/unpack logic and witness table wrapper functions for types that are incompatible with any-value/dynamic dispatch for a given interface.
* Changed the core AST-level type layout logic to use the dynamic-dispatch layout in case things fit, and the legacy static specialization case when things don't (while also reserving space for the dynamic-dispatch fields)
* Changed a bunch of test cases for static specialization to properly use the new layout (which introduces new buffers in some cases, and moves data around in others).
Future Work
===========
The experience of trying to reconcile our older way of handling interface-type specialization with our newer model (that supports dynamic dispatch) makes it clear that we really need to make similar changes to our handling of generic type parameters on entry points and at the global scope.
A future change should make it so that a global type parameter is lowered with a type layout similar to a value parameter of interface type, including the RTTI and witness-table pieces, and just leaving out the "any value" piece. A similar translation strategy should apply to entry-point generic parameters (mirroring how we lower generic functions for dynamic dispatch already), and value specialization parameters.
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
In some cases, functionality is available as either a GLSL extension for Vulkan/SPIR-V, or through the NVAPI system for D3D. This situation creates complications because while GLSL extensions are generally all supported by the open-source glslang compiler (which we can bundle and ship), NVAPI operations are exposed through a specific header (`nvHLSLExtns.h`) that ships as part of the NVAPI SDK.
When a user wants to explicitly use NVAPI-provided operations in their shader code, there are no major complications for Slang; the user sets up their include paths, `#include`s the relevant header, calls functions in it, and lets Slang deal with the details of compilation.
The challenge for Slang arises when we want to provide a cross-platform interface in our standard library (e.g., the `RWByteAddressBuffer.InterlockedAddF32` method that was recently added) that uses either a GLSL extension (when compiling for Vulkan/SPIR-V) or an NVAPI (when compiling to DXBC or DXIL). In that case, the code *generated* by Slang now has a dependency on NVAPI, and we need to somehow emit a `#include` directive that pulls it in when invoking fxc or dxc. Because we do not (and seemingly cannot) bundle the NVAPI header with the compiler, we have to rely on ther user to have it available and to somehow communicate to Slang where it is.
Exposing portable routines that sometimes use NVAPI currently creates two main challenges:
1. The user is forced to interact with the "prelude" mechanism in the compiler, which allows the programmer to define code in a given target language that gets prepended to the Slang-generated code. While the prelude mechanism is powerful, it is also hard for users to integrate into their workflow, and our experience so far is that users want something that Just Works.
2. If the user writes code that uses some of our abstract operations that layer on NVAPI *and* they also want to use NVAPI explicitly, they end up with two copies of the NVAPI header (one included by the Slang front-end, and another included by the downstream fxc/dxc compiler). This puts the user in the situation of (a) having to ensure that they set the defines like `NV_SHADER_EXTN_SLOT` consistently both when invoking Slang and when adding their prelude, and (b) even if they do make the definitions consistent, they run into the problem that fxc/dxc complain about overlapping register bindings on the two copies of the `g_NvidiaExt` global shader paraemter that the NVAPI header declares.
This change attempts to resolve both issues by adding a lot of "do what I mean" logic to the compiler to try to ease things in the common case. In particular:
1. The user no longer needs to use the "prelude" mechanism when using NVAPI. The compiler now embeds a default prelude for HLSL output, which will `#include` the NVAPI header if and only if the generated code needs NVAPI access because of portable standard library routines that were used.
2. The user can mix-and-match explicit NVAPI use and stdlib functions that compile to use NVAPI. The register/space to be used by NVAPI when included via prelude is now set based on whatever the user set via the preprocessor so that it should automatically be consistent between both cases. Furthermore, the code we emit for the declaration of `g_NvidiaExt` when compiling explicit NVAPI use is set up to be conditional, so that it is skipped in the case where the prelude will pull in its own declaration of that parameter.
The way all this is achieved involves a lot of moving pieces:
* We now have an HLSL prelude, which mostly just serves to `#include "nvHLSLExtns.h"` in the case where NVAPI support is needed downstream.
* Standard library operations that require NVAPI for their implementation on HLSL include a new `[__requiresNVAPI]` attribute.
* The preprocessor has been extended so that after tokenizing an input file it looks up the NVAPI-relevant macros in the resulting environment, and if they are set it attached a modifier (`NVAPISlotModifier1) to the AST `ModuleDecl` that is based on their values. Logic is added to detect if multiple input files specify values for the macros in ways that conflict.
* The semantic checking step is extended so that it detects the "magic" NVAPI declarations (the `g_NvidiaExt` paramter and the `NvShaderExtnStruct` type that it uses) and attaches a modifier to them so that they can be identified as such in later steps.
* Parameter binding is extended to collect a list of the AST modifiers that reflect NVAPI binding, and to reserve the relevant register(s) so that ordinary user-defined parameters cannot conflict with them.
* IR lowering translates the three new AST modifiers related to NVAPI over to IR equivalents.
* IR linking is extended to make sure that it clones any `IRNVAPISlotDecoration`s attached to the input modules. The pass intentionally does not care where the modifiers came from; it just collects them all and leaves it to downstream code to sort out what they mean.
* Emit logic is extended to have a notion of "prelude directives" which are preprocessor directives that should come *before* the prelude in the generated code, because they can impact the way that the prelude compiles. This is done so that we don't have to introduce ad hoc logic for each downstream compiler to set any relevant `-D` flags (e.g., both fxc and dxc would need to duplicate such logic for NVAPI support).
* The HLSL source emitter is extended to track whether it emits any operations that require NVAPI support.
* The HLSL source emitter is extended to emit prelude directives based on whether NVAPI is needed and, if it is, to also set the register and space that NVAPI should use based on what was stored in the decoration(s) on the IR module.
* The HLSL source emitter is extended so that it detects global instructions that represent "magic" NVAPI constructs , and emit them as conditional definitions so that they are skipped when NVAPI is included via the prelude.
* The handling of requires capabilities during emit logic was cleaned up a bit so that more logic is shared across targets, and also so that the same logic is used both when emitting a function declaration/definition and when emitting a call to an instrinsic function (which won't get declared/defined).
|
|
* Enable all dynamic dispatch tests on CUDA.
* Fix expected cross-compile test results.
|
|
* Allow existential types in `StructuredBuffer` element type.
* Handle StructuredBuffer.Load/.Consume methods
* Clean up unnecessary changes
* Code cleanup
* Update test comment
|
|
Most people agree that it is a Good Thing when compilers are deterministic: the exact same input bits produce the exact same output bits every time the compiler is run. Bonus points are awarded if the results are independent of the platform the compiler was compiled for and run on.
One of the easiest kinds of nondeterminism to have sneak into a compiler is for it to produce the "same" code inside functions, but sometimes emits functions or other global symbols in a different order from run to run. Right now, the Slang compiler has some of this kind of nondeterminism.
The main way (but not necessarily the only way) that a compiler ends up producing output with a different ordering across runs is by iterating over the contents of a hash-based container (in our codebase, a `Dictionary` or `HashSet`), where the keys make use of pointers. Most operating systems intentionally try to randomize the address space of processes across runs (as a security feature), so that exact pointer values are not stable across runs, and thus hash value are not stable across runs, and thus the ordering of entries is not stable across runs.
This change identifies a few cases of iterating over dictionaries or sets that could have produced output non-determinism:
* The `HLSLIntrinsicSet` was using a `Dictionary` to store intrinsics that had been referenced, and would later produce a linear list of those intrinsics based on their order in the dictionary.
* The `WitnessTable`s produced by the front-end stored a `Dictionary` or requirements, and lowering from AST->IR was iterating over that dictionary to ensure that everythign got emitted.
* The `SharedSemanticsContext` was tracking a `HashSet` of modules that were imported into scope (so that their `extension`s should be visible), and an iteration over that list was used when producing candidate extensions during lookup. This case is unlikely to cause any nondeterminism in final output, but could lead to nondeterministic ordering in diagnostic messages for ambiguous reference/overload cases.
* The IR linker maintains a `Dictionary` of symbols based on their mangled names, and iterates over it in code that clones all witness tables into the linked IR whether or not they are referenced.
For most of these cases the fix is simple:
* Keep both a `Dictionary`/`HashSet` and a `List` of the appropriate type
* Whenever adding to the hash-based container also add to the list
* Whenever iterating, iterate over the list
In the final case of the IR linker, the relevant code was marked with a `TODO` comment noting that it shouldn't actually be needed, so I simply dropped it and the change doesn't seem to break any of our tests. I've been fairly confident that code wasn't needed for a while.
This change isn't exactly elegant, and a better long term solution might be to introduce two new types, `OrderedDictionary` and `OrderedSet`, which are similar to `Dictionary` and `HashSet` except that they guarantee a deterministic order of enumeration of their contents, based on insertion order.
(Note that a `SortedDictionary` and/or `SortedSet` that use something like a binary tree to produce a "determinsitc" sorted order wouldn't actually help here, because sorting entries by pointer values wouldn't solve the underlying problem that the pointer values aren't stable across runs)
I've chosen to avoid adding new types to `core` in the interest of making the change as small as possible. If we all agree that new types are warranted, it should be easy to clean up these use cases.
Testing this change is difficult, because we can't produce a reliable test to rule out nondeterminism. I have done best-effort testing by hand by crafting shaders that show output nondeterminism, and then compiling them both with and without these changes.
|
|
* GPU Foreach Loop
This PR introduces the completed GPU foreach loop and updates the
heterogeneous-hello-world example to use it. This PR builds on the
previous introduction of the GPU Foreach loop parsing and semantic
checking PR (#1482) by introducing IR lowering and emmitting. THe new
feature can be used by having a GPU_Foreach loop interacting with a
named non-CPP entry point, and using the -heterogeneous flag.
* Fix to path
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
* Multiple Entry Point Backend
This PR introduces changes to the IR linking, emitting, and options for
multiple entry points. Specifically, this PR updates several locations
to support a (potentially empty) list of entry points, adding list infrastructure and looping over entry points as appropriate.
* Formatting change
* Updated unknown target case to not require an entry point
* Formatting and list consts updates
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
* Adding support for global uniform shader parameters
This change adds support for Slang programmers to declare shader parameters of "ordinary" types at global scope:
```hlsl
uniform float gScaleFactor;
void main() { ... *= gScaleFactor; ... }
```
The generated HLSL/GLSL/DXIL/SPIR-V output will be something along the lines of:
```hlsl
struct GlobalParams
{
float gScaleFactor;
}
cbuffer globalParams
{
GlobalParams globalParams;
}
void main() { ... *= globalParams.gScaleFactor; ... }
```
The binding information used for the implicit `globalParams` constant buffer will be determined by the existing implicit parameter binding logic (which already had support for this kind of transformation).
The reason this change is being pursued right now is because it is one step toward removing the implicit `KernelContext` type that is used to wrap the generated code for our CPU and CUDA C++ targets. Handling global-scope parameters of ordinary type requires an IR pass that synthesizes the `GlobalParams` structure type above, and that step ends up removing the need for the similar `UniformState` structure that was being used in the CPU/CUDA emit logic.
A more detailed guide to the changes included follows:
* The diagnostic for a global-scope variable that is implicitly a shader parameter was kept, but changed to a warning. Users can opt out of the warning by decorating their parameter as a `uniform` (since that keyword is already being used to mark entry-point parameters that should be treated as uniform shader parameters).
* To simplify the task of finding the global shader parameters, the `CLikeSourceEmitter` type has been given an `m_irModule` member. The previous emit logic for `UniformState` was having to do a roundabout solution involving the `EmitAction`s to deal with not having direct access to the module.
* Removed a few dead declarations in the emit logic (related to a much earlier point where emit was based on the AST instead of the IR).
* Made the computation of type names in C++ emit take into account `ConstantBuffer<T>` and `ParameterBlock<T>`. As far as I can tell, these were being handled with some special-case hacks in the emit logic instead of being supported more fundamentally. It might actually be good to pass these through as `ConstantBuffer<T>` and `ParameterBlock<T>` in the C++ output, and allow the prelude to customize their translation (defaulting to defining them as `T*`).
* Removed the special-case C++ emit logic for references to global shader parameters. There are now at most two global shader parameters to deal with, and the default emit logic (referring to them by name) does the Right Thing.
* Changed the handling of entry points for C++ (both CPU and CUDA) so that it handles the bundled-up shader paameters for the global and entry-point scopes the same way. The main complication here is OptiX, where parameter data is passed very differently than it is for CUDA compute kernels.
* Reverted changes to `ir-entry-point-uniforms` that had made its logic depend on the compilation target. The parameter binding logic was already responsible for deciding if a given target needed to wrap up its entry-point parameters in a constant buffer, and the IR pass was respecting that layout information. The current workaround had been removing the `ConstantBuffer<T>` indirection from this IR pass for CPU/CUDA, but then reintroducing the same indirection later on in the emit step.
* Added an explicit IR pass with the task of collecting global-scope parameters of uniform/ordinary type and packaging them up into a `struct`, and then optionally packaging that `struct` up in a constant buffer. This pass bases its decisions on the IR layout information that was already computed, so it should match whatever policy choices were made at the layout level.
* Changed the "key" operand on IR `struct` layout information to not assume an `IRStructKey`. The problem here is that the global scope gets a `StructTypeLayout` to represent its members, and this is convenient (rather than having to always special-case logic that handles the global scope), but the "fields" of that struct are global variables which do not have `IRStructKey`s associated with them. The simplest solution is to use the variables themselves as the keys, which required removing the assumption in the IR encoding.
* Updated the IR layout process to compute a layout for the global scope of an entire program, and to attach that to the `IRModule` via a decoration. Updated the IR linking process to carry through that decoration to the linked output. This is necessary so that the IR pass that transforms global parameters can access the global-scope layout information.
An important concern with this approach is that the contents and layout of the monolithic `GlobalParams` structure depends on the exact set of modules that were linked (and the order in which they were specified, in some cases). This isn't really a new thing with this change, but it becomes more important as we start to think of how to generalize things to better support separate compilation and linking.
There are changes that can (and should) be made to the way that IR layouts are computed for programs (e.g., so that we compute layout per-module and then combine them rather than as a whole-program step). In this case, the problem of forming the combined/linked global layout can be moved down the IR level and not be reliant on AST-level information.
Just changing the way layout and linking interact would not change the fundamental problem that global shader parameters as they currently exist in Slang/HLSL/GLSL are not readily compatible with true separate compilation. We either need to find a solution strategy that we can apply to allow existing shaders to work with separate compilation *or* we need to incrementally work toward removing support for global-scope shader parameters in favor of explicit entry-point parameters in all cases.
* fixup: missing files
* fixup: comment the new code
|
|
This PR introduces support for the public modifier for functions. This
keyword allows labelled functions to be written to the compiled without
having a link to an entry point. The goal of this change is to help
support heterogeneous design of Slang by permitting C++ code to interact
with CPU slang functions.
Internally, this PR adds the public decoration to the IR and defines a
lowering from the public modifier in the AST to this decoration.
Additionally, the Keep Alive decoration is added to any public modifier
being lowered, which prevents DCE from eliminating functions labelled
with the public keyword.
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
(#1420)
* Fix handling of UniformState from #1396
* * Fix bug in slang-dxc-support where it didn't get the source path correctly
* Make entryPointIndices const List<Int>&
|
|
* Backend for Multiple Entry Points
Introduces the basic backend on the compiler for zero or more entry
points. Entry points have been extended to lists for several functions,
with loopFunctions have been extended to take in entry points and
indices as appropriate, to allow for multiple entry points once the
frontend is expanded. Several functions are currently being assumed to
have a single entry point for simplicity and provide a work in progress
commit.
* Progress on debugging fixes
* Tests passing
* Refactored emitEntryPoints
* Updated lists to be by constant reference
* Fixes to formatting
* Refactoring updates for the compiler
* Fix for compilation errors
* Reformatting
* More reformatting
* Moved struct around to help with compilation
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
|
|
|
|
-Lower interfaces into actual `IRInterfaceType` insts.
-Lower `DeclRef<AssocTypeDecl>` into `IRAssociatedType`
-Generate proper IRType for generic functions.
-Add a test case exercising dynamic dispatching a generic static function through an associated type.
-Bug fixes for the test case.
|
|
IRWitnessTable values (#1387)
* Generate IRType for interfaces, and use them as the type of IRWitnessTable values.
This results the following IR for the included test case:
```
[export("_S3tu010IInterface7Computep1pii")]
let %1 : _ = key
[export("_ST3tu010IInterface")]
[nameHint("IInterface")]
interface %IInterface : _(%1);
[export("_S3tu04Impl7Computep1pii")]
[nameHint("Impl.Compute")]
func %Implx5FCompute : Func(Int, Int)
{
block %2(
[nameHint("inVal")]
param %inVal : Int):
let %3 : Int = mul(%inVal, %inVal)
return_val(%3)
}
[export("_SW3tu04Impl3tu010IInterface")]
witness_table %4 : %IInterface
{
witness_table_entry(%1,%Implx5FCompute)
}
```
* Fixes per code review comments.
Moved interface type reference in IRWitnessTable from their type to operand[0].
* Fix typo in comment.
|
|
* CPPCompiler -> DownstreamCompiler
* Added DownstreamCompileResult to start abstraction such that we don't need files.
* * Split out slang-blob.cpp
* Made CompileResult hold a DownstreamCompileResult - for access to binary or ISlangSharedLibrary
* Keep temporary files in scope.
* Add a hash to the hex dump stream.
* Move all file tracking into DownstreamCompiler.
* WIP support for nvrtc.
* WIP: Adding support for nvrtc compiler.
Adding enum types, wiring up the nvrtc into slang.
* Fix remaining CPPCompiler references.
* Fix order issue on target string matching.
* Use ISlangSharedLibrary for nvrtc.
* Use DownstreamCompiler for nvrtc.
* WIP first pass at compilation win nvrtc.
* Added testing if file is on file system into CommandLineDownstreamCompiler.
Added sourceContentsPath.
* Make test cuda-compile.cu work by just compiling not comparing output.
* Genearlize DownstreamCompiler usage.
* Fix warning on clang.
* Remove CompilerType from DownstreamCompiler.
* Use DownstreamCompiler interface for all compilers.
NOTE for FXC, DXC and GLSLANG this doesn't mean using 'compile' - it's still extracting functions from shared library.
* Replace DownstreamCompiler::SourceType -> SlangSourceLanguage
* Replace _canCompile with something data driven.
* Fix compiling on gcc/clang for DownstreamCompiler.
* Moved some text conversions into DownstreamCompiler.
* Fix problem on non-vc builds with not having return on locateCompilers for VS.
* Change so no warning for code not reachable on locateCompilers for vs.
* WIP: CUDA code generation - currently just using CPU layout and HLSL.
* emitXXXForEntryPoint -> emitEntryPointSource
emitSourceForEntryPoint -> emitEntryPointSourceFromIR
Fix up generating cuda to get PTX.
* WIP emitting cuda for IR.
* Small improvements to CUDA ouput.
* Disable the CUDA emit test, as output not currently compilable.
|
|
* * Added ConstArrayView
* Made StringSlicePool have styles
* Remove point about strings not having terminating 0 (they do), and restriction around ""
* spCalcStringHash -> spComputeStringHash
* Small code improvements.
Closer to coding conventions.
* Fix small bug with Empty adding c string.
* Fix typo in assert.
* Fix ArrayView compiling issue on gcc/clang.
* Remove tabs.
* Improve comments around StringSlicePool.
Simplify getting the added slices.
|
|
* WIP getStringHash
* Have a use.
* Add slang-string-hash.h/.cpp
* Use StringSlicePool for holding strings for StringHash.
Add outputBuffer to string-literal-hash.slang so value can be tested.
Ignore the GlobalHashedStringLiterals instruction on emit.
* Add all the hashed string literals to ProgramLayout.
* Add reflection support for hashed string literals to reflection test.
* Fix string literal hash test.
* Small fixes to pass test suite.
* Fix issue in serialization where IRUse is not correctly initialized.
* Fix problem initializing IRUse for string hash pass.
Remove hack from slang-ir-specialize - specially handling if user is not null.
* * Use shared builder when replacing getStringHash
* Comments for functions in slang-ir-string-hash
* Do not allow zero length string literals. Could be allowed, but doing so would require StringSlicePool to have a special case (or some other mechanism)
|
|
* Initial work for "global generic value parameters"
The main new feature here is support for the `__generic_value_param` keyword, which introduces a *global generic value parameter*.
For example:
__generic_value_param kOffset : uint = 0;
This declaration introduces a global generic value parameter `kOffset` of type `uint` that has a nominal default value of zero.
The broad strokes of how this feature was added are as follows:
* A new `GlobalGenericValueParamDecl` AST node type is introduces in `slang-decl-defs.h`
* A new `parseGlobalGenericValueParamDecl` subroutine is added to `slang-parser.cpp`, and is added to the list of declaration cases as the callback for the `__generic_value_param` name.
* Cases for `GlobalGenericValueParamDecl` are added to the declaration checking passes in `slang-check-decl.cpp`, mirroring what is done for other variable declaration cases.
* A case for `GlobalGenericValueParamDecl` is aded to the `Module::_collectShaderParams` function, so that it is recognized as a kind of specialization parameter. This introduces a specialization parameter of flavor `SpecializationParam::Flavor::GenericValue` (which was already defined before this change, although it was unused).
* A case for `SpecializationParam::Flavor::GenericValue` is added in `Module::_validateSpecializationArgsImpl` to check that a specialization argument represents a compile-time-constant value (not a type).
* A case for `GlobalGenericValueParmDecl` is introduced in `slang-lower-to-ir.cpp` that introduces a global generic parameter in the IR
* The `IRBuilder` is extended to support creating `IRGlobalGenericParam`s for the distinct cases of type, witness-table, and value parameters. The same IR instruction type/opcode is used for all cases, and only the type of the IR instruction differs.
* The existing mechanisms for lowering specialization arguments to the IR, and doing specialization on the IR itself Just Work with global generic value parameters since they already support value parameters on explicit generic declarations.
That's the santized version of things, but there were also a bunch of cleanups and tweaks required along the way:
* The `SpecializationParam` type was extended to also track a `SourceLoc` to help in diagnostic messages, which meant some churn in the code that collects specialization parameters.
* The `_extractSpecializationArgs` function is tweaked to support any kind of "term" as a specialization argument (either a type or a value).
* To allow *parsing* specialization arguments that can't possibly be types (e.g., integer literals) we replace the existing `parseTypeString` routine with `parseTermString` and then in `parseTermFromSourceFile` call through to a general case of expression parsing (which can also parse types) rather than only parsing types directly.
* Right before doing back-end code generation, we check if the program we are going to emit has remaining (unspecialized) parameters, in which case we emit a diagnostic message for the parameters that haven't been specialized rather than go on to emit code that will fail to compile downstream.
* Within the `render-test` tool we collapse down the arrays that held both "generic" and "existential" specialization arguments, so that we just have *global* and *entry-point* specialization argument lists. This mirrors how Slang has worked internally for a while, but the difference hasn't been important to the test tool because no tests currently mix generic and existential specialization. The logic for parsing `TEST_INPUT` lines has been streamlined down to just the global and entry-point cases, but the pre-existing keywords are still allowed so that I don't have to tweak any test cases.
There are several significant caveats for this feature, which mean that it isn't really ready for users to hammer on just yet:
* There is no support for `Val`s of anything but integers, so there is no way to meaningfully have a generic value param with a type other than `int` or `uint`.
* We allow for a default-value expression on global generic parameters, but do not actually make use of that value for anything (e.g., to allow a programmer to omit specialization arguments), nor check that it meets the constraints of being compile-time constant.
* Global generic value parameters are *not* currently being treated the same as explicit generic parameters in terms of how they can be used for things like array sizes or other things that require constants. This will probably be relaxed at some point, but allowing a global generic to be used to size an array creates questions around layout.
* The IR optimization passes in Slang currently won't eliminate entire blocks of code based on constant values, so using a global generic value parameter to enable/disable features will *not* currently lead to us outputting drastically different HLSL or GLSL. That said, we expect most downstream compilers to be able to handle an `if(0)` well.
* Fix regression for tagged union types
The change that made specialization arguments be parsed as "terms" first, and then coerced to types meant that any special-case logic that is specific to the parsing of types would be bypassed and thus not apply.
Most of that special-case logic isn't wanted for specialization arguments, since it pertains to cases were we want to, e.g, declare a `struct` type while also declaring a variable of that type.
The one special case that *is* useful is the `__TaggedUnion(...)` syntax, which is the only way to introduce a tagged union type right now.
In order to get that case working again, all I had to do was register the existing logic for parsing `__TaggedUnion` as an expression keyword with the right callback, and the existing logic in expression parsing kicks in (that logic was already handling expression keywords like `this` and `true`).
I left in the existing logic for handling `__TaggedUnion` directly where types get parsed, rather than try to unify things.
A better long-term fix is to make the base case for type parsing route into `parseAtomicExpr` so that the two paths share the core logic.
That change should probably come as its own refactoring/cleanup, because it creates the potential for some subtle breakage.
* fixup: typo
|
|
* Initial work on direct emission of SPIR-V
This change adds a first vertical slice of support for emitting SPIR-V code directly from the Slang IR, instead of generating it indirectly via GLSL.
This work isn't usable for anything valuable right now; the goal is just to get something checked in that we can incrementally extend over time.
When invoking `slangc`, the `-emit-spirv-directly` option can be used to turn on the new code path.
I have not bothered to add an equivalent API option, because this flag is only intended to be used for testing in the immediate future.
The existing `emitEntryPoint()` function has become `emitEntryPointSource()` to more accurately reflect its role in a world where we can also emit entry points to a binary format.
Much of the logic that was inside `emitEntryPoint()` had to do with linking and then optimizing/transforming Slang IR code to get it ready for emission on a particular target.
This logic has been factored into a new `linkAndOptimizeIR()` function that can be shared between the path that emits source and the new one that emits SPIR-V.
The meat of the change is then the `emitSPIRVFromIR()` function in `slang-emit-spirv.cpp`, which is called *after* all the optimizations and transformations have been applied to the Slang IR to get it ready.
Rather than repeat myself here, I will try to make the comments in `slang-emit-spirv.cpp` usable as documentation of the approach being taken.
Smaller notes:
* I've included a test case that compares `slangc` output directly to expected SPIR-V. This is perhaps not an ideal plan for how to test SPIR-V emission going forward, but it suffices for now.
* The `external/` directory needed to be added to the include dirs for the `slang` project so that the new code can depend on the SPIR-V header.
* In `slang-ir-link`, the direct SPIR-V generation path means that we now link with a target of SPIR-V instead of GLSL. In principle this can be used to ensure that appropriate variants of intrinsics are selected based on the knowledge that we are emitting SPIR-V. In practice, that isn't being used at all.
* Fixup: path for SPIR-V headers
While working on this PR I used a copy of `spirv.h` that I placed into the repository tree manually, but since I started the work we ended up with SPIR-V headers in our tree anyway, albeit at a different path.
This change tries to fix things up so that my code uses the headers that were already placed in the repository.
* fixup; 64-bit build issue
* fixup: typo fixes based on review
|
|
* Added RiffReadHelper
* Move type to fourCC in Chunk simplifies some code.
* Make MemoryArena able to track external blocks.
Allow ownership of Data to vary.
Changed IR serialization to use moved allocations to avoid copies.
As it turns out all of the array writes could use unowned data, but doing so requires the IRData to stay in scope longer than IRSerialData, which it does at the moment - but perhaps needs better naming or a control for the feature.
* Write out slang-module container.
* WIP on -r option.
Loading modules - with -r.
* Making the serialized-module run (without using imported module).
* Split compiling module from the test.
* Separate module compilation with a function working.
* Remove serialization test as not used.
* Fix warning on gcc.
* Updated test to have types across module boundary.
* Allow entry point declaration.
A test that tries to build with just an entry point declaration and a module.
* Try to make link work with multiple modules.
* Multi module linking first pass working.
* Multi module test working with -module-name option
* Added feature to repro manifest of approximation of command line that was used.
* Use isDefinition - for determining to add decorations to entry point lowering.
* Added support for repo-file-system.h
More precise control of CacheFileSystem.
Allow RelativeFileSystem to strip paths optionally.
Use canonical paths in PathInfo cache.
Fix bug in -D options for command line output of StateSerailizeUtil
* Add missing slang-options.h
* Fix bug in bit slang-state-serialize.cpp with bit removal.
* Added documentation around -repro-file-system
Added spLoadReproAsFileSystem function.
* Fix warning.
* spAddLibraryReference
* * Add support for slang-lib extension
* Container output when using -no-codegen option
* Use the m_containerFormat to determine if the module container is constructed.
Store the result in a blob. This allows for potential access via the API.
Write the blob if a filename is set.
Use m_ prefix for container variables.
* Added spGetContainerCode.
Made spGetCompileRequestCode work.
* * Put obfuscateCode on linkage
* Remove obfuscation from variable names - as can be achieved by either stripping and/or removing NameHintDecorations at lowering
* Remove name hints being added during lowering
* Add stripping of SourceLoc location in strip phase
* Hashing of linkage import/export names.
* Do final strip in emitEntryPoint, removes any remaining SourceLoc.
* Support for [__extern] to mark struct/function that are defined elsewhere.
* Allow adding extern to any decl.
* Use ExternAtrtibute to apply import decoration, rather than use an ir extern decoration.
* Added a test for [__extern]
* Improved comment around [__extern]
|
|
* Added RiffReadHelper
* Move type to fourCC in Chunk simplifies some code.
* Make MemoryArena able to track external blocks.
Allow ownership of Data to vary.
Changed IR serialization to use moved allocations to avoid copies.
As it turns out all of the array writes could use unowned data, but doing so requires the IRData to stay in scope longer than IRSerialData, which it does at the moment - but perhaps needs better naming or a control for the feature.
* Write out slang-module container.
* WIP on -r option.
Loading modules - with -r.
* Making the serialized-module run (without using imported module).
* Split compiling module from the test.
* Separate module compilation with a function working.
* Remove serialization test as not used.
* Fix warning on gcc.
* Updated test to have types across module boundary.
* Allow entry point declaration.
A test that tries to build with just an entry point declaration and a module.
* Try to make link work with multiple modules.
* Multi module linking first pass working.
* Multi module test working with -module-name option
* Added feature to repro manifest of approximation of command line that was used.
* Use isDefinition - for determining to add decorations to entry point lowering.
* Added support for repo-file-system.h
More precise control of CacheFileSystem.
Allow RelativeFileSystem to strip paths optionally.
Use canonical paths in PathInfo cache.
Fix bug in -D options for command line output of StateSerailizeUtil
* Add missing slang-options.h
* Fix bug in bit slang-state-serialize.cpp with bit removal.
* Added documentation around -repro-file-system
Added spLoadReproAsFileSystem function.
* Fix warning.
* spAddLibraryReference
* * Add support for slang-lib extension
* Container output when using -no-codegen option
* Use the m_containerFormat to determine if the module container is constructed.
Store the result in a blob. This allows for potential access via the API.
Write the blob if a filename is set.
Use m_ prefix for container variables.
* Added spGetContainerCode.
Made spGetCompileRequestCode work.
* * Put obfuscateCode on linkage
* Remove obfuscation from variable names - as can be achieved by either stripping and/or removing NameHintDecorations at lowering
* Remove name hints being added during lowering
* Add stripping of SourceLoc location in strip phase
* Hashing of linkage import/export names.
* Do final strip in emitEntryPoint, removes any remaining SourceLoc.
|
|
* Added RiffReadHelper
* Move type to fourCC in Chunk simplifies some code.
* Make MemoryArena able to track external blocks.
Allow ownership of Data to vary.
Changed IR serialization to use moved allocations to avoid copies.
As it turns out all of the array writes could use unowned data, but doing so requires the IRData to stay in scope longer than IRSerialData, which it does at the moment - but perhaps needs better naming or a control for the feature.
* Write out slang-module container.
* WIP on -r option.
Loading modules - with -r.
* Making the serialized-module run (without using imported module).
* Split compiling module from the test.
* Separate module compilation with a function working.
* Remove serialization test as not used.
* Fix warning on gcc.
* Updated test to have types across module boundary.
* Allow entry point declaration.
A test that tries to build with just an entry point declaration and a module.
* Try to make link work with multiple modules.
* Multi module linking first pass working.
* Multi module test working with -module-name option
* Use isDefinition - for determining to add decorations to entry point lowering.
|
|
* Added RiffReadHelper
* Move type to fourCC in Chunk simplifies some code.
* Make MemoryArena able to track external blocks.
Allow ownership of Data to vary.
Changed IR serialization to use moved allocations to avoid copies.
As it turns out all of the array writes could use unowned data, but doing so requires the IRData to stay in scope longer than IRSerialData, which it does at the moment - but perhaps needs better naming or a control for the feature.
* Write out slang-module container.
* WIP on -r option.
Loading modules - with -r.
* Making the serialized-module run (without using imported module).
* Split compiling module from the test.
* Separate module compilation with a function working.
* Remove serialization test as not used.
* Fix warning on gcc.
* Updated test to have types across module boundary.
|
|
This change builds on previous work that moves toward a more IR-based representation of layout.
Those steps added some instructions for representing layout in the IR (initially just proxies for the AST layout objects), and an explicit lowering pass that could build a target-specific IR module that binds parameters and entry points to layout information.
This change aims to complete that work, in the sense that the IR representation of layout is now self-contained and does not rely on having pointers back into the AST-level representation.
Achieving this requires two main kinds of work:
1. Update any code that used layout information derived from the IR (most notably all the `slang-emit-*` code) to use the new IR representation and its accessors.
2. Update any code that *constructs* layouts using information derived from the IR to construct IR layouts instead.
The biggest new infrastructure feature in this change is support for "attributes" in the IR (I'd welcome feedback on the naming).
An attribute can either be thought of like key/value arguments that can be added to certain instructions to encode optional data, or alternatively like a decoration that is referenced as an operand instead of a child.
The value of attributes over decorations is that they can affect the hash/identity of an instruction (which decorations can't), while the advantage of decorations is that they can easily be added/removed over the lifetime of an instruction (which attributes can't).
We mostly use them here to represent operands that are logically optional.
Once attributes are available, the encoding of layout information into the IR is mostly straightforward:
* An `IRVarLayout` has a fixed operand for its type layout, and can accept a few different attributes
* Zero or more `IRVarOffsetAttr`s that specify the offset of the variable for a given resource kind. These are equivalent to the `VarLayout::ResourceInfo`s at the AST level.
* An optional `IRUserSemanticAttr` and `IRSystemValueSemanticAttr` to represent the (possibly derived) semantic of a varying input/output parameter.
* An option `IRStageAttr` to represent the known stage for a parameter.
* An `IREntryPointLayout` has a var layout for the entry point parameters (logically grouped in to a struct) and another var layout for the result parameter.
* There is a small type hierarchy rooted at `IRTypeLayout` where each subtype can add fixed operands and attributes that are expected to appear. It also supports `IRTypeSizeAttr`s that serve a similar role to the `IRVarOffsetAttr`s.
* Structure types maintain the mapping of fields to their var layouts using `IRStructFieldLayoutAttr`s.
With the encoding in place, most of the changes in category (1) (code that just *uses* rather than *creates* layouts) was straightforward. The biggest different beyond name changes was that everything needs to be fetched using accessors instead of bare fields. It would have been possible to stage this commit and make the diffs smaller by first introducing mandatory acessors to the AST layout types.
The changes in category (2) were more involved. There were a lot of places in the existing code where a `TypeLayout` or `VarLayout` would be created, and then initialized piecemeal over several lines of code (and sometimes even across functions). Because of the way that layouts need to support many optional properties, it did not seem practical to just have monolithic factory functions that took all the options as arguments, so I instead opted for a builder approach.
The builders for `IRVarLayout` and `IREntryPointLayout` are both straightforward, and honestly there is no realy need for a builder for entry point layouts right now, but I was trying to future-proof in case we decidd to add some optional attributes to them.
The builders for type layouts are more involved because of the inheritance hierarchy. Each concrete sub-type of type layout needs to define its own builder type that customizes the opcode, operands, and attributes of the final instruction.
The refactoring that had to go into this change was a nice excuse to clean up a few ugly warts in the AST layout code that were largely there to support IR use cases. While this change adds a lot of new infrastructure code to the IR, most of the client code has stayed the same or gotten simpler.
One annoying wart that remains with this change is the notion of an "offset element type layout" for parameter group types. That idea was added to deal with a legacy feature in the reflection API that we realized was a mistake, but unfortunately having that "offset" layout handy made writing a few other pieces of code simpler so that there are use cases of the feature even in the IR. Removing those uses is do-able, but requires careful refactoring so it is best left to a follow-on change.
Another thing that could be considered for a follow-on change is how much information should be specified when constructing a `Builder` for an IR type layout, and how much should be allowed to be specified statefully/piecemeal. It would be nice to force all the required operands to be specified up front, but `IRParameterGroupTypeLayout::Builder` doesn't currently work that way because so much of the client code that needs it involved a lot of stateful setting and would need to be refactored heavily to provide the necessary information up front.
|
|
* Initial work on representing layout at IR level
This change starts the process of making the back-end of the compiler independent of the AST-level layout information (`TypeLayout`, `VarLayout`, etc.) so that it instead only relies on layout information that is embedded into IR modules. This brings us incrementally closer to a world in which the back-end could be run without the AST-level structures even existing (e.g., for an application that just wants to ship IR without any AST information for IP protection, while still supporting some amount of linking and specialization).
The main parts of the change are:
* There is a bunch of incidental churn related to specifying entry points by index instead of the `EntryPoint` object for certain operations. This ends up being a better choice because we can use the index to look up side-band information about the entry point that might not be stored on the `EntryPoint` object itself. In particular...
* We expand the `ComponentType` interface to support looking up the mangled name of an entry point by index. In common cases (no generic/interface specialization) this would be the same as asking the `EntryPoint` for its mangled name, but in cases where we have specialized a generic entry point, the mangled name would include speicalization arguments that are only available on the `SpecializedComponentType` that wraps the entry point. This part of the change isn't ideal and there might be a better solution waiting to be invented. Note that we store mangled entry point names as strings rather than using `DeclRef`s because that ensures that the information could be serialized and deserialized without a dependence on the AST.
* The `TargetProgram` type (which represents binding a specific `ComponentType` for a shader program to a specific `TargetRequest` that represents the target platform) is expanded to include an `IRModule` that represents layout information, in addition to the AST-level `ProgramLayout` it already contained. We create both of these objects at the same time (on-demand) to simplify the overall flow (so that any code that triggers creation of the AST-level layout will also ensure that the IR-level layout exists).
* A bunch of code in the emit passes that was passing down layout-related objects has been eliminated. It appears that most of those objects weren't actually being used, so this is just a cleanup, but it helps ensure that the back-end steps are "clean" and don't depend on the AST-level information. The one big exception here is that the emit logic needs to know the stage for the entry point being emitted (to deal with one wrinkle in translating DXR to VKRT).
* A big change (actually introduced by @jsmall-nvidia in a branch that this change copied and then built from) is to introduce some more explicit IR instructions to represent layout information, notably an `IRTypeLayout` and an `IRVarLayout`. For now these objects still reference their AST equivalents, but the separation gives us an incremental path to move information from the AST-level objects over to the IR ones. This work includes logic in `IRBuilder` to construct the IR-level layout objects from the AST-level ones on-demand, so that the existing code paths that try to attach AST-level layout will continue to work for now.
* Because layout information is now embedded in the IR, the `slang-ir-link.cpp` logic loses a lot of cases that used to deal with attaching AST-level layout objects to IR-level instructions during the linking process. Instead, the linker now assumes that one (or more) of the input IR modules will have layout information associated with it, and the linker makes sure to copy layout decorations (and the instructions they reference) from the input IR module(s) to the output using its more ordinary mechanisms.
* Inside `slang-lower-to-ir.cpp`, we add logic to construct an IR module in a `TargetProgram` that simply references the global shader parameters, entry points, etc. and attaches IR layout decorations to them. This is akin to the existing pass in the same file that constructs IR to represent specialization information, and both of these passes share infrastructure with the main AST->IR lowering pass. Eventually, it is expected that this pass will encompass more of the logic for copying AST-level layout information over to IR-level equivalents.
* One small wrinkle with this change was that the output for an HLSL generation test case changed some of its `#line` directives. The old code was actually more inaccurate than the new, so this change just updated the baseline. It also added some logic in the linker to make sure that when an IR instruction has multiple definitions, we try to pick up a source location from any of them, in case the "main" one somehow didn't get a location.
* Another small fix was that the key/value map in `StructTypeLayout` for mapping fields/members to their layouts was keyed on `Decl*` when it really should have been `VarDeclBase*`.
This change should in principle be a pure refactoring with no functionality changes, so no new tests were added. It is unfortunately also a change that has a high probability of breaking at least *some* client code, so we may want to be defensive and mark this with a new major version number (well, a new *minor* version number since we are pre-`1.0`) to give us some room for releasing hotfixes to the old version if needed.
* fixup: infinite recursion bug detected by clang
* fixup: remove commented-out code
|
|
* Revise new COM-lite API
This change revises the "COM-lite" API that was recently introduced to try to streamline it and introduce some missing central/base concepts.
The central new abstraction in the API is the notion of a "component type," which is a unit of shader code composition. A component type can have:
* IR code for some number of functions/types/etc.
* Zero or more global shader parameters
* Zero or more "entry point" functions at which execution can start
* Zero or more "specialization" parameters (types or values that must be filled in before kernel code can be generated)
* Zero or more "requirements" (dependencies on other component types that must be satisfied before kernel code can be generated)
Both individual compiled modules, and validated entry points are then examples of component types, and we additionally define a few services that apply to all component types:
* We can take N component types and compose them to create a new component type that combines their code, shader parameters, entry points, and specialization parameters. A composed component type may also include requirements from the sub-component types, but it is also possible that by composing thing we satisfy requirements (if `A` requires `B`, and we compose `A` and `B`, then the requirement is now satisfied, and doesn't appear on the composite).
* We can take a component type with N specialization parameters, and specialize it by giving N compatible specialization arguments. The result of specialization is a new component type with zero specialization parameters. Under the right circumstances the specialzed component type will be layout compatible with the unspecialized one.
* One more example that isn't exposed in the public API today is that we can take a component with requirements and "complete" it by automatically composing it with component types that satisfy those requirements. This can be seen as a kind of linking step that pulls together the transitive closure of dependencies.
* We can query the layout for the shader parameters and entry points of a component type, for a specific target.
* We can query compiled kernel code for an entry point in a component type (for a specific target). This only works for component types with zero specialization parameters and zero requirements.
The idea is that by giving users a fairly general algebra of operations on component types, they can compose final programs in ways that meet their requirements. For example, it becomes possible to incrementally "grow" a component type to represent the global root signature for ray tracing shaders as new entry points are added, in such a way that it always stays layout-compatible with kernels that have already been compiled.
Much of the implementation work here is in implementing the unifying component type abstraction, and in particular re-writing code that used to assume a program consisted of a flat list of modules and entry points to work with a hierarchical representation that reflects the underlying algebra (e.g., with types to represent composite and specialized component types).
There's also a hidden "legacy" case of a component type to deal with some legacy compiler behaviors that can't be directly modeled on top of the simple algebra with modules and entry points.
This API is by no means feature-complete or fully developed. It is expected that we will flesh it out more when bringing up application code (e.g., Falcor) on top of the revamped API.
One notable thing that went away in this change is explicit support for "entry point groups" and notions of local root signatures (especially the Falcor-specific handling of the `shared` keyword, which a previous change turned into an explicitly supported feature). With the new "building blocks" approach, it should be possible for a DXR application to deal with local root signatures as a matter of policy (on top of the API we provide). If/when we need to provide some kind of emulation of local root signatures for Vulkan (and/or if Vulkan is extended with an explicit notion of local root signatures), we might need to revisit this choice.
* Fix debug build
There was invalid code inside an `assert()`, so the release build didn't catch it.
* fixup: warnings
* fixup: more warnings-as-errors
* fixup: review notes
* fixup: use component type visitors in place of dynamic casting
|