| Age | Commit message (Collapse) | Author |
|
The syntax for this is a placeholder for now, since we will probably want to migrate to whatever gets decided on for dxc. To declare that some data should be part of the "shader record" use `layout(shaderRecordNV)` to mirror the GLSL raytracing extension:
```hlsl
layout(shaderRecordNV)
cbuffer MyShaderRecord
{
float4 someColor;
uint someValue;
}
```
The intention (not enforced) is that an application would map `MyShaderRecord` to "root constants" in the "local root signature" when compiling for DXR, while the output code in GLSL will always map to the shader record in Vulkan raytracing:
```glsl
layout(shaderRecordNV)
buffer MyShaderRecord
{
float4 someColor;
uint someValue;
};
```
This change does *not* support declaring a global value of `struct` type with `layout(shaderRecordNV)` (or a `ParameterBlock` with the modifiers, although that would be a nice-to-have feature) and it does *not* support having the contents of the shader record be mutable (even if GLSL/Vulkan allows it). Those can/should be added in future changes.
In terms of implementation, this closely mirrors the way that `layout(push_constant)` buffers were being handled, where the data inside the `ConstantBuffer<X>` (the value of type `X`) gets laid out using ordinary rules (and consuming ordinary `UNIFORM` storage, while the buffer itself is given a different layout resource to reflect that fact that it does not consume a VK `binding` any more, but a different conceptual resource.
Note: an alternative design here (that might actually be preferrable) would be to have both push-constant and shader-record buffers be handled as alternative aliases for `ConstantBuffer` (or maybe `ParameterBlock`) so that you have, e.g.:
```hlsl
PushConstantBuffer<X> myPushConstants;
ShaderRecord<Y> myShaderRecord;
```
This alternative design avoids API-specific decorations on the declarations, and reflects the intent of the programmer very directly, even when they are compiling for a target like D3D that doesn't reflect these choices at the IL level (it could still be exposed through the Slang reflection API).
|
|
The change here is that the logical that used to be controlled by `-parameter-blocks-use-register-spaces` is now turned on unconditionally, meaning that a `ParameterBlock<X>` will get its own register `space` by default when targetting D3D Shader Model 5.1 and later.
I had originally made this feature optional because I wasn't sure whether Shader Model 5.1 should default to using register spaces or not, because D3D 11 doesn't support spaces at the API level and MSDN documetnation made it sound like SM5.1 was available for D3D11 as well. Subsequent reading has led me to understand that MSDN is wrong on this front, and SM5.1 and later are D3D12-only, so it is always safe to use spaces.
The new logic is now that we automatically use spaces for parameter blocks any time it is possible (SM5.1+ and any Vulkan target), and otherwise fall back to not using spaces (SM5.0 and earlier).
I updated a reflection test case that was covering parameter blocks to confirm the output differs between SM5.0 and 5.1.
|
|
spSessionCheckCompileTargetSupport. (#728)
* Improved return codes from spSessionCheckCompileTargetSupport
|
|
* First pass support for early depth stencil.
* Add a simple test to check if output has attributes.
* Use cross compilation to test [earlydepthstencil] on glsl.
* If target is dxil, use dxc to test against.
Add hlsl to test earlydepthstencil against.
* * Added spSessionHasCompileTargetSupport
* Made slang-test use spSessionHasCompileTargetSupport to ignore tests that cannot run
|
|
* Add support for unbounded arrays as shader parameters
With this change, Slang shaders can use unbounded-size arrays as parameters, e.g.:
```hlsl
Texture2D t[] : register(t3, space2);
SamplerState s[];
```
As shown in the above example, Slang supports both explicit `register` declarations on unbounded-size arrays and also implicit binding.
When doing automatic parmaeter binding, Slang will allocate a full register space to an unbounded-size array of textures/smaplers, starting at register zero.
Note that for the Vulkan target, an array of descriptors of any size (including unbounded size) consumes only a single `bindign`, so much of this logic is specific to D3D targets.
Details on the changes made:
* The single biggest change is a new `LayoutSize` type that is used to store a value that can either be a finite unsigned integer or a dedicated "infinite" value (which is stored as the all-bits-set `-1` value). This is used in places where a size could either be a finite value or an "unbounded" value, to both try to make standard math robust against the infinite case, and also to force code to deal with both the finite and infinite cases more explicitly when they care about the difference.
* The public API was documented so that unbounded-size arrays report their size as `-1`. We should probably change this function to return a signed value instead of `size_t`, but that would technically be a source-breaking change, so we want to make sure we stage it appropriately.
* The code that invokes fxc was updated so that it passes the appropriate flag to enable unbounded arrays of descriptors. I haven't looked yet at whether dxc needs such a flag, so there may need to be a follow-on change to add that.
* The logic in the `UsedRanges::Add` method for tracking what registers have been claimed was rewritten because the previous version had some subtle bugs. The new version includes more detailed comments that attempt to explain why I think the new logic works.
* The top-level logic for auto-assigning bindings to parameters has been overhauled to deal with the fact that a parameter that needs "infinite" amounts of a resource should be claiming a full register space for those resources instead. Whenever a parameter allocates any register spaces we want them all to be contiguous, so we have a loop that counts the requirements and allocates the spaces before we go along and dole them out.
* When computing the layout for an array type, we need to carefully deal with unbounded-size arrays. In the case of an unbounded array of a "simple" resource type (e.g., `Texture2D[]`), we opt to expose the type layout as consuming an infinite number of the appropriate register, while in the case of a complex type (say, a `struct` with two texture fields), we need to instead allocate whole spaces for those fields. The logic here is more subtle than I would like, and interacts with the existing code that "adjusts" the element type of an array in order to make standard indexing math Just Work.
* Similarly, when a `struct` type has unbounded-array fields, then we need to transform any field with infinite register requirements to instead consume a space in the resulting aggregate type. This case is comparatively easier than the array case.
* The test case for unbounded arrays covers both explicit and implicit bindings, and also the case of an unbounded array over a `struct` type (it does not cover the case of a `struct` contianing unbounded arrays, so that will need to be added later). For this test we are both validation the output reflection data and that we produce the same code as fxc (with explicit bindings in the fxc case).
* The reflection test app was modified to use the new API contract and detect when a parameter consumes `SLANG_UNBOUNDED_SIZE` resources.
* Fixup: ensure unbounded size is defined at right bit width
|
|
* Add callable shader support for Vulkan ray tracing
This change extends the previous work to update Vulkan ray tracing support for the finished `GL_NV_ray_tracing` spec.
One of the features missing in the experimental extension that was added to the final spec is "callable shaders," which allow ray tracing shaders to call other shaders as general-purpose subroutines.
Most of the implementation work here mirrors what was done for the `TraceRay()` function to map it to `traceNV()`.
We map the generic `CallShader<P>` function to the non-generic `executeCallableNV`, with a payload identifier that indicates a specific global variable of type `P` (the global variable being generated from a `static` local in `CallShader`). A new modifier is added to identify the payload structure, and the parameter binding/layout logic introduces a new resource kind for callable-shader payload data (where previously the logic had assumed ray and callable payloads should use the same resource kind).
Two test shaders are included: one for the callable shader (`callable.slang`) and one for a ray generation shader that calls it (`callable-caller.slang`). Just for kicks, the payload data type is defined in a shared file so that we can be sure the two agree (trying to emulate what might be good practice, and ensure that ray tracing support works together with other Slang mechanisms).
* Typo fix: assocaited->associated
One instance was found in review, but I went ahead and fixed a bunch since I seem to make this typo a lot.
* Typo fix: defintiion->definition
|
|
* * Added ISlangSharedLibraryLoader and ISlangSharedLibrary
* Implemented default implementations
* Added slang API function to get/set the ISlangSharedLibraryLoader on the session
* Put function caching onto the Session - so that if the loader is chaged, its easy to reset the shared libraries, and functions
* Run premake.
* Fix problem with setting null, would cause an unnecessary function/shared lib flush.
* * Unload SharedLibrary when DefaultSharedLibrary is deleted.
* Make SharedLibrary handle unload safely if already unloaded.
* Refactor SharedLibrary, such that it becomes a utility class - simplifying it's semantics.
* Simplified ISlangSharedLibrary such that doesn't have unload and isLoaded so easier to implement.
Use updated SharedLibrary impl.
* Disable aarch64 on windows
* Premake windows files without aarch64 build.
* Moved slang-shared-library to core (so can be used in code outside of main slang)
Fixed problem in premake5 where on windows projects were incorrectly constructed
* Allowed RefObject to base class of com types
Added ConfigurableSharedLibraryLoader
Added -dxc-path -fxc-path -glslang-path
Fix problem with dxc-path not honoring it's path when loading dxil
* Added documentation for command line control of dll loading paths.
* Remove some tabbing issues.
* Change name of include guard.
|
|
This change adds an API function and command line options for controlling the default floating-point behavior for a target, with options for "fast" and "precise" computation.
The "precise" option gets mapped to the "IEEE strictness" mode in `fxc` and `dxc` (there is currently no equivalent option for glslang that I could find).
|
|
* Rework command-line options handling for entry points and targets
Overview:
* The biggest functionality change is that the implicit ordering constraints when multiple `-entry` options are reversed: any `-stage` option affects the `-entry` to its *left* instead of to its *right* as it used to. This is technically a breaking change, but I expect most users aren't using this feature.
* The options parsing tries to handle profile versions and stages as distinct data (rather than using the combined `Profile` type all over), and treats a `-profile` option that specifies both a profile version and a stage (e.g., `-profile ps_5_0`) as if it were sugar for both a `-profile` and a `-stage` (e.g., `-profile sm_5_0 -stage fragment`).
* We now technically handle multiple `-target` options in one invocation of `-slangc`, but do not advertise that fact in the documentation because it might be confusing for users. Similar to the relationship between `-stage` and `-entry`, any `-profile` option affects the most recent `-target` option unless there is only one `-target`.
* The logic for associating `-o` options with corresponding entry points and targets has been beefed up. The rule is that a `-o` option for a compiled kernel binds to the entry point to its left, unless there is only one entry point (just like for `-stage`). The associated target for a `-o` option is found via a search, however, because otherwise it would be impossible to specify `-o` options for both SPIR-V and DXIL in one pass.
* The handling of output paths for entry points in the internal compiler structures was changed, because previously it could only handle one output path per entry point (even when there are multiple targets). The new logic builds up a per-target mapping from an entry point to its desired output path (if any).
Details:
* Support for formatting profile versions, stages, and compile targets (formats) was added to diagnostic printing, so that we can make better error messages. This is fairly ad hoc, and it would be nice to have all of the string<->enum stuff be more data-driven throughout the codebase.
* Test cases were added for (almost) all of the error conditions in the current options validation. The main one that is missing is around specifying an `-entry` option before any source file when compiling multiple files. This is because the test runner is putting the source file name first on the command line automatically, so we can't reproduce that case.
* Several reflection-related tests now reflect entry points where they didn't before, because the logic for detecting when to infer a default `main` entry point have been made more loose
* On the dxc path, beefed up the handling of mapping from Slang `Profile`s to the coresponding string to use when invoking dxc.
* A bunch of tests cases were in violation of the newly imposed rules, so those needed to be cleaned up.
* There were also a bunch of test cases that had accidentally gotten "disabled" at some point because there were comparing output from `slangc` both with and without a `-pass-through` option, but that meant that any errors in command-line parsing produced the *same* error output in both the Slang and pass-through cases. This change updates `slang-test` to always expect a successful run for these tests, and then manually updates or disables the various test cases that are affected.
* When merging the updated test for matrix layout mode, I found that the new command-line logic was failing to propagate a matrix layout mode passed to `render-test` into the compiler. This was because the `-matrix-layout*` options were implemented as per-target, but the target was being set by API while the option came in via command line (passed through the API). It seems like we want matrix layout mode to be a global option anyway (rather than per-target), so I made that change here.
* Add missing expected output files
* A 64-bit fix
* Remove commented-out code noted in review
|
|
* First pass at caching file system.
* default-file-system -> slang-file-system
fix problem with location("build.linux") confusing windows build for now.
* Added CompressedResult
Fix problem in Result construction with it being unsigned
* Add support for Path simplification.
* Testing for Path::Simplify.
* Refactored CacheFileSystem - automatically handles ISlangFileSystem or ISlangFileSystemExt appropriately.
Removed WrapFileSystem - because wasn't possible to emulate some of the behavior if just loadFile is implemented.
Split out StringBlob - so that no need to convert between ISlangBlob and String repeatidly.
* Remove unwanted code in ~CompileRequest
|
|
* Added getPathType to ISlangFileSystemExt.
This is needed so that when searching for a file it's existance can be tested without loading the file. On some platforms a getCanonicalPath can do this - but depending on how getCanonicalPath is implemented, it may not do. This test is made after the relative path is produced before finding the canonical path.
* Test for importing along search path.
* Added comment to explain the issue around WrapFileSystem impl of getPathType.
* Make search path use / not \
|
|
* Refactor of path handling.
* Added PathInfo
* Changed ISlangFileSystem - such that has separate concepts of reading a file, getting a relative path and getting a canonical path
* Added support for getting a canonical path for windows/linux
* Made maps/testing around canonicalPaths
* User output remains around 'foundPath' - which is the same as before
* Small improvements around PathInfo
* Added a type and make constructors to make clear the different 'path' uses
* Fixed bug in findViewRecursively
* Checking and reporting for ignored #pragma once.
* Removed SLANG_PATH_TYPE_NONE as doesn't serve any useful purpose.
* Improve comments in slang.h aroung ISlangFileSystem
* Remove the need for <windows.h> in slang-io.cpp
* Ran premake5.
* Improvements and fixes around PathInfo.
* Fix typo on linix GetCanonical
* Make the ISlangFileSystem the same as before, and ISlangFileSystem contain the new methods.
Internally it always uses the ISlangFileSystemExt, and will wrap a ISlangFileSystem with WrapFileSystem, if it is determined (via queryInterface) that it doesn't implement the full interface.
|
|
* Move to newer glslang
* Support cross-compilation of ray tracing shaders to Vulkan
This change allows HLSL shaders authored for DirectX Raytracing (DXR) to be cross-compiled to run with the experimental `GL_NVX_raytracing` extension (aka "VKRay").
* The GLSL extension spec is marked as experimental, so that any shaders written using this support should be ready for breaking changes when the spec is finalized.
* "Callable shaders" are not exposed throug the GLSL extension, so this feature of DXR will not be cross-compiled.
* The experimental Vulkan raytracing extension does not have an equivalent to DXR's "local root signature" concept. This does not visibly impact shader translation (because the local/global root signature mapping is handled outside of the HLSL code), but in practice it means that applications which rely on local root signatures on their DXR path will not be able to use the translation in this change as-is; more work will be needed.
The simplest part of the implementation was to go into the Slang standard library and start adding GLSL translations for the various DXR operations.
In some cases, like mapping `IgnoreHit()` to `ignoreIntersectionNVX()` this is almost trivial.
The various functions to query system-provided values (e.g., `RayTMin()`) were also easy, with the only gotcha being that they map to variables rather than function calls in GLSL, and our handling of `__target_intrinsic` assumes that a bare identifier represents a replacement function name, and not a full expression, so we have to wrap these definitions in parentheses.
The tricky operations are then `TraceRay<P>()` and `ReportHit<A>()`, because these two are generics/templates in HLSL.
GLSL doesn't support generics, even for "standard library" functions, so the raytracing extension implements a slightly complex workaround: the matching operations `traceNVX()` and `reportIntersectionNVX()` pass the payload/attributes argument data via a global variable.
That is, shader code for the GLSL extensions writes to the global variable and then calls the intrinsic function.
The linkage between the call site and the global is established by a modifier keyword (`rayPayloadNVX` and `hitAttributeNVX`, respectively) and in the case of ray payload also uses `location` number to identify which payload global to use (since a single shader can trace rays with multiple payload types).
Our translation strategy in Slang tries to leverage standard language mechanisms instead of special-case logic.
For example, to translate the `ReportHit<A>()` function, we provide both a default declaration that will work for HLSL (where the operation is built-in with the signature given), and a *definition* marked with the `__specialized_for_target(glsl)` modifier.
The GLSL definition declares a function `static` variable that will fill the role of the required global, and then does what the GLSL spec requires: assigns to the global, and then calls the `reportIntersectionNVX` builtin (which we declare as a separate builtin).
Our ordinary lowering process will turn that `static` variable into an ordinary global in the IR, and the `[__vulkanHitAttributes]` attribute on the variable will be emitted as `hitAttributeNVX` in the output.
There is no additional cross-compilation logic in Slang specific to `ReportHit<A>()` - the target-specific definition in the standard library Just Works.
The case for `TraceRay<P>()` is a bit more complicated, simply because the GLSL `traceNVX()` function needs to be passed the `location` for the payload global.
We implement the payload global as a function-`static` variable, with the knowledge that every unique specialization of `TraceRay<P>()` will generate a unique global variable of type `P` to implement our function-`static` variable.
We then add a slightly magical builtin function `__rayPayloadLocation()` that can map such a variable to its generated `location`; the logic for this is implemented in `emit.cpp` and described below.
We also changed the `RayDesc` and `BuiltinTriangleIntersectionAttributes` types from "magic" intrinsic types over to ordinary types (because the GLSL output needs to declare them as ordinary `struct` types).
This ends up removing some cases in the AST and IR type representations.
By itself this change would break HLSL emit, because in that case the types really are intrinsic.
We added a `__target_intrinsic` modifier to these types to make them intrinsic for HLSL, and then updated the downstream passes to handle the notion of target-intrinsic types.
The logic for binding/layout of entry point inputs and outputs was updated so that raytracing stages don't follow the default logic for varying input/output parameters.
This is because the input/output parameters of a raytracing entry point aren't really "varying" in the same sense as those in the rasterization pipeline.
In particular, the SPIR-V model for raytracing input and output treats "ray payload" and "hit attributes" parameters as being in a distinct storage class from `in` or `out` parameters.
We also detect cases where a ray tracing stage declares inputs/outputs that it shouldn't have. This logic could conceivably be extended to other stages (e.g., to give an error on a compute shader with user-defined varying input/output).
The type layout logic added cases for handling raytracing payload and hit-attribute data, but this is currently just a stub implementation that follows the same logic as for varying `in` and `out` parameters (it cannot give meaningful byte sizes/offsets right now).
To my knowledge the GLSL spec doesn't currently specify anything about layout, and I haven't read the DXR spec language carefully enough to know what it says about layout.
A future change should update the layout logic to allow for byte-based layout of ray payloads, etc. so that we can query this information via reflection.
The GLSL legalization logic in `ir.cpp` was updated to factor out the per-entry-point-parameter code into its own function, and then that function was updated to special-case the input/output of a ray-tracing shader.
While for rasterization stages we typically want to take the user-declared input/output and "scalarize" it for use in GLSL (in part to deal with language limitations, and in part to tease system values apart from user-defined input/output), the GLSL spec for raytracing requires payload and hit attribute parameters to be declared as single variables. There is also the issue that even for an `in out` parameter, a ray payload parameter should only turn into a single global, whereas the handling for varying `in out` parameters generates both an `in` and an `out` global for the GLSL case.
Other than the handling of entry point parameters, the GLSL legalization pass doesn't need to do anything special for ray tracing shaders.
The trickiest change in the `emit.cpp` logic is that we now generate `location`s for ray payload arguments (the outgoing from a `TraceRay()` call) on demand during code generation.
This is a bit hacky, and it would be nice to handle it as a separate pass on the IR rather than clutter up the emit logic, but this approach was expedient.
Basically, any of the global variables that got generated from the `static` declarations in the standard library implementation of `TraceRay()` will trigger the logic to assign them a `location`.
The logic for emitting intrinsic operations added a few new `$`-based escape sequences. The `$XP` case handles emitting the location of a generated ray payload variable; this is how we emit the matching location at the site where we call `traceNVX`. The `$XT` case emits the appropriate translation for `RayTCurrent()` in HLSL, because it maps to something different depending on the target stage.
All of the test cases here consist of a pair of an HLSL/Slang shader written to the DXR spec, plus a matching GLSL shader for a baseline.
The GLSL shaders are carefully designed so that when fed into glslang they will produce the same SPIR-V as our cross-compilation process.
This kind of testing is quite fragile, but it seems to be the best we can do until our testing framework code supports *both* DXR and VKRay.
A bunch of the core changes ended up being blocked on issues in the rest of the compiler, so some additional features go implemented or fixed along the way:
The first big wall this work ran into was that the `__specialized_for_target` modifier hasn't actually been working correctly for a while.
It turns out that for the one function that is using it, `saturate()`, we have been outputting the workaround GLSL function in *all* cases (including for HLSL output) rather than only on GLSL targets.
The problem here is that for a generic function with a `__specialized_for_target` modifier or a `__target_intrinsic` modifier, the IR-level decoration will end up attached to the `IRFunc` instruction nested in the `IRGeneric`, but the logic for comparing IR declarations to see which is more specialized (via `getTargetSpecializationLevel()`) was looking only at decorations on the top-level value (the generic).
The quick (hacky) fix here is to make `getTargetSpecializationLevel()` try to look at the return value of a generic rather than the generic itself, so that it can see the decorations that indicate target-specific functions.
A more refined fix would be to attach target-specificity decorations to the outer-most generic (to simplify the "linking" logic).
The only reason not to fold that into the current fix is that the `__target_intrinsic` modifier currently serves double-duty as a marker of target specialization *and* information to drive emit logic. The latter (the emit-related stuff) currently needs to live on the `IRFunc`, and moving it to the generic could easily break a lot of code.
This needs more work in a follow-on fix, but for now target specialization should again be working.
The other big gotcha that the simple "just use the standard library" strategy ran into was that function-`static` variables weren't actually implemented yet, and in particular function-`static` variables inside of generic functions required some careful coding.
The logic in `lower-to-ir.cpp` has this `emitOuterGenerics()` function that is supposed to take a declaration that might be nested inside of zero or more levels of AST generics, and emit corresponding IR generics for all those levels.
This is needed because two different AST functions nested inside a single generic `struct` declaration should turn into distinct `IRFunc`s nested in distinct `IRGeneric`s.
The tricky bit to making that all work is that the same AST-level generic type parameter will then map to *different* IR-level instructions (the parameters of distinct `IRGeneric`s) when lowering each function.
The existing logic handled this in an idiomatic way by making "sub-builders" and "sub-contexts."
This change refactors some of the repeated logic into a `NestedContext` type to help simplify the pattern, and applies it consistently throughout the `lower-to-ir.cpp` file.
Besides that cleanup, the major change is `lowerFunctionStaticVarDecl` which, unsurprisingly, handles lower of function-`static` variables to IR globals.
The careful handling of nested contexts here is needed because if we are in the middle of lowering a generic function, then a `static` variable should turn into its *own* `IRGeneric` wrapping an `IRGlobalVar`. The body of the function should refer to the global variable by specializing the global variable's `IRGeneric` to the parameters of the *functions* `IRGeneric`. This tricky detail is handled by `defaultSpecializeOuterGenerics`.
An additional subtlety not actually required for this raytracing work (and thus not properly tested right now) is handling function-`static` variables with initializers.
These can't just be lowered to globals with initializers, because HLSL follows the C rule that function-`static` variables are initialized when the declaration statement is first executed (and this could be visible in the presence of side-effects).
The lowering strategy here translates any `static` variable with an initializer into *two* globals: one for the actual storage, plus a second `bool` variable to track whether it has been initialized yet.
There are some opportunities to optimize this case, especially for `static const` data, but that will need to wait for future changes.
We've slowly been shifting away from the model where a user thinks of a "profile" as including both a stage and a feature level.
Instead, the user should think about selecting a profile that only describes a feature level (e.g., `sm_6_1`, `glsl_450`, etc.), and then separately specifying a stage (`vertex`, `raygeneration, etc.) for each entry point.
The challenge here is that the command-line processing still only had a single `-profile` switch, and no way to specify the stage.
Adding the `-stage` option was relatively easy, but making it work with the existing validation logic for command-line arguments was tricky, because of the complex model that `slangc` supports for compiling multiple entry points in a single pass.
* In `slang.h` add new reflection parameter categories for ray payloads and hit attributes, as part of entry point input/output signatures.
* A previous change already updated our copy of glslang to one that supports the `GL_NVX_raytracing` extension, so in `slang-glslang.cpp` we just needed to map Slang's `enum` values for the raytracing stage names to their equivalents in the glslang code.
* Moved the logic for looking up a stage by name (`findStageByName()`) out of `check.cpp` and into `compiler.cpp`, with a declaration in `profile.h`
* Added a `$z` suffix to the GLSL translation of `Texture*.SampleLevel()`, to handle cases where the texture element type is not a 4-component vector. Note that this fix should actually be applied to *all* these texture-sampling operations, but I didn't want to add a bunch of changes that are (clearly) not being tested right now.
* The layout logic for entry points was updated to correctly skip producing a `TypeLayout` for an entry point result of type `void`, which meant that the related emit logic now needs to guard against a null value for the result layout.
* In `ir.cpp`, dump decorations on every instruction instead of just selected ones, so that our IR dump output is more complete.
* Added a command-line `-line-directive-mode` option so that we can easily turn off `#line` directives in the output when debugging. Not all cases where plumbed through because the `none` case is realistically the most important.
* Parser was fixed to properly initialize parent links for "scope" declarations used for statements, so that we can walk backwards from a function-scope variable (including a `static`) and see the outer function/generics/etc.
* Added GLSL 460 profile, since it is required for ray tracing. Also updated the logic for computing the "effective" profile to use to recognize that GLSL raytracing stages require GLSL 460.
* Added some conventional ray-tracing shader suffixes to the handling in `slang-test`. This code isn't actually used, but was relevant when I started by copy-pasting some existing VKRay shaders as the starting point for my testing.
* Fixup: typos
|
|
* * Change the layout of IROp such that 'main' IROps are 0-x.
* Removed MANUAL_RANGE instuction types, as no longer needed.
* Work in prog on optimizing.
* * Constant time lookup for IROpInfo
* Refactor and document a little more the IROp layout
* Mark ops that use 'other' bits
* Fix typo in definition of kIROpFlag_UseOther
* First pass at working out serialization structure.
* Work in progress on ir-serialize
* Storing strings in IRSerialInfo
Split out IRSerialInfo from the IRSerializer - to make more explicit what is actually saved.
* First pass at serializing out data.
* First pass at serialize reading.
* Fix riff fourcc mark order.
* First pass at reconstructing IRInst / IRDecoration from serialized data.
* Handling of TextureBaseType
* Deserializing of constants.
* Small changes around ir serialization.
* Changed StringIndex indexing to not be an offset into the m_strings array, but an index into strings in order. Doing so makes cache lookup much faster, and makes the 'indicies' themselves smaller and therefore more compressible.
* Removed the need for m_arena in IRSerialWriter. Previously it's purpose was to store the string contents that were being used to lookup UnownedStringSlice.
Now we keep the StringRepresentation in scope and reference that, and so don't need the copy.
* Don't need to construct the IRModuleInst as is created and set on createModule call.
* Remove test code for testing serialization.
* Fix problem with release build in ir-serialize causing warning.
* Use SLANG_OFFSET_OF for offsets in non pod classes to avoid gcc/clang warning.
Give storage to integral static variables to avoid linkage problems with gcc/clang.
* Fix warnings under x86 win32 debug.
|
|
The main change here is to fill out the `BaseType` enumeration so that it covers the full range of 8/16/32/64-bit signed and unsigned integers, as well as 16/32/64-bit floating-point numbers, and then propagate that completion through various places in the code.
More details:
* The current `half`, `float`, `double`, `int`, and `uint` types are still the default names for their types, so things like `float16_t` and `int32_t` were added as `typedef`s.
* We still need to generate the full gamut of vector/matrix `typedef`s for the new types, so that things like `float16_t4x3` will work (yes, I know that is ugly as sin, but that's the HLSL syntax...).
* A few pieces of dead code from earlier in the compiler's life got removed, since I did a find-in-files for `BaseType::` and tried to either update or delete every site.
* A few call sites that were enumerating integer base types in an ad-hoc fashion were changed to use a single `isIntegerBaseType()` function that I added in `check.cpp`
* When compiling with dxc for shader model 6.2 and up, we enable the compiler's support for native 16-bit types via a flag.
* The public API enumeration for reflection of scalar types added cases for 8- and 16-bit integers (it already exposed the other cases we need)
* The lexer was updated to be extremely liberal in what kinds of suffixes it allows on literals. I also removed the logic that was treating, e.g., `0f` as a floating-point literal (it doesn't seem to be the right behavior). That would now be an integer literal with an invalid suffix.
* The logic in the parser that applies types to literals was updated to handle a few more cases: `LL` and `ULL` for 64-bit integers, and `H` for 16-bit floats.
* The mangling logic needed to be updated to handle the new cases, and I consolidated the handling of those types in their front-end and IR forms.
* Removed the explicit `BasicExpressionType::ToString` logic, since all basic types are `DeclRefType`s in the front end, and we can just print them out as such.
* As a bit of a gross hack, fudged the conversion costs so that `int` to `int64_t` conversion is a bit more costly. The problem there is that given an operation like `int(0) + uint(0)`, the best applicable candidates ended up being `+(uint,uint)` and `+(int64_t,int64_t)` because the cost of a single `int`-to-`uint` conversion was the same as the sum of the cost of an `int`-to-`int64_t` and a `uint`-to-`int64_t`. A better long-term fix here is to completely change our overload resolution strategy, but that is obviously way too big to squeeze into this change.
* Type layout computation was updated to handle all the new types and give them their natural size/alignment. Note that this does *not* work for down-level HLSL where `half` is treated as a synonym for `float`. It also doesn't deal with the fact that many of these types aren't actually allowed in constant buffers for certain shader models. A future change should work to add error messages for unsupported stuff during type layout (or just make the types themselves require support for certain capabilities)
|
|
* First pass at MemoryArena.
* First pass at RandomGenerator.
* Extract TestContext into external source file.
* Fix warning on printf.
* Use enum classes for Test enums.
OutputMode -> TestOutputMode.
* First pass at FreeList unit test.
* Auto registering tests.
Improvements to RandomGenerator.
* Remove the need for unitTest headers - cos can use registering.
* Added unitTest for MemoryArena.
* Do unit tests.
* Fix typo.
|
|
The original goal here was to bring up a second example program: `model-viewer`.
While the existing `hello-world` example is enough to get somebody up to speed with the basics of the Slang API (as a drop-in replacement for `D3DCompile` or similar), it doesn't really show any of the big-picture stuff that Slang is meant to enable.
There wasn't any use of D3D12/Vulkan descriptor tables/sets, and there wasn't any use of interfaces, generics, or `ParameterBlock`s in the shader code.
The `model-viewer` example addresses these issues. Its shader code involves generics, interfaces, and multiple `ParameterBlock`s, and the host-side code demonstrates a few key things for working with Slang:
* There is an application-level abstraction for parameter blocks, that combines the graphics-API descriptor set object with Slang type information
* There is a shader cache layer used to look up an appropriate variant of a rendering effect by using parameter block types to "plug in" global type variables
* There is a clear separation between the phases of compilation: a first phase that does semantic checking and enables reflection-based allocation of graphics API objects, followed by one or more code generation passes for specialized kernels.
This example is certainly not perfect, and it will need to be revamped more going forward. In particular:
* The output picture is ugly as sin. We need a plan for how to get this to load better content, perhaps even popping up an error message to note that the required input data isn't present in the basic repository.
* The shader code is too simplistic. There isn't any real material variety, and the `IMaterial` abstraction is completely wrong.
* The use of parameter blocks is facile because there are no resource parameters right now. Fixing that will likely expose issues around interfacing with Slang's reflection API.
* The whole example exposes the issue that Slang's current APIs aren't really designed for the benefit of two-phase compilation (since our many client application has been stuck on one-phase compilation).
* Global type parameters are actually a Bad Idea that we only did for compatibility with existing codebases. We should not be showing them off in an example of the Right Way to use Slang, but the language support for type parameters on entry points is still not complete.
Of course, the majority of the changes here are *not* inside the example applications, and instead involve a major overhaul of the `Renderer` abstraction that is used for both tests and examples. The main thrust of the change is to make the abstraction layer be closer to the D3D12/Vulkan model than to a D3D11-style model. This is important for the `model-viewer` example, since it aspires to show how Slang can be incorporated into a renderer that targets a modern API. The most important bit is actually the use of descriptor sets and "pipeline layouts" a la Vulkan, since without these Slang's `ParameterBlock` abstraction won't make a lot of sense.
Implementation of the abstraction for the various APIs has very much been on an as-needed basis. The current implementation is just enough for the two examples to work, plus enough to get all the tests to pass in both debug and release builds on Windows.
A big missing feature in the API abstraction right now is memory lifetime management. The code had been trending toward something D3D11-like where a constant buffer could be mapped per-frame with the implementation doing behind-the-scenes allocation for targets like D3D12/Vulkan. I'd like to shift more toward a model of just exposing "transient" allocations that are only valid for one frame, because these are more representation of how an efficient renderer for next-generation APIs will work. That transition isn't actually complete, though, so there are problems with the existing examples where `hello-world` is actually scribbling into memory that the GPU might still be using, while `model-viewer` is doing full-on heavy-weight allocations on a per-frame basis with no real concern for the performance implications.
All together, there are a lot of things here that need more work, but this branch has been way too long-lived already, and so I'd like to get this checked in as long as all the tests pass.
|
|
* * Make spCompile return SlangResult
* Make spProcessCommandLineArguments return SlangResult (and not internally exit)
* Remove calls to exit()
* Fix typos
* Make all output from spProcessCommandLineArguments get sent to diagnostic sink.
|
|
* Added Result definitions to the slang.h
* Removed slang-result.h and added slang-com-helper.h
* Move slang-com-ptr.h to be publically available.
* Add SLANG_IUNKNOWN macros to simplify implementing interfaces.
Use the SLANG_IUNKNOWN macros to in slang.c
* Removed slang-defines.h added outstanding defines to slang.h
|
|
* Add support for "blobs" and a file-system callback
The most obvious change here is that the Slang header now includes a few COM-style interfaces that can be used for communication between the application and compiler. In order to support the declaration of COM-like interfaces, several platform-detection macros were lifted out of `slang-defines.h` and into the public `slang.h` header. As it exists right now, this change makes the Slang API C++-only, but a C-compatible version can be defined later with the help of lots of macros (and/or something like an IDL compiler).
The two big interfaces introduced are:
* The `ISlangBlob` interface, which is compatible with `ID3DBlob`, `IDxcBlob`, etc. This is used to pass ownership of source/compiled code across the API boundary without copies. New versions of various entry points have been added to allow passing blobs: e.g., `spAddTranslationUnitSourceBlob` and `spGetEntryPointCodeBlob`.
* The `ISlangFileSystem` interface, which is used to allow applications to intercept any attempt by the Slang compiler to load a file (input source files, include files, etc.). This is *not* the same as the `IDxcIncludeHandler` interface, because it assumes UTF-8 encoded path names, instead of the 16-bit encoding that dxc/Windows prefer. It is also not very similar to `ID3DInclude` as used by fxc, because this callback interface is *not* responsible for handling the search through include paths, etc. - it is just a file-system abstraction layer.
Internally, a few different parts of the compiler were changed to either store data in blob form all the time, or to be able to synthesize a blob on-demand. Because our internal `String` type is a reference-counted copy-on-write type, using a `SlangStringBlob` to hold string data should achieve transfer of ownership back to the application without extraneous copies. There is plenty of room to clean up the architecture of some of these internal pieces if they *know* that their data will end up in a blob.
The existing Slang testing doesn't touch any of the APIs introduced here, so they can only confirm that existing functionality hasn't been broken. The new ability to return code blobs has been tested by integration of that feature into Falcor, but there has been zero testing of the ability to pass *in* source code as blobs, and the ability to hook file loading. Future changes will need to add test coverage for the new features.
* fixup: define SLANG_NO_THROW for non-Windows builds
* fixup: header copy-paste error caught by clang/gcc
* Cleanup: return reference-counted objects via output parameters
Returning a reference-counted object through the API as a raw pointer creates challenges.
The "obvious" answer is that the returned pointer should have an added reference (it is returned at "+1"), and the caller is responsible for releasing that reference. This makes sense when using raw pointers on the calling side:
```c++
IFoo* foo = spGetFoo(...);
...
foo->Release();
```
However, as soon as smart pointers start getting involved (to handle releasing reference counts when we are done with things), the picture gets more complicated:
```c++
MySmartPtr<IFoo> foo = spGetFoo(...);
...
```
The intention of code like that is that `foo` gets released when the smart pointer goes out of scope, but this probably doesn't happen with most smart pointer implementations. If the `MySmartPtr` constructor that takes a raw pointer retains it, then the destructor will only release *that* reference, and so the object will leak.
It is possible that the user will have a smart pointer type where the constructor that takes a raw pointer doesn't retain it, but in general such types introduce the potential for errors of their own, and no matter what the Slang API shouldn't go in assuming any particular policy.
This change makes it so that any reference-counted objects that are logically returned from a call are returned through output pointers. This design makes the leak-free cases easy (enough) to implement with raw pointers or smart pointers:
```c++
// raw pointer
IFoo* foo = nullptr;
spGetFoo(..., &foo);
...
foo->Release();
// smart pointer
MySmartPtr<IFoo> foo;
spGetFoo(..., foo.writeableRef());
...
```
The only assumption here is that any COM smart-pointer type needs to provide an operation like `writableRef` that is suitable for using that pointer as an output parameter. Given that COM *loves* output parameters, this seems like a safe assumption (at the very least, anybody who interacts with COM would be used to this convention).
Future changes might introduce inline convenience methods for various operations that return results more directly, possibly by introducing a minimal smart-pointer type in the `slang.h` header (without prescribing that clients must use it...).
* fixup: another error caught by gcc/clang
|
|
* Add options to control matrix layout rules
Up to this point, the Slang compiler has assumed that the default matrix layout conventions for the target API will be used.
This means column-major layout for D3D, and *row major* layout for GL/Vulkan (note that while GL/Vulkan describe the default as "column major" there is an implicit swap of "row" and "column" when mapping HLSL conventions to GLSL).
This commit introduces two main changes:
1. The default layout convention is switched to column-major on all targets, to ensure that D3D and GL/Vulkan can easily be driven by the same application logic. I would prefer to make the default be row-major (because this is the "obvious" convention for matrices), but I don't want to deviate from the defaults in existing HLSL compilers.
2. Command-line and API options are introduced for setting the matrix layout convention to use (by default) for each code generation target. It is still possible for explicit qualifiers like `row_major` to change the layout from within shader code.
I also added an API to query the matrix layout convention that was used for a type layout (which should be of the `SLANG_TYPE_KIND_MATRIX` kind), but this isn't yet exercised.
I added a reflection test case to make sure that the offsets/sizes we compute for matrix-type fields are appropriately modified by the flag that gets passed in.
In a future change we could possibly switch the default convention to row-major, if we also changed our testing to match, since there are currently not many clients to be adversely impacted by the change.
* Fixup: silence 64-bit build warning
|
|
* Generate Visual Studio projects using Premake
This change adds a `premake5.lua` file that allows us to generate our Visual Studio solution using Premake 5 (https://premake.github.io/).
The existing Visual Studio solution/projects are now replaced with the Premake-generated ones, and project contributors will be expected to update these by running premake after adding/removing files.
I have *not* changed the Linux `Makefile` build at all, because that file is also used for things like running our tests, so that clobbering it with a premake-generated `Makefile` would break our continuous testing.
Hopefully future changes can switch to a generated `Makefile` and perhaps even add an XCode project as well.
Notes:
* The `build/slang-build.props` file is no longer needed/used, so it has been removed.
* The `slang-eval-test` test fixture wasn't following our naming conventions for its directory path, so it was updated to streamline the Premake build configuration work. This required changes to the `Makefile` as well
* Some seemingly unncessary preprocessor definitions that were specified for `core` and `slang-glslang` have been dropped. We will see if anything breaks from that.
* Possible fixup for Premake vpath issue
Premake's `vpath` feature seems to be nondeterministic about the order it applies filters (because Lua isn't deterministic about the order of entries in a key/value table), and as a result we can end up in a weird case where it decides that a `foo.cpp.h` file matches the `**.cpp` filter (I'm not sure why) before it tests against the `**.h` filter.
This change uses an (undocumented) Premake facility to set `vpath` using a list of singleton tables, which seems to fix the order in which things get tested.
* Remove support for "single-file" build of Slang
The `hello` example was the only bit of code that uses the "single-file" way of building Slang, and this had already run up against limitations of the Visual Studio compilers in its Debug|x64 build.
Rather than mess with Premake to make it pass through the `/bigobj` linker flag that is needed to work around the issue, it makes more sense just to stop using/supporting the feature since we wouldn't want users to depend on it anyway (our documentation no longer refers to it).
While I was at it I went ahead and made sure that the `SLANG_DYNAMIC` flag doesn't need to be set manually, so that instead there is a non-default `SLANG_STATIC` option (not that we have a static-library build of Slang at the moment).
|
|
This code is currently not used by anything, but I wanted to check in a first pass at an implementation of dominator tree construction so that we don't have to keep avoiding implementing algorithms that rely on having dominator information available.
The algorithm used to construct the dominator tree is taken from "A Simple, Fast Dominance Algorithm" by Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy.
This is not the "best" algorithm in terms of asymptotic performance, but it is among the simplest algorithms for computing a dominator tree that still outperforms naive iterative set-based methods.
The actual data structure and API for the dominator tree has a bit of "cleverness" in it to try to make the common queries reasonably fast (e.g., you can check whether A dominates B in constant time).
My hope is that even if we implement a more advanced algorithm for constructing the dominator tree, we can retain compatibility with passes that might make use of this API.
Because no code is currently using this logic, I have done only minimal testing by stepping through this code and validating the results on paper for some very small CFGs.
More serious testing/debugging may need to wait until we have an optimization pass that needs the dominator tree we compute here.
One open question I have is how best to introduce traditional unit testing into Slang, since this is an example of code that would benefit greatly from being unit tested.
|
|
* Fix bug when subscripting a type that must be split (#396)
The logic was creating a `PairPseudoExpr` as part of a subscript (`operator[]`) operation, but neglecting to fill in its `pairInfo` field, which led to a null-pointer crash further along.
* Allow writes to UAV textures (#416)
Work on #415
This issue is already fixed in the `v0.10.*` line, but I'm back-porting the fix to `v0.9.*`.
The issue here was that the stdlib declarations for texture types were only including the `get` accessor for subscript operations, even if the texture was write-able.
I've also included the fixes for other subscript accessors in the stdlib (notably that `OutputPatch<T>` is readable, but not writable, despite what the name seems to imply).
* Fix infinite loop in semantic parsing (#424)
The code for parsing semantics was looking for a fixed set of tokens to terminate a semantic list, rather than assuming that whenever you don't see a `:` ahead, you probably are done with semantics. This meant that you could get into an infinite loop just with simple mistakes like leaving out a `;`.
This change fixes the parser to note infinite loop in this case, and adds a test case to verify the fix.
* Expose HLSL `shared` modifier through reflection. (#436)
This is a request from Falcor, because the `shared` modifier can be used as a hint to optimize the grouping of parameters for binding. The intention is that `shared` marks shader parameters (including parameter blocks) that will us the same values across many draw calls (e.g., per-frame data, as opposed to per-model or per-instance).
The mechanism I'm using here is to provide a general reflection API for exposing the `Modifier`s already attached to declarations. While the only modifier exposed is `shared`, and the only modifier information being exposed is presence/absence, this interface could be extended down the line.
|
|
* Add support for DirectX Raytracing (DXR)
This is an initial pass to add support to Slang for the shader stages introduced by DirectX Raytracing (DXR).
* Add declarations for DXR intrinsic types and functions to the Slang standard library. The way our compilation works, these will then get propagated through the IR as intrinsics and get spit back out again as-is during HLSL code emission.
* Declare the DXR-related stages. This is the main work that affects the compiler's C++ implementation rather than being something we can add via the standard library today.
* Switch around the encoding of the `Profile` type so that the stage is in the low bits, allowing API users to pass an ordinary `SlangStage` to operations that expect a `SlangProfileID`.
- This represents a direction I'd like to push in long term, where the user specifies stage and "feature level" separately rather than using composite profiles like `vs_6_0`. The introduction of these new stages seems like a good point to try and make a clean break here and not introduce, e.g., `rgs_6_1` for ray generatin shaders.
* Upgrade "effective profile" computation so that it advances the required version based on the specified stage (e.g., DXR stages seem to require at least shader model 6.1).
- This is a bit of a kludge overall, but ideally we don't want a typical user to have to think about "feature level" stuff much at all. The ideal workflow is that they just hand us a source file and we work out entry points and their required feature levels in the compiler (and let the user query it when we are done). Until we implement that for real, stopgaps like this are required.
Overall these are relatively small changes for supporting some major new API behavior. Slang's design helps out here, by allowing a lot of things to be specified in the stdlib (including generic intrinsic functions), but some of this is also owed to the DXIL-influenced design of DXR - e.g., the use of global functions in place of `SV_*` semantics.
* fixup: typos
* Fixup: use `pixel` instead of `fragment` as primary stage name
This is to match HLSL conventions when generating output code, even if the Slang project officially favors the more correct term "fragment shader."
|
|
The main practical change here is that things that used to be `IRValue`s, like literals, are now being expressed as instructions in the global scope.
In order to validate that things are actually being handled correctly, this change introduces an explicit "validation" pass that can be run on the IR to check for different invariants (although it doesn't check many of the important ones right now). I've left the validation pass turned off by default, but with a command-line flag to enable it. We may want to make it be on by default in debug builds, just to keep us honest. The main invariant for the moment is that when on IR instruction is used as an operand to another, it had better come from the same IR module.
Some of the existing passes were violating this rule, in particular when it came to cloning of witness tables related to global generic parameter substitution. Those features can in theory be handled better now by allowing `specialize` instructions at other scopes, but I didn't want to over-complicate this change, so I make just enough fixes to ensure that these steps always clone witness tables they get from the "symbols" on an IR specialization context. In order for this to work when recursively specializing, I had to ensure that the logic for generic specialization had a notion of a "parent" specialization context that it would fall back to to perform cloning when necessary.
This change keeps the logic that was caching and re-using the instructions for literal values within a module, but adds some logic that isn't really being tested right now for picking the right parent instruction to insert a constant instruction into. This logic doesn't trigger right now because all of the cases we are using it on have zero operands (and so they always get "hoisted" to the global scope), but eventually for things like types we want to be able to support instructions with operands (e.g., `vector<float, 4>`) and handle the case where some of those operands come from different scopes (e.g., when nested inside a generic).
The final change here is mostly cosmetic: the `IRBuilder` is now more abstract about where insertion occurs: it tracks a single `IRParentInst` to insert into, and then an optional `IRInst` to insert before. In the common case, that parent is an `IRBlock`, but it could conceivably also be the global scope, or a witness table, etc. Use sites where we used to change those fields directly now use distinct methods `setInsertInto(parent)` and `setInsertBefore(inst)` which capture the two cases we care about. Accessors are also defined to extract the current block (if the current parent is a block), and the current "function" (global value with code, if the current parent is a global value with code, or a block inside one).
With this work in place, it should be possible for a follow-on change to start putting `specialize` instructions at the global scope and thus clean up some of the on-the-fly specialization work. This work should also help with some of the requirements around a distinct IR-level type system and more explicit generics.
|
|
Pull BaseType, TextureFlavor and SamplerStateFlavor enums and helper functions into a shared file "type-system-shared.h".
|
|
These changes are related to getting a first Slang geometry shader to translate to GLSL. There are some unrelated cross-compilation fixes in here as well.
* Add direct support to shader parameter layout for GS output streams, so that they are reflected as a container type
* Fix the declarations of the `SampleCmp` methods; they should always return `float`, independent of the nominal element type of the texture.
* Fix up our handling of `__target_intrinsic` modifiers, so that we are a little bit more careful in how we detect something as being just a simple name replacement (e.g., `__target_intrinsic(glsl, "foo")` should make us output `foo(original, args, here)`) vs. a custom expression (e.g., `__target_intrinsic(glsl, "bar+1")` should output `bar+1` and not use any arguments, even without any `$` substitutions).
* Don't emit the `[unroll]` modifier when outputting GLSL. Eventually we need to fully unroll loops for GLSL output anyway.
* Inspect th entry point parameter list (from the layout information) when emitting a GS, so that we can write out the correct `layout` modifiers for input primitive type and output primitive topology.
* Add a new case to `ScalarizedVal` to handle cases where an HLSL system value needs to map to a GLSL built-in variable with a slightly different type (e.g., `SV_RenderTargetArrayIndex` is a `uint` while `gl_Layer` is an `int`). For now this is only hanlding trivial cases (where a direct cast can achieve the result we want), but eventually it might need to handle things like conversion between arrays and vectors.
* This is mostly just the infrastructure for the feature, and the actual enumeration of the correct types for all the system values is still to be done.
* Handle a few more cases in assignment between `ScalarizedVal`. In particular, deal with cases where `materializeValue` is called on a tuple that has an array type, so that we need to construct the individual array elements.
* Add translation for GS output stream `Append()` and `RestartStrip()`
* Note that the translation of `Append()` seems to ignore its argument; this is because we desugar the operation during legalization for GLSL (see next item)
* When legalizing for GLSL, detect an entry point parameter that is a GS stream, and translate it into `out` variables for its element type, and then rewrite any calls to `Append()` in the body of the entry point to be preceded by assignment to those variables. This works in tandem with the above translation of HLSL `Append()` calls into GLSL `EmitVertex()` calls.
* We are detecting calls to `Append()` in a slightly hacky way, by looking at decorations on the callee to make sure that it is a function that is determined to translate to `EmitVertex()`.
* Right now we aren't handling calls to `Append()` in other functions. It wouldn't be hard in principle to walk all the functions in the module and apply the translation (assuming we don't want to start supporting multiple output streams), but this wouldn't handle the passing of the GS output stream between functions. (This points out that there is a need for an additional type legalization pass that desugars away parameters of types that aren't actually meaningful on the target).
|
|
* Initial work on validating "constexpr"-ness in IR
The underlying issue here is that certain operations in the target shading languages constrain their operands to be compile-time constants. A notable example is the optional texel offset parameter to the `Texture2D.Sample` operation.
When calling these operations in GLSL, the user is required to pass a "constant expression," and any variables in that expression must therefore be marked with the `const` qualifier (and themselves be initialized with constant expressions). Any GLSL output we generate must of course respect these rules.
When calling these operations in HLSL, the user is not so constrained. Instead, they can pass an arbitrary expression, which may involve ordinary variables with no particular markup, and then the compiler is responsible for determining if the actual value after simplification works out to be a constant. In some cases, the requirement that a value be constant might actually trigger things like loop unrolling. Also, it is okay to use a function parameter to determine such a constant expression, as long as the argument turns out to be a constant at all call sites.
The way we have decided to tackle these challenges in Slang is that we we propagate a notion of `constexpr`-ness through the IR. This is currently being tackled in `ir-constexpr.cpp` with a combination of forward and backward iterative dataflow:
* When the operands to an instruction are all `constexpr`, and the opcode is one we believe can be constant-folded, then we infer that the instruction *can* be evaluated as `constexpr`
* When instruction is required to be `constexpr`, then we infer that all of its operands are also required to be `constexpr`.
If this process ever infers that a function parameter is required to be `constexpr`, then we might have to continue propagation at all the call sites to that function.
If after all the propagation is done, there are any cases where an instruction is *required* to be `constexpr`, but it *can't* be `constexpr` (we weren't able to infer `constexpr`-ness for its operands), then we issue an error.
This implementation encodes the idea of `constexpr`-ness in the IR as part of the type system, using a simplified notion of rates. This change adds a `RateQualifiedType` that can represent `@R T`, and then introduces a `ConstExprRate` that can be used for `R`. Many accessors for the type information on IR nodes were updated to distinguish when one wants the "full" type of an IR value (which might include rate information) vs. just the "data" type.
A `constexpr` qualifier was added in the front-end, and is being used to decorate the texel offset parameter for `Texture2D.Sample`. Lowering from AST to IR looks for this qalifier and infers when a function parameter must be typed as `@ConstExpr T` instead of just `T`.
There are lots of limitations and gotchas in the implementation so far:
* The `@ConstExpr` rate is the only one added in this change, but it seems clear that the conceptual `ThreadGroup` rate that was added to represent `groupshared` should probably get folded into the representation.
* I'm not 100% pleased with how many places in the IR I have to special-case for rate-qualified types. At the same type, pulling out rate as a distinct field on `IRValue` would probably require that we pay attention to rate everywhere.
* I've added a test case to show that we can issue errors when users fail to provide a constant expression for the texel offset, but the actual error message isn't great because it doesn't indicate *why* a constant expression was required. Realistically the "initial IR" should contain a few more decorations we can use to relate error conditions back to the original code (even if this is in a side-band structure).
* I've added a test case that is supposed to show that we can back-propagate `constexpr`-ness to local variables, and I've manually confirmed that it works for Vulkan/SPIR-V output, but the level of Vulkan support in `render_test` today means I can't enable the test for check-in.
* While I'm attempting to propagate `@ConstExpr` information from callees to callers, I haven't implemented any logic to specialize callee functions based on values at call sites.
* In a similar vein, there is no handling of control-flow dependence in the current code. If we infer that a phi (block parameter) needs to be `@ConstExpr`, then it isn't actually enough to require that the inputs to the phi (arguments from predecessor blocks) are all `@ConstExpr` because we also need any control-flow decisions that pick which incoming edge we take to be `@ConstExpr` as well.
* As a practical matter, implicit propagation of `@ConstExpr` from a function body to a function parameter should only be allowed for functions that are "local" to a module. Any function that might be accessed from outside of a module should really have had its `@ConstExpr` parameter marked manually, and our pass should validate that they follow their own rules. Right now we have no kind of visibility (`public` vs `private`) system, so I'm kind of ignoring this issue.
While that is a lot of gaps, this is also just enough code to get the Falcor MultiPassPostProcess example working, so I'm inclined to get it checked in.
* Fixup: missing expected output for test
* Fixup: disable test that relies on [unroll] for now
|
|
1, make IRModule class own a memory pool for all IR object allocations
2. For now, we allow IR objects to own other (externally) heap allocated objects, such as String, List and RefPtrs by tracking all IR objects that has been allocated for the IRModule in a list named `IRModule::irObjectsToFree`. and call destructor for all these objects upon the destruction of the IRModule. In the long term, we should eliminate the use of all these externally allocated types in IR system and get rid of this tracking and explicit destructor calls.
3. remove non-generic `createValueImpl` functions and retain only generic versions in IRBulider so we can properly call the constructor of the IR types to set up virtual tables correctly for destructor dispatching.
4. add `MemoryPool` class for allocation of the IR objects.
5. Make sure we are disposing IRSpecContexts when we are done with the specialized IR module.
6. Add `_CrtDumpMemoryLeaks()` calls to check memory leaks upon destruction of a Slang session. If we are to support multiple sessions at a time, this call should probably be replaced with the more advanced MemoryState versions of the memory leak checker.
|
|
* Re-define deprecated compile flags
By including these flags in the header file, with a value of zero, we can allow some existing code to compile even after the major changes to the implementation.
* The `SLANG_COMPILE_FLAG_NO_CHECKING` option will effectively be ignored, since checking is always enabled.
* The `SLANG_COMPILE_FLAG_SPLIT_MIXED_TYPES` option will now act as if it is always enabled (and indeed some of the code has been relying on this flag being set always).
* Make subscript operators writable for writable textures
This even had a `TODO` comment saying that we needed to fix it, and now I'm seeing semantic checking failures because we didn't define these and so we find assignment to non l-values.
* Fix definitions of any() and all() intrinsics
These should always return a scalar `bool` value, but they were being defined wrong in two ways:
1. They were using their generic type parameter `T` in the return type
2. They were returning a vector in the vector case, and a matrix in the matrix case.
This change just alters the return type to be `bool` in all cases.
* Fix bug in SSA construction
When eliminating a trivial phi node, it is possible that the phi is still recorded as the "latest" value for a local variable in its block.
When later code queries that value from the block (which can happen whenever another block looks up a variable in its predecessors), it would get the old phi and not the replacement value.
I simply added a loop that checks if the value we look up is a phi that got replaced, and then continues with the replacement value (which might itself be a phi...). A more advanced solution might try to get clever and have the map itself hold `IRUse` values so that we can replace them seamlessly.
* Simplify IR control flow representation
This change gets rid of various special-case operations for conditional and unconditional branches, and instead requires emit logic to recognize when a direct branch is targetting a `break` or `continue` label.
The new approach here isn't perfect, but it seems beter than what we had before, because it can actually work in the presence of control-flow optimizations (including our current critical-edge-splitting step).
* Load from groupshared isn't groupshared
When loading from a `groupshared` variable, the resulting temporary shouldn't have the `groupshared` qualifier on it.
This might eventually need to generalize to a better understanding of storage modifiers in the IR, but I don't really want to deal with that right now.
* Don't emit references to typedefs in output code
Now that we are using the IR for all codegen, we shouldn't be dealing with surface-level things like `typedef` declarations in the output code; just use the type that was being referred to in the first place.
* Fix floating-point literal printing for IR
The IR was calling `emit()` instead of `Emit()` (we really need to normalize our convention here), and was implicitly invoking a default constructor on `String` that takes a `double` (that constructor should really be marked `explicit`), and which doesn't meet our requirements for printing floating-point values.
* Fix error when importing module that doesn't parse
We already added a case to bail out if semantic checking fails, but neglected to add a case if there is an error during parsing of a module to be imported.
Note: this logic doesn't correctly register the module as being loaded (but still in error), so users could see multiple error messages if there are multiple `import`s for the same module.
* Improve error message for overload resolution failure
- Drop debugging info from the candidate printing
- Add cases to print `double` and `half` types properly
* Fixup: switch loopTest to ifElse in expected IR output
|
|
* Generate SSA form for IR functions
The basic idea here is simple: in the front-end after we have lowered the AST to initial IR we will apply a set of "mandatory" optimization passes. The first of these is to attempt to translate the all functions into SSA form so that they are amenable to subsequent dataflow optimizations. Eventually, the mandatory optimization passes would include diagnostic passes that make sure variables aren't used when undefined, etc.
Just doing basic SSA generation already cleans up a lot of the messiness in our IR today, because constructs that used to involve many local variables can now be handled via SSA temporaries.
The implementation of SSA generation is in `ir-ssa.cpp`, and it follows the approach of Braun et al.'s "Simple and Efficient Construction of Static Single Assignment Form." I used this instead of the more well-known Cytron et al. algorithm because Braun's algorith mis very simple to code, and does not require auxiliary analyses to generate the dominance frontier.
The main wrinkle in our SSA representation right now is that instead of using ordinary phi nodes, we instead allow basic blocks to have parameters, where predecessor blocks pass in different parameter values. This encodes information equivalent to traditional phi nodes, but has two (small) benefits:
1. There is no fixed relationship between the order of phi operands and predecessor blocks, so we don't have to worry about breaking the phis when we alter the order in which predecessors are stored. This is important for us because predecessors are being stored implicitly.
2. It is easy to operationalize a "branch with arguments" either when lowering to other languages, or when interpreting the IR. A branch with arguments is implemented as a sequence of stores from the arguments to the parameters of the target block (very similar to a call), followed by a jump to the block.
Relevant to the above, this change also adds an interface for enumerating the predecessors or successors of a block in our CFG. Rather than use an auxliary structure, we directly use the information already encoded in the IR:
* The sucessors of a block are the target label operands of its terminator instruction. In our IR this is a contiguous range of `IRUse`s, possible with a stride (to account for the way `switch` interleaves values and blocks).
* The predecessors of a block are a subset of the uses of the block's value. Specifically, they are any uses that are on a terminator instruction, and within the range of values that represent the successor list of that instruction.
One important limitation of the "blocks with arguments" model for handling phis is that it is really only convenient to stash extra arguments on an unconditional terminator instruction. This change works around this prob lem by breaking any "critical edges" - edges between a block with multiple successors and one with multiple predecessors. We assume that "phi" nodes will only ever be needed on a block with multiple predecessors, and because critical edges are broken, each of these predecessors will then have only a single successor, so its branch instruction can handle the extra arguments.
This change introduces a notion of an "undefined" instruction in the IR. This is handled as an instruction rather than a value because I anticipate that we will want to distinguish different undefined values when it comes time to start issuing error messages (those messages will need to point to the variable that was used when undefined).
* Fix expected test output.
Another change was merged that enabled the `glsl-parameter-blocks` test, and its output is affected by our IR optimization work.
|
|
The basic change is simple: remove support for all code generation paths other than the IR.
There is a lot of vestigial code left, but the main logic in `ast-legalize.*` is gone.
Doing this breaks a *lot* of tests, for various reasons:
- We can no longer guarantee exactly matching DXBC or SPIR-V output after things pass through out IR
- Many builtins don't have matching versions defined for GLSL output via IR (even when they had versions defined via the earlier approach that worked with the AST)
- A lot of code creates intermediate values of opaque types in the IR, which turn into opaque-type temporaries that aren't allowed (this breaks many GLSL tests, but also some HLSL)
I implemented some small fixes for issues that I could get working in the time I had, but most of the above are larger than made sense to fix in this commit.
For now I'm disabling the tests that cause problems, but we will need to make a concerted effort to get things working on this new substrate if we are going to make good on our goals.
|
|
* Remove support for the -no-checking flag
Fixes #381
Fixes #383
Work on #382
- No longer expose flag through API (`SLANG_COMPILE_FLAG_NO_CHECKING`) and command-line (`-no-checking`) options
- Remove all logic in `check.cpp` that was withholding diagnostics (including errors) when the no-checking mode was enabled
- Remove `HiddenImplicitCastExpr`, which was only created to support no-checking mode (it represented an implicit cast that our checking through was needed, but couldn't emit because it might be wrong)
- Remove logic for storing function bodies as raw token lists when checking is turned off. I'm leaving in the `UnparsedStmt` AST node in case we ever need/want to lazily parse and check function bodies down the line.
- Remove a few of the code-generation paths we had to contend with, but keep the comment about them in place.
- Remove GLSL-based tests that can't meaningfully work with the new approach.
- Fix other tests that used a GLSL baseline so that their GLSL compiles with `-pass-through glslang` instead of invoking `slang` with the `-no-checking` flag.
- Remove tests that were explicitly added to test the "rewriter + IR" path, since that is no longer supported.
There is more cleanup that can be done here, now that we know that AST-based rewrite and IR will never co-exist, but it is probably easier to deal with that as part of removing the AST-based rewrite path.
We've lost some test coverage here, but actually not too much if we consider that we are dropping GLSL input anyway.
* Fixup: test runner was mis-counting ignored tests
* Fixup: turn on dumping on test failure under Travis
* Fixup: enable extensions in Linux build of glslang
|
|
Added two API functions:
1. `spReflection_FindTypeByName`, which returns a DeclRefType to the struct type with the given name. The function finds from all loaded modules in a `CompileRequest` for a decl with the given name, construct a `Type` object and cache it in `CompileRequest::types` dictionary. The subsequent calls to `spReflection_FindTypeByName` with the same name will simply returned the cached Type objects.
2. `spReflection_GetTypeLayout`, which returns a `TypeLayout` for a given `Type`. This function creates and caches the `TypeLayout` in the `TargetRequest` object that is used to create the `ProgramLayout`.
|
|
* no-codegen compile flag and global generics reflection
1. Add SLANG_COMPILE_FLAG_NO_CODEGEN (-no-codegen) compiler flag to skip code generation stage, so that a shader that uses global generic type parmameters can be parsed, checked and introspected without knowing the final specialization.
2. Add reflection API to query for global generic type parameters, global parameters of generic type, and the generic type parameter index related to a global generic parameter.
3. Add a reflection test case for global generic type parameters.
* add expected result for global-type-params test case.
* fix reflection json output.
* fix branch condition errors
* fix expected result for global-type-params.slang
* fix expected test case output
|
|
Fixes #307
This ends up being a major overhaul over how type layout computation is structured and exposed.
The big problems all arise around cases where both the "container" for a parameter block or CB, and the "element" type both use the same kind of resource.
E.g., if you define a CB with a texture in it, then in Vulkan both the CB and the texture use the same kind of resource, and so if you query the CB's resource usage it will just tell you it uses two descriptor-table slots, but nothing more than that.
Similar confusion still arises in the HLSL case, when a CB with a texture in it reports its parameter category as "mixed" so that a user might query for a category they didn't mean to. There were also cases in the existing code where a parameter block might expose *both* a register-space usage and another concrete resource type, which isn't right.
The most important changes here are:
- A `ParameterGroupTypeLayout` now has a more refined internal structure, consisting of:
- A `containerTypeLayout`, which represents the resource usage of the buffer/block itself (e.g., if a constant buffer had to be allocated)
- An `elementVarLayout` which stores the offsets that need to be applied to get from the `VarLayout` for an instance of this parameter-group type to the offsets of its elements. The `TypeLayout` for this variable layout should be the "raw" type of the block/CB element.
- The `offsetElementTypeLayout` (formerly just `elementTypeLayout`) which represents the element type, but in the case of a `struct` element type, will have fields offset similar to the `elementVarLayout`. This is what all the old code used to use, so we need to keep it for compatibility.
- When doing reflection on a `ParameterGroupTypeLayout`, we now only report the resource usage of the `containerTypeLayout`. This is technically a potentially breaking change in the public API, but I don't think Falcor will mind, since they actually want something closer to this behavior.
- Add a new public API for querying the element variable layout of a parameter block of constant buffer. This could be used by savvy applications to fold the handling of CB element offsetting into some notion of a "reflection path." This would be required for applications that want to handle CBs or parameter blocks where the element type is *not* a `struct` type.
- Remove old logic for applying an offset when creating a type layout for constant buffer element, and instead perform offsetting more uniformly later, by constructing the `offsetElementTypeLayout` from the `rawElementTypeLayout`. This is useful both because we want to keep both (the "raw" type layout becomes the type layout of the `elementVarLayout`), and also because we can decide later whether we even want to allocate a CB register for a buffer, based on whether it actually contains any uniform data.
- Fix cases where we might end up with a parameter block type reporting both that it uses a whole register space (and thus should not expose the resource usage of the container/element type) *and* a constant-buffer register/slot. The latter should be hidden inside the regsiter space.
- Clean up the `spReflectionParameter_GetBinding{Index,Space}` functions to just route to `spReflectionVariableLayout_Get{Offset,Space}`, using the "default" category of the parameter
- Try to make the `GetSpace` query take into account cases where a variable also has an explicit `RegisterSpace` allocation.
- This probably still needs some cleanup, since ideally we'd just move things into the `space` field on the `ReosurceInfo` and have an invariant that a variable *either* has a `RegisterSpace` allocation, or it has other resource infos, but never both...
- Add some ad-hoc logic so that if the user queries for a binding index/space using a parameter category that doesn't actually apply (e.g., they query for a D3D `t` register when using Vulkan), we can optionally remap it to the resource type they "probably" meant. This is a mess of Do What I Mean code, but it is also what our users want right now.
- Fix various bits of emit logic so that if a parameter block has a register space/set allocated to it, we properly output that as part of the binding information for it.
- This is another thing that might be cleaned up if we rationale the way that things get split during legalization.
- Add a GLSL case for emitting a parameter block variable as a `cbuffer`.
|
|
* Cleanups to `ParameterBlock<T>` behavior.
These add some more realistic tests using the `ParameterBlock<T>` support, and show that it can work with the "rewriter" mode.
Unfortunately, this code does *not* currently work with the rewriter + the IR at once. That will need to be fixed in a follow-on change, because I now see that the root problem is pretty ugly.
* cleanup
|
|
* Make AST and IR share type legalization code
A previous change already made it so that the AST-to-AST lowering/legalization pass could work together with IR-based lowering of `import`ed code, but that change didn't take into account the case where a function written in the AST needed to call an IR function and pass in a type that required legalization.
Both the IR-based and AST-based passes had their own approaches to type legalization, that mostly agreed on the desired output, but they ended up creating their own representations for legalized types which would mean that for a function call the caller and callee might end up legalizing the parameter list to use different types.
This change tries to fix this issue (and adds a new test case that relies on the fix) by massively overhauling the AST-based legalization pass so that it uses the same type legalization code as the IR. The shared code has been moved out into `legalize-types.{h,cpp}`.
Notes:
- I eliminated the `FilteredTupleType` type, since it was starting to cause code duplication in a lot of places. Instead, type legalization just creates new `struct` types to represent the result of filtering.
- One big consequence of this is that the `LegalType::pair` case needs to remember for each field in the original type which field (if any) in the new `struct` type it maps to
- A big source of complexity (and probably bugs) in this code is trying to figure out how to parent these new `struct` definitions effectively. A good follow-on change would be something that outputs declarations on-demand during the AST emit logic (as we do for the IR), just to avoid some of this song and dance.
- The old AST type legalization had a notion of both a "tuple" type and a "varying tuple" type. The "tuple" case was quite complex, and combined behavior currently handled by `LegalType::pair` (for splitting into ordinary and special sides) and `LegalType::tuple` (for holding multiple distinct elements to represent the fields of an aggregate). The "varying tuple" case was closer to `LegalType::tuple`, so I tried to just re-use the existing logic for that too. The one place this potentially gets messy is in `reifyTuple()`.
- The messiest bit of handling the "varying tuple" concept (which is used for GLSL shader inputs/outputs since they have to be scalarized) is that when passing them as function arguments we need to reify the tuple back into a structured value. Because the `LegalExpr` hierarchy doesn't have type information, but constructing a value of the "original" type requires such information, things get a little messy.
- I did *not* try to deal with any of the logic related to handling system inputs/outputs for cross-compilation purposes. Of course, the long-term goal is that any actual cross-compilation is handled via the IR, but this change can't afford to break the AST-based path just yet. As a result, there is still quite a bit of complexity in the handling of assignment, to deal with cases where "fixups" are required.
* fixup: bad code in macro, not caught by Visual Studio compiler
* fixup: more stuff missed by VS compiler
* fixup: VS continutes to miss stuff in UNREACHABLE_RETURN
|
|
Fixes #301
The problem here is that if you have input GLSL code like:
```glsl
// example.vs
in vec3 pos;
```
and:
```glsl
// example.fs
in vec3 worldPos;
```
Then both `pos` and `worldPos` are reflected as global variables (parameters of the *program*), which both get bound to "varying input" resources, but there is no way to tell through the API that `pos` is a vertex parameter while `worldPos` is a fragment one.
The original request in issue #301 was to expose parameters like this not as a global variables, but rather as parameters of the entry point in their specific file. That is, treat it as if the user had written, e.g.:
```glsl
// example.vs
void vsMain(in vec3 pos) { ... }
```
Doing that would unify the GLSL and HLSL/Slang cases a bit, but would require the Slang reflection API to lie about the structure of code the user wrote. At a more basic level, that would have been hard to implement because the current reflection API just exposes the underlying AST, and the AST *needs* to leave `pos` at the global scope so that when we go and spit GLSL back out we retain the original structure.
This PR implements a more simplistic solution, where the user is allowed to query the stage that a varying parameter "belongs" to. For right now I'm only enabling this to work for varying parameters (but it doesn't care if they are entry-point or global-scope varyings). Despite what I said on #301, this should work for both the top-level parameter's variable layout, *and* any variable layouts for fields within its type reflection.
In terms of implementation, I took the simple but wasteful route: every `VarLayout` now has a `stage` field that is by default initialized to `SLANG_STAGE_NONE`. When collecting varying parameters, I take advantage of the fact that everything bottlenecks through `processEntryPointParameter()` which takes an `EntryPointParameterState` so that I can set the `VarLayout::stage` field for any varying parameter in one place.
While I was making this change, I also did a bit of cleanup so that the "official" names for the varying parameter categories are `VARYING_INPUT` and `VARYING_OUTPUT`, with `VERTEX_INPUT` and `FRAGMENT_OUTPUT` being "deprecated" in principle. I didn't do the bulk rename inside the codebase yet.
|
|
* Rename `lower.{h,cpp}` to `ast-legalize.{h,cpp}`
This pass isn't really performing lowering akin to `lower-to-ir.{h,cpp}` so the file name is misleading.
By renaming this pass we emphasize its role as an AST-related pass.
Also update the comment at the top of `ast-legalize.h` to reflect the intended purpose of this pass in a world where we have the IR up and running.
* Allow `import` as an alias for `__import`
The use of double underscores to mark our new syntax has so far had two purposes:
1. It helps identify syntax that isn't meant to be exposed to users in its current form (e.g., `__generic` gets a double underscore because we want users to have a more pleasant surface syntax for generics that they write). This rationale doesn't apply to `__import`, which is a major language feature that users need to interact with.
2. It helps avoid the problem where the compiler treats something as a keyword that isn't supposed to be reserved in HLSL/GLSL and so causes existing user code to fail to parse (e.g., when the user tries to write a function called `import`). This no longer matters because we look up almost all of our keywords using the existing lexical scoping in the language (so the user can shadow almost any keyword with a local declaration).
So, neither of the original two reasons applies to `__import`, and it makes sense to expose it as `import`.
Doing so is a one-line change.
|
|
* Add support for global generic parameters
(In-progress work)
This commit include:
1. Update Slang API to allow specification of generic type arguments in an `EntryPointRequest`
2. Add parsing of `__generic_param` construct, which becomes a GlobalGenericParamDecl, contains members of `GenericTypeConstraintDecl`.
3. Semantics checking will check whether the provided type arguments conform to the interfaces as defined by the generic parameter, and store SubtypeWitness values in the EntryPointRequest, which will be used by `specializeIRForEntryPoint` when generating final IR.
4. Add a new type of substitution - `GlobalGenericParamSubstitution` for subsittuting references to `__generic_param` decls or to its member `GenericTypeConsraintDecl` with the actual type argument or witness tables.
5. Update `IRSpecContext` to apply `GlobalGenericParamSubstitution` when specializing the IR for an EntryPointRequest.
6. Update `render-test` to take additional `type` inputs, which specifies the type arguments to substitute into the global `__generic_param` types.
This commit does not include ProgramLayout specialization.
* IR: pass through `[unroll]` attribute (#284)
The initial lowering was adding an `IRLoopControlDecoration` to the instruction at the head of a loop, but this was getting dropped when the IR gets cloned for a particular entry point.
The fix was simply to add a case for loop-control decorations to `cloneDecoration`.
* fix warnings
* IR: support `CompileTimeForStmt` (#286)
This statement type is a bit of a hack, to support loops that *must* be unrolled.
The AST-to-AST pass handles them by cloning the AST for the loop body N times, and it was easy enough to do the same thing for the IR: emit the instructions for the body N times.
The only thing that requires a bit of care is that now we might see the same variable declarations multiple times, so we need to play it safe and overwrite existing entries in our map from declarations to their IR values.
Of course a better answer long-term would be to do the actual unrolling in the IR. This is especially true because we might some day want to support compile-time/must-unroll loops in functions, where the loop counter comes in as a parameter (but must still be compile-time-constant at every call site).
* Add support for global generic parameters
(In-progress work)
This commit include:
1. Update Slang API to allow specification of generic type arguments in an `EntryPointRequest`
2. Add parsing of `__generic_param` construct, which becomes a GlobalGenericParamDecl, contains members of `GenericTypeConstraintDecl`.
3. Semantics checking will check whether the provided type arguments conform to the interfaces as defined by the generic parameter, and store SubtypeWitness values in the EntryPointRequest, which will be used by `specializeIRForEntryPoint` when generating final IR.
4. Add a new type of substitution - `GlobalGenericParamSubstitution` for subsittuting references to `__generic_param` decls or to its member `GenericTypeConsraintDecl` with the actual type argument or witness tables.
5. Update `IRSpecContext` to apply `GlobalGenericParamSubstitution` when specializing the IR for an EntryPointRequest.
6. Update `render-test` to take additional `type` inputs, which specifies the type arguments to substitute into the global `__generic_param` types.
progress on parameter binding
* Add a more contrived test case for specializing parameter bindings
* update render-test to align buffers to 256 bytes (to get rid of D3D complains on minimal buffer size).
* adding one more test case for parameter binding specialization.
* Cleanup according to @tfoleyNV 's suggestions.
* fix a bug introduced in the cleanup
|
|
* Don't auto-enable IR use for compute tests
The `COMPARE_COMPUTE` and `COMPARE_RENDER_COMPUTE` test fixtures were set up to always enable the `-use-ir` flag on Slang, which precludes having any tests that confirm functionality on the old non-IR path (which is still required by our main customer).
This change adds the `-xslang -use-ir` flags explicitly to any compute test cases that left them out, and makes the fixture no longer add it by default.
* Continue building out parameter block support
The initial front-end logic for parameter blocks was already added, but they are still missing a bunch of functionality. This change addresses some of the known issues:
- Bug fix: don't try to emit HLSL `register` bindings for variables that consume whole register spaces/sets
- Overhaul type layout logic so that it can make decisions based on a given code generation target (currently passed in as a `TargetRequest`), which allows us to decide whether or not a parameter block should get its own register set on a per-target basis.
- Always use a register space/set for Vulkan
- Never use a register space/set for HLSL SM 5.0 and lower
- By default, don't use register spaces/sets for HLSL output
- Add a command-line flag and some "target flags" to enable register-space usage for D3D targets
- Hackily add initial support for parameter blocks in the AST-to-AST path
- This just blindly lowers `ParameterBlock<T>` to `T`, which shouldn't quite work
- A more complete overhaul will probably need to wait until the AST-to-AST legalization is changed to use the `LegalType`s from the IR legalization pass.
- Add a compute-based test case to actually run code using parameter blocks
- This file runs test cases both with and without the IR
|
|
This is currently only useful for `struct` types.
I implemented a special-case exception so that the auto-generated `struct` types used for `cbuffer` members don't show their internal name.
I did *not* implement any logic to avoid returning the name `vector` for a vector type, etc., since they are all `DeclRefType`s and it seemed easiest to just let the user access information they can't really use.
|
|
* Rename existing ParameterBlock to ParameterGroup
We are planning to add a new `ParameterBlock<T>` type, which maps to the notion of a "parameter block" as used in the Spire research work.
Unfortunately, the compiler codebase already uses the term `ParameterBlock` as catch-all to encompass all of HLSL `cbuffer`/`tbuffer` and GLSL `uniform`/`buffer`/`in`/`out` blocks (all of which are lexical `{}`-enclosed blocks that define parameters...).
This change instead renames all of the existing concepts over to `ParameterGroup`, which isn't an ideal name, but at least doesn't directly overlap the new terminology or any existing terminology.
The new `ParameterBlockType` case will probably be a subclass of `ParameterGroupType`, since it is a logical extension of the underlying concept.
* Add Shader Model 5.1 profiles
The HLSL `register(..., space0)` syntax is only allowed on "SM5.1" and later profiles (which is supported by the newer version of `d3dcompiler_47.dll` that comes with the Win10 SDK, but not the older version of `d3dcompiler_47.dll` - good luck figuring out which you have!).
This change adds those profiles to our master list of profiles, and nothing else.
* First pass at support for `ParameterBlock<T>`
- Add the type declaration in stdlib
- Add a special case of `ParameterGroupType` for parameter blocks
- Handle parameter blocks in type layout (currently handling them identically to constant buffers for now, which isn't going to be right in the long term)
- Add an IR pass that basically replaces `ParameterBlock<T>` with `T`
- Eventually this should replace it with either `T` or `ConstantBuffer<T>`, depending on whether the layout that was computed required a constant buffer to hold any "free" uniforms
- Add first stab at an IR pass to "scalarize" global variables using aggregate types with resources inside.
- This currently only applies to global variables, so it won't handle things passed through functions, or used as local variables
- It also only supports cases where the references to the original variable are always references to its fields, and not the whole value itself
- Add a single test case that technically passes with this level of support, but probably isn't very representative of what we need from the feature
* Fold parameter-block desugaring into a more complete "type legalization" pass
The basic problem that was arising is that once you desugar `ParameterBlock<T>` into `T`, you then need todeal with splitting `T` into its constituent fields if it contains any resource types.
Handling those transformations by following the usual use-def chains wasn't really helping, because you might need systematic rewriting that can really only be handled bottom-up.
This change adds a new pass that is intended to perform multiple kinds of type "legalization" at once:
- It will turn `ParameterBlock<T>` into `T`
- It may at some point also convert `ConstantBuffer<T>` into `T` as well
- It will turn an value of an aggregate type that contains resources into N different values (one per field)
- As a result of this, it will also deal with AOS-to-SOA conversion of these types
Legalization is applied to *every* function/instruction/value, so that it can make large-scale changes that would be tough to manage with a work list.
This pass needs to be run *after* generics have been fully specialized, so that we know we are always dealing with fully concrete types, so that their legalization for a given target is completely known.
This is still work in progress; there's more to be done to get this working with all our test cases, and finish the remaining `ParameterBlock<T>` work.
* Improve binding/layout information when using parameter blocks
- When doing type layout for a parameter block, don't include the resources consumed by the element type in the resource usage for the parameter block
- Note that this is pretty much identical to how a `ConstantBuffer<T>` does not report any `LayoutResourceKind::Uniform` usage, except that `ParameterBlock<T>` is *also* going to hide underlying texture/sampler reigster usage
- The one exception here is that any nested items that use up entire `space`s or `set`s those need to be exposed in the resource usage of the parent (I don't have a test for this)
- When type legalization needs to scalarize things, it must propagate layout information down to the new leaf variables. In general, the register/index for a new leaf parameter should be the sum of the offsets for all of the parent variables along the "chain" from the original variable down to the leaf (we aren't dealing with arrays here just yet).
- When type legalization decides to eliminate a pointer(-like) type (e.g., desugar `ParameterBlock<T>` over to `T`), actually deal with that in terms of the `LegalVal`s created, so that we can know to turn a `load` into a no-op when applied to a value that got indirection removed.
- Hack up the "complex" parameter-block test so that it actually passes (the big hack here is that the HLSL baseline is using names that are generated by the IR, and are unlikely to be stable as we add/remove transformations).
- Note: I can't make these be compute tests right now, because regsiter spaces/sets are a feature of D3D12/Vulkan, and our test runner isn't using those APIs.
|
|
- Add shader model 6.0, 6.1, and 6.2 targets
- Add DXIL and DXIL assembly as output formats
- Add header for DXC API to `external/`
- Add `dxc-support.cpp` that wraps usage of the API
- Add `-pass-through dxc` option, equivalent to what we have for `fxc`
Notes:
* This does *not* include any logic to add `dxcompiler.dll` to our build process; that is way out of scope for the build complexity I'm ready to deal with
* For right now, the use of `dxcompiler.dll` is hard-coded, and it must be discoverable in the current executable's search path; options to customize can come later
* The `-pass-through` option is kind of silly because the code doesn't actually pay attention to the value (just whether it is set). If you set it to `fxc` but ask for DXIL, we pass through `dxc` anyway.
|
|
This is functionality required to support a Falcor bug fix.
Most of the code to compute the right semantic name/index for a parameter was already present.
This change adds:
- Storage for semantic name/index on every `VarLayout`
- Note: this is wasteful and should be optimized later
- A public API to query the semantic name/index
- The contract is that this API returns `NULL` if the parameter had no semantic
- A bit of work in `parameter-binding.cpp` to attach semantics to varying input/output when traversing varying parameters.
- Note: this is intentionally set up so that it associates semantics even with non-leaf parameters, so that an API user can query the semantic of a `struct` parameter and know that its members will be assigned sequential semantic indices from its starting value.
- Support for dumping this information in reflection tests
One notable thing that I did *not* change here is that the reflection test fixture doesn't report information on the output of an entry point, even though it really should. That should be fixed in a separate change, though, because it would affect many of the expected outputs.
|
|
The big addition here is that the Slang "bytecode" is no longer treated as just a "code generation target" (`CodeGenTarget`) akin to DX bytecode (DXBC) or SPIR-V, but instead is a `ContainerFormat` that can be used to emit all the results of a compile request (well, currently just the IR-as-BC, but the intention is there).
Getting to this goal involved some prior checkins that eliminated bogus "targets" that weren't really akin to SPIR-V or DXBC: `-target slang-ir-asm` and `-target reflection-json`. Those targets were really in place to support testing, and so they've been made more explicit testing/debug options.
This change eliminates `-target slang-ir` and instead tries to allow the user to specify `-o foo.slang-module` as an output file name, that indicates the intention to output a "container" file that will wrap up all the generated code.
I've also gone ahead and generalized the existing `-target` option so that we are actually building up a *list* of code generation targets. This is largely just a cleanup, since it forces code to be more aware of when it is doing something target-specific vs. target independent. For example, reflection layout information lives on a requested target, and not on the compile request as a whole, and similarly output code is per-target, per-entry-point.
As a cleanup, I eliminated support for per-translation-unit output. This was vestigial code from back when I used to try and do HLSL generation for a whole translation unit instead of per-entry-point (which turned out to be a lot of complexity for little gain), and it was only being used in the `hello` example and the `render-test` test fixture - in both cases fixing it up was easy enough. I've stubbed out the old `spGetTranslationUnitSource` API, but haven't removed it yet.
|
|
* Get rid of the `-slang-ir-asm` target
This is really only useful for debugging, so I've replaced the functionality with a `-dump-ir` command line option (which dump's the IR for an entry point before doing codegen).
* fixup: use HLSL target, not DXBC, so test can run on Linux
|
|
Move reflection JSON generation into separate test fixture
|