| Age | Commit message (Collapse) | Author |
|
* Move to newer glslang
* Support cross-compilation of ray tracing shaders to Vulkan
This change allows HLSL shaders authored for DirectX Raytracing (DXR) to be cross-compiled to run with the experimental `GL_NVX_raytracing` extension (aka "VKRay").
* The GLSL extension spec is marked as experimental, so that any shaders written using this support should be ready for breaking changes when the spec is finalized.
* "Callable shaders" are not exposed throug the GLSL extension, so this feature of DXR will not be cross-compiled.
* The experimental Vulkan raytracing extension does not have an equivalent to DXR's "local root signature" concept. This does not visibly impact shader translation (because the local/global root signature mapping is handled outside of the HLSL code), but in practice it means that applications which rely on local root signatures on their DXR path will not be able to use the translation in this change as-is; more work will be needed.
The simplest part of the implementation was to go into the Slang standard library and start adding GLSL translations for the various DXR operations.
In some cases, like mapping `IgnoreHit()` to `ignoreIntersectionNVX()` this is almost trivial.
The various functions to query system-provided values (e.g., `RayTMin()`) were also easy, with the only gotcha being that they map to variables rather than function calls in GLSL, and our handling of `__target_intrinsic` assumes that a bare identifier represents a replacement function name, and not a full expression, so we have to wrap these definitions in parentheses.
The tricky operations are then `TraceRay<P>()` and `ReportHit<A>()`, because these two are generics/templates in HLSL.
GLSL doesn't support generics, even for "standard library" functions, so the raytracing extension implements a slightly complex workaround: the matching operations `traceNVX()` and `reportIntersectionNVX()` pass the payload/attributes argument data via a global variable.
That is, shader code for the GLSL extensions writes to the global variable and then calls the intrinsic function.
The linkage between the call site and the global is established by a modifier keyword (`rayPayloadNVX` and `hitAttributeNVX`, respectively) and in the case of ray payload also uses `location` number to identify which payload global to use (since a single shader can trace rays with multiple payload types).
Our translation strategy in Slang tries to leverage standard language mechanisms instead of special-case logic.
For example, to translate the `ReportHit<A>()` function, we provide both a default declaration that will work for HLSL (where the operation is built-in with the signature given), and a *definition* marked with the `__specialized_for_target(glsl)` modifier.
The GLSL definition declares a function `static` variable that will fill the role of the required global, and then does what the GLSL spec requires: assigns to the global, and then calls the `reportIntersectionNVX` builtin (which we declare as a separate builtin).
Our ordinary lowering process will turn that `static` variable into an ordinary global in the IR, and the `[__vulkanHitAttributes]` attribute on the variable will be emitted as `hitAttributeNVX` in the output.
There is no additional cross-compilation logic in Slang specific to `ReportHit<A>()` - the target-specific definition in the standard library Just Works.
The case for `TraceRay<P>()` is a bit more complicated, simply because the GLSL `traceNVX()` function needs to be passed the `location` for the payload global.
We implement the payload global as a function-`static` variable, with the knowledge that every unique specialization of `TraceRay<P>()` will generate a unique global variable of type `P` to implement our function-`static` variable.
We then add a slightly magical builtin function `__rayPayloadLocation()` that can map such a variable to its generated `location`; the logic for this is implemented in `emit.cpp` and described below.
We also changed the `RayDesc` and `BuiltinTriangleIntersectionAttributes` types from "magic" intrinsic types over to ordinary types (because the GLSL output needs to declare them as ordinary `struct` types).
This ends up removing some cases in the AST and IR type representations.
By itself this change would break HLSL emit, because in that case the types really are intrinsic.
We added a `__target_intrinsic` modifier to these types to make them intrinsic for HLSL, and then updated the downstream passes to handle the notion of target-intrinsic types.
The logic for binding/layout of entry point inputs and outputs was updated so that raytracing stages don't follow the default logic for varying input/output parameters.
This is because the input/output parameters of a raytracing entry point aren't really "varying" in the same sense as those in the rasterization pipeline.
In particular, the SPIR-V model for raytracing input and output treats "ray payload" and "hit attributes" parameters as being in a distinct storage class from `in` or `out` parameters.
We also detect cases where a ray tracing stage declares inputs/outputs that it shouldn't have. This logic could conceivably be extended to other stages (e.g., to give an error on a compute shader with user-defined varying input/output).
The type layout logic added cases for handling raytracing payload and hit-attribute data, but this is currently just a stub implementation that follows the same logic as for varying `in` and `out` parameters (it cannot give meaningful byte sizes/offsets right now).
To my knowledge the GLSL spec doesn't currently specify anything about layout, and I haven't read the DXR spec language carefully enough to know what it says about layout.
A future change should update the layout logic to allow for byte-based layout of ray payloads, etc. so that we can query this information via reflection.
The GLSL legalization logic in `ir.cpp` was updated to factor out the per-entry-point-parameter code into its own function, and then that function was updated to special-case the input/output of a ray-tracing shader.
While for rasterization stages we typically want to take the user-declared input/output and "scalarize" it for use in GLSL (in part to deal with language limitations, and in part to tease system values apart from user-defined input/output), the GLSL spec for raytracing requires payload and hit attribute parameters to be declared as single variables. There is also the issue that even for an `in out` parameter, a ray payload parameter should only turn into a single global, whereas the handling for varying `in out` parameters generates both an `in` and an `out` global for the GLSL case.
Other than the handling of entry point parameters, the GLSL legalization pass doesn't need to do anything special for ray tracing shaders.
The trickiest change in the `emit.cpp` logic is that we now generate `location`s for ray payload arguments (the outgoing from a `TraceRay()` call) on demand during code generation.
This is a bit hacky, and it would be nice to handle it as a separate pass on the IR rather than clutter up the emit logic, but this approach was expedient.
Basically, any of the global variables that got generated from the `static` declarations in the standard library implementation of `TraceRay()` will trigger the logic to assign them a `location`.
The logic for emitting intrinsic operations added a few new `$`-based escape sequences. The `$XP` case handles emitting the location of a generated ray payload variable; this is how we emit the matching location at the site where we call `traceNVX`. The `$XT` case emits the appropriate translation for `RayTCurrent()` in HLSL, because it maps to something different depending on the target stage.
All of the test cases here consist of a pair of an HLSL/Slang shader written to the DXR spec, plus a matching GLSL shader for a baseline.
The GLSL shaders are carefully designed so that when fed into glslang they will produce the same SPIR-V as our cross-compilation process.
This kind of testing is quite fragile, but it seems to be the best we can do until our testing framework code supports *both* DXR and VKRay.
A bunch of the core changes ended up being blocked on issues in the rest of the compiler, so some additional features go implemented or fixed along the way:
The first big wall this work ran into was that the `__specialized_for_target` modifier hasn't actually been working correctly for a while.
It turns out that for the one function that is using it, `saturate()`, we have been outputting the workaround GLSL function in *all* cases (including for HLSL output) rather than only on GLSL targets.
The problem here is that for a generic function with a `__specialized_for_target` modifier or a `__target_intrinsic` modifier, the IR-level decoration will end up attached to the `IRFunc` instruction nested in the `IRGeneric`, but the logic for comparing IR declarations to see which is more specialized (via `getTargetSpecializationLevel()`) was looking only at decorations on the top-level value (the generic).
The quick (hacky) fix here is to make `getTargetSpecializationLevel()` try to look at the return value of a generic rather than the generic itself, so that it can see the decorations that indicate target-specific functions.
A more refined fix would be to attach target-specificity decorations to the outer-most generic (to simplify the "linking" logic).
The only reason not to fold that into the current fix is that the `__target_intrinsic` modifier currently serves double-duty as a marker of target specialization *and* information to drive emit logic. The latter (the emit-related stuff) currently needs to live on the `IRFunc`, and moving it to the generic could easily break a lot of code.
This needs more work in a follow-on fix, but for now target specialization should again be working.
The other big gotcha that the simple "just use the standard library" strategy ran into was that function-`static` variables weren't actually implemented yet, and in particular function-`static` variables inside of generic functions required some careful coding.
The logic in `lower-to-ir.cpp` has this `emitOuterGenerics()` function that is supposed to take a declaration that might be nested inside of zero or more levels of AST generics, and emit corresponding IR generics for all those levels.
This is needed because two different AST functions nested inside a single generic `struct` declaration should turn into distinct `IRFunc`s nested in distinct `IRGeneric`s.
The tricky bit to making that all work is that the same AST-level generic type parameter will then map to *different* IR-level instructions (the parameters of distinct `IRGeneric`s) when lowering each function.
The existing logic handled this in an idiomatic way by making "sub-builders" and "sub-contexts."
This change refactors some of the repeated logic into a `NestedContext` type to help simplify the pattern, and applies it consistently throughout the `lower-to-ir.cpp` file.
Besides that cleanup, the major change is `lowerFunctionStaticVarDecl` which, unsurprisingly, handles lower of function-`static` variables to IR globals.
The careful handling of nested contexts here is needed because if we are in the middle of lowering a generic function, then a `static` variable should turn into its *own* `IRGeneric` wrapping an `IRGlobalVar`. The body of the function should refer to the global variable by specializing the global variable's `IRGeneric` to the parameters of the *functions* `IRGeneric`. This tricky detail is handled by `defaultSpecializeOuterGenerics`.
An additional subtlety not actually required for this raytracing work (and thus not properly tested right now) is handling function-`static` variables with initializers.
These can't just be lowered to globals with initializers, because HLSL follows the C rule that function-`static` variables are initialized when the declaration statement is first executed (and this could be visible in the presence of side-effects).
The lowering strategy here translates any `static` variable with an initializer into *two* globals: one for the actual storage, plus a second `bool` variable to track whether it has been initialized yet.
There are some opportunities to optimize this case, especially for `static const` data, but that will need to wait for future changes.
We've slowly been shifting away from the model where a user thinks of a "profile" as including both a stage and a feature level.
Instead, the user should think about selecting a profile that only describes a feature level (e.g., `sm_6_1`, `glsl_450`, etc.), and then separately specifying a stage (`vertex`, `raygeneration, etc.) for each entry point.
The challenge here is that the command-line processing still only had a single `-profile` switch, and no way to specify the stage.
Adding the `-stage` option was relatively easy, but making it work with the existing validation logic for command-line arguments was tricky, because of the complex model that `slangc` supports for compiling multiple entry points in a single pass.
* In `slang.h` add new reflection parameter categories for ray payloads and hit attributes, as part of entry point input/output signatures.
* A previous change already updated our copy of glslang to one that supports the `GL_NVX_raytracing` extension, so in `slang-glslang.cpp` we just needed to map Slang's `enum` values for the raytracing stage names to their equivalents in the glslang code.
* Moved the logic for looking up a stage by name (`findStageByName()`) out of `check.cpp` and into `compiler.cpp`, with a declaration in `profile.h`
* Added a `$z` suffix to the GLSL translation of `Texture*.SampleLevel()`, to handle cases where the texture element type is not a 4-component vector. Note that this fix should actually be applied to *all* these texture-sampling operations, but I didn't want to add a bunch of changes that are (clearly) not being tested right now.
* The layout logic for entry points was updated to correctly skip producing a `TypeLayout` for an entry point result of type `void`, which meant that the related emit logic now needs to guard against a null value for the result layout.
* In `ir.cpp`, dump decorations on every instruction instead of just selected ones, so that our IR dump output is more complete.
* Added a command-line `-line-directive-mode` option so that we can easily turn off `#line` directives in the output when debugging. Not all cases where plumbed through because the `none` case is realistically the most important.
* Parser was fixed to properly initialize parent links for "scope" declarations used for statements, so that we can walk backwards from a function-scope variable (including a `static`) and see the outer function/generics/etc.
* Added GLSL 460 profile, since it is required for ray tracing. Also updated the logic for computing the "effective" profile to use to recognize that GLSL raytracing stages require GLSL 460.
* Added some conventional ray-tracing shader suffixes to the handling in `slang-test`. This code isn't actually used, but was relevant when I started by copy-pasting some existing VKRay shaders as the starting point for my testing.
* Fixup: typos
|
|
* * Change the layout of IROp such that 'main' IROps are 0-x.
* Removed MANUAL_RANGE instuction types, as no longer needed.
* Work in prog on optimizing.
* * Constant time lookup for IROpInfo
* Refactor and document a little more the IROp layout
* Mark ops that use 'other' bits
* Fix typo in definition of kIROpFlag_UseOther
* First pass at working out serialization structure.
* Work in progress on ir-serialize
* Storing strings in IRSerialInfo
Split out IRSerialInfo from the IRSerializer - to make more explicit what is actually saved.
* First pass at serializing out data.
* First pass at serialize reading.
* Fix riff fourcc mark order.
* First pass at reconstructing IRInst / IRDecoration from serialized data.
* Handling of TextureBaseType
* Deserializing of constants.
* Small changes around ir serialization.
* Changed StringIndex indexing to not be an offset into the m_strings array, but an index into strings in order. Doing so makes cache lookup much faster, and makes the 'indicies' themselves smaller and therefore more compressible.
* Removed the need for m_arena in IRSerialWriter. Previously it's purpose was to store the string contents that were being used to lookup UnownedStringSlice.
Now we keep the StringRepresentation in scope and reference that, and so don't need the copy.
* Don't need to construct the IRModuleInst as is created and set on createModule call.
* Remove test code for testing serialization.
* Fix problem with release build in ir-serialize causing warning.
* Use SLANG_OFFSET_OF for offsets in non pod classes to avoid gcc/clang warning.
Give storage to integral static variables to avoid linkage problems with gcc/clang.
* Fix warnings under x86 win32 debug.
|
|
* * Change the layout of IROp such that 'main' IROps are 0-x.
* Removed MANUAL_RANGE instuction types, as no longer needed.
* Work in prog on optimizing.
* * Constant time lookup for IROpInfo
* Refactor and document a little more the IROp layout
* Mark ops that use 'other' bits
* Fix typo in definition of kIROpFlag_UseOther
|
|
* * Added support for strings in IR with IRStringLit - with storage of chars after it
* Added kIRDecorationOp_Transitory - can be used for detecting instructions constructed on stack
* Made IRConstant hashing work off type
* Fix comment that is out of date about how an instruction is determines to hold a transitory string.
|
|
* * Remove dispose from IRInst
* Use MemoryArena instead of MemoryPool
* Make all IRInst not require Dtor - by having ref counted array store ptrs that need freeing
* Increase block size - typically compilation is 2Mb of IR space(!)
* Fix issues around StringRepresentation::equal because null has special meaning.
* Don't bother to construct as String to compare StringRepresentation, just used UnownedStringSlice.
* Added fromLiteral support to UnownedStringSlice and use instead of strlen version.
* Use more conventional way to test StringRepresentation against a String.
* Fix gcc/clang template problem with cast.
|
|
* Fix typo OuptutTopologyAttribute -> OutputTopologyAttribute
First pass support for handing tesselation shaders - domain and hull.
* Added attribute PatchConstantFuncAttribute
* Added visitHLSLPatchType(HLSLPatchType* type) such that the patch type template parameters are handled
* Added IRNotePatchConstantFunc - such that the patch constant function is referenced within IR
* Added support for outputing typical tesselation attributes (although minimal validation is performed)
* Added findFunctionDeclByName
* Small improvements to diagnostic.
* Improved diagnostics and checking for geometry shader attributes.
* Added diagnostic if patchconstantfunc is not found
Handle assert failure when outputing a domain shader alone and therefore attr->patchConstantFuncDecl is not set.
* Simple script tess.hlsl to test out domain/hull shaders.
* Added url for where hull shader attributes are defined.
* Fix unsigned/signed comparison warning.
* Restore removal of fix in "Improve generic argument inference for builtins (#598)"
* Update tessellation test case to compare against fxc
The test was previously comparing against fixed expected DXBC output, but this caused problems when the test runner tried to execute the test on Linux (where there is no fxc to invoke...), and would also be a potential source of problems down the road if different users run using different builds of fxc.
The simple solution here is to convert the test to compare against fxc output generated on the fly. That test type is already filtered out on non-Windows builds, so it eliminates the portability issue (in a crude way).
I also changed the test to compile both entry points in one compiler invocation, just to streamline things into fewer distinct tests.
* Eliminate unnecessary call to `lowerFuncDecl`
In a very obscure case this could cause a bug, if the patch-constant function had somehow already been lowered (because it was called somewhere else in the code).
The call should not be needed because `ensureDecl` will lower a declaration on-demand if required, so eliminating it causes no problems for code that wouldn't be in that extreme corner case.
|
|
Fixes #581
This change adds a new parameter passing mode `__ref` to exist alongisde `in`, `out`, and `inout`.
The `__ref` modifier indicates true by-reference parameter passing (whereas `inout` is copy-in-copy-out).
This is not intended to be something that users interact with directly, but rather a low-level feature that lets us provide a correct signature for the `Interlocked*()` operations in the standard library.
Most of the support for passing what are logically addresses around already exists in the IR, so the majority of the work here is just in introducing the new type `Ref<T>` and then using it appropriately when lowering `__ref` parameters/arguments to the IR.
|
|
* render-test should not fail on HLSL compiler *warnings*
The logic in `render-test` that invokes `D3DCompile` was causing a test to fail if it produced any warnings (not just if compilation fails).
Warning output can be dealt with by the test runner, since it will compare output between runs anyway, and it is useful to be able to run something through `render-test` that compiles with warnings.
* Be more careful about deleting IR instructions
There was an `IRInst::deallocate()` method that had a precondition that the instruction should already be removed from its parent and clear out all its operands before calling, but it wasn't checking this and the few call sites weren't doing things right either.
I consolidated things on `IRInst::removeAndDeallocate()` which does all the things: removes from the parent, clear out operands, and then deallocates.
I also made sure to clear out the type operand.
This clears up some crashing issues where passes were removing instructions but those instructions would still show up as users of other instructions.
* Don't emit bitwise not for non-Boolean types
It seems like the logic in `emit.cpp` messed things up and decided that `Not` (the IR instruction that is equivalent to `!` in the AST) should emit as `!` for Boolean types and `~` for other types, but this makes no sense (e.g., `~(a & 1)` is very different from `!(a & 1)`, even when interpreted as a condition).
It seems like this logic was intended for the `BitNot` case, where `~a` and `!a` are actually equivalent for Boolean values (but a target language might not like `~a` on `bool` values).
Maybe the original plan was that the `Not` instruction should only apply to Boolean values in the first place, and that other values should be converted to `bool` (or a vector of `bool`) before applying `Not`, but even in that case the emit logic makes no sense.
This caused an actual problem for one of my test cases, so it was important to fix it now.
* Fix issue with cached resolution for overoaded operators
The basic problem was that the lookup logic was forming a key based on the *first* definition it found for the overloaded operator, but that means that when processing a prefix `++a` call we might look up the *postfix* definition of `operator++` and decide to use its opcode as the key.
This "fixes" the logic by looking for the first definition with a "compatible" definition (e.g., a `__prefix` function if we are checking a `PrefixExpr`), and then uses its opcode.
A better fix in the long run would be to make the cache just be keyed on the operator name and the "fixity" of the expression (prefix, postfix, or infix).
* Introduce an intermediate structured control-flow representation
The code previously used a single function called `emitIRStmtsForBlocks` in `emit.cpp` that would take a logical sub-graph of the CFG and emit it as high-level statements.
It would do this by recognizing operations like coniditional branches that it could turn into high-level `if` statements, etc.
The main problem with this function was that it mixed together the logic for how we restructure the program with the logic for how we emit high-level code from that structure.
This change splits those two parts of the algorithm by introducing an intermediate data structure: a tree of `Region`s, which represent single-entry regions of the CFG.
There are subclasses of `Region` corresponding to various structured control-flow constructs, and then a leaf case that wraps a single `IRBlock`.
The new function `generateRegionsForIRBlocks()` (in `ir-restructure.cpp`) now handles the restructuring work, by building one or more `Region`s to represent a sub-graph, while `emitRegion()` handles emitting HLSL/GLSL source code from a region.
Splitting things in this way opens up some opportunities for future changes:
* We can expand the set of IR control-flow constructs allowed, so long as we can still generate structure `Region`s from them, without having to mess with the emit logic (e.g., we could start to support multi-level `break` by introducing temporaries as needed). In the limit we can generate our `Region`s using something like the "Relooper" algorithm.
* We can emit to other representations while retaining the same control-flow restructuring support. E.g., if we drop the structured information from the IR, then emitting to SPIR-V for Vulkan would require us to use the strucured control-flow information from these `Region`s.
* We can do analysis that needs to understand `Region` structure. This is relevant to issue #569, which was what prompted me to start on this work. Now that we have a representation of the nesting of `Region`s, we can use it to reason about visibility of values between blocks.
During development of this change I ran into a gotcha, in that I had been assuming each IR block would map to a single `Region`, forgetting that our current lowering of "continue clauses" in `for` loops leads to them being duplicated. The `Region` representation handles this by having a linked-list struct mapping IR blocks to the `SimpleRegion`s that represent them. I added a test case that includes a `for` loop with a continue clause that is reached along multiple paths just to make sure that we continue to support that case.
The compiler output should not change as a result of this work; this is supposed to be a pure refactoring change.
* Add a pass to resolve scoping issues in generated code
Fixes #569
The basic problem arises because the structured control flow that we output in high-level HLSL/GLSL doesn't match the "scoping" rules of an SSA IR.
In particular, SSA says that a value can be used in any block that is dominated by the definition, but in the presence of `break` and `continue` statements it is easy to construct cases where a block dominates something that is not in its scope for structured control flow. Consider:
```hlsl
for(;;) {
int a = xyz;
if(a) { int b = a; break; }
int c = a;
}
int d = b;
```
This program is invalid as HLSL, because the variable `b` is referenced outside of its scope, but if we look at the CFG for this function, it is clear that the block that computes `b` dominated the block that computes `d`. IR optimizations can easily create code like this, so we need to be ready for it.
The previous change added an explicit `Region` structure to represent the structured control flow that we re-form out of the IR, and this change adds a pass that exploits the structuring information to detect cases like the above and introduce temporaries to fix the scoping issue. For example, the pass would change the earlier code block into something like:
```hlsl
int tmp;
for(;;) {
int a = xyz;
if(a) { int b = a; tmp = b; break; }
int c = a;
}
int d = tmp;
```
That is, we introduce a new `tmp` variable at a scope "above" both the definition and use of `b`, and then we copy `b` into that temporary right where it is computed, and then use the temporary instead of the original `b` at the use site.
A few details that came up during the implementation:
* Downstream compilers may get confused by code like the above, and complain that `tmp` may be used before it is initialized, even though the very definition of dominators in a CFG means we don't have to worry about it. Still, I introduced some one-off code to initialize the temporaries just to silence spurious warnings coming from fxc.
* We need to be careful not to apply this logic to "phi nodes" (the parameters of basic blocks) since they will already be turned into temporaries by the emit logic, and trying to introduce temporaries with this pass led to broken code (I still need to investigate why). It may be that a future version of this pass should also take the code out of SSA form, so that we can introduce both kinds of temporaries in a single pass (and maybe eliminate some unnecessary variables by doing basic register allocation).
There is another transformation that could fix some issues of this kind, by moving code out of a structured control-flow construct and to the "join point" after it. For example, we could turn our loop from the start of this commit message into:
```hlsl
for(;;) {
int a = xyz;
if(a) { break; }
int c = a;
}
int b = a;
int d = b;
```
Moving the definition of `b` to after the loop is possible because there is no way to get out of the loop without executing that code anyway. Now the scoping issue for `d`'s use of `b` has gone away, but of course we've introduced a *new* scoping issue for `a`, when it gets used by `b`.
Adding a pass to re-arrange control flow like this could reduce the cases where we have to apply the current pass, but it wouldn't eliminate them entirely. That means such a pass can be deferred to future work.
This change includes a test case the reproduces the original issue, so that we can confirm the fix works.
|
|
(#573)
Fixes #568
The problem occurs when an entry point declares multiple `out` parameters:
```hlsl
void myVS( out float4 a : A, out float4 b : B )
{
...
a = whatever;
b = somethingElse;
...
if(done)
{
return; // explicit return
}
...
// implicit return
}
```
Slang translates code like this by introducing a GLSL global `out` parameter for each of `a` and `b`, rewriting the logic inside the entry point to use a local temporary instead of the real parameters, and then assigning from the locals to the globals at every `return` site:
```glsl
out vec4 g_a;
out vec4 g_b;
void main()
{
// insertion location (1)
vec4 t_a;
vec4 t_b;
...
t_a = whatever;
t_b = somethingElse;
...
if(done)
{
// insertion location(2)
g_a = t_a;
g_b = t_b;
return; // explicit return
}
...
// insertion location (3)
g_a = t_a;
g_b = t_b;
// implicit return
}
```
Note that there are three different places (for this example) where code gets inserted to make the translation work. We insert declarations of local variables at the top of the function, and then insert the copy from the temporariesto the globals at each `return` site (implicit or explicit).
The bug in this case was that the pass was setting the insertion location to (1) outside of the loop for parameters, so that when it was done with `a` and moved on to `b`, it would end up inseting the temporary `t_b` at the last location used (location (3) in this example), and this would result in invalid code, because `t_b` gets used before it is declared.
This bug has been around for a while, but it has largely been masked by the fact that so few shaders use multiple `out` parameters, and also because Slang's SSA-ification pass would often be able to eliminate the local variable anyway, so that the bug never bites the user. The reason it surfaced now for a user shader was because we introduced `swizzledStore`, which currently inhibits SSA-ification, so that some temporaries that used to get eliminated are now retained so that they can break things.
The fix in this case is small: we use the existing `IRBuilder` only for insertions at location (1) and construct a new builder on the fly for all the insertions at `return` sites. I have not included a test case yet, because our end-to-end Vulkan testing is not yet ready, so this may regress again in the future.
|
|
* Cleanups around behavior when the compiler fails
* Add another case where we try to `noteInternalErrorLoc()` if an exception in thrown. This one is the in the logic for emitting an IR instruciton. This could be improved by adding another layer at the function level (as a catch-all for instructions with no location), but something is better than nothing.
* Change a bunch of `assert()`s over to `SLANG_ASSERT()`s, so that we can theoretically take more control over them (e.g., make release builds with asserts enabled)
* Some other small cleanups around the assertions we perform.
In the survey I made, I didn't really see many obvious "smoking gun" cases where we could produce a significantly better error message for some of the unimplemented/unexpected paths, other than to actually implement the missing functionality.
* fixup
|
|
The emit logic already had an idea of when an instruction should be "folded" it its use site(s), and this change just expands on that logic to try to be more aggressive. The basic idea is that instead of outputting this:
```hlsl
float4 _S3 = a_0 + b_0;
float4 _S4 = c_0 * _S3;
d_0 = _S4;
```
we can hopefully output something like this:
```hlsl
d_0 = c_0 * (a_0 + b_0);
```
The way this works is that after dealing with the various special cases that decide an instruction `I` must/cannot be folded in, we look and see if it has the following properites:
* `I` has no side effects
* `I` has a single user, `U`
* `I` and `U` are in the same block (and `I` comes before `U` in that block)
* for every instruction `X` between `I` and `U` (exclusive), `X` has no side effects
If all of these conditions are true, then `I` can be folded in as a sub-expression when we emit `U`.
This change doesn't affect most of our test output, but there is still a single test with SPIR-V output that we compare against a GLSL baseline, and so that baseline had to be modified to match the GLSL we now generate.
Similar to #547, this change is not meant to provide a complete solution, but rather to take a concrete but low-risk step toward improving our output. Opportunities to improve the results further include:
* We can/should ensure that when outputting sub-expressions we keep extra parentheses to a minimum. The old logic for emitting from an AST had support for "unparsing" expressions with minimal parentheses, and we should try to do the same. This can be error-prone, because omitting parentheses can lead to silent failures, so it must be done carefully.
* We could try to be more aggressive about detecting what operations might have side effects. The most interesting case is function calls, where we should try to check if the callee is a function known to be side-effect-free. We could start by annotating most builtin functions with an attribute/decoration that indicates freedom from side effects. Deriving this attribute for user functions could be interesting, but we'd have to be careful since "nontermination" is technically a side effect.
* We could try to be more aggressive about determining what side effects in instructions `X` are "safe" for the instruction `I` to move across. For example, if `I` is a load from variable `a` and `X` is a store to variable `b`, then that would seem to be safe. This starts to get into issues of instruction scheduling, though, and that is probably beyond what we want Slang to be doing.
|
|
The basic idea here is that when lowering to the IR, the front-end will attach a "name hint" to the IR instruction(s) that represent a given declaration, and then the passes that work on the IR will try to preserve and propagate those names, and then finally the emit logic will use them in place of mangled or unique names when available.
This change does *not* try to deal with the issues that arise when we try to use those variable names in the output without any modification (e.g., handling cases where they might clash with keywords or builtins in the target language). Instead, it tries to establish baseline behavior for propagating through names, so that a later change can concentrate on the issue of using those names exactly when it is legal to do so.
In order to avoid issues around the name "hints" causing problems we take two main steps:
1. We "scrub" each name to reduce it down to the allowed set of identifier characters in C-like languages, and then ensure that it doesn't do things that would be illegal in some downstream languages (e.g., consecutive underscores are not allowed in GLSL) or could clash with Slang's mangled names. This process isn't guaranteed to give distinct results for distinct inputs (it isn't a mangling scheme, after all).
2. We generate a unique ID for each occurence of a given name and always use that as a suffix. This means that even if a name happens to overlap with a keyword (if you somehow have a variable named `do`), we will still add a suffix that makes it not a problem (we'd output `do_0` which is fine).
The logic for generating these names is mostly straightforward. For simple variables, we use their given name directly, while for other declarations we try to form a name that includes their parent declaration (e.g. `SomeType.someMethod`).
Various IR passes need to propagate or preserve this information. The most interesting is type legalization, when we take a variable with an aggregate type and split some of the fields out into their own variables. In that case we generate "dotted" names like `someVar.someTexture` and rely on the emit logic to turn that into `someVar_someTexture`.
During SSA generation, if we are promoting a variable to SSA temporaries, we will try to propagate the name of the variable over to the temporaries (unless they already have a name from some other place). The same applies to block parameters ("phi nodes").
Many of the test changes need their expected output to be updated for this change. Luckily in most cases the output has gotten easier to understand.
|
|
This was a known issue in our IR representation, which was now biting a user. The basic problem is that in code like the following:
```hlsl
RWStructureBuffer<float4> buffer;
...
buffer[index].xz = value;
```
we ideally want to be able to reproduce the original HLSL code exactly, but that requires directly encoding the way that this code writes to two elements of a vector, but not the others.
The currently lowering strategy we had produced IR something like:
```hlsl
float4 tmp = buffer[index];
tmp.xz = value;
buffer[index] = tmp;
```
That transformation might seem valid, but it has some big problems:
* It generates UAV reads that are not needed, which could impact performance
* It performs read-modify-write operations on memory that the programmer didn't explicitly write, which could create data races
The fix here is somewhat obvious: if the "base" of a swizzle operation on a left-hand side resolves to a pointer in our IR, then we can output a "swizzled store" instead of the read-modify-write dance. We currently keep the read-modify-write around since it is potentially needed as a fallback in the general case.
Along the way I also tried to make sure that we handle the case where we have a swizzle of a swizzle on the left-hand side:
```hlsl
buffer[index].xz.y = value;
```
That code should behave the same as `buffer[index].z = value`. I am currently detecting and cleaning up this logic in the lowering path for `SwizzleExpr`, because that is the only place in the lowering logic that "swizzled l-values" currently get created.
|
|
The problem here arises when multiple entry points are compiled in one pass. Each entry point has its own arguments for global generic parameters, and leads to us emitting a `bindGlobalGenericParameter(p, val)`. But once the first entry point's substitutions are applied, the second entry point's code gives `bindGlobalGenericParameter(val, val)` and the compiler crashes (in debug builds) because `val` is not a global generic parameter.
This change just applies a quick fix. If we see `bindGlobalGenericParameter(x,y)` during specialization, and `x` is not a global generic parameter, then we skip it.
The right long-term fix is to change the compiler's representation of global generic arguments so that they live on a `CompileRequest` instead of an `EntryPointRequest`. That is a more significant change (with impact on the public API), so I'm inclined to leave it as a cleanup for another day (given that no customers are using global generic parameters today).
|
|
Fixes #519
The code was leaving out the `default` label from the successor list,
which would break any passes that require an accurate CFG (with the big
one right now being the SSA-formation pass).
|
|
The cross-compilation logic for geometry shaders would look through the user's entry point for calls like `someStream.Emit<X>(val)` and turn that into `outputGlobals = val; EmitVertex();`.
It was recognizing the `Emit()` calls by looking at the callee in all `call` instructions and seeing if it was registered to lower to GLSL as `EmitVertex()`. The logic was try to look "through" `specialize` instructions (to deal with the `<X>` bit in the call above), but this wasn't updated for the new IR encoding where the first operand to a `specialize` is the generic being specialized, and not the function nested inside it.
The fix here is to properly look through both `specialize` instructions and generics. This is kind of a gross operation and we've done things like it in a few places, so it might be something we try to extract into a utility function in the future.
|
|
The basic problem was that the lowering logic was constructing (more or less) `Ptr<@GroupShared X>` instead of `@GroupShared Ptr<X>`.
There were also problems with passes not propagating through rates that should have been (e.g., legalization).
I've added a test case to actually validate `groupshared` support.
|
|
* Introduce an IR-level type system
Up to this point, the Slang IR has used the front-end type system to represent types in the IR.
As a result (but ultimately more importantly) the IR representation of generics and specialization has used AST-level concepts embedded in the IR.
For example, to express the specialization of `vector<T,N>` to a concrete type `float` for `T`, we needed an IR operation that could represent the specialization, with operands that somehow represented the type argument `float`.
The whole thing was very complicated.
The big idea of this change is to introduce a new representation in which types in the IR are just ordinary instructions, so that using them as operands makes sense. The hierarchy of IR types closely mirrors the AST-side hierarchy for now, and that will probably be something we should maintain going forward.
In order to make these changes work, though, I also had to do major overhauls of things like the way substitutions are performed, how we check interface conformances, the way lookup through interface types is done, etc. etc. This is a big change, and unfortunately any attempt to summarize it in the commit message wouldn't do it justice.
* Fix 64-bit build warning
* Fix up some clang warnings/errors
|
|
* Don't emit interpolation modifiers on GLSL fields
The previous change that started passing through interpolation modifiers didn't account for the fact that the GLSL grammar doesn't allow interpolation modifiers on `struct` fields, so we shouldn't emit them in that case (and our legalization strategy for GLSL guarantees that varying input will never use a `struct` type anyway).
* Try to handle SV_Position semantic on GS input
HLSL allows `SV` semantics to be used for ordinary inter-stage dataflow in some cases (e.g., a VS can output `SV_Position` and it can then be read from a GS).
GLSL allows similar things with `gl_Position`, but there are some wrinkles.
One fix here is to correctly identify that `gl_FragCoord` should only be used as the translation of `SV_Position` for a fragment shader input (and not in the general case of *any* input).
The other "fix" here is a kludge to handle the fact that the right translation for a GS input is not to an array called `gl_Position`, but to the syntax `gl_in[index].gl_Position` (array-of-structs style). I am doing this by attaching a custom decoration to the global variable we create for `gl_Position` and then intercepting it during code emit. I'm not proud of this, and would like to do something better given time.
* Fix GLSL output for matrix-scalar multiplication
The output logic was assuming that any use of `operator*` in the input code that yields a value of matrix type must be translated to a `matrixCompMult()` call in GLSL, but this should really only apply if both of the *operands* are matrices (not just based on the result type). As a result matrix-times-scalar operations were emitting a call to `matrixCompMult()` and GLSL was complaining because it won't implicitly promote scalars to matrices.
|
|
* Prevent duplicating global values during specialization.
* Fixup checking code
|
|
The main practical change here is that things that used to be `IRValue`s, like literals, are now being expressed as instructions in the global scope.
In order to validate that things are actually being handled correctly, this change introduces an explicit "validation" pass that can be run on the IR to check for different invariants (although it doesn't check many of the important ones right now). I've left the validation pass turned off by default, but with a command-line flag to enable it. We may want to make it be on by default in debug builds, just to keep us honest. The main invariant for the moment is that when on IR instruction is used as an operand to another, it had better come from the same IR module.
Some of the existing passes were violating this rule, in particular when it came to cloning of witness tables related to global generic parameter substitution. Those features can in theory be handled better now by allowing `specialize` instructions at other scopes, but I didn't want to over-complicate this change, so I make just enough fixes to ensure that these steps always clone witness tables they get from the "symbols" on an IR specialization context. In order for this to work when recursively specializing, I had to ensure that the logic for generic specialization had a notion of a "parent" specialization context that it would fall back to to perform cloning when necessary.
This change keeps the logic that was caching and re-using the instructions for literal values within a module, but adds some logic that isn't really being tested right now for picking the right parent instruction to insert a constant instruction into. This logic doesn't trigger right now because all of the cases we are using it on have zero operands (and so they always get "hoisted" to the global scope), but eventually for things like types we want to be able to support instructions with operands (e.g., `vector<float, 4>`) and handle the case where some of those operands come from different scopes (e.g., when nested inside a generic).
The final change here is mostly cosmetic: the `IRBuilder` is now more abstract about where insertion occurs: it tracks a single `IRParentInst` to insert into, and then an optional `IRInst` to insert before. In the common case, that parent is an `IRBlock`, but it could conceivably also be the global scope, or a witness table, etc. Use sites where we used to change those fields directly now use distinct methods `setInsertInto(parent)` and `setInsertBefore(inst)` which capture the two cases we care about. Accessors are also defined to extract the current block (if the current parent is a block), and the current "function" (global value with code, if the current parent is a global value with code, or a block inside one).
With this work in place, it should be possible for a follow-on change to start putting `specialize` instructions at the global scope and thus clean up some of the on-the-fly specialization work. This work should also help with some of the requirements around a distinct IR-level type system and more explicit generics.
|
|
* IR: "everything is an instruction"
This change tries to streamline the representation of the IR in the following ways:
* Every IR value is an instruction (there is no `IRValue` type any more)
* All IR values that can contain other values share a single base (`IRParentInstruction`)
* Dynamic casts to specific IR instruction types can be accomplished with a new `as<Type>(inst)` operation, that uses the IR opcode to implement casts.
The biggest change in terms of number of lines is getting rid of `IRValue`. The diff here could probably be smaller if I'd just done `typedef IRInst IRValue;`.
Along the way I also renamed the `getArg`/`getArgs`/`getArgCount` combination over to `getOperand`/`getOperands`/`getOperandCount` to avoid being confusing when we have something like a `call` instruction where the "arguments" of the call don't line up with the operands of the instruction.
I also tried to clean up the representation of lists of child instructions to try to make it easier to iterate over them with C++ range-based `for` loops. Developers still need to be careful about mutating the contents of a block while iterating over it in this fashion (e.g. if you remove the "current" element, the iteration will end prematurely).
Probably the thorniest change here is that parameters are now just represented as the first N instructions in a block, which means:
* We need to perform a linear search to find the end of the parameter list. This is probably not often a problem, because usually you would be iterating over the parameters anyway, and that will be linear in the number of parameters.
* Algorithms that iterate over a block either need to ignore parameters, treat parameters just like other instructions, or somehow cleave the list into the range of parameters, and the range of "ordinary" instructions (which involves the same linear search above).
* When inserting into a block, we need to be careful not to insert instructions at invalid locations (e.g., insert a temporary before the parameters, or insert a parameter in the middle of the code). I can't pretend that I've handled the details of that here. (This is no different than having to make the same adjustments for phi nodes in a typical SSA representation)
* One possible future-proof approach is to implement a pass that sorts the instructions in a block so that parameters always come first. That would let us implement passes without caring about this detail, and then clean up right before any pass that cares about the relative order of parameters and other instructions.
The current change is missing any work to make literals and other instructions that used to be `IRValue`s properly nest inside of their parent module. Right now these instructions are just left unparented, and may actually end up being shared between distinct modules. Fixing that will need a follow-up change. The biggest challenge there is that it introduces instructions at the global scope that aren't `IRGlobalValue`s.
This change doesn't try to take advantage of any of the new flexibility (e.g., by nesting `specialize` instructions inside of witness tables). The goal is to do exactly what we were doing before, just with a different representation.
* Warning fix
|
|
These changes are related to getting a first Slang geometry shader to translate to GLSL. There are some unrelated cross-compilation fixes in here as well.
* Add direct support to shader parameter layout for GS output streams, so that they are reflected as a container type
* Fix the declarations of the `SampleCmp` methods; they should always return `float`, independent of the nominal element type of the texture.
* Fix up our handling of `__target_intrinsic` modifiers, so that we are a little bit more careful in how we detect something as being just a simple name replacement (e.g., `__target_intrinsic(glsl, "foo")` should make us output `foo(original, args, here)`) vs. a custom expression (e.g., `__target_intrinsic(glsl, "bar+1")` should output `bar+1` and not use any arguments, even without any `$` substitutions).
* Don't emit the `[unroll]` modifier when outputting GLSL. Eventually we need to fully unroll loops for GLSL output anyway.
* Inspect th entry point parameter list (from the layout information) when emitting a GS, so that we can write out the correct `layout` modifiers for input primitive type and output primitive topology.
* Add a new case to `ScalarizedVal` to handle cases where an HLSL system value needs to map to a GLSL built-in variable with a slightly different type (e.g., `SV_RenderTargetArrayIndex` is a `uint` while `gl_Layer` is an `int`). For now this is only hanlding trivial cases (where a direct cast can achieve the result we want), but eventually it might need to handle things like conversion between arrays and vectors.
* This is mostly just the infrastructure for the feature, and the actual enumeration of the correct types for all the system values is still to be done.
* Handle a few more cases in assignment between `ScalarizedVal`. In particular, deal with cases where `materializeValue` is called on a tuple that has an array type, so that we need to construct the individual array elements.
* Add translation for GS output stream `Append()` and `RestartStrip()`
* Note that the translation of `Append()` seems to ignore its argument; this is because we desugar the operation during legalization for GLSL (see next item)
* When legalizing for GLSL, detect an entry point parameter that is a GS stream, and translate it into `out` variables for its element type, and then rewrite any calls to `Append()` in the body of the entry point to be preceded by assignment to those variables. This works in tandem with the above translation of HLSL `Append()` calls into GLSL `EmitVertex()` calls.
* We are detecting calls to `Append()` in a slightly hacky way, by looking at decorations on the callee to make sure that it is a function that is determined to translate to `EmitVertex()`.
* Right now we aren't handling calls to `Append()` in other functions. It wouldn't be hard in principle to walk all the functions in the module and apply the translation (assuming we don't want to start supporting multiple output streams), but this wouldn't handle the passing of the GS output stream between functions. (This points out that there is a need for an additional type legalization pass that desugars away parameters of types that aren't actually meaningful on the target).
|
|
This allows us to get rid of `IRGlobalValue::dispose()`.
|
|
* Initial work on validating "constexpr"-ness in IR
The underlying issue here is that certain operations in the target shading languages constrain their operands to be compile-time constants. A notable example is the optional texel offset parameter to the `Texture2D.Sample` operation.
When calling these operations in GLSL, the user is required to pass a "constant expression," and any variables in that expression must therefore be marked with the `const` qualifier (and themselves be initialized with constant expressions). Any GLSL output we generate must of course respect these rules.
When calling these operations in HLSL, the user is not so constrained. Instead, they can pass an arbitrary expression, which may involve ordinary variables with no particular markup, and then the compiler is responsible for determining if the actual value after simplification works out to be a constant. In some cases, the requirement that a value be constant might actually trigger things like loop unrolling. Also, it is okay to use a function parameter to determine such a constant expression, as long as the argument turns out to be a constant at all call sites.
The way we have decided to tackle these challenges in Slang is that we we propagate a notion of `constexpr`-ness through the IR. This is currently being tackled in `ir-constexpr.cpp` with a combination of forward and backward iterative dataflow:
* When the operands to an instruction are all `constexpr`, and the opcode is one we believe can be constant-folded, then we infer that the instruction *can* be evaluated as `constexpr`
* When instruction is required to be `constexpr`, then we infer that all of its operands are also required to be `constexpr`.
If this process ever infers that a function parameter is required to be `constexpr`, then we might have to continue propagation at all the call sites to that function.
If after all the propagation is done, there are any cases where an instruction is *required* to be `constexpr`, but it *can't* be `constexpr` (we weren't able to infer `constexpr`-ness for its operands), then we issue an error.
This implementation encodes the idea of `constexpr`-ness in the IR as part of the type system, using a simplified notion of rates. This change adds a `RateQualifiedType` that can represent `@R T`, and then introduces a `ConstExprRate` that can be used for `R`. Many accessors for the type information on IR nodes were updated to distinguish when one wants the "full" type of an IR value (which might include rate information) vs. just the "data" type.
A `constexpr` qualifier was added in the front-end, and is being used to decorate the texel offset parameter for `Texture2D.Sample`. Lowering from AST to IR looks for this qalifier and infers when a function parameter must be typed as `@ConstExpr T` instead of just `T`.
There are lots of limitations and gotchas in the implementation so far:
* The `@ConstExpr` rate is the only one added in this change, but it seems clear that the conceptual `ThreadGroup` rate that was added to represent `groupshared` should probably get folded into the representation.
* I'm not 100% pleased with how many places in the IR I have to special-case for rate-qualified types. At the same type, pulling out rate as a distinct field on `IRValue` would probably require that we pay attention to rate everywhere.
* I've added a test case to show that we can issue errors when users fail to provide a constant expression for the texel offset, but the actual error message isn't great because it doesn't indicate *why* a constant expression was required. Realistically the "initial IR" should contain a few more decorations we can use to relate error conditions back to the original code (even if this is in a side-band structure).
* I've added a test case that is supposed to show that we can back-propagate `constexpr`-ness to local variables, and I've manually confirmed that it works for Vulkan/SPIR-V output, but the level of Vulkan support in `render_test` today means I can't enable the test for check-in.
* While I'm attempting to propagate `@ConstExpr` information from callees to callers, I haven't implemented any logic to specialize callee functions based on values at call sites.
* In a similar vein, there is no handling of control-flow dependence in the current code. If we infer that a phi (block parameter) needs to be `@ConstExpr`, then it isn't actually enough to require that the inputs to the phi (arguments from predecessor blocks) are all `@ConstExpr` because we also need any control-flow decisions that pick which incoming edge we take to be `@ConstExpr` as well.
* As a practical matter, implicit propagation of `@ConstExpr` from a function body to a function parameter should only be allowed for functions that are "local" to a module. Any function that might be accessed from outside of a module should really have had its `@ConstExpr` parameter marked manually, and our pass should validate that they follow their own rules. Right now we have no kind of visibility (`public` vs `private`) system, so I'm kind of ignoring this issue.
While that is a lot of gaps, this is also just enough code to get the Falcor MultiPassPostProcess example working, so I'm inclined to get it checked in.
* Fixup: missing expected output for test
* Fixup: disable test that relies on [unroll] for now
|
|
1. reorder destruction order of several key classes to avoid using deleted IR objects when destroying Types
2. remove Session::canonicalTypes and make each Type own a RefPtr to the canonicalType, to allow types to be destroyed along with each IRModule it belongs to.
|
|
1, make IRModule class own a memory pool for all IR object allocations
2. For now, we allow IR objects to own other (externally) heap allocated objects, such as String, List and RefPtrs by tracking all IR objects that has been allocated for the IRModule in a list named `IRModule::irObjectsToFree`. and call destructor for all these objects upon the destruction of the IRModule. In the long term, we should eliminate the use of all these externally allocated types in IR system and get rid of this tracking and explicit destructor calls.
3. remove non-generic `createValueImpl` functions and retain only generic versions in IRBulider so we can properly call the constructor of the IR types to set up virtual tables correctly for destructor dispatching.
4. add `MemoryPool` class for allocation of the IR objects.
5. Make sure we are disposing IRSpecContexts when we are done with the specialized IR module.
6. Add `_CrtDumpMemoryLeaks()` calls to check memory leaks upon destruction of a Slang session. If we are to support multiple sessions at a time, this call should probably be replaced with the more advanced MemoryState versions of the memory leak checker.
|
|
The approach here isn't ideal. We already have a pass that transforms HLSL varying input/output types into GLSL global variables. This change makes it so that when those inputs/outputs have system-value semantics, we generate a global variable declaration with the appropriate `gl_*` name (leaving the type the same for now). Later, when emitting code, we just skip emitting declarations for declarations with mangled names that start with `gl_*`.
A more complete implementation will be needed later on, which handles cases where the translation requires types to be changed (so that conversion code needs to be inserted).
|
|
* Fix bugs around IR legalization of GLSL input/output
- Add case to handle assignment of one `ScalarizedVal::Flavor::address` to another (still need to make sure we are handling all the possible cases there)
- Revamp logic for creating global variable declarations for varying inputs/outputs.
- Actually handle creating array declarations (not sure if binding locations will be correct)
- Properly deal with offsetting of locations for nested fields
- Only create varying input/output layout information as needed for the separate `in` and `out` variables we create to represent a single HLSL `inout` varying
* During SSA generation, recursively remove trivial phis
This is actually written up in the original paper I used as a reference, but I hadn't implemented the case yet.
When you eliminate one phi as trivial (because its only operands were itself and at most one other value), you might find that another phi becomes trivial (because it had this phi as an operand, but now it will have the other value...).
The one thing that made any of this tricky is that our "phi" nodes are really block parameters, and thus they don't technically have operands (`IRUse`s). The `IRUse`s for each phi were being tracked in a separate array, and had their `user` field set to null.
With this change, I set their `user` to be the corresponding `IRParam` for the phi (and that means I changed `IRParam` to inherit from `IRUser` even though it shouldn't really be required).
* Re-build SSA form after specialization/legalization
The main reason to do this is that legalization might scalarize types, and thus might allow us to clean up resource-type local variables that we were not able to clean up when they were part of an aggregate.
Note: we shouldn't really need to do this, because the front-end should actually be guaranteeing that types that include resources are used in "safe" ways, but we currently don't have the analyses required to support that.
* Give an error message if we get GLSL input
The API and command-line interface still recognize and nominally support GLSL input files, because they need to be supported in the "pass-through" mode.
This change just adds an error message if we encounter a GLSL input file in anything other than "pass-through" mode.
|
|
The basic problem here is that when unlinking an `IRUse` from the linked list of uses, there were several cases where I was failing to set the `prevLink` field of the next node to match the `prevLink` field of the node being removed. That doesn't show up when walking the linked list of uses forward, but it breaks it whenever you have subsequent unlinking operations.
This change fixes the bugs of that kind I could find, and also adds a debug validation method to try to avoid breaking it again. I also made more access to `IRUse` go through accessor methods rather than using fields directly, to try to avoid this kind of error. I stopped short of making anything `private`, because I tend to find that it creates more hassles than it avoids.
A few other fixes along the way:
- Made the `List<T>` type default-initialize elements when you resize it. I hadn't realized we weren't doing that.
- Add a standalone `dumpIR(IRGlobalValue*)` so help when debugging issues.
|
|
* Basic IR support for `static const` globals
Our strategy for lowering global *variables* can fall back to putting their initialization into a function, but that isn't really appropriate for global constants (it also isn't appropriate for arrays, but we'll need to deal with that seaprately).
This change adds a distinct case for global constants (rather than treating them as variables), and forces the emission logic to always emit them as a single expression.
Doing this makes assumptions about how the IR for these constants gets emitted (and what optimziations might do to it).
In order to make things work, I had to switch the handling of initializer-list expressions to not be lowered via temporaries and mutation (since that isn't a good fit for reverting to a single expression).
I've added a single test case to ensure that this works in the simplest scenario. My next priority will be to see if this unblocks my work in Falcor.
* Fixup: bug fixes
|
|
* Re-define deprecated compile flags
By including these flags in the header file, with a value of zero, we can allow some existing code to compile even after the major changes to the implementation.
* The `SLANG_COMPILE_FLAG_NO_CHECKING` option will effectively be ignored, since checking is always enabled.
* The `SLANG_COMPILE_FLAG_SPLIT_MIXED_TYPES` option will now act as if it is always enabled (and indeed some of the code has been relying on this flag being set always).
* Make subscript operators writable for writable textures
This even had a `TODO` comment saying that we needed to fix it, and now I'm seeing semantic checking failures because we didn't define these and so we find assignment to non l-values.
* Fix definitions of any() and all() intrinsics
These should always return a scalar `bool` value, but they were being defined wrong in two ways:
1. They were using their generic type parameter `T` in the return type
2. They were returning a vector in the vector case, and a matrix in the matrix case.
This change just alters the return type to be `bool` in all cases.
* Fix bug in SSA construction
When eliminating a trivial phi node, it is possible that the phi is still recorded as the "latest" value for a local variable in its block.
When later code queries that value from the block (which can happen whenever another block looks up a variable in its predecessors), it would get the old phi and not the replacement value.
I simply added a loop that checks if the value we look up is a phi that got replaced, and then continues with the replacement value (which might itself be a phi...). A more advanced solution might try to get clever and have the map itself hold `IRUse` values so that we can replace them seamlessly.
* Simplify IR control flow representation
This change gets rid of various special-case operations for conditional and unconditional branches, and instead requires emit logic to recognize when a direct branch is targetting a `break` or `continue` label.
The new approach here isn't perfect, but it seems beter than what we had before, because it can actually work in the presence of control-flow optimizations (including our current critical-edge-splitting step).
* Load from groupshared isn't groupshared
When loading from a `groupshared` variable, the resulting temporary shouldn't have the `groupshared` qualifier on it.
This might eventually need to generalize to a better understanding of storage modifiers in the IR, but I don't really want to deal with that right now.
* Don't emit references to typedefs in output code
Now that we are using the IR for all codegen, we shouldn't be dealing with surface-level things like `typedef` declarations in the output code; just use the type that was being referred to in the first place.
* Fix floating-point literal printing for IR
The IR was calling `emit()` instead of `Emit()` (we really need to normalize our convention here), and was implicitly invoking a default constructor on `String` that takes a `double` (that constructor should really be marked `explicit`), and which doesn't meet our requirements for printing floating-point values.
* Fix error when importing module that doesn't parse
We already added a case to bail out if semantic checking fails, but neglected to add a case if there is an error during parsing of a module to be imported.
Note: this logic doesn't correctly register the module as being loaded (but still in error), so users could see multiple error messages if there are multiple `import`s for the same module.
* Improve error message for overload resolution failure
- Drop debugging info from the candidate printing
- Add cases to print `double` and `half` types properly
* Fixup: switch loopTest to ifElse in expected IR output
|
|
* Generate SSA form for IR functions
The basic idea here is simple: in the front-end after we have lowered the AST to initial IR we will apply a set of "mandatory" optimization passes. The first of these is to attempt to translate the all functions into SSA form so that they are amenable to subsequent dataflow optimizations. Eventually, the mandatory optimization passes would include diagnostic passes that make sure variables aren't used when undefined, etc.
Just doing basic SSA generation already cleans up a lot of the messiness in our IR today, because constructs that used to involve many local variables can now be handled via SSA temporaries.
The implementation of SSA generation is in `ir-ssa.cpp`, and it follows the approach of Braun et al.'s "Simple and Efficient Construction of Static Single Assignment Form." I used this instead of the more well-known Cytron et al. algorithm because Braun's algorith mis very simple to code, and does not require auxiliary analyses to generate the dominance frontier.
The main wrinkle in our SSA representation right now is that instead of using ordinary phi nodes, we instead allow basic blocks to have parameters, where predecessor blocks pass in different parameter values. This encodes information equivalent to traditional phi nodes, but has two (small) benefits:
1. There is no fixed relationship between the order of phi operands and predecessor blocks, so we don't have to worry about breaking the phis when we alter the order in which predecessors are stored. This is important for us because predecessors are being stored implicitly.
2. It is easy to operationalize a "branch with arguments" either when lowering to other languages, or when interpreting the IR. A branch with arguments is implemented as a sequence of stores from the arguments to the parameters of the target block (very similar to a call), followed by a jump to the block.
Relevant to the above, this change also adds an interface for enumerating the predecessors or successors of a block in our CFG. Rather than use an auxliary structure, we directly use the information already encoded in the IR:
* The sucessors of a block are the target label operands of its terminator instruction. In our IR this is a contiguous range of `IRUse`s, possible with a stride (to account for the way `switch` interleaves values and blocks).
* The predecessors of a block are a subset of the uses of the block's value. Specifically, they are any uses that are on a terminator instruction, and within the range of values that represent the successor list of that instruction.
One important limitation of the "blocks with arguments" model for handling phis is that it is really only convenient to stash extra arguments on an unconditional terminator instruction. This change works around this prob lem by breaking any "critical edges" - edges between a block with multiple successors and one with multiple predecessors. We assume that "phi" nodes will only ever be needed on a block with multiple predecessors, and because critical edges are broken, each of these predecessors will then have only a single successor, so its branch instruction can handle the extra arguments.
This change introduces a notion of an "undefined" instruction in the IR. This is handled as an instruction rather than a value because I anticipate that we will want to distinguish different undefined values when it comes time to start issuing error messages (those messages will need to point to the variable that was used when undefined).
* Fix expected test output.
Another change was merged that enabled the `glsl-parameter-blocks` test, and its output is affected by our IR optimization work.
|
|
The standard library already has a bunch of these decorations, since they were added to support Slang->Vulkan codegen on the AST-to-AST path. This change makes the IR code generator able to exploit the modifiers so that we pick up a bunch of Vulkan support "for free" in the short term.
The basic change is in `lower-to-ir.cpp` where we copy over any `TargetIntrinsicModifier`s to become `IRTargetIntrinsicDecoration`s with the same information. We then need a bit of logic in `ir.cpp` to make sure we clone them as needed.
The core work of using the modifiers is in `emit.cpp`, where I basically just copy-pasted the existing logic that applied in the AST path (all the AST-related code there is dead, and we should clean it up soon).
The big change that comes with this logic is that when dealing with a member function, the numbering of the argument used in the intrinsic definition string changes, so that `$0` refers to the base object (whereas before the base object was looked up via the base expression of a `MemberExpr` used for the function). This requires a bunch of the definitions in the library to be updated; hopefully I caught them all.
For kicks, I've re-enabled a cross-compilation test just to confirm that we are generating valid SPIR-V for code that performs texture-fetch operations. I don't expect us to keep that test enabled as-is in the long term, though, because it would be much better to instead use render-test to do the same thing. Alas, beefing up the Vulkan support in render-test is an outstanding work item, and I didn't want to pollute this change with more work along those lines.
|
|
If we don't find the generic we expect in the first pass during IR specialization, then we check for the special case where we are trying to specialize something from a generic extension, using the type being extended.
We assume that the generic parameter lists match (that part is the huge hack), and collect the arguments as if they were for the extension instead of the type.
This will break when/if we ever have generic extensions with parameter lists that don't match the type being extended.
|
|
`lookup_witness_table` instruction. (#376)
|
|
1. allow spReflection_FindTypeByName to accept arbitrary type expression string
2. allow const int generic value to be used as expression value, and as array size
3. various bug fixes in witness table specialization / function cloning during specializeIRForEntryPoint to avoid creating duplicate global values, not copying the right definition of a function from the other module, not cloning witness tables that are required by specializeGenerics etc.
|
|
- support overloaded generic function. this involves adding a new expression type, `OverloadedExpr2` to hold the candidate expressions for the generic function decl being referenced.
- make BitNot a normal IROp instead of an IRPseudoOp
- make sure we clone the decorations of parameters when cloning ir functions
- propagate geometry shader entry point attributes (`[maxvertexcount]` and `[instance]`) through HLSL emit
- IR emit: handle geometry shader entry-point parameter decorations, such as 'triangle'.
- IR emit: treat geometry shader stream output typed ir value as `should fold into use`.
|
|
1. prevent cyclic lookups when an interface inherits transitively from itself.
2. in `createGlobalGenericParamSubstitution`, create a default substitution for the base type declref before using it to lookup the witness table.
|
|
|
|
This commit is a bunch of quick hacks to get transitive interfaces to work. The idea is for each concrete type we create one giant witness table that contains entries for all the transitively reachable interface requirements, and then create one copy of that witness table for each interface it implements.
`DoLocalLookupImpl` now also looks up in inherited interface decles when looking up for a symbol in an interface decl.
When visiting `InheritanceDecl` in `lower-to-ir`, create copies of the giant witness table for each transitively inherited interface, so that these witness tables can be found later when the IR is specialized.
Re-enable the `copy all witness tables` hack in `specializeIRForEntryPoint` to ensure those transitive witness tables are copied over.
|
|
|
|
fixes #362
|
|
This commit changes the type of `DeclRefBase::substitutions` from `RefPtr<Substitutions>` to `SubstitutionSet`, which is a new type defined as following:
```
struct SubstitutionSet
{
RefPtr<GenericSubstitution> genericSubstitutions;
RefPtr<ThisTypeSubstitution> thisTypeSubstitution;
RefPtr<GlobalGenericParamSubstitution> globalGenParamSubstitutions;
}
```
This change get rid of most helper functions to retreive the substitution of a certain type, as well as surgery operations to insert a `ThisTypeSubstitution` or `GlobalGenericTypeSubstittuion` at top or bottom of the substitution chain. It also simplies type comparison when certain type of substitution should not be considered as part of type definition.
|
|
|
|
* fix #353
* move validateEntryPoint to after all entrypoints has been checked
* bug fix: DeclRefType::SubstituteImpl should change ioDiff
* bug fix: generic resource usage should have count of 1 instead of 0.
* update test case
|
|
fixes #325
This commit includes following changes:
1. Including a default DeclaredSubtypeWitness argument when creating a default GenericSubstitution for a DeclRefType, so that the witness argument can be successfully replaced with an actual witness table after specialization. (check,cpp)
2. Not emitting full mangled name for struct field members. Since the declref of the member access instruction do not include necessary generic substitutions for its parent generic parameters, so the mangled names of the declaration site and use site mismatches. Instead we just emit the original name for struct fields. (emit.cpp)
3. Allow IRWitnessTable to represent a generic witness table for generic structs. Adds necessary fields to IRWitnessTable for generic specialization. For now, the user field of the IRUse is not used and is nullptr. (ir-inst.h)
4. Make IRProxyVal use an IRUse instead of an IRValue*, so that an IRValue referenced by IRProxyVal (as a substitution argument) can be managed by the def-use chain for easy replacement. This is used for specializing witness tables. (ir.cpp, ir.h)
5. Add a `String dumpIRFunc(IRFunc*)` function for debugging.
6. Add name mangling for generic / specialized witness tables (mangle.cpp)
7. improved natvis file for inspecting witness tables.
8. Add specialization of witness tables:
1) `findWitnessTable` will simply return the specialize IRInst for a generic witness table.
2) make `cloneSubstitutionArg` call `cloneValue` to clone the argument instead of calling `context->maybeCloneValue`, so we can make use of the cloned value lookup machanism to directly return the specialized witness table (which is done when we process the `specialize` instruction on the generic witness table before process the decl ref).
3) bug fix: the argument in ir.cpp:3338 should be `newArg` instead of `arg`.
4) add `specializeWitnessTable` function to specailize a generic witness table. It clones the witness table, and recursively calls `getSpecailizedFunc` for the witness table entries.
5) make `specailizeGenerics` function also handle the case when an operand of the `specialize` instruction is a witness table. We will call `specializeWitnessTable` here and replace the `specialize` instruction with the specialized witness table. The replacement mechanism based on IR def-use chain works here because we have already make IRProxyVal a part of the def-use chain.
9. Add two more test cases for nested generics with constraints. (generic-list.slang and generic-struct-with-constraint.slang)
|
|
Fixes #318
Most of the required support was actually in place, so this is just a bunch of fixes:
- Detect when we are in "full IR" mode, so that we can always emit `struct` declarations with their mangled named (which will produce different names for different specializations, since we emit decl-refs)
- Carefully exclude builtin types from this for now. We'll need a more complete solution for mapping HLSL/Slang builtin types to their GLSL equivalents soon.
- Skip emitting types referenced by generic IR functions, since they might not be usable.
- Also fix things up so that we emit types used in the initializer for any global variables.
- Fix bug in generic specialization where we specialize the same function more than once, with different type arguments. We were crashing on a `Dictionary::Add` call where the key already exists from a previous specialization attempt.
- Fix name-mangling logic so that when outputting a possibly-specialized generic it looks for the outer-most `GenericSubstitution` rather than just the first one in the list. This is to handle the way that we insert other substitutions willy-nilly in places where they realistically don't belong. :(
All of these changes together allow us to pass a slightly modified (more advanced) version of the test case posted to #318.
|
|
* Fix: try to improve float literal formatting
An earlier change switched to using the C++ stdlib to format floats, but I neglected to check that it would correctly print values that happen to be integral with a decimal place (e.g., print `1.0` instead of just `1`). That causes some obscure cases to fail (e.g., when you write `1.0.xxxx` in HLSL, and we print out `1.xxxx`).
I've tried to resolve the issue by using the "fixed" output format, but that may also create problems for extremely large or small values (which really need to use scientific notation). However, this at least puts us back where we were before...
* Add support for source locations to the IR
- Add a `sourceLocation` field to all `IRValue`s
- Add some logic to associate locations with operations while lowering (using a slightly "clever" RAII approach)
- Make sure that when cloning instructions, we also clone the location
- Make the emit logic use the existing support for source locations when emitting code from the IR
A nice cleanup to this work would be to have the source locations for IR values not be hyper-specific about both line and column; it is probably enough to just emit on the correct line.
|
|
* Cleanups to `ParameterBlock<T>` behavior.
These add some more realistic tests using the `ParameterBlock<T>` support, and show that it can work with the "rewriter" mode.
Unfortunately, this code does *not* currently work with the rewriter + the IR at once. That will need to be fixed in a follow-on change, because I now see that the root problem is pretty ugly.
* cleanup
|