| Age | Commit message (Collapse) | Author |
|
* Only do scrubbing if needed. When allocating content try to limit size (with scrubbing each token takes up 1k), now it's 16 bytes min size.
* Don't allocate for every call to write on the CallbackWriter - use the m_appendBuffer.
* Don't allocate memory for CallbackWriter use m_appendBuffer.
* Use UnownedStringSlice for suffix output for parsing float/int literals.
Fix typo in invalidFloatingPointLiteralSuffix
* Using memory arena to hold tokens that are not in SourceManager.
* Improve comment on lexing.
* Make UnownedStringSlice allocation simpler on SourceManager.
* Fix error on gcc around UnownedStringSlice - because VC converted string + UnownedStringSlice automatically into a String.
* Fix generateName needing concat string for gcc.
* When constructing a Token in parseAttributeName - because it's a Identifier, we have to set the Name.
* Remove translation through String on getIntrinsicOp
* Make func-cbuffer-param disablable with -exclude compatibility-issue
* Move memory leak in render-test.
* From review - can just use "?:" instead of performing a concat.
|
|
* Refactor several IR passes
This change takes some IR passes that lived together in `ir.cpp` and moves them into their own files to improve clarity.
In most cases these were passes introduced early in the life of the IR, so that it didn't seem like a big deal to have them all in one file, but now that `ir.cpp` has grown unwieldly this seems like an important cleanup to make.
To give a quick rundown of the passes involved:
* The IR "linking" step has been pulled out to `ir-link.{h,cpp}`. This code for this pass is pretty much identical to what was in `ir.cpp`, and no attempt has been made to clean up or refactor it in the current change.
* The GLSL legalization step has been pulled out to `ir-glsl-legalize.{h,cpp}`. This used to be invoked directly from the linking step, but has been made a new top-level pass invoked from `emit.cpp`. Just like with the linking, the code in the new file is just a copy-paste of what was in `ir.cpp`, and no attempt at cleanup has been made. Also note that it might be a good idea to move this pass later in the overall sequence, but this PR doesn't attempt to do that as it could change results.
* The generic specialization step has been pulled out to `ir-specialize.{h,cpp}`. The file name does not explicitly reference *generic* specialization because I anticipate this pass having to perform other kinds of specialization as well. The code in this case amounts to a heavy cleanup/refactoring pass and thus deserves careful scrutiny. The reason for the cleanup is that the generic specialization step used to be part of the "linking" step long ago, and continued to share infrastructure with it long after that stopped making sense. The newly cleaned up pass has much simpler logic that should be easy enough to follow from the comments.
* In order to reduce code dulication, the IR "cloning" part of the `ir-specialize-resources.{h,cpp}` pass was pulled into its own files (`ir-clone.{h,cpp}`) that both the generic specialization step and the resource-based specialization step now share.
The remaining changes then pertain to deleting a bunch of code out of `ir.cpp` and adding the new files to the build.
The only test that needed updating was `vkray/raygen`, where some subtle ordering change in the refactored generic specialization logic has lead to the relative order of the specialized `TraceRay` and `saturate` functions beind reversed.
* fixup: typo in assert
* fixup: typos in comments
|
|
* Fix output comparison for compute tests
There was some vestigial logic there that was first doing a string-based comparison of actual/expected output, and then falling back to a path that parsed the expected output as a float, then converted that to an integer, then printed that integer in hex, and did the comparison with the result of that conversion.
I'm not even clear on what that code was trying to accomplish, but its main effect was allowing a test failure to slide by unnoticed becaues somehow an all-zeroes actual output file was matching an expected output file with no zeros. My understanding is that it went something like this:
* The first line of expected output was `A` (as in hexidecimal for the decimal integer `10`), and the first line of actual output was `0`.
* The `StringToFloat` function was failing on the input string `"A"` and returned `0.0` to indicate failure (rather than reporting any kind of error)
* We then converted the `0.0` to integer `0` and converted it to a base-16 string `"0"`
* The comparison to the actual output passed, and then a careless early exit in the comparison loop meant that a full test would pass as soon as one line of output passed according to this "second change" logic.
This change removes the broken code in the test runner since nothing was relying on it, other than the one broken test case we wanted to fix anyway.
* Fix the declarations of byte-address buffer methods for Vulkan
The HLSL `ByteAddressBuffer` and `RWByteAddressBuffer` types have methods `Load` and `Store` that take *byte* offsets from the start of the buffer, but we translate them into GLSL that uses `uint[]` array, so that indexing into that array will be off by a factor of four.
Somehow the code for mutable byte address buffers was written to add 4, 8, and 12 bytes to the base offset of a vector to get to its subsequent components, but I never thought about the implications this would have for the base address itself.
This change includes the following fixes:
* Any place in the translation of a byte-address `Load` or `Store` method that was using the address/offset value has been changed to use `$1 / 4` instead of `$1`.
* The offsets of 4, 8, and 12 have been changed to 1, 2, and 3 since they are now being added to an *index* instead of a byte offset
* The `GetDimensions` methods have introduced a factor of `* 4` to account for the fact that they need to return a byte size and not a count of elements.
With this change the existing `byte-address-buffer` test now produces the desired output under Vulkan.
|
|
* First step toward supporting use of interfaces as existential types
Traditional generics involve universal quantification. E.g., a declaration like:
```
void drive<T : IVehicle>(T vehicle);
```
indicates for *for all* types `T` that implement the `IVehicle` interface, the `drive()` function is available.
In contrast, whend directly using an interface type like:
```
IVehicle v = ...;
v.doSomething();
```
we only know that there *exists* some concrete type (we could call it `E`) such that `v` refers to a value of type `E`, and `E` implements the `IVehicle` interface. In order to perform an operation like `v.doSomething()` we need to "open" the existential value so that we can look at the concrete type and how it implements the `IVehicle.doSomething` requirement.
This change adds a very explicit representation of existentials to Slang's IR. An operation like `e = makeExistential(v, w)` creates a value of some existential type (interfaces being our only existential types for now), by wrapping a concrete value `v` (the type of `v` can be seen as an implicit operand) and a witness table `w` showing that the type of `v` implements the requirements of the chosen interface type.
In turn, opening of an existential is handled with operations `extractExistential{Value|Type|WitnessTable}` which pull the corresponding piece of information out of a value of existential type (which somewhere in the code had to have been created with `makeExistential`).
The change includes a trivial simplification pass that can detect cases where an `extractExistential*` operation is applied direclty to a `makeExistential` operation, so that there is only one possible result that could be extracted. This allows for simplification of existential types used in trivial ways for local variables (this is mostly so I can check in a functional test, rather than to actually support useful code involving interfaces right now).
The logic in the semantic checking phase of the compiler is comparatively more complex.
When we are about to perform member lookup given an expression like `obj.member` we will first check if `obj` has an existential type, and if it does we will construct a suitable local context in which we extract the value, type, and witness table from the existential (these all become explicit AST expression nodes), and then use the extracted value as the base of the lookup operation.
The nature of existential values is that two different values with the same existential (interface) type could wrap concrete values with differnt types, so that we need to carefully refer only to the extracted type/value/witness-table of specific *values*. We handle this right now by conceptually moving the existential-type value into a local variable (by introducing a `LetExpr` that amounts to `let v = <init> in <body>`) and then require that the extract expressions must refer to the (immutable) variable declaration from which they are extracting a value.
(Eventually we should expand this so that when using an immutable local variable of existential type we just use that variable as-is rather than introduce a new temporary)
A simple test case is included that uses an interface type in an almost trivial way for a local variable; this test can be run and produces the expected results.
A more complex test case that passes an existential into a function is included, but left disabled because a more aggressive simplification approach is required to generate working code from it.
* Add missing file for expected test output
* Fixups for merge from top-of-tree
|
|
* Specialize away resource-type function parameters
Work on #397.
Introduction
------------
Suppose a user writes a function that takes a resource type as a parameter:
```hlsl
float4 getThing(RWStructuredBuffer<float4> buffer, int index)
{
return buffer[index];
}
```
This function creates challenges when generating code for GLSL-based targets, because a global shader parameter of type `RWStructuredBuffer`:
```hlsl
RWStructuredBuffer<float4> gBuffer;
```
translates to a global GLSL `buffer` declaration:
```hlsl
buffer _S0
{
float4 _data[];
} gBuffer;
```
There is no equivalent to that `buffer` declaration that can be used in function parameter position, and it is illegal in GLSL to pass `gBuffer` into a function.
(Aside: yes, we could in principle translate a function parameter like `RWStructuredBuffer<float4> buffer` to `float4 buffer[]`, but that will not in turn generalize to arrays of structured buffers; it is a dead-end strategy)
The solution employed by many shader compilers is to "inline everything" to eliminate the need for parameters of resource types, and then rely on dataflow optimization to eliminate locals of resource types. This strategy can of course lead to an increase in code size, and it also means that call stacks are lost when doing step-through debugging. Another serious issue is that an "early `return`" from a function can turn into the equivalent of a multi-level `break` when inlined, and not all of our targets support multi-level `break`.
The solution implemented in this change works around some, but not all, of the problems with full inlining.
The approach here generates specialized versions of a function like `getThing`, adapted to the actual arguments provided at different call sites.
Thus if we have code like:
```hlsl
RWStructuredBuffer<float4> gA;
RWStructuredBuffer<float4> gB[10];
...
getThing(gA, x);
getThing(gA, y);
getThing(gB[someVal], z);
```
we will generate two specializations of `getThing`: one specialized for the `buffer` parameter being `gA` and the other for `gB`:
```hlsl
float4 getThing_gA(int index) { return gA[index]; }
float4 getThing_gB(int _val, int index) { return gB[_val][index]; }
```
and the call sites will change to match:
```hlsl
getThing_gA(x);
getThing_gA(y);
getThing_gB(someVal, z);
```
Note how in the case where the argument being passed in was obtained by indexing into an array of resources, the callee is specialized to the identity of the global shader parameter (`gB`), and now accepts a new parameter to indicate the array index into it.
While this description motivates the change based on GLSL output, the same basic issue can arise for other targets.
For example, while current HLSL has added the `ConstantBuffer<T>` type, it is not supported on older targets, and it turns out that even dxc does not allow functions to have `ConstantBuffer<T>` parameters.
Longer-term, we will likely need to do even more aggressive specialization both in order to generate SPIR-V output directly, and also to deal with function that have return values or `out` parameters of resource types.
Implementation
--------------
The meat of the change is in `ir-specialize-resources.{h,cpp}`, where we have a pass that looks at all call sites (`IRCall` instructions) in the program, and attempts to replace them with calls to specialized functions, where the specializations are generated on-demand.
The code in this pass is heavily commented, so hopefully it serves to explain itself all right.
After specialization is complete, we may still have functions like the original `getThing` that will produce invalid code when emitted as GLSL, so we need a way to make sure they don't appear in the output.
To date we've had some very ad hoc approaches for ignoring IR constructs that we don't want to affect emitted code, but this change goes ahead and adds a more real dead code elimination (DCE) pass in `ir-dce.{h,cpp}`.
This pass follows a straightforward approach of tagging instructions that are "live" and then propagating liveness through the whole program, before making a single pass to delete anything that isn't live.
When I first added the DCE pass it eliminated *everything* because there were no "roots" for liveness.
I solved this for now by adding a new decoration, `IREntryPointDecoration`, to mark shader entry points in the IR which should always be live (as should anything they depend on).
A secondary problem that arose was that for GLSL ray tracing shaders it is possible for the incoming/outgoing payload or attributes parameters to be unused, but eliminating them as dead would change the signature of a shader an potential break the rules for how ray tracing programs communicate.
I added a very simple `IRDependsOnDecoration` that allows one IR instruction to keep another alive *as if* it used it, without actually using it.
There's also a fixup in the IR dumping logic where I was forgetting to store anything in the mapping from instruction to their names, so that the name of an instruction was getting incremented each time it was referenced.
Testing
-------
There are three different tests added as part of this change:
* The `compute/func-resource-param` test covers the basic `RWStructuredBuffer` case above, which we expect to work fine for D3D11/12, but fail for Vulkan without specialization.
* The `cross-compile/func-resource-param-array` test covers the case where we don't just have one resource, but an array of them. This is not an end-to-end compute test primarily because our `render-test` application doesn't yet handle arrays of resources correctly in its binding logic.
* The `compute/func-cbuffer-param` test covers the case of a function with a `ConstantBuffer<T>` parameter, which requires specialization to become valid for any of our targets.
* fixup: warnings/errors from other compilers
* fixup: typos and cleanup
* fixup: typos
|
|
Before this change, global shader parameters were represented in the IR as just being ordinary global variables.
The only indication that a particular global represented a parameter was when it got a layotu attached to it (as part of back-end processing), and we've had a number of bugs related to layouts being dropped so that what should have been a shader parameter turned into an ordinary global variable in the output.
This change is more strongly motivated by the fact that making shader parameters look like globals means that we cannot easily reason about their value when doing IR transformations.
If we see two `load`s from the same global variable can we assume they yield the same value?
In the general case we cannot, and this means that any transformation that wants to rely on the fact that an input `Texture2D` shader parameter can't actually change over the life of the program needs to do extra work.
The fix here is to introduce a new kind of IR instruction that represents a global shader parameter directly (not a pointer to it as a global would), at which point there isn't even such a notion as a "load" from the parameter, since it represents the value directly.
In several cases logic that used to apply to global variables in case they were shader parameters (by looking for a layout) is now moved to apply to these global parameters.
The biggest source of issues in this change was that switching from pointers to plain values to represent these shader parameters stresses different cases in type legalization. I also had to deal with the case of legalization for GLSL where we actually *do* need global shader parameters that are writable (since varying output goes in the global scope), but in that case I borrowed the use of pointer-like `Out<...>` and `InOut<...>` types to represent that intent, which we were already using for function parameters representing outputs.
A few tests started failing because the changes lead to a slightly different order of code emission, which in some HLSL tests resulted in a function parameter named `s` getting emitted before a global parameter named `s`, leading to the latter getting the name `s_1` instead of `s_0`.
A few SPIR-V tests started failing because the new approach means that we no longer end up performing a load from all varying input parameters at the start of `main` and instead reference the varying inputs directly. The resulting code is more idomatic, but it differed from the baselines for those tests.
|
|
* Move mangled name out of IRGlobalValue
Previously the `IRGlobalValue` type was used as a root for all IR instructions that can have "linkage," in the sense that a definition in one module can satisfy a use in another module.
The mangled symbol name was stored in state directly on each `IRGlobalValue`, which created some complications, and also forced IR instructions that wanted to support linkage to wedge into the hierarchy at that specific point.
This change moves the mangled name out into a decoration: either an `IRImportDecoration` or an `IRExportDecoration`, both of which inherit from `IRLinkageDecoration` which exposes the mangled name.
This change has a few benefits:
* We can now have any kind of instruction be exported/imported, without having to inherit from `IRGlobalValue`. This could potentially let `IRStructType` and `IRWitnessTable` be simplified to just have operand lists instead of dummy chldren as they do today.
* We can now easily have "global values" like functions that explicitly *don't* get linkage, instead of using a null or empty mangled name as a marker.
* We can use the exact opcode on a linkage decoration to distinguish imports from exports, which could be used to more accurately resolve symbols during the linking step.
Other than adding the decorations and making sure that AST->IR lowering adds them, the main changes here are around any code that used `IRGlobalValue`. Variables and parameters of type `IRGlobalValue*` were changed to `IRInst*` easily, so the main challenge was around code that *casts* to `IRGlobalValue*.
In cases where a cast to `IRGlobalValue` also performed a test for the mangled name being non-null/non-empty, we simply switched the code to check for the presence of an `IRLinkageDecoration`, since that is the new way of indicating a value with linakge.
Most of the serious complications arose in `ir.cpp` around the "linking"/target-specialization and generic specialization steps.
The "linking" logic was checking for `IRGlobalValue` to opt into some more complicated cloning logic, and just checking for a linkage decoration here wasn't sufficient since the front-end *does* produce global values without linkage in some cases (e.g., for a function-`static` variable we produce a global variable without linkage). This logic was updated to just check for the cases that used to amount to `IRGlobalValue`s directly by opcode. It might be simpler in the short term to have kept `IRGlobalValue` around to make the existing casts Just Work, but I'm confident that this logic could actually be rewritten for much greater clarity and simplicity and that is the better way forward.
The generic specialization logic was using some really messy code to generate a new mangled name to represent the specialized symbol, and then searching for an existing match for that name.
The original idea there was that an IR module could include "pre-specialized" versions of certain generics to speed up back-end compilation by eliminating the need to specialize in some cases, but this feature has never been implemented so the overhead here is just a waste.
Instead, I moved generic specialization to use a simpler dictionary to map the operands to a `specialize` instruction over to the resulting specialized value.
This allows for some simplifications in the name mangling logic, because it no longer needs to figure out how to produce mangled names from IR instructions representing types/values.
As part of this change I also overhauled the IR emit logic to produce cleaner output by default, borrowing some of the ideas from the logic in `emit.cpp`. IR values are now automatically given names based on their "name hint" decoration, if any, to make the code easier to follow, and I also made it so that types and literals get collapsed into their use sites in a new "simplified" IR dump mode (which is currently the default, with no way to opt into the other mode without tweaking the code). The resulting IR dumps are much nicer to look at, but as a result the one test that involves IR dumping (`ir/string-literal`) doesn't really test what it used to.
One weird issue that came up during testing is that the `transitive-interface` test had previously been producing output that made no sense (that is, the expected output file wasn't really sensible), and somehow these changes were altering its behavior. Changing the test to use `int` values instead of `float` was enough to make the output be what I'd expect, and hand inspection of generating DXBC has me convinced we were compiling the `float` case correctly too. There appears to be some issue around tests with floating-point outputs that we should investigate.
* fixup: C++ declaration order
|
|
* Make a test case use IR serialization
* Make all IR instructions usable as parents
This makes it so that every `IRInst` has the list of children that used to be on `IRParentInst` and eliminates `IRParentInst`.
Most places in the code were only checking against `IRParentInst` so that they could know whether there were child instructions to iterate over.
This change bloats the size of every instruction by two pointers, but we hope to be able to eliminate that overhead with a better encoding later.
* Change IR decorations to be instructions.
The main change here is that `IRDecoration` now inherits from `IRInst`, and `IRInst` now has a single linked list that holds both decorations *and* children.
At each point where code used to loop over `getChildren()` on an `IRInst`, I checked whether it made sense to leave the operation as processing just the children, or if it should process both decorations and children.
The thorniest bit was making sure the logic for inserting an instruction into a parent is correct. For the most part, once IR code is built all insertions are explicitly before/after another instruction, so the ordering can't get messed up. The sticking point is any code that does an explicit `insertAtStart` or `insertAtEnd`, but I surveyed those to make sure they are correct in context, and I also made all insertions bottleneck through one routine that does a better job of asserting the preconditions than what was there before. We may still want a "smart" insertion function at some point so that if somebody does `someDecoration->insertAtEnd(someInst)` the decoration intelligently goes to the end of the decoration list, and not the entire decorations-and-children list.
All of the existing decoration types were refactored to provide accessors for their operands, rather than directly exposing fields. In most cases the operands are required to be `IRConstant` nodes of fixed types. Not all of these types need to be kept around in the new approach, but they were left in so that as much existing code as possible can be kept working.
The `IRBuilder` was extended with factory functions to make the various decoration types and attach them.
All the fields in concrete decorations that were using `StringRepresentation` or `Name` pointers are now using IR-level string operands which provide their value as an `UnownedStringSlice`, so logic that was working with those decoration values needed to be updated here and there. I also needed to add the logic to clone string-literal values to the IR cloning pass, since they are now being used in almost every piece of code.
A new type of constant IR instruction for literal pointers was added, to handle the cases where an IR decoration needs an operand that is a raw AST-level pointer. These are even being serialized, although we obviously should not rely on them to round-trip through serialization in the future. Ideally, a follow-on change should add a cleanup pass where we remove any decorations from a module that shouldn't be allowed in the serialized code.
The biggest overall cleanup is in the serialization logic, where a lot of code just disappears because it can process the raw "decorations and children" list as the logical children of an IR instruction. The only special cases left are literals (which seem like they will always need special-casing) and global values (because they have a mangled name, which we plan to move into a decoration).
One other example of a simplification made possible by this change: the `IRNotePatchConstantFunc` instruction was implemented as an instruction only because it couldn't be encoded as a decoration at the time (it needed to have an operand that referenced an IR function).
The IR dumping logic was also updated (which meant a change to the `ir/string-literal` test) to try to make it print out all decorations a bit more systematically now that they are encoded like other instructions. The formatting isn't quite perfect, but it is good enough to be able to read what is going on.
I didn't include updates to the validation logic to ensure that decorations are being added in ways that follow the invariants, but that would be a nice thing to add next.
* fixup: 64-bit issues
* fixup: forward declaration issues
|
|
* Change how buffers are emitted
This is a change with a lot of pieces, which can't always be separated out cleanly. I'm going to walk through them in what I hope is a logical order.
The main goal of this change was to allow arrays of structured buffers to translate to Vulkan. Consider two declarations of structured buffers in HLSL/Slang:
```hlsl
StructuredBuffer<X> single;
StructuredBuffer<Y> multiple[10];
```
The current translation logic was handling `single` by translating it into an *unnamed* GLSL `buffer` block like:
```glsl
layout(std430)
buffer _S1
{
X single[];
};
```
That syntax allows an expression like `single[i]` in Slang to be translated simply as `single[i]` in GLSL.
But that naive translating doesn't work for `multiple`, since we need to declare a array of blocks in GLSL, which requires giving the whole thing a name:
```glsl
layout(std430)
buffer _S2
{
Y _data[];
} multiple[10];
```
Now a reference to `multiple[i][j]` in Slang needs to become `multiple[i]._data[j]` in GLSL.
To avoid having way too many special cases around single structured buffers vs. arrays, it makes sense to allows emit things in the latter form, so that we instead lower `single` as:
```glsl
layout(std430)
buffer _S1
{
X _data[];
} single;
```
So that now a reference to `single[i]` becomes `single._data[i]` in GLSL.
Most of that can be handled in the standard library translation of the structured buffer indexing operations.
The only wrinkle there is that there were some *old* special-case instructions in the IR intended to handle buffer load/store operations (these were added back when I was trying to keep the "VM" path working). These aren't really needed to have structured-buffer operations work; they can be handled as ordinary functions as far as the stdlib is concerned. I removed the old instructions.
Along the way, it became clear that a few other cases follow the same pattern. Byte-addressed buffers are an obvious case. We were lowering HLSL/Slang:
```hlsl
ByteAddressBuffer b;
...
uint x = b.Load(0);
```
to GLSL like:
```glsl
layout(std430)
buffer _S1
{
uint b[];
};
...
uint x = b[0];
```
That logic would fail for arrays the same way that the structured buffer case was failing. The fix is the same: use named `buffer` blocks and then introduce an explicit `_data` field:
```glsl
layout(std430)
buffer _S1
{
uint _data[];
} b;
...
uint x = b._data[0];
```
Just like with structured buffers, all of the VK translation for operations on byte-addressed buffers can be implemented directly in teh stdlib, so once the emit logic was changed it was just a matter of adding `._data` to a bunch of VK tranlsations.
It turns out that arrays of constant buffers have more or less the same problem, and furthermore we have some problems with any code that directly uses the modern HLSL `ConstantBuffer<T>` type.
Note: the emit logic around constant buffers sometimes refers to "parameter groups" because that is being used in the compiler as a catch-all term for constant buffers, texture buffers, and parameter blocks.
The existing code was going out of its way to reproduce the way that constant buffer declarations are implicitly referenced in HLSL:
```hlsl
cbuffer C { float f; }
...
float tmp = f; // No reference to `C` here
```
This can be seen in the emit logic with the `isDerefBaseImplicit` function, which is used to take the internal IR representation for a reference to `f` (which is closer to the expression `(*C).f` or `C->f`) and leave off any reference to `C` so that we emit just `f`.
That kind of logic just flat out doesn't work in some important cases. Arrays of constant buffers are a clear one:
```hlsl
ConstantBuffer<X> cbArray[3];
...
X x = cbArray[0];
```
There is no way to translate that to an ordinary `cbuffer` declaration at all. The same problem can be created without arrays, though:
```hlsl
ConstantBuffer<X> singleCB;
...
X x = singleCB;
```
The current strategy for translating constant buffers was translating `singleCB` into a `cbuffer` declaration that reproduced the fields of `X` as its members, which just wouldn't work:
```hlsl
cbuffer singleCB
{
float f; // field of `X`
}
...
X x = singleCB; // ERROR: there is nothing named `singleCB` in this HLSL
```
The new strategy is more consistent. We still generate a `cbuffer` declaration for a single constant buffer, but we always give it a single field of the chosen element type:
```hlsl
cbuffer singleCB
{
X singleCB;
}
...
X x = singleCB; // this works fine!
```
And in the array case we generate code that uses the explicit `ConstantBuffer<T>` type:
```hlsl
ConstantBuffer<X> cbArray[3];
...
X x = cbArray[0];
```
The GLSL output is more complicated because unlike with HLSL there is no implicit conversion from a uniform block to its element type (there is no notion of an element type). The array case thus needs a `_data` field similar to what we do for structured buffers:
```glsl
layout(std140)
uniform _S3
{
X _data;
} cbArray[3];
...
X x = cbArray[0]._data;
```
And then the non-array case needs to have a similar `_data` field for consistency:
```glsl
layout(std140)
uniform _S1
{
X _data;
} singleCB;
...
X x = singleCB._data;
```
This is handled by inserting the necessary reference to `_data` whenever we dereference a constant buffer, either as part of a load instruction (loading from the whole CB as a pointer), or an `IRFieldAddress` instruction which forms a pointer into the CB (e.g., `&(singleCB->f)` becomes `singleCB._data.f`).
The current emit logic handles `ParameterBlock<X>` differently from `ConstantBuffer<X>`, but really only to allow parameter blocks to be explicitly named in the output, while constant buffers were left implicit by default. Thus the only difference was a legacy one (from back when trying to exactly reproduce the HLSL text we got as input was considered an important goal), and the new approach to emitting constant buffers would get rid of it.
I removed the separate logic for emitting `ParameterBlock<X>` and just let the handling for constant buffers deal with it.
Note that any resource types inside of a `ParameterBlock<X>` would have been moved out as part of legalization, so that a parameter block is 100% equivalent to a constant buffer when it comes time to emit code.
Unsurprisingly, changing the way we generate HLSL and GLSL output for all these buffer types meant that any tests that were directly comparing the output of `slangc` against `fxc`, `dxc`, or `glslang` broke.
The basic approach to fixing the breakage in GLSL tests was to update the GLSL baseline to reflect the new output startegy. In some cases I used macros to name the various `_S<digits>` temporaries so that future renaming will hopefully be easier (it would be great if we auto-generated temporary names with a bit more context). There was one GLSL test (`tests/bugs/vk-structured-buffer-binding`) that was using raw GLSL expected output, and this was changed to use a GLSL baseline to generate SPIR-V for comparison.
For HLSL tests we were sometimes running the same input file through `slangc` and `fxc`/`dxc`, and in these cases I macro-ized the various `cbuffer` declarations to generate different declarations depending on the compiler.
I completely dropped the tests coming from the D3D SDK because they aren't providing much coverage, and updating them would change them so far from the original code that the purported benefit (using a body of existing shaders) would be lost.
I also dropped the explicit matrix layout qualifiers in the `matrix-layout` test because the new output strategy breaks those for GLSL (you can't put matrix layout qualifiers on `struct` fields, and now the body of every constant buffer is inside a `struct`). This isn't as big of a loss as it seems, because our handling of those qualifiers wasn't really right to begin with. Slang users should only be setting the matrix layout mode globally (and we should probably switch to error out on the explicit qualifiers for now).
The other thing that got dropped is tests involving `packoffset` modifiers.
Slang already warns that it doesn't support these, and the way they were used in the test cases is actually misleading. For the binding/layout-related tests, the goal was to show that Slang reproduces the same layout as fxc, in which case explicitly enforcing a layout via `packoffset` seems like cheating (are we sure we enforced the layout fxc would have produced?). The real reason was that Slang used to emit explicit `packoffset` on *every* field of a `cbuffer` it would output, because of an `fxc` bug where you couldn't use `register` on textures/samplers declared inside a `cbuffer` unless *every* field in the `cbuffer` used a `register` or `packoffset` modifier. Slang hasn't required that behavior in a while because it now splits textures and samplers, and the one test case where we needed `packoffset` to work around the `fxc` bug in the baseline HLSL has been macro-ified even more to work around the bug.
The amount of churn in the test cases is unfortunate, but it continues to point at the weakness of any testing strategy that checks for exact equivalent between Slang's output and that of other compilers. We need to keep working to replace these tests with better alternatives.
In `check.cpp` there is logic to perform implicit dereferencing, so that if you write `obj.f` where `obj` is a `ConstantBuffer<X>` (or some other "pointer-like" type) and `f` is a field in `X`, then this effectively translates as `(*obj).f`. That is, we dereference the value of type `ConstantBuffer<X>` to get a value of type `X`, and then refer to the field of the `X` value.
There was a problem where the logic to insert that kind of implicit dereference operation was using a reference (`auto& type = ...`) for the type of the expression being dereferenced, and then clobbering it. This would mean that an expression of type `ConstantBuffer<X>` would have its type overwritten to be just `X` and then codegen would break later on.
I'm not sure how we haven't run into that before.
The `array-of-buffers` test case was added to confirm that we now support arrays of constant, structured, and byte-address buffers for both DXIL and SPIR-V output.
Okay, so that was a lot of stuff, but hopefully it is clear how this all works to make the output of the compiler more consistent and explicit, while also supporting the required new functionality.
* fixup: review feedback
|
|
Fixes #730
The most basic of these already had translations added, and the Slang approach to cross-compilation made adding the new ones almost trivial. I also took the time to confirm that using "operator comma" for the sequencing (since the single HLSL function call needs to expand to a sequence of GLSL function calls) works fine with glslang.
I added a test case to confirm that all of these operations can compile down to SPIR-V.
I am not personally confident that these translations are perfect, but they are based on what was [linked](https://anteru.net/blog/2016/mapping-between-hlsl-and-glsl/index.html) from #730. The one difference is where that blog post was listing `memoryBarrierImage, memoryBarrierImage` I assumed they meant `memoryBarrierImage, memoryBarrierBuffer`.
|
|
The syntax for this is a placeholder for now, since we will probably want to migrate to whatever gets decided on for dxc. To declare that some data should be part of the "shader record" use `layout(shaderRecordNV)` to mirror the GLSL raytracing extension:
```hlsl
layout(shaderRecordNV)
cbuffer MyShaderRecord
{
float4 someColor;
uint someValue;
}
```
The intention (not enforced) is that an application would map `MyShaderRecord` to "root constants" in the "local root signature" when compiling for DXR, while the output code in GLSL will always map to the shader record in Vulkan raytracing:
```glsl
layout(shaderRecordNV)
buffer MyShaderRecord
{
float4 someColor;
uint someValue;
};
```
This change does *not* support declaring a global value of `struct` type with `layout(shaderRecordNV)` (or a `ParameterBlock` with the modifiers, although that would be a nice-to-have feature) and it does *not* support having the contents of the shader record be mutable (even if GLSL/Vulkan allows it). Those can/should be added in future changes.
In terms of implementation, this closely mirrors the way that `layout(push_constant)` buffers were being handled, where the data inside the `ConstantBuffer<X>` (the value of type `X`) gets laid out using ordinary rules (and consuming ordinary `UNIFORM` storage, while the buffer itself is given a different layout resource to reflect that fact that it does not consume a VK `binding` any more, but a different conceptual resource.
Note: an alternative design here (that might actually be preferrable) would be to have both push-constant and shader-record buffers be handled as alternative aliases for `ConstantBuffer` (or maybe `ParameterBlock`) so that you have, e.g.:
```hlsl
PushConstantBuffer<X> myPushConstants;
ShaderRecord<Y> myShaderRecord;
```
This alternative design avoids API-specific decorations on the declarations, and reflects the intent of the programmer very directly, even when they are compiling for a target like D3D that doesn't reflect these choices at the IL level (it could still be exposed through the Slang reflection API).
|
|
* Don't look at VK bindings when compiling for D3D and vice versa
The compiler had been looking at all the modifiers on a declaration when piecing together binding information, whether or not those modifiers should apply on the chosen target API. This was working in practice because the "layout resource kinds" used by each API target were disjoint, for the most part.
This change ensures that we don't even look at modifiers that don't apply on the chosen target, and furthermore adds a new warning that applies if the user is compiling a shader with explicit `register` bindings for Vulkan, if there are no corresponding `[[vk::binding(...)]]` attributes (under the assumption that if they want to be explicit in one case, they probably want to be explicit in all cases).
* Allow explicit space/set bindings on parameter blocks
The syntax for the D3D case is to specify a `space` in a `register` modifier, without any other register class:
```hlsl
ParameterBlock<X> myBlock : regsiter(space999);
```
In the Vulkan case, the user must apply the `[[vk::binding(...)]]` attribute and is expected to use a `binding` of zero:
```hlsl
[[vk::binding(0,999)]]
ParameterBlock<X> myBlock;
```
This change includes a reflection test for the new capability (where we also confirm that it produces the expected output when compared with fxc), and a test for the diagnostic messages when the user messes up bindings for Vulkan.
The implementation itself is fairly straightforward, since the compiler already treats registe spaces/sets as a resource that parameters can consume directly.
Note: the test case for explicit parameter block space/set bindings includes some commented out code that lead to a compiler crash. I would like to fix the underlying issue, but it seemed sensible to keep the bug fix out of a change like this that is adding functionality.
|
|
The change here is that the logical that used to be controlled by `-parameter-blocks-use-register-spaces` is now turned on unconditionally, meaning that a `ParameterBlock<X>` will get its own register `space` by default when targetting D3D Shader Model 5.1 and later.
I had originally made this feature optional because I wasn't sure whether Shader Model 5.1 should default to using register spaces or not, because D3D 11 doesn't support spaces at the API level and MSDN documetnation made it sound like SM5.1 was available for D3D11 as well. Subsequent reading has led me to understand that MSDN is wrong on this front, and SM5.1 and later are D3D12-only, so it is always safe to use spaces.
The new logic is now that we automatically use spaces for parameter blocks any time it is possible (SM5.1+ and any Vulkan target), and otherwise fall back to not using spaces (SM5.0 and earlier).
I updated a reflection test case that was covering parameter blocks to confirm the output differs between SM5.0 and 5.1.
|
|
The `Type` infrastructure uses a class hierarchy, but blindly `dynamic_cast`ing to a desired case doesn't always give the expected result, because a `Type` could represent a `typedef` (a `NamedExpressionType`) that itself resolves to, e.g, a vector type (a `VectorExpressionType`). In that case a `dynamic_cast<VectorExpressionType*>(someType)` would fail, even though the type logically represents a vector. The `Type::As<T>()` method is designed to handle this case, by "looking through" simple `typedef`s to get at the real definition of a type.
The fix in this case is to use `Type::As<T>()` at various points in the reflection code (`reflection.cpp`) instead of `dynamic_cast`.
This problem surfaced with a `StructuredBuffer<float2>` not reflecting correctly, because the element type (`float2`) is actually a `typedef` (for `vector<float,2>`), so I've included a test case that stresses that case. Getting the right output in the test required tweaking the `slang-reflection-test` tool to produce additional output for resource types (currently narrowed down to only affect structured buffers to avoid large diffs in expected test outputs).
|
|
The `globallycoherent` modifier indicates that resource might be read or written by threads outside of the current thread group, so that any memory barriers that affect it should guarantee coherency at the global memory scope, and not just thread-group scope. The equivalent GLSL modifier appears to be `coherent`.
This change adds the front-end modifier, transforms it into an IR-level decoration during lowering, and then checks for the modifier during code emit.
Note: this logic may not behave correctly when `globallycoherent` is added to a field in a `struct`, since the modifier would then need to be propagated to any variables created during type legalization. Checking up on that is left to future work.
Note: it isn't entirely clear if `globallycoherent` should be treated as a declaration modifier or a type modifier. The point is moot for now because Slang doesn't have any support for type modifiers, but when we get around to that we will need to make a decision.
|
|
* First pass support for early depth stencil.
* Add a simple test to check if output has attributes.
* Use cross compilation to test [earlydepthstencil] on glsl.
* If target is dxil, use dxc to test against.
Add hlsl to test earlydepthstencil against.
* * Added spSessionHasCompileTargetSupport
* Made slang-test use spSessionHasCompileTargetSupport to ignore tests that cannot run
|
|
* Add support for unbounded arrays as shader parameters
With this change, Slang shaders can use unbounded-size arrays as parameters, e.g.:
```hlsl
Texture2D t[] : register(t3, space2);
SamplerState s[];
```
As shown in the above example, Slang supports both explicit `register` declarations on unbounded-size arrays and also implicit binding.
When doing automatic parmaeter binding, Slang will allocate a full register space to an unbounded-size array of textures/smaplers, starting at register zero.
Note that for the Vulkan target, an array of descriptors of any size (including unbounded size) consumes only a single `bindign`, so much of this logic is specific to D3D targets.
Details on the changes made:
* The single biggest change is a new `LayoutSize` type that is used to store a value that can either be a finite unsigned integer or a dedicated "infinite" value (which is stored as the all-bits-set `-1` value). This is used in places where a size could either be a finite value or an "unbounded" value, to both try to make standard math robust against the infinite case, and also to force code to deal with both the finite and infinite cases more explicitly when they care about the difference.
* The public API was documented so that unbounded-size arrays report their size as `-1`. We should probably change this function to return a signed value instead of `size_t`, but that would technically be a source-breaking change, so we want to make sure we stage it appropriately.
* The code that invokes fxc was updated so that it passes the appropriate flag to enable unbounded arrays of descriptors. I haven't looked yet at whether dxc needs such a flag, so there may need to be a follow-on change to add that.
* The logic in the `UsedRanges::Add` method for tracking what registers have been claimed was rewritten because the previous version had some subtle bugs. The new version includes more detailed comments that attempt to explain why I think the new logic works.
* The top-level logic for auto-assigning bindings to parameters has been overhauled to deal with the fact that a parameter that needs "infinite" amounts of a resource should be claiming a full register space for those resources instead. Whenever a parameter allocates any register spaces we want them all to be contiguous, so we have a loop that counts the requirements and allocates the spaces before we go along and dole them out.
* When computing the layout for an array type, we need to carefully deal with unbounded-size arrays. In the case of an unbounded array of a "simple" resource type (e.g., `Texture2D[]`), we opt to expose the type layout as consuming an infinite number of the appropriate register, while in the case of a complex type (say, a `struct` with two texture fields), we need to instead allocate whole spaces for those fields. The logic here is more subtle than I would like, and interacts with the existing code that "adjusts" the element type of an array in order to make standard indexing math Just Work.
* Similarly, when a `struct` type has unbounded-array fields, then we need to transform any field with infinite register requirements to instead consume a space in the resulting aggregate type. This case is comparatively easier than the array case.
* The test case for unbounded arrays covers both explicit and implicit bindings, and also the case of an unbounded array over a `struct` type (it does not cover the case of a `struct` contianing unbounded arrays, so that will need to be added later). For this test we are both validation the output reflection data and that we produce the same code as fxc (with explicit bindings in the fxc case).
* The reflection test app was modified to use the new API contract and detect when a parameter consumes `SLANG_UNBOUNDED_SIZE` resources.
* Fixup: ensure unbounded size is defined at right bit width
|
|
The `Texture2D.Load()` operation takes a `uint3` of coordinates, with the `.xy` part holding the texel coordinates and the `.z` part holding a mip level.
In contrast, the `RWTexture2D.Load()` operation only takes a `uint2`.
This isn't clearly specified on MSDN, so Slang failed to get the declaration right.
This change fixes it so that we only add the extra coordinate for the mip level on non-multisample, read-only texture types (the previous code only checked for multisample-ness).
I also changed the logic that outputs swizzles to extract the coordinates and mip level so that it only applies when there is a mip level being passed through (this code should never actually be applied, though, because we shouldn't be generating `texelFetch` for RW texures anyway...).
One final change that sneask in here is making the `offset` parameter for one of the load-with-offset cases correctly use the base coordinate count for the texture type (e.g., 2D even for `Texture2DArray`). That is an unrelated fix, but I thought I'd sneak it in here rather than do a separate PR.
|
|
* Add Vulkan cross-compilation for byte-address buffers
This covers `ByteAddressBuffer`, `RWByteAddressBuffer`, and `RasterizerOrderedByteAddressBuffer`. A declaration of any of these types translates to a GLSL `buffer` declaration with a single `uint` array of data. Most of the methods on these types then have straightforward translations to operations on the array. The overall translation is similar to what was already being done for structured buffers.
While implementing GLSL translation for the various atomic (`Interlocked*`) methods, I discovered that some of these included declarations that aren't actually included in HLSL. I cleaned these up, including in the declarations of the global `Interlocked*` functions.
The test case that is included here covers only the most basic functionality: `Load`, `Load2`, `Load4` and `Store`. We should try to back-fill tests for the remaining methods when we have time.
Two large caveats with this work:
1. We don't handle arrays of byte-address buffers, just as we don't handle arrays of structured buffers. That will take additional work.
2. We don't handle byte-address (or structured) buffers being passed as function parameters, since the parameter would need to be declared as a bare `uint[]` array.
* Fixup: don't lump raytracing acceleration structures in with buffers
Raytracing acceleration structures share a common base class with byte-address buffers (they are both buffer resources without a specific element type), and I was mistakently matching on this base class in an attempt to have a catch-all that applied to all byte-address buffers.
The fix here was to add a distinct base class for all byte-address buffers and catch that instead.
* Fixup: typos
|
|
* Fix output of binding of structured buffer on GLSL.
* Added test to check vk binding is coming thru.
* Fix closethit binding inconsistency.
|
|
* Add callable shader support for Vulkan ray tracing
This change extends the previous work to update Vulkan ray tracing support for the finished `GL_NV_ray_tracing` spec.
One of the features missing in the experimental extension that was added to the final spec is "callable shaders," which allow ray tracing shaders to call other shaders as general-purpose subroutines.
Most of the implementation work here mirrors what was done for the `TraceRay()` function to map it to `traceNV()`.
We map the generic `CallShader<P>` function to the non-generic `executeCallableNV`, with a payload identifier that indicates a specific global variable of type `P` (the global variable being generated from a `static` local in `CallShader`). A new modifier is added to identify the payload structure, and the parameter binding/layout logic introduces a new resource kind for callable-shader payload data (where previously the logic had assumed ray and callable payloads should use the same resource kind).
Two test shaders are included: one for the callable shader (`callable.slang`) and one for a ray generation shader that calls it (`callable-caller.slang`). Just for kicks, the payload data type is defined in a shared file so that we can be sure the two agree (trying to emulate what might be good practice, and ensure that ray tracing support works together with other Slang mechanisms).
* Typo fix: assocaited->associated
One instance was found in review, but I went ahead and fixed a bunch since I seem to make this typo a lot.
* Typo fix: defintiion->definition
|
|
* Update version of glslang used
* Update VK raytracing support for final extension spec
A lot of this change is just plain renaming: The `NVX` suffixes become just `NV`, and the extension name changes from `GL_NVX_raytracing` to `GL_NV_ray_tracing`.
The Slang standard library and the GLSL baselines for the tests are consistently updated.
The other detail is that the final spec requires the "payload" identifier in a `traceNV()` call to be a compile-time constant, which means it cannot be defined as a local variable first, as in:
```glsl
int payloadID = 0;
traceNV(..., payloadID); // ERROR
```
In terms of how the original support was implemented, the payload ID is being computed via a special builtin function that maps each global GLSL payload variable to a unique ID. There are a few ways we could try to resolve the problem here:
1. We could aspire to put our equivalent of the `constexpr` modifier on the output of the function, so that the GLSL variable gets declared `const` and thus fits the GLSL rules for a constant expression.
2. We could introduce a pass to replace the payload-location instructions with literal integers.
3. We could use a special-purpose instruction instead of a builtin function call, and have that instruction indicate that it doesn't have side effects (so it can be folded into the call site)
4. We could somehow mark the builtin function as not having side effects.
We choose option (4) simply because it provides a feature that could have other applications. This change adds a `[__readNone]` attribute that can be applied to function declarations to express a promise on the part of the programmer that the given function has no side effects and computes its result strictly from the bits of its input arguments (and not things they point to, etc.). This mirrors an equivalent function attribute in LLVM.
We mark the function that computes a ray payload location with this attribute, and propagate the attribute through the layers of the IR, so that when the emit logic asks if an operation has side effects (to see if it can be folded into the arguments of a subsequent expression), we get an affirmative response.
This change should get all of the features that were present in the experiemntal `NVX` extension working with the final extension spec. It does not address callable shaders, which will come as a subsequent change.
|
|
|
|
These calls translate to uses of the `nonuniformEXT` qualifier introduced by the `GL_EXT_nonuniform_qualifier` extension.
The standard library changes in this case are straightforward uses of existing compiler mechanisms. The test case is one of the less pleasant ones where we compare SPIR-V output against SPIR-V generated from a hand-coded GLSL baseline. This is a case where a simpler test type that just checks for specific textual matches in the output (and not whole files) would be better, but that is out of scope for this change.
|
|
* Fix a precedence bug in code emit
Given code like the following:
```hlsl
float a = ...;
float3 b = pow(a, 2.0);
float3 c = b.xyz;
```
There is an implicit cast from `float` to `float3` in the computation of `b`, that Slang will always make explicit in the output.
Slang will also tend to pull the computation of `b` into the next expression if it has no other use sites in the same function.
When it does, the compiler was failing to parenthesize the result correctly, and yielded (more or less):
```hlsl
float a = ...;
float3 c = (float3) pow(a,2.0).xyz;
```
As you can see, the swizzle ended up attached to the `pow()` call instead of the cast, and the downstream compiler luckily complained that we couldn't apply an `.xyz` swizzle to a scalar value.
This change adds the missing parentheses-insertion logic for that case of emitting a cast expression, so that we instead get:
```hlsl
float a = ...;
float3 c = ((float3) pow(a,2.0)).xyz;
```
I added a test case to catch this specific issue, but there is of course no guarantee that we haven't missed other cases in the emit logic. This is why I held out so long on getting to the "why so many parentheses?" complaints...
* remove commented-out code from test program
|
|
This code path hadn't been used, and it had a crash due to not inserting the basic blocks it created (for initializing the variable) into the parent function. The fix adds a bit more smarts to the `IRBuilder` to help with inserting basic blocks into the flow of a function.
The actual user issue was around `static const` declarations, and it is clear that the code is incorrectly treating a function local `static const` as if it were just `static`. That will need to be fixed in another change.
|
|
* Ongoing serialization for full debug work.
* Use StringRepresentationCache and StringSlicePool for serialization.
* Removed some older path handling for serialization which had some wrong underlying assumptions.
* Builds with refactored use of SubStringPool in ir-serialize.
* Removed prohibitedCategories because not used anywhere.
* Add category 'compatibility-issue'
* Remove work in progress on debug serialization.
|
|
* Rework command-line options handling for entry points and targets
Overview:
* The biggest functionality change is that the implicit ordering constraints when multiple `-entry` options are reversed: any `-stage` option affects the `-entry` to its *left* instead of to its *right* as it used to. This is technically a breaking change, but I expect most users aren't using this feature.
* The options parsing tries to handle profile versions and stages as distinct data (rather than using the combined `Profile` type all over), and treats a `-profile` option that specifies both a profile version and a stage (e.g., `-profile ps_5_0`) as if it were sugar for both a `-profile` and a `-stage` (e.g., `-profile sm_5_0 -stage fragment`).
* We now technically handle multiple `-target` options in one invocation of `-slangc`, but do not advertise that fact in the documentation because it might be confusing for users. Similar to the relationship between `-stage` and `-entry`, any `-profile` option affects the most recent `-target` option unless there is only one `-target`.
* The logic for associating `-o` options with corresponding entry points and targets has been beefed up. The rule is that a `-o` option for a compiled kernel binds to the entry point to its left, unless there is only one entry point (just like for `-stage`). The associated target for a `-o` option is found via a search, however, because otherwise it would be impossible to specify `-o` options for both SPIR-V and DXIL in one pass.
* The handling of output paths for entry points in the internal compiler structures was changed, because previously it could only handle one output path per entry point (even when there are multiple targets). The new logic builds up a per-target mapping from an entry point to its desired output path (if any).
Details:
* Support for formatting profile versions, stages, and compile targets (formats) was added to diagnostic printing, so that we can make better error messages. This is fairly ad hoc, and it would be nice to have all of the string<->enum stuff be more data-driven throughout the codebase.
* Test cases were added for (almost) all of the error conditions in the current options validation. The main one that is missing is around specifying an `-entry` option before any source file when compiling multiple files. This is because the test runner is putting the source file name first on the command line automatically, so we can't reproduce that case.
* Several reflection-related tests now reflect entry points where they didn't before, because the logic for detecting when to infer a default `main` entry point have been made more loose
* On the dxc path, beefed up the handling of mapping from Slang `Profile`s to the coresponding string to use when invoking dxc.
* A bunch of tests cases were in violation of the newly imposed rules, so those needed to be cleaned up.
* There were also a bunch of test cases that had accidentally gotten "disabled" at some point because there were comparing output from `slangc` both with and without a `-pass-through` option, but that meant that any errors in command-line parsing produced the *same* error output in both the Slang and pass-through cases. This change updates `slang-test` to always expect a successful run for these tests, and then manually updates or disables the various test cases that are affected.
* When merging the updated test for matrix layout mode, I found that the new command-line logic was failing to propagate a matrix layout mode passed to `render-test` into the compiler. This was because the `-matrix-layout*` options were implemented as per-target, but the target was being set by API while the option came in via command line (passed through the API). It seems like we want matrix layout mode to be a global option anyway (rather than per-target), so I made that change here.
* Add missing expected output files
* A 64-bit fix
* Remove commented-out code noted in review
|
|
The Slang compiler allows the default matrix layout convention (row-major vs. column-major) to be specified via the command line or API.
When generating output HLSL, Slang emits a `#pragma pack_matrix` directive for the chosen default convention, so that a user can generate plain HLSL output and still have it encode their desired defaults.
The problem that has arisen is that many released versions of dxc (including those in the most recent Windows SDK at this time) *ignore* the `#pragma pack_matrix` directive (the feature has since been added to top-of-tree dxc).
The main fix here is to instead pass the `-Zpr` option in to dxc when invoking it if the row-major (non-default) convention is requested.
This will solve the problem for clients that use Slang to generate DXIL, but not for clients who use Slang to generate plain HLSL that they then pass into dxc (those clients are assumed to be able to work around the problem for themselves).
In order to test the change, I added a test that fills a constant buffer with sequential integers, and then reads out the rows/columns of an `int3x4` matrix with both row- and column-major layout, as well as an integer placed *after* the matrix, so we can see the offset it was given.
The `render-test` application did not yet support generating code via dxc/DXIL, so I added an option for that.
This ends up assuming that anybody who is running the D3D12 tests will also have a version of dxc available.
|
|
* First pass at caching file system.
* default-file-system -> slang-file-system
fix problem with location("build.linux") confusing windows build for now.
* Added CompressedResult
Fix problem in Result construction with it being unsigned
* Add support for Path simplification.
* Testing for Path::Simplify.
* Refactored CacheFileSystem - automatically handles ISlangFileSystem or ISlangFileSystemExt appropriately.
Removed WrapFileSystem - because wasn't possible to emulate some of the behavior if just loadFile is implemented.
Split out StringBlob - so that no need to convert between ISlangBlob and String repeatidly.
* Remove unwanted code in ~CompileRequest
|
|
The basic problem was that the front-end was generating code that used a `uint` vector for the coordinates, while GLSL requires an `int` vector.
Without support for implicit type conversions, this leads to GLSL compilation failure.
The fix here is to insert the type conversion as late as possible (during GLSL emit). This isn't a pretty solution, but it is the easiest one to implement in the current compiler.
A more forward-looking approach would be to support "force inline" functions in the stdlib, so that we can implement the conversion logic in a stdlib implementation specialized for the Vulkan/GLSL target.
At the moment, everything to do with image atomics is all sleight of hand anyway, so making it incrementally messier isn't a bit hit.
|
|
* Added getPathType to ISlangFileSystemExt.
This is needed so that when searching for a file it's existance can be tested without loading the file. On some platforms a getCanonicalPath can do this - but depending on how getCanonicalPath is implemented, it may not do. This test is made after the relative path is produced before finding the canonical path.
* Test for importing along search path.
* Added comment to explain the issue around WrapFileSystem impl of getPathType.
* Make search path use / not \
|
|
This change allows an interface to include `static` methods as requirements, so that types that conform to the interface will need to satisfy the requirement with a `static` method.
The essence of the check is simple: when checking that a method satisfies a requirement, we enforce that both are `static` or both are non-`static`.
Making that simple change and adding a test change broke a few other places in the compiler that this change tries to fix. The main fix is to handle cases where we might look up an "effectively static" member of a type through an instance, and to make sure that we replace the instance-based lookup with type-based lookup. There was already logic along these lines in `lower-to-ir.cpp`, so this change centralizes it in `check.cpp` where it seems to logically belong.
|
|
* Refactor of path handling.
* Added PathInfo
* Changed ISlangFileSystem - such that has separate concepts of reading a file, getting a relative path and getting a canonical path
* Added support for getting a canonical path for windows/linux
* Made maps/testing around canonicalPaths
* User output remains around 'foundPath' - which is the same as before
* Small improvements around PathInfo
* Added a type and make constructors to make clear the different 'path' uses
* Fixed bug in findViewRecursively
* Checking and reporting for ignored #pragma once.
* Removed SLANG_PATH_TYPE_NONE as doesn't serve any useful purpose.
* Improve comments in slang.h aroung ISlangFileSystem
* Remove the need for <windows.h> in slang-io.cpp
* Ran premake5.
* Improvements and fixes around PathInfo.
* Fix typo on linix GetCanonical
* Make the ISlangFileSystem the same as before, and ISlangFileSystem contain the new methods.
Internally it always uses the ISlangFileSystemExt, and will wrap a ISlangFileSystem with WrapFileSystem, if it is determined (via queryInterface) that it doesn't implement the full interface.
|
|
* Add a warning on missing return, and initial SCCP pass
The user-visible feature added here is a diagnostic for functions with non-`void` return type where control flow might fall off the end. This *sounds* like a trivial diagnostic to add as part of the front-end AST checking, but that can run afoul of really basic stuff like:
```hlsl
int thisFunctionisOkay(int a)
{
while(true)
{
if(a > 10) return a;
a = a*2 + 1;
}
// no return here!
}
```
This function "obviously" doesn't need to have a `return` statement at the end there, but realizing this fact relies on the compiler to understand that the `while(true)` loop can't exit normally, and doesn't contain any `break` statement. One can write "obvious" examples that need more and more complex analysis to rule out.
The answer Slang uses for stuff like this is to do the analysis at the IR level right after initial code generation (this would be before serialization, BTW, so that attached `IRHighLevelDeclDecoration`s can be used).
When lowering the AST to the IR, we always emit a `missingReturn` instruction (a subtype of `IRUnreachable`) at the end of its body if it isn't already terminated. The IR analysis pass to detect missing `return` statements is then as simple as just walking through all the functions in the module and making sure they don't contain `missingReturn` instructions.
For that simple pass to work, we first need to make some effort to remove dead blocks that control flow can never reach. This change adds a very basic initial implementation of Spare Conditional Constant Propagation (SCCP), which is a well-known SSA optimization that combines constant propagation over SSA form with dead code elimination over a CFG to achieve optimizations that are not possible with either optimization along.
For the moment, we don't actually implement any constant *folding* as part of the SCCP pass, so we can eliminate the dead block in a case like the function above (and those in the test case added in this change), but will not catch things like a `while(0 < 1)` loop. Handling more "obvious" cases like that is left for future work.
* fixup: warning on unreachable code
* Handle case where user of an inst isn't in same function/code
The code as assuming any instruction in the SSA work list has to come from the function/code being processed, but this misses the case where an instruction in a generic has a use inside the function that the generic produces.
This change adds code to guard against that case.
|
|
* Fix error when one constant is defined equal to another
Fixes #666
When a user declares one constant (usually a `static const` variable) to be exactly equal to another by name:
```hlsl
static const a = 999;
static const b = a;
```
Then the IR-level representation of `b` is an `IRGlobalConstant` whose value expression is just a pointer to the definition of `a`.
The logic in `emitIRGlobalConstantInitializer()` was trying to always call `emitIRInstExpr` to emit the value of the constant as an expression, but that function only handles complex/compound expressions and not the case of simple named values (e.g., constants like `a`).
The intention is for code to call `emitIROperand()` instead, and let it decide whether to emit an expression or a named reference using its own decision-making. The `IRGlobalConstant` case really just wants to pass in the "mode" flag it uses to influence that decision-making, but shouldn't be working around it.
This change just replaces the `emitIRInstExp()` call with `emitIROpernad()` and adds a test case to confirm that this fixes the reported problem.
* Fixups for bugs in previous change
The first problem was that certain instruction ops were being special-cased to opt out of "folding" into expressions *before* we make the universal check to always fold when inside an initializer for a global constant.
The second problem is that the `emitIROperand()` logic was always putting expressions around sub-expressions, which breaks parsing when the sub-expression is an initializer list (`{...}`). This fixup is pretty much a hack, but will be something we can remove once we don't emit unncessary parentheses overall, which is a better fix.
|
|
By default, when writing a "method" (aka "member function") in Slang, the `this` parameter is implicitly an `in` parameter. So this:
```hlsl
struct Foo
{
int state;
int getState() { return state; }
void setState(int s) { state = s; }
};
```
is desugared into something like this:
```hlsl
struct Foo { int state };
int Foo_getState(Foo this) { return this.state; }
// BAD:
void Foo_setState(Foo this, int s) { this.state = s; }
```
That "setter" doesn't really do what was intended. It modifies a local copy of type `Foo`, because `in` parameters in HLSL represent by-value copy-in semantics, and are mutable in the body the function. Slang was updated to give a static error on the original code to catch this kind of mistake (so that `this` parameters are unlike ordinary function parameters, and no longer mutable).
Of course, sometimes users *want* a mutable `this` parameter. Rather than make a mutable `this` the default (there are arguments both for and against this), this change adds a new attribute `[mutating]` that can be put on a method (member function) to indicate that its `this` parameter should be an `in out` parameter:
```hlsl
[mutating] void setState(int s) { state = s; }
```
The above will translate to, more or less:
```hlsl
void Foo_setState(inout Foo this, int s) { this.state = s; }
```
One added detail is that `[mutating]` can also be used on interface requirements, with the same semantics. A `[mutating]` requirement can be satisfied with a `[mutating]` or non-`[mutating]` method, while a non-`[mutating]` requirement can't be satisfied with a `[mutating]` method (the call sites would not expect mutation to happen).
The design of `[mutating]` here is heavily influenced by the equivalent `mutating` keyword in Swift.
Notes on the implementation:
* Adding the new attribute was straightforward using the existing support, but I had to change around where attributes get checked in the overall sequencing of static checks, because attributes were being checked *after* function bodies, but with this change I need to look at semantically-checked attributes to determine the mutability of `this`
* The check to restrict it so that `[mutating]` methods cannot satisfy non-`[mutating]` requirements was easy to add, but it points out the fact that there is a huge TODO comment where the actual checking of method *signatures* is supposed to happen. That is a bug waiting to bite users and needs to be fixed!
* While we had special-case logic to detect attempts to modify state accessed through an immutable `this` (e.g., `this.state = s`), that logic didn't trigger when the mutation happened through a function/operator call (e.g., `this.state += s`), so this change factors out the validation logic for that case and calls through to it from both the assignment and `out` argument cases.
* The error message for the special-case check was updated to note that the user could apply `[mutating]` to their function declaration to get rid of the error.
* The semantic checking logic for an explicit `this` expression was already walking up through the scopes (created during parsing) and looking for a scope that represents an outer type declaration that `this` might be referring to. We simply extend it to note when it passes through the scope for a function or similar declaration (`FunctionDeclBase`) and check for the `[mutating]` attribute. If the attribute is seen, it returns a mutable `this` expression, and otherwise leaves it immutable.
* The IR lowering logic then needed to be updated so that when adding an IR-level parameter to represent `this`, it gives it the appropriate "direction" based on the attributes of the function declaration being lowered. The rest of the IR logic works as-is, because it will treat `this` just like an other parameter (whether it is `in` or `inout`).
* This biggest chunk of work was the "implicit `this`" case, because ordinary name lookup may resolve an expression like `state` into `this.state`, so that the `this` expression comes out of "thin air." To handle this case, I extended the structure of the "breadcrumbs" that come along with a lookup result (the breadcrumbs are used for any case where a single identifier like `state` needs to be embellished to a more complex expression as a result of lookup), so that it can identify whether a `Breadcrumb::Kind::This` node comes from a `[mutating]` context or not. Similar to the logic for an explicit `this`, we handle this by noting when we pass through a `FunctionDeclBase` when moving up through scopes, and look for the `[mutating]` attribute on it. The rest of the work was just plumbing the additional state through.
|
|
* Move to newer glslang
* Support cross-compilation of ray tracing shaders to Vulkan
This change allows HLSL shaders authored for DirectX Raytracing (DXR) to be cross-compiled to run with the experimental `GL_NVX_raytracing` extension (aka "VKRay").
* The GLSL extension spec is marked as experimental, so that any shaders written using this support should be ready for breaking changes when the spec is finalized.
* "Callable shaders" are not exposed throug the GLSL extension, so this feature of DXR will not be cross-compiled.
* The experimental Vulkan raytracing extension does not have an equivalent to DXR's "local root signature" concept. This does not visibly impact shader translation (because the local/global root signature mapping is handled outside of the HLSL code), but in practice it means that applications which rely on local root signatures on their DXR path will not be able to use the translation in this change as-is; more work will be needed.
The simplest part of the implementation was to go into the Slang standard library and start adding GLSL translations for the various DXR operations.
In some cases, like mapping `IgnoreHit()` to `ignoreIntersectionNVX()` this is almost trivial.
The various functions to query system-provided values (e.g., `RayTMin()`) were also easy, with the only gotcha being that they map to variables rather than function calls in GLSL, and our handling of `__target_intrinsic` assumes that a bare identifier represents a replacement function name, and not a full expression, so we have to wrap these definitions in parentheses.
The tricky operations are then `TraceRay<P>()` and `ReportHit<A>()`, because these two are generics/templates in HLSL.
GLSL doesn't support generics, even for "standard library" functions, so the raytracing extension implements a slightly complex workaround: the matching operations `traceNVX()` and `reportIntersectionNVX()` pass the payload/attributes argument data via a global variable.
That is, shader code for the GLSL extensions writes to the global variable and then calls the intrinsic function.
The linkage between the call site and the global is established by a modifier keyword (`rayPayloadNVX` and `hitAttributeNVX`, respectively) and in the case of ray payload also uses `location` number to identify which payload global to use (since a single shader can trace rays with multiple payload types).
Our translation strategy in Slang tries to leverage standard language mechanisms instead of special-case logic.
For example, to translate the `ReportHit<A>()` function, we provide both a default declaration that will work for HLSL (where the operation is built-in with the signature given), and a *definition* marked with the `__specialized_for_target(glsl)` modifier.
The GLSL definition declares a function `static` variable that will fill the role of the required global, and then does what the GLSL spec requires: assigns to the global, and then calls the `reportIntersectionNVX` builtin (which we declare as a separate builtin).
Our ordinary lowering process will turn that `static` variable into an ordinary global in the IR, and the `[__vulkanHitAttributes]` attribute on the variable will be emitted as `hitAttributeNVX` in the output.
There is no additional cross-compilation logic in Slang specific to `ReportHit<A>()` - the target-specific definition in the standard library Just Works.
The case for `TraceRay<P>()` is a bit more complicated, simply because the GLSL `traceNVX()` function needs to be passed the `location` for the payload global.
We implement the payload global as a function-`static` variable, with the knowledge that every unique specialization of `TraceRay<P>()` will generate a unique global variable of type `P` to implement our function-`static` variable.
We then add a slightly magical builtin function `__rayPayloadLocation()` that can map such a variable to its generated `location`; the logic for this is implemented in `emit.cpp` and described below.
We also changed the `RayDesc` and `BuiltinTriangleIntersectionAttributes` types from "magic" intrinsic types over to ordinary types (because the GLSL output needs to declare them as ordinary `struct` types).
This ends up removing some cases in the AST and IR type representations.
By itself this change would break HLSL emit, because in that case the types really are intrinsic.
We added a `__target_intrinsic` modifier to these types to make them intrinsic for HLSL, and then updated the downstream passes to handle the notion of target-intrinsic types.
The logic for binding/layout of entry point inputs and outputs was updated so that raytracing stages don't follow the default logic for varying input/output parameters.
This is because the input/output parameters of a raytracing entry point aren't really "varying" in the same sense as those in the rasterization pipeline.
In particular, the SPIR-V model for raytracing input and output treats "ray payload" and "hit attributes" parameters as being in a distinct storage class from `in` or `out` parameters.
We also detect cases where a ray tracing stage declares inputs/outputs that it shouldn't have. This logic could conceivably be extended to other stages (e.g., to give an error on a compute shader with user-defined varying input/output).
The type layout logic added cases for handling raytracing payload and hit-attribute data, but this is currently just a stub implementation that follows the same logic as for varying `in` and `out` parameters (it cannot give meaningful byte sizes/offsets right now).
To my knowledge the GLSL spec doesn't currently specify anything about layout, and I haven't read the DXR spec language carefully enough to know what it says about layout.
A future change should update the layout logic to allow for byte-based layout of ray payloads, etc. so that we can query this information via reflection.
The GLSL legalization logic in `ir.cpp` was updated to factor out the per-entry-point-parameter code into its own function, and then that function was updated to special-case the input/output of a ray-tracing shader.
While for rasterization stages we typically want to take the user-declared input/output and "scalarize" it for use in GLSL (in part to deal with language limitations, and in part to tease system values apart from user-defined input/output), the GLSL spec for raytracing requires payload and hit attribute parameters to be declared as single variables. There is also the issue that even for an `in out` parameter, a ray payload parameter should only turn into a single global, whereas the handling for varying `in out` parameters generates both an `in` and an `out` global for the GLSL case.
Other than the handling of entry point parameters, the GLSL legalization pass doesn't need to do anything special for ray tracing shaders.
The trickiest change in the `emit.cpp` logic is that we now generate `location`s for ray payload arguments (the outgoing from a `TraceRay()` call) on demand during code generation.
This is a bit hacky, and it would be nice to handle it as a separate pass on the IR rather than clutter up the emit logic, but this approach was expedient.
Basically, any of the global variables that got generated from the `static` declarations in the standard library implementation of `TraceRay()` will trigger the logic to assign them a `location`.
The logic for emitting intrinsic operations added a few new `$`-based escape sequences. The `$XP` case handles emitting the location of a generated ray payload variable; this is how we emit the matching location at the site where we call `traceNVX`. The `$XT` case emits the appropriate translation for `RayTCurrent()` in HLSL, because it maps to something different depending on the target stage.
All of the test cases here consist of a pair of an HLSL/Slang shader written to the DXR spec, plus a matching GLSL shader for a baseline.
The GLSL shaders are carefully designed so that when fed into glslang they will produce the same SPIR-V as our cross-compilation process.
This kind of testing is quite fragile, but it seems to be the best we can do until our testing framework code supports *both* DXR and VKRay.
A bunch of the core changes ended up being blocked on issues in the rest of the compiler, so some additional features go implemented or fixed along the way:
The first big wall this work ran into was that the `__specialized_for_target` modifier hasn't actually been working correctly for a while.
It turns out that for the one function that is using it, `saturate()`, we have been outputting the workaround GLSL function in *all* cases (including for HLSL output) rather than only on GLSL targets.
The problem here is that for a generic function with a `__specialized_for_target` modifier or a `__target_intrinsic` modifier, the IR-level decoration will end up attached to the `IRFunc` instruction nested in the `IRGeneric`, but the logic for comparing IR declarations to see which is more specialized (via `getTargetSpecializationLevel()`) was looking only at decorations on the top-level value (the generic).
The quick (hacky) fix here is to make `getTargetSpecializationLevel()` try to look at the return value of a generic rather than the generic itself, so that it can see the decorations that indicate target-specific functions.
A more refined fix would be to attach target-specificity decorations to the outer-most generic (to simplify the "linking" logic).
The only reason not to fold that into the current fix is that the `__target_intrinsic` modifier currently serves double-duty as a marker of target specialization *and* information to drive emit logic. The latter (the emit-related stuff) currently needs to live on the `IRFunc`, and moving it to the generic could easily break a lot of code.
This needs more work in a follow-on fix, but for now target specialization should again be working.
The other big gotcha that the simple "just use the standard library" strategy ran into was that function-`static` variables weren't actually implemented yet, and in particular function-`static` variables inside of generic functions required some careful coding.
The logic in `lower-to-ir.cpp` has this `emitOuterGenerics()` function that is supposed to take a declaration that might be nested inside of zero or more levels of AST generics, and emit corresponding IR generics for all those levels.
This is needed because two different AST functions nested inside a single generic `struct` declaration should turn into distinct `IRFunc`s nested in distinct `IRGeneric`s.
The tricky bit to making that all work is that the same AST-level generic type parameter will then map to *different* IR-level instructions (the parameters of distinct `IRGeneric`s) when lowering each function.
The existing logic handled this in an idiomatic way by making "sub-builders" and "sub-contexts."
This change refactors some of the repeated logic into a `NestedContext` type to help simplify the pattern, and applies it consistently throughout the `lower-to-ir.cpp` file.
Besides that cleanup, the major change is `lowerFunctionStaticVarDecl` which, unsurprisingly, handles lower of function-`static` variables to IR globals.
The careful handling of nested contexts here is needed because if we are in the middle of lowering a generic function, then a `static` variable should turn into its *own* `IRGeneric` wrapping an `IRGlobalVar`. The body of the function should refer to the global variable by specializing the global variable's `IRGeneric` to the parameters of the *functions* `IRGeneric`. This tricky detail is handled by `defaultSpecializeOuterGenerics`.
An additional subtlety not actually required for this raytracing work (and thus not properly tested right now) is handling function-`static` variables with initializers.
These can't just be lowered to globals with initializers, because HLSL follows the C rule that function-`static` variables are initialized when the declaration statement is first executed (and this could be visible in the presence of side-effects).
The lowering strategy here translates any `static` variable with an initializer into *two* globals: one for the actual storage, plus a second `bool` variable to track whether it has been initialized yet.
There are some opportunities to optimize this case, especially for `static const` data, but that will need to wait for future changes.
We've slowly been shifting away from the model where a user thinks of a "profile" as including both a stage and a feature level.
Instead, the user should think about selecting a profile that only describes a feature level (e.g., `sm_6_1`, `glsl_450`, etc.), and then separately specifying a stage (`vertex`, `raygeneration, etc.) for each entry point.
The challenge here is that the command-line processing still only had a single `-profile` switch, and no way to specify the stage.
Adding the `-stage` option was relatively easy, but making it work with the existing validation logic for command-line arguments was tricky, because of the complex model that `slangc` supports for compiling multiple entry points in a single pass.
* In `slang.h` add new reflection parameter categories for ray payloads and hit attributes, as part of entry point input/output signatures.
* A previous change already updated our copy of glslang to one that supports the `GL_NVX_raytracing` extension, so in `slang-glslang.cpp` we just needed to map Slang's `enum` values for the raytracing stage names to their equivalents in the glslang code.
* Moved the logic for looking up a stage by name (`findStageByName()`) out of `check.cpp` and into `compiler.cpp`, with a declaration in `profile.h`
* Added a `$z` suffix to the GLSL translation of `Texture*.SampleLevel()`, to handle cases where the texture element type is not a 4-component vector. Note that this fix should actually be applied to *all* these texture-sampling operations, but I didn't want to add a bunch of changes that are (clearly) not being tested right now.
* The layout logic for entry points was updated to correctly skip producing a `TypeLayout` for an entry point result of type `void`, which meant that the related emit logic now needs to guard against a null value for the result layout.
* In `ir.cpp`, dump decorations on every instruction instead of just selected ones, so that our IR dump output is more complete.
* Added a command-line `-line-directive-mode` option so that we can easily turn off `#line` directives in the output when debugging. Not all cases where plumbed through because the `none` case is realistically the most important.
* Parser was fixed to properly initialize parent links for "scope" declarations used for statements, so that we can walk backwards from a function-scope variable (including a `static`) and see the outer function/generics/etc.
* Added GLSL 460 profile, since it is required for ray tracing. Also updated the logic for computing the "effective" profile to use to recognize that GLSL raytracing stages require GLSL 460.
* Added some conventional ray-tracing shader suffixes to the handling in `slang-test`. This code isn't actually used, but was relevant when I started by copy-pasting some existing VKRay shaders as the starting point for my testing.
* Fixup: typos
|
|
* * Remove the need for IRHighLevelDecoration in Emit
* Use the IRLayoutDecoration for GeometryShaderPrimitiveTypeModifier
* Fixing problems with comparison due to naming differences with slang/fxc.
|
|
* Update glslang version
* Fix build for new glslang
The latest glslang required a few changes to our manual build for their code (because we are *not* taking a dependency on CMake).
* Rebuild project files using premake, which picks up a few files added to glslang, but also a few diffs in Slang's own project files in cases where they were edited manually instead of using premake.
* Fix up the declaration our our device limits (which are inentionally set to *not* limit what code passes through our glslang), because the underlying structure definition in glslang has changed. This is a kludgy bit of glslang's design, but it doesn't make sense for us to invest in a more serious workaround.
* Remove the "hack sampler" workaround
When the `GL_KHR_vulkan_glsl` spec was introduced to allow GLSL to be compiled for Vulkan SPIR-V, it made an annoying mistake by leaving a few builtins as taking `sampler2D`, etc. when the equivalent SPIR-V operations only require a `texture2D`, etc. The relevant builtins are:
* `textureSize`
* `textureQueryLevels`
* `textureSamples`
* `texelFetch`
* `texelFetchOffset`
This means that shader code that wanted to use those operations needed to conspire to have a `sampler` handy so they could write, e.g.:
```glsl
vec4 val = texelFetch(sampler2D(myTexture, someRandomSampler), p, lod);
```
when what they really wanted was this:
```glsl
vec4 val = texelFetch(myTexture, p, lod);
```
That is annoying but probably something each to work around for a GLSL programmer, but when cross-compiling from HLSL, you might have an operation like:
```hlsl
float4 val = myTexure.Load(p);
```
in which case a cross-compiler needs to manufacture a sampler out of thin air. If the shader happened to use a sampler for something else you could snag that, but in the worse case you had to cross-compile to GLSL that declared a new sampler.
Slang did this by declaring a sampler called `SLANG_hack_samplerForTexelFetch` (because `texelFetch` is the operation that first surfaced the issue). For complex reasons we *always* define this sampler, even if we turn out not to need it in a particular output kernel. This choice has a bunch of annoying consequences:
* There is *always* a sampler defined in descriptor set zero, because that's where we put the hack sampler, so a user-defined parameter block always has a set number of 1 or greater (see #646).
* The hack sampler shows up in reflection output because users need to size their descriptor sets appropriately to pass along this sampler that won't actually be used if they don't want to get debug spew from the validation layers.
We filed an issue on glslang about this problem, and eventually some kind folks from the gamedev community (who also saw the same problem) defined an extension spec (`GL_EXT_samplerless_texture_functions`) to fix the underlying issue and contributed a patch to glslang to make it support that extension.
This change just backs the hack out of Slang now that we have a glslang version that supports the extension to get past the defect in the original GLSL-for-Vulkan definition. Besides yanking out the code for the hack, we also change the relevant builtins to declare that they require this new GLSL extension (so that we properly request it from glslang when the builtins are used), and fix some reflection test cases that exposed the existence of the "hack sampler."
* Fixup: syntax error in stdlib generator files
* Remove more code for hack sampler
There was logic to ensure we always have a "default" register space/set when cross-compiling, because the hack sampler would need it. This is no longer necessary once we remove the hack sampler.
* Fix expected test output.
Fixing the root cause of issue #646 means that one of our test cases that tickles that issue now produces different output (luckily it can now be used as a regression test for the issue).
|
|
* * Added support for strings in IR with IRStringLit - with storage of chars after it
* Added kIRDecorationOp_Transitory - can be used for detecting instructions constructed on stack
* Made IRConstant hashing work off type
* Fix comment that is out of date about how an instruction is determines to hold a transitory string.
|
|
This can mask an error when the user either typos a macro name when writing a conditional, or (as was the case for the user who pointed out this issue) they mistakenly assume that a `#define` in an `import`ed file has been made visible to them.
This change just adds the warning in the obvious place, with a test code to ensure it triggers.
|
|
* Improve diagnostic messages for function redefinition
The front-end was using internal "not implemented" errors instead of friendly user-facing errors to handle:
* Redefinition of a function (same signature and both have bodies)
* Multiple function declarations/definitions with the same parameter signature, but differnet return types
This change simply turns both of these into reasonably friendly errors that explain what went wrong and point to the previous definition/declaration as appropriate.
* Add support for detecting #pragma directives and handling them
The logic here mirrors what was set up for preprocessor directives, just for "sub-directives" in this case.
The only case here is the default one, which now reports a warning for directives we don't understand.
* Add basic support for #pragma once
Fixes #494
The approach here is simplistic in the extreme. When we see a `#pragma once` directive, we put the current file path (the location of the `#pragma` directive, as reported by our source manager) into a set, and then any paths in that set are ignored by subsequent `#include` directives.
This should work for simple cases of `#pragma once`, but it is likely to fail in a variety of cases because our filesystem layer currently makes no attempt to normalize/canonicalize paths. Improving the robustness of the solution is left to future work.
This change includes a simple test case to confirm that a second `#include` of a file with a `#pragma once` is successfully ignored.
|
|
* Support for attributed [[vk::push_constant]] and [[push_constant]]. Can also use layout(push_constant).
* Fix test so matches the expected output.
* Add expected output to binding-push-constant-gl.hlsl
* Trivial change to force travis rebuild to test the gcc linux build really has a problem.
|
|
Fixes #627
The front-end has support for `RasterizerOrderedBuffer` and `RasterizerOrderedTexture*`, but left out support for:
* `RasterizerOrderedByteAddressBuffer`
* `RasterizerOrderedStructuredBuffer`
[Nitpick: these tyeps are all amazingly annoying to type. It is easy to want to write `RasterOrdered` instead of the bulkier `RasterizerOrdered`, and almost everybody does in casual speech. There's already the issue of wanting to type `StructureBuffer` (a buffer of structures) instead of `StructuredBuffer` (a buffer that is... structured?). Then you have `ByteAddressBuffer` which is just adding to the confusion because it is nominally a "byte addressable" buffer (so that `ByteAddressedBuffer` would actually make sense), but then actually *isn't* byte addressable in practice.]
There were a few `TODO` comments related to this already, and this change was mostly a matter of doing a find-in-files for `RWByteAddressBuffer` and `RWStructuredBuffer` and adding matching `RasterizerOrdered` cases.
The test I added just checks that these types make it through the front-end, and doesn't do any actual confirmation that they work as intended. It is worth noting that the handling of ordering in GLSL/VK is different from in HLSL ("pixel shader interlock" instead of "rasterizer ordered views"), so coming up with a cross-compilation story would need to be a later step.
|
|
* Typo fix, and added dxc to command line documentation.
* Fix small typos.
Added support for Scope to lexer.
Fix bug in Token ctor.
* Add support for attribute names that are scoped.
* Added GLSLBindingAttribute. Make binding work through core.met.slang.
* Allow [[gl::binding(binding, set)]]
[[vk::binding(binding,set)]]
|
|
* Fix typo OuptutTopologyAttribute -> OutputTopologyAttribute
First pass support for handing tesselation shaders - domain and hull.
* Added attribute PatchConstantFuncAttribute
* Added visitHLSLPatchType(HLSLPatchType* type) such that the patch type template parameters are handled
* Added IRNotePatchConstantFunc - such that the patch constant function is referenced within IR
* Added support for outputing typical tesselation attributes (although minimal validation is performed)
* Added findFunctionDeclByName
* Small improvements to diagnostic.
* Improved diagnostics and checking for geometry shader attributes.
* Added diagnostic if patchconstantfunc is not found
Handle assert failure when outputing a domain shader alone and therefore attr->patchConstantFuncDecl is not set.
* Simple script tess.hlsl to test out domain/hull shaders.
* Added url for where hull shader attributes are defined.
* Fix unsigned/signed comparison warning.
* Restore removal of fix in "Improve generic argument inference for builtins (#598)"
* Update tessellation test case to compare against fxc
The test was previously comparing against fixed expected DXBC output, but this caused problems when the test runner tried to execute the test on Linux (where there is no fxc to invoke...), and would also be a potential source of problems down the road if different users run using different builds of fxc.
The simple solution here is to convert the test to compare against fxc output generated on the fly. That test type is already filtered out on non-Windows builds, so it eliminates the portability issue (in a crude way).
I also changed the test to compile both entry points in one compiler invocation, just to streamline things into fewer distinct tests.
* Eliminate unnecessary call to `lowerFuncDecl`
In a very obscure case this could cause a bug, if the patch-constant function had somehow already been lowered (because it was called somewhere else in the code).
The call should not be needed because `ensureDecl` will lower a declaration on-demand if required, so eliminating it causes no problems for code that wouldn't be in that extreme corner case.
|
|
Fixes #487
The basic problem here is that the user writes something like:
```hlsl
float invSqrt2 = 1 / sqrt(2);
```
In this case the user knows that `sqrt()` is only defined for floating-point types, so they expect this to compile something like:
```hlsl
float invSqrt2 = float(1) / sqrt(float(2));
```
The challenge this creates for the Slang compiler is that we use generics to streamline our declarations of all the builtins, so that the scalar `sqrt()` function is actually declared as:
```hlsl
T sqrt<T:__BuiltinFloatingPointType>(T value);
```
The `__BuiltinFloatingPointType` is an `interface` defined as part of the standard library, such that only built-in floating-point types conform to it (that is, `half`, `float`, and `double`).
When generic argument inference applies to a call like `sqrt(2)`, we see an argument of type `int`, and try to infer `T=int`, which leads to a failure because `int` does not conform to `__BuiltinFloatingPointType`.
The point where this currently fails in in the logic to "join" two types for inference, which is supposed to pick the best type that can represent both of two input types. E.g., a join between `float` and `int3` would be `float3`, since both of those types can convert to it, and it is the "minimal" type with that property.
So, the goal here is simple: we want a "join" between `int` and `__BuiltinFloatingPointType` to yield the `float` type. The way we handle that in this change is to special case the join of a basic scalar type and an interface, by enumerating all the basic scalar types, filtering them for ones that support the chosen interface and can be implicitly converted from the argument type, and then picking the "best" of them (the comments in the code explain what "best" means in this context).
The technique used here could be generalized in the future to deal with user-defined types or more cases, but that would risk slowing down overload resolution even more, which is already the most expensive part of our semantic checking pass.
A test case has been added for the specific case of `sqrt()` applied to an `int` argument.
|
|
Slang `enum` declarations will always be scoped, e.g.:
```hlsl
enum Color
{
Red,
Green = 2,
Blue,
}
Color c = Color.Red; // Not just `Red`
```
A user can write `enum class` as a placebo for now (to ease sharing of headers with C++).
Slang does not currently support the `::` operator for static member lookup, so it must be `Color.Green` and not `Color::Green`. Support for `::` as an alternate syntax could be added later if there is strong user demand.
An `enum` type can have a declared "tag type" using syntax like C++ `enum class`:
```hlsl
enum MyThings : uint
{
First = 0,
// ...
}
```
The `enum` cases will store their values using that type. An `enum` that doesn't declare a tag type will use the type `int` by default.
Enum cases are assigned values just like in C/C++: cases can have explicit values, but otherwise default to one more than the previous case, or zero for the first case.
All `enum` types will automatically conform to a standard-library `interface` called `__EnumType`, which is used so that basic operators like equality testing can be defined generically for all `enum` types.
This change only adds one operator at first (the `==` comparison), but other should be added later.
An `enum` case needs to be explicitly converted to an integer where needed (e.g., `int(Color.Red)`).
This is implemented by having the main integer types (`int` and `uint`) support built-in initializers that can work for *any* `enum` type (or rather, anything conforming to `__EnumType`).
Eventually these will be restricted so that an `enum` type can only be converted to its associated tag type.
IR code generation completely eliminates `enum` types and their cases.
The `enum` type will be replaced with its tag type, and the cases will be replaced with the tag values.
Currently this could leave some mess in the IR where cast operations are applied between values that actually have the same type.
|
|
* Fix atomic operations on RWBuffer
An earlier change added support for passing true pointers to `__ref` parameters to fix the global `Interlocked*()` functions when applied to `groupshared` variables or `RWStructureBuffer<T>` elements.
That change didn't apply to `RWBuffer<T>` or `RWTexture2D<T>`, etc. because those types had so far only declared `get` and `set` accessors, but not any `ref` accessors (which return a pointer).
The main fixes here are:
* Add `ref` accessors to the subscript oeprations on the `RW*` resource types
* Adjust the logic for emitting calls to subscript accessors so that we don't get quite as eager about invoking a `ref` accessor, and instead try to invoke just a `get` or `set` accessor when these will suffice. This is important for Vulkan cross-compilation, where we don't yet support the semantics of our `ref` accessors.
* Add a test case for atomics on a `RWBuffer`
* Fix up `render-test` so that we can specify a format for a buffer resource, which allows us to use things other than `*StructuredBuffer` and `*ByteAddressBuffer`. The work there is probably not complete; I just did what I could to get the test working.
* A bunch of files got whitespace edits thanks to the fact that I'm using editorconfig and others on the project seemingly arent...
* fixup: remove ifdefed-out code
|