| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
|
|
|
|
|
|
| |
* Prefixing source files in source/slang with slang-
* Prefix source in source/slang with slang- prefix.
* Rename core source files with slang- prefix.
* Update project files.
* Fix problems from automatic merge.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* A few changes required for application adoption of interface-type parameters
There are a few small changes here that are all related in that they arose from trying to integrate support for specialization via global interface-type shader parameters into a real application.
Allow querying the "pending" layout via reflection API
------------------------------------------------------
The naming here isn't ideal, and could probably use a round of "bikeshedding" to arrive at something better, but the basic idea is that when you have a type like:
```
struct MyStuff
{
int a;
IFoo foo;
int b;
}
```
the fields `a` and `b` get allocated space directly in the "primary" layout for `MyStuff` (at offsets 0 and 4, with `sizeof(MyStuff) == 8`), but the `foo` field can't be allocated space until we know what concrete type will get plugged in there.
If we have a concrete type in mind:
```
struct Bar : IFoo { int bar; }
```
then we can know how much space the `foo` field will take up, but we still can't allocate it space directly in `MyStuff`, because we already decided that `sizeof(MyStuff) == 8`.
Now imagine we place some `MyStuff` values into constant buffers:
```
cbuffer X {
MyStuff x;
}
cbuffer Y {
MyStuff y;
float4 z;
}
```
In each case we know that we want to place the `MyStuff::foo` field at the end of the containing constant buffer so that it doesn't disrupt the layout of the existing fields. But that means that the offset of `MyStuff::foo` relative to the start of the `MyStuff` isn't fixed, because of unrelated fields like `z` that need to get in between.
In our layout code, we handle this by having a notion of a "pending" layout. Once we know how `MyStuff::foo` will be specialized, we can compute both a "primary" and a "pending" layout for `MyStuff`, which basically treats it as if it were two distinct types:
```
struct MyStuff_Primary
{
int a;
int b;
}
struct MyStuff_Pending
{
Bar foo;
}
```
Layout for an aggregate type like the `X` or `Y` constant buffer then proceeds by computing an aggregate primary layout and an aggregate pending layout, and then finally a constant buffer or parameter block "flushes" all or part of the pending data by appending it to the primary data to get the final layout.
What all this means is that a type like `MyStuff` will have two different layouts (a default one for the primary data and a "pending" one for any specialized interface-type fields), and a variable like `Y::y` will also have two variable layouts that specify offsets (one set of offsets for its primary part, and one set of offsets for its pending part).
In order to handle interface-type fields with these layout rules, an application needs a way to query the "pending" part of a type or variable layout, which luckily gives it back just another type/variable layout. The API change here is minimal, although actually exploiting the new API correctly in application code could prove challenging.
Allow creating of explicitly specialized types
----------------------------------------------
This feature isn't actually implemented all the way through the compiler (I just needed enough to make the API calls go through), but I've added support for specializing a type that has interface-type fields through the reflection API. This maps to an `ExistentialSpecializedType` in the AST, and I'm lowering it to the IR as a `BindExistentialsType`, although that isn't 100% correct for the future.
This feature will require a future PR to actually flesh out the implementation work, but I'll wait until that is the sticking point on the application side before I do that.
Introduce a tiny `Hasher` abstraction
-------------------------------------
While implementing all the boilerplate for a new `Type` subclass (we really need to reduce that work...), I got fed up with how we do hash-code computation and introduced a small utility `Hasher` type that is intended to wrap up the idiom of combining hashes. For now this isn't a major change, but in the future I'd like to expand on the design a bit to clean up some of the warts around how we handle hashing:
* The `Hasher` implementation can and should switch from maintaining a single `HashCode` as its state to something that contains a more complete state (larger than the hash code) and just hashes new bytes into that state as it goes. This should make it possible to implement a `Hasher` for more serious hash functions, whether MD5, CityHash, or whatever we decide is good default.
* Things that are hashable shouldn't have a `getHashCode()` method, but instead should have something like a `hashInto(Hasher&)` method. This change would have the dual benefits that (1) a composite type can easily hash all the fields that contribute to its identity into the hasher with minimal fuss/boilerplate, and (2) the hashes for composite types will be of higher quality because they can exploit all the bits of the hasher's state to combine the fields, instead of restricting each sub-field to just the bits in a hash code.
We should be able to incrementally improve the quality of our design there over future changes, but for now it probably isn't a critical priority.
Fixes for legalization of existential types
-------------------------------------------
There were some missing cases in the handling of type legalization, such that a global interface-type shader parameter that got specialized to a type that contains *only* resource-type fields would cause a crash in the legalization step.
I added a test for this case, and then made `ir-legalize-types.cpp` account for this case (the code to handle it ias a bit of a kludge, and shows that the `declareVars()` routine there is getting to a level of complexity that is worrying.
* fixup: review feedback
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* List made members m_
Tweaked types to closer match conventions.
* Use asserts for checking conditions on List.
Other small improvements.
* List<T>.Count() -> getSize()
* List<T>
Add -> add
First -> getFirst
Last -> getLast
RemoveLast -> removeLast
ReleaseBuffer -> detachBuffer
GetArrayView -> getArrayView
* List<T>::
AddRange -> addRange
Capacity -> getCapacity
Insert -> insert
InsertRange -> insertRange
AddRange -> addRange
RemoveRange -> removeRange
RemoveAt -> removeAt
Remove -> remove
Reverse -> reverse
FastRemove -> fastRemove
FastRemoveAt -> fastRemoveAt
Clear -> clear
* List<T>
FreeBuffer -> _deallocateBuffer
Free -> clearAndDeallocate
SwapWith -> swapWith
* List<T>
SetSize -> setSize
Reserve -> reserve
GrowToSize growToSize
* UnsafeShrinkToSize -> unsafeShrinkToSize
Compress -> compress
FindLast -> findLastIndex
FindLast -> findLastIndex
Simplify Contains
* List<T>
Removed m_allocator (wasn't used)
Swap -> swapElements
Sort -> sort
Contains -> contains
ForEach -> forEach
QuickSort -> quickSort
InsertionSort -> insertionSort
BinarySearch -> binarySearch
Max -> calcMax
Min -> calcMin
* Initializer::Initialize -> initialize
List<T>::
Allocate -> _allocate
Init -> _init
IndexOf -> indexOf
* * Put #include <assert.h> in common.h, and remove unneeded inclusions
* Small refactor of ArrayView - remove stride as not used
* getSize -> getCount
setSize -> setCount
unsafeShrinkToSize->unsafeShrinkToCount
growToSize -> growToCount
m_size -> m_count
* Some tidy up around Allocator.
* Use Index type on List.
* Refactor of IntSet.
First tentative look at using Index.
* Made Index an Int
Did preliminary fixes.
Made String use Index.
* Partial refactor of String.
* String::Buffer -> getBuffer
ToWString -> toWString
* Small improvements to String.
String::
Buffer() -> getBuffer()
Equals() -> equals
* Try to use Index where appropriate.
* Fix warnings on windows x86 builds.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add better control over image formats for GLSL/SPIR-V targets
Currently Slang emits GLSL code assuming all R/W images need to have explicit formats, and thus we try to infer a format from the element type of the image.
E.g., given a `RWTexture2D<half4>` we might infer that a qualifier of `layout(rgba16f)` should be used.
This strategy has two notable shortcomings:
* Sometimes the user will want a format that doesn't match an existing HLSL type. E.g., if they want the equivalent of `layout(r11f_g11f_b10f)`, then what should they put in their `RWTexture2D<...>` to make the inference do what they need?
* Sometimes the user knows that they don't need to specify a format *at all*, because using the `GL_EXT_shader_image_load_formatted` extension, they can still perform non-atomic load/store on images with no format specified in the SPIR-V.
This change adds two features directed at these challenges.
First, we add an explicit `[format(...)]` attribute that can be used to specify an explicit image format, including ones that don't match any HLSL type.
An example of using this new attribute is:
```hlsl
[format("r11f_g11f_b10f")]
RWTexture2D<float3> myImage;
```
For simplicity in initial bring-up, the new formats all use the same naming as formats in GLSL (this should make it easy for a programmer who knows what they expect to get in the GLSL output). We can change the naming convention for formats at a later time, so long as we keep these existing names in as a compatibility feature.
Note that this is *not* given a `vk::` prefix since the attribute should signal the programmer's intent to provide an image with that format on *all* targets (although only some targets might act on that information).
Also note that the attribute takes a string (`[format("rgba8")`) instead of a bare identifier (`[format(rgba8)]`) because this is consistent with the existing convention for attributes in HLSL.
When `[format(...)]` is left off, the default compiler behavior will still be to infer a format, but this behavior can be overidden for a single image using an explicit format of `"unknown"`:
```hlsl
[format("unknown")]
RWTexture2D<float4> mysteryMachine;
```
The second new feature is that if a user knows they are coding for a GPU that supports the `"unknown"` format in all non-atomic cases, then they can opt into making that the default for images without an explicit `[format(...)]`, using the new `-default-image-format-unknown` command-line option for `slangc`.
The new test case included with this change confirms that we correctly see the explicit formats in the output GLSL and *no* formats for images without explicit `[format(...)]` when using the new command-line option. The test stresses images declared at global scope, in parameter blocks, and in entry-point parameter lists, to try and make sure that all the relevant IR passes in the compiler preserve the format information.
* fixup: missing file
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Allow plugging in types with resources for interface parameters
The key feature enabled by this change is that you can take a shader declared with interface-type parameters:
```hlsl
ConstantBuffer<ILight> gLight;
float4 myShader(IMaterial material, ...)
{ ... }
```
and specialize its interface-type parameters to concrete type that can contain resources like textures, samplers, etc.
The hard part of doing this layout is that we need to support signatures that include a mix of interface and non-interface types. Imagine this contrived example:
```hlsl
float4 myShader(
Texture2D diffuseMap,
ILight light,
Texture2D specularMap)
{ ... }
```
We end up wanting `diffuseMap` to get `register(t0)` and `specularMap` to get `register(t1)`, so that they have the same location no matter what we plug in for `light`.
But if we plug in a concrete type for `light` that needs a texture register, we need to allocate it *somewhere*.
We handle this by having the `TypeLayout` for `light` come back with a "primary" type layout that doesn't have any texture registers, but with a "pending" type layout that includes the texture register requirements of whatever concrete type we plug in.
This split between "primary" and "pending" layout then needs to work its way up the hierarchy, so that an aggregate `struct` type with a mix of interface and non-interface fields (recursively), needs to compute an aggregate "primary type layout" and an aggregate "pending type layout," and then each field needs to be able to compute its offset in the primary/pending layout of the aggregate.
A large chunk of the work in this PR is then just implementing the split between primary and pending data, and ensuring that layouts are computed appropriately.
The next catch is that when a "parameter group" (either a parameter block or constant buffer) contains one or more values of interface type, then we can allow the parameter group to "mask" some of the resource usage of the concrete types we plug in, but others "bleed through."
For example, if we have:
```hlsl
struct MyStuff { float3 color; ILight light; }
ConstantBuffer<MyStuff> myStuff;
struct SpotLight { float3 position; Texture2D shadowMap; }
``
If we plug in the `SpotLight` type for `myStuff.light`, then the `float3` data for the light can be "masked" by the fact that we have a constant buffer (we can just allocate the `float3` `position` right after `color`), but the `Texture2D` needed for `shadowMap` needs to "bleed through" and become "pending" data for the `myStuff` shader parameter.
Adding support for that detail more or less required a full rewrite of the logic for allocating parameter group type layouts.
The next detail is that when we go to legalize a declaration like the `myStuff` buffer, we will end up with something like:
```hlsl
struct MyStuff_stripped { float3 color; }
struct Wrapped
{
MyStuff_stripped primary;
SpotLight pending;
}
ConstantBuffer<Wrapped> myStuff;
```
This "wrapped" version of the buffer type more accurately reflects the layout we need/want for the uniform/ordinary data, but in order to further legalize it and pull out the resource-type fields like `shadowMap` we need to have accurate layout information, and the problem is that layout information for the original buffer can't apply to this new "wrapped" buffer.
The last major piece of this change is logic that runs during existential type legalization to compute new layouts for "wrapped" buffers like these that embeds correct offset/binding/register information for any resources nested inside them. A key challenge in that code is that existential legalization needs to erase any "pending" data from the program entirely, so that offset information that used to be relatie to the "pending" part of a surrounding type now needs to be relative to the primary part.
The work here may not be 100% complete for all scenarios, but it does well enough on the new and existing tests that I want to checkpoint it. Note that a few other tests have had their output changed, but in all cases I've reviewed the diffs and determined that the change in observable behavior is consistent with what we intened Slang's behavior to be.
Note that there is still one major piece of support for interface-type parameters that is missing here, and which might force us to revisit some of the decisions in this code: we don't properly support user-defined `struct` types with interface-type fields.
* fixup: typos
|
| |
|
|
| |
The `tuple` case in `getPointedToType` was failing to add the elements it computed to the output tuple type.
This isn't triggered in any of our test cases, but was caught by some work I was doing in another branch.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before type legalization we might have code like: (using pseudo-Slang-IR):
struct P { ... Texture2D<float>[] t; }
global_param p : ParameterBlock<P>;
...
// p.t[someIndex].Load(...);
//
let ptrToArrayOfTextures = getFieldPtr(p, "t") : Ptr<Texture2D<float>[]>;
let ptrToTexture = getElementPtr(ptrToArrayOfTextures, someIndex) : Ptr<Texture2D<float>>;
let texture = load(ptrToTexture) : Texture2D<float>;
let result = call(loadFunc, texture, ...) : float;
Legalization needs to move the `t` array there out of the `p` parameter block, so the global declarations become something like:
struct P_Ordinary { ... }; // no more "t" field
global_param p_ordinary : ParameterBlock<p_ordinary);
global_param p_t : Texture2D<float>[];
In terms of the code to access `p.t[someIndex]` the problem is that `p_t` has one less level of indirection than `p.t` had. We solve this in the type legalization pass using "pseudo-types" and "pseudo-values," where one of the cases is `implicitIndirect` which holds a value of type `T`, but indicates that it should act like a value of type `T*`.
We then use some basic rules for dealing with `implicitIndirect` values, such as:
load(implicitDeref(x)) : T => x : T
getFieldPtr(implicitDeref(s), f) => implicitDeref(getField(s, f))
getElementPtr(implicitDeref(a), i) => implicitDeref(getElement(a, i))
The bug here was that for the `getFieldPtr` and `getElementPtr` cases, we weren't computing the type of the `getField` or `getElement` instruction correctly. We were copying the type from the `getFieldPtr` or `getElementPtr` operation over directly, but those will be *pointer* types and we need the type of whatever they point to.
Once the types are fixed, we can properly generate legalized IR for `p.t[someIndex].Load(...) that looks like:
let arrayOfTextures = p_t : Texture2D<float>[];
let texture = getElement(arrayOfTextures, someIndex) : Texture2D<float>;
let result = call(loadFunc, texture, ...) : float;
The old was giving the `texture` intermediate a type of `Ptr<Texure2D<float>>`. That didn't actually trip up too many things, because we mostly just went on to emit code from something with slightly incorrect types for intermediates that never show up in the generated HLSL/GLSL.
Where this caused a problem is for some of the intrinsic function definitions for the GLSL/Vulkan back-end, because those do things that inspect operand types. In particular the `$z` opcode in our intrinsic function strings triggers logic that looks at a texture operand, and uses its type to try to find the appropriate swizzle to get from a 4-component vector to the appropriate type for the operation (e.g., for a load from a `Texture2D<float>` we need to swizzle with `.x` to get a single scalar out of the matching GLSL texture fetch operation).
The main fix in this change is thus to make `getElementPtr` and `getFieldPtr` legalization properly account for the fact that when switching to `getElement` or `getField` we need a result type that is the "pointee" of the original result.
There was already logic to extract the pointed-to type from a pointer in `ir-specialize.cpp`, so I extracted that to a re-usable function in the IR as `tryGetPointedToType` (returns null if the type isn't actually a pointer).
This logic needed to be extended for type legalization, to deal with the various "pseudo-type" cases.
There is another fix in this change which is marking the `NonUniformResourceIndex` function as `[__readNone]`, which enables it to be more aggressively folded into use sites. Without that fix, we risk emitting code like:
```glsl
int tmp = nonUniformEXT(someIndex);
vec4 result = texelFetch(arrayOfTextures[tmp], ...);
```
The problem with that code is that (at least by my reading of the spec), assigning to the variable `tmp` that isn't declared with the `nonUniformEXT` qualifier effectively loses that qualifier, and drivers are free to assume that `tmp` is uniform when used to index into `arrayOfTextures`.
Marking the `NonUniformResourceIndex` function as `[__readNone]` indicates that it has no side effects, which should mean that our emit logic no longer needs to emit it was its own line of code to be safe.
The effects of this change are confirmed by both the new test case added, and the existing `non-uniform-indexing` test.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Improve support for interfaces as shader parameters
This change adds two main things over the existing support:
1. It is now possible to plug in concrete types that actually contain (uniform/ordinary) fields for the existential type parameters introduced by interface-type shader parameters. The `interface-shader-param2.slang` test shows that this works.
2. There is a limited amount of support for doing correct layout computation and generating output code that matches that layout, so that interface and ordinary-type fields can be interleaved to a limited extent. The `interface-shader-param3.slang` test confirms this behavior.
There are several moving pieces in the change.
* When it comes to terminology, we try to draw a more clear distinction between existial type parameters/arguments and existential/object value parametes/arguments. A simple way to look at it is that an `IFoo[3]` shader parameter introduces a single existential type parameter (so that a concrete type argument like `SomeThing` can be plugged in for the `IFoo`) but introduces three existential object/value parameters (to represent the concrete values for the array elements).
* At the IR level, we support a few new operations. A `BindExistentialsType` can take a type that is not itself an interface/existential type but which depends on interfaces/existentials (e.g., `ConstantBuffer<IFoo>`) and plug in the concrete types to be used for its existential type slots.
* Then a `wrapExistentials` instruction can take a type with all the existentials plugged in (possibly by `BindExistentialsType`) and wrap it into a value of the existential-using type (e.g., turn `ConstantBuffer<SomeThing>` into a `ConstantBuffer<IFoo>`).
* The IR passes for doing generic/existential specialization have been updated to be able to desugar uses of these new operations just enough so that a `ConstantBuffer<IFoo>` can be used.
* When we specialize an IR parameter of an interface type like `IFoo` based on a concrete type `SomeThing`, we turn the parameter into an `ExistentialBox<SomeThing>` to reflect the fact that we are conceptually referring to `SomeThing` indirectly (it shouldn't be factored into the layout of its surrounding type).
* Parameter binding was updated so that it passes along the bound existential type arguments in a `Program` or `EntryPoint` to type layout, so that we can take them into account. The type layout code needs to do a little work to pass the appropriate range of arguments along to sub-fields when computing layout for aggregate types.
* Type layout was updated to have a notion of "pending" items, which represent the concrete types of data that are logically being referenced by existential value slots. The basic idea is that these values aren't included in the layout of a type by default, but then they get "flushed" to come after all the non-existential-related data in a constant buffer, parameter block, etc.
* The logic for computing a parameter group (`ConstantBuffer` or `ParameterBlock`) layout was updated to always "flush" the pending items on the element type of the group, so that the resource usage of specialized existential slots would be taken into account.
* The type legalization pass has been adapted so that we can derive two different passes from it. One does resource-type legalization (which is all that the original pass did). The new pass uses the same basic machinery to legalize `ExistentialBox<T>` types by moving them out of their containing type(s), and then turning them into ordinary variables/parameters of type `T`.
Big things missing from this change include:
- Nothing is making sure that "pending" items at the global or entry-point level will get proper registers/bindings allocated to them. For the uniform case, all that matters in the current compiler is that we declare them in the right order in the output HLSL/GLSL, but for resources to be supported we will need to compute this layout information and start associating it with the existential/interface-type fields.
- Nothing is being done to support `BindExistentials<S, ...>` where `S` is a `struct` type that might have existential-type fields (or nested fields...). Eventually we need to desugar a type like this into a fresh `struct` type that has the same field keys as `S`, but with fields replaced by suitable `BindExistentials` as needed. (The hard part of this would seem to be computing which slots go to which fields). As a practial matter, this missing feature means that interface-type members of `cbuffer` declarations won't work.
The current tests carefully avoid both of these problems. They don't declare any buffer/texture fields in the concrete types, and they don't make use of `cbuffer` declarations or `ConstantBuffer`s over structure types with interface-type fields.
* fixup: add override to methods
* fixup: typos
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Replace dynamicCast with as where does not change behavior (ie not Type derived).
Use free function where scoping is clear.
* Replace uses of dynamicCast with as when there is no difference in behavior.
* Remove the IsXXXX methods from Type.
* Don't have separate smart pointer to store canonicalType on Type.
* Simplify Slang.FilteredMemberRefList.Adjust, such does the cast directly.
* Use free as where appropriate.
* Use free function version of casts where appropriate.
* Fix text in casting.md
* Fix typos in decl-refs.md
* Remove the uses of free function as on RefDecl.
Add 'canAs' to RefDecl as a way to test if a cast is possible.
Moved 'as' into RefDeclBase.
* Use 'is' to test for as cast on smart pointers.
Fix small scope issue.
* * Cache stringType and enumTypeType on the Session
* Make DeclRefType::Create return a RefPtr
* Make casting of result use the *method* .as (cos using free function would mean objects being wrongly destroyed)
* Make results from createInstance ref'd to avoid possible leaks.
* Fix typo in template parameter for is on RefPtr.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Made dynamicCast a free function.
* Replace As with as or dynamicCast depending on if it is a type.
* Fix problem with using non smart pointer cast.
* Removed legacy asXXXX methods.
* Remove As from Type.
* Removed As from Qual type -> made coercable into Type*, such that can just use free 'as'.
* Remove left over QualType::As() impl.
* Remove As from SyntaxNodeBase.
* Made as for instructions implemented by dynamicCast.
* Replace As on DeclRef. Use the global as<> to do the cast.
* Add const safe versions of dynamicCast and as for IRInst
|
| |
|
|
|
|
|
|
|
|
|
|
| |
* Fixup handling of empty structs in function return types and parameters.
* Bug fix in legalizeFunc()
* More comprehensive empty struct test
* Fix `legalizeFieldExtract` for empty struct field.
* Add empty struct handling for construct inst
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Specialize away resource-type function parameters
Work on #397.
Introduction
------------
Suppose a user writes a function that takes a resource type as a parameter:
```hlsl
float4 getThing(RWStructuredBuffer<float4> buffer, int index)
{
return buffer[index];
}
```
This function creates challenges when generating code for GLSL-based targets, because a global shader parameter of type `RWStructuredBuffer`:
```hlsl
RWStructuredBuffer<float4> gBuffer;
```
translates to a global GLSL `buffer` declaration:
```hlsl
buffer _S0
{
float4 _data[];
} gBuffer;
```
There is no equivalent to that `buffer` declaration that can be used in function parameter position, and it is illegal in GLSL to pass `gBuffer` into a function.
(Aside: yes, we could in principle translate a function parameter like `RWStructuredBuffer<float4> buffer` to `float4 buffer[]`, but that will not in turn generalize to arrays of structured buffers; it is a dead-end strategy)
The solution employed by many shader compilers is to "inline everything" to eliminate the need for parameters of resource types, and then rely on dataflow optimization to eliminate locals of resource types. This strategy can of course lead to an increase in code size, and it also means that call stacks are lost when doing step-through debugging. Another serious issue is that an "early `return`" from a function can turn into the equivalent of a multi-level `break` when inlined, and not all of our targets support multi-level `break`.
The solution implemented in this change works around some, but not all, of the problems with full inlining.
The approach here generates specialized versions of a function like `getThing`, adapted to the actual arguments provided at different call sites.
Thus if we have code like:
```hlsl
RWStructuredBuffer<float4> gA;
RWStructuredBuffer<float4> gB[10];
...
getThing(gA, x);
getThing(gA, y);
getThing(gB[someVal], z);
```
we will generate two specializations of `getThing`: one specialized for the `buffer` parameter being `gA` and the other for `gB`:
```hlsl
float4 getThing_gA(int index) { return gA[index]; }
float4 getThing_gB(int _val, int index) { return gB[_val][index]; }
```
and the call sites will change to match:
```hlsl
getThing_gA(x);
getThing_gA(y);
getThing_gB(someVal, z);
```
Note how in the case where the argument being passed in was obtained by indexing into an array of resources, the callee is specialized to the identity of the global shader parameter (`gB`), and now accepts a new parameter to indicate the array index into it.
While this description motivates the change based on GLSL output, the same basic issue can arise for other targets.
For example, while current HLSL has added the `ConstantBuffer<T>` type, it is not supported on older targets, and it turns out that even dxc does not allow functions to have `ConstantBuffer<T>` parameters.
Longer-term, we will likely need to do even more aggressive specialization both in order to generate SPIR-V output directly, and also to deal with function that have return values or `out` parameters of resource types.
Implementation
--------------
The meat of the change is in `ir-specialize-resources.{h,cpp}`, where we have a pass that looks at all call sites (`IRCall` instructions) in the program, and attempts to replace them with calls to specialized functions, where the specializations are generated on-demand.
The code in this pass is heavily commented, so hopefully it serves to explain itself all right.
After specialization is complete, we may still have functions like the original `getThing` that will produce invalid code when emitted as GLSL, so we need a way to make sure they don't appear in the output.
To date we've had some very ad hoc approaches for ignoring IR constructs that we don't want to affect emitted code, but this change goes ahead and adds a more real dead code elimination (DCE) pass in `ir-dce.{h,cpp}`.
This pass follows a straightforward approach of tagging instructions that are "live" and then propagating liveness through the whole program, before making a single pass to delete anything that isn't live.
When I first added the DCE pass it eliminated *everything* because there were no "roots" for liveness.
I solved this for now by adding a new decoration, `IREntryPointDecoration`, to mark shader entry points in the IR which should always be live (as should anything they depend on).
A secondary problem that arose was that for GLSL ray tracing shaders it is possible for the incoming/outgoing payload or attributes parameters to be unused, but eliminating them as dead would change the signature of a shader an potential break the rules for how ray tracing programs communicate.
I added a very simple `IRDependsOnDecoration` that allows one IR instruction to keep another alive *as if* it used it, without actually using it.
There's also a fixup in the IR dumping logic where I was forgetting to store anything in the mapping from instruction to their names, so that the name of an instruction was getting incremented each time it was referenced.
Testing
-------
There are three different tests added as part of this change:
* The `compute/func-resource-param` test covers the basic `RWStructuredBuffer` case above, which we expect to work fine for D3D11/12, but fail for Vulkan without specialization.
* The `cross-compile/func-resource-param-array` test covers the case where we don't just have one resource, but an array of them. This is not an end-to-end compute test primarily because our `render-test` application doesn't yet handle arrays of resources correctly in its binding logic.
* The `compute/func-cbuffer-param` test covers the case of a function with a `ConstantBuffer<T>` parameter, which requires specialization to become valid for any of our targets.
* fixup: warnings/errors from other compilers
* fixup: typos and cleanup
* fixup: typos
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this change, global shader parameters were represented in the IR as just being ordinary global variables.
The only indication that a particular global represented a parameter was when it got a layotu attached to it (as part of back-end processing), and we've had a number of bugs related to layouts being dropped so that what should have been a shader parameter turned into an ordinary global variable in the output.
This change is more strongly motivated by the fact that making shader parameters look like globals means that we cannot easily reason about their value when doing IR transformations.
If we see two `load`s from the same global variable can we assume they yield the same value?
In the general case we cannot, and this means that any transformation that wants to rely on the fact that an input `Texture2D` shader parameter can't actually change over the life of the program needs to do extra work.
The fix here is to introduce a new kind of IR instruction that represents a global shader parameter directly (not a pointer to it as a global would), at which point there isn't even such a notion as a "load" from the parameter, since it represents the value directly.
In several cases logic that used to apply to global variables in case they were shader parameters (by looking for a layout) is now moved to apply to these global parameters.
The biggest source of issues in this change was that switching from pointers to plain values to represent these shader parameters stresses different cases in type legalization. I also had to deal with the case of legalization for GLSL where we actually *do* need global shader parameters that are writable (since varying output goes in the global scope), but in that case I borrowed the use of pointer-like `Out<...>` and `InOut<...>` types to represent that intent, which we were already using for function parameters representing outputs.
A few tests started failing because the changes lead to a slightly different order of code emission, which in some HLSL tests resulted in a function parameter named `s` getting emitted before a global parameter named `s`, leading to the latter getting the name `s_1` instead of `s_0`.
A few SPIR-V tests started failing because the new approach means that we no longer end up performing a load from all varying input parameters at the start of `main` and instead reference the varying inputs directly. The resulting code is more idomatic, but it differed from the baselines for those tests.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Move mangled name out of IRGlobalValue
Previously the `IRGlobalValue` type was used as a root for all IR instructions that can have "linkage," in the sense that a definition in one module can satisfy a use in another module.
The mangled symbol name was stored in state directly on each `IRGlobalValue`, which created some complications, and also forced IR instructions that wanted to support linkage to wedge into the hierarchy at that specific point.
This change moves the mangled name out into a decoration: either an `IRImportDecoration` or an `IRExportDecoration`, both of which inherit from `IRLinkageDecoration` which exposes the mangled name.
This change has a few benefits:
* We can now have any kind of instruction be exported/imported, without having to inherit from `IRGlobalValue`. This could potentially let `IRStructType` and `IRWitnessTable` be simplified to just have operand lists instead of dummy chldren as they do today.
* We can now easily have "global values" like functions that explicitly *don't* get linkage, instead of using a null or empty mangled name as a marker.
* We can use the exact opcode on a linkage decoration to distinguish imports from exports, which could be used to more accurately resolve symbols during the linking step.
Other than adding the decorations and making sure that AST->IR lowering adds them, the main changes here are around any code that used `IRGlobalValue`. Variables and parameters of type `IRGlobalValue*` were changed to `IRInst*` easily, so the main challenge was around code that *casts* to `IRGlobalValue*.
In cases where a cast to `IRGlobalValue` also performed a test for the mangled name being non-null/non-empty, we simply switched the code to check for the presence of an `IRLinkageDecoration`, since that is the new way of indicating a value with linakge.
Most of the serious complications arose in `ir.cpp` around the "linking"/target-specialization and generic specialization steps.
The "linking" logic was checking for `IRGlobalValue` to opt into some more complicated cloning logic, and just checking for a linkage decoration here wasn't sufficient since the front-end *does* produce global values without linkage in some cases (e.g., for a function-`static` variable we produce a global variable without linkage). This logic was updated to just check for the cases that used to amount to `IRGlobalValue`s directly by opcode. It might be simpler in the short term to have kept `IRGlobalValue` around to make the existing casts Just Work, but I'm confident that this logic could actually be rewritten for much greater clarity and simplicity and that is the better way forward.
The generic specialization logic was using some really messy code to generate a new mangled name to represent the specialized symbol, and then searching for an existing match for that name.
The original idea there was that an IR module could include "pre-specialized" versions of certain generics to speed up back-end compilation by eliminating the need to specialize in some cases, but this feature has never been implemented so the overhead here is just a waste.
Instead, I moved generic specialization to use a simpler dictionary to map the operands to a `specialize` instruction over to the resulting specialized value.
This allows for some simplifications in the name mangling logic, because it no longer needs to figure out how to produce mangled names from IR instructions representing types/values.
As part of this change I also overhauled the IR emit logic to produce cleaner output by default, borrowing some of the ideas from the logic in `emit.cpp`. IR values are now automatically given names based on their "name hint" decoration, if any, to make the code easier to follow, and I also made it so that types and literals get collapsed into their use sites in a new "simplified" IR dump mode (which is currently the default, with no way to opt into the other mode without tweaking the code). The resulting IR dumps are much nicer to look at, but as a result the one test that involves IR dumping (`ir/string-literal`) doesn't really test what it used to.
One weird issue that came up during testing is that the `transitive-interface` test had previously been producing output that made no sense (that is, the expected output file wasn't really sensible), and somehow these changes were altering its behavior. Changing the test to use `int` values instead of `float` was enough to make the output be what I'd expect, and hand inspection of generating DXBC has me convinced we were compiling the `float` case correctly too. There appears to be some issue around tests with floating-point outputs that we should investigate.
* fixup: C++ declaration order
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Make a test case use IR serialization
* Make all IR instructions usable as parents
This makes it so that every `IRInst` has the list of children that used to be on `IRParentInst` and eliminates `IRParentInst`.
Most places in the code were only checking against `IRParentInst` so that they could know whether there were child instructions to iterate over.
This change bloats the size of every instruction by two pointers, but we hope to be able to eliminate that overhead with a better encoding later.
* Change IR decorations to be instructions.
The main change here is that `IRDecoration` now inherits from `IRInst`, and `IRInst` now has a single linked list that holds both decorations *and* children.
At each point where code used to loop over `getChildren()` on an `IRInst`, I checked whether it made sense to leave the operation as processing just the children, or if it should process both decorations and children.
The thorniest bit was making sure the logic for inserting an instruction into a parent is correct. For the most part, once IR code is built all insertions are explicitly before/after another instruction, so the ordering can't get messed up. The sticking point is any code that does an explicit `insertAtStart` or `insertAtEnd`, but I surveyed those to make sure they are correct in context, and I also made all insertions bottleneck through one routine that does a better job of asserting the preconditions than what was there before. We may still want a "smart" insertion function at some point so that if somebody does `someDecoration->insertAtEnd(someInst)` the decoration intelligently goes to the end of the decoration list, and not the entire decorations-and-children list.
All of the existing decoration types were refactored to provide accessors for their operands, rather than directly exposing fields. In most cases the operands are required to be `IRConstant` nodes of fixed types. Not all of these types need to be kept around in the new approach, but they were left in so that as much existing code as possible can be kept working.
The `IRBuilder` was extended with factory functions to make the various decoration types and attach them.
All the fields in concrete decorations that were using `StringRepresentation` or `Name` pointers are now using IR-level string operands which provide their value as an `UnownedStringSlice`, so logic that was working with those decoration values needed to be updated here and there. I also needed to add the logic to clone string-literal values to the IR cloning pass, since they are now being used in almost every piece of code.
A new type of constant IR instruction for literal pointers was added, to handle the cases where an IR decoration needs an operand that is a raw AST-level pointer. These are even being serialized, although we obviously should not rely on them to round-trip through serialization in the future. Ideally, a follow-on change should add a cleanup pass where we remove any decorations from a module that shouldn't be allowed in the serialized code.
The biggest overall cleanup is in the serialization logic, where a lot of code just disappears because it can process the raw "decorations and children" list as the logical children of an IR instruction. The only special cases left are literals (which seem like they will always need special-casing) and global values (because they have a mangled name, which we plan to move into a decoration).
One other example of a simplification made possible by this change: the `IRNotePatchConstantFunc` instruction was implemented as an instruction only because it couldn't be encoded as a decoration at the time (it needed to have an operand that referenced an IR function).
The IR dumping logic was also updated (which meant a change to the `ir/string-literal` test) to try to make it print out all decorations a bit more systematically now that they are encoded like other instructions. The formatting isn't quite perfect, but it is good enough to be able to read what is going on.
I didn't include updates to the validation logic to ensure that decorations are being added in ways that follow the invariants, but that would be a nice thing to add next.
* fixup: 64-bit issues
* fixup: forward declaration issues
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Premake work in progress for linux.
* Added dump function.
* Remove examples on linux
Small warning fix.
* * Don't build render-test on linux
* Removed work around virtual destructor warning, and just used virtual dtor for simplicity
* Git ignore obj directories
* Fix premake working on windows.
* * Fix sprintf_s functions
* Make generates arg parsing more robust
* Added FloatIntUnion to avoid type punning/strong aliasing issues, and repeated union definitions.
* Work around problems building on linux with getClass claiming a strict aliasing issue.
* Fix for targetBlock appearing potentiall used unintialized to gcc.
* Linux slang link options -fPIC to make dll.
* Add -fPIC to build options on linux.
* Add -ldl for linux on slang.
* Fixes to try and get premake working with .so on linux.
* Make core compile with -fPIC
* Try to fix linux linking with --no-as-needed before -ldl
* Add rpath back.
* Remove render-gl from linux build.
* Re-add location for linux.
* Don't include <malloc.h> except on windows.
* Remove unused line to fix warning on osx.
* Remove ambiguity on OSX for operator <<.
* Fixing ambiguity with operator overloading and Int types for OSX.
* Fix ambiguity around UInt and operator
* Fix ambiguity of UInt conversion for OSX.
* Added UnambiguousInt and UnambiguousUInt to make it easier to work around OSX integer coercion for UInt/Int types.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* * Remove dispose from IRInst
* Use MemoryArena instead of MemoryPool
* Make all IRInst not require Dtor - by having ref counted array store ptrs that need freeing
* Increase block size - typically compilation is 2Mb of IR space(!)
* Fix issues around StringRepresentation::equal because null has special meaning.
* Don't bother to construct as String to compare StringRepresentation, just used UnownedStringSlice.
* Added fromLiteral support to UnownedStringSlice and use instead of strlen version.
* Use more conventional way to test StringRepresentation against a String.
* Fix gcc/clang template problem with cast.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
PR #577 tries to eliminate empty `struct` types by replacing them with a `LegalType::tuple` with zero elements, but this seems to run into problems in some cases, where we end up trying to match up `::none` values with empty `::tuple`s.
An alternative way to handle this is to never create empty `LegalType::tuple`s (and the same for `LegalVal::tuple`), and instead create `LegalType::none` and `LegalVal::none`. PR #577 avoided this because there were various cases in the legalization logic that didn't robustly handle `LegalType::Flavor::none`.
This PR thus includes two main changes:
1. Construct a `::none` type when we have an empty `struct` type.
2. Survery all places that handle the `::tuple` case and extend them to handle the `::none` case if it was missing.
This fixes an issue filed in Falcor's internal GitLab as number 424.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* render-test should not fail on HLSL compiler *warnings*
The logic in `render-test` that invokes `D3DCompile` was causing a test to fail if it produced any warnings (not just if compilation fails).
Warning output can be dealt with by the test runner, since it will compare output between runs anyway, and it is useful to be able to run something through `render-test` that compiles with warnings.
* Be more careful about deleting IR instructions
There was an `IRInst::deallocate()` method that had a precondition that the instruction should already be removed from its parent and clear out all its operands before calling, but it wasn't checking this and the few call sites weren't doing things right either.
I consolidated things on `IRInst::removeAndDeallocate()` which does all the things: removes from the parent, clear out operands, and then deallocates.
I also made sure to clear out the type operand.
This clears up some crashing issues where passes were removing instructions but those instructions would still show up as users of other instructions.
* Don't emit bitwise not for non-Boolean types
It seems like the logic in `emit.cpp` messed things up and decided that `Not` (the IR instruction that is equivalent to `!` in the AST) should emit as `!` for Boolean types and `~` for other types, but this makes no sense (e.g., `~(a & 1)` is very different from `!(a & 1)`, even when interpreted as a condition).
It seems like this logic was intended for the `BitNot` case, where `~a` and `!a` are actually equivalent for Boolean values (but a target language might not like `~a` on `bool` values).
Maybe the original plan was that the `Not` instruction should only apply to Boolean values in the first place, and that other values should be converted to `bool` (or a vector of `bool`) before applying `Not`, but even in that case the emit logic makes no sense.
This caused an actual problem for one of my test cases, so it was important to fix it now.
* Fix issue with cached resolution for overoaded operators
The basic problem was that the lookup logic was forming a key based on the *first* definition it found for the overloaded operator, but that means that when processing a prefix `++a` call we might look up the *postfix* definition of `operator++` and decide to use its opcode as the key.
This "fixes" the logic by looking for the first definition with a "compatible" definition (e.g., a `__prefix` function if we are checking a `PrefixExpr`), and then uses its opcode.
A better fix in the long run would be to make the cache just be keyed on the operator name and the "fixity" of the expression (prefix, postfix, or infix).
* Introduce an intermediate structured control-flow representation
The code previously used a single function called `emitIRStmtsForBlocks` in `emit.cpp` that would take a logical sub-graph of the CFG and emit it as high-level statements.
It would do this by recognizing operations like coniditional branches that it could turn into high-level `if` statements, etc.
The main problem with this function was that it mixed together the logic for how we restructure the program with the logic for how we emit high-level code from that structure.
This change splits those two parts of the algorithm by introducing an intermediate data structure: a tree of `Region`s, which represent single-entry regions of the CFG.
There are subclasses of `Region` corresponding to various structured control-flow constructs, and then a leaf case that wraps a single `IRBlock`.
The new function `generateRegionsForIRBlocks()` (in `ir-restructure.cpp`) now handles the restructuring work, by building one or more `Region`s to represent a sub-graph, while `emitRegion()` handles emitting HLSL/GLSL source code from a region.
Splitting things in this way opens up some opportunities for future changes:
* We can expand the set of IR control-flow constructs allowed, so long as we can still generate structure `Region`s from them, without having to mess with the emit logic (e.g., we could start to support multi-level `break` by introducing temporaries as needed). In the limit we can generate our `Region`s using something like the "Relooper" algorithm.
* We can emit to other representations while retaining the same control-flow restructuring support. E.g., if we drop the structured information from the IR, then emitting to SPIR-V for Vulkan would require us to use the strucured control-flow information from these `Region`s.
* We can do analysis that needs to understand `Region` structure. This is relevant to issue #569, which was what prompted me to start on this work. Now that we have a representation of the nesting of `Region`s, we can use it to reason about visibility of values between blocks.
During development of this change I ran into a gotcha, in that I had been assuming each IR block would map to a single `Region`, forgetting that our current lowering of "continue clauses" in `for` loops leads to them being duplicated. The `Region` representation handles this by having a linked-list struct mapping IR blocks to the `SimpleRegion`s that represent them. I added a test case that includes a `for` loop with a continue clause that is reached along multiple paths just to make sure that we continue to support that case.
The compiler output should not change as a result of this work; this is supposed to be a pure refactoring change.
* Add a pass to resolve scoping issues in generated code
Fixes #569
The basic problem arises because the structured control flow that we output in high-level HLSL/GLSL doesn't match the "scoping" rules of an SSA IR.
In particular, SSA says that a value can be used in any block that is dominated by the definition, but in the presence of `break` and `continue` statements it is easy to construct cases where a block dominates something that is not in its scope for structured control flow. Consider:
```hlsl
for(;;) {
int a = xyz;
if(a) { int b = a; break; }
int c = a;
}
int d = b;
```
This program is invalid as HLSL, because the variable `b` is referenced outside of its scope, but if we look at the CFG for this function, it is clear that the block that computes `b` dominated the block that computes `d`. IR optimizations can easily create code like this, so we need to be ready for it.
The previous change added an explicit `Region` structure to represent the structured control flow that we re-form out of the IR, and this change adds a pass that exploits the structuring information to detect cases like the above and introduce temporaries to fix the scoping issue. For example, the pass would change the earlier code block into something like:
```hlsl
int tmp;
for(;;) {
int a = xyz;
if(a) { int b = a; tmp = b; break; }
int c = a;
}
int d = tmp;
```
That is, we introduce a new `tmp` variable at a scope "above" both the definition and use of `b`, and then we copy `b` into that temporary right where it is computed, and then use the temporary instead of the original `b` at the use site.
A few details that came up during the implementation:
* Downstream compilers may get confused by code like the above, and complain that `tmp` may be used before it is initialized, even though the very definition of dominators in a CFG means we don't have to worry about it. Still, I introduced some one-off code to initialize the temporaries just to silence spurious warnings coming from fxc.
* We need to be careful not to apply this logic to "phi nodes" (the parameters of basic blocks) since they will already be turned into temporaries by the emit logic, and trying to introduce temporaries with this pass led to broken code (I still need to investigate why). It may be that a future version of this pass should also take the code out of SSA form, so that we can introduce both kinds of temporaries in a single pass (and maybe eliminate some unnecessary variables by doing basic register allocation).
There is another transformation that could fix some issues of this kind, by moving code out of a structured control-flow construct and to the "join point" after it. For example, we could turn our loop from the start of this commit message into:
```hlsl
for(;;) {
int a = xyz;
if(a) { break; }
int c = a;
}
int b = a;
int d = b;
```
Moving the definition of `b` to after the loop is possible because there is no way to get out of the loop without executing that code anyway. Now the scoping issue for `d`'s use of `b` has gone away, but of course we've introduced a *new* scoping issue for `a`, when it gets used by `b`.
Adding a pass to re-arrange control flow like this could reduce the cases where we have to apply the current pass, but it wouldn't eliminate them entirely. That means such a pass can be deferred to future work.
This change includes a test case the reproduces the original issue, so that we can confirm the fix works.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #566
The basic problem here is that the front-end translates a structure initializer-list expression into a `makeStruct` instruction (with one argument per field), but the IR type legalization logic wasn't handling the case where a `makeStruct` is used to construct a struct value that needs to get split by legalization.
The implementation is relatively straightforward, and like the other cases of instruction legalization for compound types, it follows the shape of the `LegalType`/`LegalVal` cases. The one interesting bit is that we need to be a bit careful and filter the single argument list for `makeStruct` into two in the case where we generate a "pair" type for something that has both "ordinary" and "special" (resource) fields. Luckily the `PairInfo` data that was generated by type legalization has exactly the information we need (by design).
This change does not address several issues that could be handled in follow-on changes:
* The `makeArray` instruction will face similar issues if it is applied to a type that requires legalization: we'd need to turn an array of `LegalVal`s into a bunch of distinct arrays.
* The error message when we hit the unimplemented case here isn't great. Ideally we should provide the line number of the instruction that fails in an error message when legalization fails.
This change tries to focus narrowly on the bug at hand, and leave these issues for later changes.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Cleanups around behavior when the compiler fails
* Add another case where we try to `noteInternalErrorLoc()` if an exception in thrown. This one is the in the logic for emitting an IR instruciton. This could be improved by adding another layer at the function level (as a catch-all for instructions with no location), but something is better than nothing.
* Change a bunch of `assert()`s over to `SLANG_ASSERT()`s, so that we can theoretically take more control over them (e.g., make release builds with asserts enabled)
* Some other small cleanups around the assertions we perform.
In the survey I made, I didn't really see many obvious "smoking gun" cases where we could produce a significantly better error message for some of the unimplemented/unexpected paths, other than to actually implement the missing functionality.
* fixup
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The basic idea here is that when lowering to the IR, the front-end will attach a "name hint" to the IR instruction(s) that represent a given declaration, and then the passes that work on the IR will try to preserve and propagate those names, and then finally the emit logic will use them in place of mangled or unique names when available.
This change does *not* try to deal with the issues that arise when we try to use those variable names in the output without any modification (e.g., handling cases where they might clash with keywords or builtins in the target language). Instead, it tries to establish baseline behavior for propagating through names, so that a later change can concentrate on the issue of using those names exactly when it is legal to do so.
In order to avoid issues around the name "hints" causing problems we take two main steps:
1. We "scrub" each name to reduce it down to the allowed set of identifier characters in C-like languages, and then ensure that it doesn't do things that would be illegal in some downstream languages (e.g., consecutive underscores are not allowed in GLSL) or could clash with Slang's mangled names. This process isn't guaranteed to give distinct results for distinct inputs (it isn't a mangling scheme, after all).
2. We generate a unique ID for each occurence of a given name and always use that as a suffix. This means that even if a name happens to overlap with a keyword (if you somehow have a variable named `do`), we will still add a suffix that makes it not a problem (we'd output `do_0` which is fine).
The logic for generating these names is mostly straightforward. For simple variables, we use their given name directly, while for other declarations we try to form a name that includes their parent declaration (e.g. `SomeType.someMethod`).
Various IR passes need to propagate or preserve this information. The most interesting is type legalization, when we take a variable with an aggregate type and split some of the fields out into their own variables. In that case we generate "dotted" names like `someVar.someTexture` and rely on the emit logic to turn that into `someVar_someTexture`.
During SSA generation, if we are promoting a variable to SSA temporaries, we will try to propagate the name of the variable over to the temporaries (unless they already have a name from some other place). The same applies to block parameters ("phi nodes").
Many of the test changes need their expected output to be updated for this change. Luckily in most cases the output has gotten easier to understand.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Improve SSA promotion for arrays and structs
Fixes #518
The existing SSA pass would only handle `load(v)` and `store(v,...)`
where `v` is the variable instruction, and would bail out if `v` was
used as an operand in any other fashion.
The new pass adds support for `load(ac)` where `ac` is an "access chain"
with a gramar like:
ac :: v
| getElementPtr(ac, ...)
| getFieldAddress(ac, ...)
What this means in practical terms is that we can promote a local
variable of array or structure type to an SSA temporary even if there
are loads of individual elements/fields, as along as any *assignment* to
the variable assigns the whole thing.
I've added a test case to confirm that this change fixes passing of
arrays as function parameters for Vulkan.
* Fixup: disable test on Vulkan because render-test isn't ready
This is a fix for Vulkan, but I don't think our testing setup is ready
for it.
* Fixup: error in unreachable return case, caught by clang
* Fixups based on testing
These are fixes found when testing the original changes against the user code that originated the bug report.
* `emit.cpp`: Make sure to handle array-of-texture types when deciding whether to declare a temporary as a local variable in GLSL output
* `ir-legalize-types.cpp`: Make a not of a source of validation failures that we need to clean up sooner or later (just not in scope for this bug fix change).
* `ir-ssa.cpp`:
* When checking if something is an access chain with a promotable var at the end, make sure the recursive case recurses into the "access chain" logic instead of the leaf case
* Add some assertions to guard the assumption that any access chain we apply has been scheduled for removal
* Correctly emit an element *extract* instead of getting an element *address* when promoting an element access into an array being promoted
* Eliminate a wrapper routine that was setting up an `IRBuilder` and use the one from the block being processed in the SSA pass (since it was set up for stuff just like this)
* `ir-validate.cpp`
* Add a hack to avoid validation failures when running IR validation on the stdlib code. This case triggers for an initializer (`__init`) declaration inside an interface, since the logical "return type" is the interface type itself, which has no representation at the IR level and thus yields a null result type in a `FuncType` instruction.
|
| |
|
| |
The code was handling the "get field address" opcode (which takes a pointer to a struct and returns a pointer to a field), but didn't have a case for values. This was just an oversight.
|
| |
|
|
|
|
| |
The basic problem was that the lowering logic was constructing (more or less) `Ptr<@GroupShared X>` instead of `@GroupShared Ptr<X>`.
There were also problems with passes not propagating through rates that should have been (e.g., legalization).
I've added a test case to actually validate `groupshared` support.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Introduce an IR-level type system
Up to this point, the Slang IR has used the front-end type system to represent types in the IR.
As a result (but ultimately more importantly) the IR representation of generics and specialization has used AST-level concepts embedded in the IR.
For example, to express the specialization of `vector<T,N>` to a concrete type `float` for `T`, we needed an IR operation that could represent the specialization, with operands that somehow represented the type argument `float`.
The whole thing was very complicated.
The big idea of this change is to introduce a new representation in which types in the IR are just ordinary instructions, so that using them as operands makes sense. The hierarchy of IR types closely mirrors the AST-side hierarchy for now, and that will probably be something we should maintain going forward.
In order to make these changes work, though, I also had to do major overhauls of things like the way substitutions are performed, how we check interface conformances, the way lookup through interface types is done, etc. etc. This is a big change, and unfortunately any attempt to summarize it in the commit message wouldn't do it justice.
* Fix 64-bit build warning
* Fix up some clang warnings/errors
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The main practical change here is that things that used to be `IRValue`s, like literals, are now being expressed as instructions in the global scope.
In order to validate that things are actually being handled correctly, this change introduces an explicit "validation" pass that can be run on the IR to check for different invariants (although it doesn't check many of the important ones right now). I've left the validation pass turned off by default, but with a command-line flag to enable it. We may want to make it be on by default in debug builds, just to keep us honest. The main invariant for the moment is that when on IR instruction is used as an operand to another, it had better come from the same IR module.
Some of the existing passes were violating this rule, in particular when it came to cloning of witness tables related to global generic parameter substitution. Those features can in theory be handled better now by allowing `specialize` instructions at other scopes, but I didn't want to over-complicate this change, so I make just enough fixes to ensure that these steps always clone witness tables they get from the "symbols" on an IR specialization context. In order for this to work when recursively specializing, I had to ensure that the logic for generic specialization had a notion of a "parent" specialization context that it would fall back to to perform cloning when necessary.
This change keeps the logic that was caching and re-using the instructions for literal values within a module, but adds some logic that isn't really being tested right now for picking the right parent instruction to insert a constant instruction into. This logic doesn't trigger right now because all of the cases we are using it on have zero operands (and so they always get "hoisted" to the global scope), but eventually for things like types we want to be able to support instructions with operands (e.g., `vector<float, 4>`) and handle the case where some of those operands come from different scopes (e.g., when nested inside a generic).
The final change here is mostly cosmetic: the `IRBuilder` is now more abstract about where insertion occurs: it tracks a single `IRParentInst` to insert into, and then an optional `IRInst` to insert before. In the common case, that parent is an `IRBlock`, but it could conceivably also be the global scope, or a witness table, etc. Use sites where we used to change those fields directly now use distinct methods `setInsertInto(parent)` and `setInsertBefore(inst)` which capture the two cases we care about. Accessors are also defined to extract the current block (if the current parent is a block), and the current "function" (global value with code, if the current parent is a global value with code, or a block inside one).
With this work in place, it should be possible for a follow-on change to start putting `specialize` instructions at the global scope and thus clean up some of the on-the-fly specialization work. This work should also help with some of the requirements around a distinct IR-level type system and more explicit generics.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* IR: "everything is an instruction"
This change tries to streamline the representation of the IR in the following ways:
* Every IR value is an instruction (there is no `IRValue` type any more)
* All IR values that can contain other values share a single base (`IRParentInstruction`)
* Dynamic casts to specific IR instruction types can be accomplished with a new `as<Type>(inst)` operation, that uses the IR opcode to implement casts.
The biggest change in terms of number of lines is getting rid of `IRValue`. The diff here could probably be smaller if I'd just done `typedef IRInst IRValue;`.
Along the way I also renamed the `getArg`/`getArgs`/`getArgCount` combination over to `getOperand`/`getOperands`/`getOperandCount` to avoid being confusing when we have something like a `call` instruction where the "arguments" of the call don't line up with the operands of the instruction.
I also tried to clean up the representation of lists of child instructions to try to make it easier to iterate over them with C++ range-based `for` loops. Developers still need to be careful about mutating the contents of a block while iterating over it in this fashion (e.g. if you remove the "current" element, the iteration will end prematurely).
Probably the thorniest change here is that parameters are now just represented as the first N instructions in a block, which means:
* We need to perform a linear search to find the end of the parameter list. This is probably not often a problem, because usually you would be iterating over the parameters anyway, and that will be linear in the number of parameters.
* Algorithms that iterate over a block either need to ignore parameters, treat parameters just like other instructions, or somehow cleave the list into the range of parameters, and the range of "ordinary" instructions (which involves the same linear search above).
* When inserting into a block, we need to be careful not to insert instructions at invalid locations (e.g., insert a temporary before the parameters, or insert a parameter in the middle of the code). I can't pretend that I've handled the details of that here. (This is no different than having to make the same adjustments for phi nodes in a typical SSA representation)
* One possible future-proof approach is to implement a pass that sorts the instructions in a block so that parameters always come first. That would let us implement passes without caring about this detail, and then clean up right before any pass that cares about the relative order of parameters and other instructions.
The current change is missing any work to make literals and other instructions that used to be `IRValue`s properly nest inside of their parent module. Right now these instructions are just left unparented, and may actually end up being shared between distinct modules. Fixing that will need a follow-up change. The biggest challenge there is that it introduces instructions at the global scope that aren't `IRGlobalValue`s.
This change doesn't try to take advantage of any of the new flexibility (e.g., by nesting `specialize` instructions inside of witness tables). The goal is to do exactly what we were doing before, just with a different representation.
* Warning fix
|
| |
|
|
| |
This allows us to get rid of `IRGlobalValue::dispose()`.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Initial work on validating "constexpr"-ness in IR
The underlying issue here is that certain operations in the target shading languages constrain their operands to be compile-time constants. A notable example is the optional texel offset parameter to the `Texture2D.Sample` operation.
When calling these operations in GLSL, the user is required to pass a "constant expression," and any variables in that expression must therefore be marked with the `const` qualifier (and themselves be initialized with constant expressions). Any GLSL output we generate must of course respect these rules.
When calling these operations in HLSL, the user is not so constrained. Instead, they can pass an arbitrary expression, which may involve ordinary variables with no particular markup, and then the compiler is responsible for determining if the actual value after simplification works out to be a constant. In some cases, the requirement that a value be constant might actually trigger things like loop unrolling. Also, it is okay to use a function parameter to determine such a constant expression, as long as the argument turns out to be a constant at all call sites.
The way we have decided to tackle these challenges in Slang is that we we propagate a notion of `constexpr`-ness through the IR. This is currently being tackled in `ir-constexpr.cpp` with a combination of forward and backward iterative dataflow:
* When the operands to an instruction are all `constexpr`, and the opcode is one we believe can be constant-folded, then we infer that the instruction *can* be evaluated as `constexpr`
* When instruction is required to be `constexpr`, then we infer that all of its operands are also required to be `constexpr`.
If this process ever infers that a function parameter is required to be `constexpr`, then we might have to continue propagation at all the call sites to that function.
If after all the propagation is done, there are any cases where an instruction is *required* to be `constexpr`, but it *can't* be `constexpr` (we weren't able to infer `constexpr`-ness for its operands), then we issue an error.
This implementation encodes the idea of `constexpr`-ness in the IR as part of the type system, using a simplified notion of rates. This change adds a `RateQualifiedType` that can represent `@R T`, and then introduces a `ConstExprRate` that can be used for `R`. Many accessors for the type information on IR nodes were updated to distinguish when one wants the "full" type of an IR value (which might include rate information) vs. just the "data" type.
A `constexpr` qualifier was added in the front-end, and is being used to decorate the texel offset parameter for `Texture2D.Sample`. Lowering from AST to IR looks for this qalifier and infers when a function parameter must be typed as `@ConstExpr T` instead of just `T`.
There are lots of limitations and gotchas in the implementation so far:
* The `@ConstExpr` rate is the only one added in this change, but it seems clear that the conceptual `ThreadGroup` rate that was added to represent `groupshared` should probably get folded into the representation.
* I'm not 100% pleased with how many places in the IR I have to special-case for rate-qualified types. At the same type, pulling out rate as a distinct field on `IRValue` would probably require that we pay attention to rate everywhere.
* I've added a test case to show that we can issue errors when users fail to provide a constant expression for the texel offset, but the actual error message isn't great because it doesn't indicate *why* a constant expression was required. Realistically the "initial IR" should contain a few more decorations we can use to relate error conditions back to the original code (even if this is in a side-band structure).
* I've added a test case that is supposed to show that we can back-propagate `constexpr`-ness to local variables, and I've manually confirmed that it works for Vulkan/SPIR-V output, but the level of Vulkan support in `render_test` today means I can't enable the test for check-in.
* While I'm attempting to propagate `@ConstExpr` information from callees to callers, I haven't implemented any logic to specialize callee functions based on values at call sites.
* In a similar vein, there is no handling of control-flow dependence in the current code. If we infer that a phi (block parameter) needs to be `@ConstExpr`, then it isn't actually enough to require that the inputs to the phi (arguments from predecessor blocks) are all `@ConstExpr` because we also need any control-flow decisions that pick which incoming edge we take to be `@ConstExpr` as well.
* As a practical matter, implicit propagation of `@ConstExpr` from a function body to a function parameter should only be allowed for functions that are "local" to a module. Any function that might be accessed from outside of a module should really have had its `@ConstExpr` parameter marked manually, and our pass should validate that they follow their own rules. Right now we have no kind of visibility (`public` vs `private`) system, so I'm kind of ignoring this issue.
While that is a lot of gaps, this is also just enough code to get the Falcor MultiPassPostProcess example working, so I'm inclined to get it checked in.
* Fixup: missing expected output for test
* Fixup: disable test that relies on [unroll] for now
|
| |
|
|
|
|
|
|
|
|
|
| |
The basic problem here is that when unlinking an `IRUse` from the linked list of uses, there were several cases where I was failing to set the `prevLink` field of the next node to match the `prevLink` field of the node being removed. That doesn't show up when walking the linked list of uses forward, but it breaks it whenever you have subsequent unlinking operations.
This change fixes the bugs of that kind I could find, and also adds a debug validation method to try to avoid breaking it again. I also made more access to `IRUse` go through accessor methods rather than using fields directly, to try to avoid this kind of error. I stopped short of making anything `private`, because I tend to find that it creates more hassles than it avoids.
A few other fixes along the way:
- Made the `List<T>` type default-initialize elements when you resize it. I hadn't realized we weren't doing that.
- Add a standalone `dumpIR(IRGlobalValue*)` so help when debugging issues.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Issue error when shader parameter type doesn't match between translation units
We currently allow users to compile different translation units at the same time for the vertex and fragment shader, and those different translation units might contain distinct declarations for the "same" parameter. Furthermore, those declarations might use distinct declaratins of the "same" type.
We currently bind those as if they were one parameter, which means we assume they have types that are actually equivalent. Unfortunately, things break when that isn't the case.
This change adds error messages when parameter declarations that are determined to be the "same" don't have types that are either equiavlent or match "structurally" (same names, same fields, and field types match recursively).
Ideally most users won't see these errors, because they will only submit single files to the compiler. Eventually we may actually decide to require that case and simplify our logic.
* Support types with layout coming from a "matching" type
Because of how the front-end performs parameter binding, we can end up in cases where a variable is given a `TypeLayout` based on a "matching" type from another translation unit (because the "same" shader parmaeter got declared in multiple TUs).
This change tries to support that use case by avoiding absolute field decl-refs in the representation used for type legalization, so that we don't end up in a situation where we look up field layout based on information that doesn't match.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Basic IR support for `static const` globals
Our strategy for lowering global *variables* can fall back to putting their initialization into a function, but that isn't really appropriate for global constants (it also isn't appropriate for arrays, but we'll need to deal with that seaprately).
This change adds a distinct case for global constants (rather than treating them as variables), and forces the emission logic to always emit them as a single expression.
Doing this makes assumptions about how the IR for these constants gets emitted (and what optimziations might do to it).
In order to make things work, I had to switch the handling of initializer-list expressions to not be lowered via temporaries and mutation (since that isn't a good fit for reverting to a single expression).
I've added a single test case to ensure that this works in the simplest scenario. My next priority will be to see if this unblocks my work in Falcor.
* Fixup: bug fixes
|
| |
|
|
|
|
|
|
| |
Previously, all legalizations of a generic type would use the name of the original decl for the "ordinary" part of things, and this would lead to collisions because the names didn't include the mangled generic arguments.
This is now fixed by storing the mangled name of the original inside of `struct` declarations created for legalization, and using those names instead.
Also adds support for `getElementPtr` instructions when doing IR type legalization.
Also tries to make a `DeclRefType` convert to a string using the underlying `DeclRef`. This doesn't help because `DeclRef::toString` doesn't actually include generic arguments either.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Work on getting rewriter + IR playing nice together.
There are a few different changes here, with the goal of improving the interaction between the "rewriter" code generation approach and the new IR and type legalization code.
The main changes are:
- Add a new pass that occurs before the AST legalization pass, which walks the (used) AST declarations and tries to discover (1) which declarations need to be specialized/lowered via the IR, and (2) which declarations need to be included in the resulting AST module.
- AST-based legalization now uses the generated list when in "rewriter" mode, so that we should be working around issues that users were seeing with types not getting emitted.
- TODO: we still need an equivalent fixup in the case of non-"rewriter" emit, so this may still be a problem for `.slang` files.
- IR type legalization now precedes AST legalization, so that we can record information on how any IR global values got legalized (e.g., if they got split). Then AST legalization includes logic to reconstruct suitable tuple expressions to reference a split global.
- When emitting using IR + AST, we walk all of the declarations that we decided belonged to the IR, but which were subsequently referenced in the AST, to make sure they get output (this would include `struct` types that are declared in a file compiled via IR, but never used in IR-based code).
The rewriter+IR use case still doesn't *quite* work, but the logic for walking the AST in a pre-pass ends up being needed/useful to fix some pure rewriter bugs, so I'm getting this checked in sooner rather than later.
* Fixup: walk arguments to generic declaration reference
The gotcha here is that the code for walking the AST would walk a line of code like:
SomeType a;
and know to traverse the declaration of `SomeType`, but if it saw a line of code like:
ParameterBlock<SomeType> b;
it would traverse the declaration of `ParameterBlock`, but fail to visit that of `SomeType`.
|
| |
|
|
|
|
|
|
|
|
| |
* Cleanups to `ParameterBlock<T>` behavior.
These add some more realistic tests using the `ParameterBlock<T>` support, and show that it can work with the "rewriter" mode.
Unfortunately, this code does *not* currently work with the rewriter + the IR at once. That will need to be fixed in a follow-on change, because I now see that the root problem is pretty ugly.
* cleanup
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Make AST and IR share type legalization code
A previous change already made it so that the AST-to-AST lowering/legalization pass could work together with IR-based lowering of `import`ed code, but that change didn't take into account the case where a function written in the AST needed to call an IR function and pass in a type that required legalization.
Both the IR-based and AST-based passes had their own approaches to type legalization, that mostly agreed on the desired output, but they ended up creating their own representations for legalized types which would mean that for a function call the caller and callee might end up legalizing the parameter list to use different types.
This change tries to fix this issue (and adds a new test case that relies on the fix) by massively overhauling the AST-based legalization pass so that it uses the same type legalization code as the IR. The shared code has been moved out into `legalize-types.{h,cpp}`.
Notes:
- I eliminated the `FilteredTupleType` type, since it was starting to cause code duplication in a lot of places. Instead, type legalization just creates new `struct` types to represent the result of filtering.
- One big consequence of this is that the `LegalType::pair` case needs to remember for each field in the original type which field (if any) in the new `struct` type it maps to
- A big source of complexity (and probably bugs) in this code is trying to figure out how to parent these new `struct` definitions effectively. A good follow-on change would be something that outputs declarations on-demand during the AST emit logic (as we do for the IR), just to avoid some of this song and dance.
- The old AST type legalization had a notion of both a "tuple" type and a "varying tuple" type. The "tuple" case was quite complex, and combined behavior currently handled by `LegalType::pair` (for splitting into ordinary and special sides) and `LegalType::tuple` (for holding multiple distinct elements to represent the fields of an aggregate). The "varying tuple" case was closer to `LegalType::tuple`, so I tried to just re-use the existing logic for that too. The one place this potentially gets messy is in `reifyTuple()`.
- The messiest bit of handling the "varying tuple" concept (which is used for GLSL shader inputs/outputs since they have to be scalarized) is that when passing them as function arguments we need to reify the tuple back into a structured value. Because the `LegalExpr` hierarchy doesn't have type information, but constructing a value of the "original" type requires such information, things get a little messy.
- I did *not* try to deal with any of the logic related to handling system inputs/outputs for cross-compilation purposes. Of course, the long-term goal is that any actual cross-compilation is handled via the IR, but this change can't afford to break the AST-based path just yet. As a result, there is still quite a bit of complexity in the handling of assignment, to deal with cases where "fixups" are required.
* fixup: bad code in macro, not caught by Visual Studio compiler
* fixup: more stuff missed by VS compiler
* fixup: VS continutes to miss stuff in UNREACHABLE_RETURN
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #301
The problem here is that if you have input GLSL code like:
```glsl
// example.vs
in vec3 pos;
```
and:
```glsl
// example.fs
in vec3 worldPos;
```
Then both `pos` and `worldPos` are reflected as global variables (parameters of the *program*), which both get bound to "varying input" resources, but there is no way to tell through the API that `pos` is a vertex parameter while `worldPos` is a fragment one.
The original request in issue #301 was to expose parameters like this not as a global variables, but rather as parameters of the entry point in their specific file. That is, treat it as if the user had written, e.g.:
```glsl
// example.vs
void vsMain(in vec3 pos) { ... }
```
Doing that would unify the GLSL and HLSL/Slang cases a bit, but would require the Slang reflection API to lie about the structure of code the user wrote. At a more basic level, that would have been hard to implement because the current reflection API just exposes the underlying AST, and the AST *needs* to leave `pos` at the global scope so that when we go and spit GLSL back out we retain the original structure.
This PR implements a more simplistic solution, where the user is allowed to query the stage that a varying parameter "belongs" to. For right now I'm only enabling this to work for varying parameters (but it doesn't care if they are entry-point or global-scope varyings). Despite what I said on #301, this should work for both the top-level parameter's variable layout, *and* any variable layouts for fields within its type reflection.
In terms of implementation, I took the simple but wasteful route: every `VarLayout` now has a `stage` field that is by default initialized to `SLANG_STAGE_NONE`. When collecting varying parameters, I take advantage of the fact that everything bottlenecks through `processEntryPointParameter()` which takes an `EntryPointParameterState` so that I can set the `VarLayout::stage` field for any varying parameter in one place.
While I was making this change, I also did a bit of cleanup so that the "official" names for the varying parameter categories are `VARYING_INPUT` and `VARYING_OUTPUT`, with `VERTEX_INPUT` and `FRAGMENT_OUTPUT` being "deprecated" in principle. I didn't do the bulk rename inside the codebase yet.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Revise type legalization so it can handle constant buffers
The existing legalization approach with "tuples" can handle scalarizing a `struct` type with resource-type fields in it, but it had several big gaps. The most notable is that given a type that mixes uniform and resource fields, we can't just blindly scalarize things:
```
struct P {
float4 a;
float4 b;
Texture2D t;
};
cbuffer C
{
P gParam[8];
};
```
The existing code was completely ignoring the declaration of `gParam` inside `C`, but even if we fixed that issue, we'd get something like:
```
cbuffer C
{
float4 gParam_a[8];
float4 gParam_b[8];
};
Texture2D gParam_t[8];
```
In this case we've completely changed the layout of the uniform buffer, by switching from AOS to SOA.
Even if we could get the type layout logic and the IR to agree on this, it would be a surprise to users, and "principle of least surprise" should be a big deal on a project with as many moving parts as ours.
The right thing to do is to have legalization create a "stripped" version of the original type `P` and use that:
```
struct P_stripped {
float4 a;
float4 b;
};
cbuffer C
{
P_stripped gParam[8];
};
Texture2D gParam_t[8];
```
Then at a call site, this:
```
foo(gParam);
```
becomes:
```
foo(gParam, gParam_t);
```
This is exactly how the current AST-to-AST legalization handles mixed uniform and resource types, but the way it does it involves some annoying kludges:
- That pass has a notion of a "tuple" similar to our legalization, but every tuple has an optional "primary" entry for all the uniform data, plus tuple elements for the resources, and a given field may be represented on one side, the other, or both. It makes the code for handling tuples very messy.
- That pass does the "stripping" of types by actually marking up the AST declarations (this is okay because it is constructing a new AST as it goes), so that when they get emitted certain fields don't actually show up. That is, we fix the problem with type `P` by actually *modifying* the user's declaration of `P`. That seems out of bounds for the IR.
This change fixes the problem in our IR type legalization while trying to avoid the problems of the AST-to-AST pass by using two new ideas:
1. We add a new case for `LegalType` (and `LegalVal`) that is a "pair" type, where a pair consists of both an "ordinary" type (for uniform data) and a "special" type (for resource data). E.g., after legalization, the type for `C` (which can be over-simplified to `ConstantBuffer<P>` for our purposes), will be a `LegalType::pair` where the ordinary side is `ConstantBuffer<P_stripped>` and the special side is a tuple containing the `Texture2D` field.
2. We add a new (and annoyingly hacky) AST-level type called `FilteredTupleType` which is semantically a sort of tuple type (it holds a list of elements, and the elements have their own types), but which remembers an "original type" that it was created from, and for each element remembers the field of the original type that it corresponds to. This is used to construct a type like `P_stripped` as an actual AST-level structural type.
The core logic for legalizing an aggregate type had to get more complicated just because of the new pair case, so there is now a `TupleTypeBuilder` that asists with taking an aggregate type, processing its fields, and then picking the right `LegalType` representation for the result.
Other smaller changes:
- Made the legalization logic actually legalize `PtrType<T>`. E.g., if `T` legalizes to a tuple, we need to construct a tuple of pointer types. The same exact thing needs to be applied to arrays, and any other generic type that should "distribute over" pairs/tuples.
- Made the legalization logic actually legalize `ConstantBuffer<T>` and similar. The basic idea there is if `T` maps to a pair, we wrap `ConstantBuffer<...>` around the ordinary side, and `implicitDeref` around the special side.
- Removed a bunch of `#ifdef`ed-out code from the end of `ir-legalize-types.cpp`. That was code from my first attempt at legalization that failed miserably (trying to do it via local changes and a work list instead of a global rewrite pass), but it had some code I wanted to reference when writing the version that actually got checked in (should have deleted the code earlier, though).
- Added a bunch of cases for `LegalType::none` (and the `LegalVal` equivalent) that helped simplify the logic fo the `pair` case by allowing me to *always* dispatch to both the "ordinary" and "special" sides, even if they might not actually be present.
- Renamed `TupleType` and `TupleVal` over to `TuplePseudoType` and `TuplePseudoval` to recognize the fact that we might actually need/want *real* tuples in the type system, to go along with these fake ones (that need to be optimized away).
The biggest doubt I have about this change is the whole `FilteredTupleType` thing; it seems like an obviously contrived type to add to the front-end type system that really only solves IR-level problems. A cleaner approach might have been to just add a plain old `TupleType` to the front-end type system (and initially I started with that), and then have yet another `LegalType`/`LegalVal` case that handles mapping from the fields of the original type to the numbered tuple elements.
I expect we'll actually want to make that change in the future (especially if we ever add true tuples to the front-end), but for right now I let myself be swayed by the desire to have these stripped/filtered types get names that explain their provenance ("where they came from") to make our output code more debuggable. The way I've done it is probably overkill, though, and we need a much more complete effort on the readability and debuggability of our output before anything like that is worth worrying about.
* Fixup: typo
* Fixup: fix output of "non-mangled" names for test cases
- Make sure to attach high-level decls to variables created as part of type legalization
- Also, try to share more of the code between the different cases of variables
- Fix up `parameter-blocks` test case that was passing `-no-mangle` but expecting mangled names in the output
- Fix up `multiple-parameter-blocks` to not rely on `-no-mangle` for now, because it would lead to two global variables with the same name (need to fix that underlying issue eventually).
- Also fix name generation logic so that we only use "original" names (from high-level decls) specifically when the `-no-mangle` flag is on, and otherwise use IR-level names.
* Fix: handle constant buffers better in render-test
- Don't request both CB and SRV usage for buffers, since that is illegal
- Also, don't try to create an SRV when user requested a CB (since the required usage flag won't be there)
- Record the input buffer type on the `D3DBinding` for a buffer, and use that to tell us when to bind a CB instead of SRV/UAV
- Fix expected output for `cbuffer-legalize` test now that we are actually feeding it correct cbuffer dta.
|
| |
|
|
|
|
|
|
|
|
|
| |
This commit fixes issue #275
This commit includes following changes:
1. legalize function parameter IRParam instructions
2. legalize function parameter types in IRFuncType
3. legalize call sites (IRCall) with proper arguments
4. legalize local vars that has a mixed resource type.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Don't auto-enable IR use for compute tests
The `COMPARE_COMPUTE` and `COMPARE_RENDER_COMPUTE` test fixtures were set up to always enable the `-use-ir` flag on Slang, which precludes having any tests that confirm functionality on the old non-IR path (which is still required by our main customer).
This change adds the `-xslang -use-ir` flags explicitly to any compute test cases that left them out, and makes the fixture no longer add it by default.
* Continue building out parameter block support
The initial front-end logic for parameter blocks was already added, but they are still missing a bunch of functionality. This change addresses some of the known issues:
- Bug fix: don't try to emit HLSL `register` bindings for variables that consume whole register spaces/sets
- Overhaul type layout logic so that it can make decisions based on a given code generation target (currently passed in as a `TargetRequest`), which allows us to decide whether or not a parameter block should get its own register set on a per-target basis.
- Always use a register space/set for Vulkan
- Never use a register space/set for HLSL SM 5.0 and lower
- By default, don't use register spaces/sets for HLSL output
- Add a command-line flag and some "target flags" to enable register-space usage for D3D targets
- Hackily add initial support for parameter blocks in the AST-to-AST path
- This just blindly lowers `ParameterBlock<T>` to `T`, which shouldn't quite work
- A more complete overhaul will probably need to wait until the AST-to-AST legalization is changed to use the `LegalType`s from the IR legalization pass.
- Add a compute-based test case to actually run code using parameter blocks
- This file runs test cases both with and without the IR
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* improve diagnostic messages and prevent fatal errors from crashing the compiler.
* fix top level exception catching.
* spelling fix
* change wording of invalidSwizzleExpr diagnostic
* add speculative GenericsApp expr parsing
* add new test case of cascading generics call.
* Fixing bugs in compiling cascaded generic function calls.
Add implementation of DeclaredSubTypeWitness::SubstituteImpl()
This is not needed by the type checker, but needed by IR specialization. When input source contains cascading generic function call, the arguments to `specialize` instruction is currently represented as a substitution. The arg values of this subsittution can be a `DeclaredSubTypeWitness` when a generic function uses one of its generic parameter to specialize another generic function. When the top level generics function is being specialized, this substitution argument, which is a `DeclaredSubTypeWitness`, needs to be substituted with the witness that used to specialize the top level function in the specialized specialize instruction as well.
* add a test case for cascading generic function call.
* parser bug fix
* fixes #255
* add test case for issue #255
* Generate missing `specialize` instruction when calling a generic method from an interface constraint.
When calling a generic method via an interface, we should be generating the following ir:
...
f = lookup_interface_method(...)
f_s = specailize(f, declRef)
...
This commit fixes this `emitFuncRef` function to emit the needed `specialize` instruction.
* fixes #260
This fix follows the second apporach in the disucssion. It generated mangled name for specialized functions by appending new substitution type names to the original mangled name.
* Disabling removing and re-inserting specailized functions in getSpecalizeFunc()
I am not sure why it is needed, it seems HLSL and GLSL backends are generating forward declarations anyways, so the order of functions in IRModule shouldn't matter.
* cleanup and complete test cases.
* fix warnings
|
|
|
* Rename existing ParameterBlock to ParameterGroup
We are planning to add a new `ParameterBlock<T>` type, which maps to the notion of a "parameter block" as used in the Spire research work.
Unfortunately, the compiler codebase already uses the term `ParameterBlock` as catch-all to encompass all of HLSL `cbuffer`/`tbuffer` and GLSL `uniform`/`buffer`/`in`/`out` blocks (all of which are lexical `{}`-enclosed blocks that define parameters...).
This change instead renames all of the existing concepts over to `ParameterGroup`, which isn't an ideal name, but at least doesn't directly overlap the new terminology or any existing terminology.
The new `ParameterBlockType` case will probably be a subclass of `ParameterGroupType`, since it is a logical extension of the underlying concept.
* Add Shader Model 5.1 profiles
The HLSL `register(..., space0)` syntax is only allowed on "SM5.1" and later profiles (which is supported by the newer version of `d3dcompiler_47.dll` that comes with the Win10 SDK, but not the older version of `d3dcompiler_47.dll` - good luck figuring out which you have!).
This change adds those profiles to our master list of profiles, and nothing else.
* First pass at support for `ParameterBlock<T>`
- Add the type declaration in stdlib
- Add a special case of `ParameterGroupType` for parameter blocks
- Handle parameter blocks in type layout (currently handling them identically to constant buffers for now, which isn't going to be right in the long term)
- Add an IR pass that basically replaces `ParameterBlock<T>` with `T`
- Eventually this should replace it with either `T` or `ConstantBuffer<T>`, depending on whether the layout that was computed required a constant buffer to hold any "free" uniforms
- Add first stab at an IR pass to "scalarize" global variables using aggregate types with resources inside.
- This currently only applies to global variables, so it won't handle things passed through functions, or used as local variables
- It also only supports cases where the references to the original variable are always references to its fields, and not the whole value itself
- Add a single test case that technically passes with this level of support, but probably isn't very representative of what we need from the feature
* Fold parameter-block desugaring into a more complete "type legalization" pass
The basic problem that was arising is that once you desugar `ParameterBlock<T>` into `T`, you then need todeal with splitting `T` into its constituent fields if it contains any resource types.
Handling those transformations by following the usual use-def chains wasn't really helping, because you might need systematic rewriting that can really only be handled bottom-up.
This change adds a new pass that is intended to perform multiple kinds of type "legalization" at once:
- It will turn `ParameterBlock<T>` into `T`
- It may at some point also convert `ConstantBuffer<T>` into `T` as well
- It will turn an value of an aggregate type that contains resources into N different values (one per field)
- As a result of this, it will also deal with AOS-to-SOA conversion of these types
Legalization is applied to *every* function/instruction/value, so that it can make large-scale changes that would be tough to manage with a work list.
This pass needs to be run *after* generics have been fully specialized, so that we know we are always dealing with fully concrete types, so that their legalization for a given target is completely known.
This is still work in progress; there's more to be done to get this working with all our test cases, and finish the remaining `ParameterBlock<T>` work.
* Improve binding/layout information when using parameter blocks
- When doing type layout for a parameter block, don't include the resources consumed by the element type in the resource usage for the parameter block
- Note that this is pretty much identical to how a `ConstantBuffer<T>` does not report any `LayoutResourceKind::Uniform` usage, except that `ParameterBlock<T>` is *also* going to hide underlying texture/sampler reigster usage
- The one exception here is that any nested items that use up entire `space`s or `set`s those need to be exposed in the resource usage of the parent (I don't have a test for this)
- When type legalization needs to scalarize things, it must propagate layout information down to the new leaf variables. In general, the register/index for a new leaf parameter should be the sum of the offsets for all of the parent variables along the "chain" from the original variable down to the leaf (we aren't dealing with arrays here just yet).
- When type legalization decides to eliminate a pointer(-like) type (e.g., desugar `ParameterBlock<T>` over to `T`), actually deal with that in terms of the `LegalVal`s created, so that we can know to turn a `load` into a no-op when applied to a value that got indirection removed.
- Hack up the "complex" parameter-block test so that it actually passes (the big hack here is that the HLSL baseline is using names that are generated by the IR, and are unlikely to be stable as we add/remove transformations).
- Note: I can't make these be compute tests right now, because regsiter spaces/sets are a feature of D3D12/Vulkan, and our test runner isn't using those APIs.
|