| Age | Commit message (Collapse) | Author |
|
|
|
|
|
The big change here is that the ability to contain basic blocks with instructions in them has been hoisted from `IRFunc` into a new base type `IRGlobalValueWithCode` shared with `IRGlobalVar`.
The basic blocks of a global variable define initialization logic for it; they can be looked at like a function that returns the initial value.
Places in the IR that used to assume functions contain all the code need to be updated, but so far I only handled the cloning step.
The emit logic currently handles an initializer for a global variable by outputting its logic as a separate function, and then having the variable call that function to initialize itself. This should be cleaned up over time so that we generate an ordinary expression whenever possible.
I also made the emit logic correctly label any global variable without a layout (that is, any that don't represent a shader parameter) as `static` so that the downstream HLSL compiler sees them as variables rather than parameters.
|
|
1. simplify RoundUpToAlignment()
2. add new a render-compute test case to cover the situation where the entry-point interface (parameter/return types of an entry-point function) is dependent on the global generic type.
3. initial fixes to get this test case to compile (but is not producing correct HLSL output yet)
|
|
These were already being handled a little bit, by lowering an `out T` or `inout T` function parameter in the AST to a function parameter with type `T*` in the IR, and then emiting explicit loads/stores.
The HLSL emit logic, however, couldn't tell the difference between an `out` parameter, an `inout`, or a true pointer (if we ever needed to support them...).
The intention (not fully implemented) was that we'd use a hierarchy of types rooted at `PtrTypeBase`:
- `PtrTypeBase`
- `Ptr`: "real" pointers in the C/C++ sense
- `OutTypeBase`: pointers used to represent by-reference parameter passing
- `OutType`: IR level type for an `out` parameter
- `InOutType`: IR level type for an `inout` or `in out` parameter
Actually implementing this involved:
- Adding a bit more flexibility to the `Session::getPtrType` logic to allow for creating any of the concrete types above
- Making the `lower-to-ir` logic create the right type for function parameters (instead of just using `PtrType`)
- Making the HLSL emit logic check for the `OutType` and `InOutType` cases rather than just `PtrType`
- Changing a bunch of small places in the code so that they use `PtrTypeBase` instead of `PtrType` when they should handle any of the above cases, and also make a few places check for `OutTypeBase` instead of `PtrType` or `PtrTypeBase`, when they are really trying to capture by-reference parameters
- Add a test case that uses all of the different cases we care about (without these fixes, this test case generates errors from fxc because of variables being used before being initialized, becaues parameters get declared `out` that should be `inout`).
A minor point here is that we are playing a bit fast and loose right now because the IR does not actually enforce any type checks. From the standpoint of the front end, `Ptr<T>`, `Out<T>`, and `InOut<T>` are all unrelated types (each is just a `struct` declared in `core.meta.slang`), but this doesn't really matter because none of these are types our current users are explicitly using.
In the IR it makes perfect sense to allow `Out<T>` or `InOut<T>` as the operand of a `load` or `store` instruction (and ditto for `getFieldAddr`, etc.) - there instructions just apply to any `PtrTypeBase`. The place where this potentially gets tricky is whether an `Out<T>` can be used where a `Ptr<T>` is expected, or vice vers (e.g., can I just pass my local variable's pointer directly to an `Out<T>` function parameter?
I'm going to ignore these issues for now, since the code currently works for our test case.
|
|
* Add support for global generic parameters
(In-progress work)
This commit include:
1. Update Slang API to allow specification of generic type arguments in an `EntryPointRequest`
2. Add parsing of `__generic_param` construct, which becomes a GlobalGenericParamDecl, contains members of `GenericTypeConstraintDecl`.
3. Semantics checking will check whether the provided type arguments conform to the interfaces as defined by the generic parameter, and store SubtypeWitness values in the EntryPointRequest, which will be used by `specializeIRForEntryPoint` when generating final IR.
4. Add a new type of substitution - `GlobalGenericParamSubstitution` for subsittuting references to `__generic_param` decls or to its member `GenericTypeConsraintDecl` with the actual type argument or witness tables.
5. Update `IRSpecContext` to apply `GlobalGenericParamSubstitution` when specializing the IR for an EntryPointRequest.
6. Update `render-test` to take additional `type` inputs, which specifies the type arguments to substitute into the global `__generic_param` types.
This commit does not include ProgramLayout specialization.
* IR: pass through `[unroll]` attribute (#284)
The initial lowering was adding an `IRLoopControlDecoration` to the instruction at the head of a loop, but this was getting dropped when the IR gets cloned for a particular entry point.
The fix was simply to add a case for loop-control decorations to `cloneDecoration`.
* fix warnings
* IR: support `CompileTimeForStmt` (#286)
This statement type is a bit of a hack, to support loops that *must* be unrolled.
The AST-to-AST pass handles them by cloning the AST for the loop body N times, and it was easy enough to do the same thing for the IR: emit the instructions for the body N times.
The only thing that requires a bit of care is that now we might see the same variable declarations multiple times, so we need to play it safe and overwrite existing entries in our map from declarations to their IR values.
Of course a better answer long-term would be to do the actual unrolling in the IR. This is especially true because we might some day want to support compile-time/must-unroll loops in functions, where the loop counter comes in as a parameter (but must still be compile-time-constant at every call site).
* Add support for global generic parameters
(In-progress work)
This commit include:
1. Update Slang API to allow specification of generic type arguments in an `EntryPointRequest`
2. Add parsing of `__generic_param` construct, which becomes a GlobalGenericParamDecl, contains members of `GenericTypeConstraintDecl`.
3. Semantics checking will check whether the provided type arguments conform to the interfaces as defined by the generic parameter, and store SubtypeWitness values in the EntryPointRequest, which will be used by `specializeIRForEntryPoint` when generating final IR.
4. Add a new type of substitution - `GlobalGenericParamSubstitution` for subsittuting references to `__generic_param` decls or to its member `GenericTypeConsraintDecl` with the actual type argument or witness tables.
5. Update `IRSpecContext` to apply `GlobalGenericParamSubstitution` when specializing the IR for an EntryPointRequest.
6. Update `render-test` to take additional `type` inputs, which specifies the type arguments to substitute into the global `__generic_param` types.
progress on parameter binding
* Add a more contrived test case for specializing parameter bindings
* update render-test to align buffers to 256 bytes (to get rid of D3D complains on minimal buffer size).
* adding one more test case for parameter binding specialization.
* Cleanup according to @tfoleyNV 's suggestions.
* fix a bug introduced in the cleanup
|
|
The initial lowering was adding an `IRLoopControlDecoration` to the instruction at the head of a loop, but this was getting dropped when the IR gets cloned for a particular entry point.
The fix was simply to add a case for loop-control decorations to `cloneDecoration`.
|
|
* Revise type legalization so it can handle constant buffers
The existing legalization approach with "tuples" can handle scalarizing a `struct` type with resource-type fields in it, but it had several big gaps. The most notable is that given a type that mixes uniform and resource fields, we can't just blindly scalarize things:
```
struct P {
float4 a;
float4 b;
Texture2D t;
};
cbuffer C
{
P gParam[8];
};
```
The existing code was completely ignoring the declaration of `gParam` inside `C`, but even if we fixed that issue, we'd get something like:
```
cbuffer C
{
float4 gParam_a[8];
float4 gParam_b[8];
};
Texture2D gParam_t[8];
```
In this case we've completely changed the layout of the uniform buffer, by switching from AOS to SOA.
Even if we could get the type layout logic and the IR to agree on this, it would be a surprise to users, and "principle of least surprise" should be a big deal on a project with as many moving parts as ours.
The right thing to do is to have legalization create a "stripped" version of the original type `P` and use that:
```
struct P_stripped {
float4 a;
float4 b;
};
cbuffer C
{
P_stripped gParam[8];
};
Texture2D gParam_t[8];
```
Then at a call site, this:
```
foo(gParam);
```
becomes:
```
foo(gParam, gParam_t);
```
This is exactly how the current AST-to-AST legalization handles mixed uniform and resource types, but the way it does it involves some annoying kludges:
- That pass has a notion of a "tuple" similar to our legalization, but every tuple has an optional "primary" entry for all the uniform data, plus tuple elements for the resources, and a given field may be represented on one side, the other, or both. It makes the code for handling tuples very messy.
- That pass does the "stripping" of types by actually marking up the AST declarations (this is okay because it is constructing a new AST as it goes), so that when they get emitted certain fields don't actually show up. That is, we fix the problem with type `P` by actually *modifying* the user's declaration of `P`. That seems out of bounds for the IR.
This change fixes the problem in our IR type legalization while trying to avoid the problems of the AST-to-AST pass by using two new ideas:
1. We add a new case for `LegalType` (and `LegalVal`) that is a "pair" type, where a pair consists of both an "ordinary" type (for uniform data) and a "special" type (for resource data). E.g., after legalization, the type for `C` (which can be over-simplified to `ConstantBuffer<P>` for our purposes), will be a `LegalType::pair` where the ordinary side is `ConstantBuffer<P_stripped>` and the special side is a tuple containing the `Texture2D` field.
2. We add a new (and annoyingly hacky) AST-level type called `FilteredTupleType` which is semantically a sort of tuple type (it holds a list of elements, and the elements have their own types), but which remembers an "original type" that it was created from, and for each element remembers the field of the original type that it corresponds to. This is used to construct a type like `P_stripped` as an actual AST-level structural type.
The core logic for legalizing an aggregate type had to get more complicated just because of the new pair case, so there is now a `TupleTypeBuilder` that asists with taking an aggregate type, processing its fields, and then picking the right `LegalType` representation for the result.
Other smaller changes:
- Made the legalization logic actually legalize `PtrType<T>`. E.g., if `T` legalizes to a tuple, we need to construct a tuple of pointer types. The same exact thing needs to be applied to arrays, and any other generic type that should "distribute over" pairs/tuples.
- Made the legalization logic actually legalize `ConstantBuffer<T>` and similar. The basic idea there is if `T` maps to a pair, we wrap `ConstantBuffer<...>` around the ordinary side, and `implicitDeref` around the special side.
- Removed a bunch of `#ifdef`ed-out code from the end of `ir-legalize-types.cpp`. That was code from my first attempt at legalization that failed miserably (trying to do it via local changes and a work list instead of a global rewrite pass), but it had some code I wanted to reference when writing the version that actually got checked in (should have deleted the code earlier, though).
- Added a bunch of cases for `LegalType::none` (and the `LegalVal` equivalent) that helped simplify the logic fo the `pair` case by allowing me to *always* dispatch to both the "ordinary" and "special" sides, even if they might not actually be present.
- Renamed `TupleType` and `TupleVal` over to `TuplePseudoType` and `TuplePseudoval` to recognize the fact that we might actually need/want *real* tuples in the type system, to go along with these fake ones (that need to be optimized away).
The biggest doubt I have about this change is the whole `FilteredTupleType` thing; it seems like an obviously contrived type to add to the front-end type system that really only solves IR-level problems. A cleaner approach might have been to just add a plain old `TupleType` to the front-end type system (and initially I started with that), and then have yet another `LegalType`/`LegalVal` case that handles mapping from the fields of the original type to the numbered tuple elements.
I expect we'll actually want to make that change in the future (especially if we ever add true tuples to the front-end), but for right now I let myself be swayed by the desire to have these stripped/filtered types get names that explain their provenance ("where they came from") to make our output code more debuggable. The way I've done it is probably overkill, though, and we need a much more complete effort on the readability and debuggability of our output before anything like that is worth worrying about.
* Fixup: typo
* Fixup: fix output of "non-mangled" names for test cases
- Make sure to attach high-level decls to variables created as part of type legalization
- Also, try to share more of the code between the different cases of variables
- Fix up `parameter-blocks` test case that was passing `-no-mangle` but expecting mangled names in the output
- Fix up `multiple-parameter-blocks` to not rely on `-no-mangle` for now, because it would lead to two global variables with the same name (need to fix that underlying issue eventually).
- Also fix name generation logic so that we only use "original" names (from high-level decls) specifically when the `-no-mangle` flag is on, and otherwise use IR-level names.
* Fix: handle constant buffers better in render-test
- Don't request both CB and SRV usage for buffers, since that is illegal
- Also, don't try to create an SRV when user requested a CB (since the required usage flag won't be there)
- Record the input buffer type on the `D3DBinding` for a buffer, and use that to tell us when to bind a CB instead of SRV/UAV
- Fix expected output for `cbuffer-legalize` test now that we are actually feeding it correct cbuffer dta.
|
|
- Change function mangling so we use `p<parameterCount>p` instead of just `p<parameterCount>` to avoid the parameter count running into digits at the start of a mangled type name and tripping up the un-mangling logic.
- We really need to step back at some point and define our mangling scheme a bit more carefully, especially if we are going to keep going down this road where un-mangling things is important for generating HLSL output.
- Also allow the unmangling logic to unmangle a few more cases of generic parameters, so that it can skip over them to get to the parameter count of the underlying function.
- Add a notion of an `unreachable` instruction to the IR, and emit it as the terminator (if needed) at the end of the last block for a function with a non-void return type.
- This does *not* implement any logic to emit a diagnostic if the `unreachable` turns out to be potentially reachable
- Fix a bug in IR specialization of generics where we can't create two different specializations of the same function, because both get registered in the same hash map
With all these fixes, testing in Falcor modified to use the full Slang compiler and IR for all HLSL/Slang:
- The UI and text rendering shaders yield HLSL that compiles without error; no idea if they actually *work*
- The ModelViewer shaders yield HLSL, but there are some issues (looks like type legalization isn't applying to stuff inside constant buffers)
|
|
* IR: add support for `switch` statements
Fixes #273
This is just something we hadn't gotten to yet on the IR. The actual design of the instruction is unsurprising (once you take into consideration the requirement for structured control flow). A `switch` instruction takes the form:
switch <condition> <breakLabel> <defaultLabel> [<caseVal> <caseLabel>]*
Where `condition` is the value to switch on, `breakLabel` is the "join point" after the original `switch` statement, `defaultLabel` is where to go if the value doesn't match any case, and each pair of `caseVal` and `caseLabel` is what to do on a particular value. It is required that `caseVal` be a literal, but this isn't currently being enforced in the IR (the front-end should be making a check and constant-folding the case labels).
For structured control flow, we also make the assumption that the cases are in order: cases with the same label must be grouped together, and any case that falls through to another must come right before it.
Given this representation, the emit logic can reconstruct a `switch` statement with relative ease, given the machinery we already have. It makes sure to group together case values with the same label (again, assuming they are contiguous), and will insert the `default:` label in with whatever group it belongs to.
Actually emitting code for a `switch` statement seems superficially simple, until you realize that a complete implementation needs to handle stuff like "Duff's Device." The current implementation makes the assumption that all `case` and `default` statements are directly nested under a `switch`, and that there is no way for control flow to enter a case except by the `switch` itself, or fall-through.
In order to facilitate the grouping of cases in the IR-to-HLSL emit logic, the AST-to-IR lowering logic tries to detect cases where there are multiple `case`s in a row, and emit only a single label for them.
One big/annoying gotcha is that we don't properly handle the case where a `default:` case has a non-trivial fall-throguh to another case. That seems fine for now since HLSL doesn't support fall-through anyway, but it probably needs to get detected somewhere in the Slang compiler (e.g., maybe we should add a diagnostic pass over the IR that detects target-specific problems like that and emits errors).
* IR: Add support for empty statements.
- Add empty statement in `lower-to-ir.cpp`
- Go ahead and eliminate the statement catch-all and explicitly enumerate the cases we don't support
- Fix up parser for block statements so that it doesn't leave a null statement as the body of a `{}`
- Add an empty statement to one of the cases for the `switch` test, to ensure we are testing empty statements
|
|
- Add definition of `discard` instruction
- A `discard` is a terminator instruction, just like `returnVoid`
- Lower `DiscardStmt` in AST to a `discard` instruction in the IR
- Emit `discard` instruction as a `discard;` statement when emitting HLSL/GLSL
- Add a test case using the "graphics compute" mode that tests discard. The test writes to one entry in a UAV before doing a conditional (always true at runtime) discard, and then writes to another entry; we expect to see the results of the first write, but not the second.
|
|
* improve diagnostic messages and prevent fatal errors from crashing the compiler.
* fix top level exception catching.
* spelling fix
* change wording of invalidSwizzleExpr diagnostic
* add speculative GenericsApp expr parsing
* add new test case of cascading generics call.
* Fixing bugs in compiling cascaded generic function calls.
Add implementation of DeclaredSubTypeWitness::SubstituteImpl()
This is not needed by the type checker, but needed by IR specialization. When input source contains cascading generic function call, the arguments to `specialize` instruction is currently represented as a substitution. The arg values of this subsittution can be a `DeclaredSubTypeWitness` when a generic function uses one of its generic parameter to specialize another generic function. When the top level generics function is being specialized, this substitution argument, which is a `DeclaredSubTypeWitness`, needs to be substituted with the witness that used to specialize the top level function in the specialized specialize instruction as well.
* add a test case for cascading generic function call.
* parser bug fix
* fixes #255
* add test case for issue #255
* Generate missing `specialize` instruction when calling a generic method from an interface constraint.
When calling a generic method via an interface, we should be generating the following ir:
...
f = lookup_interface_method(...)
f_s = specailize(f, declRef)
...
This commit fixes this `emitFuncRef` function to emit the needed `specialize` instruction.
* fixes #260
This fix follows the second apporach in the disucssion. It generated mangled name for specialized functions by appending new substitution type names to the original mangled name.
* Disabling removing and re-inserting specailized functions in getSpecalizeFunc()
I am not sure why it is needed, it seems HLSL and GLSL backends are generating forward declarations anyways, so the order of functions in IRModule shouldn't matter.
* cleanup and complete test cases.
* fix warnings
|
|
The original code is handling the issue where a call site might be specializing a generic function, so it has a `DeclRef` that represents what it wants to specialize, but the callee is actually a different overload of the same generic function (e.g., a target-specific overload) and so we need to construct a set of substitutions that are equivalent (same arguments), but point to different `GenericDecl`s.
That code was making some bad assumptions, though:
1. It assumed that the substitutions list would always start with a generic substitution (no longer true with `ThisTypeSubstitution`.
2. It assumed that only the top-most substitution would need to be translated. This assumption is probably safe for now, but it could break down if we ever introduced an ability for a type to be re-opened to introduce new (target-specific) overloads of its members.
The new approach goes ahead and does a deep copy of the substitition list (but a shallow copy of the arguments), and only copies the generic substititions for now.
|
|
* Rename existing ParameterBlock to ParameterGroup
We are planning to add a new `ParameterBlock<T>` type, which maps to the notion of a "parameter block" as used in the Spire research work.
Unfortunately, the compiler codebase already uses the term `ParameterBlock` as catch-all to encompass all of HLSL `cbuffer`/`tbuffer` and GLSL `uniform`/`buffer`/`in`/`out` blocks (all of which are lexical `{}`-enclosed blocks that define parameters...).
This change instead renames all of the existing concepts over to `ParameterGroup`, which isn't an ideal name, but at least doesn't directly overlap the new terminology or any existing terminology.
The new `ParameterBlockType` case will probably be a subclass of `ParameterGroupType`, since it is a logical extension of the underlying concept.
* Add Shader Model 5.1 profiles
The HLSL `register(..., space0)` syntax is only allowed on "SM5.1" and later profiles (which is supported by the newer version of `d3dcompiler_47.dll` that comes with the Win10 SDK, but not the older version of `d3dcompiler_47.dll` - good luck figuring out which you have!).
This change adds those profiles to our master list of profiles, and nothing else.
* First pass at support for `ParameterBlock<T>`
- Add the type declaration in stdlib
- Add a special case of `ParameterGroupType` for parameter blocks
- Handle parameter blocks in type layout (currently handling them identically to constant buffers for now, which isn't going to be right in the long term)
- Add an IR pass that basically replaces `ParameterBlock<T>` with `T`
- Eventually this should replace it with either `T` or `ConstantBuffer<T>`, depending on whether the layout that was computed required a constant buffer to hold any "free" uniforms
- Add first stab at an IR pass to "scalarize" global variables using aggregate types with resources inside.
- This currently only applies to global variables, so it won't handle things passed through functions, or used as local variables
- It also only supports cases where the references to the original variable are always references to its fields, and not the whole value itself
- Add a single test case that technically passes with this level of support, but probably isn't very representative of what we need from the feature
* Fold parameter-block desugaring into a more complete "type legalization" pass
The basic problem that was arising is that once you desugar `ParameterBlock<T>` into `T`, you then need todeal with splitting `T` into its constituent fields if it contains any resource types.
Handling those transformations by following the usual use-def chains wasn't really helping, because you might need systematic rewriting that can really only be handled bottom-up.
This change adds a new pass that is intended to perform multiple kinds of type "legalization" at once:
- It will turn `ParameterBlock<T>` into `T`
- It may at some point also convert `ConstantBuffer<T>` into `T` as well
- It will turn an value of an aggregate type that contains resources into N different values (one per field)
- As a result of this, it will also deal with AOS-to-SOA conversion of these types
Legalization is applied to *every* function/instruction/value, so that it can make large-scale changes that would be tough to manage with a work list.
This pass needs to be run *after* generics have been fully specialized, so that we know we are always dealing with fully concrete types, so that their legalization for a given target is completely known.
This is still work in progress; there's more to be done to get this working with all our test cases, and finish the remaining `ParameterBlock<T>` work.
* Improve binding/layout information when using parameter blocks
- When doing type layout for a parameter block, don't include the resources consumed by the element type in the resource usage for the parameter block
- Note that this is pretty much identical to how a `ConstantBuffer<T>` does not report any `LayoutResourceKind::Uniform` usage, except that `ParameterBlock<T>` is *also* going to hide underlying texture/sampler reigster usage
- The one exception here is that any nested items that use up entire `space`s or `set`s those need to be exposed in the resource usage of the parent (I don't have a test for this)
- When type legalization needs to scalarize things, it must propagate layout information down to the new leaf variables. In general, the register/index for a new leaf parameter should be the sum of the offsets for all of the parent variables along the "chain" from the original variable down to the leaf (we aren't dealing with arrays here just yet).
- When type legalization decides to eliminate a pointer(-like) type (e.g., desugar `ParameterBlock<T>` over to `T`), actually deal with that in terms of the `LegalVal`s created, so that we can know to turn a `load` into a no-op when applied to a value that got indirection removed.
- Hack up the "complex" parameter-block test so that it actually passes (the big hack here is that the HLSL baseline is using names that are generated by the IR, and are unlikely to be stable as we add/remove transformations).
- Note: I can't make these be compute tests right now, because regsiter spaces/sets are a feature of D3D12/Vulkan, and our test runner isn't using those APIs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This change includes a lot of infrastructure work, but the main point is to allow code like the following:
```
// define an interface
interface Helper { float help(); }
// define a generic function that uses the interface
float test<T : Helper>( T t ) { return t.help(); }
// define a type that implements the interface
struct A : Helper { float help() { return 1.0 } }
// define an ordinary function that calls the
// generic function with a concrete type:
float doIt()
{
A a;
return test<A>(a);
}
```
Getting this to generate valid code involves a lot of steps. This change includes the initial version of all of these steps, but leaves a lot of gaps where more complete implementation is required.
The changes include:
- Member lookup on types has been centralized, and now handles the case where the type we are looking for a member in is a generic parameter (e.g., given `t.help()` we can now look up `help` in `Helper` by knowing that `t` is a `T` and `T` conforms to `Helper`).
- There is an obvious cleanup still to be done here where the same exact logic should be used to look up available "constructor" declarations inside a type when the type is used like a function.
- Add a notion of subtype constraint "wittnesses" to the type system. When a generic is declared as taking `<T : Helper>` it really takes two generic parameters: the type `T` and a proof that `T` conforms to `Helper`. The actual arguments to a generic will then include both the type argument and a suitable witness argument (both type-level values).
- As it stands right now, a witness wraps a `DeclRef` to the declaration that represents the appropriate subtype relationship. So if we have `struct A : Helper`, that `: Helper` part turns into an `InheritanceDecl` member, and a reference to that member can serve as a witness to the fact that `A` conforms to `Helper`.
- Make explicit generic application `G<A,B>` synthesize the additional arguments that represent conformances required by the generic.
- This does *not* yet deal with the case where a generic is implicitly specialized as part of an ordinary call `G(a,b)`
- A bug fix to not auto-specialize generics during lookup. The problem here was related to an attempted fix of an earlier issue.
During checking of a method nested in a generic type, we were running into problems where `DeclRefType::create()` was getting called on an un-specialized reference to `vector`, and this was leading to a crash when the code looked for the arguments for the generic. This was worked around by having name lookup automatically specialize any generics it runs into while going through lookup contexts.
That choice creates the problem that in a generic method like this:
```
void test<T>(T val) { ... }
```
any reference to `val` inside the body of `test` will end up getting specialized so that it is effectively `test<T>::val`, when that isn't really needed.
- Add front-end logic to check that when a type claims to conform to an interface it actually must provide the methods required by the interface. The checking process goes ahead and builds a front-end "witness table" that maps declarations in the interface being conformed to over to their concrete implementations for the type.
- At the moment the checking is completely broken and bad: it assumes that *any* member with the right name is an appropriate declaration to satisfy a requirement. That obviously needs to be fixed.
- Add an explicit operation to the IR for lookup of methods: `lookup_interface_method(w, r)` where `w` is a reference to the "witness" value and `r` is an `IRDeclRef` for the member we want to look up.
- Add an explicit notion of witness tables to the IR. These end up being the IR representation of an `InheritanceDecl` in a type, and they are generated by enumerating the members that satisfy the interface requirements (which were handily already enumerated by the front-end checking). The witness table is an explicit IR value, and so it will be referenced/used at the site where conformance is being exploited (e.g., as part of a `specialize` call), so it should be safe to eliminate witness tables that are unused (since they represent conformances that aren't actually exploited). Similarly, the entries in a witness table are uses of the functions that implement interface methods, and so keep those live.
- In order to implement the above, I did a bit of a cleanup pass on the IR representation so that there is an `IRUser` base that `IRInst` inherits from, so that we can have users of values that aren't instructions.
- One annoying thing is that because of how types and generics are handled in the IR, we needed a way to have a type-level `Val` that wraps an IR-level value: e.g., to allow an IR-level witness table to be used as one of the arguments for specialization of a generic. The design I chose here is to have a "proxy" `Val` subclass (`IRProxyVal`) that wraps an `IRValue*`. These should only ever appear as part of types and `DeclRef`s that are used by the IR.
- One annoying bit here is that an IR value might then have a use that is not manifest in the set of IR instructions, and instead only appears as part of a type somewhere.
- I'm not 100% happy with this design, but it seems like we'd have to tackle similar issues if/when we eventually allow functions to have `constexpr` or `@Constant` parameters
- Make generic specialization also propagate witness table arguments through to their use sites (this is mostly just the existing substitution machinery, once we have `IRProxyVal`), and then include logic to specialize `lookup_interface_method` instructions when their first operand is a concrete witness table.
All of this work allows a single limited test using generics with constraints to pass, but more work is needed to make the solution robust.
|
|
|
|
inputs for running test shaders with arbitrary parameter definitions.
This commit contains the parser of the resource input definition.
|
|
* Fix up emission of shader parameter semantics when using IR
- Make sure to propagate entry point parameter layouts down to IR parameters when doing the initial cloning to form target-specific IR
- When layout information is present on an IR node, prefer to use that over the original high-level declaration for outputting semantics in final HLSL
- Fix up test runner to generate `.actual` files when running compute tests, in cases where the `render-test` application errors out (e.g., because of a Slang compilation error)
- Add a first test of generics functionality, to show that they generate valid code through the IR
- Right now this test is *not* using any "interesting" operations on the type parameter, so this is not a test that can confirm that interface constraints work
* fixup: skip compute tests when running on Linux
|
|
There was already a pass in place that transformed parameters and results of an entry-point function into global variables for GLSL, but this pass would just turn a `struct`-type parameter into a `struct`-type global, which has two problems:
- The standard GLSL language doesn't seem to allow `struct` types as vertex shader inputs or fragment shader outputs.
- If there are any members in such a `struct` that represent "system value" inputs or outputs, then these would need to be transformed into the equivalent `gl_*` variables.
This change adds a more complete scalarization process that applies to inputs/outputs during the legalization pass. In order to support this there is a little bit of a data strcuture for abstracting over tuples of values (this same idiom is used in a few other places, so perhaps the implementation could be done once and shared?).
System values are current handled in a painfully ad hoc (and incomplete) fashion during code emit. We need to come up with a better solution for mapping HLSL `SV_*` semantics over to `gl_*` variables.
In some cases this mapping might introduce more code than we can easily deal with during emit time, so it probably needs to be handled back at the IR level.
This implementation has many gaps, but it appears to be enough to get teh `render/cross-compile-entry-point` test working with IR-based cross-compilation.
|
|
There are two big changes here:
- Add logic during the initial IR cloning pass for an entry point + target that tries to pick the best possible version of any target-overloaded function. This allows us to pick the intrinsic version of `saturate()` when compiling for HLSL output, but then pick the non-intrinsic version (that is implemented in terms of `clamp()`) when targetting GLSL.
- Add an initial specialization pass that tries to deal with generics. This required some fixing work to IR generation, so that we correctly generate explicit operations to specialize a generic for specific types (this is currently implemented as a `specialize` instruction that takes the generic to specialize plus a declaration-reference that represents the specialized form). With that work in place, we can scan for `specialize` instructions inside of non-generic functions, and use them to trigger generation of specialized code. We rely on the name-mangling scheme to help us find pre-existing specializations when possible.
There are also a bunch of cleanups encountered along the way:
- Don't use the explicit `layout(offset=...)` for uniforms, because it isn't supported by all current drivers. For now we will just assume that our layout rules compute the same values that the driver would for un-marked-up code. We can come back later and try to implement a workaround in the cases where this doesn't apply (e.g., by re-running the layout logic as part of emission, and dropping layout modifiers from variables that don't need explicit layout).
- Fix some issues in IR dump printing so that we print function declarations more nicely.
- Testing: print out failing pixel when image-diff fails
|
|
The big addition here is that the Slang "bytecode" is no longer treated as just a "code generation target" (`CodeGenTarget`) akin to DX bytecode (DXBC) or SPIR-V, but instead is a `ContainerFormat` that can be used to emit all the results of a compile request (well, currently just the IR-as-BC, but the intention is there).
Getting to this goal involved some prior checkins that eliminated bogus "targets" that weren't really akin to SPIR-V or DXBC: `-target slang-ir-asm` and `-target reflection-json`. Those targets were really in place to support testing, and so they've been made more explicit testing/debug options.
This change eliminates `-target slang-ir` and instead tries to allow the user to specify `-o foo.slang-module` as an output file name, that indicates the intention to output a "container" file that will wrap up all the generated code.
I've also gone ahead and generalized the existing `-target` option so that we are actually building up a *list* of code generation targets. This is largely just a cleanup, since it forces code to be more aware of when it is doing something target-specific vs. target independent. For example, reflection layout information lives on a requested target, and not on the compile request as a whole, and similarly output code is per-target, per-entry-point.
As a cleanup, I eliminated support for per-translation-unit output. This was vestigial code from back when I used to try and do HLSL generation for a whole translation unit instead of per-entry-point (which turned out to be a lot of complexity for little gain), and it was only being used in the `hello` example and the `render-test` test fixture - in both cases fixing it up was easy enough. I've stubbed out the old `spGetTranslationUnitSource` API, but haven't removed it yet.
|
|
* Get rid of the `-slang-ir-asm` target
This is really only useful for debugging, so I've replaced the functionality with a `-dump-ir` command line option (which dump's the IR for an entry point before doing codegen).
* fixup: use HLSL target, not DXBC, so test can run on Linux
|
|
When outputting GLSL from a Slang or HLSL entry point, we need to translate any parameters or results of an entry-point function into global declarations of `in` or `out` parameters, as needed by GLSL.
This change adds these transformations at the IR level, so that they don't need to complicate the emit logic.
More detailed changes:
- Make `render0` test use IR. It passes out of the box.
- Fix test runner to not always dump diffs on failures
I accidentally initialized an option to `true` instead of `false` when working on debugging the Travis CI failures.
- Special-case output for component-wise multiplication to handle GLSL `matrixCompMul()`
- Handle GLSL vs. HLSL output for calls to `mul()`
- Output proper `layout(std140)` on GLSL constant buffer declarations
- Require appropriate GLSL extension when emitting explicit `layout(offset = ...)` on constant buffer members
- TODO: Need to avoid requiring this extension in cases where the offsets are what would be computed anyway.
Realistically, should probably be emitting code with explicit padding, etc. to guarantee layouts.
- Add an IR-based pass to translate entry point functions by eliminating their input/output parameters and replacing them with global variables.
- Demangle names when calling target intrinsics
The lowering to the IR will turn a call like `sin(foo)` into a call to a function declaration with a mangled name like `_S3sin...`. This works fine when the user is calling their own functions, since the name mangling will apply to both the definition and use sites, but for builtin functions it obviously isn't what we want.
This change makes it so that we demangle the name of an instrinsic function just enough so that we can extract the original simple name, and make a call using that.
These changes do nor provide 100% of what we need when translating to GLSL, so the `cross-compile-entry-point` test *still* hasn't been flipped over to use the IR (even though that is the test case I've been using to develop these changes).
|
|
* IR: overhaul IR design/implementation
Closes #192
Closes #188
This is a major overhaul of how the IR is implemented, with the primary goal of just using the AST-level type representation as the IR's type representation, rather than inventing an entire shadow set of types (as captured in issue #192).
One consequence of this choice is that types in the IR are no longer explicit "instructions" and are not represented as ordinary operands (so a bunch of `+ 1` cases end up going away when enumerating ordinary operands).
Along the way I also got rid of the embedded IDs in the IR (issue #188) because this wasn't too hard to deal with at the same time.
Another related change was to split the `IRValue` and `IRInst` cases, so that there are values that are not also instructions. Non-instruction values are now used to represent literals, references to declarations, and would eventually be used for an `undef` value if we need one. IR functions, global variables, and basic blocks are all values (because they can appear as operands), but not instructions.
The main benefit of this approach is that the top-level structure of a bytecode (BC) module is much simpler to understand and walk, and BC-level types are represented much more directly (such that we could conceivably use them for reflection soon).
* fixup: 64-bit build fix
* fixup: try to silence clang's pedantic dependent-type errors
* fixup: bug in VM loading of constants
|
|
None of these changes are made "live" at the moment. I'm just trying to get them checked in to avoid divering too far from `master` at any point during development.
- Add basic emit logic to produce GLSL from the IR in a few cases (the existing IR emit logic was ad hoc and HLSL-specific)
- When lowering a function declaration, walk up its chain of parent declarations to collect additional parameters as needed
- When lowering a call, make sure to add generic arguments that come from the declaration reference being called
- Attach a "mangled name" to symbols when lowering, so that we can eventually use that name to resolve things for linkage.
- After the above work, I had to apply some fixups to make sure that generic arguments *don't* get added when the user is calling an `__intrinsic_op` function, since those should map 1-to-1 down to instructions with just their ordinary parameter list.
A big open question right now is whether I should continue to represent the generic arguments as just part of the ordinary argument list for a function, or split them out into separate `applyGeneric` and `apply` steps.
A strongly related question is whether a declaration with generic parameters should lower into a single declaration, or one declaration nested inside an outer generic declaration.
A good future step at this point would be to eliminate a lot of the `__intrinsic_op` stuff in favor of having the builtin functions include their own definitions, which might be in terms of a new expression-level construct for writing inline IR operations. This can't be done until the existing AST-to-AST path is no longer needed for cross-compilation purposes.
More immediate next steps here:
- We need a way to round-trip calls to external declaration that get handled by this mangled-name logic. Basically, if we are asked to output HLSL and we see a call to `_S...GetDimensions...(float4, t, a, ...)` we need to be able to walk the mangled name and get back to `t.getDimensions(a, ...)` without a whole lot of manual definitions to make things round-trip.
- In the other case, where a declaration isn't built-in for the chosen target, we need to be able to load a module of target-specific definitions (which will somehow map back to symbols with certain mangled names) and then look these up (by mangled name) and then load/link/inline them into the user's IR to satisfy requirements in their code.
|
|
At a high level, this commit adds two things:
1. A "bytecode" format for serializing Slang IR instructions and related structure (functions, "registers")
2. A virtual machine that can load and then execute code in that bytecode format.
The reason for kicking off this work right now is that we *need* a way to run tests on Slang code generation that doesn't rely on having a GPU present (given that our CI runs on VM instances without GPUs), nor on textual comparison to the output of other compilers. With these features I've implemented a slapdash `slang-eval-test` test fixture that can run a (trivial) compute shader to very our compilation flow through to bytecode.
Some key design constraints/challenges:
- The bytecode format should be "position independent" so that a user can just load a blob of data and then inspect it without having to deserialize into another format, allocate memory, etc. Eventually the bytecode format might be a replacement for out current reflection API (we used to base reflection off a similar format, but the cost/benefit wasn't there at the time and we switched to just using the AST).
- The VM should be able to execute bytecode functions without doing any per-operation translation, JIT, etc. (translation of more coarse-grained symbols is okay). For now the VM is just being used to run tests, but eventually I'd like it to be viable for:
- Running Slang-based code in the context of the compiler itself. This starts with stuff like constant-folding in the front-end, but could expand to more general metaprogramming features.
- Running Slang-based ocde within a runtime application (e.g., a game engine) that wants to be able to run things like "parameter shader" code, or even just evaluate compute-like code on CPU (e.g., when supporting particles on both CPU and GPU).
- Finally, the bytecode format should ideally be able to round-trip back to the IR without unacceptable loss of information. This requirement and the previous one play off of each other, because things like a traditional SSA phi operation is ugly when you have to actually *execute* it. This doesn't matter right now when we don't have SSA yet, but it might be part of the decision-making here.
The actual implementation is centralized in `bytecode.{h,cpp}` and `vm.{h.cpp}`.
Big picture notes:
- The space of opcodes is shared between IR and bytecode (BC), with the hope that this makes translation of operations between the two easy.
- The actual bytecode instruction stream relies on a variable-length encoding for integer values, including opcodes and operand numbers, so that the common case is single-byte encoding.
- In the long term I intend to have a rule that if you use a single-byte encoding for an opcode, then all operands are required to use single-byte encodings too. Operations that need multi-byte operands would then be forced to use a multi-byte encoding of the op, and would be sent down a slower path in the interpeter.
- The "bytecode"'s outer structure is based on ordinary data structures linked with pointers, but they are "relative pointers" so the actual structure is position-independent.
- There are two main kinds of operands: registers and "constants." An operand is a signed integer where non-negatie values indicate registers (with `index == operandVal`) and negative values indicate constants (with `index == ~operandVal`).
- Registers are stored in the "stack frame" for a VM function call, and each has a fixed offset based on the size of the type and those that come before it. Conceptually, registers are allowed to overlap if they aren't live at the same time, and we manage this with a simple stack model: every register is supposed to identify the register that comes directly before it (this isn't implemented yet).
- "Constants" are more realistically a representation of "captured" values, but they are currently also how constants come in. Basically we can use a compact range of indices in the bytecode for a function, and each of these indices indirectly refers to some value in the next outer scope.
- The actual encoding of bytecode instructions right now is largely ad-hoc and very wasteful (we encode the type on everything, and we also encode everything as if it had varargs).
- In some cases, an instruction needs to know the types of the values involved (e.g., because it needs to load an array element, which means copying a number of bytes based on the size). The way the VM works we have types attached to our registers, so we currently get sneaky and look at those types in some ops. Longer term is makes sense to encode the required type info directly in the BC.
- There's a whole lot of hand-waving going on with how the actual top-level bytecode module gets loaded, because of the way we currently treat the top-level module as an instruction stream in the IR. This means that we try to represent the loaded module as a "stack frame" for a call to the module as a function, but that approach as serious problems, and isn't realistically what we want to do.
|
|
* IR: handle control flow constructs
This change includes a bunch of fixes and additions to the IR path:
- `slang-ir-assembly` is now a valid output target (so we can use it for testing)
- This uses what used to be the IR "dumping" logic, revamped to support much prettier output.
- A future change will need to add back support for less prettified output to use when actually debugging
- IR generation for `for` loops and `if` statements is supported
- HLSL output from the above control flow constructs is implemented
- Revamped the handling of l-values, and in particular work on compound ops like `+=`
- Add basic IR support for `groupshared` variables
- Add basic IR support for storing compute thread-group size
- Output semantics on entry point parameters
- This uses the AST structures to find semantics, so its still needs work
- Pass through loop unroll flags
- This is required to match `fxc` output, at least until we implement
unrolling ourselves.
* Fixup: 64-bit build issues.
* fixup for merge
|
|
- Add support for matrix types in IR/codegen
- Add support for basic indexing operations in IR/codegen
|
|
The main interesting change here is around support for lowering of calls to "subscript" operations (what a C++ programmer would think of as `operator[]`).
An important infrastructure change here was to add an explicit AST-node representation for a "static member expression" which we use whenever a member is looked up in a type as opposed to a value. The implementation of this probably isn't robust yet, but it turns out to be important to be able to tell such cases apart.
|
|
This is a bug that already existed in the IR code, but wasn't getting triggered on existing test cases, and only seems to affect the 64-bit target right now.
The "key" value being constructed to cache and re-use constants during IR generation wasn't actually being fully initialized, and so garbage values in it could cause the lookup of an existing value to fail where it should succeed.
|
|
The code previously had an enumerated type for "intrinsic" operations, and allowed functions to be marked `__intrinsic_op(...)` to indicate the operation they map to.
The nature of the IR meant that each of these intrinsic ops had to have a corresponding IR opcode, but the `enum` types weren't the same.
This change cleans things up a bit by deciding that the `__intrinsic_op(...)` modifier names an actual IR opcode, and so the `IntrinsicOp` enum is gone.
The biggest source of complexity here is that there are certain operations that need to be "intrinsic"-ish for the purposes of the current AST-based translation path, because we need them to round-trip from source to AST and back.
Right now this is being handled by defining a bunch of "pseudo-ops" which can be used in the `__intrinsic_op` modifier, but which are *not* meant to be represented in the IR.
Currently I don't actually handle this during IR generation.
In the long run, once we are using IR for everything that needs cross-compilation, we should be able to eliminate the pseudo-ops in favor of just having these be ordinary (inline) functions defined in the stdlib (e.g., the `+=` operator can just have a direct definition).
There was a second category of modifier that gets a little caught up in this, which is the `__intrinsic` modifier, which got used in two ways:
1. A function marked `__intrinsic(glsl, ...)` had what I call a "target intrinsic" modifier, which specified how to lower it for a specific target (e.g., GLSL).
2. A function just marked `__intrinsic` was supposed to be a marker for "this function shouldn't be emitted in the output, because the implementation is expected to be provided"
The latter category of function should really be an `__intrinsic_op`, so I translated all those uses. I added a tiny bit of sugar so that `__intrinsic_op` without an explicit opcode will look up an opcode based on the name of the function being called, so that an operation like `sin` can automatically be plumbed through to an equivalent IR op. (The first category is a stopgap for the AST-based cross-compilation, and will hopefully be replaced by something better as we get the IR-based path working).
Getting the switch from `__intrinsic` to `__intrinsic_op` working required shuffling around some code in `emit.cpp` that handles looking up those modifiers and emitting builtin operations appropriately during cross-compilation.
Depending on where we go with things, a possible extension of this approach is to allow multiple operands to `__intrinsic_op` so that the first specifies the opcode, and then the rest are literal arguments to specify "sub-ops." This could help us handle stuff like texture-fetch operations without an explosion in the number of opcodes. I still need to think about whether this is a good idea or not.
|
|
This gets us far enough that we can convert a single test case to use the IR, under the new `-use-ir` flag.
Getting this merged into mainline will at least ensure that we keep the IR path working in a minimal fashion, even when we have to add functionality the existing AST-based path
There is definitely some clutter here from keeping both IR-based and AST-based translation around, but I don't want to have a long-lived branch for the IR that gets further and further away from the `master` branch that is actually getting used and tested.
Summary of changes:
- Add pointer types and basic `load` operation to be able to handle variable declarations
- Add basic `call` instruction type
- Add simple address math for field reference in l-value
- Always add IR for referenced decls to global scope
- Add notion of "intrinsic" type modifier, which maps a type declaration directly to an IR opcode (plus optional literal operands to handle things like texture/sampler flavor)
- Improve printing of IR instructions, types, operands
- Add constant-buffer type to IR
- Allow any instruction to be detected as "should be folded into use sites" and use this to tag things of constant-buffer type
- Also add logic for implicit base on member expressions, to handle references to `cbuffer` members
- Add connection back to original decl to IR variables (including global shader parameters...)
- Use reflection name instead of true name when emitting HLSL from IR (so that we can match HLSL output)
- Make IR include decorations for type layout
- Re-use existing emit logic for HLSL semantics to output `register` semantics for IR-based code
- Make IR-based codegen be an option we can enable from the command line
- It still isn't on by default (it can barely manage a trivial shader), but it seems better to enable it always instead of putting it under an `#ifdef`
- Fix up how we check for intrinsic operations suring AST-based cross compilation so that adding new intrinsic ops for the IR won't break codegen.
|
|
The terminology here is similar to SPIR-V. For right now the only decoration exposed is a fairly brute-force one that just points back to a high-level declaration so that we can look up info on it that might affect how we print output.
|
|
Previously, a `StructType` was an ordinary instruction that took a variable number of types are operands, representing the types of fields.
This ends up being inconvenient for a few reasons:
- To add decorations to the fields, you'd end up having to decorate the struct type instead (SPIR-V has this problem)
- You need to compute field indices during lowering, when you might prefer to defer that until later
- The get/set field operations now need an index, which needs to be an explicit operand, which means a magic numeric literal floating around to represent the index
The new approach fixes for the first two of these, and at least makes the last one a bit nicer.
A `StructDecl` is now a parent instruction, and its sub-instructions represent the fields of the type - each field is an explicit instruction of type `StructField`.
The operation to extract a field takes a direct reference the struct field, so everything is quite explicit.
|
|
- Change IR instructions to just hold an integer opcode instead of a pointer to the "info" structure
- Externalize definition of IR instructions to a header file, and use the "X macro" approach to allow generating different definitions
- Add notion of function types to the IR, so that we can easily query the result type of a function
- Add some convenience accesors to allow walking the IR in a strongly-typed manner (e.g., iterate over the parameters of a function)
- TODO: these should really be changed to assert the type of things, as least in debug builds
- Add very basic logic to `emit.cpp` so that it can walk the generated IR and start printing it back as HLSL
- This isn't meant to be usable as-is, but it is a step toward where we need to go
|
|
- Make all instructions store their argument count for now, so we can iterate over them easily.
- Longer term we might try to optimize for space because the common case is that the operand count is known, but keeping it simpler seems better for now
- Split apart the creation of an instruction from adding it to a parent
- Use the above capability to make sure that we add a function to its parent *after* all the parameter/result type emission has occured.
- Perform simple value numbering for types during IR creation
- This logic also tries to pick a good parent for any type instructions, so that types don't get created local to a function unless they really need to
- Create all constants at global scope, and re-use when values are identical
|
|
With this change, basic generation of IR works for a trivial shader, and there is some basic support for dumping the generated IR in an assembly-like format.
As with the other IR change, the use of the IR is statically disabled for now, so that existing users won't be affected.
|
|
Right now none of this is hooked up, but I want to get things checked in incrementally rather than have along long-lived branches.
- Added placeholder declarations for IR representation of instructions, basic blocks, etc.
- Start adding a `lower-to-ir` pass to translate from AST representation to IR
Again: none of this is functional, so it shouldn't mess with existing users of the compiler.
|