| Age | Commit message (Collapse) | Author |
|
The basic idea here is that when lowering to the IR, the front-end will attach a "name hint" to the IR instruction(s) that represent a given declaration, and then the passes that work on the IR will try to preserve and propagate those names, and then finally the emit logic will use them in place of mangled or unique names when available.
This change does *not* try to deal with the issues that arise when we try to use those variable names in the output without any modification (e.g., handling cases where they might clash with keywords or builtins in the target language). Instead, it tries to establish baseline behavior for propagating through names, so that a later change can concentrate on the issue of using those names exactly when it is legal to do so.
In order to avoid issues around the name "hints" causing problems we take two main steps:
1. We "scrub" each name to reduce it down to the allowed set of identifier characters in C-like languages, and then ensure that it doesn't do things that would be illegal in some downstream languages (e.g., consecutive underscores are not allowed in GLSL) or could clash with Slang's mangled names. This process isn't guaranteed to give distinct results for distinct inputs (it isn't a mangling scheme, after all).
2. We generate a unique ID for each occurence of a given name and always use that as a suffix. This means that even if a name happens to overlap with a keyword (if you somehow have a variable named `do`), we will still add a suffix that makes it not a problem (we'd output `do_0` which is fine).
The logic for generating these names is mostly straightforward. For simple variables, we use their given name directly, while for other declarations we try to form a name that includes their parent declaration (e.g. `SomeType.someMethod`).
Various IR passes need to propagate or preserve this information. The most interesting is type legalization, when we take a variable with an aggregate type and split some of the fields out into their own variables. In that case we generate "dotted" names like `someVar.someTexture` and rely on the emit logic to turn that into `someVar_someTexture`.
During SSA generation, if we are promoting a variable to SSA temporaries, we will try to propagate the name of the variable over to the temporaries (unless they already have a name from some other place). The same applies to block parameters ("phi nodes").
Many of the test changes need their expected output to be updated for this change. Luckily in most cases the output has gotten easier to understand.
|
|
This was a known issue in our IR representation, which was now biting a user. The basic problem is that in code like the following:
```hlsl
RWStructureBuffer<float4> buffer;
...
buffer[index].xz = value;
```
we ideally want to be able to reproduce the original HLSL code exactly, but that requires directly encoding the way that this code writes to two elements of a vector, but not the others.
The currently lowering strategy we had produced IR something like:
```hlsl
float4 tmp = buffer[index];
tmp.xz = value;
buffer[index] = tmp;
```
That transformation might seem valid, but it has some big problems:
* It generates UAV reads that are not needed, which could impact performance
* It performs read-modify-write operations on memory that the programmer didn't explicitly write, which could create data races
The fix here is somewhat obvious: if the "base" of a swizzle operation on a left-hand side resolves to a pointer in our IR, then we can output a "swizzled store" instead of the read-modify-write dance. We currently keep the read-modify-write around since it is potentially needed as a fallback in the general case.
Along the way I also tried to make sure that we handle the case where we have a swizzle of a swizzle on the left-hand side:
```hlsl
buffer[index].xz.y = value;
```
That code should behave the same as `buffer[index].z = value`. I am currently detecting and cleaning up this logic in the lowering path for `SwizzleExpr`, because that is the only place in the lowering logic that "swizzled l-values" currently get created.
|
|
Fixes #527
There were a few problem cases for the IR emit logic. The most obvious, which came up in #527 is that a function body with multiple `return` statements would generate invalid code:
```hlsl
int foo()
{
return 1;
int x = 2;
return x;
}
```
In that case the IR for `foo` would have a single block that has two `return` instructions, which is invalid.
Another case that seems to be arising more often, but that had less obvious consequences was when one arm of an `if` statement ends in a `return`:
```hlsl
if(a)
{
return b;
}
else
{
int c = 0;
}
int d = 0;
```
In that case, the `return` instruction for `return b` would be followed by a branch to the end of the `if` (the `int d = 0;` line), because that would be the normal control flow without the early `return`.
The fix implemented here is to have the IR lowering logic be a bit more careful on two fronts:
1. When emitting a branch, check if the block we are emitting into has already been terminated, and if so just don't emit the branch (since we are logically at an unreachable point in the CFG.
2. Whenever we are about to emit code for a (non-empty) statement, ensure that the current block being build is unterminated. If the current block is terminated, then start a new one.
Case (2) will only matter when there is unreachable code (e.g., in the function `foo()`, the declaration of `x` and the second `return` can never be reached), so I added a warning in that case, and included a test case that triggers the new warning (with a function like `foo()` above).
|
|
* Improve messages when compilation is aborted.
Make sure to include the information from any `Slang::Exception` that was thrown, so that the poor user can at least point us at our own message string from an assertion failure.
This doesn't provide them line-number information in their code or the Slang codebase, so there is still work to be done in making the compiler more friendly about this stuff.
* When aborting compilation, try to note what source location we were working on
This is handled by having exception handlers on the stack at key bottleneck points in semantic checking and IR generation, which can then emit a diagnostic to note what we were working on when things failed.
This is not intended to be an indiciation to the user that their code is at fault for a compiler crash (it is always our fault), but might give them a chance to work around whatever bug is blocking them.
|
|
* There was a simple typo where we were emitting `RaytracingAccelerationStructureType` instead of `RaytracingAccelerationStructure`
* The IR lowering logic was failing to handle types with an `__intrinsic_type` modifier (which maps them to a single IR opcode) that weren't in one of the various special cases. I added a catch-all case to the handling of `DeclRefType`. This notably affected the `RayDesc` type.
* Even if we lower `RayDesc` to an intrinsic type, we still need to lower its *fields* too, and these were getting emitted with mangled names (as would happen for any user-defined fields). The solution I implemented was to allow for fields to have `__target_intrinsic` modifiers in the stdlib, to specify the un-mangled name they should use on each target.
I'm not 100% happy with this solution, because it seems odd to have `RayDesc` be an intrinsic type, but then to also have field keys used in `getField` instructions as if it were an ordinary `struct`. It seems like a better solution would be to have it lower to an IR `struct`, just with an appropriate modifier.
|
|
The basic problem was that the lowering logic was constructing (more or less) `Ptr<@GroupShared X>` instead of `@GroupShared Ptr<X>`.
There were also problems with passes not propagating through rates that should have been (e.g., legalization).
I've added a test case to actually validate `groupshared` support.
|
|
* Introduce an IR-level type system
Up to this point, the Slang IR has used the front-end type system to represent types in the IR.
As a result (but ultimately more importantly) the IR representation of generics and specialization has used AST-level concepts embedded in the IR.
For example, to express the specialization of `vector<T,N>` to a concrete type `float` for `T`, we needed an IR operation that could represent the specialization, with operands that somehow represented the type argument `float`.
The whole thing was very complicated.
The big idea of this change is to introduce a new representation in which types in the IR are just ordinary instructions, so that using them as operands makes sense. The hierarchy of IR types closely mirrors the AST-side hierarchy for now, and that will probably be something we should maintain going forward.
In order to make these changes work, though, I also had to do major overhauls of things like the way substitutions are performed, how we check interface conformances, the way lookup through interface types is done, etc. etc. This is a big change, and unfortunately any attempt to summarize it in the commit message wouldn't do it justice.
* Fix 64-bit build warning
* Fix up some clang warnings/errors
|
|
Fixes #471
|
|
Fixes #61
When lowering from AST to IR, if a call site doesn't supply an argument expression for each of the parameters to the callee, then use the default value expressions (stored as the "initializer" of the parameter decl) for each omitted parameter. This relies on the front-end to have already checked the call site for validity.
Along the way I also cleaned up some of the checking of parameter declarations so that it is more like the checking of ordinary variable declarations (although the code is not yet shared). I also cleaned out some dead cases in the lowering logic for when we don't actually have a declaration available for a callee (these would only matter if we supported functions as first-class values).
I added a simple test case to confirm that call sites both with and without the optional parameter work as expected.
The strategy in this change is extremely simplistic, and might only be appropriate for default parameter value expressions that are compile-time constants (which should be the 99% case). This may require a major overhaul if we decide to handle default parameter values differently (e.g., by generating extra functions to ensure that the separate compilation story is what we want).
Another issue that could change a lot of this logic would be if we start to support by-name parameters at call sites, since we could no longer assume that the argument and parameter lists align one-to-one (with the argument list possibly being shorter). Any work to add more flexible argument passing conventions would need to build a suitable structure to map from arguments to parameters, or vice-versa.
|
|
* Typo
* Add [shader(...)] and clean up some literal handling
* Add supporting for validating the `[shader(...)]` attribute, by checking that its argument is a string literal that names a known shader stage.
* Split the `ConstantExpr` class into distinct subclasses rooted at `LiteralExpr`, so we have `BoolLiteralExpr`, `IntegerLiteralExpr`, `FloatingPointLiteralExpr`, and `StringLiteralExpr`
* Add a `String` type to the stdlib, to be used as the type of a string literal.
This change allows code using `[shader(...)]` to be accepted by the front-end again, but it does nothing about emitting it in final HLSL.
* Allow entry points to be specified via [shader(...)]
Before this change, the compiler would track a list of `EntryPointRequest` objects, based on what the suer specified via API and/or command-line options. Each entry point request would get matched up with an AST `FuncDecl` as part of semantic checking, and then the back end steps (layout, codegen, etc.) would work from that information.
This change makes the compiler modal, in that it can *either* continue to use an explicit list of entry point requests (this is the mode when the list is non-empty), or it can rely on user-supplied attributes on entry point functions to drive codegen (this is the mode when the list is empty).
User-specified `[shader(...)]` attributes are processed at the same place where the association from `EntryPointRequest`s to `FuncDecl`s would otherwise be made, and basically does the same thing in the opposite direction: looks for `FuncDecl`s with the appropriate attribute and synthesizes an `EntryPointRequest` for them.
Subsequent processing should ideally not know where a given `EntryPointRequest` came from, and should handle both methods of specifying the entry points equivalently.
One design choice that might not make immediate sense is that we do *not* process a function as an entry point (applying further validation, etc.) just because it has a `[shader(...)]` modifier, unless we are in the appropriate mode (which in this case is the mode where the user didn't specify their own entry points via API or command line). This is to handle cases where the user wants to explicitly compile only one entry point, so that they (1) don't want us to spend time validating code they don't care about, (2) don't want do get output they don't expect, and (3) might actually be presenting us with code that violates the language rules due to a combination of `#define`s in effect (e.g., they might have a `[shader("vertex")]` function that transitively executes a `discard` because of how the preprocessor was configured, but they don't care because they are compiling a fragment entry point). This decision might be something we revisit over time.
As part of this work, I had to add some logic to pick a "profile version" to use for a combination of a target and stage (because when you specify `[shader("vertex")]` the compiler can't tell if you want `vs_5_0`, `vs_5_1`, etc.). This isn't really complete right now, because something like `-target dxbc` *also* doesn't determine a profile, so there is a bit of a kludge at present. We need to figure out a good long-term plan here, which might involve keeping target format, feature level/version, and pipeline stage as truly orthogonal concepts, rather than conflating them. That would involve more work in the API and command-line layers to de-compose things when the user specifies, e.g., `vs_5_1`, but might make downstream logic easier to manage.
* Emit [shader(...)] attribute on entry point for SM 6.1 and later
This should help ensure that the output from Slang can be compiled with dxc `lib_*` profiles.
* Fix warning
|
|
This commit contains two small bug fixes:
1. In `specializeProgramLayout`, we cannot assume the resourceInfo entries in a varLayout and its corresponding type layout has the same order. Should use `FindResourceLayout`.
2. When generating ir for a switch statement, make sure to remove the breakLabel from the shared context when done. For some reason if a switch statement is being lowered twice, the Dictionary::Add method will complain the statement key already exists.
|
|
The existing code parsed all of the square-bracket `[attributes]` into `HLSLUncheckedAttribute`, and then went on to hand-convert some of them to specialized subclasses of `HLSLAttribute`. When attributes didn't check, they were left as-is, and no error message was issued, because at the time the compiler was focused on accepting arbitrary input.
This change greatly overhauls the handling of `[attributes]`. Attributes are now declared in the stdlib, with declarations like:
```hlsl
__attributeTarget(LoopStmt)
attribute_syntax [unroll(count: int = 0)] : UnrollAttribute;
```
In this syntax, the `unroll` part is giving the attribute name (the `[]` are just for flavor, to make the declaration look like a use site; we could drop it if we don't like the clutter), the `count` is a parameter of the attribute, which we expect to be of type `int`, and which has a default value of `0` if unspecified.
The `: UnrollAttribute` part specifies the meta-level C++ class that will implement this attribute (and corresponds to a class in `modifier-defs.h`). This syntax is similar to our current `syntax` declarations. I'm starting to think we should change it to something like a `__meta_class(UnrollAttribute)` modifier, and then use that uniformly across all cases (e.g., also replacing the curreent `__magic_type(Foo)` syntax).
The `__attributeTarget(LoopStmt)` is a modifier that specifies the meta-level C++ class for syntax that this attribute is allowed to attach to. It is legal to have more than one of these.
Attributes continue to be parsed in an unchecked form, so that we don't tie up semantic analysis and parsing more than necessary. During checking, we look up the attribute name in the current scope, and then replace the unchecked attribute with a more specific one *if* the checking passes.
Checking proceeds in generic and attribute-specific phases. The generic phase includes checking the number of arguments against those specified in the attribute declaration (I don't currently check types, or handle default arguments), and then checking that at least one `__attributeTarget(...)` modifier applies to the syntax node being modified.
The attribute-specific phase then applies to the specialized C++ subclass of `Attribute`, and does the actual checking right now (e.g., that step is responsible for actually type-checking things at present). This can obviously be improved over time.
With this support I went ahead and added declarations for all the HLSL attributes I could find documented on MSDN. I also added a provisional declaration for the `[shader(...)]` attribute that has been added to dxc, but which is not yet documented.
One important detail here is that lookup of attribute names needs to be done carefully, so that we don't let, e.g., local variables shadow an attribute declaration:
```hlsl
int unroll = 5;
// This attribute should *not* get confused by the local variable `unroll`
[unroll] for(...) { .. }
```
The lookup logic already has a notion of a `LookupMask` that can be used to filter declarations out of the result. In this change I surfaced that mask through the main lookup API (rather than requiring a second pass to "refine" lookup results), and made is so that the default lookup mask does *not* include attributes, while an explicit mask can be used to look up *only* attributes.
(An alternatie design we discussed was to follow the approach of C# and have the declaration of an attribute like `[unroll]` actually be `unrollAttribute`, with a suffix. I decided not to follow that approach for now because it seemed like printing good error messages in that case could require us to carefully trim the `Attribute` suffix off of names at times, and using the existing mask behavior seemed simpler.)
To verify that the shadowing behavior is indeed correct, I modified the `loop-unroll.slang` test case.
Smaller notes:
* Removed the `HLSL` prefix from several of the C++ attribute classes
* Made sure to actually validate the modifiers on statements
* Special-cased checking for `ParamDecl` with a null type, because I'm re-using `ParamDecl` for attribute parameters, but can't give a concrete type to some of them right now
* Deleting some old, dead emit-from-AST logic around attributes, rather than try to "fix" code that doesn't run (a more complete scrub of that code is still needed)
* Fixed AST inheritance hierarchy so that a `Modifier` is a `SyntaxNode` rather than a `SyntaxNodeBase`. I have *no* idea why we have both of those, and we need to clean that up soon.
|
|
The main practical change here is that things that used to be `IRValue`s, like literals, are now being expressed as instructions in the global scope.
In order to validate that things are actually being handled correctly, this change introduces an explicit "validation" pass that can be run on the IR to check for different invariants (although it doesn't check many of the important ones right now). I've left the validation pass turned off by default, but with a command-line flag to enable it. We may want to make it be on by default in debug builds, just to keep us honest. The main invariant for the moment is that when on IR instruction is used as an operand to another, it had better come from the same IR module.
Some of the existing passes were violating this rule, in particular when it came to cloning of witness tables related to global generic parameter substitution. Those features can in theory be handled better now by allowing `specialize` instructions at other scopes, but I didn't want to over-complicate this change, so I make just enough fixes to ensure that these steps always clone witness tables they get from the "symbols" on an IR specialization context. In order for this to work when recursively specializing, I had to ensure that the logic for generic specialization had a notion of a "parent" specialization context that it would fall back to to perform cloning when necessary.
This change keeps the logic that was caching and re-using the instructions for literal values within a module, but adds some logic that isn't really being tested right now for picking the right parent instruction to insert a constant instruction into. This logic doesn't trigger right now because all of the cases we are using it on have zero operands (and so they always get "hoisted" to the global scope), but eventually for things like types we want to be able to support instructions with operands (e.g., `vector<float, 4>`) and handle the case where some of those operands come from different scopes (e.g., when nested inside a generic).
The final change here is mostly cosmetic: the `IRBuilder` is now more abstract about where insertion occurs: it tracks a single `IRParentInst` to insert into, and then an optional `IRInst` to insert before. In the common case, that parent is an `IRBlock`, but it could conceivably also be the global scope, or a witness table, etc. Use sites where we used to change those fields directly now use distinct methods `setInsertInto(parent)` and `setInsertBefore(inst)` which capture the two cases we care about. Accessors are also defined to extract the current block (if the current parent is a block), and the current "function" (global value with code, if the current parent is a global value with code, or a block inside one).
With this work in place, it should be possible for a follow-on change to start putting `specialize` instructions at the global scope and thus clean up some of the on-the-fly specialization work. This work should also help with some of the requirements around a distinct IR-level type system and more explicit generics.
|
|
* IR: "everything is an instruction"
This change tries to streamline the representation of the IR in the following ways:
* Every IR value is an instruction (there is no `IRValue` type any more)
* All IR values that can contain other values share a single base (`IRParentInstruction`)
* Dynamic casts to specific IR instruction types can be accomplished with a new `as<Type>(inst)` operation, that uses the IR opcode to implement casts.
The biggest change in terms of number of lines is getting rid of `IRValue`. The diff here could probably be smaller if I'd just done `typedef IRInst IRValue;`.
Along the way I also renamed the `getArg`/`getArgs`/`getArgCount` combination over to `getOperand`/`getOperands`/`getOperandCount` to avoid being confusing when we have something like a `call` instruction where the "arguments" of the call don't line up with the operands of the instruction.
I also tried to clean up the representation of lists of child instructions to try to make it easier to iterate over them with C++ range-based `for` loops. Developers still need to be careful about mutating the contents of a block while iterating over it in this fashion (e.g. if you remove the "current" element, the iteration will end prematurely).
Probably the thorniest change here is that parameters are now just represented as the first N instructions in a block, which means:
* We need to perform a linear search to find the end of the parameter list. This is probably not often a problem, because usually you would be iterating over the parameters anyway, and that will be linear in the number of parameters.
* Algorithms that iterate over a block either need to ignore parameters, treat parameters just like other instructions, or somehow cleave the list into the range of parameters, and the range of "ordinary" instructions (which involves the same linear search above).
* When inserting into a block, we need to be careful not to insert instructions at invalid locations (e.g., insert a temporary before the parameters, or insert a parameter in the middle of the code). I can't pretend that I've handled the details of that here. (This is no different than having to make the same adjustments for phi nodes in a typical SSA representation)
* One possible future-proof approach is to implement a pass that sorts the instructions in a block so that parameters always come first. That would let us implement passes without caring about this detail, and then clean up right before any pass that cares about the relative order of parameters and other instructions.
The current change is missing any work to make literals and other instructions that used to be `IRValue`s properly nest inside of their parent module. Right now these instructions are just left unparented, and may actually end up being shared between distinct modules. Fixing that will need a follow-up change. The biggest challenge there is that it introduces instructions at the global scope that aren't `IRGlobalValue`s.
This change doesn't try to take advantage of any of the new flexibility (e.g., by nesting `specialize` instructions inside of witness tables). The goal is to do exactly what we were doing before, just with a different representation.
* Warning fix
|
|
This allows us to get rid of `IRGlobalValue::dispose()`.
|
|
* Initial work on validating "constexpr"-ness in IR
The underlying issue here is that certain operations in the target shading languages constrain their operands to be compile-time constants. A notable example is the optional texel offset parameter to the `Texture2D.Sample` operation.
When calling these operations in GLSL, the user is required to pass a "constant expression," and any variables in that expression must therefore be marked with the `const` qualifier (and themselves be initialized with constant expressions). Any GLSL output we generate must of course respect these rules.
When calling these operations in HLSL, the user is not so constrained. Instead, they can pass an arbitrary expression, which may involve ordinary variables with no particular markup, and then the compiler is responsible for determining if the actual value after simplification works out to be a constant. In some cases, the requirement that a value be constant might actually trigger things like loop unrolling. Also, it is okay to use a function parameter to determine such a constant expression, as long as the argument turns out to be a constant at all call sites.
The way we have decided to tackle these challenges in Slang is that we we propagate a notion of `constexpr`-ness through the IR. This is currently being tackled in `ir-constexpr.cpp` with a combination of forward and backward iterative dataflow:
* When the operands to an instruction are all `constexpr`, and the opcode is one we believe can be constant-folded, then we infer that the instruction *can* be evaluated as `constexpr`
* When instruction is required to be `constexpr`, then we infer that all of its operands are also required to be `constexpr`.
If this process ever infers that a function parameter is required to be `constexpr`, then we might have to continue propagation at all the call sites to that function.
If after all the propagation is done, there are any cases where an instruction is *required* to be `constexpr`, but it *can't* be `constexpr` (we weren't able to infer `constexpr`-ness for its operands), then we issue an error.
This implementation encodes the idea of `constexpr`-ness in the IR as part of the type system, using a simplified notion of rates. This change adds a `RateQualifiedType` that can represent `@R T`, and then introduces a `ConstExprRate` that can be used for `R`. Many accessors for the type information on IR nodes were updated to distinguish when one wants the "full" type of an IR value (which might include rate information) vs. just the "data" type.
A `constexpr` qualifier was added in the front-end, and is being used to decorate the texel offset parameter for `Texture2D.Sample`. Lowering from AST to IR looks for this qalifier and infers when a function parameter must be typed as `@ConstExpr T` instead of just `T`.
There are lots of limitations and gotchas in the implementation so far:
* The `@ConstExpr` rate is the only one added in this change, but it seems clear that the conceptual `ThreadGroup` rate that was added to represent `groupshared` should probably get folded into the representation.
* I'm not 100% pleased with how many places in the IR I have to special-case for rate-qualified types. At the same type, pulling out rate as a distinct field on `IRValue` would probably require that we pay attention to rate everywhere.
* I've added a test case to show that we can issue errors when users fail to provide a constant expression for the texel offset, but the actual error message isn't great because it doesn't indicate *why* a constant expression was required. Realistically the "initial IR" should contain a few more decorations we can use to relate error conditions back to the original code (even if this is in a side-band structure).
* I've added a test case that is supposed to show that we can back-propagate `constexpr`-ness to local variables, and I've manually confirmed that it works for Vulkan/SPIR-V output, but the level of Vulkan support in `render_test` today means I can't enable the test for check-in.
* While I'm attempting to propagate `@ConstExpr` information from callees to callers, I haven't implemented any logic to specialize callee functions based on values at call sites.
* In a similar vein, there is no handling of control-flow dependence in the current code. If we infer that a phi (block parameter) needs to be `@ConstExpr`, then it isn't actually enough to require that the inputs to the phi (arguments from predecessor blocks) are all `@ConstExpr` because we also need any control-flow decisions that pick which incoming edge we take to be `@ConstExpr` as well.
* As a practical matter, implicit propagation of `@ConstExpr` from a function body to a function parameter should only be allowed for functions that are "local" to a module. Any function that might be accessed from outside of a module should really have had its `@ConstExpr` parameter marked manually, and our pass should validate that they follow their own rules. Right now we have no kind of visibility (`public` vs `private`) system, so I'm kind of ignoring this issue.
While that is a lot of gaps, this is also just enough code to get the Falcor MultiPassPostProcess example working, so I'm inclined to get it checked in.
* Fixup: missing expected output for test
* Fixup: disable test that relies on [unroll] for now
|
|
getSpecailizedMangledName to work properly.
|
|
* Basic IR support for `static const` globals
Our strategy for lowering global *variables* can fall back to putting their initialization into a function, but that isn't really appropriate for global constants (it also isn't appropriate for arrays, but we'll need to deal with that seaprately).
This change adds a distinct case for global constants (rather than treating them as variables), and forces the emission logic to always emit them as a single expression.
Doing this makes assumptions about how the IR for these constants gets emitted (and what optimziations might do to it).
In order to make things work, I had to switch the handling of initializer-list expressions to not be lowered via temporaries and mutation (since that isn't a good fit for reverting to a single expression).
I've added a single test case to ensure that this works in the simplest scenario. My next priority will be to see if this unblocks my work in Falcor.
* Fixup: bug fixes
|
|
* Generate SSA form for IR functions
The basic idea here is simple: in the front-end after we have lowered the AST to initial IR we will apply a set of "mandatory" optimization passes. The first of these is to attempt to translate the all functions into SSA form so that they are amenable to subsequent dataflow optimizations. Eventually, the mandatory optimization passes would include diagnostic passes that make sure variables aren't used when undefined, etc.
Just doing basic SSA generation already cleans up a lot of the messiness in our IR today, because constructs that used to involve many local variables can now be handled via SSA temporaries.
The implementation of SSA generation is in `ir-ssa.cpp`, and it follows the approach of Braun et al.'s "Simple and Efficient Construction of Static Single Assignment Form." I used this instead of the more well-known Cytron et al. algorithm because Braun's algorith mis very simple to code, and does not require auxiliary analyses to generate the dominance frontier.
The main wrinkle in our SSA representation right now is that instead of using ordinary phi nodes, we instead allow basic blocks to have parameters, where predecessor blocks pass in different parameter values. This encodes information equivalent to traditional phi nodes, but has two (small) benefits:
1. There is no fixed relationship between the order of phi operands and predecessor blocks, so we don't have to worry about breaking the phis when we alter the order in which predecessors are stored. This is important for us because predecessors are being stored implicitly.
2. It is easy to operationalize a "branch with arguments" either when lowering to other languages, or when interpreting the IR. A branch with arguments is implemented as a sequence of stores from the arguments to the parameters of the target block (very similar to a call), followed by a jump to the block.
Relevant to the above, this change also adds an interface for enumerating the predecessors or successors of a block in our CFG. Rather than use an auxliary structure, we directly use the information already encoded in the IR:
* The sucessors of a block are the target label operands of its terminator instruction. In our IR this is a contiguous range of `IRUse`s, possible with a stride (to account for the way `switch` interleaves values and blocks).
* The predecessors of a block are a subset of the uses of the block's value. Specifically, they are any uses that are on a terminator instruction, and within the range of values that represent the successor list of that instruction.
One important limitation of the "blocks with arguments" model for handling phis is that it is really only convenient to stash extra arguments on an unconditional terminator instruction. This change works around this prob lem by breaking any "critical edges" - edges between a block with multiple successors and one with multiple predecessors. We assume that "phi" nodes will only ever be needed on a block with multiple predecessors, and because critical edges are broken, each of these predecessors will then have only a single successor, so its branch instruction can handle the extra arguments.
This change introduces a notion of an "undefined" instruction in the IR. This is handled as an instruction rather than a value because I anticipate that we will want to distinguish different undefined values when it comes time to start issuing error messages (those messages will need to point to the variable that was used when undefined).
* Fix expected test output.
Another change was merged that enabled the `glsl-parameter-blocks` test, and its output is affected by our IR optimization work.
|
|
The standard library already has a bunch of these decorations, since they were added to support Slang->Vulkan codegen on the AST-to-AST path. This change makes the IR code generator able to exploit the modifiers so that we pick up a bunch of Vulkan support "for free" in the short term.
The basic change is in `lower-to-ir.cpp` where we copy over any `TargetIntrinsicModifier`s to become `IRTargetIntrinsicDecoration`s with the same information. We then need a bit of logic in `ir.cpp` to make sure we clone them as needed.
The core work of using the modifiers is in `emit.cpp`, where I basically just copy-pasted the existing logic that applied in the AST path (all the AST-related code there is dead, and we should clean it up soon).
The big change that comes with this logic is that when dealing with a member function, the numbering of the argument used in the intrinsic definition string changes, so that `$0` refers to the base object (whereas before the base object was looked up via the base expression of a `MemberExpr` used for the function). This requires a bunch of the definitions in the library to be updated; hopefully I caught them all.
For kicks, I've re-enabled a cross-compilation test just to confirm that we are generating valid SPIR-V for code that performs texture-fetch operations. I don't expect us to keep that test enabled as-is in the long term, though, because it would be much better to instead use render-test to do the same thing. Alas, beefing up the Vulkan support in render-test is an outstanding work item, and I didn't want to pollute this change with more work along those lines.
|
|
The basic change is simple: remove support for all code generation paths other than the IR.
There is a lot of vestigial code left, but the main logic in `ast-legalize.*` is gone.
Doing this breaks a *lot* of tests, for various reasons:
- We can no longer guarantee exactly matching DXBC or SPIR-V output after things pass through out IR
- Many builtins don't have matching versions defined for GLSL output via IR (even when they had versions defined via the earlier approach that worked with the AST)
- A lot of code creates intermediate values of opaque types in the IR, which turn into opaque-type temporaries that aren't allowed (this breaks many GLSL tests, but also some HLSL)
I implemented some small fixes for issues that I could get working in the time I had, but most of the above are larger than made sense to fix in this commit.
For now I'm disabling the tests that cause problems, but we will need to make a concerted effort to get things working on this new substrate if we are going to make good on our goals.
|
|
- Don't drop specializations on a method when adding it to requirement dictionary
- Handle extension declarations under a generic when emitting to IR
|
|
1. allow spReflection_FindTypeByName to accept arbitrary type expression string
2. allow const int generic value to be used as expression value, and as array size
3. various bug fixes in witness table specialization / function cloning during specializeIRForEntryPoint to avoid creating duplicate global values, not copying the right definition of a function from the other module, not cloning witness tables that are required by specializeGenerics etc.
|
|
- support overloaded generic function. this involves adding a new expression type, `OverloadedExpr2` to hold the candidate expressions for the generic function decl being referenced.
- make BitNot a normal IROp instead of an IRPseudoOp
- make sure we clone the decorations of parameters when cloning ir functions
- propagate geometry shader entry point attributes (`[maxvertexcount]` and `[instance]`) through HLSL emit
- IR emit: handle geometry shader entry-point parameter decorations, such as 'triangle'.
- IR emit: treat geometry shader stream output typed ir value as `should fold into use`.
|
|
This commit is a bunch of quick hacks to get transitive interfaces to work. The idea is for each concrete type we create one giant witness table that contains entries for all the transitively reachable interface requirements, and then create one copy of that witness table for each interface it implements.
`DoLocalLookupImpl` now also looks up in inherited interface decles when looking up for a symbol in an interface decl.
When visiting `InheritanceDecl` in `lower-to-ir`, create copies of the giant witness table for each transitively inherited interface, so that these witness tables can be found later when the IR is specialized.
Re-enable the `copy all witness tables` hack in `specializeIRForEntryPoint` to ensure those transitive witness tables are copied over.
|
|
Also support the scenario that the extension declares conformance to interface I, and a method M in I is already supported by the base implementation.
|
|
fixes #362
|
|
This commit changes the type of `DeclRefBase::substitutions` from `RefPtr<Substitutions>` to `SubstitutionSet`, which is a new type defined as following:
```
struct SubstitutionSet
{
RefPtr<GenericSubstitution> genericSubstitutions;
RefPtr<ThisTypeSubstitution> thisTypeSubstitution;
RefPtr<GlobalGenericParamSubstitution> globalGenParamSubstitutions;
}
```
This change get rid of most helper functions to retreive the substitution of a certain type, as well as surgery operations to insert a `ThisTypeSubstitution` or `GlobalGenericTypeSubstittuion` at top or bottom of the substitution chain. It also simplies type comparison when certain type of substitution should not be considered as part of type definition.
|
|
|
|
fixes #325
This commit includes following changes:
1. Including a default DeclaredSubtypeWitness argument when creating a default GenericSubstitution for a DeclRefType, so that the witness argument can be successfully replaced with an actual witness table after specialization. (check,cpp)
2. Not emitting full mangled name for struct field members. Since the declref of the member access instruction do not include necessary generic substitutions for its parent generic parameters, so the mangled names of the declaration site and use site mismatches. Instead we just emit the original name for struct fields. (emit.cpp)
3. Allow IRWitnessTable to represent a generic witness table for generic structs. Adds necessary fields to IRWitnessTable for generic specialization. For now, the user field of the IRUse is not used and is nullptr. (ir-inst.h)
4. Make IRProxyVal use an IRUse instead of an IRValue*, so that an IRValue referenced by IRProxyVal (as a substitution argument) can be managed by the def-use chain for easy replacement. This is used for specializing witness tables. (ir.cpp, ir.h)
5. Add a `String dumpIRFunc(IRFunc*)` function for debugging.
6. Add name mangling for generic / specialized witness tables (mangle.cpp)
7. improved natvis file for inspecting witness tables.
8. Add specialization of witness tables:
1) `findWitnessTable` will simply return the specialize IRInst for a generic witness table.
2) make `cloneSubstitutionArg` call `cloneValue` to clone the argument instead of calling `context->maybeCloneValue`, so we can make use of the cloned value lookup machanism to directly return the specialized witness table (which is done when we process the `specialize` instruction on the generic witness table before process the decl ref).
3) bug fix: the argument in ir.cpp:3338 should be `newArg` instead of `arg`.
4) add `specializeWitnessTable` function to specailize a generic witness table. It clones the witness table, and recursively calls `getSpecailizedFunc` for the witness table entries.
5) make `specailizeGenerics` function also handle the case when an operand of the `specialize` instruction is a witness table. We will call `specializeWitnessTable` here and replace the `specialize` instruction with the specialized witness table. The replacement mechanism based on IR def-use chain works here because we have already make IRProxyVal a part of the def-use chain.
9. Add two more test cases for nested generics with constraints. (generic-list.slang and generic-struct-with-constraint.slang)
|
|
* IR: fixes for subscript accessors
Fixes #320
This is a bunch of fixes for handling of `__subscript` operations on builtin types (notably `RWStructuredBuffer` and `StructuredBuffer` at this point).
- Automatically add a `GetterDecl` to any subscript decalratio was declithout any accessors. This avoids hitting a null- dereference in the emit logic.
- Add a notion of a `RefAccessor` (declared with `ref`) as a peer to getters and setters. The idea is that a `ref` accessor returns a pointer to the element data, so that it can be used for both getting and setting values. This is closer to the behavior of `RWStructuredBuffer` element access in HLSL.
- Fixes for dealing with "access chains" where there might be a combination of a subscript (where the is a `get` and `set` but no `ref`) and member access, so that we have to read the base value into a temp, modify it, and then write it back.
- This logic is still a bit of a mess, so we will eventually want to take a more consistent pass over this to deal with how we "materialize" values for setters.
- Update `RWStructuredBuffer` to have a `ref` accessor, and then fix up the IR tests to handle the new opcode that I added for it.
- Note: I didn't handle this as an intrinsic simply because the `tests/ir/*` tests aren't really set up to handle builtins with ugly mangled names.
* Fixup: type error in VM for buffer element ref
I was using the result type of the op as the element type for computing the element address, but the result type is a pointer to the real element type.
This caused test failures on 64-bit platforms, where the stride of the buffer in the `ir/factorial` test needs to be 4.
The fix is to assume the result type is a pointer, and extract the pointed-to type out of that.
|
|
* Fix: try to improve float literal formatting
An earlier change switched to using the C++ stdlib to format floats, but I neglected to check that it would correctly print values that happen to be integral with a decimal place (e.g., print `1.0` instead of just `1`). That causes some obscure cases to fail (e.g., when you write `1.0.xxxx` in HLSL, and we print out `1.xxxx`).
I've tried to resolve the issue by using the "fixed" output format, but that may also create problems for extremely large or small values (which really need to use scientific notation). However, this at least puts us back where we were before...
* Add support for source locations to the IR
- Add a `sourceLocation` field to all `IRValue`s
- Add some logic to associate locations with operations while lowering (using a slightly "clever" RAII approach)
- Make sure that when cloning instructions, we also clone the location
- Make the emit logic use the existing support for source locations when emitting code from the IR
A nice cleanup to this work would be to have the source locations for IR values not be hyper-specific about both line and column; it is probably enough to just emit on the correct line.
|
|
* Support simple generics syntax.
This commit enables simpler generics syntax, e.g.
T test<T>(T arg) {}
or
struct Gen<T>{T x;};
* Support simple generics syntax.
This commit enables simpler generics syntax, e.g.
T test<T>(T arg) {}
or
struct Gen<T>{T x;};
* add expected test result for compute/generics-syntax.slang
|
|
* Make AST and IR share type legalization code
A previous change already made it so that the AST-to-AST lowering/legalization pass could work together with IR-based lowering of `import`ed code, but that change didn't take into account the case where a function written in the AST needed to call an IR function and pass in a type that required legalization.
Both the IR-based and AST-based passes had their own approaches to type legalization, that mostly agreed on the desired output, but they ended up creating their own representations for legalized types which would mean that for a function call the caller and callee might end up legalizing the parameter list to use different types.
This change tries to fix this issue (and adds a new test case that relies on the fix) by massively overhauling the AST-based legalization pass so that it uses the same type legalization code as the IR. The shared code has been moved out into `legalize-types.{h,cpp}`.
Notes:
- I eliminated the `FilteredTupleType` type, since it was starting to cause code duplication in a lot of places. Instead, type legalization just creates new `struct` types to represent the result of filtering.
- One big consequence of this is that the `LegalType::pair` case needs to remember for each field in the original type which field (if any) in the new `struct` type it maps to
- A big source of complexity (and probably bugs) in this code is trying to figure out how to parent these new `struct` definitions effectively. A good follow-on change would be something that outputs declarations on-demand during the AST emit logic (as we do for the IR), just to avoid some of this song and dance.
- The old AST type legalization had a notion of both a "tuple" type and a "varying tuple" type. The "tuple" case was quite complex, and combined behavior currently handled by `LegalType::pair` (for splitting into ordinary and special sides) and `LegalType::tuple` (for holding multiple distinct elements to represent the fields of an aggregate). The "varying tuple" case was closer to `LegalType::tuple`, so I tried to just re-use the existing logic for that too. The one place this potentially gets messy is in `reifyTuple()`.
- The messiest bit of handling the "varying tuple" concept (which is used for GLSL shader inputs/outputs since they have to be scalarized) is that when passing them as function arguments we need to reify the tuple back into a structured value. Because the `LegalExpr` hierarchy doesn't have type information, but constructing a value of the "original" type requires such information, things get a little messy.
- I did *not* try to deal with any of the logic related to handling system inputs/outputs for cross-compilation purposes. Of course, the long-term goal is that any actual cross-compilation is handled via the IR, but this change can't afford to break the AST-based path just yet. As a result, there is still quite a bit of complexity in the handling of assignment, to deal with cases where "fixups" are required.
* fixup: bad code in macro, not caught by Visual Studio compiler
* fixup: more stuff missed by VS compiler
* fixup: VS continutes to miss stuff in UNREACHABLE_RETURN
|
|
The basic idea here is that for each module that gets loaded via `import`, we should also generate the initial IR for the declarations in that module at the time it gets loaded.
Furthermore, when we generate initial IR for a module, we will only generate IR *declarations* (not *definitions*) for any functions/variables in modules it imports.
Later, when cloning IR to begin code generation for an entry point, we will effectively "link" all of the loadedm modules together, so that a given global value can get its definition from any of the IR modules present.
- Change the `loadedModulesList` and related data structures to hold a new `LoadedModule` type, instead of just the AST (and then have a `LoadedModule` own both the AST and the IR module)
- Share some logic between the `import` and `#import` cases, so that we always try to generate IR for modules we load.
- Make sure that IR generation always gets skipped if the command-line flags tell us not to use the IR.
- A few small fixups for cases that didn't arise in IR lowering so far, but come up when we try to actually generate IR for things like the stdlib.
There are some notable gaps in this work right now:
- The stdlib modules are exempted from this behavior; we always generate IR for stdlib functions in any user module that calls them. This is just a workaround for the fact that the stdlib modules don't show up in the list of imported modules right now.
- We don't currently have logic that does the "linking" step for global variables like we do for functions. We really need to look up the symbols with the same mangled name, and favor any one of them that has a definition (if there is one)
- Similarly, the handling of witness tables is incomplete. During initial IR generation, we should probably be generating empty witness tables for any conformances that were declared in other modules (but are being used locally in this module), and then the "linking" step should favor non-empty witness tables over empty ones.
Still, all the test cases pass with the code like this, and this seems like an important step in the right direction.
|
|
The big change here is that the ability to contain basic blocks with instructions in them has been hoisted from `IRFunc` into a new base type `IRGlobalValueWithCode` shared with `IRGlobalVar`.
The basic blocks of a global variable define initialization logic for it; they can be looked at like a function that returns the initial value.
Places in the IR that used to assume functions contain all the code need to be updated, but so far I only handled the cloning step.
The emit logic currently handles an initializer for a global variable by outputting its logic as a separate function, and then having the variable call that function to initialize itself. This should be cleaned up over time so that we generate an ordinary expression whenever possible.
I also made the emit logic correctly label any global variable without a layout (that is, any that don't represent a shader parameter) as `static` so that the downstream HLSL compiler sees them as variables rather than parameters.
|
|
* IR: add lowering for initializer list expressions
This is relatively straightforward in the easy cases, because the front-end will have already type-checked the elements of the initializer list, and attached an appropriate type to the overall expression.
Notes:
- We are assuming in this code that if the user provides a "flattened" initializer list when dealing with nested aggregates, then the front-end is responsible for grouping things up apprporiately (this is not actually implemented in the front-end today).
- I have only handled arrays and `struct` types here, so uses of initializer lists for anything else will fail.
- I have not tried to handle the common HLSL idiom of using `{0}` as a way to default-initialize things, even when their first field is not compatible with the expression `0`
- I have not implemented support for default-initializing fields/elements beyond those for which explicit initializers were provided. This can be addressed as a follow-on change.
This change is one clear place where the front-end lowering logic could potentially be made much cleaner using a "destination-driven" code generation strategy. For example, given the following code
```hlsl
struct A { int a0; a1; };
struct B { A b0; A b1; };
struct C { B c0; B c1; };
// ...
C c = { { { 0, 1 }, {2, 3}, }, /* ... */ };
```
Our current code generator will end up allocating local variables for 1 instance of `C`, two instances of `B`, and four instances of `C`, for over 3x the allocation that would be done by a good destination-driven code generator. Yes, later optimization passes should be able to clean up the waste, but avoiding the waste from the start should result in faster compiles and also easier debugging (since intermediate IR won't be as messy in general).
* Fixup: try to appease clang compiler
|
|
These were already being handled a little bit, by lowering an `out T` or `inout T` function parameter in the AST to a function parameter with type `T*` in the IR, and then emiting explicit loads/stores.
The HLSL emit logic, however, couldn't tell the difference between an `out` parameter, an `inout`, or a true pointer (if we ever needed to support them...).
The intention (not fully implemented) was that we'd use a hierarchy of types rooted at `PtrTypeBase`:
- `PtrTypeBase`
- `Ptr`: "real" pointers in the C/C++ sense
- `OutTypeBase`: pointers used to represent by-reference parameter passing
- `OutType`: IR level type for an `out` parameter
- `InOutType`: IR level type for an `inout` or `in out` parameter
Actually implementing this involved:
- Adding a bit more flexibility to the `Session::getPtrType` logic to allow for creating any of the concrete types above
- Making the `lower-to-ir` logic create the right type for function parameters (instead of just using `PtrType`)
- Making the HLSL emit logic check for the `OutType` and `InOutType` cases rather than just `PtrType`
- Changing a bunch of small places in the code so that they use `PtrTypeBase` instead of `PtrType` when they should handle any of the above cases, and also make a few places check for `OutTypeBase` instead of `PtrType` or `PtrTypeBase`, when they are really trying to capture by-reference parameters
- Add a test case that uses all of the different cases we care about (without these fixes, this test case generates errors from fxc because of variables being used before being initialized, becaues parameters get declared `out` that should be `inout`).
A minor point here is that we are playing a bit fast and loose right now because the IR does not actually enforce any type checks. From the standpoint of the front end, `Ptr<T>`, `Out<T>`, and `InOut<T>` are all unrelated types (each is just a `struct` declared in `core.meta.slang`), but this doesn't really matter because none of these are types our current users are explicitly using.
In the IR it makes perfect sense to allow `Out<T>` or `InOut<T>` as the operand of a `load` or `store` instruction (and ditto for `getFieldAddr`, etc.) - there instructions just apply to any `PtrTypeBase`. The place where this potentially gets tricky is whether an `Out<T>` can be used where a `Ptr<T>` is expected, or vice vers (e.g., can I just pass my local variable's pointer directly to an `Out<T>` function parameter?
I'm going to ignore these issues for now, since the code currently works for our test case.
|
|
* Add support for global generic parameters
(In-progress work)
This commit include:
1. Update Slang API to allow specification of generic type arguments in an `EntryPointRequest`
2. Add parsing of `__generic_param` construct, which becomes a GlobalGenericParamDecl, contains members of `GenericTypeConstraintDecl`.
3. Semantics checking will check whether the provided type arguments conform to the interfaces as defined by the generic parameter, and store SubtypeWitness values in the EntryPointRequest, which will be used by `specializeIRForEntryPoint` when generating final IR.
4. Add a new type of substitution - `GlobalGenericParamSubstitution` for subsittuting references to `__generic_param` decls or to its member `GenericTypeConsraintDecl` with the actual type argument or witness tables.
5. Update `IRSpecContext` to apply `GlobalGenericParamSubstitution` when specializing the IR for an EntryPointRequest.
6. Update `render-test` to take additional `type` inputs, which specifies the type arguments to substitute into the global `__generic_param` types.
This commit does not include ProgramLayout specialization.
* IR: pass through `[unroll]` attribute (#284)
The initial lowering was adding an `IRLoopControlDecoration` to the instruction at the head of a loop, but this was getting dropped when the IR gets cloned for a particular entry point.
The fix was simply to add a case for loop-control decorations to `cloneDecoration`.
* fix warnings
* IR: support `CompileTimeForStmt` (#286)
This statement type is a bit of a hack, to support loops that *must* be unrolled.
The AST-to-AST pass handles them by cloning the AST for the loop body N times, and it was easy enough to do the same thing for the IR: emit the instructions for the body N times.
The only thing that requires a bit of care is that now we might see the same variable declarations multiple times, so we need to play it safe and overwrite existing entries in our map from declarations to their IR values.
Of course a better answer long-term would be to do the actual unrolling in the IR. This is especially true because we might some day want to support compile-time/must-unroll loops in functions, where the loop counter comes in as a parameter (but must still be compile-time-constant at every call site).
* Add support for global generic parameters
(In-progress work)
This commit include:
1. Update Slang API to allow specification of generic type arguments in an `EntryPointRequest`
2. Add parsing of `__generic_param` construct, which becomes a GlobalGenericParamDecl, contains members of `GenericTypeConstraintDecl`.
3. Semantics checking will check whether the provided type arguments conform to the interfaces as defined by the generic parameter, and store SubtypeWitness values in the EntryPointRequest, which will be used by `specializeIRForEntryPoint` when generating final IR.
4. Add a new type of substitution - `GlobalGenericParamSubstitution` for subsittuting references to `__generic_param` decls or to its member `GenericTypeConsraintDecl` with the actual type argument or witness tables.
5. Update `IRSpecContext` to apply `GlobalGenericParamSubstitution` when specializing the IR for an EntryPointRequest.
6. Update `render-test` to take additional `type` inputs, which specifies the type arguments to substitute into the global `__generic_param` types.
progress on parameter binding
* Add a more contrived test case for specializing parameter bindings
* update render-test to align buffers to 256 bytes (to get rid of D3D complains on minimal buffer size).
* adding one more test case for parameter binding specialization.
* Cleanup according to @tfoleyNV 's suggestions.
* fix a bug introduced in the cleanup
|
|
This statement type is a bit of a hack, to support loops that *must* be unrolled.
The AST-to-AST pass handles them by cloning the AST for the loop body N times, and it was easy enough to do the same thing for the IR: emit the instructions for the body N times.
The only thing that requires a bit of care is that now we might see the same variable declarations multiple times, so we need to play it safe and overwrite existing entries in our map from declarations to their IR values.
Of course a better answer long-term would be to do the actual unrolling in the IR. This is especially true because we might some day want to support compile-time/must-unroll loops in functions, where the loop counter comes in as a parameter (but must still be compile-time-constant at every call site).
|
|
- Change function mangling so we use `p<parameterCount>p` instead of just `p<parameterCount>` to avoid the parameter count running into digits at the start of a mangled type name and tripping up the un-mangling logic.
- We really need to step back at some point and define our mangling scheme a bit more carefully, especially if we are going to keep going down this road where un-mangling things is important for generating HLSL output.
- Also allow the unmangling logic to unmangle a few more cases of generic parameters, so that it can skip over them to get to the parameter count of the underlying function.
- Add a notion of an `unreachable` instruction to the IR, and emit it as the terminator (if needed) at the end of the last block for a function with a non-void return type.
- This does *not* implement any logic to emit a diagnostic if the `unreachable` turns out to be potentially reachable
- Fix a bug in IR specialization of generics where we can't create two different specializations of the same function, because both get registered in the same hash map
With all these fixes, testing in Falcor modified to use the full Slang compiler and IR for all HLSL/Slang:
- The UI and text rendering shaders yield HLSL that compiles without error; no idea if they actually *work*
- The ModelViewer shaders yield HLSL, but there are some issues (looks like type legalization isn't applying to stuff inside constant buffers)
|
|
* IR: add support for `switch` statements
Fixes #273
This is just something we hadn't gotten to yet on the IR. The actual design of the instruction is unsurprising (once you take into consideration the requirement for structured control flow). A `switch` instruction takes the form:
switch <condition> <breakLabel> <defaultLabel> [<caseVal> <caseLabel>]*
Where `condition` is the value to switch on, `breakLabel` is the "join point" after the original `switch` statement, `defaultLabel` is where to go if the value doesn't match any case, and each pair of `caseVal` and `caseLabel` is what to do on a particular value. It is required that `caseVal` be a literal, but this isn't currently being enforced in the IR (the front-end should be making a check and constant-folding the case labels).
For structured control flow, we also make the assumption that the cases are in order: cases with the same label must be grouped together, and any case that falls through to another must come right before it.
Given this representation, the emit logic can reconstruct a `switch` statement with relative ease, given the machinery we already have. It makes sure to group together case values with the same label (again, assuming they are contiguous), and will insert the `default:` label in with whatever group it belongs to.
Actually emitting code for a `switch` statement seems superficially simple, until you realize that a complete implementation needs to handle stuff like "Duff's Device." The current implementation makes the assumption that all `case` and `default` statements are directly nested under a `switch`, and that there is no way for control flow to enter a case except by the `switch` itself, or fall-through.
In order to facilitate the grouping of cases in the IR-to-HLSL emit logic, the AST-to-IR lowering logic tries to detect cases where there are multiple `case`s in a row, and emit only a single label for them.
One big/annoying gotcha is that we don't properly handle the case where a `default:` case has a non-trivial fall-throguh to another case. That seems fine for now since HLSL doesn't support fall-through anyway, but it probably needs to get detected somewhere in the Slang compiler (e.g., maybe we should add a diagnostic pass over the IR that detects target-specific problems like that and emits errors).
* IR: Add support for empty statements.
- Add empty statement in `lower-to-ir.cpp`
- Go ahead and eliminate the statement catch-all and explicitly enumerate the cases we don't support
- Fix up parser for block statements so that it doesn't leave a null statement as the body of a `{}`
- Add an empty statement to one of the cases for the `switch` test, to ensure we are testing empty statements
|
|
* IR: Add support for break and continue statements
The front-end is already doing the work of connecting this statements to their "parent" statement, so we just needed to build a map from the `Stmt*` to the corresponding `IRBlock*`s to use for break/continue when outputting any loop statement, and then look up in the map for the branch target when outputting a break/continue.
When we get around to adding `switch` statements, the same pattern should work just fine.
I also added support for `do/while` statements in IR codegen, and made sure to exercise those in one of the test cases I added.
There is also an unrelated IR codegen fix for when there is a "bound subscript" on the RHS of an assignment.
* IR: fix handling of do/while and continue
Thanks to @csyonghe for pointing out my mistake in the earlier commit.
I implemented `continue` for `do/while` loops incorrectly, branching to the head of the loop instead of the loop test.
I'll try to blame this mistake on the fact that I never use `do/while` loops because I think they are awful. :)
The fix for that issue wasn't too bad (see `lower-to-ir.cpp`) but it surfaces a much more serious issue: I wasn't actually implementing `continue` correctly *at all* when it comes to generating HLSL/GLSL from the IR (I can't easily make an excuse for that one).
The basic issue at the heart of this is that given an input statement like:
```
for(int ii = 0; ii < N; ii = doSomething(ii))
{ ... }
```
The continue clause (`ii = doSomething(ii)`) could expand into many instructions (across multiple blocks, if we inline), and there is in general no guarantee when we are done that we can package up that code as an expression and spit out a new `for` loop (the same basic argument applies to a `do { ... } while(someComplexExpression())`.
So, if we assume that in general we have to generate a full *statement* for the `continue` clause, what can we emit?
- We could try to "outline" the continue code into its own function, so that we can call it from an expression. That could work, but has high implementation complexity.
- We could introduce additional `bool` variables for control flow, outputting something like:
```
bool useContinueBlock = false;
for(;;) {
if(useContinueBlock) { <CONTINUE CODE>; }
useContinueBlock = true;
<LOOP TEST>
<LOOP BODY>
}
```
This works but user might balk at the extra variable we introduce.
- We could duplicate the code at each continue site. That is, we emit the loop as:
```
for(;;) {
<LOOP TEST>
<LOOP BODY>
<CONTINUE CODE>
}
```
but then whenever we'd like to emit `continue;` we instead emit `{ <CONTINUE CODE>; continue; }`.
This doesn't introduce any extra variables, but it causes code duplication (limited, if we don't have too many `continue` sites, and the continue clause is small - which are the common cases).
When I was initially working on the IR codegen I picked that last option just because it is what `fxc` seems to do, but I neglected to actually *implement* the special-case codegen for a `continue` instruction. This change addresses that (see `emit.cpp`).
Finally, once things were fixed the `continue` test case produced the results Yong told me to expect, but it also produced a warning from the downstream HLSL compiler ("hey, your loop doesn't ever actually *loop*!"), so I reworked the test back to one that actually loops (but still tests `continue`).
As a final aside in this essay of a commit message: the current IR representation of control flow uses special-case instructions for various cases of unconditional branch (and two variations on `if`), but these are not strictly necessary, and a future change will hopefully clean it up. The biggest catch in doing that is that it will require the IR->source codegen to carefully track which blocks represent which kinds of branch targets in context (e.g., you can't assume that a `continue` that nees the special handling above will appear as a distinct kind of instruction).
|
|
- Add definition of `discard` instruction
- A `discard` is a terminator instruction, just like `returnVoid`
- Lower `DiscardStmt` in AST to a `discard` instruction in the IR
- Emit `discard` instruction as a `discard;` statement when emitting HLSL/GLSL
- Add a test case using the "graphics compute" mode that tests discard. The test writes to one entry in a UAV before doing a conditional (always true at runtime) discard, and then writes to another entry; we expect to see the results of the first write, but not the second.
|
|
* improve diagnostic messages and prevent fatal errors from crashing the compiler.
* fix top level exception catching.
* spelling fix
* change wording of invalidSwizzleExpr diagnostic
* add speculative GenericsApp expr parsing
* add new test case of cascading generics call.
* Fixing bugs in compiling cascaded generic function calls.
Add implementation of DeclaredSubTypeWitness::SubstituteImpl()
This is not needed by the type checker, but needed by IR specialization. When input source contains cascading generic function call, the arguments to `specialize` instruction is currently represented as a substitution. The arg values of this subsittution can be a `DeclaredSubTypeWitness` when a generic function uses one of its generic parameter to specialize another generic function. When the top level generics function is being specialized, this substitution argument, which is a `DeclaredSubTypeWitness`, needs to be substituted with the witness that used to specialize the top level function in the specialized specialize instruction as well.
* add a test case for cascading generic function call.
* parser bug fix
* fixes #255
* add test case for issue #255
* Generate missing `specialize` instruction when calling a generic method from an interface constraint.
When calling a generic method via an interface, we should be generating the following ir:
...
f = lookup_interface_method(...)
f_s = specailize(f, declRef)
...
This commit fixes this `emitFuncRef` function to emit the needed `specialize` instruction.
* fixes #260
This fix follows the second apporach in the disucssion. It generated mangled name for specialized functions by appending new substitution type names to the original mangled name.
* Disabling removing and re-inserting specailized functions in getSpecalizeFunc()
I am not sure why it is needed, it seems HLSL and GLSL backends are generating forward declarations anyways, so the order of functions in IRModule shouldn't matter.
* cleanup and complete test cases.
* fix warnings
|
|
- During IR emit, treat a "select" expression (`?:` operator) like any other `InvokeExpr`, since it will have an `__intrinsic_op` modifier attached to turn it into a `select` instruction.
- During HLSL/GLSL emit from IR, turn a `select` instruction into a `?:` expression
- Also add support for the `neg` instruction during HLSL/GLSL emit
Note that right now we are assuming HLSL semantics for `?:` where it does not short-circuit. Correctly handling the GLSL case would require going back to special-case codegen for `SelectExpr`, but we can cross that bridge when we come to it.
|
|
|
|
|
|
|
|
|