slang.git - Making it easier to work with shaders

Age	Commit message (Collapse)	Author
2018-03-03	IR: next phase of "everything is an instruction" (#433)	Tim Foley
	The main practical change here is that things that used to be `IRValue`s, like literals, are now being expressed as instructions in the global scope. In order to validate that things are actually being handled correctly, this change introduces an explicit "validation" pass that can be run on the IR to check for different invariants (although it doesn't check many of the important ones right now). I've left the validation pass turned off by default, but with a command-line flag to enable it. We may want to make it be on by default in debug builds, just to keep us honest. The main invariant for the moment is that when on IR instruction is used as an operand to another, it had better come from the same IR module. Some of the existing passes were violating this rule, in particular when it came to cloning of witness tables related to global generic parameter substitution. Those features can in theory be handled better now by allowing `specialize` instructions at other scopes, but I didn't want to over-complicate this change, so I make just enough fixes to ensure that these steps always clone witness tables they get from the "symbols" on an IR specialization context. In order for this to work when recursively specializing, I had to ensure that the logic for generic specialization had a notion of a "parent" specialization context that it would fall back to to perform cloning when necessary. This change keeps the logic that was caching and re-using the instructions for literal values within a module, but adds some logic that isn't really being tested right now for picking the right parent instruction to insert a constant instruction into. This logic doesn't trigger right now because all of the cases we are using it on have zero operands (and so they always get "hoisted" to the global scope), but eventually for things like types we want to be able to support instructions with operands (e.g., `vector<float, 4>`) and handle the case where some of those operands come from different scopes (e.g., when nested inside a generic). The final change here is mostly cosmetic: the `IRBuilder` is now more abstract about where insertion occurs: it tracks a single `IRParentInst` to insert into, and then an optional `IRInst` to insert before. In the common case, that parent is an `IRBlock`, but it could conceivably also be the global scope, or a witness table, etc. Use sites where we used to change those fields directly now use distinct methods `setInsertInto(parent)` and `setInsertBefore(inst)` which capture the two cases we care about. Accessors are also defined to extract the current block (if the current parent is a block), and the current "function" (global value with code, if the current parent is a global value with code, or a block inside one). With this work in place, it should be possible for a follow-on change to start putting `specialize` instructions at the global scope and thus clean up some of the on-the-fly specialization work. This work should also help with some of the requirements around a distinct IR-level type system and more explicit generics.
2018-03-01	IR: "everything is an instruction" (#432)	Tim Foley
	* IR: "everything is an instruction" This change tries to streamline the representation of the IR in the following ways: * Every IR value is an instruction (there is no `IRValue` type any more) * All IR values that can contain other values share a single base (`IRParentInstruction`) * Dynamic casts to specific IR instruction types can be accomplished with a new `as<Type>(inst)` operation, that uses the IR opcode to implement casts. The biggest change in terms of number of lines is getting rid of `IRValue`. The diff here could probably be smaller if I'd just done `typedef IRInst IRValue;`. Along the way I also renamed the `getArg`/`getArgs`/`getArgCount` combination over to `getOperand`/`getOperands`/`getOperandCount` to avoid being confusing when we have something like a `call` instruction where the "arguments" of the call don't line up with the operands of the instruction. I also tried to clean up the representation of lists of child instructions to try to make it easier to iterate over them with C++ range-based `for` loops. Developers still need to be careful about mutating the contents of a block while iterating over it in this fashion (e.g. if you remove the "current" element, the iteration will end prematurely). Probably the thorniest change here is that parameters are now just represented as the first N instructions in a block, which means: * We need to perform a linear search to find the end of the parameter list. This is probably not often a problem, because usually you would be iterating over the parameters anyway, and that will be linear in the number of parameters. * Algorithms that iterate over a block either need to ignore parameters, treat parameters just like other instructions, or somehow cleave the list into the range of parameters, and the range of "ordinary" instructions (which involves the same linear search above). * When inserting into a block, we need to be careful not to insert instructions at invalid locations (e.g., insert a temporary before the parameters, or insert a parameter in the middle of the code). I can't pretend that I've handled the details of that here. (This is no different than having to make the same adjustments for phi nodes in a typical SSA representation) * One possible future-proof approach is to implement a pass that sorts the instructions in a block so that parameters always come first. That would let us implement passes without caring about this detail, and then clean up right before any pass that cares about the relative order of parameters and other instructions. The current change is missing any work to make literals and other instructions that used to be `IRValue`s properly nest inside of their parent module. Right now these instructions are just left unparented, and may actually end up being shared between distinct modules. Fixing that will need a follow-up change. The biggest challenge there is that it introduces instructions at the global scope that aren't `IRGlobalValue`s. This change doesn't try to take advantage of any of the new flexibility (e.g., by nesting `specialize` instructions inside of witness tables). The goal is to do exactly what we were doing before, just with a different representation. * Warning fix
2018-02-23	Initial support for cross-compilation of geometry shaders to GLSL (#423)	Tim Foley
	These changes are related to getting a first Slang geometry shader to translate to GLSL. There are some unrelated cross-compilation fixes in here as well. * Add direct support to shader parameter layout for GS output streams, so that they are reflected as a container type * Fix the declarations of the `SampleCmp` methods; they should always return `float`, independent of the nominal element type of the texture. * Fix up our handling of `__target_intrinsic` modifiers, so that we are a little bit more careful in how we detect something as being just a simple name replacement (e.g., `__target_intrinsic(glsl, "foo")` should make us output `foo(original, args, here)`) vs. a custom expression (e.g., `__target_intrinsic(glsl, "bar+1")` should output `bar+1` and not use any arguments, even without any `$` substitutions). * Don't emit the `[unroll]` modifier when outputting GLSL. Eventually we need to fully unroll loops for GLSL output anyway. * Inspect th entry point parameter list (from the layout information) when emitting a GS, so that we can write out the correct `layout` modifiers for input primitive type and output primitive topology. * Add a new case to `ScalarizedVal` to handle cases where an HLSL system value needs to map to a GLSL built-in variable with a slightly different type (e.g., `SV_RenderTargetArrayIndex` is a `uint` while `gl_Layer` is an `int`). For now this is only hanlding trivial cases (where a direct cast can achieve the result we want), but eventually it might need to handle things like conversion between arrays and vectors. * This is mostly just the infrastructure for the feature, and the actual enumeration of the correct types for all the system values is still to be done. * Handle a few more cases in assignment between `ScalarizedVal`. In particular, deal with cases where `materializeValue` is called on a tuple that has an array type, so that we need to construct the individual array elements. * Add translation for GS output stream `Append()` and `RestartStrip()` * Note that the translation of `Append()` seems to ignore its argument; this is because we desugar the operation during legalization for GLSL (see next item) * When legalizing for GLSL, detect an entry point parameter that is a GS stream, and translate it into `out` variables for its element type, and then rewrite any calls to `Append()` in the body of the entry point to be preceded by assignment to those variables. This works in tandem with the above translation of HLSL `Append()` calls into GLSL `EmitVertex()` calls. * We are detecting calls to `Append()` in a slightly hacky way, by looking at decorations on the callee to make sure that it is a function that is determined to translate to `EmitVertex()`. * Right now we aren't handling calls to `Append()` in other functions. It wouldn't be hard in principle to walk all the functions in the module and apply the translation (assuming we don't want to start supporting multiple output streams), but this wouldn't handle the passing of the GS output stream between functions. (This points out that there is a need for an additional type legalization pass that desugars away parameters of types that aren't actually meaningful on the target).
2018-02-22	Make `IRGlobalValue::mangledName` a `Name*`	Yong He
	This allows us to get rid of `IRGlobalValue::dispose()`.
2018-02-22	Initial work on validating "constexpr"-ness in IR (#420)	Tim Foley
	* Initial work on validating "constexpr"-ness in IR The underlying issue here is that certain operations in the target shading languages constrain their operands to be compile-time constants. A notable example is the optional texel offset parameter to the `Texture2D.Sample` operation. When calling these operations in GLSL, the user is required to pass a "constant expression," and any variables in that expression must therefore be marked with the `const` qualifier (and themselves be initialized with constant expressions). Any GLSL output we generate must of course respect these rules. When calling these operations in HLSL, the user is not so constrained. Instead, they can pass an arbitrary expression, which may involve ordinary variables with no particular markup, and then the compiler is responsible for determining if the actual value after simplification works out to be a constant. In some cases, the requirement that a value be constant might actually trigger things like loop unrolling. Also, it is okay to use a function parameter to determine such a constant expression, as long as the argument turns out to be a constant at all call sites. The way we have decided to tackle these challenges in Slang is that we we propagate a notion of `constexpr`-ness through the IR. This is currently being tackled in `ir-constexpr.cpp` with a combination of forward and backward iterative dataflow: * When the operands to an instruction are all `constexpr`, and the opcode is one we believe can be constant-folded, then we infer that the instruction can be evaluated as `constexpr` * When instruction is required to be `constexpr`, then we infer that all of its operands are also required to be `constexpr`. If this process ever infers that a function parameter is required to be `constexpr`, then we might have to continue propagation at all the call sites to that function. If after all the propagation is done, there are any cases where an instruction is required to be `constexpr`, but it can't be `constexpr` (we weren't able to infer `constexpr`-ness for its operands), then we issue an error. This implementation encodes the idea of `constexpr`-ness in the IR as part of the type system, using a simplified notion of rates. This change adds a `RateQualifiedType` that can represent `@R T`, and then introduces a `ConstExprRate` that can be used for `R`. Many accessors for the type information on IR nodes were updated to distinguish when one wants the "full" type of an IR value (which might include rate information) vs. just the "data" type. A `constexpr` qualifier was added in the front-end, and is being used to decorate the texel offset parameter for `Texture2D.Sample`. Lowering from AST to IR looks for this qalifier and infers when a function parameter must be typed as `@ConstExpr T` instead of just `T`. There are lots of limitations and gotchas in the implementation so far: * The `@ConstExpr` rate is the only one added in this change, but it seems clear that the conceptual `ThreadGroup` rate that was added to represent `groupshared` should probably get folded into the representation. * I'm not 100% pleased with how many places in the IR I have to special-case for rate-qualified types. At the same type, pulling out rate as a distinct field on `IRValue` would probably require that we pay attention to rate everywhere. * I've added a test case to show that we can issue errors when users fail to provide a constant expression for the texel offset, but the actual error message isn't great because it doesn't indicate why a constant expression was required. Realistically the "initial IR" should contain a few more decorations we can use to relate error conditions back to the original code (even if this is in a side-band structure). * I've added a test case that is supposed to show that we can back-propagate `constexpr`-ness to local variables, and I've manually confirmed that it works for Vulkan/SPIR-V output, but the level of Vulkan support in `render_test` today means I can't enable the test for check-in. * While I'm attempting to propagate `@ConstExpr` information from callees to callers, I haven't implemented any logic to specialize callee functions based on values at call sites. * In a similar vein, there is no handling of control-flow dependence in the current code. If we infer that a phi (block parameter) needs to be `@ConstExpr`, then it isn't actually enough to require that the inputs to the phi (arguments from predecessor blocks) are all `@ConstExpr` because we also need any control-flow decisions that pick which incoming edge we take to be `@ConstExpr` as well. * As a practical matter, implicit propagation of `@ConstExpr` from a function body to a function parameter should only be allowed for functions that are "local" to a module. Any function that might be accessed from outside of a module should really have had its `@ConstExpr` parameter marked manually, and our pass should validate that they follow their own rules. Right now we have no kind of visibility (`public` vs `private`) system, so I'm kind of ignoring this issue. While that is a lot of gaps, this is also just enough code to get the Falcor MultiPassPostProcess example working, so I'm inclined to get it checked in. * Fixup: missing expected output for test * Fixup: disable test that relies on [unroll] for now
2018-02-19	more to fixing memory leaks	Yong He
	1. reorder destruction order of several key classes to avoid using deleted IR objects when destroying Types 2. remove Session::canonicalTypes and make each Type own a RefPtr to the canonicalType, to allow types to be destroyed along with each IRModule it belongs to.
2018-02-19	Fix IR memory leaks.	Yong He
	1, make IRModule class own a memory pool for all IR object allocations 2. For now, we allow IR objects to own other (externally) heap allocated objects, such as String, List and RefPtrs by tracking all IR objects that has been allocated for the IRModule in a list named `IRModule::irObjectsToFree`. and call destructor for all these objects upon the destruction of the IRModule. In the long term, we should eliminate the use of all these externally allocated types in IR system and get rid of this tracking and explicit destructor calls. 3. remove non-generic `createValueImpl` functions and retain only generic versions in IRBulider so we can properly call the constructor of the IR types to set up virtual tables correctly for destructor dispatching. 4. add `MemoryPool` class for allocation of the IR objects. 5. Make sure we are disposing IRSpecContexts when we are done with the specialized IR module. 6. Add `_CrtDumpMemoryLeaks()` calls to check memory leaks upon destruction of a Slang session. If we are to support multiple sessions at a time, this call should probably be replaced with the more advanced MemoryState versions of the memory leak checker.
2018-02-16	Implement IR-level translation of system values for GLSL (#413)	Tim Foley
	The approach here isn't ideal. We already have a pass that transforms HLSL varying input/output types into GLSL global variables. This change makes it so that when those inputs/outputs have system-value semantics, we generate a global variable declaration with the appropriate `gl_` name (leaving the type the same for now). Later, when emitting code, we just skip emitting declarations for declarations with mangled names that start with `gl_`. A more complete implementation will be needed later on, which handles cases where the translation requires types to be changed (so that conversion code needs to be inserted).
2018-02-16	IR/Vulkan fixes (#412)	Tim Foley
	* Fix bugs around IR legalization of GLSL input/output - Add case to handle assignment of one `ScalarizedVal::Flavor::address` to another (still need to make sure we are handling all the possible cases there) - Revamp logic for creating global variable declarations for varying inputs/outputs. - Actually handle creating array declarations (not sure if binding locations will be correct) - Properly deal with offsetting of locations for nested fields - Only create varying input/output layout information as needed for the separate `in` and `out` variables we create to represent a single HLSL `inout` varying * During SSA generation, recursively remove trivial phis This is actually written up in the original paper I used as a reference, but I hadn't implemented the case yet. When you eliminate one phi as trivial (because its only operands were itself and at most one other value), you might find that another phi becomes trivial (because it had this phi as an operand, but now it will have the other value...). The one thing that made any of this tricky is that our "phi" nodes are really block parameters, and thus they don't technically have operands (`IRUse`s). The `IRUse`s for each phi were being tracked in a separate array, and had their `user` field set to null. With this change, I set their `user` to be the corresponding `IRParam` for the phi (and that means I changed `IRParam` to inherit from `IRUser` even though it shouldn't really be required). * Re-build SSA form after specialization/legalization The main reason to do this is that legalization might scalarize types, and thus might allow us to clean up resource-type local variables that we were not able to clean up when they were part of an aggregate. Note: we shouldn't really need to do this, because the front-end should actually be guaranteeing that types that include resources are used in "safe" ways, but we currently don't have the analyses required to support that. * Give an error message if we get GLSL input The API and command-line interface still recognize and nominally support GLSL input files, because they need to be supported in the "pass-through" mode. This change just adds an error message if we encounter a GLSL input file in anything other than "pass-through" mode.
2018-02-13	Fix a bug in IR use-def information (#406)	Tim Foley
	The basic problem here is that when unlinking an `IRUse` from the linked list of uses, there were several cases where I was failing to set the `prevLink` field of the next node to match the `prevLink` field of the node being removed. That doesn't show up when walking the linked list of uses forward, but it breaks it whenever you have subsequent unlinking operations. This change fixes the bugs of that kind I could find, and also adds a debug validation method to try to avoid breaking it again. I also made more access to `IRUse` go through accessor methods rather than using fields directly, to try to avoid this kind of error. I stopped short of making anything `private`, because I tend to find that it creates more hassles than it avoids. A few other fixes along the way: - Made the `List<T>` type default-initialize elements when you resize it. I hadn't realized we weren't doing that. - Add a standalone `dumpIR(IRGlobalValue*)` so help when debugging issues.
2018-02-08	Basic IR support for `static const` globals (#404)	Tim Foley
	* Basic IR support for `static const` globals Our strategy for lowering global variables can fall back to putting their initialization into a function, but that isn't really appropriate for global constants (it also isn't appropriate for arrays, but we'll need to deal with that seaprately). This change adds a distinct case for global constants (rather than treating them as variables), and forces the emission logic to always emit them as a single expression. Doing this makes assumptions about how the IR for these constants gets emitted (and what optimziations might do to it). In order to make things work, I had to switch the handling of initializer-list expressions to not be lowered via temporaries and mutation (since that isn't a good fit for reverting to a single expression). I've added a single test case to ensure that this works in the simplest scenario. My next priority will be to see if this unblocks my work in Falcor. * Fixup: bug fixes
2018-02-08	Falcor fixes (#402)	Tim Foley
	* Re-define deprecated compile flags By including these flags in the header file, with a value of zero, we can allow some existing code to compile even after the major changes to the implementation. * The `SLANG_COMPILE_FLAG_NO_CHECKING` option will effectively be ignored, since checking is always enabled. * The `SLANG_COMPILE_FLAG_SPLIT_MIXED_TYPES` option will now act as if it is always enabled (and indeed some of the code has been relying on this flag being set always). * Make subscript operators writable for writable textures This even had a `TODO` comment saying that we needed to fix it, and now I'm seeing semantic checking failures because we didn't define these and so we find assignment to non l-values. * Fix definitions of any() and all() intrinsics These should always return a scalar `bool` value, but they were being defined wrong in two ways: 1. They were using their generic type parameter `T` in the return type 2. They were returning a vector in the vector case, and a matrix in the matrix case. This change just alters the return type to be `bool` in all cases. * Fix bug in SSA construction When eliminating a trivial phi node, it is possible that the phi is still recorded as the "latest" value for a local variable in its block. When later code queries that value from the block (which can happen whenever another block looks up a variable in its predecessors), it would get the old phi and not the replacement value. I simply added a loop that checks if the value we look up is a phi that got replaced, and then continues with the replacement value (which might itself be a phi...). A more advanced solution might try to get clever and have the map itself hold `IRUse` values so that we can replace them seamlessly. * Simplify IR control flow representation This change gets rid of various special-case operations for conditional and unconditional branches, and instead requires emit logic to recognize when a direct branch is targetting a `break` or `continue` label. The new approach here isn't perfect, but it seems beter than what we had before, because it can actually work in the presence of control-flow optimizations (including our current critical-edge-splitting step). * Load from groupshared isn't groupshared When loading from a `groupshared` variable, the resulting temporary shouldn't have the `groupshared` qualifier on it. This might eventually need to generalize to a better understanding of storage modifiers in the IR, but I don't really want to deal with that right now. * Don't emit references to typedefs in output code Now that we are using the IR for all codegen, we shouldn't be dealing with surface-level things like `typedef` declarations in the output code; just use the type that was being referred to in the first place. * Fix floating-point literal printing for IR The IR was calling `emit()` instead of `Emit()` (we really need to normalize our convention here), and was implicitly invoking a default constructor on `String` that takes a `double` (that constructor should really be marked `explicit`), and which doesn't meet our requirements for printing floating-point values. * Fix error when importing module that doesn't parse We already added a case to bail out if semantic checking fails, but neglected to add a case if there is an error during parsing of a module to be imported. Note: this logic doesn't correctly register the module as being loaded (but still in error), so users could see multiple error messages if there are multiple `import`s for the same module. * Improve error message for overload resolution failure - Drop debugging info from the candidate printing - Add cases to print `double` and `half` types properly * Fixup: switch loopTest to ifElse in expected IR output
2018-02-07	Generate SSA form for IR functions (#400)	Tim Foley
	* Generate SSA form for IR functions The basic idea here is simple: in the front-end after we have lowered the AST to initial IR we will apply a set of "mandatory" optimization passes. The first of these is to attempt to translate the all functions into SSA form so that they are amenable to subsequent dataflow optimizations. Eventually, the mandatory optimization passes would include diagnostic passes that make sure variables aren't used when undefined, etc. Just doing basic SSA generation already cleans up a lot of the messiness in our IR today, because constructs that used to involve many local variables can now be handled via SSA temporaries. The implementation of SSA generation is in `ir-ssa.cpp`, and it follows the approach of Braun et al.'s "Simple and Efficient Construction of Static Single Assignment Form." I used this instead of the more well-known Cytron et al. algorithm because Braun's algorith mis very simple to code, and does not require auxiliary analyses to generate the dominance frontier. The main wrinkle in our SSA representation right now is that instead of using ordinary phi nodes, we instead allow basic blocks to have parameters, where predecessor blocks pass in different parameter values. This encodes information equivalent to traditional phi nodes, but has two (small) benefits: 1. There is no fixed relationship between the order of phi operands and predecessor blocks, so we don't have to worry about breaking the phis when we alter the order in which predecessors are stored. This is important for us because predecessors are being stored implicitly. 2. It is easy to operationalize a "branch with arguments" either when lowering to other languages, or when interpreting the IR. A branch with arguments is implemented as a sequence of stores from the arguments to the parameters of the target block (very similar to a call), followed by a jump to the block. Relevant to the above, this change also adds an interface for enumerating the predecessors or successors of a block in our CFG. Rather than use an auxliary structure, we directly use the information already encoded in the IR: * The sucessors of a block are the target label operands of its terminator instruction. In our IR this is a contiguous range of `IRUse`s, possible with a stride (to account for the way `switch` interleaves values and blocks). * The predecessors of a block are a subset of the uses of the block's value. Specifically, they are any uses that are on a terminator instruction, and within the range of values that represent the successor list of that instruction. One important limitation of the "blocks with arguments" model for handling phis is that it is really only convenient to stash extra arguments on an unconditional terminator instruction. This change works around this prob lem by breaking any "critical edges" - edges between a block with multiple successors and one with multiple predecessors. We assume that "phi" nodes will only ever be needed on a block with multiple predecessors, and because critical edges are broken, each of these predecessors will then have only a single successor, so its branch instruction can handle the extra arguments. This change introduces a notion of an "undefined" instruction in the IR. This is handled as an instruction rather than a value because I anticipate that we will want to distinguish different undefined values when it comes time to start issuing error messages (those messages will need to point to the variable that was used when undefined). * Fix expected test output. Another change was merged that enabled the `glsl-parameter-blocks` test, and its output is affected by our IR optimization work.
2018-02-07	Support __target_intrinsic modifiers in IR codegen (#401)	Tim Foley
	The standard library already has a bunch of these decorations, since they were added to support Slang->Vulkan codegen on the AST-to-AST path. This change makes the IR code generator able to exploit the modifiers so that we pick up a bunch of Vulkan support "for free" in the short term. The basic change is in `lower-to-ir.cpp` where we copy over any `TargetIntrinsicModifier`s to become `IRTargetIntrinsicDecoration`s with the same information. We then need a bit of logic in `ir.cpp` to make sure we clone them as needed. The core work of using the modifiers is in `emit.cpp`, where I basically just copy-pasted the existing logic that applied in the AST path (all the AST-related code there is dead, and we should clean it up soon). The big change that comes with this logic is that when dealing with a member function, the numbering of the argument used in the intrinsic definition string changes, so that `$0` refers to the base object (whereas before the base object was looked up via the base expression of a `MemberExpr` used for the function). This requires a bunch of the definitions in the library to be updated; hopefully I caught them all. For kicks, I've re-enabled a cross-compilation test just to confirm that we are generating valid SPIR-V for code that performs texture-fetch operations. I don't expect us to keep that test enabled as-is in the long term, though, because it would be much better to instead use render-test to do the same thing. Alas, beefing up the Vulkan support in render-test is an outstanding work item, and I didn't want to pollute this change with more work along those lines.
2018-01-21	A hacky fix for specializing methods from extensions	Tim Foley
	If we don't find the generic we expect in the first pass during IR specialization, then we check for the special case where we are trying to specialize something from a generic extension, using the type being extended. We assume that the generic parameter lists match (that part is the huge hack), and collect the arguments as if they were for the extension instead of the type. This will break when/if we ever have generic extensions with parameter lists that don't match the type being extended.
2018-01-21	specialize witness tables when needed when specializing ↵	Yong He
	`lookup_witness_table` instruction. (#376)
2018-01-21	Improvements and bug fixes for global type parameters	Yong He
	1. allow spReflection_FindTypeByName to accept arbitrary type expression string 2. allow const int generic value to be used as expression value, and as array size 3. various bug fixes in witness table specialization / function cloning during specializeIRForEntryPoint to avoid creating duplicate global values, not copying the right definition of a function from the other module, not cloning witness tables that are required by specializeGenerics etc.
2018-01-17	All compiler fixes to get ir branch work with falcor feature demo.	Yong He
	- support overloaded generic function. this involves adding a new expression type, `OverloadedExpr2` to hold the candidate expressions for the generic function decl being referenced. - make BitNot a normal IROp instead of an IRPseudoOp - make sure we clone the decorations of parameters when cloning ir functions - propagate geometry shader entry point attributes (`[maxvertexcount]` and `[instance]`) through HLSL emit - IR emit: handle geometry shader entry-point parameter decorations, such as 'triangle'. - IR emit: treat geometry shader stream output typed ir value as `should fold into use`.
2018-01-16	bug fixes to get falcor example shader code to compile.	Yong He
	1. prevent cyclic lookups when an interface inherits transitively from itself. 2. in `createGlobalGenericParamSubstitution`, create a default substitution for the base type declref before using it to lookup the witness table.
2018-01-15	cleanup debug code	Yong He

2018-01-15	Support transitive interfaces	Yong He
	This commit is a bunch of quick hacks to get transitive interfaces to work. The idea is for each concrete type we create one giant witness table that contains entries for all the transitively reachable interface requirements, and then create one copy of that witness table for each interface it implements. `DoLocalLookupImpl` now also looks up in inherited interface decles when looking up for a symbol in an interface decl. When visiting `InheritanceDecl` in `lower-to-ir`, create copies of the giant witness table for each transitively inherited interface, so that these witness tables can be found later when the IR is specialized. Re-enable the `copy all witness tables` hack in `specializeIRForEntryPoint` to ensure those transitive witness tables are copied over.
2018-01-13	remove out-of-date changes	Yong He

2018-01-12	Support nested generics	Yong He
	fixes #362
2018-01-12	Refactor substitution representation in DeclRefBase (#363)	Yong He
	This commit changes the type of `DeclRefBase::substitutions` from `RefPtr<Substitutions>` to `SubstitutionSet`, which is a new type defined as following: ``` struct SubstitutionSet { RefPtr<GenericSubstitution> genericSubstitutions; RefPtr<ThisTypeSubstitution> thisTypeSubstitution; RefPtr<GlobalGenericParamSubstitution> globalGenParamSubstitutions; } ``` This change get rid of most helper functions to retreive the substitution of a certain type, as well as surgery operations to insert a `ThisTypeSubstitution` or `GlobalGenericTypeSubstittuion` at top or bottom of the substitution chain. It also simplies type comparison when certain type of substitution should not be considered as part of type definition.
2018-01-09	bruteforce implementation of witness table resolution for associated (#358)	Yong He

2018-01-04	Bug fixes for Slang integration (#356)	Yong He
	* fix #353 * move validateEntryPoint to after all entrypoints has been checked * bug fix: DeclRefType::SubstituteImpl should change ioDiff * bug fix: generic resource usage should have count of 1 instead of 0. * update test case
2017-12-27	Support nested generic types (e.g. L<T<S>>)	Yong He
	fixes #325 This commit includes following changes: 1. Including a default DeclaredSubtypeWitness argument when creating a default GenericSubstitution for a DeclRefType, so that the witness argument can be successfully replaced with an actual witness table after specialization. (check,cpp) 2. Not emitting full mangled name for struct field members. Since the declref of the member access instruction do not include necessary generic substitutions for its parent generic parameters, so the mangled names of the declaration site and use site mismatches. Instead we just emit the original name for struct fields. (emit.cpp) 3. Allow IRWitnessTable to represent a generic witness table for generic structs. Adds necessary fields to IRWitnessTable for generic specialization. For now, the user field of the IRUse is not used and is nullptr. (ir-inst.h) 4. Make IRProxyVal use an IRUse instead of an IRValue, so that an IRValue referenced by IRProxyVal (as a substitution argument) can be managed by the def-use chain for easy replacement. This is used for specializing witness tables. (ir.cpp, ir.h) 5. Add a `String dumpIRFunc(IRFunc)` function for debugging. 6. Add name mangling for generic / specialized witness tables (mangle.cpp) 7. improved natvis file for inspecting witness tables. 8. Add specialization of witness tables: 1) `findWitnessTable` will simply return the specialize IRInst for a generic witness table. 2) make `cloneSubstitutionArg` call `cloneValue` to clone the argument instead of calling `context->maybeCloneValue`, so we can make use of the cloned value lookup machanism to directly return the specialized witness table (which is done when we process the `specialize` instruction on the generic witness table before process the decl ref). 3) bug fix: the argument in ir.cpp:3338 should be `newArg` instead of `arg`. 4) add `specializeWitnessTable` function to specailize a generic witness table. It clones the witness table, and recursively calls `getSpecailizedFunc` for the witness table entries. 5) make `specailizeGenerics` function also handle the case when an operand of the `specialize` instruction is a witness table. We will call `specializeWitnessTable` here and replace the `specialize` instruction with the specialized witness table. The replacement mechanism based on IR def-use chain works here because we have already make IRProxyVal a part of the def-use chain. 9. Add two more test cases for nested generics with constraints. (generic-list.slang and generic-struct-with-constraint.slang)
2017-12-21	Support generic `struct` types during IR-based emit	Tim Foley
	Fixes #318 Most of the required support was actually in place, so this is just a bunch of fixes: - Detect when we are in "full IR" mode, so that we can always emit `struct` declarations with their mangled named (which will produce different names for different specializations, since we emit decl-refs) - Carefully exclude builtin types from this for now. We'll need a more complete solution for mapping HLSL/Slang builtin types to their GLSL equivalents soon. - Skip emitting types referenced by generic IR functions, since they might not be usable. - Also fix things up so that we emit types used in the initializer for any global variables. - Fix bug in generic specialization where we specialize the same function more than once, with different type arguments. We were crashing on a `Dictionary::Add` call where the key already exists from a previous specialization attempt. - Fix name-mangling logic so that when outputting a possibly-specialized generic it looks for the outer-most `GenericSubstitution` rather than just the first one in the list. This is to handle the way that we insert other substitutions willy-nilly in places where they realistically don't belong. :( All of these changes together allow us to pass a slightly modified (more advanced) version of the test case posted to #318.
2017-12-20	More fixes for Falcor IR support (#317)	Tim Foley
	* Fix: try to improve float literal formatting An earlier change switched to using the C++ stdlib to format floats, but I neglected to check that it would correctly print values that happen to be integral with a decimal place (e.g., print `1.0` instead of just `1`). That causes some obscure cases to fail (e.g., when you write `1.0.xxxx` in HLSL, and we print out `1.xxxx`). I've tried to resolve the issue by using the "fixed" output format, but that may also create problems for extremely large or small values (which really need to use scientific notation). However, this at least puts us back where we were before... * Add support for source locations to the IR - Add a `sourceLocation` field to all `IRValue`s - Add some logic to associate locations with operations while lowering (using a slightly "clever" RAII approach) - Make sure that when cloning instructions, we also clone the location - Make the emit logic use the existing support for source locations when emitting code from the IR A nice cleanup to this work would be to have the source locations for IR values not be hyper-specific about both line and column; it is probably enough to just emit on the correct line.
2017-12-08	Cleanups to `ParameterBlock<T>` behavior. (#304)	Tim Foley
	* Cleanups to `ParameterBlock<T>` behavior. These add some more realistic tests using the `ParameterBlock<T>` support, and show that it can work with the "rewriter" mode. Unfortunately, this code does not currently work with the rewriter + the IR at once. That will need to be fixed in a follow-on change, because I now see that the root problem is pretty ugly. * cleanup
2017-12-06	Make AST and IR share type legalization code (#303)	Tim Foley
	* Make AST and IR share type legalization code A previous change already made it so that the AST-to-AST lowering/legalization pass could work together with IR-based lowering of `import`ed code, but that change didn't take into account the case where a function written in the AST needed to call an IR function and pass in a type that required legalization. Both the IR-based and AST-based passes had their own approaches to type legalization, that mostly agreed on the desired output, but they ended up creating their own representations for legalized types which would mean that for a function call the caller and callee might end up legalizing the parameter list to use different types. This change tries to fix this issue (and adds a new test case that relies on the fix) by massively overhauling the AST-based legalization pass so that it uses the same type legalization code as the IR. The shared code has been moved out into `legalize-types.{h,cpp}`. Notes: - I eliminated the `FilteredTupleType` type, since it was starting to cause code duplication in a lot of places. Instead, type legalization just creates new `struct` types to represent the result of filtering. - One big consequence of this is that the `LegalType::pair` case needs to remember for each field in the original type which field (if any) in the new `struct` type it maps to - A big source of complexity (and probably bugs) in this code is trying to figure out how to parent these new `struct` definitions effectively. A good follow-on change would be something that outputs declarations on-demand during the AST emit logic (as we do for the IR), just to avoid some of this song and dance. - The old AST type legalization had a notion of both a "tuple" type and a "varying tuple" type. The "tuple" case was quite complex, and combined behavior currently handled by `LegalType::pair` (for splitting into ordinary and special sides) and `LegalType::tuple` (for holding multiple distinct elements to represent the fields of an aggregate). The "varying tuple" case was closer to `LegalType::tuple`, so I tried to just re-use the existing logic for that too. The one place this potentially gets messy is in `reifyTuple()`. - The messiest bit of handling the "varying tuple" concept (which is used for GLSL shader inputs/outputs since they have to be scalarized) is that when passing them as function arguments we need to reify the tuple back into a structured value. Because the `LegalExpr` hierarchy doesn't have type information, but constructing a value of the "original" type requires such information, things get a little messy. - I did not try to deal with any of the logic related to handling system inputs/outputs for cross-compilation purposes. Of course, the long-term goal is that any actual cross-compilation is handled via the IR, but this change can't afford to break the AST-based path just yet. As a result, there is still quite a bit of complexity in the handling of assignment, to deal with cases where "fixups" are required. * fixup: bad code in macro, not caught by Visual Studio compiler * fixup: more stuff missed by VS compiler * fixup: VS continutes to miss stuff in UNREACHABLE_RETURN
2017-11-28	Enable HLSL/GLSL "rewrite" + IR-based Slang codegen (#300)	Tim Foley
	The big picture here is that the AST-to-AST pass in `ast-legalize` will now detect when a declaration being referenced comes from an `import`ed module, and (if IR codegen is enabled), it will trigger cloning of the IR for the chosen symbol into an IR module that will sit alongside the legalized AST. Then, during HLSL/GLSL code emit, we emit all the IR-based code first, and then the AST-based code. Whenever the AST code references a symbol that was lowered via IR (we keep track of these) we emit the mangled name of the IR symbol. Notes/details: - A lot of the logic for cloning IR symbols referenced by the AST matches the same logic that would clone them for completely IR-based codegen, so I tried to hoist out the common logic and share it (e.g., so that we apply the same guaranteed transformations in both cases). This required basically rewriting the logic in `emit.cpp` that decomposed the various cases. - There is a new compute test case added to test this functionality. `tests/compute/rewriter.hlsl` confirms that we can use the `-no-checking` mode for the HLSL code, but still make use of a library of Slang code that employs generics, etc. - Adding this test case required adding a new compute test mode that invokes `render-test` with the `-hlsl-rewrite` flag. - It turns out that the existing `tests/render/cross-compile0.hlsl` test should have been using this functionality already. It was opting into the use of the IR via `-use-ir`, and the `render-test` application already tries to set `-no-checking` for non-Slang input languages by default. Fixing the code path this test triggers means that it is now a second test of rewriter+IR codegen. - The `translateDeclRef` logic in `ast-legalize.cpp` seemed sloppy in places, and would potentially clone declarations, when declaration references were desired. I tried to clean a bit of this up, so some call sites are now changed. - This change tries to clean up some work around cloning of global values - All global value kinds (not just functions) now go through the logic of trying to pick a "best" definition, so that they can be used when we are linking multiple modules - The logic for registering cloned values has been unified a bit, so that clients always pass in an `IROriginalValuesForClone` that either wraps a single value (maybe just null), or an `IRSpecSymbol` that gives a list of values to regsiter the new value as a clone for. - I made one piece of code that was cloning witness tables as part of generic specializations not* register a clone. I think this is correct because we may specialize the same generic multiple ways, so registering any values we clone is not the right idea, but I might be missing something... - I also reorganized this logic so that it would be easier to clone a global value when we only know its mangled name (which is the case when it is the AST that triggers cloning) - I made sure that when loading a module via `import`, the translation unit for the new module copies the `-use-ir` flag from the overall compile request, if it is present (otherwise we wouldn't generate IR for loaded modules at all... oops). - Note that `getSpecializedGlobalValueForDeclRef()`, which is the main routine used by the AST legalization to trigger cloning of an IR value does not currently handle declaration references that require specialization. - This change does not deal with trying to unify the type legalization logic between the AST-to-AST rewriter and the IR-based codegen, so if you call an imported function with types that require legalization, Bad Things are expected to happen right now.
2017-11-28	Generate IR per-module for loaded modules (#299)	Tim Foley
	The basic idea here is that for each module that gets loaded via `import`, we should also generate the initial IR for the declarations in that module at the time it gets loaded. Furthermore, when we generate initial IR for a module, we will only generate IR declarations (not definitions) for any functions/variables in modules it imports. Later, when cloning IR to begin code generation for an entry point, we will effectively "link" all of the loadedm modules together, so that a given global value can get its definition from any of the IR modules present. - Change the `loadedModulesList` and related data structures to hold a new `LoadedModule` type, instead of just the AST (and then have a `LoadedModule` own both the AST and the IR module) - Share some logic between the `import` and `#import` cases, so that we always try to generate IR for modules we load. - Make sure that IR generation always gets skipped if the command-line flags tell us not to use the IR. - A few small fixups for cases that didn't arise in IR lowering so far, but come up when we try to actually generate IR for things like the stdlib. There are some notable gaps in this work right now: - The stdlib modules are exempted from this behavior; we always generate IR for stdlib functions in any user module that calls them. This is just a workaround for the fact that the stdlib modules don't show up in the list of imported modules right now. - We don't currently have logic that does the "linking" step for global variables like we do for functions. We really need to look up the symbols with the same mangled name, and favor any one of them that has a definition (if there is one) - Similarly, the handling of witness tables is incomplete. During initial IR generation, we should probably be generating empty witness tables for any conformances that were declared in other modules (but are being used locally in this module), and then the "linking" step should favor non-empty witness tables over empty ones. Still, all the test cases pass with the code like this, and this seems like an important step in the right direction.
2017-11-24	Fix substitution mechanism to remove special cases for global params (#297)	Yong He
	Add a new function: `substituteSubstitutions(Substitutions * substHead, Substitutions subst, int * ioDiff)` This function substitutes the type arguments referenced in a linked list of substitutions headed at `substHead` using the substitutions specified by `subst`. If the linked list `substHead` does not contain `GlobalGenericParamSubstitution` entries, they will be added to the bottom (outter most) of the linked list. Note that this function should be called when `substHead` is known to be the head of substitution linked list because the existance of `GlobalGenericPaaramSubstitution` is detected assuming the linked lists starts at `substHead`. If a substitution that is not the head of a substitution linked list is passed in, duplicate `GlobalGenericParamSubstitution`s could be appended to the linked list. This means that this function should not be called in places like `GenericSubstitution::SubstitutionImpl()` for its outer substitutions, because `outer` is obviously not the head of the linked list. Instead, use this function to substitution the substitution lists of `DeclRef` etc. instead of calling `declRef.substitutions->SubstituteImpl()` where the head to the linked list is known as a member of that class. With this function, IRSpecContext::maybeCloneType() is simplified down to `originalType->Substitute(subst)` Updates `DeclRefBase::SubstituteImpl` and `DeclRefType::SubstituteImpl` to call `substituteSubstitutions` instead of making direct `substitutions->SubstituteImpl` call. Providing actual implementation of `GlobalGenericParamSubstitution::SubstituteImpl` instead of just returning `this` to deal with potential situations where a true substitution is needed.
2017-11-21	Merge branch 'master' into generic-param-fix	Yong He

2017-11-21	Add logic to propagate GlobalGenericParamSubstitution	Yong He

2017-11-20	IR: support global variable with initializers (#294)	Tim Foley
	The big change here is that the ability to contain basic blocks with instructions in them has been hoisted from `IRFunc` into a new base type `IRGlobalValueWithCode` shared with `IRGlobalVar`. The basic blocks of a global variable define initialization logic for it; they can be looked at like a function that returns the initial value. Places in the IR that used to assume functions contain all the code need to be updated, but so far I only handled the cloning step. The emit logic currently handles an initializer for a global variable by outputting its logic as a separate function, and then having the variable call that function to initialize itself. This should be cleaned up over time so that we generate an ordinary expression whenever possible. I also made the emit logic correctly label any global variable without a layout (that is, any that don't represent a shader parameter) as `static` so that the downstream HLSL compiler sees them as variables rather than parameters.
2017-11-20	fixup global generic parameters	Yong He
	1. simplify RoundUpToAlignment() 2. add new a render-compute test case to cover the situation where the entry-point interface (parameter/return types of an entry-point function) is dependent on the global generic type. 3. initial fixes to get this test case to compile (but is not producing correct HLSL output yet)
2017-11-17	IR: Add support for `out` and `inout` parameters (#289)	Tim Foley
	These were already being handled a little bit, by lowering an `out T` or `inout T` function parameter in the AST to a function parameter with type `T*` in the IR, and then emiting explicit loads/stores. The HLSL emit logic, however, couldn't tell the difference between an `out` parameter, an `inout`, or a true pointer (if we ever needed to support them...). The intention (not fully implemented) was that we'd use a hierarchy of types rooted at `PtrTypeBase`: - `PtrTypeBase` - `Ptr`: "real" pointers in the C/C++ sense - `OutTypeBase`: pointers used to represent by-reference parameter passing - `OutType`: IR level type for an `out` parameter - `InOutType`: IR level type for an `inout` or `in out` parameter Actually implementing this involved: - Adding a bit more flexibility to the `Session::getPtrType` logic to allow for creating any of the concrete types above - Making the `lower-to-ir` logic create the right type for function parameters (instead of just using `PtrType`) - Making the HLSL emit logic check for the `OutType` and `InOutType` cases rather than just `PtrType` - Changing a bunch of small places in the code so that they use `PtrTypeBase` instead of `PtrType` when they should handle any of the above cases, and also make a few places check for `OutTypeBase` instead of `PtrType` or `PtrTypeBase`, when they are really trying to capture by-reference parameters - Add a test case that uses all of the different cases we care about (without these fixes, this test case generates errors from fxc because of variables being used before being initialized, becaues parameters get declared `out` that should be `inout`). A minor point here is that we are playing a bit fast and loose right now because the IR does not actually enforce any type checks. From the standpoint of the front end, `Ptr<T>`, `Out<T>`, and `InOut<T>` are all unrelated types (each is just a `struct` declared in `core.meta.slang`), but this doesn't really matter because none of these are types our current users are explicitly using. In the IR it makes perfect sense to allow `Out<T>` or `InOut<T>` as the operand of a `load` or `store` instruction (and ditto for `getFieldAddr`, etc.) - there instructions just apply to any `PtrTypeBase`. The place where this potentially gets tricky is whether an `Out<T>` can be used where a `Ptr<T>` is expected, or vice vers (e.g., can I just pass my local variable's pointer directly to an `Out<T>` function parameter? I'm going to ignore these issues for now, since the code currently works for our test case.
2017-11-17	Add support for global generic parameters (#285)	Yong He
	* Add support for global generic parameters (In-progress work) This commit include: 1. Update Slang API to allow specification of generic type arguments in an `EntryPointRequest` 2. Add parsing of `__generic_param` construct, which becomes a GlobalGenericParamDecl, contains members of `GenericTypeConstraintDecl`. 3. Semantics checking will check whether the provided type arguments conform to the interfaces as defined by the generic parameter, and store SubtypeWitness values in the EntryPointRequest, which will be used by `specializeIRForEntryPoint` when generating final IR. 4. Add a new type of substitution - `GlobalGenericParamSubstitution` for subsittuting references to `__generic_param` decls or to its member `GenericTypeConsraintDecl` with the actual type argument or witness tables. 5. Update `IRSpecContext` to apply `GlobalGenericParamSubstitution` when specializing the IR for an EntryPointRequest. 6. Update `render-test` to take additional `type` inputs, which specifies the type arguments to substitute into the global `__generic_param` types. This commit does not include ProgramLayout specialization. * IR: pass through `[unroll]` attribute (#284) The initial lowering was adding an `IRLoopControlDecoration` to the instruction at the head of a loop, but this was getting dropped when the IR gets cloned for a particular entry point. The fix was simply to add a case for loop-control decorations to `cloneDecoration`. * fix warnings * IR: support `CompileTimeForStmt` (#286) This statement type is a bit of a hack, to support loops that must be unrolled. The AST-to-AST pass handles them by cloning the AST for the loop body N times, and it was easy enough to do the same thing for the IR: emit the instructions for the body N times. The only thing that requires a bit of care is that now we might see the same variable declarations multiple times, so we need to play it safe and overwrite existing entries in our map from declarations to their IR values. Of course a better answer long-term would be to do the actual unrolling in the IR. This is especially true because we might some day want to support compile-time/must-unroll loops in functions, where the loop counter comes in as a parameter (but must still be compile-time-constant at every call site). * Add support for global generic parameters (In-progress work) This commit include: 1. Update Slang API to allow specification of generic type arguments in an `EntryPointRequest` 2. Add parsing of `__generic_param` construct, which becomes a GlobalGenericParamDecl, contains members of `GenericTypeConstraintDecl`. 3. Semantics checking will check whether the provided type arguments conform to the interfaces as defined by the generic parameter, and store SubtypeWitness values in the EntryPointRequest, which will be used by `specializeIRForEntryPoint` when generating final IR. 4. Add a new type of substitution - `GlobalGenericParamSubstitution` for subsittuting references to `__generic_param` decls or to its member `GenericTypeConsraintDecl` with the actual type argument or witness tables. 5. Update `IRSpecContext` to apply `GlobalGenericParamSubstitution` when specializing the IR for an EntryPointRequest. 6. Update `render-test` to take additional `type` inputs, which specifies the type arguments to substitute into the global `__generic_param` types. progress on parameter binding * Add a more contrived test case for specializing parameter bindings * update render-test to align buffers to 256 bytes (to get rid of D3D complains on minimal buffer size). * adding one more test case for parameter binding specialization. * Cleanup according to @tfoleyNV 's suggestions. * fix a bug introduced in the cleanup
2017-11-16	IR: pass through `[unroll]` attribute (#284)	Tim Foley
	The initial lowering was adding an `IRLoopControlDecoration` to the instruction at the head of a loop, but this was getting dropped when the IR gets cloned for a particular entry point. The fix was simply to add a case for loop-control decorations to `cloneDecoration`.
2017-11-16	Revise type legalization so it can handle constant buffers (#282)	Tim Foley
	* Revise type legalization so it can handle constant buffers The existing legalization approach with "tuples" can handle scalarizing a `struct` type with resource-type fields in it, but it had several big gaps. The most notable is that given a type that mixes uniform and resource fields, we can't just blindly scalarize things: ``` struct P { float4 a; float4 b; Texture2D t; }; cbuffer C { P gParam[8]; }; ``` The existing code was completely ignoring the declaration of `gParam` inside `C`, but even if we fixed that issue, we'd get something like: ``` cbuffer C { float4 gParam_a[8]; float4 gParam_b[8]; }; Texture2D gParam_t[8]; ``` In this case we've completely changed the layout of the uniform buffer, by switching from AOS to SOA. Even if we could get the type layout logic and the IR to agree on this, it would be a surprise to users, and "principle of least surprise" should be a big deal on a project with as many moving parts as ours. The right thing to do is to have legalization create a "stripped" version of the original type `P` and use that: ``` struct P_stripped { float4 a; float4 b; }; cbuffer C { P_stripped gParam[8]; }; Texture2D gParam_t[8]; ``` Then at a call site, this: ``` foo(gParam); ``` becomes: ``` foo(gParam, gParam_t); ``` This is exactly how the current AST-to-AST legalization handles mixed uniform and resource types, but the way it does it involves some annoying kludges: - That pass has a notion of a "tuple" similar to our legalization, but every tuple has an optional "primary" entry for all the uniform data, plus tuple elements for the resources, and a given field may be represented on one side, the other, or both. It makes the code for handling tuples very messy. - That pass does the "stripping" of types by actually marking up the AST declarations (this is okay because it is constructing a new AST as it goes), so that when they get emitted certain fields don't actually show up. That is, we fix the problem with type `P` by actually modifying the user's declaration of `P`. That seems out of bounds for the IR. This change fixes the problem in our IR type legalization while trying to avoid the problems of the AST-to-AST pass by using two new ideas: 1. We add a new case for `LegalType` (and `LegalVal`) that is a "pair" type, where a pair consists of both an "ordinary" type (for uniform data) and a "special" type (for resource data). E.g., after legalization, the type for `C` (which can be over-simplified to `ConstantBuffer<P>` for our purposes), will be a `LegalType::pair` where the ordinary side is `ConstantBuffer<P_stripped>` and the special side is a tuple containing the `Texture2D` field. 2. We add a new (and annoyingly hacky) AST-level type called `FilteredTupleType` which is semantically a sort of tuple type (it holds a list of elements, and the elements have their own types), but which remembers an "original type" that it was created from, and for each element remembers the field of the original type that it corresponds to. This is used to construct a type like `P_stripped` as an actual AST-level structural type. The core logic for legalizing an aggregate type had to get more complicated just because of the new pair case, so there is now a `TupleTypeBuilder` that asists with taking an aggregate type, processing its fields, and then picking the right `LegalType` representation for the result. Other smaller changes: - Made the legalization logic actually legalize `PtrType<T>`. E.g., if `T` legalizes to a tuple, we need to construct a tuple of pointer types. The same exact thing needs to be applied to arrays, and any other generic type that should "distribute over" pairs/tuples. - Made the legalization logic actually legalize `ConstantBuffer<T>` and similar. The basic idea there is if `T` maps to a pair, we wrap `ConstantBuffer<...>` around the ordinary side, and `implicitDeref` around the special side. - Removed a bunch of `#ifdef`ed-out code from the end of `ir-legalize-types.cpp`. That was code from my first attempt at legalization that failed miserably (trying to do it via local changes and a work list instead of a global rewrite pass), but it had some code I wanted to reference when writing the version that actually got checked in (should have deleted the code earlier, though). - Added a bunch of cases for `LegalType::none` (and the `LegalVal` equivalent) that helped simplify the logic fo the `pair` case by allowing me to always dispatch to both the "ordinary" and "special" sides, even if they might not actually be present. - Renamed `TupleType` and `TupleVal` over to `TuplePseudoType` and `TuplePseudoval` to recognize the fact that we might actually need/want real tuples in the type system, to go along with these fake ones (that need to be optimized away). The biggest doubt I have about this change is the whole `FilteredTupleType` thing; it seems like an obviously contrived type to add to the front-end type system that really only solves IR-level problems. A cleaner approach might have been to just add a plain old `TupleType` to the front-end type system (and initially I started with that), and then have yet another `LegalType`/`LegalVal` case that handles mapping from the fields of the original type to the numbered tuple elements. I expect we'll actually want to make that change in the future (especially if we ever add true tuples to the front-end), but for right now I let myself be swayed by the desire to have these stripped/filtered types get names that explain their provenance ("where they came from") to make our output code more debuggable. The way I've done it is probably overkill, though, and we need a much more complete effort on the readability and debuggability of our output before anything like that is worth worrying about. * Fixup: typo * Fixup: fix output of "non-mangled" names for test cases - Make sure to attach high-level decls to variables created as part of type legalization - Also, try to share more of the code between the different cases of variables - Fix up `parameter-blocks` test case that was passing `-no-mangle` but expecting mangled names in the output - Fix up `multiple-parameter-blocks` to not rely on `-no-mangle` for now, because it would lead to two global variables with the same name (need to fix that underlying issue eventually). - Also fix name generation logic so that we only use "original" names (from high-level decls) specifically when the `-no-mangle` flag is on, and otherwise use IR-level names. * Fix: handle constant buffers better in render-test - Don't request both CB and SRV usage for buffers, since that is illegal - Also, don't try to create an SRV when user requested a CB (since the required usage flag won't be there) - Record the input buffer type on the `D3DBinding` for a buffer, and use that to tell us when to bind a CB instead of SRV/UAV - Fix expected output for `cbuffer-legalize` test now that we are actually feeding it correct cbuffer dta.
2017-11-15	Various IR fixes for Falcor (#280)	Tim Foley
	- Change function mangling so we use `p<parameterCount>p` instead of just `p<parameterCount>` to avoid the parameter count running into digits at the start of a mangled type name and tripping up the un-mangling logic. - We really need to step back at some point and define our mangling scheme a bit more carefully, especially if we are going to keep going down this road where un-mangling things is important for generating HLSL output. - Also allow the unmangling logic to unmangle a few more cases of generic parameters, so that it can skip over them to get to the parameter count of the underlying function. - Add a notion of an `unreachable` instruction to the IR, and emit it as the terminator (if needed) at the end of the last block for a function with a non-void return type. - This does not implement any logic to emit a diagnostic if the `unreachable` turns out to be potentially reachable - Fix a bug in IR specialization of generics where we can't create two different specializations of the same function, because both get registered in the same hash map With all these fixes, testing in Falcor modified to use the full Slang compiler and IR for all HLSL/Slang: - The UI and text rendering shaders yield HLSL that compiles without error; no idea if they actually work - The ModelViewer shaders yield HLSL, but there are some issues (looks like type legalization isn't applying to stuff inside constant buffers)
2017-11-14	IR: add support for `switch` statements (#278)	Tim Foley
	* IR: add support for `switch` statements Fixes #273 This is just something we hadn't gotten to yet on the IR. The actual design of the instruction is unsurprising (once you take into consideration the requirement for structured control flow). A `switch` instruction takes the form: switch <condition> <breakLabel> <defaultLabel> [<caseVal> <caseLabel>]* Where `condition` is the value to switch on, `breakLabel` is the "join point" after the original `switch` statement, `defaultLabel` is where to go if the value doesn't match any case, and each pair of `caseVal` and `caseLabel` is what to do on a particular value. It is required that `caseVal` be a literal, but this isn't currently being enforced in the IR (the front-end should be making a check and constant-folding the case labels). For structured control flow, we also make the assumption that the cases are in order: cases with the same label must be grouped together, and any case that falls through to another must come right before it. Given this representation, the emit logic can reconstruct a `switch` statement with relative ease, given the machinery we already have. It makes sure to group together case values with the same label (again, assuming they are contiguous), and will insert the `default:` label in with whatever group it belongs to. Actually emitting code for a `switch` statement seems superficially simple, until you realize that a complete implementation needs to handle stuff like "Duff's Device." The current implementation makes the assumption that all `case` and `default` statements are directly nested under a `switch`, and that there is no way for control flow to enter a case except by the `switch` itself, or fall-through. In order to facilitate the grouping of cases in the IR-to-HLSL emit logic, the AST-to-IR lowering logic tries to detect cases where there are multiple `case`s in a row, and emit only a single label for them. One big/annoying gotcha is that we don't properly handle the case where a `default:` case has a non-trivial fall-throguh to another case. That seems fine for now since HLSL doesn't support fall-through anyway, but it probably needs to get detected somewhere in the Slang compiler (e.g., maybe we should add a diagnostic pass over the IR that detects target-specific problems like that and emits errors). * IR: Add support for empty statements. - Add empty statement in `lower-to-ir.cpp` - Go ahead and eliminate the statement catch-all and explicitly enumerate the cases we don't support - Fix up parser for block statements so that it doesn't leave a null statement as the body of a `{}` - Add an empty statement to one of the cases for the `switch` test, to ensure we are testing empty statements
2017-11-07	IR: add support for `discard` statement (#261)	Tim Foley
	- Add definition of `discard` instruction - A `discard` is a terminator instruction, just like `returnVoid` - Lower `DiscardStmt` in AST to a `discard` instruction in the IR - Emit `discard` instruction as a `discard;` statement when emitting HLSL/GLSL - Add a test case using the "graphics compute" mode that tests discard. The test writes to one entry in a UAV before doing a conditional (always true at runtime) discard, and then writes to another entry; we expect to see the results of the first write, but not the second.
2017-11-07	Support generic interface methods (#251)	Yong He
	* improve diagnostic messages and prevent fatal errors from crashing the compiler. * fix top level exception catching. * spelling fix * change wording of invalidSwizzleExpr diagnostic * add speculative GenericsApp expr parsing * add new test case of cascading generics call. * Fixing bugs in compiling cascaded generic function calls. Add implementation of DeclaredSubTypeWitness::SubstituteImpl() This is not needed by the type checker, but needed by IR specialization. When input source contains cascading generic function call, the arguments to `specialize` instruction is currently represented as a substitution. The arg values of this subsittution can be a `DeclaredSubTypeWitness` when a generic function uses one of its generic parameter to specialize another generic function. When the top level generics function is being specialized, this substitution argument, which is a `DeclaredSubTypeWitness`, needs to be substituted with the witness that used to specialize the top level function in the specialized specialize instruction as well. * add a test case for cascading generic function call. * parser bug fix * fixes #255 * add test case for issue #255 * Generate missing `specialize` instruction when calling a generic method from an interface constraint. When calling a generic method via an interface, we should be generating the following ir: ... f = lookup_interface_method(...) f_s = specailize(f, declRef) ... This commit fixes this `emitFuncRef` function to emit the needed `specialize` instruction. * fixes #260 This fix follows the second apporach in the disucssion. It generated mangled name for specialized functions by appending new substitution type names to the original mangled name. * Disabling removing and re-inserting specailized functions in getSpecalizeFunc() I am not sure why it is needed, it seems HLSL and GLSL backends are generating forward declarations anyways, so the order of functions in IRModule shouldn't matter. * cleanup and complete test cases. * fix warnings
2017-11-07	Handle "ThisType" subsitutions when specialization generics in the IR	Tim Foley
	The original code is handling the issue where a call site might be specializing a generic function, so it has a `DeclRef` that represents what it wants to specialize, but the callee is actually a different overload of the same generic function (e.g., a target-specific overload) and so we need to construct a set of substitutions that are equivalent (same arguments), but point to different `GenericDecl`s. That code was making some bad assumptions, though: 1. It assumed that the substitutions list would always start with a generic substitution (no longer true with `ThisTypeSubstitution`. 2. It assumed that only the top-most substitution would need to be translated. This assumption is probably safe for now, but it could break down if we ever introduced an ability for a type to be re-opened to introduce new (target-specific) overloads of its members. The new approach goes ahead and does a deep copy of the substitition list (but a shallow copy of the arguments), and only copies the generic substititions for now.
2017-11-06	Parameter blocks (#245)	Tim Foley
	* Rename existing ParameterBlock to ParameterGroup We are planning to add a new `ParameterBlock<T>` type, which maps to the notion of a "parameter block" as used in the Spire research work. Unfortunately, the compiler codebase already uses the term `ParameterBlock` as catch-all to encompass all of HLSL `cbuffer`/`tbuffer` and GLSL `uniform`/`buffer`/`in`/`out` blocks (all of which are lexical `{}`-enclosed blocks that define parameters...). This change instead renames all of the existing concepts over to `ParameterGroup`, which isn't an ideal name, but at least doesn't directly overlap the new terminology or any existing terminology. The new `ParameterBlockType` case will probably be a subclass of `ParameterGroupType`, since it is a logical extension of the underlying concept. * Add Shader Model 5.1 profiles The HLSL `register(..., space0)` syntax is only allowed on "SM5.1" and later profiles (which is supported by the newer version of `d3dcompiler_47.dll` that comes with the Win10 SDK, but not the older version of `d3dcompiler_47.dll` - good luck figuring out which you have!). This change adds those profiles to our master list of profiles, and nothing else. * First pass at support for `ParameterBlock<T>` - Add the type declaration in stdlib - Add a special case of `ParameterGroupType` for parameter blocks - Handle parameter blocks in type layout (currently handling them identically to constant buffers for now, which isn't going to be right in the long term) - Add an IR pass that basically replaces `ParameterBlock<T>` with `T` - Eventually this should replace it with either `T` or `ConstantBuffer<T>`, depending on whether the layout that was computed required a constant buffer to hold any "free" uniforms - Add first stab at an IR pass to "scalarize" global variables using aggregate types with resources inside. - This currently only applies to global variables, so it won't handle things passed through functions, or used as local variables - It also only supports cases where the references to the original variable are always references to its fields, and not the whole value itself - Add a single test case that technically passes with this level of support, but probably isn't very representative of what we need from the feature * Fold parameter-block desugaring into a more complete "type legalization" pass The basic problem that was arising is that once you desugar `ParameterBlock<T>` into `T`, you then need todeal with splitting `T` into its constituent fields if it contains any resource types. Handling those transformations by following the usual use-def chains wasn't really helping, because you might need systematic rewriting that can really only be handled bottom-up. This change adds a new pass that is intended to perform multiple kinds of type "legalization" at once: - It will turn `ParameterBlock<T>` into `T` - It may at some point also convert `ConstantBuffer<T>` into `T` as well - It will turn an value of an aggregate type that contains resources into N different values (one per field) - As a result of this, it will also deal with AOS-to-SOA conversion of these types Legalization is applied to every function/instruction/value, so that it can make large-scale changes that would be tough to manage with a work list. This pass needs to be run after generics have been fully specialized, so that we know we are always dealing with fully concrete types, so that their legalization for a given target is completely known. This is still work in progress; there's more to be done to get this working with all our test cases, and finish the remaining `ParameterBlock<T>` work. * Improve binding/layout information when using parameter blocks - When doing type layout for a parameter block, don't include the resources consumed by the element type in the resource usage for the parameter block - Note that this is pretty much identical to how a `ConstantBuffer<T>` does not report any `LayoutResourceKind::Uniform` usage, except that `ParameterBlock<T>` is also going to hide underlying texture/sampler reigster usage - The one exception here is that any nested items that use up entire `space`s or `set`s those need to be exposed in the resource usage of the parent (I don't have a test for this) - When type legalization needs to scalarize things, it must propagate layout information down to the new leaf variables. In general, the register/index for a new leaf parameter should be the sum of the offsets for all of the parent variables along the "chain" from the original variable down to the leaf (we aren't dealing with arrays here just yet). - When type legalization decides to eliminate a pointer(-like) type (e.g., desugar `ParameterBlock<T>` over to `T`), actually deal with that in terms of the `LegalVal`s created, so that we can know to turn a `load` into a no-op when applied to a value that got indirection removed. - Hack up the "complex" parameter-block test so that it actually passes (the big hack here is that the HLSL baseline is using names that are generated by the IR, and are unlikely to be stable as we add/remove transformations). - Note: I can't make these be compute tests right now, because regsiter spaces/sets are a feature of D3D12/Vulkan, and our test runner isn't using those APIs.
2017-11-05	small cleanups	Yong He

2017-11-04	Merge remote-tracking branch 'refs/remotes/official/master'	Yong He