slang.git - Making it easier to work with shaders

	Commit message (Collapse)	Author	Age
*	Fix a bug in IR use-def information (#406)	Tim Foley	2018-02-13
\| \| \| \| \| \| \| \| \| \| \|	The basic problem here is that when unlinking an `IRUse` from the linked list of uses, there were several cases where I was failing to set the `prevLink` field of the next node to match the `prevLink` field of the node being removed. That doesn't show up when walking the linked list of uses forward, but it breaks it whenever you have subsequent unlinking operations. This change fixes the bugs of that kind I could find, and also adds a debug validation method to try to avoid breaking it again. I also made more access to `IRUse` go through accessor methods rather than using fields directly, to try to avoid this kind of error. I stopped short of making anything `private`, because I tend to find that it creates more hassles than it avoids. A few other fixes along the way: - Made the `List<T>` type default-initialize elements when you resize it. I hadn't realized we weren't doing that. - Add a standalone `dumpIR(IRGlobalValue*)` so help when debugging issues.
*	Basic IR support for `static const` globals (#404)	Tim Foley	2018-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Basic IR support for `static const` globals Our strategy for lowering global variables can fall back to putting their initialization into a function, but that isn't really appropriate for global constants (it also isn't appropriate for arrays, but we'll need to deal with that seaprately). This change adds a distinct case for global constants (rather than treating them as variables), and forces the emission logic to always emit them as a single expression. Doing this makes assumptions about how the IR for these constants gets emitted (and what optimziations might do to it). In order to make things work, I had to switch the handling of initializer-list expressions to not be lowered via temporaries and mutation (since that isn't a good fit for reverting to a single expression). I've added a single test case to ensure that this works in the simplest scenario. My next priority will be to see if this unblocks my work in Falcor. * Fixup: bug fixes
*	Falcor fixes (#402)	Tim Foley	2018-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Re-define deprecated compile flags By including these flags in the header file, with a value of zero, we can allow some existing code to compile even after the major changes to the implementation. * The `SLANG_COMPILE_FLAG_NO_CHECKING` option will effectively be ignored, since checking is always enabled. * The `SLANG_COMPILE_FLAG_SPLIT_MIXED_TYPES` option will now act as if it is always enabled (and indeed some of the code has been relying on this flag being set always). * Make subscript operators writable for writable textures This even had a `TODO` comment saying that we needed to fix it, and now I'm seeing semantic checking failures because we didn't define these and so we find assignment to non l-values. * Fix definitions of any() and all() intrinsics These should always return a scalar `bool` value, but they were being defined wrong in two ways: 1. They were using their generic type parameter `T` in the return type 2. They were returning a vector in the vector case, and a matrix in the matrix case. This change just alters the return type to be `bool` in all cases. * Fix bug in SSA construction When eliminating a trivial phi node, it is possible that the phi is still recorded as the "latest" value for a local variable in its block. When later code queries that value from the block (which can happen whenever another block looks up a variable in its predecessors), it would get the old phi and not the replacement value. I simply added a loop that checks if the value we look up is a phi that got replaced, and then continues with the replacement value (which might itself be a phi...). A more advanced solution might try to get clever and have the map itself hold `IRUse` values so that we can replace them seamlessly. * Simplify IR control flow representation This change gets rid of various special-case operations for conditional and unconditional branches, and instead requires emit logic to recognize when a direct branch is targetting a `break` or `continue` label. The new approach here isn't perfect, but it seems beter than what we had before, because it can actually work in the presence of control-flow optimizations (including our current critical-edge-splitting step). * Load from groupshared isn't groupshared When loading from a `groupshared` variable, the resulting temporary shouldn't have the `groupshared` qualifier on it. This might eventually need to generalize to a better understanding of storage modifiers in the IR, but I don't really want to deal with that right now. * Don't emit references to typedefs in output code Now that we are using the IR for all codegen, we shouldn't be dealing with surface-level things like `typedef` declarations in the output code; just use the type that was being referred to in the first place. * Fix floating-point literal printing for IR The IR was calling `emit()` instead of `Emit()` (we really need to normalize our convention here), and was implicitly invoking a default constructor on `String` that takes a `double` (that constructor should really be marked `explicit`), and which doesn't meet our requirements for printing floating-point values. * Fix error when importing module that doesn't parse We already added a case to bail out if semantic checking fails, but neglected to add a case if there is an error during parsing of a module to be imported. Note: this logic doesn't correctly register the module as being loaded (but still in error), so users could see multiple error messages if there are multiple `import`s for the same module. * Improve error message for overload resolution failure - Drop debugging info from the candidate printing - Add cases to print `double` and `half` types properly * Fixup: switch loopTest to ifElse in expected IR output
*	Generate SSA form for IR functions (#400)	Tim Foley	2018-02-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Generate SSA form for IR functions The basic idea here is simple: in the front-end after we have lowered the AST to initial IR we will apply a set of "mandatory" optimization passes. The first of these is to attempt to translate the all functions into SSA form so that they are amenable to subsequent dataflow optimizations. Eventually, the mandatory optimization passes would include diagnostic passes that make sure variables aren't used when undefined, etc. Just doing basic SSA generation already cleans up a lot of the messiness in our IR today, because constructs that used to involve many local variables can now be handled via SSA temporaries. The implementation of SSA generation is in `ir-ssa.cpp`, and it follows the approach of Braun et al.'s "Simple and Efficient Construction of Static Single Assignment Form." I used this instead of the more well-known Cytron et al. algorithm because Braun's algorith mis very simple to code, and does not require auxiliary analyses to generate the dominance frontier. The main wrinkle in our SSA representation right now is that instead of using ordinary phi nodes, we instead allow basic blocks to have parameters, where predecessor blocks pass in different parameter values. This encodes information equivalent to traditional phi nodes, but has two (small) benefits: 1. There is no fixed relationship between the order of phi operands and predecessor blocks, so we don't have to worry about breaking the phis when we alter the order in which predecessors are stored. This is important for us because predecessors are being stored implicitly. 2. It is easy to operationalize a "branch with arguments" either when lowering to other languages, or when interpreting the IR. A branch with arguments is implemented as a sequence of stores from the arguments to the parameters of the target block (very similar to a call), followed by a jump to the block. Relevant to the above, this change also adds an interface for enumerating the predecessors or successors of a block in our CFG. Rather than use an auxliary structure, we directly use the information already encoded in the IR: * The sucessors of a block are the target label operands of its terminator instruction. In our IR this is a contiguous range of `IRUse`s, possible with a stride (to account for the way `switch` interleaves values and blocks). * The predecessors of a block are a subset of the uses of the block's value. Specifically, they are any uses that are on a terminator instruction, and within the range of values that represent the successor list of that instruction. One important limitation of the "blocks with arguments" model for handling phis is that it is really only convenient to stash extra arguments on an unconditional terminator instruction. This change works around this prob lem by breaking any "critical edges" - edges between a block with multiple successors and one with multiple predecessors. We assume that "phi" nodes will only ever be needed on a block with multiple predecessors, and because critical edges are broken, each of these predecessors will then have only a single successor, so its branch instruction can handle the extra arguments. This change introduces a notion of an "undefined" instruction in the IR. This is handled as an instruction rather than a value because I anticipate that we will want to distinguish different undefined values when it comes time to start issuing error messages (those messages will need to point to the variable that was used when undefined). * Fix expected test output. Another change was merged that enabled the `glsl-parameter-blocks` test, and its output is affected by our IR optimization work.
*	Support __target_intrinsic modifiers in IR codegen (#401)	Tim Foley	2018-02-07
\| \| \| \| \| \| \| \| \| \| \|	The standard library already has a bunch of these decorations, since they were added to support Slang->Vulkan codegen on the AST-to-AST path. This change makes the IR code generator able to exploit the modifiers so that we pick up a bunch of Vulkan support "for free" in the short term. The basic change is in `lower-to-ir.cpp` where we copy over any `TargetIntrinsicModifier`s to become `IRTargetIntrinsicDecoration`s with the same information. We then need a bit of logic in `ir.cpp` to make sure we clone them as needed. The core work of using the modifiers is in `emit.cpp`, where I basically just copy-pasted the existing logic that applied in the AST path (all the AST-related code there is dead, and we should clean it up soon). The big change that comes with this logic is that when dealing with a member function, the numbering of the argument used in the intrinsic definition string changes, so that `$0` refers to the base object (whereas before the base object was looked up via the base expression of a `MemberExpr` used for the function). This requires a bunch of the definitions in the library to be updated; hopefully I caught them all. For kicks, I've re-enabled a cross-compilation test just to confirm that we are generating valid SPIR-V for code that performs texture-fetch operations. I don't expect us to keep that test enabled as-is in the long term, though, because it would be much better to instead use render-test to do the same thing. Alas, beefing up the Vulkan support in render-test is an outstanding work item, and I didn't want to pollute this change with more work along those lines.
*	Remove non-IR codegen paths (#398)	Tim Foley	2018-02-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The basic change is simple: remove support for all code generation paths other than the IR. There is a lot of vestigial code left, but the main logic in `ast-legalize.` is gone. Doing this breaks a lot* of tests, for various reasons: - We can no longer guarantee exactly matching DXBC or SPIR-V output after things pass through out IR - Many builtins don't have matching versions defined for GLSL output via IR (even when they had versions defined via the earlier approach that worked with the AST) - A lot of code creates intermediate values of opaque types in the IR, which turn into opaque-type temporaries that aren't allowed (this breaks many GLSL tests, but also some HLSL) I implemented some small fixes for issues that I could get working in the time I had, but most of the above are larger than made sense to fix in this commit. For now I'm disabling the tests that cause problems, but we will need to make a concerted effort to get things working on this new substrate if we are going to make good on our goals.
*	Remove support for the -no-checking flag (#392)	Tim Foley	2018-02-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Remove support for the -no-checking flag Fixes #381 Fixes #383 Work on #382 - No longer expose flag through API (`SLANG_COMPILE_FLAG_NO_CHECKING`) and command-line (`-no-checking`) options - Remove all logic in `check.cpp` that was withholding diagnostics (including errors) when the no-checking mode was enabled - Remove `HiddenImplicitCastExpr`, which was only created to support no-checking mode (it represented an implicit cast that our checking through was needed, but couldn't emit because it might be wrong) - Remove logic for storing function bodies as raw token lists when checking is turned off. I'm leaving in the `UnparsedStmt` AST node in case we ever need/want to lazily parse and check function bodies down the line. - Remove a few of the code-generation paths we had to contend with, but keep the comment about them in place. - Remove GLSL-based tests that can't meaningfully work with the new approach. - Fix other tests that used a GLSL baseline so that their GLSL compiles with `-pass-through glslang` instead of invoking `slang` with the `-no-checking` flag. - Remove tests that were explicitly added to test the "rewriter + IR" path, since that is no longer supported. There is more cleanup that can be done here, now that we know that AST-based rewrite and IR will never co-exist, but it is probably easier to deal with that as part of removing the AST-based rewrite path. We've lost some test coverage here, but actually not too much if we consider that we are dropping GLSL input anyway. * Fixup: test runner was mis-counting ignored tests * Fixup: turn on dumping on test failure under Travis * Fixup: enable extensions in Linux build of glslang
*	Initial work on getting render-test to support vulkan (#391)	Tim Foley	2018-02-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Basic fixes to gets some Vulkan GLSL out of the IR path We haven't been paying much attention to the Vulkan output from the IR path, but that needs to change ASAP. This commit really just implements quick fixes, without concern for whether they are a good fit in the long term. - Add some more mappings from D3D `SV_` semantics to built-in GLSL variables, and stop redeclaring those built-in variables in our output GLSL. - Add custom output logic for HLSL `StructuredBuffer<T>` types, so that they emit as `buffer` declarations with an unsized array inside. This has some real limitations: - What if the user passes the type into a function? The parameter should be typed as an (unsized) array, and not a buffer. - What happens if we have an array of structured buffers? We need to declare an array of blocks (which GLSL allows), but this changes the GLSL we should emit when indexing. - Customize the way that we emit entry point attributes (e.g., `[numthread(...)]`) to also support outputting equivalent GLSL `layout` qualifiers. In many of these cases, a better fix might involve doing more of this work in the IR as part of legalization (e.g., we already have a pass that deals with varying input/output for GLSL, so that should probalby be responsible for swapping the `SV_` to `gl_`, especially in cases where the types don't match perfectly across langauges). * Start adding Vulkan support to render-test - Add both Vulkan and D3D12 as nominally supported back-ends - Add a git submodule to pull in the Vulkan SDK dependencies - I don't want our users to have to install it manually, since the SDK is huge - Checking in the binaries to our main repository seems like a bad idea, but my hope is that we can prune the bloat using a subodule with the `shallow` cloning option - Implement enough logic for the Vulkan back-end to get a single test passing on Vulkan * Fix warning * Fixup: disable new compute tests for Linux * Fixup: ignore Vulkan tests on AppVeyor * Dynamically load Vulkan implementation Rather than statically link to the Vulkan library, we will dynamically load all of the required functions. This removes the need to have the stub libs involved at all. * Remove vulkan submodule I had set up a `vulkan` submodule to pull in the headers and stub libs, but now that we are going to dynamically load all the symbols anyway, the stub lib binaries aren't needed and we can just commit the headers. * Add Vulkan headers to external/
*	Improvements and bug fixes for global type parameters	Yong He	2018-01-21
\| \| \| \| \| \|	1. allow spReflection_FindTypeByName to accept arbitrary type expression string 2. allow const int generic value to be used as expression value, and as array size 3. various bug fixes in witness table specialization / function cloning during specializeIRForEntryPoint to avoid creating duplicate global values, not copying the right definition of a function from the other module, not cloning witness tables that are required by specializeGenerics etc.
*	cleanup unused code.	Yong He	2018-01-17
\|
*	All compiler fixes to get ir branch work with falcor feature demo.	Yong He	2018-01-17
\| \| \| \| \| \| \| \| \|	- support overloaded generic function. this involves adding a new expression type, `OverloadedExpr2` to hold the candidate expressions for the generic function decl being referenced. - make BitNot a normal IROp instead of an IRPseudoOp - make sure we clone the decorations of parameters when cloning ir functions - propagate geometry shader entry point attributes (`[maxvertexcount]` and `[instance]`) through HLSL emit - IR emit: handle geometry shader entry-point parameter decorations, such as 'triangle'. - IR emit: treat geometry shader stream output typed ir value as `should fold into use`.
*	Support nested generics	Yong He	2018-01-12
\| \| \| \|	fixes #362
*	Refactor substitution representation in DeclRefBase (#363)	Yong He	2018-01-12
\| \| \| \| \| \| \| \| \| \| \| \|	This commit changes the type of `DeclRefBase::substitutions` from `RefPtr<Substitutions>` to `SubstitutionSet`, which is a new type defined as following: ``` struct SubstitutionSet { RefPtr<GenericSubstitution> genericSubstitutions; RefPtr<ThisTypeSubstitution> thisTypeSubstitution; RefPtr<GlobalGenericParamSubstitution> globalGenParamSubstitutions; } ``` This change get rid of most helper functions to retreive the substitution of a certain type, as well as surgery operations to insert a `ThisTypeSubstitution` or `GlobalGenericTypeSubstittuion` at top or bottom of the substitution chain. It also simplies type comparison when certain type of substitution should not be considered as part of type definition.
*	Fix struct decl order again (#348)	Tim Foley	2018-01-03
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Add core.natvis file to Slang DLL build It seems that when `slang.dll` gets loaded into a user's project, the debugger is able to pick up the custom visualizers implemented in `slang.natvis` (which is directly added to the DLL project) but not `core.natvis` (which is added to a static library project that the DLL project references). Adding `core.natvis` to the DLL project directly seems to resolve this and greatly improve the debugging experience when in user code. * Bug fix: emit type of CB before CB when using IR The problem here was the logic for emitting types used by an IR declaration before the declaration. I refactored it to share logic between variables with initializers and functions, but in doing so failed to have an ordinary variable (which includes constant buffers) ensure that its own type was emitted before the variable. This is a one-line fix.
*	Support nested generic types (e.g. L<T<S>>)	Yong He	2017-12-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fixes #325 This commit includes following changes: 1. Including a default DeclaredSubtypeWitness argument when creating a default GenericSubstitution for a DeclRefType, so that the witness argument can be successfully replaced with an actual witness table after specialization. (check,cpp) 2. Not emitting full mangled name for struct field members. Since the declref of the member access instruction do not include necessary generic substitutions for its parent generic parameters, so the mangled names of the declaration site and use site mismatches. Instead we just emit the original name for struct fields. (emit.cpp) 3. Allow IRWitnessTable to represent a generic witness table for generic structs. Adds necessary fields to IRWitnessTable for generic specialization. For now, the user field of the IRUse is not used and is nullptr. (ir-inst.h) 4. Make IRProxyVal use an IRUse instead of an IRValue, so that an IRValue referenced by IRProxyVal (as a substitution argument) can be managed by the def-use chain for easy replacement. This is used for specializing witness tables. (ir.cpp, ir.h) 5. Add a `String dumpIRFunc(IRFunc)` function for debugging. 6. Add name mangling for generic / specialized witness tables (mangle.cpp) 7. improved natvis file for inspecting witness tables. 8. Add specialization of witness tables: 1) `findWitnessTable` will simply return the specialize IRInst for a generic witness table. 2) make `cloneSubstitutionArg` call `cloneValue` to clone the argument instead of calling `context->maybeCloneValue`, so we can make use of the cloned value lookup machanism to directly return the specialized witness table (which is done when we process the `specialize` instruction on the generic witness table before process the decl ref). 3) bug fix: the argument in ir.cpp:3338 should be `newArg` instead of `arg`. 4) add `specializeWitnessTable` function to specailize a generic witness table. It clones the witness table, and recursively calls `getSpecailizedFunc` for the witness table entries. 5) make `specailizeGenerics` function also handle the case when an operand of the `specialize` instruction is a witness table. We will call `specializeWitnessTable` here and replace the `specialize` instruction with the specialized witness table. The replacement mechanism based on IR def-use chain works here because we have already make IRProxyVal a part of the def-use chain. 9. Add two more test cases for nested generics with constraints. (generic-list.slang and generic-struct-with-constraint.slang)
*	Support generic `struct` types during IR-based emit	Tim Foley	2017-12-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes #318 Most of the required support was actually in place, so this is just a bunch of fixes: - Detect when we are in "full IR" mode, so that we can always emit `struct` declarations with their mangled named (which will produce different names for different specializations, since we emit decl-refs) - Carefully exclude builtin types from this for now. We'll need a more complete solution for mapping HLSL/Slang builtin types to their GLSL equivalents soon. - Skip emitting types referenced by generic IR functions, since they might not be usable. - Also fix things up so that we emit types used in the initializer for any global variables. - Fix bug in generic specialization where we specialize the same function more than once, with different type arguments. We were crashing on a `Dictionary::Add` call where the key already exists from a previous specialization attempt. - Fix name-mangling logic so that when outputting a possibly-specialized generic it looks for the outer-most `GenericSubstitution` rather than just the first one in the list. This is to handle the way that we insert other substitutions willy-nilly in places where they realistically don't belong. :( All of these changes together allow us to pass a slightly modified (more advanced) version of the test case posted to #318.
*	IR: fixes for subscript accessors (#322)	Tim Foley	2017-12-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* IR: fixes for subscript accessors Fixes #320 This is a bunch of fixes for handling of `__subscript` operations on builtin types (notably `RWStructuredBuffer` and `StructuredBuffer` at this point). - Automatically add a `GetterDecl` to any subscript decalratio was declithout any accessors. This avoids hitting a null- dereference in the emit logic. - Add a notion of a `RefAccessor` (declared with `ref`) as a peer to getters and setters. The idea is that a `ref` accessor returns a pointer to the element data, so that it can be used for both getting and setting values. This is closer to the behavior of `RWStructuredBuffer` element access in HLSL. - Fixes for dealing with "access chains" where there might be a combination of a subscript (where the is a `get` and `set` but no `ref`) and member access, so that we have to read the base value into a temp, modify it, and then write it back. - This logic is still a bit of a mess, so we will eventually want to take a more consistent pass over this to deal with how we "materialize" values for setters. - Update `RWStructuredBuffer` to have a `ref` accessor, and then fix up the IR tests to handle the new opcode that I added for it. - Note: I didn't handle this as an intrinsic simply because the `tests/ir/` tests aren't really set up to handle builtins with ugly mangled names. Fixup: type error in VM for buffer element ref I was using the result type of the op as the element type for computing the element address, but the result type is a pointer to the real element type. This caused test failures on 64-bit platforms, where the stride of the buffer in the `ir/factorial` test needs to be 4. The fix is to assume the result type is a pointer, and extract the pointed-to type out of that.
*	More fixes for Falcor IR support (#317)	Tim Foley	2017-12-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix: try to improve float literal formatting An earlier change switched to using the C++ stdlib to format floats, but I neglected to check that it would correctly print values that happen to be integral with a decimal place (e.g., print `1.0` instead of just `1`). That causes some obscure cases to fail (e.g., when you write `1.0.xxxx` in HLSL, and we print out `1.xxxx`). I've tried to resolve the issue by using the "fixed" output format, but that may also create problems for extremely large or small values (which really need to use scientific notation). However, this at least puts us back where we were before... * Add support for source locations to the IR - Add a `sourceLocation` field to all `IRValue`s - Add some logic to associate locations with operations while lowering (using a slightly "clever" RAII approach) - Make sure that when cloning instructions, we also clone the location - Make the emit logic use the existing support for source locations when emitting code from the IR A nice cleanup to this work would be to have the source locations for IR values not be hyper-specific about both line and column; it is probably enough to just emit on the correct line.
*	Fix up parameter block emit for mixed rewriter+IR (#316)	Tim Foley	2017-12-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix up parameter block emit for mixed rewriter+IR The basic problem here arose when a parameter block was declared in code that will be lowered via IR, but is referenced from user code that will use the AST-based lowering pass. The type legalization would split up the parameter block, and the AST would refer to the new variables, but it would fail to recognize when those variables represent constant buffers, and thus should get implicit-dereference behavior. The fix here was just to propagate type information through when creating new AST (pseudo-)expressions to represent IR declarations that were split. A more complete fix down the road should try to make sure that both the expression emit logic and the declaration-emit logic agree on whether a particular declaration of a parameter block or constant buffer needs to support the "implicit dereference" case or not. I'm leaving that to future work. With this change, two test cases that had been disabled now pass. * Fixup: don't do implicit deref for `ParameterBlock` values Right now the rules for `ParameterBlock` and all other uniform praameter groups needs to be different, but the `isImplicitBaseExpr` logic wasn't taking that into account.
*	Fix floating-point literal emit to be locale-independent. (#315)	Tim Foley	2017-12-19
\| \| \| \| \| \| \| \| \| \| \| \| \|	There was a bug that arose in the context of Falcor, because: - Slang uses `sprintf` to format floating-point values when outputting HLSL/GLSL source - Falcor (or a library it uses) does the equivalent of `setlocale(LC_ALL, "")` to set the global locale for the process based on the user's environment variables This led to a floating-point literal of `0.5` getting printed as `0,5f`, with a comma for the decimal point. This then gets consumed by `glslang` which (luckily for us) complains that `5f` is not a valid floating-point literal in their language (since it has neither decimal point nor exponent). The most expedient fix in this case was to switch from using C `sprintf` for formatting floating-point numbers over to using the C++ `<stdio>` implementation, which allows the locale to be set per-stream so that we don't have to rely on (or potentially disrupt) the global locale set by an application using Slang. Longer term if we have the time/resources to do the Right Thing, it would be best to implement our own printing logic for floating-point numbers, since we would eventually need/want to support arbitrary-precision numbers for literals.
*	Work on getting rewriter + IR playing nice together. (#314)	Tim Foley	2017-12-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Work on getting rewriter + IR playing nice together. There are a few different changes here, with the goal of improving the interaction between the "rewriter" code generation approach and the new IR and type legalization code. The main changes are: - Add a new pass that occurs before the AST legalization pass, which walks the (used) AST declarations and tries to discover (1) which declarations need to be specialized/lowered via the IR, and (2) which declarations need to be included in the resulting AST module. - AST-based legalization now uses the generated list when in "rewriter" mode, so that we should be working around issues that users were seeing with types not getting emitted. - TODO: we still need an equivalent fixup in the case of non-"rewriter" emit, so this may still be a problem for `.slang` files. - IR type legalization now precedes AST legalization, so that we can record information on how any IR global values got legalized (e.g., if they got split). Then AST legalization includes logic to reconstruct suitable tuple expressions to reference a split global. - When emitting using IR + AST, we walk all of the declarations that we decided belonged to the IR, but which were subsequently referenced in the AST, to make sure they get output (this would include `struct` types that are declared in a file compiled via IR, but never used in IR-based code). The rewriter+IR use case still doesn't quite work, but the logic for walking the AST in a pre-pass ends up being needed/useful to fix some pure rewriter bugs, so I'm getting this checked in sooner rather than later. * Fixup: walk arguments to generic declaration reference The gotcha here is that the code for walking the AST would walk a line of code like: SomeType a; and know to traverse the declaration of `SomeType`, but if it saw a line of code like: ParameterBlock<SomeType> b; it would traverse the declaration of `ParameterBlock`, but fail to visit that of `SomeType`.
*	More fixups for Vulkan parameter block bindings (#309)	Tim Foley	2017-12-14
\| \| \| \| \| \| \| \| \|	I'm adding a small cross-compilation test to try to make sure that we are testing the binding generation for GLSL output. We probably still need a more complex test that uses multiple blocks, plus variables not in a block. The big changes here are: - Change the `containerTypeLayout` field to a `containerVarLayout` in the `ParameterGroupTypeLayout`, so that we can store the base offsets for the fields in a uniform fashion (even though these will all be zero). - Switch the emit logic to carefully use either the container or element var layout depending on what they are emitting bindings for. This involved adding something akin to the "reflection path" notion that Falcor has to use, but only for the emit step.
*	Fix parameter block binding for Vulkan (#308)	Tim Foley	2017-12-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes #307 This ends up being a major overhaul over how type layout computation is structured and exposed. The big problems all arise around cases where both the "container" for a parameter block or CB, and the "element" type both use the same kind of resource. E.g., if you define a CB with a texture in it, then in Vulkan both the CB and the texture use the same kind of resource, and so if you query the CB's resource usage it will just tell you it uses two descriptor-table slots, but nothing more than that. Similar confusion still arises in the HLSL case, when a CB with a texture in it reports its parameter category as "mixed" so that a user might query for a category they didn't mean to. There were also cases in the existing code where a parameter block might expose both a register-space usage and another concrete resource type, which isn't right. The most important changes here are: - A `ParameterGroupTypeLayout` now has a more refined internal structure, consisting of: - A `containerTypeLayout`, which represents the resource usage of the buffer/block itself (e.g., if a constant buffer had to be allocated) - An `elementVarLayout` which stores the offsets that need to be applied to get from the `VarLayout` for an instance of this parameter-group type to the offsets of its elements. The `TypeLayout` for this variable layout should be the "raw" type of the block/CB element. - The `offsetElementTypeLayout` (formerly just `elementTypeLayout`) which represents the element type, but in the case of a `struct` element type, will have fields offset similar to the `elementVarLayout`. This is what all the old code used to use, so we need to keep it for compatibility. - When doing reflection on a `ParameterGroupTypeLayout`, we now only report the resource usage of the `containerTypeLayout`. This is technically a potentially breaking change in the public API, but I don't think Falcor will mind, since they actually want something closer to this behavior. - Add a new public API for querying the element variable layout of a parameter block of constant buffer. This could be used by savvy applications to fold the handling of CB element offsetting into some notion of a "reflection path." This would be required for applications that want to handle CBs or parameter blocks where the element type is not a `struct` type. - Remove old logic for applying an offset when creating a type layout for constant buffer element, and instead perform offsetting more uniformly later, by constructing the `offsetElementTypeLayout` from the `rawElementTypeLayout`. This is useful both because we want to keep both (the "raw" type layout becomes the type layout of the `elementVarLayout`), and also because we can decide later whether we even want to allocate a CB register for a buffer, based on whether it actually contains any uniform data. - Fix cases where we might end up with a parameter block type reporting both that it uses a whole register space (and thus should not expose the resource usage of the container/element type) and a constant-buffer register/slot. The latter should be hidden inside the regsiter space. - Clean up the `spReflectionParameter_GetBinding{Index,Space}` functions to just route to `spReflectionVariableLayout_Get{Offset,Space}`, using the "default" category of the parameter - Try to make the `GetSpace` query take into account cases where a variable also has an explicit `RegisterSpace` allocation. - This probably still needs some cleanup, since ideally we'd just move things into the `space` field on the `ReosurceInfo` and have an invariant that a variable either has a `RegisterSpace` allocation, or it has other resource infos, but never both... - Add some ad-hoc logic so that if the user queries for a binding index/space using a parameter category that doesn't actually apply (e.g., they query for a D3D `t` register when using Vulkan), we can optionally remap it to the resource type they "probably" meant. This is a mess of Do What I Mean code, but it is also what our users want right now. - Fix various bits of emit logic so that if a parameter block has a register space/set allocated to it, we properly output that as part of the binding information for it. - This is another thing that might be cleaned up if we rationale the way that things get split during legalization. - Add a GLSL case for emitting a parameter block variable as a `cbuffer`.
*	Cleanups to `ParameterBlock<T>` behavior. (#304)	Tim Foley	2017-12-08
\| \| \| \| \| \| \| \| \| \|	* Cleanups to `ParameterBlock<T>` behavior. These add some more realistic tests using the `ParameterBlock<T>` support, and show that it can work with the "rewriter" mode. Unfortunately, this code does not currently work with the rewriter + the IR at once. That will need to be fixed in a follow-on change, because I now see that the root problem is pretty ugly. * cleanup
*	Make AST and IR share type legalization code (#303)	Tim Foley	2017-12-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Make AST and IR share type legalization code A previous change already made it so that the AST-to-AST lowering/legalization pass could work together with IR-based lowering of `import`ed code, but that change didn't take into account the case where a function written in the AST needed to call an IR function and pass in a type that required legalization. Both the IR-based and AST-based passes had their own approaches to type legalization, that mostly agreed on the desired output, but they ended up creating their own representations for legalized types which would mean that for a function call the caller and callee might end up legalizing the parameter list to use different types. This change tries to fix this issue (and adds a new test case that relies on the fix) by massively overhauling the AST-based legalization pass so that it uses the same type legalization code as the IR. The shared code has been moved out into `legalize-types.{h,cpp}`. Notes: - I eliminated the `FilteredTupleType` type, since it was starting to cause code duplication in a lot of places. Instead, type legalization just creates new `struct` types to represent the result of filtering. - One big consequence of this is that the `LegalType::pair` case needs to remember for each field in the original type which field (if any) in the new `struct` type it maps to - A big source of complexity (and probably bugs) in this code is trying to figure out how to parent these new `struct` definitions effectively. A good follow-on change would be something that outputs declarations on-demand during the AST emit logic (as we do for the IR), just to avoid some of this song and dance. - The old AST type legalization had a notion of both a "tuple" type and a "varying tuple" type. The "tuple" case was quite complex, and combined behavior currently handled by `LegalType::pair` (for splitting into ordinary and special sides) and `LegalType::tuple` (for holding multiple distinct elements to represent the fields of an aggregate). The "varying tuple" case was closer to `LegalType::tuple`, so I tried to just re-use the existing logic for that too. The one place this potentially gets messy is in `reifyTuple()`. - The messiest bit of handling the "varying tuple" concept (which is used for GLSL shader inputs/outputs since they have to be scalarized) is that when passing them as function arguments we need to reify the tuple back into a structured value. Because the `LegalExpr` hierarchy doesn't have type information, but constructing a value of the "original" type requires such information, things get a little messy. - I did not try to deal with any of the logic related to handling system inputs/outputs for cross-compilation purposes. Of course, the long-term goal is that any actual cross-compilation is handled via the IR, but this change can't afford to break the AST-based path just yet. As a result, there is still quite a bit of complexity in the handling of assignment, to deal with cases where "fixups" are required. * fixup: bad code in macro, not caught by Visual Studio compiler * fixup: more stuff missed by VS compiler * fixup: VS continutes to miss stuff in UNREACHABLE_RETURN
*	Enable HLSL/GLSL "rewrite" + IR-based Slang codegen (#300)	Tim Foley	2017-11-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The big picture here is that the AST-to-AST pass in `ast-legalize` will now detect when a declaration being referenced comes from an `import`ed module, and (if IR codegen is enabled), it will trigger cloning of the IR for the chosen symbol into an IR module that will sit alongside the legalized AST. Then, during HLSL/GLSL code emit, we emit all the IR-based code first, and then the AST-based code. Whenever the AST code references a symbol that was lowered via IR (we keep track of these) we emit the mangled name of the IR symbol. Notes/details: - A lot of the logic for cloning IR symbols referenced by the AST matches the same logic that would clone them for completely IR-based codegen, so I tried to hoist out the common logic and share it (e.g., so that we apply the same guaranteed transformations in both cases). This required basically rewriting the logic in `emit.cpp` that decomposed the various cases. - There is a new compute test case added to test this functionality. `tests/compute/rewriter.hlsl` confirms that we can use the `-no-checking` mode for the HLSL code, but still make use of a library of Slang code that employs generics, etc. - Adding this test case required adding a new compute test mode that invokes `render-test` with the `-hlsl-rewrite` flag. - It turns out that the existing `tests/render/cross-compile0.hlsl` test should have been using this functionality already. It was opting into the use of the IR via `-use-ir`, and the `render-test` application already tries to set `-no-checking` for non-Slang input languages by default. Fixing the code path this test triggers means that it is now a second test of rewriter+IR codegen. - The `translateDeclRef` logic in `ast-legalize.cpp` seemed sloppy in places, and would potentially clone declarations, when declaration references were desired. I tried to clean a bit of this up, so some call sites are now changed. - This change tries to clean up some work around cloning of global values - All global value kinds (not just functions) now go through the logic of trying to pick a "best" definition, so that they can be used when we are linking multiple modules - The logic for registering cloned values has been unified a bit, so that clients always pass in an `IROriginalValuesForClone` that either wraps a single value (maybe just null), or an `IRSpecSymbol` that gives a list of values to regsiter the new value as a clone for. - I made one piece of code that was cloning witness tables as part of generic specializations not* register a clone. I think this is correct because we may specialize the same generic multiple ways, so registering any values we clone is not the right idea, but I might be missing something... - I also reorganized this logic so that it would be easier to clone a global value when we only know its mangled name (which is the case when it is the AST that triggers cloning) - I made sure that when loading a module via `import`, the translation unit for the new module copies the `-use-ir` flag from the overall compile request, if it is present (otherwise we wouldn't generate IR for loaded modules at all... oops). - Note that `getSpecializedGlobalValueForDeclRef()`, which is the main routine used by the AST legalization to trigger cloning of an IR value does not currently handle declaration references that require specialization. - This change does not deal with trying to unify the type legalization logic between the AST-to-AST rewriter and the IR-based codegen, so if you call an imported function with types that require legalization, Bad Things are expected to happen right now.
*	Cleanups (#298)	Tim Foley	2017-11-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Rename `lower.{h,cpp}` to `ast-legalize.{h,cpp}` This pass isn't really performing lowering akin to `lower-to-ir.{h,cpp}` so the file name is misleading. By renaming this pass we emphasize its role as an AST-related pass. Also update the comment at the top of `ast-legalize.h` to reflect the intended purpose of this pass in a world where we have the IR up and running. * Allow `import` as an alias for `__import` The use of double underscores to mark our new syntax has so far had two purposes: 1. It helps identify syntax that isn't meant to be exposed to users in its current form (e.g., `__generic` gets a double underscore because we want users to have a more pleasant surface syntax for generics that they write). This rationale doesn't apply to `__import`, which is a major language feature that users need to interact with. 2. It helps avoid the problem where the compiler treats something as a keyword that isn't supposed to be reserved in HLSL/GLSL and so causes existing user code to fail to parse (e.g., when the user tries to write a function called `import`). This no longer matters because we look up almost all of our keywords using the existing lexical scoping in the language (so the user can shadow almost any keyword with a local declaration). So, neither of the original two reasons applies to `__import`, and it makes sense to expose it as `import`. Doing so is a one-line change.
*	Fix emitting of loop attributes for HLSL pass-through (#296)	Tim Foley	2017-11-22
\| \| \| \| \|	Fixes #295. The code previously had a white list of attributes that it passed through, implemented in `emit.cpp` in an ad hoc fashion. The fix here is to just pass through whatever attributes the user wrote, and then let the downstream compiler diagnose if any of them are errorneous.
*	IR: support global variable with initializers (#294)	Tim Foley	2017-11-20
\| \| \| \| \| \| \| \| \| \|	The big change here is that the ability to contain basic blocks with instructions in them has been hoisted from `IRFunc` into a new base type `IRGlobalValueWithCode` shared with `IRGlobalVar`. The basic blocks of a global variable define initialization logic for it; they can be looked at like a function that returns the initial value. Places in the IR that used to assume functions contain all the code need to be updated, but so far I only handled the cloning step. The emit logic currently handles an initializer for a global variable by outputting its logic as a separate function, and then having the variable call that function to initialize itself. This should be cleaned up over time so that we generate an ordinary expression whenever possible. I also made the emit logic correctly label any global variable without a layout (that is, any that don't represent a shader parameter) as `static` so that the downstream HLSL compiler sees them as variables rather than parameters.
*	IR: Add support for `out` and `inout` parameters (#289)	Tim Foley	2017-11-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These were already being handled a little bit, by lowering an `out T` or `inout T` function parameter in the AST to a function parameter with type `T*` in the IR, and then emiting explicit loads/stores. The HLSL emit logic, however, couldn't tell the difference between an `out` parameter, an `inout`, or a true pointer (if we ever needed to support them...). The intention (not fully implemented) was that we'd use a hierarchy of types rooted at `PtrTypeBase`: - `PtrTypeBase` - `Ptr`: "real" pointers in the C/C++ sense - `OutTypeBase`: pointers used to represent by-reference parameter passing - `OutType`: IR level type for an `out` parameter - `InOutType`: IR level type for an `inout` or `in out` parameter Actually implementing this involved: - Adding a bit more flexibility to the `Session::getPtrType` logic to allow for creating any of the concrete types above - Making the `lower-to-ir` logic create the right type for function parameters (instead of just using `PtrType`) - Making the HLSL emit logic check for the `OutType` and `InOutType` cases rather than just `PtrType` - Changing a bunch of small places in the code so that they use `PtrTypeBase` instead of `PtrType` when they should handle any of the above cases, and also make a few places check for `OutTypeBase` instead of `PtrType` or `PtrTypeBase`, when they are really trying to capture by-reference parameters - Add a test case that uses all of the different cases we care about (without these fixes, this test case generates errors from fxc because of variables being used before being initialized, becaues parameters get declared `out` that should be `inout`). A minor point here is that we are playing a bit fast and loose right now because the IR does not actually enforce any type checks. From the standpoint of the front end, `Ptr<T>`, `Out<T>`, and `InOut<T>` are all unrelated types (each is just a `struct` declared in `core.meta.slang`), but this doesn't really matter because none of these are types our current users are explicitly using. In the IR it makes perfect sense to allow `Out<T>` or `InOut<T>` as the operand of a `load` or `store` instruction (and ditto for `getFieldAddr`, etc.) - there instructions just apply to any `PtrTypeBase`. The place where this potentially gets tricky is whether an `Out<T>` can be used where a `Ptr<T>` is expected, or vice vers (e.g., can I just pass my local variable's pointer directly to an `Out<T>` function parameter? I'm going to ignore these issues for now, since the code currently works for our test case.
*	Add support for global generic parameters (#285)	Yong He	2017-11-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add support for global generic parameters (In-progress work) This commit include: 1. Update Slang API to allow specification of generic type arguments in an `EntryPointRequest` 2. Add parsing of `__generic_param` construct, which becomes a GlobalGenericParamDecl, contains members of `GenericTypeConstraintDecl`. 3. Semantics checking will check whether the provided type arguments conform to the interfaces as defined by the generic parameter, and store SubtypeWitness values in the EntryPointRequest, which will be used by `specializeIRForEntryPoint` when generating final IR. 4. Add a new type of substitution - `GlobalGenericParamSubstitution` for subsittuting references to `__generic_param` decls or to its member `GenericTypeConsraintDecl` with the actual type argument or witness tables. 5. Update `IRSpecContext` to apply `GlobalGenericParamSubstitution` when specializing the IR for an EntryPointRequest. 6. Update `render-test` to take additional `type` inputs, which specifies the type arguments to substitute into the global `__generic_param` types. This commit does not include ProgramLayout specialization. * IR: pass through `[unroll]` attribute (#284) The initial lowering was adding an `IRLoopControlDecoration` to the instruction at the head of a loop, but this was getting dropped when the IR gets cloned for a particular entry point. The fix was simply to add a case for loop-control decorations to `cloneDecoration`. * fix warnings * IR: support `CompileTimeForStmt` (#286) This statement type is a bit of a hack, to support loops that must be unrolled. The AST-to-AST pass handles them by cloning the AST for the loop body N times, and it was easy enough to do the same thing for the IR: emit the instructions for the body N times. The only thing that requires a bit of care is that now we might see the same variable declarations multiple times, so we need to play it safe and overwrite existing entries in our map from declarations to their IR values. Of course a better answer long-term would be to do the actual unrolling in the IR. This is especially true because we might some day want to support compile-time/must-unroll loops in functions, where the loop counter comes in as a parameter (but must still be compile-time-constant at every call site). * Add support for global generic parameters (In-progress work) This commit include: 1. Update Slang API to allow specification of generic type arguments in an `EntryPointRequest` 2. Add parsing of `__generic_param` construct, which becomes a GlobalGenericParamDecl, contains members of `GenericTypeConstraintDecl`. 3. Semantics checking will check whether the provided type arguments conform to the interfaces as defined by the generic parameter, and store SubtypeWitness values in the EntryPointRequest, which will be used by `specializeIRForEntryPoint` when generating final IR. 4. Add a new type of substitution - `GlobalGenericParamSubstitution` for subsittuting references to `__generic_param` decls or to its member `GenericTypeConsraintDecl` with the actual type argument or witness tables. 5. Update `IRSpecContext` to apply `GlobalGenericParamSubstitution` when specializing the IR for an EntryPointRequest. 6. Update `render-test` to take additional `type` inputs, which specifies the type arguments to substitute into the global `__generic_param` types. progress on parameter binding * Add a more contrived test case for specializing parameter bindings * update render-test to align buffers to 256 bytes (to get rid of D3D complains on minimal buffer size). * adding one more test case for parameter binding specialization. * Cleanup according to @tfoleyNV 's suggestions. * fix a bug introduced in the cleanup
*	Revise type legalization so it can handle constant buffers (#282)	Tim Foley	2017-11-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Revise type legalization so it can handle constant buffers The existing legalization approach with "tuples" can handle scalarizing a `struct` type with resource-type fields in it, but it had several big gaps. The most notable is that given a type that mixes uniform and resource fields, we can't just blindly scalarize things: ``` struct P { float4 a; float4 b; Texture2D t; }; cbuffer C { P gParam[8]; }; ``` The existing code was completely ignoring the declaration of `gParam` inside `C`, but even if we fixed that issue, we'd get something like: ``` cbuffer C { float4 gParam_a[8]; float4 gParam_b[8]; }; Texture2D gParam_t[8]; ``` In this case we've completely changed the layout of the uniform buffer, by switching from AOS to SOA. Even if we could get the type layout logic and the IR to agree on this, it would be a surprise to users, and "principle of least surprise" should be a big deal on a project with as many moving parts as ours. The right thing to do is to have legalization create a "stripped" version of the original type `P` and use that: ``` struct P_stripped { float4 a; float4 b; }; cbuffer C { P_stripped gParam[8]; }; Texture2D gParam_t[8]; ``` Then at a call site, this: ``` foo(gParam); ``` becomes: ``` foo(gParam, gParam_t); ``` This is exactly how the current AST-to-AST legalization handles mixed uniform and resource types, but the way it does it involves some annoying kludges: - That pass has a notion of a "tuple" similar to our legalization, but every tuple has an optional "primary" entry for all the uniform data, plus tuple elements for the resources, and a given field may be represented on one side, the other, or both. It makes the code for handling tuples very messy. - That pass does the "stripping" of types by actually marking up the AST declarations (this is okay because it is constructing a new AST as it goes), so that when they get emitted certain fields don't actually show up. That is, we fix the problem with type `P` by actually modifying the user's declaration of `P`. That seems out of bounds for the IR. This change fixes the problem in our IR type legalization while trying to avoid the problems of the AST-to-AST pass by using two new ideas: 1. We add a new case for `LegalType` (and `LegalVal`) that is a "pair" type, where a pair consists of both an "ordinary" type (for uniform data) and a "special" type (for resource data). E.g., after legalization, the type for `C` (which can be over-simplified to `ConstantBuffer<P>` for our purposes), will be a `LegalType::pair` where the ordinary side is `ConstantBuffer<P_stripped>` and the special side is a tuple containing the `Texture2D` field. 2. We add a new (and annoyingly hacky) AST-level type called `FilteredTupleType` which is semantically a sort of tuple type (it holds a list of elements, and the elements have their own types), but which remembers an "original type" that it was created from, and for each element remembers the field of the original type that it corresponds to. This is used to construct a type like `P_stripped` as an actual AST-level structural type. The core logic for legalizing an aggregate type had to get more complicated just because of the new pair case, so there is now a `TupleTypeBuilder` that asists with taking an aggregate type, processing its fields, and then picking the right `LegalType` representation for the result. Other smaller changes: - Made the legalization logic actually legalize `PtrType<T>`. E.g., if `T` legalizes to a tuple, we need to construct a tuple of pointer types. The same exact thing needs to be applied to arrays, and any other generic type that should "distribute over" pairs/tuples. - Made the legalization logic actually legalize `ConstantBuffer<T>` and similar. The basic idea there is if `T` maps to a pair, we wrap `ConstantBuffer<...>` around the ordinary side, and `implicitDeref` around the special side. - Removed a bunch of `#ifdef`ed-out code from the end of `ir-legalize-types.cpp`. That was code from my first attempt at legalization that failed miserably (trying to do it via local changes and a work list instead of a global rewrite pass), but it had some code I wanted to reference when writing the version that actually got checked in (should have deleted the code earlier, though). - Added a bunch of cases for `LegalType::none` (and the `LegalVal` equivalent) that helped simplify the logic fo the `pair` case by allowing me to always dispatch to both the "ordinary" and "special" sides, even if they might not actually be present. - Renamed `TupleType` and `TupleVal` over to `TuplePseudoType` and `TuplePseudoval` to recognize the fact that we might actually need/want real tuples in the type system, to go along with these fake ones (that need to be optimized away). The biggest doubt I have about this change is the whole `FilteredTupleType` thing; it seems like an obviously contrived type to add to the front-end type system that really only solves IR-level problems. A cleaner approach might have been to just add a plain old `TupleType` to the front-end type system (and initially I started with that), and then have yet another `LegalType`/`LegalVal` case that handles mapping from the fields of the original type to the numbered tuple elements. I expect we'll actually want to make that change in the future (especially if we ever add true tuples to the front-end), but for right now I let myself be swayed by the desire to have these stripped/filtered types get names that explain their provenance ("where they came from") to make our output code more debuggable. The way I've done it is probably overkill, though, and we need a much more complete effort on the readability and debuggability of our output before anything like that is worth worrying about. * Fixup: typo * Fixup: fix output of "non-mangled" names for test cases - Make sure to attach high-level decls to variables created as part of type legalization - Also, try to share more of the code between the different cases of variables - Fix up `parameter-blocks` test case that was passing `-no-mangle` but expecting mangled names in the output - Fix up `multiple-parameter-blocks` to not rely on `-no-mangle` for now, because it would lead to two global variables with the same name (need to fix that underlying issue eventually). - Also fix name generation logic so that we only use "original" names (from high-level decls) specifically when the `-no-mangle` flag is on, and otherwise use IR-level names. * Fix: handle constant buffers better in render-test - Don't request both CB and SRV usage for buffers, since that is illegal - Also, don't try to create an SRV when user requested a CB (since the required usage flag won't be there) - Record the input buffer type on the `D3DBinding` for a buffer, and use that to tell us when to bind a CB instead of SRV/UAV - Fix expected output for `cbuffer-legalize` test now that we are actually feeding it correct cbuffer dta.
*	Various IR fixes for Falcor (#280)	Tim Foley	2017-11-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Change function mangling so we use `p<parameterCount>p` instead of just `p<parameterCount>` to avoid the parameter count running into digits at the start of a mangled type name and tripping up the un-mangling logic. - We really need to step back at some point and define our mangling scheme a bit more carefully, especially if we are going to keep going down this road where un-mangling things is important for generating HLSL output. - Also allow the unmangling logic to unmangle a few more cases of generic parameters, so that it can skip over them to get to the parameter count of the underlying function. - Add a notion of an `unreachable` instruction to the IR, and emit it as the terminator (if needed) at the end of the last block for a function with a non-void return type. - This does not implement any logic to emit a diagnostic if the `unreachable` turns out to be potentially reachable - Fix a bug in IR specialization of generics where we can't create two different specializations of the same function, because both get registered in the same hash map With all these fixes, testing in Falcor modified to use the full Slang compiler and IR for all HLSL/Slang: - The UI and text rendering shaders yield HLSL that compiles without error; no idea if they actually work - The ModelViewer shaders yield HLSL, but there are some issues (looks like type legalization isn't applying to stuff inside constant buffers)
*	IR: add support for `switch` statements (#278)	Tim Foley	2017-11-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* IR: add support for `switch` statements Fixes #273 This is just something we hadn't gotten to yet on the IR. The actual design of the instruction is unsurprising (once you take into consideration the requirement for structured control flow). A `switch` instruction takes the form: switch <condition> <breakLabel> <defaultLabel> [<caseVal> <caseLabel>]* Where `condition` is the value to switch on, `breakLabel` is the "join point" after the original `switch` statement, `defaultLabel` is where to go if the value doesn't match any case, and each pair of `caseVal` and `caseLabel` is what to do on a particular value. It is required that `caseVal` be a literal, but this isn't currently being enforced in the IR (the front-end should be making a check and constant-folding the case labels). For structured control flow, we also make the assumption that the cases are in order: cases with the same label must be grouped together, and any case that falls through to another must come right before it. Given this representation, the emit logic can reconstruct a `switch` statement with relative ease, given the machinery we already have. It makes sure to group together case values with the same label (again, assuming they are contiguous), and will insert the `default:` label in with whatever group it belongs to. Actually emitting code for a `switch` statement seems superficially simple, until you realize that a complete implementation needs to handle stuff like "Duff's Device." The current implementation makes the assumption that all `case` and `default` statements are directly nested under a `switch`, and that there is no way for control flow to enter a case except by the `switch` itself, or fall-through. In order to facilitate the grouping of cases in the IR-to-HLSL emit logic, the AST-to-IR lowering logic tries to detect cases where there are multiple `case`s in a row, and emit only a single label for them. One big/annoying gotcha is that we don't properly handle the case where a `default:` case has a non-trivial fall-throguh to another case. That seems fine for now since HLSL doesn't support fall-through anyway, but it probably needs to get detected somewhere in the Slang compiler (e.g., maybe we should add a diagnostic pass over the IR that detects target-specific problems like that and emits errors). * IR: Add support for empty statements. - Add empty statement in `lower-to-ir.cpp` - Go ahead and eliminate the statement catch-all and explicitly enumerate the cases we don't support - Fix up parser for block statements so that it doesn't leave a null statement as the body of a `{}` - Add an empty statement to one of the cases for the `switch` test, to ensure we are testing empty statements
*	Parameter block work (#276)	Tim Foley	2017-11-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Don't auto-enable IR use for compute tests The `COMPARE_COMPUTE` and `COMPARE_RENDER_COMPUTE` test fixtures were set up to always enable the `-use-ir` flag on Slang, which precludes having any tests that confirm functionality on the old non-IR path (which is still required by our main customer). This change adds the `-xslang -use-ir` flags explicitly to any compute test cases that left them out, and makes the fixture no longer add it by default. * Continue building out parameter block support The initial front-end logic for parameter blocks was already added, but they are still missing a bunch of functionality. This change addresses some of the known issues: - Bug fix: don't try to emit HLSL `register` bindings for variables that consume whole register spaces/sets - Overhaul type layout logic so that it can make decisions based on a given code generation target (currently passed in as a `TargetRequest`), which allows us to decide whether or not a parameter block should get its own register set on a per-target basis. - Always use a register space/set for Vulkan - Never use a register space/set for HLSL SM 5.0 and lower - By default, don't use register spaces/sets for HLSL output - Add a command-line flag and some "target flags" to enable register-space usage for D3D targets - Hackily add initial support for parameter blocks in the AST-to-AST path - This just blindly lowers `ParameterBlock<T>` to `T`, which shouldn't quite work - A more complete overhaul will probably need to wait until the AST-to-AST legalization is changed to use the `LegalType`s from the IR legalization pass. - Add a compute-based test case to actually run code using parameter blocks - This file runs test cases both with and without the IR
*	IR: Add support for break and continue statements (#272)	Tim Foley	2017-11-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* IR: Add support for break and continue statements The front-end is already doing the work of connecting this statements to their "parent" statement, so we just needed to build a map from the `Stmt` to the corresponding `IRBlock`s to use for break/continue when outputting any loop statement, and then look up in the map for the branch target when outputting a break/continue. When we get around to adding `switch` statements, the same pattern should work just fine. I also added support for `do/while` statements in IR codegen, and made sure to exercise those in one of the test cases I added. There is also an unrelated IR codegen fix for when there is a "bound subscript" on the RHS of an assignment. * IR: fix handling of do/while and continue Thanks to @csyonghe for pointing out my mistake in the earlier commit. I implemented `continue` for `do/while` loops incorrectly, branching to the head of the loop instead of the loop test. I'll try to blame this mistake on the fact that I never use `do/while` loops because I think they are awful. :) The fix for that issue wasn't too bad (see `lower-to-ir.cpp`) but it surfaces a much more serious issue: I wasn't actually implementing `continue` correctly at all when it comes to generating HLSL/GLSL from the IR (I can't easily make an excuse for that one). The basic issue at the heart of this is that given an input statement like: ``` for(int ii = 0; ii < N; ii = doSomething(ii)) { ... } ``` The continue clause (`ii = doSomething(ii)`) could expand into many instructions (across multiple blocks, if we inline), and there is in general no guarantee when we are done that we can package up that code as an expression and spit out a new `for` loop (the same basic argument applies to a `do { ... } while(someComplexExpression())`. So, if we assume that in general we have to generate a full statement for the `continue` clause, what can we emit? - We could try to "outline" the continue code into its own function, so that we can call it from an expression. That could work, but has high implementation complexity. - We could introduce additional `bool` variables for control flow, outputting something like: ``` bool useContinueBlock = false; for(;;) { if(useContinueBlock) { <CONTINUE CODE>; } useContinueBlock = true; <LOOP TEST> <LOOP BODY> } ``` This works but user might balk at the extra variable we introduce. - We could duplicate the code at each continue site. That is, we emit the loop as: ``` for(;;) { <LOOP TEST> <LOOP BODY> <CONTINUE CODE> } ``` but then whenever we'd like to emit `continue;` we instead emit `{ <CONTINUE CODE>; continue; }`. This doesn't introduce any extra variables, but it causes code duplication (limited, if we don't have too many `continue` sites, and the continue clause is small - which are the common cases). When I was initially working on the IR codegen I picked that last option just because it is what `fxc` seems to do, but I neglected to actually implement the special-case codegen for a `continue` instruction. This change addresses that (see `emit.cpp`). Finally, once things were fixed the `continue` test case produced the results Yong told me to expect, but it also produced a warning from the downstream HLSL compiler ("hey, your loop doesn't ever actually loop!"), so I reworked the test back to one that actually loops (but still tests `continue`). As a final aside in this essay of a commit message: the current IR representation of control flow uses special-case instructions for various cases of unconditional branch (and two variations on `if`), but these are not strictly necessary, and a future change will hopefully clean it up. The biggest catch in doing that is that it will require the IR->source codegen to carefully track which blocks represent which kinds of branch targets in context (e.g., you can't assume that a `continue` that nees the special handling above will appear as a distinct kind of instruction).
*	IR: add support for `discard` statement (#261)	Tim Foley	2017-11-07
\| \| \| \| \| \| \| \| \| \|	- Add definition of `discard` instruction - A `discard` is a terminator instruction, just like `returnVoid` - Lower `DiscardStmt` in AST to a `discard` instruction in the IR - Emit `discard` instruction as a `discard;` statement when emitting HLSL/GLSL - Add a test case using the "graphics compute" mode that tests discard. The test writes to one entry in a UAV before doing a conditional (always true at runtime) discard, and then writes to another entry; we expect to see the results of the first write, but not the second.
*	Support generic interface methods (#251)	Yong He	2017-11-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* improve diagnostic messages and prevent fatal errors from crashing the compiler. * fix top level exception catching. * spelling fix * change wording of invalidSwizzleExpr diagnostic * add speculative GenericsApp expr parsing * add new test case of cascading generics call. * Fixing bugs in compiling cascaded generic function calls. Add implementation of DeclaredSubTypeWitness::SubstituteImpl() This is not needed by the type checker, but needed by IR specialization. When input source contains cascading generic function call, the arguments to `specialize` instruction is currently represented as a substitution. The arg values of this subsittution can be a `DeclaredSubTypeWitness` when a generic function uses one of its generic parameter to specialize another generic function. When the top level generics function is being specialized, this substitution argument, which is a `DeclaredSubTypeWitness`, needs to be substituted with the witness that used to specialize the top level function in the specialized specialize instruction as well. * add a test case for cascading generic function call. * parser bug fix * fixes #255 * add test case for issue #255 * Generate missing `specialize` instruction when calling a generic method from an interface constraint. When calling a generic method via an interface, we should be generating the following ir: ... f = lookup_interface_method(...) f_s = specailize(f, declRef) ... This commit fixes this `emitFuncRef` function to emit the needed `specialize` instruction. * fixes #260 This fix follows the second apporach in the disucssion. It generated mangled name for specialized functions by appending new substitution type names to the original mangled name. * Disabling removing and re-inserting specailized functions in getSpecalizeFunc() I am not sure why it is needed, it seems HLSL and GLSL backends are generating forward declarations anyways, so the order of functions in IRModule shouldn't matter. * cleanup and complete test cases. * fix warnings
*	Merge branch 'master' into falcor-integration-work	Tim Foley	2017-11-07
\|\
\| *	IR: support for select and negate (#257)	Tim Foley	2017-11-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- During IR emit, treat a "select" expression (`?:` operator) like any other `InvokeExpr`, since it will have an `__intrinsic_op` modifier attached to turn it into a `select` instruction. - During HLSL/GLSL emit from IR, turn a `select` instruction into a `?:` expression - Also add support for the `neg` instruction during HLSL/GLSL emit Note that right now we are assuming HLSL semantics for `?:` where it does not short-circuit. Correctly handling the GLSL case would require going back to special-case codegen for `SelectExpr`, but we can cross that bridge when we come to it.
* \|	Emit pointer-type parameters as out params	Tim Foley	2017-11-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The IR encodes `out` and `in out` function parameters as pointer types, so the emit logic needs to handle it. We had code to handle translation of pointers types into `out` declarations for function declarations but weren't handling it for function definitions. This change unifies the logic so that it is shared by function definitions and decalrations. This change does not deal with the following issues that need to be addressed sometime soon-ish: - We currently always translate pointers into `out`, even if they should be `in out`. This is obviously wrong. - If/when we eventually have targets that support true pointers (e.g., CUDA, NVIDIA OpenGL, etc.) we'll need a way to tell the difference between an `in` pointer parameter, and an `out` parameter. Both of these issues are meant to be addressed by having a few special cases of pointer types, for the `out` and `in out` cases, and only translating those (not all pointers). We need to plumb those through the IR more completely, but I'm not dealing with that here.
* \|	Fix for emitting subscript calls in HLSL/GLSL	Tim Foley	2017-11-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The old approach was relying on an `__intrinsic_op` modifier to tell us we need to do something special with an `InvokeExpr`, but a previous change removed a bunch of those modifiers. Instead, we will now check for calls to subscript declarations as part of the normal flow of emitting any call, similar to what is done for constructor calls already. Eventually we should be able to eliminate the special case in the `__intrinsic_op` path, but I'm holding off on that because the AST emit logic can probably be cleaned up a lot once it doesn't have to be used for cross-compilation as well.
* \|	Try to fix up IR emit for subscript calls	Tim Foley	2017-11-07
\| \| \| \| \| \| \| \| \| \| \| \|	This code isn't especially useful right now since most of the important subscripts are still special-cased with `__intrinsic_op`, but the idea is that if we de-mangle an intrinsic operation's name and see it is called `operator[]` then we are probably calling a subscript, and should emit an appropriate expression. Aside: this change has pointed out to me that our current name mangling isn't properly handling non-alphanumeric characters, so we'll be in trouble as soon as we have non-intrinsic subscripts, operators, etc.
* \|	Fixes for name mangling/demangling	Tim Foley	2017-11-07
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The source of a lot of these changes is that our current strategy for dealing with "builtin" operations when emitting HLSL from the IR is to de-mangle the mangled name of an operation, and then emit HLSL code for a function call to an operation with that de-mangled name. This change introduces a few fixups for that work: - It adds support for parsing the mangled names of generics (specialized and unspecialized) - It adds logic for detecting when the operation being invoked is a member function - This is currently a bit ugly, since we compare the number of actual arguments we have in the IR against the number of parameters declared for the callee, and if they don't match we assume we have an extra `this` argument. On the mangling side, we add (hacky) support for mangling a function name when its types involve generic parameters, e.g.: ``` __generic<T, let N : int> T length(vector<T,N> v); ``` In this case the mangled name of the function needs to include a mangling for the type `vector<T,N>` which means it also needs to include a mangling for `N`. The reason I describe this support as "hacky" is because we really shouldn't be reproducing the names `T` or `N` in the mangled symbol name. By doing so we make it so that a user changing the name of a generic parameter would break (IR) binary compatibility with existing code that was separately compiled. I've included comments in the code about a better way to handle this, but it isn't a priorit right now since binary compatibility isn't something meaningful until we start emitting usable bytecode modules.
*	Parameter blocks (#245)	Tim Foley	2017-11-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Rename existing ParameterBlock to ParameterGroup We are planning to add a new `ParameterBlock<T>` type, which maps to the notion of a "parameter block" as used in the Spire research work. Unfortunately, the compiler codebase already uses the term `ParameterBlock` as catch-all to encompass all of HLSL `cbuffer`/`tbuffer` and GLSL `uniform`/`buffer`/`in`/`out` blocks (all of which are lexical `{}`-enclosed blocks that define parameters...). This change instead renames all of the existing concepts over to `ParameterGroup`, which isn't an ideal name, but at least doesn't directly overlap the new terminology or any existing terminology. The new `ParameterBlockType` case will probably be a subclass of `ParameterGroupType`, since it is a logical extension of the underlying concept. * Add Shader Model 5.1 profiles The HLSL `register(..., space0)` syntax is only allowed on "SM5.1" and later profiles (which is supported by the newer version of `d3dcompiler_47.dll` that comes with the Win10 SDK, but not the older version of `d3dcompiler_47.dll` - good luck figuring out which you have!). This change adds those profiles to our master list of profiles, and nothing else. * First pass at support for `ParameterBlock<T>` - Add the type declaration in stdlib - Add a special case of `ParameterGroupType` for parameter blocks - Handle parameter blocks in type layout (currently handling them identically to constant buffers for now, which isn't going to be right in the long term) - Add an IR pass that basically replaces `ParameterBlock<T>` with `T` - Eventually this should replace it with either `T` or `ConstantBuffer<T>`, depending on whether the layout that was computed required a constant buffer to hold any "free" uniforms - Add first stab at an IR pass to "scalarize" global variables using aggregate types with resources inside. - This currently only applies to global variables, so it won't handle things passed through functions, or used as local variables - It also only supports cases where the references to the original variable are always references to its fields, and not the whole value itself - Add a single test case that technically passes with this level of support, but probably isn't very representative of what we need from the feature * Fold parameter-block desugaring into a more complete "type legalization" pass The basic problem that was arising is that once you desugar `ParameterBlock<T>` into `T`, you then need todeal with splitting `T` into its constituent fields if it contains any resource types. Handling those transformations by following the usual use-def chains wasn't really helping, because you might need systematic rewriting that can really only be handled bottom-up. This change adds a new pass that is intended to perform multiple kinds of type "legalization" at once: - It will turn `ParameterBlock<T>` into `T` - It may at some point also convert `ConstantBuffer<T>` into `T` as well - It will turn an value of an aggregate type that contains resources into N different values (one per field) - As a result of this, it will also deal with AOS-to-SOA conversion of these types Legalization is applied to every function/instruction/value, so that it can make large-scale changes that would be tough to manage with a work list. This pass needs to be run after generics have been fully specialized, so that we know we are always dealing with fully concrete types, so that their legalization for a given target is completely known. This is still work in progress; there's more to be done to get this working with all our test cases, and finish the remaining `ParameterBlock<T>` work. * Improve binding/layout information when using parameter blocks - When doing type layout for a parameter block, don't include the resources consumed by the element type in the resource usage for the parameter block - Note that this is pretty much identical to how a `ConstantBuffer<T>` does not report any `LayoutResourceKind::Uniform` usage, except that `ParameterBlock<T>` is also going to hide underlying texture/sampler reigster usage - The one exception here is that any nested items that use up entire `space`s or `set`s those need to be exposed in the resource usage of the parent (I don't have a test for this) - When type legalization needs to scalarize things, it must propagate layout information down to the new leaf variables. In general, the register/index for a new leaf parameter should be the sum of the offsets for all of the parent variables along the "chain" from the original variable down to the leaf (we aren't dealing with arrays here just yet). - When type legalization decides to eliminate a pointer(-like) type (e.g., desugar `ParameterBlock<T>` over to `T`), actually deal with that in terms of the `LegalVal`s created, so that we can know to turn a `load` into a no-op when applied to a value that got indirection removed. - Hack up the "complex" parameter-block test so that it actually passes (the big hack here is that the HLSL baseline is using names that are generated by the IR, and are unlikely to be stable as we add/remove transformations). - Note: I can't make these be compute tests right now, because regsiter spaces/sets are a feature of D3D12/Vulkan, and our test runner isn't using those APIs.
*	cleanup useless code	Yong He	2017-11-04
\|
*	fix warnings	Yong He	2017-11-04
\|
*	merge with fixWarnings branch	Yong He	2017-11-04
\|\
\| *	fixed all warnings	Yong He	2017-11-04
\| \|
\| *	fix all unreachable code warnings	Yong He	2017-11-04
\| \|