| Age | Commit message (Collapse) | Author |
|
* Extractor builds without any reference to syntax (as it will be helping to produce this!).
* Change macros to include the super class.
* Added indexOf(const UnownedSubString& in) to UnownedSubString.
Refactored extractor
* Output a macro for each type with the extracted info - can be used during injection in class
* Simplify the header file - as can get super type and last from macro now
* Store the 'origin' of a definition
* Some small tidy ups to the extractor.
* Improve comments on the extractor options.
* Made CPPExtractor own SourceOrigins
* Small fixes around SourceOrigin.
* Small tidy up around macroOrign
|
|
* Fixes for IR generics
There are a few different fixes going on here (and a single test that covers all of them).
1. Fix optionality of trailing semicolon for `struct`s
======================================================
We have logic in the parser that tries to make a trailing `;` on a `struct` declaration optional. That logic is a bit subtle and couild potentially break non-idiomatic HLSL input, so we try to only trigger it for files written in Slang (and not HLSL). For command-line `slangc` this is based on the file extension (`.slang` vs. `.hlsl`), and for the API it is based on the user-specified language.
The missing piece here was that the path for handling `import`ed code was *not* setting the source language of imported files at all, and so those files were not getting opted into the Slang-specific behavior. As a result, `import`ed code couldn't leave off the semicolon.
2. Fix generic code involving empty `interface`s
================================================
We have logic that tries to only specialize "definitions," but the definition-vs-declaration distinction at the IR level has historically been slippery. One corner case was that a witness table for an interface with no methods would always be considered a declaration, because it was empty. The notion of what is/isn't a definition has been made more nuanced so that it amounts to two main points:
* If something is decorated as `[import(...)]`, it is not a definition
* If something is a generic/func (a declaration that should have a body), and it has no body, it is a declaration
Otherwise we consider anything a definition, which means that non-`[import(...)]` witness tables are now definitions whether or not they have anything in them.
3. Fix IR lowering for members of generic types
===============================================
The IR lowering logic was trying to be a little careful in how it recurisvely emitted "all" `Decl`s to IR code. In particular, we don't want to recurse into things like function parameters, local variables, etc. since those can never be directly referenced by external code (they don't have linkage).
The existing logic was basically emitting everything at global scope, and then only recursing into (non-generic) type declarations. This created a problem where a method declared inside a generic `struct` would not be emitted to the IR for its own module at all *unless* it happened to be called by other code in the same module.
The fix here was to also recurse into the inner declaration of `GenericDecl`s. I also made the code recurse into any `AggTypeDeclBase` instead of just `AggTypeDecl`s, which means that members in `extension` declarations should not properly be emitted to the IR.
Conclusion
==========
These fixes should clear up some (but not all) cases where we might emit an `/* unhandled */` into output HLSL/GLSL. A future change will need to make that path a hard error and then clean up the remaining cases.
* fixup: tabs->spaces
|
|
The main feature visible to the stdlib here is the `[__unsafeForceInlineEarly]` attribute, which can be attached to a function definition and forces calls to that function to be inlined immediately after initial IR lowering.
The pass is implemented in `slang-ir-inline.{h,cpp}` and currently only handles the completely trivial case of a function with no control flow that ends with a single `return`. The lack of support for any other cases motivates the `__unsafe` prefix on the attribute.
In order to test that the pass works, I modified the "comma operator" in the standard library to be defined directly (rather than relying on special-case handling in IR lowering), and then added a test that uses that operator to make sure it generates code as expected. The compute version of the test confirms that we generate semantically correct code for the operator, while the SPIR-V cross-compilation test confirms that our output matches GLSL where the comma operator has been inlined, rather than turned into a subroutine.
Notes for the future:
* With this change it should be possible (in principle) to redefine all the compound operators in the stdlib to instead be ordinary functions with the new attribute, removing the need for `slang-compound-intrinsics.h`.
* Once the compound intrinsics are defined in the stdlib, it should be easier/possible to start making built-in operators like `+` be ordinary functions from the standpoint of the IR
* The attribute and pass here could be extended to include an alternative inlining attribute that happens later in compilation (after linking) but otherwise works the same. This could in theory be used for functions where we don't want to inline the definition into generated IR, but still want to inline things berfore generating final HlSL/GLSL/whatever.
* The inlining pass itself could be generalized to work for less trivial functions pretty easily; for the most part it would just mean "splitting" the block with the call site and then inserting clones of the blocks in the callee. Any `return` instructions in the clone would become unconditional branches (with arguments) to the block after the call (which would get a parameter to represent the returned value).
* The "hard" part for such an inlining pass would be handling cases where the control flow that results from inlining can't be handled by our later restructuring passes. The long-term fix there is to implement something like the "relooper" algorithm to restructure control flow as required for specific targets.
|
|
This change continues the work already started in moving the definitions of many built-in functions to the standard library.
The main focus in this change was reducing the number of operations that had to be special-cased on the CPU and CUDA targets by making sure that the scalar cases of built-in functions map to the proper names in the prelude (e.g., `F32_sin()`) via the ordinary `__target_intrinsic` mechanism. In some cases this cleanup meant that special-case logic that was constructing definitions for those functions using C++ code could be scrapped.
Additional changes made along the way:
* A few scalar functions that were missing in the CPU/CUDA preludes got added: `round`, hyperbolic trigonometric functions, `frexp`, `modf`, and `fma`
* The floating-point `min()` and `max()` definitions in the preludes were changed to use intrinsic operations on the target (which are likely to follow IEEE semantics, while our definitions did not)
* For the CUDA target, many of the functions had their names translated during code emit from, e.g., `sin` to `sinf`. This change makes the CUDA target more closely match the C++/CPU target in using names like `F32_sin` consistently.
* For the CUDA target, a few additional functions have intrinsics that don't exist (portably) on CPU: `sincos()` and `rsqrt()`.
* For the Slang stdlib definitions to work, a new `$P` replacement was defined for `__targert_intrinsic` that expands to a type based on the first operand of the function (e.g., `F32` for `float`).
* I removed the dedicated opcodes for matrix-matrix, matrix-vector, and vector-matrix multiplication, and instead turned them into ordinary functions with definitions and `__target_intrinsic` modifiers to map them appropriately for HLSL and GLSL. This is realistically how we would have implemented these if we'd had `__target_intrinsic` from the start.
Notes about possible follow-on work:
* The `ldexp` function is still left in the Slang stdlib because it has to account for a floating-point exponent and the `math.h` version only handles integers for the exponent. It is possible that we can/should define another overload for `ldexp` (and `frexp`) that uses an integer for exponent, and then have that one be a built-in on CPU/CUDA, with the HLSL `frexp` being defined in the stdlib to delegate to the correct `frexp` for those targets.
* The `firstbithigh` and related functions are missing for our CPU and CUDA targets, and will need to be added. It is worth nothing that `firstbithigh` apparently has some very odd functionality around signed integer arguments (which are supported, despite MSDN being unclear on that point). General cleanup will be required for those functions.
* Maxing the various matrix and vector products no longer be intrinsic ops might affect how we emit code for them as sub-expressions (both whether we fold them into use sites and how we parenthize them). This doesn't seem to affect any of our existing tests, but we could consider marking these functions with `[__readNone]` to ensure they can be folded, and then also adding whatever modifier(s) we might invent to control precdence and parentheses insertion during emit.
|
|
|
|
We currently represent the `groupshared` qualifier as a kind of "rate" at the IR level (where a rate can qualify a type to indicate the frequency/rate at which a value is stored and/or computed). This means that when computing the type that a pointer points to, we need to handle both, e.g., `Ptr<Int>` and `@GroupShared Ptr<Int>`.
The logic that was trying to handle the rate-qualified case when deducing the "pointee" type of a pointer was somehow written incorrectly, and was querying `getDataType()` on an `IRRateQualifiedType` which is asking for the type of the type itself (null in this case), rather than `getValueType()` which gets the `T` part from a rate-qualified type `@R T`.
Somehow none of our tests were hitting this case, and I'm not immediately clear on how to write a targeted reproducer for this, since the problem arose as a debug-only assertion failure in a user shader with thousands of lines.
|
|
* WIP: 64 literal diagnostic and truncation.
* Improve how integer truncation is handled/supported.
Added literal-int64.slang test.
Set a suffix on all literals.
Fixed problem on C++ based targets where l suffix was not the same as int() cast. So on C++ derived emitters, int() is used instead of l suffix to have same behavior across targets.
* Add literal diagnostic testing.
* Allow lexer to lex - in front of literals.
* Fix lexing and converting int literal with -.
* Too large small values of floats become inf.
Handling writing inf types out on different targets.
Add function to deterimine if a float literals kind.
* Roll back the support of lexer lexing negative literals.
* Fixed tests broken because of diagnostics numbers.
Improved _isFinite
* Fix compilation on linux.
* Fix problem with abs on linux - use Math::Abs.
* Fix typo.
* * Improve warnings for float literals zeroed
* Improved 64 bit type documentation
* Handle half
* Improved comments
* Fixed tests broken
* Use capital letters for suffixes.
* Make default behavior on outputting a int literal that is an 'int32_t' is cast (not suffix) to avoid platform inconsistencies.
Improve documentation for 64 bit types.
Make tests cover material in docs.
* Fixed tests.
* Rename FloatKind::Normal -> Finite
* Fix half zero check.
|
|
When using row-major layout (via command-line or API option), the following sort of declaration:
```hlsl
StructuredBuffer<float4x4> gBuffer;
... gBuffer[i] ...
```
Generates unexpected results when compiled to DXBC via fxc or DXIL via dxc, because the fxc/dxc compilers do not respect the matrix layout mode in this specific case (a structured buffer of matrices). Instead, they always use column-major layout, even if row-major was requested by the user.
A user can work around this behavior by wrapping the matrix in a `struct`:
```hlsl
struct Wrapper { float4x4 wrapped; }
SturcturedBuffer<Wrapper> gBuffer;
... gBuffer[i].wrapped ...
```
This change simply automates that workaround when compiling for an HLSL-based downstream compiler, so that we get the same behavior across all our backends.
The change adds a test case to confirm the behavior across multiple targets, but it turns out we also had a test checked in that confirmed the buggy (or at least surprising) fxc/dxc behavior, so that one had its baselines changed and can work as a regression test for this fix as well.
|
|
* Clean up the concept of "pseudo ops"
Built-in functions in the Slang standard library can be marked with `__intrinsic_op(...)` to indicate that they should not lower to functions in the IR, and that instead call sites to those functions should be translated directly to the IR.
There are two cases where `__intrinsic_op(...)` gets used:
1. In the case where the argument to `__intrinsic_op(...)` is an actual IR instruction opcode, the IR lowering logic directly translates a call into an instruction with the given opcode. The arguments to the call become the operands of the instruction.
2. In the case where the argument to `__intrinsic_op(...)` is one of a set of "pseudo" instruction opcodes, the IR lowering logic directly handles the lowering to IR with dedicated code. The operands to the call might be handled differently depending on the kind of operation.
The compound operators like `+=` are the most important example of these "pseudo" instructions.
It doesn't make sense to handle them as true function calls (although that would work semantically), nor does it make sense to have a single IR instruction with such complicated semantics.
An earlier version of the compiler used the same enumeration for both the true IR instruction opcodes and these "pseudo" opcodes, with the simple constraint that the pseudo opcodes were all negative while the real opcodes were positive. That design got changed up over a few refactorings, and because there was never a good explanation in the code itself of what "pseudo" opcodes were, we eventually ended up in a place where the in-memory and serialized IR encodings included logic to try to deal with the possibility of these "pseudo" opcodes, even though the entire design of the lowering pass meant that they'd never appear in generated IR.
This change tries to clean up the mess in a few ways:
* The terminology is now that these are "compound" intrinsic ops, to differentiate them from the more common case of intrinsic ops that map one-to-one to IR instructions.
* The declaration of the compound intrinsic ops is no longer in a file related to the IR, and doesn't use the `IR` naming prefix, so somebody looking at the IR opcodes cannot become confused and think the compound ops are allowed there.
* The IR encoding in memory and when serialized is updated to not account for or worry about the possibility of "pseudo" ops.
* The compound ops are declared in such a way that ensures their enumerant values are all negative, so that they are yet again trivially disjoint from the true IR opcodes.
A more drastic change might have split `__intrinsic_op` into two different modifier types: one for the trivial single-instruction case and one for the compound case.
Doing this would make the change more invasive, though, because there are places in the meta-code that generates the standard library that intentionally handle both single-instruction and compound ops (because built-in operators can translate to either case).
* fixup: missing file
* cleanups based on review feedback
|
|
* Initial work for "global generic value parameters"
The main new feature here is support for the `__generic_value_param` keyword, which introduces a *global generic value parameter*.
For example:
__generic_value_param kOffset : uint = 0;
This declaration introduces a global generic value parameter `kOffset` of type `uint` that has a nominal default value of zero.
The broad strokes of how this feature was added are as follows:
* A new `GlobalGenericValueParamDecl` AST node type is introduces in `slang-decl-defs.h`
* A new `parseGlobalGenericValueParamDecl` subroutine is added to `slang-parser.cpp`, and is added to the list of declaration cases as the callback for the `__generic_value_param` name.
* Cases for `GlobalGenericValueParamDecl` are added to the declaration checking passes in `slang-check-decl.cpp`, mirroring what is done for other variable declaration cases.
* A case for `GlobalGenericValueParamDecl` is aded to the `Module::_collectShaderParams` function, so that it is recognized as a kind of specialization parameter. This introduces a specialization parameter of flavor `SpecializationParam::Flavor::GenericValue` (which was already defined before this change, although it was unused).
* A case for `SpecializationParam::Flavor::GenericValue` is added in `Module::_validateSpecializationArgsImpl` to check that a specialization argument represents a compile-time-constant value (not a type).
* A case for `GlobalGenericValueParmDecl` is introduced in `slang-lower-to-ir.cpp` that introduces a global generic parameter in the IR
* The `IRBuilder` is extended to support creating `IRGlobalGenericParam`s for the distinct cases of type, witness-table, and value parameters. The same IR instruction type/opcode is used for all cases, and only the type of the IR instruction differs.
* The existing mechanisms for lowering specialization arguments to the IR, and doing specialization on the IR itself Just Work with global generic value parameters since they already support value parameters on explicit generic declarations.
That's the santized version of things, but there were also a bunch of cleanups and tweaks required along the way:
* The `SpecializationParam` type was extended to also track a `SourceLoc` to help in diagnostic messages, which meant some churn in the code that collects specialization parameters.
* The `_extractSpecializationArgs` function is tweaked to support any kind of "term" as a specialization argument (either a type or a value).
* To allow *parsing* specialization arguments that can't possibly be types (e.g., integer literals) we replace the existing `parseTypeString` routine with `parseTermString` and then in `parseTermFromSourceFile` call through to a general case of expression parsing (which can also parse types) rather than only parsing types directly.
* Right before doing back-end code generation, we check if the program we are going to emit has remaining (unspecialized) parameters, in which case we emit a diagnostic message for the parameters that haven't been specialized rather than go on to emit code that will fail to compile downstream.
* Within the `render-test` tool we collapse down the arrays that held both "generic" and "existential" specialization arguments, so that we just have *global* and *entry-point* specialization argument lists. This mirrors how Slang has worked internally for a while, but the difference hasn't been important to the test tool because no tests currently mix generic and existential specialization. The logic for parsing `TEST_INPUT` lines has been streamlined down to just the global and entry-point cases, but the pre-existing keywords are still allowed so that I don't have to tweak any test cases.
There are several significant caveats for this feature, which mean that it isn't really ready for users to hammer on just yet:
* There is no support for `Val`s of anything but integers, so there is no way to meaningfully have a generic value param with a type other than `int` or `uint`.
* We allow for a default-value expression on global generic parameters, but do not actually make use of that value for anything (e.g., to allow a programmer to omit specialization arguments), nor check that it meets the constraints of being compile-time constant.
* Global generic value parameters are *not* currently being treated the same as explicit generic parameters in terms of how they can be used for things like array sizes or other things that require constants. This will probably be relaxed at some point, but allowing a global generic to be used to size an array creates questions around layout.
* The IR optimization passes in Slang currently won't eliminate entire blocks of code based on constant values, so using a global generic value parameter to enable/disable features will *not* currently lead to us outputting drastically different HLSL or GLSL. That said, we expect most downstream compilers to be able to handle an `if(0)` well.
* Fix regression for tagged union types
The change that made specialization arguments be parsed as "terms" first, and then coerced to types meant that any special-case logic that is specific to the parsing of types would be bypassed and thus not apply.
Most of that special-case logic isn't wanted for specialization arguments, since it pertains to cases were we want to, e.g, declare a `struct` type while also declaring a variable of that type.
The one special case that *is* useful is the `__TaggedUnion(...)` syntax, which is the only way to introduce a tagged union type right now.
In order to get that case working again, all I had to do was register the existing logic for parsing `__TaggedUnion` as an expression keyword with the right callback, and the existing logic in expression parsing kicks in (that logic was already handling expression keywords like `this` and `true`).
I left in the existing logic for handling `__TaggedUnion` directly where types get parsed, rather than try to unify things.
A better long-term fix is to make the base case for type parsing route into `parseAtomicExpr` so that the two paths share the core logic.
That change should probably come as its own refactoring/cleanup, because it creates the potential for some subtle breakage.
* fixup: typo
|
|
* Strip IR after front-end steps are done
The main feature of this change is to unconditonally strip out the `IRHighLevelDeclDecoration`s in an IR module once the "mandatory" IR passes in the front end have run. This ensures that later IR passes (e.g., code emission) *cannot* rely on AST-level information to get their job done.
Since I was already writing a pass to remove some instructions at the end of the front-end passes, I went ahead and also made the `-obfuscate` flag apply to the front-end IR generation by causing it to strip `IRNameHintDecoration`s while it is doing the other stripping. With this, the main identifying information left in IR modules (other than semantics and entry-point names) is mangled name strings for imported/exported symbols.
A few other things got changes along the way:
* Removed the `.expected` file for one of the tests, where that file seemingly shouldn't have been checked in at all.
* Updated the signature of the DCE pass both so that it doesn't require a back-end compile request (it wasn't using it anyway), and so that it takes some options to decide whether to keep symbols marked `[export(...)]` alive (the front-end wants to keep these, while back-end passes currently need to be able to eliminate them).
* Moved the `obfuscateCode` flag from the back-end compile request to the base class shared between front- and back-end requests, and updated the options and repro logic to set both as needed. An obvious improvement in the future would be to have the front- and back-end requests share these settings by referencing a single common object in the end-to-end case, rather than each having their own copy.
* Removed logic that was keeping layout instructions alive in DCE, even if they weren't used. This seems to have been a vestige of an intermediate step between AST and IR layout.
* fixup: add the new files
|
|
This change builds on previous work that moves toward a more IR-based representation of layout.
Those steps added some instructions for representing layout in the IR (initially just proxies for the AST layout objects), and an explicit lowering pass that could build a target-specific IR module that binds parameters and entry points to layout information.
This change aims to complete that work, in the sense that the IR representation of layout is now self-contained and does not rely on having pointers back into the AST-level representation.
Achieving this requires two main kinds of work:
1. Update any code that used layout information derived from the IR (most notably all the `slang-emit-*` code) to use the new IR representation and its accessors.
2. Update any code that *constructs* layouts using information derived from the IR to construct IR layouts instead.
The biggest new infrastructure feature in this change is support for "attributes" in the IR (I'd welcome feedback on the naming).
An attribute can either be thought of like key/value arguments that can be added to certain instructions to encode optional data, or alternatively like a decoration that is referenced as an operand instead of a child.
The value of attributes over decorations is that they can affect the hash/identity of an instruction (which decorations can't), while the advantage of decorations is that they can easily be added/removed over the lifetime of an instruction (which attributes can't).
We mostly use them here to represent operands that are logically optional.
Once attributes are available, the encoding of layout information into the IR is mostly straightforward:
* An `IRVarLayout` has a fixed operand for its type layout, and can accept a few different attributes
* Zero or more `IRVarOffsetAttr`s that specify the offset of the variable for a given resource kind. These are equivalent to the `VarLayout::ResourceInfo`s at the AST level.
* An optional `IRUserSemanticAttr` and `IRSystemValueSemanticAttr` to represent the (possibly derived) semantic of a varying input/output parameter.
* An option `IRStageAttr` to represent the known stage for a parameter.
* An `IREntryPointLayout` has a var layout for the entry point parameters (logically grouped in to a struct) and another var layout for the result parameter.
* There is a small type hierarchy rooted at `IRTypeLayout` where each subtype can add fixed operands and attributes that are expected to appear. It also supports `IRTypeSizeAttr`s that serve a similar role to the `IRVarOffsetAttr`s.
* Structure types maintain the mapping of fields to their var layouts using `IRStructFieldLayoutAttr`s.
With the encoding in place, most of the changes in category (1) (code that just *uses* rather than *creates* layouts) was straightforward. The biggest different beyond name changes was that everything needs to be fetched using accessors instead of bare fields. It would have been possible to stage this commit and make the diffs smaller by first introducing mandatory acessors to the AST layout types.
The changes in category (2) were more involved. There were a lot of places in the existing code where a `TypeLayout` or `VarLayout` would be created, and then initialized piecemeal over several lines of code (and sometimes even across functions). Because of the way that layouts need to support many optional properties, it did not seem practical to just have monolithic factory functions that took all the options as arguments, so I instead opted for a builder approach.
The builders for `IRVarLayout` and `IREntryPointLayout` are both straightforward, and honestly there is no realy need for a builder for entry point layouts right now, but I was trying to future-proof in case we decidd to add some optional attributes to them.
The builders for type layouts are more involved because of the inheritance hierarchy. Each concrete sub-type of type layout needs to define its own builder type that customizes the opcode, operands, and attributes of the final instruction.
The refactoring that had to go into this change was a nice excuse to clean up a few ugly warts in the AST layout code that were largely there to support IR use cases. While this change adds a lot of new infrastructure code to the IR, most of the client code has stayed the same or gotten simpler.
One annoying wart that remains with this change is the notion of an "offset element type layout" for parameter group types. That idea was added to deal with a legacy feature in the reflection API that we realized was a mistake, but unfortunately having that "offset" layout handy made writing a few other pieces of code simpler so that there are use cases of the feature even in the IR. Removing those uses is do-able, but requires careful refactoring so it is best left to a follow-on change.
Another thing that could be considered for a follow-on change is how much information should be specified when constructing a `Builder` for an IR type layout, and how much should be allowed to be specified statefully/piecemeal. It would be nice to force all the required operands to be specified up front, but `IRParameterGroupTypeLayout::Builder` doesn't currently work that way because so much of the client code that needs it involved a lot of stateful setting and would need to be refactored heavily to provide the necessary information up front.
|
|
* Initial work on representing layout at IR level
This change starts the process of making the back-end of the compiler independent of the AST-level layout information (`TypeLayout`, `VarLayout`, etc.) so that it instead only relies on layout information that is embedded into IR modules. This brings us incrementally closer to a world in which the back-end could be run without the AST-level structures even existing (e.g., for an application that just wants to ship IR without any AST information for IP protection, while still supporting some amount of linking and specialization).
The main parts of the change are:
* There is a bunch of incidental churn related to specifying entry points by index instead of the `EntryPoint` object for certain operations. This ends up being a better choice because we can use the index to look up side-band information about the entry point that might not be stored on the `EntryPoint` object itself. In particular...
* We expand the `ComponentType` interface to support looking up the mangled name of an entry point by index. In common cases (no generic/interface specialization) this would be the same as asking the `EntryPoint` for its mangled name, but in cases where we have specialized a generic entry point, the mangled name would include speicalization arguments that are only available on the `SpecializedComponentType` that wraps the entry point. This part of the change isn't ideal and there might be a better solution waiting to be invented. Note that we store mangled entry point names as strings rather than using `DeclRef`s because that ensures that the information could be serialized and deserialized without a dependence on the AST.
* The `TargetProgram` type (which represents binding a specific `ComponentType` for a shader program to a specific `TargetRequest` that represents the target platform) is expanded to include an `IRModule` that represents layout information, in addition to the AST-level `ProgramLayout` it already contained. We create both of these objects at the same time (on-demand) to simplify the overall flow (so that any code that triggers creation of the AST-level layout will also ensure that the IR-level layout exists).
* A bunch of code in the emit passes that was passing down layout-related objects has been eliminated. It appears that most of those objects weren't actually being used, so this is just a cleanup, but it helps ensure that the back-end steps are "clean" and don't depend on the AST-level information. The one big exception here is that the emit logic needs to know the stage for the entry point being emitted (to deal with one wrinkle in translating DXR to VKRT).
* A big change (actually introduced by @jsmall-nvidia in a branch that this change copied and then built from) is to introduce some more explicit IR instructions to represent layout information, notably an `IRTypeLayout` and an `IRVarLayout`. For now these objects still reference their AST equivalents, but the separation gives us an incremental path to move information from the AST-level objects over to the IR ones. This work includes logic in `IRBuilder` to construct the IR-level layout objects from the AST-level ones on-demand, so that the existing code paths that try to attach AST-level layout will continue to work for now.
* Because layout information is now embedded in the IR, the `slang-ir-link.cpp` logic loses a lot of cases that used to deal with attaching AST-level layout objects to IR-level instructions during the linking process. Instead, the linker now assumes that one (or more) of the input IR modules will have layout information associated with it, and the linker makes sure to copy layout decorations (and the instructions they reference) from the input IR module(s) to the output using its more ordinary mechanisms.
* Inside `slang-lower-to-ir.cpp`, we add logic to construct an IR module in a `TargetProgram` that simply references the global shader parameters, entry points, etc. and attaches IR layout decorations to them. This is akin to the existing pass in the same file that constructs IR to represent specialization information, and both of these passes share infrastructure with the main AST->IR lowering pass. Eventually, it is expected that this pass will encompass more of the logic for copying AST-level layout information over to IR-level equivalents.
* One small wrinkle with this change was that the output for an HLSL generation test case changed some of its `#line` directives. The old code was actually more inaccurate than the new, so this change just updated the baseline. It also added some logic in the linker to make sure that when an IR instruction has multiple definitions, we try to pick up a source location from any of them, in case the "main" one somehow didn't get a location.
* Another small fix was that the key/value map in `StructTypeLayout` for mapping fields/members to their layouts was keyed on `Decl*` when it really should have been `VarDeclBase*`.
This change should in principle be a pure refactoring with no functionality changes, so no new tests were added. It is unfortunately also a change that has a high probability of breaking at least *some* client code, so we may want to be defensive and mark this with a new major version number (well, a new *minor* version number since we are pre-`1.0`) to give us some room for releasing hotfixes to the old version if needed.
* fixup: infinite recursion bug detected by clang
* fixup: remove commented-out code
|
|
Work on #1059
The `%` operator in the Slang implementation had several issues, and this change tries to address some of them:
* Renamed most occurences of "mod" describing this operator to be "rem" for "remainder" to better match its semantics in HLSL
* Split the operator into distinct integer and floating-point variants (`IRem` and `FRem`) to simplify having different codegen for the two
* Added floating-point variants of `operator%` and `operator%=` to the stdlib.
* Added custom C++ codegen for `kIROp_FRem` such that it maps to the standard C/C++ `remainder()` function
* Added custom GLSL codegen so that `kIROp_FRem` maps to the GLSL `mod()` function (which isn't correct...)
* Added a test case to confirm that D3D11, D3D12, and CPU targets all agree on the definition of floating-point `%`
* Fixed `render-test-tool` to allow a negative integer in a `data=...` specification. This didn't end up being used in the final test, but still seems like a good fix.
* Added a customized baseline for the Vulkan flavor of that test to confirm that we are *not* compiling correctly to SPIR-V just yet
Addressing the correctness of the output for GLSL/SPIR-V will have to come as a later change given that the operation we want is not exposed directly by unextended GLSL.
|
|
* Add support for '=' when defining a name in test.
* Add support for double intrinsics.
* Add support for asdouble
Add findOrAddInst - used instead of findOrEmitHoistableInst, for nominal instructions.
Support cloning of string literals.
C++ working on more compute tests.
* Constant buffer support in reflection.
Fixed debugging into source for generated C++.
buffer-layout.slang works.
* Added cpu test result.
* Remove some commented out code.
Comment on next fixes.
* Improvements to reflection CPU code.
* C++ working with ByteAddressBuffer.
* Enabled more compute tests for CPU.
* Enabled more compute tests on CPU.
Added support for [] style access to a vector.
* Enabled more CPU compute tests.
* Handling of buffer-type-splitting.slang
Named buffers can be paths to resources
* Fix some warnings, remove some dead code.
* Fix problem with verification of number of operands for asuint/asint as they can have 1 or 3 operands. asdouble takes 2.
* Fix handling in MemoryArena around aligned allocations. That _allocateAlignedFromNewBlock assumed the block allocated has the aligment that was requested and so did not correct the start address.
|
|
* Revise new COM-lite API
This change revises the "COM-lite" API that was recently introduced to try to streamline it and introduce some missing central/base concepts.
The central new abstraction in the API is the notion of a "component type," which is a unit of shader code composition. A component type can have:
* IR code for some number of functions/types/etc.
* Zero or more global shader parameters
* Zero or more "entry point" functions at which execution can start
* Zero or more "specialization" parameters (types or values that must be filled in before kernel code can be generated)
* Zero or more "requirements" (dependencies on other component types that must be satisfied before kernel code can be generated)
Both individual compiled modules, and validated entry points are then examples of component types, and we additionally define a few services that apply to all component types:
* We can take N component types and compose them to create a new component type that combines their code, shader parameters, entry points, and specialization parameters. A composed component type may also include requirements from the sub-component types, but it is also possible that by composing thing we satisfy requirements (if `A` requires `B`, and we compose `A` and `B`, then the requirement is now satisfied, and doesn't appear on the composite).
* We can take a component type with N specialization parameters, and specialize it by giving N compatible specialization arguments. The result of specialization is a new component type with zero specialization parameters. Under the right circumstances the specialzed component type will be layout compatible with the unspecialized one.
* One more example that isn't exposed in the public API today is that we can take a component with requirements and "complete" it by automatically composing it with component types that satisfy those requirements. This can be seen as a kind of linking step that pulls together the transitive closure of dependencies.
* We can query the layout for the shader parameters and entry points of a component type, for a specific target.
* We can query compiled kernel code for an entry point in a component type (for a specific target). This only works for component types with zero specialization parameters and zero requirements.
The idea is that by giving users a fairly general algebra of operations on component types, they can compose final programs in ways that meet their requirements. For example, it becomes possible to incrementally "grow" a component type to represent the global root signature for ray tracing shaders as new entry points are added, in such a way that it always stays layout-compatible with kernels that have already been compiled.
Much of the implementation work here is in implementing the unifying component type abstraction, and in particular re-writing code that used to assume a program consisted of a flat list of modules and entry points to work with a hierarchical representation that reflects the underlying algebra (e.g., with types to represent composite and specialized component types).
There's also a hidden "legacy" case of a component type to deal with some legacy compiler behaviors that can't be directly modeled on top of the simple algebra with modules and entry points.
This API is by no means feature-complete or fully developed. It is expected that we will flesh it out more when bringing up application code (e.g., Falcor) on top of the revamped API.
One notable thing that went away in this change is explicit support for "entry point groups" and notions of local root signatures (especially the Falcor-specific handling of the `shared` keyword, which a previous change turned into an explicitly supported feature). With the new "building blocks" approach, it should be possible for a DXR application to deal with local root signatures as a matter of policy (on top of the API we provide). If/when we need to provide some kind of emulation of local root signatures for Vulkan (and/or if Vulkan is extended with an explicit notion of local root signatures), we might need to revisit this choice.
* Fix debug build
There was invalid code inside an `assert()`, so the release build didn't catch it.
* fixup: warnings
* fixup: more warnings-as-errors
* fixup: review notes
* fixup: use component type visitors in place of dynamic casting
|
|
This change adds back a little bit of explicit support for global constants in the IR, after a previous change completely removed the existing `IRGlobalConstant` node type.
The new `IRGlobalConstant` is *not* a parent instruction, and doesn't function at all like the old one.
Instead it is effectively a simple instruction that takes zero or one operands:
* The zero-operand case represents a constant with unknown value. This would usually come from another module, and thus would have an `[import(...)]` linkage decoration, so that after linking it resolves to a constant with a known value.
* In the one-operand case, the single operand represents the value of the constant, so that the operation semantically behaves like an identity function. It exists just to give decorations something to "attach" to, so that a global constant with a value can have, e.g., an `[export(...)]` decoration to establish linkage.
The IR lowering pass was updated to create the new node type to wrap any global constants. For now we do this both for global `static const` variables and function-scope `static const`, although the latter doesn't really need the extra indirection.
The IR linking logic was extended to handle linking of global constants akin to how other global instructions are handled. The new logic is mostly boilerplate, and it is likely that a refactor of the linking logic would eliminate the need for this kind of per-instruction-opcode handling of IR instructions that can have linkage.
A custom pass was added that is intended to be run right after linking (it could arguably be folded into `linkIR()`, but I thought it was safer to keep each pass as small as possible). This pass replaces any `IRGlobalConstant` that has a value (operand) with that value, so that global constants should be eliminated after the linking step. This ensures that downstream optimization/transformation passes don't have to deal with the possibility of global constants.
Almost all the existing passes would Just Work if global constants were left in the IR. The two big exceptions are:
* Anything that relies on testing `IRInst*` identity as a way to test for things having the same value would break, since a global constant is a distinct `IRInst*` from its value.
* The type legalization pass doesn't handle `IRGlobalConstant` instructions with non-simple types. This could be added if we ever wanted it, but it seemed silly to write this code now if it would always be dead (and thus untested).
I went ahead and updated the emit logic to handle an `IRGlobalConstant`s that still existing in the IR module at emit time, since the amount of code required was small so that being robust to that case seemed safest (e.g., in case we ever want to have a path that emits code directly while skipping some/all of our IR transformation passes).
There should be no visible changes to the functionality of the compiler with this change, but it should help make IR dumps from the front-end more clear/explicit (since each constant will be a distinct instruction with its own name), and paves the way for supporting proper cross-module linkage of constants.
|
|
Before this change, global and function-scope `static const` declarations were represented as instructions of type `IRGlobalConstant`, which was represented similarly to an `IRGlobalVar`: with a "body" block of instructions that compute/return the initial value.
This representation inhibited optimizations (because a reference to a global constant would not in general be replaced with a reference to its value), and also caused problems for resource type legalization because the logic for type legalization did not (and still does not) handle initializers on globals (so global *variables* that contain resource types are still unsupported).
The change here is simple at the high level: we get rid of `IRGlobalConstant` and instead handle global-scope constants as "ordinary" instructions at the global scope. E.g., if we have a declaration like:
static const int a[] = { ... }
that will be represented in the IR as a `makeArray` instruction at the global scope, referencing other global-scope instructions that represent the values in the array.
This simple choice addresses both of the main limitations. A `static const` variable of integer/float/whatever type is now represented as just a reference to the given IR value and thus enables all the same optimizations. When a `static const` variable uses a type with resources, the existing legalization logic (which can handle most of the "ordinary" instructions already) applies.
Another secondary benefit of this approach is that the hacky `IREmitMode` enumeration is no longer needed to help us special-case source code emit for `static const` variables.
Beyond just removing `IRGlobalConstant`, and updating the lowering logic to use the initializer direclty, the main change here is to the emit logic to make it properly handle "ordinary" instructions that might appear at global scope.
One open issue with this change, that could be addressed in a follow-up change, is that "extern" global constants that need to be imported from another module (but which might not have a known value when the current module is compiled) aren't supported - we don't have a way to put a linkage decoration on them. A future change might re-introduce global constants as a distinct IR instruction type that just references the value as an operand (if it is available). We would then need to replace references to an IR constant with references to its value right after linking.
|
|
* WIP: Emitting Cpp
* Added HLSLType instead of using IRInst - because they don't seem to be deduped.
* Removed need for lexer to take a String.
Added mechansim to lookup intrinsic functions on C++.
* A c/c++ cross compilation test.
* WIP Cpp output using cloning and slang types.
* More work to generate mul funcs.
* WIP: Outputting some simple C++.
* Expose findOrEmitHoistableInst to IRBuilder to aid cloning,
* Simplification for checking for BasicTypes.
Test infrastructure compiles output C++ code.
* Dot and mat/vec multiplication output.
* First pass at swizzling.
* First support for binary ops.
* Builtin binary and unary functions.
* Any and all.
* WIP adding support for other functions.
Added code to generate function signature.
* Add scalar functions to slang-cpp-prelude.h
* Support for most built in operations.
* Tested first ternary.
* Checking the emitting of corner cases functions - normalize, length, any, all, normalize, reflect.
* Check asfloat etc work.
* Fmod support.
* WIP Array handling in C++.
* First stage in being able to handl arbitrary type output for CLikeSourceEmitter
* Removed Handler/Emitter split - so can implement more easily complex type naming.
* Array passing by value first pass.
* Rename Array -> FixedArray
* Outputs structs in C++.
* Emit the thread config.
* Dimension -> TypeDimension
* SpecializedOperation -> SpecializedIntrinsic
Operation -> IntrinsicOp
Use shared impl of isNominalOp
Commented use of m_uniqueModule etc.
* Add code to test slang->cpp when compiled doesn't have errors. Does so by building shared library and exporting the entry point.
* Fix linux clang/gcc compile error about override not being specified.
* Make sure c-cross-compile is run on linux targets/smoke.
* Remove c-cross-compile.slang from smoke.
* Fix running tests/cross-compile/c-cross-compile.slang on Ubuntu 16.04
* Only add -std=c++11 for C++ source.
|
|
* Prefixing source files in source/slang with slang-
* Prefix source in source/slang with slang- prefix.
* Rename core source files with slang- prefix.
* Update project files.
* Fix problems from automatic merge.
|