| Age | Commit message (Collapse) | Author |
|
* Change handling of strings for HLSL/GLSL targets
This change switches our handling of string literals and `getStringHash` to something that is more streamlined at the cost of potentially being less general/flexible.
* `String` is now allowed as a parameter type in user-defined functions
* `getStringHash` is now allowed to apply to `String`-type values that aren't literals
* The list of strings in an IR module is now generated during IR lowering as part of lowering a string literal expression, rather than being defined by recursively walking the IR of the module looking for `getStringHash` calls. The public API still refers to these as "hashed" strings, but they are realistically now "static strings."
* When emitting code for HLSL/GLSL, the `String` type emits as `int`, and `getStringHash(x)` emits as `x`.
In terms of implementation, the choice was whether to translate `String` over to `int` in an explicit IR pass, or to lump it into the emit pass. While adding the logic to emit clutters up an already complicated step, it is ultimately much easier to make the change there than to write a clean IR pass to eliminate all `String` use.
Note that other targets that can handle a more full-featured `String` type are *not* addressed by this change and thus do not support `String` at all. It may be woth emitting `String` as `const char*` on those targets, and emitting string literals directly, but the `getStringHash` function would need to be implemented in the "prelude" then, and we probably want to pick a well-known/-documented hash algorithm before we go that far.
This change also brings along some some clean-ups to the `gpu-printing` example, since it can now take advantage of the new functionality of `String`.
* Fix up tests for new string handling
* Add global string literal list to string-literal test (since we now list *all* static string literals and not just those passed to `getStringHash`)
* Disable `getStringHash` test on CPU, since we don't have a working `String` on that platform right now (only HLSL/GLSL)
Co-authored-by: Tim Foley <tim.foley.is@gmail.com>
|
|
* WIP getStringHash
* Have a use.
* Add slang-string-hash.h/.cpp
* Use StringSlicePool for holding strings for StringHash.
Add outputBuffer to string-literal-hash.slang so value can be tested.
Ignore the GlobalHashedStringLiterals instruction on emit.
* Add all the hashed string literals to ProgramLayout.
* Add reflection support for hashed string literals to reflection test.
* Fix string literal hash test.
* Small fixes to pass test suite.
* Fix issue in serialization where IRUse is not correctly initialized.
* Fix problem initializing IRUse for string hash pass.
Remove hack from slang-ir-specialize - specially handling if user is not null.
* * Use shared builder when replacing getStringHash
* Comments for functions in slang-ir-string-hash
* Do not allow zero length string literals. Could be allowed, but doing so would require StringSlicePool to have a special case (or some other mechanism)
|
|
* Use name hint on EntryPoint naming.
* Placed the entry point name on the EntryPointDecoration.
|
|
* Split out EntryPointParamDecoration.
* Add profile to EntryPointDecoration.
* WIP for GS handling for GLSL.
* WIP for StreamOut GLSL
* Fixed GLSL geometry output.
* Clean up - remove unneeded/commented out code from the entry point change.
* Use Op nums to identify GeometryTypeDecorations (as opposed to contained enum).
|
|
* IROutputControlPointsDecoration
* IROutputTopologyDecoration
* IRPartitioningDecoration
* IRDomainDecoration
* Use IRPatchConstantDecoration alone for hlsl output.
* IRMaxVertexCountDecoration
* IRInstanceDecoration
* Removed _emitHLSLAttributeSingleString and _emitHLSLAttributeSingleInt
Removed GLSLBindingAttribute and just use NumThreadsAttribute
* Added IRNumThreadsDecoration.
* Added IRNumThreadsDecoration
* Fix build problem on x86.
Improve diagnostic text based on review.
|
|
* Split front- and back-ends
This change is a major refactor of several of the types that provide the behind-the-scenes implementation of the public C API.
The goal of this refactor is primarily to allow for future API services that let the user operate both the front- and back-ends of the compiler in a more complex fashion.
For example, as user should be able to compile a bunch of source code into modules, look up types, functions, etc. in those modules, specialize generic types/functions to the types they've looked up, and then finally request target code to be gernerated for specialized entry points.
The back-end code generation they trigger should re-use the front-end compilation work (parsing, semantic checking, IR generation) that was already performed.
The most visible change is that `CompileRequest` has been split up into several smaller types that take responsibility for parts of what it did:
* The `Linkage` type owns the storage for `import`ed modules, and well as the `TargetRequest`s that represent code-generation targets. The intention is that an application could use a single `Linkage` for the duration of its runtime (so long as it was okay with the memory usage), so that each `import`ed module only gets loaded once. For now, this type needs to manage the search paths, file system, and source manager, because of its responsibility for loading files.
* A `FrontEndCompileRequest` owns the stuff related to parsing, semantic checking, and initial IR generation. This most notably includes the `TranslationUnitRequest`s and the `FrontEndEntryPointRequest`s (which used to be just `EntryPointRequest`s). It's main job is to produce AST and IR modules for each translation unit, and to find and validate the entry points. The front-end request does *not* interact with generic arguments for global or entry-point generic parameters.
* The main output of both `import` operations and front-end translation units is the `Module` type, which is just a simple container for both the AST module (to service the reflection/layout APIs, and also for semantic checking of code that `import`s the module) and the IR module (for linking and code generation). This type captures the commonalities between the old `LoadedModule` (which is now just an alias for `Module`) and `TranslationUnitRequest` (which now owns a `Module`).
* The secondary output of front-end compilation is a `Program`, which comprises a list of referenced `Module`s and validated `EntryPoint`s that will be used together. Layout and code generation both need a `Program` to tell them what modules and entry points will be used together (we don't want to just code-gen everythin that has ever been loaded into the linakge). The `Program`s created by the front-end do not include generic arguments, so they may provide incomplete layout information and/or be unsuitable for code generation.
* A `BackEndCompileRequest` owns stuff related to turning a `Program` into output kernels for the targets of a `Linkage`. Most of the data it owns beyond the `Program` to be compiled is minor, so this is a good candidate for demotion from a heap-allocated object to just a `struct` of options that gets passed around.
* The `CompileRequestBase` type is an attempt to wrap up the common functionality of both front-end and back-end compile requests. Most of it is just exposing the availability of a linkage and `DiagnosticSink`, so this type is a good candidate for subsequent removal. The main interesting thing it has is the flags related to dumping and validation of IR, so there is probably a good refactoring still to be made around deciding how options should be handled going forward.
* Behind the scenes, the `Program` type is set up to handle some level of on-line compilation and layout work. The `Program` knows the `Linkage` it belongs to, and allows for a `TargetProgram` to be looked up based on a specific `TargetRequest`. A `TargetProgram` then allows layout information and compiled kernel code to be asked for on-demand, in order to support eventual "live" compilation scenarios.
* The `EndToEndCompileRequest` type is a composition/coordination type that replaces the old `CompileRequest` in a way that uses the services of the various other types. It owns a few pieces of state that only make sense in the context of an end-to-end compile (e.g., there is really no way to "pass through" code when the front- and back-ends are run separately) or a command-line compile (everything to do with specifying output paths for files is really just for the benefit of `slangc`, and might even be moved there over time).
* One important detail is that the `EndToEndCompilRequest` owns all of the string-based generic arguments for both global and entry-point generic parameters. The logic in `check.cpp` for dealing with those arguments has been heavily refactored to separate out the parsings steps that are specific to end-to-end compilation with string-based type arguments, and the semantic checking steps that result in a specialized `Program` (which can be exposed through new APIs that aren't tied to end-to-end compilation).
It is perhaps not surprising that this change had a lot of consequences, so I'll briefly run over some of the main categories of changes required:
* I changed the way that global generic arguments are passed via API (use `spSetGlobalGenericArgs` instead of the generic arguments for `spAddEntryPointEx`, which are not just for entry-point generics), which has been a change that we've needed for a long time. This is technically a breaking API change, although we should have very few client applications that care about it.
* A bunch of places that used to take "big" objects like `CompileRequest` now just take the sub-pieces they care about (e.g., a function might have only needed a `Linkage` and a `DiagnosticSink`). This makes many subroutines or "context" struct types more generally useful, at the cost of taking more parameters.
* In a few cases the conceptually clean separation of the layers breaks down (often for edge-case or compatibility features), and so we may pass along additional objects that are allowed to be null, but are used when present. A big example of this is how the back-end code generation routines accept an `EndToEndCompileRequest` that is optional, and only used to check whether "pass through" compilation is needed. We should probably look into cleaning this kind of logic up over time so that we don't need to violate the apparent separation of phases of compilation.
* In cases where separation of layers was being broken for the sake of GLSL features, I went ahead and ripped them out, since all of that should be dead code anyway.
* In many cases I increased the encapsulation of data in the core types to help track down use sites and make sure they are following invariants better.
* In cases where code was doing, e.g., `context->shared->compileRequest->session->getThing()` I have tried to introduce convenience routines so that the usage site is just `context->getThing()` to improve encapsulation and allow changes to be made more easily going forward.
* The `noteInternalErrorLoc` functionality was moved off of the compile request and into `DiagnosticSink`, since that is the one type you can rely on having around when you want to note an internal error. We may consider going forward if (and how) it should reset the counter used for noting locations on internal errors.
* A few APIs now take `DiagnosticSink*` arguments where they didn't before, and as a result some public APIs need to create `DiagnosticSink`s to pass in, before going ahead and ignoring the messages. In the future there should be variations of these APIs that accept an `ISlangBlob**` parameter for the output.
* fixup: missing include for compilers with accurate template checking (non-VS)
* fixup: review feedback
|
|
* Specialize away resource-type function parameters
Work on #397.
Introduction
------------
Suppose a user writes a function that takes a resource type as a parameter:
```hlsl
float4 getThing(RWStructuredBuffer<float4> buffer, int index)
{
return buffer[index];
}
```
This function creates challenges when generating code for GLSL-based targets, because a global shader parameter of type `RWStructuredBuffer`:
```hlsl
RWStructuredBuffer<float4> gBuffer;
```
translates to a global GLSL `buffer` declaration:
```hlsl
buffer _S0
{
float4 _data[];
} gBuffer;
```
There is no equivalent to that `buffer` declaration that can be used in function parameter position, and it is illegal in GLSL to pass `gBuffer` into a function.
(Aside: yes, we could in principle translate a function parameter like `RWStructuredBuffer<float4> buffer` to `float4 buffer[]`, but that will not in turn generalize to arrays of structured buffers; it is a dead-end strategy)
The solution employed by many shader compilers is to "inline everything" to eliminate the need for parameters of resource types, and then rely on dataflow optimization to eliminate locals of resource types. This strategy can of course lead to an increase in code size, and it also means that call stacks are lost when doing step-through debugging. Another serious issue is that an "early `return`" from a function can turn into the equivalent of a multi-level `break` when inlined, and not all of our targets support multi-level `break`.
The solution implemented in this change works around some, but not all, of the problems with full inlining.
The approach here generates specialized versions of a function like `getThing`, adapted to the actual arguments provided at different call sites.
Thus if we have code like:
```hlsl
RWStructuredBuffer<float4> gA;
RWStructuredBuffer<float4> gB[10];
...
getThing(gA, x);
getThing(gA, y);
getThing(gB[someVal], z);
```
we will generate two specializations of `getThing`: one specialized for the `buffer` parameter being `gA` and the other for `gB`:
```hlsl
float4 getThing_gA(int index) { return gA[index]; }
float4 getThing_gB(int _val, int index) { return gB[_val][index]; }
```
and the call sites will change to match:
```hlsl
getThing_gA(x);
getThing_gA(y);
getThing_gB(someVal, z);
```
Note how in the case where the argument being passed in was obtained by indexing into an array of resources, the callee is specialized to the identity of the global shader parameter (`gB`), and now accepts a new parameter to indicate the array index into it.
While this description motivates the change based on GLSL output, the same basic issue can arise for other targets.
For example, while current HLSL has added the `ConstantBuffer<T>` type, it is not supported on older targets, and it turns out that even dxc does not allow functions to have `ConstantBuffer<T>` parameters.
Longer-term, we will likely need to do even more aggressive specialization both in order to generate SPIR-V output directly, and also to deal with function that have return values or `out` parameters of resource types.
Implementation
--------------
The meat of the change is in `ir-specialize-resources.{h,cpp}`, where we have a pass that looks at all call sites (`IRCall` instructions) in the program, and attempts to replace them with calls to specialized functions, where the specializations are generated on-demand.
The code in this pass is heavily commented, so hopefully it serves to explain itself all right.
After specialization is complete, we may still have functions like the original `getThing` that will produce invalid code when emitted as GLSL, so we need a way to make sure they don't appear in the output.
To date we've had some very ad hoc approaches for ignoring IR constructs that we don't want to affect emitted code, but this change goes ahead and adds a more real dead code elimination (DCE) pass in `ir-dce.{h,cpp}`.
This pass follows a straightforward approach of tagging instructions that are "live" and then propagating liveness through the whole program, before making a single pass to delete anything that isn't live.
When I first added the DCE pass it eliminated *everything* because there were no "roots" for liveness.
I solved this for now by adding a new decoration, `IREntryPointDecoration`, to mark shader entry points in the IR which should always be live (as should anything they depend on).
A secondary problem that arose was that for GLSL ray tracing shaders it is possible for the incoming/outgoing payload or attributes parameters to be unused, but eliminating them as dead would change the signature of a shader an potential break the rules for how ray tracing programs communicate.
I added a very simple `IRDependsOnDecoration` that allows one IR instruction to keep another alive *as if* it used it, without actually using it.
There's also a fixup in the IR dumping logic where I was forgetting to store anything in the mapping from instruction to their names, so that the name of an instruction was getting incremented each time it was referenced.
Testing
-------
There are three different tests added as part of this change:
* The `compute/func-resource-param` test covers the basic `RWStructuredBuffer` case above, which we expect to work fine for D3D11/12, but fail for Vulkan without specialization.
* The `cross-compile/func-resource-param-array` test covers the case where we don't just have one resource, but an array of them. This is not an end-to-end compute test primarily because our `render-test` application doesn't yet handle arrays of resources correctly in its binding logic.
* The `compute/func-cbuffer-param` test covers the case of a function with a `ConstantBuffer<T>` parameter, which requires specialization to become valid for any of our targets.
* fixup: warnings/errors from other compilers
* fixup: typos and cleanup
* fixup: typos
|
|
* Move mangled name out of IRGlobalValue
Previously the `IRGlobalValue` type was used as a root for all IR instructions that can have "linkage," in the sense that a definition in one module can satisfy a use in another module.
The mangled symbol name was stored in state directly on each `IRGlobalValue`, which created some complications, and also forced IR instructions that wanted to support linkage to wedge into the hierarchy at that specific point.
This change moves the mangled name out into a decoration: either an `IRImportDecoration` or an `IRExportDecoration`, both of which inherit from `IRLinkageDecoration` which exposes the mangled name.
This change has a few benefits:
* We can now have any kind of instruction be exported/imported, without having to inherit from `IRGlobalValue`. This could potentially let `IRStructType` and `IRWitnessTable` be simplified to just have operand lists instead of dummy chldren as they do today.
* We can now easily have "global values" like functions that explicitly *don't* get linkage, instead of using a null or empty mangled name as a marker.
* We can use the exact opcode on a linkage decoration to distinguish imports from exports, which could be used to more accurately resolve symbols during the linking step.
Other than adding the decorations and making sure that AST->IR lowering adds them, the main changes here are around any code that used `IRGlobalValue`. Variables and parameters of type `IRGlobalValue*` were changed to `IRInst*` easily, so the main challenge was around code that *casts* to `IRGlobalValue*.
In cases where a cast to `IRGlobalValue` also performed a test for the mangled name being non-null/non-empty, we simply switched the code to check for the presence of an `IRLinkageDecoration`, since that is the new way of indicating a value with linakge.
Most of the serious complications arose in `ir.cpp` around the "linking"/target-specialization and generic specialization steps.
The "linking" logic was checking for `IRGlobalValue` to opt into some more complicated cloning logic, and just checking for a linkage decoration here wasn't sufficient since the front-end *does* produce global values without linkage in some cases (e.g., for a function-`static` variable we produce a global variable without linkage). This logic was updated to just check for the cases that used to amount to `IRGlobalValue`s directly by opcode. It might be simpler in the short term to have kept `IRGlobalValue` around to make the existing casts Just Work, but I'm confident that this logic could actually be rewritten for much greater clarity and simplicity and that is the better way forward.
The generic specialization logic was using some really messy code to generate a new mangled name to represent the specialized symbol, and then searching for an existing match for that name.
The original idea there was that an IR module could include "pre-specialized" versions of certain generics to speed up back-end compilation by eliminating the need to specialize in some cases, but this feature has never been implemented so the overhead here is just a waste.
Instead, I moved generic specialization to use a simpler dictionary to map the operands to a `specialize` instruction over to the resulting specialized value.
This allows for some simplifications in the name mangling logic, because it no longer needs to figure out how to produce mangled names from IR instructions representing types/values.
As part of this change I also overhauled the IR emit logic to produce cleaner output by default, borrowing some of the ideas from the logic in `emit.cpp`. IR values are now automatically given names based on their "name hint" decoration, if any, to make the code easier to follow, and I also made it so that types and literals get collapsed into their use sites in a new "simplified" IR dump mode (which is currently the default, with no way to opt into the other mode without tweaking the code). The resulting IR dumps are much nicer to look at, but as a result the one test that involves IR dumping (`ir/string-literal`) doesn't really test what it used to.
One weird issue that came up during testing is that the `transitive-interface` test had previously been producing output that made no sense (that is, the expected output file wasn't really sensible), and somehow these changes were altering its behavior. Changing the test to use `int` values instead of `float` was enough to make the output be what I'd expect, and hand inspection of generating DXBC has me convinced we were compiling the `float` case correctly too. There appears to be some issue around tests with floating-point outputs that we should investigate.
* fixup: C++ declaration order
|
|
* Make a test case use IR serialization
* Make all IR instructions usable as parents
This makes it so that every `IRInst` has the list of children that used to be on `IRParentInst` and eliminates `IRParentInst`.
Most places in the code were only checking against `IRParentInst` so that they could know whether there were child instructions to iterate over.
This change bloats the size of every instruction by two pointers, but we hope to be able to eliminate that overhead with a better encoding later.
* Change IR decorations to be instructions.
The main change here is that `IRDecoration` now inherits from `IRInst`, and `IRInst` now has a single linked list that holds both decorations *and* children.
At each point where code used to loop over `getChildren()` on an `IRInst`, I checked whether it made sense to leave the operation as processing just the children, or if it should process both decorations and children.
The thorniest bit was making sure the logic for inserting an instruction into a parent is correct. For the most part, once IR code is built all insertions are explicitly before/after another instruction, so the ordering can't get messed up. The sticking point is any code that does an explicit `insertAtStart` or `insertAtEnd`, but I surveyed those to make sure they are correct in context, and I also made all insertions bottleneck through one routine that does a better job of asserting the preconditions than what was there before. We may still want a "smart" insertion function at some point so that if somebody does `someDecoration->insertAtEnd(someInst)` the decoration intelligently goes to the end of the decoration list, and not the entire decorations-and-children list.
All of the existing decoration types were refactored to provide accessors for their operands, rather than directly exposing fields. In most cases the operands are required to be `IRConstant` nodes of fixed types. Not all of these types need to be kept around in the new approach, but they were left in so that as much existing code as possible can be kept working.
The `IRBuilder` was extended with factory functions to make the various decoration types and attach them.
All the fields in concrete decorations that were using `StringRepresentation` or `Name` pointers are now using IR-level string operands which provide their value as an `UnownedStringSlice`, so logic that was working with those decoration values needed to be updated here and there. I also needed to add the logic to clone string-literal values to the IR cloning pass, since they are now being used in almost every piece of code.
A new type of constant IR instruction for literal pointers was added, to handle the cases where an IR decoration needs an operand that is a raw AST-level pointer. These are even being serialized, although we obviously should not rely on them to round-trip through serialization in the future. Ideally, a follow-on change should add a cleanup pass where we remove any decorations from a module that shouldn't be allowed in the serialized code.
The biggest overall cleanup is in the serialization logic, where a lot of code just disappears because it can process the raw "decorations and children" list as the logical children of an IR instruction. The only special cases left are literals (which seem like they will always need special-casing) and global values (because they have a mangled name, which we plan to move into a decoration).
One other example of a simplification made possible by this change: the `IRNotePatchConstantFunc` instruction was implemented as an instruction only because it couldn't be encoded as a decoration at the time (it needed to have an operand that referenced an IR function).
The IR dumping logic was also updated (which meant a change to the `ir/string-literal` test) to try to make it print out all decorations a bit more systematically now that they are encoded like other instructions. The formatting isn't quite perfect, but it is good enough to be able to read what is going on.
I didn't include updates to the validation logic to ensure that decorations are being added in ways that follow the invariants, but that would be a nice thing to add next.
* fixup: 64-bit issues
* fixup: forward declaration issues
|
|
* Rework command-line options handling for entry points and targets
Overview:
* The biggest functionality change is that the implicit ordering constraints when multiple `-entry` options are reversed: any `-stage` option affects the `-entry` to its *left* instead of to its *right* as it used to. This is technically a breaking change, but I expect most users aren't using this feature.
* The options parsing tries to handle profile versions and stages as distinct data (rather than using the combined `Profile` type all over), and treats a `-profile` option that specifies both a profile version and a stage (e.g., `-profile ps_5_0`) as if it were sugar for both a `-profile` and a `-stage` (e.g., `-profile sm_5_0 -stage fragment`).
* We now technically handle multiple `-target` options in one invocation of `-slangc`, but do not advertise that fact in the documentation because it might be confusing for users. Similar to the relationship between `-stage` and `-entry`, any `-profile` option affects the most recent `-target` option unless there is only one `-target`.
* The logic for associating `-o` options with corresponding entry points and targets has been beefed up. The rule is that a `-o` option for a compiled kernel binds to the entry point to its left, unless there is only one entry point (just like for `-stage`). The associated target for a `-o` option is found via a search, however, because otherwise it would be impossible to specify `-o` options for both SPIR-V and DXIL in one pass.
* The handling of output paths for entry points in the internal compiler structures was changed, because previously it could only handle one output path per entry point (even when there are multiple targets). The new logic builds up a per-target mapping from an entry point to its desired output path (if any).
Details:
* Support for formatting profile versions, stages, and compile targets (formats) was added to diagnostic printing, so that we can make better error messages. This is fairly ad hoc, and it would be nice to have all of the string<->enum stuff be more data-driven throughout the codebase.
* Test cases were added for (almost) all of the error conditions in the current options validation. The main one that is missing is around specifying an `-entry` option before any source file when compiling multiple files. This is because the test runner is putting the source file name first on the command line automatically, so we can't reproduce that case.
* Several reflection-related tests now reflect entry points where they didn't before, because the logic for detecting when to infer a default `main` entry point have been made more loose
* On the dxc path, beefed up the handling of mapping from Slang `Profile`s to the coresponding string to use when invoking dxc.
* A bunch of tests cases were in violation of the newly imposed rules, so those needed to be cleaned up.
* There were also a bunch of test cases that had accidentally gotten "disabled" at some point because there were comparing output from `slangc` both with and without a `-pass-through` option, but that meant that any errors in command-line parsing produced the *same* error output in both the Slang and pass-through cases. This change updates `slang-test` to always expect a successful run for these tests, and then manually updates or disables the various test cases that are affected.
* When merging the updated test for matrix layout mode, I found that the new command-line logic was failing to propagate a matrix layout mode passed to `render-test` into the compiler. This was because the `-matrix-layout*` options were implemented as per-target, but the target was being set by API while the option came in via command line (passed through the API). It seems like we want matrix layout mode to be a global option anyway (rather than per-target), so I made that change here.
* Add missing expected output files
* A 64-bit fix
* Remove commented-out code noted in review
|
|
* * Added support for strings in IR with IRStringLit - with storage of chars after it
* Added kIRDecorationOp_Transitory - can be used for detecting instructions constructed on stack
* Made IRConstant hashing work off type
* Fix comment that is out of date about how an instruction is determines to hold a transitory string.
|
|
* Introduce an IR-level type system
Up to this point, the Slang IR has used the front-end type system to represent types in the IR.
As a result (but ultimately more importantly) the IR representation of generics and specialization has used AST-level concepts embedded in the IR.
For example, to express the specialization of `vector<T,N>` to a concrete type `float` for `T`, we needed an IR operation that could represent the specialization, with operands that somehow represented the type argument `float`.
The whole thing was very complicated.
The big idea of this change is to introduce a new representation in which types in the IR are just ordinary instructions, so that using them as operands makes sense. The hierarchy of IR types closely mirrors the AST-side hierarchy for now, and that will probably be something we should maintain going forward.
In order to make these changes work, though, I also had to do major overhauls of things like the way substitutions are performed, how we check interface conformances, the way lookup through interface types is done, etc. etc. This is a big change, and unfortunately any attempt to summarize it in the commit message wouldn't do it justice.
* Fix 64-bit build warning
* Fix up some clang warnings/errors
|
|
The main practical change here is that things that used to be `IRValue`s, like literals, are now being expressed as instructions in the global scope.
In order to validate that things are actually being handled correctly, this change introduces an explicit "validation" pass that can be run on the IR to check for different invariants (although it doesn't check many of the important ones right now). I've left the validation pass turned off by default, but with a command-line flag to enable it. We may want to make it be on by default in debug builds, just to keep us honest. The main invariant for the moment is that when on IR instruction is used as an operand to another, it had better come from the same IR module.
Some of the existing passes were violating this rule, in particular when it came to cloning of witness tables related to global generic parameter substitution. Those features can in theory be handled better now by allowing `specialize` instructions at other scopes, but I didn't want to over-complicate this change, so I make just enough fixes to ensure that these steps always clone witness tables they get from the "symbols" on an IR specialization context. In order for this to work when recursively specializing, I had to ensure that the logic for generic specialization had a notion of a "parent" specialization context that it would fall back to to perform cloning when necessary.
This change keeps the logic that was caching and re-using the instructions for literal values within a module, but adds some logic that isn't really being tested right now for picking the right parent instruction to insert a constant instruction into. This logic doesn't trigger right now because all of the cases we are using it on have zero operands (and so they always get "hoisted" to the global scope), but eventually for things like types we want to be able to support instructions with operands (e.g., `vector<float, 4>`) and handle the case where some of those operands come from different scopes (e.g., when nested inside a generic).
The final change here is mostly cosmetic: the `IRBuilder` is now more abstract about where insertion occurs: it tracks a single `IRParentInst` to insert into, and then an optional `IRInst` to insert before. In the common case, that parent is an `IRBlock`, but it could conceivably also be the global scope, or a witness table, etc. Use sites where we used to change those fields directly now use distinct methods `setInsertInto(parent)` and `setInsertBefore(inst)` which capture the two cases we care about. Accessors are also defined to extract the current block (if the current parent is a block), and the current "function" (global value with code, if the current parent is a global value with code, or a block inside one).
With this work in place, it should be possible for a follow-on change to start putting `specialize` instructions at the global scope and thus clean up some of the on-the-fly specialization work. This work should also help with some of the requirements around a distinct IR-level type system and more explicit generics.
|
|
* Initial work on validating "constexpr"-ness in IR
The underlying issue here is that certain operations in the target shading languages constrain their operands to be compile-time constants. A notable example is the optional texel offset parameter to the `Texture2D.Sample` operation.
When calling these operations in GLSL, the user is required to pass a "constant expression," and any variables in that expression must therefore be marked with the `const` qualifier (and themselves be initialized with constant expressions). Any GLSL output we generate must of course respect these rules.
When calling these operations in HLSL, the user is not so constrained. Instead, they can pass an arbitrary expression, which may involve ordinary variables with no particular markup, and then the compiler is responsible for determining if the actual value after simplification works out to be a constant. In some cases, the requirement that a value be constant might actually trigger things like loop unrolling. Also, it is okay to use a function parameter to determine such a constant expression, as long as the argument turns out to be a constant at all call sites.
The way we have decided to tackle these challenges in Slang is that we we propagate a notion of `constexpr`-ness through the IR. This is currently being tackled in `ir-constexpr.cpp` with a combination of forward and backward iterative dataflow:
* When the operands to an instruction are all `constexpr`, and the opcode is one we believe can be constant-folded, then we infer that the instruction *can* be evaluated as `constexpr`
* When instruction is required to be `constexpr`, then we infer that all of its operands are also required to be `constexpr`.
If this process ever infers that a function parameter is required to be `constexpr`, then we might have to continue propagation at all the call sites to that function.
If after all the propagation is done, there are any cases where an instruction is *required* to be `constexpr`, but it *can't* be `constexpr` (we weren't able to infer `constexpr`-ness for its operands), then we issue an error.
This implementation encodes the idea of `constexpr`-ness in the IR as part of the type system, using a simplified notion of rates. This change adds a `RateQualifiedType` that can represent `@R T`, and then introduces a `ConstExprRate` that can be used for `R`. Many accessors for the type information on IR nodes were updated to distinguish when one wants the "full" type of an IR value (which might include rate information) vs. just the "data" type.
A `constexpr` qualifier was added in the front-end, and is being used to decorate the texel offset parameter for `Texture2D.Sample`. Lowering from AST to IR looks for this qalifier and infers when a function parameter must be typed as `@ConstExpr T` instead of just `T`.
There are lots of limitations and gotchas in the implementation so far:
* The `@ConstExpr` rate is the only one added in this change, but it seems clear that the conceptual `ThreadGroup` rate that was added to represent `groupshared` should probably get folded into the representation.
* I'm not 100% pleased with how many places in the IR I have to special-case for rate-qualified types. At the same type, pulling out rate as a distinct field on `IRValue` would probably require that we pay attention to rate everywhere.
* I've added a test case to show that we can issue errors when users fail to provide a constant expression for the texel offset, but the actual error message isn't great because it doesn't indicate *why* a constant expression was required. Realistically the "initial IR" should contain a few more decorations we can use to relate error conditions back to the original code (even if this is in a side-band structure).
* I've added a test case that is supposed to show that we can back-propagate `constexpr`-ness to local variables, and I've manually confirmed that it works for Vulkan/SPIR-V output, but the level of Vulkan support in `render_test` today means I can't enable the test for check-in.
* While I'm attempting to propagate `@ConstExpr` information from callees to callers, I haven't implemented any logic to specialize callee functions based on values at call sites.
* In a similar vein, there is no handling of control-flow dependence in the current code. If we infer that a phi (block parameter) needs to be `@ConstExpr`, then it isn't actually enough to require that the inputs to the phi (arguments from predecessor blocks) are all `@ConstExpr` because we also need any control-flow decisions that pick which incoming edge we take to be `@ConstExpr` as well.
* As a practical matter, implicit propagation of `@ConstExpr` from a function body to a function parameter should only be allowed for functions that are "local" to a module. Any function that might be accessed from outside of a module should really have had its `@ConstExpr` parameter marked manually, and our pass should validate that they follow their own rules. Right now we have no kind of visibility (`public` vs `private`) system, so I'm kind of ignoring this issue.
While that is a lot of gaps, this is also just enough code to get the Falcor MultiPassPostProcess example working, so I'm inclined to get it checked in.
* Fixup: missing expected output for test
* Fixup: disable test that relies on [unroll] for now
|
|
* Re-define deprecated compile flags
By including these flags in the header file, with a value of zero, we can allow some existing code to compile even after the major changes to the implementation.
* The `SLANG_COMPILE_FLAG_NO_CHECKING` option will effectively be ignored, since checking is always enabled.
* The `SLANG_COMPILE_FLAG_SPLIT_MIXED_TYPES` option will now act as if it is always enabled (and indeed some of the code has been relying on this flag being set always).
* Make subscript operators writable for writable textures
This even had a `TODO` comment saying that we needed to fix it, and now I'm seeing semantic checking failures because we didn't define these and so we find assignment to non l-values.
* Fix definitions of any() and all() intrinsics
These should always return a scalar `bool` value, but they were being defined wrong in two ways:
1. They were using their generic type parameter `T` in the return type
2. They were returning a vector in the vector case, and a matrix in the matrix case.
This change just alters the return type to be `bool` in all cases.
* Fix bug in SSA construction
When eliminating a trivial phi node, it is possible that the phi is still recorded as the "latest" value for a local variable in its block.
When later code queries that value from the block (which can happen whenever another block looks up a variable in its predecessors), it would get the old phi and not the replacement value.
I simply added a loop that checks if the value we look up is a phi that got replaced, and then continues with the replacement value (which might itself be a phi...). A more advanced solution might try to get clever and have the map itself hold `IRUse` values so that we can replace them seamlessly.
* Simplify IR control flow representation
This change gets rid of various special-case operations for conditional and unconditional branches, and instead requires emit logic to recognize when a direct branch is targetting a `break` or `continue` label.
The new approach here isn't perfect, but it seems beter than what we had before, because it can actually work in the presence of control-flow optimizations (including our current critical-edge-splitting step).
* Load from groupshared isn't groupshared
When loading from a `groupshared` variable, the resulting temporary shouldn't have the `groupshared` qualifier on it.
This might eventually need to generalize to a better understanding of storage modifiers in the IR, but I don't really want to deal with that right now.
* Don't emit references to typedefs in output code
Now that we are using the IR for all codegen, we shouldn't be dealing with surface-level things like `typedef` declarations in the output code; just use the type that was being referred to in the first place.
* Fix floating-point literal printing for IR
The IR was calling `emit()` instead of `Emit()` (we really need to normalize our convention here), and was implicitly invoking a default constructor on `String` that takes a `double` (that constructor should really be marked `explicit`), and which doesn't meet our requirements for printing floating-point values.
* Fix error when importing module that doesn't parse
We already added a case to bail out if semantic checking fails, but neglected to add a case if there is an error during parsing of a module to be imported.
Note: this logic doesn't correctly register the module as being loaded (but still in error), so users could see multiple error messages if there are multiple `import`s for the same module.
* Improve error message for overload resolution failure
- Drop debugging info from the candidate printing
- Add cases to print `double` and `half` types properly
* Fixup: switch loopTest to ifElse in expected IR output
|
|
* Generate SSA form for IR functions
The basic idea here is simple: in the front-end after we have lowered the AST to initial IR we will apply a set of "mandatory" optimization passes. The first of these is to attempt to translate the all functions into SSA form so that they are amenable to subsequent dataflow optimizations. Eventually, the mandatory optimization passes would include diagnostic passes that make sure variables aren't used when undefined, etc.
Just doing basic SSA generation already cleans up a lot of the messiness in our IR today, because constructs that used to involve many local variables can now be handled via SSA temporaries.
The implementation of SSA generation is in `ir-ssa.cpp`, and it follows the approach of Braun et al.'s "Simple and Efficient Construction of Static Single Assignment Form." I used this instead of the more well-known Cytron et al. algorithm because Braun's algorith mis very simple to code, and does not require auxiliary analyses to generate the dominance frontier.
The main wrinkle in our SSA representation right now is that instead of using ordinary phi nodes, we instead allow basic blocks to have parameters, where predecessor blocks pass in different parameter values. This encodes information equivalent to traditional phi nodes, but has two (small) benefits:
1. There is no fixed relationship between the order of phi operands and predecessor blocks, so we don't have to worry about breaking the phis when we alter the order in which predecessors are stored. This is important for us because predecessors are being stored implicitly.
2. It is easy to operationalize a "branch with arguments" either when lowering to other languages, or when interpreting the IR. A branch with arguments is implemented as a sequence of stores from the arguments to the parameters of the target block (very similar to a call), followed by a jump to the block.
Relevant to the above, this change also adds an interface for enumerating the predecessors or successors of a block in our CFG. Rather than use an auxliary structure, we directly use the information already encoded in the IR:
* The sucessors of a block are the target label operands of its terminator instruction. In our IR this is a contiguous range of `IRUse`s, possible with a stride (to account for the way `switch` interleaves values and blocks).
* The predecessors of a block are a subset of the uses of the block's value. Specifically, they are any uses that are on a terminator instruction, and within the range of values that represent the successor list of that instruction.
One important limitation of the "blocks with arguments" model for handling phis is that it is really only convenient to stash extra arguments on an unconditional terminator instruction. This change works around this prob lem by breaking any "critical edges" - edges between a block with multiple successors and one with multiple predecessors. We assume that "phi" nodes will only ever be needed on a block with multiple predecessors, and because critical edges are broken, each of these predecessors will then have only a single successor, so its branch instruction can handle the extra arguments.
This change introduces a notion of an "undefined" instruction in the IR. This is handled as an instruction rather than a value because I anticipate that we will want to distinguish different undefined values when it comes time to start issuing error messages (those messages will need to point to the variable that was used when undefined).
* Fix expected test output.
Another change was merged that enabled the `glsl-parameter-blocks` test, and its output is affected by our IR optimization work.
|
|
The basic change is simple: remove support for all code generation paths other than the IR.
There is a lot of vestigial code left, but the main logic in `ast-legalize.*` is gone.
Doing this breaks a *lot* of tests, for various reasons:
- We can no longer guarantee exactly matching DXBC or SPIR-V output after things pass through out IR
- Many builtins don't have matching versions defined for GLSL output via IR (even when they had versions defined via the earlier approach that worked with the AST)
- A lot of code creates intermediate values of opaque types in the IR, which turn into opaque-type temporaries that aren't allowed (this breaks many GLSL tests, but also some HLSL)
I implemented some small fixes for issues that I could get working in the time I had, but most of the above are larger than made sense to fix in this commit.
For now I'm disabling the tests that cause problems, but we will need to make a concerted effort to get things working on this new substrate if we are going to make good on our goals.
|
|
* IR: fixes for subscript accessors
Fixes #320
This is a bunch of fixes for handling of `__subscript` operations on builtin types (notably `RWStructuredBuffer` and `StructuredBuffer` at this point).
- Automatically add a `GetterDecl` to any subscript decalratio was declithout any accessors. This avoids hitting a null- dereference in the emit logic.
- Add a notion of a `RefAccessor` (declared with `ref`) as a peer to getters and setters. The idea is that a `ref` accessor returns a pointer to the element data, so that it can be used for both getting and setting values. This is closer to the behavior of `RWStructuredBuffer` element access in HLSL.
- Fixes for dealing with "access chains" where there might be a combination of a subscript (where the is a `get` and `set` but no `ref`) and member access, so that we have to read the base value into a temp, modify it, and then write it back.
- This logic is still a bit of a mess, so we will eventually want to take a more consistent pass over this to deal with how we "materialize" values for setters.
- Update `RWStructuredBuffer` to have a `ref` accessor, and then fix up the IR tests to handle the new opcode that I added for it.
- Note: I didn't handle this as an intrinsic simply because the `tests/ir/*` tests aren't really set up to handle builtins with ugly mangled names.
* Fixup: type error in VM for buffer element ref
I was using the result type of the op as the element type for computing the element address, but the result type is a pointer to the real element type.
This caused test failures on 64-bit platforms, where the stride of the buffer in the `ir/factorial` test needs to be 4.
The fix is to assume the result type is a pointer, and extract the pointed-to type out of that.
|
|
- Change function mangling so we use `p<parameterCount>p` instead of just `p<parameterCount>` to avoid the parameter count running into digits at the start of a mangled type name and tripping up the un-mangling logic.
- We really need to step back at some point and define our mangling scheme a bit more carefully, especially if we are going to keep going down this road where un-mangling things is important for generating HLSL output.
- Also allow the unmangling logic to unmangle a few more cases of generic parameters, so that it can skip over them to get to the parameter count of the underlying function.
- Add a notion of an `unreachable` instruction to the IR, and emit it as the terminator (if needed) at the end of the last block for a function with a non-void return type.
- This does *not* implement any logic to emit a diagnostic if the `unreachable` turns out to be potentially reachable
- Fix a bug in IR specialization of generics where we can't create two different specializations of the same function, because both get registered in the same hash map
With all these fixes, testing in Falcor modified to use the full Slang compiler and IR for all HLSL/Slang:
- The UI and text rendering shaders yield HLSL that compiles without error; no idea if they actually *work*
- The ModelViewer shaders yield HLSL, but there are some issues (looks like type legalization isn't applying to stuff inside constant buffers)
|
|
The test case had previously been calling `GroupBarrierWithGroupSync` as if it was a special-cased instruction, but now it is just calling it as an ordinary (intrinsic) function.
I haven't removed the now-useless instruction, but it would be a good cleanup to go through and eliminate all the instruction cases we aren't using in the near future.
|
|
This change includes a lot of infrastructure work, but the main point is to allow code like the following:
```
// define an interface
interface Helper { float help(); }
// define a generic function that uses the interface
float test<T : Helper>( T t ) { return t.help(); }
// define a type that implements the interface
struct A : Helper { float help() { return 1.0 } }
// define an ordinary function that calls the
// generic function with a concrete type:
float doIt()
{
A a;
return test<A>(a);
}
```
Getting this to generate valid code involves a lot of steps. This change includes the initial version of all of these steps, but leaves a lot of gaps where more complete implementation is required.
The changes include:
- Member lookup on types has been centralized, and now handles the case where the type we are looking for a member in is a generic parameter (e.g., given `t.help()` we can now look up `help` in `Helper` by knowing that `t` is a `T` and `T` conforms to `Helper`).
- There is an obvious cleanup still to be done here where the same exact logic should be used to look up available "constructor" declarations inside a type when the type is used like a function.
- Add a notion of subtype constraint "wittnesses" to the type system. When a generic is declared as taking `<T : Helper>` it really takes two generic parameters: the type `T` and a proof that `T` conforms to `Helper`. The actual arguments to a generic will then include both the type argument and a suitable witness argument (both type-level values).
- As it stands right now, a witness wraps a `DeclRef` to the declaration that represents the appropriate subtype relationship. So if we have `struct A : Helper`, that `: Helper` part turns into an `InheritanceDecl` member, and a reference to that member can serve as a witness to the fact that `A` conforms to `Helper`.
- Make explicit generic application `G<A,B>` synthesize the additional arguments that represent conformances required by the generic.
- This does *not* yet deal with the case where a generic is implicitly specialized as part of an ordinary call `G(a,b)`
- A bug fix to not auto-specialize generics during lookup. The problem here was related to an attempted fix of an earlier issue.
During checking of a method nested in a generic type, we were running into problems where `DeclRefType::create()` was getting called on an un-specialized reference to `vector`, and this was leading to a crash when the code looked for the arguments for the generic. This was worked around by having name lookup automatically specialize any generics it runs into while going through lookup contexts.
That choice creates the problem that in a generic method like this:
```
void test<T>(T val) { ... }
```
any reference to `val` inside the body of `test` will end up getting specialized so that it is effectively `test<T>::val`, when that isn't really needed.
- Add front-end logic to check that when a type claims to conform to an interface it actually must provide the methods required by the interface. The checking process goes ahead and builds a front-end "witness table" that maps declarations in the interface being conformed to over to their concrete implementations for the type.
- At the moment the checking is completely broken and bad: it assumes that *any* member with the right name is an appropriate declaration to satisfy a requirement. That obviously needs to be fixed.
- Add an explicit operation to the IR for lookup of methods: `lookup_interface_method(w, r)` where `w` is a reference to the "witness" value and `r` is an `IRDeclRef` for the member we want to look up.
- Add an explicit notion of witness tables to the IR. These end up being the IR representation of an `InheritanceDecl` in a type, and they are generated by enumerating the members that satisfy the interface requirements (which were handily already enumerated by the front-end checking). The witness table is an explicit IR value, and so it will be referenced/used at the site where conformance is being exploited (e.g., as part of a `specialize` call), so it should be safe to eliminate witness tables that are unused (since they represent conformances that aren't actually exploited). Similarly, the entries in a witness table are uses of the functions that implement interface methods, and so keep those live.
- In order to implement the above, I did a bit of a cleanup pass on the IR representation so that there is an `IRUser` base that `IRInst` inherits from, so that we can have users of values that aren't instructions.
- One annoying thing is that because of how types and generics are handled in the IR, we needed a way to have a type-level `Val` that wraps an IR-level value: e.g., to allow an IR-level witness table to be used as one of the arguments for specialization of a generic. The design I chose here is to have a "proxy" `Val` subclass (`IRProxyVal`) that wraps an `IRValue*`. These should only ever appear as part of types and `DeclRef`s that are used by the IR.
- One annoying bit here is that an IR value might then have a use that is not manifest in the set of IR instructions, and instead only appears as part of a type somewhere.
- I'm not 100% happy with this design, but it seems like we'd have to tackle similar issues if/when we eventually allow functions to have `constexpr` or `@Constant` parameters
- Make generic specialization also propagate witness table arguments through to their use sites (this is mostly just the existing substitution machinery, once we have `IRProxyVal`), and then include logic to specialize `lookup_interface_method` instructions when their first operand is a concrete witness table.
All of this work allows a single limited test using generics with constraints to pass, but more work is needed to make the solution robust.
|
|
There are two big changes here:
- Add logic during the initial IR cloning pass for an entry point + target that tries to pick the best possible version of any target-overloaded function. This allows us to pick the intrinsic version of `saturate()` when compiling for HLSL output, but then pick the non-intrinsic version (that is implemented in terms of `clamp()`) when targetting GLSL.
- Add an initial specialization pass that tries to deal with generics. This required some fixing work to IR generation, so that we correctly generate explicit operations to specialize a generic for specific types (this is currently implemented as a `specialize` instruction that takes the generic to specialize plus a declaration-reference that represents the specialized form). With that work in place, we can scan for `specialize` instructions inside of non-generic functions, and use them to trigger generation of specialized code. We rely on the name-mangling scheme to help us find pre-existing specializations when possible.
There are also a bunch of cleanups encountered along the way:
- Don't use the explicit `layout(offset=...)` for uniforms, because it isn't supported by all current drivers. For now we will just assume that our layout rules compute the same values that the driver would for un-marked-up code. We can come back later and try to implement a workaround in the cases where this doesn't apply (e.g., by re-running the layout logic as part of emission, and dropping layout modifiers from variables that don't need explicit layout).
- Fix some issues in IR dump printing so that we print function declarations more nicely.
- Testing: print out failing pixel when image-diff fails
|
|
The big addition here is that the Slang "bytecode" is no longer treated as just a "code generation target" (`CodeGenTarget`) akin to DX bytecode (DXBC) or SPIR-V, but instead is a `ContainerFormat` that can be used to emit all the results of a compile request (well, currently just the IR-as-BC, but the intention is there).
Getting to this goal involved some prior checkins that eliminated bogus "targets" that weren't really akin to SPIR-V or DXBC: `-target slang-ir-asm` and `-target reflection-json`. Those targets were really in place to support testing, and so they've been made more explicit testing/debug options.
This change eliminates `-target slang-ir` and instead tries to allow the user to specify `-o foo.slang-module` as an output file name, that indicates the intention to output a "container" file that will wrap up all the generated code.
I've also gone ahead and generalized the existing `-target` option so that we are actually building up a *list* of code generation targets. This is largely just a cleanup, since it forces code to be more aware of when it is doing something target-specific vs. target independent. For example, reflection layout information lives on a requested target, and not on the compile request as a whole, and similarly output code is per-target, per-entry-point.
As a cleanup, I eliminated support for per-translation-unit output. This was vestigial code from back when I used to try and do HLSL generation for a whole translation unit instead of per-entry-point (which turned out to be a lot of complexity for little gain), and it was only being used in the `hello` example and the `render-test` test fixture - in both cases fixing it up was easy enough. I've stubbed out the old `spGetTranslationUnitSource` API, but haven't removed it yet.
|
|
* Get rid of the `-slang-ir-asm` target
This is really only useful for debugging, so I've replaced the functionality with a `-dump-ir` command line option (which dump's the IR for an entry point before doing codegen).
* fixup: use HLSL target, not DXBC, so test can run on Linux
|
|
* IR: overhaul IR design/implementation
Closes #192
Closes #188
This is a major overhaul of how the IR is implemented, with the primary goal of just using the AST-level type representation as the IR's type representation, rather than inventing an entire shadow set of types (as captured in issue #192).
One consequence of this choice is that types in the IR are no longer explicit "instructions" and are not represented as ordinary operands (so a bunch of `+ 1` cases end up going away when enumerating ordinary operands).
Along the way I also got rid of the embedded IDs in the IR (issue #188) because this wasn't too hard to deal with at the same time.
Another related change was to split the `IRValue` and `IRInst` cases, so that there are values that are not also instructions. Non-instruction values are now used to represent literals, references to declarations, and would eventually be used for an `undef` value if we need one. IR functions, global variables, and basic blocks are all values (because they can appear as operands), but not instructions.
The main benefit of this approach is that the top-level structure of a bytecode (BC) module is much simpler to understand and walk, and BC-level types are represented much more directly (such that we could conceivably use them for reflection soon).
* fixup: 64-bit build fix
* fixup: try to silence clang's pedantic dependent-type errors
* fixup: bug in VM loading of constants
|
|
None of these changes are made "live" at the moment. I'm just trying to get them checked in to avoid divering too far from `master` at any point during development.
- Add basic emit logic to produce GLSL from the IR in a few cases (the existing IR emit logic was ad hoc and HLSL-specific)
- When lowering a function declaration, walk up its chain of parent declarations to collect additional parameters as needed
- When lowering a call, make sure to add generic arguments that come from the declaration reference being called
- Attach a "mangled name" to symbols when lowering, so that we can eventually use that name to resolve things for linkage.
- After the above work, I had to apply some fixups to make sure that generic arguments *don't* get added when the user is calling an `__intrinsic_op` function, since those should map 1-to-1 down to instructions with just their ordinary parameter list.
A big open question right now is whether I should continue to represent the generic arguments as just part of the ordinary argument list for a function, or split them out into separate `applyGeneric` and `apply` steps.
A strongly related question is whether a declaration with generic parameters should lower into a single declaration, or one declaration nested inside an outer generic declaration.
A good future step at this point would be to eliminate a lot of the `__intrinsic_op` stuff in favor of having the builtin functions include their own definitions, which might be in terms of a new expression-level construct for writing inline IR operations. This can't be done until the existing AST-to-AST path is no longer needed for cross-compilation purposes.
More immediate next steps here:
- We need a way to round-trip calls to external declaration that get handled by this mangled-name logic. Basically, if we are asked to output HLSL and we see a call to `_S...GetDimensions...(float4, t, a, ...)` we need to be able to walk the mangled name and get back to `t.getDimensions(a, ...)` without a whole lot of manual definitions to make things round-trip.
- In the other case, where a declaration isn't built-in for the chosen target, we need to be able to load a module of target-specific definitions (which will somehow map back to symbols with certain mangled names) and then look these up (by mangled name) and then load/link/inline them into the user's IR to satisfy requirements in their code.
|
|
At a high level, this commit adds two things:
1. A "bytecode" format for serializing Slang IR instructions and related structure (functions, "registers")
2. A virtual machine that can load and then execute code in that bytecode format.
The reason for kicking off this work right now is that we *need* a way to run tests on Slang code generation that doesn't rely on having a GPU present (given that our CI runs on VM instances without GPUs), nor on textual comparison to the output of other compilers. With these features I've implemented a slapdash `slang-eval-test` test fixture that can run a (trivial) compute shader to very our compilation flow through to bytecode.
Some key design constraints/challenges:
- The bytecode format should be "position independent" so that a user can just load a blob of data and then inspect it without having to deserialize into another format, allocate memory, etc. Eventually the bytecode format might be a replacement for out current reflection API (we used to base reflection off a similar format, but the cost/benefit wasn't there at the time and we switched to just using the AST).
- The VM should be able to execute bytecode functions without doing any per-operation translation, JIT, etc. (translation of more coarse-grained symbols is okay). For now the VM is just being used to run tests, but eventually I'd like it to be viable for:
- Running Slang-based code in the context of the compiler itself. This starts with stuff like constant-folding in the front-end, but could expand to more general metaprogramming features.
- Running Slang-based ocde within a runtime application (e.g., a game engine) that wants to be able to run things like "parameter shader" code, or even just evaluate compute-like code on CPU (e.g., when supporting particles on both CPU and GPU).
- Finally, the bytecode format should ideally be able to round-trip back to the IR without unacceptable loss of information. This requirement and the previous one play off of each other, because things like a traditional SSA phi operation is ugly when you have to actually *execute* it. This doesn't matter right now when we don't have SSA yet, but it might be part of the decision-making here.
The actual implementation is centralized in `bytecode.{h,cpp}` and `vm.{h.cpp}`.
Big picture notes:
- The space of opcodes is shared between IR and bytecode (BC), with the hope that this makes translation of operations between the two easy.
- The actual bytecode instruction stream relies on a variable-length encoding for integer values, including opcodes and operand numbers, so that the common case is single-byte encoding.
- In the long term I intend to have a rule that if you use a single-byte encoding for an opcode, then all operands are required to use single-byte encodings too. Operations that need multi-byte operands would then be forced to use a multi-byte encoding of the op, and would be sent down a slower path in the interpeter.
- The "bytecode"'s outer structure is based on ordinary data structures linked with pointers, but they are "relative pointers" so the actual structure is position-independent.
- There are two main kinds of operands: registers and "constants." An operand is a signed integer where non-negatie values indicate registers (with `index == operandVal`) and negative values indicate constants (with `index == ~operandVal`).
- Registers are stored in the "stack frame" for a VM function call, and each has a fixed offset based on the size of the type and those that come before it. Conceptually, registers are allowed to overlap if they aren't live at the same time, and we manage this with a simple stack model: every register is supposed to identify the register that comes directly before it (this isn't implemented yet).
- "Constants" are more realistically a representation of "captured" values, but they are currently also how constants come in. Basically we can use a compact range of indices in the bytecode for a function, and each of these indices indirectly refers to some value in the next outer scope.
- The actual encoding of bytecode instructions right now is largely ad-hoc and very wasteful (we encode the type on everything, and we also encode everything as if it had varargs).
- In some cases, an instruction needs to know the types of the values involved (e.g., because it needs to load an array element, which means copying a number of bytes based on the size). The way the VM works we have types attached to our registers, so we currently get sneaky and look at those types in some ops. Longer term is makes sense to encode the required type info directly in the BC.
- There's a whole lot of hand-waving going on with how the actual top-level bytecode module gets loaded, because of the way we currently treat the top-level module as an instruction stream in the IR. This means that we try to represent the loaded module as a "stack frame" for a call to the module as a function, but that approach as serious problems, and isn't realistically what we want to do.
|
|
* IR: handle control flow constructs
This change includes a bunch of fixes and additions to the IR path:
- `slang-ir-assembly` is now a valid output target (so we can use it for testing)
- This uses what used to be the IR "dumping" logic, revamped to support much prettier output.
- A future change will need to add back support for less prettified output to use when actually debugging
- IR generation for `for` loops and `if` statements is supported
- HLSL output from the above control flow constructs is implemented
- Revamped the handling of l-values, and in particular work on compound ops like `+=`
- Add basic IR support for `groupshared` variables
- Add basic IR support for storing compute thread-group size
- Output semantics on entry point parameters
- This uses the AST structures to find semantics, so its still needs work
- Pass through loop unroll flags
- This is required to match `fxc` output, at least until we implement
unrolling ourselves.
* Fixup: 64-bit build issues.
* fixup for merge
|