<feed xmlns='http://www.w3.org/2005/Atom'>
<title>slang.git/source/slang/ir-inst-defs.h, branch master</title>
<subtitle>Making it easier to work with shaders</subtitle>
<id>https://git.yummers.dev/slang.git/atom?h=master</id>
<link rel='self' href='https://git.yummers.dev/slang.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/'/>
<updated>2019-05-31T21:20:37+00:00</updated>
<entry>
<title>Use slang- prefix on slang compiler and core source (#973)</title>
<updated>2019-05-31T21:20:37+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2019-05-31T21:20:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=6cbc3929a54d37bd23cb5efa8e3320ba02f78b2f'/>
<id>urn:sha1:6cbc3929a54d37bd23cb5efa8e3320ba02f78b2f</id>
<content type='text'>
* Prefixing source files in source/slang with slang-

* Prefix source in source/slang with slang- prefix.

* Rename core source files with slang- prefix.

* Update project files.

* Fix problems from automatic merge.
</content>
</entry>
<entry>
<title>Initial support for the `precise` keyword (#958)</title>
<updated>2019-04-29T16:31:25+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2019-04-29T16:31:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=ded340beb4b5197b559626acc39920abb2d39e77'/>
<id>urn:sha1:ded340beb4b5197b559626acc39920abb2d39e77</id>
<content type='text'>
Fixes #858

The `precise` keyword exists in both HLSL and GLSL and when applied to a variable declaration is supposed to indicate that all computations that contribute to the value of that variable should not be altered based on "fast-math" optimizations. The main examples are that separate multiply and add operations should not be turned into fused multiply-add (fma) operations, and that operations cannot ignore the possibility of infinity or not-a-number values (e.g., by assuming that `x * 0.0f` is always `0.0f`).

(Aside: it is possible that my understanding of what the semantics of `precise` are in HLSL and GLSL is imperfect so that either the GLSL variant isn't sufficient to provide the semantics of the HLSL keyword, or that the definition of "all computations that contribute" to a value isn't actually correct. We may need to revise this implementation based on subsequent learnings.)

The basic idea here is to turn the AST `precise` keyword into a `[precise]` decoration in the IR and then emit that as a `precise` keyword again in the output.

The main catch is that whereas most of our existing IR decorations apply to things like global shader parameters or `struct` members that usually stick around for the duration of compilation, `[precise]` will get slapped on local variables that will often get optimized away by our SSA pass. There are two ways a variable can get eliminated/replaced during the SSA pass:

1. A use of the variable can be replaced with an ordinary instruction that computes its value.
2. A use of the variable can be replaced with a reference to a "phi node" that will take on the appropriate value based on control flow.

These two cases already had logic to copy a "name hint" decoration from the variable over to an instruction that will replace it, and I simply extended them to also propagate over a `[precise]` decoration.

The test case added with this change intentionally constructs a case where `[precise]` needs to be propagated over to an SSA "phi node" in order to generate correct output code.

The other gotcha is that we can emit variable declarations in various places in `emit.cpp`, and all of these needed to handle `[precise]`. Not only do we have actually local variables (`IRVar`), but we also have SSA phi nodes (`IRParam`), and then there are cases where an intermediate computation (an ordinary instruction) should be `[precise]` and thus we need to emit it as a temporary (not folding it into its use sites) and make sure that the temporary itself gets the `precise` keyword.

I have manually confirmed that in the output SPIR-V, this change results in the `NoContraction` SPIR-V decoration being added to the relevant operations, and the output DXBC contains a multiply and an add in place of a multiply-add. The output DXIL does not show any obvious changes due to `precise`, although the exact order and operands of the math instructions emitted does differ when `precise` is added/removed. In all cases the output is equivalent to hand-written HLSL/GLSL with a `precise`-qualified local variable.</content>
</entry>
<entry>
<title>Add better control over image formats for GLSL/SPIR-V targets (#939)</title>
<updated>2019-04-08T18:09:03+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2019-04-08T18:09:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=dc54f1dd1b694b087816857a791e9d37dc25de6d'/>
<id>urn:sha1:dc54f1dd1b694b087816857a791e9d37dc25de6d</id>
<content type='text'>
* Add better control over image formats for GLSL/SPIR-V targets

Currently Slang emits GLSL code assuming all R/W images need to have explicit formats, and thus we try to infer a format from the element type of the image.
E.g., given a `RWTexture2D&lt;half4&gt;` we might infer that a qualifier of `layout(rgba16f)` should be used.

This strategy has two notable shortcomings:

* Sometimes the user will want a format that doesn't match an existing HLSL type. E.g., if they want the equivalent of `layout(r11f_g11f_b10f)`, then what should they put in their `RWTexture2D&lt;...&gt;` to make the inference do what they need?

* Sometimes the user knows that they don't need to specify a format *at all*, because using the `GL_EXT_shader_image_load_formatted` extension, they can still perform non-atomic load/store on images with no format specified in the SPIR-V.

This change adds two features directed at these challenges.

First, we add an explicit `[format(...)]` attribute that can be used to specify an explicit image format, including ones that don't match any HLSL type.
An example of using this new attribute is:

```hlsl
[format("r11f_g11f_b10f")]
RWTexture2D&lt;float3&gt; myImage;
```

For simplicity in initial bring-up, the new formats all use the same naming as formats in GLSL (this should make it easy for a programmer who knows what they expect to get in the GLSL output). We can change the naming convention for formats at a later time, so long as we keep these existing names in as a compatibility feature.

Note that this is *not* given a `vk::` prefix since the attribute should signal the programmer's intent to provide an image with that format on *all* targets (although only some targets might act on that information).

Also note that the attribute takes a string (`[format("rgba8")`) instead of a bare identifier (`[format(rgba8)]`) because this is consistent with the existing convention for attributes in HLSL.

When `[format(...)]` is left off, the default compiler behavior will still be to infer a format, but this behavior can be overidden for a single image using an explicit format of `"unknown"`:

```hlsl
[format("unknown")]
RWTexture2D&lt;float4&gt; mysteryMachine;
```

The second new feature is that if a user knows they are coding for a GPU that supports the `"unknown"` format in all non-atomic cases, then they can opt into making that the default for images without an explicit `[format(...)]`, using the new `-default-image-format-unknown` command-line option for `slangc`.

The new test case included with this change confirms that we correctly see the explicit formats in the output GLSL and *no* formats for images without explicit `[format(...)]` when using the new command-line option. The test stresses images declared at global scope, in parameter blocks, and in entry-point parameter lists, to try and make sure that all the relevant IR passes in the compiler preserve the format information.

* fixup: missing file
</content>
</entry>
<entry>
<title>Improve support for interfaces as shader parameters (#886)</title>
<updated>2019-03-09T00:24:02+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2019-03-09T00:24:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=4f94dd46a2d885e570814dd14a5e46f8e0814802'/>
<id>urn:sha1:4f94dd46a2d885e570814dd14a5e46f8e0814802</id>
<content type='text'>
* Improve support for interfaces as shader parameters

This change adds two main things over the existing support:

1. It is now possible to plug in concrete types that actually contain (uniform/ordinary) fields for the existential type parameters introduced by interface-type shader parameters. The `interface-shader-param2.slang` test shows that this works.

2. There is a limited amount of support for doing correct layout computation and generating output code that matches that layout, so that interface and ordinary-type fields can be interleaved to a limited extent. The `interface-shader-param3.slang` test confirms this behavior.

There are several moving pieces in the change.

* When it comes to terminology, we try to draw a more clear distinction between existial type parameters/arguments and existential/object value parametes/arguments. A simple way to look at it is that an `IFoo[3]` shader parameter introduces a single existential type parameter (so that a concrete type argument like `SomeThing` can be plugged in for the `IFoo`) but introduces three existential object/value parameters (to represent the concrete values for the array elements).

* At the IR level, we support a few new operations. A `BindExistentialsType` can take a type that is not itself an interface/existential type but which depends on interfaces/existentials (e.g., `ConstantBuffer&lt;IFoo&gt;`) and plug in the concrete types to be used for its existential type slots.

* Then a `wrapExistentials` instruction can take a type with all the existentials plugged in (possibly by `BindExistentialsType`) and wrap it into a value of the existential-using type (e.g., turn `ConstantBuffer&lt;SomeThing&gt;` into a `ConstantBuffer&lt;IFoo&gt;`).

* The IR passes for doing generic/existential specialization have been updated to be able to desugar uses of these new operations just enough so that a `ConstantBuffer&lt;IFoo&gt;` can be used.

* When we specialize an IR parameter of an interface type like `IFoo` based on a concrete type `SomeThing`, we turn the parameter into an `ExistentialBox&lt;SomeThing&gt;` to reflect the fact that we are conceptually referring to `SomeThing` indirectly (it shouldn't be factored into the layout of its surrounding type).

* Parameter binding was updated so that it passes along the bound existential type arguments in a `Program` or `EntryPoint` to type layout, so that we can take them into account. The type layout code needs to do a little work to pass the appropriate range of arguments along to sub-fields when computing layout for aggregate types.

* Type layout was updated to have a notion of "pending" items, which represent the concrete types of data that are logically being referenced by existential value slots. The basic idea is that these values aren't included in the layout of a type by default, but then they get "flushed" to come after all the non-existential-related data in a constant buffer, parameter block, etc.

* The logic for computing a parameter group (`ConstantBuffer` or `ParameterBlock`) layout was updated to always "flush" the pending items on the element type of the group, so that the resource usage of specialized existential slots would be taken into account.

* The type legalization pass has been adapted so that we can derive two different passes from it. One does resource-type legalization (which is all that the original pass did). The new pass uses the same basic machinery to legalize `ExistentialBox&lt;T&gt;` types by moving them out of their containing type(s), and then turning them into ordinary variables/parameters of type `T`.

Big things missing from this change include:

- Nothing is making sure that "pending" items at the global or entry-point level will get proper registers/bindings allocated to them. For the uniform case, all that matters in the current compiler is that we declare them in the right order in the output HLSL/GLSL, but for resources to be supported we will need to compute this layout information and start associating it with the existential/interface-type fields.

- Nothing is being done to support `BindExistentials&lt;S, ...&gt;` where `S` is a `struct` type that might have existential-type fields (or nested fields...). Eventually we need to desugar a type like this into a fresh `struct` type that has the same field keys as `S`, but with fields replaced by suitable `BindExistentials` as needed. (The hard part of this would seem to be computing which slots go to which fields). As a practial matter, this missing feature means that interface-type members of `cbuffer` declarations won't work.

The current tests carefully avoid both of these problems. They don't declare any buffer/texture fields in the concrete types, and they don't make use of `cbuffer` declarations or `ConstantBuffer`s over structure types with interface-type fields.

* fixup: add override to methods

* fixup: typos
</content>
</entry>
<entry>
<title>First steps toward supporting interface-type parameters on shaders (#852)</title>
<updated>2019-02-19T19:46:05+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2019-02-19T19:46:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=32135c5bdfb4d387f8227742a2d2fd555898aca8'/>
<id>urn:sha1:32135c5bdfb4d387f8227742a2d2fd555898aca8</id>
<content type='text'>
* First steps toward supporting interface-type parameters on shaders

What's New
----------

From the perspective of a user, the main thing this change adds is the ability to declare top-level shader parameters (either at global scope, or in an entry-point parameter list) with interface types. For example, the following becomes possible:

```hlsl
// Define an interface to modify values
interface IModifier { float4 modify(float4 val); }

// Define some concrete implementations
struct Doubler : IModifier
{
    float4 modify(float4 val) { return val + val; }
}
struct Squarer : IModifier { ... }

// Define a global shader parameter of interface type
IModifier gGlobalModifier;

// Define an entry point with an interface-type `uniform` parameter
void myShader(
    unifrom IModifier entryPointModifier,
            float4    inColor  : COLOR,
        out float4    outColor : SV_Target)
{
    // Use the interface-type parameters to compute things
    float4 color = inColor;
    color = gGlobalModifier.modify(color);
    color = entryPointModifier.modify(color);
    outColor = color;
}
```

The user can specialize that shader by specifying the concrete types to use for global and entry-point parameters of interface types (e.g., plugging in `Doubler` for `gGlobalModifier` and `Squarer` for `entryPointModifier`).

The "plugging in" process is done in terms of a concept of both global and local "existential slots" which are a new `LayoutResourceKind` that represents the holes where concrete types need to be plugged in for existential/interface types.

In simple cases like the above, each interface-type parameter will yield a single existential slot in either the global or entry-point parameter layout. Users can query the start slot and number of slots for each shader parameter, just like they would for any other resource that a parameter can consume. Before generating specialized code, the user plugs in the name of the concrete type they would like to use for each slot using `spSetTypeNameForGlobalExistentialSlot` and/or `spSetTypeNameForEntryPointExistentialSlot`.

There are some major limitations to the implementation in this first change:

* Parameters must be of interface type (e.g., `IFoo`) and not an array (`IFoo[3]`), or buffer (`ConstantBuffer&lt;IFoo&gt;`) over an interface type. Similarly, `struct` types with interface-type fields still don't work.

* The work on interface-type function parameters still doesn't include support for `out` or `inout` parameters, nor for functions that return interface types (that isn't technically related to this change, but affects its usefullness).

* No work is being done to correctly lay out shader parameters once the concrete types for existential slots are known, so that this change really only works when the concrete type that gets plugged in is empty.

These limitations are severe enough that this feature isn't really usable as implemented in this change, and this merely represents a stepping stone toward a more complete implementation.

Implementation
--------------

The API side of thing largely mirrors what was already done to support passing strings for the type names to use for global/entry-point generic arguments, so there should be no major surprises there.

The logic in `check.cpp` computes the list of existential slots when creating unspecialized `Program`s and `EntryPoint`s (this is logically the "front end" of the compiler), and then checks the supplied argument types against what is expected in each slot when creating specialized `Program`s and `EntryPoint`s. This again mirrors how generic arguments are handled.

Type layout was extended to compute the number of existential slots that a type consumes, and will thus automatically assign ranges of slots to top-level and entry-point shader parameters in the same way it already allocates `register`s and `binding`s. The big missing feature is the ability to specialize a layout to account for the concrete types plugged into the existential-type slots.

IR generation for specialized programs and entry points was slightly extended so that it attaches information about the concrete types plugged into the existential slots, and the witness tables that show how they conform to the interface for that slot. The linking step needed some small tweaks to make sure that information gets copied over to the target-specific program when we start code generation.

The meat of the IR-level work is in `ir-bind-existentials.cpp`, which takes the information that was placed in the IR module by the generation/linking steps and uses it to rewrite shader parameters. For example, if there is a shader parameter `p` of type `IModifier`, and the corresponding existential slot has the type `Doubler` in it, we will rewrite the parameter to have type `Doubler`, and rewrite any uses of `p` to instead use `makeExistential(p, /*witness that Doubler conforms to IModifier*/)`.

Once the replacement is done on the parameters, the existing work for specializing existential-based code when the input type(s) are known kicks in and does the rest.

Testing
-------

A single compute test is added to validate that this feature works. It is narrowly tailored to not require any of the features not supported by the initial implementation (e.g., all of the concrete types used have no members).

The test case *does* include use of an associated type through one of these existential-type parameters, which has exposed a subtle bug in how "opening" of existential values is implemented in the front-end. Rather than fix the underlying problem, I cleaned up the code in the front-end to special case when the existential value being opened is a variable bound with `let`, to directly use a reference to that variable rather than introduce a temporary. Similarly, in the IR generation step, I added an optimization to make variables declared with `let` skip introducing an IR-level variable and just use the SSA value of their initializer directly instead.

* fixup: missing files

* fixup: incorrect type for unreachable return

* fixup: actually comment ir-bind-existentials.cpp
</content>
</entry>
<entry>
<title>Split front- and back-ends (#846)</title>
<updated>2019-02-15T17:08:19+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2019-02-15T17:08:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=a3fd4e2bc40cfc77db953b14744c30e7a18e7c1d'/>
<id>urn:sha1:a3fd4e2bc40cfc77db953b14744c30e7a18e7c1d</id>
<content type='text'>
* Split front- and back-ends

This change is a major refactor of several of the types that provide the behind-the-scenes implementation of the public C API.
The goal of this refactor is primarily to allow for future API services that let the user operate both the front- and back-ends of the compiler in a more complex fashion.
For example, as user should be able to compile a bunch of source code into modules, look up types, functions, etc. in those modules, specialize generic types/functions to the types they've looked up, and then finally request target code to be gernerated for specialized entry points.
The back-end code generation they trigger should re-use the front-end compilation work (parsing, semantic checking, IR generation) that was already performed.

The most visible change is that `CompileRequest` has been split up into several smaller types that take responsibility for parts of what it did:

* The `Linkage` type owns the storage for `import`ed modules, and well as the `TargetRequest`s that represent code-generation targets. The intention is that an application could use a single `Linkage` for the duration of its runtime (so long as it was okay with the memory usage), so that each `import`ed module only gets loaded once. For now, this type needs to manage the search paths, file system, and source manager, because of its responsibility for loading files.

* A `FrontEndCompileRequest` owns the stuff related to parsing, semantic checking, and initial IR generation. This most notably includes the `TranslationUnitRequest`s and the `FrontEndEntryPointRequest`s (which used to be just `EntryPointRequest`s). It's main job is to produce AST and IR modules for each translation unit, and to find and validate the entry points. The front-end request does *not* interact with generic arguments for global or entry-point generic parameters.

* The main output of both `import` operations and front-end translation units is the `Module` type, which is just a simple container for both the AST module (to service the reflection/layout APIs, and also for semantic checking of code that `import`s the module) and the IR module (for linking and code generation). This type captures the commonalities between the old `LoadedModule` (which is now just an alias for `Module`) and `TranslationUnitRequest` (which now owns a `Module`).

* The secondary output of front-end compilation is a `Program`, which comprises a list of referenced `Module`s and validated `EntryPoint`s that will be used together. Layout and code generation both need a `Program` to tell them what modules and entry points will be used together (we don't want to just code-gen everythin that has ever been loaded into the linakge). The `Program`s created by the front-end do not include generic arguments, so they may provide incomplete layout information and/or be unsuitable for code generation.

* A `BackEndCompileRequest` owns stuff related to turning a `Program` into output kernels for the targets of a `Linkage`. Most of the data it owns beyond the `Program` to be compiled is minor, so this is a good candidate for demotion from a heap-allocated object to just a `struct` of options that gets passed around.

* The `CompileRequestBase` type is an attempt to wrap up the common functionality of both front-end and back-end compile requests. Most of it is just exposing the availability of a linkage and `DiagnosticSink`, so this type is a good candidate for subsequent removal. The main interesting thing it has is the flags related to dumping and validation of IR, so there is probably a good refactoring still to be made around deciding how options should be handled going forward.

* Behind the scenes, the `Program` type is set up to handle some level of on-line compilation and layout work. The `Program` knows the `Linkage` it belongs to, and allows for a `TargetProgram` to be looked up based on a specific `TargetRequest`. A `TargetProgram` then allows layout information and compiled kernel code to be asked for on-demand, in order to support eventual "live" compilation scenarios.

* The `EndToEndCompileRequest` type is a composition/coordination type that replaces the old `CompileRequest` in a way that uses the services of the various other types. It owns a few pieces of state that only make sense in the context of an end-to-end compile (e.g., there is really no way to "pass through" code when the front- and back-ends are run separately) or a command-line compile (everything to do with specifying output paths for files is really just for the benefit of `slangc`, and might even be moved there over time).

* One important detail is that the `EndToEndCompilRequest` owns all of the string-based generic arguments for both global and entry-point generic parameters. The logic in `check.cpp` for dealing with those arguments has been heavily refactored to separate out the parsings steps that are specific to end-to-end compilation with string-based type arguments, and the semantic checking  steps that result in a specialized `Program` (which can be exposed through new APIs that aren't tied to end-to-end compilation).

It is perhaps not surprising that this change had a lot of consequences, so I'll briefly run over some of the main categories of changes required:

* I changed the way that global generic arguments are passed via API (use `spSetGlobalGenericArgs` instead of the generic arguments for `spAddEntryPointEx`, which are not just for entry-point generics), which has been a change that we've needed for a long time. This is technically a breaking API change, although we should have very few client applications that care about it.

* A bunch of places that used to take "big" objects like `CompileRequest` now just take the sub-pieces they care about (e.g., a function might have only needed a `Linkage` and a `DiagnosticSink`). This makes many subroutines or "context" struct types more generally useful, at the cost of taking more parameters.

* In a few cases the conceptually clean separation of the layers breaks down (often for edge-case or compatibility features), and so we may pass along additional objects that are allowed to be null, but are used when present. A big example of this is how the back-end code generation routines accept an `EndToEndCompileRequest` that is optional, and only used to check whether "pass through" compilation is needed. We should probably look into cleaning this kind of logic up over time so that we don't need to violate the apparent separation of phases of compilation.

* In cases where separation of layers was being broken for the sake of GLSL features, I went ahead and ripped them out, since all of that should be dead code anyway.

* In many cases I increased the encapsulation of data in the core types to help track down use sites and make sure they are following invariants better.

* In cases where code was doing, e.g., `context-&gt;shared-&gt;compileRequest-&gt;session-&gt;getThing()` I have tried to introduce convenience routines so that the usage site is just `context-&gt;getThing()` to improve encapsulation and allow changes to be made more easily going forward.

* The `noteInternalErrorLoc` functionality was moved off of the compile request and into `DiagnosticSink`, since that is the one type you can rely on having around when you want to note an internal error. We may consider going forward if (and how) it should reset the counter used for noting locations on internal errors.

* A few APIs now take `DiagnosticSink*` arguments where they didn't before, and as a result some public APIs need to create `DiagnosticSink`s to pass in, before going ahead and ignoring the messages. In the future there should be variations of these APIs that accept an `ISlangBlob**` parameter for the output.

* fixup: missing include for compilers with accurate template checking (non-VS)

* fixup: review feedback
</content>
</entry>
<entry>
<title>Initial support for dynamic dispatch using "tagged union" types (#772)</title>
<updated>2019-01-16T20:48:11+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2019-01-16T20:48:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=aedf61784606406c090302efd8b7ac668ac997fc'/>
<id>urn:sha1:aedf61784606406c090302efd8b7ac668ac997fc</id>
<content type='text'>
* Initial support for dynamic dispatch using "tagged union" types

Suppose a user declares some generic shader code, like the following:

```hlsl
interface IFrobnicator { ... }
type_param T : IFrobincator;
ParameterBlock&lt;T : IFrobnicator&gt; gFrobnicator;
...
gFrobincator.frobnicate(value);
```

and then they have some concrete implementations of the required interface:

```hlsl
struct A : IFrobnicator { ... }
struct B : IFrobnicator { ... }
```

The current Slang compiler allows them to generate  distinct compiled kernels for the case of `T=A` and the case of `T=B`. This means that the decision of which implementation to use must be made at or before the time when a shader gets bound in the application.

This change adds a new ability where the Slang compiler can generate code to handle the case where `T` might be *either* `A` or `B`, and which case it is will be determined dynamically at runtime. This means a single compiled kernel can handle both cases, and the decision about which code path to run can be made any time before the shader executes.

This new option is supported by defining a *tagged union* type. Via the API, the user specifies that `T` should be specialized to `__TaggedUnion(A,B)` (the double underscore indicates that this is an experimental and unsupported feature at present). We refer to the types `A` and `B` here as the "case" types of the tagged union. Conceptually, the compiler synthesizes a type something like:

```hlsl
struct TU { union { A a; B b; } payload; uint tag; }
```

The user can then allocate a constant buffer to hold their tagged union type, and when they pick a concrete type to use (say `B`), they fill in the first `sizeof(B)` bytes of their buffer with data describing a `B` instance, and then set the `tag` field to the appopriate 0-based index of the case type they chose (in this case the `B` case gets the tag value `1`).

Actually implementing tagged unions takes a few main steps:

* Type parsing was extended to special-case `__TaggedUnion` as a contextual keyword. This is really only intended to be used when parsing types from the API or command-line, and Bad Things are likely to happen if a user ever puts it directly in their code. Eventually construction of tagged unions should be an API feature and not part of the language syntax.

* Semantic checking was extended to recognize that a tagged union like `__TaggedUnion(A,B)` shoud support an interface like `IFrobnicator` whenever all of the case types suport it, as long as the interface is "safe" for use with tagged unions (which means it doesn't use a few of the advancd langauge features like associated types).

* The IR was extended with instructions to represent tagged union types and to extract their tag and the payload for the different cases as needed.

* IR generation was extended to synthesize implementations of interface methods for any interface that a tagged union needs to support. Right now the implementation is simplistic and only handles simple method requirements, which it does by emitting a `switch` instruction to pick between the different cases.

* A new IR pass was introduced to "desugar" any tagged union types used in the code. The downstream HLSL and GLSL compilers don't support `union`s, so we have to instead emit a tagged union as a "bag of bits" and implement loading the data for particular cases from it manually.

* Final code emit mostly Just Works after the above steps, but we had to introduce an explicit IR instruction for bit-casting to handle the output of the desugaring pass.

There are a bunch of gaps and caveats in this implementation, but that seems reasonable for something that is an experimental feature. The various `TODO` comments and assertion failures in unimplemented cases are intended, so that this work can be checked in even if it isn't feature-complete.

* fixup: missing files

* fixup: typos
</content>
</entry>
<entry>
<title>Improve handling of {} initializer list expressions (#778)</title>
<updated>2019-01-16T19:50:14+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2019-01-16T19:50:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=8e47a3802d4d74eb11620f147ef5b29b8e931d35'/>
<id>urn:sha1:8e47a3802d4d74eb11620f147ef5b29b8e931d35</id>
<content type='text'>
Fixes #775

It was reported (in #775) that Slang doesn't handle initializer-list syntax when initializing matrix variables. When starting on a fix for that it became apparent that the time was right to fix two broad issues in the compiler's current handling of `{}`-enclosed initializer lists.

The first issue was that the front-end checking of initializer lists wasn't handling the C-style behavior where an initializer list can either contain nested `{}`-enclosed lists for sub-arrays/-structures, or directly contain "leaf" values for initializing those aggregates. For example, the following two variable declarations ought to be equivalent:

```hlsl
int4 a[] = { {1, 2, 3, 4}, {5, 6, 7, 8} };
int4 b[] = {  1, 2, 3, 4,   5, 6, 7, 8 };
```

Getting this distinction right is important because we want to support initializing a matrix either from a list of vectors for its rows, or a list of scalars for its elements (in row-major order).

The front-end semantic checking logic for initializer lists was revamped so that it conceptually tries to "read" an expression of a desired type from the initializer list, and decides at each step whether to consume a single expression by coercing it to the desired type, or to recursively read multiple sub-values to construct the type as an aggregate. The logic for deciding between direct vs aggregate initialization could potentially use some tweaking, but luckily it should always handle the case where users introduce explicit `{}`-enclosed sub-lists to make their intention clear, so that existing Slang code should continue to work as before.

The second issue was that initializers without the expected number of elements weren't implemented in code generation, so they would lead to internal compiler errors. This change revamps the codegen logic for initializer lists so that it can synthesize default values for fields/elements that were left out during initialization. This includes an attempt to support default initialization of `struct` fields based on explicitly written initialization expressions.</content>
</entry>
<entry>
<title>First step toward supporting use of interfaces as existential types (#716)</title>
<updated>2018-12-18T03:26:32+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-12-18T03:26:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=583c72af28d2dde5c564d5b56d3c5eb4ae4844f6'/>
<id>urn:sha1:583c72af28d2dde5c564d5b56d3c5eb4ae4844f6</id>
<content type='text'>
* First step toward supporting use of interfaces as existential types

Traditional generics involve universal quantification. E.g., a declaration like:

```
void drive&lt;T : IVehicle&gt;(T vehicle);
```

indicates for *for all* types `T` that implement the `IVehicle` interface, the `drive()` function is available.

In contrast, whend directly using an interface type like:

```
IVehicle v = ...;
v.doSomething();
```

we only know that there *exists* some concrete type (we could call it `E`) such that `v` refers to a value of type `E`, and `E` implements the `IVehicle` interface. In order to perform an operation like `v.doSomething()` we need to "open" the existential value so that we can look at the concrete type and how it implements the `IVehicle.doSomething` requirement.

This change adds a very explicit representation of existentials to Slang's IR. An operation like `e = makeExistential(v, w)` creates a value of some existential type (interfaces being our only existential types for now), by wrapping a concrete value `v` (the type of `v` can be seen as an implicit operand) and a witness table `w` showing that the type of `v` implements the requirements of the chosen interface type.

In turn, opening of an existential is handled with operations `extractExistential{Value|Type|WitnessTable}` which pull the corresponding piece of information out of a value of existential type (which somewhere in the code had to have been created with `makeExistential`).

The change includes a trivial simplification pass that can detect cases where an `extractExistential*` operation is applied direclty to a `makeExistential` operation, so that there is only one possible result that could be extracted. This allows for simplification of existential types used in trivial ways for local variables (this is mostly so I can check in a functional test, rather than to actually support useful code involving interfaces right now).

The logic in the semantic checking phase of the compiler is comparatively more complex.
When we are about to perform member lookup given an expression like `obj.member` we will first check if `obj` has an existential type, and if it does we will construct a suitable local context in which we extract the value, type, and witness table from the existential (these all become explicit AST expression nodes), and then use the extracted value as the base of the lookup operation.

The nature of existential values is that two different values with the same existential (interface) type could wrap concrete values with differnt types, so that we need to carefully refer only to the extracted type/value/witness-table of specific *values*. We handle this right now by conceptually moving the existential-type value into a local variable (by introducing a `LetExpr` that amounts to `let v = &lt;init&gt; in &lt;body&gt;`) and then require that the extract expressions must refer to the (immutable) variable declaration from which they are extracting a value.

(Eventually we should expand this so that when using an immutable local variable of existential type we just use that variable as-is rather than introduce a new temporary)

A simple test case is included that uses an interface type in an almost trivial way for a local variable; this test can be run and produces the expected results.
A more complex test case that passes an existential into a function is included, but left disabled because a more aggressive simplification approach is required to generate working code from it.

* Add missing file for expected test output

* Fixups for merge from top-of-tree
</content>
</entry>
<entry>
<title>Specialize away resource-type function parameters (#759)</title>
<updated>2018-12-17T22:48:48+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-12-17T22:48:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=3a02c590afdd2624b2c729e989ada9393d708f75'/>
<id>urn:sha1:3a02c590afdd2624b2c729e989ada9393d708f75</id>
<content type='text'>
* Specialize away resource-type function parameters

Work on #397.

Introduction
------------

Suppose a user writes a function that takes a resource type as a parameter:

```hlsl
float4 getThing(RWStructuredBuffer&lt;float4&gt; buffer, int index)
{
    return buffer[index];
}
```

This function creates challenges when generating code for GLSL-based targets, because a global shader parameter of type `RWStructuredBuffer`:

```hlsl
RWStructuredBuffer&lt;float4&gt; gBuffer;
```

translates to a global GLSL `buffer` declaration:

```hlsl
buffer _S0
{
    float4 _data[];
} gBuffer;
```

There is no equivalent to that `buffer` declaration that can be used in function parameter position, and it is illegal in GLSL to pass `gBuffer` into a function.

(Aside: yes, we could in principle translate a function parameter like `RWStructuredBuffer&lt;float4&gt; buffer` to `float4 buffer[]`, but that will not in turn generalize to arrays of structured buffers; it is a dead-end strategy)

The solution employed by many shader compilers is to "inline everything" to eliminate the need for parameters of resource types, and then rely on dataflow optimization to eliminate locals of resource types. This strategy can of course lead to an increase in code size, and it also means that call stacks are lost when doing step-through debugging. Another serious issue is that an "early `return`" from a function can turn into the equivalent of a multi-level `break` when inlined, and not all of our targets support multi-level `break`.

The solution implemented in this change works around some, but not all, of the problems with full inlining.
The approach here generates specialized versions of a function like `getThing`, adapted to the actual arguments provided at different call sites.
Thus if we have code like:

```hlsl
RWStructuredBuffer&lt;float4&gt; gA;
RWStructuredBuffer&lt;float4&gt; gB[10];
...
getThing(gA, x);
getThing(gA, y);
getThing(gB[someVal], z);
```

we will generate two specializations of `getThing`: one specialized for the `buffer` parameter being `gA` and the other for `gB`:

```hlsl
float4 getThing_gA(int index) { return gA[index]; }
float4 getThing_gB(int _val, int index) { return gB[_val][index]; }
```

and the call sites will change to match:

```hlsl
getThing_gA(x);
getThing_gA(y);
getThing_gB(someVal, z);
```

Note how in the case where the argument being passed in was obtained by indexing into an array of resources, the callee is specialized to the identity of the global shader parameter (`gB`), and now accepts a new parameter to indicate the array index into it.

While this description motivates the change based on GLSL output, the same basic issue can arise for other targets.
For example, while current HLSL has added the `ConstantBuffer&lt;T&gt;` type, it is not supported on older targets, and it turns out that even dxc does not allow functions to have `ConstantBuffer&lt;T&gt;` parameters.
Longer-term, we will likely need to do even more aggressive specialization both in order to generate SPIR-V output directly, and also to deal with function that have return values or `out` parameters of resource types.

Implementation
--------------

The meat of the change is in `ir-specialize-resources.{h,cpp}`, where we have a pass that looks at all call sites (`IRCall` instructions) in the program, and attempts to replace them with calls to specialized functions, where the specializations are generated on-demand.
The code in this pass is heavily commented, so hopefully it serves to explain itself all right.

After specialization is complete, we may still have functions like the original `getThing` that will produce invalid code when emitted as GLSL, so we need a way to make sure they don't appear in the output.
To date we've had some very ad hoc approaches for ignoring IR constructs that we don't want to affect emitted code, but this change goes ahead and adds a more real dead code elimination (DCE) pass in `ir-dce.{h,cpp}`.
This pass follows a straightforward approach of tagging instructions that are "live" and then propagating liveness through the whole program, before making a single pass to delete anything that isn't live.

When I first added the DCE pass it eliminated *everything* because there were no "roots" for liveness.
I solved this for now by adding a new decoration, `IREntryPointDecoration`, to mark shader entry points in the IR which should always be live (as should anything they depend on).

A secondary problem that arose was that for GLSL ray tracing shaders it is possible for the incoming/outgoing payload or attributes parameters to be unused, but eliminating them as dead would change the signature of a shader an potential break the rules for how ray tracing programs communicate.
I added a very simple `IRDependsOnDecoration` that allows one IR instruction to keep another alive *as if* it used it, without actually using it.

There's also a fixup in the IR dumping logic where I was forgetting to store anything in the mapping from instruction to their names, so that the name of an instruction was getting incremented each time it was referenced.

Testing
-------

There are three different tests added as part of this change:

* The `compute/func-resource-param` test covers the basic `RWStructuredBuffer` case above, which we expect to work fine for D3D11/12, but fail for Vulkan without specialization.

* The `cross-compile/func-resource-param-array` test covers the case where we don't just have one resource, but an array of them. This is not an end-to-end compute test primarily because our `render-test` application doesn't yet handle arrays of resources correctly in its binding logic.

* The `compute/func-cbuffer-param` test covers the case of a function with a `ConstantBuffer&lt;T&gt;` parameter, which requires specialization to become valid for any of our targets.

* fixup: warnings/errors from other compilers

* fixup: typos and cleanup

* fixup: typos
</content>
</entry>
</feed>
