<feed xmlns='http://www.w3.org/2005/Atom'>
<title>slang.git/source/slang/type-defs.h, branch master</title>
<subtitle>Making it easier to work with shaders</subtitle>
<id>https://git.yummers.dev/slang.git/atom?h=master</id>
<link rel='self' href='https://git.yummers.dev/slang.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/'/>
<updated>2019-05-31T21:20:37+00:00</updated>
<entry>
<title>Use slang- prefix on slang compiler and core source (#973)</title>
<updated>2019-05-31T21:20:37+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2019-05-31T21:20:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=6cbc3929a54d37bd23cb5efa8e3320ba02f78b2f'/>
<id>urn:sha1:6cbc3929a54d37bd23cb5efa8e3320ba02f78b2f</id>
<content type='text'>
* Prefixing source files in source/slang with slang-

* Prefix source in source/slang with slang- prefix.

* Rename core source files with slang- prefix.

* Update project files.

* Fix problems from automatic merge.
</content>
</entry>
<entry>
<title>Changes required for application adoption of interface-type parameters (#963)</title>
<updated>2019-05-20T17:40:38+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2019-05-20T17:40:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=71e35b6822b9e2846e129a888774d45a5e0827da'/>
<id>urn:sha1:71e35b6822b9e2846e129a888774d45a5e0827da</id>
<content type='text'>
* A few changes required for application adoption of interface-type parameters

There are a few small changes here that are all related in that they arose from trying to integrate support for specialization via global interface-type shader parameters into a real application.

Allow querying the "pending" layout via reflection API
------------------------------------------------------

The naming here isn't ideal, and could probably use a round of "bikeshedding" to arrive at something better, but the basic idea is that when you have a type like:

```
struct MyStuff
{
    int a;
    IFoo foo;
    int b;
}
```

the fields `a` and `b` get allocated space directly in the "primary" layout for `MyStuff` (at offsets 0 and 4, with `sizeof(MyStuff) == 8`), but the `foo` field can't be allocated space until we know what concrete type will get plugged in there.

If we have a concrete type in mind:

```
struct Bar : IFoo { int bar; }
```

then we can know how much space the `foo` field will take up, but we still can't allocate it space directly in `MyStuff`, because we already decided that `sizeof(MyStuff) == 8`.

Now imagine we place some `MyStuff` values into constant buffers:

```
cbuffer X {
    MyStuff x;
}

cbuffer Y {
    MyStuff y;
    float4 z;
}
```

In each case we know that we want to place the `MyStuff::foo` field at the end of the containing constant buffer so that it doesn't disrupt the layout of the existing fields. But that means that the offset of `MyStuff::foo` relative to the start of the `MyStuff` isn't fixed, because of unrelated fields like `z` that need to get in between.

In our layout code, we handle this by having a notion of a "pending" layout. Once we know how `MyStuff::foo` will be specialized, we can compute both a "primary" and a "pending" layout for `MyStuff`, which basically treats it as if it were two distinct types:

```
struct MyStuff_Primary
{
    int a;
    int b;
}

struct MyStuff_Pending
{
    Bar foo;
}
```

Layout for an aggregate type like the `X` or `Y` constant buffer then proceeds by computing an aggregate primary layout and an aggregate pending layout, and then finally a constant buffer or parameter block "flushes" all or part of the pending data by appending it to the primary data to get the final layout.

What all this means is that a type like `MyStuff` will have two different layouts (a default one for the primary data and a "pending" one for any specialized interface-type fields), and a variable like `Y::y` will also have two variable layouts that specify offsets (one set of offsets for its primary part, and one set of offsets for its pending part).

In order to handle interface-type fields with these layout rules, an application needs a way to query the "pending" part of a type or variable layout, which luckily gives it back just another type/variable layout. The API change here is minimal, although actually exploiting the new API correctly in application code could prove challenging.

Allow creating of explicitly specialized types
----------------------------------------------

This feature isn't actually implemented all the way through the compiler (I just needed enough to make the API calls go through), but I've added support for specializing a type that has interface-type fields through the reflection API. This maps to an `ExistentialSpecializedType` in the AST, and I'm lowering it to the IR as a `BindExistentialsType`, although that isn't 100% correct for the future.

This feature will require a future PR to actually flesh out the implementation work, but I'll wait until that is the sticking point on the application side before I do that.

Introduce a tiny `Hasher` abstraction
-------------------------------------

While implementing all the boilerplate for a new `Type` subclass (we really need to reduce that work...), I got fed up with how we do hash-code computation and introduced a small utility `Hasher` type that is intended to wrap up the idiom of combining hashes. For now this isn't a major change, but in the future I'd like to expand on the design a bit to clean up some of the warts around how we handle hashing:

* The `Hasher` implementation can and should switch from maintaining a single `HashCode` as its state to something that contains a more complete state (larger than the hash code) and just hashes new bytes into that state as it goes. This should make it possible to implement a `Hasher` for more serious hash functions, whether MD5, CityHash, or whatever we decide is good default.

* Things that are hashable shouldn't have a `getHashCode()` method, but instead should have something like a `hashInto(Hasher&amp;)` method. This change would have the dual benefits that (1) a composite type can easily hash all the fields that contribute to its identity into the hasher with minimal fuss/boilerplate, and (2) the hashes for composite types will be of higher quality because they can exploit all the bits of the hasher's state to combine the fields, instead of restricting each sub-field to just the bits in a hash code.

We should be able to incrementally improve the quality of our design there over future changes, but for now it probably isn't a critical priority.

Fixes for legalization of existential types
-------------------------------------------

There were some missing cases in the handling of type legalization, such that a global interface-type shader parameter that got specialized to a type that contains *only* resource-type fields would cause a crash in the legalization step.

I added a test for this case, and then made `ir-legalize-types.cpp` account for this case (the code to handle it ias a bit of a kludge, and shows that the `declareVars()` routine there is getting to a level of complexity that is worrying.

* fixup: review feedback
</content>
</entry>
<entry>
<title>String/List closer to conventions, and use Index type (#959)</title>
<updated>2019-04-29T21:03:46+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2019-04-29T21:03:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=4880789e3003441732cca4471091563f36531635'/>
<id>urn:sha1:4880789e3003441732cca4471091563f36531635</id>
<content type='text'>
* List made members m_
Tweaked types to closer match conventions.

* Use asserts for checking conditions on List.
Other small improvements.

* List&lt;T&gt;.Count() -&gt; getSize()

* List&lt;T&gt;
Add -&gt; add
First -&gt; getFirst
Last -&gt; getLast
RemoveLast -&gt; removeLast
ReleaseBuffer -&gt; detachBuffer
GetArrayView -&gt; getArrayView

* List&lt;T&gt;::
AddRange -&gt; addRange
Capacity -&gt; getCapacity
Insert -&gt; insert
InsertRange -&gt; insertRange
AddRange -&gt; addRange
RemoveRange -&gt; removeRange
RemoveAt -&gt; removeAt
Remove -&gt; remove
Reverse -&gt; reverse
FastRemove -&gt; fastRemove
FastRemoveAt -&gt; fastRemoveAt
Clear -&gt; clear

* List&lt;T&gt;
FreeBuffer -&gt; _deallocateBuffer
Free -&gt; clearAndDeallocate
SwapWith -&gt; swapWith

* List&lt;T&gt;
SetSize -&gt; setSize
Reserve -&gt; reserve
GrowToSize growToSize

* UnsafeShrinkToSize -&gt; unsafeShrinkToSize
Compress -&gt; compress
FindLast -&gt; findLastIndex
FindLast -&gt; findLastIndex
Simplify Contains

* List&lt;T&gt;
Removed m_allocator (wasn't used)
Swap -&gt; swapElements
Sort -&gt; sort
Contains -&gt; contains
ForEach -&gt; forEach
QuickSort -&gt; quickSort
InsertionSort -&gt; insertionSort
BinarySearch -&gt; binarySearch

Max -&gt; calcMax
Min -&gt; calcMin

* Initializer::Initialize -&gt; initialize
List&lt;T&gt;::
Allocate -&gt; _allocate
Init -&gt; _init
IndexOf -&gt; indexOf

* * Put #include &lt;assert.h&gt; in common.h, and remove unneeded inclusions
* Small refactor of ArrayView - remove stride as not used

* getSize -&gt; getCount
setSize -&gt; setCount
unsafeShrinkToSize-&gt;unsafeShrinkToCount
growToSize -&gt; growToCount
m_size -&gt; m_count

* Some tidy up around Allocator.

* Use Index type on List.

* Refactor of IntSet.
First tentative look at using Index.

* Made Index an Int
Did preliminary fixes.
Made String use Index.

* Partial refactor of String.

* String::Buffer -&gt; getBuffer
ToWString -&gt; toWString

* Small improvements to String.
String::
Buffer() -&gt; getBuffer()
Equals() -&gt; equals

* Try to use Index where appropriate.

* Fix warnings on windows x86 builds.
</content>
</entry>
<entry>
<title>Feature/as refactor review (#821)</title>
<updated>2019-02-02T16:58:54+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2019-02-02T16:58:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=3726194fbe3da234eb30b6371e5b4ab1ea388f93'/>
<id>urn:sha1:3726194fbe3da234eb30b6371e5b4ab1ea388f93</id>
<content type='text'>
* Replace dynamicCast with as where does not change behavior (ie not Type derived).
Use free function where scoping is clear.

* Replace uses of dynamicCast with as when there is no difference in behavior.

* Remove the IsXXXX methods from Type.

* Don't have separate smart pointer to store canonicalType on Type.

* Simplify Slang.FilteredMemberRefList.Adjust, such does the cast directly.

* Use free as where appropriate.

* Use free function version of casts where appropriate.

* Fix text in casting.md

* Fix typos in decl-refs.md

* Remove the uses of free function as on RefDecl.
Add 'canAs' to RefDecl as a way to test if a cast is possible.
Moved 'as' into RefDeclBase.

* Use 'is' to test for as cast on smart pointers.
Fix small scope issue.

* * Cache stringType and enumTypeType on the Session
* Make DeclRefType::Create return a RefPtr
* Make casting of result use the *method* .as  (cos using free function would mean objects being wrongly destroyed)
* Make results from createInstance ref'd to avoid possible leaks.

* Fix typo in template parameter for is on RefPtr.
</content>
</entry>
<entry>
<title>Initial support for dynamic dispatch using "tagged union" types (#772)</title>
<updated>2019-01-16T20:48:11+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2019-01-16T20:48:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=aedf61784606406c090302efd8b7ac668ac997fc'/>
<id>urn:sha1:aedf61784606406c090302efd8b7ac668ac997fc</id>
<content type='text'>
* Initial support for dynamic dispatch using "tagged union" types

Suppose a user declares some generic shader code, like the following:

```hlsl
interface IFrobnicator { ... }
type_param T : IFrobincator;
ParameterBlock&lt;T : IFrobnicator&gt; gFrobnicator;
...
gFrobincator.frobnicate(value);
```

and then they have some concrete implementations of the required interface:

```hlsl
struct A : IFrobnicator { ... }
struct B : IFrobnicator { ... }
```

The current Slang compiler allows them to generate  distinct compiled kernels for the case of `T=A` and the case of `T=B`. This means that the decision of which implementation to use must be made at or before the time when a shader gets bound in the application.

This change adds a new ability where the Slang compiler can generate code to handle the case where `T` might be *either* `A` or `B`, and which case it is will be determined dynamically at runtime. This means a single compiled kernel can handle both cases, and the decision about which code path to run can be made any time before the shader executes.

This new option is supported by defining a *tagged union* type. Via the API, the user specifies that `T` should be specialized to `__TaggedUnion(A,B)` (the double underscore indicates that this is an experimental and unsupported feature at present). We refer to the types `A` and `B` here as the "case" types of the tagged union. Conceptually, the compiler synthesizes a type something like:

```hlsl
struct TU { union { A a; B b; } payload; uint tag; }
```

The user can then allocate a constant buffer to hold their tagged union type, and when they pick a concrete type to use (say `B`), they fill in the first `sizeof(B)` bytes of their buffer with data describing a `B` instance, and then set the `tag` field to the appopriate 0-based index of the case type they chose (in this case the `B` case gets the tag value `1`).

Actually implementing tagged unions takes a few main steps:

* Type parsing was extended to special-case `__TaggedUnion` as a contextual keyword. This is really only intended to be used when parsing types from the API or command-line, and Bad Things are likely to happen if a user ever puts it directly in their code. Eventually construction of tagged unions should be an API feature and not part of the language syntax.

* Semantic checking was extended to recognize that a tagged union like `__TaggedUnion(A,B)` shoud support an interface like `IFrobnicator` whenever all of the case types suport it, as long as the interface is "safe" for use with tagged unions (which means it doesn't use a few of the advancd langauge features like associated types).

* The IR was extended with instructions to represent tagged union types and to extract their tag and the payload for the different cases as needed.

* IR generation was extended to synthesize implementations of interface methods for any interface that a tagged union needs to support. Right now the implementation is simplistic and only handles simple method requirements, which it does by emitting a `switch` instruction to pick between the different cases.

* A new IR pass was introduced to "desugar" any tagged union types used in the code. The downstream HLSL and GLSL compilers don't support `union`s, so we have to instead emit a tagged union as a "bag of bits" and implement loading the data for particular cases from it manually.

* Final code emit mostly Just Works after the above steps, but we had to introduce an explicit IR instruction for bit-casting to handle the output of the desugaring pass.

There are a bunch of gaps and caveats in this implementation, but that seems reasonable for something that is an experimental feature. The various `TODO` comments and assertion failures in unimplemented cases are intended, so that this work can be checked in even if it isn't feature-complete.

* fixup: missing files

* fixup: typos
</content>
</entry>
<entry>
<title>Improve handling of {} initializer list expressions (#778)</title>
<updated>2019-01-16T19:50:14+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2019-01-16T19:50:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=8e47a3802d4d74eb11620f147ef5b29b8e931d35'/>
<id>urn:sha1:8e47a3802d4d74eb11620f147ef5b29b8e931d35</id>
<content type='text'>
Fixes #775

It was reported (in #775) that Slang doesn't handle initializer-list syntax when initializing matrix variables. When starting on a fix for that it became apparent that the time was right to fix two broad issues in the compiler's current handling of `{}`-enclosed initializer lists.

The first issue was that the front-end checking of initializer lists wasn't handling the C-style behavior where an initializer list can either contain nested `{}`-enclosed lists for sub-arrays/-structures, or directly contain "leaf" values for initializing those aggregates. For example, the following two variable declarations ought to be equivalent:

```hlsl
int4 a[] = { {1, 2, 3, 4}, {5, 6, 7, 8} };
int4 b[] = {  1, 2, 3, 4,   5, 6, 7, 8 };
```

Getting this distinction right is important because we want to support initializing a matrix either from a list of vectors for its rows, or a list of scalars for its elements (in row-major order).

The front-end semantic checking logic for initializer lists was revamped so that it conceptually tries to "read" an expression of a desired type from the initializer list, and decides at each step whether to consume a single expression by coercing it to the desired type, or to recursively read multiple sub-values to construct the type as an aggregate. The logic for deciding between direct vs aggregate initialization could potentially use some tweaking, but luckily it should always handle the case where users introduce explicit `{}`-enclosed sub-lists to make their intention clear, so that existing Slang code should continue to work as before.

The second issue was that initializers without the expected number of elements weren't implemented in code generation, so they would lead to internal compiler errors. This change revamps the codegen logic for initializer lists so that it can synthesize default values for fields/elements that were left out during initialization. This includes an attempt to support default initialization of `struct` fields based on explicitly written initialization expressions.</content>
</entry>
<entry>
<title>First step toward supporting use of interfaces as existential types (#716)</title>
<updated>2018-12-18T03:26:32+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-12-18T03:26:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=583c72af28d2dde5c564d5b56d3c5eb4ae4844f6'/>
<id>urn:sha1:583c72af28d2dde5c564d5b56d3c5eb4ae4844f6</id>
<content type='text'>
* First step toward supporting use of interfaces as existential types

Traditional generics involve universal quantification. E.g., a declaration like:

```
void drive&lt;T : IVehicle&gt;(T vehicle);
```

indicates for *for all* types `T` that implement the `IVehicle` interface, the `drive()` function is available.

In contrast, whend directly using an interface type like:

```
IVehicle v = ...;
v.doSomething();
```

we only know that there *exists* some concrete type (we could call it `E`) such that `v` refers to a value of type `E`, and `E` implements the `IVehicle` interface. In order to perform an operation like `v.doSomething()` we need to "open" the existential value so that we can look at the concrete type and how it implements the `IVehicle.doSomething` requirement.

This change adds a very explicit representation of existentials to Slang's IR. An operation like `e = makeExistential(v, w)` creates a value of some existential type (interfaces being our only existential types for now), by wrapping a concrete value `v` (the type of `v` can be seen as an implicit operand) and a witness table `w` showing that the type of `v` implements the requirements of the chosen interface type.

In turn, opening of an existential is handled with operations `extractExistential{Value|Type|WitnessTable}` which pull the corresponding piece of information out of a value of existential type (which somewhere in the code had to have been created with `makeExistential`).

The change includes a trivial simplification pass that can detect cases where an `extractExistential*` operation is applied direclty to a `makeExistential` operation, so that there is only one possible result that could be extracted. This allows for simplification of existential types used in trivial ways for local variables (this is mostly so I can check in a functional test, rather than to actually support useful code involving interfaces right now).

The logic in the semantic checking phase of the compiler is comparatively more complex.
When we are about to perform member lookup given an expression like `obj.member` we will first check if `obj` has an existential type, and if it does we will construct a suitable local context in which we extract the value, type, and witness table from the existential (these all become explicit AST expression nodes), and then use the extracted value as the base of the lookup operation.

The nature of existential values is that two different values with the same existential (interface) type could wrap concrete values with differnt types, so that we need to carefully refer only to the extracted type/value/witness-table of specific *values*. We handle this right now by conceptually moving the existential-type value into a local variable (by introducing a `LetExpr` that amounts to `let v = &lt;init&gt; in &lt;body&gt;`) and then require that the extract expressions must refer to the (immutable) variable declaration from which they are extracting a value.

(Eventually we should expand this so that when using an immutable local variable of existential type we just use that variable as-is rather than introduce a new temporary)

A simple test case is included that uses an interface type in an almost trivial way for a local variable; this test can be run and produces the expected results.
A more complex test case that passes an existential into a function is included, but left disabled because a more aggressive simplification approach is required to generate working code from it.

* Add missing file for expected test output

* Fixups for merge from top-of-tree
</content>
</entry>
<entry>
<title> Support cross-compilation of ray tracing shaders to Vulkan (#663)</title>
<updated>2018-10-04T23:05:23+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-10-04T23:05:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=4cb2a19ef192424c0a4eb205a773624563222383'/>
<id>urn:sha1:4cb2a19ef192424c0a4eb205a773624563222383</id>
<content type='text'>
* Move to newer glslang

* Support cross-compilation of ray tracing shaders to Vulkan

This change allows HLSL shaders authored for DirectX Raytracing (DXR) to be cross-compiled to run with the experimental `GL_NVX_raytracing` extension (aka "VKRay").

* The GLSL extension spec is marked as experimental, so that any shaders written using this support should be ready for breaking changes when the spec is finalized.
* "Callable shaders" are not exposed throug the GLSL extension, so this feature of DXR will not be cross-compiled.
* The experimental Vulkan raytracing extension does not have an equivalent to DXR's "local root signature" concept. This does not visibly impact shader translation (because the local/global root signature mapping is handled outside of the HLSL code), but in practice it means that applications which rely on local root signatures on their DXR path will not be able to use the translation in this change as-is; more work will be needed.

The simplest part of the implementation was to go into the Slang standard library and start adding GLSL translations for the various DXR operations.
In some cases, like mapping `IgnoreHit()` to `ignoreIntersectionNVX()` this is almost trivial.
The various functions to query system-provided values (e.g., `RayTMin()`) were also easy, with the only gotcha being that they map to variables rather than function calls in GLSL, and our handling of `__target_intrinsic` assumes that a bare identifier represents a replacement function name, and not a full expression, so we have to wrap these definitions in parentheses.

The tricky operations are then `TraceRay&lt;P&gt;()` and `ReportHit&lt;A&gt;()`, because these two are generics/templates in HLSL.
GLSL doesn't support generics, even for "standard library" functions, so the raytracing extension implements a slightly complex workaround: the matching operations `traceNVX()` and `reportIntersectionNVX()` pass the payload/attributes argument data via a global variable.
That is, shader code for the GLSL extensions writes to the global variable and then calls the intrinsic function.
The linkage between the call site and the global is established by a modifier keyword (`rayPayloadNVX` and `hitAttributeNVX`, respectively) and in the case of ray payload also uses `location` number to identify which payload global to use (since a single shader can trace rays with multiple payload types).

Our translation strategy in Slang tries to leverage standard language mechanisms instead of special-case logic.
For example, to translate the `ReportHit&lt;A&gt;()` function, we provide both a default declaration that will work for HLSL (where the operation is built-in with the signature given), and a *definition* marked with the `__specialized_for_target(glsl)` modifier.
The GLSL definition declares a function `static` variable that will fill the role of the required global, and then does what the GLSL spec requires: assigns to the global, and then calls the `reportIntersectionNVX` builtin (which we declare as a separate builtin).
Our ordinary lowering process will turn that `static` variable into an ordinary global in the IR, and the `[__vulkanHitAttributes]` attribute on the variable will be emitted as `hitAttributeNVX` in the output.
There is no additional cross-compilation logic in Slang specific to `ReportHit&lt;A&gt;()` - the target-specific definition in the standard library Just Works.

The case for `TraceRay&lt;P&gt;()` is a bit more complicated, simply because the GLSL `traceNVX()` function needs to be passed the `location` for the payload global.
We implement the payload global as a function-`static` variable, with the knowledge that every unique specialization of `TraceRay&lt;P&gt;()` will generate a unique global variable of type `P` to implement our function-`static` variable.
We then add a slightly magical builtin function `__rayPayloadLocation()` that can map such a variable to its generated `location`; the logic for this is implemented in `emit.cpp` and described below.

We also changed the `RayDesc` and `BuiltinTriangleIntersectionAttributes` types from "magic" intrinsic types over to ordinary types (because the GLSL output needs to declare them as ordinary `struct` types).
This ends up removing some cases in the AST and IR type representations.
By itself this change would break HLSL emit, because in that case the types really are intrinsic.
We added a `__target_intrinsic` modifier to these types to make them intrinsic for HLSL, and then updated the downstream passes to handle the notion of target-intrinsic types.

The logic for binding/layout of entry point inputs and outputs was updated so that raytracing stages don't follow the default logic for varying input/output parameters.
This is because the input/output parameters of a raytracing entry point aren't really "varying" in the same sense as those in the rasterization pipeline.
In particular, the SPIR-V model for raytracing input and output treats "ray payload" and "hit attributes" parameters as being in a distinct storage class from `in` or `out` parameters.

We also detect cases where a ray tracing stage declares inputs/outputs that it shouldn't have. This logic could conceivably be extended to other stages (e.g., to give an error on a compute shader with user-defined varying input/output).

The type layout logic added cases for handling raytracing payload and hit-attribute data, but this is currently just a stub implementation that follows the same logic as for varying `in` and `out` parameters (it cannot give meaningful byte sizes/offsets right now).
To my knowledge the GLSL spec doesn't currently specify anything about layout, and I haven't read the DXR spec language carefully enough to know what it says about layout.
A future change should update the layout logic to allow for byte-based layout of ray payloads, etc. so that we can query this information via reflection.

The GLSL legalization logic in `ir.cpp` was updated to factor out the per-entry-point-parameter code into its own function, and then that function was updated to special-case the input/output of a ray-tracing shader.

While for rasterization stages we typically want to take the user-declared input/output and "scalarize" it for use in GLSL (in part to deal with language limitations, and in part to tease system values apart from user-defined input/output), the GLSL spec for raytracing requires payload and hit attribute parameters to be declared as single variables. There is also the issue that even for an `in out` parameter, a ray payload parameter should only turn into a single global, whereas the handling for varying `in out` parameters generates both an `in` and an `out` global for the GLSL case.

Other than the handling of entry point parameters, the GLSL legalization pass doesn't need to do anything special for ray tracing shaders.

The trickiest change in the `emit.cpp` logic is that we now generate `location`s for ray payload arguments (the outgoing from a `TraceRay()` call) on demand during code generation.
This is a bit hacky, and it would be nice to handle it as a separate pass on the IR rather than clutter up the emit logic, but this approach was expedient.
Basically, any of the global variables that got generated from the `static` declarations in the standard library implementation of `TraceRay()` will trigger the logic to assign them a `location`.

The logic for emitting intrinsic operations added a few new `$`-based escape sequences. The `$XP` case handles emitting the location of a generated ray payload variable; this is how we emit the matching location at the site where we call `traceNVX`. The `$XT` case emits the appropriate translation for `RayTCurrent()` in HLSL, because it maps to something different depending on the target stage.

All of the test cases here consist of a pair of an HLSL/Slang shader written to the DXR spec, plus a matching GLSL shader for a baseline.
The GLSL shaders are carefully designed so that when fed into glslang they will produce the same SPIR-V as our cross-compilation process.
This kind of testing is quite fragile, but it seems to be the best we can do until our testing framework code supports *both* DXR and VKRay.

A bunch of the core changes ended up being blocked on issues in the rest of the compiler, so some additional features go implemented or fixed along the way:

The first big wall this work ran into was that the `__specialized_for_target` modifier hasn't actually been working correctly for a while.
It turns out that for the one function that is using it, `saturate()`, we have been outputting the workaround GLSL function in *all* cases (including for HLSL output) rather than only on GLSL targets.

The problem here is that for a generic function with a `__specialized_for_target` modifier or a `__target_intrinsic` modifier, the IR-level decoration will end up attached to the `IRFunc` instruction nested in the `IRGeneric`, but the logic for comparing IR declarations to see which is more specialized (via `getTargetSpecializationLevel()`) was looking only at decorations on the top-level value (the generic).

The quick (hacky) fix here is to make `getTargetSpecializationLevel()` try to look at the return value of a generic rather than the generic itself, so that it can see the decorations that indicate target-specific functions.
A more refined fix would be to attach target-specificity decorations to the outer-most generic (to simplify the "linking" logic).
The only reason not to fold that into the current fix is that the `__target_intrinsic` modifier currently serves double-duty as a marker of target specialization *and* information to drive emit logic. The latter (the emit-related stuff) currently needs to live on the `IRFunc`, and moving it to the generic could easily break a lot of code.

This needs more work in a follow-on fix, but for now target specialization should again be working.

The other big gotcha that the simple "just use the standard library" strategy ran into was that function-`static` variables weren't actually implemented yet, and in particular function-`static` variables inside of generic functions required some careful coding.

The logic in `lower-to-ir.cpp` has this `emitOuterGenerics()` function that is supposed to take a declaration that might be nested inside of zero or more levels of AST generics, and emit corresponding IR generics for all those levels.
This is needed because two different AST functions nested inside a single generic `struct` declaration should turn into distinct `IRFunc`s nested in distinct `IRGeneric`s.
The tricky bit to making that all work is that the same AST-level generic type parameter will then map to *different* IR-level instructions (the parameters of distinct `IRGeneric`s) when lowering each function.
The existing logic handled this in an idiomatic way by making "sub-builders" and "sub-contexts."
This change refactors some of the repeated logic into a `NestedContext` type to help simplify the pattern, and applies it consistently throughout the `lower-to-ir.cpp` file.

Besides that cleanup, the major change is `lowerFunctionStaticVarDecl` which, unsurprisingly, handles lower of function-`static` variables to IR globals.
The careful handling of nested contexts here is needed because if we are in the middle of lowering a generic function, then a `static` variable should turn into its *own* `IRGeneric` wrapping an `IRGlobalVar`. The body of the function should refer to the global variable by specializing the global variable's `IRGeneric` to the parameters of the *functions* `IRGeneric`. This tricky detail is handled by `defaultSpecializeOuterGenerics`.

An additional subtlety not actually required for this raytracing work (and thus not properly tested right now) is handling function-`static` variables with initializers.
These can't just be lowered to globals with initializers, because HLSL follows the C rule that function-`static` variables are initialized when the declaration statement is first executed (and this could be visible in the presence of side-effects).
The lowering strategy here translates any `static` variable with an initializer into *two* globals: one for the actual storage, plus a second `bool` variable to track whether it has been initialized yet.
There are some opportunities to optimize this case, especially for `static const` data, but that will need to wait for future changes.

We've slowly been shifting away from the model where a user thinks of a "profile" as including both a stage and a feature level.
Instead, the user should think about selecting a profile that only describes a feature level (e.g., `sm_6_1`, `glsl_450`, etc.), and then separately specifying a stage (`vertex`, `raygeneration, etc.) for each entry point.
The challenge here is that the command-line processing still only had a single `-profile` switch, and no way to specify the stage.

Adding the `-stage` option was relatively easy, but making it work with the existing validation logic for command-line arguments was tricky, because of the complex model that `slangc` supports for compiling multiple entry points in a single pass.

* In `slang.h` add new reflection parameter categories for ray payloads and hit attributes, as part of entry point input/output signatures.

* A previous change already updated our copy of glslang to one that supports the `GL_NVX_raytracing` extension, so in `slang-glslang.cpp` we just needed to map Slang's `enum` values for the raytracing stage names to their equivalents in the glslang code.

* Moved the logic for looking up a stage by name (`findStageByName()`) out of `check.cpp` and into `compiler.cpp`, with a declaration in `profile.h`

* Added a `$z` suffix to the GLSL translation of `Texture*.SampleLevel()`, to handle cases where the texture element type is not a 4-component vector. Note that this fix should actually be applied to *all* these texture-sampling operations, but I didn't want to add a bunch of changes that are (clearly) not being tested right now.

* The layout logic for entry points was updated to correctly skip producing a `TypeLayout` for an entry point result of type `void`, which meant that the related emit logic now needs to guard against a null value for the result layout.

* In `ir.cpp`, dump decorations on every instruction instead of just selected ones, so that our IR dump output is more complete.

* Added a command-line `-line-directive-mode` option so that we can easily turn off `#line` directives in the output when debugging. Not all cases where plumbed through because the `none` case is realistically the most important.

* Parser was fixed to properly initialize parent links for "scope" declarations used for statements, so that we can walk backwards from a function-scope variable (including a `static`) and see the outer function/generics/etc.

* Added GLSL 460 profile, since it is required for ray tracing. Also updated the logic for computing the "effective" profile to use to recognize that GLSL raytracing stages require GLSL 460.

* Added some conventional ray-tracing shader suffixes to the handling in `slang-test`. This code isn't actually used, but was relevant when I started by copy-pasting some existing VKRay shaders as the starting point for my testing.

* Fixup: typos
</content>
</entry>
<entry>
<title>Improve support for non-32-bit types. (#643)</title>
<updated>2018-09-20T15:14:25+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-09-20T15:14:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=738bcb82d0327c463625ee6fcdf14b52e766cedc'/>
<id>urn:sha1:738bcb82d0327c463625ee6fcdf14b52e766cedc</id>
<content type='text'>
The main change here is to fill out the `BaseType` enumeration so that it covers the full range of 8/16/32/64-bit signed and unsigned integers, as well as 16/32/64-bit floating-point numbers, and then propagate that completion through various places in the code.

More details:

* The current `half`, `float`, `double`, `int`, and `uint` types are still the default names for their types, so things like `float16_t` and `int32_t` were added as `typedef`s.

* We still need to generate the full gamut of vector/matrix `typedef`s for the new types, so that things like `float16_t4x3` will work (yes, I know that is ugly as sin, but that's the HLSL syntax...).

* A few pieces of dead code from earlier in the compiler's life got removed, since I did a find-in-files for `BaseType::` and tried to either update or delete every site.

* A few call sites that were enumerating integer base types in an ad-hoc fashion were changed to use a single `isIntegerBaseType()` function that I added in `check.cpp`

* When compiling with dxc for shader model 6.2 and up, we enable the compiler's support for native 16-bit types via a flag.

* The public API enumeration for reflection of scalar types added cases for 8- and 16-bit integers (it already exposed the other cases we need)

* The lexer was updated to be extremely liberal in what kinds of suffixes it allows on literals. I also removed the logic that was treating, e.g., `0f` as a floating-point literal (it doesn't seem to be the right behavior). That would now be an integer literal with an invalid suffix.

* The logic in the parser that applies types to literals was updated to handle a few more cases: `LL` and `ULL` for 64-bit integers, and `H` for 16-bit floats.

* The mangling logic needed to be updated to handle the new cases, and I consolidated the handling of those types in their front-end and IR forms.

* Removed the explicit `BasicExpressionType::ToString` logic, since all basic types are `DeclRefType`s in the front end, and we can just print them out as such.

* As a bit of a gross hack, fudged the conversion costs so that `int` to `int64_t` conversion is a bit more costly. The problem there is that given an operation like `int(0) + uint(0)`, the best applicable candidates ended up being `+(uint,uint)` and `+(int64_t,int64_t)` because the cost of a single `int`-to-`uint` conversion was the same as the sum of the cost of an `int`-to-`int64_t` and a `uint`-to-`int64_t`. A better long-term fix here is to completely change our overload resolution strategy, but that is obviously way too big to squeeze into this change.

* Type layout computation was updated to handle all the new types and give them their natural size/alignment. Note that this does *not* work for down-level HLSL where `half` is treated as a synonym for `float`. It also doesn't deal with the fact that many of these types aren't actually allowed in constant buffers for certain shader models. A future change should work to add error messages for unsupported stuff during type layout (or just make the types themselves require support for certain capabilities)</content>
</entry>
<entry>
<title>Add support for more RasterizerOrdered types (#628)</title>
<updated>2018-08-21T15:40:25+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2018-08-21T15:40:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=0ce1131ba12f777fbaa40004e0e3e7af89ccf4f0'/>
<id>urn:sha1:0ce1131ba12f777fbaa40004e0e3e7af89ccf4f0</id>
<content type='text'>
Fixes #627

The front-end has support for `RasterizerOrderedBuffer` and `RasterizerOrderedTexture*`, but left out support for:

* `RasterizerOrderedByteAddressBuffer`
* `RasterizerOrderedStructuredBuffer`

[Nitpick: these tyeps are all amazingly annoying to type. It is easy to want to write `RasterOrdered` instead of the bulkier `RasterizerOrdered`, and almost everybody does in casual speech. There's already the issue of wanting to type `StructureBuffer` (a buffer of structures) instead of `StructuredBuffer` (a buffer that is... structured?). Then you have `ByteAddressBuffer` which is just adding to the confusion because it is nominally a "byte addressable" buffer (so that `ByteAddressedBuffer` would actually make sense), but then actually *isn't* byte addressable in practice.]

There were a few `TODO` comments related to this already, and this change was mostly a matter of doing a find-in-files for `RWByteAddressBuffer` and `RWStructuredBuffer` and adding matching `RasterizerOrdered` cases.

The test I added just checks that these types make it through the front-end, and doesn't do any actual confirmation that they work as intended. It is worth noting that the handling of ordering in GLSL/VK is different from in HLSL ("pixel shader interlock" instead of "rasterizer ordered views"), so coming up with a cross-compilation story would need to be a later step.</content>
</entry>
</feed>
