<feed xmlns='http://www.w3.org/2005/Atom'>
<title>slang.git/source/slang/core.meta.slang.h, branch master</title>
<subtitle>Making it easier to work with shaders</subtitle>
<id>https://git.yummers.dev/slang.git/atom?h=master</id>
<link rel='self' href='https://git.yummers.dev/slang.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/'/>
<updated>2020-03-06T22:15:58+00:00</updated>
<entry>
<title>Remove generated header files (#1264)</title>
<updated>2020-03-06T22:15:58+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2020-03-06T22:15:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=cdee13466080b737ed18c148e36af75898285ed6'/>
<id>urn:sha1:cdee13466080b737ed18c148e36af75898285ed6</id>
<content type='text'>
* Update slang-binaries to verison with SPIR-V version support.

* Support vec and matrix Wave intrinsics on vk.
Added wave-vector.slang test
Add wave-diverge.slang test
Add support for more wave intrinsics to vk.

* Test out Wave intrinsic support for matrices.

* Remove matrix glsl intrinsics -&gt; not available.
Fix some typo.

* Remove generated slang generated headers.</content>
</entry>
<entry>
<title>Expand range of definitions that can be moved into stdlib (#1259)</title>
<updated>2020-03-06T19:37:36+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2020-03-06T19:37:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=18be2d81fd2740d3f0c06fc407cff1702b93d468'/>
<id>urn:sha1:18be2d81fd2740d3f0c06fc407cff1702b93d468</id>
<content type='text'>
The actual definitions that got moved into the stdlib here are pretty few:

* `clip()`
* `cross()`
* `dxx()`, `ddy()` etc.
* `degrees()`
* `distance()`
* `dot()`
* `faceforward()`

The meat of the change is infrastructure changes required to support these new declarations

* Generic versions of the standard operators (e.g., `operator+`) were added that are generic for a type `T` that implements the matching `__Builtin`-prefixed interface. An open question is whether we can now drop the non-generic versions in favor of just having these generic operators.

* A `__BuiltinLogicalType` interface was added to capture the commonality between integers and `bool`

* `__BuiltinArithmeticType` was extended so that implementations must support initialization from an `int`

* `__BuiltinFloatingPointType` was extended to require an accessor that returns the value of pi for the given type, and the concrete floating-point types were extended to provide definitions of this value.

* It turns out that our logic for checking if two functions have the same signature (and should thus count as redeclarations/redefinitions) wasn't taking generic constraints into account at all. That was fixed with a stopgap solution that checks if the generic constraints are pairwise identical, but I didn't implement the more "correct" fix that would require canonicalizing the constraints.

* When doing overload resolution and considering potential callees, logic was added so that a non-generic candidate should always be selected over a generic one (generally the Right Thing to do), and also so that a generic candidate with fewer parameters will be selected over one with more (an approximation of the much more complicated rule we'd ideally have).

* The formatting of declarations/overloads for "ambiguous overload" errors was fleshed out a bit to include more context (the "kind" of declaration where appropriate, the return type for function declarations) and to properly space thing when outputting specialization of operator overloads that end with `&lt;` (so that we print `func &lt; &lt;int&gt;(int, int)` instead of just `func &lt;&lt;int,int&gt;(int,int)`).

* The core lookup routines were heavily refactored and reorganized to try to make them bottleneck more effectively so that all paths handle all the nuances of inheritance, extensions, etc.

* Because of the refactoring to lookup logic, the semantic checking logic related to checking if a type conforms to an interface was updated to be driven based on the `Type` that is supposed to be conforming, rather than a `DeclRef` to the type's declaration. This allows it to use the type-based lookup entry point and eliminates one special-case entry point for lookup.

In addition to the various core changes, this change also refactors some of the existing stdlib code to favor writing more things in actual Slang syntax, and less in C++ code that uses `StringBuilder` to construct the Slang syntax. There is a lot more that could be done along those lines, but even pushing this far is showing that the current approach that `slang-generate` takes for how to separate meta-level C++ and Slang code isn't really ideal, so a revamp of the generator code is probably needed before I continue pushing.

One surprising casualty of the refactoring of lookup is that we no longer have the `lookedUpDecls` field in `LookupResult`. That field probably didn't belong there anyway, but the role it served was important. The idea of `lookedUpDecls` was to avoid looking up in the same interface more than once in cases where a type might have a "diamond" inheritance pattern. Removing that field doesn't appear to affect correctness of any of our existing tests, but by adding a specific test for "diamond" inheritance I could see that the refactoring introduced a regression and made looking up a member inherited along multiple paths ambiguous.

Rather than add back `lookedUpDecls` I went for a simpler (but arguably even hackier) solution where when ranking candidates from a `LookupResult` we check for identical `DeclRef`s and arbitrarily favor one over the other. One complication that arises here is that when comparing `DeclRef`s inherited along different paths they might have a `ThisTypeSubstitution` for the same type, but with different subtype witnesses (because different inheritance paths could lead to different transitive subtype witnesses: e.g., `A : B : D` and `A : C : D`).</content>
</entry>
<entry>
<title>Move definitions of simple vector/matrix builtins to stdlib. (#1247)</title>
<updated>2020-03-03T19:49:40+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2020-03-03T19:49:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=0f1f4a42df4efd32b80fd2b01f3893435e47e980'/>
<id>urn:sha1:0f1f4a42df4efd32b80fd2b01f3893435e47e980</id>
<content type='text'>
Some of the functions declared in the Slang standard library are built in on some targets (almost always the case for HLSL) but aren't available on other targets (often the case for GLSL, CUDA, and CPU). To date, the CUDA and CPU targets have worked around this issue by synthesizing definitions of the missing functions on the fly as part of output code generation, at the cost of some amount of code complexity in the emit pass.

This change adds definitions inside the stdlib itself for a large number of built-in HLSL functions that act element-wise over both vectors and matrices (e.g., `sin()`, `sqrt()`, etc.), and changes the CPU/CUDA codegen path to *not* synthesize C++ code for those functions (instead relying on code generated from the Slang definitions).

The element-wise vector/matrix function bodies are being defined using macros in the stdlib, so that we can more easily swap out the definitions en masse if we find an implementation strategy we like better. This could involve defining special-case syntax just for vector/matrix "map" operations that can lower directly to the IR and theoretically generate cleaner code after specialization is complete.

As a byproduct of this change, the matrix versions of these functions should in principle now be available to GLSL (GLSL only defines vector versions of functions like `sin()`, and leaves out matrix ones). No testing has been done to confirm this fix.

In some cases builtins were being declared with multiple declarations to split out the HLSL and GLSL cases, and this change tries to unify these as much as possible into single declarations to keep the stdlib as small as possible.

Two functions -- `sincos()` and `saturate()` -- were simple enough that their full definitions could be given in the stdlib so that even the scalar cases wouldn't need to be synthesized, so the corresponding enumerants were removed in `slang-hlsl-intrinsic-set.h`. In the case of `saturate()` the pre-existing definition used for GLSL codegen could have been used for CPU/CUDA all along.

In some cases functions that can and should be defined in the future have had commented-out bodies added as an outline for what should be inserted in the future. Most of these functions cannot be implemented directly in the stdlib today because basic operations like `operator+` are currently not defined for `T : __BuiltinArithmeticType`, etc. Adding such declarations should be straightforward, but brings risks of creating unexpected breakage, so it seemed best to leave for a future change.

This change does not try to address making vector or matrix versions of builtin functions that map to single `IROp`s, because the existing mechanisms for target-based specialization, etc., do not apply for such cases. In the future we will either have to make those operations into ordinary functions (eliminating many `IROp`s) so that stdlib definitions can apply, or add an explicit IR pass to deal with legalizing vector/matrix ops for targets that don't support them natively. The right path for this is not yet clear, so this change doesn't wade into it.

This change does not touch the `Wave*` functions added in Shader Model 6, despite many of these having vector/matrix versions that could benefit from the same default mapping. It is expected that these functions will have GLSL/Vulkan translation added soon, and it probably makes sense to know what cases are directly supported on Vulkan before adding the hand-written definitions.

Because of the limitations on what could be ported into the stdlib, it is not yet possible to remove any of the infrastructure for synthesizing builtin function definitions in the CPU and CUDA back-ends.</content>
</entry>
<entry>
<title>Additional Wave Intrinsic Support (#1252)</title>
<updated>2020-03-02T21:18:20+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2020-03-02T21:18:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=8899c149b05def1cce626ea649012c4c974861de'/>
<id>urn:sha1:8899c149b05def1cce626ea649012c4c974861de</id>
<content type='text'>
* Test for some wave intrinsics.
More wave intrinsic support on CUDA.

* Use shfl_xor_sync.

* Improvements around wave intrinsics.
Fix built in integer types belong to __BuiltinIntegerType.

* Improvements and fixes around Wave intrinsics.

* Added WaveIsFirstLane test.
No longer use __wavemask_lt, as appears not available as an intrinsic.

* Small fixes to CUDA prelude.

* Add wave-active-product test.
Handle the special case for arbitray sums.

* Used macro to implement CUDA wave intrinsics.
</content>
</entry>
<entry>
<title>Support for RWTexture types on CPU and CUDA (#1243)</title>
<updated>2020-02-26T21:13:41+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2020-02-26T21:13:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=7bce066cfc51296a538c7a7d325133d60e352494'/>
<id>urn:sha1:7bce066cfc51296a538c7a7d325133d60e352494</id>
<content type='text'>
* Added FloatTextureData as a mechanism to enable CPU based Texture writes.

* Add [] RWTexture access for CPU.

* Fixed rw-texture-simple.slang.expected.txt

* WIP: CUDA stdlib has support for [] surface access.

* Made IRWTexture class able to take different locations.
Doing a Texture2d access on CUDA works.

* Fix bug in outputing UniformState - was missing out padding.
Support RWTexture with array. Support RWTexture3D.

* Use * for locations for read only textures, so only need a ITexture interface.

* Fix problem around application of set/get for CUDA on subscript Texture types.
</content>
</entry>
<entry>
<title>WIP on RWTexture types on CUDA/CPU (#1234)</title>
<updated>2020-02-20T23:24:00+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2020-02-20T23:24:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=1f401d04e32c6feaeb35243ea5bfc2b14520344b'/>
<id>urn:sha1:1f401d04e32c6feaeb35243ea5bfc2b14520344b</id>
<content type='text'>
* CUDA support for array of resources.

* * Add support for Texture2DArray on CPU
* Expand texture-simple.slang to test Texture2DArray

* Reorganise CUDAComputeUtil to split out createTextureResource.

* Add TextureCubeArray support for CPU/CUDA targets.

* Pulled out CUDAResource
Renamed derived classes to reflect that change.

* Creation of SurfObject type.

* Functions to return read/write access for simplifying future additions.

* WIP for RWTexture access on CPU/CUDA.

* CUsurfObject cannot have mips.

* Ability to set number of mips on test data.
Preliminary support for CUsurfObj and RWTexture1D on CUDA.
CUDA docs improvements.

* Fix typo.
</content>
</entry>
<entry>
<title>First pass Texture Array support on CUDA/CPU (#1225)</title>
<updated>2020-02-18T19:14:16+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2020-02-18T19:14:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=8ee39e08c48a315163fe1850dbb12ca292020d4d'/>
<id>urn:sha1:8ee39e08c48a315163fe1850dbb12ca292020d4d</id>
<content type='text'>
* Add cubemap support.

* Add CUDA fence instrinsics.

* Added Gather for CUDA.

* Use the CUDA driver API as much as possible.

* * Support 1D texture on CPU
* WIP on 1D texture on CUDA
* Added simplified texture test

* Fix test.

* Improve texture-simple tests.

* * Add CPU support for 3d textures
* Add support for mip maps to CUDA
* Disable warnings in nvrtc
* Update CUDA docs

* WIP on 3d texture support.

* Add support for 3d textures for CPU and CUDA.

* CPU and CUDA support for cube maps.

* Add CPU support for Texture1DArray.

* Support CUDA Layered/Array type in meta library.
</content>
</entry>
<entry>
<title>Feature/cuda coverage (#1223)</title>
<updated>2020-02-14T20:06:35+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2020-02-14T20:06:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=2c097545eaa324a91a035327abad2e8b4fa60469'/>
<id>urn:sha1:2c097545eaa324a91a035327abad2e8b4fa60469</id>
<content type='text'>
* Add cubemap support.

* Add CUDA fence instrinsics.

* Added Gather for CUDA.

* Use the CUDA driver API as much as possible.

* * Support 1D texture on CPU
* WIP on 1D texture on CUDA
* Added simplified texture test

* Fix test.

* Improve texture-simple tests.

Co-authored-by: Tim Foley &lt;tfoleyNV@users.noreply.github.com&gt;
</content>
</entry>
<entry>
<title>Add attributes to enable dual-source blending on Vulkan (#1210)</title>
<updated>2020-02-10T18:25:29+00:00</updated>
<author>
<name>Tim Foley</name>
<email>tfoleyNV@users.noreply.github.com</email>
</author>
<published>2020-02-10T18:25:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=60dfb62e638a06ebdcef27138b63033b828ec2ef'/>
<id>urn:sha1:60dfb62e638a06ebdcef27138b63033b828ec2ef</id>
<content type='text'>
This change adds support for the `[[vk::location(...)]]` and `[[vk::index(...)]]` attributes, which can be used together to mark up shader outputs for dual-source blending on Vulkan. HLSL/Slang code like the following:

```hlsl
struct Output
{
    [[vk::location(0)]]
    float4 a : SV_Target0;

    [[vk::location(0), vk::index(1)]]
    float4 b : SV_Target1;
}

[shader("fragment")]
Output main(...) { ...}
```

can be used to set up dual-source blending on both D3D and Vulkan APIs. The output GLSL for the above will look something like:

```glsl
layout(location = 0)            out vec4 a;
layout(location = 0, index = 1) out vec4 b;

void main() { ... }
```

The more or less straightforward parts of this change were:

* Added new `attribute_syntax` declarations to the stdlib, for `[[vk::location(...)]]` and `[[vk::index(...)]]`

* Added new AST node types for the new attribute cases, sharing a base class so that argument checking can be shared

* Added checks for the arguments to the new attributes in `slang-check-modifier.cpp` (eventually this kind of logic shouldn't be needed for new attributes)

* Updated GLSL emit logic so that it treats the `index`/`space` parts of a variable layout as the `location`/`index` for varying parameters.

* Updated GLSL legalization so that when it translates entry-point parameters into globals (and scalarizes structures) it handles both a binding index and space for the parameters.

* Added a cross-compilation test case to verify that the basics of the feature work

The remaining work is all in `slang-parameter-binding.cpp`.

There is some work that isn't technically related to this change (and which could be reverted if it causes problems), around the detection and handling of fragment shader outputs with `SV_Target` semantics. The basic changes (which could be backed out and then merged separately) are:

* Made the special-case `SV_Target` logic only trigger for fragment shaders (that is the only place where `SV_Target` should appear, but we weren't guarding against it)

* Made the logic to reserve a `u&lt;N&gt;` register for `SV_Target&lt;N&gt;` only trigger for D3D Shader Model 5.0 and below (since it is not required for SM 5.1 and up). This could be a breaking change for some users, but that seems unlikely.

* Fixed one test case that relied on the behavior of reserving `u0` for `SV_Target0` even though it was a SM6.0 test.

* Also added more comments to the system-value handling logic.

The more interesting changes come up starting in `processEntryPointVaryingParameterDecl()`. The basic issue is that we have so far only supported implicit layout for varying parameters on GLSL/Vulkan, but the `[[vk::location(...)]]` attribute is a form of explicit layout annotation. Rather than try to kludge something that only works in narrow cases, I instead opted to try to fix things more generally.

In `processEntryPointVaryingParameterDecl()` we now check for the `location` and `index` attributes when we are on "Khronos" targets (Vulkan/OpenGL/GLSL) and immediately add them to the variable layout being constructed if they are found. There is nothing in this logic specific to fragment-shader outputs, so this feature now applies to any varying input/output on Khronos targets.

Allowing explicit layouts creates the potential for mixing implicit and explicit layout. For example, consider:

```hlsl
struct Output
{
    float4 color : COLOR;
    [[vk::location(0)]] float3 normal : NORMAL;
}
```

What `location` should `color` get? Should this code be an error? There are two cases where this conundrum can come up: when working with `struct` types used for varying parameters, and the entry-point parameter list itself.

For the varying `struct` case we currently make an expedient choice. We handle fields with both implicit or explicit layotu with appropriate logic, but logic that doesn't account for the case of mixing the two. Then at the end of layout for the `struct` we issue an error if there was a mix of implicit and explicit layout (such that our results aren't likely to be valid).

For the entry point varying parameter case, things were already using a `ScopeLayoutBuilder` type (that encapsulates some logic shared between entry-point and global parameters). The entry-point-specific bits were moved out into a `SimpleScopeLayoutBuilder` and it was updated so that rather than assuming all parameters use implicit layout it does a two-phase layout approach similar to what we use for the global scope:

* First all parameters are enumerated to collect explicit bindings and mark certain ranges as "used"

* Next the parameters are enumerated again and those without explicit bindings get allocated space using a "first fit" algorithm

In principle we could extend the two-phase approach to apply to `struct` types as well, but that would be best saved for a future refactoring of some of this parameter binding logic, since I would like to exploit more of the opportunities for sharing code across the uniform/varying and struct/entry-point/global cases.

By moving the point where entry point parameters get their offsets assigned, it was necessary to move around some of the logic that removes varying parameter usage (and other things that shouldn't "leak" out of an entry point) to a different point in the entry point layout process.

While adding these various pieces does not quite enable us to support explicit bindings on entry point parameters (e.g., putting `uniform Texture2D t : register(t0)` in an entry point parameter list) or in `struct` types (e.g., explicit `packoffset` annotations on fields), it starts to provide some of the infrastructure that we'd need in order to support those cases.</content>
</entry>
<entry>
<title>Fixes to make all CPU compute shaders work on CUDA (#1211)</title>
<updated>2020-02-08T16:19:31+00:00</updated>
<author>
<name>jsmall-nvidia</name>
<email>jsmall@nvidia.com</email>
</author>
<published>2020-02-08T16:19:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/slang.git/commit/?id=0eed0125fa5e5f425d546efdc2b284b09ffc2785'/>
<id>urn:sha1:0eed0125fa5e5f425d546efdc2b284b09ffc2785</id>
<content type='text'>
* Launch CUDA test taking into account dispatch size.

* Enable isCPUOnly hack to work on CUDA.

* Rename 'isCPUOnly' hack to 'onlyCPULikeBinding'.

* Add $T special type.
Support SampleLevel on CUDA.

* Fix typo.
</content>
</entry>
</feed>
