| Age | Commit message (Collapse) | Author |
|
* * Remove UniformState and UniformEntryPointParams types
* Put all output C++ source in an anonymous namespace
* If SLANG_PRELUDE_NAMESPACE is set, make what it defines available in generated file.
* Fix signature issue in performance-profile.slang
* Context -> KernelContext to avoid ambiguity.
* Fix issues around dynamic dispatch and anonymous namespace.
* Fix typo.
|
|
* Add IR pass to lower generics into ordinary functions.
* Fix project files
* Emit dynamic C++ code for simple generics and witness tables.
Fixes #1386.
* Remove -dump-ir flag.
* Fixups.
|
|
IRWitnessTable values (#1387)
* Generate IRType for interfaces, and use them as the type of IRWitnessTable values.
This results the following IR for the included test case:
```
[export("_S3tu010IInterface7Computep1pii")]
let %1 : _ = key
[export("_ST3tu010IInterface")]
[nameHint("IInterface")]
interface %IInterface : _(%1);
[export("_S3tu04Impl7Computep1pii")]
[nameHint("Impl.Compute")]
func %Implx5FCompute : Func(Int, Int)
{
block %2(
[nameHint("inVal")]
param %inVal : Int):
let %3 : Int = mul(%inVal, %inVal)
return_val(%3)
}
[export("_SW3tu04Impl3tu010IInterface")]
witness_table %4 : %IInterface
{
witness_table_entry(%1,%Implx5FCompute)
}
```
* Fixes per code review comments.
Moved interface type reference in IRWitnessTable from their type to operand[0].
* Fix typo in comment.
|
|
Fixes #1377
|
|
* Use the original value in the test.
Run test on VK.
* Added RWBuffer and Buffer types to C++ prelude.
* Add vk to atomics.slang tests
* Update target-compatibility around atomics.
When tests disabled in atomics-buffer.slang explained why.
* tabs -> spaces.
* Small docs improvement.
|
|
* Fix CUDA output of a static const array if values are all literals.
* Fix bug in Convert definition.
* Output makeArray such that is deconstructed on CUDA to fill in based on what the target type is. Tries to expand such that there are no function calls so that static const global scope definitions work.
* Fix unbounded-array-of-array-syntax.slang to work correctly on CUDA.
* Remove tabs.
* Check works with static const vector/matrix.
* Fix typo in type comparison.
* Shorten _areEquivalent test.
* Rename _emitInitializerList. Some small comment fixes.
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
(#1318)
TL;DR: This is a tweak the rules for layout that only affects a corner case for people who actually use `interface`-type shader parameters (which for now is just our own test cases). The tweaked rules seem like they make it easier to write the application code for interfacing with Slang, but even if we change our minds later the risk here should be low (again: nobody is using this stuff right now).
Slang already has a rule that a constant buffer that contains no ordinary/uniform data doesn't actually allocate a constant buffer `binding`/`register`:
struct A { float4 x; Texture2D y; } // has uniform/ordinary data
struct B { Texture2D u; SamplerState v; } // has none
ConstantBuffer<A> gA; // gets a constant buffer register/binding
ConstantBuffer<B> gB; // does not
There is similar logic for `ParameterBlock`, where the feature makes more sense. A user would be somewhat surprised if they declared a parmaeter block with a texture and a sampler in it, but then the generating code reserved Vulkan `binding=0` for a constant buffer they never asked for. The behavior in the case of a plain `ConstantBuffer` is chosen to be consistent with the parameter block case.
(Aside: all of this is a non-issue for targets with direct support for pointers, like CUDA and CPU. On those platforms a constant buffer or parameter block always translates to a pointer to the contained data.)
Now, suppose the user declares a constant buffer with an interface type in it:
interface IFoo { ... }
ConstantBuffer<IFoo> gBuffer;
When the layout logic sees the declaration of `gBuffer` it doesn't yet know what type will be plugged in as `IFoo` there. Will it contain uniform/ordinary data, such that a constant buffer is needed?
The existing logic in the type layout step implemented a complicated rule that amounted to:
* A `ConstantBuffer` or `cbuffer` that only contains `interface`/existential-type data will *not* be allocated a constant buffer `register`/`binding` during the initial layout process (on unspecialized code). That means that any resources declared after it will take the next consecutive `register`/`binding` without leaving any "gap" for the `ConstantBuffer` variable.
* After specialization (e.g., when we know that `Thing` should be plugged in for `IFoo`), if we discover that there is uniform/ordinary data in `Thing` then we will allocate a constant buffer `register`/`binding` for the `ConstantBuffer`, but that register/binding will necessarily come *after* any `register`s/`binding`s that were allocated to parameters during the first pass.
* Parameter blocks were intended to work the same when when it comes to whether or not they allocate a default `space`/`set`, but that logic appears to not have worked as intended.
These rules make some logical sense: a `ConstantBuffer` declaration only pays for what the element type actually needs, and if that changes due to specialization then the new resource allocation comes after the unspecialized resources (so that the locations of unspecialized parameters are stable across specializations).
The problem is that in practice it is almost impossible to write client application code that uses the Slang reflection API and makes reasonable choices in the presence of these rules. A general-purpose `ShaderObject` abstraction in application code ends up having to deal with multiple possible states that an object could be in:
1. An object where the element type `E` contains no uniform/ordinary data, and no interface/existential fields, so a constant buffer doesn't need to be allocated or bound.
2. An object where the element type `E` contains no uniform/ordinary data, but has interace/existential fields, with two sub-cases:
a. When no values bound to interface/existential fields use uniform/ordinary dat, then the parent object must not bind a buffer
b. When the type of value bound to an interface/existential field uses uniform/ordinary data, then the parent object needs to have a buffer allocated, and bind it.
3. When the element type `E` contains uniform/ordinary data, then a buffer should be allocated and bound (although its size/contents may change as interface/existential fields get re-bound)
Needing to deal with a possible shift between cases (2a) and (2b) based on what gets bound at runtime is a mess, and it is important to note that even though both (2a) and (3) require a buffer to be bound, the rules about *where* the buffer gets bound aren't consistent (so that the application needs to undrestand the distinction between "primary" and "pending" data in a type layout).
This change introduces a different rule, which seems to be more complicated to explain, but actually seems to simplify things for the application:
* A `ConstantBuffer` or `cbuffer` that only contains `interface`/existential-type data always has a constant buffer `register`/`binding` allocated for it "just in case."
* If after specialization there is any uniform/ordinary data, then that will use the buffer `register`/`binding` that was already allocated (that's easy enough).
* If after speciazliation there *isn't* any uniform/ordinary data, then the generated HLSL/GLSL shader code won't declare a buffer, but the `register`/`binding` is still claimed.
* A `ParameterBlock` behaves equivalently, so that if it contains any `interface`/existential fields, then it will always allocate a `space`/`set` "just in case"
The effect of these rules is to streamline the cases that an application needs to deal with down to two:
1. If the element type `E` of a shader object contains no uniform/ordinary or interface/existential fields, then no buffer needs to be allocated or bound
2. If the element type `E` contains *any* uniform/ordinary or interface/existential fields, then it is always safe to allocate and bind a buffer (even in the cases where it might be ignored).
Furthermore, the reflection data for the constant buffer `register`/`binding` becomes consistent in case (2), so that the application can always expect to find it in the same way.
|
|
* Fold prefix operators if they prefix an int literal.
* Make test case a bit more convoluted.
* Remove ++ and -- as not appropriate for folding of literals.
* Set output buffer name.
|
|
* Improve performance of building members dictionary by adding when needed.
* Fix unbounded-array-of-array-syntax.slang, that DISABLE_TEST now uses up an index. Use IGNORE_TEST.
* Improve variable name.
Small improvements.
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
The Slang compiler was bit by a known issue when translating from SSA form back to straight-line code. Give code like the following:
int x = 0;
int y = 1;
while(...)
{
...
int t = x;
x = y;
y = t;
}
...
The SSA construction pass will eliminate the temporary `t` and yield code something like:
br(b, 0, 1);
block b(param x : Int, param y : Int):
...
br(b, y, x);
The loop-dependent variables have become parameters of the loop block, and the branchs to the top of the loop pass the appropriate values for the next iteration (e.g., the jump that starts the loop sends in `0` and `1`).
The problem comes up when translating the back-edge the continues the loop out of SSA form. Our generated code will re-introduce temporaries for `x` and `y`:
int x;
int y;
// jump into loop becomes:
x = 0;
y = 1;
for(;;)
{
...
// back-edge becomes
x = y;
y = x;
continue;
}
The problem there is that we've naively translated a branch like `br(b, <a>, <b>)` into `x = <a>; y = <b>;` but that doesn't work correctly in the case where `<b>` is `x`, because we will have already clobbered the value of `x` with `<a>`.
The simplest fix is to introduce a temporary (just like the input code had), and generate:
// back-edge becomes
int t = x;
x = y;
y = t;
This change modifies the `emitPhiVarAssignments()` function so that it detects bad cases like the above and emits temporaries to work around the problem. A new test case is included that produced incorrect output before the change, and now produces the expected results.
A secondary change is folded in here that tries to guard against a more subtle version of the problem:
for(...)
{
...
int x1 = x + 1;
int y1 = y + 1;
x = y1;
y = x1;
}
In this more complicated case, each of `x` and `y` is being assigned to a value derived from the other, but neither is being set using a block parameter directly, so the changes to `emitPhiVarAssignments()` do not apply.
The problem in this case would be if the `shouldFoldInstIntoUseSites()` logic decided to fold the computation of `x1` or `y1` into the branch instruction, resulting in:
x = y + 1;
y = x + 1;
which would again violate the semantics of the original code, because now there is an assignment to `x` before the computation of `x + 1`.
Right now it seems impossible to force this case to arise in practice, due to implementation details in how we generate IR code for loops. In particular, the block that computes the `x+1` and `y+1` values is currently always distinct from the block that branches back to the top of the loop, and we do not allow "folding" of sub-expressions from different blocks. It is possible, however, that future changes to the compiler could change the form of the IR we generate and make it possible for this problem to arise.
The right fix for this issue would be to say that we should introduce a temporary for any branch argument that "involves" a block parameter (whether directly using it or using it as a sub-expression). Unfortunately, the ad hoc approach we use for folding sub-expressions today means that testing if an operand "involves" something would be both expensive and unwieldy.
A more expedient fix is to disallow *all* folding of sub-expressions into unconditional branch instructions (the ones that can pass arguments to the target block), which is what I ended up implementing in this change. Making that defensive change alters the GLSL we output for some of our cross-compilation tests, in a way that required me to update the baseline/gold GLSL.
A better long-term fix for this whole space of issues would be to have the "de-SSA" operation be something we do explicitly on the IR. Such an IR pass would still need to be careful about the first issue addressed in this change, but the second one should (in principle) be a non-issue given that our emit/folding logic already handles code with explicit mutable local variables correctly.
|
|
* Add unroll support for CUDA, and preliminary for C++.
Document [unroll] support.
* Fix loop-unroll to run on CPU, and test on CPU and elsewhere.
Fix bug in emitting loop unroll condition.
* Improved comment.
* Added support for vk/glsl loop unrolling.
|
|
* Better diagnostics on failure on CUDA.
* Catch exceptions in render-test
* * Added ability to disable reporting on CUDA failures
* Stopped using exception for reporting (just write to StdWriter::out()
* Removed CUDAResult type
* Don't set arch type on nvrtc to see if fixes CI issues.
* Try compute_30 on CUDA.
* Added ability to IGNORE_ a test
DIsabled rw-texture-simple and texture-get-dimensions
* Disable tests that require CUDA SM7.0
Use DISABLE_ prefix to disable tests.
* Disable signalUnexpectedError doing printf.
|
|
* Added CPU support for GetDimensions on C++/CPU target.
Added texture-get-dimension.slang test
* Fix some typos.
* Update CUDA docs.
* Fix output of GetDimensions on glsl when has an array.
Disabled VK - because VK renderer doesn't support createTextureView
* Fix typo.
* Fix typo.
* Fix bad-operator-call diagnostics output.
|
|
The main feature visible to the stdlib here is the `[__unsafeForceInlineEarly]` attribute, which can be attached to a function definition and forces calls to that function to be inlined immediately after initial IR lowering.
The pass is implemented in `slang-ir-inline.{h,cpp}` and currently only handles the completely trivial case of a function with no control flow that ends with a single `return`. The lack of support for any other cases motivates the `__unsafe` prefix on the attribute.
In order to test that the pass works, I modified the "comma operator" in the standard library to be defined directly (rather than relying on special-case handling in IR lowering), and then added a test that uses that operator to make sure it generates code as expected. The compute version of the test confirms that we generate semantically correct code for the operator, while the SPIR-V cross-compilation test confirms that our output matches GLSL where the comma operator has been inlined, rather than turned into a subroutine.
Notes for the future:
* With this change it should be possible (in principle) to redefine all the compound operators in the stdlib to instead be ordinary functions with the new attribute, removing the need for `slang-compound-intrinsics.h`.
* Once the compound intrinsics are defined in the stdlib, it should be easier/possible to start making built-in operators like `+` be ordinary functions from the standpoint of the IR
* The attribute and pass here could be extended to include an alternative inlining attribute that happens later in compilation (after linking) but otherwise works the same. This could in theory be used for functions where we don't want to inline the definition into generated IR, but still want to inline things berfore generating final HlSL/GLSL/whatever.
* The inlining pass itself could be generalized to work for less trivial functions pretty easily; for the most part it would just mean "splitting" the block with the call site and then inserting clones of the blocks in the callee. Any `return` instructions in the clone would become unconditional branches (with arguments) to the block after the call (which would get a parameter to represent the returned value).
* The "hard" part for such an inlining pass would be handling cases where the control flow that results from inlining can't be handled by our later restructuring passes. The long-term fix there is to implement something like the "relooper" algorithm to restructure control flow as required for specific targets.
|
|
* Added FloatTextureData as a mechanism to enable CPU based Texture writes.
* Add [] RWTexture access for CPU.
* Fixed rw-texture-simple.slang.expected.txt
* WIP: CUDA stdlib has support for [] surface access.
* Made IRWTexture class able to take different locations.
Doing a Texture2d access on CUDA works.
* Fix bug in outputing UniformState - was missing out padding.
Support RWTexture with array. Support RWTexture3D.
* Use * for locations for read only textures, so only need a ITexture interface.
* Fix problem around application of set/get for CUDA on subscript Texture types.
|
|
Within the context of an aggregate type (or an `extension` of one), the programmer can use `this` to refer to the "current" instance of the surrounding type, but there is no easy way to utter the name of the type itself. This is especially relevant inside of an `interface`, where the type of `this` isn't actually the `interface` type, but rather a placeholder for the as-yet-unknown concrete type that will implement the interface.
This change adds a keyword `This` that works similarly to `this`, but names the current *type* instead of the current instance. It can be used to declare things like binary methods or factory functions in an interface:
```
interface IBasicMathType
{
This absoluteValue();
This sumWith(This left);
}
T doSomeMath<T:IBasicMathType>(T value)
{
return value.sumWith(value.absoluteValue());
}
```
The `This` type is consistent with the type named `Self` in Rust and Swift (where Rust/Swift use `self` instead of `this`). Other names could be considered (e.g., `ThisType`) if we find that users don't like the name in this change.
|
|
This change makes it so that for a suitable type `MyType`, a variable declaration like:
MyType v;
is treated as if it were written:
MyType v = MyType();
The definition of "suitable" here is that `MyType` needs to have an available `__init` declaration that can be invoked with zero arguments. I've added a test to confirm that the new behavior works in this specific case.
There are a bunch of caveats to the feature as it stands today:
* Just because `MyType` has a zero-parameter `__init`, that doesn't mean an array type like `MyType[10]` does, so arrays currently remain uninitialized by default. Fixing this gap requires careful consideration because some, but not all, array types should be default-initializable.
* The change here should mean that a `struct` type with a field like `MyType f;` should count as having a default initial-value expression for that field, but I haven't confirmed that.
* Even if a `struct` provides initial values for all its fields (e.g., `struct S { float f = 0; }`), that doesn't mean it has a default `__init` right now, so those `struct` types will still be left uninitialized by default. Converging all this behavior is still TBD.
Just to be clear: there is no provision or plan in Slang to support destructors, RAII, copy constructors, move constructors, overloaded assignment operations, or any other features that buy heavily into the C++ model of how construction and destruction of values gets done.
In fact, I'm not even 100% sure I like having this change in place at all, and I think we should reserve the right to revert it and say that only specific stdlib types get to opt in to default initialization along these lines.
|
|
* CUDA support for array of resources.
* * Add support for Texture2DArray on CPU
* Expand texture-simple.slang to test Texture2DArray
* Reorganise CUDAComputeUtil to split out createTextureResource.
* Add TextureCubeArray support for CPU/CUDA targets.
* Pulled out CUDAResource
Renamed derived classes to reflect that change.
* Creation of SurfObject type.
* Functions to return read/write access for simplifying future additions.
* WIP for RWTexture access on CPU/CUDA.
* CUsurfObject cannot have mips.
* Ability to set number of mips on test data.
Preliminary support for CUsurfObj and RWTexture1D on CUDA.
CUDA docs improvements.
* Fix typo.
|
|
The basic idea is that the user can write:
```hlsl
struct MyThing
{
int a;
float b;
__init(int x, float y)
{
a = x;
b = y;
}
}
```
and after that point, they can create an intstance of their `MyThing` type as simply as `MyThing(123, 4.56f)`.
There was already a large amount of infrastructure laying around that is shared between ininitializers and ordinary functions, so enabling this feature mostly amounted to tying up some loose ends:
* In the parser, make sure to properly push/pop the scope for an `__init` (or `__subscript`) declaration, so parameters would be visible to the body
* In semantic checking, make sure that declaration "header" checking properly bottlenecks all the function-like cases into a base routine
* In semantic checking, make sure that the logic for checking function bodies applies to every `FunctionDeclBase` with a body, and not just `FuncDecl`s
* Update semeantic checking for statements to allow for any `FunctionDeclBase` as the parent declaration, not just a `FuncDecl`
* In lookup, treat the `this` parameter of an `__init` (well, not actually a *parameter* in this case) as being mutable, just like for a `[mutating]` method
* In IR codegen, don't just assume that all `__init`s are intrinsics, and narrow the scope of that hack to just `__init`s without bodies
* In IR codegen, detect when we are emitting an IR function for an `__init`, and in that case create a local variable to represent the `this` value, and implicitly return that value at the end of the body.
From that point on the rest of the compiler Just Works and IR codegen doesn't have to think of an `__init` as being any different than if the user had declared a `static MyThing make(...)` function.
Caveats:
* C++ users might like to use that naming convention (so `MyThing` as the name instead of `__init`). We can consider that later.
* Everybody else might prefer a keyword other than `__init` (e.g., just `init` as in Swift), but I'm keeping this as a "preview" feature for now, rather than something officially supported
* Early `return`s from the body of an `__init` aren't going to work right now.
* There is currently no provision for automatically synthesizing initializers for `struct` types based on their fields. This seems like a reasonable direction to take in the future.
* There is no provision for routing `{}`-based initializer lists over to initializer calls. The two syntaxes probably need to be unified at some point so that doing `MyType x = { a, b, c }` and `let x = MyType(a, b, c)` are semantically equivalent.
It is possible that as a byproduct of this change user-defined `__subscript`s might Just Work, but I am guessing there will still be loose ends on that front as well, so I will refrain from looking into that feature until we have a use case that calls for it.
|
|
* CUDA support for array of resources.
* * Add support for Texture2DArray on CPU
* Expand texture-simple.slang to test Texture2DArray
* Reorganise CUDAComputeUtil to split out createTextureResource.
* Add TextureCubeArray support for CPU/CUDA targets.
|
|
* Add cubemap support.
* Add CUDA fence instrinsics.
* Added Gather for CUDA.
* Use the CUDA driver API as much as possible.
* * Support 1D texture on CPU
* WIP on 1D texture on CUDA
* Added simplified texture test
* Fix test.
* Improve texture-simple tests.
* * Add CPU support for 3d textures
* Add support for mip maps to CUDA
* Disable warnings in nvrtc
* Update CUDA docs
* WIP on 3d texture support.
* Add support for 3d textures for CPU and CUDA.
* CPU and CUDA support for cube maps.
* Add CPU support for Texture1DArray.
* Support CUDA Layered/Array type in meta library.
|
|
* Add cubemap support.
* Add CUDA fence instrinsics.
* Added Gather for CUDA.
* Use the CUDA driver API as much as possible.
* * Support 1D texture on CPU
* WIP on 1D texture on CUDA
* Added simplified texture test
* Fix test.
* Improve texture-simple tests.
* * Add CPU support for 3d textures
* Add support for mip maps to CUDA
* Disable warnings in nvrtc
* Update CUDA docs
* WIP on 3d texture support.
* Add support for 3d textures for CPU and CUDA.
|
|
* Add cubemap support.
* Add CUDA fence instrinsics.
* Added Gather for CUDA.
* Use the CUDA driver API as much as possible.
* * Support 1D texture on CPU
* WIP on 1D texture on CUDA
* Added simplified texture test
* Fix test.
* Improve texture-simple tests.
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
* Added 'truncate' for fixing floats, for floats near the max value (as opposed to making infinite).
Put AreNearlyEqual into Math
* Test for ::make static method.
|
|
* * Improved fastRemoveAt
* Fixed off by one bug
* Fixed const safeness with List<>
* Made List begin and end const safe.
* Revert to previous RefPtr usage.
* Fix bug with casting.
* Tabs -> spaces.
Small fixes/improvements to List.
* Improve comment on List.
* Group shared/atomic test works on CUDA.
* * Enabled CUDA tests for atomics tests
* Enabled DX12 test for atomics-buffer.slang
Not clear just yet how to implement that for CUDA - it will work with StructuredBuffer.
* hasContent -> isNonEmpty
* Remove unneeded comment.
|
|
* Launch CUDA test taking into account dispatch size.
* Enable isCPUOnly hack to work on CUDA.
* Rename 'isCPUOnly' hack to 'onlyCPULikeBinding'.
* Add $T special type.
Support SampleLevel on CUDA.
* Fix typo.
|
|
* WIP: 64 literal diagnostic and truncation.
* Improve how integer truncation is handled/supported.
Added literal-int64.slang test.
Set a suffix on all literals.
Fixed problem on C++ based targets where l suffix was not the same as int() cast. So on C++ derived emitters, int() is used instead of l suffix to have same behavior across targets.
* Add literal diagnostic testing.
* Allow lexer to lex - in front of literals.
* Fix lexing and converting int literal with -.
* Too large small values of floats become inf.
Handling writing inf types out on different targets.
Add function to deterimine if a float literals kind.
* Roll back the support of lexer lexing negative literals.
* Fixed tests broken because of diagnostics numbers.
Improved _isFinite
* Fix compilation on linux.
* Fix problem with abs on linux - use Math::Abs.
* Fix typo.
* * Improve warnings for float literals zeroed
* Improved 64 bit type documentation
* Handle half
* Improved comments
* Fixed tests broken
* Use capital letters for suffixes.
* Make default behavior on outputting a int literal that is an 'int32_t' is cast (not suffix) to avoid platform inconsistencies.
Improve documentation for 64 bit types.
Make tests cover material in docs.
* Fixed tests.
* Rename FloatKind::Normal -> Finite
* Fix half zero check.
|
|
When using row-major layout (via command-line or API option), the following sort of declaration:
```hlsl
StructuredBuffer<float4x4> gBuffer;
... gBuffer[i] ...
```
Generates unexpected results when compiled to DXBC via fxc or DXIL via dxc, because the fxc/dxc compilers do not respect the matrix layout mode in this specific case (a structured buffer of matrices). Instead, they always use column-major layout, even if row-major was requested by the user.
A user can work around this behavior by wrapping the matrix in a `struct`:
```hlsl
struct Wrapper { float4x4 wrapped; }
SturcturedBuffer<Wrapper> gBuffer;
... gBuffer[i].wrapped ...
```
This change simply automates that workaround when compiling for an HLSL-based downstream compiler, so that we get the same behavior across all our backends.
The change adds a test case to confirm the behavior across multiple targets, but it turns out we also had a test checked in that confirmed the buggy (or at least surprising) fxc/dxc behavior, so that one had its baselines changed and can work as a regression test for this fix as well.
|
|
* When using setUniform clamp the amount of data written to the buffer size.
* CUDA implement StructuredBuffer/ByteAddressBuffer as pointer/count as is on CPU.
Allow bounds check to zero index.
Update docs.
* Synthesize tests.
* Fix bug in CUDA output.
* Fixing more tests to run on CUDA.
* Added BaseType for layout of Vector and Matrix - as they are held as int32_t vector array types.
* Enable unbound array support on CUDA.
* Added unsized array support for CUDA documentation.
|
|
* Added hlsl-intrinsic test folder.
Enabled ceil as works across targets.
* log10 support.
* Fix float % on CPU/CUDA to match HLSL which is fmod (not fremainder).
* Added log10 tests back to scalar-float.slang
* Don't add the ( for $Sx - it's clearer what's going on without it.
* Works on CUDA/CPU. Problem with asint/asuint do not seem to be found.
* Only asuint exists for double.
* Support countbits on CUDA and C++.
* Fix typo in C++ population count.
* First pass at int vector intrinsic tests.
* Swizzle for int.
* Bit cast tests on CUDA.
* Fix warning on gcc.
* Fix bit-cast-double execution on CUDA.
* scalar-int test working on gcc release.
|
|
* Added hlsl-intrinsic test folder.
Enabled ceil as works across targets.
* log10 support.
* Fix float % on CPU/CUDA to match HLSL which is fmod (not fremainder).
* Added log10 tests back to scalar-float.slang
* Don't add the ( for $Sx - it's clearer what's going on without it.
|
|
* Add test result for compile-to-cuda
* Add RAII for some CUDA types to simplify usage.
* First pass handling of some instrinsics on CUDA (for example transcendentals)
* CUDA working with built in intrinsics.
* Add missing CUDA prelude intrinsics.
* CUDA matches CPU output on simple-cross-compile.slang
* First pass at hlsl-scalar-float-intrinsic.slang test.
* Fix smoothstep impl on CUDA and CPU.
* Fixed step intrinsic on CUDA/CPU.
* Added operator[] to Matrix for C++, to allow row access.
Needs a fix for CUDA.
* Fixed warning on clang build.
|
|
The logic for invoking methods (member functions) in `slang-lower-to-ir.cpp` was failing to take into account whether the callee was `[mutating]` or not. Instead, it would always lower the `base` expression in something like `base.f(...)` as an r-value expression, consistent with a non-`[mutating]` method.
The incorrect code generation strategy somehow turned out to work in many cases, but it broke in cases where a `[mutating]` method was called on an `inout` parameter. E.g., in this code:
```hlsl
struct Stuff { [mutating] void doThing() { ... } }
void broken(inout Stuff s)
{
s.doThing();
}
```
The `broken` function would fail to write back the value mutated by `doThing` to its `s` parameter before returning.
The crux of the fix here is inside `visitInvokeExpr()`. Instead of directly calling `lowerRValueExpr` on the base expression of a method/member-function call, we instead compute the "direction" of the `this` parameter in the callee, and use that to emit the argument expression appropriately.
In order to enable that change, there are several refactorings included:
* The existing `ParameterDirection` and `getParameterDirection()` calls were lifted out from the declaration visitor to the global scope, so that they could be shared between lowering of functions and their call sites.
* The logic for determining the "direction" of a `this` parameter was factored out of `collectParameterLists()` into its own `getThisParamDirection()` subroutine (again so that functions and call sites can share matching logic).
* The logic for turning an AST expression used as a call argument into IR argument(s)* was pulled out into its own `addCallArgsForParam` *and* was refactored to rely on a `ParameterDirection` instead of directly inspecting the modifiers on a `ParamDecl`. This allows the function to be used for ordinary/direct arguments and the `this` argument, and also ensures that the caller and callee will agree on the direction of parameters.
Fixing the way that `[mutating]` methods are called actually broke some test cases, specifically in the cases where a `[mutating]` method was being called on a value with an interface-constrained generic type:
```hlsl
interface IThing { [mutating] void doStuff(); }
void myFunc<T : IThing>(inout T thing)
{
thing.doStuff();
}
```
Our argument passing for `inout` parameters currently requires that we make a temp copy of `thing` into a local, and then pass that local as argument for the `inout` parameter, before copying back. The issue that arose was that a simple version of the logic uses the type of the `base` expression in `base.someMethod(...)` as the type of the local variable, but for an interface method call the base expression will have been cast to the interface type (we effectively have `((IThing) thing).doStuff()`.
The fix here was to query the this type through the member function we are calling, and to share that logic between the function-call and function-declaration cases, to try and make sure they match, which meant even more logic got hoisted out of the declaration-emission logic and to the top level.
Note: This change does *not* clean up any other clarity or performance concerns around `out` and `inout` parameters; it is only focused on correctness.
|
|
* Support conversion from int/uint to enum types
The basic feature here is tiny, and is summarized in the code added to the stdlib:
```
extension __EnumType
{
__init(int val);
__init(uint val);
}
```
The front-end already makes all `enum` types implicitly conform to `__EnumType` behind the scenes, and this `extension` makes it so that all such types inherit some initializers (`__init` declarations, aka. "constructors") that take `int` and `uint`.
(Note: right now all `__init` declarations in Slang are assumed to be implemented as intrinsics using `kIROp_Construct`. This obviously needs to change some day, especially so that we can support user-defined initializers.)
Actually making this *work* required a bit of fleshing out pieces of the compiler that had previously been a bit ad hoc to be a bit more "correct." Most of the rest of this description is focused on those details, since the main feature is not itself very exciting.
When overload resolution sees an attempt to "call" a type (e.g., `MyType(3.0)`) it needs to add appropriate overload candidates for the initializers in that type, which may take different numbers and types of parameters. The existing code for handling this case was using an ad hoc approach to try to enumerate the initializer declarations to consider, which might be found via inheritance, `extension` declarations, etc.
In practice, the ad hoc logic for looking up initializers was just doing a subset of the work that already goes into doing member lookup. Changing the code so that it effectively does lookup for `MyType.__init` allows us to look up initializers in a way that is consistent with any other case of member lookup. Generalizing this lookup step brings us one step closer to being able to go from an `enum` type `E` to an initializer defined on an `extension` of an `interface` that `E` conforms to.
One casualty of using the ordinary lookup logic for initializers is that we used to pass the type being constructed down into the logic that enumerated the initializers, which made it easier to short-circuit the part of overload resolution that usually asks "what type does this candidate return."
It might seem "obvious" that an initializer/constructor on type `Foo` should return a value of type `Foo`, but that isn't necessarily true.
Consider the `__BuiltinFloatingPointType` interface, which requires all the built-in floating-point types (`float`, `double`, `half`) to have an initializer that can take a `float`.
If we call that interface in a generic context for `T : __BuiltinFloatingPointType`, then we want to treat that initializer as returning `T` and not `__BuiltinFloatingPointType`.
Without the ad hoc logic in initializer overload resolution, this is the exact problem that surfaced for the stdlib definition of `clamp`.
The solution to the "what type does an initializer return" problem was to introduce a notion of a `ThisType`, which refers to the type of `this` in the body of an interface.
More generally, we will eventually want to have the keyword `This` be the type-level equivalent of `this`, and be usable inside any type.
The `calcThisType` function introduced here computes a reasonable `Type` to represent the value of `This` within a given declaration.
Inside of concrete type it refers to the type itself, while in an `interface` it will always be a `ThisType`.
The existing `ThisTypeSubstitution`s, previously only applied to associated types, now apply to `ThisType`s as well, in the same situations.
The next roadblock for making the simple declarations for `__EnumType` work was that the lookup logic was only doing lookup through inheritance relationships when the type being looked up in was an `interface`.
The logic in play was reasonable: if you are doing lookup in a type `T` that inherits from `IFoo`, then why bother looking for `IFoo::bar` when there must be a `T::bar` if `T` actually implements the interface?
The catch in this case is that `IFoo::bar` might not be a requirement of `IFoo`, but rather a concrete method added via an `extension`, in which case `T` need not have its own concrete `bar`.
The simple/obvious fix here was to make the lookup logic always include inherited members, even when looking up through a concrete type.
Of course, if we allow lookup to see `IFoo::bar` when looking up on `T`, then we have the problem that both `T::bar` and `IFoo::bar` show up in the lookup results, and potentially lead to an "ambiguous overload" error.
This problem arises for any interface rquirement (so both methods and associated types right now).
In order to get around it, I added a somewhat grungy check for comparing overload candidates (during overload resolution) or `LookupResultItem`s (during resolution of simple overloaded identifiers) that considers a member of a concrete type as automatically "better" than a member of an interface.
The Right Way to solve this problem in the long run requires some more subtlety, but for now this check should Just Work.
One final wrinkle is that due to our IR lowering pass being a bit overzealous, we currently end up trying to emit IR for those new `__init` declarations, which ends up causing us to try and emit IR for a `ThisType`.
That is a case that will require some subtlty to handle correctly down the line, for for now we do the expedient thing and emit the `ThisType` for `IFoo` as `IFoo` itself, which is not especially correct, but doesn't matter since the concrete initializer won't ever be called.
* testing: add more debug output to Unix process launch function
* testing: increase timeout when running command-line tests
|
|
|
|
The `TEST_INPUT` facility allows textual Slang test cases to provide two kinds of information to the `render-test` tool:
1. Information on what shader inputs exist
2. Information on what values/objects to bind into those shader inputs
Under the first category of information, there exists supporting for attaching a `dxbinding(...)` annotation to a `TEST_INPUT` which seemingly indicates what HLSL `register` the input uses. There is a similar `glbinding(...)` annotation, used for OpenGL and Vulkan.
It turns out that these annotations were, in practice, completely ignored and had no bearing on how `render-test` allocates or bindings graphics API objects. There was some amount of code attempting to validate that explicit registers/bindings were being set appropriately, but the actual values were being ignored.
The visible consequence of the `dxbinding` and `glbinding` annotations being ignored is issue #1036: the order of `TEST_INPUT` lines was *de facto* determining the registers/bindings that were being used by `render-test`.
This change simply removes the placebo features and strips things down to what is implemented in practice: the `TEST_INPUT` lines do not need target-API-specific binding/register numbers, because their order in the file implicitly defines them.
I added logic to the parsing of `TEST_INPUT` lines to make sure I got an error message on any leftover annotations, and went ahead and systematicaly deleted all of the placebo annotations from our test cases.
If we decide to make `TEST_INPUT` lines *not* depend on order of declaration in the future, we can build it up as a new and better considered feature.
The main alternative I considered was to keep the annotations in place, and change `render-test` and the `gfx` abstraction layer to properly respect them, but that path actually creates much more opportunity for breakage (since every single test case would suddenly be specifying its root signature / pipeline layout via a different path using data that has never been tested). The approach in this change has the benefit of giving me high confidence that all the test cases continue to work just as they had before.
|
|
* Initial work for "global generic value parameters"
The main new feature here is support for the `__generic_value_param` keyword, which introduces a *global generic value parameter*.
For example:
__generic_value_param kOffset : uint = 0;
This declaration introduces a global generic value parameter `kOffset` of type `uint` that has a nominal default value of zero.
The broad strokes of how this feature was added are as follows:
* A new `GlobalGenericValueParamDecl` AST node type is introduces in `slang-decl-defs.h`
* A new `parseGlobalGenericValueParamDecl` subroutine is added to `slang-parser.cpp`, and is added to the list of declaration cases as the callback for the `__generic_value_param` name.
* Cases for `GlobalGenericValueParamDecl` are added to the declaration checking passes in `slang-check-decl.cpp`, mirroring what is done for other variable declaration cases.
* A case for `GlobalGenericValueParamDecl` is aded to the `Module::_collectShaderParams` function, so that it is recognized as a kind of specialization parameter. This introduces a specialization parameter of flavor `SpecializationParam::Flavor::GenericValue` (which was already defined before this change, although it was unused).
* A case for `SpecializationParam::Flavor::GenericValue` is added in `Module::_validateSpecializationArgsImpl` to check that a specialization argument represents a compile-time-constant value (not a type).
* A case for `GlobalGenericValueParmDecl` is introduced in `slang-lower-to-ir.cpp` that introduces a global generic parameter in the IR
* The `IRBuilder` is extended to support creating `IRGlobalGenericParam`s for the distinct cases of type, witness-table, and value parameters. The same IR instruction type/opcode is used for all cases, and only the type of the IR instruction differs.
* The existing mechanisms for lowering specialization arguments to the IR, and doing specialization on the IR itself Just Work with global generic value parameters since they already support value parameters on explicit generic declarations.
That's the santized version of things, but there were also a bunch of cleanups and tweaks required along the way:
* The `SpecializationParam` type was extended to also track a `SourceLoc` to help in diagnostic messages, which meant some churn in the code that collects specialization parameters.
* The `_extractSpecializationArgs` function is tweaked to support any kind of "term" as a specialization argument (either a type or a value).
* To allow *parsing* specialization arguments that can't possibly be types (e.g., integer literals) we replace the existing `parseTypeString` routine with `parseTermString` and then in `parseTermFromSourceFile` call through to a general case of expression parsing (which can also parse types) rather than only parsing types directly.
* Right before doing back-end code generation, we check if the program we are going to emit has remaining (unspecialized) parameters, in which case we emit a diagnostic message for the parameters that haven't been specialized rather than go on to emit code that will fail to compile downstream.
* Within the `render-test` tool we collapse down the arrays that held both "generic" and "existential" specialization arguments, so that we just have *global* and *entry-point* specialization argument lists. This mirrors how Slang has worked internally for a while, but the difference hasn't been important to the test tool because no tests currently mix generic and existential specialization. The logic for parsing `TEST_INPUT` lines has been streamlined down to just the global and entry-point cases, but the pre-existing keywords are still allowed so that I don't have to tweak any test cases.
There are several significant caveats for this feature, which mean that it isn't really ready for users to hammer on just yet:
* There is no support for `Val`s of anything but integers, so there is no way to meaningfully have a generic value param with a type other than `int` or `uint`.
* We allow for a default-value expression on global generic parameters, but do not actually make use of that value for anything (e.g., to allow a programmer to omit specialization arguments), nor check that it meets the constraints of being compile-time constant.
* Global generic value parameters are *not* currently being treated the same as explicit generic parameters in terms of how they can be used for things like array sizes or other things that require constants. This will probably be relaxed at some point, but allowing a global generic to be used to size an array creates questions around layout.
* The IR optimization passes in Slang currently won't eliminate entire blocks of code based on constant values, so using a global generic value parameter to enable/disable features will *not* currently lead to us outputting drastically different HLSL or GLSL. That said, we expect most downstream compilers to be able to handle an `if(0)` well.
* Fix regression for tagged union types
The change that made specialization arguments be parsed as "terms" first, and then coerced to types meant that any special-case logic that is specific to the parsing of types would be bypassed and thus not apply.
Most of that special-case logic isn't wanted for specialization arguments, since it pertains to cases were we want to, e.g, declare a `struct` type while also declaring a variable of that type.
The one special case that *is* useful is the `__TaggedUnion(...)` syntax, which is the only way to introduce a tagged union type right now.
In order to get that case working again, all I had to do was register the existing logic for parsing `__TaggedUnion` as an expression keyword with the right callback, and the existing logic in expression parsing kicks in (that logic was already handling expression keywords like `this` and `true`).
I left in the existing logic for handling `__TaggedUnion` directly where types get parsed, rather than try to unify things.
A better long-term fix is to make the base case for type parsing route into `parseAtomicExpr` so that the two paths share the core logic.
That change should probably come as its own refactoring/cleanup, because it creates the potential for some subtle breakage.
* fixup: typo
|
|
* WIP on serialize/save state.
* Relative string encoding.
* Added RelativeContainer unit test.
Split out RelativeContainer into core.
* Fix bug in RelativeString encoding.
* More work around relative container.
* Fix checks.
* Use RelativeBase for safe access.
Use malloc/free/realloc instead of List.
* Add natvis support for relative types.
* Setting up of state (not includes) writing of repro state.
* Capture after spCompile.
* Writing SourceFile and file system files.
Added -dump-repo
* First pass at loading state.
* First pass at reading repro.
* Small optimization around Safe32Ptr
* Refactor how repro data is stored - to make saving off the files more simple, by having all all backed by 'files'.
Make file loading always set up PathInfo so we get uniqueIdentifier info.
* Generate unique file names.
* Added RelativeFileSystem
Added saveFile to ISlangFileSystemExt and implemented for interfaces
Added mechanism to save of files (and manifest)
* Added ability to replace files in repo with directory holding their contents.
* Add support for entry points.
* Fix problem compiling on linux.
* Added SIMPLE_EX option, where everything on command line must be specified.
* Fix typo in unit test for relative container.
* Fix another typo in unit test for RelativeContainer.
* Fix small bugs.
* Fix release unused variable issue in slang-state-serialize.cpp
* Fix checking for SIMPLE_EX in testing, else broke COMMAND_LINE_SIMPLE.
* Fix warnings on 32 bit debug build.
* Added import-subdir-search-path-repro.slang test. Although disabled for now as writes to root of slang project.
* Remove wrong version of import-subdir-search-path-repro.slang
* Added import-subdir-search-path-repro.slang
|
|
* Initial work on representing layout at IR level
This change starts the process of making the back-end of the compiler independent of the AST-level layout information (`TypeLayout`, `VarLayout`, etc.) so that it instead only relies on layout information that is embedded into IR modules. This brings us incrementally closer to a world in which the back-end could be run without the AST-level structures even existing (e.g., for an application that just wants to ship IR without any AST information for IP protection, while still supporting some amount of linking and specialization).
The main parts of the change are:
* There is a bunch of incidental churn related to specifying entry points by index instead of the `EntryPoint` object for certain operations. This ends up being a better choice because we can use the index to look up side-band information about the entry point that might not be stored on the `EntryPoint` object itself. In particular...
* We expand the `ComponentType` interface to support looking up the mangled name of an entry point by index. In common cases (no generic/interface specialization) this would be the same as asking the `EntryPoint` for its mangled name, but in cases where we have specialized a generic entry point, the mangled name would include speicalization arguments that are only available on the `SpecializedComponentType` that wraps the entry point. This part of the change isn't ideal and there might be a better solution waiting to be invented. Note that we store mangled entry point names as strings rather than using `DeclRef`s because that ensures that the information could be serialized and deserialized without a dependence on the AST.
* The `TargetProgram` type (which represents binding a specific `ComponentType` for a shader program to a specific `TargetRequest` that represents the target platform) is expanded to include an `IRModule` that represents layout information, in addition to the AST-level `ProgramLayout` it already contained. We create both of these objects at the same time (on-demand) to simplify the overall flow (so that any code that triggers creation of the AST-level layout will also ensure that the IR-level layout exists).
* A bunch of code in the emit passes that was passing down layout-related objects has been eliminated. It appears that most of those objects weren't actually being used, so this is just a cleanup, but it helps ensure that the back-end steps are "clean" and don't depend on the AST-level information. The one big exception here is that the emit logic needs to know the stage for the entry point being emitted (to deal with one wrinkle in translating DXR to VKRT).
* A big change (actually introduced by @jsmall-nvidia in a branch that this change copied and then built from) is to introduce some more explicit IR instructions to represent layout information, notably an `IRTypeLayout` and an `IRVarLayout`. For now these objects still reference their AST equivalents, but the separation gives us an incremental path to move information from the AST-level objects over to the IR ones. This work includes logic in `IRBuilder` to construct the IR-level layout objects from the AST-level ones on-demand, so that the existing code paths that try to attach AST-level layout will continue to work for now.
* Because layout information is now embedded in the IR, the `slang-ir-link.cpp` logic loses a lot of cases that used to deal with attaching AST-level layout objects to IR-level instructions during the linking process. Instead, the linker now assumes that one (or more) of the input IR modules will have layout information associated with it, and the linker makes sure to copy layout decorations (and the instructions they reference) from the input IR module(s) to the output using its more ordinary mechanisms.
* Inside `slang-lower-to-ir.cpp`, we add logic to construct an IR module in a `TargetProgram` that simply references the global shader parameters, entry points, etc. and attaches IR layout decorations to them. This is akin to the existing pass in the same file that constructs IR to represent specialization information, and both of these passes share infrastructure with the main AST->IR lowering pass. Eventually, it is expected that this pass will encompass more of the logic for copying AST-level layout information over to IR-level equivalents.
* One small wrinkle with this change was that the output for an HLSL generation test case changed some of its `#line` directives. The old code was actually more inaccurate than the new, so this change just updated the baseline. It also added some logic in the linker to make sure that when an IR instruction has multiple definitions, we try to pick up a source location from any of them, in case the "main" one somehow didn't get a location.
* Another small fix was that the key/value map in `StructTypeLayout` for mapping fields/members to their layouts was keyed on `Decl*` when it really should have been `VarDeclBase*`.
This change should in principle be a pure refactoring with no functionality changes, so no new tests were added. It is unfortunately also a change that has a high probability of breaking at least *some* client code, so we may want to be defensive and mark this with a new major version number (well, a new *minor* version number since we are pre-`1.0`) to give us some room for releasing hotfixes to the old version if needed.
* fixup: infinite recursion bug detected by clang
* fixup: remove commented-out code
|
|
* Simple testing of unbounded array of array on GPU.
* Fix problem on CPU targets around NonUniformResourceIndex
Use the unbounded-array-of-array-syntax test for CPU and GPU tests.
|
|
* WIP: Unsized arrays on CPU.
* unbounded-array-of-array working on CPU.
* Test that has an unbounded array of array directly (ie without wrapping with ParameterBlock). Test works on CPU.
* Remove some left over comments.
* Added documention on unsized array usage on CPU targets.
|
|
* WIP: Unsized arrays on CPU.
* unbounded-array-of-array working on CPU.
* Remove some left over comments.
|
|
* First pass support for performance profiling
* Test across all elements
* Fix bug - sourceContents is not used, should use rawSource.
* * Add ability to get prelude from API.
* Allow specifying source language for render-test
* Made it possible to compile a test input file as C++
* Special handling for reflection
* Added C++ impl to performance-profile.slang
* Remove some clang warnings.
* Output profile timings on appveyor and other TC.
* Remove passing around of StdWriters (can use global).
Small comment improvements.
|
|
results. (#1061)
|
|
Work on #1059
The `%` operator in the Slang implementation had several issues, and this change tries to address some of them:
* Renamed most occurences of "mod" describing this operator to be "rem" for "remainder" to better match its semantics in HLSL
* Split the operator into distinct integer and floating-point variants (`IRem` and `FRem`) to simplify having different codegen for the two
* Added floating-point variants of `operator%` and `operator%=` to the stdlib.
* Added custom C++ codegen for `kIROp_FRem` such that it maps to the standard C/C++ `remainder()` function
* Added custom GLSL codegen so that `kIROp_FRem` maps to the GLSL `mod()` function (which isn't correct...)
* Added a test case to confirm that D3D11, D3D12, and CPU targets all agree on the definition of floating-point `%`
* Fixed `render-test-tool` to allow a negative integer in a `data=...` specification. This didn't end up being used in the final test, but still seems like a good fix.
* Added a customized baseline for the Vulkan flavor of that test to confirm that we are *not* compiling correctly to SPIR-V just yet
Addressing the correctness of the output for GLSL/SPIR-V will have to come as a later change given that the operation we want is not exposed directly by unextended GLSL.
|
|
* WIP: Improving CPU performance/ABI
* Optionally output code on CPU for groupThreadID and groupID.
* Added ability to set compute dispatch size on command line for render-test.
Dispatch compute tests taking into account dispatch size.
Added test for semantics are working.
* Test using GroupRange.
* Fix problem with adding \n for externa diagnostic - to do it if there isn't a \n at the end. Change the ouput order (put result before) so last value is diagnostic string.
|
|
* First pass of render-test refactor.
* Make window construction a function that can choose an implementation.
* Remove OpenGL as currently has windows dependency.
* Disable Vulkan as Renderer impl has dependency on windows.
* Pass Window in as parameter of 'update'.
* Add win-window.cpp as was missing.
* Fix warning on windows about signs during comparison.
* * Added mechanism to add random arrays as buffer inputs and select type
* Improved RenderGenerator to generate more types, and to be more careful around int32 ranges.
* Added support for security checks (for Visual Studio C++)
* Disable Execption handling being on by default when compiling kernels
* Added a 'Group' version of the entry point that will evaluate all threads in a group in a single call. In test code use this method if available.
* Added -compile-arg to be able to pass arguments to the compile within render-test
* Add documention for the _Group execution feature.
* Fix some typos in cpu-target.md
|
|
* Updated docs to reflect ParameterBlock support
* Fixed CPU binding to handle ParameterBlocks
* Updated parameter-block.slang to be able to work as a CPU test
|
|
* * Made entry point parameters a separate entry point
* Made CPUMemoryBinding work with entry point parameters/initialize constant buffers
* Added isCPUOnly to bindings, because entry point parameters do not layout like constant buffer
* entry-point-uniform.slang works on CPU
* EntryPointParams -> UniformEntryPointParams
Updated CPU documentation.
* Update cpu-target.md to removed completed issues.
* Only allocate CPU buffers if the size is > 0.
Small update to cpu-target doc.
|
|
* WIP: Memory binding.
* WIP for binding.
* Fix handling of writing to constant buffer.
* Fix bug in handling indices.
|