| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR implements `Access.Immutable` to allow pointers to immutable
data.
The new type `ImmutablePtr<T>` is defined as an alias of `Ptr<T,
Address.Immutable>`.
By forming a immutable pointer, the programmer is conveying to the
compiler that the data at the pointer address will never change during
the execution of the current program. Therefore loads from immutable
pointers can be deduplicated by the compiler, and will translate to
`__ldg` when generating code for CUDA.
The SPIRV backend is not changed in this PR, since the current SPIRV
spec makes it very difficult to specify loads from immutable address
without generating tons of wrappers and boilerplate type declarations.
We would like to see the spec evolved a bit to around its support of
`NonWritable` physical storage pointers or immutable loads before we
attempt to express such immutability in SPIRV. For now we simply emit
ordinary pointers and loads when generating spirv.
---------
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
| |
|
|
|
|
|
|
| |
happen. (#8653)
This allows us to further cleanup unnecessary copies in the target code
we generate.
Part of effort of #8652.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We insert field initialization logic at the beginning of every ctor in
`synthesizeCtorBody`, but then immediately inserts another round of
initialization again for explicit ctors in `maybeInsertDefaultInitExpr`,
both called from `SemanticsDeclBodyVisitor::visitAggTypeDecl` right next
to each other.
The fix is to remove `maybeInsertDefaultInitExpr`.
This change also enhances the address aliasing analysis, so that for the
following case:
```
this->member1 = 0;
this->member2 = 0;
this->member1 = param;
```
We can still remove the first assignment to `this->member1` despite
seeing `this->member2=0`, since it is easy to know that `this->member2`
cannot alias with `this->member1`.
Closes #8600.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#8547)
This allows us to specialize functions whose argument is a sub element
of a constant buffer, instead of being only applicable to entire buffer
element. Closes #8421.
This change also implements a proper heuristic to determine when to
specialize the calls and defer the buffer loads.
This PR addresses a pathological case exposed in
`slangpy\slangpy\benchmarks\test_benchmark_tensor.py`, which used to
take 27ms to finish, and now takes 1.25ms.
For example, given:
```
struct Bottom
{
float bigArray[1024];
[mutating]
void setVal(int index, float value) { bigArray[index] = value; }
}
struct Root
{
Bottom top[2];
[mutating]
void setTopVal(int x, int y, float value)
{
top[x].setVal(y, value);
}
}
RWStructuredBuffer<Root> sb;
[shader("compute")]
[numthreads(1, 1, 1)]
void compute_main(uint3 tid: SV_DispatchThreadID)
{
sb[0].setTopVal(1, 2, 100.0f);
}
```
We are now able to specialize the call to `setTopVal` into:
```
void compute_main(uint3 tid: SV_DispatchThreadID)
{
setTopVal_specialized(0, 1, 2, 100.0f);
}
void setTopVal_specialized(int sbIdx, int x, int y, float value)
{
Bottom_setVal_specialized(sbIdx, x, y, value);
}
void Bottom_setVal_specialized(int sbIdx, int x, int y, float value)
{
sb[sbIdx].top[x].bigArray[y] = value;
}
```
And get rid of all unnecessary loads. Achieving this requires a
combination of function call specialization and buffer-load-defer pass.
The buffer-load-defer pass has been completely rewritten to be more
correct and avoid introducing redundant loads.
This PR also adds tests to make sure pointers, bindless handles, and
loads from structured buffer or constant buffers works as expected.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
packing/unpacking. (#8526)
Part of the effort to improve the performance of generated SPIRV code.
The existing lower-buffer-element-type pass works by loading the entire
buffer element content from memory, and translate it to logical type
stored in a local variable at the earliest reference of a buffer handle.
This means that is can generate inefficient code that reads more than
necessary.
Consider this example:
```
struct BigStruct { bool values[1024]; }
ConstantBuffer<BigStruct> cb;
void test(BigStruct v)
{
if (v.values[0]) { printf("ok"); }
}
[numthreads(1,1,1)]
void computeMain()
{
test(cb);
}
```
In IR, the `computeMain` function before lower-buffer-element-type pass
is something like following:
```
func test:
%v = param : BigStruct
%barr = fieldExtract(%v, "values")
%element = elementExtract(%barr, 0)
... // uses %element
func computeMain:
%v = load(cb)
call %test %v
```
The existing lower-buffer-element-type pass will rewrite the bool array
in `BigStruct` into `int` array so it is legal in SPIRV. However, it
does so by inserting the translation on the first `load` of the constant
buffer:
```
struct BigStruct_std430 {
int values[1024];
}
var cb : ConstantBuffer<BigStruct_std430>;
func computeMain:
%tmpVar : var<BigStruct>
call %unpackStorage(%tmpVar, cb)
%v : BigStruct = load %tmpVar
call %test %v
```
This means that the entire array will be loaded and translated to int,
before calling `test`, which only uses one element. It turns out that
the downstream compiler isn't always able to optimize out this
inefficient translation/copy.
This PR completely rewrites the way buffer-element-type lowering is
handled to avoid producing this inefficient code. It works in two parts:
first we turn on the `transformParamsToConstRef` pass for SPIRV target
as well, so we will translate the `test` function to take the `v`
parameter as `constref`. The second part is a redesigned
buffer-element-type pass that defers the storage-type to logical-type
translation until a value is actually used by a `load` instruction.
In this example, after `transformParamsToConstRef`, the IR is:
```
func test:
%v = param : ConstRef<BigStruct>
%barr = fieldAddr(%v, "values")
%elementPtr = elementAddr(%barr, 0)
%element = load(%elementPtr)
... // uses %element
func computeMain:
call %test %cb
```
The new `buffer-element-type-lowering` pass will take this IR, and
insert translation at latest possible time across the entire call graph,
and translate the IR into:
```
func test:
%v = param : ConstRef<BigStruct_std430>
%barr = fieldAddr(%v, "values")
%elementPtr : ptr<int> = elementAddr(%barr, 0)
%element_int = load(%elementPtr)
%element = cast(%element_int) : %bool
... // uses %element
func computeMain:
call %test %cb
```
In this new IR, there is no longer a load and conversion of the entire
array.
See new comment in `slang-ir-lower-buffer-element-type.cpp` for more
details of how the pass works.
This PR also address many other issues surfaced by turning on
`transformParamsToConstRef` pass on SPIRV backend.
---------
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
| |
|
|
|
|
|
|
|
| |
Some tests are now passing and are enabled.
Other tests are still failing, but are given comments categorizing the failures.
Tests in the 'Not supported in WGSL' category are also removed from the expected failures
list. (Though they are still kept disabled for WebGPU, of course.)
This closes #5519.
|
| |
|
|
|
|
|
|
|
|
|
| |
targeting WGSL (#5483)
Structured buffer types translate to array types in the WGSL emitter.
WGSL doesn't allow passing runtime-sized arrays to functions.
Similarly for pointers to texture handles.
Also, structured buffers (runtime-sized arrays) cannot be returned in WGSL.
This closes issue #5228, issue #5278 and issue #5288 by enabling specialized functions
to be generated in these cases, in order to work around these constraints.
|
| | |
|
| |
|
|
|
| |
solves: `func-resource-result-simple.slang`, related to #4291
when getting the address space of a function return we now check the return address of the inst if the return type is a non pointer type.
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Warning on bool to float conversion.
* Fix test cases.
* Improve.
* LanguageServer: don't show constant value for non constant variables.
* Fix tests.
* Fix warnings in tests.
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
| |
`ImageSubscript` for GLSL (#2146)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change converts a large number of our existing tests to use the `ShaderObject` support that was added to the `gfx` layer.
In many cases, tests were just updated to pass `-shaderobj` and the result Just Worked.
In other cases, a `name` attribute had to be added to one or more `TEST_INPUT` lines.
For tests that did not work with shader objects "out of the box," I spent a little bit of time trying to get them work, but fell back to letting those tests run in the older mode.
Future changes to the infrastructure will be needed to get those additional tests working in the new path.
Along with the changes to test files, the following implementation changes were made to get additional tests working:
* Because the shader object mode uses explicit register bindings (from reflection), the hacky logic that was offseting `u` registers for D3D12 based on the number of render targets gets disabled (by another hack).
* The "flat" reflection information coming from Slang was not correctly reporting "binding ranges" for things that consumed only uniform data (which would be everything on CUDA/CPU), so it was refactored to properly include binding ranges for anything where the type of the field/variable implied a binding range should be created (even if the `LayoutResourceKind` was `::Uniform`).
* A few fixes were made to the CUDA implementation of `Renderer`, in order to get additional tests up and running. Most of these changes had to do with texture bindings, which hadn't really been tested previously.
In addition, a few changes were made that were attempts at getting more tests working, but didn't actually help. These could be dropped if requested:
* As a quality-of-life feature (not being used) the `object` style of `TEST_INPUT` line is upgraded to support inferring the type to use from the type of the input being set.
* Any `object` shader input lines get ignored in non-shader-object mode.
|
|
|
A long-standing problem for the Slang implementation has been that some targets (notably GLSL/SPIR-V) do not support treating resources (textures, buffers, samplers, etc.) as first-class types. Resource types on such platforms are restricted so that they may not be used as the type of:
1. fields of aggregate types (`struct`s)
2. local variables
3. function results or `out`/`inout` parameters
Issue (1) is handled by our "type legalization" pass today, by splitting aggregates that contain resources into separate fields/variables/parameters. Issue (2) is worked around by putting code into SSA form and promoting local variables to SSA temporaries when possible; the net result is that many local variables of texture type are eliminated (that pass is not perfect, though, and it is possible for users to get errors when it doesn't fully clean up local variables of texture type).
Issue (3) is a much more complicated matter, and it is what this change is concerned with.
A typical solution to issue (3) is to simply inline all of the code in a program, at which point function results and `out`/`inout` parameters will no longer exist to cause problems. We reject such solutions for two reasons. First, there are limitations on control-flow structure in HLSL/GLSL/SPIR-V that mean they cannot express certain programs after inlining has been performed. Second, and more importantly, the philosophy of the Slang compiler is to perform as little duplication of code as possible, so that we do not accidentally contribute to binary size bloat.
Instead, this change tackles the problem of functions that output resource types by adding a new specialization pass. The pass detects functions that ought to be specialized (because they have resource-type outputs), and inspects their bodies to see if the values they output have a predicatable structure that can be replicated outside of the function body. The same logic that inspects the function body also rewrites (a copy of) the function to not have the offending outputs. Finally, all the call sites to a function that is rewritten in this way also get rewritten so that instead of using output values from the function itself, they reproduce the expected output value(s) in their own code.
The pass as presented here is intentionally limited in the scope of what it can optimize away (and the test case only touches on that specific functionality). The goal is to get a basic version of this pass in place and evaluated, and then to expand on its functionality incrementally over time.
|