| Commit message (Collapse) | Author | Age |
| ... | |
| |
|
|
|
| |
closes https://github.com/shader-slang/slang/issues/2410
Co-authored-by: Yong He <yonghe@outlook.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Warning on bool to float conversion.
* Fix test cases.
* Improve.
* LanguageServer: don't show constant value for non constant variables.
* Fix tests.
* Fix warnings in tests.
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
| |
See https://github.com/shader-slang/slang/issues/2213
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Use IR pass to eliminate phi nodes
"Phi nodes" are one of the key contrivances that makes SSA (Static
Single Assignment) form work. Because SSA is so great for compiler
IRs, we kind of need to deal with phi nodes, but they also get in
the way because they don't have a direct analog in most lower-level
machine ISAs or execution models, nor in most of the high-level
languages a transpiler wants to emit. As a result a compiler like
ours needs to be able to eliminate the phi nodes from a program as
part of generating output code.
(For any clever people noting that SPIR-V supports phi nodes
directly: yes, it does. It doesn't need to and it probably *shouldn't*.
Anybody involved in the decision-making knows my reasoning, and
anybody else should feel free to ask me if they want the lecture.
Anyway...)
The basic idea of elimiating phi nodes is simple enough. We replace
each phi node with a temporary variable. Uses of the phi use values
loaded from the temporary. The operation of the phi itself
(assigning a value based on the branch taken) amounts to an assignment
into the temporary.
Previously, the Slang compiler dealt with phi nodes very late in
the process of generating code: in the middle of emitting strings
of source code in a high-level language like HLSL or GLSL. Doing the
work that late in compilation has two big drawbacks:
1. Our ability to emit clean and/or optimal code is limited because
we may not be able to make certain changes to the IR, or because we
cannot make use of additional information like a dominator tree that
might be available at other points in compilation.
2. Any other IR passes that relate to temporary variables won't be
able to see the variables that we generate for phi nodes. This could
raise issues with correctness (e.g., if we want to compute live-range
information for *all* temporary variables), or performance (we have no
way to run additional IR optimization passes after phis are eliminated).
This change addresses these problems by making the elimination of
phi nodes an explicit IR pass. Additional optimizations can easily be
run after this pass (although we'd need to be careful not to run
passes that could end up introducing new phis). The pass makes use
of the information available to it to try to produce code that will
emit to "clean" HLSL/GLSL.
The core of the pass is in `slang-ir-eliminate-phis.cpp`, and is
heavily commented, so I won't describe the approach in detail here.
There are two related issues that came up, though:
First, it turned out that our emit logic for local variables (`IRVar`
instructions) wasn't using the function we'd defined named `emitVar()`.
One worrying consequence of that oversight was that the `precise`
modifier would impact generated HLSL/GLSL for variables that turned
into SSA values (including phi nodes), but *not* for local variables
that had not been SSA'd (or that had been SSA'd and then de-SSA'd).
This change also fixes that bug; it is unclear how widespread the
impact of the original issue might be.
Second, generating explicit IR temporaries for phi nodes exposed a
pre-existing bug in the `slang-ir-restructure-scoping` pass. That pass
basically detects cases where we have an instruction `I` with a use
`U` such that the use follows the rules of SSA form ("def dominates
use," meaning `I` dominations `U`), but does not follow the more
restrictive scoping rules of high-level-language output (where a value
computed "inside" a loop is not automatically visible to code outside
the loop just because it dominates that code). That pass did not
correctly account for the case where `I` was a temporary variable.
It seems that case could not arise before now because we didn't have
any passes that would move `var`, `load`, or `store` operations out
of the basic block they started in. The fix for that pass was relatively
simple, and will make the whole thing more robust in case we add more
aggressive optimizations later.
* fixup: expected test output
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Work to mitigate SPIR-V bloat
SPIR-V is not an especially compact format, but some patterns in how Slang generates code and then runs it through `spirv-opt` lead to many redundant field-by-field copy operations being emitted. This change attempts to address some of the resulting bloat from the Slang side of things.
Note: experimentation shows that the bloat is less pronounced when running either *no* SPIR-V optimizations or *full* SPIR-V optimizations, so it is also likely that the bloat should be addressed by changing which `spirv-opt` passes the Slang compiler runs in default (`-O1`) builds. Such changes should come as a distinct pull request.
This change primarily does two things:
First, the code generation strategy for passing arguments to `out` and `inout` parameters has been changed. In the past, the compiler would *always* copy the argument value into a temporary, then pass the address of the temporary, and then write back the value after the call. The new code generation strategy attempts to identify when an argument value already has a simple address in memory and passes that address directly when possible. This eliminates many copy operations that occur before/after calls to functions with `out`/`inout` parameters.
Second, we introduce an IR optimization pass that detects call sites where the entire contents of a buffer (usually a constant buffer) is being passed to a callee function, such that many bytes are loaded and then passed even if only very few are used in the callee. The pass moves the load operations from the caller to a specialized version of the the callee where possible (e.g., when the constant buffer in question is a global shader parameter). Doing this eliminates another major category of copies.
Notes:
* The IR lowering logic is complicated by the fact that several kinds of l-values (values that are usable as the desitnation of assignment, or for `out`/`inout` arguments) are not actually addressable. An easy example is a non-contiguous swizzle like `v.xwz` on a `float4`, where the value occupies 12 bytes, but not 12 consecutive bytes with a single address. There are many more corner cases like that and the IR lowering pass carries a lot of complexity to deal with them. A more systematic overhaul is due some time soon.
* The IR representation of `out` and `inout` parameters deserves some careful scrutiny when making these kinds of changes. The official semantics of `inout` in HLSL has been "copy in copy out" (and `out` is just "copy out") which is observably different from any solution that passes in the address of an l-value directly. By making this change we are saying that Slang's semantics are not precisely those of legacy HLSL, and that our semantics for `inout` parameters are closer to those of `inout` in Swift or of a mutable borrow in Rust. In the Swift case the implementation can freely pass the underlying storage of an l-value or the address of a temporary, and valid programs may not observe the different. It is thus illegal to observe the value in a storage local while a mutation to that location is "in flight." All of this is way more detailed and technical than 99% of Slang users will ever care about, but importantly it gives us semantic cover to eliminate these copies in the IR, and also to emit output C++ code that implements `out` and `inout` as by-reference parameter passing.
* There was an exsting generic pass for specializing functions based on call sites that uses a "template method" style of pattern to customize its behavior. That pass needed to be generalized to handle this use case because it had previously operated on the assumption that the "desire" to specialize a callee function must be driven by the parameter declarations of that function, and not on the argument values passed in. The code has been slightly refactored to allow the policy for specialization to consider both parameters and arguments.
* Unsurprisingly, a bunch of the GLSL (and thus SPIR-V) generated has changed with this work, so several baseline `.slang.glsl` files needed to be updated.
* This change is incomplete in that it does not address broader cases of buffer loads, including both partial loads from constant buffers (just loading one field, but a field that uses a "large" structure type), and loads from multi-element buffers (a lot from a structured buffer where the element type is "large"). The main question in each of those cases is how to define how "large" a structure needs to be before we decide to try and sink loads into callee functions like this. In the worst case, sinking loads in this way may actually create *more* memory traffic (because the same values get loaded in multiple callee functions).
* fixup: run premake
* fixup: typo
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* #include an absolute path didn't work - because paths were taken to always be relative.
* Fix issue with with SLANG_ENABLE_GLSLANG_SUPPORT
* Update expected output from glslang-error.glsl
* Fix bug in glsl dissassembly.
* Make ExtensionTracker available even if source is not emitted.
* Only explicitly set extension tracker based on capability bits, if we are in pass through.
* Small simplification of invoke sourceEmit.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* #include an absolute path didn't work - because paths were taken to always be relative.
* WIP Fxc as downstream compiler.
* First pass FXC downstream compiler working.
* GCC compile fix.
* Fix FXC parsing issue.
* Special case filesystem access.
* Use StringUtil getSlice.
* Fix isses with not emitting source for FXC.
* WIP on DXC.
* Small fixes for DXBC handling.
* Removed DXC from ParseDiagnosticUtil (can use generic)
Try to improve output for notes from DXC.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* #include an absolute path didn't work - because paths were taken to always be relative.
* WIP Fxc as downstream compiler.
* First pass FXC downstream compiler working.
* GCC compile fix.
* Fix FXC parsing issue.
* Special case filesystem access.
* Use StringUtil getSlice.
* Fix isses with not emitting source for FXC.
* Small fixes for DXBC handling.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change converts a large number of our existing tests to use the `ShaderObject` support that was added to the `gfx` layer.
In many cases, tests were just updated to pass `-shaderobj` and the result Just Worked.
In other cases, a `name` attribute had to be added to one or more `TEST_INPUT` lines.
For tests that did not work with shader objects "out of the box," I spent a little bit of time trying to get them work, but fell back to letting those tests run in the older mode.
Future changes to the infrastructure will be needed to get those additional tests working in the new path.
Along with the changes to test files, the following implementation changes were made to get additional tests working:
* Because the shader object mode uses explicit register bindings (from reflection), the hacky logic that was offseting `u` registers for D3D12 based on the number of render targets gets disabled (by another hack).
* The "flat" reflection information coming from Slang was not correctly reporting "binding ranges" for things that consumed only uniform data (which would be everything on CUDA/CPU), so it was refactored to properly include binding ranges for anything where the type of the field/variable implied a binding range should be created (even if the `LayoutResourceKind` was `::Uniform`).
* A few fixes were made to the CUDA implementation of `Renderer`, in order to get additional tests up and running. Most of these changes had to do with texture bindings, which hadn't really been tested previously.
In addition, a few changes were made that were attempts at getting more tests working, but didn't actually help. These could be dropped if requested:
* As a quality-of-life feature (not being used) the `object` style of `TEST_INPUT` line is upgraded to support inferring the type to use from the type of the input being set.
* Any `object` shader input lines get ignored in non-shader-object mode.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add basic GLSL support for SV_Barycentrics
This change allows for fragment shader varying inputs marked with the `SV_Barycentrics` semantic to be mapped to GLSL code using the `gl_BaryCoordNV` builtin variable (from he `GL_NV_fragment_shader_barycentric` extension).
This is the simplest possible change to get the functionality up and running, and it leaves out many things that could be desired in a more feature-complete version of the feature later:
* There is no support for alternative extensions that provide similar functionality. Selection of which extension to favor could eventually be based on the "capability" work that has been put in place.
* There is no attempt made to check that the input has the expected type (or to coerce it if it doesn't), so for now this is only going to be guaranteed to work for a `float3` input.
* This change does not expose the `pervertexNV` qualifier added in the `GL_NV_fragment_shader_barycentric` extension, which can be used by a shader to access the uninterpolated vertex inputs.
The last issue is an important one, since the HLSL `GetAttributeAtVertex` function seems to be defiend to work with *any* incoming varying parameter that was marked with `nointerpolation`. When we have a `nointerpolation` input, it would seem that we need to know whether it will be used with `GetAttributeAtVertex` (in which case it should be declared as a `pervertexNV` array input in GLSL) or not (in which case it should be declared as a `nointerpolation` input, without an array).
* fixup: missing file
|
| |
|
|
|
| |
* Enable all dynamic dispatch tests on CUDA.
* Fix expected cross-compile test results.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Adding support for global uniform shader parameters
This change adds support for Slang programmers to declare shader parameters of "ordinary" types at global scope:
```hlsl
uniform float gScaleFactor;
void main() { ... *= gScaleFactor; ... }
```
The generated HLSL/GLSL/DXIL/SPIR-V output will be something along the lines of:
```hlsl
struct GlobalParams
{
float gScaleFactor;
}
cbuffer globalParams
{
GlobalParams globalParams;
}
void main() { ... *= globalParams.gScaleFactor; ... }
```
The binding information used for the implicit `globalParams` constant buffer will be determined by the existing implicit parameter binding logic (which already had support for this kind of transformation).
The reason this change is being pursued right now is because it is one step toward removing the implicit `KernelContext` type that is used to wrap the generated code for our CPU and CUDA C++ targets. Handling global-scope parameters of ordinary type requires an IR pass that synthesizes the `GlobalParams` structure type above, and that step ends up removing the need for the similar `UniformState` structure that was being used in the CPU/CUDA emit logic.
A more detailed guide to the changes included follows:
* The diagnostic for a global-scope variable that is implicitly a shader parameter was kept, but changed to a warning. Users can opt out of the warning by decorating their parameter as a `uniform` (since that keyword is already being used to mark entry-point parameters that should be treated as uniform shader parameters).
* To simplify the task of finding the global shader parameters, the `CLikeSourceEmitter` type has been given an `m_irModule` member. The previous emit logic for `UniformState` was having to do a roundabout solution involving the `EmitAction`s to deal with not having direct access to the module.
* Removed a few dead declarations in the emit logic (related to a much earlier point where emit was based on the AST instead of the IR).
* Made the computation of type names in C++ emit take into account `ConstantBuffer<T>` and `ParameterBlock<T>`. As far as I can tell, these were being handled with some special-case hacks in the emit logic instead of being supported more fundamentally. It might actually be good to pass these through as `ConstantBuffer<T>` and `ParameterBlock<T>` in the C++ output, and allow the prelude to customize their translation (defaulting to defining them as `T*`).
* Removed the special-case C++ emit logic for references to global shader parameters. There are now at most two global shader parameters to deal with, and the default emit logic (referring to them by name) does the Right Thing.
* Changed the handling of entry points for C++ (both CPU and CUDA) so that it handles the bundled-up shader paameters for the global and entry-point scopes the same way. The main complication here is OptiX, where parameter data is passed very differently than it is for CUDA compute kernels.
* Reverted changes to `ir-entry-point-uniforms` that had made its logic depend on the compilation target. The parameter binding logic was already responsible for deciding if a given target needed to wrap up its entry-point parameters in a constant buffer, and the IR pass was respecting that layout information. The current workaround had been removing the `ConstantBuffer<T>` indirection from this IR pass for CPU/CUDA, but then reintroducing the same indirection later on in the emit step.
* Added an explicit IR pass with the task of collecting global-scope parameters of uniform/ordinary type and packaging them up into a `struct`, and then optionally packaging that `struct` up in a constant buffer. This pass bases its decisions on the IR layout information that was already computed, so it should match whatever policy choices were made at the layout level.
* Changed the "key" operand on IR `struct` layout information to not assume an `IRStructKey`. The problem here is that the global scope gets a `StructTypeLayout` to represent its members, and this is convenient (rather than having to always special-case logic that handles the global scope), but the "fields" of that struct are global variables which do not have `IRStructKey`s associated with them. The simplest solution is to use the variables themselves as the keys, which required removing the assumption in the IR encoding.
* Updated the IR layout process to compute a layout for the global scope of an entire program, and to attach that to the `IRModule` via a decoration. Updated the IR linking process to carry through that decoration to the linked output. This is necessary so that the IR pass that transforms global parameters can access the global-scope layout information.
An important concern with this approach is that the contents and layout of the monolithic `GlobalParams` structure depends on the exact set of modules that were linked (and the order in which they were specified, in some cases). This isn't really a new thing with this change, but it becomes more important as we start to think of how to generalize things to better support separate compilation and linking.
There are changes that can (and should) be made to the way that IR layouts are computed for programs (e.g., so that we compute layout per-module and then combine them rather than as a whole-program step). In this case, the problem of forming the combined/linked global layout can be moved down the IR level and not be reliant on AST-level information.
Just changing the way layout and linking interact would not change the fundamental problem that global shader parameters as they currently exist in Slang/HLSL/GLSL are not readily compatible with true separate compilation. We either need to find a solution strategy that we can apply to allow existing shaders to work with separate compilation *or* we need to incrementally work toward removing support for global-scope shader parameters in favor of explicit entry-point parameters in all cases.
* fixup: missing files
* fixup: comment the new code
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
| |
The existing code was assuming `fmod()` was available as a builtin in GLSL, which isn't true. It also isn't possible to translate the HLSL `fmod()` to the GLSL `mod()` since the two have slightly different semantics.
This change introduces a definition for `fmod(x,y)` that amounts to `x - y*trunc(x/y)` which should agree with the HLSL version except in corner cases (e.g., there are some cases where the HLSL version returns `-0` and this one will return `0`).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Improve GLSL coverage of boolean binary ops
This change ensures that the `&&`, `||`, `&`, `|`, and `^` apply correctly to vectors of `bool` values when targetting GLSL.
Most of the changes are in the GLSL emit path, where the IR instructions for these operators are bottlenecked through a small set of helper routines to cover the different cases. In general:
* The vector variants of the operations are implemented by casting to `uint` vectors, performing bitwise ops, then casting back
* The scalar variants are handled by conveting the bitwise operations to their equivalent logical operator (the one interesting case there is bitwise `^` where the equivalent logical operation on `bool` is `!=`)
This change makes it clear that our IR really shouldn't have distinct opcodes for logical vs. bitwise and/or/xor, and instead should just have a single family of operations where the behavior differs based on the type of the operand. That is already *de facto* the way things work (a user can always write `&`, `|` and `^` and expect them to work on `bool` and vectors of `bool`), so that the GLSL output path has to deal with the overlap. Having two sets of IR ops here actually makes for more code instead of less.
* Fixups: review feedback and test ! operator
|
| |
|
|
|
| |
These ended up being additional cases where we needed to use an explicit loop over components in the stdlib in order to produce valid GLSL output, but the existing declarations weren't doing it.
I added a very minimal cross-compilation test just to confirm that we generate valid SPIR-V for code using `f16tof32()`.
|
| |
|
|
|
| |
* Specialized handling for comparison of dxc output that ignores line/column numbers.
* Simplify areAllEqualWithSplit.
|
| |
|
| |
I'm not sure how this slipped in, but I know that I missed this when testing all my recent PRs because I end up havign a bunch of random not-ready-to-commit repro tests in my source tree which means I always get at least *some* test failures and have to scan them for the ones that are real. Somehow I have had a blind spot for this one.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Slang compiler was bit by a known issue when translating from SSA form back to straight-line code. Give code like the following:
int x = 0;
int y = 1;
while(...)
{
...
int t = x;
x = y;
y = t;
}
...
The SSA construction pass will eliminate the temporary `t` and yield code something like:
br(b, 0, 1);
block b(param x : Int, param y : Int):
...
br(b, y, x);
The loop-dependent variables have become parameters of the loop block, and the branchs to the top of the loop pass the appropriate values for the next iteration (e.g., the jump that starts the loop sends in `0` and `1`).
The problem comes up when translating the back-edge the continues the loop out of SSA form. Our generated code will re-introduce temporaries for `x` and `y`:
int x;
int y;
// jump into loop becomes:
x = 0;
y = 1;
for(;;)
{
...
// back-edge becomes
x = y;
y = x;
continue;
}
The problem there is that we've naively translated a branch like `br(b, <a>, <b>)` into `x = <a>; y = <b>;` but that doesn't work correctly in the case where `<b>` is `x`, because we will have already clobbered the value of `x` with `<a>`.
The simplest fix is to introduce a temporary (just like the input code had), and generate:
// back-edge becomes
int t = x;
x = y;
y = t;
This change modifies the `emitPhiVarAssignments()` function so that it detects bad cases like the above and emits temporaries to work around the problem. A new test case is included that produced incorrect output before the change, and now produces the expected results.
A secondary change is folded in here that tries to guard against a more subtle version of the problem:
for(...)
{
...
int x1 = x + 1;
int y1 = y + 1;
x = y1;
y = x1;
}
In this more complicated case, each of `x` and `y` is being assigned to a value derived from the other, but neither is being set using a block parameter directly, so the changes to `emitPhiVarAssignments()` do not apply.
The problem in this case would be if the `shouldFoldInstIntoUseSites()` logic decided to fold the computation of `x1` or `y1` into the branch instruction, resulting in:
x = y + 1;
y = x + 1;
which would again violate the semantics of the original code, because now there is an assignment to `x` before the computation of `x + 1`.
Right now it seems impossible to force this case to arise in practice, due to implementation details in how we generate IR code for loops. In particular, the block that computes the `x+1` and `y+1` values is currently always distinct from the block that branches back to the top of the loop, and we do not allow "folding" of sub-expressions from different blocks. It is possible, however, that future changes to the compiler could change the form of the IR we generate and make it possible for this problem to arise.
The right fix for this issue would be to say that we should introduce a temporary for any branch argument that "involves" a block parameter (whether directly using it or using it as a sub-expression). Unfortunately, the ad hoc approach we use for folding sub-expressions today means that testing if an operand "involves" something would be both expensive and unwieldy.
A more expedient fix is to disallow *all* folding of sub-expressions into unconditional branch instructions (the ones that can pass arguments to the target block), which is what I ended up implementing in this change. Making that defensive change alters the GLSL we output for some of our cross-compilation tests, in a way that required me to update the baseline/gold GLSL.
A better long-term fix for this whole space of issues would be to have the "de-SSA" operation be something we do explicitly on the IR. Such an IR pass would still need to be careful about the first issue addressed in this change, but the second one should (in principle) be a non-issue given that our emit/folding logic already handles code with explicit mutable local variables correctly.
|
| |
|
|
|
|
|
|
|
| |
There were two overlapping issues here:
1. We always mapped `SV_Coverage` to `gl_SampleMask`, even though `gl_SampleMaskIn` is the correct built-in variable to use for an input varying.
2. We treated `gl_SampleMask` like it was a scalar shader input, when it and `gl_SampleMaskIn` are actually arrays of indeterminate size (as a byproduct of trying to future-proof for implementations that might support hundreds or thousands of samples per pixel...)
The fix here is simple: map to either `gl_SampleMask[0]` or `gl_SampleMaskIn[0]` as appropriate. I suppose that this approach doesn't handle the possibility of eventually supporting >32 samples per pixel by having something like `uint2 coverage : SV_Coverage`, but I think we can cross that bridge when we come to it.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a shader only uses `ParameterBlock`s plus a single buffer for root constants:
```hlsl
ParameterBlock<A> a;
ParameterBlock<B> b;
[[vk::push_constant]] cbuffer Stuff { ... }
```
we expect the push-constant buffer should not affect the `space` allocated to the parameter blocks (so `a` should get `space=0`).
This behavior wasn't being implemented correctly in `slang-parameter-binding.cpp`. There was logic to ignore certain resource kinds in entry-point parameter lists when computing whether a default space is needed, but the equivalent logic for the global scope only considered parameters that consuem whole register spaces/sets.
This change shuffles the code around and makes sure it considers a global push-constant buffer as *not* needing a default space/set.
Note that this change will have no impact on D3D targets, where `Stuff` above would always get put in `space0` because for D3D targets a push-constant buffer is no different from any other constant buffer in terms of register/space allocation.
One unrelated point that this change brings to mind is the `[[vk::push_constant]]` should ideally also be allowed to apply to an entry point (where it would modify the default/implicit constant buffer). In fact, it could be argued that push-constant allocation should be the *default* for (non-RT) entry point `uniform` parameters (while `[[vk::shader_record]]` should be the default for RT entry point `uniform` parameters).
|
| |
|
|
|
|
|
|
|
|
| |
We had been translating an `SV_PrimitiveID` input in a shader over to `gl_PrimitiveID` in GLSL.
That translation seemed to work just fine for users, so we thought it was correct.
It turns out that `gl_PrimitiveID` is the correct GLSL for a primitive ID input in every stage *except* a geometry shader.
In a geometry shader, `gl_PrimitiveID` is a primitive ID *output*, and if you want the input case you have to write `gl_PrimitiveIDIn` (note the `In` suffix).
This change sets aside my bewilderment at the above long enough to implement a workaround in the GLSL legalization step.
I also modified our current geometry shader cross compilation test to make use of an input primitive ID.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds support for the `[[vk::location(...)]]` and `[[vk::index(...)]]` attributes, which can be used together to mark up shader outputs for dual-source blending on Vulkan. HLSL/Slang code like the following:
```hlsl
struct Output
{
[[vk::location(0)]]
float4 a : SV_Target0;
[[vk::location(0), vk::index(1)]]
float4 b : SV_Target1;
}
[shader("fragment")]
Output main(...) { ...}
```
can be used to set up dual-source blending on both D3D and Vulkan APIs. The output GLSL for the above will look something like:
```glsl
layout(location = 0) out vec4 a;
layout(location = 0, index = 1) out vec4 b;
void main() { ... }
```
The more or less straightforward parts of this change were:
* Added new `attribute_syntax` declarations to the stdlib, for `[[vk::location(...)]]` and `[[vk::index(...)]]`
* Added new AST node types for the new attribute cases, sharing a base class so that argument checking can be shared
* Added checks for the arguments to the new attributes in `slang-check-modifier.cpp` (eventually this kind of logic shouldn't be needed for new attributes)
* Updated GLSL emit logic so that it treats the `index`/`space` parts of a variable layout as the `location`/`index` for varying parameters.
* Updated GLSL legalization so that when it translates entry-point parameters into globals (and scalarizes structures) it handles both a binding index and space for the parameters.
* Added a cross-compilation test case to verify that the basics of the feature work
The remaining work is all in `slang-parameter-binding.cpp`.
There is some work that isn't technically related to this change (and which could be reverted if it causes problems), around the detection and handling of fragment shader outputs with `SV_Target` semantics. The basic changes (which could be backed out and then merged separately) are:
* Made the special-case `SV_Target` logic only trigger for fragment shaders (that is the only place where `SV_Target` should appear, but we weren't guarding against it)
* Made the logic to reserve a `u<N>` register for `SV_Target<N>` only trigger for D3D Shader Model 5.0 and below (since it is not required for SM 5.1 and up). This could be a breaking change for some users, but that seems unlikely.
* Fixed one test case that relied on the behavior of reserving `u0` for `SV_Target0` even though it was a SM6.0 test.
* Also added more comments to the system-value handling logic.
The more interesting changes come up starting in `processEntryPointVaryingParameterDecl()`. The basic issue is that we have so far only supported implicit layout for varying parameters on GLSL/Vulkan, but the `[[vk::location(...)]]` attribute is a form of explicit layout annotation. Rather than try to kludge something that only works in narrow cases, I instead opted to try to fix things more generally.
In `processEntryPointVaryingParameterDecl()` we now check for the `location` and `index` attributes when we are on "Khronos" targets (Vulkan/OpenGL/GLSL) and immediately add them to the variable layout being constructed if they are found. There is nothing in this logic specific to fragment-shader outputs, so this feature now applies to any varying input/output on Khronos targets.
Allowing explicit layouts creates the potential for mixing implicit and explicit layout. For example, consider:
```hlsl
struct Output
{
float4 color : COLOR;
[[vk::location(0)]] float3 normal : NORMAL;
}
```
What `location` should `color` get? Should this code be an error? There are two cases where this conundrum can come up: when working with `struct` types used for varying parameters, and the entry-point parameter list itself.
For the varying `struct` case we currently make an expedient choice. We handle fields with both implicit or explicit layotu with appropriate logic, but logic that doesn't account for the case of mixing the two. Then at the end of layout for the `struct` we issue an error if there was a mix of implicit and explicit layout (such that our results aren't likely to be valid).
For the entry point varying parameter case, things were already using a `ScopeLayoutBuilder` type (that encapsulates some logic shared between entry-point and global parameters). The entry-point-specific bits were moved out into a `SimpleScopeLayoutBuilder` and it was updated so that rather than assuming all parameters use implicit layout it does a two-phase layout approach similar to what we use for the global scope:
* First all parameters are enumerated to collect explicit bindings and mark certain ranges as "used"
* Next the parameters are enumerated again and those without explicit bindings get allocated space using a "first fit" algorithm
In principle we could extend the two-phase approach to apply to `struct` types as well, but that would be best saved for a future refactoring of some of this parameter binding logic, since I would like to exploit more of the opportunities for sharing code across the uniform/varying and struct/entry-point/global cases.
By moving the point where entry point parameters get their offsets assigned, it was necessary to move around some of the logic that removes varying parameter usage (and other things that shouldn't "leak" out of an entry point) to a different point in the entry point layout process.
While adding these various pieces does not quite enable us to support explicit bindings on entry point parameters (e.g., putting `uniform Texture2D t : register(t0)` in an entry point parameter list) or in `struct` types (e.g., explicit `packoffset` annotations on fields), it starts to provide some of the infrastructure that we'd need in order to support those cases.
|
| |
|
|
|
| |
This appears to have been an oversight in the work that added support for `imageStore` as well as atomics when writing to `RWTexture*` and friends. The HLSL/Slang `RWBuffer` type maps to GLSL as an `imageBuffer`, which is effectively just another case of writable texture image (bonus points to anybody who can explain to me the meaningful distinction between an `imageBuffer` and an `image1D`).
This change copies the handling of subscript access (`operator[]`) from textures over to buffers, and adds a test case to confirm that the new handling works for the simple case of setting a buffer element.
|
| |
|
|
|
|
|
|
|
|
| |
* When checking if an instruction can be folded, take into account if it's called by a target intrinsic, because if it is we need to check if the parameter is accessed multiple times to see if it's worth allowing to fold.
* Tidy up code around folding/target intrinsics.
* Fix texture-load.slang .
* Fix typo in assert.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add test result for compile-to-cuda
* Add RAII for some CUDA types to simplify usage.
* First pass handling of some instrinsics on CUDA (for example transcendentals)
* CUDA working with built in intrinsics.
* Add missing CUDA prelude intrinsics.
* CUDA matches CPU output on simple-cross-compile.slang
* First pass at hlsl-scalar-float-intrinsic.slang test.
* Fix smoothstep impl on CUDA and CPU.
* Fixed step intrinsic on CUDA/CPU.
* Added operator[] to Matrix for C++, to allow row access.
Needs a fix for CUDA.
* Fixed warning on clang build.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds some new entry points to the reflection API to cover corner cases that a majority of applications won't care about.
These are most likely to come up for users who want to make a complete copy of the Slang reflection information into a data format of their own design.
All of the information is stuff that we already computed as part of layout, and just hadn't exposed:
* Alignment information for type layouts. This is only useful for ordinary/uniform data; in all other cases alignment is always one. Even for uniform/ordinary data, it is unlikely that any application would actually make use of it.
* Layout information for the result of an entry point function. This would be useful for applications that need to enumerate the varying outputs (user- or system-defined) of a shader. Having information available for `out` parameters but not the function result was inconsistent.
* The "element type" of a parameter block type (e.g., going from `ParameterBlock<X>` to `X`). This seems to have been an oversight since `ConstantBuffer<X>` appears to have been implemented, and the case for a type *layout* was handled.
* The "container" variable layout for a parameter block or constant buffer. It took a while for us to arrive at the current representation of layout for parameter groups, and most client code continues to use the original API that requires us to generated kludged "do what I mean" data. However, if we don't expose the more useful new representation fully, there is no way for users to take advantage of it!
The reflection test tool has been updated to print the new information where it makes sense, which provides us some level of coverage for the new code.
Unfortunately, this led to some cascading changes:
* First, a bunch of the tests had their output changed since they include new information. That's the easy bit.
* Next, the "container" and "element" var layouts don't actually have names (because there is no actual variable underlying them), which means that the code to emit variable names in the JSON dump needed to be condition.
* Making the `"name"` output conditional messed up a lot of the delicate logic that had been dealing with when to emit commas for the output JSON (JSON uses commas as separators, and doesn't allow trailing commas). I added a bit of new infrastructure to make it simple(-ish) to track when a comma actually needs to be output.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `TEST_INPUT` facility allows textual Slang test cases to provide two kinds of information to the `render-test` tool:
1. Information on what shader inputs exist
2. Information on what values/objects to bind into those shader inputs
Under the first category of information, there exists supporting for attaching a `dxbinding(...)` annotation to a `TEST_INPUT` which seemingly indicates what HLSL `register` the input uses. There is a similar `glbinding(...)` annotation, used for OpenGL and Vulkan.
It turns out that these annotations were, in practice, completely ignored and had no bearing on how `render-test` allocates or bindings graphics API objects. There was some amount of code attempting to validate that explicit registers/bindings were being set appropriately, but the actual values were being ignored.
The visible consequence of the `dxbinding` and `glbinding` annotations being ignored is issue #1036: the order of `TEST_INPUT` lines was *de facto* determining the registers/bindings that were being used by `render-test`.
This change simply removes the placebo features and strips things down to what is implemented in practice: the `TEST_INPUT` lines do not need target-API-specific binding/register numbers, because their order in the file implicitly defines them.
I added logic to the parsing of `TEST_INPUT` lines to make sure I got an error message on any leftover annotations, and went ahead and systematicaly deleted all of the placebo annotations from our test cases.
If we decide to make `TEST_INPUT` lines *not* depend on order of declaration in the future, we can build it up as a new and better considered feature.
The main alternative I considered was to keep the annotations in place, and change `render-test` and the `gfx` abstraction layer to properly respect them, but that path actually creates much more opportunity for breakage (since every single test case would suddenly be specifying its root signature / pipeline layout via a different path using data that has never been tested). The approach in this change has the benefit of giving me high confidence that all the test cases continue to work just as they had before.
|
| |
|
|
|
|
| |
* Fix GetDimensions for glsl.
* Add test for Load on RWStructuredBuffer as part of GetDimension.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* WIP: Improving CPU performance/ABI
* Optionally output code on CPU for groupThreadID and groupID.
* Added ability to set compute dispatch size on command line for render-test.
Dispatch compute tests taking into account dispatch size.
Added test for semantics are working.
* Test using GroupRange.
* Fix problem with adding \n for externa diagnostic - to do it if there isn't a \n at the end. Change the ouput order (put result before) so last value is diagnostic string.
* Made GroupRange the default exposed CPU ABI entry point style.
Removed CPU_EXECUTE test style -as tested via the now cross platform render-test
* Split out execution from setup for execution to improve perf.
* For better code coverage/testing test all styles of CPU compute entry point.
* Improve documentation for ABI changes for CPU code.
Add 'expecting' to error message from review.
* Fix small typos.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Try to make x86 builds on x86 platforms (not the default for the os).
* Use c style include for stdint.h cos not found on x86 linux.
* Simplified x86 issue for linux.
* Fix typo.
* Remove the need for the shared-library category.
* Disable CPU tests on linux x86.
* Fix typo.
* Named test requirement methods so overloading not confusing (around flags, and SlangPassThroughType which are both 'int')
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Added setDownstreamCompilerPrelude
Renamed setPassThroughPath to setDownstreamCompilerPath.
Fixed tests.
Added prelude directory & code to TestToolUtil to setup default preludes for testing/command line apis.
* Fix merge problem
* Remove hacks to make prelude work by adding a search path as no longer needed with 'user prelude'.
* Split up prelude into scalar intrinsics, and types.
Use slang.h for main header.
slang-cpp-prelude.h can now just include what it needs (relative to prelude directory) and define the few remaining things/work arounds.
* Fix typo.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* * Simplify some of test code around CPPCompiler
* Test using 'callable' with pass-through
* Small cpu doc improvements
* Improvements to Clang output parsing.
* Remove temporary file (base filename) .
* Improve handling of external errors - handle severity.
* On error dumping out to 'actual' file for runCPPCompilerCompile.
* Small fixes.
Set the source language type correctly for pass thru.
* Remove warning for test for clang backend c
* Preliminary work around making render-test compute potentiall work with CPU.
Made ShaderCompiler -> a stateless ShaderCompilerUtil.
Means we don't require a Renderer interface to do shader compilation.
* Refactor such that CPU test can take place in without Window or Renderer.
* Hack to look for prelude in source file directory.
Fix bug returning the SharedLibrary for HostCallable.
* Compute test running on CPU.
* Need the prelude currently in same directly as test.
* Hack to remove warning - that then produces an error on appveyor build.
Disable running render CPU test on non-windows.
* Improve handling of disabling CPU tests on linux.
* Added bit-cast.slang working on CPU.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* First pass support for compiling to a loaded shared library.
* Improve documentation for cpu target.
* Removed the SLANG_COMPILE_FLAG_LOAD_SHARED_LIBRARY flag.
Use the SLANG_HOST_CALLABLE code target
Document mechanism.
* Fix typo in cpp-resource.slang
In test code if the target is 'callable' we don't need to compile (indeed there is no source file).
* Small refactor using CommandLineCPPCompiler as base class to implement VisualStudioCPPCompiler and GCCCPPCompiler.
* Improvements around CPPCompiler.
Mechanism to know products produced.
Cleaning up products after execution.
* Fix multiple definition of 'SourceType'
|
| |
|
|
|
|
| |
* Added CPU_REFLECTION test option - that has two versions of the reflection output depending on ptr size.
* Added 'shared-library' test category. This category is disabled on CI targets that have issues.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Expanded prelude for some other resource types. Disable C++ output for ParameterGroup.
* WIP: Layout for CPU.
* Fixes to CPU layout.
* WIP: The uniform is output, but the variable definition is not.
* WIP: Entry point parameters to global scope in C++.
Handling of resource types (in so far as outputting)
* Some discussion of ABI and different input types.
* WIP: More C++ support around resource types.
* WIP: Split up variables into different structures on emit.
* WIP: Emitting C++ with wrapping up of 'Context'
* WIP: C++ code has access to semantic values.
Wrap in struct so can use method calls to pass shared state.
Disable legalizeResourceTypes and legalizeExistentialTypeLayout
* Fix structured buffer layout for CPU.
* Remove testing/handling of global uniforms on CPU path.
Typo fix.
Changed CPU tests to use new CPU calling convention.
* Check globals are working. Initalize context to zero globals.
* Order the global parameters for C++ ouput by their layout.
Note - that layout isn't quite working correctly because the StructuredBuffer<int> the int seems to be consuming uniform space.
* Work around for reflection not having all data needed for layout ordering for C++ code.
* Output constant buffers as pointers.
* Entry point parameters accessed through pointer to struct.
* WIP: Layout for CPU is reasonable for test case.
* Only output 'f' after float literal if type marks as a float.
* Cast construction works on C++.
* Made IntrinsicOp::ConvertConstruct to make intent clearer.
* C++ handling construction from scalar.
Handle access of a scalar with .x.
Check default initialization.
* Comment about need for split of kIROp_construct.
Release build works.
* Added support from constructVectorFromScalar to C/C++ target.
* Handling of in/out in C/C++.
* First pass documentation CPU support.
* Improvements to C++/C slang code generation documentation.
* Small doc change to include need for mechansim to specify cpp compiler path.
* Better handling of swizzling - allow swizzling a scalar into a vector.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* WIP: Adding support for C/C++ compilation to slang API.
* Removed BackEndType in test harness -> use SlangPassThrough to identify backends
Only require stage for targets that require it.
Detection of all different backends.
* Windows/Unix create temporary filename.
* WIP: Output CPU binaries.
* Added a pass-through c/c++ test.
* Compile C++/C and store in temporary file.
* Read the binary back into memory.
* Set debug info and optimization flags for C/C++.
Make the CPPCompiler debug/optimization levels match slangs.
* Handling of include paths and math precision.
* Dumping c++/c source and exe/shared library.
* Put hex dump into own util.
* End to end pass through c compilation test.
* WIP: Simple execute test working on Linux/Unix.
* Fix typo on linux.
* WIP: To compile slang to cpp shared library. Report backend compiler errors.
* Compiles slang -> cpp and loads as shared library.
* Fix problem on c-cross-compile test because prelude is now included with <> quotes.
* Run slang generated cpp code - using hard coded data.
* Added cpp-execute-simple, and test output.
* Fix warning that broke win32 build.
* Fix compilation problem on osx.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* WIP: Emitting Cpp
* Added HLSLType instead of using IRInst - because they don't seem to be deduped.
* Removed need for lexer to take a String.
Added mechansim to lookup intrinsic functions on C++.
* A c/c++ cross compilation test.
* WIP Cpp output using cloning and slang types.
* More work to generate mul funcs.
* WIP: Outputting some simple C++.
* Expose findOrEmitHoistableInst to IRBuilder to aid cloning,
* Simplification for checking for BasicTypes.
Test infrastructure compiles output C++ code.
* Dot and mat/vec multiplication output.
* First pass at swizzling.
* First support for binary ops.
* Builtin binary and unary functions.
* Any and all.
* WIP adding support for other functions.
Added code to generate function signature.
* Add scalar functions to slang-cpp-prelude.h
* Support for most built in operations.
* Tested first ternary.
* Checking the emitting of corner cases functions - normalize, length, any, all, normalize, reflect.
* Check asfloat etc work.
* Fmod support.
* WIP Array handling in C++.
* First stage in being able to handl arbitrary type output for CLikeSourceEmitter
* Removed Handler/Emitter split - so can implement more easily complex type naming.
* Array passing by value first pass.
* Rename Array -> FixedArray
* Outputs structs in C++.
* Emit the thread config.
* Dimension -> TypeDimension
* SpecializedOperation -> SpecializedIntrinsic
Operation -> IntrinsicOp
Use shared impl of isNominalOp
Commented use of m_uniqueModule etc.
* Add code to test slang->cpp when compiled doesn't have errors. Does so by building shared library and exporting the entry point.
* Fix linux clang/gcc compile error about override not being specified.
* Make sure c-cross-compile is run on linux targets/smoke.
* Remove c-cross-compile.slang from smoke.
* Fix running tests/cross-compile/c-cross-compile.slang on Ubuntu 16.04
* Only add -std=c++11 for C++ source.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Removed the need for VisualStudio specific CPPCompiler
Improved the version parsing for gcc/clang
Removed need for slang-unix-cpp-compiler-util.cpp/.h
Remove binary before compiling in the compile c tests
* Moved VisualStudio calcArgs into CPPCompilerUtil - as code is not windows specific.
* Set up compile time version for gcc and clang
* Fix compilation on OSX - use remove instead of unlink for file deletion.
* On OSX - clang uses different string format.
* Removed /bin/sh invoking as not required for OSX.
* First pass working testing with shared libraries.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Work in progress to be able to invoke VS from within code.
* First pass at windows version of refactor of OSProcessSpawner
* Closer to getting VS path lookup working.
* Make OSString assignable/ctor able
* Work out program files directory directly, so don't have to expand %%.
* WIP: Improve handling of process spawning.
* Add support for splitting input by line.
* * Correctly locates visual studio install
* Added functionality to invoke vs via cmd
* Add option to execute the command line.
* Handle in ProcessUtil for windows -> WinHandle.
* Rename files slang-win-visual-studio-util.cpp/.h and slang-process-util.h
* First pass at unix/linux version of ProcessUtil.
* Fix reading Visual Studio path from the registry.
* Get compiling on linux with.
* Fix vcvarsall.bat name
* Use ProcessUtil to execute external code.
* Remove OSProcessSpawner.
* Remove includes for "os.h" where no longer needed.
* Fix tabbing issue in premake5.lua
Remove test code from slang-test-main.cpp
* Fix premake4.lua tabbing issue.
* Small fixes to slang-process-util.h
Init ExecuteResult on Win execute.
* Improve comments.
* Fix bug in StringUtil::calcLines - with oddly terminated source input being able to read past end.
Make slang-generate use StringUtil over it's own impl.
* Fix off by one bug in working out Visual Studio version.
* Fix bug in calculating Visual Studio Version
* Fix compilation on linux with string parameter being passed to messageFormat.
* Remove erroneous use of kOSError codes - use Result.
* First effort to generate standard compiler options.
* Initial efforts in compiling source code in test framework for VisualStudio.
* Testing compiling c code on VisualStudio on Windows.
* Fix warning on linux.
* Fix clang on linux warning (and therefore failing) returning a StringBuilder as String.
* Disable return-std-move on clang.
* CommandLine arguments are now tagged if they are escaped or not. That it is the clients responsibility to escape command lines that cannot be automatically escaped.
* Add checks on unix/linux that command line args are all unescaped.
* WIP getting runtime GCC to work.
* First pass compiler working on unix-like targets.
* Enable c-compile.c test on 'smoke'.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Work in progress to be able to invoke VS from within code.
* First pass at windows version of refactor of OSProcessSpawner
* Closer to getting VS path lookup working.
* Make OSString assignable/ctor able
* Work out program files directory directly, so don't have to expand %%.
* WIP: Improve handling of process spawning.
* Add support for splitting input by line.
* * Correctly locates visual studio install
* Added functionality to invoke vs via cmd
* Add option to execute the command line.
* Handle in ProcessUtil for windows -> WinHandle.
* Rename files slang-win-visual-studio-util.cpp/.h and slang-process-util.h
* First pass at unix/linux version of ProcessUtil.
* Fix reading Visual Studio path from the registry.
* Get compiling on linux with.
* Fix vcvarsall.bat name
* Use ProcessUtil to execute external code.
* Remove OSProcessSpawner.
* Remove includes for "os.h" where no longer needed.
* Fix tabbing issue in premake5.lua
Remove test code from slang-test-main.cpp
* Fix premake4.lua tabbing issue.
* Small fixes to slang-process-util.h
Init ExecuteResult on Win execute.
* Improve comments.
* Fix bug in StringUtil::calcLines - with oddly terminated source input being able to read past end.
Make slang-generate use StringUtil over it's own impl.
* Fix off by one bug in working out Visual Studio version.
* Fix bug in calculating Visual Studio Version
* Fix compilation on linux with string parameter being passed to messageFormat.
* Remove erroneous use of kOSError codes - use Result.
* First effort to generate standard compiler options.
* Initial efforts in compiling source code in test framework for VisualStudio.
* Testing compiling c code on VisualStudio on Windows.
* Fix warning on linux.
* Fix clang on linux warning (and therefore failing) returning a StringBuilder as String.
* Disable return-std-move on clang.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* WIP: Setting up C/Cpp source compilation targets.
* WIP: Emitting C/CPP.
* WIP: Split out SourceSink, and use it for source output on emit.
* SourceSink -> SourceStream
* * Made SourceStream use m_ prefixing of members.
* Make all methods use lower camel
* Removed methods from SourceStream interface that are not used externally (use _ prefixing)
* Improvements to documentation
* EmitContext is now effectively empty, so just use SharedEmitContext as EmitContext.
* SharedEmitContext -> EmitContext
* Methods to LowerCamel in emit.cpp
* Split out EmitContext and ExtensionUsageTracker into separate files.
* Split out EmitVisitor into slang-c-like-source-emitter files.
* EmitVisitor -> CLikeSourceEmitter
* Tidy up around CLikeSourceEmitter - simplify header.
* Small tidy up - removing repeated comments that are in header.
* Remove EmitContext paramter threading.
* Small tidy up.
Use prefixed macros for slang-c-like-source-emitter.h
* Small tidy up in slang-c-like-source-emitter.cpp
* First pass at splitting out UnmangleContext.
* MangledNameParser -> MangledLexer.
* WIP making EmitOp (EOp) enum available outside of cpp
* Generating EmitOpInfo from macro.
* Split out emit precedence handling.
Don't use kOp_ style anymore, just use an array indexed by EmitOp.
* Disable C simple test for now.
* Keep g++/clang happy with token pasting.
* Fix win32 narrowing warning.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Translate .Load() to imageLoad() for Vulkan
We were already emitting calls to `imageLoad()` and `imageStore` when a `RWTexture*` was used with `operator[]`:
```hlsl
RWTexture2D<float> myTex;
...
float value = myTex[xy]; // becomes an imageLoad
myTex[xy] = value; // becomes an imageStore
```
However, we were *not* correctly handling the translation of an explicit `.Load()` operation:
```hlsl
float value = myTex.Load(xy);
```
The `.Load()` operation was being translated to a GLSL `texelFetch` as it would be a for a `Texture2D`, and not to an `imageLoad()` as would make sense for a `RWTexture2D` (which becomes a GLSL `image2D`).
This fix is confined to the stdlib, and is mostly a matter of emitting either `texelFetch` or `imageLoad` as the GLSL function name depending on the "access" of the resource type. It is messy code, but straightforward.
One extra detail was that there had been logic to emit a `, 0` argument in the `texelFetch` calls in the non-read-only case, because `texelFetch` usualy requires an explicit mip-level argument and `.Load()` on a `RWTexture*` doesn't recieve an LOD parameter. This is a non-issue now that we are calling `imageLoad()` instead, because `imageLoad` doesn't need/want the extra argument.
* fixup: change test baseline based on recent GLSL output changes
* fixup: review feedback
|
| |
|
|
|
|
|
|
|
|
|
| |
* Specify glsl semantic format - such that conversions are possible from hlsl sematics.
* Comment improvements. Give appropriate type in glsl for sv_tessfactor. Note that sv_tessfactor is not functional though.
* Work in progress for comparison of types.
* * Fix type comparison issues around the hash.
* Fix tests whos output changed with use of isTypeEqual
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #858
The `precise` keyword exists in both HLSL and GLSL and when applied to a variable declaration is supposed to indicate that all computations that contribute to the value of that variable should not be altered based on "fast-math" optimizations. The main examples are that separate multiply and add operations should not be turned into fused multiply-add (fma) operations, and that operations cannot ignore the possibility of infinity or not-a-number values (e.g., by assuming that `x * 0.0f` is always `0.0f`).
(Aside: it is possible that my understanding of what the semantics of `precise` are in HLSL and GLSL is imperfect so that either the GLSL variant isn't sufficient to provide the semantics of the HLSL keyword, or that the definition of "all computations that contribute" to a value isn't actually correct. We may need to revise this implementation based on subsequent learnings.)
The basic idea here is to turn the AST `precise` keyword into a `[precise]` decoration in the IR and then emit that as a `precise` keyword again in the output.
The main catch is that whereas most of our existing IR decorations apply to things like global shader parameters or `struct` members that usually stick around for the duration of compilation, `[precise]` will get slapped on local variables that will often get optimized away by our SSA pass. There are two ways a variable can get eliminated/replaced during the SSA pass:
1. A use of the variable can be replaced with an ordinary instruction that computes its value.
2. A use of the variable can be replaced with a reference to a "phi node" that will take on the appropriate value based on control flow.
These two cases already had logic to copy a "name hint" decoration from the variable over to an instruction that will replace it, and I simply extended them to also propagate over a `[precise]` decoration.
The test case added with this change intentionally constructs a case where `[precise]` needs to be propagated over to an SSA "phi node" in order to generate correct output code.
The other gotcha is that we can emit variable declarations in various places in `emit.cpp`, and all of these needed to handle `[precise]`. Not only do we have actually local variables (`IRVar`), but we also have SSA phi nodes (`IRParam`), and then there are cases where an intermediate computation (an ordinary instruction) should be `[precise]` and thus we need to emit it as a temporary (not folding it into its use sites) and make sure that the temporary itself gets the `precise` keyword.
I have manually confirmed that in the output SPIR-V, this change results in the `NoContraction` SPIR-V decoration being added to the relevant operations, and the output DXBC contains a multiply and an add in place of a multiply-add. The output DXIL does not show any obvious changes due to `precise`, although the exact order and operands of the math instructions emitted does differ when `precise` is added/removed. In all cases the output is equivalent to hand-written HLSL/GLSL with a `precise`-qualified local variable.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add better control over image formats for GLSL/SPIR-V targets
Currently Slang emits GLSL code assuming all R/W images need to have explicit formats, and thus we try to infer a format from the element type of the image.
E.g., given a `RWTexture2D<half4>` we might infer that a qualifier of `layout(rgba16f)` should be used.
This strategy has two notable shortcomings:
* Sometimes the user will want a format that doesn't match an existing HLSL type. E.g., if they want the equivalent of `layout(r11f_g11f_b10f)`, then what should they put in their `RWTexture2D<...>` to make the inference do what they need?
* Sometimes the user knows that they don't need to specify a format *at all*, because using the `GL_EXT_shader_image_load_formatted` extension, they can still perform non-atomic load/store on images with no format specified in the SPIR-V.
This change adds two features directed at these challenges.
First, we add an explicit `[format(...)]` attribute that can be used to specify an explicit image format, including ones that don't match any HLSL type.
An example of using this new attribute is:
```hlsl
[format("r11f_g11f_b10f")]
RWTexture2D<float3> myImage;
```
For simplicity in initial bring-up, the new formats all use the same naming as formats in GLSL (this should make it easy for a programmer who knows what they expect to get in the GLSL output). We can change the naming convention for formats at a later time, so long as we keep these existing names in as a compatibility feature.
Note that this is *not* given a `vk::` prefix since the attribute should signal the programmer's intent to provide an image with that format on *all* targets (although only some targets might act on that information).
Also note that the attribute takes a string (`[format("rgba8")`) instead of a bare identifier (`[format(rgba8)]`) because this is consistent with the existing convention for attributes in HLSL.
When `[format(...)]` is left off, the default compiler behavior will still be to infer a format, but this behavior can be overidden for a single image using an explicit format of `"unknown"`:
```hlsl
[format("unknown")]
RWTexture2D<float4> mysteryMachine;
```
The second new feature is that if a user knows they are coding for a GPU that supports the `"unknown"` format in all non-atomic cases, then they can opt into making that the default for images without an explicit `[format(...)]`, using the new `-default-image-format-unknown` command-line option for `slangc`.
The new test case included with this change confirms that we correctly see the explicit formats in the output GLSL and *no* formats for images without explicit `[format(...)]` when using the new command-line option. The test stresses images declared at global scope, in parameter blocks, and in entry-point parameter lists, to try and make sure that all the relevant IR passes in the compiler preserve the format information.
* fixup: missing file
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before type legalization we might have code like: (using pseudo-Slang-IR):
struct P { ... Texture2D<float>[] t; }
global_param p : ParameterBlock<P>;
...
// p.t[someIndex].Load(...);
//
let ptrToArrayOfTextures = getFieldPtr(p, "t") : Ptr<Texture2D<float>[]>;
let ptrToTexture = getElementPtr(ptrToArrayOfTextures, someIndex) : Ptr<Texture2D<float>>;
let texture = load(ptrToTexture) : Texture2D<float>;
let result = call(loadFunc, texture, ...) : float;
Legalization needs to move the `t` array there out of the `p` parameter block, so the global declarations become something like:
struct P_Ordinary { ... }; // no more "t" field
global_param p_ordinary : ParameterBlock<p_ordinary);
global_param p_t : Texture2D<float>[];
In terms of the code to access `p.t[someIndex]` the problem is that `p_t` has one less level of indirection than `p.t` had. We solve this in the type legalization pass using "pseudo-types" and "pseudo-values," where one of the cases is `implicitIndirect` which holds a value of type `T`, but indicates that it should act like a value of type `T*`.
We then use some basic rules for dealing with `implicitIndirect` values, such as:
load(implicitDeref(x)) : T => x : T
getFieldPtr(implicitDeref(s), f) => implicitDeref(getField(s, f))
getElementPtr(implicitDeref(a), i) => implicitDeref(getElement(a, i))
The bug here was that for the `getFieldPtr` and `getElementPtr` cases, we weren't computing the type of the `getField` or `getElement` instruction correctly. We were copying the type from the `getFieldPtr` or `getElementPtr` operation over directly, but those will be *pointer* types and we need the type of whatever they point to.
Once the types are fixed, we can properly generate legalized IR for `p.t[someIndex].Load(...) that looks like:
let arrayOfTextures = p_t : Texture2D<float>[];
let texture = getElement(arrayOfTextures, someIndex) : Texture2D<float>;
let result = call(loadFunc, texture, ...) : float;
The old was giving the `texture` intermediate a type of `Ptr<Texure2D<float>>`. That didn't actually trip up too many things, because we mostly just went on to emit code from something with slightly incorrect types for intermediates that never show up in the generated HLSL/GLSL.
Where this caused a problem is for some of the intrinsic function definitions for the GLSL/Vulkan back-end, because those do things that inspect operand types. In particular the `$z` opcode in our intrinsic function strings triggers logic that looks at a texture operand, and uses its type to try to find the appropriate swizzle to get from a 4-component vector to the appropriate type for the operation (e.g., for a load from a `Texture2D<float>` we need to swizzle with `.x` to get a single scalar out of the matching GLSL texture fetch operation).
The main fix in this change is thus to make `getElementPtr` and `getFieldPtr` legalization properly account for the fact that when switching to `getElement` or `getField` we need a result type that is the "pointee" of the original result.
There was already logic to extract the pointed-to type from a pointer in `ir-specialize.cpp`, so I extracted that to a re-usable function in the IR as `tryGetPointedToType` (returns null if the type isn't actually a pointer).
This logic needed to be extended for type legalization, to deal with the various "pseudo-type" cases.
There is another fix in this change which is marking the `NonUniformResourceIndex` function as `[__readNone]`, which enables it to be more aggressively folded into use sites. Without that fix, we risk emitting code like:
```glsl
int tmp = nonUniformEXT(someIndex);
vec4 result = texelFetch(arrayOfTextures[tmp], ...);
```
The problem with that code is that (at least by my reading of the spec), assigning to the variable `tmp` that isn't declared with the `nonUniformEXT` qualifier effectively loses that qualifier, and drivers are free to assume that `tmp` is uniform when used to index into `arrayOfTextures`.
Marking the `NonUniformResourceIndex` function as `[__readNone]` indicates that it has no side effects, which should mean that our emit logic no longer needs to emit it was its own line of code to be safe.
The effects of this change are confirmed by both the new test case added, and the existing `non-uniform-indexing` test.
|
| | |
|