slang.git/tests/bindings/glsl-parameter-blocks.slang.glsl, branch master

[SPIRV] Support `globallycoherent` and `[vk::index()]`. (#3488)

2024-01-24T23:36:49+00:00

* [SPIRV] Support `globallycoherent` modifier. * Fix. * Disable executable cooperative vector tests. * Update expected failure. * [SPIRV] Emit varying output index decoration. * Add test. * Update tests. * Fix test. * Emit `SpvExecutionModeEarlyFragmentTests`. * Lower `StructuredBuffer`. * Support globallycoherent on ByteAddressBuffer. --------- Co-authored-by: Yong He

SPIR-V image operations (#3163)

2023-09-05T15:26:59+00:00

* Add __truncate and __sampledType for spirv_asm Allows some texture tests to start passing * add __isVector Currently unused * Add 1-vector legalization pass (WIP) * Add capabilities for image types * neaten instruction dumping * add 1-vector test * Add a couple of cases to vec1 legalization * Remove texture tests from expected failures * comment * regenerate vs projects * Remove redundant define form synchapi emulation * refactoring image methods * All sample functions refactored * Remove incorrect glsl intrinsics Partially addresses https://github.com/shader-slang/slang/issues/3174 * __subscript image ops via writing funcs * Extract texture struct writing from core.meta.slang * Abstract out cuda intrinsic * Remvoe erroneous call to opDecorateIndex * spirv asm IR utils * Correct position of loads for SPIR-V asm inst operands * Raise constructors to global scope during spir-v legalization * Correct snippet output * Implement most texture sampling ops for SPIR-V * Legalize 1-vectors for glsl too * Make SPIR-V inst operands non-hoistable * Better 1-vector legalization * Put textures in ptrs for spirv * insert missing break * Add vec1 legalization test * Add some missing pieces to slang-ir-insts * Greatly neaten vec1 legalization * a * Neaten vec1 legalization * Add image read and write intrinsics for spir-v * Squash warnings * regenerate vs projects * Drop redundant guards * Drop 5 tests from expected failure list * Inst numbering changes to cross compile tests * vec1 legalization tests only on vk * Correct location of asm op emit * Inline constant in spirv-asm * Correct signedness for lane in wave intrinsics * Extract element from float1 for cuda * squash warnings * Neaten spirv-emit * dedupe more capabilities * warnings * neaten assert * comments * comments

Various dxc/fxc compatibility fixes. (#2863)

2023-05-03T03:29:38+00:00

* Various dxc/fxc compatibility fixes. * Cleanup. * Fix test cases. * Fix comments. --------- Co-authored-by: Yong He

Detect and deduplicate read-only resource access. (#2680)

2023-02-27T23:18:07+00:00

* Detect and deduplicate read-only resource access. * Fix tests. * Fix tests. --------- Co-authored-by: Yong He

Change how buffers are emitted (#741)

2018-12-07T21:31:06+00:00

* Change how buffers are emitted This is a change with a lot of pieces, which can't always be separated out cleanly. I'm going to walk through them in what I hope is a logical order. The main goal of this change was to allow arrays of structured buffers to translate to Vulkan. Consider two declarations of structured buffers in HLSL/Slang: ```hlsl StructuredBuffer single; StructuredBuffer multiple[10]; ``` The current translation logic was handling `single` by translating it into an *unnamed* GLSL `buffer` block like: ```glsl layout(std430) buffer _S1 { X single[]; }; ``` That syntax allows an expression like `single[i]` in Slang to be translated simply as `single[i]` in GLSL. But that naive translating doesn't work for `multiple`, since we need to declare a array of blocks in GLSL, which requires giving the whole thing a name: ```glsl layout(std430) buffer _S2 { Y _data[]; } multiple[10]; ``` Now a reference to `multiple[i][j]` in Slang needs to become `multiple[i]._data[j]` in GLSL. To avoid having way too many special cases around single structured buffers vs. arrays, it makes sense to allows emit things in the latter form, so that we instead lower `single` as: ```glsl layout(std430) buffer _S1 { X _data[]; } single; ``` So that now a reference to `single[i]` becomes `single._data[i]` in GLSL. Most of that can be handled in the standard library translation of the structured buffer indexing operations. The only wrinkle there is that there were some *old* special-case instructions in the IR intended to handle buffer load/store operations (these were added back when I was trying to keep the "VM" path working). These aren't really needed to have structured-buffer operations work; they can be handled as ordinary functions as far as the stdlib is concerned. I removed the old instructions. Along the way, it became clear that a few other cases follow the same pattern. Byte-addressed buffers are an obvious case. We were lowering HLSL/Slang: ```hlsl ByteAddressBuffer b; ... uint x = b.Load(0); ``` to GLSL like: ```glsl layout(std430) buffer _S1 { uint b[]; }; ... uint x = b[0]; ``` That logic would fail for arrays the same way that the structured buffer case was failing. The fix is the same: use named `buffer` blocks and then introduce an explicit `_data` field: ```glsl layout(std430) buffer _S1 { uint _data[]; } b; ... uint x = b._data[0]; ``` Just like with structured buffers, all of the VK translation for operations on byte-addressed buffers can be implemented directly in teh stdlib, so once the emit logic was changed it was just a matter of adding `._data` to a bunch of VK tranlsations. It turns out that arrays of constant buffers have more or less the same problem, and furthermore we have some problems with any code that directly uses the modern HLSL `ConstantBuffer` type. Note: the emit logic around constant buffers sometimes refers to "parameter groups" because that is being used in the compiler as a catch-all term for constant buffers, texture buffers, and parameter blocks. The existing code was going out of its way to reproduce the way that constant buffer declarations are implicitly referenced in HLSL: ```hlsl cbuffer C { float f; } ... float tmp = f; // No reference to `C` here ``` This can be seen in the emit logic with the `isDerefBaseImplicit` function, which is used to take the internal IR representation for a reference to `f` (which is closer to the expression `(*C).f` or `C->f`) and leave off any reference to `C` so that we emit just `f`. That kind of logic just flat out doesn't work in some important cases. Arrays of constant buffers are a clear one: ```hlsl ConstantBuffer cbArray[3]; ... X x = cbArray[0]; ``` There is no way to translate that to an ordinary `cbuffer` declaration at all. The same problem can be created without arrays, though: ```hlsl ConstantBuffer singleCB; ... X x = singleCB; ``` The current strategy for translating constant buffers was translating `singleCB` into a `cbuffer` declaration that reproduced the fields of `X` as its members, which just wouldn't work: ```hlsl cbuffer singleCB { float f; // field of `X` } ... X x = singleCB; // ERROR: there is nothing named `singleCB` in this HLSL ``` The new strategy is more consistent. We still generate a `cbuffer` declaration for a single constant buffer, but we always give it a single field of the chosen element type: ```hlsl cbuffer singleCB { X singleCB; } ... X x = singleCB; // this works fine! ``` And in the array case we generate code that uses the explicit `ConstantBuffer` type: ```hlsl ConstantBuffer cbArray[3]; ... X x = cbArray[0]; ``` The GLSL output is more complicated because unlike with HLSL there is no implicit conversion from a uniform block to its element type (there is no notion of an element type). The array case thus needs a `_data` field similar to what we do for structured buffers: ```glsl layout(std140) uniform _S3 { X _data; } cbArray[3]; ... X x = cbArray[0]._data; ``` And then the non-array case needs to have a similar `_data` field for consistency: ```glsl layout(std140) uniform _S1 { X _data; } singleCB; ... X x = singleCB._data; ``` This is handled by inserting the necessary reference to `_data` whenever we dereference a constant buffer, either as part of a load instruction (loading from the whole CB as a pointer), or an `IRFieldAddress` instruction which forms a pointer into the CB (e.g., `&(singleCB->f)` becomes `singleCB._data.f`). The current emit logic handles `ParameterBlock` differently from `ConstantBuffer`, but really only to allow parameter blocks to be explicitly named in the output, while constant buffers were left implicit by default. Thus the only difference was a legacy one (from back when trying to exactly reproduce the HLSL text we got as input was considered an important goal), and the new approach to emitting constant buffers would get rid of it. I removed the separate logic for emitting `ParameterBlock` and just let the handling for constant buffers deal with it. Note that any resource types inside of a `ParameterBlock` would have been moved out as part of legalization, so that a parameter block is 100% equivalent to a constant buffer when it comes time to emit code. Unsurprisingly, changing the way we generate HLSL and GLSL output for all these buffer types meant that any tests that were directly comparing the output of `slangc` against `fxc`, `dxc`, or `glslang` broke. The basic approach to fixing the breakage in GLSL tests was to update the GLSL baseline to reflect the new output startegy. In some cases I used macros to name the various `_S` temporaries so that future renaming will hopefully be easier (it would be great if we auto-generated temporary names with a bit more context). There was one GLSL test (`tests/bugs/vk-structured-buffer-binding`) that was using raw GLSL expected output, and this was changed to use a GLSL baseline to generate SPIR-V for comparison. For HLSL tests we were sometimes running the same input file through `slangc` and `fxc`/`dxc`, and in these cases I macro-ized the various `cbuffer` declarations to generate different declarations depending on the compiler. I completely dropped the tests coming from the D3D SDK because they aren't providing much coverage, and updating them would change them so far from the original code that the purported benefit (using a body of existing shaders) would be lost. I also dropped the explicit matrix layout qualifiers in the `matrix-layout` test because the new output strategy breaks those for GLSL (you can't put matrix layout qualifiers on `struct` fields, and now the body of every constant buffer is inside a `struct`). This isn't as big of a loss as it seems, because our handling of those qualifiers wasn't really right to begin with. Slang users should only be setting the matrix layout mode globally (and we should probably switch to error out on the explicit qualifiers for now). The other thing that got dropped is tests involving `packoffset` modifiers. Slang already warns that it doesn't support these, and the way they were used in the test cases is actually misleading. For the binding/layout-related tests, the goal was to show that Slang reproduces the same layout as fxc, in which case explicitly enforcing a layout via `packoffset` seems like cheating (are we sure we enforced the layout fxc would have produced?). The real reason was that Slang used to emit explicit `packoffset` on *every* field of a `cbuffer` it would output, because of an `fxc` bug where you couldn't use `register` on textures/samplers declared inside a `cbuffer` unless *every* field in the `cbuffer` used a `register` or `packoffset` modifier. Slang hasn't required that behavior in a while because it now splits textures and samplers, and the one test case where we needed `packoffset` to work around the `fxc` bug in the baseline HLSL has been macro-ified even more to work around the bug. The amount of churn in the test cases is unfortunate, but it continues to point at the weakness of any testing strategy that checks for exact equivalent between Slang's output and that of other compilers. We need to keep working to replace these tests with better alternatives. In `check.cpp` there is logic to perform implicit dereferencing, so that if you write `obj.f` where `obj` is a `ConstantBuffer` (or some other "pointer-like" type) and `f` is a field in `X`, then this effectively translates as `(*obj).f`. That is, we dereference the value of type `ConstantBuffer` to get a value of type `X`, and then refer to the field of the `X` value. There was a problem where the logic to insert that kind of implicit dereference operation was using a reference (`auto& type = ...`) for the type of the expression being dereferenced, and then clobbering it. This would mean that an expression of type `ConstantBuffer` would have its type overwritten to be just `X` and then codegen would break later on. I'm not sure how we haven't run into that before. The `array-of-buffers` test case was added to confirm that we now support arrays of constant, structured, and byte-address buffers for both DXIL and SPIR-V output. Okay, so that was a lot of stuff, but hopefully it is clear how this all works to make the output of the compiler more consistent and explicit, while also supporting the required new functionality. * fixup: review feedback

Remove the "hack sampler" workaround (#648)

2018-09-21T18:12:23+00:00

* Update glslang version * Fix build for new glslang The latest glslang required a few changes to our manual build for their code (because we are *not* taking a dependency on CMake). * Rebuild project files using premake, which picks up a few files added to glslang, but also a few diffs in Slang's own project files in cases where they were edited manually instead of using premake. * Fix up the declaration our our device limits (which are inentionally set to *not* limit what code passes through our glslang), because the underlying structure definition in glslang has changed. This is a kludgy bit of glslang's design, but it doesn't make sense for us to invest in a more serious workaround. * Remove the "hack sampler" workaround When the `GL_KHR_vulkan_glsl` spec was introduced to allow GLSL to be compiled for Vulkan SPIR-V, it made an annoying mistake by leaving a few builtins as taking `sampler2D`, etc. when the equivalent SPIR-V operations only require a `texture2D`, etc. The relevant builtins are: * `textureSize` * `textureQueryLevels` * `textureSamples` * `texelFetch` * `texelFetchOffset` This means that shader code that wanted to use those operations needed to conspire to have a `sampler` handy so they could write, e.g.: ```glsl vec4 val = texelFetch(sampler2D(myTexture, someRandomSampler), p, lod); ``` when what they really wanted was this: ```glsl vec4 val = texelFetch(myTexture, p, lod); ``` That is annoying but probably something each to work around for a GLSL programmer, but when cross-compiling from HLSL, you might have an operation like: ```hlsl float4 val = myTexure.Load(p); ``` in which case a cross-compiler needs to manufacture a sampler out of thin air. If the shader happened to use a sampler for something else you could snag that, but in the worse case you had to cross-compile to GLSL that declared a new sampler. Slang did this by declaring a sampler called `SLANG_hack_samplerForTexelFetch` (because `texelFetch` is the operation that first surfaced the issue). For complex reasons we *always* define this sampler, even if we turn out not to need it in a particular output kernel. This choice has a bunch of annoying consequences: * There is *always* a sampler defined in descriptor set zero, because that's where we put the hack sampler, so a user-defined parameter block always has a set number of 1 or greater (see #646). * The hack sampler shows up in reflection output because users need to size their descriptor sets appropriately to pass along this sampler that won't actually be used if they don't want to get debug spew from the validation layers. We filed an issue on glslang about this problem, and eventually some kind folks from the gamedev community (who also saw the same problem) defined an extension spec (`GL_EXT_samplerless_texture_functions`) to fix the underlying issue and contributed a patch to glslang to make it support that extension. This change just backs the hack out of Slang now that we have a glslang version that supports the extension to get past the defect in the original GLSL-for-Vulkan definition. Besides yanking out the code for the hack, we also change the relevant builtins to declare that they require this new GLSL extension (so that we properly request it from glslang when the builtins are used), and fix some reflection test cases that exposed the existence of the "hack sampler." * Fixup: syntax error in stdlib generator files * Remove more code for hack sampler There was logic to ensure we always have a "default" register space/set when cross-compiling, because the hack sampler would need it. This is no longer necessary once we remove the hack sampler. * Fix expected test output. Fixing the root cause of issue #646 means that one of our test cases that tickles that issue now produces different output (luckily it can now be used as a regression test for the issue).

Allow more complex compound expressions when emitting from IR (#552)

2018-05-04T19:01:30+00:00

The emit logic already had an idea of when an instruction should be "folded" it its use site(s), and this change just expands on that logic to try to be more aggressive. The basic idea is that instead of outputting this: ```hlsl float4 _S3 = a_0 + b_0; float4 _S4 = c_0 * _S3; d_0 = _S4; ``` we can hopefully output something like this: ```hlsl d_0 = c_0 * (a_0 + b_0); ``` The way this works is that after dealing with the various special cases that decide an instruction `I` must/cannot be folded in, we look and see if it has the following properites: * `I` has no side effects * `I` has a single user, `U` * `I` and `U` are in the same block (and `I` comes before `U` in that block) * for every instruction `X` between `I` and `U` (exclusive), `X` has no side effects If all of these conditions are true, then `I` can be folded in as a sub-expression when we emit `U`. This change doesn't affect most of our test output, but there is still a single test with SPIR-V output that we compare against a GLSL baseline, and so that baseline had to be modified to match the GLSL we now generate. Similar to #547, this change is not meant to provide a complete solution, but rather to take a concrete but low-risk step toward improving our output. Opportunities to improve the results further include: * We can/should ensure that when outputting sub-expressions we keep extra parentheses to a minimum. The old logic for emitting from an AST had support for "unparsing" expressions with minimal parentheses, and we should try to do the same. This can be error-prone, because omitting parentheses can lead to silent failures, so it must be done carefully. * We could try to be more aggressive about detecting what operations might have side effects. The most interesting case is function calls, where we should try to check if the callee is a function known to be side-effect-free. We could start by annotating most builtin functions with an attribute/decoration that indicates freedom from side effects. Deriving this attribute for user functions could be interesting, but we'd have to be careful since "nontermination" is technically a side effect. * We could try to be more aggressive about determining what side effects in instructions `X` are "safe" for the instruction `I` to move across. For example, if `I` is a load from variable `a` and `X` is a store to variable `b`, then that would seem to be safe. This starts to get into issues of instruction scheduling, though, and that is probably beyond what we want Slang to be doing.

Pass through original names for most declarations (#547)

2018-05-03T23:34:49+00:00

The basic idea here is that when lowering to the IR, the front-end will attach a "name hint" to the IR instruction(s) that represent a given declaration, and then the passes that work on the IR will try to preserve and propagate those names, and then finally the emit logic will use them in place of mangled or unique names when available. This change does *not* try to deal with the issues that arise when we try to use those variable names in the output without any modification (e.g., handling cases where they might clash with keywords or builtins in the target language). Instead, it tries to establish baseline behavior for propagating through names, so that a later change can concentrate on the issue of using those names exactly when it is legal to do so. In order to avoid issues around the name "hints" causing problems we take two main steps: 1. We "scrub" each name to reduce it down to the allowed set of identifier characters in C-like languages, and then ensure that it doesn't do things that would be illegal in some downstream languages (e.g., consecutive underscores are not allowed in GLSL) or could clash with Slang's mangled names. This process isn't guaranteed to give distinct results for distinct inputs (it isn't a mangling scheme, after all). 2. We generate a unique ID for each occurence of a given name and always use that as a suffix. This means that even if a name happens to overlap with a keyword (if you somehow have a variable named `do`), we will still add a suffix that makes it not a problem (we'd output `do_0` which is fine). The logic for generating these names is mostly straightforward. For simple variables, we use their given name directly, while for other declarations we try to form a name that includes their parent declaration (e.g. `SomeType.someMethod`). Various IR passes need to propagate or preserve this information. The most interesting is type legalization, when we take a variable with an aggregate type and split some of the fields out into their own variables. In that case we generate "dotted" names like `someVar.someTexture` and rely on the emit logic to turn that into `someVar_someTexture`. During SSA generation, if we are promoting a variable to SSA temporaries, we will try to propagate the name of the variable over to the temporaries (unless they already have a name from some other place). The same applies to block parameters ("phi nodes"). Many of the test changes need their expected output to be updated for this change. Luckily in most cases the output has gotten easier to understand.

Introduce an IR-level type system (#481)

2018-04-11T23:18:29+00:00

* Introduce an IR-level type system Up to this point, the Slang IR has used the front-end type system to represent types in the IR. As a result (but ultimately more importantly) the IR representation of generics and specialization has used AST-level concepts embedded in the IR. For example, to express the specialization of `vector` to a concrete type `float` for `T`, we needed an IR operation that could represent the specialization, with operands that somehow represented the type argument `float`. The whole thing was very complicated. The big idea of this change is to introduce a new representation in which types in the IR are just ordinary instructions, so that using them as operands makes sense. The hierarchy of IR types closely mirrors the AST-side hierarchy for now, and that will probably be something we should maintain going forward. In order to make these changes work, though, I also had to do major overhauls of things like the way substitutions are performed, how we check interface conformances, the way lookup through interface types is done, etc. etc. This is a big change, and unfortunately any attempt to summarize it in the commit message wouldn't do it justice. * Fix 64-bit build warning * Fix up some clang warnings/errors

Generate SSA form for IR functions (#400)

2018-02-07T22:37:37+00:00

* Generate SSA form for IR functions The basic idea here is simple: in the front-end after we have lowered the AST to initial IR we will apply a set of "mandatory" optimization passes. The first of these is to attempt to translate the all functions into SSA form so that they are amenable to subsequent dataflow optimizations. Eventually, the mandatory optimization passes would include diagnostic passes that make sure variables aren't used when undefined, etc. Just doing basic SSA generation already cleans up a lot of the messiness in our IR today, because constructs that used to involve many local variables can now be handled via SSA temporaries. The implementation of SSA generation is in `ir-ssa.cpp`, and it follows the approach of Braun et al.'s "Simple and Efficient Construction of Static Single Assignment Form." I used this instead of the more well-known Cytron et al. algorithm because Braun's algorith mis very simple to code, and does not require auxiliary analyses to generate the dominance frontier. The main wrinkle in our SSA representation right now is that instead of using ordinary phi nodes, we instead allow basic blocks to have parameters, where predecessor blocks pass in different parameter values. This encodes information equivalent to traditional phi nodes, but has two (small) benefits: 1. There is no fixed relationship between the order of phi operands and predecessor blocks, so we don't have to worry about breaking the phis when we alter the order in which predecessors are stored. This is important for us because predecessors are being stored implicitly. 2. It is easy to operationalize a "branch with arguments" either when lowering to other languages, or when interpreting the IR. A branch with arguments is implemented as a sequence of stores from the arguments to the parameters of the target block (very similar to a call), followed by a jump to the block. Relevant to the above, this change also adds an interface for enumerating the predecessors or successors of a block in our CFG. Rather than use an auxliary structure, we directly use the information already encoded in the IR: * The sucessors of a block are the target label operands of its terminator instruction. In our IR this is a contiguous range of `IRUse`s, possible with a stride (to account for the way `switch` interleaves values and blocks). * The predecessors of a block are a subset of the uses of the block's value. Specifically, they are any uses that are on a terminator instruction, and within the range of values that represent the successor list of that instruction. One important limitation of the "blocks with arguments" model for handling phis is that it is really only convenient to stash extra arguments on an unconditional terminator instruction. This change works around this prob lem by breaking any "critical edges" - edges between a block with multiple successors and one with multiple predecessors. We assume that "phi" nodes will only ever be needed on a block with multiple predecessors, and because critical edges are broken, each of these predecessors will then have only a single successor, so its branch instruction can handle the extra arguments. This change introduces a notion of an "undefined" instruction in the IR. This is handled as an instruction rather than a value because I anticipate that we will want to distinguish different undefined values when it comes time to start issuing error messages (those messages will need to point to the variable that was used when undefined). * Fix expected test output. Another change was merged that enabled the `glsl-parameter-blocks` test, and its output is affected by our IR optimization work.