summaryrefslogtreecommitdiffstats
path: root/source/slang/slang-ast-dump.cpp
Commit message (Collapse)AuthorAge
* Mediate access to ContainerDecl members (#7242)Theresa Foley2025-06-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of what this change does is straightforward: take all the places in the code that used to operate directly on `ContainerDecl::members` and related fields, and instead have them call into a smaller set of accessor methods defined on `ContainerDecl`. The primary motivation for making this change is that in order to implement on-demand loading of members from serialized AST modules, we need a way to identify and intercept the "demand" for those members. On-demand loading benefits from having all accesses to the members of a `ContainerDecl` be as narrow as possible. If a part of the code only need a member at a specific index, it should say so. If it only needs access to members with a specific name, or a given subclass of `Decl`, then it should say so. A secondary motivation for this change is that there have recently been several changes that added complexity and special cases by introducing code that operated on (and *mutated*) the member list of a container decl in ways that the existing code had never done before. Any code that mutates the member list of a `ContainerDecl` needs to be sure to not disrupt the invariants that the lookup acceleration structures currently rely on. One of the recent changes added a declaration-to-index map to the set of acceleration structures (with different validation/invalidation behavior than the others...) while other recent changes would remove or insert declarations in ways that could change the indices of other declarations in the same container. It is not clear if any of these pieces of code were aware of the others, and the invariants that might be expected or broken along the way. This change bottlenecks the vast majority of accesses to the members of a `ContainerDecl` through the following operations: * Getting a `List` of all of the direct member declarations of a container * Get the number of direct member declarations, and accessing them by index. * Looking up the list of direct member declarations with a given name. * Adding a new direct member declaration to the end of the list. Some other operations are layered on top of those (e.g., getting a list of all the direct member declarations of a given C++ class). These layered operations are still centralized on the `ContainerDecl`, with the intention that we *can* change them to be non-layered implementations if we ever need to for performance (e.g., by building a lookup structure for finding member declarations by their type). The exceptional cases of access/mutation on the direct members of a `ContainerDecl` have also been encapsulated, but rather than expose what would risk appearing like general-purpose accessors (e.g., `removeDecl(d)`, `setDecl(index)`, etc.), these operations have been explicitly named after the specific use case that they serve in the codebase today, to discourage others from using them for more kinds of operations we'd rather not support. These operations have also been given parameter signatures that match their use cases, to make it so that even somebody determined to abuse them would have to invent suitable arguments out of thin air. In the case of the declaration-to-index mapping, this change eliminates that acceleration structure, in favor or slightly more complicated (and possibly inefficient, yes) code at the use site. Over time, it would be good to closely scrutinize each of the use cases that requires more complicated interaction with the members of a `ContainerDecl`, to see whether any of them can be reframed in terms of the more basic operations, or if there is some clean abstraction we can introduce to make operations that mutate the member list feel like... hacky.
* Generalize serialization system used for AST (#7126)Theresa Foley2025-05-21
| | | | | | | | | | | | | | | This change takes the new approach to serialization that was used for the AST and generalizes it in a few ways: * The new approach is no longer tangled up with the RIFF format. The serialization system supports multiple different implementations of the underlying format. The existing RIFF format is now supported as one back-end, but support for others will follow in subsequent changes. * The new approach is no longer deeply specialized to AST serialization. The old code had things like serialization for `List`s and `Dictionary`s, but it was embedded inside the `AST{Encoding|Decoding}Context`, and thus couldn't be leveraged for other serialization tasks. This change factors out a completely AST-independent `Serializer` implementation, with an `ASTSerializer` layered on top of it to provide the additional context needed. * There is less duplication of code between reading and writing of serialized data. The old code had both the `ASTEncodingContext` and `ASTDecodingContext`, with serialization logic for most types being implemented in both, but with the constraint that those implementations needed to be kept in sync to avoid serialization-related runtime failures. A key property of the revamped approach is that a single `serialize()` method for a type implements both the reading and writing directions of serialization.
* A new approach to AST serialization (#6854)Theresa Foley2025-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * A new approach to AST serialization This change completely overhauls the way that AST nodes are being serialized, and the offline source-code generation steps that enable that serialization. In practice, this ends up being a complete overhaul of the way that *modules* are being serialized (not just the AST part), although things like the serialization format for the Slang IR and for source locations are not affected. The rest of this commit message is broken down in to sections, in an attempt to help guide anybody looking at the code in how to make sense of all the changes. The Old C++ Extractor --------------------- AST serialization used to be driven by information scraped using the `slang-cpp-extractor` tool, which did an ad hoc parse of the C++ declarations of the AST node types and then generated a set of "X macros" that could be for macro-based code generation within the rest of the compiler. While the existing approach was functional, it wasn't easy to understand or maintain, and it has been getting in the way of forward progress on other features we'd like to work on in the language and compiler. This change removes the `slang-cpp-extractor` tool entirely. Marking Up the AST Declarations ------------------------------- The most notable change that contributors to the compiler may notice is the large number of invocations of a macro `FIDDLE()` on the declarations of the AST node types. The basic idea is that only declarations (namespaces, types, fields) that are preceded by `FIDDLE()` are visible to the code generator tool. So if somebody is working with the AST and wondering why a new node type isn't working, or why a field they added isn't being serialized correctly, it is probably because they need to add `FIDDLE()` in front of it. Generating the Boilerplate Code ------------------------------- The file `slang-ast-boilerplate.cpp` provides a good example of how the information extracted from the marked-up AST declarations gets used. In that file, the `FIDDLE TEMPLATE` construct is used to generate type information for each of the AST node types. Similar logic is used in `slang-ast-forward-declarations.h` to generate the declaration of the `ASTNodeType` enumeration, and forward-declare all the AST node classes. For many parts of the code, simply including that file replaces the need for the old `slang-generated-*.h` files. Replacing Visitors and Related Logic ------------------------------------ The old visitor types for the AST used the macros that were generated by `slang-cpp-extractor`, so something new was needed to replace them. The same goes for the `SLANG_AST_NODE_VIRTUAL_CALL` macros. The core of the solution implemented here is in `slang-ast-dispatch.h`. Given a "dispatchable" AST node type (say, `Expr`), a call like: ``` ASTNodeDispatcher<Expr,R>(expr, [&](auto e) { return doSomething(e); }) ``` is an expression of type `R`, which does the equivalent of something like: ``` switch(expr->getTag()) { case ASTNodeType::VarExpr: return doSomething(static_cast<VarExpr*>(expr)); // ... } ``` The `SLANG_AST_NODE_VIRTUAL_CALL` macro is now implemented in terms of `ASTNodeDispatcher`. The implementation of the visitor types is more involved. The code in this change retains some of the macro names from the original version, just to try and make the parallels more clear. The visitor types are all implemented on top of the `ASTNodeDispatcher` approach, and use `FIDDLE TEMPLATE` to generate all the boilerplate `visit*()` method declarations. Refactoring of `Linkage` Module Loading --------------------------------------- Needing to revisit all the places where modules get deserialized made it clear that there is a lot of complexity and apparent duplication in the core routines on the `Linkage` that get used for loading modules. This change tries to clean up some of that logic, but it is worth noting that there are two legacy features that get in the way of making things as clean as they should be: * The `LoadedModuleDictionary` type that gets passed around a lot exists entirely to handle the corner case where somebody uses the Slang API to perform a compilation with multiple `TranslationUnitRequest`s in the same `FrontEndCompileRequest`, and one of the translation units `import`s the module defined by another of the translation units. * There are a lot of special-case behaviors and routines entirely there to support the `ModuleLibrary` feature, although that feature should be considered deprecated (or at least subject to getting entirely re-designed down the line). The basic idea of the cleanup is that all of the (non-deprecated) ways load a module from a serialized binary, or compile one from source should now bottleneck through `loadModuleImpl`, which then bifurcates into `loadSourceModuleImpl` for the compilation case and `loadBinaryModuleImpl` for the deserialization case. High-Level Serialization Approach --------------------------------- The old serialization logic used the [RIFF](https://en.wikipedia.org/wiki/Resource_Interchange_File_Format) format to encode the high-level structure of things, and this change retains that usage (and actually doubles down on the RIFF usage). The old serialization system relied on the idea that for any given type `Foo` that wants to support serialization, there should be something like a `SerialFooData` type in C++, that can represent the state of a `Foo`, and then the actual serialization applied to that `SerialFooData`. This means that in most cases there are four pieces of code written: * During serialization: * Copying the data of a `Foo` in memory over to a `SerialFooData` in memory * Writing the state of a `SerialFooData` into the serialized data stream * During deserialization: * Reading the state of a `SerialFooData` from a serialized data stream * Copying the data of the `SerialFooData` in memory over to a `Foo` The new logic gets rid of the intermediate `SerialFooData`. In the serialization direction, we take a `Foo` and write it to the `RIFFContainer` directly, or using some other utilities layered on top of it. In the deserialization direction, we have additional flexibility. Given a `RIFFContainer::Chunk*` that represents a serialized `Foo`, we often navigate through the in-memory representation of the RIFF data to get to the parts of the serialized value that we actually want/need, without needing to deserialize the entire `Foo`. To support this kind of operation, this change introduces a few helper types like `ContainerChunkRef` an `ModuleChunkRef`, that are little more than typed wrappers around a `RIFFContainer::Chunk*`. The Module "Container" Part --------------------------- A serialized `Module` is encoded as a RIFF chunk, using logic in `slang-serialize-container.cpp` - both before and after this change. This change reorganizes a lot of the code in that file, to account for the way that eliminating the intermediate `SerialContainerData` type streamlines the overall task of writing out the parts of the module. In the deserialization logic... there isn't really much to do in `slang-serialize-container.cpp`. Most of the logic in `slang.cpp` and `slang-module-library.cpp` that pertains to deserializing modules uses the `ModuleChunkRef`-based approach, and simply extracts the pieces of the serialized module that it needs. The Actual Serialization of the AST ----------------------------------- The actual AST serialization logic is in `slang-serialize-ast.cpp`. The basic approach in both the writing and reading directions is: * Use the `FIDDLE TEMPLATE` system to generate a set of functions, one for each AST node type, that recursively invoke the read/write logic on each field of that node (after recursively invoking the case for its direct superclass) * Use the `ASTNodeDispatcher` system to dispatch out to those functions whene reading or writing anything derived from `NodeBase` * For now, handle all types *not* derived from `NodeBase` by hand. There's a lot of room for improvement around that last item: it should be just as easy to generate the serialization and deserialization logic for other types that don't inherit from `NodeBase`, but the current change tries to err on the side of making the logic as explicit and simplistic as possible, rather than trying to get too clever too soon. The actual serialization *format* used for the AST is almost comically simplistic: the code uses hierarchical RIFF chunks to emulate a JSON-like structure. This is a very wasteful representation (e.g., a `bool` or a null pointer each take up *8 bytes*), but the goal for now is to start with the simplest thing that could possibly work, and only add more cleverness once we are sure it won't get in the way of important future improvements (like lazy/on-demand deserialization or IR and AST, to improve compiler startup times). The files `slang-serialize.{h,cpp}` have been co-opted to define a new pair of types `Encoder` and `Decoder` that are used for a more-or-less stream-oriented way or reading or writing RIFF chunks for the JSON-like structure. Almost everything related to the actual AST serialization could do with a cleanup pass, and some time spent on picking good/better names for everything. Smaller Stuff ------------- * Cleaned up a lot of code that was using bare `ASTNodeType` or the extractor's `ReflectClassInfo` type to consistently use `SyntaxClass`. * Fixed an apparent bug in how the destination-driven code genarator was handling `TryExpr`s * Fixed an apparent bug in how the GLSL legalization pass was handling translation of certain `SV_*` semantics. * format code * fixup: template errors caught by non-VS compilers * format code * fixup: more template errors * fixup: more stuff VS didn't catch * fixup: it's amazing VS doesn't catch these... * fixup: yet more template stuff VS ignores * fixup: more VS template nonsense * fixup: unreachable return macro usage * fixup: more unreacable returns * fixup: unused parameter * fixup: strict aliasing * fixup: allow missing entry point list chunk * fixup: wasm build script * fixup: AST changes since this PR was created --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: Yong He <yonghe@outlook.com>
* Fix lowering of associated types in generic interfaces (#6600)Sai Praveen Bangaru2025-03-15
| | | | | | | | | | | * Fix lowering of associated types in generic interfaces. * Update diff-assoctype-generic-interface.slang * Fix-up lowering of differentiable witnesses for implicit ops * Update slang-ir-autodiff-transcriber-base.cpp * Fix issue with differentiating type-packs
* Use two-stage parsing to disambiguate generic app and comparison. (#6281)Yong He2025-02-05
| | | | | | | * Use two-stage parsing to disambiguate generic app and comparison. * Typo fix. * Update doc.
* Lower varying parameters as pointers instead of SSA values. (#5919)Yong He2025-01-07
| | | | | * Add executable test on matrix-typed vertex input. * Fix emit logic of matrix layout qualifier. * Pass fragment shader varying input by constref to allow EvaluateAttributeAtCentroid etc. to be implemented correctly.
* Move switch statement bodies to their own lines (#5493)Ellie Hermaszewska2024-11-05
| | | | | | | | | * Move switch statement bodies to their own lines * format --------- Co-authored-by: Yong He <yonghe@outlook.com>
* formatEllie Hermaszewska2024-10-29
| | | | | | | * format * Minor test fixes * enable checking cpp format in ci
* Enable emscripten builds to compile slang.dll to WebAssembly. (#5131)Anders Leino2024-09-25
| | | | | | | | | | | | | | | | | | | | | | | | | | * Compile fixes for Wasm The issues are all are due to 'long' types being 32 bits on WASM. - class members redeclared errors - << with StringBuilder and unsigned long is ambiguous This helps to address issue #5115. * Use the host executable suffix for generators Since the generators are run at build-time, we should not use CMAKE_EXECUTABLE_SUFFIX, which is the suffix for the target platform. Instead, define CMAKE_HOST_EXECUTABLE_SUFFIX as appropriate, and use that suffix instead. This helps to address issue #5115. * Add support for Wasm as a platform This helps to address issue #5115. * Add emscripten build This closes #5115.
* Tuple swizzling, concat, comparison and `countof`. (#4856)Yong He2024-08-19
| | | | | | | | | | | * Tuple swizzling and element access. * Update proposal status. * Cleanup. * Fix merrge error. * Address review.
* Capabilities System, CapabilitySet Logic Overhaul (#4145)ArielG-NV2024-05-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Capabilities System, Backing Logic Overhaul Fixes #4015 Problems to address: 1. Currently the capabilities system spends anywhere from 25-50% of compile time on the CapabilityVisitor. Most of this time is spent on join logic: 1. Finding abstract atoms 2. Comparing list1<->list2. This should and can be made significantly faster. 2. Error system does not produce errors with auxiliary information. This will require a partial redesign to provide more useful semantic information for debugging. What was addressed: 1. Array backed `CapabilityConjunctionSet` was replaced in-favor for a `UIntSet` backed `CapabilityTargetSets`. The design is described below. Design: * `CapabilityTargetSets` is a `Dictionary<targetAtom, CapabilityTargetSet>`. This is not an array for 2 reasons: 1. Easy to figure out which target is missing between two `CapabilityTargetSets` 2. To statically allocate an array requires the preprocessor to manually annotate which Capability is a target and link that Capability to an index. This means a dictionary is required for lookup regardless of implementation. * `CapabilityTargetSet` is an intermediate representation of all capabilities for a singular `target` atom (`glsl`, `hlsl`, `metal`, ...). This structure contains a dictionary to all stage specific capability sets for fast lookup of stage capabilities supported by a `CapabilitySet` for a `target` atom. This reduces number of sets searched. * `CapabilityStageSet` is an intermediate representation of all capabilities for a singular `stage` atom (`vertex`, `fragment`, ...). This structure holds all disjoint capability sets for a `stage`. A disjoint set is rare, but may exist in some scenarios (as an example): `{glsl, EXT_GL_FOO}{glsl, _GLSL_130, _GLSL_150}`. This reduces the number of sets searched. * `UIntSet` is the main reason for the redesign for better performance and memory usage. All set operations only require a few operations, making all set logic trivial and with minimal cost to run. All algorithms were modified to focus around `UIntSet` operations. 2. Errors * Semantic information are now better linked to the calling function to provide a connection of function<->function_body for when saving semantic information for errors. * Missing targets now print errors much like other error code by finding code which could be a cause of incompatibility. What is missing: 1. Add non naive support for non-stage specific capabilities such as `{hlsl, _sm_5_0}`. Currently non stage specific targets emulate the behavior through assigning such capabilities to every stage: `{hlsl, _sm_5_0, vertex} {hlsl, _sm_5_0, fragment}...`. Removal of this behavior would remove redundant shader stage sets being made at construction time (~80% of new implementation runtime). This is an addition, not an overhaul. 2. Optionally: `UIntSet` should be modified to support SIMD operations for significantly faster operations. This is not required immediately since `UIntSet` is already not a performance constraint. Notes: * UIntSet had implementation bugs which were fixed in this PR. * The old capabilities system had bugs which were fixed in this PR when transforming to the new implementation. * fix .natvis debug view * Small optimizations I found while working on the addition the AST building pass looks like so now: 1% = ~capabilitySet 2% = capabilitySet() 1.5% capabilitySet::unionWith() 0.8% capabilitySet::join() 1.5% auxillary info for debugging ~0.5-1% extra visitor overhead ~5% total for the visitor ~6.5% for total runtime costs * fix caps which were wrong but worked * push minor syntax fix (still looking for why other tests fail) * perf & bug fixes 1. did not properly remake isBetterForTarget for this->empty case with that as Invalid. This is best case in this senario. 2. Remade seralizer for stdlib generation. Faster (more direct) & cleaner code. NOTE: did not address review comments * fix glsl.meta caps error * fixing findBest logic again & UIntSet wrapper findBest was not checking for 'more specialized' targets & was element counter was flawed * faster getElements algorithm + natvis for UIntSet + wrong warning * type incompatability of bitscanForward implementations * try to fix warnings again * remove ptr for clang intrinsic * add missing header * ifdef to allow clang compile * compiler hackery to fix up platform/type independent operations * bracket * fix MSVC error * missing template * change types out again * changes to fix compiling * adjustment to parameter for Clang/GCC * added iterator to delay processing all atomSets of a CapabilitySet * add a few missing consts's * ensure we never have more than 1 disjointSet Added a wrapper + assert + union functionality to all possible disjoint sets. This was done in favor of a removal of the LinkedList for 2 reasons: 1. We still need 0-1 set functionality. 2. Might as well keep the code, just disallow the problematic functionality. * address review comments non linked-list refactor review comments addressed; add doc comments + remove redundant code * comments + remove isValid for bool operator * push removal of linkedlist for capabilities * add missing break * address review comments minor adjustments of syntax * push a fix to the `CapabilitySet({shader, missing target})` code * quality + error 1. add iterator to UIntSet 2. do not specialize target_switch if profile is derived from case (GLSL_150 is not compatable with GLSL_400) * fix target_switch erroring + temporarily remove UIntSet::Interator temporarily remove UIntSet::Interator. It will be added after, testing code on CI first so I can multi-task fixing the UIntSet Iterator * fix the UIntSet iterator * Revert "fix the UIntSet iterator" temporarily to pull from master * add metal error as per texture.slang (took a while I realize this was why things were breaking, likely should adjust errors to reflect this) * Rework UIntSet to have a template for output type This is done so it is reasonable to debug the iterator output and not just dealing with messy int's Fix problems with the iterators implemented + invalid capabilities handling * removed incorrect `__target_switch` capability barycentric was being used with anticipation of `profile glsl450`, this does not expand into `GL_EXT_fragment_shader_barycentric`, this instead caused an error which is hidden during cross-compile. * remove some uses of getElements * remove undeclared_stage for now * remove redundant code associated with `undeclared_stage` * remove unused variable * address review specifically to note removed static in a thread dangerous scope. Now using a `const static` for read only (thread safe) which precompile steps generate * move GLSL_150 capdef change to sm_4_1 (more accurate) * address most review comments did not address: https://github.com/shader-slang/slang/pull/4145#discussion_r1602256776 * revert incorrect code review suggestion * push changes for all code review suggestions
* Add host shared library target. (#4098)Yong He2024-05-03
| | | | | | | | | | | | | * Add host shared library target. * Attempt fix. * Fix warnings. * try fix. * Fix test. * Fix.
* Initial pass to add capability declarations to stdlib intrinsics. (#3912)ArielG-NV2024-04-19
|
* Implement raytracing extension(s); resolves #3560 for GLSL & SPIR-V targets ↵ArielG-NV2024-03-15
| | | | | | | | | (#3675) The following PR implements raytracing extensions (GLSL_EXT_ray_tracing, GLSL_EXT_ray_query, GLSL_NV_shader_invocation_reorder & GLSL_NV_ray_tracing_motion_blur); for GLSL & SPIR-V targets. Fully implements all functions, built-in variables, & syntax; resolves #3560 for GLSL & SPIR-V Targets. notes of worth: * __rayPayloadFromLocation, __rayAttributeFromLocation, and __rayCallableFromLocation, were added as SPIR-V Intrinsics to refer to location's of raytracing objects in SPIR-V for when using GLSL syntax.
* Partially implement shader_subgroup extension(s); Partially resolves #3548 ↵ArielG-NV2024-02-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#3580) * Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548 Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548 GL_KHR_shader_subgroup implemented based on https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt Implementation is broken down into seperate glsl extensions due to the ***large differences*** in implementation of each section, and functionality/testing. GL_KHR_shader_subgroup_basic{ **Partially implemented** Implementation: * All 9 built-in variables have been stubbed without proper value; implementation is still required for these system variables; related to #411. * Functions were reimplemented despite nearly mirrored HLSL functions due to: * hlsl.meta implementations targetting workgroups rather than a warp/wave/subgroup: * `__syncwarp` vs `__syncthreads` * `SubgroupMemory` vs `WorkgroupMemory` * etc. * hlsl.meta implementations target broader SPIR-V memory targets to block on: * ImageMemory|UniformMemory versus SPIR-V specifying barriers for ImageMemory and seperately an option for UniformMemory * `subgroupElect` for CUDA has a different implementation than `WaveIsFirstLane`, this is because spec states that `subgroupElect()` only returns the lowest active gl_SubgroupInvocationID; therefore we are supposed to fetch the current active mask even if some invocations are turned off by branches Testing: tests for the variable -- `tests/glsl/shader-subgroup-built-in-variables.slang` * these tests do not test functionality since not implemented yet tests for the functions -- `tests/glsl/shader-subgroup-basic.slang` * concurrency is tested for using SubgroupMemory, UniformMemory through attempting to create a GPU side race condition with writing and reading memory * due to testing tools avaible there are no tests for ImageMemory * subgroupElect is tested to return invocation #0, the lowest invocation that will always run; wave size is 32, therefore #0 is always active and will always be the elected invocation. } GL_KHR_shader_subgroup_vote{ **Fully implemented** Implementation: * 3/3 functions are using the hlsl.meta implementation Testing: `tests/glsl/shader-subgroup-vote.slang` * Testing each a positive (returns true) and negative (returns false) test case to ensure vote results are correct } GL_KHR_shader_subgroup_ballot{ **Partially implemented** Implementation: There are 10/10 functions that are implemented: * 3 are using hlsl.meta implementation * 7 are using new implementations -- only support GLSL, SPIR-V, HLSL, CUDA * These implementations do not exist in hlsl.meta, so they were added * `subgroupInverseBallot` lacks an analog function to call; this feature was emulated: * in CUDA through knowing waves are 32bit and lanes are 0 indexed, this implys that ` (ballotResult >> YOUR_INVOCATION) & 1` checks if your invocation is active, for example, `(0b11001 >> 3) & 1` would mean that only invocation 5, 4, and 1 is active, 3 would mean `YOUR_INVOCATION` is the fourth invocation in the subgroup. `(0b11001>>3) & 1` would return true since your bit is toggled and evaluates to `0b11 & 0b1` * in HLSL through testing if the wave count is 32 or less (use the same logic as CUDA in this case); else find the index `YOUR_INVOCATION` corrisponds with where each vector has 32bits (32 waves); avoid division in the process. then run the same algorithm cuda employs. * `subgroupBallotBitExtract` is logically the same as `subgroupInverseBallot` * 5 implementations do not have a CUDA, HLSL, and CPP imlementation yet (subgroupBallotFindMSB, subgroupBallotFindLSB, subgroupBallotExclusiveBitCount, subgroupBallotInclusiveBitCount, subgroupBallotBitCount) due to being out of scope for the commit Testing: `tests/glsl/shader-subgroup-ballot.slang` * the function tests for an expected value of each ballot function; tests try inputting larger than 32 toggled bits as function parameters to ensure the implementation correctly identifies values up to a maximum of the subgroup invocation count as per extension specification (otherwise the functionality is fairly trivial to test) } GL_KHR_shader_subgroup_arithmetic{ **Partially implemented** Implementation: * There are 21 functions to implement: * 14 functions are using the hlsl.meta implementation * 7 functions are new implementations -- only implemented for GLSL and SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required * CUDA, CPP, HLSL are out of scope for the commit Testing: `tests/glsl/shader-subgroup-arithmetic.slang` * all tests silently kill the shader; outputted GLSL was checked, could not see an issue * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_shuffle{ **Partially implemented** Implementation: * There are 2 functions to implement: * 1 function is using the existing hlsl.meta implmentation * 1 function is using a new implmentation (subgroupShuffleXor) -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle.slang` * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit] * tests fail with cpp due to `kIROp_WaveGetActiveMask` failing to be called } GL_KHR_shader_subgroup_shuffle_relative{ **Partially implemented** Implementation: * There are 2 functions to implement: * all 2 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-relative.slang` * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_clustered{ **Partially implemented** Implementation: * There are 7 functions to implement: * all 7 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-clustered.slang` * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_quad{ **Partially implemented** Implementation: * There are 4 functions to implement: * all 4 functions are using hlsl.meta implmentations -- only implemented for GLSL & SPIR-V & HLSL Testing: `tests/glsl/shader-subgroup-shuffle-quad.slang` * these tests only check basic functionality and correctness of all functions implemented; not an exaustive test [further continued in "Other notes of worthy" at end of commit] } --------- Failing tests and why: Note: due to system variables not being implemented largly for CUDA and CPP, these tests will fail (#3 and #4){ tests/glsl/shader-subgroup-arithmetic.slang.3 tests/glsl/shader-subgroup-arithmetic.slang.4 tests/glsl/shader-subgroup-ballot.slang.4 tests/glsl/shader-subgroup-basic.slang.3 tests/glsl/shader-subgroup-basic.slang.4 tests/glsl/shader-subgroup-quad.slang.3 tests/glsl/shader-subgroup-quad.slang.4 tests/glsl/shader-subgroup-vote.slang.3 tests/glsl/shader-subgroup-vote.slang.4 } Note: due to kIROp_WaveGetActiveMask not being loaded for cpp the following test will fail{ tests/glsl/shader-subgroup-shuffle.slang.4 } Note: due to a unknown silent error the following will fail [could not spot an error in the generated glsl and spir-v]{ tests/glsl/shader-subgroup-arithmetic.slang.5 (vk) tests/glsl/shader-subgroup-arithmetic.slang.6 (vk) } Other notes of worthy:{ * only a few types are checked currently in tests due to equality templates not allowing freely casting to int/uint, meaning to test types en-mass is not trivial and will most likley be completly replaced once templates can cast & check equality more freely. * did not implement vector types for any functions that may use them (mostly in reference to SPIR-V, since many may accept scalar or vector inputs); applicable to subgroup-shuffle, subgroup-shuffle-relative, subgroup-arithmetic, subgroup-shuffle, subgroup_clustered, subgroup_quad * did not implement checks for half floats * CUDA, CPP, HLSL implementations were largly out of scope and if not implemented, this is due to the implementation not being trivial } Random fixes encountered:{ * hlsl.meta incorrectly sets `OpCapability` as `GroupNonUniformBallot` when the `OpCapability` should be `GroupNonUniformVote`; this is as per SPIR-V spec for all SPIR-V calls used in `GL_KHR_shader_subgroup_vote`: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformAll } * added vector types and tests; Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548 GL_KHR_shader_subgroup implemented based on https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt GL_KHR_shader_subgroup_* & GLSL ref: * https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt * https://www.khronos.org/blog/vulkan-subgroup-tutorial * https://www.khronos.org/assets/uploads/developers/library/2018-vulkan-devday/06-subgroups.pdf HLSL ref: * https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions * https://github.com/Microsoft/DirectXShaderCompiler/wiki/Wave-Intrinsics CUDA ref: * https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html SPIR-V ref: * https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_memory_semantics_id Implementation is broken down into seperate glsl extensions due to the ***large differences*** in implementation of each section, and functionality/testing. GL_KHR_shader_subgroup_basic{ **Partially implemented** Implementation: * All 9 built-in variables have been stubbed without proper value; implementation is still required for these system variables; related to #411. * Functions were reimplemented despite nearly mirrored HLSL functions due to: * hlsl.meta implementations targetting workgroups rather than a warp/wave/subgroup: * `__syncwarp` vs `__syncthreads` * `SubgroupMemory` vs `WorkgroupMemory` * etc. * hlsl.meta implementations target broader SPIR-V memory targets to block on: * ImageMemory|UniformMemory versus SPIR-V specifying barriers for ImageMemory and seperately an option for UniformMemory * `subgroupElect` for CUDA has a different implementation than `WaveIsFirstLane`, this is because spec states that `subgroupElect()` only returns the lowest active gl_SubgroupInvocationID; therefore we are supposed to fetch the current active mask even if some invocations are turned off by branches Testing: tests for the variable -- `tests/glsl/shader-subgroup-built-in-variables.slang` * these tests do not test functionality since not implemented yet tests for the functions -- `tests/glsl/shader-subgroup-basic.slang` * concurrency is tested for using SubgroupMemory, UniformMemory through attempting to create a GPU side race condition with writing and reading memory * due to testing tools avaible there are no tests for ImageMemory * subgroupElect is tested to return invocation #0, the lowest invocation that will always run; wave size is 32, therefore #0 is always active and will always be the elected invocation. } GL_KHR_shader_subgroup_vote{ **Fully implemented** Implementation: * 3/3 functions are using the hlsl.meta implementation Testing: `tests/glsl/shader-subgroup-vote.slang` * Testing each a positive (returns true) and negative (returns false) test case to ensure vote results are correct } GL_KHR_shader_subgroup_ballot{ **Partially implemented** Implementation: There are 10/10 functions that are implemented: * 3 are using hlsl.meta implementation * 7 are using new implementations -- only support GLSL, SPIR-V, HLSL, CUDA * These implementations do not exist in hlsl.meta, so they were added * `subgroupInverseBallot` lacks an analog function to call; this feature was emulated: * in CUDA through knowing waves are 32bit and lanes are 0 indexed, this implys that ` (ballotResult >> YOUR_INVOCATION) & 1` checks if your invocation is active, for example, `(0b11001 >> 3) & 1` would mean that only invocation 5, 4, and 1 is active, 3 would mean `YOUR_INVOCATION` is the fourth invocation in the subgroup. `(0b11001>>3) & 1` would return true since your bit is toggled and evaluates to `0b11 & 0b1` * in HLSL through testing if the wave count is 32 or less (use the same logic as CUDA in this case); else find the index `YOUR_INVOCATION` corrisponds with where each vector has 32bits (32 waves); avoid division in the process. then run the same algorithm cuda employs. * `subgroupBallotBitExtract` is logically the same as `subgroupInverseBallot` * 5 implementations do not have a CUDA, HLSL, and CPP imlementation yet (subgroupBallotFindMSB, subgroupBallotFindLSB, subgroupBallotExclusiveBitCount, subgroupBallotInclusiveBitCount, subgroupBallotBitCount) due to being out of scope for the commit Testing: `tests/glsl/shader-subgroup-ballot.slang` * the function tests for an expected value of each ballot function; tests try inputting larger than 32 toggled bits as function parameters to ensure the implementation correctly identifies values up to a maximum of the subgroup invocation count as per extension specification (otherwise the functionality is fairly trivial to test) } GL_KHR_shader_subgroup_arithmetic{ **Partially implemented** Implementation: * There are 21 functions to implement: * 14 functions are using the hlsl.meta implementation * 7 functions are new implementations -- only implemented for GLSL and SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required * CUDA, CPP, HLSL are out of scope for the commit Testing: `tests/glsl/shader-subgroup-arithmetic.slang` * all tests silently kill the shader; outputted GLSL was checked, could not see an issue * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_shuffle{ **Partially implemented** Implementation: * There are 2 functions to implement: * 1 function is using the existing hlsl.meta implmentation * 1 function is using a new implmentation (subgroupShuffleXor) -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] * tests fail with cpp due to `kIROp_WaveGetActiveMask` failing to be called } GL_KHR_shader_subgroup_shuffle_relative{ **Partially implemented** Implementation: * There are 2 functions to implement: * all 2 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-relative.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_clustered{ **Partially implemented** Implementation: * There are 7 functions to implement: * all 7 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-clustered.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_quad{ **Partially implemented** Implementation: * There are 4 functions to implement: * all 4 functions are using hlsl.meta implmentations -- only implemented for GLSL & SPIR-V & HLSL Testing: `tests/glsl/shader-subgroup-shuffle-quad.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } --------- Failing tests and why: Note: test numbers are assuming none of the existing tests are toggled off Note: due to system variables not being implemented largly for CUDA and CPP, these tests will fail (#3 and #4){ tests/glsl/shader-subgroup-arithmetic.slang.3 tests/glsl/shader-subgroup-arithmetic.slang.4 tests/glsl/shader-subgroup-ballot.slang.4 tests/glsl/shader-subgroup-basic.slang.3 tests/glsl/shader-subgroup-basic.slang.4 tests/glsl/shader-subgroup-quad.slang.3 tests/glsl/shader-subgroup-quad.slang.4 tests/glsl/shader-subgroup-vote.slang.3 tests/glsl/shader-subgroup-vote.slang.4 } Note: due to kIROp_WaveGetActiveMask not being loaded for cpp the following test will fail{ tests/glsl/shader-subgroup-shuffle.slang.4 tests/glsl/shader-subgroup-shuffle-relative.slang.4 tests/glsl/shader-subgroup-basic.slang.4 } Note: due to a unknown silent error the following will fail [could not spot an error in the generated glsl and spir-v]{ tests/glsl/shader-subgroup-arithmetic.slang.5 (vk) tests/glsl/shader-subgroup-arithmetic.slang.6 (vk) } Other notes of worthy:{ * only a few types are checked currently in arithmetic test; this is due to the test silently failing, meaning I can't actually test anything implemented * did not implement checks for half floats * CUDA, CPP, HLSL implementations were largly out of scope and not implemented, this is due to the implementation being non trivial for many functions } Random fixes encountered:{ * hlsl.meta incorrectly sets `OpCapability` as `GroupNonUniformBallot` when the `OpCapability` should be `GroupNonUniformVote`; this is as per SPIR-V spec for all SPIR-V calls used in `GL_KHR_shader_subgroup_vote`: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformAll } * Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548 Partially Implement with tests, functions and built-in variables apart of GL_KHR_shader_subgroup; Partially resolves #3548 GL_KHR_shader_subgroup implemented based on https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt GL_KHR_shader_subgroup_* & GLSL ref: * https://github.com/KhronosGroup/GLSL/blob/main/extensions/khr/GL_KHR_shader_subgroup.txt * https://www.khronos.org/blog/vulkan-subgroup-tutorial * https://www.khronos.org/assets/uploads/developers/library/2018-vulkan-devday/06-subgroups.pdf HLSL ref: * https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions * https://github.com/Microsoft/DirectXShaderCompiler/wiki/Wave-Intrinsics CUDA ref: * https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html SPIR-V ref: * https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_memory_semantics_id Implementation is broken down into seperate glsl extensions due to the ***large differences*** in implementation of each section, and functionality/testing. GL_KHR_shader_subgroup_basic{ **Partially implemented** Implementation: * All 9 built-in variables have been stubbed without proper value; implementation is still required for these system variables; related to #411. * Functions were reimplemented despite nearly mirrored HLSL functions due to: * hlsl.meta implementations targetting workgroups rather than a warp/wave/subgroup: * `__syncwarp` vs `__syncthreads` * `SubgroupMemory` vs `WorkgroupMemory` * etc. * hlsl.meta implementations target broader SPIR-V memory targets to block on: * ImageMemory|UniformMemory versus SPIR-V specifying barriers for ImageMemory and seperately an option for UniformMemory * `subgroupElect` for CUDA has a different implementation than `WaveIsFirstLane`, this is because spec states that `subgroupElect()` only returns the lowest active gl_SubgroupInvocationID; therefore we are supposed to fetch the current active mask even if some invocations are turned off by branches Testing: tests for the variable -- `tests/glsl/shader-subgroup-built-in-variables.slang` * these tests do not test functionality since not implemented yet tests for the functions -- `tests/glsl/shader-subgroup-basic.slang` * concurrency is tested for using SubgroupMemory, UniformMemory through attempting to create a GPU side race condition with writing and reading memory * due to testing tools avaible there are no tests for ImageMemory * subgroupElect is tested to return invocation #0, the lowest invocation that will always run; wave size is 32, therefore #0 is always active and will always be the elected invocation. } GL_KHR_shader_subgroup_vote{ **Fully implemented** Implementation: * 3/3 functions are using the hlsl.meta implementation Testing: `tests/glsl/shader-subgroup-vote.slang` * Testing each a positive (returns true) and negative (returns false) test case to ensure vote results are correct } GL_KHR_shader_subgroup_ballot{ **Partially implemented** Implementation: There are 10/10 functions that are implemented: * 3 are using hlsl.meta implementation * 7 are using new implementations -- only support GLSL, SPIR-V, HLSL, CUDA * These implementations do not exist in hlsl.meta, so they were added * `subgroupInverseBallot` lacks an analog function to call; this feature was emulated: * in CUDA through knowing waves are 32bit and lanes are 0 indexed, this implys that ` (ballotResult >> YOUR_INVOCATION) & 1` checks if your invocation is active, for example, `(0b11001 >> 3) & 1` would mean that only invocation 5, 4, and 1 is active, 3 would mean `YOUR_INVOCATION` is the fourth invocation in the subgroup. `(0b11001>>3) & 1` would return true since your bit is toggled and evaluates to `0b11 & 0b1` * in HLSL through testing if the wave count is 32 or less (use the same logic as CUDA in this case); else find the index `YOUR_INVOCATION` corrisponds with where each vector has 32bits (32 waves); avoid division in the process. then run the same algorithm cuda employs. * `subgroupBallotBitExtract` is logically the same as `subgroupInverseBallot` * 5 implementations do not have a CUDA, HLSL, and CPP imlementation yet (subgroupBallotFindMSB, subgroupBallotFindLSB, subgroupBallotExclusiveBitCount, subgroupBallotInclusiveBitCount, subgroupBallotBitCount) due to being out of scope for the commit Testing: `tests/glsl/shader-subgroup-ballot.slang` * the function tests for an expected value of each ballot function; tests try inputting larger than 32 toggled bits as function parameters to ensure the implementation correctly identifies values up to a maximum of the subgroup invocation count as per extension specification (otherwise the functionality is fairly trivial to test) } GL_KHR_shader_subgroup_arithmetic{ **Partially implemented** Implementation: * There are 21 functions to implement: * 14 functions are using the hlsl.meta implementation * 7 functions are new implementations -- only implemented for GLSL and SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required * CUDA, CPP, HLSL are out of scope for the commit Testing: `tests/glsl/shader-subgroup-arithmetic.slang` * all tests silently kill the shader; outputted GLSL was checked, could not see an issue * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_shuffle{ **Partially implemented** Implementation: * There are 2 functions to implement: * 1 function is using the existing hlsl.meta implmentation * 1 function is using a new implmentation (subgroupShuffleXor) -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] * tests fail with cpp due to `kIROp_WaveGetActiveMask` failing to be called } GL_KHR_shader_subgroup_shuffle_relative{ **Partially implemented** Implementation: * There are 2 functions to implement: * all 2 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-relative.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_clustered{ **Partially implemented** Implementation: * There are 7 functions to implement: * all 7 functions are using a new implmentation -- only implmented for GLSL & SPIR-V * GLSL & SPIR-V both use their related functions, no emulation required Testing: `tests/glsl/shader-subgroup-shuffle-clustered.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } GL_KHR_shader_subgroup_quad{ **Partially implemented** Implementation: * There are 4 functions to implement: * all 4 functions are using hlsl.meta implmentations -- only implemented for GLSL & SPIR-V & HLSL Testing: `tests/glsl/shader-subgroup-shuffle-quad.slang` * these tests only check basic functionality and correctness of all functions implemented; [further continued in "Other notes of worthy" at end of commit] } --------- Failing tests and why: Note: test numbers are assuming none of the existing tests are toggled off Note: due to system variables not being implemented largly for CUDA and CPP, these tests will fail (#3 and #4){ tests/glsl/shader-subgroup-arithmetic.slang.3 tests/glsl/shader-subgroup-arithmetic.slang.4 tests/glsl/shader-subgroup-ballot.slang.4 tests/glsl/shader-subgroup-basic.slang.3 tests/glsl/shader-subgroup-basic.slang.4 tests/glsl/shader-subgroup-quad.slang.3 tests/glsl/shader-subgroup-quad.slang.4 tests/glsl/shader-subgroup-vote.slang.3 tests/glsl/shader-subgroup-vote.slang.4 } Note: due to kIROp_WaveGetActiveMask not being loaded for cpp the following test will fail{ tests/glsl/shader-subgroup-shuffle.slang.4 tests/glsl/shader-subgroup-shuffle-relative.slang.4 tests/glsl/shader-subgroup-basic.slang.4 } Other notes of worthy:{ * added preamble function and macros for implementing subgroup functionality (and tests) to make it possible to iterate on the functionality with reasonable effort in the future * CUDA, CPP, HLSL implementations were largly out of scope and not implemented, this is due to the implementation being non trivial for many functions * doubles cause a silent crash on most subgroup functions tested (silent shader hang) * __requireGLSLExtension does not work as intended inside glsl.meta; as a result half, int16, int64 int8, all are ommited from testing } Random fixes encountered:{ * hlsl.meta incorrectly sets `OpCapability` as `GroupNonUniformBallot` when the `OpCapability` should be `GroupNonUniformVote`; this is as per SPIR-V spec for all SPIR-V calls used in `GL_KHR_shader_subgroup_vote`: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformAll * hlsl.meta incorrectly uses for WaveMaskPrefixBitOr (SPIR-V) OpGroupNonUniformBitwiseAnd intead of OpGroupNonUniformBitwiseOr; this was fixed } * redesign tests under suggestions that they should be smaller, more maintainable, and test the most amount of data reasonabley possible (balance with fast iterations); optional double testing varying parameter testing most tests chain results now * fix missing impl and merge conflict resolutions * reundant test code cleanup and organization move tests to proper location (glsl-intrinsic) clean up redundant code (input buffers) * add missing logical operands support (and remove hlsl/cuda code reuse due to the functional differences) under all And, Or, Xor ops redesign tests to conform to a better testing paradigm * testing code style change to not use white space as a toggle for tests * provided crash reason for doubles (intel iris gpu's crash in glsl with doubles due to missing support in device caps [as per vulkan validation layer) uncommented the `__requireGLSLExtension` code so once it is fixed int16/8/64/half wil work with subgroup not requiring future intervention * fixing some vk validation layer errors (OpMemoryBarrier, Shuffle operations) modified style of tests; removed redundancy (extra code that does nothing); fixed some incorrect run targets; added error reasons for all encountered problems (and if needed, a #define/#if toggle) * remove comments of important tests inplace of #define over the broken feature of extended shader_subgroup types * removed macros inside glsl.meta removed erroneous __target_switch to directly call hlsl.meta function added elaboration on the problem with __requireGLSLExtension changed WaveMaskPrefixBit[or|and|xor] to support the expected type of <int> only as per `HLSL Shader Model 6.5` specs removed "precision highp" since it does not affect tests * changes some hlsl.meta functions used to be more appropriate (as per suggested) WaveMask -> WaveActive.* WaveMaskPrefix.* -> WavePrefix.* remove __target_switch case's for unimplemented case's of intrinsics fix _getLaneId() being removed from some regex used earlier * fix usage of __target_intrinsic instead of __intrinsic_asm; silently would cause only arguments to be emmitted as return changed usage of `__requireGLSLExtension` because now it causes a crash from the missing intrinsic (instead of a silent error) * fix shader subgroup extended types support for GLSL and SPIR-V: 1. seperate intrinsic/__requireGLSL generating functionality of shader_subgroup_preamble into child function calls due to otherwise `__requireGLSLExtension` being ignored if the calling function of shader_subgroup_preamble calls an `__intrinsic_asm` 2. fixed HLSL.meta logic for wave operations (Add, Mul, exclusiveAdd, exclusiveMul) to no longer cast the input type T into a uint due to cost-of-op & crash. * Int8_t bit casted into uint32_t crashed the compiler. As per SPIR-V spec, OpGroupNonUniformI.* work on uint and int types meaning the function has no need to cast to a unit. 3. removed erroneous __target_switch for subgroupShuffle * 1. ignore tests gracefully 2. remove un-needed SPIRV capability specifying (with OpCapability) 3. clean up structure of typeRequireChecks_shader_subgroup_GLSL 4. explain why HLSL/CUDA are not targeted for shader-subgroup-arithmetic.slang * syntax changes + `property` declaration fix + builtin var glsl implementation + changed incorrect HLSL.meta assumptions (#1)`property` declaration as *non member* implementation change/fix (all of the changes to `slang-lower-to-ir.cpp`) using (#1), implemented subgroup builtin's for GLSL/SPIR-V; did not implement built'ins completly for HLSL/CUDA due to non trivial implementations. CPP has no implementation due to missing support of system values changed some incorrect HLSL.meta subgroup implementation assumptions of type usage (bit casting 8bit->32bit, wrong capabilities causing errors) dumping ast crash with spir-v when using builtin's fixed by adding the `builtin` spirv case (all of the changes to `slang-ast-dump.cpp`) [ForceInline] addition to functions missing it return instead of spirv_asm when empty blocks are used * syntax & organization of tests adjustment (specifically how if'def's are managed) * figuring out where ci fails * figuring out where ci fails -- testing with enclusive & regular * testing CI with exclusive, regular, inclusive * remove unneeded white space test CI inconsistency issues further with arithmetic.slang * testing if the ci run fails due to some timeout/recovery issue * split up arithmetic tests and push to test with CI --------- Co-authored-by: Yong He <yonghe@outlook.com>
* Support link time type specialization. (#3604)Yong He2024-02-20
|
* Capability type checking. (#3530)Yong He2024-02-02
| | | | | | | | | * Capability type checking. * Fix. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Add ConstBufferPointer::subscript. (#3415)Yong He2023-12-15
| | | Co-authored-by: Yong He <yhe@nvidia.com>
* Unify stdlib `Texture` types into one generic type. (#3327)Yong He2023-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Unify Texture types in stdlib into 1 generic type. * Fixes. * Fix. * Fixes. * Fix reflection. * Fix binding reflection. * Add gather intrinsics. * Fix gather intrinsics. * Fix texture type toText. * Fix intrinsic. * fix cuda intrinsic. * Fix project files. * cleanup. * Fix. * Fix. * Fix sampler feedback test. * Fix getDimension intrinsics. * Fix spirv sample image intrinsics. * Fix test. * Fix GLSL intrinsic. * Cleanup. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Allow bitwise or expressions and numeric literals in spirv_asm blocks (#3157)Ellie Hermaszewska2023-08-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add -spirv-core-grammar option to load alternate spirv defs Also embed a version to use by default * Use perfect hash for spv op lookup * Neaten perfect hash embedding * Refactor spirv grammar lookup in preperation for more kinds of lookups * Load spirv capability list from spec * Add all SPIR-V enums to lookup table * regenerate vs projects * appease msvc * Use string slices for spir-v core grammar lookups * wiggle * comment * Add OpInfo for spv ops * regenerate vs projects * Embed op names * Add min/max operand counts and enum categories to spirv info * neaten * Operand kinds for spirv ops * Store and embed all information relating to spirv enums and qualifiers * Use SPIR-V spec to position instructions in spirv_asm blocks * Neaten spir-v info embedding * Neaten perfect hash embedding * Add assignment syntax to spirv_asm snippets * Better errors for spirv_asm parser * Add warning for too many operands in spirv asm * squash warnings * neaten * test wiggle * Lookup enums for spirv * Put OpCapability and OpExtension in the correct place for spirv_asm blocks * Tests for OpCapability and OpExtension * ci wiggle * Add expected failure * Allow raising immediate values to constant ids where necessary in spirv_asm blocks * Allow bitwise or expressions and numeric literals in spirv_asm blocks * test numeric literals * Fix memory issues. * fix. --------- Co-authored-by: Yong He <yonghe@outlook.com>
* Initial version of spirv_asm block (#3151)Ellie Hermaszewska2023-08-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Initial version of spirv_asm block * Correct indentation of parent instruction dumping * neater dumping for spirv_asm instructions * Add $$ DollarDollar token * Allow passing addresses to spirv_asm blocks * spirv OpUndef * String literals in spirv asm * OpName for spirv_asm ids * Correct failure in lower spirv_asm * correct position for spirv_asm idents * comment correct * several more tests for spirv_asm blocks * Fill out some unimplemented functions for spirv_asm expressions --------- Co-authored-by: Yong He <yonghe@outlook.com>
* Use ankerl/unordered_dense as a hashmap implementation (#3036)Ellie Hermaszewska2023-08-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Correct namespace for getClockFrequency * missing const * Add missing assignment operator * Remove unused variables * Return correct modified variable * Use stable hash code for file system identity * terse static_assert * Structured binding for map iteration * Make (==) and getHashCode const on many structs * Add ConstIterator for LinkedList * Replace uses of ItemProxy::getValue with Dictionary::at * Extract list of loads from gradientsMap before updating it * Const correctness in type layout * Add unordered_dense hashmap submodule * Use wyhash or getHashCode in slang-hash.h * refactor slang-hash.h * Use ankerl/unordered_dense as a hashmap implementation Notable changes: - The subscript operator returns a reference directly to the value, rather than a lazy ItemProxy (pair of dict pointer and key) slang-profile time (95% over 10 runs): - Before: 6.3913906 (±0.0746) - After: 5.9276123 (±0.0964) * 64 bit hash for strings So they have the same hash as char buffers with the same contents * Narrowing warnings for gcc to match msvc * revert back to c++17 * Correct c++ version for msvc * Use path to unordered_dense which keeps tests happy * Do not assign to and read from map in same expression * Remove redundant map operations in primal-hoist * Split out stable hash functions into slang-stable-hash.h * 64 bit hash by default * regenerate vs projects * Correct return type from HashSetBase::getCount() * correct width for call to Dictionary::reserve * Use stable hash for obfuscated module ids * Signed int for reserve * clearer variable naming * Parameterize Dictionary on hash and equality functors * Allow heterogenous lookup for Dictionary * missing const * Use set over operator[] in some places * Remove unused function * s/at/getValue
* Redesign `DeclRef` and systematic `Val` deduplication (#3049)Yong He2023-08-04
| | | | | | | | | | | | | | | | | | | | | | | * Redesign DeclRef + Deduplicate Val. * Update project files * Fix warning. * Fix. * Fix. * Remove `Val::_equalsImplOverride`. * Rmove `Val::_getHashCodeOverride`. * Remove `semanticVisitor` param from `resolve`. * Cleanups. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Make DeclRefBase a Val, and DeclRef<T> a helper class. (#2967)Yong He2023-07-07
| | | | | | | | | | | | | | | | | | | | | * Make DeclRefBase a Val, and DeclRef<T> a helper class. * Fixes. * Workaround gcc parser issue. * Revert NodeOperand change. * Fix. * Fix clang incomplete class complains. * Fix code review. * Small cleanups and improvements. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Embed stdlib documentation to AST. (#2851)Yong He2023-04-27
| | | | | | | | | * Embed stdlib documentation to AST. * Extract documentation for attributes. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Fix most of the disabled warnings on gcc/clang (#2839)Ellie Hermaszewska2023-04-26
|
* StringBuilder to lowerCamel (#2840)jsmall-nvidia2023-04-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * WIP lowerCamel Dictionary. * WIP more lowerCamel fixes for Dictionary. * Add/Remove/Clear * GetValue/Contains * Fix tabs in dictionary. Count -> getCount * Fix fields with caps. * Key -> key Value -> value Use m_ for members where appropriate. Use lowerCamel in linked list. * Some small fixes/improvements to Dictionary. * Kick CI. * Small tidy on String. * Append -> append * ToString -> toString ProduceString -> produceString * Small fixes. * StringToXXX -> stringToXXX * Fix typo introduced by Append -> append. * Made intToAscii do reversal at the end. --------- Co-authored-by: Yong He <yonghe@outlook.com>
* Dictionary using lowerCamel (#2835)jsmall-nvidia2023-04-25
| | | | | | | | | | | | | | | | | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * WIP lowerCamel Dictionary. * WIP more lowerCamel fixes for Dictionary. * Add/Remove/Clear * GetValue/Contains * Fix tabs in dictionary. Count -> getCount * Fix fields with caps. * Key -> key Value -> value Use m_ for members where appropriate. Use lowerCamel in linked list. * Some small fixes/improvements to Dictionary. * Kick CI.
* Rework differential conformance dictionary checking. (#2483)Yong He2022-11-02
| | | | | | | * Rework differential conformance dictionary checking. * Revert space changes. Co-authored-by: Yong He <yhe@nvidia.com>
* Auto synthesis of IDifferntial interface methods. (#2469)Yong He2022-10-27
| | | | | | | * Auto synthesis of IDifferntial interface methods. * Add comments. Co-authored-by: Yong He <yhe@nvidia.com>
* Auto synthesis of Differential type (#2466)Yong He2022-10-26
|
* New language feature: basic error handling. (#2253)Yong He2022-06-01
| | | | | | | | | * New language feature: basic error handling. * Fix. * Fix `tryCall` encoding according to code review. Co-authored-by: Yong He <yhe@nvidia.com>
* Remove GlobalGenericParamSubstitution (#1684)Tim Foley2021-02-02
| | | | | | | The `GlobalGenericParamSubsitution` class used to be used to represent the mapping of global-scope generic parameters to their concrete arguments, so that we could make use of those concrete arguments for things like layout. That representation caused a lot of pain for other parts of the compiler, though, because everything that dealt with `Substitution`s needed to account for the possibility of global-generic-param subsitutions even if they logically could not occur in most parts of the compiler. We have since moved to a model where the values for global-scope generic parameters are stored in a single explicit global structure that is used by both layout computation and IR lowering. There is no actual code that construct `GlobalGenericParamSubstitution`s from scratch any more, so all of the support code for them was actually unused. This change removes all the unused code, and shows that the tests still pass without it (even the tests that use global-scope generic parameters).
* C++ extractor fix for access modifiers (#1586)jsmall-nvidia2020-10-23
| | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * Fix handling of access modifiers inside type definition. * Fix access problem for AST node. Make dumping produce a single function with switch, to potentially make available without Dump specific access. * Remove project references to previously generated files.
* Single pass C++ extraction (#1583)jsmall-nvidia2020-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * Added CharUtil. Added TypeSet to extractor. First pass at being able to specify all headers for multiple output headers. * Fix includes for new C++ extractor convension. Update premake5 to use new extractor mechanisms. * Small improvements around StringUtil. * Split out NameConventionUtil. * Use a 'convert' to convert between convention types. * Fix output of build message for C++ extractor. Improve NameConventionUtil interface. * Improve comments. * Fix warning on gcc. * Fix clang warning. * Fix some typos in NameConventionUtil. * Small fix to premake5.lua * Fix generated includes. * Remove m_reflectType as no longer applicable with TypeSet. * Fix .gitignore for slang-generated-* files. Added getConvention to determine convention from slice. Add versions of split and convert that infer the from convention * Fix typo in spliting camel. * LineWhitespace -> HorizontalWhitespace * Improve CharUtil comments.
* Generalizing Serialization (#1563)jsmall-nvidia2020-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * First pass at generalizing serializer. * Split out ReflectClassInfo * Use the general ReflectClassInfo * Fix some typos in debug generalized serialization. * Add calculation of classIds. Make distinct addCopy/add on SerialClasses. * Write up of more generalized serialization * WIP to transition from ASTSerialReader/Writer etc to generalized SerialReader/Writer and associated types. * Improvements to SerialExtraObjects. Keep RefObjects in scope in factory * Compiles with Serial refactor - doesn't quite work yet. * First pass serialization appears to work with refector. * Split out type info for general slang types. * Split out slang-serialize-misc-type-info.h * DebugSerialData -> SerialSourecLocData DebugSerialReader -> SerialSourceLocReader DebugSerialWriter -> SerialSourceLocWriter * Remove unused template that only compiles on VS. * Fix warning around unused function on non-VS.
* AST Serialization in Modules (#1524)jsmall-nvidia2020-08-31
| | | | | | | | | * First pass at filter for AST serial writing. * Serialization of AST for modules. * Removed some commented out source. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
* First pass support for Sampler Feedback (#1470)jsmall-nvidia2020-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add the Feedback texture types. Depreciate SLANG_RESOURCE_EXT_SHAPE_MASK. * Starting point to test sampler feedback. * WIP on FeedbackSampler. * Use __target_intrinsic to override the output of sampler feedback types. * Use newer generic syntax for FeedbackTexture. * Reflects Feedback type. * SLANG_TYPE_KIND_TEXTURE_FEEDBACK -> SLANG_TYPE_KIND_FEEDBACK * Added reflection test. * Reneable issue with generics in sampler-feedback-basic.slang * Add methods to FeedbackTexture2D/Array. Make test cover test cases. * Sampler feedback produces DXC code. * Disabled Sampler feedback test - as requires newer version of DXC. * Fix bug in reflection tool output. * Fix problem with direct-spirv-emit.slang.expected due to update to glslang. * Fix direct-spirv-emit.slang * Use SLANG_RESOURCE_EXT_SHAPE_MASK again * Make Feedback be emitted as a textue type prefix. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
* AST serialize improvements (#1412)jsmall-nvidia2020-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Try to fix problem with C++ extractor concating tokens producing an erroneous result. * Improve naming/comments around C++ extractor fix. * Another small improvement around space concating when outputing token list. * Handle some more special cases for consecutive tokens for C++ extractor concat of tokens. * WIP AST serialization. * Comment out so compile works. * More work on AST serialization. * WIP AST serialize. * WIP AST Serialization - handling more types. * WIP: Compiles but not all types are converted, as not all List element types are handled. * Compiles with array types. * Finish off AST serialization of remaining types. * Remove ComputedLayoutModifier and TupleVarModifier. * Add fields to ASTSerialClass type. * Construct AST type layout. * AST Serialization working for writing to ASTSerialWriter. * Removed call to ASTSerialization::selfTest in session creation. * Fixes for gcc. * Diagnostics handling - better handling of dashify. * Improve comment around DiagnosticLookup. * Updated VS project. * Write out as a Stream, taking into account alignment. * First pass at serializing in AST. * Added support for deserializing arrays. * Small bug fixes. * Fix problem calculating layout. Split out loading on entries. * Fix typo in AST conversion. * Add some flags to control AST dumping. * Fix bug from a typo. * Special case handling of Name* in AST serialization. * Special case handling of Token lexemes, make Names on read. * Documentation on AST serialization. * ASTSerialTestUtil - put AST testing functions. Fix typo that broke compilation. * Fix typo.
* ASTNodes use MemoryArena (#1376)jsmall-nvidia2020-06-05
| | | | | | | | | | | | | | | | | | | | | | | | | | * Add a ASTBuilder to a Module Only construct on valid ASTBuilder (was being called on nullptr on occassion) * Add nodes to ASTBuilder. * Compiles with RefPtr removed from AST node types. * Initialize all AST node pointer variables in headers to nullptr; * Initialize AST node variables as nullptr. Make ASTBuilder keep a ref on node types. Make SyntaxParseCallback returns a NodeBase * Don't release canonicalType on dtor (managed by ASTBuilder). * Give ASTBuilders a name and id, to help in debugging. For now destroy the session TypeCache, to stop it holding things released when the compile request destroys ASTBuilders. * Moved the TypeCheckingCache over to Linkage from Session. * NodeBase no longer derived from RefObject. * Only add/dtor nodes that need destruction. First pass compile on linux.
* Working matrix swizzle (#1354)Dietrich Geisler2020-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Working matrix swizzle. Supports one and zero indexing and multiple elements. Performs semantic checking of the swizzle. Matrix swizzles are transformed into a vector of indexing operations during lowering to the IR. This change does not handle matrix swizzle as lvalues. * Renaming * Added missing semicolon * Initialize variable for gcc * Added the expect file for diagnostics * Matrix swizzle updated per PR feedback * Stylistic fix * Formatting fixes * Fix compiling with AST change. Change indentation. Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
* NodeBase types constructed with astNodeType member set (#1363)jsmall-nvidia2020-05-29
| | | | | | | | * Maked Substituions derived from NodeBase * * Add astNodeTYpe field to NodeBase * Make Substitutions derived from NodeBase * Make all construction through ASTBuilder * Make getClassInfo non virtual (just uses the astNodeType)
* Feature/ast syntax standard (#1360)jsmall-nvidia2020-05-29
| | | | | | | | | * Small improvements to documentation and code around DiagnosticSink * Made methods/functions in slang-syntax.h be lowerCamel Removed some commented out source (was placed elsewhere in code) * Making AST related methods and function lowerCamel. Made IsLeftValue -> isLeftValue.
* AST dump improvements (#1350)jsmall-nvidia2020-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add support for parsing array types to C++ extractor. * C++ extractor looks for 'balanced tokens'. Use for extracting array suffixes. * First pass at field dumping. * Update project for field dumping. * WIP AST Dumper. * More AST dump compiling. * Fix bug in StringSlicePool where it doesn't use the copy of the UnownedStringSlice in the map. * Add support for SLANG_RELFECTED and SLANG_UNREFLECTED More AST dump support. * Support for hierarchical dumping/flat dumping. Use SourceWriter to dump. * Add -dump-ast command line option. * Add fixes to VS project to incude AST dump. * Fix compilation on gcc. * Add fix for type ambiguity issue on x86 VS. * Fixes from merge of reducing Token size. * Fix comment about using SourceWriter. * Improvement AST dumping around * Pointers (write as hex) * Scope * Turn off some unneeded fields in AST hierarchy * Only output the initial module in full.
* AST dumping via C++ Extractor reflection (#1348)jsmall-nvidia2020-05-20
* Add support for parsing array types to C++ extractor. * C++ extractor looks for 'balanced tokens'. Use for extracting array suffixes. * First pass at field dumping. * Update project for field dumping. * WIP AST Dumper. * More AST dump compiling. * Fix bug in StringSlicePool where it doesn't use the copy of the UnownedStringSlice in the map. * Add support for SLANG_RELFECTED and SLANG_UNREFLECTED More AST dump support. * Support for hierarchical dumping/flat dumping. Use SourceWriter to dump. * Add -dump-ast command line option. * Add fixes to VS project to incude AST dump. * Fix compilation on gcc. * Add fix for type ambiguity issue on x86 VS. * Fixes from merge of reducing Token size. * Fix comment about using SourceWriter.