summaryrefslogtreecommitdiffstats
path: root/source/slang/slang-ir-redundancy-removal.cpp
Commit message (Collapse)AuthorAge
* Immutable access qualifier for pointers and use `__ldg` on cuda. (#8710)Yong He2025-10-16
| | | | | | | | | | | | | | | | | | | | | | | | This PR implements `Access.Immutable` to allow pointers to immutable data. The new type `ImmutablePtr<T>` is defined as an alias of `Ptr<T, Address.Immutable>`. By forming a immutable pointer, the programmer is conveying to the compiler that the data at the pointer address will never change during the execution of the current program. Therefore loads from immutable pointers can be deduplicated by the compiler, and will translate to `__ldg` when generating code for CUDA. The SPIRV backend is not changed in this PR, since the current SPIRV spec makes it very difficult to specify loads from immutable address without generating tons of wrappers and boilerplate type declarations. We would like to see the spec evolved a bit to around its support of `NonWritable` physical storage pointers or immutable loads before we attempt to express such immutability in SPIRV. For now we simply emit ordinary pointers and loads when generating spirv. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Addition of `Load`/`Store` coherent operations (#8395)16-Bit-Dog2025-10-10
| | | | | | | | | | | | | | | | | | | | | | | | Fixes: https://github.com/shader-slang/slang/issues/7634 Duplicate of PR https://github.com/shader-slang/slang/pull/8052 Primary Changes: * Added `storeCoherent` and `loadCoherent` for coherent load/store via pointers. This is backed by `IRMemoryScopeAttr` which is an `IRAttr` attached to `IRLoad` and `IRStore` * Logic in `source\slang\slang-emit-spirv.cpp` for load/store emitting has been reworked to be less messy and more maintainable * Add to `hlsl.meta.slang` coop vector and coop matrix coherent load/store operations Secondary Changes: * Added a missing load/store test for coop matrix: `tests\cooperative-matrix\load-store-pointer.slang` --------- Co-authored-by: ArielG-NV <aglasroth@nvidia.com> Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: Nathan V. Morrical <natemorrical@gmail.com>
* Rename some symbols related to pointers types (#8592)Theresa Foley2025-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that while this change touched a large numer of files, there are no changes to functionality being made here. The only things being done are renaming various symbols and, in a few cases, updating or adding comments for consistency with the new names. The core of the naming changes are: * Most things named to refer to `OutType` (e.g., `IROutType`, `IRBuilder::getOutType()`, etc.) have been consistently renamed to refer to `OutParamType`, to emphasize that the relevant AST/IR node types are only intended for use to represent `out` parameters. * The same change as described above for `OutType` is also made for `RefType`, which becomes `RefParamType` in most cases. One mess that this exposes is the way that the `ExplicitRef<T>` type in the core module currently lowers to `IRRefParamType`. This change sticks to the rule of not making functional changes, so that mess is left as-is for now. * Names referring to `InOutType` have been changed to instead refer to `BorrowInOutType`. The intention with this naming change is to emphasize that the Slang rules for `inout` are semantically those of a borrow (or at least our interpretation of what a borrow means). * Names referring to `ConstRefType` have been changed to instead refer to `BorrowInType`. This change starts work on clarifying that the existing `__constref` modifier was never intended to be a read-only analogue of `__ref`, and instead is the input-only analogue of `inout`. * The `ParameterDirection` enum type has been changed to `ParamPassingMode`, to reflect the fact that the concept of "direction" fails to capture what is actually being encoded, particularly once we have modes beyond simple `in`/`out`/`inout`. While this change does not alter behavior in any case (the user-exposed Slang language is unchanged), it is intended to set up subsequence changes that will work to make the handling of these types in the compiler more nuanced and correct. Breaking this part of the change out separately is primarily motivated by a desire to minimize the effort for reviewers. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Rewriting the lower-buffer-element-type pass to avoid unnecessary ↵Yong He2025-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | packing/unpacking. (#8526) Part of the effort to improve the performance of generated SPIRV code. The existing lower-buffer-element-type pass works by loading the entire buffer element content from memory, and translate it to logical type stored in a local variable at the earliest reference of a buffer handle. This means that is can generate inefficient code that reads more than necessary. Consider this example: ``` struct BigStruct { bool values[1024]; } ConstantBuffer<BigStruct> cb; void test(BigStruct v) { if (v.values[0]) { printf("ok"); } } [numthreads(1,1,1)] void computeMain() { test(cb); } ``` In IR, the `computeMain` function before lower-buffer-element-type pass is something like following: ``` func test: %v = param : BigStruct %barr = fieldExtract(%v, "values") %element = elementExtract(%barr, 0) ... // uses %element func computeMain: %v = load(cb) call %test %v ``` The existing lower-buffer-element-type pass will rewrite the bool array in `BigStruct` into `int` array so it is legal in SPIRV. However, it does so by inserting the translation on the first `load` of the constant buffer: ``` struct BigStruct_std430 { int values[1024]; } var cb : ConstantBuffer<BigStruct_std430>; func computeMain: %tmpVar : var<BigStruct> call %unpackStorage(%tmpVar, cb) %v : BigStruct = load %tmpVar call %test %v ``` This means that the entire array will be loaded and translated to int, before calling `test`, which only uses one element. It turns out that the downstream compiler isn't always able to optimize out this inefficient translation/copy. This PR completely rewrites the way buffer-element-type lowering is handled to avoid producing this inefficient code. It works in two parts: first we turn on the `transformParamsToConstRef` pass for SPIRV target as well, so we will translate the `test` function to take the `v` parameter as `constref`. The second part is a redesigned buffer-element-type pass that defers the storage-type to logical-type translation until a value is actually used by a `load` instruction. In this example, after `transformParamsToConstRef`, the IR is: ``` func test: %v = param : ConstRef<BigStruct> %barr = fieldAddr(%v, "values") %elementPtr = elementAddr(%barr, 0) %element = load(%elementPtr) ... // uses %element func computeMain: call %test %cb ``` The new `buffer-element-type-lowering` pass will take this IR, and insert translation at latest possible time across the entire call graph, and translate the IR into: ``` func test: %v = param : ConstRef<BigStruct_std430> %barr = fieldAddr(%v, "values") %elementPtr : ptr<int> = elementAddr(%barr, 0) %element_int = load(%elementPtr) %element = cast(%element_int) : %bool ... // uses %element func computeMain: call %test %cb ``` In this new IR, there is no longer a load and conversion of the entire array. See new comment in `slang-ir-lower-buffer-element-type.cpp` for more details of how the pass works. This PR also address many other issues surfaced by turning on `transformParamsToConstRef` pass on SPIRV backend. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Remove unnecessary Load and Store pair (#8433)Jay Kwak2025-09-24
| | | | | | | | | | | | | | | | | | | | This commit removes unnecessary Load and Store pairs in IR. When the IR is like ``` let %1 = var let %2 = load(%ptr) store(%1 %2) ``` This PR will replace all uses of %1 with %ptr. And the load and store instructions will be removed. But I found that there can be cases where %2 might be still used later in other IRs. For these cases, the removal of load instruction relies on DCE. --------- Co-authored-by: slangbot <ellieh+slangbot@nvidia.com>
* Initial copy elision pass regression fix (#8126)ArielG-NV2025-08-08
| | | | | | | | | | | | | | when we have `GetElementPtr -> load -> GetElement` in our use-chain, the final `GetElement` may use the `load` as a `Index`, not a base. This is a non-issue with `getFieldExtract` since a field is a StructKey. We will still add this check to ensure no bugs down the line. --------- Co-authored-by: Harsh Aggarwal <haaggarwal@nvidia.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Initial copy elision pass (#8042)ArielG-NV2025-08-07
| | | | | | | | | | | | | | | | | | | | | | Fixes #7574 Changes: * Add an initial (fairly simple) optimization pass which is able to eliminate redundant copies. * Our current existing optimizer passes remove redundant load/store very robustly, this pass will focus on other cases of copy elimination * Primary approach is to make all functions which are `in T` and `T` is trivial to copy into a `__constref T`. We then (depending on scenario) manually insert a variable+load if a pass-by-reference is not possible; otherwise we pass by `constref`. * Added optimizations to eliminate redundant code which causes `constref` to fail to compile --------- Co-authored-by: Harsh Aggarwal <haaggarwal@nvidia.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Add flag to hoist instructions (#6740)jarcherNV2025-04-11
| | | | | | | | | This fixes issue #6654 Only hoist instructions that are optimized by prepareFuncForForwardDiff. Add flag hoistLoopInvariantInsts to IRSimplificationOptions and set this to true only if called from prepareFuncForForwardDiff, then only hoist if the flag is set. Additionally, do not hoist loops if they only have a single trivial iteration.
* Fix loop hoisting logic in redundancy pass. (#5836)Yong He2024-12-11
| | | | | * Fix fast single iteration loop test in redundancy pass. * Fix.
* Move switch statement bodies to their own lines (#5493)Ellie Hermaszewska2024-11-05
| | | | | | | | | * Move switch statement bodies to their own lines * format --------- Co-authored-by: Yong He <yonghe@outlook.com>
* Precompiled SPIR-V import support (#5048)cheneym22024-10-29
| | | | | | | | | | | | | | | | | | | | * Precompiled SPIR-V import support Adds appropriate linkage and function declaration syntax for SPIR-V functions that are declared, to be imported from another SPIR-V module. Unlike DXIL, stripping the Slang IR for a function down to a declaration requires retaining a block of parameters, as the function declaration must be emitted to SPIR-V with the same parameters as a definition. Because that thwarts the logic in Slang to tell the difference between a declaration and definition, and explicit decoration is introduced to explicitly mark functions which need to be treated as declarations during emit phase. Fixes #4992 Co-authored-by: Yong He <yonghe@outlook.com>
* formatEllie Hermaszewska2024-10-29
| | | | | | | * format * Minor test fixes * enable checking cpp format in ci
* Initial -embed-spirv support (#4974)cheneym22024-09-05
| | | | | | | | | | | | | | | | | | | | | | | | | | * Initial -embed-spirv support Add support for SPIR-V precompilation using the framework established for DXIL. Work on #4883 * SLANG_UNUSED * Add linkage attributes to exported spirv functions * Combine DXIL and SPIRV paths * Whitespace fix * Merge remaining precompiled spirv/dxil paths * Change inst accessors to return codegentarget * Add unit test for precompiled spirv --------- Co-authored-by: Yong He <yonghe@outlook.com>
* Support mixture of precompiled and non-precompiled modules (#4860)cheneym22024-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Support mixture of precompiled and non-precompiled modules This changes the implementation of precompile DXIL modules to accept combinations of modules with precompiled DXIL, ones without, and ones with a mixture of precompiled DXIL and Slang IR. During precompilation, module IR is analyzed to find public functions which appear to be capable of being compiled as HLSL, and those functions are given a HLSLExport decoration, ensuring they are emitted as HLSL and preserved in the precompiled DXIL blob. The IR for those functions is then tagged with a new decoration AvailableInDXIL, which marks that their implementation is present in the embedded DXIL blob. The DXIL blob is attached to the IR as before, inside a EmbeddedDXIL BlobLit instruction. The logic that determines whether or not functions should be precompiled to DXIL is a placeholder at this point, returning true always. A subsequent change will add selection criteria. During module linking, the full module IR is available, as well as the optional EmbeddedDXIL blob. The IR for functions implemented by the blob are tagged with AvailableInDXIL in the module IR. After linking the IR for all modules to program level IR, the IR for the functions marked AvailableInDXIL are deleted from the linked IR, prior to emitting HLSL and compiling linking the result. This change also changes the point of time when the module IR is checked for EmbeddedDXIL blobs. Instead of happening at load time as before, it happens during immediately before final linking, meaning that the blob does not need to be independently stored with the module separate from the IR as was done previously. Work on #4792 * Clean up debug prints * Call isSimpleHLSLDataType stub * Address feedback on precompiled dxil support Allow for IR filtering both before and after linking. Only mark AvailableInDXIL those functions which pass both filtering stages. Functions are corrlated using mangled function names. Rather than delete functions entirely when linking with libraries that include precompiled DXIL, instead convert the IR function definitions to declarations by gutting them, removing child blocks. * Use artifact metadata and name list instead of linkedir hack * Use String instead of UnownedStringSlice * Update tests * Renaming * Minor edits * Don't fully remove functions post-link * Unexport before collecting metadata
* Tuple swizzling, concat, comparison and `countof`. (#4856)Yong He2024-08-19
| | | | | | | | | | | * Tuple swizzling and element access. * Update proposal status. * Cleanup. * Fix merrge error. * Address review.
* Prevent hoisting of non-hoistable instructions in non-function values with ↵Ellie Hermaszewska2024-06-01
| | | | code (#4250)
* SPIRV compiler performance fixes. (#3258)Yong He2023-10-04
| | | | | | | | | | | | | | | * SPIRV compiler performance fixes. * Cleanup. * update project files * Cleanup debug code. * Make redundancy removal non-recursive. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Do not move movable insts in fuse-satcoop (#3221)Ellie Hermaszewska2023-09-21
| | | | | | | | | * Do not move movable insts in fuse-satcoop * Add case for IRCall in isMovableInst --------- Co-authored-by: Yong He <yonghe@outlook.com>
* Disable code motion for expensive insts (call & div) (#3042)Sai Praveen Bangaru2023-08-03
| | | | | | | * Disable code motion for expensive insts (call & div) The current redundancy removal pass does not consider control-flow within loops and as a result can sometimes move dynamic dispatch code outside their switch blocks, if they are nested in a single-iter-loop. * Update liveness.slang.expected
* Simplify Lookup and improve compiler performance. (#2996)Yong He2023-07-18
| | | | | | | | | | | | | | | | | | | | | | | | | * Simplify lookup. * Various bug fixes. * Report type dictionary size in perf benchmark. * Remove type duplication. * increase initial dict size. * Bug fix. * Fix bugs. * Fixup. * Revert type legalization looping. * Fix specialization pass. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Pool inst worklists and hashsets to avoid rehashing. (#2982)Yong He2023-07-12
| | | Co-authored-by: Yong He <yhe@nvidia.com>
* Use scratchData on `IRInst` to replace HashSets. (#2978)Yong He2023-07-12
| | | | | | | | | | | | | | | * Use scratchData on `IRInst` to replace HashSets. * Update test results. * Initialize scratchData. * Update autodiff documentation. * Use enum instead of bool. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* SSA Register Allocation improvements. (#2857)Yong He2023-04-28
| | | | | | | | | | | * SSA Register Allocation improvements. * Fix. * Rename `Use`->`UseOrPseudoUse`. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Fix most of the disabled warnings on gcc/clang (#2839)Ellie Hermaszewska2023-04-26
|
* Dictionary using lowerCamel (#2835)jsmall-nvidia2023-04-25
| | | | | | | | | | | | | | | | | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * WIP lowerCamel Dictionary. * WIP more lowerCamel fixes for Dictionary. * Add/Remove/Clear * GetValue/Contains * Fix tabs in dictionary. Count -> getCount * Fix fields with caps. * Key -> key Value -> value Use m_ for members where appropriate. Use lowerCamel in linked list. * Some small fixes/improvements to Dictionary. * Kick CI.
* Refactor checkpointing policy and availability pass. (#2826)Yong He2023-04-21
| | | Co-authored-by: Yong He <yhe@nvidia.com>
* Fix optimization pass not converging. (#2725)Yong He2023-03-23
| | | | | | | | | | | * Fix optimization pass not converging. * Fix. * Fix tests. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Detect and deduplicate read-only resource access. (#2680)Yong He2023-02-27
| | | | | | | | | | | * Detect and deduplicate read-only resource access. * Fix tests. * Fix tests. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* More control flow simplifications. (#2673)Yong He2023-02-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * More control flow and Phi param simplifications. * Fix. * Fix gcc error. * Fix. * More IR cleanup. * Fix bug in phi param dce + ifelse simplify. * Propagate and DCE side-effect-free functions. * Enhance CFG simplifcation to remove loops with no side effects. * Fix. * Fixes. * Fix tests. Add [__AlwaysFoldIntoUseSite] for rayPayloadLocation. * More cleanup. * Fixes. * Fix. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Overhaul global inst deduplication and cpp/cuda backend. (#2654)Yong He2023-02-16
| | | | | | | | | * Overhaul global inst deduplication and cpp/cuda backend. * Update IR documentation. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Arithmetic simplifications and more IR clean up logic. (#2632)Yong He2023-02-07
|
* Unify UpdateField and UpdateElement with access chain. (#2611)Yong He2023-01-25
| | | | | | | * Unify UpdateField and UpdateElement with access chain. * Fix warnings. Co-authored-by: Yong He <yhe@nvidia.com>
* Reimplement address elimination. (#2605)Yong He2023-01-24
| | | | | | | | | * Reimplement address elimination pass. * Fix error. * Update test references. Co-authored-by: Yong He <yhe@nvidia.com>
* Full address insts elimination for backward autodiff. (#2604)Yong He2023-01-23
Co-authored-by: Yong He <yhe@nvidia.com>