summaryrefslogtreecommitdiff
path: root/source/slang/slang-ir-metal-legalize.cpp
AgeCommit message (Collapse)Author
2025-09-30Enhance buffer load specialization pass to specialize past field extracts. ↵Yong He
(#8547) This allows us to specialize functions whose argument is a sub element of a constant buffer, instead of being only applicable to entire buffer element. Closes #8421. This change also implements a proper heuristic to determine when to specialize the calls and defer the buffer loads. This PR addresses a pathological case exposed in `slangpy\slangpy\benchmarks\test_benchmark_tensor.py`, which used to take 27ms to finish, and now takes 1.25ms. For example, given: ``` struct Bottom { float bigArray[1024]; [mutating] void setVal(int index, float value) { bigArray[index] = value; } } struct Root { Bottom top[2]; [mutating] void setTopVal(int x, int y, float value) { top[x].setVal(y, value); } } RWStructuredBuffer<Root> sb; [shader("compute")] [numthreads(1, 1, 1)] void compute_main(uint3 tid: SV_DispatchThreadID) { sb[0].setTopVal(1, 2, 100.0f); } ``` We are now able to specialize the call to `setTopVal` into: ``` void compute_main(uint3 tid: SV_DispatchThreadID) { setTopVal_specialized(0, 1, 2, 100.0f); } void setTopVal_specialized(int sbIdx, int x, int y, float value) { Bottom_setVal_specialized(sbIdx, x, y, value); } void Bottom_setVal_specialized(int sbIdx, int x, int y, float value) { sb[sbIdx].top[x].bigArray[y] = value; } ``` And get rid of all unnecessary loads. Achieving this requires a combination of function call specialization and buffer-load-defer pass. The buffer-load-defer pass has been completely rewritten to be more correct and avoid introducing redundant loads. This PR also adds tests to make sure pointers, bindless handles, and loads from structured buffer or constant buffers works as expected.
2025-09-30Rewriting the lower-buffer-element-type pass to avoid unnecessary ↵Yong He
packing/unpacking. (#8526) Part of the effort to improve the performance of generated SPIRV code. The existing lower-buffer-element-type pass works by loading the entire buffer element content from memory, and translate it to logical type stored in a local variable at the earliest reference of a buffer handle. This means that is can generate inefficient code that reads more than necessary. Consider this example: ``` struct BigStruct { bool values[1024]; } ConstantBuffer<BigStruct> cb; void test(BigStruct v) { if (v.values[0]) { printf("ok"); } } [numthreads(1,1,1)] void computeMain() { test(cb); } ``` In IR, the `computeMain` function before lower-buffer-element-type pass is something like following: ``` func test: %v = param : BigStruct %barr = fieldExtract(%v, "values") %element = elementExtract(%barr, 0) ... // uses %element func computeMain: %v = load(cb) call %test %v ``` The existing lower-buffer-element-type pass will rewrite the bool array in `BigStruct` into `int` array so it is legal in SPIRV. However, it does so by inserting the translation on the first `load` of the constant buffer: ``` struct BigStruct_std430 { int values[1024]; } var cb : ConstantBuffer<BigStruct_std430>; func computeMain: %tmpVar : var<BigStruct> call %unpackStorage(%tmpVar, cb) %v : BigStruct = load %tmpVar call %test %v ``` This means that the entire array will be loaded and translated to int, before calling `test`, which only uses one element. It turns out that the downstream compiler isn't always able to optimize out this inefficient translation/copy. This PR completely rewrites the way buffer-element-type lowering is handled to avoid producing this inefficient code. It works in two parts: first we turn on the `transformParamsToConstRef` pass for SPIRV target as well, so we will translate the `test` function to take the `v` parameter as `constref`. The second part is a redesigned buffer-element-type pass that defers the storage-type to logical-type translation until a value is actually used by a `load` instruction. In this example, after `transformParamsToConstRef`, the IR is: ``` func test: %v = param : ConstRef<BigStruct> %barr = fieldAddr(%v, "values") %elementPtr = elementAddr(%barr, 0) %element = load(%elementPtr) ... // uses %element func computeMain: call %test %cb ``` The new `buffer-element-type-lowering` pass will take this IR, and insert translation at latest possible time across the entire call graph, and translate the IR into: ``` func test: %v = param : ConstRef<BigStruct_std430> %barr = fieldAddr(%v, "values") %elementPtr : ptr<int> = elementAddr(%barr, 0) %element_int = load(%elementPtr) %element = cast(%element_int) : %bool ... // uses %element func computeMain: call %test %cb ``` In this new IR, there is no longer a load and conversion of the entire array. See new comment in `slang-ir-lower-buffer-element-type.cpp` for more details of how the pass works. This PR also address many other issues surfaced by turning on `transformParamsToConstRef` pass on SPIRV backend. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-09-17Diagnostic for metal ref mesh output assignment (#8365)James Helferty (NVIDIA)
When slang detects assignment to a mesh output reference on metal, generate a diagnostic message. (Metal mesh shader outputs must be assigned via 'set' instead of 'ref'.) Fixes #7498
2025-07-11Fix metal segfault by check vectorValue before accessing (#7688)Gangzheng Tong
* Check vectorValue before accessing * Fix metal segfault by using IRElementExtract for general vector handling Address review comments by replacing IRMakeVector-specific code with IRElementExtract to handle any vector instruction type (IRIntCast, etc). This makes the code more robust and fixes cases where float2(1,2) creates IRIntCast instead of IRMakeVector. Co-authored-by: Yong He <csyonghe@users.noreply.github.com> * Fix sign comparison warning in metal legalize Cast originalElementCount->getValue() to UInt to avoid comparison between signed and unsigned integers. Co-authored-by: Yong He <csyonghe@users.noreply.github.com> --------- Co-authored-by: Yong He <yonghe@outlook.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Yong He <csyonghe@users.noreply.github.com>
2025-04-14Fix matrix division by scalar for Metal and WGSL targets (#6752)Darren Wihandi
* Fix matrix division by scalar for Metal and WGSL targets * Add tests * Minor fix * Fix compilation error * Convert to multiplication for WGSL * Minor cleanup --------- Co-authored-by: Yong He <yonghe@outlook.com>
2025-01-24Fix depth texture sampling on Metal. (#6168)Yong He
2025-01-15Reuse code for Metal and WGSL entry point legalization (#6063)Darren Wihandi
* Refactor to reuse common for metal and wgsl entry point legalization * refactor system val work item * refactor simplify user names * clean up fix semantic field of struct * improve code layout * split wgsl/metal to seperate classes and cleanup * remove extra includes * remove dead code comments * minor cleanup * squash merge from master and resolve conflict * apply metal spec const thread count changes * Revert "apply metal spec const thread count changes" This reverts commit c42d707fd25ee0328598650d3235cd2322810ccc. * Revert "squash merge from master and resolve conflict" This reverts commit 06db88ef7001bdfe93fb23af35af0d026b255dee. * Merge remote-tracking branch 'origin/master' * apply metal spec const thread count changes * Revert "apply metal spec const thread count changes" This reverts commit 3b9e6f53cee2e6076ac2b7a0d015a1ed2cbbd627. * Revert "Merge remote-tracking branch 'origin/master'" This reverts commit 99869d573a46dadeb24445405f5a1e37a8e03d0d. * apply metal spec const thread count changes --------- Co-authored-by: Yong He <yonghe@outlook.com>
2025-01-14Implement specialization constant support in numthreads / local_size (#5963)Julius Ikkala
* Allow using specialization constants in numthreads attribute * Add support for GLSL local_size_x_id syntax * Fix overeager specialization constant parsing * Add diagnostics for specialization constant numthreads * Remove unused variable * Fix local_size_x_id not finding existing specialization constant * Allow materializeGetWorkGroupSize to reference specialization constants * Use SpvOpExecutionModeId for modes that require it * Cleanup specialization constant numthreads code * Add tests for specialization constant work group sizes * Fix implicit Slang::Int -> int32_t cast * Fix querying thread group size in reflection API --------- Co-authored-by: Yong He <yonghe@outlook.com>
2025-01-10WGSL: Convert signed vector shift amounts to unsigned (#6023)Anders Leino
* WGSL: Fixes for signed shift amounts - Handle the case of vector shift amounts - Closes #5985 - Move handling of scalar case from emit to legalization - Add tests for bitshifts. * Move the binary operator legalization function to a common place * Metal: Legalize binary operations Closes #6029. * Fix Metal filecheck test The int shift amounts are now converted to unsigned. * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: Yong He <yonghe@outlook.com>
2025-01-07Lower varying parameters as pointers instead of SSA values. (#5919)Yong He
* Add executable test on matrix-typed vertex input. * Fix emit logic of matrix layout qualifier. * Pass fragment shader varying input by constref to allow EvaluateAttributeAtCentroid etc. to be implemented correctly.
2024-12-19Add base vertex and base instance system values (#5918)Darren Wihandi
* Add base vertex and base instance system values * Fixed incorrect stage in tests
2024-11-05Move switch statement bodies to their own lines (#5493)Ellie Hermaszewska
* Move switch statement bodies to their own lines * format --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-10-29formatEllie Hermaszewska
* format * Minor test fixes * enable checking cpp format in ci
2024-10-09Metal: Texture write fix (#4952)Dynamitos
2024-08-29Fix typo SV_DomainLsocation (#4960)Jay Kwak
* Fix typo SV_DomainLsocation * Fix CI failures --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-08-28Metal: Fix metal primitive_id semantic (#4951)Dynamitos
2024-08-28Metal: Mesh Shaders (#4280)Dynamitos
* Metal: mesh shading skeleton * Metal: fixing mesh payload * Metal: improving mesh shader indices output * Metal: Implementing conditional mesh output set * Metal: Trying to not break other backends * Metal: trying to fix mesh output set * Metal: Fixing MeshOutputSet usages * Metal: Fixing vertex and primitive semantics * Metal: Fixing code style * Metal: Fixed hlsl indices set * Fixed HLSL mesh output set disappearing and GLSL mesh output crashing * Metal: Adjusting task test matching --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-08-19Remove using SpvStorageClass values casted into AddressSpace values (#4861)Ellie Hermaszewska
* Remove using SpvStorageClass values casted into AddressSpace values Also removes support for specific storage classes in __target_intrinsic snippets * remove SLANG_RETURN_NEVER macro * squash warnings * Make nonexhaustive switch statement error on gcc * Add SLANG_EXHAUSTIVE_SWITCH_BEGIN/END macros --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-07-10Specialize address space during spirv legalization. (#4600)Yong He
* Specialize address space during spirv legalization. * Fix. * Fix building doc. * Fix cmake. * Update assert.
2024-07-10Fixes to Metal Input parameters and Output value input/output semantics (#4536)ArielG-NV
* initial change to test with CI for CPU/CUDA errors * Fixes to Metal Input parameters and Output values Note: 1. Flattening a struct is the process of making a struct have 0 struct/class members. Changes: 1. Separated `legalizeSystemValueParameters`. This was done to make it easier to run `legalizeSystemValue` 1 system-value at a time to simplify logic. This change is optional and can be undone if not preferred. 2. Wrap everything inside a Metal legalization context. This was done since it simplifies a lot of logic and will be required for #4375 3. Created `convertSystemValueSemanticNameToEnum` and expanded the existing System-Value Enum system. This allows (sometimes) faster comparisons and helps prepare code for porting into `slang-ir-legalize-varying-params.cpp` (#4375) 4. Added a more dynamic `legalizeSystemValue` system so more than 2 types can be targeted for legalization. This is required to legalize `output`. There is still no preference for any converted type, the first valid type will be converted to. 5. Flatten all input(`flattenInputParameters`)/output(part of `wrapReturnValueInStruct`) structs and assign semantics accordingly. 6. Semantics when legalized have no specific logic other than to: 1. avoid overlapping semantics 2. Prefer assigning explicit semantics specified by a user. 7. Fixed some issue with incorrect output semantics if not a fragment stage (when there are not any assigned semantics) * change metallib test to the correct metal test * comment code & cleanup -- Did not address all review Added comments for clarity + cleaned up some odd areas which were messy * Add comment to `fixFieldSemanticsOfFlatStruct` I found `fixFieldSemanticsOfFlatStruct` to still be confusing at a cursory glance. Added comments to make the function be more understandable. * white space * Address review comments 1. Fix semantic propegation. 2. Fix how we map struct fields of the flat struct to struct. This is specifically important for if reusing the same struct twice since struct member info is not unique per struct instance used. * Fix semantic legalization by adding TreeMap Add TreeMap to allow proper sorted-object data iteration. * Fix some compile issues * try to fix gcc compile error * compile error * fix logic bug in treeMap iterator next-semantic setter * fix vsproject filters * filter file syntax error * remove need of a context to make copies stable * Rename treemap to the more appropriate name of "treeset", adjust code comments accordingly. * remove custom type `TreeSet` and use `std::set` * remove TreeMap fully --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-06-13[Metal] Support SV_TargetN. (#4390)Yong He
* [Metal] Support SV_TargetN. * Fix.
2024-06-13Implement for metal `SV_GroupIndex` (#4385)ArielG-NV
* Implement for metal `SV_GroupIndex` 1. If we don't have `sv_GroupThreadId` available we create one using `SV_GroupIndex`s location data. 2. We emit code emulating `sv_GroupThreadId` from the same logic that CUDA/CPP uses. * address most review comments Addressed all but two: [1](https://github.com/shader-slang/slang/pull/4385#discussion_r1639058473) and [2](https://github.com/shader-slang/slang/pull/4385#issuecomment-2166934855) I want to enable tests and be sure there is no bugs using CI before I redesign the code so I have a working fallback. * address comment, enable tests enable now functioning tests due to `SV_GroupIndex` working with metal * syntax error with groupThreadID search did `= param` instead of `= i.param` * add `sv_groupid` for test + test fixes * disable test that won't work regardless
2024-06-12Fix emit logic for getElementPtr. (#4362)Yong He
* Fix emit logic for getElementPtr. * Legalize `getElementPtr(vector, id)` for metal. * Fix compiler error. * Fix warnings. * Fix test. * Fix.
2024-06-08Metal system value overhaul. (#4308)Yong He
2024-06-02Metal Task Shader payload (#4238)Dynamitos
2024-05-22Fix all Clang-14 warnings (#4203)ArielG-NV
* fix all Clang-14 warnings * remove a clang-14 warning fix because it is a MSVC warning...
2024-05-08Metal: propagate and specialize address space. (#4137)Yong He
2024-04-30Metal: Vertex/Fragment builtin and layouts. (#4044)Yong He
* Metal: Vertex/Fragment builtin and layouts. * Fix. * Fix test. * Emit user semantic on vertex/fragment attributes.