summaryrefslogtreecommitdiff
path: root/source/slang/slang-emit.cpp
AgeCommit message (Collapse)Author
2025-10-16Immutable access qualifier for pointers and use `__ldg` on cuda. (#8710)Yong He
This PR implements `Access.Immutable` to allow pointers to immutable data. The new type `ImmutablePtr<T>` is defined as an alias of `Ptr<T, Address.Immutable>`. By forming a immutable pointer, the programmer is conveying to the compiler that the data at the pointer address will never change during the execution of the current program. Therefore loads from immutable pointers can be deduplicated by the compiler, and will translate to `__ldg` when generating code for CUDA. The SPIRV backend is not changed in this PR, since the current SPIRV spec makes it very difficult to specify loads from immutable address without generating tons of wrappers and boilerplate type declarations. We would like to see the spec evolved a bit to around its support of `NonWritable` physical storage pointers or immutable loads before we attempt to express such immutability in SPIRV. For now we simply emit ordinary pointers and loads when generating spirv. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-10-09Improve perf with `-separate-debug-info` (#8670)Jay Kwak
When Slang form a new spirv code without the debug info, List container had to reserve the memory space before adding items in it. This improves the given repro test time from 56 minutes to 6 minutes.
2025-10-10Defer `IRCastStorageToLogicalDeref` in lowerBufferElementType pass. (#8668)Yong He
Fix a regression on metal test. In `lowerBufferElementTypeToStorageType` pass, not only we want to defer an argument that is `CastStorageToLogical` to the callee, but also apply the same defer logic to `CastStorageToLogicalDeref` as well. Because `CastStorageToLogicalDeref` will appear as argumnet if `lowerBufferElementTypeToStorageType` is run before we apply the `in->borrow` transformation pass, which is the case for metal parameter block legalization.
2025-10-10Small fix to buffer load specialization pass to allow more specialization to ↵Yong He
happen. (#8653) This allows us to further cleanup unnecessary copies in the target code we generate. Part of effort of #8652.
2025-10-03Fix legalization crash when processing metal parameter blocks. (#8591)Yong He
Closes #7606. When Slang compile for a bindful target, we will run the resource type legalization pass to hoist resource typed struct fields outside of the struct type and define them as global parameters and passing them around via dedicated function parameters. When we compile for a bindless target, we don't run this pass. However, Metal is a hybrid bindful and bindless target. We need to run type legalization for the constant buffer, but skip type legalization for parameter block. The previous attempt to support this behavior is to hack the type legalization pass to return `LegalVal::simple` when it sees a `ParameterBlock<T>`. However, whenever the code is accessing `parameterBlock.someNestedField`, the type of the nested field may get a `LegalType::tuple`, and now we will run into inconsistent scenarios where we have a `LegalVal::simple` on the operand val, and but the legalization logic is expecting that val to be a `LegalType::tuple`. This breaks a lot of assumptions and invariants in the type legalization pass, resulting unstable/fragile behavior. To systematically solve this problem, this change generalizes the existing legalize buffer element type pass to translate `ParameterBlock<Texture2D>` (and similar cases) to `ParameterBlock<Texture2D.Handle>`. So that such parameter block will always be legalized to `LegalType:::simple` during type legalization, and we will never run into any inconsistent cases. This allowed us to get rid of the hacky logic in the type legalization pass to try to workaround the inconsistencies.
2025-09-30Enhance buffer load specialization pass to specialize past field extracts. ↵Yong He
(#8547) This allows us to specialize functions whose argument is a sub element of a constant buffer, instead of being only applicable to entire buffer element. Closes #8421. This change also implements a proper heuristic to determine when to specialize the calls and defer the buffer loads. This PR addresses a pathological case exposed in `slangpy\slangpy\benchmarks\test_benchmark_tensor.py`, which used to take 27ms to finish, and now takes 1.25ms. For example, given: ``` struct Bottom { float bigArray[1024]; [mutating] void setVal(int index, float value) { bigArray[index] = value; } } struct Root { Bottom top[2]; [mutating] void setTopVal(int x, int y, float value) { top[x].setVal(y, value); } } RWStructuredBuffer<Root> sb; [shader("compute")] [numthreads(1, 1, 1)] void compute_main(uint3 tid: SV_DispatchThreadID) { sb[0].setTopVal(1, 2, 100.0f); } ``` We are now able to specialize the call to `setTopVal` into: ``` void compute_main(uint3 tid: SV_DispatchThreadID) { setTopVal_specialized(0, 1, 2, 100.0f); } void setTopVal_specialized(int sbIdx, int x, int y, float value) { Bottom_setVal_specialized(sbIdx, x, y, value); } void Bottom_setVal_specialized(int sbIdx, int x, int y, float value) { sb[sbIdx].top[x].bigArray[y] = value; } ``` And get rid of all unnecessary loads. Achieving this requires a combination of function call specialization and buffer-load-defer pass. The buffer-load-defer pass has been completely rewritten to be more correct and avoid introducing redundant loads. This PR also adds tests to make sure pointers, bindless handles, and loads from structured buffer or constant buffers works as expected.
2025-09-30Rewriting the lower-buffer-element-type pass to avoid unnecessary ↵Yong He
packing/unpacking. (#8526) Part of the effort to improve the performance of generated SPIRV code. The existing lower-buffer-element-type pass works by loading the entire buffer element content from memory, and translate it to logical type stored in a local variable at the earliest reference of a buffer handle. This means that is can generate inefficient code that reads more than necessary. Consider this example: ``` struct BigStruct { bool values[1024]; } ConstantBuffer<BigStruct> cb; void test(BigStruct v) { if (v.values[0]) { printf("ok"); } } [numthreads(1,1,1)] void computeMain() { test(cb); } ``` In IR, the `computeMain` function before lower-buffer-element-type pass is something like following: ``` func test: %v = param : BigStruct %barr = fieldExtract(%v, "values") %element = elementExtract(%barr, 0) ... // uses %element func computeMain: %v = load(cb) call %test %v ``` The existing lower-buffer-element-type pass will rewrite the bool array in `BigStruct` into `int` array so it is legal in SPIRV. However, it does so by inserting the translation on the first `load` of the constant buffer: ``` struct BigStruct_std430 { int values[1024]; } var cb : ConstantBuffer<BigStruct_std430>; func computeMain: %tmpVar : var<BigStruct> call %unpackStorage(%tmpVar, cb) %v : BigStruct = load %tmpVar call %test %v ``` This means that the entire array will be loaded and translated to int, before calling `test`, which only uses one element. It turns out that the downstream compiler isn't always able to optimize out this inefficient translation/copy. This PR completely rewrites the way buffer-element-type lowering is handled to avoid producing this inefficient code. It works in two parts: first we turn on the `transformParamsToConstRef` pass for SPIRV target as well, so we will translate the `test` function to take the `v` parameter as `constref`. The second part is a redesigned buffer-element-type pass that defers the storage-type to logical-type translation until a value is actually used by a `load` instruction. In this example, after `transformParamsToConstRef`, the IR is: ``` func test: %v = param : ConstRef<BigStruct> %barr = fieldAddr(%v, "values") %elementPtr = elementAddr(%barr, 0) %element = load(%elementPtr) ... // uses %element func computeMain: call %test %cb ``` The new `buffer-element-type-lowering` pass will take this IR, and insert translation at latest possible time across the entire call graph, and translate the IR into: ``` func test: %v = param : ConstRef<BigStruct_std430> %barr = fieldAddr(%v, "values") %elementPtr : ptr<int> = elementAddr(%barr, 0) %element_int = load(%elementPtr) %element = cast(%element_int) : %bool ... // uses %element func computeMain: call %test %cb ``` In this new IR, there is no longer a load and conversion of the entire array. See new comment in `slang-ir-lower-buffer-element-type.cpp` for more details of how the pass works. This PR also address many other issues surfaced by turning on `transformParamsToConstRef` pass on SPIRV backend. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-09-18Fix CUDA global variable initialization with constructor calls (#8340)Harsh Aggarwal (NVIDIA)
Fix CUDA global variable initialization with constructor calls Resolves CUDA compilation failure where global variables with struct constructor initialization generated illegal `__device__` variable runtime initialization. **Problem:** ```cuda // Generated invalid CUDA code: __device__ static const Stuff_0 gStuff_0 = Stuff_x24init_0(args...); // Error: "dynamic initialization is not supported for a __device__ variable" Root Cause Discovered: Through extensive debugging, found that moveGlobalVarInitializationToEntryPoints pass only handled kIROp_GlobalVar instructions, but global constants with constructor calls appeared as kIROp_Call instructions at module scope. Solution: 1. IR Pipeline Fix: Extended moveGlobalVarInitializationToEntryPoints to detect and transform module-level constructor calls into proper global variables with entry-point initialization 2. Field Access Fix: Enhanced kIROp_FieldExtract logic to emit correct -> syntax for pointer types and address-of operations 3. Constructor Emission: Added CUDA-specific handling for constructor calls Architecture: - Transforms let %gStuff = call %Constructor(...) into kernel context initialization - Moves runtime initialization from global scope to entry-point execution - Follows CUDA best practices for global state management Files: - source/slang/slang-ir-explicit-global-init.cpp: Extended IR transformation pass - source/slang/slang-emit-c-like.cpp: Enhanced field access and foldable value logic - source/slang/slang-emit-cuda.cpp: Added CUDA-specific field extraction handling Result: // Now generates proper CUDA code: struct KernelContext_0 { Stuff_0 gStuff_1; }; // Runtime initialization in entry point: kernelContext_1.gStuff_1 = constructor_call(); Fixes: tests/compute/type-legalize-global-with-init.slang
2025-09-10Fix segfault in SPIR-V header processing in SpirvInstructionHelper (#8428)Gangzheng Tong
The `SpirvInstructionHelper::loadBlob()` method could segfault when calling `m_headerWords.addRange()` if the SPIR-V blob contained insufficient data for the required 5-word header. To reproduce, run ``` ./build/Debug/bin/slangc.exe tests/modules/environment.slang -o tests/modules/environment.slang-module -target spirv -separate-debug-info (0): error 57004: output SPIR-V contains no exported symbols. Please make sure to specify at least one entrypoint. Segmentation fault ``` The error is expected, but the `Segmentation fault` is not. This PR adds the check to ensure the SPIR-V blob has at least `SPV_INDEX_INSTRUCTION_START * sizeof(SpvWord)` bytes (20 bytes minimum) before attempting to process the header words. Related to: https://github.com/shader-slang/slang/issues/7547
2025-09-10CUDA: Fix compiler crash with unsized array field - nonuniformres-as-… (#8380)Harsh Aggarwal (NVIDIA)
…function-parameter.slang #8315 Root Cause: CUDA compilation crashed with `assert failure: !seenFinalUnsizedArrayField` because unsized arrays like `RWStructuredBuffer<uint> globalBuffer[]` were not the final field in generated parameter structs, violating the layout constraint in slang-ir-layout.cpp. Fix: Extended `collectGlobalUniformParameters` to automatically reorder struct fields for CUDA targets - regular fields first, unsized arrays last. Other targets preserve original order. Impact: - Enables CUDA support for nonuniform resource indexing as function parameters - Zero impact on existing GLSL/HLSL/SPIRV targets - Automatic handling - no manual parameter reordering required Files: slang-emit.cpp, slang-ir-collect-global-uniforms.cpp/.h, test file --------- Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
2025-09-03Diagnose on structured buffers containing resources (#8222)Ellie Hermaszewska
closes https://github.com/shader-slang/slang/issues/3313
2025-08-07Initial copy elision pass (#8042)ArielG-NV
Fixes #7574 Changes: * Add an initial (fairly simple) optimization pass which is able to eliminate redundant copies. * Our current existing optimizer passes remove redundant load/store very robustly, this pass will focus on other cases of copy elimination * Primary approach is to make all functions which are `in T` and `T` is trivial to copy into a `__constref T`. We then (depending on scenario) manually insert a variable+load if a pass-by-reference is not possible; otherwise we pass by `constref`. * Added optimizations to eliminate redundant code which causes `constref` to fail to compile --------- Co-authored-by: Harsh Aggarwal <haaggarwal@nvidia.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-07-30Lowering unsupported matrix types for GLSL/WGSL/Metal targets (#7936)venkataram-nv
* Add emit cases for WGSL and GLSL * Fix compilation warnings Modify short cutting test to reflect change in emit logic Lower matrix for metal as well Add emit matrix logic for metal Fix compiler warning Brace initializer for lowered matrices Fix compiler warnings * Tests for metal * Fix mult, any, and determinant * Fix matrix-matrix multiplication * Fix mat mul to be element-wise * Fix compiler warning * Move makeMatrix to legalization * Move unary and binary arithmetic operator lowering to legalization * Remove emit logic and move final comparison operators to legalization * Handle vector/matrix negation for WGSL * Restore older SPIR-V emit logic * Address PR comments * Revert to zero minus for negation * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-07-29Detect uses of uninitialized resource fields (#7962)Ellie Hermaszewska
Closes https://github.com/shader-slang/slang/issues/3386
2025-07-18Lower int/uint/bool matrices to arrays for SPIRV (#7687)venkataram-nv
* Add tests for expected behaviour * Allow matrix types in logical or/and * Legalize int/bool matrix types and construction with makeMatrix * Legalize uint matrices and operations * Limit testing to only SPIRV * Better tests for int and bool * Add test for uint * Remove GLSL tests * Remove old test for diagnosing int matrices * Emit SPIRV directly in tests * format code * Address PR comments * Improve testing * Address PR comments * format code * Add tests for matrix intrinsic operations * Move matrix lowering to dedicated legalization pass * Fix compiler warning * Remove signal again * Reorder matrix and vector legalization * Fix formatting * Add shift and comparison tests --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-07-18Add bounds checking for out-of-bounds array access with constant indices (#7814)Copilot
* Initial plan * Implement out-of-bounds array access checking Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Add tests and format code for array bounds checking Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Move checkForOutOfBoundAccess to separate file and refactor using InstPassBase Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Optimize out-of-bounds checker to use single IR traversal Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Fix DiagnosticSink forward declaration from struct to class Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Update 0-array-1 test to use runtime indices to avoid bounds checking diagnostic Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Use SV_DispatchThreadID for truly runtime array access in 0-array-1 test Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com>
2025-07-16Strip debug info only for the SpvOpExtInst of debug import type (#7779)Gangzheng Tong
2025-07-02[HLSL, SPIRV_1_3] Hoist OpSelect returning a composite into `if`/`else` (#7594)ArielG-NV
* emit var and hoist out OpSelect if Composite * cleanup comment * address review check for version in spv context use phi node instead of using var move inst's using a list (not in-place modification) * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-07-02Defer immutable buffer loads when emitting spirv. (#7579)Yong He
* Defer immutable buffer loads when emitting spirv. * Fix. * Fix. * Fix. * Fix tests. * Fix test.
2025-07-01extend fiddle to allow custom lua splices in more places (#7559)Ellie Hermaszewska
* Add fkYAML submodule * Generate slang-ir-inst-defs.h from slang-ir-inst-defs.yaml * generate ir-inst-defs.h * neaten things * neaten inst def parser * add rapidyaml submodule * remove fkyaml * remove fkyaml submodule * remove use of ir-inst-defs.h * format and warnings * fix wasm build * tidy * remove rapidyaml * Extend fiddle to allow custom splices in more places * Use lua to describe ir insts * fix * neaten * neaten * neaten * spelling * neaten * comment comment out assert * merge
2025-07-01Add arguments for controlling floating point denormal mode (#7461)aidanfnv
* Implement -fp-denorm-mode slangc arg * Split fp-denorm-mode into 3 args for fp16/32/64 * Remove redundant option categories * Use emitInst for multiple of the same OpExecutionMode * Fix formatting * Remove -denorm any * Re-add option categories * emitinst for ftz * Use enums for type text * Remove extra categories again * Add tests for denorm mode * Move denorm mode to post linking * format code (#8) Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> * regenerate command line reference (#9) Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> * Clean up tests * Fix option text * format code (#10) Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> * Add tests for "any" mode * Return "any" enum if option not set * Simplify emission logic * Add support for generic entrypoints * Move denorm modes to end of CompilerOptionName enum * format code (#11) Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> * Move new enum members to before CountOf * Add not checks to tests, fix generic test, add functionality tests * Rename denorm to fpDenormal * Clean up functional test * Rename denorm test dir * Fix formatting, regenerate cmdline ref * Fold simple tests into functional tests, add more dxil checks * Remove no-op DX tests, make tests more consistent * Disable VK functionality tests that will fail on the CI configs * Fix formatting * Add comments to disabled tests explaining why --------- Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-06-12Fix intermittent debug failures with Debug build (#7369)Jay Kwak
This PR replaces enable/disable style C function calls with C++ RAII style code. In debug build, when an assertion failed in between enable and disable functions, an exception is thrown and the disable function is not called. RAII style code is safer for an exception
2025-06-06Add command line option for separate debug info (#7178)jarcherNV
* Add command line option for separate debug info Add command line arg -separate-debug-info which, if provided, produces both a .spv and a .dbg.spv file. The .dbg.spv file contains full debug info and the .spv file has all debug info stripped out. Also add a DebugBuildIdentifier instruction to store a unique hash in both the output files, so they can be more easily matched together. A matching API is provided to allow using the Slang API to retrieve a base and debug SPIRV as well as the debug build identifier string.
2025-06-04Add legalization for 0-sized arrays. (#7327)Yong He
* Add legalization for 0-sized arrays. * Allow 0-sized arrays in the front-end. * More tests. * Add `Conditional<T, hasValue>` type to core module. * Update toc. * Fix wording. * Update test.
2025-06-04Make interface types non c-style in Slang2026. (#7260)Yong He
* Make interface types non c-style. * Make Optional<T> work with autodiff and existential types. * Fix. * patch behind slang 2026. * Fix warnings. * cleanup. * Fix tests. * Fix. * Fix com interface lowering. * Add comment to test. * regenerate command line reference * Add test for passing `none` to autodiff function. * Fix recording of `getDynamicObjectRTTIBytes`. * Fix nested Optional types. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-05-23Implement throw & catch statements (#6916)Julius Ikkala
* Implement throw statement It already existed in the IR, so only parsing, checking and lowering was missing. * Initial catch implementation Likely very broken. * Error out when catch() isn't last in scope * Prevent accessing variables from scope preceding catch As those may actually not be available at that point. * Add IError and use it in Result type lowering * Add diagnostic tests * Allow caught throws in non-throw functions * Fix catch propagating between functions & SPIR-V merge issue * Add test for non-trivial error types * Fix MSVC build * Fix invalid value type from Result lowering * Also lower error handling in templates * Lower result types only after specialization * Attempt to disambiguate error enums by witness table * Revert matching by witness, types should be distinct too * Don't assert valueField when getting Result's error value It may not exist if the function returns void, but getting the error value is still legitimate. * Update tests for new error numbers & get rid of expected.txt * Change catch lowering to resemble breaking a loop ... To make SPIR-V happy. * Fix dead catch blocks and invalid cached dominator tree * More SPIR-V adjustment * Lower catch as two nested loops * Add defer interaction test and revert broken defer changes * Fix enum type when throwing literals * Cleanup and bikeshedding * Document error handling mechanism * Fix table of contents * Use boolean tag in Result<T, E> * Use anyValue storage for Result<T,E> * Remove IError * Fix formatting * Eradicate success values from docs and tests * Use parseModernParamDecl for catch parameter * Implement do-catch syntax * Implement catch-all * Fix formatting * Fix marshalling native calls that throw --------- Co-authored-by: Yong He <yonghe@outlook.com>
2025-05-17Fix: Preserve inout param modifications with OptiX IgnoreHit() (#6956)Harsh Aggarwal (NVIDIA)
* Add noreturn attribute to IgnoreHit * Revert "Add noreturn attribute to IgnoreHit" This reverts commit 3cf2354dada71b9a8713b08f3a2e261de4aabfa4. * Fix: Preserve inout param modifications with OptiX IgnoreHit() Issue #6326 identified that in OptiX, when using IgnoreHit() (which maps to the "noreturn" optixIgnoreIntersection() intrinsic), any modifications made to 'inout' parameters within the shader would be lost. This was due to IgnoreHit() preventing the execution of the copy-back operation from the temporary variable (used to implement 'inout' semantics) to the original parameter. This commit introduces a new IR pass, 'undoParameterCopy', specifically for CUDA/OptiX targets to address this. The pass operates as follows: 1. Identifies temporary IR variables created for 'inout' parameters, which are now decorated with 'TempCallArgVarDecoration'. 2. Maps these temporary variables back to their original parameter storage (e.g., the OptiX payload pointer). 3. Replaces all uses of the temporary variable directly with the original parameter pointer. 4. Removes the temporary variable declaration and its initializing store (which copied from the original parameter to the temporary). By transforming the IR to operate directly on the original parameter storage before any potential call to IgnoreHit(), this fix ensures that all modifications are preserved, correctly resolving issue #6326. The pass is integrated into the compilation flow for relevant targets. * Refactor(IR): Optimize GetOptiXRayPayloadPtr for better DCE/CSE To allow for more effective dead code elimination (DCE) and common subexpression elimination (CSE) of `getOptiXRayPayloadPtr` instructions, this commit: - Marks `kIROp_GetOptiXRayPayloadPtr` as side-effect-free within `IRInst::mightHaveSideEffects` (in `slang-ir.cpp`). - Flags `GetOptiXRayPayloadPtr` as `HOISTABLE` in its definition within `slang-ir-inst-defs.h`. This addresses scenarios where multiple, potentially redundant, calls to `getOptiXRayPayloadPtr` might appear in the IR, allowing optimizers to produce cleaner and potentially more efficient code for OptiX targets. This change supports efforts to refine IR handling for ray-tracing shader stages. * Remove debugging code * Refactor UndoParameterCopyVisitor for improved performance - Optimized IR traversal by combining multiple passes into a single scan - Removed unnecessary dictionary, immediately replace uses when a temp var is found - Reduced duplicate code paths by checking for both temp vars and redundant stores in one loop - Better handling of the 'changed' flag to ensure DCE only runs when needed - Results in fewer instruction traversals and improved efficiency for large functions * Add Test
2025-05-14Error out on invalid vector sizes (#7076)Darren Wihandi
* Error out on invalid vector sizes * Remove unnecessary include * Fix incorrect assert * Add test
2025-05-10Add debug information for slang inling (#6621)Mukund Keshava
2025-05-03Add IREnumType to distinguish enums from ints and each other (#6973)Julius Ikkala
* Add IREnumType to distinguish enums from ints and each other * Add issue example as test * format code * Add expected test output * Fix peephole optimization hanging No idea why this PR triggered this, but there seems to have been a clear bug here anyway, so may just as well fix it now. * Move enum lowering later * Add linkage decoration to enum type * Use filecheck-buffer instead of expected.txt * Fix comment * Make enum casts actually use IR enum casts They were all BuiltinCasts by accident * Lower enum type before VM * Deal with rate-qualified types in enum cast * Allow any value marshalling for enum types * Handle new enum instructions in a couple more switches * Fix formatting --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-04-28Add Slang Byte Code generation and interpreter. (#6896)Yong He
* Add Slang Byte Code generation and interpreter. * Fix compile issues. * format code * More compile fix. * Fix clang issue. * Fix more clang issues. * Another clang fix. * Fix clang issues. * Fix another clang issue. * Fix wasm build. * Update building.md * Fix test-server. * Fix compile error. * Fix bug. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-04-09void field rework (#6739)kaizhangNV
* void field rework * move void cleanup pass earlier
2025-04-04Add a loop analysis step to infer the exit values of loop phi parameters. ↵Sai Praveen Bangaru
(#6696) * Initial loop analysis pass * More changes for a single-pass implication propagation * Update slang-ir-autodiff-loop-analysis.cpp * Cleanup + new system for loop analysis * Fixup bugs in loop analysis * Remove some relation types to simplify the analysis. Add test * Remove unused * Address comments * Fix issue with continue loops * Update reverse-loop-exit-value-inference-1.slang * Update reverse-continue-loop.slang
2025-04-02Metal remove void field (#6725)kaizhangNV
* Reapply "Eliminate empty struct on metal target (#6603)" (#6711) This reverts commit bc9dc6557fc0cc3a4c0c2ff27e636940e361cf5d. * Remove argument in make_struct call corresponding to void field This is a follow-up of #6543, where we leave the VoidType field as it in make_struct call during legalization pass. So during cleaning_void IR pass, when we remove "VoidType" from struct, we will have to also clean up the argument corresponding to the "VoidType" field.
2025-03-31Revert "Eliminate empty struct on metal target (#6603)" (#6711)Jay Kwak
This reverts commit b3deec2001ea34e20e9a6af8ddf5cf3866cafac0.
2025-03-26Eliminate empty struct on metal target (#6603)kaizhangNV
* Eliminate empty struct on metal target Close 6573. We previously disabled the type legalization for ParameterBlock on Metal, but Metal doesn't allow empty struct in the argument buffer which is mapped from ParameterBlock, so we will need legalizeEmptyTypes on Metal target. * update test * update function name
2025-03-21Emit errors for missing returns on unsupported targets (#6633)Darren Wihandi
* initial wip * more WIP * preserve old lower behavior * remove unnecessary includes * add test * add no target case in test * fix broken test --------- Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
2025-03-17Add error diagnostic vectors and matrices with disallowed element types (#6596)Anders Leino
- Add the diagnostic messages, and code to emit them - Add some tests This helps to address issue #6183.
2025-03-13Add mesh shader output topology checks (#6592)Darren Wihandi
* initial wip * more wip * add test * add unexpected for invalid target * fixups and improve error message * fixups and improve error message * remove incorrect comment --------- Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
2025-03-06Update SPIRV-Tools and fix new validation errors. (#6511)Yong He
* Update SPIRV-Tools and fix new validation errors. * Implement pointers for glsl target. * Reworked packStorage/unpackStorage code gen to operate on pointers rather than values.
2025-03-05Support SPIR-V deferred linking option (#6500)cheneym2
The new option "SkipDownstreamLinking" will defer final downstream IR linking to the user application. This option only has an effect if there are modules that were precompiled to the target IR using precompileForTarget(). Until now, the default behavior for SPIR-V was to use deferred linking, and the default behavior for DXIL was to use immediate/internal linking in Slang. This change only affects the SPIR-V behavior such that both deferred and non-deferred linking is supported based on the new option. To support the non-deferred option, Slang will internally call into SPIRV-Tools-link to reconstitute a complete SPIR-V shader program when necessary (due to modules having been precompiled to target IR). Otherwise, if SkipDownstreamLinking is enabled, the shader returned by e.g. getTargetCode() or getEntryPointCode() may have import linkage to the SPIR-V embedded in the constituent modules. Closes #4994 Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-02-28Add WaveGetLane* support for Metal and WGSL (#6371)Darren Wihandi
* support WaveGetLane* for WGSL and Metal * update test and glsl support * address review comments and fix metal test * add missing pragma guard * update test * Revert "update test" This reverts commit f2b97e91c29de154190710580c343bd0764aedbb. * update failing glsl metal test and added new test * make hlsl and glsl outputs similar * update test * disable tests for Metal and cleanup * comment fix * add expected failures * correct expected failures list * remove expected failure * add tests to expected failure --------- Co-authored-by: Yong He <yonghe@outlook.com>
2025-02-23Improve performance when compiling small shaders. (#6396)Yong He
Improve performance when compiling small shaders. Avoid copying witness table entries that are not getting used during linking. Avoid copying auto-diff related decorations and derivative functions during linking, if the user modules doesn't use autodiff. Cache operator overload resolution results on global session, so each new Session doesn't need to repetitively run through overload resolution from scratch.
2025-02-07Use and() and or() functions for logical-AND and OR (#6310)Jay Kwak
* Use and() and or() functions for logical-AND and OR With this commit, Slang will emit function calls to `and()` and `or()` for the logical-AND and logical-OR when the operands are non-scalar and the target profile is SM6.0 and above. This is required change from SM6.0. For WGSL, there is no operator overloadings of `&&` and `||` when the operands are non-scalar. Unlike HLSL, WGSL also don't have `and()` nor `or()`. Alternatively, we can use `select()`.
2025-02-06Support stage_switch. (#6311)Yong He
* Support stage_switch. * Update proposal status. * Fix gl_InstanceID. * Fix.
2025-02-02Add support for WGSL subgroup operations (#6213)Darren Wihandi
* initial work * more work * more work on glsl intrinsics * add subgroup broadcast for glsl * wip add wgsl extension tracking * enable tests, enable extensions and added some todos * format and warning fixes * fix wgsl extension tracker --------- Co-authored-by: Yong He <yonghe@outlook.com>
2025-01-31Respect per-target debug options (#6193)Anders Leino
* Base compiler options for targets on target-specific compiler options Before this change, the target compiler options were based on the linkage-wide compiler options, which where later again inherited from (basically a no-op). With this change, the target-specific compiler options are added first, and then the linkage-wide comnpiler options are inherited from. * Remove debug instructions if target-specific setting is NONE This helps to address #6092. * Make sure the linkage debug info level is sufficient for each target This closes #6092.
2025-01-30Support cooperative vector (#6223)Jay Kwak
* Support cooperative vector without Vulkan-header update Adding a Slang support for cooperative vector. But this commit doesn't have Vulkan-header update.
2025-01-22Add validation for destination of atomic operations (#6093)Anders Leino
* Reimplement the GLSL atomic* functions in terms of __intrinsic_op Many of these functions map directly to atomic IR instructions. The functions taking atomic_uint are left as they are. This helps to address #5989, since the destination pointer type validation can then be written only for the atomic IR instructions. * Add validation for atomic operations Diagnose an error if the destination of the atomic operation is not appropriate, where appropriate means it's either: - 'groupshared' - from a device buffer This closes #5989. * Add tests for GLSL atomics destination validation Attempting to use the GLSL atomic functions on destinations that are neither groupshared nor from a device buffer should fail with the following error: error 41403: cannot perform atomic operation because destination is neither groupshared nor from a device buffer. * Validate atomic operations after address space specialization Address space specialization for SPIR-V is not done as part of `linkAndOptimizeIR`, as it is for e.g. Metal, so opt out and add a separate call for SPIR-V. * Allow unchecked in/inout parameters for non-SPIRV targets * Clean up callees left without uses during address space specialziation * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: Yong He <yonghe@outlook.com>
2025-01-22Remove unnecessary parameters from Metal entry point signature (#6131)Darren Wihandi
* fix metal entry point global params * address review comments, cleanup and test * remove dead code * undo accidental change * address review comments and cleanup * minor fix and cleanup --------- Co-authored-by: Yong He <yonghe@outlook.com>