summaryrefslogtreecommitdiff
path: root/source
AgeCommit message (Collapse)Author
2025-10-16Immutable access qualifier for pointers and use `__ldg` on cuda. (#8710)Yong He
This PR implements `Access.Immutable` to allow pointers to immutable data. The new type `ImmutablePtr<T>` is defined as an alias of `Ptr<T, Address.Immutable>`. By forming a immutable pointer, the programmer is conveying to the compiler that the data at the pointer address will never change during the execution of the current program. Therefore loads from immutable pointers can be deduplicated by the compiler, and will translate to `__ldg` when generating code for CUDA. The SPIRV backend is not changed in this PR, since the current SPIRV spec makes it very difficult to specify loads from immutable address without generating tons of wrappers and boilerplate type declarations. We would like to see the spec evolved a bit to around its support of `NonWritable` physical storage pointers or immutable loads before we attempt to express such immutability in SPIRV. For now we simply emit ordinary pointers and loads when generating spirv. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-10-15Clean up Slang IR representation of undefined values (#8708)Theresa Foley
Prior to this change, the Slang IR used a single opcode (`kIROp_Undefined`) to encode all cases of undefined values. The particular motivation for this change was a need to distinguish those undefined values that represent a load from an uninitialized memory location versus other sorts of undefined values. If transforming a variable into SSA form results in `undefined` values in cases where the a `load` was executed without a prior `store`, that represents an error on the programmer's part, and should be diagnosed. However, other cases of undefined values can arise during program transformation and optimization, and should not typically result in diagnostics being emitted. While it was not the original motivation for this change, it is also worth noting that the LLVM project has transitioned from initially using only a single `undef` instruction to having a more nuanced model, and the same factors that motivated their shift also apply to the Slang IR. Counter-intuitively, the semantics of undefined values actually need to be carefully defined. Concretely, this change splits the pre-existing `undefined` opcode into two sub-cases: - `kIROp_LoadFromUninitializedMemory`, to represent the case of loading from a memory location (such as a local variable) that has not been initialized. - `kIROp_Poison`, corresponding to the LLVM `poison` value. Our poison instruction is intended to have semantics comparable to LLVM's equivalent. Conceptually, any operation that is invoked with a poison value as input will (with a few exceptions) produce a poison value as output. One can think of the behavior of `poison` as similar to how not-a-number values propagate in floating-point computations: by default they "infect" the result of any computation they are involved in. This semantic choice helps to ensure that many optimizations end up being correct in the presence of undefined values, even if they did not specifically account for them. The `kIROp_LoadFromUninitializedMemory` case is comparable to the combination of `freeze` and `undef` in LLVM. An LLVM `undef` value has semantics that allow *each* use of that value to be replaced with a *different* arbitrary value; these semantics cause many optimizations to only be correct in the absence of undefined values. An LLVM `freeze` instruction can take an undefined value as input, and produces a single value that is still arbitrary, but must be consistent across all uses. The latter semantics are what we want, since a given `load` from an uninitialized memory location will yield an arbitrary-but-fixed value. Note that we intentionally do not have a direct analogue to LLVM's `undef` instruction, because of the way that `undef` causes so many complications when trying to write optimizations. We also do not add a `kIROp_Freeze` instruction in this change, but that is simply because we currently have no need for it. Existing code that was creating `IRUndefined` values has been updated to create either `IRPoison` or `IRLoadFromUninitializedMemory` values, as appropriate to the use case. Code that was checking for the `kIROp_Undefined` opcode has been updated to either check for both of the new opcodes (in the case of `switch` statements), or to use `as<IRUndefined>` to perform a dynamic cast to the common base type of the two new instructions. Note that this change does not alter the way that instructions representing undefined values are typically emitted as ordinary instructions in the block that produces an undefined value. While emitting `IRLoadFromUninitializedMemory` as an ordinary instruction is exactly what we want, the `IRPoison` case would actually be better represented in Slang IR as a "hoistable" instruction, so that there would only be a singular `poison` value of each type. Changing `IRPoison` to be hoistable would be a good follow-up change, but might run into more challenges depending on what assumptions (if any) the codebase is making about where undefined values get emitted. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-10-14Handle SPIR-V aliases (#8704)Jay Kwak
Fixes https://github.com/shader-slang/slang/issues/8703
2025-10-13Fix segfault on arrays of structs containing parameter blocks (#8555)Ellie Hermaszewska
Closes https://github.com/shader-slang/slang/issues/8154 However there is further design work to do on implementing the "NonAddressableType" suggestion
2025-10-11Decouple debug level control from separate-debug-info (#8680)Jay Kwak
Fixes https://github.com/shader-slang/slang/issues/8649
2025-10-11Add diagnostic for cyclic #include. (#8679)Yong He
2025-10-10Allow entry points with missing numthreads on CPU targets (#8678)Julius Ikkala
Several tests have compute entry points without a `[numthreads(x,y,z)]` decoration. Currently, none of these tests run on the CPU target, as they crash the compiler. I took a look at the SPIR-V emitter, which falls back to a workgroup size of (1,1,1): https://github.com/shader-slang/slang/blob/1e0908bd7107dfbdac912b693c3ab9bd6e1dc8b3/source/slang/slang-ir-spirv-legalize.cpp#L1635-L1643 To match this behaviour, this PR implements a fallback solution that makes `emitCalcGroupExtents()` emit (1,1,1). This PR is both a question and a suggestion; I'm not sure the approach here is at all reasonable. Personally, I'd just like to explicitly add `[numthreads(1,1,1)]` to all such tests, but I don't know if it's actually legal and supported to not have a `numthreads`. So the implementation here is a bit conservative. I ran across these when I went through tests for the upcoming LLVM target. These were the final blockers to get all autodiff and language-features tests passing (not counting the ones using things like wave intrinsics and barriers etc.)
2025-10-10Fix `specializeRTTIObject` to use non-zero RTTI value to work with ↵Yong He
`Optional<T>`. (#8677) Closes #8673. The issue is that we use the RTTI field of an existential to check if it is null. We have the logic to help the user to fill in a non-zero value for the RTTI field when such an object is filled from the host. However, when there is slang code creating an existential value, we still have old logic in the compiler that just fills in 0 for the RTTI field, causing an `Optional<IFoo>.hasValue` to always return false in such cases.
2025-10-10Addition of `Load`/`Store` coherent operations (#8395)16-Bit-Dog
Fixes: https://github.com/shader-slang/slang/issues/7634 Duplicate of PR https://github.com/shader-slang/slang/pull/8052 Primary Changes: * Added `storeCoherent` and `loadCoherent` for coherent load/store via pointers. This is backed by `IRMemoryScopeAttr` which is an `IRAttr` attached to `IRLoad` and `IRStore` * Logic in `source\slang\slang-emit-spirv.cpp` for load/store emitting has been reworked to be less messy and more maintainable * Add to `hlsl.meta.slang` coop vector and coop matrix coherent load/store operations Secondary Changes: * Added a missing load/store test for coop matrix: `tests\cooperative-matrix\load-store-pointer.slang` --------- Co-authored-by: ArielG-NV <aglasroth@nvidia.com> Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: Nathan V. Morrical <natemorrical@gmail.com>
2025-10-10implement dot products for 1 vectors (#8599)Ellie Hermaszewska
Closes https://github.com/shader-slang/slang/issues/8378
2025-10-10Specialize interfaces in DebugFunction (#8617)Julius Ikkala
E.g. in [generic-extension-2.slang](https://github.com/shader-slang/slang/blob/master/tests/language-feature/extensions/generic-extension-2.slang), incorrect DebugFunctions are generated for `getFirstOuter`: ``` let %33 : Void = DebugFunction("getFirstOuter", 18 : UInt, 3 : UInt, %26, Func(Int, 0 : Int)) ``` This happens because specialization passes are leaving a `%IFoo` in the function type, instead of replacing with a concrete type: ``` let %34 : Void = DebugFunction("getFirstOuter", 18 : UInt, 3 : UInt, %26, Func(Int, %IFoo)) ``` and later, `cleanUpInterfaceTypes()` just replaces all interfaces with the literal zero. So now we have a parameter type which isn't actually a type at all, but an IntLit instead. I'm not sure if the approach I picked is good, though. Some other options that crossed my mind were: * Make `fixUpFuncType` also update related DebugFunctions - But is there a reason why DebugFunctions separately carry a function type in the first place? * Make `cleanUpInterfaceTypes` less aggressive or at least replace types with a type instead of a value - But this will still make the debug info incorrect :(
2025-10-108503 wgsl depth texture (#8645)Sami Kiminki (NVIDIA)
Add built-in type aliases for DepthTexture* and unify Sampler*Shadow Add the following type aliases: - DepthTexture1D, DepthTexture1DArray - DepthTexture2D, DepthTexture2DArray - DepthTexture2DMS, DepthTexture2DMSArray - DepthTexture3D - DepthTextureCube, DepthTextureCubeArray These match with the type aliases for non-depth textures. Also, unify the Sampler*Shadow type aliases with DepthTexture* ones. This adds the following: - Sampler2DMSShadow - Sampler2DMSArrayShadow and removes the Sampler3DArrayShadow type alias. As a side-effect, the descriptions of Sampler*ArrayShadow type aliases are fixed ("texture-sampler for shadow" ==> "texture-sampler array for shadow"). Update the slang tests to use the newly introduced type aliases instead of the custom type aliases that use _Texture<> directly. Add DepthTexture testing in hlsl-intrinsic/texture/texture-intrinsics. Do this by extracting the test logic of computeMain() in a separate function and parametrize it for non-depth/depth texture types. This adds basic coverage for the following types: - DepthTexture1D - DepthTexture2D - DepthTexture3D - DepthTextureCube - DepthTexture1DArray - DepthTexture2DArray - DepthTextureCubeArray Issue #6166 Issue #8503
2025-10-10Update debug var when in-param proxy var is being updated. (#8671)Yong He
Closes #8664. The problem is that when there is an `in` parameter, Slang will create a local variable to proxy the parameter, copy the value of the parameter into the proxy variable, and replace all uses of the parameter in the function body to use the proxy variable instead. This way all writes to the parameter become writes to the proxy variable. However, when there is debug info enabled, we are also going to create a "debugVariable" corresponding to the parameter, but this debugVariable isn't updated when the proxy variable is updated. The fix is to map the proxy var instead of the original param to the debug var during the `insertDebugValueStore` pass, so that any changes to the proxy var will result in additional stores being inserted to the debug var. Allowing function body to modify an `in` parameter is a bad legacy behavior we inherited from HLSL that we should really be moving away from. I would like us to completely treat an `in` parameter as immutable by default in the next language version (Slang 2026), and make it an error if the user tries to do so. This will allow us to generate much cleaner code and in many cases would help with performance.
2025-10-09Improve perf with `-separate-debug-info` (#8670)Jay Kwak
When Slang form a new spirv code without the debug info, List container had to reserve the memory space before adding items in it. This improves the given repro test time from 56 minutes to 6 minutes.
2025-10-10Defer `IRCastStorageToLogicalDeref` in lowerBufferElementType pass. (#8668)Yong He
Fix a regression on metal test. In `lowerBufferElementTypeToStorageType` pass, not only we want to defer an argument that is `CastStorageToLogical` to the callee, but also apply the same defer logic to `CastStorageToLogicalDeref` as well. Because `CastStorageToLogicalDeref` will appear as argumnet if `lowerBufferElementTypeToStorageType` is run before we apply the `in->borrow` transformation pass, which is the case for metal parameter block legalization.
2025-10-10Small fix to buffer load specialization pass to allow more specialization to ↵Yong He
happen. (#8653) This allows us to further cleanup unnecessary copies in the target code we generate. Part of effort of #8652.
2025-10-08Fix DerivativeGroupQuadsKHR workgroup size validation for texture sampling ↵Lujin Wang
(#8647) Fixes #8545 where Slang generates SPIR-V with DerivativeGroupQuadsKHR execution mode but doesn't validate workgroup sizes when texture sampling triggers automatic derivative computation. **Root Cause**: Validation code was looking for IRNumThreadsDecoration on the wrong IR node **Fix**: One-line change in slang-emit-spirv.cpp to search decoration on entryPoint instead of entryPointDecor **Tests**: Added regression tests for both quad and linear derivative group validation Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Lujin Wang <lujinwangnv@users.noreply.github.com> Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-10-08Allow 1D SV_DispatchThreadID in CPU targets (#8612)Julius Ikkala
The varying param legalization pass didn't deal with this 1D form of SV_DispatchThreadID for CPU targets: ```slang void computeMain(int i : SV_DispatchThreadID) ``` Instead, it just overrode the type of `i` with a `uint3`, breaking lots of code that attempted to use `i` for something, like a `switch` statement for example. I ran across this when going through `language-feature` tests for the LLVM target, which will also use this legalization pass. I'm separately submitting this now because this also fixes the existing CPU target. The test I enable in this PR is one that was previously generating broken code on CPU. (somewhat related issue: #7468)
2025-10-08parser: Avoid dropping modifiers when splitting list (#8546)James Helferty (NVIDIA)
Fix for a linked list usage bug; avoids dropping any modifiers when moving type modifiers from a linked list of modifiers into their own linked list. Since this change results in no_diff modifiers to traditional functions ending up on the return type instead of the function (due to the order they're parsed in), we duplicate the no_diff modifier onto the function declaration after the fact. Includes a test for the original issue. The no_diff redistribution case is covered by a slangpy device test case. Fixes #8332 --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-10-08Fix UnixPipeStream::read() not handling EOF (#8626)ncelikNV
Fixes #6754.
2025-10-08Improve texture loads and stores on CUDA (#8644)Simon Kallweit
- fix handling layer and mip level - add support for 1D layered textures - reduce code by using macros - assert when trying to emit unsupported intrinsics There is a new set of unit tests in slang-rhi for exhaustive testing of shader loads/stores on textures. These fixes allow to enable most of these tests. Formatted loads/stores on surfaces are not supported in PTX ISA, so this would require codegen for the conversion which in theory should be possible but not as part of the CUDA prelude.
2025-10-08`ExprLoweringVisitorBase::getDefaultVal(Type*)` use ↵Ronan
`MakeVector/MatrixFromScalar` (#8512) - Allows using `Vector/Matrix` type with yet unresolved dimensions - Simpler implementation and in-line with default `Array` - Added `test/bugs/gh-8512.slang`
2025-10-07Fix a bug that causes a struct field to be initialized twice. (#8619)Yong He
We insert field initialization logic at the beginning of every ctor in `synthesizeCtorBody`, but then immediately inserts another round of initialization again for explicit ctors in `maybeInsertDefaultInitExpr`, both called from `SemanticsDeclBodyVisitor::visitAggTypeDecl` right next to each other. The fix is to remove `maybeInsertDefaultInitExpr`. This change also enhances the address aliasing analysis, so that for the following case: ``` this->member1 = 0; this->member2 = 0; this->member1 = param; ``` We can still remove the first assignment to `this->member1` despite seeing `this->member2=0`, since it is easy to know that `this->member2` cannot alias with `this->member1`. Closes #8600.
2025-10-07Use symbol alias instead of wrapper synthesis to implement link-time types. ↵Yong He
(#8603) This change achieves link-time type resolution with a different mechanism. For `extern struct Foo : IFoo = FooImpl;`, instead of synthesizing a wrapper type `Foo` that has a `FooImpl inner` field and dispatches all interface method calls to `inner.method()`, this PR completely removes this synthesis step, and instead just lower such `extern`/`export` types as `IRSymbolAlias` instructions that is just a reference to the type being wrapped. Then we extend the linker logic to clone the referenced symbol instead of the SymbolAlias insts itself during linking. By doing so, we greatly simply the logic need to support link-time types, and achieves higher robustness by not having to deal with many AST synthesis scenarios. Closes #8554. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-10-06Prefer IntegerType over LogicalType integer matrix mul() overloads (#8426)pdeayton-nv
Integer mul(matrix, matrix) and mul(vector, matrix) are not disambiguated between __BuiltinIntegerType and __BuiltinLogicalType, emitting an ambiguous call compilation error. Use the OverloadRank attribute to prefer the IntegerType overload over the LogicalType overload. Fixes #8424
2025-10-05Matrix legalization for missing instructions & MakeMatrix of vectors (#8605)Julius Ikkala
Fixes these issues: * During matrix legalization, `MakeMatrix` crashed if it was given a list of vectors instead of individual elements. * Matrix casts, IRem, and Frem would be emitted using arrays, e.g. `IntToFloatCast` with `float2[2]` parameters. I found these bugs while enabling various `hlsl-intrinsic` tests for the LLVM target. For now, I've chose to get rid of all matrix types with the matrix legalization pass so that the LLVM emitter doesn't need to be aware. These bugs were preventing `tests/hlsl-intrinsic/matrix-double-reduced-intrinsic.slang` and `tests/hlsl-intrinsic/matrix-double.slang` from passing there.
2025-10-04Respect isShadow() flag when setting depth type in SPIR-V backend (#8604)Nils Hasenbanck
This is important for SPIR-V targets that need to know if a texture is designated as a depth texture or not (for example WebGPU). I didn't change the default behavior for when isShadow() is not set, since I didn't want to make the change too invasive.
2025-10-03Add direct usage support for gl_DrawID (#8594)Copilot
- [x] Add `gl_DrawID` property declaration in `source/slang/glsl.meta.slang` similar to `gl_BaseVertex` and `gl_BaseInstance` - [x] Ensure it maps to `SV_DrawIndex` semantic - [x] Add extension requirements in `slang-ir-glsl-legalize.cpp` (GLSL 460 and GL_ARB_shader_draw_parameters) - [x] Create test case to verify SPIRV output contains `DrawIndex` decoration - [x] Run formatting script before committing - [x] Verify existing tests still pass - [x] Remove accidentally committed temp files (1.glsl, 3.spv-asm) - [x] Update .gitignore to prevent similar files from being committed <!-- START COPILOT CODING AGENT SUFFIX --> <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>Missing direct usage support for gl_DrawID</issue_title> > <issue_description># Problem Description > Because DirectX does not have support for DrawIndex like vulkan does, i add the support in d3d12 by emulating it and storing it in a root constant. In vulkan though, i should be able to use it directly, but because now i store it as a global variable, i cannot use the SV_DrawIndex semantic either. > > # Preferred Solution > I'd like to be able to use gl_DrawID like you how can use gl_BaseVertex and gl_BaseInstance. > </issue_description> > > <agent_instructions>You can just edit glsl.meta.slang and add a declaration for gl_DrawID in a similar way to gl_BaseVertex, and map it to SV_DrawIndex. > > Create a `//TEST:SIMPLE(filecheck=SPIRV): -target spirv` test case, and use filecheck to verify the output contains the correct SPIRV `DrawIndex` decoration.</agent_instructions> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > </comments> > </details> Fixes shader-slang/slang#8548 <!-- START COPILOT CODING AGENT TIPS --> --- ✨ Let Copilot coding agent [set things up for you](https://github.com/shader-slang/slang/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com>
2025-10-03Fix legalization crash when processing metal parameter blocks. (#8591)Yong He
Closes #7606. When Slang compile for a bindful target, we will run the resource type legalization pass to hoist resource typed struct fields outside of the struct type and define them as global parameters and passing them around via dedicated function parameters. When we compile for a bindless target, we don't run this pass. However, Metal is a hybrid bindful and bindless target. We need to run type legalization for the constant buffer, but skip type legalization for parameter block. The previous attempt to support this behavior is to hack the type legalization pass to return `LegalVal::simple` when it sees a `ParameterBlock<T>`. However, whenever the code is accessing `parameterBlock.someNestedField`, the type of the nested field may get a `LegalType::tuple`, and now we will run into inconsistent scenarios where we have a `LegalVal::simple` on the operand val, and but the legalization logic is expecting that val to be a `LegalType::tuple`. This breaks a lot of assumptions and invariants in the type legalization pass, resulting unstable/fragile behavior. To systematically solve this problem, this change generalizes the existing legalize buffer element type pass to translate `ParameterBlock<Texture2D>` (and similar cases) to `ParameterBlock<Texture2D.Handle>`. So that such parameter block will always be legalized to `LegalType:::simple` during type legalization, and we will never run into any inconsistent cases. This allowed us to get rid of the hacky logic in the type legalization pass to try to workaround the inconsistencies.
2025-10-03Rename some symbols related to pointers types (#8592)Theresa Foley
Note that while this change touched a large numer of files, there are no changes to functionality being made here. The only things being done are renaming various symbols and, in a few cases, updating or adding comments for consistency with the new names. The core of the naming changes are: * Most things named to refer to `OutType` (e.g., `IROutType`, `IRBuilder::getOutType()`, etc.) have been consistently renamed to refer to `OutParamType`, to emphasize that the relevant AST/IR node types are only intended for use to represent `out` parameters. * The same change as described above for `OutType` is also made for `RefType`, which becomes `RefParamType` in most cases. One mess that this exposes is the way that the `ExplicitRef<T>` type in the core module currently lowers to `IRRefParamType`. This change sticks to the rule of not making functional changes, so that mess is left as-is for now. * Names referring to `InOutType` have been changed to instead refer to `BorrowInOutType`. The intention with this naming change is to emphasize that the Slang rules for `inout` are semantically those of a borrow (or at least our interpretation of what a borrow means). * Names referring to `ConstRefType` have been changed to instead refer to `BorrowInType`. This change starts work on clarifying that the existing `__constref` modifier was never intended to be a read-only analogue of `__ref`, and instead is the input-only analogue of `inout`. * The `ParameterDirection` enum type has been changed to `ParamPassingMode`, to reflect the fact that the concept of "direction" fails to capture what is actually being encoded, particularly once we have modes beyond simple `in`/`out`/`inout`. While this change does not alter behavior in any case (the user-exposed Slang language is unchanged), it is intended to set up subsequence changes that will work to make the handling of these types in the compiler more nuanced and correct. Breaking this part of the change out separately is primarily motivated by a desire to minimize the effort for reviewers. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-10-02Relax the inst definition order rule (#8588)kaizhangNV
Close #8572. The root cause of the issue is that in `_replaceInstUsesWith` call, if the use of the inst is a generic parameter, and the inst is the data type of that generic parameter, we could end up of moving the data type before the generic parameter. This will break the layout of generic parameters, where all the generic parameters should be laid consecutively at the beginning of the first block of the generic. Therefore, we don't make that relocation for such case.
2025-10-02Fix the missing derivative member check (#8569)kaizhangNV
Close #8568. The root cause of this issue is that when the struct is indirectly inherited from IDifferentiable type, we will not check the reference of the DerivativeMember attribute. This PR fixes this issue by checking the DerivativeMember attribute right before synthesize the requirement methods of IDifferentiable interface.
2025-10-01Fix incorrect binding index assignment for StructuredBuffer and ↵Copilot
ByteAddressBuffer with DescriptorHandle (#8252) - [x] Fix segmentation fault in wrapConstantBufferElement for DescriptorHandle types - [x] Split DescriptorKind.Buffer into ConstantBuffer and StorageBuffer - [x] Update binding enums with descriptive names (ConstantBuffer_Read, StorageBuffer_Read, etc.) - [x] Update resource type mappings for correct binding assignments - [x] Update template logic to handle ConstantBuffer and StorageBuffer kinds separately - [x] Update tests to reflect correct binding assignments - [x] Split DescriptorKind.TexelBuffer into UniformTexelBuffer and StorageTexelBuffer - [x] Update TextureBuffer<T> to use UniformTexelBuffer kind - [x] Update _Texture extension to determine texel buffer kind based on access mode - [x] Update test desc-handle-1.slang to handle new DescriptorKind enum cases <!-- START COPILOT CODING AGENT TIPS --> --- ✨ Let Copilot coding agent [set things up for you](https://github.com/shader-slang/slang/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: Yong He <yonghe@outlook.com>
2025-10-01Misc parser improvements. (#8563)Yong He
- Fix bug parsing multiple link-time structs on the same line. Closes #8553. - Fix bug parsing anonymous struct type as function return type in modern syntax. Closes #8558 - Support semantics on modern style param/var declarations.
2025-10-01Use glslang public API (#8369)Jeremy Hayes
Stop including private header (see #8333). Co-authored-by: Yong He <yonghe@outlook.com>
2025-09-30Enhance buffer load specialization pass to specialize past field extracts. ↵Yong He
(#8547) This allows us to specialize functions whose argument is a sub element of a constant buffer, instead of being only applicable to entire buffer element. Closes #8421. This change also implements a proper heuristic to determine when to specialize the calls and defer the buffer loads. This PR addresses a pathological case exposed in `slangpy\slangpy\benchmarks\test_benchmark_tensor.py`, which used to take 27ms to finish, and now takes 1.25ms. For example, given: ``` struct Bottom { float bigArray[1024]; [mutating] void setVal(int index, float value) { bigArray[index] = value; } } struct Root { Bottom top[2]; [mutating] void setTopVal(int x, int y, float value) { top[x].setVal(y, value); } } RWStructuredBuffer<Root> sb; [shader("compute")] [numthreads(1, 1, 1)] void compute_main(uint3 tid: SV_DispatchThreadID) { sb[0].setTopVal(1, 2, 100.0f); } ``` We are now able to specialize the call to `setTopVal` into: ``` void compute_main(uint3 tid: SV_DispatchThreadID) { setTopVal_specialized(0, 1, 2, 100.0f); } void setTopVal_specialized(int sbIdx, int x, int y, float value) { Bottom_setVal_specialized(sbIdx, x, y, value); } void Bottom_setVal_specialized(int sbIdx, int x, int y, float value) { sb[sbIdx].top[x].bigArray[y] = value; } ``` And get rid of all unnecessary loads. Achieving this requires a combination of function call specialization and buffer-load-defer pass. The buffer-load-defer pass has been completely rewritten to be more correct and avoid introducing redundant loads. This PR also adds tests to make sure pointers, bindless handles, and loads from structured buffer or constant buffers works as expected.
2025-09-30Handle getEquivalentStructuredBuffer(castDynamicResource) in byte address ↵Yong He
legalization pass. (#8567) This is crash that be triggered by providing custom `getDescriptorFromHandle` and use it to return access a ByteAddressBuffer from a bindless handle. Closes #8355.
2025-09-30canonical type equality constraint (#8445)Ronan
Fixes #8439 When checked, generic type equality constraints types are now in a canonical order, allowing for a commutative type equality operator. --------- Co-authored-by: Mukund Keshava <mkeshava@nvidia.com>
2025-09-30Rewriting the lower-buffer-element-type pass to avoid unnecessary ↵Yong He
packing/unpacking. (#8526) Part of the effort to improve the performance of generated SPIRV code. The existing lower-buffer-element-type pass works by loading the entire buffer element content from memory, and translate it to logical type stored in a local variable at the earliest reference of a buffer handle. This means that is can generate inefficient code that reads more than necessary. Consider this example: ``` struct BigStruct { bool values[1024]; } ConstantBuffer<BigStruct> cb; void test(BigStruct v) { if (v.values[0]) { printf("ok"); } } [numthreads(1,1,1)] void computeMain() { test(cb); } ``` In IR, the `computeMain` function before lower-buffer-element-type pass is something like following: ``` func test: %v = param : BigStruct %barr = fieldExtract(%v, "values") %element = elementExtract(%barr, 0) ... // uses %element func computeMain: %v = load(cb) call %test %v ``` The existing lower-buffer-element-type pass will rewrite the bool array in `BigStruct` into `int` array so it is legal in SPIRV. However, it does so by inserting the translation on the first `load` of the constant buffer: ``` struct BigStruct_std430 { int values[1024]; } var cb : ConstantBuffer<BigStruct_std430>; func computeMain: %tmpVar : var<BigStruct> call %unpackStorage(%tmpVar, cb) %v : BigStruct = load %tmpVar call %test %v ``` This means that the entire array will be loaded and translated to int, before calling `test`, which only uses one element. It turns out that the downstream compiler isn't always able to optimize out this inefficient translation/copy. This PR completely rewrites the way buffer-element-type lowering is handled to avoid producing this inefficient code. It works in two parts: first we turn on the `transformParamsToConstRef` pass for SPIRV target as well, so we will translate the `test` function to take the `v` parameter as `constref`. The second part is a redesigned buffer-element-type pass that defers the storage-type to logical-type translation until a value is actually used by a `load` instruction. In this example, after `transformParamsToConstRef`, the IR is: ``` func test: %v = param : ConstRef<BigStruct> %barr = fieldAddr(%v, "values") %elementPtr = elementAddr(%barr, 0) %element = load(%elementPtr) ... // uses %element func computeMain: call %test %cb ``` The new `buffer-element-type-lowering` pass will take this IR, and insert translation at latest possible time across the entire call graph, and translate the IR into: ``` func test: %v = param : ConstRef<BigStruct_std430> %barr = fieldAddr(%v, "values") %elementPtr : ptr<int> = elementAddr(%barr, 0) %element_int = load(%elementPtr) %element = cast(%element_int) : %bool ... // uses %element func computeMain: call %test %cb ``` In this new IR, there is no longer a load and conversion of the entire array. See new comment in `slang-ir-lower-buffer-element-type.cpp` for more details of how the pass works. This PR also address many other issues surfaced by turning on `transformParamsToConstRef` pass on SPIRV backend. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-09-29Optimize CapabilitySet deserialization performance (#8552)Ellie Hermaszewska
Closes https://github.com/shader-slang/slang/issues/8477 About a 50% reduction in deser performance for capability sets
2025-09-29Update function type after inserting KernelContext parameter (#8551)Julius Ikkala
Without this, there are functions with missing parameters in their type in the IR after running the `introduceExplicitGlobalContext` pass: ``` [layout(%15)] [export("_SV4test12outputBuffer")] [nameHint("outputBuffer")] let %outputBuffer : _ = key [noSideEffect] [export("_S4test7dostuffp1pi_ff")] [nameHint("dostuff")] func %dostuff : Func(Float, Float) { block %34( [nameHint("f")] param %f : Float, [nameHint("kernelContext")] param %kernelContext : Ptr(%KernelContext, 0 : UInt64, 1 : UInt64)): let %35 : Float = mul(%f, %f) let %36 : Ptr(ConstantBuffer(%GlobalParams, DefaultLayout), 0 : UInt64, 1 : UInt64) = get_field_addr(%kernelContext, %globalParams) let %37 : ConstantBuffer(%GlobalParams, DefaultLayout) = load(%36) let %38 : Ptr(RWStructuredBuffer(Float, DefaultLayout, %20)) = get_field_addr(%37, %outputBuffer) let %39 : RWStructuredBuffer(Float, DefaultLayout, %20) = load(%38) let %40 : Ptr(Float) = rwstructuredBufferGetElementPtr(%39, 1 : Int) let %41 : Float = load(%40) let %42 : Float = mul(%35, %41) return_val(%42) } ``` Not sure why this doesn't seem to negatively affect existing targets, but it sure is an issue for the LLVM target I'm working on. I could've left this fix for that PR, but I want to check now if this causes any issues with the existing targets using the CI. This also happens with the entry point functions, where the function type is not updated after adding `ComputeThreadVaryingInput`. This had no effect in the C++ target because `convertEntryPointPtrParamsToRawPtrs(irModule);` is called right after and fixes it.
2025-09-29Fix segfault when shader entry points return resource types (#8434)Copilot
The Slang compiler was segfaulting when trying to compile shaders that return resource types (like `Texture2D`, `RWTexture2D`, `SamplerState`, etc.) from entry point functions. This occurred because there was missing validation that should reject such invalid return types before they reach IR generation. For example, this code would cause a segfault: ```slang StructuredBuffer<Texture2D<int>> skyLight; [shader("compute")] Texture2D<int> computeMain(uint3 threadID : SV_DispatchThreadID) { return skyLight[threadID.x]; } ``` ## Root Cause The issue was in the entry point validation logic in `validateEntryPoint()`. While there was a TODO comment indicating that return type validation should be performed, it was never implemented. The compiler would accept the invalid shader code and attempt to process it during IR lowering, where resource types as return values are not properly handled, leading to a segmentation fault. ## Solution 1. **Added robust validation**: Modified `validateEntryPoint()` in `slang-check-shader.cpp` to use the existing `SemanticsVisitor::getTypeTags()` functionality to check for invalid return types by detecting `TypeTag::Opaque` and `TypeTag::Unsized` bits. This leverages the existing type analysis infrastructure that comprehensively handles: - Direct resource types (Texture2D, RWTexture2D, SamplerState, etc.) - Structs containing resource-typed fields (through type tag propagation) - Nested structures and complex type hierarchies - Arrays and other composite types 2. **Added diagnostic message**: Uses existing diagnostic `entryPointCannotReturnResourceType` (error 38010) that provides a clear error message explaining why resource types cannot be returned from shader entry points 3. **Updated existing tests**: Modified existing tests to match the updated validation behavior ## Result Instead of a segfault, users now get a clear, actionable error message: ``` error 38010: entry point 'computeMain' cannot return type 'Texture2D<int>' that contains resource types ``` The fix properly handles all resource types including `Texture2D`, `RWTexture2D`, `SamplerState`, and others, while preserving the ability to compile valid shaders that return simple data types. Fixes #6438. <!-- START COPILOT CODING AGENT TIPS --> --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: expipiplus1 <857308+expipiplus1@users.noreply.github.com> Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com> Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com>
2025-09-26Add SPV_NV_bindless_texture support (#8534)Lujin Wang
Treat DescriptorHandle as uint64_t instead of uint2. Implement target-specific SPIR-V emission with the bindless texture support. For OpImageTexelPointer, Image must have a type of OpTypePointer with Type OpTypeImage. Fix the issue by using [constref] in __subscript. Add a test coverage for various texture/sampler handle types. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-09-26Diagnostic on use of unsupported entry point modifiers (#8487)James Helferty (NVIDIA)
Generate a diagnostic warning whenever unsupported modifiers (keywords, attributes) are found on entry point parameters. These have been silently ignored up until now, with the parser accepting them but Slang not actually doing anything with them. Fixes #7151 --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-09-25Prepare VulkanSDK release Oct 2025 (#8525)Jay Kwak
Related to - https://github.com/shader-slang/slang/issues/8519
2025-09-24Remove unnecessary Load and Store pair (#8433)Jay Kwak
This commit removes unnecessary Load and Store pairs in IR. When the IR is like ``` let %1 = var let %2 = load(%ptr) store(%1 %2) ``` This PR will replace all uses of %1 with %ptr. And the load and store instructions will be removed. But I found that there can be cases where %2 might be still used later in other IRs. For these cases, the removal of load instruction relies on DCE. --------- Co-authored-by: slangbot <ellieh+slangbot@nvidia.com>
2025-09-23Legalize type as well in legalizeOperand (#8483)Gangzheng Tong
This fixes a type mismatch issue. See the generated cuda code ```cuda struct Query_0 { EmptyExample_0 query_0; uint hasNonEmptyAbsorbingBoundary_0; }; struct Query_1 { uint hasNonEmptyAbsorbingBoundary_0; }; struct GlobalParams_0 { Query_0* gQuery_0; RWStructuredBuffer<float3 > gInput_0; RWStructuredBuffer<float> gOutput_0; }; ... Query_1 _S4 = *globalParams_0->gQuery_0; // ==> type mismatch at call site! ``` **Root Cause:** During the empty type legalization pass in Slang's IR processing, struct types were being optimized. e.g., `Query_0` → `Query_1` with empty type removed), but this created an inconsistency: **Function parameters were updated:** When Query_compute_0 function was legalized, its parameter type was correctly updated from `Query_0` to the optimized `Query_1` **Global parameter types were NOT updated:** The `ParameterBlock<Struct>` type in globalParams still referenced the old `Query_0` type The PR adds special handling for type operands in the `legalizeInst` function. This triggers the legalization of the `StructType` from the original `legalizeOperand` call site. The leaglized result will be saved in the type-to-legal-type map and be re-used when the same type requires legalization again (e.g. in the `IRFunc` as parameter) Fixes: https://github.com/shader-slang/slang/issues/7905
2025-09-23Lookup refactor (#8467)kaizhangNV
Close #8201. This PR unify the lowering logic for LookupDeclRef of an interface requirement. We will always lower this AST node to a LookupWitness IR. The key of this IR is the special witnessTableType `ThisTypeWitness`, this witness Table is simply a wrapper for an interface type. Our current specialization pass doesn't handle this kind of LookupWitness IR at all, so we will also add the specialization of this_type IR as well.
2025-09-23fix a crash when using type equality constaint (#8515)kaizhangNV
Close #8193. When constructing `TransitiveTypeWitness` node, we should check if there is operand that represents two equal times. Currently, we only check whether the operand is `TypeEqualityWitness`, which is not good enough, because a `DeclaredSubtypeWitness` could also be representing two same types, in that case, we should also const fold this kind of witness. Fails to do so, we could finally ends up with a generating a lookup witness IR on a generic parameter that is not supposed to be looked up.
2025-09-23Fix varying output structs in GLSL source (#8501)Julius Ikkala
Closes #8500. `slang-ir-translate-global-varying-var.cpp` turns the global varying outputs into a struct that's returned from the entry point. Currently, there's a problem when one of the outputs is a struct. It always creates a generic `IRTypeLayout`, even when a correct type layout already exists. Somehow, this appears to work when the global varying outputs aren't structs. The crash occurs in `slang-ir-glsl-legalize.cpp:createGLSLGlobalVaryingsImpl()`. It correctly handles the generated outer struct, but when that contains an inner struct, it's been given a non-struct type layout and crashes. This PR uses the correct layout if found, instead of generating a broken placeholder. This matches the behaviour that has already been implemented for inputs. Additionally, I removed a call to `addResourceUsage` from both the input and output side. I can't see any way in which it would've affected anything, the layout builder is never used after that call and it doesn't retroactively modify the layout that was already created.