slang.git - Making it easier to work with shaders

Age	Commit message (Collapse)	Author
2025-12-17	fix compiler bug lolHEAD master	yum

2025-10-31	meow	yum

2025-10-28	non-exported public labels get namespaced now	yum

2025-10-28	fix compiler break	yum

2025-10-17	Improve unhandled instruction error message	yum
	It now includes the opcode in human-readable form.
2025-10-17	Optionally disable entry point param cbuffer transform	yum

2025-10-17	Fix infinite loop in SPIRVLegalizationContext::processWorkList (#8712)	davli-nv
	When slangc is invoked with -g, a source shader that has static infinite loop can generate IR that have branch to a block that contains a branch to the first block that contains the first branch, resulting in infinite loop. Change SPIRVLegalizationContext::processWorkList to only add branch target to work list via its parent, this avoids the infinite loop above. Also change addToWorkList to stop addUsersToWorkList, users should be added explicitly by logic for specific insts. Add regression test as tests/spirv/infinite-loop.slang Fixes #8669
2025-10-17	Re-enable slangpy test_blit.py::test_generate_mips for CUDA. (#8740)	Yong He
	Issue is fixed with https://github.com/shader-slang/slang/pull/8710
2025-10-16	Update SPIRV-Tools and SPIRV-Headers to latest versions (#8722)	aidanfnv

2025-10-16	Inline global constants for shader style CPU targets (#8686)	Julius Ikkala
	On the shader-host-callable target, test `gh-4874.slang` generates IR that contains global constants referencing global params. These need to get inlined into functions, as otherwise `introduceExplicitGlobalContext()` will fail with "no outer func at use site for global", making the test crash the compiler.
2025-10-16	Fix wrong diagnostic when checking identical casting expr. (#8727)	Yong He
	`SemanticsVisitor::CheckInvokeExprWithCheckedOperands` made several references to `expr` parameter in its `inout` parameter l-value-ness validation logic to access arguments, which is wrong because `expr` is not necessarily the same as `result`/`invoke` (the result of calling `ResolveInvoke()` in the first line of the function. Changing it to `invoke` for consistency. Also add a special case logic to return early in case the resolved invoke expr is `argument[0]` when the original invoke expr is `T(funcThatReturnsT())`. Closes #8659.
2025-10-16	Fix use of variadic generics with [Differentiable]. (#8736)	Yong He
	There was a bug that causes the compiler failing to treat a `no_diff TypePack` as a type pack, and thus diagnose an error when resolving the following call. The fix is to unwrap any ModifiedType wrappers in `IsTypePack()` check.
2025-10-16	Update slang-rhi (#8709)	Simon Kallweit
	- Update to latest slang-rhi - Enable additional slang-rhi tests for OptiX 8.0 and 8.1
2025-10-16	Fix WGSL bitshift test typo (#8720)	Julius Ikkala
	Random drive-by test fix, this was reading past the end of the buffer but usually succeeded because the expected result is 0.
2025-10-16	Immutable access qualifier for pointers and use `__ldg` on cuda. (#8710)	Yong He
	This PR implements `Access.Immutable` to allow pointers to immutable data. The new type `ImmutablePtr<T>` is defined as an alias of `Ptr<T, Address.Immutable>`. By forming a immutable pointer, the programmer is conveying to the compiler that the data at the pointer address will never change during the execution of the current program. Therefore loads from immutable pointers can be deduplicated by the compiler, and will translate to `__ldg` when generating code for CUDA. The SPIRV backend is not changed in this PR, since the current SPIRV spec makes it very difficult to specify loads from immutable address without generating tons of wrappers and boilerplate type declarations. We would like to see the spec evolved a bit to around its support of `NonWritable` physical storage pointers or immutable loads before we attempt to express such immutability in SPIRV. For now we simply emit ordinary pointers and loads when generating spirv. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-10-16	[CI] Skip slangpy test_blit.py::test_generate_mips for CUDA (#8725)	aidanfnv
	This skips a new test from slangpy that is hitting an internal assert in slang CI, uncaught in testing due to slangpy's CI testing using release builds. See https://github.com/shader-slang/slangpy/issues/575 for details
2025-10-15	Clean up Slang IR representation of undefined values (#8708)	Theresa Foley
	Prior to this change, the Slang IR used a single opcode (`kIROp_Undefined`) to encode all cases of undefined values. The particular motivation for this change was a need to distinguish those undefined values that represent a load from an uninitialized memory location versus other sorts of undefined values. If transforming a variable into SSA form results in `undefined` values in cases where the a `load` was executed without a prior `store`, that represents an error on the programmer's part, and should be diagnosed. However, other cases of undefined values can arise during program transformation and optimization, and should not typically result in diagnostics being emitted. While it was not the original motivation for this change, it is also worth noting that the LLVM project has transitioned from initially using only a single `undef` instruction to having a more nuanced model, and the same factors that motivated their shift also apply to the Slang IR. Counter-intuitively, the semantics of undefined values actually need to be carefully defined. Concretely, this change splits the pre-existing `undefined` opcode into two sub-cases: - `kIROp_LoadFromUninitializedMemory`, to represent the case of loading from a memory location (such as a local variable) that has not been initialized. - `kIROp_Poison`, corresponding to the LLVM `poison` value. Our poison instruction is intended to have semantics comparable to LLVM's equivalent. Conceptually, any operation that is invoked with a poison value as input will (with a few exceptions) produce a poison value as output. One can think of the behavior of `poison` as similar to how not-a-number values propagate in floating-point computations: by default they "infect" the result of any computation they are involved in. This semantic choice helps to ensure that many optimizations end up being correct in the presence of undefined values, even if they did not specifically account for them. The `kIROp_LoadFromUninitializedMemory` case is comparable to the combination of `freeze` and `undef` in LLVM. An LLVM `undef` value has semantics that allow each use of that value to be replaced with a different arbitrary value; these semantics cause many optimizations to only be correct in the absence of undefined values. An LLVM `freeze` instruction can take an undefined value as input, and produces a single value that is still arbitrary, but must be consistent across all uses. The latter semantics are what we want, since a given `load` from an uninitialized memory location will yield an arbitrary-but-fixed value. Note that we intentionally do not have a direct analogue to LLVM's `undef` instruction, because of the way that `undef` causes so many complications when trying to write optimizations. We also do not add a `kIROp_Freeze` instruction in this change, but that is simply because we currently have no need for it. Existing code that was creating `IRUndefined` values has been updated to create either `IRPoison` or `IRLoadFromUninitializedMemory` values, as appropriate to the use case. Code that was checking for the `kIROp_Undefined` opcode has been updated to either check for both of the new opcodes (in the case of `switch` statements), or to use `as<IRUndefined>` to perform a dynamic cast to the common base type of the two new instructions. Note that this change does not alter the way that instructions representing undefined values are typically emitted as ordinary instructions in the block that produces an undefined value. While emitting `IRLoadFromUninitializedMemory` as an ordinary instruction is exactly what we want, the `IRPoison` case would actually be better represented in Slang IR as a "hoistable" instruction, so that there would only be a singular `poison` value of each type. Changing `IRPoison` to be hoistable would be a good follow-up change, but might run into more challenges depending on what assumptions (if any) the codebase is making about where undefined values get emitted. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-10-15	Retry file reads in slang-test to handle intermittent I/O errors (#8713)	Jay Kwak
	Related #8705
2025-10-14	Handle SPIR-V aliases (#8704)	Jay Kwak
	Fixes https://github.com/shader-slang/slang/issues/8703
2025-10-14	Add support targeting older OptiX versions (#8700)	Simon Kallweit
	Currently, the emitted CUDA code does only compile with latest OptiX 9.0. This change allows code to be compiled with OptiX 8.0 upwards by not emitting OptiX calls that are not available. In a later step we should add proper capabilities for the various OptiX versions.
2025-10-13	Fix segfault on arrays of structs containing parameter blocks (#8555)	Ellie Hermaszewska
	Closes https://github.com/shader-slang/slang/issues/8154 However there is further design work to do on implementing the "NonAddressableType" suggestion
2025-10-13	Support tests outside tests directory (#7791) (#8666)	Janne Kiviluoto (NVIDIA)
	As running slang-test in slang root is implicitly assumed (and mentioned in CONTRIBUTING.md), no detailed path checks are done.
2025-10-11	Decouple debug level control from separate-debug-info (#8680)	Jay Kwak
	Fixes https://github.com/shader-slang/slang/issues/8649
2025-10-11	Update build to allow setting external mimalloc path (#8676)	Lujin Wang
	Update the build to allow setting user-specific path for the external module mimalloc.
2025-10-11	Add diagnostic for cyclic #include. (#8679)	Yong He

2025-10-10	Allow entry points with missing numthreads on CPU targets (#8678)	Julius Ikkala
	Several tests have compute entry points without a `[numthreads(x,y,z)]` decoration. Currently, none of these tests run on the CPU target, as they crash the compiler. I took a look at the SPIR-V emitter, which falls back to a workgroup size of (1,1,1): https://github.com/shader-slang/slang/blob/1e0908bd7107dfbdac912b693c3ab9bd6e1dc8b3/source/slang/slang-ir-spirv-legalize.cpp#L1635-L1643 To match this behaviour, this PR implements a fallback solution that makes `emitCalcGroupExtents()` emit (1,1,1). This PR is both a question and a suggestion; I'm not sure the approach here is at all reasonable. Personally, I'd just like to explicitly add `[numthreads(1,1,1)]` to all such tests, but I don't know if it's actually legal and supported to not have a `numthreads`. So the implementation here is a bit conservative. I ran across these when I went through tests for the upcoming LLVM target. These were the final blockers to get all autodiff and language-features tests passing (not counting the ones using things like wave intrinsics and barriers etc.)
2025-10-10	Fix `specializeRTTIObject` to use non-zero RTTI value to work with ↵	Yong He
	`Optional<T>`. (#8677) Closes #8673. The issue is that we use the RTTI field of an existential to check if it is null. We have the logic to help the user to fill in a non-zero value for the RTTI field when such an object is filled from the host. However, when there is slang code creating an existential value, we still have old logic in the compiler that just fills in 0 for the RTTI field, causing an `Optional<IFoo>.hasValue` to always return false in such cases.
2025-10-10	Addition of `Load`/`Store` coherent operations (#8395)	16-Bit-Dog
	Fixes: https://github.com/shader-slang/slang/issues/7634 Duplicate of PR https://github.com/shader-slang/slang/pull/8052 Primary Changes: * Added `storeCoherent` and `loadCoherent` for coherent load/store via pointers. This is backed by `IRMemoryScopeAttr` which is an `IRAttr` attached to `IRLoad` and `IRStore` * Logic in `source\slang\slang-emit-spirv.cpp` for load/store emitting has been reworked to be less messy and more maintainable * Add to `hlsl.meta.slang` coop vector and coop matrix coherent load/store operations Secondary Changes: * Added a missing load/store test for coop matrix: `tests\cooperative-matrix\load-store-pointer.slang` --------- Co-authored-by: ArielG-NV <aglasroth@nvidia.com> Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: Nathan V. Morrical <natemorrical@gmail.com>
2025-10-10	implement dot products for 1 vectors (#8599)	Ellie Hermaszewska
	Closes https://github.com/shader-slang/slang/issues/8378
2025-10-10	Specialize interfaces in DebugFunction (#8617)	Julius Ikkala
	E.g. in [generic-extension-2.slang](https://github.com/shader-slang/slang/blob/master/tests/language-feature/extensions/generic-extension-2.slang), incorrect DebugFunctions are generated for `getFirstOuter`: ``` let %33 : Void = DebugFunction("getFirstOuter", 18 : UInt, 3 : UInt, %26, Func(Int, 0 : Int)) ``` This happens because specialization passes are leaving a `%IFoo` in the function type, instead of replacing with a concrete type: ``` let %34 : Void = DebugFunction("getFirstOuter", 18 : UInt, 3 : UInt, %26, Func(Int, %IFoo)) ``` and later, `cleanUpInterfaceTypes()` just replaces all interfaces with the literal zero. So now we have a parameter type which isn't actually a type at all, but an IntLit instead. I'm not sure if the approach I picked is good, though. Some other options that crossed my mind were: * Make `fixUpFuncType` also update related DebugFunctions - But is there a reason why DebugFunctions separately carry a function type in the first place? * Make `cleanUpInterfaceTypes` less aggressive or at least replace types with a type instead of a value - But this will still make the debug info incorrect :(
2025-10-10	8503 wgsl depth texture (#8645)	Sami Kiminki (NVIDIA)
	Add built-in type aliases for DepthTexture* and unify SamplerShadow Add the following type aliases: - DepthTexture1D, DepthTexture1DArray - DepthTexture2D, DepthTexture2DArray - DepthTexture2DMS, DepthTexture2DMSArray - DepthTexture3D - DepthTextureCube, DepthTextureCubeArray These match with the type aliases for non-depth textures. Also, unify the SamplerShadow type aliases with DepthTexture* ones. This adds the following: - Sampler2DMSShadow - Sampler2DMSArrayShadow and removes the Sampler3DArrayShadow type alias. As a side-effect, the descriptions of Sampler*ArrayShadow type aliases are fixed ("texture-sampler for shadow" ==> "texture-sampler array for shadow"). Update the slang tests to use the newly introduced type aliases instead of the custom type aliases that use _Texture<> directly. Add DepthTexture testing in hlsl-intrinsic/texture/texture-intrinsics. Do this by extracting the test logic of computeMain() in a separate function and parametrize it for non-depth/depth texture types. This adds basic coverage for the following types: - DepthTexture1D - DepthTexture2D - DepthTexture3D - DepthTextureCube - DepthTexture1DArray - DepthTexture2DArray - DepthTextureCubeArray Issue #6166 Issue #8503
2025-10-10	Update debug var when in-param proxy var is being updated. (#8671)	Yong He
	Closes #8664. The problem is that when there is an `in` parameter, Slang will create a local variable to proxy the parameter, copy the value of the parameter into the proxy variable, and replace all uses of the parameter in the function body to use the proxy variable instead. This way all writes to the parameter become writes to the proxy variable. However, when there is debug info enabled, we are also going to create a "debugVariable" corresponding to the parameter, but this debugVariable isn't updated when the proxy variable is updated. The fix is to map the proxy var instead of the original param to the debug var during the `insertDebugValueStore` pass, so that any changes to the proxy var will result in additional stores being inserted to the debug var. Allowing function body to modify an `in` parameter is a bad legacy behavior we inherited from HLSL that we should really be moving away from. I would like us to completely treat an `in` parameter as immutable by default in the next language version (Slang 2026), and make it an error if the user tries to do so. This will allow us to generate much cleaner code and in many cases would help with performance.
2025-10-09	Improve perf with `-separate-debug-info` (#8670)	Jay Kwak
	When Slang form a new spirv code without the debug info, List container had to reserve the memory space before adding items in it. This improves the given repro test time from 56 minutes to 6 minutes.
2025-10-10	Defer `IRCastStorageToLogicalDeref` in lowerBufferElementType pass. (#8668)	Yong He
	Fix a regression on metal test. In `lowerBufferElementTypeToStorageType` pass, not only we want to defer an argument that is `CastStorageToLogical` to the callee, but also apply the same defer logic to `CastStorageToLogicalDeref` as well. Because `CastStorageToLogicalDeref` will appear as argumnet if `lowerBufferElementTypeToStorageType` is run before we apply the `in->borrow` transformation pass, which is the case for metal parameter block legalization.
2025-10-10	Small fix to buffer load specialization pass to allow more specialization to ↵	Yong He
	happen. (#8653) This allows us to further cleanup unnecessary copies in the target code we generate. Part of effort of #8652.
2025-10-08	Fix DerivativeGroupQuadsKHR workgroup size validation for texture sampling ↵	Lujin Wang
	(#8647) Fixes #8545 where Slang generates SPIR-V with DerivativeGroupQuadsKHR execution mode but doesn't validate workgroup sizes when texture sampling triggers automatic derivative computation. Root Cause: Validation code was looking for IRNumThreadsDecoration on the wrong IR node Fix: One-line change in slang-emit-spirv.cpp to search decoration on entryPoint instead of entryPointDecor Tests: Added regression tests for both quad and linear derivative group validation Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Lujin Wang <lujinwangnv@users.noreply.github.com> Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-10-08	Allow 1D SV_DispatchThreadID in CPU targets (#8612)	Julius Ikkala
	The varying param legalization pass didn't deal with this 1D form of SV_DispatchThreadID for CPU targets: ```slang void computeMain(int i : SV_DispatchThreadID) ``` Instead, it just overrode the type of `i` with a `uint3`, breaking lots of code that attempted to use `i` for something, like a `switch` statement for example. I ran across this when going through `language-feature` tests for the LLVM target, which will also use this legalization pass. I'm separately submitting this now because this also fixes the existing CPU target. The test I enable in this PR is one that was previously generating broken code on CPU. (somewhat related issue: #7468)
2025-10-08	parser: Avoid dropping modifiers when splitting list (#8546)	James Helferty (NVIDIA)
	Fix for a linked list usage bug; avoids dropping any modifiers when moving type modifiers from a linked list of modifiers into their own linked list. Since this change results in no_diff modifiers to traditional functions ending up on the return type instead of the function (due to the order they're parsed in), we duplicate the no_diff modifier onto the function declaration after the fact. Includes a test for the original issue. The no_diff redistribution case is covered by a slangpy device test case. Fixes #8332 --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-10-08	Feature/improve formatting sh (#8641)	Janne Kiviluoto (NVIDIA)

2025-10-08	Fix UnixPipeStream::read() not handling EOF (#8626)	ncelikNV
	Fixes #6754.
2025-10-08	Improve texture loads and stores on CUDA (#8644)	Simon Kallweit
	- fix handling layer and mip level - add support for 1D layered textures - reduce code by using macros - assert when trying to emit unsupported intrinsics There is a new set of unit tests in slang-rhi for exhaustive testing of shader loads/stores on textures. These fixes allow to enable most of these tests. Formatted loads/stores on surfaces are not supported in PTX ISA, so this would require codegen for the conversion which in theory should be possible but not as part of the CUDA prelude.
2025-10-08	Add deterministic shuffling of tests in directory (#8622)	Janne Kiviluoto (NVIDIA)
	Fixes #8621 Add command line options for enable shuffling as well as providing a custom seed. Use Mersenne-Twister engine for a deterministic shuffle.
2025-10-08	`ExprLoweringVisitorBase::getDefaultVal(Type*)` use ↵	Ronan
	`MakeVector/MatrixFromScalar` (#8512) - Allows using `Vector/Matrix` type with yet unresolved dimensions - Simpler implementation and in-line with default `Array` - Added `test/bugs/gh-8512.slang`
2025-10-07	Fix a bug that causes a struct field to be initialized twice. (#8619)	Yong He
	We insert field initialization logic at the beginning of every ctor in `synthesizeCtorBody`, but then immediately inserts another round of initialization again for explicit ctors in `maybeInsertDefaultInitExpr`, both called from `SemanticsDeclBodyVisitor::visitAggTypeDecl` right next to each other. The fix is to remove `maybeInsertDefaultInitExpr`. This change also enhances the address aliasing analysis, so that for the following case: ``` this->member1 = 0; this->member2 = 0; this->member1 = param; ``` We can still remove the first assignment to `this->member1` despite seeing `this->member2=0`, since it is easy to know that `this->member2` cannot alias with `this->member1`. Closes #8600.
2025-10-07	Use GitHub runners for Windows releases, disable CUDA for aarch64 (#8613)	aidanfnv
	For #8596 Fixes #8597 This switches our release workflow back to using GitHub's `windows-latest` runners, which we were using previously. It also adds the variable `extra-cmake-flags` to the `windows-aarch64` entry in the workflow's matrix with the value `"-DSLANG_ENABLE_CUDA=0"`. If we are cross-compiling aarch64 on x86_64, and the x86_64 CUDA Toolkit is installed, it will be auto-detected by cmake and the build will fail (no aarch64 version of CUDA Toolkit exists). The `windows-latest` runners do not have CUDA Toolkit, so they do not encounter this issue, but if we do end up building on runners that do (such as the temporary move to self-hosted runners), adding that flag eliminates that potential problem. This release workflow does build properly on `windows-latest` with `extra-cmake-flags`: https://github.com/aidanfnv/slang/actions/runs/18293521738
2025-10-07	Disable branching subgroup test for WGSL (#8614)	Jay Kwak
	WGSL doesn't allow subgroup related functions in a branching. It must be used in a uniform flow. This commit disables a test for such case. Note that the test was supposed to be disabled on the previous PR, but it was mistakenly not disabled. - #8386
2025-10-07	Use loadModuleFromSourceString in specialization example snippet (#8616)	aidanfnv
	Fixes #8221 This modifies the code snippet used to demonstrate link-time specialization to use the public `loadModuleFromSourceString` API instead of the internal `UnownedRawBlob::create`. It also corrects a couple variable names in the snippet as well.
2025-10-07	Minor Documentation Update to Remove Outdated Section (#8606)	Xiang Hong
	As mentioned in #8316 , there is a small duplicated and outdated section in WGSL-Specific Functionalities documentation about specialization constants support, remove the outdated duplicated one <img width="893" height="146" alt="image" src="https://github.com/user-attachments/assets/abcd7521-645b-4bd6-b926-ce2d978775bd" /> as there is a new section in the page <img width="851" height="319" alt="image" src="https://github.com/user-attachments/assets/f52e5230-812b-4b29-88f4-bfff890f37ed" /> --------- Co-authored-by: Yong He <yonghe@outlook.com>
2025-10-07	Use symbol alias instead of wrapper synthesis to implement link-time types. ↵	Yong He
	(#8603) This change achieves link-time type resolution with a different mechanism. For `extern struct Foo : IFoo = FooImpl;`, instead of synthesizing a wrapper type `Foo` that has a `FooImpl inner` field and dispatches all interface method calls to `inner.method()`, this PR completely removes this synthesis step, and instead just lower such `extern`/`export` types as `IRSymbolAlias` instructions that is just a reference to the type being wrapped. Then we extend the linker logic to clone the referenced symbol instead of the SymbolAlias insts itself during linking. By doing so, we greatly simply the logic need to support link-time types, and achieves higher robustness by not having to deal with many AST synthesis scenarios. Closes #8554. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-10-06	Add check for NVRTC backend in unit test cudaCodeGenBug (#8611)	Sami Kiminki (NVIDIA)
	Test `slang-unit-test-tool/cudaCodeGenBug.internal` requires that the CUDA toolkit is available. Add a check for the NVRTC backend to avoid a failure when this is not the case. Fixes #6636