slang.git - Making it easier to work with shaders

	Commit message (Collapse)	Author	Age
*	Addition of `Load`/`Store` coherent operations (#8395)	16-Bit-Dog	2025-10-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes: https://github.com/shader-slang/slang/issues/7634 Duplicate of PR https://github.com/shader-slang/slang/pull/8052 Primary Changes: * Added `storeCoherent` and `loadCoherent` for coherent load/store via pointers. This is backed by `IRMemoryScopeAttr` which is an `IRAttr` attached to `IRLoad` and `IRStore` * Logic in `source\slang\slang-emit-spirv.cpp` for load/store emitting has been reworked to be less messy and more maintainable * Add to `hlsl.meta.slang` coop vector and coop matrix coherent load/store operations Secondary Changes: * Added a missing load/store test for coop matrix: `tests\cooperative-matrix\load-store-pointer.slang` --------- Co-authored-by: ArielG-NV <aglasroth@nvidia.com> Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: Nathan V. Morrical <natemorrical@gmail.com>
*	implement dot products for 1 vectors (#8599)	Ellie Hermaszewska	2025-10-10
\| \| \|	Closes https://github.com/shader-slang/slang/issues/8378
*	8503 wgsl depth texture (#8645)	Sami Kiminki (NVIDIA)	2025-10-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add built-in type aliases for DepthTexture* and unify SamplerShadow Add the following type aliases: - DepthTexture1D, DepthTexture1DArray - DepthTexture2D, DepthTexture2DArray - DepthTexture2DMS, DepthTexture2DMSArray - DepthTexture3D - DepthTextureCube, DepthTextureCubeArray These match with the type aliases for non-depth textures. Also, unify the SamplerShadow type aliases with DepthTexture* ones. This adds the following: - Sampler2DMSShadow - Sampler2DMSArrayShadow and removes the Sampler3DArrayShadow type alias. As a side-effect, the descriptions of Sampler*ArrayShadow type aliases are fixed ("texture-sampler for shadow" ==> "texture-sampler array for shadow"). Update the slang tests to use the newly introduced type aliases instead of the custom type aliases that use _Texture<> directly. Add DepthTexture testing in hlsl-intrinsic/texture/texture-intrinsics. Do this by extracting the test logic of computeMain() in a separate function and parametrize it for non-depth/depth texture types. This adds basic coverage for the following types: - DepthTexture1D - DepthTexture2D - DepthTexture3D - DepthTextureCube - DepthTexture1DArray - DepthTexture2DArray - DepthTextureCubeArray Issue #6166 Issue #8503
*	Improve texture loads and stores on CUDA (#8644)	Simon Kallweit	2025-10-08
\| \| \| \| \| \| \| \| \| \| \| \|	- fix handling layer and mip level - add support for 1D layered textures - reduce code by using macros - assert when trying to emit unsupported intrinsics There is a new set of unit tests in slang-rhi for exhaustive testing of shader loads/stores on textures. These fixes allow to enable most of these tests. Formatted loads/stores on surfaces are not supported in PTX ISA, so this would require codegen for the conversion which in theory should be possible but not as part of the CUDA prelude.
*	Prefer IntegerType over LogicalType integer matrix mul() overloads (#8426)	pdeayton-nv	2025-10-06
\| \| \| \| \| \| \| \| \| \|	Integer mul(matrix, matrix) and mul(vector, matrix) are not disambiguated between __BuiltinIntegerType and __BuiltinLogicalType, emitting an ambiguous call compilation error. Use the OverloadRank attribute to prefer the IntegerType overload over the LogicalType overload. Fixes #8424
*	Fix incorrect binding index assignment for StructuredBuffer and ↵	Copilot	2025-10-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ByteAddressBuffer with DescriptorHandle (#8252) - [x] Fix segmentation fault in wrapConstantBufferElement for DescriptorHandle types - [x] Split DescriptorKind.Buffer into ConstantBuffer and StorageBuffer - [x] Update binding enums with descriptive names (ConstantBuffer_Read, StorageBuffer_Read, etc.) - [x] Update resource type mappings for correct binding assignments - [x] Update template logic to handle ConstantBuffer and StorageBuffer kinds separately - [x] Update tests to reflect correct binding assignments - [x] Split DescriptorKind.TexelBuffer into UniformTexelBuffer and StorageTexelBuffer - [x] Update TextureBuffer<T> to use UniformTexelBuffer kind - [x] Update _Texture extension to determine texel buffer kind based on access mode - [x] Update test desc-handle-1.slang to handle new DescriptorKind enum cases <!-- START COPILOT CODING AGENT TIPS --> --- ✨ Let Copilot coding agent [set things up for you](https://github.com/shader-slang/slang/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: Yong He <yonghe@outlook.com>
*	Add SPV_NV_bindless_texture support (#8534)	Lujin Wang	2025-09-26
\| \| \| \| \| \| \| \| \| \| \| \| \|	Treat DescriptorHandle as uint64_t instead of uint2. Implement target-specific SPIR-V emission with the bindless texture support. For OpImageTexelPointer, Image must have a type of OpTypePointer with Type OpTypeImage. Fix the issue by using [constref] in __subscript. Add a test coverage for various texture/sampler handle types. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
*	Prepare VulkanSDK release Oct 2025 (#8525)	Jay Kwak	2025-09-25
\| \| \| \|	Related to - https://github.com/shader-slang/slang/issues/8519
*	Split overloaded uses of RefType in front-end (#8427)	Theresa Foley	2025-09-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Overview ======== This change is the start of an attempt to address how the Slang compiler codebase has ended up conflating two similar, but semantically distinct, concepts: * The long-standing notion of `ref` parameters (only allowed for use in the builtin modules), which are encoded using a wrapper `Type` in the AST as part of the representation of the parameters of a `FuncType`. * A recently-introduced notion of explicit reference types that mirror the built-in `Ptr` type, with a relationship comparable to that between pointer and reference types in C++. The change splits the `Ref<T>` type in the core module into two distinct types, with one for each of the two use cases. Similarly, the `RefType` class in the compiler's AST is split into two distinct classes, to represent the two cases. Background ========== The `Ref<T>` type in the core module (hidden and not intended for users to ever see or use) was originally introduced to encode the `ref` parameter-passing mode, comparable to the hidden `Out<T>` and `InOut<T>` types used to encode `out` and `inout` parameter-passing modes. The `Ref<T>` type in the core module was encoded as a instance of the `RefType` class in the Slang AST (similar to how `Out<T>` mapped to an `OutType`). These AST classes were only intended to be used by the compiler front-end as part of its encoding of function types. The `FuncType` class needed a way to distinguish an `inout int` parameter from a plain (implicitly `in`) `int` parameter, so these wrapper like `RefType` and `OutType` were introduced to encode both the parameter type (`T`) and the parameter-passing mode in a form that could be passed around as a `Type`. Notably, the `Ref<T>` type (and `Out<T>`, etc.) were not intended to be type names that ever get uttered in Slang code (not even in the builtin modules), and the vast majority of the compiler code was not supposed to ever encounter them. They were an implementation detail of `FuncType`, and nothing else. (In hindsight it may have been a mistake to use a nominal type declared in the core module to implement these wrappers; it might have been a good idea to use an entirely separate class of `Type` for this case...) Recent changes to the builtin modules introduced functions that wanted to return a reference (so that the parameter-passing-mode modifiers like `ref` could not trivially be used), and as part of those changes the appealingly-named `Ref<T>` type in the core module was re-used for this new case. Builtin operations were declared with an explicit `Ref<T>` return type, and parts of the compiler front-end that had previously been blissfully unaware of the AST's `RefType` (and `InOutType`, etc.) had to start accounting for the possibility that an explicit `Ref<T>` would show up. Related changes also introduced a comparable conflation of the (unfortunately-named) `constref` parameter-passing modifier and builtin operations that wanted to return an explicit reference that is read-only. Both use cases were mapped to the core-module `ConstRef<T>` type, which appeared in the AST as an instance of the `ConstRefType` class. The overlapping use of `ConstRef<T>`` is actually significantly more troublesome than the `Ref<T>` case because, despite what its name implies, `constref` was not really supposed to be the read-only analogue of `ref`, but rather it is closer to the "immutable value borrow" analogue to `inout`'s "mutable value borrow." The semantics of a "value borrow" vs. a "memory reference" in Slang have not been very carefully codified, and the conflation around `ConstRef<T>` has contributed to things becoming increasingly muddy in the compiler back-end. Main Changes ============ Core Module ----------- The `Ref<T>` type has been replaced with two distinct types, with one for each use case: * `RefParam<T>` is intended for use when encoding a `ref` parameter in a function type * `ExplicitRef<T>` is intended for use when an operation in a builtin module wants to return a reference The other types used to represent parameter-passing modes (e.g., `InOut<T>`) were renamed to better indicate that their role in defining parameter types (e.g., `InOutParam<T>`). The `ExplicitRef<T>` type was given additional generic parameters for the allowed access and the address space, akin to what `Ptr<T>` now supports. The pointer dereference operator (prefix ``) in the core module should now properly propagate the access and address space of the pointer over to the reference that gets returned. The two distinct use cases of `ConstRef<T>` were not split in the way as `Ref<T>`, instead the case for the `constref` parameter-passing mode uses `ConstParamRef<T>`, while cases that previously used `ConstRef<T>` to represent a read-only explicit reference instead now use `ExplicitRef<T, Access.Read>`. Prior to this change there were two subscripts declared on pointers: one in the `Ptr` type itself, and another in an `extension` for pointers with `Access.ReadWrite`. The comments on the code seemed to indicate that the catch-all subscript used to only have a `get` accessor, while the `ref` was only available on read-write pointers, but it seems that subsequent changes converted the default subscript to support `ref`. This change eliminates the subscript added via `extension`, since it is redundant. AST and Front-End ================= Similar to the changes in the core module, the AST `RefType` class was split into: `RefParamType` for the case of encoding `ref` parameters * `ExplicitRefType` for the case where the user meant an explicit reference type All the other classes that represent wrappers for encoding parameter-passing modes (e.g., `OutType`) were similarly renamed (e.g., `OutParamType`). The `ConstRefType` class was simply renamed to `ConstRefParamType`, because any use cases of `ConstRefType` that intended an explicit reference type will now use `ExplicitRefType` with `Acccess.Read`. For convenience, this change includes type aliases to map the old names for these types over to the new ones (e.g., `using OutType = OutParamType`) so that the change doesn't need to affect quite so many lines of code. The `RefType` and `ConstRefType` names are intentionally left undefined, since it woudl be unsafe to assume that existing use sites should default to either of the two possible interpretations. All use cases of `RefType` and `ConstRefType` (and their former shared base class `RefTypeBase`) were audited and updated to refer to either `RefParamType`/`ConstRefParamType` or `ExplicitRefType`, as appropriate (based on whether the context of the code indicated it was working with parameter-passing mode wrapper types, or explicit reference types). In many (many) cases comments were added to the code that was updated (and some unrelated code that needed to be audited along the way) to note cases where there appears to be something fishy going on in the compiler and/or there are obvious opportunities for next-step improvement. The `QualType` constructor used to infer l-value-ness when passed a `RefType` or `ConstRefType`; that code was introduced to support explicit reference types. The code was updated to consult the access argument of an `ExplicitRefType` to try and determine the right l-value-ness to use. There is some ambiguity about what should be done in the case where the value of the generic argument representing the access cannot be statically determined; a better solution may be needed. Many other cases in the front-end that were working with `RefType` and `ConstRefType` for explicit references also need to figure out l-value-ness, and these were changed to rely on the logic already added to `QualType` so that it wouldn't have to be duplicated. It isn't clear if this structure is the best way to tackle the problem, but it seems to at least be an upgrade over the more strictly ad-hoc logic that was in place before. Future Work =========== IR-Level Work ------------- The most obvious next step to take is that the split that was made in the compiler front-end needs to be properly plumbed through all of the back-end. There appears to be a lot of code in the back end of the compiler that has made the same conflation of `ref` parameters and explicit reference types that the front-end did. In practice, any uses of `ExplicitRef<T>` in the front-end should desugar into plain pointer-based code in the IR. Clean Up Parameter-Passing Modes -------------------------------- The code that handles different parameter-passing modes (`ParameterDirection`s) and their wrapper types is somewhat scattered and messy (as found while auditing use cases of `RefType`). A cleanup pass is warranted to ensure that most code only needs to think about `ParameterDirection`s. There should ideally be only a single operation in the front-end that handles determining the `ParameterDirection` of a parameter based on its modifiers. Similarly, there should be one operation to wrap a value type based on a parameter direction, and one operation to derive a `ParameterDirection` from the wrapper type. Ideally, the accessors for `FuncType` should not provide unrestricted access to the potentially-wrapped parameter types, and should instead return some kind of `ParamInfo` struct that encodes both a `ParameterDirection` and the unwrapped `Type` of the parameter. Clean Up `QualType` ------------------- A significant piece of future work that appears required is to drastically clean up and improve the way that `QualType`s are represente and handled in the front-end. There are currently various distinct `bool` flags in `QualType` (some with very unclear meaning) and differnet parts of the codebase consult/modify only subsets of them; a clear enumeration of the "value categories" (to use the C++ terminology) that Slang supports could be quite helpful. Naively, a `QualType` should at least encode the basic information that a `Ptr` type encodes: * A value type * Allowed access (read-only, read-write, etc.) * Address space The main additional thing that a `QualType` needs is a way to distinguish cases where an expression evaluates to: * A reference to a memory location, where all the information from a `Ptr` is relevant * A simple value, such that the access and address space are irrelevant * A reference to an abstract storage location (a `property`, `subscript`, or an implicit conversion that needs to support being an l-value), in which case address space is irrelevant and the "allowed access" basically amounts to a listing of the accessors the storage location supports Eliminate Explicit Reference Types ---------------------------------- Finally, twe should eventually eliminate the `ExplicitRef<T>` type from the core module (and all of the supporting code from the front-end), since the feature is not a good fit for the Slang language. We should find some other way to decorate operations in the builtin module that need to returns a reference rather than a value (note how `ref` accessors already avoided exposing explicit reference types, by design). --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
*	Fix LSS intrinsics for hit objects in ray tracing tests (#8469)	Harsh Aggarwal (NVIDIA)	2025-09-17
\| \| \| \| \| \| \| \| \| \| \| \|	Enable GetLssPositionsAndRadii() call in rayGenLssIntrinsicsHitObject shader that was previously commented out. This fixes the failing ray-tracing-lss-intrinsics-hit-object test which was returning all zero values for LSS position and radius data. The hit object LSS intrinsics are now working correctly in D3D12 backend, returning proper endcap positions and radii values as expected by the test. All 27 test assertions now pass successfully. Fixes #8128
*	Diagnose error when the function args can't satisfy constexpr parameter ↵	Gangzheng Tong	2025-09-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	requirements (#7269) ## Summary This PR enhances constexpr validation by adding proper error checking when function arguments cannot satisfy constexpr parameter requirements, addressing issue #6370. ## Problem Previously, when a function declared constexpr parameters, the compiler would attempt to propagate constexpr-ness to the call site arguments, but there was insufficient validation and error reporting when this propagation failed. This could lead silent failures where constexpr requirements weren't properly enforced ## Solution This PR adds checks that: 1. Validates constexpr arguments: When a function parameter is marked as `constexpr`, the compiler now explicitly checks that the corresponding argument can be marked as `constexpr` 2. Issues clear compilation errors: added `Diagnostics::argIsNotConstexpr`) 3. Handles both call scenarios: The validation works for both: - Direct function calls with IR-level function definitions - Calls to function from external modules Fixes #6370 --------- Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
*	Fix#8128 LSS and sphere hit object intrinsics fail to compile (#8339)	Harsh Aggarwal (NVIDIA)	2025-09-04
\| \| \|	Update intrinsics signature as per the nvapi header
*	Enable CUDA support for additional HLSL intrinsic tests (#8293)	Harsh Aggarwal (NVIDIA)	2025-09-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enable CUDA support for additional HLSL intrinsic tests by implementing missing functionality and fixing compiler bugs affecting CUDA targets. - Fix critical bug in InterlockedCompareStore64 where division used /4 instead of /8 for 64-bit types, causing incorrect memory addressing for all signed int 64_t atomics - Add signed int64_t atomic wrappers (atomicExch, atomicCAS) to CUDA prelu de that properly cast to/from unsigned types as required by CUDA's atomic API - Enable tests: atomic-intrinsics-64bit.slang - Implement CUDA support for QuadAny and QuadAll operations using warp shu ffle primitives (__shfl_sync with quad-level lane masking) - Add CUDA to quad_control capability definition in slang-capabilities.capdef - Add _slang_quadAny/_slang_quadAll helper functions to CUDA prelude - Enable tests: quad-control-comp-functionality.slang, subgroup-quad.slang --------- Co-authored-by: szihs <675653+szihs@users.noreply.github.com>
*	[CBP] Pointer frontend changes + groupshared pointer support (#7848)	ArielG-NV	2025-08-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Resolves #7628 Resolves: #8197 Primary Goals: 1. Add `Access` to pointer 2. AddressSpace::GroupShared support for pointers (SPIR-V) 3. Add `__getAddress()` to replace `&` * `&` is not updated to `require(cpu)` since slangpy uses `&`. This means we must: (1) merge PR; (2) replace `&` with `__getAddress()`; (3) add `require(cpu)` to `&` Changes: * Added to `Ptr` the `Access` generic argument & logic (for `Access::Read`). * Moved the generic argument `AddressSpace` from `Ptr` to the end of the type. * Added pointer casting support between any `Ptr` as long as the `AddressSpace` is the same * Disallow globallycoherent T* and coherent T* * Disallow const T, T const, and const T* * Fixed .natvis display of `ConstantValue` `ValOperandNode` * Support generic resolution of type-casted integers * Added `VariablePointer` emitting for spirv + other minor logic needed for groupshared pointers Breaking Changes: * Anyone using the `AddressSpace` of `Ptr` will now have to account for the `Access` argument * we disallow various syntax paired with `Ptr` and `T*` --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
*	[Documentation] optix test coverage #463 (#8311)	Harsh Aggarwal (NVIDIA)	2025-08-28
\| \| \| \|	Update docs/shader-execution-reordering.md with additional intrinsics Add correct capability `LoadLocalRootTableConstant`
*	Introduce CDataLayout & -fvk-use-c-layout (#8136)	Julius Ikkala	2025-08-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Closes #8112. ~~The issue asks for a "C layout", but in this PR I use the term "CPU layout" because this naming was pre-existing in the codebase as `kCPULayoutRulesImpl_`. The primary purpose of this layout is to match CPU-side struct definitions with the shader side. I'm open to better naming suggestions, though.~~ Edit: switched back to using `CDataLayout` & `-fvk-use-c-layout`, as the CPU target depends on the object layout rules of existing CPU layout rules, but they're incompatible with actual shaders. So a new `kCLayoutRulesImpl_` was needed anyway. --------- Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
*	Fix nextafter() (#8195)	Julius Ikkala	2025-08-20
\| \| \| \| \| \| \| \| \| \|	Fixes #8185. The previous implementation is incorrect and basically only works in the `x = 0` case. `delta` was the smallest possible positive value representable as a float, but that's below the rounding error of addition with almost all reasonably sized floats. This fixed implementation is based on bit twiddling instead. I've checked the float case against the C++ `nextafterf` with both a -inf -> inf and inf -> -inf sweep, in addition to the test included in this PR.
*	Add Metal support for WaveGetActiveMask and WaveActiveCountBits (#8218)	Tianyu Li	2025-08-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## Summary - Add Metal platform support for `WaveGetActiveMask()` and `WaveActiveCountBits()` wave intrinsics - Update capability requirements to include Metal platform for subgroup ballot operations - Implement Metal-specific intrinsic assembly using `simd_ballot()` and `simd_vote` APIs ## Changes - source/slang/hlsl.meta.slang: - Add Metal target case for `WaveGetActiveMask()` using `simd_ballot(true)` - Update capability requirements from `cuda_glsl_hlsl_spirv` to `cuda_glsl_hlsl_metal_spirv` for wave ballot functions - source/slang/slang-capabilities.capdef: - Add `metal` to `subgroup_ballot_activemask` capability alias
*	Updated support to enable batch3 (#8219)	Harsh Aggarwal (NVIDIA)	2025-08-20
\| \| \| \| \| \| \| \| \|	Enable CUDA support for batch 3 tests - Enhanced wave operations with exclusive support - Added proper identity values for min/max operations - Fixed intrinsic name mapping issues - Updated test configurations Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
*	Use 64bit int instead of emulation on metal (#8180)	James Helferty (NVIDIA)	2025-08-15
\| \| \| \| \| \| \| \| \| \| \| \|	Metal's popcount prototype is `T popcount(T x)` but we want to use it to implement `countbits` where the prototype always returns `uint`. Using `popcount` directly would implicitly cast successfully to the 32-bit return value in all cases except when the argument is a 64-bit type. Thus, this change always explicitly casts the result to `$TR`, which should be one of the `uint[N]` types, and should always be able to hold the number of bits in the type. Addresses #6877
*	Prohibit use of buffer.GetDimensions on metal (#8156)	James Helferty (NVIDIA)	2025-08-15
\| \| \|	Fixes #7011
*	[Capability System] Fix bug where capabilities do not correctly propegate if ↵	ArielG-NV	2025-08-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AST-parent has target+set the AST-child does not (#8175) Fixes: #8174 Changes: * To determine if we propagate capabilities, we need to ensure that a `join` will do nothing (optimization since `join` is expensive + caching data for the `join` adds up to be expensive). This logic was changed in `slang-check-decl.cpp` since the current logic was incorrect. * A parent could have the set `metal+glsl` and the use-site could have `glsl`. In this case, we will not remove `metal` from the parent since `{metal+glsl}.implies({glsl})` is true. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
*	Fix atomics error diagnostics (#8117)	venkataram-nv	2025-08-09
\| \| \| \| \| \| \|	Fixes #8116 --------- Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
*	Error if super-type capabilities are a super-set of sub-type (#7452)	ArielG-NV	2025-08-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes: #7410 Changes: 1. super-type capabilities must be a super-set of sub-type capabilities (and support the same shader stages/targets) * InheritanceDecl visits super-type to inherit it's capabilities; validate InheritanceDecl capabilities against sub-type * visit all container decl's with a default case * clean up functionDeclBase visitor * Simplify `diagnoseUndeclaredCapability` by moving logic into capability checking (more correct) 3. added changed behavior to documentation 4. fixed some incorrect capabilities 5. we do not* diagnose capability errors on interface requirement-to-implementation if both lack explicit capability requirements. This change is to work around a slangpy regression (test case for the failing situation is in `tests\language-feature\capability\capability-interface-extension-1.slang`), Note: maybe for slang-2026 we don't do this? 6. requirement & implementation must support the same shader stage/target. This was changed because otherwise we can have cases where `X` inherits from `Y`, but `Y` is only expected to be used in `glsl` whilst `X` is expected to be used in `hlsl \| glsl` 7. removed `tests/language-feature/capability/capabilitySimplification3.slang` because it tests nothing special (redundant) Note: not using rebase due to separate branches depending on this PR --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
*	Fix intrinsic LoadLocalRootTableConstant for optix (#7949)	Harsh Aggarwal (NVIDIA)	2025-08-07
\| \| \| \| \| \| \| \| \| \| \| \|	Due to an older version of spec referred there was an inconsitency v1.29 2/20/2025 - [HitObject LoadLocalRootArgumentsConstant] Latest spec https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html#hitobject-loadlocalroottableconstant Refer: OptiX backend support for Shader Execution Reordering (SER) features as outlined in issue #6647. -
*	Drain sink when single-argument constructor call fail (#7883)	ArielG-NV	2025-08-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* fix bug * fix test * push test changs for clarity * fix bug * fix test * push test changs for clarity * test what fails * remove redundant code
*	Lowering unsupported matrix types for GLSL/WGSL/Metal targets (#7936)	venkataram-nv	2025-07-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add emit cases for WGSL and GLSL * Fix compilation warnings Modify short cutting test to reflect change in emit logic Lower matrix for metal as well Add emit matrix logic for metal Fix compiler warning Brace initializer for lowered matrices Fix compiler warnings * Tests for metal * Fix mult, any, and determinant * Fix matrix-matrix multiplication * Fix mat mul to be element-wise * Fix compiler warning * Move makeMatrix to legalization * Move unary and binary arithmetic operator lowering to legalization * Remove emit logic and move final comparison operators to legalization * Handle vector/matrix negation for WGSL * Restore older SPIR-V emit logic * Address PR comments * Revert to zero minus for negation * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
*	[Language Server]: Show signature help on generic parameters. (#7913)	Yong He	2025-07-29
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Show signature help on generic parameters. * Fix. * Update tests. * slang-test: make vvl error go through stderr. * update slang-rhi * Update slang-rhi
*	Improve diagnostics over ambiguous references. (#7930)	Yong He	2025-07-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Improve diagnostics over ambiguous references. * Fix. * Remove files. * Fix some optix hitobject intrinsics. * Fix some hitobject intrinsics for optix. * Fix. * update rhi * revert slang-rhi * Update slang-rhi
*	Fix CUDA backend missing U32_firstbitlow implementation (#7921)	Copilot	2025-07-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Initial plan * Add U32_firstbitlow implementation for CUDA and CPP backends Co-authored-by: bmillsNV <163073245+bmillsNV@users.noreply.github.com> * Add I32_firstbitlow and comprehensive testing for signed/unsigned firstbitlow Co-authored-by: bmillsNV <163073245+bmillsNV@users.noreply.github.com> * Convert firstbitlow test to use inline filecheck syntax Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> * Add U32_firstbithigh and I32_firstbithigh implementations for CUDA and CPP backends Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Update prelude/slang-cpp-scalar-intrinsics.h * Update prelude/slang-cpp-scalar-intrinsics.h * Update prelude/slang-cpp-scalar-intrinsics.h * Refactor Metal bit intrinsics to handle zero case correctly Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> * Update slang-cuda-prelude.h remove fake links * Update hlsl.meta.slang * if -1, return -1 due to implicit hlsl rule * -1 or 0 is ~0u as per hlsl implictly * 0 or -1 as per hlsl * fix the math to map to hlsl * fix compile error * forgot `31 - clz` * format code (#7943) Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> * Update source/slang/hlsl.meta.slang * Update source/slang/hlsl.meta.slang * Update source/slang/hlsl.meta.slang * Update source/slang/hlsl.meta.slang --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: bmillsNV <163073245+bmillsNV@users.noreply.github.com> Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> Co-authored-by: ArielG-NV <aglasroth@nvidia.com> Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
*	Lower int/uint/bool matrices to arrays for SPIRV (#7687)	venkataram-nv	2025-07-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add tests for expected behaviour * Allow matrix types in logical or/and * Legalize int/bool matrix types and construction with makeMatrix * Legalize uint matrices and operations * Limit testing to only SPIRV * Better tests for int and bool * Add test for uint * Remove GLSL tests * Remove old test for diagnosing int matrices * Emit SPIRV directly in tests * format code * Address PR comments * Improve testing * Address PR comments * format code * Add tests for matrix intrinsic operations * Move matrix lowering to dedicated legalization pass * Fix compiler warning * Remove signal again * Reorder matrix and vector legalization * Fix formatting * Add shift and comparison tests --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
*	Fix CUDA issues with texture reads and surface writes (#7780)	Mukund Keshava	2025-07-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix 1D texture reads in CUDA target Fixes #7570: 1D surface writes don't work The issue was that the Load function for read-only textures (hlsl.meta.slang lines 3629-3656) only supported 2D and 3D textures for CUDA targets, causing 1D texture reads to fall through to <invalid intrinsic>. This affected the srcTexture[tid.x] read operation in the reproduction case. Changes: - Updated static_assert to include SLANG_TEXTURE_1D support - Added tex1DArrayfetch_int<T> for 1D array texture reads - Added tex1Dfetch_int<T> for regular 1D texture reads 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Mukund Keshava <mkeshavaNV@users.noreply.github.com> * Add 1D texture read support for CUDA target - Add tex1Dfetch_int template specializations for float2, float4, uint, uint2, uint4 - Remove TODO comment about 1D PTX not being supported - Enable 1D texture test in texture-subscript-cuda.slang - Fix assembly code issues in original template specializations 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Mukund Keshava <mkeshavaNV@users.noreply.github.com> * Update slang-cuda-prelude.h * Fix texture3d ptx issue * undo 1D texture changes * Update hlsl.meta.slang * Update hlsl.meta.slang * Update hlsl.meta.slang * Update hlsl.meta.slang * Extend texture-subscript-cuda.slang test with uint and int format variants Add test cases for newly supported texture formats in CUDA: - 2D textures with uint, uint2, uint4 - 2D textures with int, int2, int4 - 3D textures with uint, uint2, uint4 - 3D textures with int, int2, int4 This ensures the texture subscript operations work correctly for all the format variants added in the CUDA texture fixes. Co-authored-by: Mukund Keshava <mkeshavaNV@users.noreply.github.com> * update expected file --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Mukund Keshava <mkeshavaNV@users.noreply.github.com>
*	Document why failing VVL coop-vec tests cannot be enabled & minor ↵	ArielG-NV	2025-07-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cleanup/bug-fix (#7754) * Changes Fix a coop-vec bug, fix incorrect test synax, clean up test, comment why some coop-vec tests are stil disabled * Changes Fix a coop-vec bug, fix incorrect test synax, clean up test, comment why some coop-vec tests are stil disabled * disable failing tests * push changes * add to failing GLSL test list
*	Fix WGSL sign function to return correct int type instead of float (#7739)	Copilot	2025-07-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Initial plan * Fix WGSL sign function to return int type properly Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Add vector version check for WGSL sign function test Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Fix FileCheck order in WGSL sign test - use DAG for order-independent checking Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com>
*	Replace [KnownBuiltin] string-based comparisons with enum-based system (#7714)	Copilot	2025-07-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Initial plan * Implement enum-based KnownBuiltin system to replace string comparisons Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Add test for enum-based KnownBuiltin system and verify functionality Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Implement enum-based KnownBuiltin system with direct integer values Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Fix IntVal access and update tests for new enum-based KnownBuiltin system Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Replace hardcoded KnownBuiltin integers with preprocessor enum syntax - Updated all KnownBuiltin attributes to use $( (int)KnownBuiltinDeclName::EnumValue) syntax - Added space between parentheses to avoid preprocessor bug: $( (int) instead of $((int) - Updated both core.meta.slang and hlsl.meta.slang files - Eliminates preprocessor-time integer conversion, baking enum values directly into meta files - Maintains same functionality while using type-safe enum references Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Fix IDifferentiablePtr KnownBuiltin mapping regression Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Remove unused IDifferentiablePtrType enum case from KnownBuiltinDeclName Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Clean up temporary AST dump files from testing Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Replace hardcoded integer with descriptive constant in KnownBuiltin test Replace the hardcoded [KnownBuiltin(0)] with a descriptive named constant GEOMETRY_STREAM_APPEND_BUILTIN to improve code readability and maintainability. The test now clearly indicates which builtin enum value is being tested. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Gangzheng Tong <gtong-nv@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Gangzheng Tong <gtong-nv@users.noreply.github.com> Co-authored-by: Gangzheng Tong <tonggangzheng@gmail.com>
*	Add WorkgroupCount function (#7734)	davli-nv	2025-07-12
\| \| \| \| \| \| \| \| \|	Fixes #7733 Copy gl_NumWorkGroups into hlsl.meta.slang as WorkgroupCount function so that it can be used for GLSL and SPIR-V targets without GLSL syntax. Also change WorkgroupSize function to allow use with mesh shading capability. Update pipeline/rasterization/mesh/task-simple.slang to test it in task and mesh stages.
*	Fix CommittedRayInstanceCustomIndex generating wrong SPIR-V opcode (#7659)	Julius Ikkala	2025-07-10
\|
*	Fix typo in image format table (#7661)	aidanfnv	2025-07-09
\|
*	Make copysign function differentiable (#7585)	Copilot	2025-07-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Initial plan * Implement copysign forward and backward derivatives Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Fix copysign test format to use expected.txt file Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Add wgsl support to copysign and fix y==0 derivative case Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Add wgsl support to copysign helper functions Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Fix copysign derivative to return 0 when either x or y is 0 Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com>
*	Add MLP training examples. (#7550)	Yong He	2025-06-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add MLP training examples. * Formatting fix. * Fix. * Improve documentation on coopvector. * Improve doc. * Update doc. * Fix typo. * Cleanup shader. * Cleanup. * Fix test. * Fix type check recursion. * Fix. * Fix. * Fix override check.
*	Support `mad` in WGSL (#7538)	Mehmet Oguz Derin	2025-06-26
\|
*	Add matrix operand for OpCooperativeVectorMatrixMulAddNV (#7524)	Gangzheng Tong	2025-06-26
\| \| \| \| \| \| \| \| \| \| \|	* Add matrix operand for OpCooperativeVectorMatrixMulAddNV * update tests to use the supported UINT32 input component type * Add MatrixCSignedComponentsKHR for coopVecMatMulAddPacked --------- Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
*	Add default implementation for determinant (#7505)	Julius Ikkala	2025-06-22
\| \| \|	Co-authored-by: Nathan V. Morrical <natemorrical@gmail.com>
*	Fix coopvector neg intrinsic. (#7481)	Yong He	2025-06-18
\| \| \| \| \|	* Fix coopvector neg intrinsic. * Add test case.
*	Add new capdef for lss intrinsics (#7427)	Mukund Keshava	2025-06-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add new capdef for lss intrinsics Fixes #7426 Raygen shaders need to be supported for only hitobject APIs. So we need a special capability for that, instead of a common one. * regenerate command line reference --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
*	Fix an issue in extension override. (#7402)	Yong He	2025-06-11
\| \| \| \| \|	* Fix an issue in extension override. * Fix typo in comment.
*	Add optix support for coopvec (#7286)	Mukund Keshava	2025-06-10
\| \| \| \| \| \| \| \| \| \| \| \| \|	* WiP: Add coopvec support for Optix * format code * fix minor issues * Fix review comments --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
*	Implement isnan and isinf for WGSL with bitwise operations (#7344)	Jay Kwak	2025-06-05
\| \| \| \| \| \| \| \|	WGSL doesn't support isnan and isinf, because it assumes that it always uses fast-math and fast-math doesnt' handle NaN as defined in IEEE standard. The initial implementation used a clever workaround but it stopped working from some point. This PR implemented isnan and isinf with a bitwise operation, which can be expensive. But that seems to be an only option at the moment.
*	Use MatrixResultSignedComponents on OpCooperativeVectorMatrixMulNV (#7227)	Jay Kwak	2025-06-02
\|
*	Fix coopvec::fill to use a simpler expression (#7253)	Jay Kwak	2025-06-02
\|