slang.git - Making it easier to work with shaders

Age	Commit message (Collapse)	Author
2025-06-19	Always emit DepthReplacing execution mode when writing to FragDepth (#7450)	dubiousconst282

2025-06-19	Add support for on-demand AST deserialization (#7482)	Theresa Foley
	Note that this change does not actually enable on-demand deserialization of ASTs, because doing so is incompatible with the current compiler architecture where we have both an `ASTBuilder` and a `SharedASTBuilder`, and there are important invariants about how all AST nodes related to the core module must be created before those of any module using the core module. Instead, this change simply adds the infrastructure for on-demand deserialization, and ensures that those code paths get used at runtime, but actually "demands" all of the nodes in a given serialized AST immediately as part of the deserialization process. Important notes about the implementation approach: * PR #7242 ensured that all of the code accessing the direct member declarations of a `ContainerDecl` went through a small(-ish) set of accessor methods. This change takes advantage of that work by further abstracting the storage of the direct member declarations out in a type, `ContainerDeclDirectMemberDecls`, which makes it easy to add custom serialization logic for just that type. * The `ContainerDeclDirectMemberDecls` type also stores two pointers (one a `RefPtr` and the other a plain pointer) that are only used in the case where the members of a given `ContainerDecl` are being accessed through on-demand deserialization. This can be queried using the `isUsingOnDemandDeserialization()` method but any code accessing a `ContainerDecl` through the intended public API should never need to care about that detail. * Many of the accessor methods that were added in PR #7242 now branch on whether `isUsingOnDemandDeserialization()` is set. The normal code path is unchanged, and the implementation logic for the on-demand-deserialization case is largely held in `slang-serialize-ast.cpp`, to keep it close to the definitions of the serialized data structures themselves. * A few types in the `slang-ast-.h` headers have had `FIDDLE()` annotations added to them, so that they can be used to synthesize some of the serialization logic that was previously hand-written. The `_registerBuiltinDeclsRec()` function (which is used to scan the built-in module ASTs for the various "magic" declarations that the `SharedASTBuilder` needs to know about) was factored a bit to support the way that registration needs to behave differently in the case of loading a serialized module (if we kept using the existing recursive search, then it would force every declaration in the core module to be loaded right away). The new `_collectBuiltinDeclsThatNeedRegistrationRec()` function mirrors the overall traversal pattern to produce a flat list that gets included in the serialized AST module. Note in particular that we no longer call `registerBuiltinDecls()` from within `_readBuiltinModule()`. * The interface of the `Module` type was slightly expanded so that there is a more complete API for accessing the declarations exported from the module. Previously they could only be queried by their mangled name, but the new API also allows the entire list to be iterated over. The `ensureLookupAcceleratorBuilt()` method factors out the logic for building those data structures for a module. Note that in the case where on-demand deserialization is being used for a module, the `findExportedDeclByMandledName()` query will use serialized data directly, rather than build the lookup accelerators as C++ data structures (this is required if we are to avoid immediately deserializing all of the (exported) declarations in the core module as soon as it is loaded). * A few methods related to loading serialized modules (e.g., `loadSerializedModule()`) have been updated so that along with a pointer to the serialized `ModuleChunk` (which, for those who aren't aware, is a pointer directly into the serialized bytes of the module file), they receive an `ISlangBlob` that refers to the entire blob holding the serialized data (which the `ModuleChunk` is part of). Passing this pointer down allows code running under these methods to retain a reference-counted pointer to the blob to stop the memory of the serialized module from being released until deserialization has been completed. * The data types defined in `slang-fossil.h` have been overhauled significantly: * The most important change that is relevant to this work is the introduction of the `Fossilized<T>` template, which is used to statically map a "live" C++ type `T` to its binary fossilized representation. The `slang-fossil.h` file provides infrastructure allowing `Fossilized<T>` to be specialized for user-defined types, and also provides the necessary mappings for the core types like strings, arrays, and dictionaries. * A key point is that in C++ code, one can take a value of some type `Foo`, serialize it using a `Fossil::SerialWriter`, get a pointer to that serialized data, and then directly cast it to a `Fossilized<Foo>` and navigate the serialized data directly (without deserializing it back into a `Foo`). For that process to work, any specialization of `Fossilized<T>` must be sure to match the layout that will be produced by the `serialize()` implementation for `T`, when writing to a `Fossil::SerialWriter`. Another key change in the public interface of `slang-fossil.h` is that dynamically-typed traversal of the data used to be handled just with `FossilizedValRef`, but now uses a few different types. The `Fossil::ValRef<T>` and `Fossil::AnyValRef` types are used to capture the use cases that want reference-like behavior (basically a `Fossil::ValRef<T>` can be thought of as sort of like a `T&`), while `Fossil::ValPtr<T>` and `Fossil::AnyValPtr` are used for cases that want pointer like behavior (akin to `T`). Then there are related changes in `slang-serialize-fossil.`: The implementation of `Fossil::SerialReader` has been changed to use `Fossil::AnyValPtr` in most places where it formerly used `FossilizedValRef`. Using pointers (that can be null) instead of a weird kind of pseudo-reference (that could still be null) to traverse things was making the code harder to follow than it ought to be, in terms of understanding the levels of indirection in various places. * Some of the state that was previously in `Fossil::SerialReader` has been split into `Fossil::ReadContext`. This type allows multiple `Fossil::SerialReader`s to be created to read from the same serialized blob(s), while maintaining a persistent mapping from fossilized data pointers to live object pointers. The `ReadContext` also maintains the work list of deferred deserialization actions waiting to be performed, and only flushes that list when the last currently-open `SerialReader` is about to go out of scope. * In order to support the split of `Fossil::SerialReader` described above (and also to clean up something that didn't quite feel right in the original serialization design) the base serialization framework in `slang-serialize.h` has been tweaked so that a `Serializer` now wraps two pointers instead of just one. The first pointer continues to be an implementation of `ISerializerImpl`, which handles the actual reading/writing of data, while the other pointer is an explicit "context" pointer for operations that need additional user-defined context. * Similar to the changes made to the accessors for direct member declarations in a `ContainerDecl`, the `Module::findExportedDeclByMangledName()` method was updated to conditionally execute a different code path in the case of a module that has been loaded from serialized data. * Some improvements have been made to the fiddle tool: * Most importantly, the error-handling logic around Lua script execution has been cleaned up to better match correct Lua idiom. Native functions exposed to the Lua scripts have been changed to just use `lua_call` instead of `lua_pcall`, so rather than attempt to intercept Lua errors they will just automatically propagate them. * All Lua-related errors are caught at the top level, and reported in a way that uses the source location of the fiddle template that was being evaluated when the error was raised. In most cases, a Lua error should be accompanied by a stack trace of the Lua evluation state. The file paths and line numbers given should be accurate, but aren't directly double-clickable in the Visual Studio output panel, because they use a different format (a good future change might be to process the Lua stack trace and rewrite it into a format that is better for our needs). * Fixed a subtle bug where having "raw" content (parts of the template that should neither be evaluated nor emitted into the output) that consisted of only whitespace could result in a template being translated to invalid Lua code. * The bulk of the change is, unsurprisingly, in `slang-serialize-ast.cpp`. * This file has been refactored enough to look like a complete rewrite. A lot of work has been put into comments that describe the overall approach being taken, so hopefully it can be understood even by somebody who wasn't familiar with the previous code. Some of these are just plain cleanups, rather than being directly related to on-demand serialization. * Where possible, the code for reading and writing types that needed custom serialization has been moved so that the read/write functions are next to one another, making it easier to visually confirm that the serialized representations match on the read and write sides. * Where possible, the serialization logic for all types (not just the AST nodes, as was the case before) is being generated via fiddle. * Rather than just defining `serialize()` overloads for each of the relevant types, the code now defines `Fossilized<...>` specializations for these types as well, to enable statically-typed in-memory traversal of the serialized data. Note, however, that for the most part the `Fossilized<...>` representation types are not being used by the code (really only the `ASTModuleInfo` and `ContainerDeclDirectMemberDeclsInfo` types are traversed directly). This can be considered more as work to prove out the design of the `Fossil<...>` template approach, and it may or may not end up being relevant in the future. * The trivial bit of work to enable on-demand deserialization is in `ASTSerialReadContext::handleContainerDeclDirectMemberDecls()` where, rather than recursively reading the contained declarations, the method effectively just grabs the current cursor of the `Fossil::SerialReader` (which is pointed into the fossilized data) and stashes it into the `ContainerDeclDirectMemberDecls`, along with a `RefPtr` to the `ASTSerialReadContext` itself. Those stashed pointers are what enables the accessors on `ContaienrDeclDirectMemberDecls` to look up information on-demand. * The more interesting bits of the approach mostly come at the end of the file, where the accessor operations for on-demand deserialization are implemented. Once all the relevant work has been done to write the data structures, and produce `Fossilized<...>` types with the right layout, the work itself may seem almost trivial: a little bit of array iteration, and a little bit of binary-search lookup. * As a reminder, all of this infrastructure for on-demand deserialization is now in place and able to be invoked by the rest of the compiler, but declarations are currently all being loaded eagerly. The `SLANG_DISABLE_ON_DEMAND_AST_DESERIALIZATION` macro is being used to enable a small bit of extra logic in `ASTSerialReadContext::_cleanUpASTNode` so that the "cleanup" on a just-deserialized `ContainerDecl` includes eagerly querying its list of direct member declarations, which will cause them to be recursively deserialized.
2025-06-18	Fix coopvector neg intrinsic. (#7481)	Yong He
	* Fix coopvector neg intrinsic. * Add test case.
2025-06-18	Use 1-based argument index for DebugLocalVariable (#7438)	Lujin Wang
	* Use 1-based argument index for DebugLocalVariable GLSLANG/DXC NSDI uses 1-based index of the argument. Slang should follow this convention like other SPIR-V generators. * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-06-18	Fix out of bound buffer access in the preprocessor. (#7475)	Yong He
	* Fix out of bound buffer access in the preprocessor. * Fix test regression. --------- Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
2025-06-17	LanguageServer: Enhance auto completion for override. (#7465)	Yong He
	* Add additional completion keywords. * LanguageServer: Enhance auto completion for `override`.
2025-06-17	AD cleanup for 'break' replacement (#7456)	kaizhangNV
	close #4187. This PR makes some cleanup on the variable name. During CFG normalization, we introduce a variable to replace the break, but the variable indicates that we should continue running the loop if it's true, and break the loop if it's false. Previously we named this variable as 'breakVar', which is opposite to its purpose. So rename it to keepRunning.
2025-06-16	Fix for missing signedness cast in SwizzleIR (#7448)	Jerran Schmidt
	* Cast if there is a signedness mismatch on the swizzle * Move isSignedType to slang-util and add test --------- Co-authored-by: Yong He <yonghe@outlook.com>
2025-06-16	Require `override` keyword for overriding default interface methods. (#7458)	Yong He
	* Require `override` keyword for overriding default interface methods. * Update doc. * Fix test.
2025-06-16	Disable periadic diagnostic update on language-server on CI (#7445)	Jay Kwak
	The "textDocument/publishDiagnostics" Notification in the official Language Server Protocol, or LSP for short, is a notification that the server sends to the client such as VSCode or Visual Studio without the client having to ask for it. Its purpose is to provide a list of errors, warnings, or other informational "squiggles" for a specific file. Because the notification is an asynchronous push notification, it is receieved as an unexpected RPC message during the slang-test CI tests. When a notificatoin is unexpectedly sent to slang-test, the communication goes out-of-sync and the rest of language-server based tests intermittently fails. In order to address the problem, this PR adds a new command-line argument to change the behavior of the notification and it will be sent in a more deterministic manner where the notification can be sent only in one of three cases: didOpen, didChange, and didClose. Because these evets are only ways to cause a new notification, we can still expect to get the same diagnostic messages without missing any of them. For slang-test CI test, this new option will be used to make the notification more deterministic.
2025-06-14	Bump the min shader model version for CUDA 12.8+ (#7415)	Jay Kwak

2025-06-13	Skip processing import declarations after errors (#7393)	James Helferty (NVIDIA)
	* Add test case for missing import attribution Add a test case that imports a non-existent file, followed by a valid file. Tests for absence of a bug where slang reports existent files as non-existent if they're imported after a non-existent file. * Skip processing imports after errors Skip processing additional imports after the first error. This behavior is already observed in Linkage::loadSourceModuleImpl, but since that happenes after import processing already started, a false diagnostic gets generated for a missing import. By hoisting this check out before the import is processed, the diagnostic message for a missing file is no longer erroneously generated. Fixes #6453 * Revert "Skip processing imports after errors" This reverts commit 6b2fef09782414de4c5e017c4ecb5f2affa0c199. * Remove early abort of import processing Partial revert of commit 04f1bad Reverts an early return in Linkage::loadSourceModuleImpl() whenever any error diagnostic message has already been generated. This was causing earlier errors to prevent subsequent imports from succeeding, and was misattributing them to a missing file. Fixes #6453 --------- Co-authored-by: Yong He <yonghe@outlook.com>
2025-06-13	Allow interface methods to have default implementations. (#7439)	Yong He

2025-06-13	Fix a bug in empty array legalization. (#7444)	Yong He

2025-06-13	Fix SLANG_USE_SYSTEM_SPIRV_HEADERS (#7371)	Emil Imbert Villumsen
	* Use aliased SPIRV-Headers::SPIRV-Headers to also work with an installed SPIRV-Headers SPIRV-Headers standalone is only defined when using sources directly. When consuming an installed SPIRV-Headers via find_package, the full SPIRV-Headers::SPIRV-Headers must be used. The full syntax is supported by both source and installed builds. * Fix SLANG_USE_SYSTEM_SPIRV_HEADERS - Use find_package to bring in SPIRV-Headers cmake targets - Set SPIRV-Headers_SOURCE_DIR as a workaround when including spirv-tools - Query cmake for SLANG_SPIRV_HEADERS_INCLUDE_DIR location, supporting default, SLANG_OVERRIDE_SPIRV_HEADERS_PATH and find_package builds. - Cleanup unnecessary SPIRV_HEADER_DIR (unconditionally overwritten in spirv-tools)
2025-06-13	Fix issue that struct with member is not its Differential type (#7434)	kaizhangNV
	Close #6176. If the struct has a `no_diff` member, it should not be its Differential type. We miss this check.
2025-06-13	Add new capdef for lss intrinsics (#7427)	Mukund Keshava
	* Add new capdef for lss intrinsics Fixes #7426 Raygen shaders need to be supported for only hitobject APIs. So we need a special capability for that, instead of a common one. * regenerate command line reference --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-06-12	Fix API changes from separate debugging support (#7397)	jarcherNV
	Recent separate debugging support added two new functions which broke backwards compatibility. This change restores the old API and moves the new functions to an IComponentType2 interface which can be used if separate debug files are needed.
2025-06-12	Fix issue of missing scope for 'Differential' type (#7433)	kaizhangNV
	* Fix issue of missing scope for 'Differential' type When we synthesize the struct decl for Differential type, we should add the ownedScope for this decl, because the scope is used in lots of locations in the following synthesized processes, e.g. constructor synthesize. And that could cause surprising behavior, e.g. the 'this' expression could access the members of parent struct decl. Fix the issue by adding the scope. The containerDecl will be the Differential struct decl itself, parent scope will be the parent struct. * Add a unit-test
2025-06-12	Diagnose on use of struct inheritance. (#7419)	Yong He
	* Diagnose on use of struct inheritance. * fix test. * Fix tests. * fix. --------- Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com>
2025-06-12	Fix intermittent debug failures with Debug build (#7369)	Jay Kwak
	This PR replaces enable/disable style C function calls with C++ RAII style code. In debug build, when an assertion failed in between enable and disable functions, an exception is thrown and the disable function is not called. RAII style code is safer for an exception
2025-06-11	Add DebugLine before IfElse (#7368)	Lujin Wang
	Missing DebugLine in some basic blocks that include OpBranchConditional causes invalid line number '0' presented in the line table of '.debug_line' section. Emiting Debugline before IfElse fixes the issue. Modified maybeEmitDebugLine() to handle the case without Stmt.
2025-06-11	Fix an issue in extension override. (#7402)	Yong He
	* Fix an issue in extension override. * Fix typo in comment.
2025-06-10	Fix reflection to json issue (#7379)	kaizhangNV
	Apply argument buffer tier2 rule when using parameter block for Metal target. Close #6803.
2025-06-10	Allow checking capabilities in specific stages (#7375)	jarcherNV
	This allows checking capabilities in any stage, needed specifically for the hlsl_2018 capability which is defined for sm_5_1 and above. Stage specific capabilities such as cs_5_1 would not find this in any stage other than compute, so we need to restrict the check to only desired stages.
2025-06-10	Fix IR layout of 3-element vectors in cbuffers for -fvk-use-dx-layout (#7282)	James Helferty (NVIDIA)
	* Better handling for 16-byte boundary of d3d cbuffer Fixes #6921 D3D cbuffers have slightly different packing rules that allow packing vectors into a 16-byte slot at element alignments, except when a field would cross a 16-byte boundary. In that case, we need to realign the field to the next 16-byte boundary. In particular, this impacts vec3s, which are not a power of two in size and thus require slightly different alignment logic, compared to std430 and std140. (Example: a float and float3 should fit together in that order in a single slot.) Adds test cases. Adds documentation page for GLSL target
2025-06-10	remove unnecessary use of std::bit_cast (#7384)	Ellie Hermaszewska

2025-06-10	Legalise out parameters for vertex shaders on metal (#6943)	Ellie Hermaszewska
	* Handle pointer types when getting type cast style Closes https://github.com/shader-slang/slang/issues/6025 * Move vertex shader out parameters to return type for Metal Closes https://github.com/shader-slang/slang/issues/6025 * More asserts * Make struct instead of tuple * More layout preservation * Handle same function result * more layout * remove layout * a * more debug code * more debug code * a * layout working * refactored * more tests * more tests * fuse loops * remove unused comments * Correct filecheck usage * debug code * correct name and order of filecheck vars * simplify * Address review comments fix warning * simplify handling of simple vertex shaders
2025-06-10	Add optix support for coopvec (#7286)	Mukund Keshava
	* WiP: Add coopvec support for Optix * format code * fix minor issues * Fix review comments --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-06-09	Mediate access to ContainerDecl members (#7242)	Theresa Foley
	Most of what this change does is straightforward: take all the places in the code that used to operate directly on `ContainerDecl::members` and related fields, and instead have them call into a smaller set of accessor methods defined on `ContainerDecl`. The primary motivation for making this change is that in order to implement on-demand loading of members from serialized AST modules, we need a way to identify and intercept the "demand" for those members. On-demand loading benefits from having all accesses to the members of a `ContainerDecl` be as narrow as possible. If a part of the code only need a member at a specific index, it should say so. If it only needs access to members with a specific name, or a given subclass of `Decl`, then it should say so. A secondary motivation for this change is that there have recently been several changes that added complexity and special cases by introducing code that operated on (and mutated) the member list of a container decl in ways that the existing code had never done before. Any code that mutates the member list of a `ContainerDecl` needs to be sure to not disrupt the invariants that the lookup acceleration structures currently rely on. One of the recent changes added a declaration-to-index map to the set of acceleration structures (with different validation/invalidation behavior than the others...) while other recent changes would remove or insert declarations in ways that could change the indices of other declarations in the same container. It is not clear if any of these pieces of code were aware of the others, and the invariants that might be expected or broken along the way. This change bottlenecks the vast majority of accesses to the members of a `ContainerDecl` through the following operations: * Getting a `List` of all of the direct member declarations of a container * Get the number of direct member declarations, and accessing them by index. * Looking up the list of direct member declarations with a given name. * Adding a new direct member declaration to the end of the list. Some other operations are layered on top of those (e.g., getting a list of all the direct member declarations of a given C++ class). These layered operations are still centralized on the `ContainerDecl`, with the intention that we can change them to be non-layered implementations if we ever need to for performance (e.g., by building a lookup structure for finding member declarations by their type). The exceptional cases of access/mutation on the direct members of a `ContainerDecl` have also been encapsulated, but rather than expose what would risk appearing like general-purpose accessors (e.g., `removeDecl(d)`, `setDecl(index)`, etc.), these operations have been explicitly named after the specific use case that they serve in the codebase today, to discourage others from using them for more kinds of operations we'd rather not support. These operations have also been given parameter signatures that match their use cases, to make it so that even somebody determined to abuse them would have to invent suitable arguments out of thin air. In the case of the declaration-to-index mapping, this change eliminates that acceleration structure, in favor or slightly more complicated (and possibly inefficient, yes) code at the use site. Over time, it would be good to closely scrutinize each of the use cases that requires more complicated interaction with the members of a `ContainerDecl`, to see whether any of them can be reframed in terms of the more basic operations, or if there is some clean abstraction we can introduce to make operations that mutate the member list feel like... hacky.
2025-06-08	Fix interface types as RHS of is/as operators (#7234)	Jay Kwak
	Added error checking to reject interface types as the right-hand side of is and as operators. Enhanced semantic analysis with new diagnostic 30301 and comprehensive test coverage.
2025-06-06	Remove unnecessary compile definitions (#7355)	Emil Imbert Villumsen
	* Remove unused compile definitions No need for full external path; slang is already linking with the SPIRV-Headers cmake target which adds the proper include flags
2025-06-06	Add command line option for separate debug info (#7178)	jarcherNV
	* Add command line option for separate debug info Add command line arg -separate-debug-info which, if provided, produces both a .spv and a .dbg.spv file. The .dbg.spv file contains full debug info and the .spv file has all debug info stripped out. Also add a DebugBuildIdentifier instruction to store a unique hash in both the output files, so they can be more easily matched together. A matching API is provided to allow using the Slang API to retrieve a base and debug SPIRV as well as the debug build identifier string.
2025-06-06	Address issues with GLSL style global in/out vars (#6669) (#6998)	sricker-nvidia
	* Address issues with GLSL style global in/out vars (#6669) Asserts and segfaults were observed trying to compile a simple vertex shader like: ```` in int2 inPos; [shader("vertex")] main(uniform int2 test1, int2 test2, out float4 pos: SV_Position) void main() { // Bogus use of all input vars to prevent optimizing out. pos = float4(inPos.x, test1.x, test2.y, 0); } ```` Further investigation found that while replacing "uniform int2 test1" with "int2 test1" allowed for successful compilation, the resulting output shader would have overlapping location qualifiers. For example, compiling the above with "int2 test1" to glsl might give: ```` ... layout(location = 0) in ivec2 test1_0; layout(location = 1) in ivec2 test2_0; layout(location = 0) in ivec2 translatedGlobalParams_inPos_0; ... ```` This was because Slang does not actually support mixing GLSL style global in/out vars and entry point params. However, this is never checked for or noted in documentation. Slang source also assumes input shaders do not mix these and these assumptions ultimately led to the observed asserts and seg faults when using uniform entry point params. This change makes updates to throw an error when the compiler detects that it is trying to translate global in/out variables into entry point params when an entry point already contains parameters, allowing for compilation to fail gracefully. Certain tests have been updated to avoid mixing GLSL style global in/out vars and entry point params. This was mostly for tests that were using functions like WaveGetLaneIndex which use global in vars for certain platforms (see __builtinWaveLaneIndex). * Address issues with GLSL style global in/out vars - updates 1 (#6669) Update addresses review feedback to support mixing GLSL-flavored global in/out vars and entrypoint parameters when either all global in/out vars or all entry point params have a system value binding semantic. * Address issues with GLSL style global in/out vars - updates 2 (#6669) This update attempts to actually allow mixing GLSL style global in vars and entry point vars. Change attempts to recalculate offsets when adding the global input vars into the recreated entry point params layout. Additional updates were made to: -resolve further issues uncovered with entry point uniform params. -Address improper use of SV_DispatchThreadID in wave-get-lane-index.slang for metal. "thread_position_in_grid" is not supported for signed integer scalars or vectors. -Fix a spirv casting conflict due to the implementation of gl_PrimitiveID.get conflicting with PrimitiveIndex(). -Add a call to remove a global var in replaceUsesOfGlobalVar(). The global var is already replaced in this function and keeping it around can prevent it from being cleaned up by DCE if it still has decorations. * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-06-06	Disable Link-Time-Optimization by default (#7345)	Jay Kwak
	* Disable Link-Time-Optimization by default LTO was requested for the release package a while ago. When we added it, LTO was enabled by default although it was needed only for the release packages. Later we found that the Release build cannot be incrementally recompiled when LTO is enabled. It sometimes works fine, but it required full recompilation when it doesn't work. We added a new CMake option, `SLANG_ENABLE_RELEASE_LTO`, to disable it for developers. But many Slang developers don't know the option exists. I was going to update the document, CONTRIBUTING.md, but I thought it will be better to change the default behavior. * Fix a compiler warning treated as an error on linux A padding variable was uninitialized, which is fine, but the compiler was complaining about it. * Fix other gcc error for uninitialized variable * Fix more compile warning treated as error * Fix compiler warning from gcc 11 It appears that this is a valid warning that the `delete this` is done on an offset 8 when the class uses multiple inheritance. The compiler warning is following: ``` In file included from /home/runner/work/slang/slang/source/core/slang-memory-file-system.h:5, from /home/runner/work/slang/slang/tools/slang-unit-test/unit-test-module-ptr.cpp:3: In destructor ‘virtual Slang::ComBaseObject::~ComBaseObject()’, inlined from ‘uint32_t Slang::ComBaseObject::_releaseImpl()’ at /home/runner/work/slang/slang/source/core/slang-com-object.h:49:16, inlined from ‘virtual uint32_t Slang::MemoryFileSystem::release()’ at /home/runner/work/slang/slang/source/core/slang-memory-file-system.h:34:5: /home/runner/work/slang/slang/source/core/slang-com-object.h:33:31: error: ‘void operator delete(void, std::size_t)’ called on pointer ‘<unknown>’ with nonzero offset 8 [-Werror=free-nonheap-object] 33 \| virtual ~ComBaseObject() {} \| ^ In destructor ‘virtual Slang::ComBaseObject::~ComBaseObject()’, inlined from ‘uint32_t Slang::ComBaseObject::_releaseImpl()’ at /home/runner/work/slang/slang/source/core/slang-com-object.h:49:16, inlined from ‘virtual uint32_t Slang::MemoryFileSystem::release()’ at /home/runner/work/slang/slang/source/core/slang-memory-file-system.h:34:5, inlined from ‘Slang::ComPtr<T>::~ComPtr() [with T = ISlangMutableFileSystem]’ at /home/runner/work/slang/slang/include/slang-com-ptr.h:113:34, inlined from ‘void _modulePtr_impl(UnitTestContext)’ at /home/runner/work/slang/slang/tools/slang-unit-test/unit-test-module-ptr.cpp:92:1: /home/runner/work/slang/slang/source/core/slang-com-object.h:33:31: error: ‘void operator delete(void, std::size_t)’ called on pointer ‘<unknown>’ with nonzero offset 8 [-Werror=free-nonheap-object] 33 \| virtual ~ComBaseObject() {} \| ^ /home/runner/work/slang/slang/tools/slang-unit-test/unit-test-module-ptr.cpp: In function ‘void _modulePtr_impl(UnitTestContext)’: /home/runner/work/slang/slang/tools/slang-unit-test/unit-test-module-ptr.cpp:36:69: note: returned from ‘void* operator new(std::size_t)’ 36 \| ComPtr<ISlangMutableFileSystem>(new Slang::MemoryFileSystem()); \| ^ ``` The problem is on the fact that `ComBaseObject` is not the first in the multiple inheritance: ``` class MemoryFileSystem : public ISlangMutableFileSystem, public ComBaseObject { public: // ISlangUnknown SLANG_COM_BASE_IUNKNOWN_ALL ``` It should be: ``` class MemoryFileSystem : public ComBaseObject, public ISlangMutableFileSystem ``` The chain of ComObject release is little complicated and it is easy to make a mistake. Here is summary with details, 1. `release()` is declared as a pure-virtual in ISlangUnknown, which is one of the base classes of `ISlangMutableFileSystem`. ``` struct ISlangUnknown { virtual SLANG_NO_THROW uint32_t SLANG_MCALL release() = 0; ``` 2. `release()` is implemented with the macro `SLANG_COM_BASE_IUNKNOWN_RELEASE`. ``` SLANG_NO_THROW uint32_t SLANG_MCALL release() SLANG_OVERRIDE \ { \ return _releaseImpl(); \ } inline uint32_t ComBaseObject::_releaseImpl() { // Check there is a ref count to avoid underflow SLANG_ASSERT(m_refCount != 0); const uint32_t count = --m_refCount; if (count == 0) { delete this; } return count; } ``` 3. The instance of `MemoryFileSystem` is handled by ComPtr. And `ComPtr::~ComPtr()` calls the `release()`. ``` ComPtr<ISlangMutableFileSystem> memoryFileSystem = ComPtr<ISlangMutableFileSystem>(new Slang::MemoryFileSystem()); SLANG_FORCE_INLINE ~ComPtr() { if (m_ptr) ((Ptr)m_ptr)->release(); } ``` 4. When `delete this` is called, because ComBaseObject is not the first in the multiple inheritance, `this` is 8 byte off from the actual instance address. A fix for this is to change the order of the inheritance and make ComBaseObject to be the first in the order.
2025-06-05	Fix somme mis-define of capability (#7356)	kaizhangNV
	Close #7315. We have couple mis-definition in capability. sm_50 shouldn't require cuda compute_9_0, drop it to compute_6_0 unpack should only require compute_6_0 subgroup_ballot will require sm_60 Co-authored-by: Yong He <yonghe@outlook.com>
2025-06-05	Implement isnan and isinf for WGSL with bitwise operations (#7344)	Jay Kwak
	WGSL doesn't support isnan and isinf, because it assumes that it always uses fast-math and fast-math doesnt' handle NaN as defined in IEEE standard. The initial implementation used a clever workaround but it stopped working from some point. This PR implemented isnan and isinf with a bitwise operation, which can be expensive. But that seems to be an only option at the moment.
2025-06-05	Fix crash when loading modules with syntax errors (#6993) (#7288)	Harsh Aggarwal (NVIDIA)
	* Fix#6993 - Emit Diagnostic Warning and Fix SIGSEGV * Update external/slang-rhi submodule * Add checks for valid stage names for paq in SemanticsVisitor check * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
2025-06-05	Fix missing debug info for the included slang file (#7281)	Lujin Wang
	* Fix missing debug info in the included slang file Issue: https://github.com/shader-slang/slang/issues/7271 Debug info including DebugFunction, DebugLocation, and DebugValue are missing in IR for "#included" Slang shader file. The included shader file was not added to TranslationUnit's source file list, therefore mapSourceFileToDebugSourceInst.add() was not called for the source in generateIRForTranslationUnit(), and later mapSourceFileToDebugSourceInst.tryGetValue() could not get value for the source to add DebugLocationDecoration, which led to missing DebugFunction, DebugLocation and other debug info for the included file in IR. Adding the include file in TranslationUnit's source file list fixes the issue. * Add source file using PreprocessorHandler Call _addSourceFile from FrontEndPreprocessorHandler::handleFileDependency. * Just use FrontEndPreprocessorHandler * Make _addSourceFile public * format code * Distingush the included source file * Add m_includedFileSet to avoid adding dup file HashSet<SourceFile*> m_includedFileSet; --------- Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-06-04	Add legalization for 0-sized arrays. (#7327)	Yong He
	* Add legalization for 0-sized arrays. * Allow 0-sized arrays in the front-end. * More tests. * Add `Conditional<T, hasValue>` type to core module. * Update toc. * Fix wording. * Update test.
2025-06-04	Make interface types non c-style in Slang2026. (#7260)	Yong He
	* Make interface types non c-style. * Make Optional<T> work with autodiff and existential types. * Fix. * patch behind slang 2026. * Fix warnings. * cleanup. * Fix tests. * Fix. * Fix com interface lowering. * Add comment to test. * regenerate command line reference * Add test for passing `none` to autodiff function. * Fix recording of `getDynamicObjectRTTIBytes`. * Fix nested Optional types. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-06-04	Break down record replay to individual tests to avoid timeout (#7340)	Jay Kwak
	* Break down RecordReply to individual tests to avoid timeout In Debug build, RecordReplay unit-test was timing out. It was running six tests all in one unit-test, but this commit breaks it down to individual test so that each unit test can be done within the timeout limit. This issue has seen only in Debug build but it has been unnoticed because even when the test failed with test-server, it was still passing on its retry because the time-out applies only when using test-server. * Reduce the retry from 2 times to 1 time * Remove RecordReplay from expected failure
2025-06-04	Fix rpc errors on slang-unit-tests (#7339)	Jay Kwak

2025-06-03	Fix the memory leak (#7290)	Jay Kwak
	When we return a raw point to a module, we should decrement the reference count. The module is owned by its session so it should be valid as long as the session is valid.
2025-06-03	Fix specialization constants getting incorrectly folded (#7299)	Julius Ikkala

2025-06-03	Add CUDA the compile target for slang-playground	Gangzheng Tong

2025-06-02	Add a new slang-test option `-enable-debug-layers` (#7300)	Jay Kwak
	* Add a new slang-test option `-enable-debug-layers` A variable `disableDebugLayer` is renamed to `enableDebugLayers`, and a corresponding command-line argument is added, `-enable-debug-layers`. The previous option `-disable-debug-layer` is still available, but it prints a deprecation warning message. The reason why it is added is to make the option available to both Debug and Release. On Debug build, it will be enabled by default, and it will be disabled on Release build. We should be able to not only disable it, but also enable it on Release build. Ideally this option should be enabled all the time, but currently there are too many VUID error messages printed and we are enabling only for Debug build for now. Note that the CI/CD will run with the option disabled until we resolve all of VUID errors.
2025-06-02	Fix compile error in debug build (#7291)	Jay Kwak

2025-06-02	Use MatrixResultSignedComponents on OpCooperativeVectorMatrixMulNV (#7227)	Jay Kwak

2025-06-02	Fix coopvec::fill to use a simpler expression (#7253)	Jay Kwak