summaryrefslogtreecommitdiffstats
path: root/source/slang
Commit message (Collapse)AuthorAge
...
* Use 64bit int instead of emulation on metal (#8180)James Helferty (NVIDIA)2025-08-15
| | | | | | | | | | | | Metal's popcount prototype is `T popcount(T x)` but we want to use it to implement `countbits` where the prototype always returns `uint`. Using `popcount` directly would implicitly cast successfully to the 32-bit return value in all cases except when the argument is a 64-bit type. Thus, this change always explicitly casts the result to `$TR`, which should be one of the `uint[N]` types, and should always be able to hold the number of bits in the type. Addresses #6877
* [CUDA] Fix incorrect `kIROp_RaytracingAccelerationStructureType` emitting ↵ArielG-NV2025-08-15
| | | | | | | | | | | | | | logic (#8168) Fixes: #8167 Current emitting logic does not work, this has been corrected. The provided test ensures our CUDA code is valid by compiling PTX from it. `m_writer->emit("OptixTraversableHandle");` should be `out <<` since `out` adds to type-name-cache; otherwise using a type twice will produce bad type-names (since we filled type-name cache with "" instead of "typeName")
* Prohibit use of buffer.GetDimensions on metal (#8156)James Helferty (NVIDIA)2025-08-15
| | | Fixes #7011
* Clean up `natvis` and use fiddle to generate info needed for `.natvis` ↵ArielG-NV2025-08-14
| | | | | | | | | | | | | debugging (#8192) fixes: #8188 Changes: * Fix Indentation * Add a visualizer for `NodeBase` based on changes to `slang-fiddle` --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* [Capability System] Fix bug where capabilities do not correctly propegate if ↵ArielG-NV2025-08-14
| | | | | | | | | | | | | | | | | | AST-parent has target+set the AST-child does not (#8175) Fixes: #8174 Changes: * To determine if we propagate capabilities, we need to ensure that a `join` will do nothing (optimization since `join` is expensive + caching data for the `join` adds up to be expensive). This logic was changed in `slang-check-decl.cpp` since the current logic was incorrect. * A parent could have the set `metal+glsl` and the use-site could have `glsl`. In this case, we will not remove `metal` from the parent since `{metal+glsl}.implies({glsl})` is true. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Fix typo in "compilation ceased" error identifier (#8189)Sam Estep2025-08-14
|
* Handle SV_Barycentrics on metal (#8163)James Helferty (NVIDIA)2025-08-13
| | | Fixes #6785
* Fix error for not defining `Val::_toTextOverride` (#8177)Sam Estep2025-08-13
| | | | | #1729 renamed `Val::_toStringOverride` to `Val::_toTextOverride` but did not update the error message for when it is not overridden. This PR fixes that.
* Remove the semantic decoration from the original entry struct (#8146)Jay Kwak2025-08-13
| | | | | | | | | | | | | | | | | | | | When we legalize the entry point param, there are cases where we need to reconstruct a struct for the parameter and the original struct wouldn't be used. But if the user tries to use the origianl struct as a type for a function parameter, we will end up using both the original struct and the synthesized struct at the same time. On Metal and WGSL, it causes an error when an identical semtaic is used on more than one variable. This commit removes the semantics from the original struct after cloning the type. Fixes https://github.com/shader-slang/slang/issues/8141 Related to https://github.com/shader-slang/slang/issues/7693 --------- Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com>
* Do case-sensitive search for category first in slangc -h (#8142)aidanfnv2025-08-11
| | | | | | | | | | | | | | | | | | | | | | | | Related to #7969 In the slangc help text we have two categories, the option category `Target` and the values category `target`, where the names only differ by case. The help parser finds the category in the list by the case-insensitive name, so `slangc -h Target` and `slangc -h target` will both print the `Target` help text and never the `target` help text. This commit replaces the case-insensitive name search with a case-sensitive one, and then only if the case-sensitive check fails do we do a case-insensitive search. That way, if the user uses `Target` they get the `Target` text, for `target` they get the `target` text. If they use something like `Capability`, a category that does not exist but whose name is upper-case of the value category `capability`, they will still get the help text for `capability` as before as a fallback. --------- Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Allow specializing entrypoints with generic value args or variadic types ↵Yong He2025-08-09
| | | | | | from API (#8119) Closes #8110. Closes #8011.
* [SPIR-V] Emit control flags for `branch/flatten` decorations (#8134)amidescent2025-08-09
| | | | Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Fix atomics error diagnostics (#8117)venkataram-nv2025-08-09
| | | | | | | Fixes #8116 --------- Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
* Error if super-type capabilities are a super-set of sub-type (#7452)ArielG-NV2025-08-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes: #7410 Changes: 1. super-type capabilities must be a super-set of sub-type capabilities (and support the same shader stages/targets) * InheritanceDecl visits super-type to inherit it's capabilities; validate InheritanceDecl capabilities against sub-type * visit all container decl's with a default case * clean up functionDeclBase visitor * Simplify `diagnoseUndeclaredCapability` by moving logic into capability checking (more correct*) 3. added changed behavior to documentation 4. fixed some incorrect capabilities 5. **we do not** diagnose capability errors on interface requirement-to-implementation if both lack explicit capability requirements. This change is to work around a slangpy regression (test case for the failing situation is in `tests\language-feature\capability\capability-interface-extension-1.slang`), Note: maybe for slang-2026 we don't do this? 6. requirement & implementation must support the same shader stage/target. This was changed because otherwise we can have cases where `X` inherits from `Y`, but `Y` is only expected to be used in `glsl` whilst `X` is expected to be used in `hlsl | glsl` 7. removed `tests/language-feature/capability/capabilitySimplification3.slang` because it tests nothing special (redundant) Note: not using rebase due to separate branches depending on this PR --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Initial copy elision pass regression fix (#8126)ArielG-NV2025-08-08
| | | | | | | | | | | | | | when we have `GetElementPtr -> load -> GetElement` in our use-chain, the final `GetElement` may use the `load` as a `Index`, not a base. This is a non-issue with `getFieldExtract` since a field is a StructKey. We will still add this check to ensure no bugs down the line. --------- Co-authored-by: Harsh Aggarwal <haaggarwal@nvidia.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Diagnose on array of parameterblock instead of asserting (#8123)Ellie Hermaszewska2025-08-08
| | | Closes https://github.com/shader-slang/slang/issues/5750
* Fix FunctionReflection getName() returning null for overloaded functions (#8113)Copilot2025-08-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using Slang reflection API to find functions by name in a type, `FunctionReflection::getName()` would return `nullptr` for overloaded functions instead of the expected function name. The issue occurred in `spReflectionFunction_GetName` when `findFunctionByNameInType` returned a `SlangReflectionFunction` wrapping an `OverloadedExpr` (for overloaded functions) instead of a single `DeclRef<FunctionDeclBase>`. The `convertToFunc()` function would fail for `OverloadedExpr` objects, causing `getName()` to return `nullptr`. **Example of the bug:** ```cpp // This code would fail before the fix public interface IBase { public void step(inout float f); } public struct Impl : IBase { public void step(inout float f) { f += 1.0f; } } // Using reflection API: auto implType = reflection->findTypeByName("Impl"); auto stepFunction = reflection->findFunctionByNameInType(implType, "step"); auto name = stepFunction->getName(); // Would return nullptr ``` **Fix:** Modified `spReflectionFunction_GetName` to handle overloaded functions by falling back to the name of the first overload candidate when `convertToFunc` fails. This follows the same pattern already used by `spReflectionFunction_specializeWithArgTypes`. The fix ensures that: - Overloaded function containers return the correct function name - Individual overload candidates also return their names correctly - Non-overloaded functions continue to work as before - No regression in existing functionality **Testing:** Added comprehensive test cases covering both the original issue scenario and explicit function overloading. All existing reflection tests continue to pass. Fixes #8047. <!-- START COPILOT CODING AGENT TIPS --> --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> Co-authored-by: Yong He <yonghe@outlook.com> Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Fix intrinsic LoadLocalRootTableConstant for optix (#7949)Harsh Aggarwal (NVIDIA)2025-08-07
| | | | | | | | | | | | Due to an older version of spec referred there was an inconsitency v1.29 2/20/2025 - [HitObject LoadLocalRootArgumentsConstant] Latest spec https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html#hitobject-loadlocalroottableconstant Refer: OptiX backend support for Shader Execution Reordering (SER) features as outlined in issue #6647. -
* Support `expand` on concrete tuple values. (#8106)Yong He2025-08-07
| | | | | | | | Closes #8061. Along with the fix, also enhanced coercion/overload resolution to filter candidates based on the target type, allowing `tests\language-feature\higher-order-functions\overloaded.slang` to pass.
* Add new _getName to get extension decl names for core module docs (#7985)aidanfnv2025-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | Fixes #7479 This adds a new variant of `_getName(ExtensionDecl* decl)` to our markdown doc writer that takes an ExtensionDecl* and returns the name in the format `extension <name> : <interface>`. This is required to display "extension T : ITexelElement" properly in the core module docs, as the existing `_getName(Decl* decl)` returns an empty string because the name does not come from the `decl` itself, but rather its `targetType`. The target type alone is not enough, as that would return `T`, which will be erroneously interpreted as the name for the generic parameter, and the doc system will link every mention of `T` to the extension's page. ReadTheDocs already displays the name correctly in the TOC of the docs there, but that is because it falls back to the page title when the name in the TOC is empty. --------- Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Add warning for comma operators used outside for-loops and expand ↵Copilot2025-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | expressions in legacy mode (#7984) This PR implements a warning system to help users identify potentially unintended comma operator usage in expressions. The comma operator can be confusing when used in contexts like variable initialization where users might have intended to use braces for initialization instead. ## Problem The following code compiles without error but is likely not written as intended: ```slang float4 vColor = (0.f, 0.f, 0.f, 1.f); // Uses comma operators, evaluates to 1.f ``` The intended code should use braces: ```slang float4 vColor = {0.f, 0.f, 0.f, 1.f}; // Proper initialization ``` ## Solution Added a new warning diagnostic (`commaOperatorUsedInExpression`, ID: 41024) that warns when comma operators are used in expressions, with exemptions for contexts where they are commonly intended: - **For-loop side effects**: `for (int i = 0; i < 10; i++, x++)` - no warning - **Expand expressions**: `expand(f(), g(each param))` - no warning - **Slang 2026+ mode**: `let m = (1,2,3)` creates tuples - no warning - **All other expressions**: `float4 v = (a, b, c, d)` and `return a, b` - warns for each comma ## Implementation Details - Added context tracking in `SemanticsContext` with `m_inForLoopSideEffect` flag - Modified `visitForStmt` to use special context when checking side effect expressions - Added comma operator detection in `visitInvokeExpr` for `InfixExpr` nodes - Added language version check using `isSlang2026OrLater()` to disable warnings in Slang 2026+ mode where parentheses create tuples - Performance optimization: language version check is hoisted to avoid unnecessary casting - Warning can be suppressed using `-Wno-41024` command line flag ## Test Coverage Added comprehensive test cases using filecheck format that verify: - Warnings are generated for comma operators in variable initialization (legacy mode only) - Warnings are generated for comma operators in return statements (legacy mode only) - Warnings are generated for comma operators in general expressions (legacy mode only) - No warnings for comma operators in for-loop side effects - No warnings in Slang 2026+ mode where parentheses create tuples - Warning suppression works correctly Example output (legacy mode): ``` warning 41024: comma operator used in expression (may be unintended) float4 vColor = (0.f, 0.f, 0.f, 1.f); ^ warning 41024: comma operator used in expression (may be unintended) return a *= 2, a + 1; ^ ``` Fixes #6732. <!-- START COPILOT CODING AGENT TIPS --> --- 💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click [here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to start the survey. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: aidanfnv <198290069+aidanfnv@users.noreply.github.com> Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: aidanfnv <aidanf@nvidia.com> Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com>
* Initial copy elision pass (#8042)ArielG-NV2025-08-07
| | | | | | | | | | | | | | | | | | | | | | Fixes #7574 Changes: * Add an initial (fairly simple) optimization pass which is able to eliminate redundant copies. * Our current existing optimizer passes remove redundant load/store very robustly, this pass will focus on other cases of copy elimination * Primary approach is to make all functions which are `in T` and `T` is trivial to copy into a `__constref T`. We then (depending on scenario) manually insert a variable+load if a pass-by-reference is not possible; otherwise we pass by `constref`. * Added optimizations to eliminate redundant code which causes `constref` to fail to compile --------- Co-authored-by: Harsh Aggarwal <haaggarwal@nvidia.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Fix atomic fp16 vector SPIRV emit (#8104)jarcherNV2025-08-07
| | | | Update the SPIRV emit of atomic fp16 vector extension from its previous incorrect name to SPV_NV_shader_atomic_fp16_vector.
* Improve performance of AST deserialization (#7935)Theresa Foley2025-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Improve performance of AST deserialization The primary goal of these changes is to reduce the total time spent in the global session's `loadBuiltinModule()`, which gets called as part of global session creation to load the core module, and thus impacts every invocation of `slangc` and every user of the Slang compiler API. The majority of the time is spent simply deserializing the core module's AST and IR and, of those two, the AST takes significantly longer to load than the IR (in the ballpark of 5x the time). This change is focused on the serialization infrastructure but, given the performance situation described above, the focus is first and foremost on *deserialization* performance for the Slang *AST*, when using the *fossil* format. That focus shows through in the changes that have been implemented. Change serialization framework to use `template` instead of `virtual` ===================================================================== The recently-introduced serialization framework in `slang-serialize.h` was centered around a dynamically-dispatched `ISerializerImpl` interface. As a result, every single invocation of a `serialize(...)` call ultimately went through `virtual` function dispatch. While the overhead of the `virtual` calls themselves does not have a major impact on the total deserialization performance, those calls end up serving as a barrier to further optimization. This change changes operations that used to take a `Serializer const&` (which wraps an `ISerializerImpl*`), to instead declare a template parameter `<typename S>` and take an `S const&`. The main consequence of the change is that `serialize()` functions for user-defined types will need to be template functions, and thus either be defined in headers (alongside the type that they serialize) or else in the specific source file that handles serialization (as is currently being done for the AST-related types in `slang-serialize-ast.cpp`). Note that if we later decide that we want the ability to perform serialization through a dynamically-dispatched interface (e.g., to easily toggle between different serialization back-ends), it will be easier to layer a dynamically-dispatched implementation on top of the statically-dispatched `template` version than the other way around. Generous use of `SLANG_FORCE_INLINE` ==================================== In order to unlock further optimizations, a bunch of operations were marked with `SLANG_FORCE_INLINE`. It is important to note that forcing inlining like this is a big hammer, and needs to be approached with at least a little caution. The simplest cases are: * trivial wrapper function that just delegate to another function * functions that only have a single call site (but exist to keep abstractions clean) Externalize Scope for `begin`/`end` Operations ============================================== The old `ISerializerImpl` interface had a bunch of paired begin/end operations that define the hierarchical structure of data being read. Most serializer implementations (whether for reading or writing) use these operations to help maintain some kind of internal stack for tracking state in the hierarchy. The overhead of maintaining such a stack with something like a `List<T>` amortizes out over many operations, but even that overhead is unnecessary when the begin/end pairs are *already* mirroring the call stack of the code invoking serialization. This change modifies the `ScopedSerializerFoo` types so that they each provide a piece of stack-allocated storage to the serializer back-end's `beginFoo()` and `endFoo()` operations. Currently only the `Fossil::SerialReader` is making use of that facility, but the other implementations of readers and writers in the codebase could be adapted if we ever wanted to. Streamline `Fossil::SerialReader` ================================= The most significant performance gains came from changes to the `Fossil::SerialReader` type, aimed at minimizing the cycles spent in the core `_readValPtr()` routine. That function used to have a large-ish `switch` statement that implemented superficially very different reading logic depending on the outer container/object being read from. The new logic pushes more work back on the `begin` and `end` operations (which get invoked far less frequently than simple scalar/pointer values get read), so that they always set up the state of the reader with direct pointers to the data and layout for the next fossilized value to be read. The remaining work in `_readValPtr()` has been factored into a differnt subroutine - `_advanceCursor()` - that takes responsibility for advancing the data pointer, and updating the various other fields. The `_advanceCursor()` routine is still messier than is ideal, because it has to deal with the various different kinds of logic required for navigating to the next value. Various other conditionals inside the `SerialReader` implementation were streamlined, mostly by collapsing the `State::Type` enumeration down to only represent the cases that are truly semantically distinct. Evaluated: Streamline Layout Rules for Fossil ============================================= One potential approach that I implemented but then reverted (after finding it had little to no performance impact) was changing the fossil format to always write things with 4-byte alignment/granularity. That would mean values smaller than 4 bytes would get inflated to a full 4 bytes, and scalar values larger than 4 bytes get written with only 4-byte alignment (requiring unaligned loads to read them). I found that the only way to take advantage of the simplified layout rules to improve read performance would be to more-or-less eliminate the use of the layout information embedded in the fossil data, which would make it very difficult to validate that the data is correctly structured. Possible Future Work: Further Type Specialization ================================================= As it stands, the biggest overhead remaining on the critical path of `_readValPtr()` is the way the `_advanceCursor()` logic needs to take different approaches depending on the type of the surrounding context (advancing through elements of a container is very different than advancing through fields of a `struct`, for example). The interesting thing to note is that at the use site within a `serialize()` function, it is usually manifestly obvious which case something is in. If the code uses `SLANG_SCOPED_SERIALIZER_ARRAY` it is in a container, while if it uses `SLANG_SCOPED_SERIALIZER_STRUCT` it is in a struct. This means that the contextual information is staticaly available, but just isn't exposed in a way that lets the core reading logic take advantage of it. A logical extension of the work here would be to expand on the `Scope` idea added in this change such that most of the serialization operations (`handleInt32`, `handleString`, etc.) are actually dispatched through the scope, and then have each of the `SLANG_SCOPED_SERIALIZER_...` macros instantiate a *different* scope type (still dependent on the serializer). * fixup * format code * typo --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Add Discord server link to Slang internal error messages (#8026)Copilot2025-08-06
| | | | | | | | | | | | | | | | | | | | | | | | | * Initial plan * Add Discord server link to Slang internal error messages Co-authored-by: aidanfnv <198290069+aidanfnv@users.noreply.github.com> * Format code according to Slang coding standards Co-authored-by: aidanfnv <198290069+aidanfnv@users.noreply.github.com> * Update Discord URL and message format for error diagnostics - Changed Discord URL from discord.gg/slang to khr.io/slangdiscord - Updated message format to "For assistance, file an issue on GitHub (...) or join the Slang Discord (...)" - Applied changes to all internal error diagnostics: unimplemented, unexpected, internalCompilerError, compilationAborted, compilationAbortedDueToException Co-authored-by: aidanfnv <198290069+aidanfnv@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: aidanfnv <198290069+aidanfnv@users.noreply.github.com> Co-authored-by: aidanfnv <aidanf@nvidia.com>
* Fix unused space discovery for bindless heap. (#8075)Yong He2025-08-06
|
* Enable on-demand deserialization of AST decls (#8095)Theresa Foley2025-08-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Overview -------- This change basically just flips a `#define` switch to enable the changes that were already checked in with PR #7482. That earlier change added the infrastructure required to do on-demand deserialization, but it couldn't be enabled at the time due to problematic interactions with the approach to AST node deduplication that was in place. PR #8072 introduced a new approach to AST node deduplication that eliminates the problematic interaction, and thus unblocks this feature. Impact ------ Let's look at some anecdotal performance numbers, collected on my dev box using a `hello-world.exe` from a Release x64 Windows build. The key performance stats from a build before this change are: ``` [*] loadBuiltinModule 1 254.29ms [*] checkAllTranslationUnits 1 6.14ms ``` After this change, we see: ``` [*] loadBuiltinModule 1 91.75ms [*] checkAllTranslationUnits 1 11.40ms ``` This change reduces the time spent in `loadBuiltinModule()` by just over 162ms, and increases the time spent in `checkAllTranslationUnits()` by about 5.25ms (the time spent in other compilation steps seems to be unaffected). Because `loadBuiltinModule()` is the most expensive step for trivial one-and-done compiles like this, reducing its execution time by over 60% is a big gain. For this example, the time spent in `checkAllTranslationUnits()` has almost doubled, due to operations that force AST declarations from the core module to be deserialized. Note, however, that in cases where multiple modules are compiled using the same global session, that extra work should eventually amortize out, because each declaration from the core module can only be demand-loaded once (after which the in-memory version will be used). Because of some unrelated design choices in the compiler, loading of the core module causes approximately 17% of its top-level declarations to be demand-loaded. After compiling the code for the `hello-world` example, approximately 20% of the top-level declarations have been demand-loaded. Further work could be done to reduce the number of core-module declarations that must always be deserialized, potentially reducing the time spent in `loadBuiltinModule()` further. The data above also implies that `loadBuiltinModule()` may include large fixed overheads, which should also be scrutinized further. Relationship to PR #7935 ------------------------ PR #7935, which at this time hasn't yet been merged, implements several optimizations to overall deserialization performance. On a branch with those optimizations in place (but not this change), the corresponding timings are: ``` [*] loadBuiltinModule 1 176.62ms [*] checkAllTranslationUnits 1 6.04ms ``` It remains to be seen how performance fares when this change and the optimizations in PR #7935 are combined. In principle, the two approaches are orthogonal, each attacking a different aspect of the performance problem. We thus expect the combination of the two to be better than either alone but, of course, testing will be required.
* Fix GetDimensions to use mipLevel for SPIRV (#8065)Jay Kwak2025-08-06
| | | | | | | | | * Fix GetDimensions to use mipLevel for SPIRV * format code (#84) --------- Co-authored-by: slangbot <ellieh+slangbot@nvidia.com>
* A new approach to AST node deduplication (#8072)Theresa Foley2025-08-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * A new approach to AST node deduplication Background ---------- The Slang compiler currently relies on the idea that AST nodes derived from `Val` are always deduplicated based on their opcode and operands. Deduplication requires caching, and we thus have to determine the right scope at which to allocate and cache different `Val`s. We need to ensure that `Val`s that refer to declarations/nodes in a specific compilation session/linkage are not cached in a scope that will outlive that session/linkage (or else the cache will contain garbage pointers). Conversely, we also need to ensure that any `Val`s that are referred to by something like a builtin module's AST will remain alive at least as long as that module/AST, and that every compilation session/linkage that refers to that module will find that `Val` cached if they try to create an equivalent one. The existing approach to deduplication has some subtleties: * A builtin module like the core module needs to be fully loaded (using the `ASTBuilder` for the global session) before any other compilation session might construct `Val`s referring to nodes in the AST of that module. * The various declarations/types/values cached on the `SharedASTBuilder` need to be created before any user-defined compilation occurs, to ensure that all of the relevant `Val`s are created on the global session's AST builder (see `Session::finalizeSharedASTBuilder`). * Related to all of the above: the deduplication cache on a non-top-level `ASTBuilder` is initialized by copying the cache from the top-level (global session) `ASTBuilder` on creation. Any subsequent allocations on the global-session `ASTBuilder` will not be visible to the linkage-level `ASTBuilder`. This led to weird things like the *options parsing* logic for `-load-core-module` reaching in and performing a manual copy of the cache from the global session to a linkage to try to restore the invariants. * Notably, the deduplication logic doesn't actually care if two different linkages create distinct `Val`s that represent the same thing, so long as nothing in the AST of a builtin module will refer to that thing. E.g., if the builtin modules never mention `Ptr<String>`, then two different linkages that both refer to that type will likely construct distinct `Val`s to represent it. Thus the deduplication is not as complete as some might assume. The subtle ordering considerations when creating `Val`s related to builtin modules has proved to be a sticking point for implementing on-demand deserialization of the builtin modules (in order to improve startup times). Overview -------- This change implements a new approach that hopefully makes it easier to ensure correctness, even in the case where AST nodes for builtin modules and `Val`s that refer to those nodes sometimes get created after other compilation has been performed. The key points of the new approach are: * `ASTBuilder`s are now explicitly (rather than just implicitly) linked into a hierarchy. Currently the compiler codebase will only create a two-level hierarchy: the `ASTBuilder` for a global session is the parent to all of the `ASTBuilder`s created for `Linkage`s. The expectation is that the code can and will generalize to more levels of nesting. * When a request is made to create a `Val` (or find it in a cache), there is a single `ASTBuilder` that is determined to be responsible for owning that `Val` for its lifetime. The logic involved is comparable to the way that the `IRBuilder` decides where to insert a "hoistable" (deduplicated) instruction, with the simplification that we are dealing with a tree instead of a CFG. Details ------- * `Session::finalizeASTBuilder()` is gone, because it is no longer needed * The logic in the `ASTBuilder` constructor and in `slang-options.cpp` that was manually copying the `m_cachedNodes` from the global-session `ASTBuilder` over to a linkage's `ASTBuilder` is removed, since it is no longer needed. * Made every AST node (`NodeBase`) carry a pointer to the `ASTBuilder` that created it. This is wasteful, but makes it easier to be sure the implementation will work. * Introduced a class `RootASTBuilder`, derived from `ASTBuilder` to represent the root of a given hierarchy of AST builders. * Every non-root `ASTBuilder` is now constructed with a pointer to its parent builder. * Changed it so that instead of allocating a `SharedASTBuilder` and then passing it in to create one or more `ASTBuilder`s, the shared AST builder state is more of an implementation detail of the `ASTBuilder` type, and is automatically allocated behind the scenes as part of creating an `ASTBuilder`. * The inline (defined in header) `ASTBuilder::_getOrCreateImpl()` now just does a first-pass check for an existing cached `Val` and, if it doesn't find one, delegates to `ASTBuilder::_getOrCreateImplSlowPath()`, which encapsulates the logic for the cache-miss case (even if that logic is just two lines). * The `ASTBuilder::_findAppropriateASTBuilderForVal()` method inspects a `ValNodeDesc` to determine the "deepest" of the AST builders among its operands (falling back to the root AST builder if there are no relevant operands). * The `ASTBuilder::_getOrCreateValDirectly()` is intended for use when the correct AST builder to use for caching/allocation has already been identified. * Moved the caching of generic arguments for "default substitutions" out of `ASTBuilder` and onto `GenericDecl` itself. Note that the naming of the old field (`m_cachedGenericDefaultArgs`) was unclear about the fact that this is related to "default substitutions" (where each generic parameter is fed an argument that refers to the parameter itself), and has *nothing* to do with any default argument values that might be set on the generic parameters. * Changed the global session (`Session`) so that instead of storing pointers to both the root/builtin `ASTBuilder` *and* the corresponding `SharedASTBuilder`, it just retains a pointer to the root `ASTBuilder`. This is consistent with the move toward making the `SharedASTBuilder` just an implementation detail. Possible Future Work -------------------- * In theory this change should unblock on-demand AST deserialization, so it would be good to get back to those changes once this new approach lands. * Many AST node subclasses end up storing their own `ASTBuilder` pointers, which are likely now redundant with the one in `NodeBase`. These should be eliminated where possible. * A lot of code for AST-related manipulations has been changed over time to require an `ASTBuilder` to be passed in, but at this point such a builder should always be derive-able so long as the operation has at least one non-null AST node available to it. * Most of the accessors that are currently on `SharedASTBuilder` can/should be migrated to just be on `ASTBuilder`, where they can use the data from the `SharedASTBuilder` as part of their implementation. Ideally most code should only nee to interface with `ASTBuilder`s directly, and will never need to talk to the `SharedASTBuilder`. * This change cleaned up some of the ownership hierarchy, and made it so that the global session only retains a pointer to the root AST builder (and not the shared state as well). A logical follow-on for that would be to make the `Linkage` more properly own (and thus allocate) its own AST builder (and other related objects), and have the global session only store a pointer to the root linkage (and not point to the various sub-objects directly). * The `Linkage` and `SourceManager` types have their own form of parent/child hierarchy (restricted to two levels), and could be generalized in a way that is similar to what this change does for `ASTBuilder`. Support for a hierarchy of `Linkage`s could be a powerful tool for end users, if we expose it in the right way. * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Fix noperspective modifier for SV_Barycentrics in SPIRV and GLSL (#8067)davli-nv2025-08-06
| | | | | | | | | | | | | | | | | | | | | | | | | | * Fix noperspective modifier for SV_Barycentrics in SPIRV and GLSL - Added test case with both regular and noperspective SV_Barycentrics inputs 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: davli-nv <davli-nv@users.noreply.github.com> * fixup format * address review https://github.com/shader-slang/slang/pull/8067#pullrequestreview-3090037501 * address review https://github.com/shader-slang/slang/pull/8067#discussion_r2255818595 * add test case from review --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: davli-nv <davli-nv@users.noreply.github.com>
* Add reflection api for overload candidate filtering. (#8066)Yong He2025-08-06
| | | | | | | | | | | | | * Add reflection api for overload candidate filtering. * Fix API. * Fix. * Update build. * Update test. * Update formatting.
* Implement SPV_EXT_fragment_invocation_density (SPV_NV_shading_rate) (#8037)davli-nv2025-08-05
| | | | | | | | | | | | | | | | | | | | | * Implement SPV_EXT_fragment_invocation_density -Adds semantics SV_FragSize and SV_FragInvocationCount and implements them for SPIRV and GLSL using the appropriate target builtins from extensions. -Adds test case checking for expected target builtins from these semantics. -For future work, could implement SV_FragSize using pixel shader input SV_ShadingRate for HLSL, and SV_FragInvocationCount needs research. Fixes #7974 Generated with Claude Code * address review feedback https://github.com/shader-slang/slang/pull/8037#pullrequestreview-3084645845 * fixup format * review feedback https://github.com/shader-slang/slang/pull/8037#pullrequestreview-3086442819
* Add missing `ASTBuilder` methods for int types (#8056)Sam Estep2025-08-05
|
* Fix #pragma warning not working with multifile modules (#7942)Copilot2025-08-05
| | | | | | | | | | | | | | | | | | | | | | | | * Initial plan * Fix pragma warning not working with multifile modules - Check if DiagnosticSink already has a WarningStateTracker before creating new one - This preserves pragma warning state across __include'd files - Add regression tests for multifile pragma warnings Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Add additional test cases for nested pragma warnings - Test nested __include scenarios with pragma warning directives - Verify pragma warnings work correctly with multiple levels of includes Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> Co-authored-by: Yong He <yonghe@outlook.com>
* Add support for pointer literals in metal (#8040)jarcherNV2025-08-04
| | | | Add support for kIROp_PtrLit types in metal and add a test for null pointer values, which is the only valid value.
* fix overload in extension issue (#7999)kaizhangNV2025-08-03
| | | | | | | | | | | | close #7931. For a generic callable, we have two passes overload resolution, in first pass, we will resolve the generic by only checking the generic parameters, while in the second pass, we will resolve the function signature to resolve the overload. But in our candidate comparison logic, we pick a preferred generic even two generics are equally good. However, we should not make this decision in the first pass, because we don't know about the function arguments in this pass yet. So we just return OverloadEpxr2 in this case, and let the function overload resolution to break the tie.
* Drain sink when single-argument constructor call fail (#7883)ArielG-NV2025-08-01
| | | | | | | | | | | | | | | | | * fix bug * fix test * push test changs for clarity * fix bug * fix test * push test changs for clarity * test what fails * remove redundant code
* Fix 7441: CUDA boolean vector layout to use 1-byte elements (#7862)Harsh Aggarwal (NVIDIA)2025-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Fix 7441: CUDA boolean vector layout to use 1-byte elements Boolean vectors (bool1, bool2, bool3, bool4) were incorrectly implemented as integer-based types using 4 bytes per element instead of actual 1-byte boolean elements on CUDA targets. Changes: - Update CUDA prelude to define boolean vectors as structs with bool fields instead of typedef aliases to integer vectors - Implement CUDALayoutRulesImpl::GetVectorLayout to use 1-byte alignment for boolean vectors, matching actual CUDA memory layout behavior - Update make_bool functions to populate struct fields correctly This ensures boolean vectors have the same memory layout as bool[4] arrays: - bool1: 1 byte (was 4 bytes) - bool2: 2 bytes (was 8 bytes) - bool3: 3 bytes (was 12 bytes) - bool4: 4 bytes (was 16 bytes) Fixes memory layout mismatch between Slang reflection API and actual CUDA compilation, achieving 75% memory savings for boolean vector usage. * Fix CI issues - Add and update associated functions and operators * Make boolX same as uchar * Use align construct on struct for boolX * Improve Test case for robust alignment checks * Formatting * Disable selected slangpy tests * add metal check which is slightly different than cuda * Test-1 * Test-2 * Test-3 * Test-4 * ReflectionChange * cleanup and update * _slang_select with plain bool is needed for reverse-loop-checkpoint-test
* Omit "Invalid" capability from slangc -h output (#8020)aidanfnv2025-08-01
| | | | | | | | | | | | * Omit "Invalid" capability from slangc -h output * format code (#23) Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> --------- Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Fix segmentation fault in ray tracing parameter consolidation. (#7997)Copilot2025-07-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | * Initial plan * Fix segfault in ray tracing parameter consolidation Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Fix. * Fix. * Keep entrypoint param layout consistent during `MoveEntryPointUniformParametersToGlobalScope`. * Fix. * fix. * Fix. * Fix pending layout handling. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> Co-authored-by: Yong He <yonghe@outlook.com>
* Add matrix select intrinsic (#7566)venkataram-nv2025-07-31
| | | | | | | | | | | | | | | | | | | | | | | * Add matrix select intrinsic * Fix hlsl test * Restrict matrix select to HLSL * Better test for HLSL side * Select route for GLSL/SPIRV * Exclude matrices from select legalization * Exclude CUDA from select test * Inline and move * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* msvc style bitfield packing (#7963)Ellie Hermaszewska2025-07-31
| | | | | | Closes https://github.com/shader-slang/slang/issues/3646 New tests rather than just adding another TEST line to existing tests so that we get the msvc- prefix in the output of slang-test
* disallow `static const` variables without default-value (#7993)ArielG-NV2025-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | * Fix static const variables without initializers causing internal errors Add validation in SemanticsDeclHeaderVisitor::checkVarDeclCommon to detect static const variables without initializers and emit proper error diagnostics instead of allowing internal errors to escape during SPIR-V generation. - Add new diagnostic (ID 31225) for static const variables without initializers - Skip validation for extern static const variables - Skip validation for interface member variables - Add comprehensive test case covering various scenarios Fixes #7989 Co-authored-by: ArielG-NV <ArielG-NV@users.noreply.github.com> * clean up test and implementation * format code (#7994) Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: ArielG-NV <ArielG-NV@users.noreply.github.com> Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* [Language Server]: Don't eagerly check file upon open doc. (#7995)Yong He2025-07-30
|
* Lowering unsupported matrix types for GLSL/WGSL/Metal targets (#7936)venkataram-nv2025-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add emit cases for WGSL and GLSL * Fix compilation warnings Modify short cutting test to reflect change in emit logic Lower matrix for metal as well Add emit matrix logic for metal Fix compiler warning Brace initializer for lowered matrices Fix compiler warnings * Tests for metal * Fix mult, any, and determinant * Fix matrix-matrix multiplication * Fix mat mul to be element-wise * Fix compiler warning * Move makeMatrix to legalization * Move unary and binary arithmetic operator lowering to legalization * Remove emit logic and move final comparison operators to legalization * Handle vector/matrix negation for WGSL * Restore older SPIR-V emit logic * Address PR comments * Revert to zero minus for negation * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Fix ICE when immutable value is passed to a bwd_diff function. (#7973)Yong He2025-07-29
|
* Fix Metal invalid as_type cast for 64-bit RWByteAddressBuffer.Store values ↵Gangzheng Tong2025-07-29
| | | | | | | | | | | | | | | | | | | | (#7843) * Fix 64-bit val lowering for metal * Add ByteAddressBuffer load/store 64-bit tests * Handle Store/Load ptr types * Use bitcast for non-pointer typers * format code (#7966) Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> --------- Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* [Language Server]: Show signature help on generic parameters. (#7913)Yong He2025-07-29
| | | | | | | | | | | | | * Show signature help on generic parameters. * Fix. * Update tests. * slang-test: make vvl error go through stderr. * update slang-rhi * Update slang-rhi
* Detect uses of uninitialized resource fields (#7962)Ellie Hermaszewska2025-07-29
| | | Closes https://github.com/shader-slang/slang/issues/3386
* Improve diagnostics over ambiguous references. (#7930)Yong He2025-07-29
| | | | | | | | | | | | | | | | | | | * Improve diagnostics over ambiguous references. * Fix. * Remove files. * Fix some optix hitobject intrinsics. * Fix some hitobject intrinsics for optix. * Fix. * update rhi * revert slang-rhi * Update slang-rhi