slang.git - Making it easier to work with shaders

Age	Commit message (Collapse)	Author
2025-09-10	CUDA: Fix compiler crash with unsized array field - nonuniformres-as-… (#8380)	Harsh Aggarwal (NVIDIA)
	…function-parameter.slang #8315 Root Cause: CUDA compilation crashed with `assert failure: !seenFinalUnsizedArrayField` because unsized arrays like `RWStructuredBuffer<uint> globalBuffer[]` were not the final field in generated parameter structs, violating the layout constraint in slang-ir-layout.cpp. Fix: Extended `collectGlobalUniformParameters` to automatically reorder struct fields for CUDA targets - regular fields first, unsized arrays last. Other targets preserve original order. Impact: - Enables CUDA support for nonuniform resource indexing as function parameters - Zero impact on existing GLSL/HLSL/SPIRV targets - Automatic handling - no manual parameter reordering required Files: slang-emit.cpp, slang-ir-collect-global-uniforms.cpp/.h, test file --------- Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
2025-04-05	Fix crash when using GLSL global uniforms and varying inputs/ouputs together ↵	Darren Wihandi
	(#6651) * Fix incorrect assert on mixed global uniform and varyings * add test * remove unnecessary include * fix incorrect logic * fix comment grammar * address review comments and improve test * minimize diff * fix more issues for cuda build * remove unnecessary line for diff
2025-01-14	Implement specialization constant support in numthreads / local_size (#5963)	Julius Ikkala
	* Allow using specialization constants in numthreads attribute * Add support for GLSL local_size_x_id syntax * Fix overeager specialization constant parsing * Add diagnostics for specialization constant numthreads * Remove unused variable * Fix local_size_x_id not finding existing specialization constant * Allow materializeGetWorkGroupSize to reference specialization constants * Use SpvOpExecutionModeId for modes that require it * Cleanup specialization constant numthreads code * Add tests for specialization constant work group sizes * Fix implicit Slang::Int -> int32_t cast * Fix querying thread group size in reflection API --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-11-21	Add datalayout for constant buffers. (#5608)	Yong He
	* Add datalayout for constant buffers. * Fixes. * Fix test. * Fix glsl codegen. * Update spirv-specific doc. * Fix test. * Fix binding in the presense of specialization constants. * address comments. * Add a test for constant buffer layout.
2024-11-05	Move switch statement bodies to their own lines (#5493)	Ellie Hermaszewska
	* Move switch statement bodies to their own lines * format --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-10-29	format	Ellie Hermaszewska
	* format * Minor test fixes * enable checking cpp format in ci
2023-04-25	Support recomputing phi params in bwd prop func. (#2841)	Yong He

2023-02-16	Remove `SharedIRBuilder`. (#2657)	Yong He
	Co-authored-by: Yong He <yhe@nvidia.com>
2023-02-16	Overhaul global inst deduplication and cpp/cuda backend. (#2654)	Yong He
	* Overhaul global inst deduplication and cpp/cuda backend. * Update IR documentation. --------- Co-authored-by: Yong He <yhe@nvidia.com>
2022-05-27	Added NativeStringType (#2252)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Use TerminatedUnownedStringSlice for literals in output C++. * Remove Escape/Unescape functions used in slang-token-reader.cpp Add target type of 'host-cpp' etc to map to the target types. * Fix some corner cases around string encoding. * Added unit test for string escaping. Fixed some assorted escaping bugs. * Updated test output. * Added decode test. * Stop using hex output, to get around 'greedy' aspect. Use octal instead.
2022-05-18	Support for querying which parameters are used in emitted code (#2239)	Alexey Panteleev
	See https://github.com/shader-slang/slang/issues/2213
2021-12-17	Cleanup refactoring work around the IR builder (#2061)	Theresa Foley
	* Cleanup refactoring work around the IR builder We have some long-term goals for the IR that require a more centralized and disciplined set of rules for how IR instructions get created/emitted. I had been working on trying to set things up so that all IR instruction creation goes through a single bottleneck point, but the non-trivial work in that branch was getting drowned out by the sheer volume of cleanup and refactoring changes. This change tries to pull together several of the more important cleanups. The big pieces are: * `IRBuilder` and `SharedIRBuilder` now protect their data members and rely on users to initialize them more directly via constructor of an `init()` method. This change affects a bunch of sites where `IRBuilder`s were created. I changed use sites to use the constructors whenever possible, and to use `init()` in cases where we had longer-lived builders that needed to be initialized multiple times. * The insertion location for the `IRBuilder` now uses an encapsulated type called `IRInsertLoc`. This new type can replace what used to be just two `IRInst` fields in the builder, and also covers some new functionality (if we ever want to take advantage of it). Very little client code cares about this change, but it is still a nice cleanup in terms of making things more explicit. The creation of an `IRModule` has been moded out of `IRBuilder`, because in practice we `IRBuilder` always wants to be associated with a pre-existing `IRModule` at creation time (via its `SharedIRBuilder`). There is now an `IRModule::create()` operation instead. This required changing the sequencing at many `IRModule` creation sites, since most had been contriving to make an `IRBuilder` first. There were also several cleanups because code had been carelessly using non-reference-counted pointers for `IRModule`s in ways that broke now that `IRModule::create()` always returns a `RefPtr`. * The core operations to actually allocate memory for IR instructions were moved into `IRModule` (since they interact with the memory pool that the module owns). These were called `createEmptyInst()` but have been renamed into `_allocateInst()`. In principle these seem like they should only be needed to be called by the `IRBuilder`, but in practice they are also needed by the IR deserialization logic. * A few core operations for emitting IR instructions that were associted with `IRBuilder` were moved to actually be methods on `IRBuilder`. First is `_findOrEmitConstant` which is the primary bottleneck for creating simple scalar constant values. Another is `_createInst` (formerly part of the templated `createInstImpl` along with `createInstWithSizeImpl`) which is the main bottleneck for allocation and initialization of any instruction other than a constant (well, the `IRModuleInst` is the other exception...). Finally, there is also `_maybeSetSourceLoc()`, which is obvious to scope inside the `IRBuilder` once it is protecting the source-location info. Notes: * The `minSizeInBytes` parameter to `_createInst()` might not actually be needed at all. At this point any `IRInst` subtypes that need data allocated for things other than their operands already get created manually via `_allocateInst` or `_findOrEmitConstant`, so I think we could remove that part. I will handle that in a subsequent cleanup if it turns out to be the case. * There is one IR pass (`slang-ir-string-hash.cpp`) that is using manual `_allocateInst()` instead of going through an `IRBuilder`. It could be easily cleaned up to not do so (and I will probably make that change down the line), but for now I wanted to avoid doing anything that wasn't close to pure refactoring if I could. * At this point in our design an `IRBuilder` is a very lightweight thing - it basically just owns the insertion location plus a source location to write into instructions. A lot of our code currently treats `IRBuilder`s like they are expensive and/or need to be re-used (which leads to them being used in more mutable/stateful ways). It is quite likely that as we clean up other aspects of the implementation of IR creation/emission we can make `IRBuilder` use feel more lightweight in ways that can streamline and simplify code. * The next step for this work is to identify the different paths that eventually lead to `_createInst()` being called, and unify them at a single bottleneck operation that can own the decisions around when to create an instruction vs. when to re-use an existing one (rather than those decisions being baked into the various `IRBuilder` subroutines that create instructions of the various subtypes). * fixup: gcc/clang C++ spec details
2021-02-16	Add an accessor for IRInst opcode (#1707)	Tim Foley
	* Add an accessor for IRInst opcode This main changing is renaming `IRInst::op` over to `IRInst::m_op` and then adds an accessor `IRInst::getOp()` to read it. The rest of the changes are just changing use sites to `getOp` (or to `m_op` in the limited cases where we write to it). This work is in anticipation of a future change that might need to store an extra bit in the same field as the opcode. It seemed better to do this massive refactoring as a separate PR. * fixup
2020-11-19	Unify handling of static and dynamic dispatch for interfaces (#1612)	Tim Foley
	Overview ======== Prior to this change, we had two different code generation strategies for interface/existential types in Slang, that didn't always play nicely together: * The "legacy" static specialization approach could handle plugging in an arbitrary concrete type for an existential type parameter (including types with resources, etc.), but wouldn't work well with things like a `StructuredBuffer<>` of an interface type, and requires somewhat counter-intuitive layout rules to make work. * The new dynamic dispatch approach produces simpler, more easily understood layouts by assuming that values of interface type can fit into a fixed number of bytes. The tradeoff there is that it cannot handle types that include resources (only POD types). The goal of this change is to make it so that the two strategies can co-exist. In particular, in cases where a shader is amenable to both static specialization and dynamic dispatch, the type layouts should agree. In order to make the type layouts agree, we: * Declare that all values of existential type reserve storage according to the dynamic-dispatch rules (so 16 bytes for the RTTI and witness-table information, plus whatever bytes are needed to story "any value" of a conforming type). * Then we modify the "legacy" layout rules so that if a value of concrete type can fit in the reserved "any value" space for a given interface, then it is laid out there exactly like the dynamic dispatch rules would do. Otherwise, we fall back to the previous legacy rules (since we don't need to agree with the dynamic-dispatch layout on types that can't be used with dynamic dispatch). Details ======= * Renamed `ExistentialBox` to `BoundInterfaceType` to better clarify how it relates to `BindExistentialsType` * Unconditionally apply the `lowerGenerics` pass during emit, since it is now responsible for aspects of the lowering of existential types when specialization is used. * Made IR type layout take the target into account, so that the layout of resource types can vary by target (e.g., being POD on some targets, and invalid on others) * Cleaned up some issues around using global shader parameters as the "key" for their layout information in the global-scope layout (only comes up when there are global-scope `uniform` parameters) * Made there be a default any-value size (16) instead of making it be an error to leave out. This was the simplest option; we could try to go back to having an error, but we'd need to only issue it if we are sure a type/interface is being used with dynamic dispatch, since static dispatch doesn't have to obey the restrictions. * Changed lowering of existential types to tuples so that bound interfaces where the concrete type won't fit use a "pseudo-pointer" instead of an "any-value" to hold the payload * Changed IR type legalization to handle the "pseudo-pointer" case and apply layout information from an interface type over to the payload part when static specialization was used. * Changed some details of how witness tables were being lowered, so that we didn't have to create "proxy" witness tables for the constraints on associated types (just use the actual requirement entries we generate) * Changed witness tables so that they know the subtype doing the conforming * Added logic so that we don't generate pack/unpack logic and witness table wrapper functions for types that are incompatible with any-value/dynamic dispatch for a given interface. * Changed the core AST-level type layout logic to use the dynamic-dispatch layout in case things fit, and the legacy static specialization case when things don't (while also reserving space for the dynamic-dispatch fields) * Changed a bunch of test cases for static specialization to properly use the new layout (which introduces new buffers in some cases, and moves data around in others). Future Work =========== The experience of trying to reconcile our older way of handling interface-type specialization with our newer model (that supports dynamic dispatch) makes it clear that we really need to make similar changes to our handling of generic type parameters on entry points and at the global scope. A future change should make it so that a global type parameter is lowered with a type layout similar to a value parameter of interface type, including the RTTI and witness-table pieces, and just leaving out the "any value" piece. A similar translation strategy should apply to entry-point generic parameters (mirroring how we lower generic functions for dynamic dispatch already), and value specialization parameters. Co-authored-by: Yong He <yonghe@outlook.com>
2020-09-04	Allow mixing unspecialized and specialized existential parameters. (#1533)	Yong He
	* Allow mixing unspecialized and specialized existential parameters. * Fixes.
2020-07-08	Add support for global uniform shader parameters (#1433)	Tim Foley
	* Adding support for global uniform shader parameters This change adds support for Slang programmers to declare shader parameters of "ordinary" types at global scope: ```hlsl uniform float gScaleFactor; void main() { ... = gScaleFactor; ... } ``` The generated HLSL/GLSL/DXIL/SPIR-V output will be something along the lines of: ```hlsl struct GlobalParams { float gScaleFactor; } cbuffer globalParams { GlobalParams globalParams; } void main() { ... = globalParams.gScaleFactor; ... } ``` The binding information used for the implicit `globalParams` constant buffer will be determined by the existing implicit parameter binding logic (which already had support for this kind of transformation). The reason this change is being pursued right now is because it is one step toward removing the implicit `KernelContext` type that is used to wrap the generated code for our CPU and CUDA C++ targets. Handling global-scope parameters of ordinary type requires an IR pass that synthesizes the `GlobalParams` structure type above, and that step ends up removing the need for the similar `UniformState` structure that was being used in the CPU/CUDA emit logic. A more detailed guide to the changes included follows: * The diagnostic for a global-scope variable that is implicitly a shader parameter was kept, but changed to a warning. Users can opt out of the warning by decorating their parameter as a `uniform` (since that keyword is already being used to mark entry-point parameters that should be treated as uniform shader parameters). * To simplify the task of finding the global shader parameters, the `CLikeSourceEmitter` type has been given an `m_irModule` member. The previous emit logic for `UniformState` was having to do a roundabout solution involving the `EmitAction`s to deal with not having direct access to the module. * Removed a few dead declarations in the emit logic (related to a much earlier point where emit was based on the AST instead of the IR). * Made the computation of type names in C++ emit take into account `ConstantBuffer<T>` and `ParameterBlock<T>`. As far as I can tell, these were being handled with some special-case hacks in the emit logic instead of being supported more fundamentally. It might actually be good to pass these through as `ConstantBuffer<T>` and `ParameterBlock<T>` in the C++ output, and allow the prelude to customize their translation (defaulting to defining them as `T`). Removed the special-case C++ emit logic for references to global shader parameters. There are now at most two global shader parameters to deal with, and the default emit logic (referring to them by name) does the Right Thing. * Changed the handling of entry points for C++ (both CPU and CUDA) so that it handles the bundled-up shader paameters for the global and entry-point scopes the same way. The main complication here is OptiX, where parameter data is passed very differently than it is for CUDA compute kernels. * Reverted changes to `ir-entry-point-uniforms` that had made its logic depend on the compilation target. The parameter binding logic was already responsible for deciding if a given target needed to wrap up its entry-point parameters in a constant buffer, and the IR pass was respecting that layout information. The current workaround had been removing the `ConstantBuffer<T>` indirection from this IR pass for CPU/CUDA, but then reintroducing the same indirection later on in the emit step. * Added an explicit IR pass with the task of collecting global-scope parameters of uniform/ordinary type and packaging them up into a `struct`, and then optionally packaging that `struct` up in a constant buffer. This pass bases its decisions on the IR layout information that was already computed, so it should match whatever policy choices were made at the layout level. * Changed the "key" operand on IR `struct` layout information to not assume an `IRStructKey`. The problem here is that the global scope gets a `StructTypeLayout` to represent its members, and this is convenient (rather than having to always special-case logic that handles the global scope), but the "fields" of that struct are global variables which do not have `IRStructKey`s associated with them. The simplest solution is to use the variables themselves as the keys, which required removing the assumption in the IR encoding. * Updated the IR layout process to compute a layout for the global scope of an entire program, and to attach that to the `IRModule` via a decoration. Updated the IR linking process to carry through that decoration to the linked output. This is necessary so that the IR pass that transforms global parameters can access the global-scope layout information. An important concern with this approach is that the contents and layout of the monolithic `GlobalParams` structure depends on the exact set of modules that were linked (and the order in which they were specified, in some cases). This isn't really a new thing with this change, but it becomes more important as we start to think of how to generalize things to better support separate compilation and linking. There are changes that can (and should) be made to the way that IR layouts are computed for programs (e.g., so that we compute layout per-module and then combine them rather than as a whole-program step). In this case, the problem of forming the combined/linked global layout can be moved down the IR level and not be reliant on AST-level information. Just changing the way layout and linking interact would not change the fundamental problem that global shader parameters as they currently exist in Slang/HLSL/GLSL are not readily compatible with true separate compilation. We either need to find a solution strategy that we can apply to allow existing shaders to work with separate compilation or we need to incrementally work toward removing support for global-scope shader parameters in favor of explicit entry-point parameters in all cases. * fixup: missing files * fixup: comment the new code