slang.git - Making it easier to work with shaders

	Commit message (Collapse)	Author	Age
*	Rename some symbols related to pointers types (#8592)	Theresa Foley	2025-10-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Note that while this change touched a large numer of files, there are no changes to functionality being made here. The only things being done are renaming various symbols and, in a few cases, updating or adding comments for consistency with the new names. The core of the naming changes are: * Most things named to refer to `OutType` (e.g., `IROutType`, `IRBuilder::getOutType()`, etc.) have been consistently renamed to refer to `OutParamType`, to emphasize that the relevant AST/IR node types are only intended for use to represent `out` parameters. * The same change as described above for `OutType` is also made for `RefType`, which becomes `RefParamType` in most cases. One mess that this exposes is the way that the `ExplicitRef<T>` type in the core module currently lowers to `IRRefParamType`. This change sticks to the rule of not making functional changes, so that mess is left as-is for now. * Names referring to `InOutType` have been changed to instead refer to `BorrowInOutType`. The intention with this naming change is to emphasize that the Slang rules for `inout` are semantically those of a borrow (or at least our interpretation of what a borrow means). * Names referring to `ConstRefType` have been changed to instead refer to `BorrowInType`. This change starts work on clarifying that the existing `__constref` modifier was never intended to be a read-only analogue of `__ref`, and instead is the input-only analogue of `inout`. * The `ParameterDirection` enum type has been changed to `ParamPassingMode`, to reflect the fact that the concept of "direction" fails to capture what is actually being encoded, particularly once we have modes beyond simple `in`/`out`/`inout`. While this change does not alter behavior in any case (the user-exposed Slang language is unchanged), it is intended to set up subsequence changes that will work to make the handling of these types in the compiler more nuanced and correct. Breaking this part of the change out separately is primarily motivated by a desire to minimize the effort for reviewers. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
*	Enhance buffer load specialization pass to specialize past field extracts. ↵	Yong He	2025-09-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(#8547) This allows us to specialize functions whose argument is a sub element of a constant buffer, instead of being only applicable to entire buffer element. Closes #8421. This change also implements a proper heuristic to determine when to specialize the calls and defer the buffer loads. This PR addresses a pathological case exposed in `slangpy\slangpy\benchmarks\test_benchmark_tensor.py`, which used to take 27ms to finish, and now takes 1.25ms. For example, given: ``` struct Bottom { float bigArray[1024]; [mutating] void setVal(int index, float value) { bigArray[index] = value; } } struct Root { Bottom top[2]; [mutating] void setTopVal(int x, int y, float value) { top[x].setVal(y, value); } } RWStructuredBuffer<Root> sb; [shader("compute")] [numthreads(1, 1, 1)] void compute_main(uint3 tid: SV_DispatchThreadID) { sb[0].setTopVal(1, 2, 100.0f); } ``` We are now able to specialize the call to `setTopVal` into: ``` void compute_main(uint3 tid: SV_DispatchThreadID) { setTopVal_specialized(0, 1, 2, 100.0f); } void setTopVal_specialized(int sbIdx, int x, int y, float value) { Bottom_setVal_specialized(sbIdx, x, y, value); } void Bottom_setVal_specialized(int sbIdx, int x, int y, float value) { sb[sbIdx].top[x].bigArray[y] = value; } ``` And get rid of all unnecessary loads. Achieving this requires a combination of function call specialization and buffer-load-defer pass. The buffer-load-defer pass has been completely rewritten to be more correct and avoid introducing redundant loads. This PR also adds tests to make sure pointers, bindless handles, and loads from structured buffer or constant buffers works as expected.
*	Force inline functions that takes InputPatch and OutputPatch (#6407)	Jay Kwak	2025-02-19
\| \| \| \| \| \|	This commit inlines functions that takes InputPatch and OutputPatch as the function parameter. Co-authored-by: Yong He <yonghe@outlook.com>
*	Fix resource specialization issue where store insts from inlined calls are ↵	Sai Praveen Bangaru	2025-01-17
\| \| \| \| \| \| \|	not considered properly. (#6099) * Fix resource specialization issue where stores from inlined calls are not considered. * Format
*	Add `IDifferentiablePtrType` support for arrays (#5576)	Sai Praveen Bangaru	2024-11-18
\| \| \| \| \| \| \|	* Add `IDifferentiablePtrType` support for arrays - Also fixes an issue with spirv-emit of constructors that contain references to global params * Fix GLSL legalization for arrays of resource types
*	Make various parameters and return types require specialization when ↵	Anders Leino	2024-11-06
\| \| \| \| \| \| \| \| \| \| \|	targeting WGSL (#5483) Structured buffer types translate to array types in the WGSL emitter. WGSL doesn't allow passing runtime-sized arrays to functions. Similarly for pointers to texture handles. Also, structured buffers (runtime-sized arrays) cannot be returned in WGSL. This closes issue #5228, issue #5278 and issue #5288 by enabling specialized functions to be generated in these cases, in order to work around these constraints.
*	Move switch statement bodies to their own lines (#5493)	Ellie Hermaszewska	2024-11-05
\| \| \| \| \| \| \| \| \|	* Move switch statement bodies to their own lines * format --------- Co-authored-by: Yong He <yonghe@outlook.com>
*	Write only texture types. (#5454)	Yong He	2024-10-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add support for write-only textures. * Fix capabilities. * Fix implementation. * Fix. * format code --------- Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
*	format	Ellie Hermaszewska	2024-10-29
\| \| \| \| \| \| \|	* format * Minor test fixes * enable checking cpp format in ci
*	Fix resource specialization with `-embed-dxil` (#4990)	ArielG-NV	2024-09-04
\| \| \| \| \| \| \| \|	* Fix resource specialization with `-embed-dxil` fixes: #4989 Changes: 1. Before handing off to DCE an `oldFunc` which should be removed, clean up any leftover `IRKeepAliveDecoration` (else DCE won't remove our `oldFunc`s)
*	Fix invalid code generation for when using nested resource specialization ↵	ArielG-NV	2024-07-30
\| \| \| \|	(#4751)
*	Add generic descriptor indexing intrinsic (#4389)	dubiousconst282	2024-07-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add ResourceArray intrinsic type * Move aliased parameter generation to GLSL legalization * Add DynamicResourceEntry type for proxying layout of GenericResourceArray * Reimplement as DynamicResource * Add reflection test * Don't reuse alias cache between different parameters * Add dynamic cast extensions for buffer types * Minor format fix * Fix VarDecl diagnostics after finding non-appliable initializer candidates --------- Co-authored-by: Yong He <yonghe@outlook.com>
*	Add options to speedup compilation. (#4240)	Yong He	2024-05-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add options to speedup compilation. * Fix. * Plumb options to DCE pass. * Revert debug change. * Fix regressions. * More optimizations. * more cleanup and fixes. * remove comment. * Fixes. * Another fix. * Fix errors. * Fix errors. * Add comments.
*	Add `-minimum-slang-optimization` to favor compile time. (#4186)	Yong He	2024-05-17
\|
*	Remove use of `G0` and `__target_intrinsic` in stdlib. (#4170)	Yong He	2024-05-14
\| \| \| \| \| \| \|	* Remove use of `G0` and `__target_intrinsic` in stdlib. * Fix. * Fix calling intrinsic in global scope.
*	Implement 8.14-8.19 of OpenGL-GLSL specification	ArielG-NV	2024-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following PR implements 8.14-8.19 of the [OpenGL-GLSL specification](https://registry.khronos.org/OpenGL/specs/gl/GLSLangSpec.4.60.pdf). Fully implements all functions and built-in type's, resolves https://github.com/shader-slang/slang/issues/3692 for GLSL & SPRI-V targets. _Notes:_ Testing Tools: * Fragment shaders cannot test computational results. Only OpCodes are checked for proper emitting. Implementation Notes: * SubpassInput requires an unknown image format. * SubpassInput is disjoint from TextureType: __SubpassImpl (.slang) & SubpassInputType (Compiler) to reduce code generation required. * SubpassInput required an additional input layout modifier, input_attachment_index, this was added as a new parameter binding attribute. Since the following qualifiers can overlap with different resources (`layout(input_attachment_index = 0, binding = 0, set = 0)`) input_attachment_index is checked for overlapping resource bindings separately from other qualifiers with `LayoutResourceKind::InputAttachmentIndex`. * `GLSLInputAttachmentIndexLayoutModifier` was added to enforce function parameters only accepting `in` decorated variables. * `in` decorated variables needed to have emitting modified to allow directly emitting the variable into function calls if used as a parameter, normally Slang has a "global variable" shadow as a "global parameter" through a copy. This does not work and is solved using `GlobalVariableShadowingGlobalParameterDecoration` to build a relationship of "global variable" to "global parameter", we then resolve this relationship and replace "global variable" uses later in compile. * `AtomicCounterMemory` memory-constraint requires `OpCapability AtomicStorage`, `AtomicStorage` is invalid for Vulkan targets. glslang outputs for `barrier`, `memoryBarrier`, and `groupMemoryBarrier` `AtomicCounterMemory` as a memory constraint. This compiles as valid SPIR-V for Vulkan since `OpCapability AtomicStorage` is not declared. This behavior of glslang is undefined as per [3.31.Capability of the SPIR-V specification](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_capability). We will omit `AtomicCounterMemory` from our barrier calls.
*	Fix spirv generation for using output stream in a function. (#3806)	Yong He	2024-03-20
\| \| \| \| \|	* Fix spirv generation for using output stream in a function. * polish.
*	Refactor compiler option representations. (#3598)	Yong He	2024-02-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Refactor compiler option representation. * Fix binary compatibility. * Add a test for specifying compiler options at link time. * Fix binary compatibility. * Fix binary compatibility. * Fix backward compatibility on matrix layout. * Fix. * Fix. * Fix. * Fix gfx. * Fix gfx. * Fix dynamic dispatch. * Polish.
*	Fix spirv emit that leads to pathological downstream time. (#3546)	Yong He	2024-02-03
\|
*	Add ConstBufferPointer::subscript. (#3415)	Yong He	2023-12-15
\| \| \|	Co-authored-by: Yong He <yhe@nvidia.com>
*	Small warnings and bugs (#3272)	Ellie Hermaszewska	2023-10-11
\| \| \| \| \| \| \| \| \| \| \|	* Correctly use removeTrivialSingleIterationLoops during simplification * remove unused variables * Fix invalid fallthrough --------- Co-authored-by: Yong He <yonghe@outlook.com>
*	SPIRV compiler performance fixes. (#3258)	Yong He	2023-10-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* SPIRV compiler performance fixes. * Cleanup. * update project files * Cleanup debug code. * Make redundancy removal non-recursive. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Add SPIRV intrinsics for ShaderExecutionReordering and RW/Buffer. (#3252)	Yong He	2023-10-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add SPIRV intrinsics for ShaderExecutionReordering. * Add intrinsics for `Buffer` and `RWBuffer`. * Various spirv fixes. * Marshal bool vector type. * Inline global constants + OpFOrdNotEqual->OpFUnordNotEqual. * Fix. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Proper lowering of functiosn that returns NonCopyable values. (#3179)	Yong He	2023-09-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Proper lowering of functiosn that returns NonCopyable values. * Fix tests. * Fix clang errors. * Fix. * Fix clang error. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Fix GLSL code gen around RayQuery and HitObject types. (#3173)	Yong He	2023-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Update slang-llvm. * Fix. * fix. * Fix unit tests for multi-thread execution. * Fix tests. * fixes. * update tests. * Add gfx-smoke to linux expected failure list. * Try fix test. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Inline all RayQuery/HitObject typed functions when targeting GLSL. (#3172)	Yong He	2023-08-31
\| \| \| \| \| \| \| \| \|	* Inline all RayQuery/HitObject typed functions when targeting GLSL. * update test. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Add SPIRV atomics intrinsics and fix buffer layout lowering. (#3170)	Yong He	2023-08-31
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix atomics intrinsics and buffer layout lowering. * Fix. * Add more test. * Fix. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Fixes for Shader Execution Reordering on VK (#2929)	Theresa Foley	2023-06-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fixes for Shader Execution Reordering on VK There are some mismatches between the way that hit objects are handled between the current NVAPI/HLSL and proposed GLSL extensions for shader execution reordering. These mismatches create complications for generating valid GLSL/SPIR-V code from input Slang. Many of the problems that apply to `HitObject` also apply to the existing `RayQuery<>` type used for "inline" ray tracing. In the case of `RayQuery<>` we have that for both HLSL and GLSL/SPIR-V: * A `RayQuery` (or `rayQueryEXT`) is an opaque handle to underlying mutable storage * The storage that backs a `RayQuery` is allocated as part of the "defualt constructor" for a local variable declared with type `RayQuery`. * The `RayQuery` API provides numerous operations that mutate the storage referred to by the opaque handle. The key difference between HLSL and GLSL/SPIR-V for the case of a `RayQuery` amounts to: * In HLSL, local variables of type `RayQuery` can be assigned to, and assignment has by-reference semantics. It is possible to create multiple aliased handles to the same underlying storage. * In GLSL/SPIR-V, local variables of type `rayQueryEXT` cannot be assigned to, returned from functions, etc. It is impossible to create multiple aliased handles to the same underlying storage. The case for `HitObject`s is signicantly more messy, because: * In NVAPI/HLSL a `HitObject` is effectively a "value type" in that it only exposes constructors, and there is no way to mutate the state of a `HitObject` other than by assignment to a variable of that type. It makes no semantic difference whether a `HitObject` directly stores the value(s), or if it is a handle, since there is no way to introduce aliasing of mutable state. Assignment of `HitObject`s semantically creates a copy. * In GLSL/SPIR-V, a `hitObjectNV` is, like a `rayQueryEXT`, a handle to underlying mutable state. These handles cannot be assigned, returned from functions, etc. There is no way to make a copy of a hit object. This change includes several changes to how both `RayQuery<>` and `HitObject` are implemented, with the intention of getting more cases to work correctly when compiling for GLSL/SPIR-V, and to set up a more clear mental model for the semantics we want to give to these types in Slang, and how those semantics can/should map to our targets. An overview of important changes: * Marked a few operations on `RayQuery` as `[mutating]` that realistically should have already been that way. * Marked the `HitObject` type as being non-copyable (an attribute we do not currently enforce), and marked the various GLSL operations that construct a hit object as having an `out` parameter of the `HitObject` type (even if they are nominally specified in GLSL as not writing to the correspondign parameter). * Added a distinct IR opcode (`allocateOpaqueHandle`) to represent the implicit allocation that happens when declaring a variable of type `HitObject` or `RayQuery`, and made the "implicit constructor" for those types map to the new op. This operation took a lot of tweaking to get emitting in a reasonable way, and I'm still not 100% sure that all of the emission-related logic for it is strictly required (or correct). * Added new IR instructions for `HitObject` and `RayQuery` types, and made the stdlib types map to those IR instructions. * Treat `HitObject` and `RayQuery` as resource types for the purpose of our existing pass that specializes calls to functions that have outputs of resource type * Added a new test case that includes a function that returns a `HitObject` as its result. * Many test cases saw slight changes in their output (especially around the relative ordering of declarations of `HitObject`s and `RayQuery`s with other instructions) * Remove debugging logic
*	Add API for querying total compile time. (#2898)	Yong He	2023-05-23
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Add API for querying total compile time. * Optimize. * Remove redundant simplifyIR calls. * Fix. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Remove `SharedIRBuilder`. (#2657)	Yong He	2023-02-16
\| \| \|	Co-authored-by: Yong He <yhe@nvidia.com>
*	Overhaul global inst deduplication and cpp/cuda backend. (#2654)	Yong He	2023-02-16
\| \| \| \| \| \| \| \| \|	* Overhaul global inst deduplication and cpp/cuda backend. * Update IR documentation. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Mesh shader support (#2464)	Ellie Hermaszewska	2022-11-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add gdb generated files to .gitignore * Switch to c++17 TODO: Ellie update coding style doc * WIP mesh shaders * Add MeshOutputType and mesh output decorations * Lift array type layout creation out of _createTypeLayout in preparation for sharing it elsewhere * Initial pass at GLSL legalization for mesh shaders * Create output types for builtin mesh outputs This should be rendered as an out paramter block * Handle writes to member fields in mesh shader output * Per primitive output from mesh shaders * Add mesh shader tests * Redeclare mesh output builtins * Remove unused instruction * Emit explicit mesh output max max size * Add unimplemented warning for array members in mesh output * Implement mesh output splitting for GLSL in terms of getSubscriptVal * Allow HLSL syntax for mesh output modifiers * Improve error messages for mesh output * Add test for HLSL style mesh output syntax * Emit explicit mesh output indices max size * HLSL generation support for mesh shaders * Better errors for mesh shader misuse * Neaten comments * Regenerate vs2019 project files * Fix build on vs2019 * Retreat on c++17 Will make the change in a separate PR * slang-glslang binary dep 11.10.0 -> 11.12.0-32 * Fixes for msvc compiler * Update msvc project
*	Fix resource inout param specialization. (#2394)	Yong He	2022-09-05
\| \| \|	Co-authored-by: Yong He <yhe@nvidia.com>
*	Multi parameter `__subscript` (#2392)	Yong He	2022-09-05
\| \| \| \| \| \| \| \| \| \| \|	* Multi parameter `__subscript` * Fix. * Fix bugs. * Fix. Co-authored-by: Yong He <yhe@nvidia.com>
*	Clean up void returns. (#2260)	Yong He	2022-06-01
\| \| \| \| \| \| \|	* Clean up `IRReturnVoid`. * Update gitignore. Co-authored-by: Yong He <yhe@nvidia.com>
*	Refactor: eliminate BackEndCompileRequest (#2178)	Theresa Foley	2022-04-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An earlier refactoring pass over the compiler codebase split the type that had been called `CompileRequest` into three distinct pieces: * `FrontEndCompileRequest` which was supposed to own state and options related to running the compiler front end and producing IR + reflection (e.g., what translation units and source files/strings are included). * `BackEndCompileRequest` which was supposed to own state and options related to running the compiler back end to translate the IR for a `ComponentType` (program) into output code. (Note that the `BackEndCompileRequest` was conceived of as orthogonal to the `TargetRequest`s, which store per-target and target-specific options.) * `EndToEndCompileRequest` which was an umbrella object that owns separate front-end and back-end requests, plus any state that is only relevant when doing a true end-to-end compile (such as the kinds of compiles initiated with `slangc`). As originally conceived, the only state that this type was supposed to own was stuff related to "pass-through" compilation, as well as state related to writing of generated code to output files. That refactoring work was very useful at the time, because it allowed us to "scrub" the back end compilation steps to remove all dependencies on front-end and AST state (this was important for our goals of enabling linking and codegen from serialized Slang IR). At this point, however, it is clear that the hierarchy that was built up serves very little purpose: * The `BackEndCompileRequest` type is only used in two places: * As part of an `EndToEndCompileRequest`, where the settings on the `BackEndCompileRequest` can be configured, but only through the `EndToEndCompileRequest` * As part of on-demand code generation through the `IComponentType` APIs. In this case, the settings stored on the `BackEndCompileRequest` are not accessible to the application at all, and will always use their default values, so that instantiating a "request" object doesn't really make any sense. * The `FrontEndCompileRequest` type has a similar situation: * Front-end compilation as part of an `EndToEndCompileRequest` supports user configuration of `FrontEndCompileRequest` settings, but only through the `EndToEndCompileRequest` * Front-end compilation triggered by an `import` or a `loadModule()` call does not support user configuration of settings at all. It will always derive all relevant settings from thsoe on the session ("linkage"). In addition, subsequent changes have been made to the compiler that show a bit of a "code smell" and/or forward-looking worries for this decomposition: * In some cases we've had to add the same setting to multiple types in the breakdown (front-end, back-end, end-to-end, linkage, target, etc.) which makes it harder for us to validate that all the possible mixtures of state work correctly. * Related to the above, in some cases we have manual logic that copies state from one of the objects in the breakdown to another, in order to ensure that the user's intention is actually followed. * As a forward-looking concern, it seems that developers have sometimes added new configuration options and state to places that don't really make sense according to the rationale of the original decomposition (e.g., we probably don't want to have a lot of state that is only available via end-to-end requests, given that the API structure is meant to push users away from end-to-end compiles). As a result of all of the above, I've been planning a large refactor with the following big-picture goals: * Eliminate `BackEndCompileRequest` * Move all relevant state/options from the back-end request to the end-to-end request, since that is the only place they could be set anyway. * Introduce a transient "context" type to be used for the duration of code generation that serves the main functions that back-end requests really served in the codebase * Make `EndToEndCompileRequest` be a subclass of `FrontEndCompileRequest` * Consider addding a transient "context" type for front-end compiles that can be used in `import`-like cases rather than needing a full front-end request object. If this works, then eliminate `FrontEndCompileRequest` and be back to world with just a single `CompileRequest` type * Move all compiler configuration options to a distinct type (named something like `CompilerConfig` or `CompilerOptions` or whatever) which stores setting as key-value pairs, and has a notion of "inheritance" such that one configuration can extend or build on top of another. Make all the relevant types use this catch-all structure instead of redundantly storing flags in many places. This change deals with the first of those bullets: removeal of `BackEndCompileRequest`. The addition of the `CodeGenContext` type is perhaps an unncessary additional step, but making that change helps clean up a bunch of the code related to per-target code generation, so I think it is the right choice. Co-authored-by: Yong He <yonghe@outlook.com>
*	Improved SCCP, inlining and resource specialization passes, legalize ↵	Yong He	2022-02-25
\| \| \| \|	`ImageSubscript` for GLSL (#2146)
*	Cleanup refactoring work around the IR builder (#2061)	Theresa Foley	2021-12-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Cleanup refactoring work around the IR builder We have some long-term goals for the IR that require a more centralized and disciplined set of rules for how IR instructions get created/emitted. I had been working on trying to set things up so that all IR instruction creation goes through a single bottleneck point, but the non-trivial work in that branch was getting drowned out by the sheer volume of cleanup and refactoring changes. This change tries to pull together several of the more important cleanups. The big pieces are: * `IRBuilder` and `SharedIRBuilder` now protect their data members and rely on users to initialize them more directly via constructor of an `init()` method. This change affects a bunch of sites where `IRBuilder`s were created. I changed use sites to use the constructors whenever possible, and to use `init()` in cases where we had longer-lived builders that needed to be initialized multiple times. * The insertion location for the `IRBuilder` now uses an encapsulated type called `IRInsertLoc`. This new type can replace what used to be just two `IRInst` fields in the builder, and also covers some new functionality (if we ever want to take advantage of it). Very little client code cares about this change, but it is still a nice cleanup in terms of making things more explicit. The creation of an `IRModule` has been moded out of `IRBuilder`, because in practice we `IRBuilder` always wants to be associated with a pre-existing `IRModule` at creation time (via its `SharedIRBuilder`). There is now an `IRModule::create()` operation instead. This required changing the sequencing at many `IRModule` creation sites, since most had been contriving to make an `IRBuilder` first. There were also several cleanups because code had been carelessly using non-reference-counted pointers for `IRModule`s in ways that broke now that `IRModule::create()` always returns a `RefPtr`. * The core operations to actually allocate memory for IR instructions were moved into `IRModule` (since they interact with the memory pool that the module owns). These were called `createEmptyInst()` but have been renamed into `_allocateInst()`. In principle these seem like they should only be needed to be called by the `IRBuilder`, but in practice they are also needed by the IR deserialization logic. * A few core operations for emitting IR instructions that were associted with `IRBuilder` were moved to actually be methods on `IRBuilder`. First is `_findOrEmitConstant` which is the primary bottleneck for creating simple scalar constant values. Another is `_createInst` (formerly part of the templated `createInstImpl` along with `createInstWithSizeImpl`) which is the main bottleneck for allocation and initialization of any instruction other than a constant (well, the `IRModuleInst` is the other exception...). Finally, there is also `_maybeSetSourceLoc()`, which is obvious to scope inside the `IRBuilder` once it is protecting the source-location info. Notes: * The `minSizeInBytes` parameter to `_createInst()` might not actually be needed at all. At this point any `IRInst` subtypes that need data allocated for things other than their operands already get created manually via `_allocateInst` or `_findOrEmitConstant`, so I think we could remove that part. I will handle that in a subsequent cleanup if it turns out to be the case. * There is one IR pass (`slang-ir-string-hash.cpp`) that is using manual `_allocateInst()` instead of going through an `IRBuilder`. It could be easily cleaned up to not do so (and I will probably make that change down the line), but for now I wanted to avoid doing anything that wasn't close to pure refactoring if I could. * At this point in our design an `IRBuilder` is a very lightweight thing - it basically just owns the insertion location plus a source location to write into instructions. A lot of our code currently treats `IRBuilder`s like they are expensive and/or need to be re-used (which leads to them being used in more mutable/stateful ways). It is quite likely that as we clean up other aspects of the implementation of IR creation/emission we can make `IRBuilder` use feel more lightweight in ways that can streamline and simplify code. * The next step for this work is to identify the different paths that eventually lead to `_createInst()` being called, and unify them at a single bottleneck operation that can own the decisions around when to create an instruction vs. when to re-use an existing one (rather than those decisions being baked into the various `IRBuilder` subroutines that create instructions of the various subtypes). * fixup: gcc/clang C++ spec details
*	Fix a few issues around opaque types as outputs (#1918)	Theresa Foley	2021-08-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix a few issues around opaque types as outputs Slang and HLSL support opaque types (textures, buffers, samplers, etc.) as members of `struct`s, mutable local variables, function results, and `out`/`inout` parameters. GLSL and SPIR-V do not. In order to translate Slang code over to GLSL/SPIR-V we use a variety of passes that seek to eliminate all of the above use cases and produce code that only uses opaque types in the limited ways that GLSL/SPIR-V allow. This change relates to the passes that deal with function results and `out`/`inout` parameters. There are two basic changes here: 1. The `specializeResourceOutputs` pass was only dealing with resource (texture/buffer) types. This change updates it to process sampler types as well. 2. The sequencing of the passes made it possible that an opaque-typed local variable might be left around after `specializeResourceOutputs`, which would mean the code is still invalid for GLSL/SPIR-V. This change adds an additional SSA-formation pass which would eliminate any opaque-type local variables whose lifetimes were made simple enough by the optimizations. Together these changes fix a problem-case user shader that was failing to compile for Vulkan. * Update slang-emit.cpp Fix typo 'reuslt' * Update slang-emit.cpp Comment change to re-trigger CI build. Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
*	Work to mitigate SPIR-V bloat (#1914)	Theresa Foley	2021-07-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Work to mitigate SPIR-V bloat SPIR-V is not an especially compact format, but some patterns in how Slang generates code and then runs it through `spirv-opt` lead to many redundant field-by-field copy operations being emitted. This change attempts to address some of the resulting bloat from the Slang side of things. Note: experimentation shows that the bloat is less pronounced when running either no SPIR-V optimizations or full SPIR-V optimizations, so it is also likely that the bloat should be addressed by changing which `spirv-opt` passes the Slang compiler runs in default (`-O1`) builds. Such changes should come as a distinct pull request. This change primarily does two things: First, the code generation strategy for passing arguments to `out` and `inout` parameters has been changed. In the past, the compiler would always copy the argument value into a temporary, then pass the address of the temporary, and then write back the value after the call. The new code generation strategy attempts to identify when an argument value already has a simple address in memory and passes that address directly when possible. This eliminates many copy operations that occur before/after calls to functions with `out`/`inout` parameters. Second, we introduce an IR optimization pass that detects call sites where the entire contents of a buffer (usually a constant buffer) is being passed to a callee function, such that many bytes are loaded and then passed even if only very few are used in the callee. The pass moves the load operations from the caller to a specialized version of the the callee where possible (e.g., when the constant buffer in question is a global shader parameter). Doing this eliminates another major category of copies. Notes: * The IR lowering logic is complicated by the fact that several kinds of l-values (values that are usable as the desitnation of assignment, or for `out`/`inout` arguments) are not actually addressable. An easy example is a non-contiguous swizzle like `v.xwz` on a `float4`, where the value occupies 12 bytes, but not 12 consecutive bytes with a single address. There are many more corner cases like that and the IR lowering pass carries a lot of complexity to deal with them. A more systematic overhaul is due some time soon. * The IR representation of `out` and `inout` parameters deserves some careful scrutiny when making these kinds of changes. The official semantics of `inout` in HLSL has been "copy in copy out" (and `out` is just "copy out") which is observably different from any solution that passes in the address of an l-value directly. By making this change we are saying that Slang's semantics are not precisely those of legacy HLSL, and that our semantics for `inout` parameters are closer to those of `inout` in Swift or of a mutable borrow in Rust. In the Swift case the implementation can freely pass the underlying storage of an l-value or the address of a temporary, and valid programs may not observe the different. It is thus illegal to observe the value in a storage local while a mutation to that location is "in flight." All of this is way more detailed and technical than 99% of Slang users will ever care about, but importantly it gives us semantic cover to eliminate these copies in the IR, and also to emit output C++ code that implements `out` and `inout` as by-reference parameter passing. * There was an exsting generic pass for specializing functions based on call sites that uses a "template method" style of pattern to customize its behavior. That pass needed to be generalized to handle this use case because it had previously operated on the assumption that the "desire" to specialize a callee function must be driven by the parameter declarations of that function, and not on the argument values passed in. The code has been slightly refactored to allow the policy for specialization to consider both parameters and arguments. * Unsurprisingly, a bunch of the GLSL (and thus SPIR-V) generated has changed with this work, so several baseline `.slang.glsl` files needed to be updated. * This change is incomplete in that it does not address broader cases of buffer loads, including both partial loads from constant buffers (just loading one field, but a field that uses a "large" structure type), and loads from multi-element buffers (a lot from a structured buffer where the element type is "large"). The main question in each of those cases is how to define how "large" a structure needs to be before we decide to try and sink loads into callee functions like this. In the worst case, sinking loads in this way may actually create more memory traffic (because the same values get loaded in multiple callee functions). * fixup: run premake * fixup: typo
*	Add support for returning structures that contain opaque types (#1835)	Tim Foley	2021-05-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduction ============ Several of our target platforms share a concept of "opaque" types, including resources (`Texture2D`) and samplers (`SamplerState`), which are restricted in how they can be used. GLSL and SPIR-V place very severe restrictions, in that opaque types cannot be used for the type of: * (mutable) local variables * (mutable) global variables * structure fields * Function result/return * `out` or `inout` parameters The HLSL language allows all of these cases, but with the practical caveat that the compiler front-end must be able to statically analyze how opaque types have been used and "optimize away" all of the above cases. For example, it is legal to have a local variable of an opaque type, but at any point where the variable gets used it must be statically known which top-level shader parameter the variable refers to. Existing Work ============= In the Slang compiler we need to implement our own passes to detect these "illegal" uses of opaque types and legalize them. The work is basically broken into two distinct steps: * The existing `legalizeResourceTypes()` pass detects illegal types (e.g., a `struct` that has a field of type `Texture2D`) and replaces them with legal types, sometimes by splitting apart declarations (e.g., a parameter using such a `struct` type gets split into multiple parameters). At a high level, we can think of this as "exposing" opaque types so that they are not hidden inside of nested structures. * Next, the `specializeResourceOutputs()` pass detects calls to functions that output opaque types (whether by the function return value of `out` / `inout` parameters). The pass analyzes the body of such functions, and tries to isolate the logic that determines their resource-type outputs and hoise that logic into call sites (so that the opaque-type outputs can then be eliminated). This Change =========== One important missing case was that the type legalization step was incapable of legalizing types that appear in the result/return type of functions. The existing logic would simply diagnose an internal/unimplemented error if it ecountered a non-simple type in the return position. At a high-level, supporting this case seems simple enough. Given a function signature like: ``` struct Things { int a; Texture2D b; } Things myFunc(int x) { ... } ``` we want to split the result type into an "ordinary" result type and then `out` parameters for any opaque-type fields: ``` struct Things_Legal { int a; } Things_Legal myFunc(int x, out Texture2D result_b) { ... }; ``` Similarly, at a call site to a function like this: ``` Things t = myFunc(99); ``` we split the function result into ordinary and opaque-type parts, and pass the latter as `out` parameters: ``` Texture2D t_b; Things_Legal t = myFunc(99, /out/ t_b); ``` The main place where things get tricky is when dealing with `return` sites within the body of a function that needs legalization: ``` Things myFunc(int x) { ... Things things = ...; ... return things; } ``` In theory the answer is simple: a `return` translates into writes to the `out` parameters for any opaque-type data, followed by a return of the ordinary-type part: ``` Things_Legal myFunc(int x, out Texture2D result_b) { ... Things_Legal things = ...; Texture2D things_b = ...; ... result_b = things_b; return things; } ``` The sticking point here is that this step requires tracking data between the legalization of the parameter list for `myFunc` and legalization of the `return`s in its body, so that we can identify the `result_b` parameter to be able to write to it. The existing type legalization pass was not built with the idea that such communication is commonly needed; it assumes that each instruction can be legalized in isolation, so long as dependencies are respected. This change adds logic such that the `legalizeFunc()` step sets up a data structure that it used to represent information about how a function (and its parameter list) got legalized, so that the logic for a `return` can make use of that legalized information. Right now the information we track consists of just the list of parameters that were introduced to represent a return/result type. Testing ======= In order to confirm what features do/don't work, I added a set of tests that cover a cross-product of opaque type use cases: * The opaque type can be used in the function result type, an `out` parameter, or an `inout` parameter * The opaque type can be used "directly" or nested inside a `struct`. These tests are helpful to make sure we handle the most important cases, but it is worth noting that the coverage is still lacking in that we do not sufficiently test all the options for what the function body might do. An opaque-type function result could be derived from many different sources: * It could be a global shader parameter * It could be an `in` or `inout` parameter of the function itself * It could be wrapped up in one or more structure types * It could be wrapped up in one or more array types (such that the output of specialization needs to pass around array indices) * It could involve use of the type as a local variable (including passing it into other functions with result/`out`/`inout` outputs of opaque types) This change makes it so that we can handle the simplest cases involving result/return types with a wrapper `struct`, and adds test cases that confirm we handle several other cases for `out` and `inout` parameters. Gaining confidence that we cover all the cases that arise in practical shaders will require more work over following changes.
*	Add an accessor for IRInst opcode (#1707)	Tim Foley	2021-02-16
\| \| \| \| \| \| \| \| \|	* Add an accessor for IRInst opcode This main changing is renaming `IRInst::op` over to `IRInst::m_op` and then adds an accessor `IRInst::getOp()` to read it. The rest of the changes are just changing use sites to `getOp` (or to `m_op` in the limited cases where we write to it). This work is in anticipation of a future change that might need to store an extra bit in the same field as the opcode. It seemed better to do this massive refactoring as a separate PR. * fixup
*	Add a pass to support resource return values (#1537)	Tim Foley	2020-09-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A long-standing problem for the Slang implementation has been that some targets (notably GLSL/SPIR-V) do not support treating resources (textures, buffers, samplers, etc.) as first-class types. Resource types on such platforms are restricted so that they may not be used as the type of: 1. fields of aggregate types (`struct`s) 2. local variables 3. function results or `out`/`inout` parameters Issue (1) is handled by our "type legalization" pass today, by splitting aggregates that contain resources into separate fields/variables/parameters. Issue (2) is worked around by putting code into SSA form and promoting local variables to SSA temporaries when possible; the net result is that many local variables of texture type are eliminated (that pass is not perfect, though, and it is possible for users to get errors when it doesn't fully clean up local variables of texture type). Issue (3) is a much more complicated matter, and it is what this change is concerned with. A typical solution to issue (3) is to simply inline all of the code in a program, at which point function results and `out`/`inout` parameters will no longer exist to cause problems. We reject such solutions for two reasons. First, there are limitations on control-flow structure in HLSL/GLSL/SPIR-V that mean they cannot express certain programs after inlining has been performed. Second, and more importantly, the philosophy of the Slang compiler is to perform as little duplication of code as possible, so that we do not accidentally contribute to binary size bloat. Instead, this change tackles the problem of functions that output resource types by adding a new specialization pass. The pass detects functions that ought to be specialized (because they have resource-type outputs), and inspects their bodies to see if the values they output have a predicatable structure that can be replicated outside of the function body. The same logic that inspects the function body also rewrites (a copy of) the function to not have the offending outputs. Finally, all the call sites to a function that is rewritten in this way also get rewritten so that instead of using output values from the function itself, they reproduce the expected output value(s) in their own code. The pass as presented here is intentionally limited in the scope of what it can optimize away (and the test case only touches on that specific functionality). The goal is to get a basic version of this pass in place and evaluated, and then to expand on its functionality incrementally over time.
*	Run array specialization in a sperate pass. (#1449)	Yong He	2020-07-23
\| \| \| \| \| \| \|	* Run array specialization in a sperate pass. * rename specializeFunctionCall->specializeFunctionCalls Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Disable specializing function calls if they have a struct param, that ↵	jsmall-nvidia	2020-07-17
\| \| \| \| \| \| \| \| \|	contains an array (#1448) * This code is disabled, it was part of the optimization `Specialize function calls involving array arguments. (#1389)` on github. It is disabled here because it causes a problem when a struct is passed to a function that contains a structured buffer and an array. It is specialized on the struct type, and so those types become parameters to the function. If the struct contains a structured buffer this is a problem on GLSL/VK based targets because currently structured buffers cannot be function parameters. The fix for now is to just disable this optimization. * Fix typo in name of test expected values.
*	Specialize function calls involving array arguments. (#1389)	Yong He	2020-06-15
\| \| \| \| \|	Fixes #890. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Use slang- prefix on slang compiler and core source (#973)	jsmall-nvidia	2019-05-31
	* Prefixing source files in source/slang with slang- * Prefix source in source/slang with slang- prefix. * Rename core source files with slang- prefix. * Update project files. * Fix problems from automatic merge.