slang.git - Making it easier to work with shaders

	Commit message (Collapse)	Author	Age
*	Use symbol alias instead of wrapper synthesis to implement link-time types. ↵	Yong He	2025-10-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(#8603) This change achieves link-time type resolution with a different mechanism. For `extern struct Foo : IFoo = FooImpl;`, instead of synthesizing a wrapper type `Foo` that has a `FooImpl inner` field and dispatches all interface method calls to `inner.method()`, this PR completely removes this synthesis step, and instead just lower such `extern`/`export` types as `IRSymbolAlias` instructions that is just a reference to the type being wrapped. Then we extend the linker logic to clone the referenced symbol instead of the SymbolAlias insts itself during linking. By doing so, we greatly simply the logic need to support link-time types, and achieves higher robustness by not having to deal with many AST synthesis scenarios. Closes #8554. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
*	Relax restriction on using link-time types for shader parameters. (#8387)	Yong He	2025-09-05
\| \| \| \| \| \| \| \| \|	This change relaxes a previous restriction on link-time types and constants, so that we now allow them to be used to define shader parameters. Doing so will result in a parameter layout that is incomplete prior to linking. The PR added a test to call the reflection API on a fully linked program and ensure that we can report correct binding info.
*	Resolve 'extern' types during type layout generation if possible (#6450)	Ellie Hermaszewska	2025-02-28
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Resolve 'extern' types during type layout generation if possible Closes https://github.com/shader-slang/slang/issues/5994 Closes https://github.com/shader-slang/slang/issues/6437 * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: Yong He <yonghe@outlook.com>
*	Fix CUDA reflection for acceleration structure handle size. (#6055)	Yong He	2025-01-10
\|
*	Fix Metal type layout reflection for nested parameter blocks. (#6042)	Yong He	2025-01-10
\|
*	Add datalayout for constant buffers. (#5608)	Yong He	2024-11-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add datalayout for constant buffers. * Fixes. * Fix test. * Fix glsl codegen. * Update spirv-specific doc. * Fix test. * Fix binding in the presense of specialization constants. * address comments. * Add a test for constant buffer layout.
*	format	Ellie Hermaszewska	2024-10-29
\| \| \| \| \| \| \|	* format * Minor test fixes * enable checking cpp format in ci
*	preparation for clang format (#5422)	Ellie Hermaszewska	2024-10-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Clang-format excludes * Add .clang-format * Don't clang-format in external * Missing includes and forward declarations * Replace wonky include-once macro name * neaten include naming * Add clang-format to formatting script * Add xargs and diff to required binaries * add clang-format to ci formatting check * Add max version check to formatting script * temporarily disable checking formatting for cpp files
*	Fix spirv debug info for pointer types. (#5319)	Yong He	2024-10-16
\| \| \| \| \|	* Fix spirv debug info for pointer types. * fix comment.
*	Support parameter block in metal shader objects. (#4671)	Yong He	2024-07-19
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Support parameter block in metal shader objects. * Ingore parameter block tests on devices without tier2 argument buffer. * Fix warning. * Fix texture subscript test. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Move the file public header files to `include` dir (#4636)	kaizhangNV	2024-07-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Move the file public header files to `include` dir Close the issue (#4635). Move the following headers files to a `include` dir located at root dir of slang repo: slang-com-helper.h -> include/slang-com-helper.h slang-com-ptr.h -> include/slang-com-ptr.h slang-gfx.h -> include/slang-gfx.h slang.h -> include/slang.h Change cmake/SlangTarget.cmake to add include path to every target, and change the source file to use "#include <slang.h>" to include the public headers. The source code update is by the script like follow: ``` fileNames_slang=$(grep -r "\".slang\.h\"" source/ -l) for fileName in "${fileNames_slang[@]}" do echo "$fileName" sed -i "s/\".slang\.h\"/\"slang\.h\"/" $fileName done ``` * Fix the test issues * Fix cpu test issues by adding include seach path * Update cmake to not add include path for every target Also change "#include <slang.h>" to "include "slang.h" " to make the coding style consistent with other slang code. * Change public include to private include for unit-test and slang-glslang
*	Implementing `tbuffer` layout(s) (#4436)	ArielG-NV	2024-06-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Implementing `tbuffer` layouts. 1. Add to layout options 'TextureBuffer' layouts. 2. Add on to existing logic a way to allocate appropriate registers for TextureBufferType (this was made to work with parameter block logic). 3. Added asserts so objects missing a layout will gracefully crash This means `tbuffer` now works for hlsl,glsl,metal targets, spirv has yet to implement logic for `TextureBufferType`. * disable metal tests and fix emitting code a bit fixing the emitting code means metal compilation emits a useful error (help point users/developers to #4435) * fix warning
*	Parameter layout and reflection for Metal bindings. (#4022)	Yong He	2024-04-24
\|
*	Support combined texture sampler when targeting HLSL. (#3963)	Yong He	2024-04-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Support combined texture sampler when targeting HLSL. * Fix glsl intrinsics. * Update source/slang/slang-ir-lower-combined-texture-sampler.cpp Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> * Update source/slang/slang-ir-lower-combined-texture-sampler.cpp Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> * Update source/slang/slang-ir-lower-combined-texture-sampler.cpp Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> * Fix., * Enhance test. * Remove unused field. * Fix indentation --------- Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com>
*	[GFX] Fix d3d12 buffer view creation logic for StructuredBuffers. (#3954)	Yong He	2024-04-15
\|
*	Implement 8.14-8.19 of OpenGL-GLSL specification	ArielG-NV	2024-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following PR implements 8.14-8.19 of the [OpenGL-GLSL specification](https://registry.khronos.org/OpenGL/specs/gl/GLSLangSpec.4.60.pdf). Fully implements all functions and built-in type's, resolves https://github.com/shader-slang/slang/issues/3692 for GLSL & SPRI-V targets. _Notes:_ Testing Tools: * Fragment shaders cannot test computational results. Only OpCodes are checked for proper emitting. Implementation Notes: * SubpassInput requires an unknown image format. * SubpassInput is disjoint from TextureType: __SubpassImpl (.slang) & SubpassInputType (Compiler) to reduce code generation required. * SubpassInput required an additional input layout modifier, input_attachment_index, this was added as a new parameter binding attribute. Since the following qualifiers can overlap with different resources (`layout(input_attachment_index = 0, binding = 0, set = 0)`) input_attachment_index is checked for overlapping resource bindings separately from other qualifiers with `LayoutResourceKind::InputAttachmentIndex`. * `GLSLInputAttachmentIndexLayoutModifier` was added to enforce function parameters only accepting `in` decorated variables. * `in` decorated variables needed to have emitting modified to allow directly emitting the variable into function calls if used as a parameter, normally Slang has a "global variable" shadow as a "global parameter" through a copy. This does not work and is solved using `GlobalVariableShadowingGlobalParameterDecoration` to build a relationship of "global variable" to "global parameter", we then resolve this relationship and replace "global variable" uses later in compile. * `AtomicCounterMemory` memory-constraint requires `OpCapability AtomicStorage`, `AtomicStorage` is invalid for Vulkan targets. glslang outputs for `barrier`, `memoryBarrier`, and `groupMemoryBarrier` `AtomicCounterMemory` as a memory constraint. This compiles as valid SPIR-V for Vulkan since `OpCapability AtomicStorage` is not declared. This behavior of glslang is undefined as per [3.31.Capability of the SPIR-V specification](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_capability). We will omit `AtomicCounterMemory` from our barrier calls.
*	Implement glsl atomic's [non image or memory scope] with optional ↵	ArielG-NV	2024-03-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	extension(s); resolves #3587 for GLSL & SPIR-V targets (#3755) The following commit implements atomic operations & types associated with OpenGL 4.6, GL_EXT_vulkan_glsl_relaxed, GLSL_EXT_shader_atomic_float, GLSL_EXT_shader_atomic_float2, for GLSL & SPIR-V targets. Fully implements all functions, and built-in type's, resolves https://github.com/shader-slang/slang/issues/3560 for GLSL & SPRI-V targets. [Atomic extensions for GLSL can be found here](https://github.com/KhronosGroup/GLSL/tree/main) Notes of worth: * atomic_uint is well defined in GLSL->OpenGL, although was removed in GLSL->VK unless a compiler extension is supported (GL_EXT_vulkan_glsl_relaxed). This support entails transforming all atomic_uint operations and references into a storage buffer. SPIR-V has AtomicCounter+AtomicStorage (atomic_uint parallel) but does not implement these capabilities for SPIR-V->VK in any scenario. Due to the case we transform atomic_uint ourselves (GLSL_Syntax->Slang_IR) to accommodate transforming atomic_uint into valid syntax. * GLSL_EXT_shader_atomic_float2 (all float16_t & some float/double operations) support is minimal and worth watching out for if enabling the tests.
*	Link-time specialization fixes. (#3734)	Yong He	2024-03-11
\| \| \| \| \|	* Fix method synthesis logic for static differentiable methods. * Support link-time constants in thread group size reflection.
*	Refactor compiler option representations. (#3598)	Yong He	2024-02-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Refactor compiler option representation. * Fix binary compatibility. * Add a test for specifying compiler options at link time. * Fix binary compatibility. * Fix binary compatibility. * Fix backward compatibility on matrix layout. * Fix. * Fix. * Fix. * Fix gfx. * Fix gfx. * Fix dynamic dispatch. * Polish.
*	Parameter binding and gfx fixes. (#3302)	Yong He	2023-11-01
\| \| \| \| \| \| \| \| \| \| \|	* Parameter binding and gfx fixes. * Add diagnostics on entry point parameters. * Fix. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Type layouts for structured buffers with counters (#3269)	Ellie Hermaszewska	2023-10-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* More tests for append structured buffer * Append and Consume structured buffer tests for DX12 * neaten * test wobble * Add counter layout information to append/consume structured buffers * add getRWStructuredBufferType * Correct definition of get size for append/consume structured buffers * tweak append structured buffer test * Allow initializing counter buffer in render test * vulkan test for consume structured buffer * Handle null counterVarLayout in getExplicitCounterBindingRangeOffset * remove dead code * Implement atomic counter increment/decrement for spirv * explicit spirv test * Add missing check on result * Hold on to counter resources --------- Co-authored-by: Yong He <yonghe@outlook.com>
*	Compile append and consume structured buffers to glsl. (#3142)	Yong He	2023-08-21
\| \| \| \| \| \| \| \| \| \| \|	* Compile append and consume structured buffers to glsl. * Fix. * Update CI config. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Use ankerl/unordered_dense as a hashmap implementation (#3036)	Ellie Hermaszewska	2023-08-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Correct namespace for getClockFrequency * missing const * Add missing assignment operator * Remove unused variables * Return correct modified variable * Use stable hash code for file system identity * terse static_assert * Structured binding for map iteration * Make (==) and getHashCode const on many structs * Add ConstIterator for LinkedList * Replace uses of ItemProxy::getValue with Dictionary::at * Extract list of loads from gradientsMap before updating it * Const correctness in type layout * Add unordered_dense hashmap submodule * Use wyhash or getHashCode in slang-hash.h * refactor slang-hash.h * Use ankerl/unordered_dense as a hashmap implementation Notable changes: - The subscript operator returns a reference directly to the value, rather than a lazy ItemProxy (pair of dict pointer and key) slang-profile time (95% over 10 runs): - Before: 6.3913906 (±0.0746) - After: 5.9276123 (±0.0964) * 64 bit hash for strings So they have the same hash as char buffers with the same contents * Narrowing warnings for gcc to match msvc * revert back to c++17 * Correct c++ version for msvc * Use path to unordered_dense which keeps tests happy * Do not assign to and read from map in same expression * Remove redundant map operations in primal-hoist * Split out stable hash functions into slang-stable-hash.h * 64 bit hash by default * regenerate vs projects * Correct return type from HashSetBase::getCount() * correct width for call to Dictionary::reserve * Use stable hash for obfuscated module ids * Signed int for reserve * clearer variable naming * Parameterize Dictionary on hash and equality functors * Allow heterogenous lookup for Dictionary * missing const * Use set over operator[] in some places * Remove unused function * s/at/getValue
*	Redesign `DeclRef` and systematic `Val` deduplication (#3049)	Yong He	2023-08-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Redesign DeclRef + Deduplicate Val. * Update project files * Fix warning. * Fix. * Fix. * Remove `Val::_equalsImplOverride`. * Rmove `Val::_getHashCodeOverride`. * Remove `semanticVisitor` param from `resolve`. * Cleanups. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Better handling of bindings with multiple resource kind "aliases" for GLSL ↵	jsmall-nvidia	2023-07-21
\| \| \| \| \| \| \| \| \|	emit (#3009) * A more way robust way to handle resource consumption might use multiple `kind`s on GLSL emit. * Improve method naming and some comments. * Small consistency fix.
*	Support for vk-shift-* without explicit bindings (#3000)	jsmall-nvidia	2023-07-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Improvements to HLSLToVulkanLayoutOptions. * WIP vk-shift-* with HLSL like binding. Detecting clashes. * Shift example seems to be working correctly. One oddness is that "used" data is now reflected, as we only enable for D3D shader resource types. Now we use those with inferred VK mode they appear. * Implicit seems to work. * Disable inference with Sampler/CombinedTextureSampler. I guess? we could just use the HLSL texture register binding to infer. * Report overlapping ranges in diagnostic. The hlsl-to-vulkan-shift-diagnostic result might be surprising but it is correct, because u is automatically laid out so consumes DescriptorSlot 0, but that's already consumed by c. * First attempt at array layout with infer on Vulkan. * Fix the vulkan shift output. * Array example.
*	Pointer layout support (#2930)	jsmall-nvidia	2023-06-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* WIP looking at reflection with pointers. * Added GetPointerLayout. * Initial test via reflection with layout of ptr type. * WIP handles ptrs to types that have layout that hasn't been completed. * Move tests to ptr. * WIP try to take into account lowering correctly between AggTypeDecl and Type, but doesn't quite work. * WIP a different path to handling recursive lowering problem with Ptr. * Fix issues with reflection output. * Small tidy. * Fix for infinite recursion issue. * Lower IRPointerTypeLayout * Working with generics. Has a hack to work around Layout around Ptr in IR. The reflection around the generic - the name isn't much use, it should probably have the generic parameters, but that would require getName to do something more sophisticated. * Fix issue around calling finishOuterGenerics to early. * Remove feature/ptr test. * Fix type legalization being an infinite loop with Ptr self referencing. * Disable the pointer self reference test because produces an infintie loop on emit. * Fixed comment based on review. * Fix for issue with emit and pointers causing infinite recursion.
*	Various dxc/fxc compatibility fixes. (#2863)	Yong He	2023-05-02
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Various dxc/fxc compatibility fixes. * Cleanup. * Fix test cases. * Fix comments. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Use GLSL scalar layout for constant buffers. (#2147)	Yong He	2022-02-28
\|
*	Improved SCCP, inlining and resource specialization passes, legalize ↵	Yong He	2022-02-25
\| \| \| \|	`ImageSubscript` for GLSL (#2146)
*	gfx: support shader record overwrite and fix QueryPool. (#2123)	Yong He	2022-02-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Various fixes to gfx. * Fix. * Fixes. * Fix. * gfx: support root parameter via user-defined attribute. * Fix. * Fix. * Skip d3d12 tests on win x86. * Fixes. * gfx: support shader record overwrite. * Fix QueyPool implementation. * Rename to `getBindingRangeLeafVariable` Co-authored-by: Yong He <yhe@nvidia.com>
*	gfx: support d3d12 root parameters (#2122)	Yong He	2022-02-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Various fixes to gfx. * Fix. * Fixes. * Fix. * gfx: support root parameter via user-defined attribute. * Fix. * Fix. * Skip d3d12 tests on win x86. * Fixes. Co-authored-by: Yong He <yhe@nvidia.com> Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
*	Various fixes to gfx. (#2120)	Yong He	2022-02-09
\| \| \| \| \| \| \| \| \| \| \|	* Various fixes to gfx. * Fix. * Fixes. * Fix. Co-authored-by: Yong He <yhe@nvidia.com>
*	Improve comments around -Xarg and C++/CUDA layout (#1884)	jsmall-nvidia	2021-06-14
\| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Alter comments around layout size/alignment to reflect nuance on C++/CUDA. * Fix some errors in -X documentation, and clarify some of the behavior. * Small doc improvements.
*	Fix cyclic reference in `ExtendedTypeLayout`. (#1868)	Yong He	2021-06-02
\|
*	Update gfx back-ends to handle static specialization (#1826)	Tim Foley	2021-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Update gfx back-ends to handle static specialization The main goal here is to make the D3D11, D3D12 and Vulkan back-ends support static specialization of interface types in the case where the data for the type won't "fit" in the pre-allocated space for existential values. This includes all cases where the concrete type being specialized to has resources/samplers/etc., as well as any cases where its ordinary/uniform data exceeds the space available. (Note that the CPU and CUDA targets don't need this work since they can (in theory) support arbitrary-size data in the fixed-size existential payload by using pointer indirection. Actually supporting indirection in those cases should be a distinct change) The Slang compiler already performs layout for programs that have this kind of data that doesn't "fit," and it lays them out using an idea of "pending" type layouts. Basically, a type that contains some amount of specialized interface-type fields will produce both a "primary" type layout that just covers the data for the unspecialized case, as well as "pending" type layout that describes the layout for all the extra data needed by specialization. When laying out a `ConstantBuffer<X>` or `ParameterBlocK<X>` ("CB" or "PB"), the front-end will try to place as much of that "pending" data into the layout of the buffer/block itself as is possible. That means that both CBs and PBs will be able to allocate trailing bytes for any ordinary data in the "pending" layout. PBs will be able to allocate any trailing resources/samplers into their layout, but for CBs they will spill out to be part of the pending layout for the buffer itself. In order for the back-ends to properly handle pending data, they need to either assume the exact layout rules used by the front-end and try to reproduce them (e.g., by iterating over binding ranges and sub-objects in the exact same order that front-end layout would enumerate them), or they need to respect the reflection information produced by the front-end. This change takes the latter approach, trying to make only minimal assumptions about the layout rules being used. This choice is motivated by wanting to decouple the `gfx` implementation from the compiler front-end, especially insofar as this work has made me question whether the current layout rules are the best ones possible. A common theme across all the implementations is to have a fixed-size type that can represent "binding offsets" for the chosen back-end. The offset type has fields that depend on the API-specific way bindings are indexed; e.g., for D3D11 it has offsets for CBV, SRV, UAV, and sampler bindings. This fixed-size offset type can be filled in based on Slang reflecton information, and then used to compute derived offsets with just a few add operations. The simple offset type for each API is then extended to produce an offset type that includes both the offsets for "primary" data and also the offsets for "pending" data. Most logic that traffics in offsets doesn't have to know about this more complicated representation. Making consistent use of these offsets required that I pretty much rewrite the logic that actually applies shader objects to the API state. Doing so might be lowering the efficiency of the system in the near term, but the increase in clarity was important for getting the work done, and it seems like it will also be important if/when we start trying to perform special-case optimizations around root and entry-point parameter setting. While there are many API-specific differences, we can identify a repeated pattern where many steps, whether applying parameters to the pipeline stage or constructing signatures / layouts, can be broken down into three main operations on `ShaderObject`s or their layouts: * `AsValue()` is the core operation, and is the one used for the `ExistentialValue` case most of the time. It ignores the ordinary data in the object, and instead processes all nested binding ranges (for resources/smaplers) and sub-objects. `AsConstantBuffer()` handles the `ConstntBuffer<X>` case, by dealing with the implicit buffer for ordinary data (if it is needed) and then delegates to the `AsValue()` case. * `AsParameterBlock()` handles the `ParameterBlock<X>` case, by allocating/preparing/etc. any descriptor tables/sets that would be required for the current object/layout and then delegating to `AsConstantBuffer()` to do the rest The idea is that by having the parameter block case delegate to the constant buffer case, which delegates to the value/existential case, we can streamline a lot of the logic so that it doesn't seem quite as full of special cases. Note: When preparing this pull request I spent a reasonable amount of time trying to clean up the D3D11 and Vulkan implementations, so they are probably the easiest to read and understand when it comes to the new code. Doing the cleanup work also helped to work out some weird corner case bugs/issues. In contrast, the D3D12 path hasn't had as much attention given to cleanliness and comments, so it really needs some attention down the line to get things into a state that is easier to understand. * fixup: remove debugging code spotted in review
*	Various fixes to make `model-viewer` example almost working. (#1801)	Yong He	2021-04-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fixing `PseudoPtr` legalization and `gfx` lifetime issues. * Fixing `model-viewer` example. This change contains various fixes to bring `model-viewer` example to fully functional. These fixes include: 1. Add `spReflectionTypeLayout_getSubObjectRangeSpaceOffset` function to return the space index for a sub object referenced through a `ParameterBlock` binding. 2. Make sure `D3D12Device` specifies column major matrix order creating a Slang session. 3. Fix `platform::Window::close()` and `platform::Application::quit()`. 4. Fix memory leak during `model-viewer''s model loading. 5. Fix command buffer recording in `model-viewer`. With these changes, model viewer can now produce an image with a gray cube. The lighting is still incorrect becuase the `gfx` shader object implementation still does not handle "pending layout" resulting from global existential parameters. * Fix d3d12 root signature creation. * Use row-major matrix layout in model-viewer
*	Add shader object parameter binding to renderer_test. (#1622)	Yong He	2020-12-03
\| \| \| \| \| \| \| \| \|	* Add shader object parameter binding to renderer_test. * remove multiple-definitions.hlsl * Fix cuda implementation. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Unify handling of static and dynamic dispatch for interfaces (#1612)	Tim Foley	2020-11-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Overview ======== Prior to this change, we had two different code generation strategies for interface/existential types in Slang, that didn't always play nicely together: * The "legacy" static specialization approach could handle plugging in an arbitrary concrete type for an existential type parameter (including types with resources, etc.), but wouldn't work well with things like a `StructuredBuffer<>` of an interface type, and requires somewhat counter-intuitive layout rules to make work. * The new dynamic dispatch approach produces simpler, more easily understood layouts by assuming that values of interface type can fit into a fixed number of bytes. The tradeoff there is that it cannot handle types that include resources (only POD types). The goal of this change is to make it so that the two strategies can co-exist. In particular, in cases where a shader is amenable to both static specialization and dynamic dispatch, the type layouts should agree. In order to make the type layouts agree, we: * Declare that all values of existential type reserve storage according to the dynamic-dispatch rules (so 16 bytes for the RTTI and witness-table information, plus whatever bytes are needed to story "any value" of a conforming type). * Then we modify the "legacy" layout rules so that if a value of concrete type can fit in the reserved "any value" space for a given interface, then it is laid out there exactly like the dynamic dispatch rules would do. Otherwise, we fall back to the previous legacy rules (since we don't need to agree with the dynamic-dispatch layout on types that can't be used with dynamic dispatch). Details ======= * Renamed `ExistentialBox` to `BoundInterfaceType` to better clarify how it relates to `BindExistentialsType` * Unconditionally apply the `lowerGenerics` pass during emit, since it is now responsible for aspects of the lowering of existential types when specialization is used. * Made IR type layout take the target into account, so that the layout of resource types can vary by target (e.g., being POD on some targets, and invalid on others) * Cleaned up some issues around using global shader parameters as the "key" for their layout information in the global-scope layout (only comes up when there are global-scope `uniform` parameters) * Made there be a default any-value size (16) instead of making it be an error to leave out. This was the simplest option; we could try to go back to having an error, but we'd need to only issue it if we are sure a type/interface is being used with dynamic dispatch, since static dispatch doesn't have to obey the restrictions. * Changed lowering of existential types to tuples so that bound interfaces where the concrete type won't fit use a "pseudo-pointer" instead of an "any-value" to hold the payload * Changed IR type legalization to handle the "pseudo-pointer" case and apply layout information from an interface type over to the payload part when static specialization was used. * Changed some details of how witness tables were being lowered, so that we didn't have to create "proxy" witness tables for the constraints on associated types (just use the actual requirement entries we generate) * Changed witness tables so that they know the subtype doing the conforming * Added logic so that we don't generate pack/unpack logic and witness table wrapper functions for types that are incompatible with any-value/dynamic dispatch for a given interface. * Changed the core AST-level type layout logic to use the dynamic-dispatch layout in case things fit, and the legacy static specialization case when things don't (while also reserving space for the dynamic-dispatch fields) * Changed a bunch of test cases for static specialization to properly use the new layout (which introduces new buffers in some cases, and moves data around in others). Future Work =========== The experience of trying to reconcile our older way of handling interface-type specialization with our newer model (that supports dynamic dispatch) makes it clear that we really need to make similar changes to our handling of generic type parameters on entry points and at the global scope. A future change should make it so that a global type parameter is lowered with a type layout similar to a value parameter of interface type, including the RTTI and witness-table pieces, and just leaving out the "any value" piece. A similar translation strategy should apply to entry-point generic parameters (mirroring how we lower generic functions for dynamic dispatch already), and value specialization parameters. Co-authored-by: Yong He <yonghe@outlook.com>
*	Rework type layout for ExistentialSpecializedType (#1531)	Yong He	2020-09-03
\|
*	ASTNodes use MemoryArena (#1376)	jsmall-nvidia	2020-06-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add a ASTBuilder to a Module Only construct on valid ASTBuilder (was being called on nullptr on occassion) * Add nodes to ASTBuilder. * Compiles with RefPtr removed from AST node types. * Initialize all AST node pointer variables in headers to nullptr; * Initialize AST node variables as nullptr. Make ASTBuilder keep a ref on node types. Make SyntaxParseCallback returns a NodeBase * Don't release canonicalType on dtor (managed by ASTBuilder). * Give ASTBuilders a name and id, to help in debugging. For now destroy the session TypeCache, to stop it holding things released when the compile request destroys ASTBuilders. * Moved the TypeCheckingCache over to Linkage from Session. * NodeBase no longer derived from RefObject. * Only add/dtor nodes that need destruction. First pass compile on linux.
*	WIP: ASTBuilder (#1358)	jsmall-nvidia	2020-05-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Compiles. * Small tidy up around session/ASTBuilder. * Tests are now passing. * Fix Visual Studio project. * Fix using new X to use builder when protectedness of Ctor is not enough. Substitute->substitute * Add some missing ast nodes created outside of ASTBuilder. * Compile time check that ASTBuilder is making an AST type. * Moced findClasInfo and findSyntaxClass (essentially the same thing) to SharedASTBuilder from Session.
*	Remove static struct members from layout and reflection (#1310)	jsmall-nvidia	2020-04-08
\| \| \| \| \| \| \| \| \| \| \| \|	* * Added MemberFilterStyle - controls action of FilteredMemberList and FilteredMemberRefList * Splt out template implementations * Use more standard method names dofr FilteredMemberRefList * Added reflect-static.slang test * Added isNotEmpty/isEmpty to filtered lists * Added ability to index into filtered list (so not require building of array) * Default MemberFilterStyle to All. * Remove explicit MemberFilterStyle::All
*	Add attributes to enable dual-source blending on Vulkan (#1210)	Tim Foley	2020-02-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change adds support for the `[[vk::location(...)]]` and `[[vk::index(...)]]` attributes, which can be used together to mark up shader outputs for dual-source blending on Vulkan. HLSL/Slang code like the following: ```hlsl struct Output { [[vk::location(0)]] float4 a : SV_Target0; [[vk::location(0), vk::index(1)]] float4 b : SV_Target1; } [shader("fragment")] Output main(...) { ...} ``` can be used to set up dual-source blending on both D3D and Vulkan APIs. The output GLSL for the above will look something like: ```glsl layout(location = 0) out vec4 a; layout(location = 0, index = 1) out vec4 b; void main() { ... } ``` The more or less straightforward parts of this change were: * Added new `attribute_syntax` declarations to the stdlib, for `[[vk::location(...)]]` and `[[vk::index(...)]]` * Added new AST node types for the new attribute cases, sharing a base class so that argument checking can be shared * Added checks for the arguments to the new attributes in `slang-check-modifier.cpp` (eventually this kind of logic shouldn't be needed for new attributes) * Updated GLSL emit logic so that it treats the `index`/`space` parts of a variable layout as the `location`/`index` for varying parameters. * Updated GLSL legalization so that when it translates entry-point parameters into globals (and scalarizes structures) it handles both a binding index and space for the parameters. * Added a cross-compilation test case to verify that the basics of the feature work The remaining work is all in `slang-parameter-binding.cpp`. There is some work that isn't technically related to this change (and which could be reverted if it causes problems), around the detection and handling of fragment shader outputs with `SV_Target` semantics. The basic changes (which could be backed out and then merged separately) are: * Made the special-case `SV_Target` logic only trigger for fragment shaders (that is the only place where `SV_Target` should appear, but we weren't guarding against it) * Made the logic to reserve a `u<N>` register for `SV_Target<N>` only trigger for D3D Shader Model 5.0 and below (since it is not required for SM 5.1 and up). This could be a breaking change for some users, but that seems unlikely. * Fixed one test case that relied on the behavior of reserving `u0` for `SV_Target0` even though it was a SM6.0 test. * Also added more comments to the system-value handling logic. The more interesting changes come up starting in `processEntryPointVaryingParameterDecl()`. The basic issue is that we have so far only supported implicit layout for varying parameters on GLSL/Vulkan, but the `[[vk::location(...)]]` attribute is a form of explicit layout annotation. Rather than try to kludge something that only works in narrow cases, I instead opted to try to fix things more generally. In `processEntryPointVaryingParameterDecl()` we now check for the `location` and `index` attributes when we are on "Khronos" targets (Vulkan/OpenGL/GLSL) and immediately add them to the variable layout being constructed if they are found. There is nothing in this logic specific to fragment-shader outputs, so this feature now applies to any varying input/output on Khronos targets. Allowing explicit layouts creates the potential for mixing implicit and explicit layout. For example, consider: ```hlsl struct Output { float4 color : COLOR; [[vk::location(0)]] float3 normal : NORMAL; } ``` What `location` should `color` get? Should this code be an error? There are two cases where this conundrum can come up: when working with `struct` types used for varying parameters, and the entry-point parameter list itself. For the varying `struct` case we currently make an expedient choice. We handle fields with both implicit or explicit layotu with appropriate logic, but logic that doesn't account for the case of mixing the two. Then at the end of layout for the `struct` we issue an error if there was a mix of implicit and explicit layout (such that our results aren't likely to be valid). For the entry point varying parameter case, things were already using a `ScopeLayoutBuilder` type (that encapsulates some logic shared between entry-point and global parameters). The entry-point-specific bits were moved out into a `SimpleScopeLayoutBuilder` and it was updated so that rather than assuming all parameters use implicit layout it does a two-phase layout approach similar to what we use for the global scope: * First all parameters are enumerated to collect explicit bindings and mark certain ranges as "used" * Next the parameters are enumerated again and those without explicit bindings get allocated space using a "first fit" algorithm In principle we could extend the two-phase approach to apply to `struct` types as well, but that would be best saved for a future refactoring of some of this parameter binding logic, since I would like to exploit more of the opportunities for sharing code across the uniform/varying and struct/entry-point/global cases. By moving the point where entry point parameters get their offsets assigned, it was necessary to move around some of the logic that removes varying parameter usage (and other things that shouldn't "leak" out of an entry point) to a different point in the entry point layout process. While adding these various pieces does not quite enable us to support explicit bindings on entry point parameters (e.g., putting `uniform Texture2D t : register(t0)` in an entry point parameter list) or in `struct` types (e.g., explicit `packoffset` annotations on fields), it starts to provide some of the infrastructure that we'd need in order to support those cases.
*	Synthesizing CUDA tests (#1183)	jsmall-nvidia	2020-01-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* When using setUniform clamp the amount of data written to the buffer size. * CUDA implement StructuredBuffer/ByteAddressBuffer as pointer/count as is on CPU. Allow bounds check to zero index. Update docs. * Synthesize tests. * Fix bug in CUDA output. * Fixing more tests to run on CUDA. * Added BaseType for layout of Vector and Matrix - as they are held as int32_t vector array types. * Enable unbound array support on CUDA. * Added unsized array support for CUDA documentation.
*	Feature/string hash review (#1142)	jsmall-nvidia	2019-12-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* * Added ConstArrayView * Made StringSlicePool have styles * Remove point about strings not having terminating 0 (they do), and restriction around "" * spCalcStringHash -> spComputeStringHash * Small code improvements. Closer to coding conventions. * Fix small bug with Empty adding c string. * Fix typo in assert. * Fix ArrayView compiling issue on gcc/clang. * Remove tabs. * Improve comments around StringSlicePool. Simplify getting the added slices.
*	getStringHash on string literals (#1140)	jsmall-nvidia	2019-12-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* WIP getStringHash * Have a use. * Add slang-string-hash.h/.cpp * Use StringSlicePool for holding strings for StringHash. Add outputBuffer to string-literal-hash.slang so value can be tested. Ignore the GlobalHashedStringLiterals instruction on emit. * Add all the hashed string literals to ProgramLayout. * Add reflection support for hashed string literals to reflection test. * Fix string literal hash test. * Small fixes to pass test suite. * Fix issue in serialization where IRUse is not correctly initialized. * Fix problem initializing IRUse for string hash pass. Remove hack from slang-ir-specialize - specially handling if user is not null. * * Use shared builder when replacing getStringHash * Comments for functions in slang-ir-string-hash * Do not allow zero length string literals. Could be allowed, but doing so would require StringSlicePool to have a special case (or some other mechanism)
*	* Added getCStr(Name*) (#1121)	jsmall-nvidia	2019-11-13
\| \| \| \| \| \|	* Added the name to the EntryPointLayout so is always available * Made spReflectionEntryPoint_getName use name * Improved checking for entry point name in render-test * Improved COMPILE test type to allow failure and output of failure.
*	User IR-based layout for all IR steps (#1084)	Tim Foley	2019-10-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change builds on previous work that moves toward a more IR-based representation of layout. Those steps added some instructions for representing layout in the IR (initially just proxies for the AST layout objects), and an explicit lowering pass that could build a target-specific IR module that binds parameters and entry points to layout information. This change aims to complete that work, in the sense that the IR representation of layout is now self-contained and does not rely on having pointers back into the AST-level representation. Achieving this requires two main kinds of work: 1. Update any code that used layout information derived from the IR (most notably all the `slang-emit-` code) to use the new IR representation and its accessors. 2. Update any code that constructs* layouts using information derived from the IR to construct IR layouts instead. The biggest new infrastructure feature in this change is support for "attributes" in the IR (I'd welcome feedback on the naming). An attribute can either be thought of like key/value arguments that can be added to certain instructions to encode optional data, or alternatively like a decoration that is referenced as an operand instead of a child. The value of attributes over decorations is that they can affect the hash/identity of an instruction (which decorations can't), while the advantage of decorations is that they can easily be added/removed over the lifetime of an instruction (which attributes can't). We mostly use them here to represent operands that are logically optional. Once attributes are available, the encoding of layout information into the IR is mostly straightforward: * An `IRVarLayout` has a fixed operand for its type layout, and can accept a few different attributes * Zero or more `IRVarOffsetAttr`s that specify the offset of the variable for a given resource kind. These are equivalent to the `VarLayout::ResourceInfo`s at the AST level. * An optional `IRUserSemanticAttr` and `IRSystemValueSemanticAttr` to represent the (possibly derived) semantic of a varying input/output parameter. * An option `IRStageAttr` to represent the known stage for a parameter. * An `IREntryPointLayout` has a var layout for the entry point parameters (logically grouped in to a struct) and another var layout for the result parameter. * There is a small type hierarchy rooted at `IRTypeLayout` where each subtype can add fixed operands and attributes that are expected to appear. It also supports `IRTypeSizeAttr`s that serve a similar role to the `IRVarOffsetAttr`s. * Structure types maintain the mapping of fields to their var layouts using `IRStructFieldLayoutAttr`s. With the encoding in place, most of the changes in category (1) (code that just uses rather than creates layouts) was straightforward. The biggest different beyond name changes was that everything needs to be fetched using accessors instead of bare fields. It would have been possible to stage this commit and make the diffs smaller by first introducing mandatory acessors to the AST layout types. The changes in category (2) were more involved. There were a lot of places in the existing code where a `TypeLayout` or `VarLayout` would be created, and then initialized piecemeal over several lines of code (and sometimes even across functions). Because of the way that layouts need to support many optional properties, it did not seem practical to just have monolithic factory functions that took all the options as arguments, so I instead opted for a builder approach. The builders for `IRVarLayout` and `IREntryPointLayout` are both straightforward, and honestly there is no realy need for a builder for entry point layouts right now, but I was trying to future-proof in case we decidd to add some optional attributes to them. The builders for type layouts are more involved because of the inheritance hierarchy. Each concrete sub-type of type layout needs to define its own builder type that customizes the opcode, operands, and attributes of the final instruction. The refactoring that had to go into this change was a nice excuse to clean up a few ugly warts in the AST layout code that were largely there to support IR use cases. While this change adds a lot of new infrastructure code to the IR, most of the client code has stayed the same or gotten simpler. One annoying wart that remains with this change is the notion of an "offset element type layout" for parameter group types. That idea was added to deal with a legacy feature in the reflection API that we realized was a mistake, but unfortunately having that "offset" layout handy made writing a few other pieces of code simpler so that there are use cases of the feature even in the IR. Removing those uses is do-able, but requires careful refactoring so it is best left to a follow-on change. Another thing that could be considered for a follow-on change is how much information should be specified when constructing a `Builder` for an IR type layout, and how much should be allowed to be specified statefully/piecemeal. It would be nice to force all the required operands to be specified up front, but `IRParameterGroupTypeLayout::Builder` doesn't currently work that way because so much of the client code that needs it involved a lot of stateful setting and would need to be refactored heavily to provide the necessary information up front.
*	Initial work on representing layout at IR level (#1079)	Tim Foley	2019-10-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Initial work on representing layout at IR level This change starts the process of making the back-end of the compiler independent of the AST-level layout information (`TypeLayout`, `VarLayout`, etc.) so that it instead only relies on layout information that is embedded into IR modules. This brings us incrementally closer to a world in which the back-end could be run without the AST-level structures even existing (e.g., for an application that just wants to ship IR without any AST information for IP protection, while still supporting some amount of linking and specialization). The main parts of the change are: * There is a bunch of incidental churn related to specifying entry points by index instead of the `EntryPoint` object for certain operations. This ends up being a better choice because we can use the index to look up side-band information about the entry point that might not be stored on the `EntryPoint` object itself. In particular... * We expand the `ComponentType` interface to support looking up the mangled name of an entry point by index. In common cases (no generic/interface specialization) this would be the same as asking the `EntryPoint` for its mangled name, but in cases where we have specialized a generic entry point, the mangled name would include speicalization arguments that are only available on the `SpecializedComponentType` that wraps the entry point. This part of the change isn't ideal and there might be a better solution waiting to be invented. Note that we store mangled entry point names as strings rather than using `DeclRef`s because that ensures that the information could be serialized and deserialized without a dependence on the AST. * The `TargetProgram` type (which represents binding a specific `ComponentType` for a shader program to a specific `TargetRequest` that represents the target platform) is expanded to include an `IRModule` that represents layout information, in addition to the AST-level `ProgramLayout` it already contained. We create both of these objects at the same time (on-demand) to simplify the overall flow (so that any code that triggers creation of the AST-level layout will also ensure that the IR-level layout exists). * A bunch of code in the emit passes that was passing down layout-related objects has been eliminated. It appears that most of those objects weren't actually being used, so this is just a cleanup, but it helps ensure that the back-end steps are "clean" and don't depend on the AST-level information. The one big exception here is that the emit logic needs to know the stage for the entry point being emitted (to deal with one wrinkle in translating DXR to VKRT). * A big change (actually introduced by @jsmall-nvidia in a branch that this change copied and then built from) is to introduce some more explicit IR instructions to represent layout information, notably an `IRTypeLayout` and an `IRVarLayout`. For now these objects still reference their AST equivalents, but the separation gives us an incremental path to move information from the AST-level objects over to the IR ones. This work includes logic in `IRBuilder` to construct the IR-level layout objects from the AST-level ones on-demand, so that the existing code paths that try to attach AST-level layout will continue to work for now. * Because layout information is now embedded in the IR, the `slang-ir-link.cpp` logic loses a lot of cases that used to deal with attaching AST-level layout objects to IR-level instructions during the linking process. Instead, the linker now assumes that one (or more) of the input IR modules will have layout information associated with it, and the linker makes sure to copy layout decorations (and the instructions they reference) from the input IR module(s) to the output using its more ordinary mechanisms. * Inside `slang-lower-to-ir.cpp`, we add logic to construct an IR module in a `TargetProgram` that simply references the global shader parameters, entry points, etc. and attaches IR layout decorations to them. This is akin to the existing pass in the same file that constructs IR to represent specialization information, and both of these passes share infrastructure with the main AST->IR lowering pass. Eventually, it is expected that this pass will encompass more of the logic for copying AST-level layout information over to IR-level equivalents. * One small wrinkle with this change was that the output for an HLSL generation test case changed some of its `#line` directives. The old code was actually more inaccurate than the new, so this change just updated the baseline. It also added some logic in the linker to make sure that when an IR instruction has multiple definitions, we try to pick up a source location from any of them, in case the "main" one somehow didn't get a location. * Another small fix was that the key/value map in `StructTypeLayout` for mapping fields/members to their layouts was keyed on `Decl` when it really should have been `VarDeclBase`. This change should in principle be a pure refactoring with no functionality changes, so no new tests were added. It is unfortunately also a change that has a high probability of breaking at least some client code, so we may want to be defensive and mark this with a new major version number (well, a new minor version number since we are pre-`1.0`) to give us some room for releasing hotfixes to the old version if needed. * fixup: infinite recursion bug detected by clang * fixup: remove commented-out code