slang.git - Making it easier to work with shaders

	Commit message (Collapse)	Author	Age
...
*	Fix vk-shift-* mapping issue (#2993)	jsmall-nvidia	2023-07-14
\| \| \| \| \| \| \|	* Fix vk-shift-* mappings. * Add some doc info about vk-shift. * Fix diagnostic test.
*	Make DeclRefBase a Val, and DeclRef<T> a helper class. (#2967)	Yong He	2023-07-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Make DeclRefBase a Val, and DeclRef<T> a helper class. * Fixes. * Workaround gcc parser issue. * Revert NodeOperand change. * Fix. * Fix clang incomplete class complains. * Fix code review. * Small cleanups and improvements. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Bottleneck DeclRef creation through ASTBuilder. (#2689)	Yong He	2023-07-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Bottleneck DeclRef creation through ASTBuilder. * Fix clang error. * Fix. * Fix. * More fix. * Rebase on top of tree. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Pointer layout support (#2930)	jsmall-nvidia	2023-06-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* WIP looking at reflection with pointers. * Added GetPointerLayout. * Initial test via reflection with layout of ptr type. * WIP handles ptrs to types that have layout that hasn't been completed. * Move tests to ptr. * WIP try to take into account lowering correctly between AggTypeDecl and Type, but doesn't quite work. * WIP a different path to handling recursive lowering problem with Ptr. * Fix issues with reflection output. * Small tidy. * Fix for infinite recursion issue. * Lower IRPointerTypeLayout * Working with generics. Has a hack to work around Layout around Ptr in IR. The reflection around the generic - the name isn't much use, it should probably have the generic parameters, but that would require getName to do something more sophisticated. * Fix issue around calling finishOuterGenerics to early. * Remove feature/ptr test. * Fix type legalization being an infinite loop with Ptr self referencing. * Disable the pointer self reference test because produces an infintie loop on emit. * Fixed comment based on review. * Fix for issue with emit and pointers causing infinite recursion.
*	Improvements around HLSLToVulkanLayout (#2867)	jsmall-nvidia	2023-05-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Improve the HLSLToVulkanLayoutOptions interface. Add more diagnostics. Add diagnostics test. * Add check for global binding using file check. * Fix issues with some tests around making some diagnostics ids unique. * Small improvements with doc/handling of vk-<>-shift option setup. --------- Co-authored-by: Yong He <yonghe@outlook.com>
*	HLSL->Vulkan binding support (#2865)	jsmall-nvidia	2023-05-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* WIP around VK shift binding. * Refactor around options parsing. * Remove needless passing around of sink. * Some more tidying around OptionsParser. * Handle vulkan shift parsing. * Fix small issue around vk binding and "all". * Fixing some small issues. Missing break. * Split out VulkanLayoutOptions * WIP binding taking into account HLSL->Vulkan options. * First attempt at making binding work with HLSLVulkanOptions. * VulkanLayoutOptions -> HLSLToVulkanLayoutOptions * WIP with HLSL-Vulkan binding. * Some more testing around vk-shift. * Improvements around global binding. More tests. * Improve test coverage. Improve checking for requirements around default space. * Update command line options. * Small fixes. * Small fix in options reporting. * Fix warning issue. * Some fixes for isDefault for HLSLToVulkanLayoutOptions. * Update hlsl-to-vulkan-shift output. The difference was due to default handling if shift isn't specified, and not being specified was not correctly tracked.
*	Various dxc/fxc compatibility fixes. (#2863)	Yong He	2023-05-02
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Various dxc/fxc compatibility fixes. * Cleanup. * Fix test cases. * Fix comments. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Fix most of the disabled warnings on gcc/clang (#2839)	Ellie Hermaszewska	2023-04-26
\|
*	Dictionary using lowerCamel (#2835)	jsmall-nvidia	2023-04-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * WIP lowerCamel Dictionary. * WIP more lowerCamel fixes for Dictionary. * Add/Remove/Clear * GetValue/Contains * Fix tabs in dictionary. Count -> getCount * Fix fields with caps. * Key -> key Value -> value Use m_ for members where appropriate. Use lowerCamel in linked list. * Some small fixes/improvements to Dictionary. * Kick CI.
*	Make ArrayExpressionType a DeclRefType and define its autodiff extension in ↵	Yong He	2023-01-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	stdlib. (#2615) * Allow array parameters in forward diff. * Use type canonicalization instead of coersion. * Reimplement array type. * Fix. * Update test case. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Mesh shader support (#2464)	Ellie Hermaszewska	2022-11-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add gdb generated files to .gitignore * Switch to c++17 TODO: Ellie update coding style doc * WIP mesh shaders * Add MeshOutputType and mesh output decorations * Lift array type layout creation out of _createTypeLayout in preparation for sharing it elsewhere * Initial pass at GLSL legalization for mesh shaders * Create output types for builtin mesh outputs This should be rendered as an out paramter block * Handle writes to member fields in mesh shader output * Per primitive output from mesh shaders * Add mesh shader tests * Redeclare mesh output builtins * Remove unused instruction * Emit explicit mesh output max max size * Add unimplemented warning for array members in mesh output * Implement mesh output splitting for GLSL in terms of getSubscriptVal * Allow HLSL syntax for mesh output modifiers * Improve error messages for mesh output * Add test for HLSL style mesh output syntax * Emit explicit mesh output indices max size * HLSL generation support for mesh shaders * Better errors for mesh shader misuse * Neaten comments * Regenerate vs2019 project files * Fix build on vs2019 * Retreat on c++17 Will make the change in a separate PR * slang-glslang binary dep 11.10.0 -> 11.12.0-32 * Fixes for msvc compiler * Update msvc project
*	Assorted Artifact improvements (#2374)	jsmall-nvidia	2022-08-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * WIP replacing DownstreamCompileResult. * First attempt at replacing DownstreamCompileResult with IArtifact and associated types. * Small renaming around CharSlice. * ICastable -> ISlangCastable Added IClonable Fix issue with cloning in ArtifactDiagnostics. * Only add the blob if one is defined in DXC. * Guard adding blob representation. * Make cloneInterface available across code base. Set enums backing type for ArtifactDiagnostic. * Added ::create for ArtifactDiagnostics. * Use SemanticVersion for DownstreamCompilerDesc. Set sizes for enum types. * Depreciate old incompatible CompileOptions. Change SemanticVersion use 32 bits for the patch. * Split out CastableUtil. * Change IDownstreamCompiler to use canConvert and convert to use artifact types. * Fix typos. * Fix typo bug. Allow trafficing in PTX assembly/binaries * struct DownstreamCompilerBaseUtil -> struct DownstreamCompilerUtilBase * Add other riff types. * Small fix around artifact kind. * Make using slices instead of strings explicit on atomic ref counted types. (not complete). Added IArtifactList. Use IArtifactList to hold the 'associated' files. Use IUnknown for scoping for atomic ref counting. Small naming improvements. * Make artifact not use String in construction (so it owns contents). * Calculate compile products as artifacts. * Small improvements around ArtifactDesc. * Use ICastableList for list of artifacts and remove IArtifactList.
*	Artifact split interface and implementation (#2349)	jsmall-nvidia	2022-08-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * WIP with hierarchical enums. * Some small fixes and improvements around artifact desc related types. * Improvements around hierarchical enum. * Fixes to get Artifact types refactor to be able to execute tests. * Attempt to better categorize PTX. * Work around for potentially unused function warning. * Typo fix. * Simplify Artifact header. * Small improvements around Artifact kind/payload/style. * Added IDestroyable/ICastable * Add IArtifactList. * First impl of IArtifactUtil. * Use the ICastable interface for IArtifactRepresentation. * Added IArtifactRepresentation & IArtifactAssociated. * Add SLANG_OVERRIDE to avoid gcc/clang warning. * Fix calling convention issue on win32. * Fix missing SLANG_OVERRIDE. * First attempt at file abstraction around Artifact. * Added creation of lock file. * Move functionality for determining file paths to the IArtifactUtil. Add casting to ICastable. * Added some casting/finding mechanisms. * Simplify IArtifact interface, and use Items for file reps. * Fix problem with libraries on DXIL. * Split out ArtifactRepresentation. * Move ArtifactDesc functionality to ArtifactDescUtil. ArtifactInfoUtil becomes ArtifactDescUtil. * Split implementations from the interfaces for Artifact. * Use TypeTextUtil for target name outputting. * Add artifact impls.
*	Improvements around Artifact (#2346)	jsmall-nvidia	2022-08-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * WIP with hierarchical enums. * Some small fixes and improvements around artifact desc related types. * Improvements around hierarchical enum. * Fixes to get Artifact types refactor to be able to execute tests. * Attempt to better categorize PTX. * Work around for potentially unused function warning. * Typo fix. * Simplify Artifact header. * Small improvements around Artifact kind/payload/style. * Added IDestroyable/ICastable * Add IArtifactList. * First impl of IArtifactUtil. * Use the ICastable interface for IArtifactRepresentation. * Added IArtifactRepresentation & IArtifactAssociated. * Add SLANG_OVERRIDE to avoid gcc/clang warning. * Fix calling convention issue on win32. * Fix missing SLANG_OVERRIDE.
*	COM interfaces with host callable (#2258)	jsmall-nvidia	2022-06-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Use TerminatedUnownedStringSlice for literals in output C++. * Remove Escape/Unescape functions used in slang-token-reader.cpp Add target type of 'host-cpp' etc to map to the target types. * Fix some corner cases around string encoding. * Added unit test for string escaping. Fixed some assorted escaping bugs. * Updated test output. * Added decode test. * Stop using hex output, to get around 'greedy' aspect. Use octal instead. * Added HostHostCallable Small changes to use ArtifactDesc/Info instead of large switches. * Fix C++ emit to handle arbitrary function export. * Add options handling for callable without an output being specified. * Can compile with COM interface. Added example using com interface. * Use the IR Ptr type instead of hack in C++ emit for interfaces. * Fix issue with outputting the COM call when ptr is used. * Fix crash issue on compilation failure.
*	Allow slangc to generate exe from .slang file. (#2170)	Yong He	2022-03-28
\|
*	Use GLSL scalar layout for constant buffers. (#2147)	Yong He	2022-02-28
\|
*	Various fixes to gfx. (#2120)	Yong He	2022-02-09
\| \| \| \| \| \| \| \| \| \| \|	* Various fixes to gfx. * Fix. * Fixes. * Fix. Co-authored-by: Yong He <yhe@nvidia.com>
*	Revise entrypoint renaming interface. (#2113)	Yong He	2022-01-31
\| \| \| \| \| \| \| \| \|	Changed the interface from `IEntryPoint::getRenamedEntryPoint` to `IComponentType::renameEntryPoint`. The underlying implementation creates a `RenamedEntryPointComponentType` wrapper object around the base entry-point. This new implementation allows the user to specify entry point renaming on an IComponentType that isn't just a `EntryPoint`, but also on `SpecializedComponentType` or `CompositeComponentType` as long as the component defines a single entry point. Co-authored-by: Yong He <yhe@nvidia.com>
*	Add API to control interface specialization. (#1925)	Yong He	2021-08-26
\|
*	Use "capability" system to select VKRT extension (#1647)	Tim Foley	2021-01-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Use "capability" system to select VKRT extension Slang currently supports translation of ray tracing shader code to Vulkan GLSL code that uses the `GL_NV_ray_tracing` extension. A multi-vendor equivalent of that extension has been released as `GL_EXT_ray_tracing` and we want Slang to support that extension as well. At the simplest, making the change from one extension to the other is just a matter of changing a few strings, since it does not appear that anything of significance was changed at the GLSL level (or even in SPIR-V). Where this gets trickier is when we have users who want us to support both extensions, and to be able to switch between them. The solution we've implemented here more or less amounts to: * If you don't tell the compiler which extension to use, it will default to `GL_EXT_ray_tracing` (the newer multi-vendor one). * If you explicitly want the older extension, you can opt into it using the `-profile` option or via a new API for explicitly adding capabilities to your target. Making that work required a few different kinds of changes: * The options parsing and public API needed ways to add optional capabilities to a target. * During GLSL code emit, we can check the capabilities that were added to the target to see if the `GL_NV_ray_tracing` extension was explicitly enabled and, if not, default to using the `GL_EXT_ray_tracing` names for things. This step is needed because some of the modifiers/attributes involved in the extension have to be handled explicitly in the code generator rather than implicitly as part of mapping intrinsic functions. * We add two different translations to the relevant operatiosn in the stdlib, one marked with each of the extensions. If profile/capability-based overload resolution can be relied on to pick the right one, this should Just Work. * Next, a bunch of work had to go into making capability-based overloading Just Work for the purposes of this change. There's been a nearly complete reworking of the implementation of `CapabilitySet` here to make it more suitable for our needs. * The tests that were using ray tracing translation for Vulkan needed to be updated. For some of them I updated their baselines to use `GL_EXT_ray_tracing` so that they can test the new path. For others, I updated the command line for the test case so that it explicitly opts into using `GL_NV_ray_tracing`. The result is that we have some coverage of each extension. I would have liked to have each test run in both modes, but our pass-through glslang support doesn't support `-D` options, so I couldn't take that step easily. This change does not add support for `GL_EXT_ray_query`, the extension that supports "DXR 1.1" style queries under Vulkan. Adding support for that extension should hopefully be a smaller step because it doesn't have the same multiple-extensions issue. This change does not address a lot of possible avenues for improvement or cleanup around the capability system. It focuses only on those changes that are necessary to make the ray tracing feature work and leaves the rest for future work. * fixup: infinite loop * Comment-only change to retrigger TC build
*	Simplify workflow when using NVAPI (#1556)	Tim Foley	2020-09-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In some cases, functionality is available as either a GLSL extension for Vulkan/SPIR-V, or through the NVAPI system for D3D. This situation creates complications because while GLSL extensions are generally all supported by the open-source glslang compiler (which we can bundle and ship), NVAPI operations are exposed through a specific header (`nvHLSLExtns.h`) that ships as part of the NVAPI SDK. When a user wants to explicitly use NVAPI-provided operations in their shader code, there are no major complications for Slang; the user sets up their include paths, `#include`s the relevant header, calls functions in it, and lets Slang deal with the details of compilation. The challenge for Slang arises when we want to provide a cross-platform interface in our standard library (e.g., the `RWByteAddressBuffer.InterlockedAddF32` method that was recently added) that uses either a GLSL extension (when compiling for Vulkan/SPIR-V) or an NVAPI (when compiling to DXBC or DXIL). In that case, the code generated by Slang now has a dependency on NVAPI, and we need to somehow emit a `#include` directive that pulls it in when invoking fxc or dxc. Because we do not (and seemingly cannot) bundle the NVAPI header with the compiler, we have to rely on ther user to have it available and to somehow communicate to Slang where it is. Exposing portable routines that sometimes use NVAPI currently creates two main challenges: 1. The user is forced to interact with the "prelude" mechanism in the compiler, which allows the programmer to define code in a given target language that gets prepended to the Slang-generated code. While the prelude mechanism is powerful, it is also hard for users to integrate into their workflow, and our experience so far is that users want something that Just Works. 2. If the user writes code that uses some of our abstract operations that layer on NVAPI and they also want to use NVAPI explicitly, they end up with two copies of the NVAPI header (one included by the Slang front-end, and another included by the downstream fxc/dxc compiler). This puts the user in the situation of (a) having to ensure that they set the defines like `NV_SHADER_EXTN_SLOT` consistently both when invoking Slang and when adding their prelude, and (b) even if they do make the definitions consistent, they run into the problem that fxc/dxc complain about overlapping register bindings on the two copies of the `g_NvidiaExt` global shader paraemter that the NVAPI header declares. This change attempts to resolve both issues by adding a lot of "do what I mean" logic to the compiler to try to ease things in the common case. In particular: 1. The user no longer needs to use the "prelude" mechanism when using NVAPI. The compiler now embeds a default prelude for HLSL output, which will `#include` the NVAPI header if and only if the generated code needs NVAPI access because of portable standard library routines that were used. 2. The user can mix-and-match explicit NVAPI use and stdlib functions that compile to use NVAPI. The register/space to be used by NVAPI when included via prelude is now set based on whatever the user set via the preprocessor so that it should automatically be consistent between both cases. Furthermore, the code we emit for the declaration of `g_NvidiaExt` when compiling explicit NVAPI use is set up to be conditional, so that it is skipped in the case where the prelude will pull in its own declaration of that parameter. The way all this is achieved involves a lot of moving pieces: * We now have an HLSL prelude, which mostly just serves to `#include "nvHLSLExtns.h"` in the case where NVAPI support is needed downstream. * Standard library operations that require NVAPI for their implementation on HLSL include a new `[__requiresNVAPI]` attribute. * The preprocessor has been extended so that after tokenizing an input file it looks up the NVAPI-relevant macros in the resulting environment, and if they are set it attached a modifier (`NVAPISlotModifier1) to the AST `ModuleDecl` that is based on their values. Logic is added to detect if multiple input files specify values for the macros in ways that conflict. * The semantic checking step is extended so that it detects the "magic" NVAPI declarations (the `g_NvidiaExt` paramter and the `NvShaderExtnStruct` type that it uses) and attaches a modifier to them so that they can be identified as such in later steps. * Parameter binding is extended to collect a list of the AST modifiers that reflect NVAPI binding, and to reserve the relevant register(s) so that ordinary user-defined parameters cannot conflict with them. * IR lowering translates the three new AST modifiers related to NVAPI over to IR equivalents. * IR linking is extended to make sure that it clones any `IRNVAPISlotDecoration`s attached to the input modules. The pass intentionally does not care where the modifiers came from; it just collects them all and leaves it to downstream code to sort out what they mean. * Emit logic is extended to have a notion of "prelude directives" which are preprocessor directives that should come before the prelude in the generated code, because they can impact the way that the prelude compiles. This is done so that we don't have to introduce ad hoc logic for each downstream compiler to set any relevant `-D` flags (e.g., both fxc and dxc would need to duplicate such logic for NVAPI support). * The HLSL source emitter is extended to track whether it emits any operations that require NVAPI support. * The HLSL source emitter is extended to emit prelude directives based on whether NVAPI is needed and, if it is, to also set the register and space that NVAPI should use based on what was stored in the decoration(s) on the IR module. * The HLSL source emitter is extended so that it detects global instructions that represent "magic" NVAPI constructs , and emit them as conditional definitions so that they are skipped when NVAPI is included via the prelude. * The handling of requires capabilities during emit logic was cleaned up a bit so that more logic is shared across targets, and also so that the same logic is used both when emitting a function declaration/definition and when emitting a call to an instrinsic function (which won't get declared/defined).
*	Fix an issue with double-counting uniform data for CUDA/CPU (#1545)	Tim Foley	2020-09-17
\| \| \| \| \| \| \| \| \|	The `SimpleScopeLayoutBuilder` helper that is used to build up binding information for entry-point parameter lists has logic to try to support both explicit and implicit binding of parameters. This logic was added as part of supporting dual-source color blending on Vulkan. The basic approach is similar to that used for the global scope, where parameters with explicit binding first "carve out" the ranges they claim via a `UsedRangeSet`, and then parameters without explicit binding allocate space from what is left. The logic is (seemingly by accident) also applied to uniform/ordinary data, which creates a problem because the `ScopeLayoutBuilder` base type is also responsible for computing a layout for uniform/ordinary data that is 100% implicit (while dealing with all the relevant alignment restrictions). That logic goes on to add the computed uniform/ordinary resource usage to the computed type layout, but because such a layout has already been computed (albeit without taking alignment into account), the result is that the uniform/ordinary usage is reported at approximately double what it should be. The fix here is to skip uniform/ordinary resource usage when doing the explicit/implicit dance in `SimpleScopeLayoutBuilder`. This approach means that explicit bindings on entry-point `uniform` parameters will only apply to resources (which matches our rules for the global scope, where we don't allow for explicit binding on uniform/ordinary parameters). This is appropriate since the only reason we are supported explicit layout at all is for dual-source color blending (in general, we only support explicit `register` and `[[vk::binding(...)]]` modifiers on global parameters; users are stuck with our computed layouts in all other cases. Co-authored-by: Yong He <yonghe@outlook.com>
*	Change the policy for entry-point uniform parameters on Vulkan (#1476)	Tim Foley	2020-08-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Entry point `uniform` parameters were a feature of the original Cg and HLSL, but have not been used much in production shader code. One of our goals on Slang is to reduce the (ab)use of the global scope, so bringing entry point `uniform` parameters up to a greater level of usability is an important goal. Some policy choices about how global vs. entry-point `uniform` parameters behave have already been made, that shape decisions looking forward: * For DXBC/DXIL, it makes the most sense to follow the lead of fxc/dxc, by treating entry point `uniform` parameters as a kind of syntax sugar for global shader parameters. Any parameters of "ordinary" types are bundles up into an implicit constant buffer, and all the resources (including the implicit constant buffer) are assigned `register`s just as for globals. It is up to the application to decide how to bind those parameters via a root signature (using root descriptors, root constants, descriptor tables, local vs. global root signature, etc.) * For CPU, it makes sense to pass global vs. entry-point parameters as two different pointers, although the details of what we do for CPU are the least constrained across all current targets. * For CUDA compute, it makes the most sense to map global shader parameters to `__constant__` global data, and entry-point `uniform` parameters to kernel parameters. This choice ensures that the signature of a kernel when translated from Slang->CUDA follows the Principle of Least Surprise, at the cost of making entry-point vs. global parameters be passed via different mechanisms. * For OptiX ray tracing, it makes sense to expand on the precedent from CUDA compute: pass global parameters via global `__constant__` data (as is already expected by OptiX for whole-launch parameters), and pass entry-point `uniform` parameters via the "shader record." This establishes a precedent that for ray-tracing shaders, global-scope parameters map to the "global root signature" concept from DXR, while entry-point `uniform` parameters map to a "local root signature" or "shader record." * For Vulkan ray tracing, the precedent from OptiX then argues that entry-point `uniform` parameters should map to the Vulkan "shader record" concept (and thus cannot support things like resource types). * The remaining interesting case is what to do for non-ray-tracing shaders on Vulkan. The dev team agrees that the most reasonable choice to make for non-ray-tracing Vulkan shaders is to map entry-point `uniform` parameters to "push constants." In particular, this makes it easy to express the case of a compute kernel with direct parameters of ordinary/value types in the way that will be implemented most efficiently. The big picture is then that a kernel like: ```hlsl void computeMain(uniform float someValue) { ... } ``` will map to output GLSL like: ```glsl layout(push_constant) uniform { float someValue; } U; void main() { ... } ``` If the user really wanted a constant-buffer binding to be created instead, they can easily change their input to make the buffer explicit: ```hlsl struct Params { float someValue; } void computeMain(uniform ConstantBuffer<Params> params) { ... } ``` (Forcing the user to be explicit about the desire for a buffer here creates a nice symmetry between Vulkan and CUDA; in the first case the user sets up the data in host memory and passes it to the GPU by copy, while in the second case the user must allocate and set up a device-memory buffer for the data. This symmetry extends to D3D if the application chooses to map entry-point `uniform` parameters to root constants.) This change implements logic in the "parameter binding" part of the Slang compiler to make sure that entry-point `uniform` parameters are wrapped up in a push-constant buffer rather than an ordinary constant buffer for non-ray-tracing shaders on Vulkan (and in a shader record "buffer" for the ray-tracing case). The majority of the actual work was in adding support for root/push constants to the test framework and the graphics API abstraction it uses. To be clear about that support: * Root constant ranges are (perhaps confusingly) treated as a new kind of "slot" that can appear on a descriptor set. This choice ensures that the implicit numbering of registers/spaces used by the back-ends can account for these ranges correctly. * The `TEST_INPUT` lines are extended to allow a `root_constants` case that behaves more or less like `cbuffer` * The CPU and CUDA paths can treat a `root_constants` input identically to a `cbuffer`. They already allocate the actual buffers based on reflection, and just use `cbuffer` as a directive that causes bytes to be copied in. * On D3D12 and Vulkan, a descriptor set allocates a `List<char>` to hold the bytes of root constant data assigned into it, and these bytes are flushed to the command list when the table is actually bound (usually right before rendering). * On D3D11, a descriptor set treats a root constant range more or less like a constant buffer range (with a single buffer), except that it also automatically allocates a buffer to hold the data. Assigning "root constant" data automatically copies it into that buffer. The small number of tests that used entry-point `uniform` parameters of ordinary types were updated to use the new `root_constant` input type, and the bugs that surfaced were fixed. A new test to confirm that entry-point `uniform` parameters map to the shader record for VK ray tracing was added. An important but technically unrelated change is the removal of the `DescriptorSetImpl::Binding` type and related function from the Vulkan implementation of `Renderer`. That type was created to ensure that objects that are bound into a descriptor set don't get released while the descriptor set is still alive, but the implementation relied on a complicated linear search to check for existing bindings, which could create a performance issue for descriptor sets that include large arrays of descriptors. The new implementation makes use of the approach already present in the various `Renderer` implementations (including the Vulkan one) for assigning ranges in a descriptor set a flat/linear index for where their pertinent data is to be bound. As a result, the Vulkan `DescriptorSetImpl` now uses a single flat array of `RefPtr`s to track bound objects, and has no need for linear search when binding. Co-authored-by: Yong He <yonghe@outlook.com>
*	Add support for global uniform shader parameters (#1433)	Tim Foley	2020-07-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Adding support for global uniform shader parameters This change adds support for Slang programmers to declare shader parameters of "ordinary" types at global scope: ```hlsl uniform float gScaleFactor; void main() { ... = gScaleFactor; ... } ``` The generated HLSL/GLSL/DXIL/SPIR-V output will be something along the lines of: ```hlsl struct GlobalParams { float gScaleFactor; } cbuffer globalParams { GlobalParams globalParams; } void main() { ... = globalParams.gScaleFactor; ... } ``` The binding information used for the implicit `globalParams` constant buffer will be determined by the existing implicit parameter binding logic (which already had support for this kind of transformation). The reason this change is being pursued right now is because it is one step toward removing the implicit `KernelContext` type that is used to wrap the generated code for our CPU and CUDA C++ targets. Handling global-scope parameters of ordinary type requires an IR pass that synthesizes the `GlobalParams` structure type above, and that step ends up removing the need for the similar `UniformState` structure that was being used in the CPU/CUDA emit logic. A more detailed guide to the changes included follows: * The diagnostic for a global-scope variable that is implicitly a shader parameter was kept, but changed to a warning. Users can opt out of the warning by decorating their parameter as a `uniform` (since that keyword is already being used to mark entry-point parameters that should be treated as uniform shader parameters). * To simplify the task of finding the global shader parameters, the `CLikeSourceEmitter` type has been given an `m_irModule` member. The previous emit logic for `UniformState` was having to do a roundabout solution involving the `EmitAction`s to deal with not having direct access to the module. * Removed a few dead declarations in the emit logic (related to a much earlier point where emit was based on the AST instead of the IR). * Made the computation of type names in C++ emit take into account `ConstantBuffer<T>` and `ParameterBlock<T>`. As far as I can tell, these were being handled with some special-case hacks in the emit logic instead of being supported more fundamentally. It might actually be good to pass these through as `ConstantBuffer<T>` and `ParameterBlock<T>` in the C++ output, and allow the prelude to customize their translation (defaulting to defining them as `T`). Removed the special-case C++ emit logic for references to global shader parameters. There are now at most two global shader parameters to deal with, and the default emit logic (referring to them by name) does the Right Thing. * Changed the handling of entry points for C++ (both CPU and CUDA) so that it handles the bundled-up shader paameters for the global and entry-point scopes the same way. The main complication here is OptiX, where parameter data is passed very differently than it is for CUDA compute kernels. * Reverted changes to `ir-entry-point-uniforms` that had made its logic depend on the compilation target. The parameter binding logic was already responsible for deciding if a given target needed to wrap up its entry-point parameters in a constant buffer, and the IR pass was respecting that layout information. The current workaround had been removing the `ConstantBuffer<T>` indirection from this IR pass for CPU/CUDA, but then reintroducing the same indirection later on in the emit step. * Added an explicit IR pass with the task of collecting global-scope parameters of uniform/ordinary type and packaging them up into a `struct`, and then optionally packaging that `struct` up in a constant buffer. This pass bases its decisions on the IR layout information that was already computed, so it should match whatever policy choices were made at the layout level. * Changed the "key" operand on IR `struct` layout information to not assume an `IRStructKey`. The problem here is that the global scope gets a `StructTypeLayout` to represent its members, and this is convenient (rather than having to always special-case logic that handles the global scope), but the "fields" of that struct are global variables which do not have `IRStructKey`s associated with them. The simplest solution is to use the variables themselves as the keys, which required removing the assumption in the IR encoding. * Updated the IR layout process to compute a layout for the global scope of an entire program, and to attach that to the `IRModule` via a decoration. Updated the IR linking process to carry through that decoration to the linked output. This is necessary so that the IR pass that transforms global parameters can access the global-scope layout information. An important concern with this approach is that the contents and layout of the monolithic `GlobalParams` structure depends on the exact set of modules that were linked (and the order in which they were specified, in some cases). This isn't really a new thing with this change, but it becomes more important as we start to think of how to generalize things to better support separate compilation and linking. There are changes that can (and should) be made to the way that IR layouts are computed for programs (e.g., so that we compute layout per-module and then combine them rather than as a whole-program step). In this case, the problem of forming the combined/linked global layout can be moved down the IR level and not be reliant on AST-level information. Just changing the way layout and linking interact would not change the fundamental problem that global shader parameters as they currently exist in Slang/HLSL/GLSL are not readily compatible with true separate compilation. We either need to find a solution strategy that we can apply to allow existing shaders to work with separate compilation or we need to incrementally work toward removing support for global-scope shader parameters in favor of explicit entry-point parameters in all cases. * fixup: missing files * fixup: comment the new code
*	Fix and improvements around repro (#1397)	jsmall-nvidia	2020-06-18
\| \| \| \| \| \|	* * Fix output in slang repro command line * Profile uses lowerCamel method names (had mix of upper and lower) * Rename slang-serialize-state/SerializeStateUtil to slang-repro and ReproUtil.
*	ASTNodes use MemoryArena (#1376)	jsmall-nvidia	2020-06-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add a ASTBuilder to a Module Only construct on valid ASTBuilder (was being called on nullptr on occassion) * Add nodes to ASTBuilder. * Compiles with RefPtr removed from AST node types. * Initialize all AST node pointer variables in headers to nullptr; * Initialize AST node variables as nullptr. Make ASTBuilder keep a ref on node types. Make SyntaxParseCallback returns a NodeBase * Don't release canonicalType on dtor (managed by ASTBuilder). * Give ASTBuilders a name and id, to help in debugging. For now destroy the session TypeCache, to stop it holding things released when the compile request destroys ASTBuilders. * Moved the TypeCheckingCache over to Linkage from Session. * NodeBase no longer derived from RefObject. * Only add/dtor nodes that need destruction. First pass compile on linux.
*	Feature/ast syntax standard (#1360)	jsmall-nvidia	2020-05-29
\| \| \| \| \| \| \| \| \|	* Small improvements to documentation and code around DiagnosticSink * Made methods/functions in slang-syntax.h be lowerCamel Removed some commented out source (was placed elsewhere in code) * Making AST related methods and function lowerCamel. Made IsLeftValue -> isLeftValue.
*	WIP: ASTBuilder (#1358)	jsmall-nvidia	2020-05-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Compiles. * Small tidy up around session/ASTBuilder. * Tests are now passing. * Fix Visual Studio project. * Fix using new X to use builder when protectedness of Ctor is not enough. Substitute->substitute * Add some missing ast nodes created outside of ASTBuilder. * Compile time check that ASTBuilder is making an AST type. * Moced findClasInfo and findSyntaxClass (essentially the same thing) to SharedASTBuilder from Session.
*	Tidy up around AST nodes (#1353)	jsmall-nvidia	2020-05-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fields from upper to lower case in slang-ast-decl.h * Lower camel field names in slang-ast-stmt.h * Fix fields in slang-ast-expr.h * slang-ast-type.h make fields lowerCamel. * slang-ast-base.h members functions lowerCamel. * Method names in slang-ast-type.h to lowerCamel. * GetCanonicalType -> getCanonicalType * Substitute -> substitute * Equals -> equals ToString -> toString * ParentDecl -> parentDecl Members -> members
*	Reduce the size of Token (#1349)	jsmall-nvidia	2020-05-19
\| \| \| \| \| \|	* Token size on 64 bits is 24 bytes (from 40). On 32 bits is 16 bytes from 24. * Added hasContent method to Token. Some other small improvements around Token.
*	Remove static struct members from layout and reflection (#1310)	jsmall-nvidia	2020-04-08
\| \| \| \| \| \| \| \| \| \| \| \|	* * Added MemberFilterStyle - controls action of FilteredMemberList and FilteredMemberRefList * Splt out template implementations * Use more standard method names dofr FilteredMemberRefList * Added reflect-static.slang test * Added isNotEmpty/isEmpty to filtered lists * Added ability to index into filtered list (so not require building of array) * Default MemberFilterStyle to All. * Remove explicit MemberFilterStyle::All
*	Renamed UnownedStringSlice::size to getLength to make match String. (#1254)	jsmall-nvidia	2020-03-02
\|
*	Don't allocate a default space for a VK push-constant buffer (#1231)	Tim Foley	2020-02-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a shader only uses `ParameterBlock`s plus a single buffer for root constants: ```hlsl ParameterBlock<A> a; ParameterBlock<B> b; [[vk::push_constant]] cbuffer Stuff { ... } ``` we expect the push-constant buffer should not affect the `space` allocated to the parameter blocks (so `a` should get `space=0`). This behavior wasn't being implemented correctly in `slang-parameter-binding.cpp`. There was logic to ignore certain resource kinds in entry-point parameter lists when computing whether a default space is needed, but the equivalent logic for the global scope only considered parameters that consuem whole register spaces/sets. This change shuffles the code around and makes sure it considers a global push-constant buffer as not needing a default space/set. Note that this change will have no impact on D3D targets, where `Stuff` above would always get put in `space0` because for D3D targets a push-constant buffer is no different from any other constant buffer in terms of register/space allocation. One unrelated point that this change brings to mind is the `[[vk::push_constant]]` should ideally also be allowed to apply to an entry point (where it would modify the default/implicit constant buffer). In fact, it could be argued that push-constant allocation should be the default for (non-RT) entry point `uniform` parameters (while `[[vk::shader_record]]` should be the default for RT entry point `uniform` parameters).
*	Add attributes to enable dual-source blending on Vulkan (#1210)	Tim Foley	2020-02-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change adds support for the `[[vk::location(...)]]` and `[[vk::index(...)]]` attributes, which can be used together to mark up shader outputs for dual-source blending on Vulkan. HLSL/Slang code like the following: ```hlsl struct Output { [[vk::location(0)]] float4 a : SV_Target0; [[vk::location(0), vk::index(1)]] float4 b : SV_Target1; } [shader("fragment")] Output main(...) { ...} ``` can be used to set up dual-source blending on both D3D and Vulkan APIs. The output GLSL for the above will look something like: ```glsl layout(location = 0) out vec4 a; layout(location = 0, index = 1) out vec4 b; void main() { ... } ``` The more or less straightforward parts of this change were: * Added new `attribute_syntax` declarations to the stdlib, for `[[vk::location(...)]]` and `[[vk::index(...)]]` * Added new AST node types for the new attribute cases, sharing a base class so that argument checking can be shared * Added checks for the arguments to the new attributes in `slang-check-modifier.cpp` (eventually this kind of logic shouldn't be needed for new attributes) * Updated GLSL emit logic so that it treats the `index`/`space` parts of a variable layout as the `location`/`index` for varying parameters. * Updated GLSL legalization so that when it translates entry-point parameters into globals (and scalarizes structures) it handles both a binding index and space for the parameters. * Added a cross-compilation test case to verify that the basics of the feature work The remaining work is all in `slang-parameter-binding.cpp`. There is some work that isn't technically related to this change (and which could be reverted if it causes problems), around the detection and handling of fragment shader outputs with `SV_Target` semantics. The basic changes (which could be backed out and then merged separately) are: * Made the special-case `SV_Target` logic only trigger for fragment shaders (that is the only place where `SV_Target` should appear, but we weren't guarding against it) * Made the logic to reserve a `u<N>` register for `SV_Target<N>` only trigger for D3D Shader Model 5.0 and below (since it is not required for SM 5.1 and up). This could be a breaking change for some users, but that seems unlikely. * Fixed one test case that relied on the behavior of reserving `u0` for `SV_Target0` even though it was a SM6.0 test. * Also added more comments to the system-value handling logic. The more interesting changes come up starting in `processEntryPointVaryingParameterDecl()`. The basic issue is that we have so far only supported implicit layout for varying parameters on GLSL/Vulkan, but the `[[vk::location(...)]]` attribute is a form of explicit layout annotation. Rather than try to kludge something that only works in narrow cases, I instead opted to try to fix things more generally. In `processEntryPointVaryingParameterDecl()` we now check for the `location` and `index` attributes when we are on "Khronos" targets (Vulkan/OpenGL/GLSL) and immediately add them to the variable layout being constructed if they are found. There is nothing in this logic specific to fragment-shader outputs, so this feature now applies to any varying input/output on Khronos targets. Allowing explicit layouts creates the potential for mixing implicit and explicit layout. For example, consider: ```hlsl struct Output { float4 color : COLOR; [[vk::location(0)]] float3 normal : NORMAL; } ``` What `location` should `color` get? Should this code be an error? There are two cases where this conundrum can come up: when working with `struct` types used for varying parameters, and the entry-point parameter list itself. For the varying `struct` case we currently make an expedient choice. We handle fields with both implicit or explicit layotu with appropriate logic, but logic that doesn't account for the case of mixing the two. Then at the end of layout for the `struct` we issue an error if there was a mix of implicit and explicit layout (such that our results aren't likely to be valid). For the entry point varying parameter case, things were already using a `ScopeLayoutBuilder` type (that encapsulates some logic shared between entry-point and global parameters). The entry-point-specific bits were moved out into a `SimpleScopeLayoutBuilder` and it was updated so that rather than assuming all parameters use implicit layout it does a two-phase layout approach similar to what we use for the global scope: * First all parameters are enumerated to collect explicit bindings and mark certain ranges as "used" * Next the parameters are enumerated again and those without explicit bindings get allocated space using a "first fit" algorithm In principle we could extend the two-phase approach to apply to `struct` types as well, but that would be best saved for a future refactoring of some of this parameter binding logic, since I would like to exploit more of the opportunities for sharing code across the uniform/varying and struct/entry-point/global cases. By moving the point where entry point parameters get their offsets assigned, it was necessary to move around some of the logic that removes varying parameter usage (and other things that shouldn't "leak" out of an entry point) to a different point in the entry point layout process. While adding these various pieces does not quite enable us to support explicit bindings on entry point parameters (e.g., putting `uniform Texture2D t : register(t0)` in an entry point parameter list) or in `struct` types (e.g., explicit `packoffset` annotations on fields), it starts to provide some of the infrastructure that we'd need in order to support those cases.
*	Fix a bug in handling explicit register space bindings (#1175)	Tim Foley	2020-01-23
\| \| \| \| \|	The logic for handling explicit `space`/`set` bindings on shader parameters for parameter blocks was not correctly marking the `space`/`set` that gets grabbed as used, and as a result it was possible for another parameter block that relies on implicit assignment to end up with a conflicting space. This change fixes the original oversight, and adds a test case to prevent against regression.
*	WIP CUDA source emit (#1157)	jsmall-nvidia	2019-12-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* CPPCompiler -> DownstreamCompiler * Added DownstreamCompileResult to start abstraction such that we don't need files. * * Split out slang-blob.cpp * Made CompileResult hold a DownstreamCompileResult - for access to binary or ISlangSharedLibrary * Keep temporary files in scope. * Add a hash to the hex dump stream. * Move all file tracking into DownstreamCompiler. * WIP support for nvrtc. * WIP: Adding support for nvrtc compiler. Adding enum types, wiring up the nvrtc into slang. * Fix remaining CPPCompiler references. * Fix order issue on target string matching. * Use ISlangSharedLibrary for nvrtc. * Use DownstreamCompiler for nvrtc. * WIP first pass at compilation win nvrtc. * Added testing if file is on file system into CommandLineDownstreamCompiler. Added sourceContentsPath. * Make test cuda-compile.cu work by just compiling not comparing output. * Genearlize DownstreamCompiler usage. * Fix warning on clang. * Remove CompilerType from DownstreamCompiler. * Use DownstreamCompiler interface for all compilers. NOTE for FXC, DXC and GLSLANG this doesn't mean using 'compile' - it's still extracting functions from shared library. * Replace DownstreamCompiler::SourceType -> SlangSourceLanguage * Replace _canCompile with something data driven. * Fix compiling on gcc/clang for DownstreamCompiler. * Moved some text conversions into DownstreamCompiler. * Fix problem on non-vc builds with not having return on locateCompilers for VS. * Change so no warning for code not reachable on locateCompilers for vs. * WIP: CUDA code generation - currently just using CPU layout and HLSL. * emitXXXForEntryPoint -> emitEntryPointSource emitSourceForEntryPoint -> emitEntryPointSourceFromIR Fix up generating cuda to get PTX. * WIP emitting cuda for IR. * Small improvements to CUDA ouput. * Disable the CUDA emit test, as output not currently compilable.
*	Remove legacy feature for merging global shader parameters (#1139)	Tim Foley	2019-12-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Remove legacy feature for merging global shader parameters There is a fair amount of special-case code in the Slang compiler today to deal with the scenario where a programmer declares the "same" shader parameter across two different translation units: ```hlsl // a.hlsl Texture2D a; cbuffer C { float4 c; } ``` ```hlsl // b.hlsl cbuffer C { float4 c; } Texture2D b; ``` An important note here is that the declaration of `C` may be in a header file that both `a.hlsl` and `b.hlsl` `#include`, because from the standpoint of the parser and later stages of the compiler, there is no difference between `C` being in an included file vs. it being copy-pasted across both `a.hlsl` and `b.hlsl`. When a user invokes `slangc a.hlsl b.hlsl` (or the equivalent via the API), then they may decide that it is "obvious" that the shader parameter `C` is the "same" in both `a.hlsl` and `b.hlsl`. Knowing that the parameter is the "same" may lead them to make certain assumptions: * They may assume that generated code for entry points in `a.hlsl` and `b.hlsl` will both agree on the exact `register`/`binding` occupied by `C`. * They may assume that reflection information for their program will only reflect `C` once, and it will reflect it in a way that is applicable to entry points in both `a.hlsl` and `b.hlsl` * They may assume that the compiler can and should handle this use case even when `C` contains fields with `struct` types that are declared in both `a.hlsl` and `b.hlsl` that have the "same" definition. * They may assume that in cases where `C` is declared inconsistently between `a.hlsl` and `b.hlsl` the compiler can and will diagnose an error. Making these assumptions work in practice required a lot of special-case code: * When composing/linking programs was `ComponentType`s we had to include a special case `LegacyProgram` type that could provide these "do what I mean" semantics, since they are not what one would want in the general case for a `CompositeComponentType`. * During enumeration of global shader parameter in a `LegacyProgram`, we had to detect parameters from distinct modules (translation units) with the same name, and then enforce that they must have the "same" type (via an ad hoc recursive structural type match). No other semantic checking logic needs or uses that kind of structural check. * During parameter binding generation, we need to handle the case where a single global shader parameter might have multiple declarations, and make sure to collect explicit bindings from all of them (checking for inconsistency) and also to apply generated bindings to all of them. * The `mapVarToLayout` member in `StructTypeLayout` is a concession to the fact that we might have multiple `VarDecl`s for each field of the struct that represents the global scope, we might need to look up a field and its layout using any of those declarations (much of the need for this field had gone away now that IR passes are largely using IR-based layout). All of these different special cases added more complex code in many places in the compiler, all to support a scenario that isn't especially common. Most users won't be affected by the original issue, because they will do one of several things that rule it out: * Anybody using `slangc` like a stand-in for `fxc` or `dxc` and compiling one translation unit at a time will not suffer from any problems. If/when such users want consistent bindings across translation units, they already use either explicit binding or rely on consistent ordering and implicit binding. * Anybody who puts all the entry points that get combined into a pass/pipeline in a single file will not have problems. They will automatically get consistent bindings because of Slang's guarantees, and there can't be duplicated declarations when there is only one translation unit. * Anybody using `import` to factor out common declarations while compiling multiple translation units at once will not be affected. Parameters declared in an `import`ed module are the "same" in a much deeper way that it is trivial for Slang to support. Only users of the Falcor framework are likely to be affected by this, and they have two easy migration paths: either put related entry points into the same file, or factor common parameters into an `import`ed module. (It is also worth noting that for command-line `slangc`, it is possible to have a single module with multiple `.slang` files in it, which can all see global declarations like parameters across all the files. Anybody who buys into doing things the Slang Way should have no problem avoiding duplicated declarations) With the rationale out of the way, the actual change mostly just amounts to deleting lots of code that is no longer needed. An astute reviewer might notice several `assert`-fail conditions where complex Slang features were never actually made to work correctly with this legacy behavior. A small number of test cases broke with the code changes, but these were tests that specifically exercised the behavior being removed. In the case of the tests around binding/reflection generating, I rewrote the tests to use one of the idomatic workarounds (putting the shared parameters into an `import`ed module), but doing so required me to add support for `#include` when doing pass-through compilation with `fxc`. That logic added a bit more cruft than I had originally hoped to this commit, but having `#include` support when doing pass-through compilation is probably a net win. * fixup: 64-bit warning
*	getStringHash on string literals (#1140)	jsmall-nvidia	2019-12-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* WIP getStringHash * Have a use. * Add slang-string-hash.h/.cpp * Use StringSlicePool for holding strings for StringHash. Add outputBuffer to string-literal-hash.slang so value can be tested. Ignore the GlobalHashedStringLiterals instruction on emit. * Add all the hashed string literals to ProgramLayout. * Add reflection support for hashed string literals to reflection test. * Fix string literal hash test. * Small fixes to pass test suite. * Fix issue in serialization where IRUse is not correctly initialized. * Fix problem initializing IRUse for string hash pass. Remove hack from slang-ir-specialize - specially handling if user is not null. * * Use shared builder when replacing getStringHash * Comments for functions in slang-ir-string-hash * Do not allow zero length string literals. Could be allowed, but doing so would require StringSlicePool to have a special case (or some other mechanism)
*	* Added getCStr(Name*) (#1121)	jsmall-nvidia	2019-11-13
\| \| \| \| \| \|	* Added the name to the EntryPointLayout so is always available * Made spReflectionEntryPoint_getName use name * Improved checking for entry point name in render-test * Improved COMPILE test type to allow failure and output of failure.
*	Add basic support for entry points in `.slang-lib` files. (#1112)	Tim Foley	2019-11-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add basic support for entry points in `.slang-lib` files. The basic idea here is that when writing out a `.slang-lib` file based on a compile request, we include new sections in the generated RIFF that represent the entry points that were requested. The entry-point information is serialized in an entirely ad hoc fashion (a future change might clean it up to use the `OffsetContainer` machinery), and contains the name, profile, and mangled symbol name of an entry point. When deserializing this information, we create a list of "extra" entry points that gets attached to the front-end compile requests. These "extra" entry points get turned into `EntryPoint` objects at the same place in the code that entry points specified on the command line or via API would be checked, but the extra entry points bypass the semantic checking and just create "dummy" `EntryPoint` objects. Aside: the ability for a compile request to end up with entry points that weren't originally specified via API or command-line is not new. We already had support for compiling a translation unit with entry points entirely specified via `[shader(...)]` attributes, and this new support tries to function similarly. Because the "dummy" entry points don't retain AST-level information, several parts of the code have been modified to defensively check for `EntryPoint` objects without a matching AST declaration, and skip over them. The main place where this creates a problem is paramete binding, where ignoring the dummy entry point is appropriate since we currently assume linked-in library code has been laid out manually. One small cleanup here is that the `-r` command-line flag and the `spAddLibraryReference` API functio now bottleneck through a common routine to do their work, so that they both gain the new behavior without needing copy-paste programming. In order to keep the existing test case for library linking with entry points working, I had to add a flag to the `render-test` tool so that it can skip specifying entry point names as part of the compile request it creates. In that case it must instead assume that the entry points will be added to the compile request via other means. This logic is a bit magical, and hints that we should be looking for other ways to expose the library linking functionality over time. * fixup: remove alignment assertion
*	Initial work on representing layout at IR level (#1079)	Tim Foley	2019-10-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Initial work on representing layout at IR level This change starts the process of making the back-end of the compiler independent of the AST-level layout information (`TypeLayout`, `VarLayout`, etc.) so that it instead only relies on layout information that is embedded into IR modules. This brings us incrementally closer to a world in which the back-end could be run without the AST-level structures even existing (e.g., for an application that just wants to ship IR without any AST information for IP protection, while still supporting some amount of linking and specialization). The main parts of the change are: * There is a bunch of incidental churn related to specifying entry points by index instead of the `EntryPoint` object for certain operations. This ends up being a better choice because we can use the index to look up side-band information about the entry point that might not be stored on the `EntryPoint` object itself. In particular... * We expand the `ComponentType` interface to support looking up the mangled name of an entry point by index. In common cases (no generic/interface specialization) this would be the same as asking the `EntryPoint` for its mangled name, but in cases where we have specialized a generic entry point, the mangled name would include speicalization arguments that are only available on the `SpecializedComponentType` that wraps the entry point. This part of the change isn't ideal and there might be a better solution waiting to be invented. Note that we store mangled entry point names as strings rather than using `DeclRef`s because that ensures that the information could be serialized and deserialized without a dependence on the AST. * The `TargetProgram` type (which represents binding a specific `ComponentType` for a shader program to a specific `TargetRequest` that represents the target platform) is expanded to include an `IRModule` that represents layout information, in addition to the AST-level `ProgramLayout` it already contained. We create both of these objects at the same time (on-demand) to simplify the overall flow (so that any code that triggers creation of the AST-level layout will also ensure that the IR-level layout exists). * A bunch of code in the emit passes that was passing down layout-related objects has been eliminated. It appears that most of those objects weren't actually being used, so this is just a cleanup, but it helps ensure that the back-end steps are "clean" and don't depend on the AST-level information. The one big exception here is that the emit logic needs to know the stage for the entry point being emitted (to deal with one wrinkle in translating DXR to VKRT). * A big change (actually introduced by @jsmall-nvidia in a branch that this change copied and then built from) is to introduce some more explicit IR instructions to represent layout information, notably an `IRTypeLayout` and an `IRVarLayout`. For now these objects still reference their AST equivalents, but the separation gives us an incremental path to move information from the AST-level objects over to the IR ones. This work includes logic in `IRBuilder` to construct the IR-level layout objects from the AST-level ones on-demand, so that the existing code paths that try to attach AST-level layout will continue to work for now. * Because layout information is now embedded in the IR, the `slang-ir-link.cpp` logic loses a lot of cases that used to deal with attaching AST-level layout objects to IR-level instructions during the linking process. Instead, the linker now assumes that one (or more) of the input IR modules will have layout information associated with it, and the linker makes sure to copy layout decorations (and the instructions they reference) from the input IR module(s) to the output using its more ordinary mechanisms. * Inside `slang-lower-to-ir.cpp`, we add logic to construct an IR module in a `TargetProgram` that simply references the global shader parameters, entry points, etc. and attaches IR layout decorations to them. This is akin to the existing pass in the same file that constructs IR to represent specialization information, and both of these passes share infrastructure with the main AST->IR lowering pass. Eventually, it is expected that this pass will encompass more of the logic for copying AST-level layout information over to IR-level equivalents. * One small wrinkle with this change was that the output for an HLSL generation test case changed some of its `#line` directives. The old code was actually more inaccurate than the new, so this change just updated the baseline. It also added some logic in the linker to make sure that when an IR instruction has multiple definitions, we try to pick up a source location from any of them, in case the "main" one somehow didn't get a location. * Another small fix was that the key/value map in `StructTypeLayout` for mapping fields/members to their layouts was keyed on `Decl` when it really should have been `VarDeclBase`. This change should in principle be a pure refactoring with no functionality changes, so no new tests were added. It is unfortunately also a change that has a high probability of breaking at least some client code, so we may want to be defensive and mark this with a new major version number (well, a new minor version number since we are pre-`1.0`) to give us some room for releasing hotfixes to the old version if needed. * fixup: infinite recursion bug detected by clang * fixup: remove commented-out code
*	Callable CPU code support (#1014)	jsmall-nvidia	2019-08-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First pass support for compiling to a loaded shared library. * Improve documentation for cpu target. * Removed the SLANG_COMPILE_FLAG_LOAD_SHARED_LIBRARY flag. Use the SLANG_HOST_CALLABLE code target Document mechanism. * Fix typo in cpp-resource.slang In test code if the target is 'callable' we don't need to compile (indeed there is no source file). * Small refactor using CommandLineCPPCompiler as base class to implement VisualStudioCPPCompiler and GCCCPPCompiler. * Improvements around CPPCompiler. Mechanism to know products produced. Cleaning up products after execution. * Fix multiple definition of 'SourceType'
*	WIP: Preliminary Slang -> C++ code generation (#1009)	jsmall-nvidia	2019-08-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Expanded prelude for some other resource types. Disable C++ output for ParameterGroup. * WIP: Layout for CPU. * Fixes to CPU layout. * WIP: The uniform is output, but the variable definition is not. * WIP: Entry point parameters to global scope in C++. Handling of resource types (in so far as outputting) * Some discussion of ABI and different input types. * WIP: More C++ support around resource types. * WIP: Split up variables into different structures on emit. * WIP: Emitting C++ with wrapping up of 'Context' * WIP: C++ code has access to semantic values. Wrap in struct so can use method calls to pass shared state. Disable legalizeResourceTypes and legalizeExistentialTypeLayout * Fix structured buffer layout for CPU. * Remove testing/handling of global uniforms on CPU path. Typo fix. Changed CPU tests to use new CPU calling convention. * Check globals are working. Initalize context to zero globals. * Order the global parameters for C++ ouput by their layout. Note - that layout isn't quite working correctly because the StructuredBuffer<int> the int seems to be consuming uniform space. * Work around for reflection not having all data needed for layout ordering for C++ code. * Output constant buffers as pointers. * Entry point parameters accessed through pointer to struct. * WIP: Layout for CPU is reasonable for test case. * Only output 'f' after float literal if type marks as a float. * Cast construction works on C++. * Made IntrinsicOp::ConvertConstruct to make intent clearer. * C++ handling construction from scalar. Handle access of a scalar with .x. Check default initialization. * Comment about need for split of kIROp_construct. Release build works. * Added support from constructVectorFromScalar to C/C++ target. * Handling of in/out in C/C++. * First pass documentation CPU support. * Improvements to C++/C slang code generation documentation. * Small doc change to include need for mechansim to specify cpp compiler path. * Better handling of swizzling - allow swizzling a scalar into a vector.
*	Revise new COM-lite API (#1007)	Tim Foley	2019-08-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Revise new COM-lite API This change revises the "COM-lite" API that was recently introduced to try to streamline it and introduce some missing central/base concepts. The central new abstraction in the API is the notion of a "component type," which is a unit of shader code composition. A component type can have: * IR code for some number of functions/types/etc. * Zero or more global shader parameters * Zero or more "entry point" functions at which execution can start * Zero or more "specialization" parameters (types or values that must be filled in before kernel code can be generated) * Zero or more "requirements" (dependencies on other component types that must be satisfied before kernel code can be generated) Both individual compiled modules, and validated entry points are then examples of component types, and we additionally define a few services that apply to all component types: * We can take N component types and compose them to create a new component type that combines their code, shader parameters, entry points, and specialization parameters. A composed component type may also include requirements from the sub-component types, but it is also possible that by composing thing we satisfy requirements (if `A` requires `B`, and we compose `A` and `B`, then the requirement is now satisfied, and doesn't appear on the composite). * We can take a component type with N specialization parameters, and specialize it by giving N compatible specialization arguments. The result of specialization is a new component type with zero specialization parameters. Under the right circumstances the specialzed component type will be layout compatible with the unspecialized one. * One more example that isn't exposed in the public API today is that we can take a component with requirements and "complete" it by automatically composing it with component types that satisfy those requirements. This can be seen as a kind of linking step that pulls together the transitive closure of dependencies. * We can query the layout for the shader parameters and entry points of a component type, for a specific target. * We can query compiled kernel code for an entry point in a component type (for a specific target). This only works for component types with zero specialization parameters and zero requirements. The idea is that by giving users a fairly general algebra of operations on component types, they can compose final programs in ways that meet their requirements. For example, it becomes possible to incrementally "grow" a component type to represent the global root signature for ray tracing shaders as new entry points are added, in such a way that it always stays layout-compatible with kernels that have already been compiled. Much of the implementation work here is in implementing the unifying component type abstraction, and in particular re-writing code that used to assume a program consisted of a flat list of modules and entry points to work with a hierarchical representation that reflects the underlying algebra (e.g., with types to represent composite and specialized component types). There's also a hidden "legacy" case of a component type to deal with some legacy compiler behaviors that can't be directly modeled on top of the simple algebra with modules and entry points. This API is by no means feature-complete or fully developed. It is expected that we will flesh it out more when bringing up application code (e.g., Falcor) on top of the revamped API. One notable thing that went away in this change is explicit support for "entry point groups" and notions of local root signatures (especially the Falcor-specific handling of the `shared` keyword, which a previous change turned into an explicitly supported feature). With the new "building blocks" approach, it should be possible for a DXR application to deal with local root signatures as a matter of policy (on top of the API we provide). If/when we need to provide some kind of emulation of local root signatures for Vulkan (and/or if Vulkan is extended with an explicit notion of local root signatures), we might need to revisit this choice. * Fix debug build There was invalid code inside an `assert()`, so the release build didn't catch it. * fixup: warnings * fixup: more warnings-as-errors * fixup: review notes * fixup: use component type visitors in place of dynamic casting
*	Add an attribute to disable the overlapping-bindings warning (#1005)	Tim Foley	2019-07-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently if the user gives two global shader parameters conflicting bindings, they get a warning diagnostic: ```hlsl Texture2D a : register(t0); Texture2D b : register(t0); // WARNING: overlapping bindings ``` This change adds a way to locally disable that warning using an attribute: ```hlsl [allow("overlapping-bindings")] Texture2D a : register(t0); [allow("overlapping-bindings")] Texture2D b : register(t0); // OK ``` Note that as a policy decision, the implementation requires `[allow("overlapping-bindings")]` on both declarations in order to disable the warning, under the assumption that the behavior should be strictly opt-in, and not silently affect a programmer who adds a new shader parameter with no knowledge or expectation of possible overlap. The `[allow(...)]` attribute is intended to be a fairly generally mechanism for disabling optional diagnostics within certain scopes (e.g., for the body of a function definition), but as implemented in this change it is quite restrictive: * Only the single name `"overlapping-bindings"` will be recognized, and this name cannot be used with, e.g., a `-W` flag on the command line to enable/disable the same diagnostic, or turn it into an error. Adding more cases would be easy enough, but wiring it up to command-line flags could be trickier. * Only the code that checks for parameter binding overlap is currently checking for `[allow(...)]` attributes, so it is not "wired up" to enable/disable any others. Doing this systematically would ideally involve something in `diagnose()`, but there could be complications to a systematic approach (finding the AST node(s) to use when searching for `[allow(...)]`. On gotcha here is that versions of Slang without this feature will error out on the `[allow(...)]` attribute since they don't understand it, and if we add future diagnostics that it covers then old compiler versions will (as written) error out on a diagnostic they haven't heard of rather than just assume the `[allow(...)]` attribute doesn't apply to them. These kinds of issues can and should be addressed in future changes.
*	Start exposing a new COM-lite API (#987)	Tim Foley	2019-06-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Start exposing a new COM-lite API This change is mostly about exposing a new API to the Slang compiler that allows more fine-grained control over the compilation flow. The basic concepts in the new API are: * An `IGlobalSession` is the granularity at which we load/parse the Slang stdlib, and therefore gives applications a way to amortize startup cost for the library across multiple compiles. This is a concept that might be able to go away in a future version of Slang. * An `ISession` owns all the code that gets loaded/compiled/generated. Any `import`ed modules are shared across everything in a session (we don't re-parse/-check the code when we see another `import` for the same module). Any generic- or interface-based code in the session can be specialized using types from the same session (but not necessarily across sessions). * An `IModule` is the unit of code loading and scoping. It doesn't expose any API in this change, but would be the right scope for looking up types or entry points by name. * An `IProgram` is a "linked" combination of modules and entry points from which code can be generated and reflection information queried. This change re-uses the existing reflection API types, rather than introduce a new API that duplicates that functionality. That will probably change in a future revision. There are two major pieces of functionality added here that aren't related to the new API: * We now have an API concept of "entry point groups" which are one or more entry points that are intended to be used together so that they need to have non-overlapping parameters. For now this is being used to handle "hit groups" and local root signatures for ray tracing, but I'm not sure this is a concept we will keep in the long run. * We have a very special-case (client-application-specific) flag that ascribes special meaning to the `shared` keyword, so that it can be attached to global parameters to indicate that they are actually to be part of the local root signature rather than the global one for DXR. None of the API design (including naming) here is finalized; the only reason to check in the changes at this point to avoid having a long-running branch that leads to merge pain. Clients should not try to depend on the new API just yet, since it is still a work in progress. * fixup: clang warning * fixup: try to detect clang C++11 support * fixup * fixup * fixup * fixup * fixup: review feedback
*	Use slang- prefix on slang compiler and core source (#973)	jsmall-nvidia	2019-05-31
	* Prefixing source files in source/slang with slang- * Prefix source in source/slang with slang- prefix. * Rename core source files with slang- prefix. * Update project files. * Fix problems from automatic merge.