slang.git - Making it easier to work with shaders

	Commit message (Collapse)	Author	Age
...
*	Support for HLSL `export` (#2223)	jsmall-nvidia	2022-05-05
\| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Add support for HLSL `export`. * Test for using `export` keyword.
*	Preliminary Liveness tracking (#2218)	jsmall-nvidia	2022-05-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * WIP tracking liveness. * Skeleton around adding liveness instructions. * Calling into liveness tracking logic. Adds live start to var insts. * Liveness macros have initial output. * Looking at different initialization scenarios. * Some discussion around liveness. * WIP for working out liveness end. * WIP Updated liveness using use lists. * Is now adding liveness information * Some small fixes. * WIP around liveness. * Seems to output liveness correctly for current scenario. * Tidy up liveness code. * Update comment arounds liveness to current status. * Small fixes to liveness test. * Add support for call in liveness analysis. * Improve liveness example with array access. * Small updates to comments. * Disable liveness test because inconsistencies with output on CI system. * Fix some issues brought up in PR. * Rename liveness instructions.
*	Fix the way IR "regions" store conditions (#2216)	Theresa Foley	2022-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of generating high-level-language code, we have a pass that builds a data structure representing structured control-flow `Region`s and their nesting relationship. That data structure is then used when emitting control-flow statements for the body of a function. There are `Region` subtypes coresponding to different kinds of control flow constructs. Both the `IfElseRegion` and `SwitchRegion` subtypes were implemented to store a reference to the branch condition direclty in the `Region`. This turns out to be the root cause of the problem. After the nested `Region` structure is constructed, we have an IR pass that uses the region hierarchy to detect and fix problems where the implicit "scoping" rules of SSA form are incompatible with the scoping rules that will be in effect when those regions are emitted as high-level-language control-flow statements. A bug arose when one of the SSA values that required the scoping fix was the branch condition of an `if` statement. While the IR pass did what it was supposed to and replaced the operand to the `IRIfElse` instruction, doing so did not change the cached condition in the corresponding `IfElseRegion`, and thus didn't effect the way code got emitted for the `if(...)` condition in HLSL. The fix here is simple: the relevant `Region` subtypes now store a pointer to the relevant control-flow instruction rather than to the branch condition. The emit logic can thus fetch the correct condition from the control-flow instruction at the time it emits an `if` or `switch`. Note: We do not need to have the same worries around the `IRIfElse` or `IRSwitch` instructions, nor for the `IRBlock`s that the `Region`s still store. The passes that come after the `Region`s get created are not supposed to alter the CFG in any way, because otherwise they would risk changing/invalidating the `Region` structure. Similarly, this change doesn't modify the `IRInst`s refernced in the `Case`s for a `SwitchRegion` under the assumption that these must always be literal integer constants, and thus cannot be changed out. Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
*	Callable shader fix and explicit payload locations for GLSL (#2185)	Alexey Panteleev	2022-04-13
\| \| \| \| \| \| \| \|	* Fixed the callable shader payload type for GLSL. * Added location parameters to the __vulkanRayPayload and __vulkanCallablePayload attributes. The default value is -1 which means use the old auto-assignment logic. * Fixed the vkray/callable-caller test.
*	Support `[DllImport]` (#2181)	Yong He	2022-04-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Support `[DllImport]` * Fix. * Fix. * Fix array type emit in cpp. * Fix. * Fix. * Fix Co-authored-by: Yong He <yhe@nvidia.com>
*	Refactor: eliminate BackEndCompileRequest (#2178)	Theresa Foley	2022-04-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An earlier refactoring pass over the compiler codebase split the type that had been called `CompileRequest` into three distinct pieces: * `FrontEndCompileRequest` which was supposed to own state and options related to running the compiler front end and producing IR + reflection (e.g., what translation units and source files/strings are included). * `BackEndCompileRequest` which was supposed to own state and options related to running the compiler back end to translate the IR for a `ComponentType` (program) into output code. (Note that the `BackEndCompileRequest` was conceived of as orthogonal to the `TargetRequest`s, which store per-target and target-specific options.) * `EndToEndCompileRequest` which was an umbrella object that owns separate front-end and back-end requests, plus any state that is only relevant when doing a true end-to-end compile (such as the kinds of compiles initiated with `slangc`). As originally conceived, the only state that this type was supposed to own was stuff related to "pass-through" compilation, as well as state related to writing of generated code to output files. That refactoring work was very useful at the time, because it allowed us to "scrub" the back end compilation steps to remove all dependencies on front-end and AST state (this was important for our goals of enabling linking and codegen from serialized Slang IR). At this point, however, it is clear that the hierarchy that was built up serves very little purpose: * The `BackEndCompileRequest` type is only used in two places: * As part of an `EndToEndCompileRequest`, where the settings on the `BackEndCompileRequest` can be configured, but only through the `EndToEndCompileRequest` * As part of on-demand code generation through the `IComponentType` APIs. In this case, the settings stored on the `BackEndCompileRequest` are not accessible to the application at all, and will always use their default values, so that instantiating a "request" object doesn't really make any sense. * The `FrontEndCompileRequest` type has a similar situation: * Front-end compilation as part of an `EndToEndCompileRequest` supports user configuration of `FrontEndCompileRequest` settings, but only through the `EndToEndCompileRequest` * Front-end compilation triggered by an `import` or a `loadModule()` call does not support user configuration of settings at all. It will always derive all relevant settings from thsoe on the session ("linkage"). In addition, subsequent changes have been made to the compiler that show a bit of a "code smell" and/or forward-looking worries for this decomposition: * In some cases we've had to add the same setting to multiple types in the breakdown (front-end, back-end, end-to-end, linkage, target, etc.) which makes it harder for us to validate that all the possible mixtures of state work correctly. * Related to the above, in some cases we have manual logic that copies state from one of the objects in the breakdown to another, in order to ensure that the user's intention is actually followed. * As a forward-looking concern, it seems that developers have sometimes added new configuration options and state to places that don't really make sense according to the rationale of the original decomposition (e.g., we probably don't want to have a lot of state that is only available via end-to-end requests, given that the API structure is meant to push users away from end-to-end compiles). As a result of all of the above, I've been planning a large refactor with the following big-picture goals: * Eliminate `BackEndCompileRequest` * Move all relevant state/options from the back-end request to the end-to-end request, since that is the only place they could be set anyway. * Introduce a transient "context" type to be used for the duration of code generation that serves the main functions that back-end requests really served in the codebase * Make `EndToEndCompileRequest` be a subclass of `FrontEndCompileRequest` * Consider addding a transient "context" type for front-end compiles that can be used in `import`-like cases rather than needing a full front-end request object. If this works, then eliminate `FrontEndCompileRequest` and be back to world with just a single `CompileRequest` type * Move all compiler configuration options to a distinct type (named something like `CompilerConfig` or `CompilerOptions` or whatever) which stores setting as key-value pairs, and has a notion of "inheritance" such that one configuration can extend or build on top of another. Make all the relevant types use this catch-all structure instead of redundantly storing flags in many places. This change deals with the first of those bullets: removeal of `BackEndCompileRequest`. The addition of the `CodeGenContext` type is perhaps an unncessary additional step, but making that change helps clean up a bunch of the code related to per-target code generation, so I think it is the right choice. Co-authored-by: Yong He <yonghe@outlook.com>
*	Allow slangc to generate exe from .slang file. (#2170)	Yong He	2022-03-28
\|
*	Fix type truncation during SCCP. (#2163)	Yong He	2022-03-18
\|
*	Improved SCCP, inlining and resource specialization passes, legalize ↵	Yong He	2022-02-25
\| \| \| \|	`ImageSubscript` for GLSL (#2146)
*	Add target option to force `scalar` layout for storage buffers. (#2135)	Yong He	2022-02-17
\| \| \|	Co-authored-by: Yong He <yhe@nvidia.com>
*	Fixed naming conflicts in heterogeneous-hello-world (#2114)	David Siher	2022-02-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fixed naming conflicts in heterogeneous-hello-world Added 3 new modifiers (`__unmangled`, `__exportDirectly`, `__externLib`) `__unmangled` causes mangleName() to return the normal name of the decl. `__exportDirectly` changes parent decl name concatenation behavior to use "::" instead of "." (for Name Hint) and emits the name hint when it exists, otherwise it emits the mangled name. `__externLib` stops Slang from emitting the corresponding struct. Also made necessary changes to heterogeneous-hello-world so that this new functionality is shown off. * Undo unintentional formatting changes Co-authored-by: Yong He <yonghe@outlook.com>
*	Add support for HLSL unorm/snorm (#2095)	Theresa Foley	2022-01-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Read/write resource types (what D3D/HLSL often refer to as UAVs) can be broadly categorized based on whether they require an underlying format (e.g., a `DXGI_FORMAT`) for reads, or not. D3D refers to the ones that require a format as "typed" UAVs (even though a `RWStructuredBuffer<MyData>` is clearly "typed" at the HLSL level). Vulkan refers to these cases as "storage images" and "storage texel buffers." Under the D3D model, an application does not have to specify the exact format for a formatted/"typed" UAV in order for loads to work, but it does need to specify if an HLSL resource with a declared `float` or vector-of-`float` element type will be backed by data with a `_UNORM` or `_SNORM` format. This is where the `unorm` and `snorm` type modifiers come in. Superficially, it might seem that adding this feature to the Slang compiler is "just" a matter of adding the two modifiers, which is easily done with a pair of one-line `syntax` declarations in `core.meta.slang` plus the corresponding AST node types. Unfortunately the superficial view misses the detail that, to date, Slang has not had any support for type modifiers at all, and has only supported declaration modifiers. The distinction has so far not mattered, even with modifiers like `const` because, e.g., the difference between a "`const` array of `float`" and an "array of `const float`" doesn't really matter. So, adding these two modifiers required introducing a lot of infrastructure along the way. Let's walk through what needed to happen: * As described above, the actual `syntax` was added easily in the Slang stdlib * I added a new subclass of `Modifier` for `TypeModifier`s in the AST, and added the AST nodes for `unorm` and `snorm` as subclasses of that. * In order to syntactically support modifiers applied to types (e.g., `unorm float`), I needed to add a `ModifiedTypeExpr` subclass of `Expr` that represents a base type expression with one or more modifiers applied * The parser needed some subtle new logic. There are two main cases where type modifiers will come up: 1. In contexts where we might be parsing a declaration (e.g., `const unorm float a`), we need to support a list of modifiers that might freely mix type modifiers and "declaration modifiers" which are not intended to apply to types. In this case we need to split the lis tof modifiers into the type-related ones and the declaration-related ones, and attach each subset to the appropriate place. This is very important for features like C-style pointers, where in `static const float* a;`, the `static` modifier applies to the entire declaration of `a`, but the `const` modifier only applies to the `float` type specifier, and not to the outer pointer type (the actual type of `a`). 2. In contexts where we are not parsing a declaration (e.g., a generic type argument), we need to support a list of modifiers and appy them all to the type specifier being parsed, even if some of them might not be appropriate. * While working in the parser I implemented a certain amount of unrelated cleanup for code that was using raw `Modifier`s to represent lists of modifiers, instead of the purpose-built `Modifiers` type. The `_parseGenericArg` case needed specific work, because it is an important case in the grammar where we need to parse either a type expression or a value exprssion, but cannot easily predict which we will see. The fix implemented for now is to always try to parse modifiers and, if we see any, to assume we are in the type case. Because of the rules for how modifiers in a C-like language inhere to the type specifier (and not necessarily the entire type), we need to refactor some of the type expression parsing routines to support parsing a "suffix" of a type expression. * Note: I decided to be conservative and only make these changes in `_parseGenericArg` because that is place that is needed in order for user code with `unorm`/`snorm` to work, but in practice a user could still confuse our parser by using type modifiers as part of a cast (e.g., `x = (unorm float)y;`). While there is currently no reason why a user should want to do this, it does suggest that we need to be prepared to see type modifiers in other ambiguous "expression or type?" contexts. We have so far preferred to avoid looking up built-in syntax declarations like modifiers in expression contexts, because we want to allow users to create variable names that might conflict with some of the more surprising modifier keywords in HLSL (e.g., both `triangle` and `sample` are modifier keyword). A nuanced strategy may be required when we get around to closing this gap (which will be needed around when we want full pointer support, since a cast like `(const SomeType)somePtr` is pretty common). In semantic checking, we now need a `visitModifiedTypeExpr`, which visits the base expression to produce a `Type` and then checks each of the `Modifier`s attached to it. During this process we need to translate the AST-level `Modifier`s into something that can exist properly in the universe of `Type`s. We introduce a `ModifiedType` subclass of `Type`, distinct from the `ModifiedTypeExpr` subclass of `Expr`. Furthermore, we introduce a `ModifierVal` subclass of `Val`, distinct from `Modifier`/`TypeModifier`. * One unfortunate thing here is that it means we have both, e.g., `UNormModifier` to represent the parsed syntax, and `UNormModifierVal` to represent the `Type`/`Val`-level representation of the same concept. It is quite likely that we are near the point where we can/should consider having two distinct AST representations: one for freshly-parsed ASTs and one for semantically-checked ASTs. The `Type`/`Val` hierarchy clearly belongs to the latter. * No actual semantic checking is currently being applied to the `unorm` and `snorm` modifiers, although we should in principle check that they are only being applied to `float` and vector-of-`float` types. * In an attempt to simplify some of the creation logic and build a tiny bit of reusable infrastructure, I went ahead and added the skeleton of a dedupe-caching system in `ASTBuilder` so that we can easily ensure only a single `UNormModifierVal` and a single `SNormModifierVal` ever get created inside the scope of a single builder. * TODO: Thinking about this, I'm now worried the deduplication does not mean I can make the simplifications I currently do in semantic checking by assuming that any two `UNormModifierVal`s will be pointer-identical. This is because we do not currently (IIRC) have the required "bottleneck" in the compiler where all ASTs get serialized after initial checking, and then deserialized when `import`ed into a downstream module, so that every AST node during a checking step comes from a single `ASTBuilder`. Hmm... * If we can rely on deduplication to do its thing, then the `Val` and `Type` implementations of modifiers can be relatively simple. * TODO: One issue here is that the equality comparison for `ModifiedType` currently checks for the same base type and the same modifiers in the same order. This works for now when we only have a small number of type modifiers and any given type will hae at most one, but in the longer run it relies on us to implement some kind of canonicalization scheme, which would both ensure that between `Modified(T, {A, B})` and `Modified(T, {B, A})` only one is allowed (that is, a canonical ordering on modifiers), and that we do not allow `Modified(Modified(T, {A}), {B})`. * TODO: One other issues is that the `ModifiedType` case does not currently interact correctly with the `as()`-based casting for types (whereas that operation does interact in a semantically-correct fashion with `typedef`s). Fixing this issue in a robust way really depends on us re-architecting the `Type` system so that any `Type` can have modifiers attached, with modifiers affecting type identity/deduplication. * The key place where `ModifiedType` creates a complication in semantic checking is type conversion/coercion. A user is likely to declare a `RWTexture2D<unorm float>`, fetch from it (producing a value of type `unorm float`) and then assign the result to a `float` variable, prompting for a conversion from `unorm float` to `float` (because they are distinct `Type`s). * We handle this case in the core `_coerce()` operation by checking if either `toType` or `fromType` is a `ModifiedType`. If either one is a modified type, we apply logic to check for modifiers that are present on one and not the other. Basically we check which modifiers need to be "dropped" and which need to be "added" during conversion, and validate that these modifiers can be dropped/added without creating a semantic error. The only type modifiers we support right now can be dropped/added like this, so we are fine. * TODO: When we add more complete pointer support, we could need logic here to validate when casts between, e.g., `const int` and `int` should/shouldn't be allowed. * Note: Even opening the door to type modifiers at all creates the same kind of challenges for user-defined generic types (and functions!) since `MyType<int>` and `MyType<const int>` are distinct instantiations in a future where we support `const` as a type modifier. We may need to plan to restrict where modified types can be used, so that certain built-in generic types support modified types as arguments, but user-defined types don't (or at least might need to opt-in to get support). * The result of a `_coerce()` that drops/adds modifiers is a `ModifierCastExpr`, which is a kind of no-op AST node that merely expresses that the conversion is allowed and valid. * In IR lowering we currently do the simple thing and translate a `ModifiedType` to a distinct IR node called `AttributedType`. * The change in terminology from "modifier" to "attribute" is to follow the way that these kinds of modifiers best map to the `IRAttr` case in the IR (rather than the `IRDecoration` case). We probably ought to do a careful terminology scrub here, because having this terminology mismatch between IR and AST could be a source of confusion. * TODO: In principle, using `IRAttributedType` creates the same basic problems as using `ModifiedType`: code that is usin `as()` or similar operations to check for a specific subclass of `IRType` may not see the case they were looking for due to use of `IRAttributedType`. * Initially I had hoped to avoid the problem by having the `IRAttr`s be attached directly as operands to an otherwise-ordinary `IRType`. E.g., a lowered `unorm float4` would be an `IRVectorType` with an "extra" operand that is an `IRUNormAttr`, something like: `Vector<Float, 4, UNorm>`. This sounds great (and looks great!), but runs into the problem that it is incompatible with the way we currently represent things like generic type parameters. A generic type parameter `T` is represented as an `IRParam`, and it does not make sense to have an additional `IRParam` to represent `const T` or `unorm T`, etc. * The Right Way to solve this stuff at both the AST and IR levels is to avoid passing around bare `Type` or `IRType` in general, and instead use a value type that implements the needed policy more directly: something like a `TypeHolder` or `IRTypeHolder` (placeholder name). The `Holder` type would abstract over the various "wrapper" nodes required to store all the additional data like attributes but, importantly, would not* allow that extra information to be dropped or lost during operations like casting (e.g., note how the current `Type` implementation of `as()` loses information on `typedef` names, making our error messages slightly worse). This is actually quite similar to how we currently use the `DeclRef<T>` system to allow working with what is usually a `T` under the hood, but in a way that ensures we don't lose track of any generic substitution information. During C-like code emit we have a process that turns an `IRType` into a chain of declarators as needed to emit a C-like declaration with pointers, arrays, etc. The `IRAttributedType` case needs to get folded into this logic. Basically, when we see an `IRAttributedType` we immediately emit any modifiers that are required to be in a prefix position, then recursively emit the underlying type with an extra layer of declarator that tracks the modifiers, so that we can emit any modifiers that should be placed in a postfix position after the type. As a specific example, our C/C++ back-end would want to use the postifx option to handle `const`, because then it can properly emit stuff like `int const * const ` and not the incorrect `const const int`. The HLSL emit logic overrides the prefix case for handling type attributes, and uses it to emit `unorm` and `snorm` where they occur. * One unfortunate detail is that (apparently) some downstream HLSL compilers do not allow the `unorm`/`snorm` modifiers to apply to `vector<float, >` types, even though that should be semantically valid. Instead, they only support `float`, `float2`, `float3`, and `float4` explicitly. To work around this issue, we go ahead and change our HLSL emit logic so that when we encountered 1-to-4 component vectors of `float`, `int`, or `uint` we emit the type name using the typical HLSL shorthand. This is actually a signficicant change in our HLSL output, but it both seemed like a good fix to have anyway, and was also the only obvious way to address the downstream parser shortcomings without a massive kludge. As a result of this change the `half-texture.slang` test broke, since it was using raw HLSL as the expected output. I changed the test to do a DXIL comparison instead, which is our preferred way of testing cross-compilation behavior (since it is more robust in the face of small changes to our source output).
*	Fix for issue #2069 (#2082)	jsmall-nvidia	2022-01-18
\| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Fix unused initialized variable warning. Fixed typo found in issue 2069 causing g++11 error of access through this.
*	Add GLSL450 intrinsics to SPIRV direct emit. (#1921)	Yong He	2021-08-17
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Add GLSL450 intrinsics to SPIRV direct emit. * Fix. * Fix compiler error. * Fix. * Fix compiler error. * Make direct-spirv tests actually run.
*	Further implementation of SPIRV direct emit. (#1920)	Yong He	2021-08-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Further implementation of SPIRV direct emit. This change implements: - Struct, Vector, Matrix and Unsized Array types. - Basic arithmetic opcodes, vector construct, swizzle etc. - getElementPtr, getElement, fieldAddress, extractField. - SPIRV target intrinsics with SPIRV asm code in stdlib. - RWStructuredBuffer and StructuredBuffer. - Pointer storage class propagation. - Control flow. * Fix.
*	Handle unexpected character in mangled names (#1919)	Theresa Foley	2021-08-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change is related to a case where a user saw a `/` character printed as part of a structure field name in output HLSL (which obviously broke downstream compilation). The root cause in that case was that a module that was `import`ed was found via a file system directory path, and thus the module name that the Slang compiler formed included a `/`. Due to other factors the structure field was emitted based on its mangled name (missing name hint), and that meant the bare `/` escaped into the output. That situation implies some other things maybe are questionable: * Could we have avoided a `/` ending up in the module name to begin with? Maybe, but we kind of want to support non-ASCII characters in names in the long run. * Could we have fixed whatever was causing the name hint to be dropped? Maybe, but we don't ever want name hints to be required for valid compilation (hence why they are "hints"). Even if we investigate these other issues, it is important that the name mangling scheme only ever produce output that uses a constrained subset of ASCII that we expect most compilation targets can support (GLSL being an annoying outlier because of its rules around `_`). This change adds a path where when mangling a `Name` as part of a mangled symbol name we check for the presence of bytes that aren't allowed and then produce an escaped form instead if needed. Note that the code is still byte-oriented for now aothough in the long-run it might want to be oriented around Unicode code points or extended grapheme clusters.
*	Various fixes to CUDA backend. (#1877)	Yong He	2021-06-08
\| \| \| \| \| \| \| \| \| \| \| \|	- Fix emitting `StructuredBuffer<ISomething>::Load`, which triggers emitting for `IROp_WrapExistential` that is previously unhandled. - Fix cuda layout around vectors, they should be aligned to 1,2,4,8,16 bytes instead of just using element type's alignment. That means `float4` has alignment of 16 instead of 4. - Fix `SLANG_CUDA_HANDLE_ERROR` macro definition. - Fix navis sometimes fail to find `Slang::kIROp_*` enum values when debugging external projects. Co-authored-by: Yong He <yhe@nvidia.com> Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
*	Added compiler-core project (#1775)	jsmall-nvidia	2021-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Split out compiler-core initially with just slang-source-loc.cpp * More lexer, name, token to compiler-core. * Split Lexer and Core diagnostics. * Move slang-file-system to core. * Add slang-file-system to core. * More DownstreamCompiler into compiler-core * Fix typo. * Add compiler-core to bootstrap proj. * Small fixes to premake * For linux try with compiler-core * Remove compiler-core from examples. * Added NameConventionUtil to compiler-core * Add global function to CharUtil to hopefully avoid linking issue. * Hack to make linkage of CharUtil work on linux.
*	Add GLSL/SPIR-V support got GetAttributeAtVertex (#1733)	Tim Foley	2021-03-03
\| \| \| \| \| \| \|	This change allows varying fragment shader inputs to be declared in a way that allows the `GetAttributeAtVertex` operation to compile to valid code for both D3D and GLSL/SPIR-V/Vulkan. The key is that rather than just use ordinary `nointerpolation`-qualified inputs the code must declare these varying inputs with a new `pervertex` qualifier that marks them as only being usable with `GetAttributeAtVertex`. The `pervertex`-tagged inputs then translate to GLSL inputs using the `pervertexNV` qualifier Note that this change does not include any enforcement of the requirements around how these qualifiers are used (and the compiler doesn't have enforcement for the existing operations like `EvaluateAttributeAtCentroid`). The underlying problem is that the inerpolation-mode qualifiers and explicit interpolation functions in HLSL constitute a kind of rate-qualified type system, but without any systematic rules. It seems wasteful to encode a bunch of ad hoc rules for this stuff as special cases in the compiler when the clear right answer is to implement a systematic approach to rates.
*	Clean up declarator handling during source emit (#1732)	Tim Foley	2021-03-02
\| \| \| \| \|	This change tidies up some code related to the handling of declarators for the purpose of "unparsing" types into C-like declarations. The big change is that the `EDeclarator` type is changed to `DeclaratorInfo` and now has a bit of a subtype hierarchy under it rather than just using a `union`. The declarations have been moved to the header for CLikeSourceEmitter` so that they can be used by subclasses. I also removed the `IRDeclaratorInfo` type that was being declared but never actually used, and moved the case for pointers from that type into the main `EDeclarator`/`DeclaratorInfo`.
*	Fix issue with long identifier names in GLSL output (#1731)	jsmall-nvidia	2021-03-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * First pass at handling 'names' that are too long in GLSL output. * Test to check functionality with very long func name. * Add access a long names buffer. * Fix typo in assert. Fix issue with coercion error for 1.0f / 0x7fffffff Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Add support for GetAttributeAtVertex for D3D (#1725)	Tim Foley	2021-02-24
\| \| \| \| \| \| \| \| \| \|	This operation was added along with the `SV_Barycentrics` system-value input, and allows for a `nointerpolation` varying input to a fragment shader to be fetched at a specific vertex index within the primitive that is causing the fragment shader to be invoked. This change adds support for the new operations in the standard library, and also includes a test case to make sure that we emit it correctly when producing HLSL/DXIL. This change also includes a small bug fix to our emission logic for function parameters so that we properly emit layout-related attributes for varying parameters declared directly on an entry point. (Note that most attribute end up being declared in `struct` types in existing HLSL shaders, and our IR passes produce only global variables for attributes on GLSL; the only case this affects is inidividual scalar/vector attributes declared declared as entry-point parameters, when outputting HLSL) Note that this change only adds support for the new function on the HLSL/DXIL path, and doesn't yet add any cross-compilation support for GLSL/SPIR-V. The reason for this is that the equivalent GLSL feature(s) appear to use a different model to the HLSL version, and we need to invent a suitable approach to align them to make portable code possible.
*	More #line improvements (#1713)	jsmall-nvidia	2021-02-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * WIP: First pass in supporting output of line error information. * Add support for lexing to better be able to indicate SourceLocation information. * Fix lexer usage in DiagnosticSink in C++ extractor. * Update diagnostics tests to have line location info. * Fixed test expected output that now have source location information in them. * Better handling of tab. * Fix test expected results for tabbing change. * DiagnosticLexer -> DiagnosticSink::SourceLocationLexer Added line continuation tests. * Fix typo. * Added String::appendRepeatedChar * Change to rerun tests. * Added source locations to IR dumping. * Output column for IR dump source loc. * Add support for closing brace location to AST. Use closing brace location in lowering when adding return void. * Set the source location through SourceLoc - simplifies identifying if current loc is valid. * Copy terminator sloc. * Test for improved #line handling. * Made writer the last parameter for dumpIR. Small improvements to comments. * Disable sloc output on dump IR by default. * Fix issue with #line and inlining. * Fix for output with improved #line output. * Small comment change - mainly to kick off TC build. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Add an accessor for IRInst opcode (#1707)	Tim Foley	2021-02-16
\| \| \| \| \| \| \| \| \|	* Add an accessor for IRInst opcode This main changing is renaming `IRInst::op` over to `IRInst::m_op` and then adds an accessor `IRInst::getOp()` to read it. The rest of the changes are just changing use sites to `getOp` (or to `m_op` in the limited cases where we write to it). This work is in anticipation of a future change that might need to store an extra bit in the same field as the opcode. It seemed better to do this massive refactoring as a separate PR. * fixup
*	Initial support for DXR payload access qualifiers (#1705)	Tim Foley	2021-02-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change adds initial support for a feature being proposed for inclusion in dxc: https://github.com/microsoft/DirectXShaderCompiler/pull/3171. The main features are: * A `[payload]` attribute that indicates which `struct` types are intended to be used as payloads. Consistent use of this attribute should mean that an application no longer needs to manually specify a maximum payload size when creating a ray-tracing pipeline. * `read(...)` and `write(...)` qualifiers which can be attached to fields of `struct` types (usually `[payload]`-attributed types) to indicate which ray tracing pipeline stages are allowed read/write access to that part of the payload. Use of these qualifiers should allow an implementation to optimize storage of ray payload elements across RT pipeline stages. The work in this change just adds basic parsing for these features, translation to matching IR decorations, and then emission of HLSL text based on those decorations. Notable gaps in this first change include: * No work is currently being done to validate access to ray payloads in RT entry points based on these qualifiers. * The stage names in `read(...)` and `write(...)` are not being validated, and are being stored in the IR as text. These should probably use the `Stage` enumeration in some fashion, but we would need to have a way to encode the additional `caller` pseudo-stage that the feature uses. * No work is currently being done to adjust or react to the chosen shader model when emitting HLSL code. We should either have these attributes force a switch to a higher shader model, or skip emission of these attributes if the chosen shader model / profile does not imply support for them. * No tests are currently included for this work, because tests would rely on using a custom `dxcompiler.dll` build with the new feature supported.
*	Fix line offset problem (#1690)	jsmall-nvidia	2021-02-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * WIP diagnostics for line number output. * Small param naming change * Use x macro for pass through compile human name lookup/getting. * WIP on parsing downstream compiler output. * Split out parsing into ParseDiagnosticUtil. Added test result of single line. * Dump out the std output on fail to parse diagnostics. * Change test type for syntax-error-intrinsic.slang be TEST not TEST_DIAGNOSTIC * Use Index for StringUtil. * WIP: First pass support for parsing Slang diagnostics. * WIP Testing comparing with ParseDiagnosticUtil with previous ad-hoc mechanism. * Use the new parsing mechanism for diagnostic comparisons. * Fix layout on GLSL, doesn't have CR so runs into main. * Split out switch on outputting intrinsic 'specials'. Output code around intrinsic as emit - so that we get the appropriate indenting (and potentially other benefits). * Improvements to diagnostics parsing. Better error handling, and fallback handling. Added ability to parse downstream compilers without a prefix. Added ability to parse Slang with a prefix. * DownstreamDiagnostic::Type -> Severity and related fixes. * Small fixes around moving from DownstreamDiagnostic::Type -> Severity * Fix handling of 'special intrinsic' expansion * Split out the handling of intrinsic expansion into it's own type and files. * Fixes to reading expected output - for SimpleLine test. * Test using += to check #line output. * A test around += and return. * Small comment fixes. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Fix issue when passing ray query to a subroutine (#1680)	Tim Foley	2021-01-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The problem would manifest for any code that declared a DXR 1.1 `RayQuery` value, but then only used it as one location in their code. The most common way for this to arise in user code was declaring a `RayQuery` and then handing it off to a helper/worker subroutine. RayQuery<0> myRayQuery; helperRoutine(myRayQuery, ...); The root cause was in the emit logic, where the initialization of `myRayQuery` above (a `defaultConstruct` operation in our IR) was getting folded into its (only) use site. This folding makes some sense, because the initialization of a ray query is not an operation with side effects, but doesn't work in practice because our way of handling default construction in HLSL output is by using a variable declaration. The simple fix here is to ensure that `defaultConstruct` instructions never get folded into use sites. If we decide to revisit the logic here, it might be possible to separate out the case where a `defaultConstruct` is being used as a stand-alone instruction, where we can emit it as: RayQuery<0> myRayQuery; versus cases where the `defaultConstruct` is being used as a sub-expression, such as: helperRoutine(RayQuery<0>(), ...); Whether or not we can emit the latter form (or if it would be equivalent) depends on details of how constructors like this are being implemented in dxc. For now it seems safest to emit things in a form that is obviously expected to work. Aside: Historically, the HLSL language has had no notion of "constructors" as being a thing. A variable that is declared but not initialized in HLSL has always been left uninitialized, since the first version of the language. The `RayQuery` type in DXR 1.1 is the first example of a type that appears to have a C++-style "default constructor," although HLSL as implemented by dxc still does not expose constructors as a user-visible or documented feature. (There is the small detail that the DXR 1.0 `HitGroup` type also relied on C++ constructor syntax, but I'm not aware of anybody using that feature right now, so it is mostly a curiosity.)
*	Obfuscation naming issue fix (#1676)	jsmall-nvidia	2021-01-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Work around for issue with obfuscation (and lack of name hints) leading to names in output not being correctly uniquified. * Improve appendChar Remove unrequired memory juggling to scrub names. * Remove test code. * Small fixes in comments and method called. * Remove linkage decoration on functions that are specialized. * Obfuscation naming with specialization test. * Fix instruction deletion. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Add support for [noinline] attribute (#1650)	Tim Foley	2021-01-07
\| \| \| \| \| \| \| \| \| \| \|	This adds the `[noinline]` attribute to the front-end, and passes it through when generating HLSL output. Notes: * This change doesn't include a test since the dxc version I have locally parses `[noinline]` but then generates DXIL that fails validation. * This change doesn't include logic to handle `[noinline]` for other targets. Notably, SPIR-V has decorations that convey the same intention, but we don't yet take advantage of the GLSL extension(s) that would let us generate those decorations. * By necesstiy, `[noinline]` is only a "strong suggestion" and not actually something the compiler can ever guarantee/enforce.
*	Use "capability" system to select VKRT extension (#1647)	Tim Foley	2021-01-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Use "capability" system to select VKRT extension Slang currently supports translation of ray tracing shader code to Vulkan GLSL code that uses the `GL_NV_ray_tracing` extension. A multi-vendor equivalent of that extension has been released as `GL_EXT_ray_tracing` and we want Slang to support that extension as well. At the simplest, making the change from one extension to the other is just a matter of changing a few strings, since it does not appear that anything of significance was changed at the GLSL level (or even in SPIR-V). Where this gets trickier is when we have users who want us to support both extensions, and to be able to switch between them. The solution we've implemented here more or less amounts to: * If you don't tell the compiler which extension to use, it will default to `GL_EXT_ray_tracing` (the newer multi-vendor one). * If you explicitly want the older extension, you can opt into it using the `-profile` option or via a new API for explicitly adding capabilities to your target. Making that work required a few different kinds of changes: * The options parsing and public API needed ways to add optional capabilities to a target. * During GLSL code emit, we can check the capabilities that were added to the target to see if the `GL_NV_ray_tracing` extension was explicitly enabled and, if not, default to using the `GL_EXT_ray_tracing` names for things. This step is needed because some of the modifiers/attributes involved in the extension have to be handled explicitly in the code generator rather than implicitly as part of mapping intrinsic functions. * We add two different translations to the relevant operatiosn in the stdlib, one marked with each of the extensions. If profile/capability-based overload resolution can be relied on to pick the right one, this should Just Work. * Next, a bunch of work had to go into making capability-based overloading Just Work for the purposes of this change. There's been a nearly complete reworking of the implementation of `CapabilitySet` here to make it more suitable for our needs. * The tests that were using ray tracing translation for Vulkan needed to be updated. For some of them I updated their baselines to use `GL_EXT_ray_tracing` so that they can test the new path. For others, I updated the command line for the test case so that it explicitly opts into using `GL_NV_ray_tracing`. The result is that we have some coverage of each extension. I would have liked to have each test run in both modes, but our pass-through glslang support doesn't support `-D` options, so I couldn't take that step easily. This change does not add support for `GL_EXT_ray_query`, the extension that supports "DXR 1.1" style queries under Vulkan. Adding support for that extension should hopefully be a smaller step because it doesn't have the same multiple-extensions issue. This change does not address a lot of possible avenues for improvement or cleanup around the capability system. It focuses only on those changes that are necessary to make the ray tracing feature work and leaves the rest for future work. * fixup: infinite loop * Comment-only change to retrigger TC build
*	Heterogeneous Flag Error Visibility (#1642)	Dietrich Geisler	2020-12-18
\| \| \| \| \| \| \| \| \| \|	* PR to fix issue #1638. This change introduces a diagnostic sink to the emitModule function, and updates all associated calls to that function. Additionally, this commit updates the heterogeneous hello world example to not need the entry and stage flags for simplicity. * Updated emit-cpp per suggested changes Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Add first steps toward a "capability" system (#1636)	Tim Foley	2020-12-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add first steps toward a "capability" system We already have cases in the stdlib where we mark declarations as being specific to certain targets, e.g.: ``` // My ordinary function to add two numbers. // Works everywhere. // void myFunc(int a, int b) { return a + b; } // On the "coolgpu" target, we can use a secret intrinsic // that adds numbers even faster! // __specialized_for_target(coolgpu) void myFunc(int a, int b) { return __secretIntrinsic(a, b); } ``` The existing logic for dealing with these modifiers (`__specialized_for_target` and `__target_intrinsic`) was almost entirely string-based. We would turn the chosen compilation target into a string, and then use that to try and search for the "best" definition of a function at a few steps: * During IR linking, we always pick one definition of an `[import]`ed function, and that definition will be the one with the "best" target-specialization modifier (if any) * During final code generation, we always look up the "best" target-intrinsic modifier, and use it as the template for the code we output. This change preserves the basic flow there, but replaces the ad hoc string-based logic with something a bit more principled, in terms of a new `CapabilitySet` type. A `CapabilitySet` represents a set of zero or more atomic features (here represented as `CapabilityAtom`s). What a `CapabilitySet` means depends on how and where it is used: * A compilation target implies a `CapabilitySet` where the contents of the set are the features the target supports. * A `CapabilitySet` attached to a declaration (or a modifier on that declaration) describes a set of feature that declaration requires. The current implementation of `CapabilitySet` is wasteful and inefficient, but that is something we can iterate on over time. In practice, most of the current code only ever uses capability sets that are either empty (because they represent a function with no specific requirements) or singleton (because they represent asingle atomic capability like "is a GLSL target," "is an HLSL target," etc.). The main goal here was to put in the skeleton of a new system, including some of the features it might need down the line, and then to leave changes that eventually use the greater flexibility for later. Eventually, the capability system should encompass: * Differences between shader model versions, GLSL versions, SPIR-V versions, etc. (currently tracked with other modifiers) * Optional extensions, and functions that are made available only with certain extensions (currently tracked with other modifiers) * Front-end checking that the call graph of a program doesn't violate any capability-requirements (e.g., having a GLSL+HLSL portable function call a GLSL-only subroutine) * Hypothetically we can also try to fold stage-specific (vertex-only, fragment-only, etc.) functions into this system, but doing so would require more linker cleverness if we allow overloading on stages (since we might have to clone a caller if it calls through to a callee with multiple stage-specific versions) One important complication that the system has to deal with just because of the "do what I mean" nature of the current compiler is that somethings a current Slang user might compile for target X and specify version N, but then use a function that actually requires version N+1 of that target. Currently the Slang compiler silently "upgrades" the version(s) used by user code in these cases, because it is often what users want in cross-compilation scenarios. Dealing with the "silent upgrade" situation requires us to be a little careful and sometimes pick a "best" capability set that doesn't appear to be supported on our target. Refining that system and potentially getting rid of the "do what I mean" behavior over time could be a goal for future changes. * fixup: handle case where value is incompatible during linking
*	Unify handling of static and dynamic dispatch for interfaces (#1612)	Tim Foley	2020-11-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Overview ======== Prior to this change, we had two different code generation strategies for interface/existential types in Slang, that didn't always play nicely together: * The "legacy" static specialization approach could handle plugging in an arbitrary concrete type for an existential type parameter (including types with resources, etc.), but wouldn't work well with things like a `StructuredBuffer<>` of an interface type, and requires somewhat counter-intuitive layout rules to make work. * The new dynamic dispatch approach produces simpler, more easily understood layouts by assuming that values of interface type can fit into a fixed number of bytes. The tradeoff there is that it cannot handle types that include resources (only POD types). The goal of this change is to make it so that the two strategies can co-exist. In particular, in cases where a shader is amenable to both static specialization and dynamic dispatch, the type layouts should agree. In order to make the type layouts agree, we: * Declare that all values of existential type reserve storage according to the dynamic-dispatch rules (so 16 bytes for the RTTI and witness-table information, plus whatever bytes are needed to story "any value" of a conforming type). * Then we modify the "legacy" layout rules so that if a value of concrete type can fit in the reserved "any value" space for a given interface, then it is laid out there exactly like the dynamic dispatch rules would do. Otherwise, we fall back to the previous legacy rules (since we don't need to agree with the dynamic-dispatch layout on types that can't be used with dynamic dispatch). Details ======= * Renamed `ExistentialBox` to `BoundInterfaceType` to better clarify how it relates to `BindExistentialsType` * Unconditionally apply the `lowerGenerics` pass during emit, since it is now responsible for aspects of the lowering of existential types when specialization is used. * Made IR type layout take the target into account, so that the layout of resource types can vary by target (e.g., being POD on some targets, and invalid on others) * Cleaned up some issues around using global shader parameters as the "key" for their layout information in the global-scope layout (only comes up when there are global-scope `uniform` parameters) * Made there be a default any-value size (16) instead of making it be an error to leave out. This was the simplest option; we could try to go back to having an error, but we'd need to only issue it if we are sure a type/interface is being used with dynamic dispatch, since static dispatch doesn't have to obey the restrictions. * Changed lowering of existential types to tuples so that bound interfaces where the concrete type won't fit use a "pseudo-pointer" instead of an "any-value" to hold the payload * Changed IR type legalization to handle the "pseudo-pointer" case and apply layout information from an interface type over to the payload part when static specialization was used. * Changed some details of how witness tables were being lowered, so that we didn't have to create "proxy" witness tables for the constraints on associated types (just use the actual requirement entries we generate) * Changed witness tables so that they know the subtype doing the conforming * Added logic so that we don't generate pack/unpack logic and witness table wrapper functions for types that are incompatible with any-value/dynamic dispatch for a given interface. * Changed the core AST-level type layout logic to use the dynamic-dispatch layout in case things fit, and the legacy static specialization case when things don't (while also reserving space for the dynamic-dispatch fields) * Changed a bunch of test cases for static specialization to properly use the new layout (which introduces new buffers in some cases, and moves data around in others). Future Work =========== The experience of trying to reconcile our older way of handling interface-type specialization with our newer model (that supports dynamic dispatch) makes it clear that we really need to make similar changes to our handling of generic type parameters on entry points and at the global scope. A future change should make it so that a global type parameter is lowered with a type layout similar to a value parameter of interface type, including the RTTI and witness-table pieces, and just leaving out the "any value" piece. A similar translation strategy should apply to entry-point generic parameters (mirroring how we lower generic functions for dynamic dispatch already), and value specialization parameters. Co-authored-by: Yong He <yonghe@outlook.com>
*	Support CUDA bindless texture in dynamic dispatch code. (#1575)	Yong He	2020-10-09
\|
*	Simplify workflow when using NVAPI (#1556)	Tim Foley	2020-09-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In some cases, functionality is available as either a GLSL extension for Vulkan/SPIR-V, or through the NVAPI system for D3D. This situation creates complications because while GLSL extensions are generally all supported by the open-source glslang compiler (which we can bundle and ship), NVAPI operations are exposed through a specific header (`nvHLSLExtns.h`) that ships as part of the NVAPI SDK. When a user wants to explicitly use NVAPI-provided operations in their shader code, there are no major complications for Slang; the user sets up their include paths, `#include`s the relevant header, calls functions in it, and lets Slang deal with the details of compilation. The challenge for Slang arises when we want to provide a cross-platform interface in our standard library (e.g., the `RWByteAddressBuffer.InterlockedAddF32` method that was recently added) that uses either a GLSL extension (when compiling for Vulkan/SPIR-V) or an NVAPI (when compiling to DXBC or DXIL). In that case, the code generated by Slang now has a dependency on NVAPI, and we need to somehow emit a `#include` directive that pulls it in when invoking fxc or dxc. Because we do not (and seemingly cannot) bundle the NVAPI header with the compiler, we have to rely on ther user to have it available and to somehow communicate to Slang where it is. Exposing portable routines that sometimes use NVAPI currently creates two main challenges: 1. The user is forced to interact with the "prelude" mechanism in the compiler, which allows the programmer to define code in a given target language that gets prepended to the Slang-generated code. While the prelude mechanism is powerful, it is also hard for users to integrate into their workflow, and our experience so far is that users want something that Just Works. 2. If the user writes code that uses some of our abstract operations that layer on NVAPI and they also want to use NVAPI explicitly, they end up with two copies of the NVAPI header (one included by the Slang front-end, and another included by the downstream fxc/dxc compiler). This puts the user in the situation of (a) having to ensure that they set the defines like `NV_SHADER_EXTN_SLOT` consistently both when invoking Slang and when adding their prelude, and (b) even if they do make the definitions consistent, they run into the problem that fxc/dxc complain about overlapping register bindings on the two copies of the `g_NvidiaExt` global shader paraemter that the NVAPI header declares. This change attempts to resolve both issues by adding a lot of "do what I mean" logic to the compiler to try to ease things in the common case. In particular: 1. The user no longer needs to use the "prelude" mechanism when using NVAPI. The compiler now embeds a default prelude for HLSL output, which will `#include` the NVAPI header if and only if the generated code needs NVAPI access because of portable standard library routines that were used. 2. The user can mix-and-match explicit NVAPI use and stdlib functions that compile to use NVAPI. The register/space to be used by NVAPI when included via prelude is now set based on whatever the user set via the preprocessor so that it should automatically be consistent between both cases. Furthermore, the code we emit for the declaration of `g_NvidiaExt` when compiling explicit NVAPI use is set up to be conditional, so that it is skipped in the case where the prelude will pull in its own declaration of that parameter. The way all this is achieved involves a lot of moving pieces: * We now have an HLSL prelude, which mostly just serves to `#include "nvHLSLExtns.h"` in the case where NVAPI support is needed downstream. * Standard library operations that require NVAPI for their implementation on HLSL include a new `[__requiresNVAPI]` attribute. * The preprocessor has been extended so that after tokenizing an input file it looks up the NVAPI-relevant macros in the resulting environment, and if they are set it attached a modifier (`NVAPISlotModifier1) to the AST `ModuleDecl` that is based on their values. Logic is added to detect if multiple input files specify values for the macros in ways that conflict. * The semantic checking step is extended so that it detects the "magic" NVAPI declarations (the `g_NvidiaExt` paramter and the `NvShaderExtnStruct` type that it uses) and attaches a modifier to them so that they can be identified as such in later steps. * Parameter binding is extended to collect a list of the AST modifiers that reflect NVAPI binding, and to reserve the relevant register(s) so that ordinary user-defined parameters cannot conflict with them. * IR lowering translates the three new AST modifiers related to NVAPI over to IR equivalents. * IR linking is extended to make sure that it clones any `IRNVAPISlotDecoration`s attached to the input modules. The pass intentionally does not care where the modifiers came from; it just collects them all and leaves it to downstream code to sort out what they mean. * Emit logic is extended to have a notion of "prelude directives" which are preprocessor directives that should come before the prelude in the generated code, because they can impact the way that the prelude compiles. This is done so that we don't have to introduce ad hoc logic for each downstream compiler to set any relevant `-D` flags (e.g., both fxc and dxc would need to duplicate such logic for NVAPI support). * The HLSL source emitter is extended to track whether it emits any operations that require NVAPI support. * The HLSL source emitter is extended to emit prelude directives based on whether NVAPI is needed and, if it is, to also set the register and space that NVAPI should use based on what was stored in the decoration(s) on the IR module. * The HLSL source emitter is extended so that it detects global instructions that represent "magic" NVAPI constructs , and emit them as conditional definitions so that they are skipped when NVAPI is included via the prelude. * The handling of requires capabilities during emit logic was cleaned up a bit so that more logic is shared across targets, and also so that the same logic is used both when emitting a function declaration/definition and when emitting a call to an instrinsic function (which won't get declared/defined).
*	Fix a bug around byte-address buffer loads of vectors (#1557)	Tim Foley	2020-09-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This problem is only visible when * using `RWByteAddressBuffer.Load<V>`, * when `V` is `vector<T,N>`, * and `V` is a non-32-bit type In such a case, the Slang compiler generates output HLSL like: someBuffer.Load<vector<T,N>>(someOffset); and dxc balks because it fails to parse the `>>` as closing the generics, and instead parses it as a shift operator. The solution here is simple: add a space before the closing `>` when emitting a `.Load<T>` operation. Note that this change does not come with a test fix yet because writing the test case exposed a more complicated issue with GLSL codegen for this same scenario. This change includes the simple single-byte fix, to unblock users while we work on a fix for the GLSL case (and that fix will include the test coverage).
*	Enable all dynamic dispatch tests on CUDA. (#1552)	Yong He	2020-09-21
\| \| \| \| \|	* Enable all dynamic dispatch tests on CUDA. * Fix expected cross-compile test results.
*	Initial attempt to enable CUDA dynamic dispatch codegen (#1549)	Yong He	2020-09-17
\| \| \| \| \|	* Front-load cuda module loading to fill in RTTI pointers. * Enable dynamic dispatch codegen for CUDA.
*	Allow existential types in `StructuredBuffer` element type. (#1536)	Yong He	2020-09-10
\| \| \| \| \| \| \| \| \| \| \|	* Allow existential types in `StructuredBuffer` element type. * Handle StructuredBuffer.Load/.Consume methods * Clean up unnecessary changes * Code cleanup * Update test comment
*	Allow mixing unspecialized and specialized existential parameters. (#1533)	Yong He	2020-09-04
\| \| \| \| \|	* Allow mixing unspecialized and specialized existential parameters. * Fixes.
*	Allow unspecialized existential shader parameters (dynamic dispatch). (#1529)	Yong He	2020-09-02
\| \| \| \| \| \| \| \| \|	* Allow unspecialized existential shader parameters (dynamic dispatch). * Fixes. * Fixes * disable cuda test
*	Add support for (undocumented) HLSL 16-bit bit-cast ops (#1528)	Tim Foley	2020-09-02
\| \| \| \| \| \| \| \| \| \| \| \| \|	As of SM 6.2, the dxc compiler added support for a set of 16-bit bit-cast operations to mirror the `asuint`, `asfloat`, and `asint` operations that were provided for 32-bit scalar types. These operations are not publicly documented, so we didn't think to add them. It should be noted that there was already a similar operation in HLSL, called `f32tof16`, that took as input a `float` and then packed a half-precision version of it into the low bits of a `uint`. The problem is that using that operation for `half`->`uint16_t` conversion required a round trip through a `float`, and downstream compilers seemingly can't optimize away that conversion. This change adds the new operations along with a test that tries to make use of them to ensure the results are what is expected. There are enough cases to cover that I had to write the test in a way where each thread only writes out a subset of the required output. There are two other changes here are that are not directly related to the main feature: First, it seems like the `[__forceInlineEarly]` attribute on some of these overloads interacts poorly with generics, and results in an `IRVectorType` appearing at local scope in the output code. That is semantically reasonable given our IR model, but it would ideally be something that gets eliminated as a result of deduplication of types. For now I've introduced a slight hack to make types always get inlined into their use sites during emission, which should handle the case of locally-defined types. I'm not 100% happy with that solution, but it seemed better than introducing a bunch of unrelated fixes into this PR. Second, the way that conversion operations were being declared for matrix types seems to have been incorrect: we had a single explicit initializer added to matrix types via an `extension` that allowed them to be initialized from other matrix types with the same size and any element type. In order to support implicit conversions of matrix types, I cribbed the code we were already using to introduce implicit conversion operations for vector types.
*	Enable lower-generics pass universally. (#1518)	Yong He	2020-08-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Enable lower-generics pass universally. * Exclude builtin interfaces and functions from lower-generics pass. * Update stdlib. * Fixup. * Fixes handling of nested intrinsic generic functions. * Fixes. * Fixes.
*	GPU Foreach Loop (#1498)	Dietrich Geisler	2020-08-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* GPU Foreach Loop This PR introduces the completed GPU foreach loop and updates the heterogeneous-hello-world example to use it. This PR builds on the previous introduction of the GPU Foreach loop parsing and semantic checking PR (#1482) by introducing IR lowering and emmitting. THe new feature can be used by having a GPU_Foreach loop interacting with a named non-CPP entry point, and using the -heterogeneous flag. * Fix to path Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	IR support for Tuple types. (#1492)	Yong He	2020-08-13
\| \| \| \| \| \| \| \| \|	* Tuple types. * Fix x86 warning * Improved deduplication Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	`AnyValue` based dynamic dispatch code gen (#1477)	Yong He	2020-08-05
\| \| \| \| \|	* AnyValue based dynamic code gen * Fix aarch64 build error
*	Fix issues arising around DXR 1.1 RayQuery usage (#1468)	Tim Foley	2020-07-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change includes a few different fixes for issues that arose in a user shader that made use of DXR 1.1. The existing solution we had for handling the DXR 1.1 `RayQuery` type relied on the fact that a declaration like: ```hlsl RayQuery<0> myRayQuery; ``` Looks like an undefined variable to existing Slang, while to dxc it is a variable declaration that runs an implicit default constructor (sneaking a bit of C++ into HLSL, but only in a way the standard library can use). Slang was getting away with the fact that this maps to an undefined variable because it turns out that our emit logic would output the exact same declaration for an undefined value (since declaring a variable without initializing it is the simplest way to get an undefined value of a given type in a C-like language). The main bug that arose here was that if the `RayQuery<...>`-typed variable was declared under control flow, then the `undefined` instructions introduced by our SSA pass would actually get inserted into the wrong block. Basically, when a block was trying to read a variable, and there was no preceding `store` to that variable in the block, we'd start looking for incoming values from its predecessor block(s). In the case where the variable never gets stored to, this search would eventually reach the first block of the function, where we'd realize the value must be `undefined`. The result was that we might insert an `undefined` instruction of some `T` into the first block of a function, but the type `T` might be the result of a lookup operation performed later in that function. This ends up creating a use of `T` that isn't dominated by the definition, which violates the SSA property. This violation of the SSA properties lead to us generating incorrect code in a later pass that deals with scoping differences between SSA form and our structured output statements; that code would end up creating a local variable to hold a type instead of a value. The main fix is in `slang-ir-ssa.cpp`, where we catch the case of trying to read a variable in the block that declares it, if there we no preceding `store`s. We simply insert an `undefined` instruction before the first such read and write that out as the value of the variable to be used for subsequent instructions (up to the next `store`). This fixes the SSA dominance property for the `undefined` values that get introduced and thus technically fixes the output code for the user shader. A secondary issue is that it is kind of gross to be relying on the behavior of `undefined` instructions in the IR for the semantics of an important standard library type like `RayQuery<...>`. A preceding change already added basic support for Slang to run default initializers (declared as `__init()`) on variables that are declared without an initial-value expression. This change adds such a default initializer to `RayQuery<...>` and maps it to a dedicated IR instruction that is intended to represent the idea of running a C++-style default constructor to produce a value. It turns out that the code we need to emit in that cse is identical to what we currently emit for `undefined` instructions, so that is helpful. A tertiary issue is that when trying to run the user shader in debug mode, I ran into an assertion because our type layout logic for reflection had never dealt with the issue of user-defined `enum` types being used in constant buffers or other memory that needs layout. I added a quick fix that lays out any `enum` types as their "tag type" (which defaults to `int`). Unfortunately, there is no easy way to check in a regression test for the user issue, because official `dxcompiler` versions with support for DXR 1.1 are not yet released (at least as of last time I checked).
*	Change parameter passing convention for CUDA (#1463)	Tim Foley	2020-07-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Big Picture =============== Given input Slang code like: ```hlsl Texture2D gA; [shader("compute")] void kernelFunc(uniform Texture2D b, uint3 tid : SV_DispatchThreadID) { ... } ``` the existing CUDA code generation strategy would always generate a kernel with a signature like: ```c++ struct GlobalParams { Texture2D gA; } struct EntryPointParams { Texture2D b; } extern "C" __global__ void kernelFunc(EntryPointParams* entryPointParams, GlobalParams* globalParams) { ... } ``` This choice was consistent with the conventions of the CPU kernel target, and shares the advantage that it is easy for the user to data-drive the logic for filling in parameters and then invoking a kernel. However, the approach outlined above has two serious problems when used for CUDA kernels: * First, it defies the programmer's expectation about what an "equivalent" CUDA kernel signature would be, which makes it awkward for a developer to invoke this kernel from CUDA C++ host code (especially in the context of an app that might also run hand-written CUDA kernels). * Second, the performance of this approach suffers because every access to a global or entry point parameter turns into a load from global memory. In contrast, a typical hand-written CUDA kernel passes its parameters via an implementation-specific path that (for current CUDA platforms) seems to be equivalent to `__constant__` memory in performance. This change alters the convention so that the Slang compiler takes the code from the top of this message and translates it into something like: ```c++ struct GlobalParams { Texture2D gA; } __constant__ GlobalParams SLANG_globalParams; extern "C" __global__ void kernelFunc( Texture2D b ) { ... } ``` This translation alleviates both problems with the current translation: * The signature of the generated CUDA kernel function is as close to that of the original as is possible (we had to eliminate the `SV_`-semantic varying inputs), and should directly match what the programmer would expect in common cases. Entry-point parameters are passed via CUDA kernel parameters, and should thus match in performance. Global parameters are passed via a variable in `__constant__` memory, and thus should also perform as well as possible/expected. Detailed Changes ================ * Disable the `collectEntryPointUniformParams` pass for CUDA, so that entry-point `uniform` parameters are not bundles into a single `struct` and/or `ConstantBuffer`. * When targeting CUDA, disable the logic for generating an entry-point parameter for passing in the global shader parameter(s) * Allow `CLikeSourceEmitter` subclasses to override the name generated for entry-point symbols, and use this to add the required prefix for each OptiX kernel type when translating a ray-tracing kernel. * Add logic to emit "parameter groups" in a specialized way for CUDA (this is the same approach that allows us to generate `cbufffer { ... }` declarations for fxc). A global-scope parameter group will turn into a global `__constant__` variable called `SLANG_globalParams` (that name becomes part of the ABI for Slang-compiled shaders). * Update the logic in `render-test` for loading and invoking CUDA kernels to handle the new policy. The last bullet there merits expansion, since it is indicative of the work a client using Slang would have to go through to use our generated kernels with the new policy: * When loading a CUDA module with one or more kernels, we also use `cuModuleGetGlobal` to query the address of the `SLANG_globalParams` symbol in that CUDA module. That pointer needs to be used when setting global parameter values to be used by kernels in that CUDA odule. * Because our existing `BindPoint` logic for CUDA always sets up parameter data in GPU memory, we end up having to copy the entry-point parameter data from GPU memory to host memory. This step would ideally be skipped in a codebase that understands the correct policy, but it is a bit unfortunate that it is no longer trivially correct for an application to store all parameter data in GPU memory. * Before invoking the kernel, we need to use a `cudaMemcpyAsync` to copy from the prepared GPU memory for global parameters over to the `SLANG_globalParams` symbol associated with the kernel to be invoked. Because this operations is issued on the same CUDA stream as the kernel call, it is guaranteed to not overlap with GPU kernel execution. * When invoking the kernel, we take advantage of the seldom-used `CU_LAUNCH_PARAM_BUFFER_POINTER` facility to specify a contiguous memory region with all the entry-point parameters in it instead of passing each entry-point parameter separately. Given Slang reflection it is also possible to query the offset of each entry-point parameter in the buffer, so we could invoke the kernel in the traditional fashion as well. The choice here is up to the application. Caveats ======= * This is a breaking change, and any subsequent release will need to reflect that fact. Any customers who rely on Slang's current CUDA codegen strategy are likely to be surprised by this change, and I don't see an easy way to give them a more gentle transition. * This change does not remove the logic that introduces a `KernelContext` type for code that requires it. That means that things like `static` global variables can continue to work on CUDA for now, but we know that those are not going to be something we can support in the long-term with separate compilation. * While the policy implemented in this change is a reasonable default, it is still not going to perfectly match expecations for some developers. In particular, some developers who are familiar with both D3D and CUDA will likely wonder why a global `cbuffer` in Slang translates to a global-memory pointer in the output CUDA instead of one global `__constant__` variable per `cbuffer`. A more detailed alternate translation would generate a distinct global `__constant__` variable for each top-level constant buffer or parameter block. We may need to refine the translation even more based on feedback from users who care about how we handle global-scope parameters. * Recent changes in Slang have broken the logic that handles the OptiX "shader record" as an alternative mechanism for passing entry-point parameters. In order to get any level of OptiX support up and running we will have to change the IR passes that run on CUDA kernels to actually run the "collection" of `uniform` parameters for ray tracing stages, and then to replace references to the resulting parameter with a call to the function to access the shader record. * The use of `SLANG_globalParams` here works well enough in the case of whole-program compilation; every `CUmodule` ends up with (zero or) one parameter with this name, and an application can just hard-code it. As a mechanism it wouldn't work in the presence of separately-compiled modules that might introduce their own global parameters (including cases like constant lookup tables that really want to be at the global scope). An alternative approach would have Slang generate output PTX for each module, where a module has an optional global symbol for its own global-scope parameters (with a mangled name that is based on the module name), and then a linked CUDA binary has all of those distinct symbols. Such an approach would be compatible with module-at-a-time reflection and parameter binding, but would lead to another breaking change down the line for code that switches to `SLANG_globalParams`.
*	Fix support for nested generic intrinsics (#1462)	Tim Foley	2020-07-28
\| \| \| \| \|	The logic that detects intrinsic functions during emit was not able to properly detect an intrinsic generic method nested in a generic type. The basic problem was that this led to a `specialize(specialize(...), ...)` in the IR, which wasn't being handled (only one level of `specialize` was handled). The fix is local and simple. The larger issue was that the author of this commit had thought our IR ruled out nested generics like this, when in fact that is precisely how we handle nested generics throughout the IR. This oversight/misunderstanding means that we might have broken passes in other places that assume nested generics cannot happen. This change doesn't pretend to fix that other issue, but we should pay attention to it.
*	Run SSA pass to clean up temporary variables during generics lowering. (#1447)	Yong He	2020-07-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Run SSA pass to clean up generic temporary variables during lowering. * Fix `undefined` emitting logic. * revert dumpir control flag * Defer fold decision of `undefined` values after special case logic for GLSL and HLSL. * Update expected test result. * Manually update raygen.slang.glsl to minimize change. * fix formatting Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>