summaryrefslogtreecommitdiffstats
path: root/source/slang/slang-emit-hlsl.cpp
Commit message (Collapse)AuthorAge
...
* Rename IR opcodes to unify style. (#2556)Yong He2022-12-07
| | | Co-authored-by: Yong He <yhe@nvidia.com>
* Remove `construct` IR op. (#2555)Yong He2022-12-07
| | | Co-authored-by: Yong He <yhe@nvidia.com>
* Inline functions with string param/return for GPU targets (#2544)jsmall-nvidia2022-12-02
| | | | | | | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * WIP inlining of functions that take or return string related types on GPU targets. * Small fixes. * Added a test. * Add checking for any getStringHash insts are valid. * Support getStringHash on CUDA. * Tweak diagnostic.
* Mesh shader support (#2464)Ellie Hermaszewska2022-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add gdb generated files to .gitignore * Switch to c++17 TODO: Ellie update coding style doc * WIP mesh shaders * Add MeshOutputType and mesh output decorations * Lift array type layout creation out of _createTypeLayout in preparation for sharing it elsewhere * Initial pass at GLSL legalization for mesh shaders * Create output types for builtin mesh outputs This should be rendered as an out paramter block * Handle writes to member fields in mesh shader output * Per primitive output from mesh shaders * Add mesh shader tests * Redeclare mesh output builtins * Remove unused instruction * Emit explicit mesh output max max size * Add unimplemented warning for array members in mesh output * Implement mesh output splitting for GLSL in terms of getSubscriptVal * Allow HLSL syntax for mesh output modifiers * Improve error messages for mesh output * Add test for HLSL style mesh output syntax * Emit explicit mesh output indices max size * HLSL generation support for mesh shaders * Better errors for mesh shader misuse * Neaten comments * Regenerate vs2019 project files * Fix build on vs2019 * Retreat on c++17 Will make the change in a separate PR * slang-glslang binary dep 11.10.0 -> 11.12.0-32 * Fixes for msvc compiler * Update msvc project
* Shader Execution Reordering without HLSL2021 (#2489)jsmall-nvidia2022-11-03
| | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * Disable SER tests and enabling HLSL2021 by default. * Small typos fix. Improve SER coverage in testing. * Fix typo.
* Various gfx fixes. (#2434)Yong He2022-10-05
|
* Add support for GL_EXT_debug_printf extension to slang (#2399)Qubaef2022-09-15
|
* Language feature: pointer sized int types. (#2401)Yong He2022-09-15
| | | | | | | | | | | | | | | | | | | | | * Language feature: pointer sized int types. * Fix. * small change to test. * Fix stdlib. * Fix. * Fix. * Add typedef for `size_t` in stdlib. * Fix test. * Add `intptr_t::size` constant. Co-authored-by: Yong He <yhe@nvidia.com>
* Added NativeStringType (#2252)jsmall-nvidia2022-05-27
| | | | | | | | | | | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * Use TerminatedUnownedStringSlice for literals in output C++. * Remove Escape/Unescape functions used in slang-token-reader.cpp Add target type of 'host-cpp' etc to map to the target types. * Fix some corner cases around string encoding. * Added unit test for string escaping. Fixed some assorted escaping bugs. * Updated test output. * Added decode test. * Stop using hex output, to get around 'greedy' aspect. Use octal instead.
* Refactor prelude emit (#2236)jsmall-nvidia2022-05-17
| | | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * Refactor how prelude output works in emit. * Small improvement to emit output. * Move around comment on target specific language directives based on review. Co-authored-by: Theresa Foley <10618364+tangent-vector@users.noreply.github.com>
* Initial work around groupshared (#2224)jsmall-nvidia2022-05-06
| | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * Allow rate modifier on parameter. * Add test. * Disable test for now as breaks on source comparison because around nvAPI.
* Support for HLSL `export` (#2223)jsmall-nvidia2022-05-05
| | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * Add support for HLSL `export`. * Test for using `export` keyword.
* Make artifact an interface (#2195)jsmall-nvidia2022-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * Compile to a dxil library. * Added CompileProduct. * Support handling of ModuleLibrary. * CacheBehavior -> Cache * Use CompileProduct for -r references. * CompileProduct -> Artifact. * Determining an artifact type on binding. * Determine binary linkability. * Added Artifact::exists. * Added ArtifactKeep. * Small fixes. * Small improvements to Artifact. * Add zip extension. * Fix some comments. * Fix multiple adding of PublicDecoration. Make public output export for DXIL/lib. Add checking for simpleDecorations such that only added once. * Use 'whole program' to identify library build. * Move slang-artifact into compiler-core. * Split out Keep free functions. * Artifact::Keep -> ArtifactKeep. * Handle libraries as artifacts. * Add -target dxil so test infrastructure knows it needs DXC. * Linking working in DXC. * Improve handling around emit for 'export'. * Add comment around Artifact name. * Render test working with linking. * Improvements around Artifact handling. * Add ArtifactPayloadInfo. * Small tidy up around artifact. * Split out code to get info about Artifacts into artifact-info.cpp/.h * IArtifact interface and IArtifactInstance interface. * Fix small issues. * Fix compilation warning issue. * Fix missing SLANG_OVERRIDE. * Small fixes to make compilation work on Visual Studio 2022. * Small improvements to Artifact interface/naming. * Added Desc with each element in IArchive to allow more flexibility in usage. * Fix clang warning issue. * Add ArtifactPayload::Diagnostics * More discussion around IArtifact usage. * Re-add slang-artifact.h which was removed during merge. * Fix typo identified in review.
* Linking in DXC (#2190)jsmall-nvidia2022-04-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * Compile to a dxil library. * Added CompileProduct. * Support handling of ModuleLibrary. * CacheBehavior -> Cache * Use CompileProduct for -r references. * CompileProduct -> Artifact. * Determining an artifact type on binding. * Determine binary linkability. * Added Artifact::exists. * Added ArtifactKeep. * Small fixes. * Small improvements to Artifact. * Add zip extension. * Fix some comments. * Fix multiple adding of PublicDecoration. Make public output export for DXIL/lib. Add checking for simpleDecorations such that only added once. * Use 'whole program' to identify library build. * Move slang-artifact into compiler-core. * Split out Keep free functions. * Artifact::Keep -> ArtifactKeep. * Handle libraries as artifacts. * Add -target dxil so test infrastructure knows it needs DXC. * Linking working in DXC. * Improve handling around emit for 'export'. * Add comment around Artifact name. * Render test working with linking. Co-authored-by: Theresa Foley <10618364+tangent-vector@users.noreply.github.com>
* `export` support in HLSL (#2188)jsmall-nvidia2022-04-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * Compile to a dxil library. * Added CompileProduct. * Support handling of ModuleLibrary. * CacheBehavior -> Cache * Use CompileProduct for -r references. * CompileProduct -> Artifact. * Determining an artifact type on binding. * Determine binary linkability. * Added Artifact::exists. * Added ArtifactKeep. * Small fixes. * Small improvements to Artifact. * Add zip extension. * Fix some comments. * Fix multiple adding of PublicDecoration. Make public output export for DXIL/lib. Add checking for simpleDecorations such that only added once. * Use 'whole program' to identify library build. * Add -target dxil so test infrastructure knows it needs DXC.
* Add support for HLSL unorm/snorm (#2095)Theresa Foley2022-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Read/write resource types (what D3D/HLSL often refer to as UAVs) can be broadly categorized based on whether they require an underlying format (e.g., a `DXGI_FORMAT`) for reads, or not. D3D refers to the ones that require a format as "typed" UAVs (even though a `RWStructuredBuffer<MyData>` is clearly "typed" at the HLSL level). Vulkan refers to these cases as "storage images" and "storage texel buffers." Under the D3D model, an application does not have to specify the exact format for a formatted/"typed" UAV in order for loads to work, but it *does* need to specify if an HLSL resource with a declared `float` or vector-of-`float` element type will be backed by data with a `*_UNORM` or `*_SNORM` format. This is where the `unorm` and `snorm` type modifiers come in. Superficially, it might seem that adding this feature to the Slang compiler is "just" a matter of adding the two modifiers, which is easily done with a pair of one-line `syntax` declarations in `core.meta.slang` plus the corresponding AST node types. Unfortunately the superficial view misses the detail that, to date, Slang has not had any support for *type modifiers* at all, and has only supported *declaration modifiers*. The distinction has so far not mattered, even with modifiers like `const` because, e.g., the difference between a "`const` array of `float`" and an "array of `const float`" doesn't really matter. So, adding these two modifiers required introducing a lot of infrastructure along the way. Let's walk through what needed to happen: * As described above, the actual `syntax` was added easily in the Slang stdlib * I added a new subclass of `Modifier` for `TypeModifier`s in the AST, and added the AST nodes for `unorm` and `snorm` as subclasses of that. * In order to syntactically support modifiers applied to types (e.g., `unorm float`), I needed to add a `ModifiedTypeExpr` subclass of `Expr` that represents a base type expression with one or more modifiers applied * The parser needed some subtle new logic. There are two main cases where type modifiers will come up: 1. In contexts where we might be parsing a declaration (e.g., `const unorm float a`), we need to support a list of modifiers that might freely mix type modifiers and "declaration modifiers" which are not intended to apply to types. In this case we need to split the lis tof modifiers into the type-related ones and the declaration-related ones, and attach each subset to the appropriate place. This is very important for features like C-style pointers, where in `static const float* a;`, the `static` modifier applies to the entire declaration of `a`, but the `const` modifier *only* applies to the `float` type specifier, and *not* to the outer pointer type (the actual type of `a`). 2. In contexts where we are not parsing a declaration (e.g., a generic type argument), we need to support a list of modifiers and appy them *all* to the type specifier being parsed, even if some of them might not be appropriate. * While working in the parser I implemented a certain amount of unrelated cleanup for code that was using raw `Modifier*`s to represent lists of modifiers, instead of the purpose-built `Modifiers` type. * The `_parseGenericArg` case needed specific work, because it is an important case in the grammar where we need to parse *either* a type expression or a value exprssion, but cannot easily predict which we will see. The fix implemented for now is to always try to parse modifiers and, if we see any, to assume we are in the type case. Because of the rules for how modifiers in a C-like language inhere to the type specifier (and not necessarily the entire type), we need to refactor some of the type expression parsing routines to support parsing a "suffix" of a type expression. * Note: I decided to be conservative and only make these changes in `_parseGenericArg` because that is place that is *needed* in order for user code with `unorm`/`snorm` to work, but in practice a user could still confuse our parser by using type modifiers as part of a cast (e.g., `x = (unorm float)y;`). While there is currently no reason why a user should want to do this, it *does* suggest that we need to be prepared to see type modifiers in other ambiguous "expression or type?" contexts. We have so far preferred to avoid looking up built-in syntax declarations like modifiers in expression contexts, because we want to allow users to create variable names that might conflict with some of the more surprising modifier keywords in HLSL (e.g., both `triangle` and `sample` are modifier keyword). A nuanced strategy may be required when we get around to closing this gap (which will be needed around when we want full pointer support, since a cast like `(const SomeType*)somePtr` is pretty common). * In semantic checking, we now need a `visitModifiedTypeExpr`, which visits the base expression to produce a `Type` and then checks each of the `Modifier`s attached to it. During this process we need to translate the AST-level `Modifier`s into something that can exist properly in the universe of `Type`s. We introduce a `ModifiedType` subclass of `Type`, distinct from the `ModifiedTypeExpr` subclass of `Expr`. Furthermore, we introduce a `ModifierVal` subclass of `Val`, distinct from `Modifier`/`TypeModifier`. * One unfortunate thing here is that it means we have both, e.g., `UNormModifier` to represent the parsed syntax, and `UNormModifierVal` to represent the `Type`/`Val`-level representation of the same concept. It is quite likely that we are near the point where we can/should consider having two distinct AST representations: one for freshly-parsed ASTs and one for semantically-checked ASTs. The `Type`/`Val` hierarchy clearly belongs to the latter. * No actual semantic checking is currently being applied to the `unorm` and `snorm` modifiers, although we should in principle check that they are only being applied to `float` and vector-of-`float` types. * In an attempt to simplify some of the creation logic and build a tiny bit of reusable infrastructure, I went ahead and added the skeleton of a dedupe-caching system in `ASTBuilder` so that we can easily ensure only a single `UNormModifierVal` and a single `SNormModifierVal` ever get created inside the scope of a single builder. * TODO: Thinking about this, I'm now worried the deduplication does not mean I can make the simplifications I currently do in semantic checking by assuming that any two `UNormModifierVal`s will be pointer-identical. This is because we do not currently (IIRC) have the required "bottleneck" in the compiler where all ASTs get serialized after initial checking, and then deserialized when `import`ed into a downstream module, so that every AST node during a checking step comes from a single `ASTBuilder`. Hmm... * If we can rely on deduplication to do its thing, then the `Val` and `Type` implementations of modifiers can be relatively simple. * TODO: One issue here is that the equality comparison for `ModifiedType` currently checks for the same base type and the same modifiers in the same order. This works for now when we only have a small number of type modifiers and any given type will hae at most one, but in the longer run it relies on us to implement some kind of canonicalization scheme, which would both ensure that between `Modified(T, {A, B})` and `Modified(T, {B, A})` only one is allowed (that is, a canonical ordering on modifiers), and that we do not allow `Modified(Modified(T, {A}), {B})`. * TODO: One other issues is that the `ModifiedType` case does not currently interact correctly with the `as()`-based casting for types (whereas that operation *does* interact in a semantically-correct fashion with `typedef`s). Fixing this issue in a robust way really depends on us re-architecting the `Type` system so that *any* `Type` can have modifiers attached, with modifiers affecting type identity/deduplication. * The key place where `ModifiedType` creates a complication in semantic checking is type conversion/coercion. A user is likely to declare a `RWTexture2D<unorm float>`, fetch from it (producing a value of type `unorm float`) and then assign the result to a `float` variable, prompting for a conversion from `unorm float` to `float` (because they are distinct `Type`s). * We handle this case in the core `_coerce()` operation by checking if either `toType` or `fromType` is a `ModifiedType`. If *either* one is a modified type, we apply logic to check for modifiers that are present on one and not the other. Basically we check which modifiers need to be "dropped" and which need to be "added" during conversion, and validate that these modifiers *can* be dropped/added without creating a semantic error. The only type modifiers we support right now *can* be dropped/added like this, so we are fine. * TODO: When we add more complete pointer support, we could need logic here to validate when casts between, e.g., `const int*` and `int*` should/shouldn't be allowed. * Note: Even opening the door to type modifiers at all creates the same kind of challenges for user-defined generic types (and functions!) since `MyType<int>` and `MyType<const int>` are distinct instantiations in a future where we support `const` as a type modifier. We *may* need to plan to restrict where modified types can be used, so that certain built-in generic types support modified types as arguments, but user-defined types don't (or at least might need to opt-in to get support). * The result of a `_coerce()` that drops/adds modifiers is a `ModifierCastExpr`, which is a kind of no-op AST node that merely expresses that the conversion is allowed and valid. * In IR lowering we currently do the simple thing and translate a `ModifiedType` to a distinct IR node called `AttributedType`. * The change in terminology from "modifier" to "attribute" is to follow the way that these kinds of modifiers best map to the `IRAttr` case in the IR (rather than the `IRDecoration` case). We probably ought to do a careful terminology scrub here, because having this terminology mismatch between IR and AST could be a source of confusion. * TODO: In principle, using `IRAttributedType` creates the same basic problems as using `ModifiedType`: code that is usin `as()` or similar operations to check for a specific subclass of `IRType` may not see the case they were looking for due to use of `IRAttributedType`. * Initially I had hoped to avoid the problem by having the `IRAttr`s be attached directly as operands to an otherwise-ordinary `IRType`. E.g., a lowered `unorm float4` would be an `IRVectorType` with an "extra" operand that is an `IRUNormAttr`, something like: `Vector<Float, 4, UNorm>`. This sounds great (and looks great!), but runs into the problem that it is incompatible with the way we currently represent things like generic type parameters. A generic type parameter `T` is represented as an `IRParam`, and it does *not* make sense to have an additional `IRParam` to represent `const T` or `unorm T`, etc. * The Right Way to solve this stuff at both the AST and IR levels is to avoid passing around bare `Type*` or `IRType*` in general, and instead use a value type that implements the needed policy more directly: something like a `TypeHolder` or `IRTypeHolder` (placeholder name). The `*Holder` type would abstract over the various "wrapper" nodes required to store all the additional data like attributes but, importantly, would *not* allow that extra information to be dropped or lost during operations like casting (e.g., note how the current `Type` implementation of `as()` loses information on `typedef` names, making our error messages slightly worse). This is actually quite similar to how we currently use the `DeclRef<T>` system to allow working with what is *usually* a `T*` under the hood, but in a way that ensures we don't lose track of any generic substitution information. * During C-like code emit we have a process that turns an `IRType` into a chain of declarators as needed to emit a C-like declaration with pointers, arrays, etc. The `IRAttributedType` case needs to get folded into this logic. Basically, when we see an `IRAttributedType` we immediately emit any modifiers that are required to be in a prefix position, then recursively emit the underlying type with an extra layer of declarator that tracks the modifiers, so that we can emit any modifiers that should be placed in a postfix position *after* the type. As a specific example, our C/C++ back-end would want to use the postifx option to handle `const`, because then it can properly emit stuff like `int const * const *` and not the incorrect `const const int**`. * The HLSL emit logic overrides the prefix case for handling type attributes, and uses it to emit `unorm` and `snorm` where they occur. * One unfortunate detail is that (apparently) some downstream HLSL compilers do not allow the `unorm`/`snorm` modifiers to apply to `vector<float, *>` types, even though that should be semantically valid. Instead, they only support `float`, `float2`, `float3`, and `float4` explicitly. To work around this issue, we go ahead and change our HLSL emit logic so that when we encountered 1-to-4 component vectors of `float`, `int`, or `uint` we emit the type name using the typical HLSL shorthand. This is actually a signficicant change in our HLSL output, but it both seemed like a good fix to have anyway, and was also the only obvious way to address the downstream parser shortcomings without a massive kludge. * As a result of this change the `half-texture.slang` test broke, since it was using raw HLSL as the expected output. I changed the test to do a DXIL comparison instead, which is our preferred way of testing cross-compilation behavior (since it is more robust in the face of small changes to our source output).
* Passing associated type arguments to existential parameters + packing for ↵Yong He2021-10-21
| | | | | | | | | `bool`. (#1987) * Passing associated type arguments to existential parameters + packing for `bool`. * fix typo Co-authored-by: Yong He <yhe@nvidia.com>
* Avoid upcasting to f32 in 16bit float-uint bit cast. (#1938)Yong He2021-09-14
| | | Co-authored-by: Yong He <yhe@nvidia.com>
* Bug fix in 16bit type emit, vk validation error fix. (#1936)Yong He2021-09-13
| | | | | | + Implement bit_cast between float16 and uint16 in GLSL. + Enable pack-any-value-16bit test on vk. Co-authored-by: Yong He <yhe@nvidia.com>
* `reinterpret` and 16-bit value packing. (#1933)Yong He2021-09-09
| | | | | | | | | * `reinterpret` and 16-bit value packing. * Update `half-texture` cross-compile test reference result. * Revert inadvertent reformatting of slang-ir-inst-defs.h Co-authored-by: Yong He <yhe@nvidia.com>
* Add GLSL/SPIR-V support got GetAttributeAtVertex (#1733)Tim Foley2021-03-03
| | | | | | | This change allows varying fragment shader inputs to be declared in a way that allows the `GetAttributeAtVertex` operation to compile to valid code for both D3D and GLSL/SPIR-V/Vulkan. The key is that rather than just use ordinary `nointerpolation`-qualified inputs the code must declare these varying inputs with a new `pervertex` qualifier that marks them as *only* being usable with `GetAttributeAtVertex`. The `pervertex`-tagged inputs then translate to GLSL inputs using the `pervertexNV` qualifier Note that this change does *not* include any enforcement of the requirements around how these qualifiers are used (and the compiler doesn't have enforcement for the existing operations like `EvaluateAttributeAtCentroid`). The underlying problem is that the inerpolation-mode qualifiers and explicit interpolation functions in HLSL constitute a kind of rate-qualified type system, but without any systematic rules. It seems wasteful to encode a bunch of ad hoc rules for this stuff as special cases in the compiler when the clear right answer is to implement a systematic approach to rates.
* Add an accessor for IRInst opcode (#1707)Tim Foley2021-02-16
| | | | | | | | | * Add an accessor for IRInst opcode This main changing is renaming `IRInst::op` over to `IRInst::m_op` and then adds an accessor `IRInst::getOp()` to read it. The rest of the changes are just changing use sites to `getOp` (or to `m_op` in the limited cases where we write to it). This work is in anticipation of a future change that might need to store an extra bit in the same field as the opcode. It seemed better to do this massive refactoring as a separate PR. * fixup
* Initial support for DXR payload access qualifiers (#1705)Tim Foley2021-02-12
| | | | | | | | | | | | | | | | | | | | This change adds initial support for a feature being proposed for inclusion in dxc: https://github.com/microsoft/DirectXShaderCompiler/pull/3171. The main features are: * A `[payload]` attribute that indicates which `struct` types are intended to be used as payloads. Consistent use of this attribute should mean that an application no longer needs to manually specify a maximum payload size when creating a ray-tracing pipeline. * `read(...)` and `write(...)` qualifiers which can be attached to fields of `struct` types (usually `[payload]`-attributed types) to indicate which ray tracing pipeline stages are allowed read/write access to that part of the payload. Use of these qualifiers should allow an implementation to optimize storage of ray payload elements across RT pipeline stages. The work in this change just adds basic parsing for these features, translation to matching IR decorations, and then emission of HLSL text based on those decorations. Notable gaps in this first change include: * No work is currently being done to validate access to ray payloads in RT entry points based on these qualifiers. * The stage names in `read(...)` and `write(...)` are not being validated, and are being stored in the IR as text. These should probably use the `Stage` enumeration in some fashion, but we would need to have a way to encode the additional `caller` pseudo-stage that the feature uses. * No work is currently being done to adjust or react to the chosen shader model when emitting HLSL code. We should *either* have these attributes force a switch to a higher shader model, *or* skip emission of these attributes if the chosen shader model / profile does not imply support for them. * No tests are currently included for this work, because tests would rely on using a custom `dxcompiler.dll` build with the new feature supported.
* Add support for [noinline] attribute (#1650)Tim Foley2021-01-07
| | | | | | | | | | | This adds the `[noinline]` attribute to the front-end, and passes it through when generating HLSL output. Notes: * This change doesn't include a test since the dxc version I have locally parses `[noinline]` but then generates DXIL that fails validation. * This change doesn't include logic to handle `[noinline]` for other targets. Notably, SPIR-V has decorations that convey the same intention, but we don't yet take advantage of the GLSL extension(s) that would let us generate those decorations. * By necesstiy, `[noinline]` is only a "strong suggestion" and not actually something the compiler can ever guarantee/enforce.
* Simplify workflow when using NVAPI (#1556)Tim Foley2020-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some cases, functionality is available as either a GLSL extension for Vulkan/SPIR-V, or through the NVAPI system for D3D. This situation creates complications because while GLSL extensions are generally all supported by the open-source glslang compiler (which we can bundle and ship), NVAPI operations are exposed through a specific header (`nvHLSLExtns.h`) that ships as part of the NVAPI SDK. When a user wants to explicitly use NVAPI-provided operations in their shader code, there are no major complications for Slang; the user sets up their include paths, `#include`s the relevant header, calls functions in it, and lets Slang deal with the details of compilation. The challenge for Slang arises when we want to provide a cross-platform interface in our standard library (e.g., the `RWByteAddressBuffer.InterlockedAddF32` method that was recently added) that uses either a GLSL extension (when compiling for Vulkan/SPIR-V) or an NVAPI (when compiling to DXBC or DXIL). In that case, the code *generated* by Slang now has a dependency on NVAPI, and we need to somehow emit a `#include` directive that pulls it in when invoking fxc or dxc. Because we do not (and seemingly cannot) bundle the NVAPI header with the compiler, we have to rely on ther user to have it available and to somehow communicate to Slang where it is. Exposing portable routines that sometimes use NVAPI currently creates two main challenges: 1. The user is forced to interact with the "prelude" mechanism in the compiler, which allows the programmer to define code in a given target language that gets prepended to the Slang-generated code. While the prelude mechanism is powerful, it is also hard for users to integrate into their workflow, and our experience so far is that users want something that Just Works. 2. If the user writes code that uses some of our abstract operations that layer on NVAPI *and* they also want to use NVAPI explicitly, they end up with two copies of the NVAPI header (one included by the Slang front-end, and another included by the downstream fxc/dxc compiler). This puts the user in the situation of (a) having to ensure that they set the defines like `NV_SHADER_EXTN_SLOT` consistently both when invoking Slang and when adding their prelude, and (b) even if they do make the definitions consistent, they run into the problem that fxc/dxc complain about overlapping register bindings on the two copies of the `g_NvidiaExt` global shader paraemter that the NVAPI header declares. This change attempts to resolve both issues by adding a lot of "do what I mean" logic to the compiler to try to ease things in the common case. In particular: 1. The user no longer needs to use the "prelude" mechanism when using NVAPI. The compiler now embeds a default prelude for HLSL output, which will `#include` the NVAPI header if and only if the generated code needs NVAPI access because of portable standard library routines that were used. 2. The user can mix-and-match explicit NVAPI use and stdlib functions that compile to use NVAPI. The register/space to be used by NVAPI when included via prelude is now set based on whatever the user set via the preprocessor so that it should automatically be consistent between both cases. Furthermore, the code we emit for the declaration of `g_NvidiaExt` when compiling explicit NVAPI use is set up to be conditional, so that it is skipped in the case where the prelude will pull in its own declaration of that parameter. The way all this is achieved involves a lot of moving pieces: * We now have an HLSL prelude, which mostly just serves to `#include "nvHLSLExtns.h"` in the case where NVAPI support is needed downstream. * Standard library operations that require NVAPI for their implementation on HLSL include a new `[__requiresNVAPI]` attribute. * The preprocessor has been extended so that after tokenizing an input file it looks up the NVAPI-relevant macros in the resulting environment, and if they are set it attached a modifier (`NVAPISlotModifier1) to the AST `ModuleDecl` that is based on their values. Logic is added to detect if multiple input files specify values for the macros in ways that conflict. * The semantic checking step is extended so that it detects the "magic" NVAPI declarations (the `g_NvidiaExt` paramter and the `NvShaderExtnStruct` type that it uses) and attaches a modifier to them so that they can be identified as such in later steps. * Parameter binding is extended to collect a list of the AST modifiers that reflect NVAPI binding, and to reserve the relevant register(s) so that ordinary user-defined parameters cannot conflict with them. * IR lowering translates the three new AST modifiers related to NVAPI over to IR equivalents. * IR linking is extended to make sure that it clones any `IRNVAPISlotDecoration`s attached to the input modules. The pass intentionally does not care where the modifiers came from; it just collects them all and leaves it to downstream code to sort out what they mean. * Emit logic is extended to have a notion of "prelude directives" which are preprocessor directives that should come *before* the prelude in the generated code, because they can impact the way that the prelude compiles. This is done so that we don't have to introduce ad hoc logic for each downstream compiler to set any relevant `-D` flags (e.g., both fxc and dxc would need to duplicate such logic for NVAPI support). * The HLSL source emitter is extended to track whether it emits any operations that require NVAPI support. * The HLSL source emitter is extended to emit prelude directives based on whether NVAPI is needed and, if it is, to also set the register and space that NVAPI should use based on what was stored in the decoration(s) on the IR module. * The HLSL source emitter is extended so that it detects global instructions that represent "magic" NVAPI constructs , and emit them as conditional definitions so that they are skipped when NVAPI is included via the prelude. * The handling of requires capabilities during emit logic was cleaned up a bit so that more logic is shared across targets, and also so that the same logic is used both when emitting a function declaration/definition and when emitting a call to an instrinsic function (which won't get declared/defined).
* First pass support for Sampler Feedback (#1470)jsmall-nvidia2020-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add the Feedback texture types. Depreciate SLANG_RESOURCE_EXT_SHAPE_MASK. * Starting point to test sampler feedback. * WIP on FeedbackSampler. * Use __target_intrinsic to override the output of sampler feedback types. * Use newer generic syntax for FeedbackTexture. * Reflects Feedback type. * SLANG_TYPE_KIND_TEXTURE_FEEDBACK -> SLANG_TYPE_KIND_FEEDBACK * Added reflection test. * Reneable issue with generics in sampler-feedback-basic.slang * Add methods to FeedbackTexture2D/Array. Make test cover test cases. * Sampler feedback produces DXC code. * Disabled Sampler feedback test - as requires newer version of DXC. * Fix bug in reflection tool output. * Fix problem with direct-spirv-emit.slang.expected due to update to glslang. * Fix direct-spirv-emit.slang * Use SLANG_RESOURCE_EXT_SHAPE_MASK again * Make Feedback be emitted as a textue type prefix. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
* Prelude is associated with SourceLanguage (#1398)jsmall-nvidia2020-06-18
| | | | | | | | | | | | | | | * Associate a downstream compiler for prelude lookup even if output is source. * Remove LanguageStyle and just use SourceLanguage instread. * Added set/getPrelude. Made prelude work on source language. * Fix typo in method name replacement. get/SetPrelude get/setLanguagePrelude * Fix issue because of method name change. * Remove getPreludeDownstreamCompilerForTarget
* Fix and improvements around repro (#1397)jsmall-nvidia2020-06-18
| | | | | | * * Fix output in slang repro command line * Profile uses lowerCamel method names (had mix of upper and lower) * Rename slang-serialize-state/SerializeStateUtil to slang-repro and ReproUtil.
* Remove aborting in emitLoopControlDecoration default case.Yong He2020-06-04
|
* Emit [loop] attribute to output HLSL.Yong He2020-06-04
|
* Feature/ast syntax standard (#1360)jsmall-nvidia2020-05-29
| | | | | | | | | * Small improvements to documentation and code around DiagnosticSink * Made methods/functions in slang-syntax.h be lowerCamel Removed some commented out source (was placed elsewhere in code) * Making AST related methods and function lowerCamel. Made IsLeftValue -> isLeftValue.
* Improvements around hashing (#1355)jsmall-nvidia2020-05-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Fields from upper to lower case in slang-ast-decl.h * Lower camel field names in slang-ast-stmt.h * Fix fields in slang-ast-expr.h * slang-ast-type.h make fields lowerCamel. * slang-ast-base.h members functions lowerCamel. * Method names in slang-ast-type.h to lowerCamel. * GetCanonicalType -> getCanonicalType * Substitute -> substitute * Equals -> equals ToString -> toString * ParentDecl -> parentDecl Members -> members * * Make hash code types explicit * Use HashCode as return type of GetHashCode * Added conversion from double to int64_t * Split Stable from other hash functions * toHash32/64 to convert a HashCode to the other styles. GetHashCode32/64 -> getHashCode32/64 GetStableHashCode32/64 -> getStableHashCode32/64 * Other Get/Stable/HashCode32/64 fixes * GetHashCode -> getHashCode * Equals -> equals * CreateCanonicalType -> createCanonicalType * Catches of polymorphic types should be through references otherwise slicing can occur. * Fixes for newer verison of gcc. Fix hashing problem on gcc for Dictionary. * Another fix for GetHashPos * Fix signed issue around GetHashPos
* Add support for generic load/store on byte-addressed buffers (#1334)Tim Foley2020-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add support for generic load/store on byte-addressed buffers Introduction ============ The HLSL `*ByteAddressBuffer` types originaly only supported loading/storing `uint` values or vectors of the same, using `Load`/`Load2`/`Load3`/`Load4` or `Store`/`Store2`/`Store3`/`Store4`. More recent versions of dxc have added support for generic `Load<T>` and `Store<T>`, which adds a two main pieces of functionality for users. The first and more fundamental feature is that `T` can be a type that isn't 32 bits in size (or a vector with elements of such a type), thus exposing a capability that is difficult or impossible to emulate on top of 32-bit load/store (depending on what guarantees `*StructuredBuffer` makes about the atomicity of loads/stores). The secondary benefit of having a generic `Load<T>` and `Store<T>` is that it becomes possible to load/store types like `float` without manual bit-casting, and also becomes possible to load/store `struct` types so long as all the fields are loadable/storable. This change adds generic `Load<T>` and `Store<T>` to the Slang standard library definition of byte-address buffers, and tries to bring those same benefits to as many targets as possible. In particular, the secondary benefits become available on all targets, including DXBC: byte-address buffers can be used to directly load/store types other than `uint`, including user-defined `struct` types, so long as all of the fields of those types can be loaded/stored. The ability to load/store non-32-bit types depends on target capabilities, and so is only available where direct support for those types is available. For 16-bit types like `half` this includes both Vulkan and D3D12 DXIL with appropriate extensions or shader models. The implementation is somewhat involved, so I will try to explain the pieces here. Standard Library ================ The changes to the Slang standard library in `hlsl.meta.slang` are pretty simple. We add new `Load<T>` and `Store<T>` generic methods to `*ByteAddressBuffer`, and route them through to a new IR opcode. Right now the generic `Load<T>` and `Store<T>` do *not* place any constraints on the type `T`, although in practice they should only work when `T` is a fixed-size type that only contains "first class" uniform/ordinary data (so no resources, unless the target makes resource types first class). Our front-end checking cannot currently represent first-class-ness and validate it (nor can it represent fixed-size-ness), so these gaps will have to do for now. Rather than directly translate `Load<T>` or `Store<T>` calls into a single instruction, we instead bottleneck them through internal-use-only subroutines. The design choice here is intended to ensure that for some large user-defined type like `MassiveMaterialStruct` we only emit code for loading all of its fields *once* in the output HLSL/GLSL rather than once per load site. While downstream compilers are likely to inline all of this logic anyway, we are doing what we can to avoid generating bloated code. Emit and C++/CUDA ================= Over in `slang-emit-c-like.cpp` we translate the new ops into output code in a straightforward way. A call like `obj.Load<Foo>(offset)` will eventually output as a call like `obj.Load<Foo>(offset)` in the generated code, by default. For the CPU C++ and CUDA C++ codegen paths, this is enough to make a workable implementation, and we add suitable templated `Load<T>` and `Store<T>` declarations to the prelude for those targets. Legalization ============ For targets like DXBC and GLSL there is no way to emit a load operation for an aggregate type like a `struct`, so we introduce a legalization pass on the IR that will translate our byte-address-buffer load/store ops into multiple ops that are legal for the target. Scalarization ------------- The big picture here is easy enough to understand: when we see a load of a `struct` type from a byte-address buffer, we translate that into loads for each of the fields, and then assemble a new `struct` value from the results. We do similar things for arrays, matrices, and optionally for vectors (depending on the target). Bit Casting ----------- After scalarization alone, we might have a load of a `float` or a `float3` that isn't legal for D3D11/DXBC, but that *would* be legal if we just loaded a `uint` or `uint3` and then bit-casted it. The legalization pass thus includes an option to allow for loads/stores to be translated to operate on a same-size unsigned integer type and then to bit-cast. To make this work actually usable, I had to add some more details to the implementation of the bit-cast op during HLSL emit and, more importantly, I had to customize the way that the byte-address buffer load/store ops get emitted to HLSL so that it prefers to use the existing operations like `Load`/`Load2`/`Load3`/`Load4` instead of the generic one, whenever operating on `uint`s or vectors of `uint`. Translation to Structured Buffers --------------------------------- Even after scalarizing all byte-address-buffer loads/stores, we still have a problem for GLSL targets, because a single global `buffer` declaration used to back a byte-address buffer can only have a single element type (currently always `uint`), so the granularity of loads/stores it can express is fixed at declaration time. If we want to load a `half` from a byte-address buffer, we need a dedicated `buffer` declaration in the output GLSL with an element type of `half`. The solution we employ here is to translate all byte-address buffer loads into "equivalent" structured-buffer ops when targetting GLSL. We add logic to find the underlying global shader parameter that was used for a load/store and introduce a new structured-buffer parameter with the desired element type (e.g., `half`) and then rewrite the load/store op to use that buffer instead. We copy layout information from the original buffer to the new one, so that in the output GLSL all the various `buffer`s will use a single `binding` and thus alias "for free." We don't want to create a new global buffer for every load/store, so we try to cache these "equivalent" structured buffers as best as we can. For the caching I ended up needing a pair to use as a key, so I tweaked the `KeyValuePair<K,V>` type in `core` so that it could actually work for that purpose. Because we are working at the level of IR instructions instead of stdlib functions at this work I had to add new IR opcodes to represent structured-buffer load/store that only (currently) apply to GLSL. Layout ====== In order to translate a load/store of a `struct` type into per-field load/store we need a way to access layout information for the types of the fields. Previously layout information has been an AST-level concern that then gets passed down to the IR only when needed and only on global parameters, so layout information isn't always available in cases like this, at the actual load/store point. As an expedient move for now I've introduced a dedicated module that does IR-level layout and caches its results on the IR types themselves. This approach *only* supports the "natural" layout of a type, and thus is usable for structured buffers and byte-address buffers (or general pointer load/store on targets that support it), but which is *not* usable for things like constant buffer layout. We've known for a while that the Right Way to do layout going forward is to have an IR-based layout system, and this could either be seen as a first step toward it, or else as a gross short-term hack. YMMV. Details ======= The GLSL "extension tracker" stuff around type support needed to be tweaked to recognize that types like `int16_t` aren't actually available by default. I switched it from using a "black list" of unavailable types at initialization time over to using a "white list" of types that are known to always be available without any extensions. Tests ===== There are two tests checked in here: one for the basic case of a `struct` type that has fields that should all be natively loadable, and one that stresses 16-bit types. Each test uses both load and store operations. Future Directions ================= Right now we translate vector load/store to GLSL as load/store of individual scalars, which means the assumed alignment is just that of the scalars (consistent with HLSL byte-address buffer rules). We could conceivably introduce some controls to allow outputting the vector load/store ops more directly to GLSL (e.g., declaring a `buffer` of `float4`s), which might enable more efficient load/store based on the alignment rules for `buffer`s. The IR layout work has a number of rough edges, but the most worrying is probably the assumption that all matrices are laid out in row-major order. Slang really needs an overhaul of its handling of matrices and matrix layout, so I don't know if we can do much better in the near term. At some point the IR-based layout system needs to be reconciled with our current AST-base layout, and we need to figure out how "natural" layout and the currently computed layouts co-exist (in particular, we need to make sure that the IR-based layout and the existing layout logic for structured buffers will agree). This probably needs to come along once we have moved the core layout logic to operate on IR types instead of AST types (a change we keep talking about). As part of this work I had to touch the implementation of bit-casting for HLSL, and it seems like that logic has some serious gaps. We really ought to consider a separate legalization pass that can turn IR bitcast instructions into the separate ops that a target actually supports so that we can implement `uint64_t`<->`double` and other conersions that are technically achievable, but which are hard to express in HLSL today. * fixup: missing files
* Replace /* unhandled */ in source emit with a real error (#1313)Tim Foley2020-04-08
| | | | | | | | | | | | | | | | | | | | | For a long time the various source-to-source back-ends have been emitted the text `/* unhandled */` when they encounter an IR instruction opcode that didn't have any emit logic implemented. This choice had two apparent benefits: * In most common cases, emitting `/* unhandled */` in place of an expression would lead to downstream compilation failure, so most catastrophic cases seemed to work as desired (e.g., if we emit `int i = /* unhandled */;` we get a downstream parse error and we know something is wrong. * In a few cases, if a dead-but-harmless instruction slips through (e.g., a type formed in the body of a specialized generic function), we would emit `/* unhandled */;`, which is a valid empty statement. It is already clear from the above write-up that the benefits of the policy aren't really that compelling, and where it has recently turned out to be a big liability is that there are actually plenty of cases where emitting `/* unhandled */` instead of a sub-expression won't cause downstream compilation failure, and will instead silently compute incorrect results: * Emitting `/* unhandled */ + b` instead of `a + b` * Emitting `/* unhandled */(a)` instead of `f(a)`, or even `/* unhandled */(a, b)` instead of `f(a,b)` * Emitting `f(/*unhandled*/)` instead of `f(a)` in cases where `f` is a built-in with both zero-argument and one-argument overloads The right fix here is simple: where we would have emitted `/* unhandled */` to the output we should instead diagnose an internal compiler error, thus leading to compilation failure. This change appears to pass all our current tests, but it is possible that there are going to be complicated cases in user code that were relying on the previous lax behavior. I know from experience that we sometimes see `/* unhandled */` in output for generics, and while we have eliminated many of those cases I don't have confidence we've dealt with them all. When this change lands we should make sure that the first release that incorporates it is marked as potentially breaking for clients, and we should make sure to test the changes in the context of the client codebases before those codebases integrate the new release.
* Unroll target improvements (#1291)jsmall-nvidia2020-03-25
| | | | | | | | | | | * Add unroll support for CUDA, and preliminary for C++. Document [unroll] support. * Fix loop-unroll to run on CPU, and test on CPU and elsewhere. Fix bug in emitting loop unroll condition. * Improved comment. * Added support for vk/glsl loop unrolling.
* Renamed UnownedStringSlice::size to getLength to make match String. (#1254)jsmall-nvidia2020-03-02
|
* Fixes for DXR 1.1 RayQuery type (#1227)Tim Foley2020-02-19
| | | | | | | | | | | | | | | | | | | | The previous change that added `RayQuery` to the standard library didn't mark it as any kind of intrinsic, so the first fix here was to add the appopriate attribtue to the stdlib declaration of `RayQuery`. Next I found that the legalization pass was obliterating the `RayQuery` type because it had no members, and thus looked like an empty `struct` (which we eliminate for a variety of reasons). I fixed that by adding a check for a target-intrinsic decoration in type legalization. Next I found that the type wasn't emitted correctly because our generic specialization was turning `RayQuery<0>` into a new type `RayQuery_0` (which is what our specialization is designed to do, after all). I then disabled generic specialization for types that are marked as target intrinsics (which probably renders the preceding fix moot). Finally, I found that the emit logic for types in HLSL wasn't handling the case of a generic intrinsic type that didn't also use its own dedicated opcode. I fixes that up by adding a specific case for `IRSpecialize` as a late catch-all. After all these changes, a declaration of a `RayQuery` variable seems to Just Work (even without any new/improved behavior for handling default constructors). One potential gotcha looking forward is that my checks for `IRTargetIntrinsicDecoration` aren't checking what target the decoration is for. This is fine for now because there are only two types using the decoration right now (`RayDesc` and `RayQuery`), and the special cases above are reasonable for both of them. If/when we have more target-intrinsic types with this decoration, and some of them are only intrinsic for specific targets, then we will need to revisit this choice and either: * make these checks perform filtering based on the "current" target (similar to what the emit logic has today), or * (more likely) make the linking and target-specialization step strip out any target-intrinsic decorations that aren't the right one(s) for the current target Note that this change doesn't include a test case yet because I don't have a DXR 1.1 ready version of dxc to test against. I have manually confirmed that appropriate Slang input seems to be producing reasonable HLSL output when using these functions, but I cannot yet try to check that in (using an HLSL file for the expected output would be quite fragile).
* Change handling of strings for HLSL/GLSL targets (#1204)Tim Foley2020-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * Change handling of strings for HLSL/GLSL targets This change switches our handling of string literals and `getStringHash` to something that is more streamlined at the cost of potentially being less general/flexible. * `String` is now allowed as a parameter type in user-defined functions * `getStringHash` is now allowed to apply to `String`-type values that aren't literals * The list of strings in an IR module is now generated during IR lowering as part of lowering a string literal expression, rather than being defined by recursively walking the IR of the module looking for `getStringHash` calls. The public API still refers to these as "hashed" strings, but they are realistically now "static strings." * When emitting code for HLSL/GLSL, the `String` type emits as `int`, and `getStringHash(x)` emits as `x`. In terms of implementation, the choice was whether to translate `String` over to `int` in an explicit IR pass, or to lump it into the emit pass. While adding the logic to emit clutters up an already complicated step, it is ultimately much easier to make the change there than to write a clean IR pass to eliminate all `String` use. Note that other targets that can handle a more full-featured `String` type are *not* addressed by this change and thus do not support `String` at all. It may be woth emitting `String` as `const char*` on those targets, and emitting string literals directly, but the `getStringHash` function would need to be implemented in the "prelude" then, and we probably want to pick a well-known/-documented hash algorithm before we go that far. This change also brings along some some clean-ups to the `gpu-printing` example, since it can now take advantage of the new functionality of `String`. * Fix up tests for new string handling * Add global string literal list to string-literal test (since we now list *all* static string literals and not just those passed to `getStringHash`) * Disable `getStringHash` test on CPU, since we don't have a working `String` on that platform right now (only HLSL/GLSL) Co-authored-by: Tim Foley <tim.foley.is@gmail.com>
* Literal handling improvements (#1202)jsmall-nvidia2020-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * WIP: 64 literal diagnostic and truncation. * Improve how integer truncation is handled/supported. Added literal-int64.slang test. Set a suffix on all literals. Fixed problem on C++ based targets where l suffix was not the same as int() cast. So on C++ derived emitters, int() is used instead of l suffix to have same behavior across targets. * Add literal diagnostic testing. * Allow lexer to lex - in front of literals. * Fix lexing and converting int literal with -. * Too large small values of floats become inf. Handling writing inf types out on different targets. Add function to deterimine if a float literals kind. * Roll back the support of lexer lexing negative literals. * Fixed tests broken because of diagnostics numbers. Improved _isFinite * Fix compilation on linux. * Fix problem with abs on linux - use Math::Abs. * Fix typo. * * Improve warnings for float literals zeroed * Improved 64 bit type documentation * Handle half * Improved comments * Fixed tests broken * Use capital letters for suffixes. * Make default behavior on outputting a int literal that is an 'int32_t' is cast (not suffix) to avoid platform inconsistencies. Improve documentation for 64 bit types. Make tests cover material in docs. * Fixed tests. * Rename FloatKind::Normal -> Finite * Fix half zero check.
* Address review comments on IR layout PR (#1091)Tim Foley2019-10-24
| | | These were meant to be merged into #1084, but I failed to push the changes to the server.
* User IR-based layout for all IR steps (#1084)Tim Foley2019-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change builds on previous work that moves toward a more IR-based representation of layout. Those steps added some instructions for representing layout in the IR (initially just proxies for the AST layout objects), and an explicit lowering pass that could build a target-specific IR module that binds parameters and entry points to layout information. This change aims to complete that work, in the sense that the IR representation of layout is now self-contained and does not rely on having pointers back into the AST-level representation. Achieving this requires two main kinds of work: 1. Update any code that used layout information derived from the IR (most notably all the `slang-emit-*` code) to use the new IR representation and its accessors. 2. Update any code that *constructs* layouts using information derived from the IR to construct IR layouts instead. The biggest new infrastructure feature in this change is support for "attributes" in the IR (I'd welcome feedback on the naming). An attribute can either be thought of like key/value arguments that can be added to certain instructions to encode optional data, or alternatively like a decoration that is referenced as an operand instead of a child. The value of attributes over decorations is that they can affect the hash/identity of an instruction (which decorations can't), while the advantage of decorations is that they can easily be added/removed over the lifetime of an instruction (which attributes can't). We mostly use them here to represent operands that are logically optional. Once attributes are available, the encoding of layout information into the IR is mostly straightforward: * An `IRVarLayout` has a fixed operand for its type layout, and can accept a few different attributes * Zero or more `IRVarOffsetAttr`s that specify the offset of the variable for a given resource kind. These are equivalent to the `VarLayout::ResourceInfo`s at the AST level. * An optional `IRUserSemanticAttr` and `IRSystemValueSemanticAttr` to represent the (possibly derived) semantic of a varying input/output parameter. * An option `IRStageAttr` to represent the known stage for a parameter. * An `IREntryPointLayout` has a var layout for the entry point parameters (logically grouped in to a struct) and another var layout for the result parameter. * There is a small type hierarchy rooted at `IRTypeLayout` where each subtype can add fixed operands and attributes that are expected to appear. It also supports `IRTypeSizeAttr`s that serve a similar role to the `IRVarOffsetAttr`s. * Structure types maintain the mapping of fields to their var layouts using `IRStructFieldLayoutAttr`s. With the encoding in place, most of the changes in category (1) (code that just *uses* rather than *creates* layouts) was straightforward. The biggest different beyond name changes was that everything needs to be fetched using accessors instead of bare fields. It would have been possible to stage this commit and make the diffs smaller by first introducing mandatory acessors to the AST layout types. The changes in category (2) were more involved. There were a lot of places in the existing code where a `TypeLayout` or `VarLayout` would be created, and then initialized piecemeal over several lines of code (and sometimes even across functions). Because of the way that layouts need to support many optional properties, it did not seem practical to just have monolithic factory functions that took all the options as arguments, so I instead opted for a builder approach. The builders for `IRVarLayout` and `IREntryPointLayout` are both straightforward, and honestly there is no realy need for a builder for entry point layouts right now, but I was trying to future-proof in case we decidd to add some optional attributes to them. The builders for type layouts are more involved because of the inheritance hierarchy. Each concrete sub-type of type layout needs to define its own builder type that customizes the opcode, operands, and attributes of the final instruction. The refactoring that had to go into this change was a nice excuse to clean up a few ugly warts in the AST layout code that were largely there to support IR use cases. While this change adds a lot of new infrastructure code to the IR, most of the client code has stayed the same or gotten simpler. One annoying wart that remains with this change is the notion of an "offset element type layout" for parameter group types. That idea was added to deal with a legacy feature in the reflection API that we realized was a mistake, but unfortunately having that "offset" layout handy made writing a few other pieces of code simpler so that there are use cases of the feature even in the IR. Removing those uses is do-able, but requires careful refactoring so it is best left to a follow-on change. Another thing that could be considered for a follow-on change is how much information should be specified when constructing a `Builder` for an IR type layout, and how much should be allowed to be specified statefully/piecemeal. It would be nice to force all the required operands to be specified up front, but `IRParameterGroupTypeLayout::Builder` doesn't currently work that way because so much of the client code that needs it involved a lot of stateful setting and would need to be refactored heavily to provide the necessary information up front.
* Initial work on representing layout at IR level (#1079)Tim Foley2019-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Initial work on representing layout at IR level This change starts the process of making the back-end of the compiler independent of the AST-level layout information (`TypeLayout`, `VarLayout`, etc.) so that it instead only relies on layout information that is embedded into IR modules. This brings us incrementally closer to a world in which the back-end could be run without the AST-level structures even existing (e.g., for an application that just wants to ship IR without any AST information for IP protection, while still supporting some amount of linking and specialization). The main parts of the change are: * There is a bunch of incidental churn related to specifying entry points by index instead of the `EntryPoint` object for certain operations. This ends up being a better choice because we can use the index to look up side-band information about the entry point that might not be stored on the `EntryPoint` object itself. In particular... * We expand the `ComponentType` interface to support looking up the mangled name of an entry point by index. In common cases (no generic/interface specialization) this would be the same as asking the `EntryPoint` for its mangled name, but in cases where we have specialized a generic entry point, the mangled name would include speicalization arguments that are only available on the `SpecializedComponentType` that wraps the entry point. This part of the change isn't ideal and there might be a better solution waiting to be invented. Note that we store mangled entry point names as strings rather than using `DeclRef`s because that ensures that the information could be serialized and deserialized without a dependence on the AST. * The `TargetProgram` type (which represents binding a specific `ComponentType` for a shader program to a specific `TargetRequest` that represents the target platform) is expanded to include an `IRModule` that represents layout information, in addition to the AST-level `ProgramLayout` it already contained. We create both of these objects at the same time (on-demand) to simplify the overall flow (so that any code that triggers creation of the AST-level layout will also ensure that the IR-level layout exists). * A bunch of code in the emit passes that was passing down layout-related objects has been eliminated. It appears that most of those objects weren't actually being used, so this is just a cleanup, but it helps ensure that the back-end steps are "clean" and don't depend on the AST-level information. The one big exception here is that the emit logic needs to know the stage for the entry point being emitted (to deal with one wrinkle in translating DXR to VKRT). * A big change (actually introduced by @jsmall-nvidia in a branch that this change copied and then built from) is to introduce some more explicit IR instructions to represent layout information, notably an `IRTypeLayout` and an `IRVarLayout`. For now these objects still reference their AST equivalents, but the separation gives us an incremental path to move information from the AST-level objects over to the IR ones. This work includes logic in `IRBuilder` to construct the IR-level layout objects from the AST-level ones on-demand, so that the existing code paths that try to attach AST-level layout will continue to work for now. * Because layout information is now embedded in the IR, the `slang-ir-link.cpp` logic loses a lot of cases that used to deal with attaching AST-level layout objects to IR-level instructions during the linking process. Instead, the linker now assumes that one (or more) of the input IR modules will have layout information associated with it, and the linker makes sure to copy layout decorations (and the instructions they reference) from the input IR module(s) to the output using its more ordinary mechanisms. * Inside `slang-lower-to-ir.cpp`, we add logic to construct an IR module in a `TargetProgram` that simply references the global shader parameters, entry points, etc. and attaches IR layout decorations to them. This is akin to the existing pass in the same file that constructs IR to represent specialization information, and both of these passes share infrastructure with the main AST->IR lowering pass. Eventually, it is expected that this pass will encompass more of the logic for copying AST-level layout information over to IR-level equivalents. * One small wrinkle with this change was that the output for an HLSL generation test case changed some of its `#line` directives. The old code was actually more inaccurate than the new, so this change just updated the baseline. It also added some logic in the linker to make sure that when an IR instruction has multiple definitions, we try to pick up a source location from any of them, in case the "main" one somehow didn't get a location. * Another small fix was that the key/value map in `StructTypeLayout` for mapping fields/members to their layouts was keyed on `Decl*` when it really should have been `VarDeclBase*`. This change should in principle be a pure refactoring with no functionality changes, so no new tests were added. It is unfortunately also a change that has a high probability of breaking at least *some* client code, so we may want to be defensive and mark this with a new major version number (well, a new *minor* version number since we are pre-`1.0`) to give us some room for releasing hotfixes to the old version if needed. * fixup: infinite recursion bug detected by clang * fixup: remove commented-out code
* Fixed from Review of Entry Point decoration #1068 (#1072)jsmall-nvidia2019-10-08
| | | | | | | | * Remove typo around GeometryPrimitiveTypeDecoration * * GeometryPrimitiveTypeDecoration -> GeometryInputPrimitiveTypeDecoration (to try and closer match meaning and the Modifier name) * Remove a small problem around definition of IRGeometryPrimitiveTypeDecoration * Fix comment around IRStreamOutputTypeDecoration
* Remove EntryPointLayout* use in emit logic. (#1071)jsmall-nvidia2019-10-08
| | | | | | | | | | | | | | | | | | | | | | * Split out EntryPointParamDecoration. * Add profile to EntryPointDecoration. * WIP for GS handling for GLSL. * WIP for StreamOut GLSL * Fixed GLSL geometry output. * Clean up - remove unneeded/commented out code from the entry point change. * Use Op nums to identify GeometryTypeDecorations (as opposed to contained enum). * Remove setSampleRateFlag & doSampleRateInputCheck * Remove EntryPointLayout from emit. * Change to force CI.
* Feature/ir entry point profile (#1068)jsmall-nvidia2019-10-08
| | | | | | | | | | | | | | | | * Split out EntryPointParamDecoration. * Add profile to EntryPointDecoration. * WIP for GS handling for GLSL. * WIP for StreamOut GLSL * Fixed GLSL geometry output. * Clean up - remove unneeded/commented out code from the entry point change. * Use Op nums to identify GeometryTypeDecorations (as opposed to contained enum).
* IR types for subset of Attributes (#1067)jsmall-nvidia2019-10-04
| | | | | | | | | | | | | | | | | | | | | | | | | | * IROutputControlPointsDecoration * IROutputTopologyDecoration * IRPartitioningDecoration * IRDomainDecoration * Use IRPatchConstantDecoration alone for hlsl output. * IRMaxVertexCountDecoration * IRInstanceDecoration * Removed _emitHLSLAttributeSingleString and _emitHLSLAttributeSingleInt Removed GLSLBindingAttribute and just use NumThreadsAttribute * Added IRNumThreadsDecoration. * Added IRNumThreadsDecoration * Fix build problem on x86. Improve diagnostic text based on review.
* Revise new COM-lite API (#1007)Tim Foley2019-08-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Revise new COM-lite API This change revises the "COM-lite" API that was recently introduced to try to streamline it and introduce some missing central/base concepts. The central new abstraction in the API is the notion of a "component type," which is a unit of shader code composition. A component type can have: * IR code for some number of functions/types/etc. * Zero or more global shader parameters * Zero or more "entry point" functions at which execution can start * Zero or more "specialization" parameters (types or values that must be filled in before kernel code can be generated) * Zero or more "requirements" (dependencies on other component types that must be satisfied before kernel code can be generated) Both individual compiled modules, and validated entry points are then examples of component types, and we additionally define a few services that apply to all component types: * We can take N component types and compose them to create a new component type that combines their code, shader parameters, entry points, and specialization parameters. A composed component type may also include requirements from the sub-component types, but it is also possible that by composing thing we satisfy requirements (if `A` requires `B`, and we compose `A` and `B`, then the requirement is now satisfied, and doesn't appear on the composite). * We can take a component type with N specialization parameters, and specialize it by giving N compatible specialization arguments. The result of specialization is a new component type with zero specialization parameters. Under the right circumstances the specialzed component type will be layout compatible with the unspecialized one. * One more example that isn't exposed in the public API today is that we can take a component with requirements and "complete" it by automatically composing it with component types that satisfy those requirements. This can be seen as a kind of linking step that pulls together the transitive closure of dependencies. * We can query the layout for the shader parameters and entry points of a component type, for a specific target. * We can query compiled kernel code for an entry point in a component type (for a specific target). This only works for component types with zero specialization parameters and zero requirements. The idea is that by giving users a fairly general algebra of operations on component types, they can compose final programs in ways that meet their requirements. For example, it becomes possible to incrementally "grow" a component type to represent the global root signature for ray tracing shaders as new entry points are added, in such a way that it always stays layout-compatible with kernels that have already been compiled. Much of the implementation work here is in implementing the unifying component type abstraction, and in particular re-writing code that used to assume a program consisted of a flat list of modules and entry points to work with a hierarchical representation that reflects the underlying algebra (e.g., with types to represent composite and specialized component types). There's also a hidden "legacy" case of a component type to deal with some legacy compiler behaviors that can't be directly modeled on top of the simple algebra with modules and entry points. This API is by no means feature-complete or fully developed. It is expected that we will flesh it out more when bringing up application code (e.g., Falcor) on top of the revamped API. One notable thing that went away in this change is explicit support for "entry point groups" and notions of local root signatures (especially the Falcor-specific handling of the `shared` keyword, which a previous change turned into an explicitly supported feature). With the new "building blocks" approach, it should be possible for a DXR application to deal with local root signatures as a matter of policy (on top of the API we provide). If/when we need to provide some kind of emulation of local root signatures for Vulkan (and/or if Vulkan is extended with an explicit notion of local root signatures), we might need to revisit this choice. * Fix debug build There was invalid code inside an `assert()`, so the release build didn't catch it. * fixup: warnings * fixup: more warnings-as-errors * fixup: review notes * fixup: use component type visitors in place of dynamic casting
* Change how global-scope constants are handled (#1001)Tim Foley2019-07-17
| | | | | | | | | | | | | | | | | | | Before this change, global and function-scope `static const` declarations were represented as instructions of type `IRGlobalConstant`, which was represented similarly to an `IRGlobalVar`: with a "body" block of instructions that compute/return the initial value. This representation inhibited optimizations (because a reference to a global constant would not in general be replaced with a reference to its value), and also caused problems for resource type legalization because the logic for type legalization did not (and still does not) handle initializers on globals (so global *variables* that contain resource types are still unsupported). The change here is simple at the high level: we get rid of `IRGlobalConstant` and instead handle global-scope constants as "ordinary" instructions at the global scope. E.g., if we have a declaration like: static const int a[] = { ... } that will be represented in the IR as a `makeArray` instruction at the global scope, referencing other global-scope instructions that represent the values in the array. This simple choice addresses both of the main limitations. A `static const` variable of integer/float/whatever type is now represented as just a reference to the given IR value and thus enables all the same optimizations. When a `static const` variable uses a type with resources, the existing legalization logic (which can handle most of the "ordinary" instructions already) applies. Another secondary benefit of this approach is that the hacky `IREmitMode` enumeration is no longer needed to help us special-case source code emit for `static const` variables. Beyond just removing `IRGlobalConstant`, and updating the lowering logic to use the initializer direclty, the main change here is to the emit logic to make it properly handle "ordinary" instructions that might appear at global scope. One open issue with this change, that could be addressed in a follow-up change, is that "extern" global constants that need to be imported from another module (but which might not have a known value when the current module is compiled) aren't supported - we don't have a way to put a linkage decoration on them. A future change might re-introduce global constants as a distinct IR instruction type that just references the value as an operand (if it is available). We would then need to replace references to an IR constant with references to its value right after linking.
* Split out target code generation from CLikeSourceEmitter (#976)jsmall-nvidia2019-06-06
* * Added SourceStyle to CLikeSourceEmitter, to limit cases to actual target types. * Made Impl methods _ prefixed * Small tidyup * * SourceStream -> SourceWriter * use slang-emit- prefix on SourceWriter file * * Remove EmitContext -> merge into CLikeSourceEmitter * slang-c-like-source-emitter -> slang-emit-source.cpp * ExtensionUsageTracker -> GLSLExtensionTracker slang-extension-usage-tracker.cpp/.h -> slang-emit-glsl-extension-tracker.cpp/.h * emit-source.cpp.h -> emit-c-like.cpp/.h * Small fix to move where some _ prefixed functions are declared in CLikeSourceEmitter. * * CLikeSourceEmitter::CInfo -> Desc * Functions to get and find CodeGenTarget by name * Split out empty language impls * Create an impl based on SourceStyle * * CodeGenTarget conversion to and from string * Move HLSL specific functions to HLSLEmitSource. * Emitting texture and image types. * Move move GLSL specific functionality to GLSLSourceEmitter * Split more out of slang-emit-c-like * Refactor more out of slang-emit-c-like * * tryEmitIRInstExprImpl(IRInst* inst, IREmitMode mode, const EmitOpInfo& inOuterPrec) * Fix bug around output of uintBitsToFloat * More work refactoring out target specifics from slang-emit-c-like * Move functions that are only implemented once in GLSL impl into their Impl method. * Move rate qualification out of slang-emit-c-like * * Added getEmitOpForOp - allows for table usage so different ops can be dealt with the same way * Moved vector comparison to slang-emit-glsl * * * Use EmitOpInfo to control output in slang-emit-c-like.cpp for unary ops * Move more functionality from CLikeSourceEmitter to HLSLSourceEmitter * Make output of parameters implementaion specific. * Extracted interpolation modifiers. * Remove IR from methods that don't need them. * Remove IR from method names. * Refactor handling of output of types - to make the impls implement the full path without lots of cases for specific impls * Add variable declaration modifiers and matrix layout to larget specific in slang-emit. * Make target specific internal functions _ prefixed.