slang.git - Making it easier to work with shaders

	Commit message (Collapse)	Author	Age
*	Include hierarchy output (#1595)	jsmall-nvidia	2020-11-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Improve diagnostic for token pasting. * Token paste location test. * Output include hierarchy. * WIP on includes hierarchy. * Improved include hierarchy output - to handle source files without tokens. Improved test case. * Small comment improvements. Fixed a typo with not returning a reference. * Slight simplification of the ViewInitiatingHierarchy, by adding GetOrAddValue to Dictionary. * Remove the need for ViewInitiatingHierarchy type. * Improve output of path in diagnostic for includes hierarchy. * Remove comment in diagnostic for token-paste-location.slang * Update command line docs to include `-output-includes` Co-authored-by: Yong He <yonghe@outlook.com>
*	Refactor preprocessor API to avoid coupling (#1559)	Tim Foley	2020-09-24
\| \| \| \| \| \| \| \| \| \| \|	Based on review feedback from #1556, this change updates the Slang preprocessor so that it is no longer coupled to policy details from higher levels of the software stack. In particular, the preprocessor used to: * Deal with updating the list of file paths that a `Module` depends on. * (As of #1556) detect NVAPI-related macro definitions and use them to construct an AST-level `Modifier` attached to the `ModuleDecl`. This change introduces a callback interface where the `Preprocessor` calls out to a `PreprocessorHandler` at certain points during execution, allowing the handler to introduce custom logic that suits a particular high-level use case. This change also removes the dependence of the preprocessor on the `Linkage`, because in practice only a small number of its sub-objects were needed. As a convenience, a wrapper function that takes a `Linkage` was left in place so that the existing call sites didn't have to change very much.
*	Simplify workflow when using NVAPI (#1556)	Tim Foley	2020-09-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In some cases, functionality is available as either a GLSL extension for Vulkan/SPIR-V, or through the NVAPI system for D3D. This situation creates complications because while GLSL extensions are generally all supported by the open-source glslang compiler (which we can bundle and ship), NVAPI operations are exposed through a specific header (`nvHLSLExtns.h`) that ships as part of the NVAPI SDK. When a user wants to explicitly use NVAPI-provided operations in their shader code, there are no major complications for Slang; the user sets up their include paths, `#include`s the relevant header, calls functions in it, and lets Slang deal with the details of compilation. The challenge for Slang arises when we want to provide a cross-platform interface in our standard library (e.g., the `RWByteAddressBuffer.InterlockedAddF32` method that was recently added) that uses either a GLSL extension (when compiling for Vulkan/SPIR-V) or an NVAPI (when compiling to DXBC or DXIL). In that case, the code generated by Slang now has a dependency on NVAPI, and we need to somehow emit a `#include` directive that pulls it in when invoking fxc or dxc. Because we do not (and seemingly cannot) bundle the NVAPI header with the compiler, we have to rely on ther user to have it available and to somehow communicate to Slang where it is. Exposing portable routines that sometimes use NVAPI currently creates two main challenges: 1. The user is forced to interact with the "prelude" mechanism in the compiler, which allows the programmer to define code in a given target language that gets prepended to the Slang-generated code. While the prelude mechanism is powerful, it is also hard for users to integrate into their workflow, and our experience so far is that users want something that Just Works. 2. If the user writes code that uses some of our abstract operations that layer on NVAPI and they also want to use NVAPI explicitly, they end up with two copies of the NVAPI header (one included by the Slang front-end, and another included by the downstream fxc/dxc compiler). This puts the user in the situation of (a) having to ensure that they set the defines like `NV_SHADER_EXTN_SLOT` consistently both when invoking Slang and when adding their prelude, and (b) even if they do make the definitions consistent, they run into the problem that fxc/dxc complain about overlapping register bindings on the two copies of the `g_NvidiaExt` global shader paraemter that the NVAPI header declares. This change attempts to resolve both issues by adding a lot of "do what I mean" logic to the compiler to try to ease things in the common case. In particular: 1. The user no longer needs to use the "prelude" mechanism when using NVAPI. The compiler now embeds a default prelude for HLSL output, which will `#include` the NVAPI header if and only if the generated code needs NVAPI access because of portable standard library routines that were used. 2. The user can mix-and-match explicit NVAPI use and stdlib functions that compile to use NVAPI. The register/space to be used by NVAPI when included via prelude is now set based on whatever the user set via the preprocessor so that it should automatically be consistent between both cases. Furthermore, the code we emit for the declaration of `g_NvidiaExt` when compiling explicit NVAPI use is set up to be conditional, so that it is skipped in the case where the prelude will pull in its own declaration of that parameter. The way all this is achieved involves a lot of moving pieces: * We now have an HLSL prelude, which mostly just serves to `#include "nvHLSLExtns.h"` in the case where NVAPI support is needed downstream. * Standard library operations that require NVAPI for their implementation on HLSL include a new `[__requiresNVAPI]` attribute. * The preprocessor has been extended so that after tokenizing an input file it looks up the NVAPI-relevant macros in the resulting environment, and if they are set it attached a modifier (`NVAPISlotModifier1) to the AST `ModuleDecl` that is based on their values. Logic is added to detect if multiple input files specify values for the macros in ways that conflict. * The semantic checking step is extended so that it detects the "magic" NVAPI declarations (the `g_NvidiaExt` paramter and the `NvShaderExtnStruct` type that it uses) and attaches a modifier to them so that they can be identified as such in later steps. * Parameter binding is extended to collect a list of the AST modifiers that reflect NVAPI binding, and to reserve the relevant register(s) so that ordinary user-defined parameters cannot conflict with them. * IR lowering translates the three new AST modifiers related to NVAPI over to IR equivalents. * IR linking is extended to make sure that it clones any `IRNVAPISlotDecoration`s attached to the input modules. The pass intentionally does not care where the modifiers came from; it just collects them all and leaves it to downstream code to sort out what they mean. * Emit logic is extended to have a notion of "prelude directives" which are preprocessor directives that should come before the prelude in the generated code, because they can impact the way that the prelude compiles. This is done so that we don't have to introduce ad hoc logic for each downstream compiler to set any relevant `-D` flags (e.g., both fxc and dxc would need to duplicate such logic for NVAPI support). * The HLSL source emitter is extended to track whether it emits any operations that require NVAPI support. * The HLSL source emitter is extended to emit prelude directives based on whether NVAPI is needed and, if it is, to also set the register and space that NVAPI should use based on what was stored in the decoration(s) on the IR module. * The HLSL source emitter is extended so that it detects global instructions that represent "magic" NVAPI constructs , and emit them as conditional definitions so that they are skipped when NVAPI is included via the prelude. * The handling of requires capabilities during emit logic was cleaned up a bit so that more logic is shared across targets, and also so that the same logic is used both when emitting a function declaration/definition and when emitting a call to an instrinsic function (which won't get declared/defined).
*	Remove IncludeHandler. (#1505)	jsmall-nvidia	2020-08-19
\| \| \| \| \| \| \| \|	nvAPI -> NVAPI nvAPIPath -> nvapiPath DxcIncludeHandler don't reference count. nv-api-path -> nvapi-path Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Fix a preprocessor bug affecting X-macros (#1436)	Tim Foley	2020-07-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix a preprocessor bug affecting X-macros Fixes #1435 This bug exhibited as nondeterministic output from the preprocessor in release builds, but using a debug build it was narrowed down to a use-after-free issue. The core problem is subtle, but relates to how we set up the linked list that represents the "busy" status of macros in a particular expansion environment. Consider this scenario: ```hlsl X(A) ``` The flow we expect from the preprocessor is something like: 1. Read the `X` token in `X(A)` and recognize the start of a function-like macro invocation. Create an expansion environment for `X`, with the global environment as a parent, read in the arguments (just `A`), and push that expansion onto the stack. 2. Read the `M` token that starts the expansion of `M`, and recognize it as an invocation of the object-like macro representing the argument `M`. Create an expansion environment for the definition of `M` (which is just `A`), and push it onto the stack. 3. Read the token `A` from the expansion for the argument `M`, and recognize it as an invocation of the function-like macro `A`. Create an expansion environemnt for `A`, with the current environment as its parent, read in the arguments (just `0`), and push that expansion onto the stack. 4. Read the token `y` from the expansion for `A`, and recognize it as an invocation of the object-like macro representing the argument `y`. Create an expansion environment for the definition of `y` (which is just `0`) and push it onto the stack. 5. Read `0`. 6. Read a bunch of end-of-file tokens that cause all of these expansions to be popped. That all looks fine as written, but the gotcha is that the input stream for the expansion in step (2) is only a single token (`A`), which means that during step (3) the current input stream at the time we create the macro expansion for `A` is at the end of its input, and by the time we've read in the macro arguments that expansion will have been popped. The problem, then, is that the logic for setting up the stack of "busy" macros was being performed at the beginning of the expansion (the part referred to as "create an expansion" above), when it should only have been set up as part of pushing the xpansion onto the stack (since at that point we have a guarantee that the parent expansion cannot be popped until the child expansion has been). The fix here is thus pretty simple: we already have distinct operations for `initializeMacroExpansion()` and `pushMacroExpansion()`, and I simply moved the logic for setting up the "busy" state from the former to the latter. * fixup: typo
*	Reduce the size of Token (#1349)	jsmall-nvidia	2020-05-19
\| \| \| \| \| \|	* Token size on 64 bits is 24 bytes (from 40). On 32 bits is 16 bytes from 24. * Added hasContent method to Token. Some other small improvements around Token.
*	Code standard changes for Lexer (#1209)	jsmall-nvidia	2020-02-07
\| \| \| \| \| \| \| \|	* Upper camel -> lowerCamel m_ prefix members where appropriate _ prefix module local functions * m_ prefix members in Lexer. Fit's standard because type has methods/ctor.
*	Fix for a macro expansion bug (#1192)	Tim Foley	2020-01-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix for a macro expansion bug The work in #1177 added a notion of a macro being "busy" to prevent errors around recursively-defined macros: #define BAR(X) BAR(0) BAR(A) /* should yield BAR(0) and not infinite-loop / The fix was to place an `isBusy` flag on each macro, setting it to `true` when an expansion of that macro gets pushed onto the stack of input streams, and back to `false` when the expansion gets popped from the stack of input streams. That approach meant that we had to pre-expand macro arguments to avoid incorrectly treating a macro as busy for nested-but-not-recursive invocations: #define NEG(X) (-X) NEG(NEG(3)) // should expand to (-(-3)) and not (-NEG(3)) The subtle bug that arose with the `isBusy` flag is that the current preprocessor design doesn't always pop input streams when we might expect. For input like the following: BAR(A) BAR(B) The preprocessor can be in a state where the stack of input streams look like: // top input stream: BAR(X) => BAR(0) ----------------^ // bottom input stream BAR(A) BAR(B) ------^ That is, the first macro expansion is still "open," even though it is at the end of its input. When the preprocesor looks ahead and sees `BAR` as the next token and asks whether it should be expanded, the `isBusy` state on the BAR macro hasn't yet been unset, so expansion gets skipped. Aside: it is completely reasonable to ask at this point whether we should just "fix" the behavior of not popping input streams that are at their end. We should consider doing that, but the current code goes to some lengths to preserve the current behavior and I do not* recall why we had found it necessary. Any attempt to change that behavior should come along with lots of testing. This change instead adds a different approach for tracking the busy-ness of macros that doesn't rely on mutable state in the macro, and thus works even without pre-expansion of macro arguments. In the Slang preprocessor, macro names are always looked up in environments, where each environment maps names to macros, and has an option parent environment. There is a global environment used for most cases, but when expanding a function-like macro we would create an "argument environment" for it that maps the parameter names to their argument tokens. By expanding the environment type to include an (optional) field for a macro that is made "busy" by that environment, we can check if a macro is busy in a given environment by checking if the given macro has been associated with any of the environments on the chain of parent environments. A function-like macro expansion could then attach the macro to the "argument environment" and automatically make itself busy for the purposes of expansion of its body. To extend the same functionality to all macros, the "argument environment" was changes to an "expansion environment" that always gets used for a macro expansion and always has the identity of the macro attached. Because the arguments to a function-like macro have their own environment for expansion that bypasses the argument/expansion environment of the function-like macro, the macro will show as busy in its own body, but not in the expansion of its arguments. Thus the pre-expansion of macro arguments is not needed with this change. Aside: it might not be clear how this design avoids the problem of the unpopped input streams that aren't at their end. The answer is that the "current" environment for the preprocessor is already taken as the environment of the top-most input stream that is not at its end. This means that the new test for busy-ness benefits from the existing logic that was in place to deal with non popping input streams as soon as then end. (This is not to say that the current input-stream behavior isn't questionable) This change includes a new test case for the behavior that was broken, and expands the test case from #1177 to also test object-like macros. * Fix to handle the mutually-recursive case The revised "busy" logic was unable to handle a mutually-recursive macro like: #define ABC XYZ #define XYZ ABC This change restores the ability to handle mutually recursive cases, and makes sure that we test that case. The crux of the fix relies on splitting the environment and busy-macro tracking into more separate data structures. Rather than the list of environments directly trakcing busy-ness via the chain of parent environments, an environment now stores a separate linked list of macros that should be considered busy in that environment. Decoupling the two lists allows for an important change: when expanding an ordinary macro (either object-like or function-like, but not a function argument) the parent environment comes from the point where the macro was defined (this is required for lookup to make sense), but the parent list of busy macros comes from the current input stream at the point where expansion takes place. That change ensures that non-argument macros always properly "stack up" the list of busy macros.
*	Fix for infinite recursion with macro invocation (#1177)	jsmall-nvidia	2020-01-24
\| \| \| \| \| \| \|	* First pass fix of macro expansion logic to stop recursive application (causting a recursive loop), whilst also allowing application on parameters to a macro. * Added recursive-macro test. Fixed macro application example.
*	Use slang- prefix on slang compiler and core source (#973)	jsmall-nvidia	2019-05-31
	* Prefixing source files in source/slang with slang- * Prefix source in source/slang with slang- prefix. * Rename core source files with slang- prefix. * Update project files. * Fix problems from automatic merge.