Simplify workflow when using NVAPI (#1556)

In some cases, functionality is available as either a GLSL extension for Vulkan/SPIR-V, or through the NVAPI system for D3D. This situation creates complications because while GLSL extensions are generally all supported by the open-source glslang compiler (which we can bundle and ship), NVAPI operations are exposed through a specific header (`nvHLSLExtns.h`) that ships as part of the NVAPI SDK. When a user wants to explicitly use NVAPI-provided operations in their shader code, there are no major complications for Slang; the user sets up their include paths, `#include`s the relevant header, calls functions in it, and lets Slang deal with the details of compilation. The challenge for Slang arises when we want to provide a cross-platform interface in our standard library (e.g., the `RWByteAddressBuffer.InterlockedAddF32` method that was recently added) that uses either a GLSL extension (when compiling for Vulkan/SPIR-V) or an NVAPI (when compiling to DXBC or DXIL). In that case, the code *generated* by Slang now has a dependency on NVAPI, and we need to somehow emit a `#include` directive that pulls it in when invoking fxc or dxc. Because we do not (and seemingly cannot) bundle the NVAPI header with the compiler, we have to rely on ther user to have it available and to somehow communicate to Slang where it is. Exposing portable routines that sometimes use NVAPI currently creates two main challenges: 1. The user is forced to interact with the "prelude" mechanism in the compiler, which allows the programmer to define code in a given target language that gets prepended to the Slang-generated code. While the prelude mechanism is powerful, it is also hard for users to integrate into their workflow, and our experience so far is that users want something that Just Works. 2. If the user writes code that uses some of our abstract operations that layer on NVAPI *and* they also want to use NVAPI explicitly, they end up with two copies of the NVAPI header (one included by the Slang front-end, and another included by the downstream fxc/dxc compiler). This puts the user in the situation of (a) having to ensure that they set the defines like `NV_SHADER_EXTN_SLOT` consistently both when invoking Slang and when adding their prelude, and (b) even if they do make the definitions consistent, they run into the problem that fxc/dxc complain about overlapping register bindings on the two copies of the `g_NvidiaExt` global shader paraemter that the NVAPI header declares. This change attempts to resolve both issues by adding a lot of "do what I mean" logic to the compiler to try to ease things in the common case. In particular: 1. The user no longer needs to use the "prelude" mechanism when using NVAPI. The compiler now embeds a default prelude for HLSL output, which will `#include` the NVAPI header if and only if the generated code needs NVAPI access because of portable standard library routines that were used. 2. The user can mix-and-match explicit NVAPI use and stdlib functions that compile to use NVAPI. The register/space to be used by NVAPI when included via prelude is now set based on whatever the user set via the preprocessor so that it should automatically be consistent between both cases. Furthermore, the code we emit for the declaration of `g_NvidiaExt` when compiling explicit NVAPI use is set up to be conditional, so that it is skipped in the case where the prelude will pull in its own declaration of that parameter. The way all this is achieved involves a lot of moving pieces: * We now have an HLSL prelude, which mostly just serves to `#include "nvHLSLExtns.h"` in the case where NVAPI support is needed downstream. * Standard library operations that require NVAPI for their implementation on HLSL include a new `[__requiresNVAPI]` attribute. * The preprocessor has been extended so that after tokenizing an input file it looks up the NVAPI-relevant macros in the resulting environment, and if they are set it attached a modifier (`NVAPISlotModifier1) to the AST `ModuleDecl` that is based on their values. Logic is added to detect if multiple input files specify values for the macros in ways that conflict. * The semantic checking step is extended so that it detects the "magic" NVAPI declarations (the `g_NvidiaExt` paramter and the `NvShaderExtnStruct` type that it uses) and attaches a modifier to them so that they can be identified as such in later steps. * Parameter binding is extended to collect a list of the AST modifiers that reflect NVAPI binding, and to reserve the relevant register(s) so that ordinary user-defined parameters cannot conflict with them. * IR lowering translates the three new AST modifiers related to NVAPI over to IR equivalents. * IR linking is extended to make sure that it clones any `IRNVAPISlotDecoration`s attached to the input modules. The pass intentionally does not care where the modifiers came from; it just collects them all and leaves it to downstream code to sort out what they mean. * Emit logic is extended to have a notion of "prelude directives" which are preprocessor directives that should come *before* the prelude in the generated code, because they can impact the way that the prelude compiles. This is done so that we don't have to introduce ad hoc logic for each downstream compiler to set any relevant `-D` flags (e.g., both fxc and dxc would need to duplicate such logic for NVAPI support). * The HLSL source emitter is extended to track whether it emits any operations that require NVAPI support. * The HLSL source emitter is extended to emit prelude directives based on whether NVAPI is needed and, if it is, to also set the register and space that NVAPI should use based on what was stored in the decoration(s) on the IR module. * The HLSL source emitter is extended so that it detects global instructions that represent "magic" NVAPI constructs , and emit them as conditional definitions so that they are skipped when NVAPI is included via the prelude. * The handling of requires capabilities during emit logic was cleaned up a bit so that more logic is shared across targets, and also so that the same logic is used both when emitting a function declaration/definition and when emitting a call to an instrinsic function (which won't get declared/defined).
author: Tim Foley <tfoleyNV@users.noreply.github.com> 2020-09-23 15:47:14 -0700
committer: GitHub <noreply@github.com> 2020-09-23 15:47:14 -0700
commit: 895405212aa286701031a4f62b6904938105411c (patch)
tree: 81abc616192e51c8500e3d3d119cef653349341a
parent: 3d063a7024e54340b6fed2af964ea2790056a3e3 (diff)
31 files changed, 732 insertions, 80 deletions
diff --git a/.gitignore b/.gitignore
index 0ee4e49e4..c25e68b15 100644
--- a/.gitignore
+++ b/.gitignore
@@ -37,7 +37,8 @@ tests/**/*.slang-module
 *.spv
 
 # Intermediate source files generated during build process
-/source/slang/slang-ast-generated.h
-/source/slang/slang-ast-generated-macro.h
-/source/slang/hlsl.meta.slang.h
-/source/slang/core.meta.slang.h
+/source/slang/slang-ast-generated.h
+/source/slang/slang-ast-generated-macro.h
+/source/slang/hlsl.meta.slang.h
+/source/slang/core.meta.slang.h
+prelude/*.h.cpp
diff --git a/prelude/slang-hlsl-prelude.h b/prelude/slang-hlsl-prelude.h
new file mode 100644
index 000000000..c01159e4a
--- /dev/null
+++ b/prelude/slang-hlsl-prelude.h
@@ -0,0 +1,3 @@
+#ifdef SLANG_HLSL_ENABLE_NVAPI
+#include "nvHLSLExtns.h"
+#endif
diff --git a/premake5.lua b/premake5.lua
index f6bfe6d77..1b350998b 100644
--- a/premake5.lua
+++ b/premake5.lua
@@ -954,7 +954,10 @@ standardProject "slang"
     -- compile for their embedded code, since they will not
     -- exist at the time projects/makefiles are generated,
     -- and thus a glob would not match anything.
-    files { "prelude/slang-cuda-prelude.h.cpp" }
+    files {
+        "prelude/slang-cuda-prelude.h.cpp",
+        "prelude/slang-hlsl-prelude.h.cpp",
+    }
 
     -- 
     -- The most challenging part of building `slang` is that we need
diff --git a/source/core/slang-string.h b/source/core/slang-string.h
index 02a43a806..25bf99023 100644
--- a/source/core/slang-string.h
+++ b/source/core/slang-string.h
@@ -161,6 +161,11 @@ namespace Slang
             return !(*this == other);
         }
 
+        bool operator!=(char const* str) const
+        {
+            return (*this) != UnownedStringSlice(str, str + ::strlen(str));
+        }
+
         bool startsWith(UnownedStringSlice const& other) const;
         bool startsWith(char const* str) const;
 
diff --git a/source/slang/core.meta.slang b/source/slang/core.meta.slang
index 8d0ce1d1d..f8b1bd1f9 100644
--- a/source/slang/core.meta.slang
+++ b/source/slang/core.meta.slang
@@ -2079,3 +2079,6 @@ attribute_syntax [anyValueSize(size:int)] : AnyValueSizeAttribute;
 
 __attributeTarget(DeclBase)
 attribute_syntax [builtin] : BuiltinAttribute;
+
+__attributeTarget(DeclBase)
+attribute_syntax [__requiresNVAPI] : RequiresNVAPIAttribute;
diff --git a/source/slang/hlsl.meta.slang b/source/slang/hlsl.meta.slang
index aeadf8eba..7c5ca0027 100644
--- a/source/slang/hlsl.meta.slang
+++ b/source/slang/hlsl.meta.slang
@@ -285,6 +285,7 @@ ${{{{
     __target_intrinsic(hlsl, "($3 = NvInterlockedAddFp32($0, $1, $2))")
     __cuda_sm_version(2.0)
     __target_intrinsic(cuda, "(*$3 = atomicAdd((float*)$0._getPtrAt($1), $2))")
+    [__requiresNVAPI]
     void InterlockedAddF32(uint byteAddress, float valueToAdd, out float originalValue);
 
     __specialized_for_target(glsl)
diff --git a/source/slang/run-generators.vcxproj b/source/slang/run-generators.vcxproj
index c03b67ce1..41f61f01d 100644
--- a/source/slang/run-generators.vcxproj
+++ b/source/slang/run-generators.vcxproj
@@ -209,6 +209,19 @@
       <AdditionalInputs Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">../../bin/windows-x86/release/slang-embed.exe</AdditionalInputs>
       <AdditionalInputs Condition="'$(Configuration)|$(Platform)'=='Release|x64'">../../bin/windows-x64/release/slang-embed.exe</AdditionalInputs>
     </CustomBuild>
+    <CustomBuild Include="..\..\prelude\slang-hlsl-prelude.h">
+      <FileType>Document</FileType>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">"../../bin/windows-x86/debug/slang-embed" %(Identity)</Command>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">"../../bin/windows-x64/debug/slang-embed" %(Identity)</Command>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">"../../bin/windows-x86/release/slang-embed" %(Identity)</Command>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Release|x64'">"../../bin/windows-x64/release/slang-embed" %(Identity)</Command>
+      <Outputs>../../prelude/slang-hlsl-prelude.h.cpp</Outputs>
+      <Message>slang-embed %(Identity)</Message>
+      <AdditionalInputs Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">../../bin/windows-x86/debug/slang-embed.exe</AdditionalInputs>
+      <AdditionalInputs Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">../../bin/windows-x64/debug/slang-embed.exe</AdditionalInputs>
+      <AdditionalInputs Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">../../bin/windows-x86/release/slang-embed.exe</AdditionalInputs>
+      <AdditionalInputs Condition="'$(Configuration)|$(Platform)'=='Release|x64'">../../bin/windows-x64/release/slang-embed.exe</AdditionalInputs>
+    </CustomBuild>
     <CustomBuild Include="core.meta.slang">
       <FileType>Document</FileType>
       <Command Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">"../../bin/windows-x86/debug/slang-generate" %(Identity)</Command>
diff --git a/source/slang/run-generators.vcxproj.filters b/source/slang/run-generators.vcxproj.filters
index 2f0200bd1..6a51d5ba7 100644
--- a/source/slang/run-generators.vcxproj.filters
+++ b/source/slang/run-generators.vcxproj.filters
@@ -26,6 +26,9 @@
     <CustomBuild Include="..\..\prelude\slang-cuda-prelude.h">
       <Filter>Header Files</Filter>
     </CustomBuild>
+    <CustomBuild Include="..\..\prelude\slang-hlsl-prelude.h">
+      <Filter>Header Files</Filter>
+    </CustomBuild>
     <CustomBuild Include="core.meta.slang">
       <Filter>Source Files</Filter>
     </CustomBuild>
diff --git a/source/slang/slang-ast-modifier.h b/source/slang/slang-ast-modifier.h
index 6a80d8722..cbdca6ab7 100644
--- a/source/slang/slang-ast-modifier.h
+++ b/source/slang/slang-ast-modifier.h
@@ -893,4 +893,47 @@ class AnyValueSizeAttribute : public Attribute
     int32_t size;
 };
 
+    /// A `[__requiresNVAPI]` attribute indicates that the declaration being modifed
+    /// requires NVAPI operations for its implementation on D3D.
+class RequiresNVAPIAttribute : public Attribute
+{
+    SLANG_CLASS(RequiresNVAPIAttribute)
+};
+
+    /// Indicates that the modified declaration is one of the "magic" declarations
+    /// that NVAPI uses to communicate extended operations. When NVAPI is being included
+    /// via the prelude for downstream compilation, declarations with this modifier
+    /// will not be emitted, instead allowing the versions from the prelude to be used.
+class NVAPIMagicModifier : public Modifier
+{
+    SLANG_CLASS(NVAPIMagicModifier)
+};
+
+    /// A modifier that attaches to a `ModuleDecl` to indicate the register/space binding
+    /// that NVAPI wants to use, as indicated by, e.g., the `NV_SHADER_EXTN_SLOT` and
+    /// `NV_SHADER_EXTN_REGISTER_SPACE` preprocessor definitions.
+class NVAPISlotModifier : public Modifier
+{
+    SLANG_CLASS(NVAPISlotModifier)
+
+        /// The name of the register that is to be used (e.g., `"u3"`)
+        ///
+        /// This value will come from the `NV_SHADER_EXTN_SLOT` macro, if set.
+        ///
+        /// The `registerName` field must always be filled in when adding
+        /// an `NVAPISlotModifier` to a module; if no register name is defined,
+        /// then the modifier should not be added.
+        ///
+    String registerName;
+
+        /// The name of the register space to be used (e.g., `space1`)
+        ///
+        /// This value will come from the `NV_SHADER_EXTN_REGISTER_SPACE` macro,
+        /// if set.
+        ///
+        /// It is valid for a user to specify a register name but not a space name,
+        /// and in that case `spaceName` will be set to `"space0"`.
+    String spaceName;
+};
+
 } // namespace Slang
diff --git a/source/slang/slang-check-decl.cpp b/source/slang/slang-check-decl.cpp
index 3ccf6fe06..a51dc0313 100644
--- a/source/slang/slang-check-decl.cpp
+++ b/source/slang/slang-check-decl.cpp
@@ -89,6 +89,7 @@ namespace Slang
 
         void visitPropertyDecl(PropertyDecl* decl);
 
+        void visitStructDecl(StructDecl* decl);
 
             /// Get the type of the storage accessed by an accessor.
             ///
@@ -915,6 +916,87 @@ namespace Slang
             //
             validateArraySizeForVariable(varDecl);
         }
+
+        // The NVAPI library allows user code to express extended operations
+        // (not supported natively by D3D HLSL) by communicating with
+        // a specially identified shader parameter called `g_NvidiaExt`.
+        //
+        // By default, that shader parameter would look like an ordinary
+        // global shader parameter to Slang, but we want to be able to
+        // associate special behavior with it to make downstream compilation
+        // work nicely (especially in the case where certain cross-platform
+        // operations in the Slang standard library need to use NVAPI).
+        //
+        // We will detect a global variable declaration that appears to
+        // be declaring `g_NvidiaExt` from NVAPI, and mark it with a special
+        // modifier to allow downstream steps to detect it whether or
+        // not it has an associated name.
+        //
+        if( as<ModuleDecl>(varDecl->parentDecl)
+            && varDecl->getName()
+            && varDecl->getName()->text == "g_NvidiaExt" )
+        {
+            addModifier(varDecl, m_astBuilder->create<NVAPIMagicModifier>());
+        }
+        //
+        // One thing that the `NVAPIMagicModifier` is going to do is ensure
+        // that `g_NvidiaExt` always gets emitted with *exactly* that name,
+        // whether or not obfuscation or other steps are enabled.
+        //
+        // The `g_NvidiaExt` variable is declared as a:
+        //
+        //      RWStructuredBuffer<NvShaderExtnStruct>
+        //
+        // and we also want to make sure that the fields of that struct
+        // retain their original names in output code. We will detect
+        // variable declarations that represent fields of that struct
+        // and flag them as "magic" as well.
+        //
+        // Note: The goal here is to make it so that generated HLSL output
+        // can either use these declarations as they have been preocessed
+        // by the Slang front-end *or* they can use declarations directly
+        // from the NVAPI header during downstream compilation.
+        //
+        // TODO: It would be nice if we had a way to identify *all* of the
+        // declarations that come from the NVAPI header and mark them, so
+        // that the Slang front-end doesn't have to take responsibility
+        // for generating code from them (and can instead rely on the downstream
+        // compiler alone).
+        //
+        // The NVAPI header doesn't put any kind of macro-defined modifier
+        // (defaulting to an empty macro) in front of its declarations,
+        // so the most plausible way to add a modifier to all the declarations
+        // would be to tag the `nvHLSLExtns.h` header in a list of "magic"
+        // headers which should get all their declarations flagged during
+        // front-end processing, and then use the same header again during
+        // downstream compilation.
+        //
+        // For now, the current hackery seems a bit less complicated.
+        //
+        if( auto structDecl = as<StructDecl>(varDecl->parentDecl))
+        {
+            if( structDecl->getName()
+                && structDecl->getName()->text == "NvShaderExtnStruct" )
+            {
+                addModifier(varDecl, m_astBuilder->create<NVAPIMagicModifier>());
+            }
+        }
+    }
+
+    void SemanticsDeclHeaderVisitor::visitStructDecl(StructDecl* structDecl)
+    {
+        // As described above in `SemanticsDeclHeaderVisitor::checkVarDeclCommon`,
+        // we want to identify and tag the "magic" declarations that make NVAPI
+        // work, so that downstream passes can identify them and act accordingly.
+        //
+        // In this case, we are looking for the `NvShaderExtnStruct` type, which
+        // is used by `g_NvidiaExt`.
+        //
+        if( structDecl->getName()
+            && structDecl->getName()->text == "NvShaderExtnStruct" )
+        {
+            addModifier(structDecl, m_astBuilder->create<NVAPIMagicModifier>());
+        }
     }
 
     void SemanticsDeclBodyVisitor::checkVarDeclCommon(VarDeclBase* varDecl)
diff --git a/source/slang/slang-diagnostic-defs.h b/source/slang/slang-diagnostic-defs.h
index 044ce0830..d7f561537 100644
--- a/source/slang/slang-diagnostic-defs.h
+++ b/source/slang/slang-diagnostic-defs.h
@@ -555,6 +555,15 @@ DIAGNOSTIC(52003, Error, cppCompilerNotFound, "Could not find a suitable C/C++ c
 DIAGNOSTIC(52004, Error, unableToWriteFile, "Unable to write file '$0'")
 DIAGNOSTIC(52005, Error, unableToReadFile, "Unable to read file '$0'")
 
+//
+// 8xxxx - Issues specific to a particular library/technology/platform/etc.
+//
+
+// 811xx - NVAPI
+
+DIAGNOSTIC(81110, Error, nvapiMacroMismatch, "conflicting definitions for NVAPI macro '$0': '$1' and '$2'")
+
+
 // 99999 - Internal compiler errors, and not-yet-classified diagnostics.
 
 DIAGNOSTIC(99999, Internal, unimplemented, "unimplemented feature in Slang compiler: $0")
diff --git a/source/slang/slang-emit-c-like.cpp b/source/slang/slang-emit-c-like.cpp
index 9322c30bd..e0499319f 100644
--- a/source/slang/slang-emit-c-like.cpp
+++ b/source/slang/slang-emit-c-like.cpp
@@ -674,6 +674,18 @@ String CLikeSourceEmitter::generateName(IRInst* inst)
         return String(intrinsicDecoration->getDefinition());
     }
 
+    // If the instruction reprsents one of the "magic" declarations
+    // that makes the NVAPI library work, then we want to make sure
+    // it uses the original name it was declared with, so that our
+    // generated code will work correctly with either a Slang-compiled
+    // or directly `#include`d version of those declarations during
+    // downstream compilation.
+    //
+    if(auto nvapiDecor = inst->findDecoration<IRNVAPIMagicDecoration>())
+    {
+        return String(nvapiDecor->getName());
+    }
+
     auto entryPointDecor = inst->findDecoration<IREntryPointDecoration>();
     if (entryPointDecor)
     {
@@ -1997,12 +2009,23 @@ void CLikeSourceEmitter::_emitCallArgList(IRCall* inst)
     m_writer->emit(")");
 }
 
+void CLikeSourceEmitter::handleRequiredCapabilities(IRInst* inst)
+{
+    auto decoratedValue = inst;
+    while (auto specInst = as<IRSpecialize>(decoratedValue))
+    {
+        decoratedValue = getSpecializedValue(specInst);
+    }
+
+    handleRequiredCapabilitiesImpl(decoratedValue);
+}
+
 void CLikeSourceEmitter::emitCallExpr(IRCall* inst, EmitOpInfo outerPrec)
 {
     auto funcValue = inst->getOperand(0);
 
     // Does this function declare any requirements.
-    handleCallExprDecorationsImpl(funcValue);
+    handleRequiredCapabilities(funcValue);
 
     // We want to detect any call to an intrinsic operation,
     // that we can emit it directly without mangling, etc.
@@ -3214,6 +3237,10 @@ void CLikeSourceEmitter::emitSimpleFuncImpl(IRFunc* func)
         emitEntryPointAttributes(func, entryPointDecor);
     }
 
+    // Deal with required features/capabilities of the function
+    //
+    handleRequiredCapabilitiesImpl(func);
+
     emitFunctionPreambleImpl(func);
 
     auto name = getName(func);
@@ -3735,6 +3762,11 @@ void CLikeSourceEmitter::emitGlobalParam(IRGlobalParam* varDecl)
 
 void CLikeSourceEmitter::emitGlobalInst(IRInst* inst)
 {
+    emitGlobalInstImpl(inst);
+}
+
+void CLikeSourceEmitter::emitGlobalInstImpl(IRInst* inst)
+{
     m_writer->advanceToSourceLocation(inst->sourceLoc);
 
     switch(inst->op)
diff --git a/source/slang/slang-emit-c-like.h b/source/slang/slang-emit-c-like.h
index 05cffd053..9c91078ee 100644
--- a/source/slang/slang-emit-c-like.h
+++ b/source/slang/slang-emit-c-like.h
@@ -270,6 +270,7 @@ public:
     void emitGlobalParam(IRGlobalParam* varDecl);
 
     void emitGlobalInst(IRInst* inst);
+    virtual void emitGlobalInstImpl(IRInst* inst);
 
     void ensureInstOperand(ComputeEmitActionsContext* ctx, IRInst* inst, EmitAction::Level requiredLevel = EmitAction::Level::Definition);
 
@@ -282,6 +283,13 @@ public:
     void executeEmitActions(List<EmitAction> const& actions);
     void emitModule(IRModule* module) { m_irModule = module; emitModuleImpl(module); }
 
+        /// Emit any preprocessor directives that should come *before* the prelude code
+        ///
+        /// These are directives that are intended to customize some aspect(s) of the
+        /// prelude's behavior.
+        ///
+    void emitPreludeDirectives() { emitPreludeDirectivesImpl(); }
+
     void emitPreprocessorDirectives() { emitPreprocessorDirectivesImpl(); }
     void emitSimpleType(IRType* type);
 
@@ -308,6 +316,7 @@ public:
 
     virtual void emitImageFormatModifierImpl(IRInst* varDecl, IRType* varType) { SLANG_UNUSED(varDecl); SLANG_UNUSED(varType); }
     virtual void emitLayoutQualifiersImpl(IRVarLayout* layout) { SLANG_UNUSED(layout); }
+    virtual void emitPreludeDirectivesImpl() {}
     virtual void emitPreprocessorDirectivesImpl() {}
     virtual void emitLayoutDirectivesImpl(TargetRequest* targetReq) { SLANG_UNUSED(targetReq); }
     virtual void emitRateQualifiersImpl(IRRate* rate) { SLANG_UNUSED(rate); }
@@ -338,11 +347,15 @@ public:
     virtual void emitInterface(IRInterfaceType* interfaceType);
     virtual void emitRTTIObject(IRRTTIObject* rttiObject);
 
-    virtual void handleCallExprDecorationsImpl(IRInst* funcValue) { SLANG_UNUSED(funcValue); }
-
     virtual bool tryEmitGlobalParamImpl(IRGlobalParam* varDecl, IRType* varType) { SLANG_UNUSED(varDecl); SLANG_UNUSED(varType); return false; }
     virtual bool tryEmitInstExprImpl(IRInst* inst, const EmitOpInfo& inOuterPrec) { SLANG_UNUSED(inst); SLANG_UNUSED(inOuterPrec); return false; }
 
+        /// Inspect the capabilities required by `inst` (according to its decorations),
+        /// and ensure that those capabilities have been detected and stored in the
+        /// target-specific extension tracker.
+    void handleRequiredCapabilities(IRInst* inst);
+    virtual void handleRequiredCapabilitiesImpl(IRInst* inst) { SLANG_UNUSED(inst); }
+
     void _emitArrayType(IRArrayType* arrayType, EDeclarator* declarator);
     void _emitUnsizedArrayType(IRUnsizedArrayType* arrayType, EDeclarator* declarator);
     void _emitType(IRType* type, EDeclarator* declarator);
diff --git a/source/slang/slang-emit-cpp.cpp b/source/slang/slang-emit-cpp.cpp
index 88afead1a..5359740b4 100644
--- a/source/slang/slang-emit-cpp.cpp
+++ b/source/slang/slang-emit-cpp.cpp
@@ -2184,7 +2184,7 @@ bool CPPSourceEmitter::tryEmitInstExprImpl(IRInst* inst, const EmitOpInfo& inOut
             auto funcValue = inst->getOperand(0);
 
             // Does this function declare any requirements.
-            handleCallExprDecorationsImpl(funcValue);
+            handleRequiredCapabilities(funcValue);
 
             // try doing automatically
             return _tryEmitInstExprAsIntrinsic(inst, inOuterPrec);
diff --git a/source/slang/slang-emit-cuda.cpp b/source/slang/slang-emit-cuda.cpp
index b029a8aa6..1e6df82ba 100644
--- a/source/slang/slang-emit-cuda.cpp
+++ b/source/slang/slang-emit-cuda.cpp
@@ -592,18 +592,12 @@ void CUDASourceEmitter::_requireCUDASMVersion(SemanticVersion const& version)
     }
 }
 
-void CUDASourceEmitter::handleCallExprDecorationsImpl(IRInst* funcValue)
+void CUDASourceEmitter::handleRequiredCapabilitiesImpl(IRInst* inst)
 {
-    // Does this function declare any requirements on GLSL version or
-    // extensions, which should affect our output?
+    // Does this function declare any requirements on CUDA capabilities
+    // that should affect output?
 
-    auto decoratedValue = funcValue;
-    while (auto specInst = as<IRSpecialize>(decoratedValue))
-    {
-        decoratedValue = getSpecializedValue(specInst);
-    }
-
-    for (auto decoration : decoratedValue->getDecorations())
+    for (auto decoration : inst->getDecorations())
     {
         if( auto smDecoration = as<IRRequireCUDASMVersionDecoration>(decoration))
         {
diff --git a/source/slang/slang-emit-cuda.h b/source/slang/slang-emit-cuda.h
index 9378453ba..01fc3fb5b 100644
--- a/source/slang/slang-emit-cuda.h
+++ b/source/slang/slang-emit-cuda.h
@@ -64,7 +64,7 @@ protected:
 
     virtual void emitLoopControlDecorationImpl(IRLoopControlDecoration* decl) SLANG_OVERRIDE;
 
-    virtual void handleCallExprDecorationsImpl(IRInst* funcValue) SLANG_OVERRIDE;
+    virtual void handleRequiredCapabilitiesImpl(IRInst* inst) SLANG_OVERRIDE;
 
     virtual bool tryEmitGlobalParamImpl(IRGlobalParam* varDecl, IRType* varType) SLANG_OVERRIDE;
     virtual bool tryEmitInstExprImpl(IRInst* inst, const EmitOpInfo& inOuterPrec) SLANG_OVERRIDE;
diff --git a/source/slang/slang-emit-glsl.cpp b/source/slang/slang-emit-glsl.cpp
index d09c778b9..cc2494455 100644
--- a/source/slang/slang-emit-glsl.cpp
+++ b/source/slang/slang-emit-glsl.cpp
@@ -1452,18 +1452,12 @@ bool GLSLSourceEmitter::tryEmitInstExprImpl(IRInst* inst, const EmitOpInfo& inOu
     return false;
 }
 
-void GLSLSourceEmitter::handleCallExprDecorationsImpl(IRInst* funcValue)
+void GLSLSourceEmitter::handleRequiredCapabilitiesImpl(IRInst* inst)
 {
     // Does this function declare any requirements on GLSL version or
     // extensions, which should affect our output?
 
-    auto decoratedValue = funcValue;
-    while (auto specInst = as<IRSpecialize>(decoratedValue))
-    {
-        decoratedValue = getSpecializedValue(specInst);
-    }
-
-    for (auto decoration : decoratedValue->getDecorations())
+    for (auto decoration : inst->getDecorations())
     {
         switch (decoration->op)
         {
diff --git a/source/slang/slang-emit-glsl.h b/source/slang/slang-emit-glsl.h
index 5c46d5471..56da1b064 100644
--- a/source/slang/slang-emit-glsl.h
+++ b/source/slang/slang-emit-glsl.h
@@ -44,7 +44,7 @@ protected:
     virtual void emitVarDecorationsImpl(IRInst* varDecl) SLANG_OVERRIDE;
     virtual void emitMatrixLayoutModifiersImpl(IRVarLayout* layout) SLANG_OVERRIDE;
 
-    virtual void handleCallExprDecorationsImpl(IRInst* funcValue) SLANG_OVERRIDE;
+    virtual void handleRequiredCapabilitiesImpl(IRInst* inst) SLANG_OVERRIDE;
 
     virtual bool tryEmitGlobalParamImpl(IRGlobalParam* varDecl, IRType* varType) SLANG_OVERRIDE;
     virtual bool tryEmitInstExprImpl(IRInst* inst, const EmitOpInfo& inOuterPrec) SLANG_OVERRIDE;
diff --git a/source/slang/slang-emit-hlsl.cpp b/source/slang/slang-emit-hlsl.cpp
index 036c4701a..57d603940 100644
--- a/source/slang/slang-emit-hlsl.cpp
+++ b/source/slang/slang-emit-hlsl.cpp
@@ -1001,5 +1001,98 @@ void HLSLSourceEmitter::emitMatrixLayoutModifiersImpl(IRVarLayout* layout)
     }
 }
 
+void HLSLSourceEmitter::handleRequiredCapabilitiesImpl(IRInst* inst)
+{
+    if(inst->findDecoration<IRRequiresNVAPIDecoration>())
+    {
+        m_extensionTracker->m_requiresNVAPI = true;
+    }
+}
+
+void HLSLSourceEmitter::emitPreludeDirectivesImpl()
+{
+    if( m_extensionTracker->m_requiresNVAPI )
+    {
+        // If the generated code includes implicit NVAPI use,
+        // then we need to ensure that NVAPI support is included
+        // via the prelude.
+        //
+        m_writer->emit("#define SLANG_HLSL_ENABLE_NVAPI 1\n");
+
+        // In addition, if the user has informed the Slang compiler of
+        // the register/space that it wants to use for NVAPI, then we
+        // need to pass along that information to prelude in the
+        // generated code, so that it can be picked up by the NVAPI
+        // header at the point where it gets included.
+        //
+        // Note: If the user doesn't inform the Slang compiler where
+        // it wants the NVAPI parameter to be bound, then a downstream
+        // compiler error is going to occur. We could try to produce
+        // our own error message here, but our error is unlikely to
+        // be significantly better, and also it is *technically*
+        // possible for the user to use Slang to generate HLSL,
+        // and then go on to compile it manually via fxc/dxc, where
+        // they could pass in these `#define`s using command-line
+        // or API options.
+        //
+        if( auto decor = m_irModule->getModuleInst()->findDecoration<IRNVAPISlotDecoration>() )
+        {
+            m_writer->emit("#define NV_SHADER_EXTN_SLOT ");
+            m_writer->emit(decor->getRegisterName());
+            m_writer->emit("\n");
+
+            // Note: We only emit a preprocessor directive if the space
+            // is not `space0`, because we want to ensure that the output
+            // code can compile with fxc when possible (and fxc has no
+            // understanding of `space`s).
+            //
+            auto spaceName = decor->getSpaceName();
+            if( spaceName != "space0" )
+            {
+                m_writer->emit("#define NV_SHADER_EXTN_REGISTER_SPACE ");
+                m_writer->emit(spaceName);
+                m_writer->emit("\n");
+            }
+        }
+    }
+}
+
+void HLSLSourceEmitter::emitGlobalInstImpl(IRInst* inst)
+{
+    if( auto nvapiDecor = inst->findDecoration<IRNVAPIMagicDecoration>() )
+    {
+        // When emitting one of the "magic" NVAPI declarations,
+        // we will wrap it in a preprocessor conditional that
+        // skips it if the NVAPI header is already being included
+        // via the prelude. In that case, the definitions from
+        // the prelude-included NVAPI will be used instead of
+        // those that were processed by the Slang front-end.
+        //
+        // TODO: In theory we could drop the downstream preprocessor
+        // conditional here, and either emit or not emit the
+        // instruction based on whether the code needs NVAPI (which
+        // is when `SLANG_HLSL_ENABLE_NVAPI` would be set).
+        // Such a change would require that we replace the current
+        // approach of tracking extension use during emit with an
+        // approach that detects requirements as a pure pre-pass.
+        //
+        // Note: We skip `IRStructKey` instructions here because
+        // the fields of the `NvShaderExtnStruct` are also decorated,
+        // but field keys don't produce anything in the output, so
+        // we'd have conditionals that are wrapping empty lines.
+        //
+        if( !as<IRStructKey>(inst) )
+        {
+            m_writer->emit("#ifndef SLANG_HLSL_ENABLE_NVAPI\n");
+            Super::emitGlobalInstImpl(inst);
+            m_writer->emit("#endif\n");
+            return;
+        }
+    }
+
+    Super::emitGlobalInstImpl(inst);
+}
+
+
 
 } // namespace Slang
diff --git a/source/slang/slang-emit-hlsl.h b/source/slang/slang-emit-hlsl.h
index f9b0ad0fb..b93cc694b 100644
--- a/source/slang/slang-emit-hlsl.h
+++ b/source/slang/slang-emit-hlsl.h
@@ -7,16 +7,27 @@
 namespace Slang
 {
 
+class HLSLExtensionTracker : public RefObject
+{
+public:
+        /// Has any operation been used that requires NVAPI to be included via prelude?
+    bool m_requiresNVAPI = false;
+};
+
 class HLSLSourceEmitter : public CLikeSourceEmitter
 {
 public:
     typedef CLikeSourceEmitter Super;
 
-    HLSLSourceEmitter(const Desc& desc) :
-        Super(desc)
+    HLSLSourceEmitter(const Desc& desc)
+        : Super(desc)
+        , m_extensionTracker(new HLSLExtensionTracker)
     {}
 
+    virtual RefObject* getExtensionTracker() SLANG_OVERRIDE { return m_extensionTracker; }
+
 protected:
+    RefPtr<HLSLExtensionTracker> m_extensionTracker;
 
     virtual void emitLayoutSemanticsImpl(IRInst* inst, char const* uniformSemanticSpelling) SLANG_OVERRIDE;
     virtual void emitParameterGroupImpl(IRGlobalParam* varDecl, IRUniformParameterGroupType* type) SLANG_OVERRIDE;
@@ -35,6 +46,11 @@ protected:
     virtual void emitSimpleValueImpl(IRInst* inst) SLANG_OVERRIDE;
     virtual void emitLoopControlDecorationImpl(IRLoopControlDecoration* decl) SLANG_OVERRIDE;
 
+    virtual void handleRequiredCapabilitiesImpl(IRInst* inst) SLANG_OVERRIDE;
+    virtual void emitPreludeDirectivesImpl() SLANG_OVERRIDE;
+
+    virtual void emitGlobalInstImpl(IRInst* inst) SLANG_OVERRIDE;
+
         // Emit a single `register` semantic, as appropriate for a given resource-type-specific layout info
         // Keyword to use in the uniform case (`register` for globals, `packoffset` inside a `cbuffer`)
     void _emitHLSLRegisterSemantic(LayoutResourceKind kind, EmitVarChain* chain, char const* uniformSemanticSpelling = "register");
diff --git a/source/slang/slang-emit.cpp b/source/slang/slang-emit.cpp
index 4b7b13e4f..fbddd2910 100644
--- a/source/slang/slang-emit.cpp
+++ b/source/slang/slang-emit.cpp
@@ -825,6 +825,8 @@ SlangResult emitEntryPointsSourceFromIR(
     // Now that we've emitted the code for all the declarations in the file,
     // it is time to stitch together the final output.
 
+    sourceEmitter->emitPreludeDirectives();
+
     {
         // If there is a prelude emit it
         const auto& prelude = compileRequest->getSession()->getPreludeForLanguage(sourceLanguage);
diff --git a/source/slang/slang-ir-inst-defs.h b/source/slang/slang-ir-inst-defs.h
index 7863b9a9c..c7d223ab6 100644
--- a/source/slang/slang-ir-inst-defs.h
+++ b/source/slang/slang-ir-inst-defs.h
@@ -565,6 +565,16 @@ INST(HighLevelDeclDecoration,               highLevelDecl,          1, 0)
 
     INST(BuiltinDecoration, BuiltinDecoration, 0, 0)
 
+        /// The decorated instruction requires NVAPI to be included via prelude when compiling for D3D.
+    INST(RequiresNVAPIDecoration, requiresNVAPI, 0, 0)
+
+        /// The decorated instruction is part of the NVAPI "magic" and should always use its original name
+    INST(NVAPIMagicDecoration, nvapiMagic, 1, 0)
+
+        /// A decoration that applies to an entire IR module, and indicates the register/space binding
+        /// that the NVAPI shader parameter intends to use.
+    INST(NVAPISlotDecoration, nvapiSlot, 2, 0)
+
     INST(SemanticDecoration, semantic, 2, 0)
 
     INST_RANGE(Decoration, HighLevelDeclDecoration, SemanticDecoration)
diff --git a/source/slang/slang-ir-insts.h b/source/slang/slang-ir-insts.h
index 3b390015b..ef382373f 100644
--- a/source/slang/slang-ir-insts.h
+++ b/source/slang/slang-ir-insts.h
@@ -260,7 +260,28 @@ IR_SIMPLE_DECORATION(GloballyCoherentDecoration)
 IR_SIMPLE_DECORATION(PreciseDecoration)
 IR_SIMPLE_DECORATION(PublicDecoration)
 IR_SIMPLE_DECORATION(KeepAliveDecoration)
+IR_SIMPLE_DECORATION(RequiresNVAPIDecoration)
 
+struct IRNVAPIMagicDecoration : IRDecoration
+{
+    enum { kOp = kIROp_NVAPIMagicDecoration };
+    IR_LEAF_ISA(NVAPIMagicDecoration)
+
+    IRStringLit* getNameOperand() { return cast<IRStringLit>(getOperand(0)); }
+    UnownedStringSlice getName() { return getNameOperand()->getStringSlice(); }
+};
+
+struct IRNVAPISlotDecoration : IRDecoration
+{
+    enum { kOp = kIROp_NVAPISlotDecoration };
+    IR_LEAF_ISA(NVAPISlotDecoration)
+
+    IRStringLit* getRegisterNameOperand() { return cast<IRStringLit>(getOperand(0)); }
+    UnownedStringSlice getRegisterName() { return getRegisterNameOperand()->getStringSlice(); }
+
+    IRStringLit* getSpaceNameOperand() { return cast<IRStringLit>(getOperand(1)); }
+    UnownedStringSlice getSpaceName() { return getSpaceNameOperand()->getStringSlice(); }
+};
 
 struct IROutputControlPointsDecoration : IRDecoration
 {
@@ -2457,6 +2478,16 @@ struct IRBuilder
         addDecoration(value, kIROp_PublicDecoration);
     }
 
+    void addNVAPIMagicDecoration(IRInst* value, UnownedStringSlice const& name)
+    {
+        addDecoration(value, kIROp_NVAPIMagicDecoration, getStringValue(name));
+    }
+
+    void addNVAPISlotDecoration(IRInst* value, UnownedStringSlice const& registerName, UnownedStringSlice const& spaceName)
+    {
+        addDecoration(value, kIROp_NVAPISlotDecoration, getStringValue(registerName), getStringValue(spaceName));
+    }
+
         /// Add a decoration that indicates that the given `inst` depends on the given `dependency`.
         ///
         /// This decoration can be used to ensure that a value that an instruction
diff --git a/source/slang/slang-ir-link.cpp b/source/slang/slang-ir-link.cpp
index f40f83fb8..802288c0b 100644
--- a/source/slang/slang-ir-link.cpp
+++ b/source/slang/slang-ir-link.cpp
@@ -1499,6 +1499,45 @@ LinkedIR linkIR(
         }
     }
 
+    // It is possible that metadata has been attached to the input modules
+    // themselves, which should be copied over to the output module.
+    //
+    // In cases where multiple input modules specify the same metadata
+    // decoration, we will need rules to merge such decorations according
+    // to opcode-specific policy (e.g., a `[maxStackSizeRequired(...)]`
+    // decoration might use a maximum over all specified values, while a
+    // `[assumedWaveSize(...)]` decoration might require that all specified
+    // values match exactly).
+    //
+    for (IRModule* irModule : irModules)
+    {
+        for( auto decoration : irModule->getModuleInst()->getDecorations() )
+        {
+            switch( decoration->op )
+            {
+            case kIROp_NVAPISlotDecoration:
+                {
+                    // For now we just clone every decoration we see,
+                    // which means that an arbitrary one will end up
+                    // "winning" and being the one found by searches
+                    // in later code.
+                    //
+                    // TODO: need validation to check if decorations are
+                    // consistent with one another, in the case where
+                    // multiple input modules have matching decorations.
+                    //
+                    auto cloned = cloneInst(context, context->builder, decoration);
+                    cloned->insertAtStart(state->irModule->getModuleInst());
+                }
+                break;
+
+            default:
+                break;
+            }
+        }
+    }
+
+
     // TODO: *technically* we should consider the case where
     // we have global variables with initializers, since
     // these should get run whether or not the entry point
diff --git a/source/slang/slang-lower-to-ir.cpp b/source/slang/slang-lower-to-ir.cpp
index 424358d3c..e9381268b 100644
--- a/source/slang/slang-lower-to-ir.cpp
+++ b/source/slang/slang-lower-to-ir.cpp
@@ -5480,6 +5480,8 @@ struct DeclLoweringVisitor : DeclVisitor<DeclLoweringVisitor, LoweredValInfo>
             builder->addHighLevelDeclDecoration(irParam, decl);
         }
 
+        addTargetIntrinsicDecorations(irParam, decl);
+
         // A global variable's SSA value is a *pointer* to
         // the underlying storage.
         setGlobalValue(context, decl, paramVal);
@@ -6533,6 +6535,11 @@ struct DeclLoweringVisitor : DeclVisitor<DeclLoweringVisitor, LoweredValInfo>
 
             builder->addTargetIntrinsicDecoration(irInst, targetName, definition.getUnownedSlice());
         }
+
+        if(auto nvapiMod = decl->findModifier<NVAPIMagicModifier>())
+        {
+            builder->addNVAPIMagicDecoration(irInst, decl->getName()->text.getUnownedSlice());
+        }
     }
 
         /// Is `decl` a member function (or effectively a member function) when considered as a stdlib declaration?
@@ -7000,6 +7007,11 @@ struct DeclLoweringVisitor : DeclVisitor<DeclLoweringVisitor, LoweredValInfo>
             getBuilder()->addRequireCUDASMVersionDecoration(irFunc, versionMod->version);
         }
 
+        if(decl->findModifier<RequiresNVAPIAttribute>())
+        {
+            getBuilder()->addSimpleDecoration<IRRequiresNVAPIDecoration>(irFunc);
+        }
+
         if (decl->findModifier<PublicModifier>()) {
             getBuilder()->addSimpleDecoration<IRPublicDecoration>(irFunc);
         }
@@ -7662,6 +7674,14 @@ IRModule* generateIRForTranslationUnit(
         }
     }
 
+    if(auto nvapiSlotModifier = translationUnit->getModuleDecl()->findModifier<NVAPISlotModifier>())
+    {
+        builder->addNVAPISlotDecoration(
+            module->getModuleInst(),
+            nvapiSlotModifier->registerName.getUnownedSlice(),
+            nvapiSlotModifier->spaceName.getUnownedSlice());
+    }
+
 #if 0
     {
         DiagnosticSinkWriter writer(compileRequest->getSink());
diff --git a/source/slang/slang-parameter-binding.cpp b/source/slang/slang-parameter-binding.cpp
index 90b5d9824..73c94f722 100644
--- a/source/slang/slang-parameter-binding.cpp
+++ b/source/slang/slang-parameter-binding.cpp
@@ -377,6 +377,9 @@ struct SharedParameterBindingContext
     // The space to use for auto-generated bindings.
     UInt defaultSpace = 0;
 
+    // Any NVAPI slot binding information that has been generated
+    List<NVAPISlotModifier*> nvapiSlotModifiers;
+
     TargetRequest* getTargetRequest() { return targetRequest; }
     DiagnosticSink* getSink() { return m_sink; }
     Linkage* getLinkage() { return targetRequest->getLinkage(); }
@@ -483,16 +486,19 @@ LayoutResourceKind findRegisterClassFromName(UnownedStringSlice const& registerC
     return LayoutResourceKind::None;
 }
 
-LayoutSemanticInfo ExtractLayoutSemanticInfo(
-    ParameterBindingContext*    context,
-    HLSLLayoutSemantic*         semantic)
+LayoutSemanticInfo extractHLSLLayoutSemanticInfo(
+    UnownedStringSlice  registerName,
+    SourceLoc           registerLoc,
+    UnownedStringSlice  spaceName,
+    SourceLoc           spaceLoc,
+    DiagnosticSink*     sink
+    )
 {
     LayoutSemanticInfo info;
     info.space = 0;
     info.index = 0;
     info.kind = LayoutResourceKind::None;
 
-    UnownedStringSlice registerName = semantic->registerName.getContent();
     if (registerName.getLength() == 0)
         return info;
 
@@ -513,7 +519,7 @@ LayoutSemanticInfo ExtractLayoutSemanticInfo(
     LayoutResourceKind kind = findRegisterClassFromName(registerClassName);
     if(kind == LayoutResourceKind::None)
     {
-        getSink(context)->diagnose(semantic->registerName, Diagnostics::unknownRegisterClass, registerClassName);
+        sink->diagnose(registerLoc, Diagnostics::unknownRegisterClass, registerClassName);
         return info;
     }
 
@@ -521,7 +527,7 @@ LayoutSemanticInfo ExtractLayoutSemanticInfo(
     // how it works for varying input/output semantics).
     if( registerIndexDigits.getLength() == 0 )
     {
-        getSink(context)->diagnose(semantic->registerName, Diagnostics::expectedARegisterIndex, registerClassName);
+        sink->diagnose(registerLoc, Diagnostics::expectedARegisterIndex, registerClassName);
     }
 
     UInt index = 0;
@@ -531,49 +537,67 @@ LayoutSemanticInfo ExtractLayoutSemanticInfo(
         index = index * 10 + (c - '0');
     }
 
-
     UInt space = 0;
-    if( auto registerSemantic = as<HLSLRegisterSemantic>(semantic) )
+    if(spaceName.getLength() != 0)
     {
-        auto const& spaceName = registerSemantic->spaceName.getContent();
-        if(spaceName.getLength() != 0)
-        {
-            UnownedStringSlice spaceSpelling;
-            UnownedStringSlice spaceDigits;
-            splitNameAndIndex(spaceName, spaceSpelling, spaceDigits);
+        UnownedStringSlice spaceSpelling;
+        UnownedStringSlice spaceDigits;
+        splitNameAndIndex(spaceName, spaceSpelling, spaceDigits);
 
-            if( kind == LayoutResourceKind::RegisterSpace )
-            {
-                getSink(context)->diagnose(registerSemantic->spaceName, Diagnostics::unexpectedSpecifierAfterSpace, spaceName);
-            }
-            else if( spaceSpelling != UnownedTerminatedStringSlice("space") )
-            {
-                getSink(context)->diagnose(registerSemantic->spaceName, Diagnostics::expectedSpace, spaceSpelling);
-            }
-            else if( spaceDigits.getLength() == 0 )
-            {
-                getSink(context)->diagnose(registerSemantic->spaceName, Diagnostics::expectedSpaceIndex);
-            }
-            else
+        if( kind == LayoutResourceKind::RegisterSpace )
+        {
+            sink->diagnose(spaceLoc, Diagnostics::unexpectedSpecifierAfterSpace, spaceName);
+        }
+        else if( spaceSpelling != UnownedTerminatedStringSlice("space") )
+        {
+            sink->diagnose(spaceLoc, Diagnostics::expectedSpace, spaceSpelling);
+        }
+        else if( spaceDigits.getLength() == 0 )
+        {
+            sink->diagnose(spaceLoc, Diagnostics::expectedSpaceIndex);
+        }
+        else
+        {
+            for(auto c : spaceDigits)
             {
-                for(auto c : spaceDigits)
-                {
-                    SLANG_ASSERT(isDigit(c));
-                    space = space * 10 + (c - '0');
-                }
+                SLANG_ASSERT(isDigit(c));
+                space = space * 10 + (c - '0');
             }
         }
     }
 
+    info.kind = kind;
+    info.index = (int) index;
+    info.space = space;
+    return info;
+}
+
+LayoutSemanticInfo ExtractLayoutSemanticInfo(
+    ParameterBindingContext*    context,
+    HLSLLayoutSemantic*         semantic)
+{
+    Token const& registerToken = semantic->registerName;
+
+    Token defaultSpaceToken;
+    Token const* spaceToken = &defaultSpaceToken;
+    if( auto registerSemantic = as<HLSLRegisterSemantic>(semantic) )
+    {
+        spaceToken = &registerSemantic->spaceName;
+    }
+
+    LayoutSemanticInfo info = extractHLSLLayoutSemanticInfo(
+        registerToken.getContent(),
+        registerToken.loc,
+        spaceToken->getContent(),
+        spaceToken->loc,
+        getSink(context));
+
     // TODO: handle component mask part of things...
     if( semantic->componentMask.hasContent())
     {
         getSink(context)->diagnose(semantic->componentMask, Diagnostics::componentMaskNotSupported);
     }
 
-    info.kind = kind;
-    info.index = (int) index;
-    info.space = space;
     return info;
 }
 
@@ -2886,6 +2910,14 @@ struct CollectParametersVisitor : ComponentTypeVisitor
 
             collectGlobalScopeParameter(m_context, shaderParamInfo, SubstitutionSet());
         }
+
+        if( auto moduleDecl = module->getModuleDecl() )
+        {
+            if( auto nvapiSlotModifier = moduleDecl->findModifier<NVAPISlotModifier>() )
+            {
+                m_context->shared->nvapiSlotModifiers.add(nvapiSlotModifier);
+            }
+        }
     }
 
 };
@@ -3361,6 +3393,82 @@ RefPtr<ProgramLayout> generateParameterBindings(
         generateParameterBindings(&context, parameter);
     }
 
+    // It is possible that code has specified an explicit location
+    // for the UAV used to communicate with NVAPI, but the code
+    // is not actually including/using the NVAPI header. We still
+    // need to ensure that user-defined shader parameters do not
+    // conflict with the location that will be used by NVAPI.
+    //
+    if( isD3DTarget(targetReq) )
+    {
+        // Information about the NVAPI parameter was recorded
+        // on the AST `ModuleDecl`s during earlier stages of
+        // compilation, and there might be multiple modules
+        // as input to layout and back-end compilation.
+        //
+        // We do not take responsibility for diagnostic conflicts
+        // at this point, and simply process all of the modifiers
+        // we see.
+        //
+        for( auto nvapiSlotModifier : sharedContext.nvapiSlotModifiers )
+        {
+            // For a given modifier, we start by parsing the semantic
+            // strings that were specified, just as we would if they
+            // had been specified via a `register(...)` semantic.
+            //
+            auto info = extractHLSLLayoutSemanticInfo(
+                nvapiSlotModifier->registerName.getUnownedSlice(),
+                nvapiSlotModifier->loc,
+                nvapiSlotModifier->spaceName.getUnownedSlice(),
+                nvapiSlotModifier->loc,
+                sink);
+            auto kind = info.kind;
+            if (kind == LayoutResourceKind::None)
+                continue;
+
+            // The NVAPI parameter always uses a single register.
+            //
+            LayoutSize count = 1;
+
+            // We are going to mark the register range declared for the
+            // NVAPI parameter as used.
+            //
+            // Note: It is possible that the range has *already* been
+            // marked as used, because the `g_NvidiaExt` parameter has
+            // already been processed by the `generateParameterBindings`
+            // logic above. We don't worry about that case and do not
+            // diagnose an error.
+            //
+            // Note: It is *also* possible that some non-NVAPI parameter
+            // with an explicit binding will collide with the NVAPI
+            // parameter, and we also do not diagnose a problem in
+            // that case.
+            //
+            // TODO: We could probably make the user experience nicer
+            // here in the case of conflicts, but we are already deep
+            // into a do-what-I-mean edge case.
+            //
+            // Note: In the case where the user sets up the NVAPI
+            // register/space via the front-end (e.g., by setting a
+            // `NV_SHADER_EXTN_SLOT` macro), but doesn't actually
+            // `#include` the NVAPI header, we will *still* reserve
+            // the appropriate register so that it won't be used
+            // by user paraemter.
+            //
+            // That policy means that simply defining a particular
+            // macro can alter layout behavior, which is conceptually
+            // kind of a mess, but also seems to be the best possible
+            // answer given the constraints.
+            //
+            auto usedRangeSet = findUsedRangeSetForSpace(&context, info.space);
+            markSpaceUsed(&context, nullptr, info.space);
+            usedRangeSet->usedResourceRanges[(int)kind].Add(
+                nullptr,
+                info.index,
+                info.index + count);
+        }
+    }
+
     // Once we have a canonical list of all the parameters, we can
     // detect if there are any global-scope parameters that make use
     // of `LayoutResourceKind::Uniform`, since such parameters would
diff --git a/source/slang/slang-preprocessor.cpp b/source/slang/slang-preprocessor.cpp
index fceee27f9..5a54bfcb0 100644
--- a/source/slang/slang-preprocessor.cpp
+++ b/source/slang/slang-preprocessor.cpp
@@ -222,6 +222,9 @@ struct Preprocessor
         /// The module, if any, that the preprocessed result will belong to
     Module*                                 parentModule = nullptr;
 
+        /// The AST builder that should be used when creating AST nodes for `parentModule`
+    ASTBuilder*                             astBuilder = nullptr;
+
     // The unique identities of any paths that have issued `#pragma once` directives to
     // stop them from being included again.
     HashSet<String>                         pragmaOnceUniqueIdentities;
@@ -2447,18 +2450,141 @@ static TokenList ReadAllTokens(
     return tokens;
 }
 
+    /// Try to look up a macro with the given `macroName` and produce its value as a string
+static bool _findMacroValue(
+    Preprocessor*   preprocessor,
+    char const*     macroName,
+    String&         outValue,
+    SourceLoc&      outLoc)
+{
+    auto namePool = preprocessor->linkage->getNamePool();
+    auto macro = LookupMacro(preprocessor, namePool->getName(macroName));
+    if(!macro)
+        return false;
+    if(macro->flavor != PreprocessorMacroFlavor::ObjectLike)
+        return false;
+
+    MacroExpansion* expansion = new MacroExpansion();
+    initializeMacroExpansion(preprocessor, expansion, macro);
+    pushMacroExpansion(preprocessor, expansion);
+
+    String value;
+    for(bool first = true;;first = false)
+    {
+        Token token = ReadToken(preprocessor);
+        if(token.type == TokenType::EndOfFile)
+            break;
+
+        if(!first && (token.flags & TokenFlag::AfterWhitespace))
+            value.append(" ");
+        value.append(token.getContent());
+    }
+
+    outValue = value;
+    outLoc = macro->getLoc();
+    return true;
+}
+
+    /// Validate that a re-defintion of an NVAPI-related macro matches any previous definition
+static void _validateNVAPIMacroMatch(
+    Preprocessor*   preprocessor,
+    char const*     macroName,
+    String const&   existingValue,
+    String const&   newValue,
+    SourceLoc       loc)
+{
+    if( existingValue != newValue )
+    {
+        preprocessor->sink->diagnose(loc, Diagnostics::nvapiMacroMismatch, macroName, existingValue, newValue);
+    }
+}
+
+    /// Collect macro definitions that are relevant to subsequent compilation steps, and store them
+static void _collectDownstreamRelevantMacros(
+    Preprocessor*   preprocessor)
+{
+    // For now, the only case of semantically-relevant macros we need to worrry
+    // about are the NVAPI macros used to establish the register/space to use.
+    //
+    static const char* kNVAPIRegisterMacroName = "NV_SHADER_EXTN_SLOT";
+    static const char* kNVAPISpaceMacroName = "NV_SHADER_EXTN_REGISTER_SPACE";
+
+    // For NVAPI use, the `NV_SHADER_EXTN_SLOT` macro is required to be defined.
+    //
+    String nvapiRegister;
+    SourceLoc nvapiRegisterLoc;
+    if(_findMacroValue(preprocessor, kNVAPIRegisterMacroName, nvapiRegister, nvapiRegisterLoc))
+    {
+        // In contrast, NVAPI can be used without defining `NV_SHADER_EXTN_REGISTER_SPACE`,
+        // which effectively defaults to `space0`.
+        //
+        String nvapiSpace = "space0";
+        SourceLoc nvapiSpaceLoc;
+        _findMacroValue(preprocessor, kNVAPISpaceMacroName, nvapiSpace, nvapiSpaceLoc);
+
+        // We are going to store the values of these macros on the AST-level `ModuleDecl`
+        // so that they will be available to later processing stages.
+        //
+        // Note: An alternative design here would be to either put the data directly
+        // on the `Module`, or to define some kind of side-channel output that the
+        // preprocessor can use to communicate these macro values back (and then
+        // allow another system to create the AST nodes). In practice, all of these
+        // alternatives would actually increase the amount of code/complexity,
+        // so we stick with the simple-but-hacky option of having the
+        // preprocessor create AST nodes directly.
+        //
+        auto module = preprocessor->parentModule;
+        if(!module) return;
+        auto moduleDecl = module->getModuleDecl();
+
+        // We need to make sure that the AST nodes we create will have the right
+        // lifetime (it should match the module we are adding to).
+        //
+        auto astBuilder = preprocessor->astBuilder;
+        if(!astBuilder) return;
+
+        if(auto existingModifier = moduleDecl->findModifier<NVAPISlotModifier>())
+        {
+            // If there is already a modifier attached to the module (perhaps
+            // because of preprocessing a different source file, or because
+            // of settings established via command-line options), then we
+            // need to validate that the values being set in this file
+            // match those already set (or else there is likely to be
+            // some kind of error in the user's code).
+            //
+            _validateNVAPIMacroMatch(preprocessor, kNVAPIRegisterMacroName, existingModifier->registerName, nvapiRegister,  nvapiRegisterLoc);
+            _validateNVAPIMacroMatch(preprocessor, kNVAPISpaceMacroName,    existingModifier->spaceName,    nvapiSpace,     nvapiSpaceLoc);
+        }
+        else
+        {
+            // If there is no existing modifier on the module, then we
+            // take responsibility for adding one, based on the macro
+            // values we saw.
+            //
+            auto modifier = astBuilder->create<NVAPISlotModifier>();
+            modifier->loc = nvapiRegisterLoc;
+            modifier->registerName = nvapiRegister;
+            modifier->spaceName = nvapiSpace;
+
+            addModifier(moduleDecl, modifier);
+        }
+    }
+}
+
 TokenList preprocessSource(
     SourceFile*                 file,
     DiagnosticSink*             sink,
     IncludeSystem*              includeSystem,
     Dictionary<String, String>  defines,
     Linkage*                    linkage,
-    Module*                     parentModule)
+    Module*                     parentModule,
+    ASTBuilder*                 astBuilder)
 {
     Preprocessor preprocessor;
     InitializePreprocessor(&preprocessor, sink);
     preprocessor.linkage = linkage;
     preprocessor.parentModule = parentModule;
+    preprocessor.astBuilder = astBuilder;
 
     preprocessor.includeSystem = includeSystem;
     for (auto p : defines)
@@ -2475,6 +2601,17 @@ TokenList preprocessSource(
 
     TokenList tokens = ReadAllTokens(&preprocessor);
 
+    // We look at the preprocessor state after reading the entire
+    // source file/string, in order to see if any macros have been
+    // set that should be considered semantically relevant for
+    // later stages of compilation.
+    //
+    // Note: Checking the macro environment *after* preprocessing is complete
+    // means that we can treat macros introduced via `-D` options or the API
+    // equivalently to macros introduced via `#define`s in user code.
+    //
+    _collectDownstreamRelevantMacros(&preprocessor);
+
     FinalizePreprocessor(&preprocessor);
 
     // debugging: build the pre-processed source back together
diff --git a/source/slang/slang-preprocessor.h b/source/slang/slang-preprocessor.h
index 472a8675d..7c72a859c 100644
--- a/source/slang/slang-preprocessor.h
+++ b/source/slang/slang-preprocessor.h
@@ -10,6 +10,7 @@
 
 namespace Slang {
 
+class ASTBuilder;
 class DiagnosticSink;
 class Linkage;
 class Module;
@@ -22,7 +23,8 @@ TokenList preprocessSource(
     IncludeSystem*              includeSystem,
     Dictionary<String, String>  defines,
     Linkage*                    linkage,
-    Module*                     parentModule);
+    Module*                     parentModule,
+    ASTBuilder*                 astBuilder = nullptr);
 
 } // namespace Slang
 
diff --git a/source/slang/slang.cpp b/source/slang/slang.cpp
index fbdbe14b6..0add5acea 100644
--- a/source/slang/slang.cpp
+++ b/source/slang/slang.cpp
@@ -42,6 +42,7 @@
 #endif
 
 extern char const* slang_cuda_prelude;
+extern char const* slang_hlsl_prelude;
 
 namespace Slang {
 
@@ -189,6 +190,7 @@ void Session::init()
 
     // Set up default prelude code for target languages that need a prelude
     m_languagePreludes[Index(SourceLanguage::CUDA)] = slang_cuda_prelude;
+    m_languagePreludes[Index(SourceLanguage::HLSL)] = slang_hlsl_prelude;
 }
 
 ISlangUnknown* Session::getInterface(const Guid& guid)
@@ -989,7 +991,8 @@ void FrontEndCompileRequest::parseTranslationUnit(
             &includeSystem,
             combinedPreprocessorDefinitions,
             getLinkage(),
-            module);
+            module,
+            astBuilder);
 
         parseSourceFile(
             astBuilder,
diff --git a/source/slang/slang.vcxproj b/source/slang/slang.vcxproj
index a0865d16c..a5d51e36b 100644
--- a/source/slang/slang.vcxproj
+++ b/source/slang/slang.vcxproj
@@ -298,10 +298,8 @@
     <ClInclude Include="slang-visitor.h" />
   </ItemGroup>
   <ItemGroup>
-    <ClCompile Include="..\..\prelude\slang-cpp-prelude.h.cpp" />
-    <ClCompile Include="..\..\prelude\slang-cpp-scalar-intrinsics.h.cpp" />
-    <ClCompile Include="..\..\prelude\slang-cpp-types.h.cpp" />
     <ClCompile Include="..\..\prelude\slang-cuda-prelude.h.cpp" />
+    <ClCompile Include="..\..\prelude\slang-hlsl-prelude.h.cpp" />
     <ClCompile Include="slang-ast-builder.cpp" />
     <ClCompile Include="slang-ast-decl.cpp" />
     <ClCompile Include="slang-ast-dump.cpp" />
diff --git a/source/slang/slang.vcxproj.filters b/source/slang/slang.vcxproj.filters
index 5edd4d31e..fec82a8f5 100644
--- a/source/slang/slang.vcxproj.filters
+++ b/source/slang/slang.vcxproj.filters
@@ -341,16 +341,10 @@
     </ClInclude>
   </ItemGroup>
   <ItemGroup>
-    <ClCompile Include="..\..\prelude\slang-cpp-prelude.h.cpp">
-      <Filter>Header Files</Filter>
-    </ClCompile>
-    <ClCompile Include="..\..\prelude\slang-cpp-scalar-intrinsics.h.cpp">
-      <Filter>Header Files</Filter>
-    </ClCompile>
-    <ClCompile Include="..\..\prelude\slang-cpp-types.h.cpp">
+    <ClCompile Include="..\..\prelude\slang-cuda-prelude.h.cpp">
       <Filter>Header Files</Filter>
     </ClCompile>
-    <ClCompile Include="..\..\prelude\slang-cuda-prelude.h.cpp">
+    <ClCompile Include="..\..\prelude\slang-hlsl-prelude.h.cpp">
       <Filter>Header Files</Filter>
     </ClCompile>
     <ClCompile Include="slang-ast-builder.cpp">
author	Tim Foley <tfoleyNV@users.noreply.github.com>	2020-09-23 15:47:14 -0700
committer	GitHub <noreply@github.com>	2020-09-23 15:47:14 -0700
commit	895405212aa286701031a4f62b6904938105411c (patch)
tree	81abc616192e51c8500e3d3d119cef653349341a
parent	3d063a7024e54340b6fed2af964ea2790056a3e3 (diff)