summaryrefslogtreecommitdiffstats
path: root/source/slang/slang-emit-cuda.cpp
Commit message (Collapse)AuthorAge
* Immutable access qualifier for pointers and use `__ldg` on cuda. (#8710)Yong He2025-10-16
| | | | | | | | | | | | | | | | | | | | | | | | This PR implements `Access.Immutable` to allow pointers to immutable data. The new type `ImmutablePtr<T>` is defined as an alias of `Ptr<T, Address.Immutable>`. By forming a immutable pointer, the programmer is conveying to the compiler that the data at the pointer address will never change during the execution of the current program. Therefore loads from immutable pointers can be deduplicated by the compiler, and will translate to `__ldg` when generating code for CUDA. The SPIRV backend is not changed in this PR, since the current SPIRV spec makes it very difficult to specify loads from immutable address without generating tons of wrappers and boilerplate type declarations. We would like to see the spec evolved a bit to around its support of `NonWritable` physical storage pointers or immutable loads before we attempt to express such immutability in SPIRV. For now we simply emit ordinary pointers and loads when generating spirv. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* [CUDA] Fix incorrect `kIROp_RaytracingAccelerationStructureType` emitting ↵ArielG-NV2025-08-15
| | | | | | | | | | | | | | logic (#8168) Fixes: #8167 Current emitting logic does not work, this has been corrected. The provided test ensures our CUDA code is valid by compiling PTX from it. `m_writer->emit("OptixTraversableHandle");` should be `out <<` since `out` adds to type-name-cache; otherwise using a type twice will produce bad type-names (since we filled type-name cache with "" instead of "typeName")
* Fix operator precedence in OptiX ray payload pointer casting which broke due ↵Harsh Aggarwal (NVIDIA)2025-05-26
| | | | | | | | | | | | | | | | | | | | | | | | | to (#6326) (#7194) * Fix operator precedence in OptiX ray payload pointer casting Added extra parentheses around the cast to ensure proper operator precedence when dereferencing the OptiX ray payload pointer. This fixes the issue where the compiler was treating the expression as (RayPayload_0 *)getOptiXRayPayloadPtr()->color_0 instead of ((RayPayload_0 *)getOptiXRayPayloadPtr())->color_0. Error: nvrtc 12.9: tests/cuda/optix-cluster.slang(17): error : expression must have pointer-to-class type but it has type "void *" nvrtc 12.9: note : (RayPayload_0 *)getOptiXRayPayloadPtr()->color_0 = color_1; nvrtc 12.9: note : ^ Tested using: ./build/Debug/bin/slangc -target ptx -Xnvrtc -I"/home/haaggarwal/NVIDIA-OptiX-SDK-9.0.0-linux64-x86_64/include" -DSLANG_CUDA_ENABLE_OPTIX -entry closestHitShaderA ./tests/cuda/optix-cluster.slang * Fix Check
* Implement shader execution reordering support for OptiX (#7211)Harsh Aggarwal (NVIDIA)2025-05-26
| | | | | | | | | | | | | | | | * Implement shader execution reordering support for OptiX Added OptiX backend support for Shader Execution Reordering (SER) features as outlined in issue #6647. This implementation: 1. Added CUDA target support for HitObject API 2. Implemented core SER functionality (TraceRay, MakeHit/Miss, Invoke) 3. Added OptiX-specific hit object handling functions 4. Added test case for OptiX SER functionality * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Cleanups related to RIFF support (#7041)Theresa Foley2025-05-12
|
* Fix various intptr_t issues by defining its width in `getIntTypeInfo` (#6786)Julius Ikkala2025-05-09
| | | | | | | | | | | | | | | | | | | | | | | | | * Define a bit size for the intptr types * Fix intptr_t sign * Extend intptr test to check for previously broken operations * Fix intptr vector test on CUDA * Handle intptr size in getAnyValueSize * Fix formatting * Try with __ARM_ARCH_ISA_64 * On macs, int64_t != intptr_t Yikes * Move define to prelude header * Also check apple in host-prelude * Fix define location
* Add Slang-specific intrinsics for integer pack/unpack (#6459)Darren Wihandi2025-02-28
| | | | | | | | | | | | | | | | | | | | | * update hlsl meta * update test * use slang syntax in meta file * improve meta file * fix pack clamp u8 * remove builtin packed types, use typealias instead * fix wgsl pack clamp * fix formatting --------- Co-authored-by: Yong He <yonghe@outlook.com>
* Add packed 8bit builtin types (#5939)Darren Wihandi2024-12-26
| | | | | * Add packed bytes builtin type * fix test
* Move switch statement bodies to their own lines (#5493)Ellie Hermaszewska2024-11-05
| | | | | | | | | * Move switch statement bodies to their own lines * format --------- Co-authored-by: Yong He <yonghe@outlook.com>
* Write only texture types. (#5454)Yong He2024-10-30
| | | | | | | | | | | | | | | | * Add support for write-only textures. * Fix capabilities. * Fix implementation. * Fix. * format code --------- Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* formatEllie Hermaszewska2024-10-29
| | | | | | | * format * Minor test fixes * enable checking cpp format in ci
* Cleanup atomic intrinsics. (#5324)Yong He2024-10-17
| | | | | | | | | | | | | | | | | | | * Cleanup atomic intrinsics. * Fix. * Fix glsl. * Remove hacky intrinsic expansion logic for glsl image atomics. * Fix all tests. * Fix. * Add `InterlockedAddF16Emulated`. * Fix glsl intrinsic. * Fix.
* Initial `Atomic<T>` type implementation. (#5125)Yong He2024-09-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Initial Atomic<T> type implementation. * Update design doc. * Fix. * Add test. * Fixes and add tests. * Fix WGSL. * Fix glsl. * Fix metal. * experiemnt with github metal. * experiment github metal 2 * github metal experiment 3 * experiment with github metal 4. * experiment with metal 5. * experiment 7. * metal experiment 8. * Fix metal tests. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Respect matrix layout in uniform and in/out parameters for HLSL target. (#5013)Yong He2024-09-05
| | | | | | | | | | | | | | | | | | | * Respect matrix layout in uniform and in/out parameters for HLSL target. * Update test. * Fix test. * fix test. * Fix metal layout calculation. * Fix compile error. * Fix compiler error. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Overhaul IR lowering of pointer types. (#4710)Yong He2024-07-25
| | | | | | | | | | | | | | | * Overhaul IR lowering of pointer types. * Propagate address space in IRBuilder. * Fixup. * Fix. * Fix. * Change how Ptr type is printed to text. * Fix.
* Implement HLSL resource bindings and default type `float4` to ↵ArielG-NV2024-06-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `SubpassInput<T>` (#4462) * Add case to `emitVectorReshape` for `vector<>` type, `scalar` value 1. Add new case 2. Add test * fix warning * fix warning * Implement HLSL resource bindings and default type `float4` to `SubpassInput<T>` fixes: #4440 1. Removed GLSLInputAttachmentIndexLayout modifier and the somewhat 'hacky' binding model 'Input Attachment' previously relied upon. This was changed to work with the slang-type-layout rules system. This change allows Slang automatic bindings, HLSL bindings, GLSL bindings, and translation of GLSL to and from HLSL bindings to work. 2. Added default argument `float4` to SubpassInput<T>. 3. Merged glsl.meta and hlsl.meta SubpassInput logic. * fix InputAttachment attribute checks fix InputAttachment attribute checks for HLSL and GLSL syntax * remove unused var * validate attribute correctly Attributes do not have type information. We must check the type expression to validate attribute usage. * remove hacky validation type based validation before types are fully resolved is quite hacky and unstable to changes and wrapped types * fix warning * remove redundant `!= nullptr` * remove extra `!= nullptr` * fix some warnings/errors * subpass capability to limit to dxc & remove default values in some functions * revert logic to previous logic revert logic to return if we have a binding regardless of if a VarDecl is given the binding
* Remove use of `G0` and `__target_intrinsic` in stdlib. (#4170)Yong He2024-05-14
| | | | | | | * Remove use of `G0` and `__target_intrinsic` in stdlib. * Fix. * Fix calling intrinsic in global scope.
* Unify stdlib `Texture` types into one generic type. (#3327)Yong He2023-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Unify Texture types in stdlib into 1 generic type. * Fixes. * Fix. * Fixes. * Fix reflection. * Fix binding reflection. * Add gather intrinsics. * Fix gather intrinsics. * Fix texture type toText. * Fix intrinsic. * fix cuda intrinsic. * Fix project files. * cleanup. * Fix. * Fix. * Fix sampler feedback test. * Fix getDimension intrinsics. * Fix spirv sample image intrinsics. * Fix test. * Fix GLSL intrinsic. * Cleanup. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Add `requirePrelude()` intrinsic function. (#3250)Yong He2023-09-29
| | | | | | | | | * Add `requirePrelude()` intrinsic function. * Fix. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Add Mesh and Task shader support to GFX (#3190)Ellie Hermaszewska2023-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Bump vulkan headers Also just use vulkan-headers as a submodule * Add drawMeshTasks to gfx graphics pipelines * Add DispatchMesh overload with no payload, with GLSL intrinsic * Require spirv 1.4 for mesh shaders * Add vulkan mesh shader feature discovery * Add mesh shader stage bits to vk-util * Add mesh and task shader support to render-test * Add mesh and task tests * Preserve "payload" specifier in task shaders * Add mesh shader pipeline support to gfx * Add TODO * Add numThreads attribute for amplification stage * Add payload to task shader test * Drop dependency on d3dx12 * Allow passing payloads from task to mesh shaders * regenerate vs projects * check DispatchMesh name correctly * Add mesh shader tests to failing tests * Detect wave-ops feature on vulkan * Add fuse-product to expected failures This fails because the global varaible `count` is not initialized * Add required extension to WaveMaskMatch SPIR-V impl * Remove meshShader member from pipeline desc * Identify mesh shader support on d3d12
* Add `target_switch` and `intrinsic_asm` statement. (#3154)Yong He2023-08-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add `target_switch` and `__intrinsic_asm` statement. * Cleanup. * WaveGetActiveMask, WaveGetActiveMask, WaveCountBits. * WaveIsFirstLane. * More wave intrinsics. * wave intrinsics. * merge fix. * Fix. * Fix. * Update test. * update test. * Fix. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Lower all ByteAddressBuffer uses for SPIRV. (#3143)Yong He2023-08-23
| | | Co-authored-by: Yong He <yhe@nvidia.com>
* Fix literals needing cast (#3039)jsmall-nvidia2023-08-01
| | | | | | | | | | | | * Cast integer literals. * Fix expected output. * For CUDA, search global instructions to see what types are used. Improve lookup for fp16 header in CUDA. * Fix issue with f16tof32 * Small improvement around finding used base types.
* Various dxc/fxc compatibility fixes. (#2863)Yong He2023-05-02
| | | | | | | | | | | | | * Various dxc/fxc compatibility fixes. * Cleanup. * Fix test cases. * Fix comments. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Fix most of the disabled warnings on gcc/clang (#2839)Ellie Hermaszewska2023-04-26
|
* Fix missing `f` suffix for float lits in CUDA backend. (#2791)Yong He2023-04-11
| | | Co-authored-by: Yong He <yhe@nvidia.com>
* Add PyTorch C++ binding generation. (#2734)Yong He2023-03-26
| | | | | | | | | * Add PyTorch C++ binding generation. * fix --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Add support for emitting cuda kernel and host functions. (#2712)Yong He2023-03-17
| | | | | | | | | | | * Add support for emitting cuda kernel and host functions. * Update test. * Fix cuda preamble emit. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Overhaul global inst deduplication and cpp/cuda backend. (#2654)Yong He2023-02-16
| | | | | | | | | * Overhaul global inst deduplication and cpp/cuda backend. * Update IR documentation. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Rename IR opcodes to unify style. (#2556)Yong He2022-12-07
| | | Co-authored-by: Yong He <yhe@nvidia.com>
* Remove `construct` IR op. (#2555)Yong He2022-12-07
| | | Co-authored-by: Yong He <yhe@nvidia.com>
* Run simple compute kernel in gfx-smoke test. (#2400)Yong He2022-09-15
|
* Language feature: pointer sized int types. (#2401)Yong He2022-09-15
| | | | | | | | | | | | | | | | | | | | | * Language feature: pointer sized int types. * Fix. * small change to test. * Fix stdlib. * Fix. * Fix. * Add typedef for `size_t` in stdlib. * Fix test. * Add `intptr_t::size` constant. Co-authored-by: Yong He <yhe@nvidia.com>
* Refactor prelude emit (#2236)jsmall-nvidia2022-05-17
| | | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * Refactor how prelude output works in emit. * Small improvement to emit output. * Move around comment on target specific language directives based on review. Co-authored-by: Theresa Foley <10618364+tangent-vector@users.noreply.github.com>
* Improved SCCP, inlining and resource specialization passes, legalize ↵Yong He2022-02-25
| | | | `ImageSubscript` for GLSL (#2146)
* Bug fix for optix SBT access (#1922)Nathan V. Morrical2021-08-23
| | | | | | | * optix SBT record data can now be accessed using uniform parameters on ray tracing entry points * Update slang-emit.cpp * fixing a bug where SBT instruction was missing a location at which to insert. Switching back to emitFieldExtract and accounting for changes in instruction emission location
* Enable reading OptiX SBT records via uniform parameters on ray tracing entry ↵Nathan V. Morrical2021-08-10
| | | | | | | points (#1917) * optix SBT record data can now be accessed using uniform parameters on ray tracing entry points * Update slang-emit.cpp
* Enable tracing rays with OptiX backend (#1871)Nathan V. Morrical2021-06-04
| | | | | | | | | | | | | | | * OptiX ray payload can now be read from and written to using the two payload register pointer method * changing op to more descriptive name * small tweak to allow for dumping out intermediate source for cuda targets * initial trace ray call compiling * hit attributes now work for float and int types, and vectors thereof * Hitgroups using structs and arrays now work with optix Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
* OptiX ray payload read/write support in raytracing pipeline shaders (#1853)Nathan V. Morrical2021-05-25
| | | | | | | | | * OptiX ray payload can now be read from and written to using the two payload register pointer method * changing op to more descriptive name * fixup: comment change to re-trigger CI Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
* CUDA half comparison support (#1834)jsmall-nvidia2021-05-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * Split out StringEscapeUtil. * Added StringEscapeUtil. * Fix typo in unix quoting type. * Small comment improvements. * Try to fix linux linking issue. * Fix typo. * Attempt to fix linux link issue. * Update VS proj even though nothing really changed. * Fix another typo issue. * Fix for windows issue. Fixed bug. * Make separate Utils for escaping. * Fix typo. * Split out into StringEscapeHandler. * Windows shell does handle removing quotes (so remove code to remove them). * Handle unescaping if not initiating using the shell. * Slight improvement around shell like decoding. * Simplify command extraction. * Add shared-library category type. * Fix bug in command extraction. * Typo in transcendental category. * Enable unit-test on in smoke test category. * Make parsing failing output as a failing test. * Fixes for transcendental tests. Disable tests that do not work. * Changed category parsing. * Removed the TestResult parameter from _gatherTestsForFile. Made testsList only output. * Remove testing if all tests were disabled. * Make args of CommandLine always unescaped. * Add category. * Don't need escaping on unix/linux. * Remove some no longer used functions. * Add requireSMVersion to CUDAExtensionTracker. * half-calc.slang now works for CUDA. * bit-cast-16-bit works on CUDA. * WIP handling of CUDA vector<half> types. * Half swizzle CUDA. * Half vector test. * Fix swizzle half bug. * Fix compilation issue with narrowing to Index. * Add unary ops. * Add some vector scalar maths ops. * Add half vector conversions for CUDA. * Fix erroneous comment. * Support for half comparisons. * First pass test for half compare. * Fix bug in CUDA specialized emit control. Updated tests to have pre and post inc/dec. * Removed unneeded parts of the cuda prelude. * Half structured buffer works on CUDA. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
* Add support for RaytracingAccelerationStructure type for PTX targets (#1831)Nathan V. Morrical2021-05-04
| | | | | | | | | | | | | * enabling command line compiler to output PTX with multiple entry points. * adding some simple optix intrinsics to slang * Now handling the kIROp_RaytracingAccelerationStructureType for CUDA. Source seems to generate correctly, but accels are always optimized out of resulting PTX due to no trace calls * fixing unnecessary diff with master * allowing unhandled untyped buffers to fall through to super::calcTypeName Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
* More CUDA Half support (#1833)jsmall-nvidia2021-05-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * Split out StringEscapeUtil. * Added StringEscapeUtil. * Fix typo in unix quoting type. * Small comment improvements. * Try to fix linux linking issue. * Fix typo. * Attempt to fix linux link issue. * Update VS proj even though nothing really changed. * Fix another typo issue. * Fix for windows issue. Fixed bug. * Make separate Utils for escaping. * Fix typo. * Split out into StringEscapeHandler. * Windows shell does handle removing quotes (so remove code to remove them). * Handle unescaping if not initiating using the shell. * Slight improvement around shell like decoding. * Simplify command extraction. * Add shared-library category type. * Fix bug in command extraction. * Typo in transcendental category. * Enable unit-test on in smoke test category. * Make parsing failing output as a failing test. * Fixes for transcendental tests. Disable tests that do not work. * Changed category parsing. * Removed the TestResult parameter from _gatherTestsForFile. Made testsList only output. * Remove testing if all tests were disabled. * Make args of CommandLine always unescaped. * Add category. * Don't need escaping on unix/linux. * Remove some no longer used functions. * Add requireSMVersion to CUDAExtensionTracker. * half-calc.slang now works for CUDA. * bit-cast-16-bit works on CUDA. * WIP handling of CUDA vector<half> types. * Half swizzle CUDA. * Half vector test. * Fix swizzle half bug. * Fix compilation issue with narrowing to Index. * Add unary ops. * Add some vector scalar maths ops. * Add half vector conversions for CUDA. * Fix erroneous comment.
* Preliminary CUDA half maths (#1827)jsmall-nvidia2021-04-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * Split out StringEscapeUtil. * Added StringEscapeUtil. * Fix typo in unix quoting type. * Small comment improvements. * Try to fix linux linking issue. * Fix typo. * Attempt to fix linux link issue. * Update VS proj even though nothing really changed. * Fix another typo issue. * Fix for windows issue. Fixed bug. * Make separate Utils for escaping. * Fix typo. * Split out into StringEscapeHandler. * Windows shell does handle removing quotes (so remove code to remove them). * Handle unescaping if not initiating using the shell. * Slight improvement around shell like decoding. * Simplify command extraction. * Add shared-library category type. * Fix bug in command extraction. * Typo in transcendental category. * Enable unit-test on in smoke test category. * Make parsing failing output as a failing test. * Fixes for transcendental tests. Disable tests that do not work. * Changed category parsing. * Removed the TestResult parameter from _gatherTestsForFile. Made testsList only output. * Remove testing if all tests were disabled. * Make args of CommandLine always unescaped. * Add category. * Don't need escaping on unix/linux. * Remove some no longer used functions. * Add requireSMVersion to CUDAExtensionTracker. * half-calc.slang now works for CUDA. * bit-cast-16-bit works on CUDA. * WIP handling of CUDA vector<half> types. * Half swizzle CUDA. * Half vector test. * Fix swizzle half bug. * Fix compilation issue with narrowing to Index. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
* Preliminary CUDA Half support (#1808)jsmall-nvidia2021-04-23
| | | | | | | | | | | | | * #include an absolute path didn't work - because paths were taken to always be relative. * WIP CUDA half support. * Working support for half on CUDA - requires cuda_fp16.h and associated files can be found. * Fix for win32 for unused funcs. * Fix for Clang. * Hack to disable unused local function warning.
* Add an accessor for IRInst opcode (#1707)Tim Foley2021-02-16
| | | | | | | | | * Add an accessor for IRInst opcode This main changing is renaming `IRInst::op` over to `IRInst::m_op` and then adds an accessor `IRInst::getOp()` to read it. The rest of the changes are just changing use sites to `getOp` (or to `m_op` in the limited cases where we write to it). This work is in anticipation of a future change that might need to store an extra bit in the same field as the opcode. It seemed better to do this massive refactoring as a separate PR. * fixup
* Heterogeneous Flag Error Visibility (#1642)Dietrich Geisler2020-12-18
| | | | | | | | | | * PR to fix issue #1638. This change introduces a diagnostic sink to the emitModule function, and updates all associated calls to that function. Additionally, this commit updates the heterogeneous hello world example to not need the entry and stage flags for simplicity. * Updated emit-cpp per suggested changes Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
* Make RTTI objects __constant__ in CUDA (#1573)Yong He2020-10-09
| | | Co-authored-by: Yong He <yhe@nvidia.com>
* Simplify workflow when using NVAPI (#1556)Tim Foley2020-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some cases, functionality is available as either a GLSL extension for Vulkan/SPIR-V, or through the NVAPI system for D3D. This situation creates complications because while GLSL extensions are generally all supported by the open-source glslang compiler (which we can bundle and ship), NVAPI operations are exposed through a specific header (`nvHLSLExtns.h`) that ships as part of the NVAPI SDK. When a user wants to explicitly use NVAPI-provided operations in their shader code, there are no major complications for Slang; the user sets up their include paths, `#include`s the relevant header, calls functions in it, and lets Slang deal with the details of compilation. The challenge for Slang arises when we want to provide a cross-platform interface in our standard library (e.g., the `RWByteAddressBuffer.InterlockedAddF32` method that was recently added) that uses either a GLSL extension (when compiling for Vulkan/SPIR-V) or an NVAPI (when compiling to DXBC or DXIL). In that case, the code *generated* by Slang now has a dependency on NVAPI, and we need to somehow emit a `#include` directive that pulls it in when invoking fxc or dxc. Because we do not (and seemingly cannot) bundle the NVAPI header with the compiler, we have to rely on ther user to have it available and to somehow communicate to Slang where it is. Exposing portable routines that sometimes use NVAPI currently creates two main challenges: 1. The user is forced to interact with the "prelude" mechanism in the compiler, which allows the programmer to define code in a given target language that gets prepended to the Slang-generated code. While the prelude mechanism is powerful, it is also hard for users to integrate into their workflow, and our experience so far is that users want something that Just Works. 2. If the user writes code that uses some of our abstract operations that layer on NVAPI *and* they also want to use NVAPI explicitly, they end up with two copies of the NVAPI header (one included by the Slang front-end, and another included by the downstream fxc/dxc compiler). This puts the user in the situation of (a) having to ensure that they set the defines like `NV_SHADER_EXTN_SLOT` consistently both when invoking Slang and when adding their prelude, and (b) even if they do make the definitions consistent, they run into the problem that fxc/dxc complain about overlapping register bindings on the two copies of the `g_NvidiaExt` global shader paraemter that the NVAPI header declares. This change attempts to resolve both issues by adding a lot of "do what I mean" logic to the compiler to try to ease things in the common case. In particular: 1. The user no longer needs to use the "prelude" mechanism when using NVAPI. The compiler now embeds a default prelude for HLSL output, which will `#include` the NVAPI header if and only if the generated code needs NVAPI access because of portable standard library routines that were used. 2. The user can mix-and-match explicit NVAPI use and stdlib functions that compile to use NVAPI. The register/space to be used by NVAPI when included via prelude is now set based on whatever the user set via the preprocessor so that it should automatically be consistent between both cases. Furthermore, the code we emit for the declaration of `g_NvidiaExt` when compiling explicit NVAPI use is set up to be conditional, so that it is skipped in the case where the prelude will pull in its own declaration of that parameter. The way all this is achieved involves a lot of moving pieces: * We now have an HLSL prelude, which mostly just serves to `#include "nvHLSLExtns.h"` in the case where NVAPI support is needed downstream. * Standard library operations that require NVAPI for their implementation on HLSL include a new `[__requiresNVAPI]` attribute. * The preprocessor has been extended so that after tokenizing an input file it looks up the NVAPI-relevant macros in the resulting environment, and if they are set it attached a modifier (`NVAPISlotModifier1) to the AST `ModuleDecl` that is based on their values. Logic is added to detect if multiple input files specify values for the macros in ways that conflict. * The semantic checking step is extended so that it detects the "magic" NVAPI declarations (the `g_NvidiaExt` paramter and the `NvShaderExtnStruct` type that it uses) and attaches a modifier to them so that they can be identified as such in later steps. * Parameter binding is extended to collect a list of the AST modifiers that reflect NVAPI binding, and to reserve the relevant register(s) so that ordinary user-defined parameters cannot conflict with them. * IR lowering translates the three new AST modifiers related to NVAPI over to IR equivalents. * IR linking is extended to make sure that it clones any `IRNVAPISlotDecoration`s attached to the input modules. The pass intentionally does not care where the modifiers came from; it just collects them all and leaves it to downstream code to sort out what they mean. * Emit logic is extended to have a notion of "prelude directives" which are preprocessor directives that should come *before* the prelude in the generated code, because they can impact the way that the prelude compiles. This is done so that we don't have to introduce ad hoc logic for each downstream compiler to set any relevant `-D` flags (e.g., both fxc and dxc would need to duplicate such logic for NVAPI support). * The HLSL source emitter is extended to track whether it emits any operations that require NVAPI support. * The HLSL source emitter is extended to emit prelude directives based on whether NVAPI is needed and, if it is, to also set the register and space that NVAPI should use based on what was stored in the decoration(s) on the IR module. * The HLSL source emitter is extended so that it detects global instructions that represent "magic" NVAPI constructs , and emit them as conditional definitions so that they are skipped when NVAPI is included via the prelude. * The handling of requires capabilities during emit logic was cleaned up a bit so that more logic is shared across targets, and also so that the same logic is used both when emitting a function declaration/definition and when emitting a call to an instrinsic function (which won't get declared/defined).
* Enable all dynamic dispatch tests on CUDA. (#1552)Yong He2020-09-21
| | | | | * Enable all dynamic dispatch tests on CUDA. * Fix expected cross-compile test results.
* Initial attempt to enable CUDA dynamic dispatch codegen (#1549)Yong He2020-09-17
| | | | | * Front-load cuda module loading to fill in RTTI pointers. * Enable dynamic dispatch codegen for CUDA.