diff options
| author | Harsh Aggarwal (NVIDIA) <haaggarwal@nvidia.com> | 2025-09-18 21:16:44 +0530 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2025-09-18 15:46:44 +0000 |
| commit | 54faa55c0bd4c4beede7337a76ed3a56d1eb4f15 (patch) | |
| tree | 352d7e67fbc63801c7aa3e7942f875850a34eac4 /source/slang/slang-emit.cpp | |
| parent | 206f6cf5657177dfec9113d1f1b88b685eb2b75a (diff) | |
Fix CUDA global variable initialization with constructor calls (#8340)
Fix CUDA global variable initialization with constructor calls
Resolves CUDA compilation failure where global variables with struct
constructor
initialization generated illegal `__device__` variable runtime
initialization.
**Problem:**
```cuda
// Generated invalid CUDA code:
__device__ static const Stuff_0 gStuff_0 = Stuff_x24init_0(args...);
// Error: "dynamic initialization is not supported for a __device__
variable"
Root Cause Discovered:
Through extensive debugging, found that
moveGlobalVarInitializationToEntryPoints
pass only handled kIROp_GlobalVar instructions, but global constants
with
constructor calls appeared as kIROp_Call instructions at module scope.
Solution:
1. IR Pipeline Fix: Extended moveGlobalVarInitializationToEntryPoints to
detect
and transform module-level constructor calls into proper global
variables with
entry-point initialization
2. Field Access Fix: Enhanced kIROp_FieldExtract logic to emit correct
->
syntax for pointer types and address-of operations
3. Constructor Emission: Added CUDA-specific handling for constructor
calls
Architecture:
- Transforms let %gStuff = call %Constructor(...) into kernel context
initialization
- Moves runtime initialization from global scope to entry-point
execution
- Follows CUDA best practices for global state management
Files:
- source/slang/slang-ir-explicit-global-init.cpp: Extended IR
transformation pass
- source/slang/slang-emit-c-like.cpp: Enhanced field access and foldable
value logic
- source/slang/slang-emit-cuda.cpp: Added CUDA-specific field extraction
handling
Result:
// Now generates proper CUDA code:
struct KernelContext_0 { Stuff_0 gStuff_1; };
// Runtime initialization in entry point:
kernelContext_1.gStuff_1 = constructor_call();
Fixes: tests/compute/type-legalize-global-with-init.slang
Diffstat (limited to 'source/slang/slang-emit.cpp')
| -rw-r--r-- | source/slang/slang-emit.cpp | 9 |
1 files changed, 8 insertions, 1 deletions
diff --git a/source/slang/slang-emit.cpp b/source/slang/slang-emit.cpp index 6725ac3de..e1689ccfc 100644 --- a/source/slang/slang-emit.cpp +++ b/source/slang/slang-emit.cpp @@ -1264,10 +1264,17 @@ Result linkAndOptimizeIR( legalizeEmptyArray(irModule, sink); + // For CUDA targets, always inline global constants to avoid dynamic initialization + // of __device__ variables rejected by NVRTC. This runs independently of the broader + // resource/existential type legalization, which remains disabled for CUDA. + if (target == CodeGenTarget::CUDASource || options.shouldLegalizeExistentialAndResourceTypes) + { + inlineGlobalConstantsForLegalization(irModule); + } + // We don't need the legalize pass for C/C++ based types if (options.shouldLegalizeExistentialAndResourceTypes) { - inlineGlobalConstantsForLegalization(irModule); // The Slang language allows interfaces to be used like // ordinary types (including placing them in constant |
