diff options
| author | Theresa Foley <10618364+tangent-vector@users.noreply.github.com> | 2025-05-12 15:50:32 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2025-05-12 15:50:32 -0700 |
| commit | b423ea55b4b00004bde1f91d95d9e5161d0ae629 (patch) | |
| tree | 848bb548ae3813736f9a26a06352060d0813bc67 /docs/user-guide | |
| parent | 54ff7fd879e71f51ed3c0fda16224cbbcf0831eb (diff) | |
Make CUDA version capabilities reach NVRTC (#7074)
Fixes #7049
The root cause of the problem in #7049 is simply that newer NVRTC versions produce a warning when asked to generate code for older CUDA SM versions, and the default that Slang was requesting compilation for was old enough to trigger that warning, and thus trip up the test case (which only looks at the first diagnostic produced by the downstream compiler).
Superficially, the fix was easy: change the test case in question (`tests/diagnostics/local-line.slang`) to request `-capability cuda_sm_8_0`, the minimum version supported by current NVRTC.
Unfortunately, the simple fix required some other fixes in order to actually work.
The capability system includes capability names of the form `cuda_sm_*_*`, but specifying such a capability had *no* impact on the CUDA SM version passed in when invoking NVRTC.
Instead, only the CUDA SM versions requested in the implementation of intrinsics in the core module were affecting the version number passed down.
This change adds logic to `slang-compiler.cpp` to take explicitly requested capabilities into account when inferring the CUDA SM version to be passed downstream.
A more complete fix would also add similar logic for all the other targets.
Unfortunately... yet again... that fix wasn't enough to make things work as expect.
Now I had the problem that requesting `-capability cuda_sm_8_0` was actually causing the NVRTC invocation to request CUDA SM version **9.0**!
The underlying problem *there* was that the `slang-capabilities.capdef` file has defined certain capability names in a way that implies atomic capabilities much higher than one would expect.
E.g., the `cuda_sm_8_0` alias was including HLSL `sm_5_0`, but then `sm_5_0` in turn included `_cuda_sm_9_0`.
The fix, for now, is to change the definitions in `slang-capabilities.capdef` to not have the counter-intuitive definitions for `cuda_sm_*_*`.
With this set of fixes, the test failure in the original bug report no longer occurs.
The work that went into this change suggests several larger-scope fixes that would be good to pursue:
* Ideally the capability definitions would have some sort of validation checking to make sure that counter-intuitive results like `cuda_sm_8_0` requesting CUDA SM 9.0 do not occur.
* The translation of capabilities over to version numbers for a downstream compiler should be expanded to cover other targets, and not just CUDA. It might be better/simpler to just pass the capabilities themselves to the downstream compiler, since it is possible that a downstream compiler could have more fine-grained enable/disable options than a simple version number.
* The entire approach to computing version numbers required for downstream compilation should be cleaned up so that we don't have this duplication between the capabilities that represent those versions and separate syntactic constructs that are used to "request" those versions as part of code generation.
* We are very much at the point where we should consider dropping the current behavior where a profile name or capability like `sm_5_0`, that is specific to a single target or a subset of targets, also implies a set of comparable capabilities for other targets.
Diffstat (limited to 'docs/user-guide')
| -rw-r--r-- | docs/user-guide/a3-02-reference-capability-atoms.md | 12 |
1 files changed, 6 insertions, 6 deletions
diff --git a/docs/user-guide/a3-02-reference-capability-atoms.md b/docs/user-guide/a3-02-reference-capability-atoms.md index 4f4a1a12b..12cda6bed 100644 --- a/docs/user-guide/a3-02-reference-capability-atoms.md +++ b/docs/user-guide/a3-02-reference-capability-atoms.md @@ -7,12 +7,12 @@ Capability Atoms ### Sections: -1. [Targets](#targets) -2. [Stages](#stages) -3. [Versions](#versions) -4. [Extensions](#extensions) -5. [Compound Capabilities](#compound-capabilities) -6. [Other](#other) +1. [Targets](#Targets) +2. [Stages](#Stages) +3. [Versions](#Versions) +4. [Extensions](#Extensions) +5. [Compound Capabilities](#Compound-Capabilities) +6. [Other](#Other) Targets ---------------------- |
