| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Fix 7441: CUDA boolean vector layout to use 1-byte elements
Boolean vectors (bool1, bool2, bool3, bool4) were incorrectly implemented
as integer-based types using 4 bytes per element instead of actual 1-byte
boolean elements on CUDA targets.
Changes:
- Update CUDA prelude to define boolean vectors as structs with bool fields
instead of typedef aliases to integer vectors
- Implement CUDALayoutRulesImpl::GetVectorLayout to use 1-byte alignment
for boolean vectors, matching actual CUDA memory layout behavior
- Update make_bool functions to populate struct fields correctly
This ensures boolean vectors have the same memory layout as bool[4] arrays:
- bool1: 1 byte (was 4 bytes)
- bool2: 2 bytes (was 8 bytes)
- bool3: 3 bytes (was 12 bytes)
- bool4: 4 bytes (was 16 bytes)
Fixes memory layout mismatch between Slang reflection API and actual
CUDA compilation, achieving 75% memory savings for boolean vector usage.
* Fix CI issues -
Add and update associated functions and operators
* Make boolX same as uchar
* Use align construct on struct for boolX
* Improve Test case for robust alignment checks
* Formatting
* Disable selected slangpy tests
* add metal check which is slightly different than cuda
* Test-1
* Test-2
* Test-3
* Test-4
* ReflectionChange
* cleanup and update
* _slang_select with plain bool is needed for reverse-loop-checkpoint-test
|