| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR implements `Access.Immutable` to allow pointers to immutable
data.
The new type `ImmutablePtr<T>` is defined as an alias of `Ptr<T,
Address.Immutable>`.
By forming a immutable pointer, the programmer is conveying to the
compiler that the data at the pointer address will never change during
the execution of the current program. Therefore loads from immutable
pointers can be deduplicated by the compiler, and will translate to
`__ldg` when generating code for CUDA.
The SPIRV backend is not changed in this PR, since the current SPIRV
spec makes it very difficult to specify loads from immutable address
without generating tons of wrappers and boilerplate type declarations.
We would like to see the spec evolved a bit to around its support of
`NonWritable` physical storage pointers or immutable loads before we
attempt to express such immutability in SPIRV. For now we simply emit
ordinary pointers and loads when generating spirv.
---------
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
| |
|
|
|
|
| |
Currently, the emitted CUDA code does only compile with latest OptiX
9.0. This change allows code to be compiled with OptiX 8.0 upwards by
not emitting OptiX calls that are not available. In a later step we
should add proper capabilities for the various OptiX versions.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
- fix handling layer and mip level
- add support for 1D layered textures
- reduce code by using macros
- assert when trying to emit unsupported intrinsics
There is a new set of unit tests in slang-rhi for exhaustive testing of
shader loads/stores on textures. These fixes allow to enable most of
these tests. Formatted loads/stores on surfaces are not supported in PTX
ISA, so this would require codegen for the conversion which in theory
should be possible but not as part of the CUDA prelude.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(#8547)
This allows us to specialize functions whose argument is a sub element
of a constant buffer, instead of being only applicable to entire buffer
element. Closes #8421.
This change also implements a proper heuristic to determine when to
specialize the calls and defer the buffer loads.
This PR addresses a pathological case exposed in
`slangpy\slangpy\benchmarks\test_benchmark_tensor.py`, which used to
take 27ms to finish, and now takes 1.25ms.
For example, given:
```
struct Bottom
{
float bigArray[1024];
[mutating]
void setVal(int index, float value) { bigArray[index] = value; }
}
struct Root
{
Bottom top[2];
[mutating]
void setTopVal(int x, int y, float value)
{
top[x].setVal(y, value);
}
}
RWStructuredBuffer<Root> sb;
[shader("compute")]
[numthreads(1, 1, 1)]
void compute_main(uint3 tid: SV_DispatchThreadID)
{
sb[0].setTopVal(1, 2, 100.0f);
}
```
We are now able to specialize the call to `setTopVal` into:
```
void compute_main(uint3 tid: SV_DispatchThreadID)
{
setTopVal_specialized(0, 1, 2, 100.0f);
}
void setTopVal_specialized(int sbIdx, int x, int y, float value)
{
Bottom_setVal_specialized(sbIdx, x, y, value);
}
void Bottom_setVal_specialized(int sbIdx, int x, int y, float value)
{
sb[sbIdx].top[x].bigArray[y] = value;
}
```
And get rid of all unnecessary loads. Achieving this requires a
combination of function call specialization and buffer-load-defer pass.
The buffer-load-defer pass has been completely rewritten to be more
correct and avoid introducing redundant loads.
This PR also adds tests to make sure pointers, bindless handles, and
loads from structured buffer or constant buffers works as expected.
|
| |
|
|
| |
This fixes an issue where non-raytracing kernels couldn't contain any
RaytracingAccelerationStructure resources even when not used.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable CUDA support for additional HLSL intrinsic tests by implementing
missing functionality and fixing compiler bugs affecting CUDA targets.
- Fix critical bug in InterlockedCompareStore64 where division used /4
instead of /8 for 64-bit types, causing incorrect memory addressing for
all signed int 64_t atomics
- Add signed int64_t atomic wrappers (atomicExch, atomicCAS) to CUDA
prelu de that properly cast to/from unsigned types as required by CUDA's
atomic API
- Enable tests: atomic-intrinsics-64bit.slang
- Implement CUDA support for QuadAny and QuadAll operations using warp
shu ffle primitives (__shfl_sync with quad-level lane masking)
- Add CUDA to quad_control capability definition in
slang-capabilities.capdef
- Add _slang_quadAny/_slang_quadAll helper functions to CUDA prelude
- Enable tests: quad-control-comp-functionality.slang,
subgroup-quad.slang
---------
Co-authored-by: szihs <675653+szihs@users.noreply.github.com>
|
| |
|
|
|
|
|
|
|
| |
Enable CUDA support for batch 3 tests
- Enhanced wave operations with exclusive support
- Added proper identity values for min/max operations
- Fixed intrinsic name mapping issues
- Updated test configurations
Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
|
| |
|
|
| |
Enable CUDA for the tests listed in issue #8078
This requires a minor CUDA prelude change, adding some math functions.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Due to an older version of spec referred there was an inconsitency v1.29
2/20/2025 - [HitObject LoadLocalRootArgumentsConstant]
Latest spec
https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html#hitobject-loadlocalroottableconstant
Refer:
OptiX backend support for Shader Execution Reordering (SER) features as
outlined in issue #6647. -
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #7574
Changes:
* Add an initial (fairly simple) optimization pass which is able to
eliminate redundant copies.
* Our current existing optimizer passes remove redundant load/store very
robustly, this pass will focus on other cases of copy elimination
* Primary approach is to make all functions which are `in T` and `T` is
trivial to copy into a `__constref T`. We then (depending on scenario)
manually insert a variable+load if a pass-by-reference is not possible;
otherwise we pass by `constref`.
* Added optimizations to eliminate redundant code which causes
`constref` to fail to compile
---------
Co-authored-by: Harsh Aggarwal <haaggarwal@nvidia.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: slangbot <ellieh+slangbot@nvidia.com>
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Fix 7441: CUDA boolean vector layout to use 1-byte elements
Boolean vectors (bool1, bool2, bool3, bool4) were incorrectly implemented
as integer-based types using 4 bytes per element instead of actual 1-byte
boolean elements on CUDA targets.
Changes:
- Update CUDA prelude to define boolean vectors as structs with bool fields
instead of typedef aliases to integer vectors
- Implement CUDALayoutRulesImpl::GetVectorLayout to use 1-byte alignment
for boolean vectors, matching actual CUDA memory layout behavior
- Update make_bool functions to populate struct fields correctly
This ensures boolean vectors have the same memory layout as bool[4] arrays:
- bool1: 1 byte (was 4 bytes)
- bool2: 2 bytes (was 8 bytes)
- bool3: 3 bytes (was 12 bytes)
- bool4: 4 bytes (was 16 bytes)
Fixes memory layout mismatch between Slang reflection API and actual
CUDA compilation, achieving 75% memory savings for boolean vector usage.
* Fix CI issues -
Add and update associated functions and operators
* Make boolX same as uchar
* Use align construct on struct for boolX
* Improve Test case for robust alignment checks
* Formatting
* Disable selected slangpy tests
* add metal check which is slightly different than cuda
* Test-1
* Test-2
* Test-3
* Test-4
* ReflectionChange
* cleanup and update
* _slang_select with plain bool is needed for reverse-loop-checkpoint-test
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Initial plan
* Add U32_firstbitlow implementation for CUDA and CPP backends
Co-authored-by: bmillsNV <163073245+bmillsNV@users.noreply.github.com>
* Add I32_firstbitlow and comprehensive testing for signed/unsigned firstbitlow
Co-authored-by: bmillsNV <163073245+bmillsNV@users.noreply.github.com>
* Convert firstbitlow test to use inline filecheck syntax
Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com>
* Add U32_firstbithigh and I32_firstbithigh implementations for CUDA and CPP backends
Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com>
* Update prelude/slang-cpp-scalar-intrinsics.h
* Update prelude/slang-cpp-scalar-intrinsics.h
* Update prelude/slang-cpp-scalar-intrinsics.h
* Refactor Metal bit intrinsics to handle zero case correctly
Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com>
* Update slang-cuda-prelude.h
remove fake links
* Update hlsl.meta.slang
* if -1, return -1 due to implicit hlsl rule
* -1 or 0 is ~0u as per hlsl implictly
* 0 or -1 as per hlsl
* fix the math to map to hlsl
* fix compile error
* forgot `31 - clz`
* format code (#7943)
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Update source/slang/hlsl.meta.slang
* Update source/slang/hlsl.meta.slang
* Update source/slang/hlsl.meta.slang
* Update source/slang/hlsl.meta.slang
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: bmillsNV <163073245+bmillsNV@users.noreply.github.com>
Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com>
Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com>
Co-authored-by: ArielG-NV <aglasroth@nvidia.com>
Co-authored-by: slangbot <ellieh+slangbot@nvidia.com>
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
| |
|
|
| |
Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
Co-authored-by: Yong He <yonghe@outlook.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* option to use riff as serialization backend
* option to use riff as serialization backend
* perf
* shuffle code
* perf improvements to deserialization
* formatting
* remove bit_cast
* correct IR verification
* neaten serialized format
* fix peek module info
* formatting
* remove temporary profiling code
* cleanup
* fix wasm build
* more explicit sizes
* deserialize via fossil on 32 bit wasm
* Make serialized modules Int size agnostic
* reorder stable names to allow range based check for 64 bit constants
* format
* review comments
* fix build
* fix
* c++17 compat slang-common.h
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Fix 1D texture reads in CUDA target
Fixes #7570: 1D surface writes don't work
The issue was that the Load function for read-only textures (hlsl.meta.slang lines 3629-3656)
only supported 2D and 3D textures for CUDA targets, causing 1D texture reads to fall through
to <invalid intrinsic>. This affected the srcTexture[tid.x] read operation in the reproduction case.
Changes:
- Updated static_assert to include SLANG_TEXTURE_1D support
- Added tex1DArrayfetch_int<T> for 1D array texture reads
- Added tex1Dfetch_int<T> for regular 1D texture reads
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Mukund Keshava <mkeshavaNV@users.noreply.github.com>
* Add 1D texture read support for CUDA target
- Add tex1Dfetch_int template specializations for float2, float4, uint, uint2, uint4
- Remove TODO comment about 1D PTX not being supported
- Enable 1D texture test in texture-subscript-cuda.slang
- Fix assembly code issues in original template specializations
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Mukund Keshava <mkeshavaNV@users.noreply.github.com>
* Update slang-cuda-prelude.h
* Fix texture3d ptx issue
* undo 1D texture changes
* Update hlsl.meta.slang
* Update hlsl.meta.slang
* Update hlsl.meta.slang
* Update hlsl.meta.slang
* Extend texture-subscript-cuda.slang test with uint and int format variants
Add test cases for newly supported texture formats in CUDA:
- 2D textures with uint, uint2, uint4
- 2D textures with int, int2, int4
- 3D textures with uint, uint2, uint4
- 3D textures with int, int2, int4
This ensures the texture subscript operations work correctly for all
the format variants added in the CUDA texture fixes.
Co-authored-by: Mukund Keshava <mkeshavaNV@users.noreply.github.com>
* update expected file
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Mukund Keshava <mkeshavaNV@users.noreply.github.com>
|
| |
|
|
|
| |
* Replace SLANG_ALIGN_OF with C++11 alignof
* Fix formatting (again)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Enable LSS hit object test
Enabled LSS SER tests now that PR #7211, which added SER support to OptiX,
has been merged.
Ran: ./build/Debug/bin/slangc.exe tests/cuda/lss-test.slang -target ptx
-Xnvrtc -I"C:/ProgramData/NVIDIA Corporation/OptiX SDK 9.0.0/include"
and confirmed that the HitObject intrinsic is called.
eg:
call (%f15, %f16, %f17, %f18, %f19, %f20, %f21, %f22),
_optix_hitobject_get_linear_curve_vertex_data, ();
* format code
---------
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* WiP: LSS intrinsics: initial commit
* format code
* Fix CI failures
* Address review comment
---------
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Implement shader execution reordering support for OptiX
Added OptiX backend support for Shader Execution Reordering (SER) features as outlined in issue #6647. This implementation:
1. Added CUDA target support for HitObject API
2. Implemented core SER functionality (TraceRay, MakeHit/Miss, Invoke)
3. Added OptiX-specific hit object handling functions
4. Added test case for OptiX SER functionality
* format code
---------
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
* add template specializations for signed integer texture fetches
* format code (#7162)
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
---------
Co-authored-by: slangbot <ellieh+slangbot@nvidia.com>
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* WiP: Add more formats for texture reads
* fix test
* format code
* add float2/float4 versions for 1D and 3D as well
* fixed review comment
* fix review comments
---------
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Define a bit size for the intptr types
* Fix intptr_t sign
* Extend intptr test to check for previously broken operations
* Fix intptr vector test on CUDA
* Handle intptr size in getAnyValueSize
* Fix formatting
* Try with __ARM_ARCH_ISA_64
* On macs, int64_t != intptr_t
Yikes
* Move define to prelude header
* Also check apple in host-prelude
* Fix define location
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* fix cuda surface write intrinsics
* format code (#7023)
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
---------
Co-authored-by: Mukund Keshava <mkeshava@nvidia.com>
Co-authored-by: slangbot <ellieh+slangbot@nvidia.com>
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
| |
|
|
|
|
|
|
| |
Change adds 16-bit and 8-bit support for countbits intrinsic. In
cases where a backend's native counbits lacks support, support
is emulated.
New tests are added for 16-bit and 8-bit support. Additional testing
added for 32-bit and minor updates made to 64-bit countbits.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* cuda: Add support for subscript operator
This CL adds support for the subscript operator for Read Only
textures in cuda. Also adds a test for this.
Fixes #6781
* format code
* fix review comments
* format code
---------
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Change modifies the countbits intrinsic to use generics in order to
support 64bit countbits on select platforms where this is supported.
On platforms where this is not natively supported, we emulate by
converting the 64-bit type into a uint2 (metal and spir-v).
This should align with the implementation of other uint64_t
intrinsics such as abs, min, max and clamp.
Added new countbits64 test to verify changes.
Updated documentation for 64bit-type-support.html
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
(#6675)
* Improve embed tool to search all include directories as determined by CMake
Hopefully this puts an end to prelude generation issues.
* Update CMakeLists.txt
* Update CMakeLists.txt
* Use Slang's string representation instead of malloc-ing chars
|
| |
|
| |
Co-authored-by: Yong He <yonghe@outlook.com>
|
| |
|
| |
Co-authored-by: Yong He <yonghe@outlook.com>
|
| |
|
|
|
|
|
|
|
| |
* Add intptr_t abs/min/max operations for CPU & CUDA targets
* Define intptr_t and uintptr_t with CUDACC_RTC
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
| |
|
|
|
|
|
| |
* Add SLANG_ENABLE_RELEASE_LTO cmake option
* Fix cmake static build
* Disable install SlangTargets to avoid static build failing
|
| |
|
|
|
| |
* Fix issue with slang-embed & include ordering
* Update CMakeLists.txt
|
| |
|
|
|
| |
* Fix CUDA prelude for makeMatrix
* Add regression test.
|
| |
|
|
|
|
|
| |
* format
* Minor test fixes
* enable checking cpp format in ci
|
| |
|
|
|
|
|
|
|
| |
* format cmake files
* format code
---------
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
| |
|
|
|
| |
(#5415)
This commit changes the word "stdlib" or "standard library" to "core module" in the source code.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Split examples cmake desc
* declutter top level CMakeLists.txt
* fail if building tests without gfx
* Move llvm fetching to another cmake file
* Further split CMakeLists.txt
* Neaten llvm fetching
* Remove last premake remnant
* correct cross builds
* Neaten
* Neaten project organization in vs
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Cleanup atomic intrinsics.
* Fix.
* Fix glsl.
* Remove hacky intrinsic expansion logic for glsl image atomics.
* Fix all tests.
* Fix.
* Add `InterlockedAddF16Emulated`.
* Fix glsl intrinsic.
* Fix.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add options to prevent usage of own submodules
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Allow using external unordered dense headers
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Link system wide installed unordered dense
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Allow external header usage for lz4 and spirv
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Add more options to disable targets
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Add option to provide explizit path for spirv headers and remove earlier options that break the build process
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Rename options to use common prefix
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Fix indentation for the cmake changes
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Add advanced_option function for cmake
* Normalize includes between system and submodule dependencies
Fix any before-accidentally-working problems
* Add option for enabling/disabling slang-rhi
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Pass correct include path for cpu tests
* Correct include path
---------
Signed-off-by: Jacki <jacki@thejackimonster.de>
Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
column_major/row_major. (#4653)
* Allow CPP/CUDA/Metal to legalize their buffer-elements.
Fixes: #4537
Changes:
1. Matrix inputs require legalization (pack/unpack) to ensure consistent row_major/column_major throughout entire shader, the following enabled legalization pass fixes this.
2. Added missing CUDA intrinsic so CUDA can run more tests.
3. Added a memory packing test since this still fails for cpp/cuda/metal (due to having no memory packing enforcement).
* change memory packing tests to run for targets without packing
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Move the file public header files to `include` dir
Close the issue (#4635).
Move the following headers files to a `include` dir
located at root dir of slang repo:
slang-com-helper.h -> include/slang-com-helper.h
slang-com-ptr.h -> include/slang-com-ptr.h
slang-gfx.h -> include/slang-gfx.h
slang.h -> include/slang.h
Change cmake/SlangTarget.cmake to add include path to
every target, and change the source file to use
"#include <slang.h>" to include the public headers.
The source code update is by the script like follow:
```
fileNames_slang=$(grep -r "\".*slang\.h\"" source/ -l)
for fileName in "${fileNames_slang[@]}"
do
echo "$fileName"
sed -i "s/\".*slang\.h\"/\"slang\.h\"/" $fileName
done
```
* Fix the test issues
* Fix cpu test issues by adding include seach path
* Update cmake to not add include path for every target
Also change "#include <slang.h>" to "include "slang.h" " to
make the coding style consistent with other slang code.
* Change public include to private include for unit-test and slang-glslang
|
| | |
|
| |
|
| |
Fixes https://github.com/shader-slang/slang/issues/4549
|
| |
|
|
|
|
|
|
|
| |
This avoids a problem with broadcasted tensors. Our tensor-view platform is designed to allow unrestricted access to tensor memory, while broadcasted tensors were designed for 'read-only' use-cases. Trying to write into a broadcasted tensor needs re-allocation, which Slang is not designed to do.
For now, we enforce contiguity on tensors with any 0 strides.
In the future, we will introduce a ConstTensorView object to allow such tensors to be used as an input.
This patch also propagates name-hint information through structs & arrays of tensors, to allow sensible names for the error messages (before this the error messages were temporary inst numbers, which is nearly impossible to debug)
|
| | |
|
| |
|
|
|
|
|
|
|
|
| |
Resolves #3980
Based on the operator precedence, Slang may omits the parentheses if they
are not needed. DXC prints warnings for such cases and some applications
may treat the warnings as errors.
This commit emits parentheses to avoid the DXC warning even when they
are not needed.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following PR implements 8.14-8.19 of the [OpenGL-GLSL specification](https://registry.khronos.org/OpenGL/specs/gl/GLSLangSpec.4.60.pdf).
Fully implements all functions and built-in type's, resolves https://github.com/shader-slang/slang/issues/3692 for GLSL & SPRI-V targets.
_Notes:_
Testing Tools:
* Fragment shaders cannot test computational results. Only OpCodes are checked for proper emitting.
Implementation Notes:
* SubpassInput requires an unknown image format.
* SubpassInput is disjoint from TextureType: __SubpassImpl (.slang) & SubpassInputType (Compiler) to reduce code generation required.
* SubpassInput required an additional input layout modifier, input_attachment_index, this was added as a new parameter binding attribute. Since the following qualifiers can overlap with different resources (`layout(input_attachment_index = 0, binding = 0, set = 0)`) input_attachment_index is checked for overlapping resource bindings separately from other qualifiers with `LayoutResourceKind::InputAttachmentIndex`.
* `GLSLInputAttachmentIndexLayoutModifier` was added to enforce function parameters only accepting `in` decorated variables.
* `in` decorated variables needed to have emitting modified to allow directly emitting the variable into function calls if used as a parameter, normally Slang has a "global variable" shadow as a "global parameter" through a copy. This does not work and is solved using `GlobalVariableShadowingGlobalParameterDecoration` to build a relationship of "global variable" to "global parameter", we then resolve this relationship and replace "global variable" uses later in compile.
* `AtomicCounterMemory` memory-constraint requires `OpCapability AtomicStorage`, `AtomicStorage` is invalid for Vulkan targets. glslang outputs for `barrier`, `memoryBarrier`, and `groupMemoryBarrier` `AtomicCounterMemory` as a memory constraint. This compiles as valid SPIR-V for Vulkan since `OpCapability AtomicStorage` is not declared. This behavior of glslang is undefined as per [3.31.Capability of the SPIR-V specification](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_capability). We will omit `AtomicCounterMemory` from our barrier calls.
|
| | |
|
| |
|
|
| |
SLANG_CUDA_RTC (#3624)
|