| Age | Commit message (Collapse) | Author |
|
This fixes an issue where non-raytracing kernels couldn't contain any
RaytracingAccelerationStructure resources even when not used.
|
|
Enable CUDA support for additional HLSL intrinsic tests by implementing
missing functionality and fixing compiler bugs affecting CUDA targets.
- Fix critical bug in InterlockedCompareStore64 where division used /4
instead of /8 for 64-bit types, causing incorrect memory addressing for
all signed int 64_t atomics
- Add signed int64_t atomic wrappers (atomicExch, atomicCAS) to CUDA
prelu de that properly cast to/from unsigned types as required by CUDA's
atomic API
- Enable tests: atomic-intrinsics-64bit.slang
- Implement CUDA support for QuadAny and QuadAll operations using warp
shu ffle primitives (__shfl_sync with quad-level lane masking)
- Add CUDA to quad_control capability definition in
slang-capabilities.capdef
- Add _slang_quadAny/_slang_quadAll helper functions to CUDA prelude
- Enable tests: quad-control-comp-functionality.slang,
subgroup-quad.slang
---------
Co-authored-by: szihs <675653+szihs@users.noreply.github.com>
|
|
Enable CUDA support for batch 3 tests
- Enhanced wave operations with exclusive support
- Added proper identity values for min/max operations
- Fixed intrinsic name mapping issues
- Updated test configurations
Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
|
|
Enable CUDA for the tests listed in issue #8078
This requires a minor CUDA prelude change, adding some math functions.
|
|
Due to an older version of spec referred there was an inconsitency v1.29
2/20/2025 - [HitObject LoadLocalRootArgumentsConstant]
Latest spec
https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html#hitobject-loadlocalroottableconstant
Refer:
OptiX backend support for Shader Execution Reordering (SER) features as
outlined in issue #6647. -
|
|
Fixes #7574
Changes:
* Add an initial (fairly simple) optimization pass which is able to
eliminate redundant copies.
* Our current existing optimizer passes remove redundant load/store very
robustly, this pass will focus on other cases of copy elimination
* Primary approach is to make all functions which are `in T` and `T` is
trivial to copy into a `__constref T`. We then (depending on scenario)
manually insert a variable+load if a pass-by-reference is not possible;
otherwise we pass by `constref`.
* Added optimizations to eliminate redundant code which causes
`constref` to fail to compile
---------
Co-authored-by: Harsh Aggarwal <haaggarwal@nvidia.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: slangbot <ellieh+slangbot@nvidia.com>
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
|
* Fix 7441: CUDA boolean vector layout to use 1-byte elements
Boolean vectors (bool1, bool2, bool3, bool4) were incorrectly implemented
as integer-based types using 4 bytes per element instead of actual 1-byte
boolean elements on CUDA targets.
Changes:
- Update CUDA prelude to define boolean vectors as structs with bool fields
instead of typedef aliases to integer vectors
- Implement CUDALayoutRulesImpl::GetVectorLayout to use 1-byte alignment
for boolean vectors, matching actual CUDA memory layout behavior
- Update make_bool functions to populate struct fields correctly
This ensures boolean vectors have the same memory layout as bool[4] arrays:
- bool1: 1 byte (was 4 bytes)
- bool2: 2 bytes (was 8 bytes)
- bool3: 3 bytes (was 12 bytes)
- bool4: 4 bytes (was 16 bytes)
Fixes memory layout mismatch between Slang reflection API and actual
CUDA compilation, achieving 75% memory savings for boolean vector usage.
* Fix CI issues -
Add and update associated functions and operators
* Make boolX same as uchar
* Use align construct on struct for boolX
* Improve Test case for robust alignment checks
* Formatting
* Disable selected slangpy tests
* add metal check which is slightly different than cuda
* Test-1
* Test-2
* Test-3
* Test-4
* ReflectionChange
* cleanup and update
* _slang_select with plain bool is needed for reverse-loop-checkpoint-test
|
|
* Initial plan
* Add U32_firstbitlow implementation for CUDA and CPP backends
Co-authored-by: bmillsNV <163073245+bmillsNV@users.noreply.github.com>
* Add I32_firstbitlow and comprehensive testing for signed/unsigned firstbitlow
Co-authored-by: bmillsNV <163073245+bmillsNV@users.noreply.github.com>
* Convert firstbitlow test to use inline filecheck syntax
Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com>
* Add U32_firstbithigh and I32_firstbithigh implementations for CUDA and CPP backends
Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com>
* Update prelude/slang-cpp-scalar-intrinsics.h
* Update prelude/slang-cpp-scalar-intrinsics.h
* Update prelude/slang-cpp-scalar-intrinsics.h
* Refactor Metal bit intrinsics to handle zero case correctly
Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com>
* Update slang-cuda-prelude.h
remove fake links
* Update hlsl.meta.slang
* if -1, return -1 due to implicit hlsl rule
* -1 or 0 is ~0u as per hlsl implictly
* 0 or -1 as per hlsl
* fix the math to map to hlsl
* fix compile error
* forgot `31 - clz`
* format code (#7943)
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Update source/slang/hlsl.meta.slang
* Update source/slang/hlsl.meta.slang
* Update source/slang/hlsl.meta.slang
* Update source/slang/hlsl.meta.slang
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: bmillsNV <163073245+bmillsNV@users.noreply.github.com>
Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com>
Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com>
Co-authored-by: ArielG-NV <aglasroth@nvidia.com>
Co-authored-by: slangbot <ellieh+slangbot@nvidia.com>
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
|
Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* option to use riff as serialization backend
* option to use riff as serialization backend
* perf
* shuffle code
* perf improvements to deserialization
* formatting
* remove bit_cast
* correct IR verification
* neaten serialized format
* fix peek module info
* formatting
* remove temporary profiling code
* cleanup
* fix wasm build
* more explicit sizes
* deserialize via fossil on 32 bit wasm
* Make serialized modules Int size agnostic
* reorder stable names to allow range based check for 64 bit constants
* format
* review comments
* fix build
* fix
* c++17 compat slang-common.h
|
|
* Fix 1D texture reads in CUDA target
Fixes #7570: 1D surface writes don't work
The issue was that the Load function for read-only textures (hlsl.meta.slang lines 3629-3656)
only supported 2D and 3D textures for CUDA targets, causing 1D texture reads to fall through
to <invalid intrinsic>. This affected the srcTexture[tid.x] read operation in the reproduction case.
Changes:
- Updated static_assert to include SLANG_TEXTURE_1D support
- Added tex1DArrayfetch_int<T> for 1D array texture reads
- Added tex1Dfetch_int<T> for regular 1D texture reads
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Mukund Keshava <mkeshavaNV@users.noreply.github.com>
* Add 1D texture read support for CUDA target
- Add tex1Dfetch_int template specializations for float2, float4, uint, uint2, uint4
- Remove TODO comment about 1D PTX not being supported
- Enable 1D texture test in texture-subscript-cuda.slang
- Fix assembly code issues in original template specializations
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Mukund Keshava <mkeshavaNV@users.noreply.github.com>
* Update slang-cuda-prelude.h
* Fix texture3d ptx issue
* undo 1D texture changes
* Update hlsl.meta.slang
* Update hlsl.meta.slang
* Update hlsl.meta.slang
* Update hlsl.meta.slang
* Extend texture-subscript-cuda.slang test with uint and int format variants
Add test cases for newly supported texture formats in CUDA:
- 2D textures with uint, uint2, uint4
- 2D textures with int, int2, int4
- 3D textures with uint, uint2, uint4
- 3D textures with int, int2, int4
This ensures the texture subscript operations work correctly for all
the format variants added in the CUDA texture fixes.
Co-authored-by: Mukund Keshava <mkeshavaNV@users.noreply.github.com>
* update expected file
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Mukund Keshava <mkeshavaNV@users.noreply.github.com>
|
|
* Replace SLANG_ALIGN_OF with C++11 alignof
* Fix formatting (again)
|
|
* Enable LSS hit object test
Enabled LSS SER tests now that PR #7211, which added SER support to OptiX,
has been merged.
Ran: ./build/Debug/bin/slangc.exe tests/cuda/lss-test.slang -target ptx
-Xnvrtc -I"C:/ProgramData/NVIDIA Corporation/OptiX SDK 9.0.0/include"
and confirmed that the HitObject intrinsic is called.
eg:
call (%f15, %f16, %f17, %f18, %f19, %f20, %f21, %f22),
_optix_hitobject_get_linear_curve_vertex_data, ();
* format code
---------
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
|
* WiP: LSS intrinsics: initial commit
* format code
* Fix CI failures
* Address review comment
---------
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
|
* Implement shader execution reordering support for OptiX
Added OptiX backend support for Shader Execution Reordering (SER) features as outlined in issue #6647. This implementation:
1. Added CUDA target support for HitObject API
2. Implemented core SER functionality (TraceRay, MakeHit/Miss, Invoke)
3. Added OptiX-specific hit object handling functions
4. Added test case for OptiX SER functionality
* format code
---------
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
|
* add template specializations for signed integer texture fetches
* format code (#7162)
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
---------
Co-authored-by: slangbot <ellieh+slangbot@nvidia.com>
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
|
* WiP: Add more formats for texture reads
* fix test
* format code
* add float2/float4 versions for 1D and 3D as well
* fixed review comment
* fix review comments
---------
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
|
|
* Define a bit size for the intptr types
* Fix intptr_t sign
* Extend intptr test to check for previously broken operations
* Fix intptr vector test on CUDA
* Handle intptr size in getAnyValueSize
* Fix formatting
* Try with __ARM_ARCH_ISA_64
* On macs, int64_t != intptr_t
Yikes
* Move define to prelude header
* Also check apple in host-prelude
* Fix define location
|
|
* fix cuda surface write intrinsics
* format code (#7023)
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
---------
Co-authored-by: Mukund Keshava <mkeshava@nvidia.com>
Co-authored-by: slangbot <ellieh+slangbot@nvidia.com>
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
|
Change adds 16-bit and 8-bit support for countbits intrinsic. In
cases where a backend's native counbits lacks support, support
is emulated.
New tests are added for 16-bit and 8-bit support. Additional testing
added for 32-bit and minor updates made to 64-bit countbits.
|
|
* cuda: Add support for subscript operator
This CL adds support for the subscript operator for Read Only
textures in cuda. Also adds a test for this.
Fixes #6781
* format code
* fix review comments
* format code
---------
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
|
|
Change modifies the countbits intrinsic to use generics in order to
support 64bit countbits on select platforms where this is supported.
On platforms where this is not natively supported, we emulate by
converting the 64-bit type into a uint2 (metal and spir-v).
This should align with the implementation of other uint64_t
intrinsics such as abs, min, max and clamp.
Added new countbits64 test to verify changes.
Updated documentation for 64bit-type-support.html
|
|
(#6675)
* Improve embed tool to search all include directories as determined by CMake
Hopefully this puts an end to prelude generation issues.
* Update CMakeLists.txt
* Update CMakeLists.txt
* Use Slang's string representation instead of malloc-ing chars
|
|
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* Add intptr_t abs/min/max operations for CPU & CUDA targets
* Define intptr_t and uintptr_t with CUDACC_RTC
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* Add SLANG_ENABLE_RELEASE_LTO cmake option
* Fix cmake static build
* Disable install SlangTargets to avoid static build failing
|
|
* Fix issue with slang-embed & include ordering
* Update CMakeLists.txt
|
|
* Fix CUDA prelude for makeMatrix
* Add regression test.
|
|
* format
* Minor test fixes
* enable checking cpp format in ci
|
|
* format cmake files
* format code
---------
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
|
|
(#5415)
This commit changes the word "stdlib" or "standard library" to "core module" in the source code.
|
|
* Split examples cmake desc
* declutter top level CMakeLists.txt
* fail if building tests without gfx
* Move llvm fetching to another cmake file
* Further split CMakeLists.txt
* Neaten llvm fetching
* Remove last premake remnant
* correct cross builds
* Neaten
* Neaten project organization in vs
|
|
* Cleanup atomic intrinsics.
* Fix.
* Fix glsl.
* Remove hacky intrinsic expansion logic for glsl image atomics.
* Fix all tests.
* Fix.
* Add `InterlockedAddF16Emulated`.
* Fix glsl intrinsic.
* Fix.
|
|
* Add options to prevent usage of own submodules
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Allow using external unordered dense headers
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Link system wide installed unordered dense
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Allow external header usage for lz4 and spirv
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Add more options to disable targets
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Add option to provide explizit path for spirv headers and remove earlier options that break the build process
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Rename options to use common prefix
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Fix indentation for the cmake changes
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Add advanced_option function for cmake
* Normalize includes between system and submodule dependencies
Fix any before-accidentally-working problems
* Add option for enabling/disabling slang-rhi
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Pass correct include path for cpu tests
* Correct include path
---------
Signed-off-by: Jacki <jacki@thejackimonster.de>
Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
|
|
|
|
column_major/row_major. (#4653)
* Allow CPP/CUDA/Metal to legalize their buffer-elements.
Fixes: #4537
Changes:
1. Matrix inputs require legalization (pack/unpack) to ensure consistent row_major/column_major throughout entire shader, the following enabled legalization pass fixes this.
2. Added missing CUDA intrinsic so CUDA can run more tests.
3. Added a memory packing test since this still fails for cpp/cuda/metal (due to having no memory packing enforcement).
* change memory packing tests to run for targets without packing
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* Move the file public header files to `include` dir
Close the issue (#4635).
Move the following headers files to a `include` dir
located at root dir of slang repo:
slang-com-helper.h -> include/slang-com-helper.h
slang-com-ptr.h -> include/slang-com-ptr.h
slang-gfx.h -> include/slang-gfx.h
slang.h -> include/slang.h
Change cmake/SlangTarget.cmake to add include path to
every target, and change the source file to use
"#include <slang.h>" to include the public headers.
The source code update is by the script like follow:
```
fileNames_slang=$(grep -r "\".*slang\.h\"" source/ -l)
for fileName in "${fileNames_slang[@]}"
do
echo "$fileName"
sed -i "s/\".*slang\.h\"/\"slang\.h\"/" $fileName
done
```
* Fix the test issues
* Fix cpu test issues by adding include seach path
* Update cmake to not add include path for every target
Also change "#include <slang.h>" to "include "slang.h" " to
make the coding style consistent with other slang code.
* Change public include to private include for unit-test and slang-glslang
|
|
|
|
Fixes https://github.com/shader-slang/slang/issues/4549
|
|
This avoids a problem with broadcasted tensors. Our tensor-view platform is designed to allow unrestricted access to tensor memory, while broadcasted tensors were designed for 'read-only' use-cases. Trying to write into a broadcasted tensor needs re-allocation, which Slang is not designed to do.
For now, we enforce contiguity on tensors with any 0 strides.
In the future, we will introduce a ConstTensorView object to allow such tensors to be used as an input.
This patch also propagates name-hint information through structs & arrays of tensors, to allow sensible names for the error messages (before this the error messages were temporary inst numbers, which is nearly impossible to debug)
|
|
|
|
Resolves #3980
Based on the operator precedence, Slang may omits the parentheses if they
are not needed. DXC prints warnings for such cases and some applications
may treat the warnings as errors.
This commit emits parentheses to avoid the DXC warning even when they
are not needed.
|
|
The following PR implements 8.14-8.19 of the [OpenGL-GLSL specification](https://registry.khronos.org/OpenGL/specs/gl/GLSLangSpec.4.60.pdf).
Fully implements all functions and built-in type's, resolves https://github.com/shader-slang/slang/issues/3692 for GLSL & SPRI-V targets.
_Notes:_
Testing Tools:
* Fragment shaders cannot test computational results. Only OpCodes are checked for proper emitting.
Implementation Notes:
* SubpassInput requires an unknown image format.
* SubpassInput is disjoint from TextureType: __SubpassImpl (.slang) & SubpassInputType (Compiler) to reduce code generation required.
* SubpassInput required an additional input layout modifier, input_attachment_index, this was added as a new parameter binding attribute. Since the following qualifiers can overlap with different resources (`layout(input_attachment_index = 0, binding = 0, set = 0)`) input_attachment_index is checked for overlapping resource bindings separately from other qualifiers with `LayoutResourceKind::InputAttachmentIndex`.
* `GLSLInputAttachmentIndexLayoutModifier` was added to enforce function parameters only accepting `in` decorated variables.
* `in` decorated variables needed to have emitting modified to allow directly emitting the variable into function calls if used as a parameter, normally Slang has a "global variable" shadow as a "global parameter" through a copy. This does not work and is solved using `GlobalVariableShadowingGlobalParameterDecoration` to build a relationship of "global variable" to "global parameter", we then resolve this relationship and replace "global variable" uses later in compile.
* `AtomicCounterMemory` memory-constraint requires `OpCapability AtomicStorage`, `AtomicStorage` is invalid for Vulkan targets. glslang outputs for `barrier`, `memoryBarrier`, and `groupMemoryBarrier` `AtomicCounterMemory` as a memory constraint. This compiles as valid SPIR-V for Vulkan since `OpCapability AtomicStorage` is not declared. This behavior of glslang is undefined as per [3.31.Capability of the SPIR-V specification](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_capability). We will omit `AtomicCounterMemory` from our barrier calls.
|
|
|
|
SLANG_CUDA_RTC (#3624)
|
|
* Generate lookup tables from cmake
* Correct add_custom_command generator dependencies
* set options for lookup table source
* include path
* use slang_add_target for capability generated targets
* vs project regenerate
* ci wobble
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* More robust input and output selection in generator tools
* Add cmake build system
* Get slang-test running with cmake
* Bump lz4 and miniz dependencies
* Make cmake build more declarative
* Correct preprocessor logic in slang.h
* Add cuda test to compute/simple
* Remove empty cmake files
* output placement for cmake, and commenting
* Correct include paths in spirv-embed-generator
* Format cmake with gersemi
* Make cmake build clerer
* Neaten header generation
Also work around https://gitlab.kitware.com/cmake/cmake/-/issues/18399
by introducing correct_generated_properties to set the GENERATED flag in
the correct scope
* remove unused files
* use 3.20 to set GENERATOR property properly
* spelling
* more flexible linker arg setting
* replace slang-static with obj collection
* Set rpath and linker path correctly
* neaten generated file generation
* tests working with cmake build
* fix premake5 build
* comment and neaten cmake
* remove unnecessary dependency
* Build aftermath example only when aftermath is enabled
* Add slang-llvm and other dependencies
* Put modules alongside binaries
* Find slang-glslang correctly
* Better option handling
* comments
* add llvm build test
* Better option handling
* cmake wobble
* use UNICODE and _UNICODE
* remove other workflows
* use ccache
* neaten
* limit parallel for llvm build
* use ninja for build
* Windows and Darwin slang-llvm builds
* cache key
* verbose llvm build
* cl on windows
* sccache and cl.exe
* use cl.exe
* Correct package detection
* less verbosity
* Simplify miniz inclusion
* fix build with sccache
* Neaten llvm building
* neaten
* Neaten slang-llvm fetching
* more surgical workarounds
* Add ci action
* Get version from git
* better variable naming
* add missing include
* clean up after premake in cmake
* more docs on cmake build
* ci wobble
* add imgui target
* more selective source
* do not download swiftshader
* Some missing dependencies
* only build llvm on dispatch
* Disable /Zi in CI where sccache is present
* simplify
* set PIC for miniz
* set policies before project
* reengage workaround
* more runs on ci
* Add cmake presets
* Add cpack
* move iterator debug level to preset
* Correct lib flag
* simplify action
* Neaten cmake init
* Add todo
* Add simple test wrapper
* Add tests to workflow presets
* rename packing preset
* Correctly set definitions
* docs
* correct preset names
* Make slang-test depend on test-server/test-process
* neaten
* use workflow in actions
* install docs
* Correct module install dir
* debug dist workflow
* Install headers
* neaten header globbing
* Neaten dependency handling
* make lib and bin variables
* Do not set compiler for vs builds, unnecessary
* docs
* allow setting explicit source for target
* maintain archive subdir
* cmake docs
* install headers
* place targets into folders
* cmake docs
* nest external projects in folder
* remove name clash
* Neater external packages
* meta targets in folder structure
* cleaner slang-glslang dll
* Add missing static directive to slang-no-embedded-stdlib
* more robust module copying
* make slang-test the startup project
* folder tweak
* Make FETCH_BINARY the default on all platforms
* Set DEBUG_DIR
* add natvis files to source
* skip spirv tests
* remove test step from debug dist
* Add build to .gitignore
* redo warnings to be more like premake
* Update imgui
* clean more premake files
* Disable PCH for glslang, gcc throws a warning
* Add /MP for msvc builds
* warning wobble
* Add script to build llvm
* Add slang-llvm and generators components
* Build slang-llvm in ci
* comments
* fetch llvm with git
* better abi approximation for cache
* better sccache key
* formatting
* Correct logic around disabling problematic debug info for ccache
* exclude gcc and clang from windows ci
* Make dist workflows use system llvm
* naming
* restore normal dist builds
* formatting
* run tests in ci
* Correct slang-llvm url setting
* Rely on the system to find the test tool library
* actions matrix wiggle
* cope with OSX ancient bash
* Correct compilers on windows
* more ci debugging
* Correct rpath handling on OSX
* neaten
* correct path to slang-llvm
* Correct rpath separator on osx
* Find slang-llvm correctly
* smoke tests only on osx
* ci wobble
* Give MacOS module a dylib suffix
* get swiftshader correctly
* cope with bsd cp
* remove debug output
* full tests on osx
* ci wobble
* Add some vk tests to expected failures
* simplify ci
* ci wobble
* exclude dx12 tests from github ci
* remove cmake code for building llvm
* warnings
* warnings as errors for cl
* spirv-tools in path
* add aarch64 ci build
* Add SLANG_GENERATORS_PATH option for prebuilt generators
* neaten
* Correct generator target name
* remove yaml anchors because github actions does not support them
* Demote CMake in docs
Also add info on cross compiling
* Restore premake CI
* use minimal ci for cmake
* Write miniz_export for premake build
and .gitignore it
* Mention build config tool options in docs
* Remove redefined macro for miniz
* regenerate vs project
|
|
reduction intrinsics. (#3314)
* CUDA: Fixes for NVRTC 12.x, warp mask ambiguity; add reduction partial specializations.
* Fixes running NVRTC on CUDA 12 without a specified profile (used in testing, e.g. `slang-test -api cuda -category wave`)
* Fixes mask ambiguity between getting the lane index from threadId.x and a full mask of threads.
* Adds partial specializations for compute capability 8.x warp reduction intrinsics.
* Fix formatting
|
|
* Make the exponent return value from frexp int
Fixes https://github.com/shader-slang/slang/issues/3282
* Update slang-llvm.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|