summaryrefslogtreecommitdiff
path: root/prelude
AgeCommit message (Collapse)Author
2025-09-09Always define OptixTraversableHandle (#8411)Simon Kallweit
This fixes an issue where non-raytracing kernels couldn't contain any RaytracingAccelerationStructure resources even when not used.
2025-09-04Enable CUDA support for additional HLSL intrinsic tests (#8293)Harsh Aggarwal (NVIDIA)
Enable CUDA support for additional HLSL intrinsic tests by implementing missing functionality and fixing compiler bugs affecting CUDA targets. - Fix critical bug in InterlockedCompareStore64 where division used /4 instead of /8 for 64-bit types, causing incorrect memory addressing for all signed int 64_t atomics - Add signed int64_t atomic wrappers (atomicExch, atomicCAS) to CUDA prelu de that properly cast to/from unsigned types as required by CUDA's atomic API - Enable tests: atomic-intrinsics-64bit.slang - Implement CUDA support for QuadAny and QuadAll operations using warp shu ffle primitives (__shfl_sync with quad-level lane masking) - Add CUDA to quad_control capability definition in slang-capabilities.capdef - Add _slang_quadAny/_slang_quadAll helper functions to CUDA prelude - Enable tests: quad-control-comp-functionality.slang, subgroup-quad.slang --------- Co-authored-by: szihs <675653+szihs@users.noreply.github.com>
2025-08-20Updated support to enable batch3 (#8219)Harsh Aggarwal (NVIDIA)
Enable CUDA support for batch 3 tests - Enhanced wave operations with exclusive support - Added proper identity values for min/max operations - Fixed intrinsic name mapping issues - Updated test configurations Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
2025-08-12Enable CUDA testing for batch 2 (#8147)jarcherNV
Enable CUDA for the tests listed in issue #8078 This requires a minor CUDA prelude change, adding some math functions.
2025-08-07Fix intrinsic LoadLocalRootTableConstant for optix (#7949)Harsh Aggarwal (NVIDIA)
Due to an older version of spec referred there was an inconsitency v1.29 2/20/2025 - [HitObject LoadLocalRootArgumentsConstant] Latest spec https://microsoft.github.io/DirectX-Specs/d3d/Raytracing.html#hitobject-loadlocalroottableconstant Refer: OptiX backend support for Shader Execution Reordering (SER) features as outlined in issue #6647. -
2025-08-07Initial copy elision pass (#8042)ArielG-NV
Fixes #7574 Changes: * Add an initial (fairly simple) optimization pass which is able to eliminate redundant copies. * Our current existing optimizer passes remove redundant load/store very robustly, this pass will focus on other cases of copy elimination * Primary approach is to make all functions which are `in T` and `T` is trivial to copy into a `__constref T`. We then (depending on scenario) manually insert a variable+load if a pass-by-reference is not possible; otherwise we pass by `constref`. * Added optimizations to eliminate redundant code which causes `constref` to fail to compile --------- Co-authored-by: Harsh Aggarwal <haaggarwal@nvidia.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-08-01Fix 7441: CUDA boolean vector layout to use 1-byte elements (#7862)Harsh Aggarwal (NVIDIA)
* Fix 7441: CUDA boolean vector layout to use 1-byte elements Boolean vectors (bool1, bool2, bool3, bool4) were incorrectly implemented as integer-based types using 4 bytes per element instead of actual 1-byte boolean elements on CUDA targets. Changes: - Update CUDA prelude to define boolean vectors as structs with bool fields instead of typedef aliases to integer vectors - Implement CUDALayoutRulesImpl::GetVectorLayout to use 1-byte alignment for boolean vectors, matching actual CUDA memory layout behavior - Update make_bool functions to populate struct fields correctly This ensures boolean vectors have the same memory layout as bool[4] arrays: - bool1: 1 byte (was 4 bytes) - bool2: 2 bytes (was 8 bytes) - bool3: 3 bytes (was 12 bytes) - bool4: 4 bytes (was 16 bytes) Fixes memory layout mismatch between Slang reflection API and actual CUDA compilation, achieving 75% memory savings for boolean vector usage. * Fix CI issues - Add and update associated functions and operators * Make boolX same as uchar * Use align construct on struct for boolX * Improve Test case for robust alignment checks * Formatting * Disable selected slangpy tests * add metal check which is slightly different than cuda * Test-1 * Test-2 * Test-3 * Test-4 * ReflectionChange * cleanup and update * _slang_select with plain bool is needed for reverse-loop-checkpoint-test
2025-07-29Fix CUDA backend missing U32_firstbitlow implementation (#7921)Copilot
* Initial plan * Add U32_firstbitlow implementation for CUDA and CPP backends Co-authored-by: bmillsNV <163073245+bmillsNV@users.noreply.github.com> * Add I32_firstbitlow and comprehensive testing for signed/unsigned firstbitlow Co-authored-by: bmillsNV <163073245+bmillsNV@users.noreply.github.com> * Convert firstbitlow test to use inline filecheck syntax Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> * Add U32_firstbithigh and I32_firstbithigh implementations for CUDA and CPP backends Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> * Update prelude/slang-cpp-scalar-intrinsics.h * Update prelude/slang-cpp-scalar-intrinsics.h * Update prelude/slang-cpp-scalar-intrinsics.h * Refactor Metal bit intrinsics to handle zero case correctly Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> * Update slang-cuda-prelude.h remove fake links * Update hlsl.meta.slang * if -1, return -1 due to implicit hlsl rule * -1 or 0 is ~0u as per hlsl implictly * 0 or -1 as per hlsl * fix the math to map to hlsl * fix compile error * forgot `31 - clz` * format code (#7943) Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> * Update source/slang/hlsl.meta.slang * Update source/slang/hlsl.meta.slang * Update source/slang/hlsl.meta.slang * Update source/slang/hlsl.meta.slang --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: bmillsNV <163073245+bmillsNV@users.noreply.github.com> Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> Co-authored-by: csyonghe <2652293+csyonghe@users.noreply.github.com> Co-authored-by: ArielG-NV <aglasroth@nvidia.com> Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-07-17fix typo (#7794)Dennis Brakhane
Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com> Co-authored-by: Yong He <yonghe@outlook.com>
2025-07-17Perf improvements to IR serialization (#7751)Ellie Hermaszewska
* option to use riff as serialization backend * option to use riff as serialization backend * perf * shuffle code * perf improvements to deserialization * formatting * remove bit_cast * correct IR verification * neaten serialized format * fix peek module info * formatting * remove temporary profiling code * cleanup * fix wasm build * more explicit sizes * deserialize via fossil on 32 bit wasm * Make serialized modules Int size agnostic * reorder stable names to allow range based check for 64 bit constants * format * review comments * fix build * fix * c++17 compat slang-common.h
2025-07-16Fix CUDA issues with texture reads and surface writes (#7780)Mukund Keshava
* Fix 1D texture reads in CUDA target Fixes #7570: 1D surface writes don't work The issue was that the Load function for read-only textures (hlsl.meta.slang lines 3629-3656) only supported 2D and 3D textures for CUDA targets, causing 1D texture reads to fall through to <invalid intrinsic>. This affected the srcTexture[tid.x] read operation in the reproduction case. Changes: - Updated static_assert to include SLANG_TEXTURE_1D support - Added tex1DArrayfetch_int<T> for 1D array texture reads - Added tex1Dfetch_int<T> for regular 1D texture reads 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Mukund Keshava <mkeshavaNV@users.noreply.github.com> * Add 1D texture read support for CUDA target - Add tex1Dfetch_int template specializations for float2, float4, uint, uint2, uint4 - Remove TODO comment about 1D PTX not being supported - Enable 1D texture test in texture-subscript-cuda.slang - Fix assembly code issues in original template specializations 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Mukund Keshava <mkeshavaNV@users.noreply.github.com> * Update slang-cuda-prelude.h * Fix texture3d ptx issue * undo 1D texture changes * Update hlsl.meta.slang * Update hlsl.meta.slang * Update hlsl.meta.slang * Update hlsl.meta.slang * Extend texture-subscript-cuda.slang test with uint and int format variants Add test cases for newly supported texture formats in CUDA: - 2D textures with uint, uint2, uint4 - 2D textures with int, int2, int4 - 3D textures with uint, uint2, uint4 - 3D textures with int, int2, int4 This ensures the texture subscript operations work correctly for all the format variants added in the CUDA texture fixes. Co-authored-by: Mukund Keshava <mkeshavaNV@users.noreply.github.com> * update expected file --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Mukund Keshava <mkeshavaNV@users.noreply.github.com>
2025-07-03Replace SLANG_ALIGN_OF with C++11 alignof (#7523)Julius Ikkala
* Replace SLANG_ALIGN_OF with C++11 alignof * Fix formatting (again)
2025-05-30Enable LSS hit object test (#7273)Mukund Keshava
* Enable LSS hit object test Enabled LSS SER tests now that PR #7211, which added SER support to OptiX, has been merged. Ran: ./build/Debug/bin/slangc.exe tests/cuda/lss-test.slang -target ptx -Xnvrtc -I"C:/ProgramData/NVIDIA Corporation/OptiX SDK 9.0.0/include" and confirmed that the HitObject intrinsic is called. eg: call (%f15, %f16, %f17, %f18, %f19, %f20, %f21, %f22), _optix_hitobject_get_linear_curve_vertex_data, (); * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-05-27Add LSS intrinsics (#7200)Mukund Keshava
* WiP: LSS intrinsics: initial commit * format code * Fix CI failures * Address review comment --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-05-26Implement shader execution reordering support for OptiX (#7211)Harsh Aggarwal (NVIDIA)
* Implement shader execution reordering support for OptiX Added OptiX backend support for Shader Execution Reordering (SER) features as outlined in issue #6647. This implementation: 1. Added CUDA target support for HitObject API 2. Implemented core SER functionality (TraceRay, MakeHit/Miss, Invoke) 3. Added OptiX-specific hit object handling functions 4. Added test case for OptiX SER functionality * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-05-20[CUDA] Add template specializations for signed integer texture fetches (#7161)Simon Kallweit
* add template specializations for signed integer texture fetches * format code (#7162) Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> --------- Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-05-12cuda: Add more formats for texture read/write (#7012)Mukund Keshava
* WiP: Add more formats for texture reads * fix test * format code * add float2/float4 versions for 1D and 3D as well * fixed review comment * fix review comments --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
2025-05-09Fix various intptr_t issues by defining its width in `getIntTypeInfo` (#6786)Julius Ikkala
* Define a bit size for the intptr types * Fix intptr_t sign * Extend intptr test to check for previously broken operations * Fix intptr vector test on CUDA * Handle intptr size in getAnyValueSize * Fix formatting * Try with __ARM_ARCH_ISA_64 * On macs, int64_t != intptr_t Yikes * Move define to prelude header * Also check apple in host-prelude * Fix define location
2025-05-07[CUDA] Fix surface write intrinsics (#7004)Simon Kallweit
* fix cuda surface write intrinsics * format code (#7023) Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> --------- Co-authored-by: Mukund Keshava <mkeshava@nvidia.com> Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-05-05Add countbits 16-bit and 8-bit support (#6433) (#6897)sricker-nvidia
Change adds 16-bit and 8-bit support for countbits intrinsic. In cases where a backend's native counbits lacks support, support is emulated. New tests are added for 16-bit and 8-bit support. Additional testing added for 32-bit and minor updates made to 64-bit countbits.
2025-04-30Add subscript operator support in cuda (#6830)Mukund Keshava
* cuda: Add support for subscript operator This CL adds support for the subscript operator for Read Only textures in cuda. Also adds a test for this. Fixes #6781 * format code * fix review comments * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
2025-04-19Implement 64bit countbits intrinsic (#6433) (#6845)sricker-nvidia
Change modifies the countbits intrinsic to use generics in order to support 64bit countbits on select platforms where this is supported. On platforms where this is not natively supported, we emulate by converting the 64-bit type into a uint2 (metal and spir-v). This should align with the implementation of other uint64_t intrinsics such as abs, min, max and clamp. Added new countbits64 test to verify changes. Updated documentation for 64bit-type-support.html
2025-03-25Improve embed tool to search all include directories as determined by CMake ↵Sai Praveen Bangaru
(#6675) * Improve embed tool to search all include directories as determined by CMake Hopefully this puts an end to prelude generation issues. * Update CMakeLists.txt * Update CMakeLists.txt * Use Slang's string representation instead of malloc-ing chars
2025-02-05Fix matrix comparison operators on CPU (#6296)Julius Ikkala
Co-authored-by: Yong He <yonghe@outlook.com>
2025-01-28Added const version for the operator[] in Matrix (#6186)Norbert Nopper
Co-authored-by: Yong He <yonghe@outlook.com>
2025-01-24Add intptr_t abs/min/max operations for CPU & CUDA targets (#6160)Julius Ikkala
* Add intptr_t abs/min/max operations for CPU & CUDA targets * Define intptr_t and uintptr_t with CUDACC_RTC --------- Co-authored-by: Yong He <yonghe@outlook.com>
2025-01-24Fix static build and install (#6158)Dario Mylonopoulos
* Add SLANG_ENABLE_RELEASE_LTO cmake option * Fix cmake static build * Disable install SlangTargets to avoid static build failing
2024-11-25Fix issue with slang-embed & include ordering (#5680)Sai Praveen Bangaru
* Fix issue with slang-embed & include ordering * Update CMakeLists.txt
2024-11-07Fix CUDA prelude for makeMatrix (#5509)Yong He
* Fix CUDA prelude for makeMatrix * Add regression test.
2024-10-29formatEllie Hermaszewska
* format * Minor test fixes * enable checking cpp format in ci
2024-10-29format cmake files (#5406)Ellie Hermaszewska
* format cmake files * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2024-10-28Replace the word stdlib or standard-library with core-module for source code ↵Jay Kwak
(#5415) This commit changes the word "stdlib" or "standard library" to "core module" in the source code.
2024-10-24declutter top level CMakeLists.txt (#5391)Ellie Hermaszewska
* Split examples cmake desc * declutter top level CMakeLists.txt * fail if building tests without gfx * Move llvm fetching to another cmake file * Further split CMakeLists.txt * Neaten llvm fetching * Remove last premake remnant * correct cross builds * Neaten * Neaten project organization in vs
2024-10-17Cleanup atomic intrinsics. (#5324)Yong He
* Cleanup atomic intrinsics. * Fix. * Fix glsl. * Remove hacky intrinsic expansion logic for glsl image atomics. * Fix all tests. * Fix. * Add `InterlockedAddF16Emulated`. * Fix glsl intrinsic. * Fix.
2024-10-04Allow building using external dependencies (#5076)Tobias Frisch
* Add options to prevent usage of own submodules Signed-off-by: Jacki <jacki@thejackimonster.de> * Allow using external unordered dense headers Signed-off-by: Jacki <jacki@thejackimonster.de> * Link system wide installed unordered dense Signed-off-by: Jacki <jacki@thejackimonster.de> * Allow external header usage for lz4 and spirv Signed-off-by: Jacki <jacki@thejackimonster.de> * Add more options to disable targets Signed-off-by: Jacki <jacki@thejackimonster.de> * Add option to provide explizit path for spirv headers and remove earlier options that break the build process Signed-off-by: Jacki <jacki@thejackimonster.de> * Rename options to use common prefix Signed-off-by: Jacki <jacki@thejackimonster.de> * Fix indentation for the cmake changes Signed-off-by: Jacki <jacki@thejackimonster.de> * Add advanced_option function for cmake * Normalize includes between system and submodule dependencies Fix any before-accidentally-working problems * Add option for enabling/disabling slang-rhi Signed-off-by: Jacki <jacki@thejackimonster.de> * Pass correct include path for cpu tests * Correct include path --------- Signed-off-by: Jacki <jacki@thejackimonster.de> Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
2024-07-24Add missing make_bool intrinsics in cuda prelude. (#4735)Yong He
2024-07-18Allow CPP/CUDA/Metal to lower/legalize buffer-elements to support ↵ArielG-NV
column_major/row_major. (#4653) * Allow CPP/CUDA/Metal to legalize their buffer-elements. Fixes: #4537 Changes: 1. Matrix inputs require legalization (pack/unpack) to ensure consistent row_major/column_major throughout entire shader, the following enabled legalization pass fixes this. 2. Added missing CUDA intrinsic so CUDA can run more tests. 3. Added a memory packing test since this still fails for cpp/cuda/metal (due to having no memory packing enforcement). * change memory packing tests to run for targets without packing --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-07-17Move the file public header files to `include` dir (#4636)kaizhangNV
* Move the file public header files to `include` dir Close the issue (#4635). Move the following headers files to a `include` dir located at root dir of slang repo: slang-com-helper.h -> include/slang-com-helper.h slang-com-ptr.h -> include/slang-com-ptr.h slang-gfx.h -> include/slang-gfx.h slang.h -> include/slang.h Change cmake/SlangTarget.cmake to add include path to every target, and change the source file to use "#include <slang.h>" to include the public headers. The source code update is by the script like follow: ``` fileNames_slang=$(grep -r "\".*slang\.h\"" source/ -l) for fileName in "${fileNames_slang[@]}" do echo "$fileName" sed -i "s/\".*slang\.h\"/\"slang\.h\"/" $fileName done ``` * Fix the test issues * Fix cpu test issues by adding include seach path * Update cmake to not add include path for every target Also change "#include <slang.h>" to "include "slang.h" " to make the coding style consistent with other slang code. * Change public include to private include for unit-test and slang-glslang
2024-07-10Add `float16` support to slang-torch (#4584)Sai Praveen Bangaru
2024-07-05Correct type for double log10 (#4550)Ellie Hermaszewska
Fixes https://github.com/shader-slang/slang/issues/4549
2024-07-01Error out when constructing tensor views from tensors with 0 stride. (#4516)Sai Praveen Bangaru
This avoids a problem with broadcasted tensors. Our tensor-view platform is designed to allow unrestricted access to tensor memory, while broadcasted tensors were designed for 'read-only' use-cases. Trying to write into a broadcasted tensor needs re-allocation, which Slang is not designed to do. For now, we enforce contiguity on tensors with any 0 strides. In the future, we will introduce a ConstTensorView object to allow such tensors to be used as an input. This patch also propagates name-hint information through structs & arrays of tensors, to allow sensible names for the error messages (before this the error messages were temporary inst numbers, which is nearly impossible to debug)
2024-04-24Prevent pointer validation for zero-size arrays (#4021)Sai Praveen Bangaru
2024-04-24Avoid DXC warnings for missing bitwise op parantheses (#4004)Jay Kwak
Resolves #3980 Based on the operator precedence, Slang may omits the parentheses if they are not needed. DXC prints warnings for such cases and some applications may treat the warnings as errors. This commit emits parentheses to avoid the DXC warning even when they are not needed.
2024-04-03Implement 8.14-8.19 of OpenGL-GLSL specificationArielG-NV
The following PR implements 8.14-8.19 of the [OpenGL-GLSL specification](https://registry.khronos.org/OpenGL/specs/gl/GLSLangSpec.4.60.pdf). Fully implements all functions and built-in type's, resolves https://github.com/shader-slang/slang/issues/3692 for GLSL & SPRI-V targets. _Notes:_ Testing Tools: * Fragment shaders cannot test computational results. Only OpCodes are checked for proper emitting. Implementation Notes: * SubpassInput requires an unknown image format. * SubpassInput is disjoint from TextureType: __SubpassImpl (.slang) & SubpassInputType (Compiler) to reduce code generation required. * SubpassInput required an additional input layout modifier, input_attachment_index, this was added as a new parameter binding attribute. Since the following qualifiers can overlap with different resources (`layout(input_attachment_index = 0, binding = 0, set = 0)`) input_attachment_index is checked for overlapping resource bindings separately from other qualifiers with `LayoutResourceKind::InputAttachmentIndex`. * `GLSLInputAttachmentIndexLayoutModifier` was added to enforce function parameters only accepting `in` decorated variables. * `in` decorated variables needed to have emitting modified to allow directly emitting the variable into function calls if used as a parameter, normally Slang has a "global variable" shadow as a "global parameter" through a copy. This does not work and is solved using `GlobalVariableShadowingGlobalParameterDecoration` to build a relationship of "global variable" to "global parameter", we then resolve this relationship and replace "global variable" uses later in compile. * `AtomicCounterMemory` memory-constraint requires `OpCapability AtomicStorage`, `AtomicStorage` is invalid for Vulkan targets. glslang outputs for `barrier`, `memoryBarrier`, and `groupMemoryBarrier` `AtomicCounterMemory` as a memory constraint. This compiles as valid SPIR-V for Vulkan since `OpCapability AtomicStorage` is not declared. This behavior of glslang is undefined as per [3.31.Capability of the SPIR-V specification](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_capability). We will omit `AtomicCounterMemory` from our barrier calls.
2024-03-08Improve cpp prelude. (#3725)Yong He
2024-02-24Enable SLANG_MAKE_VECTOR calls when using SLANG_CUDA_ENABLE_HALF without ↵NBickford
SLANG_CUDA_RTC (#3624)
2024-01-24Generate lookup tables from cmake (#3461)Ellie Hermaszewska
* Generate lookup tables from cmake * Correct add_custom_command generator dependencies * set options for lookup table source * include path * use slang_add_target for capability generated targets * vs project regenerate * ci wobble --------- Co-authored-by: Yong He <yonghe@outlook.com>
2023-12-08WIP: CMake (#3326)Ellie Hermaszewska
* More robust input and output selection in generator tools * Add cmake build system * Get slang-test running with cmake * Bump lz4 and miniz dependencies * Make cmake build more declarative * Correct preprocessor logic in slang.h * Add cuda test to compute/simple * Remove empty cmake files * output placement for cmake, and commenting * Correct include paths in spirv-embed-generator * Format cmake with gersemi * Make cmake build clerer * Neaten header generation Also work around https://gitlab.kitware.com/cmake/cmake/-/issues/18399 by introducing correct_generated_properties to set the GENERATED flag in the correct scope * remove unused files * use 3.20 to set GENERATOR property properly * spelling * more flexible linker arg setting * replace slang-static with obj collection * Set rpath and linker path correctly * neaten generated file generation * tests working with cmake build * fix premake5 build * comment and neaten cmake * remove unnecessary dependency * Build aftermath example only when aftermath is enabled * Add slang-llvm and other dependencies * Put modules alongside binaries * Find slang-glslang correctly * Better option handling * comments * add llvm build test * Better option handling * cmake wobble * use UNICODE and _UNICODE * remove other workflows * use ccache * neaten * limit parallel for llvm build * use ninja for build * Windows and Darwin slang-llvm builds * cache key * verbose llvm build * cl on windows * sccache and cl.exe * use cl.exe * Correct package detection * less verbosity * Simplify miniz inclusion * fix build with sccache * Neaten llvm building * neaten * Neaten slang-llvm fetching * more surgical workarounds * Add ci action * Get version from git * better variable naming * add missing include * clean up after premake in cmake * more docs on cmake build * ci wobble * add imgui target * more selective source * do not download swiftshader * Some missing dependencies * only build llvm on dispatch * Disable /Zi in CI where sccache is present * simplify * set PIC for miniz * set policies before project * reengage workaround * more runs on ci * Add cmake presets * Add cpack * move iterator debug level to preset * Correct lib flag * simplify action * Neaten cmake init * Add todo * Add simple test wrapper * Add tests to workflow presets * rename packing preset * Correctly set definitions * docs * correct preset names * Make slang-test depend on test-server/test-process * neaten * use workflow in actions * install docs * Correct module install dir * debug dist workflow * Install headers * neaten header globbing * Neaten dependency handling * make lib and bin variables * Do not set compiler for vs builds, unnecessary * docs * allow setting explicit source for target * maintain archive subdir * cmake docs * install headers * place targets into folders * cmake docs * nest external projects in folder * remove name clash * Neater external packages * meta targets in folder structure * cleaner slang-glslang dll * Add missing static directive to slang-no-embedded-stdlib * more robust module copying * make slang-test the startup project * folder tweak * Make FETCH_BINARY the default on all platforms * Set DEBUG_DIR * add natvis files to source * skip spirv tests * remove test step from debug dist * Add build to .gitignore * redo warnings to be more like premake * Update imgui * clean more premake files * Disable PCH for glslang, gcc throws a warning * Add /MP for msvc builds * warning wobble * Add script to build llvm * Add slang-llvm and generators components * Build slang-llvm in ci * comments * fetch llvm with git * better abi approximation for cache * better sccache key * formatting * Correct logic around disabling problematic debug info for ccache * exclude gcc and clang from windows ci * Make dist workflows use system llvm * naming * restore normal dist builds * formatting * run tests in ci * Correct slang-llvm url setting * Rely on the system to find the test tool library * actions matrix wiggle * cope with OSX ancient bash * Correct compilers on windows * more ci debugging * Correct rpath handling on OSX * neaten * correct path to slang-llvm * Correct rpath separator on osx * Find slang-llvm correctly * smoke tests only on osx * ci wobble * Give MacOS module a dylib suffix * get swiftshader correctly * cope with bsd cp * remove debug output * full tests on osx * ci wobble * Add some vk tests to expected failures * simplify ci * ci wobble * exclude dx12 tests from github ci * remove cmake code for building llvm * warnings * warnings as errors for cl * spirv-tools in path * add aarch64 ci build * Add SLANG_GENERATORS_PATH option for prebuilt generators * neaten * Correct generator target name * remove yaml anchors because github actions does not support them * Demote CMake in docs Also add info on cross compiling * Restore premake CI * use minimal ci for cmake * Write miniz_export for premake build and .gitignore it * Mention build config tool options in docs * Remove redefined macro for miniz * regenerate vs project
2023-11-07CUDA: Fixes for NVRTC 12.x and warp mask ambiguity; adds CC 8.x warp ↵Neil Bickford
reduction intrinsics. (#3314) * CUDA: Fixes for NVRTC 12.x, warp mask ambiguity; add reduction partial specializations. * Fixes running NVRTC on CUDA 12 without a specified profile (used in testing, e.g. `slang-test -api cuda -category wave`) * Fixes mask ambiguity between getting the lane index from threadId.x and a full mask of threads. * Adds partial specializations for compute capability 8.x warp reduction intrinsics. * Fix formatting
2023-10-26Make the exponent return value from frexp int (#3284)Ellie Hermaszewska
* Make the exponent return value from frexp int Fixes https://github.com/shader-slang/slang/issues/3282 * Update slang-llvm. --------- Co-authored-by: Yong He <yhe@nvidia.com>