summaryrefslogtreecommitdiff
path: root/prelude
AgeCommit message (Collapse)Author
2025-05-30Enable LSS hit object test (#7273)Mukund Keshava
* Enable LSS hit object test Enabled LSS SER tests now that PR #7211, which added SER support to OptiX, has been merged. Ran: ./build/Debug/bin/slangc.exe tests/cuda/lss-test.slang -target ptx -Xnvrtc -I"C:/ProgramData/NVIDIA Corporation/OptiX SDK 9.0.0/include" and confirmed that the HitObject intrinsic is called. eg: call (%f15, %f16, %f17, %f18, %f19, %f20, %f21, %f22), _optix_hitobject_get_linear_curve_vertex_data, (); * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-05-27Add LSS intrinsics (#7200)Mukund Keshava
* WiP: LSS intrinsics: initial commit * format code * Fix CI failures * Address review comment --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-05-26Implement shader execution reordering support for OptiX (#7211)Harsh Aggarwal (NVIDIA)
* Implement shader execution reordering support for OptiX Added OptiX backend support for Shader Execution Reordering (SER) features as outlined in issue #6647. This implementation: 1. Added CUDA target support for HitObject API 2. Implemented core SER functionality (TraceRay, MakeHit/Miss, Invoke) 3. Added OptiX-specific hit object handling functions 4. Added test case for OptiX SER functionality * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-05-20[CUDA] Add template specializations for signed integer texture fetches (#7161)Simon Kallweit
* add template specializations for signed integer texture fetches * format code (#7162) Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> --------- Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-05-12cuda: Add more formats for texture read/write (#7012)Mukund Keshava
* WiP: Add more formats for texture reads * fix test * format code * add float2/float4 versions for 1D and 3D as well * fixed review comment * fix review comments --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
2025-05-09Fix various intptr_t issues by defining its width in `getIntTypeInfo` (#6786)Julius Ikkala
* Define a bit size for the intptr types * Fix intptr_t sign * Extend intptr test to check for previously broken operations * Fix intptr vector test on CUDA * Handle intptr size in getAnyValueSize * Fix formatting * Try with __ARM_ARCH_ISA_64 * On macs, int64_t != intptr_t Yikes * Move define to prelude header * Also check apple in host-prelude * Fix define location
2025-05-07[CUDA] Fix surface write intrinsics (#7004)Simon Kallweit
* fix cuda surface write intrinsics * format code (#7023) Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> --------- Co-authored-by: Mukund Keshava <mkeshava@nvidia.com> Co-authored-by: slangbot <ellieh+slangbot@nvidia.com> Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2025-05-05Add countbits 16-bit and 8-bit support (#6433) (#6897)sricker-nvidia
Change adds 16-bit and 8-bit support for countbits intrinsic. In cases where a backend's native counbits lacks support, support is emulated. New tests are added for 16-bit and 8-bit support. Additional testing added for 32-bit and minor updates made to 64-bit countbits.
2025-04-30Add subscript operator support in cuda (#6830)Mukund Keshava
* cuda: Add support for subscript operator This CL adds support for the subscript operator for Read Only textures in cuda. Also adds a test for this. Fixes #6781 * format code * fix review comments * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
2025-04-19Implement 64bit countbits intrinsic (#6433) (#6845)sricker-nvidia
Change modifies the countbits intrinsic to use generics in order to support 64bit countbits on select platforms where this is supported. On platforms where this is not natively supported, we emulate by converting the 64-bit type into a uint2 (metal and spir-v). This should align with the implementation of other uint64_t intrinsics such as abs, min, max and clamp. Added new countbits64 test to verify changes. Updated documentation for 64bit-type-support.html
2025-03-25Improve embed tool to search all include directories as determined by CMake ↵Sai Praveen Bangaru
(#6675) * Improve embed tool to search all include directories as determined by CMake Hopefully this puts an end to prelude generation issues. * Update CMakeLists.txt * Update CMakeLists.txt * Use Slang's string representation instead of malloc-ing chars
2025-02-05Fix matrix comparison operators on CPU (#6296)Julius Ikkala
Co-authored-by: Yong He <yonghe@outlook.com>
2025-01-28Added const version for the operator[] in Matrix (#6186)Norbert Nopper
Co-authored-by: Yong He <yonghe@outlook.com>
2025-01-24Add intptr_t abs/min/max operations for CPU & CUDA targets (#6160)Julius Ikkala
* Add intptr_t abs/min/max operations for CPU & CUDA targets * Define intptr_t and uintptr_t with CUDACC_RTC --------- Co-authored-by: Yong He <yonghe@outlook.com>
2025-01-24Fix static build and install (#6158)Dario Mylonopoulos
* Add SLANG_ENABLE_RELEASE_LTO cmake option * Fix cmake static build * Disable install SlangTargets to avoid static build failing
2024-11-25Fix issue with slang-embed & include ordering (#5680)Sai Praveen Bangaru
* Fix issue with slang-embed & include ordering * Update CMakeLists.txt
2024-11-07Fix CUDA prelude for makeMatrix (#5509)Yong He
* Fix CUDA prelude for makeMatrix * Add regression test.
2024-10-29formatEllie Hermaszewska
* format * Minor test fixes * enable checking cpp format in ci
2024-10-29format cmake files (#5406)Ellie Hermaszewska
* format cmake files * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2024-10-28Replace the word stdlib or standard-library with core-module for source code ↵Jay Kwak
(#5415) This commit changes the word "stdlib" or "standard library" to "core module" in the source code.
2024-10-24declutter top level CMakeLists.txt (#5391)Ellie Hermaszewska
* Split examples cmake desc * declutter top level CMakeLists.txt * fail if building tests without gfx * Move llvm fetching to another cmake file * Further split CMakeLists.txt * Neaten llvm fetching * Remove last premake remnant * correct cross builds * Neaten * Neaten project organization in vs
2024-10-17Cleanup atomic intrinsics. (#5324)Yong He
* Cleanup atomic intrinsics. * Fix. * Fix glsl. * Remove hacky intrinsic expansion logic for glsl image atomics. * Fix all tests. * Fix. * Add `InterlockedAddF16Emulated`. * Fix glsl intrinsic. * Fix.
2024-10-04Allow building using external dependencies (#5076)Tobias Frisch
* Add options to prevent usage of own submodules Signed-off-by: Jacki <jacki@thejackimonster.de> * Allow using external unordered dense headers Signed-off-by: Jacki <jacki@thejackimonster.de> * Link system wide installed unordered dense Signed-off-by: Jacki <jacki@thejackimonster.de> * Allow external header usage for lz4 and spirv Signed-off-by: Jacki <jacki@thejackimonster.de> * Add more options to disable targets Signed-off-by: Jacki <jacki@thejackimonster.de> * Add option to provide explizit path for spirv headers and remove earlier options that break the build process Signed-off-by: Jacki <jacki@thejackimonster.de> * Rename options to use common prefix Signed-off-by: Jacki <jacki@thejackimonster.de> * Fix indentation for the cmake changes Signed-off-by: Jacki <jacki@thejackimonster.de> * Add advanced_option function for cmake * Normalize includes between system and submodule dependencies Fix any before-accidentally-working problems * Add option for enabling/disabling slang-rhi Signed-off-by: Jacki <jacki@thejackimonster.de> * Pass correct include path for cpu tests * Correct include path --------- Signed-off-by: Jacki <jacki@thejackimonster.de> Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
2024-07-24Add missing make_bool intrinsics in cuda prelude. (#4735)Yong He
2024-07-18Allow CPP/CUDA/Metal to lower/legalize buffer-elements to support ↵ArielG-NV
column_major/row_major. (#4653) * Allow CPP/CUDA/Metal to legalize their buffer-elements. Fixes: #4537 Changes: 1. Matrix inputs require legalization (pack/unpack) to ensure consistent row_major/column_major throughout entire shader, the following enabled legalization pass fixes this. 2. Added missing CUDA intrinsic so CUDA can run more tests. 3. Added a memory packing test since this still fails for cpp/cuda/metal (due to having no memory packing enforcement). * change memory packing tests to run for targets without packing --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-07-17Move the file public header files to `include` dir (#4636)kaizhangNV
* Move the file public header files to `include` dir Close the issue (#4635). Move the following headers files to a `include` dir located at root dir of slang repo: slang-com-helper.h -> include/slang-com-helper.h slang-com-ptr.h -> include/slang-com-ptr.h slang-gfx.h -> include/slang-gfx.h slang.h -> include/slang.h Change cmake/SlangTarget.cmake to add include path to every target, and change the source file to use "#include <slang.h>" to include the public headers. The source code update is by the script like follow: ``` fileNames_slang=$(grep -r "\".*slang\.h\"" source/ -l) for fileName in "${fileNames_slang[@]}" do echo "$fileName" sed -i "s/\".*slang\.h\"/\"slang\.h\"/" $fileName done ``` * Fix the test issues * Fix cpu test issues by adding include seach path * Update cmake to not add include path for every target Also change "#include <slang.h>" to "include "slang.h" " to make the coding style consistent with other slang code. * Change public include to private include for unit-test and slang-glslang
2024-07-10Add `float16` support to slang-torch (#4584)Sai Praveen Bangaru
2024-07-05Correct type for double log10 (#4550)Ellie Hermaszewska
Fixes https://github.com/shader-slang/slang/issues/4549
2024-07-01Error out when constructing tensor views from tensors with 0 stride. (#4516)Sai Praveen Bangaru
This avoids a problem with broadcasted tensors. Our tensor-view platform is designed to allow unrestricted access to tensor memory, while broadcasted tensors were designed for 'read-only' use-cases. Trying to write into a broadcasted tensor needs re-allocation, which Slang is not designed to do. For now, we enforce contiguity on tensors with any 0 strides. In the future, we will introduce a ConstTensorView object to allow such tensors to be used as an input. This patch also propagates name-hint information through structs & arrays of tensors, to allow sensible names for the error messages (before this the error messages were temporary inst numbers, which is nearly impossible to debug)
2024-04-24Prevent pointer validation for zero-size arrays (#4021)Sai Praveen Bangaru
2024-04-24Avoid DXC warnings for missing bitwise op parantheses (#4004)Jay Kwak
Resolves #3980 Based on the operator precedence, Slang may omits the parentheses if they are not needed. DXC prints warnings for such cases and some applications may treat the warnings as errors. This commit emits parentheses to avoid the DXC warning even when they are not needed.
2024-04-03Implement 8.14-8.19 of OpenGL-GLSL specificationArielG-NV
The following PR implements 8.14-8.19 of the [OpenGL-GLSL specification](https://registry.khronos.org/OpenGL/specs/gl/GLSLangSpec.4.60.pdf). Fully implements all functions and built-in type's, resolves https://github.com/shader-slang/slang/issues/3692 for GLSL & SPRI-V targets. _Notes:_ Testing Tools: * Fragment shaders cannot test computational results. Only OpCodes are checked for proper emitting. Implementation Notes: * SubpassInput requires an unknown image format. * SubpassInput is disjoint from TextureType: __SubpassImpl (.slang) & SubpassInputType (Compiler) to reduce code generation required. * SubpassInput required an additional input layout modifier, input_attachment_index, this was added as a new parameter binding attribute. Since the following qualifiers can overlap with different resources (`layout(input_attachment_index = 0, binding = 0, set = 0)`) input_attachment_index is checked for overlapping resource bindings separately from other qualifiers with `LayoutResourceKind::InputAttachmentIndex`. * `GLSLInputAttachmentIndexLayoutModifier` was added to enforce function parameters only accepting `in` decorated variables. * `in` decorated variables needed to have emitting modified to allow directly emitting the variable into function calls if used as a parameter, normally Slang has a "global variable" shadow as a "global parameter" through a copy. This does not work and is solved using `GlobalVariableShadowingGlobalParameterDecoration` to build a relationship of "global variable" to "global parameter", we then resolve this relationship and replace "global variable" uses later in compile. * `AtomicCounterMemory` memory-constraint requires `OpCapability AtomicStorage`, `AtomicStorage` is invalid for Vulkan targets. glslang outputs for `barrier`, `memoryBarrier`, and `groupMemoryBarrier` `AtomicCounterMemory` as a memory constraint. This compiles as valid SPIR-V for Vulkan since `OpCapability AtomicStorage` is not declared. This behavior of glslang is undefined as per [3.31.Capability of the SPIR-V specification](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_capability). We will omit `AtomicCounterMemory` from our barrier calls.
2024-03-08Improve cpp prelude. (#3725)Yong He
2024-02-24Enable SLANG_MAKE_VECTOR calls when using SLANG_CUDA_ENABLE_HALF without ↵NBickford
SLANG_CUDA_RTC (#3624)
2024-01-24Generate lookup tables from cmake (#3461)Ellie Hermaszewska
* Generate lookup tables from cmake * Correct add_custom_command generator dependencies * set options for lookup table source * include path * use slang_add_target for capability generated targets * vs project regenerate * ci wobble --------- Co-authored-by: Yong He <yonghe@outlook.com>
2023-12-08WIP: CMake (#3326)Ellie Hermaszewska
* More robust input and output selection in generator tools * Add cmake build system * Get slang-test running with cmake * Bump lz4 and miniz dependencies * Make cmake build more declarative * Correct preprocessor logic in slang.h * Add cuda test to compute/simple * Remove empty cmake files * output placement for cmake, and commenting * Correct include paths in spirv-embed-generator * Format cmake with gersemi * Make cmake build clerer * Neaten header generation Also work around https://gitlab.kitware.com/cmake/cmake/-/issues/18399 by introducing correct_generated_properties to set the GENERATED flag in the correct scope * remove unused files * use 3.20 to set GENERATOR property properly * spelling * more flexible linker arg setting * replace slang-static with obj collection * Set rpath and linker path correctly * neaten generated file generation * tests working with cmake build * fix premake5 build * comment and neaten cmake * remove unnecessary dependency * Build aftermath example only when aftermath is enabled * Add slang-llvm and other dependencies * Put modules alongside binaries * Find slang-glslang correctly * Better option handling * comments * add llvm build test * Better option handling * cmake wobble * use UNICODE and _UNICODE * remove other workflows * use ccache * neaten * limit parallel for llvm build * use ninja for build * Windows and Darwin slang-llvm builds * cache key * verbose llvm build * cl on windows * sccache and cl.exe * use cl.exe * Correct package detection * less verbosity * Simplify miniz inclusion * fix build with sccache * Neaten llvm building * neaten * Neaten slang-llvm fetching * more surgical workarounds * Add ci action * Get version from git * better variable naming * add missing include * clean up after premake in cmake * more docs on cmake build * ci wobble * add imgui target * more selective source * do not download swiftshader * Some missing dependencies * only build llvm on dispatch * Disable /Zi in CI where sccache is present * simplify * set PIC for miniz * set policies before project * reengage workaround * more runs on ci * Add cmake presets * Add cpack * move iterator debug level to preset * Correct lib flag * simplify action * Neaten cmake init * Add todo * Add simple test wrapper * Add tests to workflow presets * rename packing preset * Correctly set definitions * docs * correct preset names * Make slang-test depend on test-server/test-process * neaten * use workflow in actions * install docs * Correct module install dir * debug dist workflow * Install headers * neaten header globbing * Neaten dependency handling * make lib and bin variables * Do not set compiler for vs builds, unnecessary * docs * allow setting explicit source for target * maintain archive subdir * cmake docs * install headers * place targets into folders * cmake docs * nest external projects in folder * remove name clash * Neater external packages * meta targets in folder structure * cleaner slang-glslang dll * Add missing static directive to slang-no-embedded-stdlib * more robust module copying * make slang-test the startup project * folder tweak * Make FETCH_BINARY the default on all platforms * Set DEBUG_DIR * add natvis files to source * skip spirv tests * remove test step from debug dist * Add build to .gitignore * redo warnings to be more like premake * Update imgui * clean more premake files * Disable PCH for glslang, gcc throws a warning * Add /MP for msvc builds * warning wobble * Add script to build llvm * Add slang-llvm and generators components * Build slang-llvm in ci * comments * fetch llvm with git * better abi approximation for cache * better sccache key * formatting * Correct logic around disabling problematic debug info for ccache * exclude gcc and clang from windows ci * Make dist workflows use system llvm * naming * restore normal dist builds * formatting * run tests in ci * Correct slang-llvm url setting * Rely on the system to find the test tool library * actions matrix wiggle * cope with OSX ancient bash * Correct compilers on windows * more ci debugging * Correct rpath handling on OSX * neaten * correct path to slang-llvm * Correct rpath separator on osx * Find slang-llvm correctly * smoke tests only on osx * ci wobble * Give MacOS module a dylib suffix * get swiftshader correctly * cope with bsd cp * remove debug output * full tests on osx * ci wobble * Add some vk tests to expected failures * simplify ci * ci wobble * exclude dx12 tests from github ci * remove cmake code for building llvm * warnings * warnings as errors for cl * spirv-tools in path * add aarch64 ci build * Add SLANG_GENERATORS_PATH option for prebuilt generators * neaten * Correct generator target name * remove yaml anchors because github actions does not support them * Demote CMake in docs Also add info on cross compiling * Restore premake CI * use minimal ci for cmake * Write miniz_export for premake build and .gitignore it * Mention build config tool options in docs * Remove redefined macro for miniz * regenerate vs project
2023-11-07CUDA: Fixes for NVRTC 12.x and warp mask ambiguity; adds CC 8.x warp ↵Neil Bickford
reduction intrinsics. (#3314) * CUDA: Fixes for NVRTC 12.x, warp mask ambiguity; add reduction partial specializations. * Fixes running NVRTC on CUDA 12 without a specified profile (used in testing, e.g. `slang-test -api cuda -category wave`) * Fixes mask ambiguity between getting the lane index from threadId.x and a full mask of threads. * Adds partial specializations for compute capability 8.x warp reduction intrinsics. * Fix formatting
2023-10-26Make the exponent return value from frexp int (#3284)Ellie Hermaszewska
* Make the exponent return value from frexp int Fixes https://github.com/shader-slang/slang/issues/3282 * Update slang-llvm. --------- Co-authored-by: Yong He <yhe@nvidia.com>
2023-09-23More `slangpy` features + polishing (#3233)Sai Praveen Bangaru
* Update user-guide with new slangpy features * More polishing of new slangpy docs * Update a1-02-slangpy.md * Only require contiguity for vector element types * Added `loadOnce/storeOnce` and subscript operations * Added docs, `DiffTensorView.dims()` & `DiffTensorView.stride(uint)` * Add constructors, remove storeOnce/loadOnce test * Adjusted intrinsic definitions
2023-09-08Add check for contiguous tensors (#3199)Sai Praveen Bangaru
Otherwise, this can lead to undetected scenario where the strides are incorrect for non-scalar types (`float2`, `float3`, etc..) Users must call `tensor = tensor.contiguous()` on the inputs to avoid this error.
2023-09-08Remove unsupported torch types + add bool type. (#3197)Sai Praveen Bangaru
Co-authored-by: Yong He <yonghe@outlook.com>
2023-08-24Misc. SPIRV Fixes, Part 2. (#3147)Yong He
* Misc. SPIRV Fixes, Part 2. * Fix up. * Fix. * Add system smenatic values. * 16 bit int and floats, matrix/vector reshape, bool ops. * Fix. * Fix. * Allow push constant entry point params. * entrypoint params. * swizzleSet and swizzledStore. * packoffset. * string hash. * Fix. * Matrix arithmetics. --------- Co-authored-by: Yong He <yhe@nvidia.com>
2023-08-23Lower all ByteAddressBuffer uses for SPIRV. (#3143)Yong He
Co-authored-by: Yong He <yhe@nvidia.com>
2023-08-02Only define atomics for `float2` and `float4` when CUDA arch<900 (#3041)Sai Praveen Bangaru
From Ada onwards, these definitions are already available in CUDA's stdlib and will cause a compiler error.
2023-07-14Avoid implicit casts or device transfers. (#2992)Sai Praveen Bangaru
2023-07-12Fix native string emit for CUDA/Cpp backend. (#2980)Yong He
Co-authored-by: Yong He <yhe@nvidia.com>
2023-06-27Support for infinite literal of from 34.2432#INF (#2944)jsmall-nvidia
2023-05-09Various fixes for autodiff and slangpy. (#2876)Yong He
* Various fixes for autodiff and slangpy. * Fix cuda code gen for `select`. * Fix getBuildTagString(). * Fix. --------- Co-authored-by: Yong He <yhe@nvidia.com>
2023-04-13Set sharedMem argument to 0 when launching cuda kernel. (#2799)Yong He
Co-authored-by: Yong He <yhe@nvidia.com>
2023-04-11Small fixes to TorchTensor. (#2790)Yong He
Co-authored-by: Yong He <yhe@nvidia.com>