summaryrefslogtreecommitdiff
path: root/source/slang
AgeCommit message (Collapse)Author
2024-05-14Support combined textures for Metal target (#4169)Jay Kwak
2024-05-14Remove use of `G0` and `__target_intrinsic` in stdlib. (#4170)Yong He
* Remove use of `G0` and `__target_intrinsic` in stdlib. * Fix. * Fix calling intrinsic in global scope.
2024-05-14Implement texture functions for Metal target (#4158)Jay Kwak
* Impl texture APIs for Metal target This commit is to implement texture functions for Metal target. The following functions are implemented and tested. - GetDimensions() - CalculateLevelOfDetail() - CalculateLevelOfDetailUnclamped() - Sample() - SampleBias() - SampleLevel() - SampleCmp() - SampleCmpLevelZero() - Gather() - SampleGrad() - Load() Metal has limited support for the texture functions compared to HLSL. - LOD is not supported for 1D texture, - Depth textures are limited to 2D, 2DArray, Cube and CubeArray textures. - "Offset" variants are limited to 2D, 2DArray, 2D-Depth, 2DArray-Depth and 3D textures. The functions that cannot be implemented for Metal should properly be handled by the capability system later. * Fix the failing test, multi-file.hlsl I am not sure why this change is needed. * Fix compile errors on macOS 2nd try * Remove a typo character to fix the compile error * Trivial clean up * Remove `as_type` where it was intended as static_cast * Use a simpler sytax for __intrinsic_asm * Trivial clean up * Remove TEST_AFTER_FIXING_CAPABILITY_PROBLEM after fixing normalize * Fix the failing test properly * Fix an incorrect setup of Depth-cube texture --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-05-14Fix CFG reversal logic for loops (#4162)Sai Praveen Bangaru
Handles a corner case where the first block after the condition on the true-side is another condition. This would currently result in an invalid reverse graph, where the reverse version of the true-block is the merge point for two different branching insts (the reverse version of the loop as well as the second condition). This patch simply adds a blank block when constructing the reverse-loop (similar to critical edge breaking) so that each branch inst in the reversed loop has a unique merge block.
2024-05-14Propagate warning settings on `Linkage` to IR passes. (#4156)Yong He
2024-05-13Add LoadAligned and StoreAligned methods to ByteAddressBuffers (#4066)Sriram Murali
Fixes #4062 This change enables wide load/stores for byte-address-buffer backed resources, when the data is accessed at an offset that is aligned. **Goals** - Improve performance by issuing wider instructions instead of sequence of scalar instructions, for load and stores of byte-address buffers. - Reduce code-size and readability of the generated shaders. - Help naive users as well as ninja programmers, generate optimal code. **Non Goals** - Help with Structured buffers, or other resources. - Target compilation time improvements. **Key changes** Adds 2 new overloads for Load and Store operations on ByteAddress Buffers. 1. Load / Store with an extra alignment parameter ``` resource.Load<T>(offset, alignment); resource.Store<T>(offset, value, alignment); ``` 2. LoadAligned / StoreAligned with no extra parameter, with the same signature as orignial Load / Store. ``` resource.LoadAligned<T>(offset); resource.StoreAligned<T>(offset, value); ``` - This overload will implicitly identify the alignment value, from the base type T of the elementary unit of the resource. **Supported resources** 1. Vectors This can be upto 4 elements, i.e. float -- float4. 2. Arrays This does not have a limit on number of elements, but on a conservative estimate, we can limit to few hundreds. 3. Structures This is used to group a resource of a single type. ``` struct { float4 x; } ``` **Code updates** - Modified byte-address-ir legalize to handle struct, array and vector kinds of load or store access - Added custom hlsl stdlib functions to implement all the overloads for Load, Store etc. - Added C-like emitter, SPIR-V emitter for handling ByteAddressBuffers. - Added a new core stdlib function intrinsic to wrap around alignOf<T>(). - Added a new peephole optimization entry to identify the equivalent IntLiteral value from the alignOf<T>() inst. - Added tests to check explicit, and implicit aligned Load and Store operations.
2024-05-10Fix race-condition and visual artifacts issues (#4152)kaizhangNV
* Fix race-condition and visual artifacts issues In PerformanceProfiler::getProfiler() we return a static object for the profiler implementation, this is not thread-safe, so change it to thead_local. There is still some visual artifacts when using slang as the shading language. We don't know the root cause yet, but found out it's related to our loop inversion algorithm. So stage this feature for now, and turn it into an internal option and default off. We will re-enable it after more investigation on this optimization. File an new issue 4151 to track it. * Add '-loop-inversion' to the few tests
2024-05-10More Metal Intrinsics. (#4143)Yong He
2024-05-08Metal: propagate and specialize address space. (#4137)Yong He
2024-05-08Support `getAddress` of a single-element vector swizzle. (#4138)Yong He
Fixes #4112.
2024-05-08Support `[__ref]` attribute to make `this` pass by reference. (#4139)Yong He
Fixes #4110.
2024-05-08`slangc` tool experience improvements. (#4140)Yong He
* `slangc` tool experience improvements. Fixes #4123. Fixes #4127. * Update doc.
2024-05-08Fix legalization of `kIROp_GetLegalizedSPIRVGlobalParamAddr`. (#4141)Yong He
2024-05-08Fix crash in obfuscation (#4134)kaizhangNV
Slang crashes during obfuscation because of referencing the nullptr pointer. Add the checking. In addition, above situation happens when user provide an empty slang shader with '-obfuscate' option, we shouldn't do anything in that case. So add an early return in obfuscateModuleLocs if no IR code is actually generated.
2024-05-08Fix NonUniformResourceIndex legalization for SPIRV. (#4133)Yong He
* Fix NonUniformResourceIndex legalization for SPIRV. * Update gh-4131.slang
2024-05-08capture/replay: interface implementation 1 (#4122)kaizhangNV
* capture/replay: interface implementation 1 - Add global session, filesystem, and session capture interface classes: GlobalSessionCapture for IGlobalSession FileSystemCapture for ISlangFileSystemExt SessionCapture for ISession - Add environment variables to enable it The 2 variables are SLANG_CAPTURE_LAYER and SLANG_CAPTURE_LOG_LEVEL SLANG_CAPTURE_LAYER: In slang_createGlobalSession(), after the compiling/loading stdlib, we will check the capture environment variable, if it's set to 1, we will create a GlobalSessionCapture object and return to user code. SLANG_CAPTURE_LOG_LEVEL: This is to set the log level, user can choose the loglevel to debug. (We can remove this when the feature is fully implemented). - Update premake file and cmake file to add the capture/replay source folder * Fix Windows build error Fix windows build error by adding the "SLANG_MCALL" keyword. Change to use Slang::ComPtr for those captured object pointers to simplify the resource management. Use __func__ macro to print the function name in the log.
2024-05-07Make sure pointer local vars have `AliasedPointer` decoration. (#4132)Yong He
2024-05-07Support Metal math functions (#4118)Jay Kwak
* Support Metal math functions Closes #4024 Note that Metal document says Metal doesn't support "double" type; "Metal does not support the double, long long, unsigned long long, and long double data types." According to Metal document, math functions are not defined for integer types. That leaves only two types to test: half and float. As a code clean up, __floatCast is replaced with __realCast. But I had to add a new signature that can convert from integer to float. Some of GLSL functions are moved to hlsl.meta.slang. For those functions, there isn't builtin functions for HLSL but there are for GLSL and Metal. "nextafter(T,T)" is currently not working because it requires Metal version 3.1 and we invoke metal compiler with a profile version lower than 3.1. * Changes based on review comments.
2024-05-06Support groupshared variables for Metal. (#4116)Yong He
2024-05-06Delete `wrap-global-context` pass. (#4114)Yong He
* Delete `wrap-global-context` pass. The pass was added for the metal backend without realizing that the existing `explicit-global-context` does 99% of the job. Instead of duplicating the logic in a different pass for metal, we extend same explicit-global-context pass to work for metal. * Fix build.
2024-05-03Fix mistake in WaveMatch intrinsic. (#4105)Yong He
2024-05-03Add host shared library target. (#4098)Yong He
* Add host shared library target. * Attempt fix. * Fix warnings. * try fix. * Fix test. * Fix.
2024-05-03Don't bottleneck Wave intrinsics through `WaveMask*` for spirv. (#4099)Yong He
* Don't bottleneck Wave intrinsics through `WaveMask*` for spirv. * Fix.
2024-05-03Fix `Ptr::__subscript` to accept any integer index. (#4100)Yong He
* Fix `Ptr::__subscript` to accept any integer index. * Fix `Ptr::__subscript` to allow 64bit indices.
2024-05-03Utilize vector operations over scalar if possible (#4092)Jay Kwak
* Utilize vector operations over scalar if possible Closes #4085 * Fix for the failing CI [ForceUnroll] is removed because it changed the emitted SPIR-V code a little differently for half-conversion.slang. SPIR-V code style is changed to a more preferred style, from "OpXX $$T result $x" to "result:$$T = OpXX $x"
2024-05-02Support generic constraints that are dependent on another generic param. (#4091)Yong He
2024-05-02Fix unzipping logic for inout non-diff parameters and adjust tests (#4090)Sai Praveen Bangaru
* Fix unzipping logic for inout non-diff parameters and adjust tests + Removed `-g0` from `struct-this-parameter.slang` test. Works correctly with the new unzipping logic. + Removed `-g0` from `was/warped-sampling-1d.slang` test. Works correctly with DX12 & CS_5_1. CS_5_0 appears to run into an FXC compiler bug with detecting infinite loops where there don't appear to be any. * Update slang-ir-autodiff-unzip.h * Update warped-sampling-1d.slang
2024-05-02Handle case where types can be used as their own `Differential` type. (#4057)Sai Praveen Bangaru
* Avoid synthesis for when types can be used as their own differenial + Add test * Add missing files.. * Fix issue with method synthesis for self-differential types + Add a generic test * Fix * Fix issue with out-of-date type resolution cache. Witness tables created during the conformance checking phase not being taken into account during the decl type resolution phase because the epoch is not updated after conformance checking. This leads to certain complex associated-type lookup chains (such as the one in tests/compute/assoctype-nested-lookup) not resolving properly and causing errors. * Delete self-differential-type-synthesis-extension.slang * Quick fix to repopulate stdlib cache for deferred stdlib loading * Update slang-check-decl.cpp
2024-05-02Allow multiple _AttributeTargets for attribute declaration (#4087)kaizhangNV
The syntax like: [__AttributeUsage(_AttributeTargets.Var)] [__AttributeUsage(_AttributeTargets.Param)] struct DefaultValueAttribute { int iParam; }; is allowed. For user-defined attribute, we can specify more attribute targets on the attribute declaration. So one attribute can be used in more than one situations.
2024-05-02Fix fmod behavior targetting GLSL and SPIR-V (#4080)Jay Kwak
* Fix fmod behavior targetting GLSL and SPIR-V The default implementation of fmod was doing "Modulo" operation when "fmod" in HLSL should do "remainder" operation. * Fix a mistake in `fmod` GLSL target When using __intrinsic_asm, the "if" logic wasn't emitted. "__intrinsic_asm" had to be called from a new function and `fmod` had to call it. Alternatively, I am using `operator?()` to workaround. A similar modification is made to `roundEven()` hoping for a better performance.
2024-05-02Implement SPIR-V target for GLSL functions (#4083)Jay Kwak
Fixes #4051 This commit implements SPIR-V target for GLSL functions. It also fixes a few problesm of GLSL targetting implemention too.
2024-05-01Delete out-of-date assert. (#4079)Yong He
2024-05-01Fix/replace target intrinsic to target switch part 2 (#4058)Jay Kwak
* Fix texture capabilities * Remove more __target_intrinsic and fix capability for texture Fixes #3906 With this commit, following functions will use __target_switch: - abs - asdouble - clamp - min - max - EvaluateAttributeSnapped - frexp - log10 - modf - __glsl_textureXXX For an unknown reason, I couldn't get "min(int,int)" working with __target_switch. It causes a test failure in Falcore unit test. --------- Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com>
2024-05-01Copy default target's optionSet to code-gen target's optionSet (#4073)kaizhangNV
In current implementation, the some options will be to added to the target that is only specified by command line "-target". But if user specifies the target by just using slang API, e.g. 'spAddCodeGenTarget', those options will be missed. To fix the problem, we copy the default target's options to the code-gen target's option set. The default target will only be useful when there is no target specified in the command line.
2024-05-01Fix compile failures when using debug symbol. (#4069)Yong He
* Fix compile failures when using debug symbol. * Various fixes. * Fix intrinsic. * Fix test.
2024-05-01SPIRV: Fix performance issue when handling large arrays. (#4064)Yong He
* SPIRV: Fix performance issue when handling large arrays. * Add test for packing. * Fix clang.
2024-05-01SPIRV: Fix storage class for unwrapped pointers (#4068)cheneym2
In SPIRV legalization, a struct wrapper is created around push constants but is not itself legalized. Putting the struct type into the work list causes the storage access of the push constant pointer to be PhysicalStorageBuffer as expected, instead of Function scope that was produced without the added struct legalization. Adds a SPIRV test that exercises the fix. Fixes #3946 Co-authored-by: Yong He <yonghe@outlook.com>
2024-05-01Add ParamDecl as the attribute target (#4067)kaizhangNV
Currently we only allow variable, struct, and function as the target for the user-defined attribute, this change adds the function parameter to the target as well.
2024-05-01Adds functionality to dump IR to stdout (#4065)Sriram Murali
Adds a member dump() to IRInst that can writes the immediate value or IR inst value to stdout to help with debugging
2024-04-30Avoid classifying methods with `[numthreads]` as entry points for ↵Sai Praveen Bangaru
CUDA-related targets (#4063)
2024-04-30Added diagnostics & built-in type lowering for `[CUDAKernel]` functions (#4042)Sai Praveen Bangaru
* Added diagnostics & built-in type lowering for `[CUDAKernel]` functions This PR adds - Diagnostics for non-void return from a cuda kernel entry point - Diagnostics for using differentiable types in a differentiable cuda kernel entry point - Logic for converting built-in types (float3, float3x3, etc..) to portable struct types and unpacks the parameter back into a built-in type on the CUDA side. This is because built-in types have different implementations in CUDA & CPP targets, which causes signature mis-match when linking. * Fix error codes * Add ability to lower structs and arrays that contain built-in types. + Added tests + Fix issue where the host-side was not marshalling data to lowered types. * Update slang-ir-pytorch-cpp-binding.cpp --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-04-30Generate vectorized version of byteaddress load/store methods (#4036)Sriram Murali
Fixes #3533 - Add logic to perform aligned memory operations for loading from and storing to composite resources, like vectors within the ByteAddress legalize pass. - Checks Added a new test for byte address with/without alignment. --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-04-30Change stdlib to not depend on short-circuit (#4056)kaizhangNV
Do not use "&&" to implement the intrinsic kIROp_And, instead define a 'and' function in stdlib. So it will be up to us to determine whether we want to use 'short-circuit' behavior in stdlib.
2024-04-30Add option -disable-short-circuit (#4054)kaizhangNV
Add option -disable-short-circuit to disable short circuit for logic operators && and ||. Also, disable the short circuit by default in the stdlib.
2024-04-30Metal: Vertex/Fragment builtin and layouts. (#4044)Yong He
* Metal: Vertex/Fragment builtin and layouts. * Fix. * Fix test. * Emit user semantic on vertex/fragment attributes.
2024-04-29Replace __target_intrinsics and __specialize_for_target, part 1 (#4050)Jay Kwak
* Replace __target_intrinsics and __specialize_for_target Partially resolves #3906 Most but not all __target_intrinsics are replaced with __target_switch. All __specialize_for_target are replaced with __target_switch. This change is mostly processed by a temporary c++ program mechanically. Because the change is already too big, the remaining __target_intrinsics will be replaced later in another commit. * Fix indentations * Add diff.meta.slang * Revert the change in __sizeOf<>(). "$G0" doesn't seem to work. It needs to be addressed later. * Revert more functions that use `$G0` keyword
2024-04-29Do not mangle the name of identifiers when __extern_cpp is added (#4052)kaizhangNV
Do not mange the name of identifiers decorated by "__extern_cpp". For a slang files that are included by the library module and entry point module, slang could generated two different mangled names for the same functions, because the function with a struct parameter will make the mangled function name contains the file name. Therefore, we allow using "__extern_cpp" on such struct, such that no file name is associated in the mangled name.
2024-04-26Fix invoke resolution when dealing with overloded type expressions (#4043)Sai Praveen Bangaru
2024-04-26WIP: Force Inline If RefType (#4005)ArielG-NV
* Force Inline if reftype Fixes #3997. If we are using a refType, we now ForceInline. remarks: 1. Modifications were made in slang-ir-glsl-legalize to change how we translate GlobalParam proxy's into GlobalParam. a. We now handle the senario where a globalParam is used in multiple disjoint blocks (like 2 different functions). * try to figure out why CI fails but local works try to inline DispatchMesh, works locally, may fail on CI(?) * try another fix * add task tests + don't allow semi-early task-shader inline Task shader uses DispatchMesh which is a very big 'hack' where we check for the function name and modify the callees in very large ways. This function does inline, but it cannot inline early due to future mangling that this operation requires todo. This is reflected with the `[noRefInline]` modifier. It is a modifier so users may stop mandatory inlines with `__ref` parameter.
2024-04-25Fix unpackUnorm4x8 and unpackSnorm4x8 (#4033)Jay Kwak
Fixes #4031 Each component of unpackU/Snorm4x8 had to be masked for 8bits.