summaryrefslogtreecommitdiffstats
path: root/tests/compute
Commit message (Collapse)AuthorAge
...
* Fix WGSL parameter block binding. (#5500)Yong He2024-11-06
| | | | | | | | | | | | | * Fix WGSL parameter block binding. * Re-enable tests. * Update failure list. * Fix entrypoint parameters. * Update tests. * Enable stat-var test.
* [WGSL] Enable arbitrary arrays in uniform buffers. (#5497)Yong He2024-11-06
| | | | | | | | | | | | | | | | | * [WGSL] Enable arbitrary arrays in uniform buffers. * format code * Undo irrelevant change and fixups. * Update expected failure list. * Fix. * Rename. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Make various parameters and return types require specialization when ↵Anders Leino2024-11-06
| | | | | | | | | | | targeting WGSL (#5483) Structured buffer types translate to array types in the WGSL emitter. WGSL doesn't allow passing runtime-sized arrays to functions. Similarly for pointers to texture handles. Also, structured buffers (runtime-sized arrays) cannot be returned in WGSL. This closes issue #5228, issue #5278 and issue #5288 by enabling specialized functions to be generated in these cases, in order to work around these constraints.
* Legalize the Entry-point for WGSL (#5498)Jay Kwak2024-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Legalize the Entry-point for WGSL The return type of the entry-point needs to be legalized when targeting WGSL. This commit flattens the nested-structs of the return type and the input parameters of the entry-point. Most of code is copied from the legalization code for Metal. The following functions are exactly same to the implementation for Metal or almost same. - flattenInputParameters() : 136 lines - reportUnsupportedSystemAttribute() : 7 lines - ensureResultStructHasUserSemantic() : 46 lines - struct MapStructToFlatStruct : 176 lines - flattenNestedStructs() : 95 lines - maybeFlattenNestedStructs() : 42 lines - _replaceAllReturnInst() : 19 lines - _returnNonOverlappingAttributeIndex() : 16 lines - _replaceAttributeOfLayout() : 23 lines - tryConvertValue() : 41 lines - legalizeSystemValueParameters() : 11 lines They need to be refactored to reduce the duplication later. The test case, `tests/compute/assoctype-lookup.slang`, had a bug that the compute shader was trying to use the varying input/output with the user defined semantics. This commit removes the user defined semantics, because the compute shaders cannot use the user defined semantics. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
* Fix system semantics of SV_GroupIndex (#5496)kaizhangNV2024-11-05
| | | Close the issue #4940.
* Revert uint<->int implicit cast cost to prefer promotion to unsigned. (#5480)Yong He2024-11-02
|
* Enable a few more WGPU tests (#5476)Anders Leino2024-11-01
| | | | | * Enable tests/compute/func-cbuffer-param * Enable tests/language-feature/tuple/tuple-parameter.slang
* Update Slang-RHI again to get more WGPU fixes (#5475)Anders Leino2024-11-01
| | | | | | | | | | | | | This fixes a teardown crash, and a buffer usage mismatch issue during bind group creation. These Slang-RHI fixes allow several WGPU tests to be enabled: - tests/compute/column-major.slang - tests/compute/constant-buffer-memory-packing.slang - tests/compute/matrix-layout.hlsl - tests/compute/non-square-column-major.slang - tests/compute/row-major.slang - tests/hlsl/packoffset.slang This helps to address issue #5222.
* Cleanup atomic intrinsics. (#5324)Yong He2024-10-17
| | | | | | | | | | | | | | | | | | | * Cleanup atomic intrinsics. * Fix. * Fix glsl. * Remove hacky intrinsic expansion logic for glsl image atomics. * Fix all tests. * Fix. * Add `InterlockedAddF16Emulated`. * Fix glsl intrinsic. * Fix.
* Enable WebGPU tests in CI (#5239)Anders Leino2024-10-15
|
* WGSL: Enable load & store from byte-addressible buffers (#5252)Anders Leino2024-10-11
|
* WGSL emitter: Specify private address space for global non-handle variable ↵Anders Leino2024-10-08
| | | | | declarations (#5236) Closes issue #5229.
* Add WGSL support for slang-test (#5174)Anders Leino2024-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Use the assembly description as target when disassembling I believe this is a bugfix. It seems to have worked before because up until the WGSL case, the disassembler has been the same executable as the one producing the binary to be disassembled. * Add Tint as a downstream compiler This closes issue #5104. * Add downstream compiler for Tint. * Tint is wrapped in a shared library, 'slang-tint' available from [1]. * The header file for slang-tint.dll is added in external/slang-tint-headers. * Add some boilerplate for WGSL targets. * Add an entry point test for WGSL. [1] https://github.com/shader-slang/dawn/releases/tag/slang-tint-0 * Add WGSL_SPIRV as supported target for Glslang * Add WebGPU support to slang-test This helps to address issue #5051. * Disable lots of crashing compute tests for 'wgpu' This closes issue #5051. --------- Co-authored-by: Yong He <yonghe@outlook.com>
* Respect matrix layout in uniform and in/out parameters for HLSL target. (#5013)Yong He2024-09-05
| | | | | | | | | | | | | | | | | | | * Respect matrix layout in uniform and in/out parameters for HLSL target. * Update test. * Fix test. * fix test. * Fix metal layout calculation. * Fix compile error. * Fix compiler error. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Draft: integrate slang-rhi (#4970)Simon Kallweit2024-08-30
| | | | | | | | | | | | | | | | | | | | | * add slang-rhi submodule * refactor render-test to use slang-rhi and remove OpenGL support * remove -vk -glsl tests * remove gl test * disable failing test * allow recursive submodules in github actions * update slang-rhi * update slang-rhi --------- Co-authored-by: Yong He <yonghe@outlook.com>
* Do not zero-initialize groupshared and rayquery variables (#4838)ArielG-NV2024-08-14
| | | | | | | | | | | | | | | | | | | | | | | * Do not zero-initialize groupshared and rayquery variables Fixes: #4824 `-zero-initialize` option will explicitly not: 1. Set any groupshared values to defaults 2. Set any rayQuery object to a default state (currently invalid code generation) * grammer * disallow groupshared initializers disallow groupshared initializers & adjust tests accordingly * remove disallowed groupshared-init expression * do not default init if non-copyable --------- Co-authored-by: Yong He <yonghe@outlook.com>
* Issue/legalize resource (#4769)kaizhangNV2024-08-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Fix the issue that NonUniformResourceIndex is ignored Fix the issue that after `specializeFunctionCalls`, `NonUniformResourceIndex` is ignored in the generated specialized function. The reason is that if the function has a non-uniform resource parameter, we will legalize it by replacing the resource parameter with a index, and indexing of the resource will be moved inside the specialized function. e.g. ``` void func(ResourceType resource) { ... } func(resource[NonUniformResourceIndex(0)]) ``` will be specialized into ``` void func(int index) { resource[index]; } func(0); ``` In this case, inside the function, we will loose the information about whether the resource is a non-uniform. So we add the handling for this corner case by adding insert a `NonUniformResourceIndex` into the specialized function: ``` void func(int index) { int nonUniformIdx = NonUniformResourceIndex(index); resource[nonUniformIdx]; } ``` * Fix the issue that arguments mismatch after specilization callsite specializeCall() call could cause arguments mismatch with the parameters of the specialized function. For example, if the function parameter contains a resource type ``` void func(ResourceType res) { ... } int index = ... func(resources[index]); ``` This will be specialized into ``` void func(int index) { resources[index] } int index = ... func(index); ``` However, if we have more than 1 call sites, and the other call site doesn't use `int` as the index, e.g. ``` uint index = ... func(resources[index]); ``` this call site will be specialized into ``` uint index = ... func(index); ``` this will be invalid, because the argument doesn't match the parameter. so we just add the data type of the new arguments into the function key such that For the uniformity info, we add a new attribute "IROp_NonUniformAttr", so we will form a IRAttributedType that encodes both uniformity and data type, and use it as the key of call info. So if there is call site using the different data type for the resource index, we will specialize a new function for this. * Handle the intCast and uintCast operation Since after intCast/uintCast of nonuniformIndex, it's still a nonuniformIndex. So we will handle this case as well. Also, add a new test to cover this.
* Support an Upper-case variant of [NumThreads] and [Shader] (#4780)Jay Kwak2024-08-06
| | | | | | | | | | | | Closes #4746. This commit adds a support for "NumThreads" and "Shader" attribute keyword, which is in CamelCasing starting with an upper case letter. The attribute keywords in HLSL are case-insensitive. As an example, one of D3D documents says, "The attribute name "Shader" is case insensitive." https://microsoft.github.io/DirectX-Specs/d3d/WorkGraphs.html Slang, however, doesn't support the case-insensitivity. They should be all lower-case or CamelCasing starting with an upper case.
* Support parameter block in metal shader objects. (#4671)Yong He2024-07-19
| | | | | | | | | | | | | * Support parameter block in metal shader objects. * Ingore parameter block tests on devices without tier2 argument buffer. * Fix warning. * Fix texture subscript test. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Allow CPP/CUDA/Metal to lower/legalize buffer-elements to support ↵ArielG-NV2024-07-18
| | | | | | | | | | | | | | | | | | column_major/row_major. (#4653) * Allow CPP/CUDA/Metal to legalize their buffer-elements. Fixes: #4537 Changes: 1. Matrix inputs require legalization (pack/unpack) to ensure consistent row_major/column_major throughout entire shader, the following enabled legalization pass fixes this. 2. Added missing CUDA intrinsic so CUDA can run more tests. 3. Added a memory packing test since this still fails for cpp/cuda/metal (due to having no memory packing enforcement). * change memory packing tests to run for targets without packing --------- Co-authored-by: Yong He <yonghe@outlook.com>
* Warnings for uninitialized fields in constructors (#4680)venkataram-nv2024-07-18
| | | | | | | | | | | | | * Detect uninitialized fields in constructors * Reachability check for early returns * Specialized warnings for synthesized default initializers * Handling quirks with constructors * Addressing review comments * Ignore synthesized constructors if they are not used
* Warnings function parameters (#4626)venkataram-nv2024-07-16
| | | | | | | | | | | | | | | | | | | | | * Handle out/inout functions with separate consideration * Fixing bug with passing aliasable instructions * Handle autodiff functions (fwd and rev) in warning system * Handling interface methods * Handling ref parameters like out/inout * Temporary fix to remaining bugs * Refactoring methods and tests * Recursive check for empty structs * Using default initializable interface in tests * Resolving CI fail
* Fix issue with synthesized `__init` methods not getting added to witness ↵Sai Praveen Bangaru2024-07-16
| | | | table (#4638)
* use `nullptr' for IRStructKey with `IRDerivativeMemberDecoration` (#4623)ArielG-NV2024-07-12
|
* Fix lowering of associated types and synthesis of dispatch functions. (#4568)Sai Praveen Bangaru2024-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | * Treat global variables and parameters as non-differentiable when checking derivative data-flow Global parameters are by-default not differentiable (even if they are of a differentiable type), because our auto-diff passes do not touch anything outside of function bodies. The solution is to use wrapper objects with differentiable getter/setter methods (and we should provide a few such objects in the stdlib). Fixes: #3289 This is a potentially breaking change: User code that was previously working with global variables of a differentiable type will now throw an error (previously the gradient would be dropped without warning). The solution is to use `detach()` to keep same behavior as before or rewrite the access using differentiable getter/setter methods. * Fix issues with lookup witness lowering * Update slang-ir-lower-witness-lookup.cpp * Add tests * Update slang-ir-lower-witness-lookup.cpp * Cleanup * Update nested-assoc-types.slang --------- Co-authored-by: Yong He <yonghe@outlook.com>
* Implement non member function atomic texture support (#4544)ArielG-NV2024-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Implement non member function atomic texture support texture_buffer and texture1d Fixes: #4538 Related to: #4291, fixes `tests/compute/atomics-buffer.slang` Texture objects cannot use `__getMetalAtomicRef` to cast objects into atomic value type. [Texture objects mandate use of member functions](https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf#Texture%20Functions). The implementation is as follows: * We can detect texture object usage through checking for an `IRImageSubscript` Operation. `__isTextureAccess()` was added to evaluate if we have an `IRImageSubscript` operation at compile time (before `static_assert`). `__isTextureAccess()` only checks if we are targeting Metal. * We have all parameter data needed to call a texture atomic function embedded inside `IRImageSubscript`. `__extractTextureFromTextureAccess()` and `__extractCoordFromTextureAccess()` was added to extract this data for use with Metal atomics. Note: * Metal documentation has various incorrect details (function names) * Since we currently hardcode metal versions for compiling, the Metal compiler version was changed to target `Metal 3.1` (`slang-gcc-compiler-util.cpp`) * textures do not permit atomic float operations * add fallthrough attribute + fix bug with 'exchange instead of xor' + fix warning bug * incorrect function name fix * missing filecheck * disable atomics-buffer.slang compute test since GFX issue causing it to fail * Array support for metal interlockedAtomic and proper verification of texture with interlockedAtomic functions * Array support for metal interlockedAtomic * proper verification of texture with interlockedAtomic functions note: had to seperate many functions to allow forceInlining to run * missing getOperand(0) * push atomic fix for metal * fix atomic syntax for metal and hlsl emitting extra brackets (breaks tests) * test changes and meta changes 1. max is 8 rw textures with metal because Metal has this limit. Split up tests to not hit this limit 2. added back `[0]`...,`T` to test since this legalizes metal atomic intrinsic * macro'ify some of the atomic code 1. addresses review 2. makes code easier to modify in the future (rather than sifting through 1000 lines we can just look at ~10-30 * fix test 'check' * missing float support due to macro * add functions macro generates, `InternalAtomicOperationInfo` --------- Co-authored-by: Yong He <yonghe@outlook.com>
* Fixes to Metal Input parameters and Output value input/output semantics (#4536)ArielG-NV2024-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * initial change to test with CI for CPU/CUDA errors * Fixes to Metal Input parameters and Output values Note: 1. Flattening a struct is the process of making a struct have 0 struct/class members. Changes: 1. Separated `legalizeSystemValueParameters`. This was done to make it easier to run `legalizeSystemValue` 1 system-value at a time to simplify logic. This change is optional and can be undone if not preferred. 2. Wrap everything inside a Metal legalization context. This was done since it simplifies a lot of logic and will be required for #4375 3. Created `convertSystemValueSemanticNameToEnum` and expanded the existing System-Value Enum system. This allows (sometimes) faster comparisons and helps prepare code for porting into `slang-ir-legalize-varying-params.cpp` (#4375) 4. Added a more dynamic `legalizeSystemValue` system so more than 2 types can be targeted for legalization. This is required to legalize `output`. There is still no preference for any converted type, the first valid type will be converted to. 5. Flatten all input(`flattenInputParameters`)/output(part of `wrapReturnValueInStruct`) structs and assign semantics accordingly. 6. Semantics when legalized have no specific logic other than to: 1. avoid overlapping semantics 2. Prefer assigning explicit semantics specified by a user. 7. Fixed some issue with incorrect output semantics if not a fragment stage (when there are not any assigned semantics) * change metallib test to the correct metal test * comment code & cleanup -- Did not address all review Added comments for clarity + cleaned up some odd areas which were messy * Add comment to `fixFieldSemanticsOfFlatStruct` I found `fixFieldSemanticsOfFlatStruct` to still be confusing at a cursory glance. Added comments to make the function be more understandable. * white space * Address review comments 1. Fix semantic propegation. 2. Fix how we map struct fields of the flat struct to struct. This is specifically important for if reusing the same struct twice since struct member info is not unique per struct instance used. * Fix semantic legalization by adding TreeMap Add TreeMap to allow proper sorted-object data iteration. * Fix some compile issues * try to fix gcc compile error * compile error * fix logic bug in treeMap iterator next-semantic setter * fix vsproject filters * filter file syntax error * remove need of a context to make copies stable * Rename treemap to the more appropriate name of "treeset", adjust code comments accordingly. * remove custom type `TreeSet` and use `std::set` * remove TreeMap fully --------- Co-authored-by: Yong He <yonghe@outlook.com>
* correctly setting launch parameters should fix the test (#4551)ArielG-NV2024-07-07
|
* Fix the type error in kIROp_RWStructuredBufferLoad (#4523)kaizhangNV2024-07-01
| | | | | | | | | | | | | | | * Fix the type error in kIROp_RWStructuredBufferLoad In StructuredBuffer::Load(), we allow any type of integer as the input. However, when emitting glsl code, StructuredBuffer::Load(index) will be translated to the subscript index of the buffer, e.g. buffer[index], however, glsl doesn't allow 64bit integer as the subscript. So the easiest fix is to convert the index to uint when emitting glsl. * Add commit
* Fix Texture2DMSArray (#4485)Jay Kwak2024-06-26
| | | | | | | | * Fix Texture2DMSArray Close #4427 We had the postfix order wrong for the keyword MS. This commit changes the incorrect name Texture2DArrayMS to Texture2DMSArray.
* Expand upon existing `ImageSubscript` support (Metal, GLSL, SPIRV) (#4408)ArielG-NV2024-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add additional `ImageSubscript` features: 1. Added ImageSubscript support for Metal & a test case * Merge GLSL/SPIRV/Metal `ImageSubscript` legalization pass 2. Added multisample support to glsl/spirv/metal for when using ImageSubscript * Added in this PR since the overhaul of the code merges together GLSL/SPIRV/Metal implementation 3. Fixed minor metal texture `Load`/`Read` bugs * [HLSL methods of access do not support subscript accessor for texture cube array](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/texturecubearray) * removed swizzling of uint/int/float * other odd bugs which were causing compile errors note: Compute tests do not work due to what seems to be the GFX backend (causes crash without error report). The tests are disabled. * disable LOD with texture 1d seems that LOD for 1d textures need to be a compile time constant as per an error metal throws * syntax error in hlsl.meta * static_assert alone with intrinsic_asm error provides cleaner errors Note: `static_assert` seems to be unstable and not be fully respected (still require `intrinsic_asm` to avoid a stdlib compile error) * change comment to `// lod is not supported for 1D texture * add `static_assert` in related code gen paths * address review * address review * add asserts as per review comment, NOTE: unclear if these should be release 'asserts' as well
* Support atomic intrinsics for Metal (#4473)Jay Kwak2024-06-25
| | | | | | | | | | * Support atomic intrinsics for Metal This commit adds a support for the atomic intrinsics in Metal. The atomic member functions for buffers is not implemented yet. Metal requires the first argument for the atomic functions to be an atomic data type. This implementation rely on the fact that we can do a C-style type casting from a regular data type to an atomic data type.
* [Metal] Fix global constant array emit. (#4392)Yong He2024-06-14
| | | | | * [Metal] Fix global constant array emit. * Try enable more tests.
* Implement for metal `SV_GroupIndex` (#4385)ArielG-NV2024-06-13
| | | | | | | | | | | | | | | | | | | | | | | | * Implement for metal `SV_GroupIndex` 1. If we don't have `sv_GroupThreadId` available we create one using `SV_GroupIndex`s location data. 2. We emit code emulating `sv_GroupThreadId` from the same logic that CUDA/CPP uses. * address most review comments Addressed all but two: [1](https://github.com/shader-slang/slang/pull/4385#discussion_r1639058473) and [2](https://github.com/shader-slang/slang/pull/4385#issuecomment-2166934855) I want to enable tests and be sure there is no bugs using CI before I redesign the code so I have a working fallback. * address comment, enable tests enable now functioning tests due to `SV_GroupIndex` working with metal * syntax error with groupThreadID search did `= param` instead of `= i.param` * add `sv_groupid` for test + test fixes * disable test that won't work regardless
* Metal: misc fixes and enable more tests. (#4374)Yong He2024-06-13
| | | | | | | | | | | | | | | | | * Fix and enable tests for metal. * Fix. * Fix. * Fix tests. * Fix warnings. * Fix. --------- Co-authored-by: Yong He <yonghe@Yongs-Mac-mini.local>
* Enable full test on macos. (#4327)Yong He2024-06-12
| | | | | | | | | | | * Enable full test on macos. * Add failing test to expected list. * Fix CI script. * Update expected failure list. * Update test list.
* Capability System: Implicit capability upgrade warning/error (#4241)ArielG-NV2024-06-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * capability upgrade warning/error adjusted implementation + tests to support a warning/error if capabilities are implicitly upgraded and test accordingly. * add glsl profile caps * add GLSL and HLSL capabilities to the associated capability * syntax error in capdef * only error if user explicitly enables capabilities 1. changed testing infrastructure to not set a `profile` explicitly, 2. Added tests to be sure this works as intended with user API and with slangc command line * Change capability atom definitions and how Slang manages them to fix errors 1. most `glsl_spirv` version atoms have been removed from `.capdef`, instead we will translate `spirv` version atoms into `glsl_spirv` since there is no point in writing the same code twice in `.capdef` files to define `spirv` versions. 2. add spirv version, and hlsl sm version (and equivlent) capability dependencies 3. removed some stage requirments which were set on objects, keep the wrapper capabilities. I am keeping the wrapper capabilities since I am unaware on if there are stage limitations (spec says code in practice does not work). * check internal version instead of version profile (_spirv_1_5 vs. spirv_1_5) * remove unused OpCapability. adjust SPIRV version'ing again for glsl_spirv * apply workaround for glslang bug with rayquery usage * ensure capabilities targetted by a profile and added together by a user are valid * remove additions to `spirv_1_*` wrapper * spirv_* -> glsl_spirv fix * fix bug where incompatable profiles would cause invalid target caps * try to avoid joining invalid capabilities * fix the warning/error & printing * run through tests to fix capability system and test mistakes many mistakes were mesh shaders doing `-profile glsl_450+spirv_1_4`. This is not allowed for a few reasons 1. the test tooling does not handle arguments the same as `slangc` 2. glsl_450 core profile does not support mesh shaders, nor does spirv_1_4. sm_6_5 does work in this senario * set some sm_4_1 intrinsics to sm_4_0 * replace `GLSL_` defs with `glsl_` * swap the unsupported render-test syntax for working syntax * set d3d11/d3d12 profile defaults this is required since sm version changes compiled code & behavior * adjusted nvapi capabilities with atomics + d3d11 set to use sm_5_0 as per default * cleanup * address review * incorrect styling * change `bitscanForward` to work as intended on 32 bit targets --------- Co-authored-by: Yong He <yonghe@outlook.com>
* [gfx] Metal texture fixes (#4331)skallweitNV2024-06-11
|
* enable more metal tests (#4326)skallweitNV2024-06-10
|
* Metal compute tests (#4292)skallweitNV2024-06-07
|
* Add LoadAligned and StoreAligned methods to ByteAddressBuffers (#4066)Sriram Murali2024-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes #4062 This change enables wide load/stores for byte-address-buffer backed resources, when the data is accessed at an offset that is aligned. **Goals** - Improve performance by issuing wider instructions instead of sequence of scalar instructions, for load and stores of byte-address buffers. - Reduce code-size and readability of the generated shaders. - Help naive users as well as ninja programmers, generate optimal code. **Non Goals** - Help with Structured buffers, or other resources. - Target compilation time improvements. **Key changes** Adds 2 new overloads for Load and Store operations on ByteAddress Buffers. 1. Load / Store with an extra alignment parameter ``` resource.Load<T>(offset, alignment); resource.Store<T>(offset, value, alignment); ``` 2. LoadAligned / StoreAligned with no extra parameter, with the same signature as orignial Load / Store. ``` resource.LoadAligned<T>(offset); resource.StoreAligned<T>(offset, value); ``` - This overload will implicitly identify the alignment value, from the base type T of the elementary unit of the resource. **Supported resources** 1. Vectors This can be upto 4 elements, i.e. float -- float4. 2. Arrays This does not have a limit on number of elements, but on a conservative estimate, we can limit to few hundreds. 3. Structures This is used to group a resource of a single type. ``` struct { float4 x; } ``` **Code updates** - Modified byte-address-ir legalize to handle struct, array and vector kinds of load or store access - Added custom hlsl stdlib functions to implement all the overloads for Load, Store etc. - Added C-like emitter, SPIR-V emitter for handling ByteAddressBuffers. - Added a new core stdlib function intrinsic to wrap around alignOf<T>(). - Added a new peephole optimization entry to identify the equivalent IntLiteral value from the alignOf<T>() inst. - Added tests to check explicit, and implicit aligned Load and Store operations.
* Handle case where types can be used as their own `Differential` type. (#4057)Sai Praveen Bangaru2024-05-02
| | | | | | | | | | | | | | | | | | | | | | | | * Avoid synthesis for when types can be used as their own differenial + Add test * Add missing files.. * Fix issue with method synthesis for self-differential types + Add a generic test * Fix * Fix issue with out-of-date type resolution cache. Witness tables created during the conformance checking phase not being taken into account during the decl type resolution phase because the epoch is not updated after conformance checking. This leads to certain complex associated-type lookup chains (such as the one in tests/compute/assoctype-nested-lookup) not resolving properly and causing errors. * Delete self-differential-type-synthesis-extension.slang * Quick fix to repopulate stdlib cache for deferred stdlib loading * Update slang-check-decl.cpp
* Generate vectorized version of byteaddress load/store methods (#4036)Sriram Murali2024-04-30
| | | | | | | | | | | | Fixes #3533 - Add logic to perform aligned memory operations for loading from and storing to composite resources, like vectors within the ByteAddress legalize pass. - Checks Added a new test for byte address with/without alignment. --------- Co-authored-by: Yong He <yonghe@outlook.com>
* Switch to direct-to-spirv backend as default. (#4002)Yong He2024-04-23
| | | | | | | | | * Switch to direct-to-spirv backend as default. * Fix slang-test. * Fix. * Fix.
* Enable NonUniformResourceIndex support for glsl, hlsl and spirv (#3899)sriramm-nv2024-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes #387676* ForceInline SampleLevel to allow decorations to apply * explictly add all the SPIRVAsmOperand Insts in non-differentiable list, which might get inadvertently processed when these functions are inlined into the main shader * Support NonUniformResourceIndex for SPIR-V target Fixes #3876 * add a new IR instruction for NonUniformResourceIndex * slang ir emitter for nonuniform resource index * update the hlsl meta slang * Add test cases for NonUniformResourceIndex access for buffers and textures, with/without cast, nested access etc. * add default c-like emitter for nonuniformresourceinfo * added hlsl emitter * added glsl emitter * requisites for spirv enabling - new decorator for nonuniformresourceindex - emitter for nonuniformresourceindex signature change * add hasResourceType checker * add rwStructBuffType in resourcetype checker * add a case for nonuniformres in emitDecorations * DO NOT COMMIT: This change adds special handling for RWStructBuf within the isResourceType function, if it is a pointer to this resource, return true to make it work with nonuniformres test * spirv emitter for decorations - update the emitLocalInst to perform decorations at the end * added main spirv emitter code * slang emit spirv bugfix * hacky way of supporting Call Inst * move code to cleanup nonuniform inst into helper function * remove stale codefrom test * add spirv decoration for nonuniform * update test to remove global variables * update coherent-2 test * update comment for special handling * update the spirv legalize to handle nested nonuniforms improved logic that handles call ops, rwstructbuf, nested nonuniforms etc. * update nonuniform-array-of-tex test * missed removing nonuniform inst causing duplicate decorations * add glsl and hlsl variants of nonuniform tests * repurpose the hasResource function into something specific for nonuniform inst decoration helper * clean up comments and code around spirv-legalization to emit nonuniform inst by recursively looking into the inst * use the helper canDecorateNonUniformInst to convert `nonUniformResourceInfo` inst to decoration * converted compute/unbounded-array-of-array cross compile test into a simple check test * update contains Resource helper function to be more generic * clean up the case for opcall handling with nonuniform resource inst * update ptr to struct buffer check to be more explicit and rename the function to check for ptr to resource type * update comments and fix the test for coherent * fix typos * update logic on spirv legalize to delete dead instructions - for some reason this doesn't automatically happen * add comments to declarations * add NonuniformResourceIndex to the non-differential inst list
* Fix the cuda left-hand swizzle issue (#3538) (#3691)kaizhangNV2024-03-06
|
* Implement short-circuit logic operator (#3635)kaizhangNV2024-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Implement short-circuit logic operator Implement short-circuit evaluation for logic && and || operator. The short-circuit behavior is only used when the operands involved are scalar and the parent function is non-differentiable. In implementation, we define a new class 'LogicOperatorShortCircuitExpr' derived from 'OperatorExpr'. In the visitInvoke() call, we will create a new expression object 'LogicOperatorShortCircuitExpr' if the expression is logic && or ||. So that we can generate new IR code in the new visit function 'visitLogicOperatorShortCircuitExpr' to implement the short-circuit behavior. Add new test to test the short-circuit behavior. * Fix an compile issue occurred in Falcon test Previously, we early return when at least one of the operands of "&&" or "||" is vector in convertToLogicOperatorExpr call. However, in that case the arguments involved in the expression have already been type checked. When it falls-back to 'visitInvokeExpr', it will check the arguments again, and some unexpected behavior could occur which could in turn cause some internal error. So we add a check in the 'visitInvokeExpr' to avoid double type checking of arguments. * Update glsl subgroup test to not use short-circuit Since the short-circuit evaluation could cause the threads diverging in subgroup intrinsics. So change the test to not using "&&" to chain those subgroup intrinsics together. Instead, using "&" to chain them together because those test functions have the return value as bool. * Disable short-circuit in few situations Disable short-circuit in following situations: 1. generic parameter list 2. static const varible initialization * Use a flag to indicate the enablement of short-circuit Instead of using a struct to indicate the state of the outer environment of current expression, use a simple bool flag to indicate whether or not apply the short-circuit to current expression because there few situations where we will disable short-circuiting and in those circumstances, there is no nested. Therefore, a flag is good enough to indicate the case. * Disable short-circuit in index expression Also fix the build issue. (A cleanup for the last change.) * check both 'static' and 'const' modifiers Previously we only check HLSLStaticModifier to decide whether or not using short-circuit, but we really should check both 'static' and 'const' modifiers together, because we only want to disable the short circuit for init expression for 'static const' variable. * relax the restriction of short-circuit for index expression Disable the short-circuit for index expression only when declare an array. * Simplify the logic by creating subVisitor Simplify the logic by create a sub expression visitor so that we don't need to introduce extra recursion. * Call convertToLogicOperatorExpr after args check Change to call convertToLogicOperatorExpr after arguments check in visitInvokeExpr such that we don't have to check whether the arguments checked to avoid the double checking issue.
* Refactor compiler option representations. (#3598)Yong He2024-02-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | * Refactor compiler option representation. * Fix binary compatibility. * Add a test for specifying compiler options at link time. * Fix binary compatibility. * Fix binary compatibility. * Fix backward compatibility on matrix layout. * Fix. * Fix. * Fix. * Fix gfx. * Fix gfx. * Fix dynamic dispatch. * Polish.
* [SPIRV] Support `globallycoherent` and `[vk::index()]`. (#3488)Yong He2024-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | * [SPIRV] Support `globallycoherent` modifier. * Fix. * Disable executable cooperative vector tests. * Update expected failure. * [SPIRV] Emit varying output index decoration. * Add test. * Update tests. * Fix test. * Emit `SpvExecutionModeEarlyFragmentTests`. * Lower `StructuredBuffer<bool>`. * Support globallycoherent on ByteAddressBuffer. --------- Co-authored-by: Yong He <yhe@nvidia.com>
* Fix incorrect behavior of operator% (#3470)Jay Kwak2024-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Fix incorrect behavior of operator% Fixes #1059. This change fixes the incorrect translation of "operator%" from HLSL to SPIRV. The issue stems from the fact that the behavior of "operator%" in GLSL differs from that in HLSL. In HLSL it behaves as "remainder" where as it behaves as "modulus" in GLSL. We have been using SpvOpFMod for operator% when Slang compiles from HLSL to SPRIV, which is incorrect. This change switches it to SpvOpFRem. The tests are slightly modified to reveal any potential issues. * Change output type of test/compute/frem For testing the operator%, we are using "int" as the output type of the test, "test/compute/frem.slang". Since the operands are in float type, it is more preferable to have a float type as the resulting type. This can be done with an option, "-output-using-type". --------- Co-authored-by: Yong He <yonghe@outlook.com>