summaryrefslogtreecommitdiff
path: root/source/slang/slang-emit.cpp
AgeCommit message (Collapse)Author
2025-01-09[Auto-diff] Overhaul auto-diff type tracking + Overhaul dynamic dispatch for ↵Sai Praveen Bangaru
differentiable functions (#5866) * Overhauled the auto-diff system for dynamic dispatch * More fixes * remove intermediate dumps * Update slang-ast-type.h * More fixes + add a workaround for existential no-diff * Update reverse-control-flow-3.slang * remove dumps * remove more dumps * Delete working-reverse-control-flow-3.hlsl * Cleanup comments + unused variables * More comment cleanup * Add support for lowering `DiffPairType(TypePack)` & `MakePair(MakeValuePack, MakeValuePack)` * Fix array of issues in Falcor tests. * Update slang-ir-autodiff-pairs.cpp * More fixes for Falcor image tests * Small fixups. --------- Co-authored-by: Yong He <yonghe@outlook.com>
2025-01-07Lower varying parameters as pointers instead of SSA values. (#5919)Yong He
* Add executable test on matrix-typed vertex input. * Fix emit logic of matrix layout qualifier. * Pass fragment shader varying input by constref to allow EvaluateAttributeAtCentroid etc. to be implemented correctly.
2025-01-07Use disassemble API from SPIRV-Tools (#6001)Jay Kwak
* Use disassemble API from SPIRV-Tools This commit uses C API version of SPIRV disassemble function rather than calling spirv-dis.exe. This allows us to use a correct version of SPIRV disassble function that Slangc.exe is using. The implementation is mostly copied from external/spirv-tools/tools/dis/dis.cpp, which is a source file for building spirv-dis.exe. This commit also includes a fix for a bug in RPC communication to `test-server`. When an RPC connection to `test-server.exe` is reused and the second test abruptly fails due to a compile error or SPIRV validation error, the output from the first test run was incorrectly reused as the output for the second test. This commit resets the RPC result before waiting for the response so that even when the RPC connection is erratically disconnected, the result from the previous run will not be reused incorrectly. Some of the tests appear to be relying on this type of behavior. By using an option, `-skip-spirv-validation`, the RPC connection will continue without an interruption.
2024-12-11Fix the logic to determine whether lower generic pass should run. (#5837)Yong He
2024-12-05Do recursive function checks early during IR linking (#5777)Darren Wihandi
2024-11-26wgpu: Enable Metal-like legalization for byte addressible buffers (#5681)Anders Leino
* Enable hlsl-intrinsic/byte-address-buffer/byte-address-struct * Set byte address buffer legalization options for WGSL
2024-11-21Add datalayout for constant buffers. (#5608)Yong He
* Add datalayout for constant buffers. * Fixes. * Fix test. * Fix glsl codegen. * Update spirv-specific doc. * Fix test. * Fix binding in the presense of specialization constants. * address comments. * Add a test for constant buffer layout.
2024-11-14Fix issue with raw default constructors in SPIRV emit (#5556)Sai Praveen Bangaru
2024-11-12Push buffer load to end of access chain. (#5544)Yong He
* Push buffer load to end of access chain. * Update test. * Fix. * Fix. * Fix. * Make more robust. * Fix.
2024-11-06[WGSL] Enable arbitrary arrays in uniform buffers. (#5497)Yong He
* [WGSL] Enable arbitrary arrays in uniform buffers. * format code * Undo irrelevant change and fixups. * Update expected failure list. * Fix. * Rename. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
2024-11-05Move switch statement bodies to their own lines (#5493)Ellie Hermaszewska
* Move switch statement bodies to their own lines * format --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-10-29formatEllie Hermaszewska
* format * Minor test fixes * enable checking cpp format in ci
2024-10-17Cleanup atomic intrinsics. (#5324)Yong He
* Cleanup atomic intrinsics. * Fix. * Fix glsl. * Remove hacky intrinsic expansion logic for glsl image atomics. * Fix all tests. * Fix. * Add `InterlockedAddF16Emulated`. * Fix glsl intrinsic. * Fix.
2024-10-07Add WGSL support for slang-test (#5174)Anders Leino
* Use the assembly description as target when disassembling I believe this is a bugfix. It seems to have worked before because up until the WGSL case, the disassembler has been the same executable as the one producing the binary to be disassembled. * Add Tint as a downstream compiler This closes issue #5104. * Add downstream compiler for Tint. * Tint is wrapped in a shared library, 'slang-tint' available from [1]. * The header file for slang-tint.dll is added in external/slang-tint-headers. * Add some boilerplate for WGSL targets. * Add an entry point test for WGSL. [1] https://github.com/shader-slang/dawn/releases/tag/slang-tint-0 * Add WGSL_SPIRV as supported target for Glslang * Add WebGPU support to slang-test This helps to address issue #5051. * Disable lots of crashing compute tests for 'wgpu' This closes issue #5051. --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-09-26Move texture format inference to frontend and add reflection api for it. (#5155)Yong He
2024-09-26Always run AD cleanup pass. (#5157)Sai Praveen Bangaru
2024-09-23Implemented Combined-texture for WGSL (#5130)Jay Kwak
* Implemented Combined-texture for WGSL * Remove unnecessary comment * Limit to std430 layout * Fix compiler warning for unused variable --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-09-18Report AD checkpoint contexts (#5058)venkataram-nv
* Transferring source locations when creating phi instructions * Tracking for simple variables * Deriving source locations for loop counters * Printing checkpoint structure breakdown * More readable output format * Special behavior for loop counters * Writing report to file * Add slangc option to enable checkpoint reports * Display types of checkpointed fields * Message in case there are no checkpointing contexts * Catch source locations for function calls * Source cleanup * Fix compilation warnings * Remove stray dump() * Provide the report through diagnostic notes * Add missing path for sourceLoc during unzip pass * Add tests for reporting intermediates * Include more transfer cases for source locations * Fix ordering in address elimination * Fill in more holes with source location transfer * Remove debugging line * Reverting changes to diagnostic sink * Simplify address elimination using source location RAII contexts * Eliminating manual source loc transfers in forward transcription * Fix local var adaptation to use RAII location setter * Simplify primal hoisting logic for source location transfer * Simplify unzipping with RAII location scopes * Simplify transpose logic * Cleaning up for rev.cpp * Reverting spacing changes * Fix mistake with source loc RAII instantiation * Fix formatting issues
2024-09-10Specialize existential return types when possible. (#5044)Yong He
* Fix inccorect dropping of declref during Unification of DeclaredSubtypeWitness. * Add extension test. * Specialize existential return types when possible. * Fix. * Fix. * Fix falcor issue.
2024-09-09Initial WGSL support (#5006)Anders Leino
* Add WGSL as a target This is required for #4807. * C-like emitter: Allow the function header emission to be overloaded WGSL-style function headers are pretty different from normal C-style headers: Normal C-style headers: ReturnType Func(...) void VoidFunc(...) WGSL-style headers: fn Func(...) -> ReturnType fn VoidFunc(...) This change allows the header style to be overloaded, in order to accomodate WGSL-style headers as required to resolve issue #4807, but retains normal C-style headers as the default implementation. [1] https://www.w3.org/TR/WGSL/#function-declaration-sec * C-like emitter: Allow emission of switch case selectors to be overloaded The C-like emitter will emit code like this: switch(a.x) { case 0: case 1: { ... } break; ... } This is not allowed in WGSL. Instead, selectors for cases that share a body must [1] be separated by commas, like this: switch(a.x) { case 0, 1: { ... } break; ... } To prepare for addressing issue #4807, this patch makes the emission of switch case selectors overloadable. [1] https://www.w3.org/TR/WGSL/#syntax-case_selectors * C-like emitter: Support WGSL-style declarations This patch helps to address issue 4807. C-like languages declare variables like this: i32 a; WGSL declares variables like this: var a : i32 The patch introduces overloads so that the forthcoming WGSL emitter can output WGSL-style declarations, which helps to resolve #4807. * C-like emitter: Support overloading of declarators Unlike C-like languages, WGSL does not support the following types at the syntax level, via declarators: - arrays - pointers - references For this reason, this patch introduces support for overloading the declarator emitter, in order to help address issue #4807. C-like languages: int a[3]; // Array-ness of type is mixed into the "declarator" WGSL: var a : array<int, 3>; // Array-ness of type is part of the... type_specifier! * C-like emitter: Allow struct declaration separator to be overridden C-like languages use ';' as a separator, and languages like e.g. WGSL use ','. This change prepares for addressing issue #4807. * C-like emitter: Allow overriding of whether pointer-like syntax is necessary Things like e.g. structured buffers map to "ptr-to-array" in WGSL, but ptr-typed expressions don't always need C-style pointer-like syntax. Therefore, make it overrideable whether or not such syntax is emitted in various cases in order to address #4807. * C-like emitter: Emit parenthesis to avoid warning about & and + precedence This helps with #4807 because WGSL compilers (e.g. Tint) treat absence of parenthesis as an error. * C-like emitter: Add hook for emitting struct field attributes WGSL requires @align attributes to specify explicit field alignment in certain cases. Thus, this patch prepares for addressing #4807. * C-like emitter: Add hook for emitting global param types Declarations of structured buffers map to global array declarations in WGSL. However, in all other cases such as when structured buffers are used in operands, their types map to *ptr*-to-array. This patch makes it possible for the WGSL back-end to say that structured buffers generally map to "ptr-to-array" types, but still have a special case of just "array" when declaring the global shader parameter. Thus, this patch helps with addressing #4807. * IR lowering: Use std140 for WGSL uniform buffers This patch just cuts out some logic that prevented std140 to be chosen for WGSL uniform buffers. Note that WGSL buffers in the uniform address space is not quite std140, but for now it's close enough to avoid compile issues. Later on, a custom layout should be created for WGSL uniform buffers. When that's done, this change will be revisited, but for now it helps to resolve #4807. * Don't emit line directives in WGSL by default WGSL does not support line directives [1]. The plan currently seems to be to instead support source-map [2]. This is part of addressing issue #4807. [1] https://github.com/gpuweb/gpuweb/issues/606 [2] https://github.com/mozilla/source-map * WGSL IR legalization: Map SV's The implementation closely follows the cooresponding one for Metal. Supported: - DispatchThreadID - GroupID - GroupThreadID - GroupThreadID Unsupported: - GSInstanceID This is not complete, but it helps to address #4807. * WGSL emitter: Add support for basic language constructs A lot of the basics are added in order to generate correct WGSL code for basic Slang language constructs. This addresses issue #4807. This adds support for at least the following: - statments - if statements - ternary operator - while statement - for statements - variable declarations - switch statements - Note: Slang may emit non-constant case expressions, see issue 4834 - literals - integer literals - u?int[16|32|64]_t - float and half literals - bool literals - vector literals and splatting (e.g 1.xxx) - function definitions - assignments - +=, *=, /= - array assignments - vector assignments/updates - swizzles of other vectors - from matrix rows ('m[i]' notation) - from matrix cols (using swizzle notation, e.g 'm._11_12_13') - matrix assignments/updates - to rows ('m[i]' notation) - to cols (using swizzle notation, e.g 'm._11_12_13') - declarations - arrays [1] https://www.w3.org/TR/WGSL/#syntax-switch_body * Add some WGSL capabilities This patch registers some WGSL capabilities required to pass many of the initial compute shader compile tests. Many capabilities still remain to be added -- this is just an initial set to help resolve issue #4807. - asint - min and max - cos and sin - all and any * WGSL and C-like emitters: Add hack to bitcast case expression In WGSL, the switch condition and case types must match. https://www.w3.org/TR/WGSL/#switch-statement Slang currently allows these types to mismatch, as pointed out in #4921. Issue #4921 should eventually be addressed in the front-end by a patch like [1]. However, at the moment that would break Falcor tests. Thus, this patch temporarily works around the issue in the WGSL emitter only in order to help resolve #4807. In the future, the Falcor tests should be fixed, this patch should be dropped and [1] should be merged instead. [1] a32156ef52f43b8503b2c77f2f1d51220ab9bdea
2024-09-05Initial -embed-spirv support (#4974)cheneym2
* Initial -embed-spirv support Add support for SPIR-V precompilation using the framework established for DXIL. Work on #4883 * SLANG_UNUSED * Add linkage attributes to exported spirv functions * Combine DXIL and SPIRV paths * Whitespace fix * Merge remaining precompiled spirv/dxil paths * Change inst accessors to return codegentarget * Add unit test for precompiled spirv --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-08-29Support mixture of precompiled and non-precompiled modules (#4860)cheneym2
* Support mixture of precompiled and non-precompiled modules This changes the implementation of precompile DXIL modules to accept combinations of modules with precompiled DXIL, ones without, and ones with a mixture of precompiled DXIL and Slang IR. During precompilation, module IR is analyzed to find public functions which appear to be capable of being compiled as HLSL, and those functions are given a HLSLExport decoration, ensuring they are emitted as HLSL and preserved in the precompiled DXIL blob. The IR for those functions is then tagged with a new decoration AvailableInDXIL, which marks that their implementation is present in the embedded DXIL blob. The DXIL blob is attached to the IR as before, inside a EmbeddedDXIL BlobLit instruction. The logic that determines whether or not functions should be precompiled to DXIL is a placeholder at this point, returning true always. A subsequent change will add selection criteria. During module linking, the full module IR is available, as well as the optional EmbeddedDXIL blob. The IR for functions implemented by the blob are tagged with AvailableInDXIL in the module IR. After linking the IR for all modules to program level IR, the IR for the functions marked AvailableInDXIL are deleted from the linked IR, prior to emitting HLSL and compiling linking the result. This change also changes the point of time when the module IR is checked for EmbeddedDXIL blobs. Instead of happening at load time as before, it happens during immediately before final linking, meaning that the blob does not need to be independently stored with the module separate from the IR as was done previously. Work on #4792 * Clean up debug prints * Call isSimpleHLSLDataType stub * Address feedback on precompiled dxil support Allow for IR filtering both before and after linking. Only mark AvailableInDXIL those functions which pass both filtering stages. Functions are corrlated using mangled function names. Rather than delete functions entirely when linking with libraries that include precompiled DXIL, instead convert the IR function definitions to declarations by gutting them, removing child blocks. * Use artifact metadata and name list instead of linkedir hack * Use String instead of UnownedStringSlice * Update tests * Renaming * Minor edits * Don't fully remove functions post-link * Unexport before collecting metadata
2024-08-27Adds a warning for using `[PreferRecompute]` on methods that may contain ↵Sai Praveen Bangaru
side effects (#4707) * Adds a warning for using prefer-recompute on methods that contain side effects * Rename `SideEffects` -> `SideEffectBehavior` --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-08-26Fix Varying Variable Location Assignments With Hull Shaders (#4915)ArielG-NV
* Fix Varying Variable Location Assignments With Hull Shaders Fixes: #4913 Fixes: #4540 Changes: 1. Added `kIROp_ControlBarrier` to HLSL/GLSL emitting. 2. Added a method to track 'used' and 'unused' varyings for when legalizing GLSL. This allows us to assign correct offsets to automatically added varyings * Added a `ZeroLSB` check to UIntSet for this purpose * add missing return * code comment adjustment * cleanup * comment and HLSL controlBarrier mistake * assume space for glsl/spriv varying is irrelevant
2024-07-30Fixes for Metal ParameterBlock support. (#4752)Yong He
2024-07-30Move SPIRV global variables into a context variable (#4741)ArielG-NV
2024-07-26Allow passing sized array to unsized array parameter. (#4744)Yong He
2024-07-25Overhaul IR lowering of pointer types. (#4710)Yong He
* Overhaul IR lowering of pointer types. * Propagate address space in IRBuilder. * Fixup. * Fix. * Fix. * Change how Ptr type is printed to text. * Fix.
2024-07-24Add generic descriptor indexing intrinsic (#4389)dubiousconst282
* Add ResourceArray intrinsic type * Move aliased parameter generation to GLSL legalization * Add DynamicResourceEntry type for proxying layout of GenericResourceArray * Reimplement as DynamicResource * Add reflection test * Don't reuse alias cache between different parameters * Add dynamic cast extensions for buffer types * Minor format fix * Fix VarDecl diagnostics after finding non-appliable initializer candidates --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-07-19move autodiff-decoration-stripping-pass so it always runs (#4632)ArielG-NV
Co-authored-by: Yong He <yonghe@outlook.com>
2024-07-18Metal: `Interlocked` (atomic) member function support for buffers (#4655)ArielG-NV
* Metal: `Interlocked` (atomic) member function support for buffers fixes: #4654 fixes: #4481 1. Add `Interlocked` (atomic) member function support for buffers to Metal 2. Fix `__getEquivalentStructuredBuffer` so it works with CPP/Metal targets * add `CompareStore` support * legalize RWByteAddressBuffer to fully replace StructuredBuffer * destroy replaced byte-addr buffer * cleanup as per review and add comment to explain why certain code exists * fix flow of byte-address-buffer replacement * toggle on option to translate byteAddrBuffer to StructuredBuffer * cleanup unused buffers * add treatGetEquivalentStructuredBufferAsGetThis flag to treat getEquivStructuredBuffer as a byteAddressBuffer * comment to explain `treatGetEquivalentStructuredBufferAsGetThis` --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-07-18Allow CPP/CUDA/Metal to lower/legalize buffer-elements to support ↵ArielG-NV
column_major/row_major. (#4653) * Allow CPP/CUDA/Metal to legalize their buffer-elements. Fixes: #4537 Changes: 1. Matrix inputs require legalization (pack/unpack) to ensure consistent row_major/column_major throughout entire shader, the following enabled legalization pass fixes this. 2. Added missing CUDA intrinsic so CUDA can run more tests. 3. Added a memory packing test since this still fails for cpp/cuda/metal (due to having no memory packing enforcement). * change memory packing tests to run for targets without packing --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-07-17Use slang-glslang.dll for spirv-validation (#4642)Jay Kwak
* Use slang-glslang.dll for spirv-validation This change replaces the use of "spirv-val.exe" with an API call to "spvtools::SpirvTools::Validate()". Closes #4610
2024-07-16SCCP instead of CFG since SCCP removes code of unused branches, not CFG (#4640)ArielG-NV
2024-07-12use `nullptr' for IRStructKey with `IRDerivativeMemberDecoration` (#4623)ArielG-NV
2024-07-10Implement non member function atomic texture support (#4544)ArielG-NV
* Implement non member function atomic texture support texture_buffer and texture1d Fixes: #4538 Related to: #4291, fixes `tests/compute/atomics-buffer.slang` Texture objects cannot use `__getMetalAtomicRef` to cast objects into atomic value type. [Texture objects mandate use of member functions](https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf#Texture%20Functions). The implementation is as follows: * We can detect texture object usage through checking for an `IRImageSubscript` Operation. `__isTextureAccess()` was added to evaluate if we have an `IRImageSubscript` operation at compile time (before `static_assert`). `__isTextureAccess()` only checks if we are targeting Metal. * We have all parameter data needed to call a texture atomic function embedded inside `IRImageSubscript`. `__extractTextureFromTextureAccess()` and `__extractCoordFromTextureAccess()` was added to extract this data for use with Metal atomics. Note: * Metal documentation has various incorrect details (function names) * Since we currently hardcode metal versions for compiling, the Metal compiler version was changed to target `Metal 3.1` (`slang-gcc-compiler-util.cpp`) * textures do not permit atomic float operations * add fallthrough attribute + fix bug with 'exchange instead of xor' + fix warning bug * incorrect function name fix * missing filecheck * disable atomics-buffer.slang compute test since GFX issue causing it to fail * Array support for metal interlockedAtomic and proper verification of texture with interlockedAtomic functions * Array support for metal interlockedAtomic * proper verification of texture with interlockedAtomic functions note: had to seperate many functions to allow forceInlining to run * missing getOperand(0) * push atomic fix for metal * fix atomic syntax for metal and hlsl emitting extra brackets (breaks tests) * test changes and meta changes 1. max is 8 rw textures with metal because Metal has this limit. Split up tests to not hit this limit 2. added back `[0]`...,`T` to test since this legalizes metal atomic intrinsic * macro'ify some of the atomic code 1. addresses review 2. makes code easier to modify in the future (rather than sifting through 1000 lines we can just look at ~10-30 * fix test 'check' * missing float support due to macro * add functions macro generates, `InternalAtomicOperationInfo` --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-06-27Remove returned-array-legalization pass for metal (#4478)ArielG-NV
* disable return array optimization pass for metal targets fixes: #4468
2024-06-26Expand upon existing `ImageSubscript` support (Metal, GLSL, SPIRV) (#4408)ArielG-NV
* Add additional `ImageSubscript` features: 1. Added ImageSubscript support for Metal & a test case * Merge GLSL/SPIRV/Metal `ImageSubscript` legalization pass 2. Added multisample support to glsl/spirv/metal for when using ImageSubscript * Added in this PR since the overhaul of the code merges together GLSL/SPIRV/Metal implementation 3. Fixed minor metal texture `Load`/`Read` bugs * [HLSL methods of access do not support subscript accessor for texture cube array](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/texturecubearray) * removed swizzling of uint/int/float * other odd bugs which were causing compile errors note: Compute tests do not work due to what seems to be the GFX backend (causes crash without error report). The tests are disabled. * disable LOD with texture 1d seems that LOD for 1d textures need to be a compile time constant as per an error metal throws * syntax error in hlsl.meta * static_assert alone with intrinsic_asm error provides cleaner errors Note: `static_assert` seems to be unstable and not be fully respected (still require `intrinsic_asm` to avoid a stdlib compile error) * change comment to `// lod is not supported for 1D texture * add `static_assert` in related code gen paths * address review * address review * add asserts as per review comment, NOTE: unclear if these should be release 'asserts' as well
2024-06-13Remove `IRHLSLExportDecoration` and `IRKeepAliveDecoration` for ↵Sai Praveen Bangaru
non-CUDA/Torch targets (#4364) * Remove `IRHLSLExportDecoration` and `IRKeepAliveDecoration` for non-CUDA/Torch targets * Update hlsl-torch-cross-compile.slang
2024-06-12Add option to preserve shader parameter declaration in output SPIRV. (#4344)Yong He
* Add option to preserve shader parameter declarations in output. * Add test.
2024-06-10Address glslang ordering requirments for 'derivative_group_*NV' (#4323)ArielG-NV
* Address glslang ordering requirments for 'derivative_group_*NV' fixes: #4305 The solution is to emit some `layout`s after a module source is emitted. Added to slangs gfx backend code to enable the compute shader derivative extension for testing purposes. * address review * enable removed test --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-06-10Partial implementation of static_assert (#4294)Jay Kwak
* Error out for types not supported by texture sample functions This commit prints errors with a new keyword, `static_assert`, when the given texture type is not supported for the target. * Moving the check to linkAndOptimizeIR after specialization is done * Remove unnecessary change * Adding test * Remove kIROp_StaticAssert once processed * Do not remove StaticAssert because it is needed for the next specialization * Remove after iteration of child is done --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-05-29Improve compile time performance. (#3857)Yong He
* Handle type check cache update on extensions more gracefully. * Correctness fix. * Cache implcit cast overload resolution results. * Fix. * More optimizations. * Cache implicit default ctor resolution. * Disable redundancy removal. * Fix. * Fix test. * Fix. * Correctness fix. * Fix. * Fix, * Fix test. * Small tweak.
2024-05-29Add options to speedup compilation. (#4240)Yong He
* Add options to speedup compilation. * Fix. * Plumb options to DCE pass. * Revert debug change. * Fix regressions. * More optimizations. * more cleanup and fixes. * remove comment. * Fixes. * Another fix. * Fix errors. * Fix errors. * Add comments.
2024-05-17Add `-minimum-slang-optimization` to favor compile time. (#4186)Yong He
2024-05-16RasterizerOrder resource for spirv and metal. (#4175)Yong He
* RasterizerOrder resource for spirv and metal. Also fixes the byte address buffer logic for metal. * Fix. * Delete commented lines. --------- Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
2024-05-14Support combined textures for Metal target (#4169)Jay Kwak
2024-05-13Add LoadAligned and StoreAligned methods to ByteAddressBuffers (#4066)Sriram Murali
Fixes #4062 This change enables wide load/stores for byte-address-buffer backed resources, when the data is accessed at an offset that is aligned. **Goals** - Improve performance by issuing wider instructions instead of sequence of scalar instructions, for load and stores of byte-address buffers. - Reduce code-size and readability of the generated shaders. - Help naive users as well as ninja programmers, generate optimal code. **Non Goals** - Help with Structured buffers, or other resources. - Target compilation time improvements. **Key changes** Adds 2 new overloads for Load and Store operations on ByteAddress Buffers. 1. Load / Store with an extra alignment parameter ``` resource.Load<T>(offset, alignment); resource.Store<T>(offset, value, alignment); ``` 2. LoadAligned / StoreAligned with no extra parameter, with the same signature as orignial Load / Store. ``` resource.LoadAligned<T>(offset); resource.StoreAligned<T>(offset, value); ``` - This overload will implicitly identify the alignment value, from the base type T of the elementary unit of the resource. **Supported resources** 1. Vectors This can be upto 4 elements, i.e. float -- float4. 2. Arrays This does not have a limit on number of elements, but on a conservative estimate, we can limit to few hundreds. 3. Structures This is used to group a resource of a single type. ``` struct { float4 x; } ``` **Code updates** - Modified byte-address-ir legalize to handle struct, array and vector kinds of load or store access - Added custom hlsl stdlib functions to implement all the overloads for Load, Store etc. - Added C-like emitter, SPIR-V emitter for handling ByteAddressBuffers. - Added a new core stdlib function intrinsic to wrap around alignOf<T>(). - Added a new peephole optimization entry to identify the equivalent IntLiteral value from the alignOf<T>() inst. - Added tests to check explicit, and implicit aligned Load and Store operations.
2024-05-06Delete `wrap-global-context` pass. (#4114)Yong He
* Delete `wrap-global-context` pass. The pass was added for the metal backend without realizing that the existing `explicit-global-context` does 99% of the job. Instead of duplicating the logic in a different pass for metal, we extend same explicit-global-context pass to work for metal. * Fix build.
2024-04-30Added diagnostics & built-in type lowering for `[CUDAKernel]` functions (#4042)Sai Praveen Bangaru
* Added diagnostics & built-in type lowering for `[CUDAKernel]` functions This PR adds - Diagnostics for non-void return from a cuda kernel entry point - Diagnostics for using differentiable types in a differentiable cuda kernel entry point - Logic for converting built-in types (float3, float3x3, etc..) to portable struct types and unpacks the parameter back into a built-in type on the CUDA side. This is because built-in types have different implementations in CUDA & CPP targets, which causes signature mis-match when linking. * Fix error codes * Add ability to lower structs and arrays that contain built-in types. + Added tests + Fix issue where the host-side was not marshalling data to lowered types. * Update slang-ir-pytorch-cpp-binding.cpp --------- Co-authored-by: Yong He <yonghe@outlook.com>