summaryrefslogtreecommitdiff
path: root/source/slang/slang-emit.cpp
AgeCommit message (Collapse)Author
2024-07-10Implement non member function atomic texture support (#4544)ArielG-NV
* Implement non member function atomic texture support texture_buffer and texture1d Fixes: #4538 Related to: #4291, fixes `tests/compute/atomics-buffer.slang` Texture objects cannot use `__getMetalAtomicRef` to cast objects into atomic value type. [Texture objects mandate use of member functions](https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf#Texture%20Functions). The implementation is as follows: * We can detect texture object usage through checking for an `IRImageSubscript` Operation. `__isTextureAccess()` was added to evaluate if we have an `IRImageSubscript` operation at compile time (before `static_assert`). `__isTextureAccess()` only checks if we are targeting Metal. * We have all parameter data needed to call a texture atomic function embedded inside `IRImageSubscript`. `__extractTextureFromTextureAccess()` and `__extractCoordFromTextureAccess()` was added to extract this data for use with Metal atomics. Note: * Metal documentation has various incorrect details (function names) * Since we currently hardcode metal versions for compiling, the Metal compiler version was changed to target `Metal 3.1` (`slang-gcc-compiler-util.cpp`) * textures do not permit atomic float operations * add fallthrough attribute + fix bug with 'exchange instead of xor' + fix warning bug * incorrect function name fix * missing filecheck * disable atomics-buffer.slang compute test since GFX issue causing it to fail * Array support for metal interlockedAtomic and proper verification of texture with interlockedAtomic functions * Array support for metal interlockedAtomic * proper verification of texture with interlockedAtomic functions note: had to seperate many functions to allow forceInlining to run * missing getOperand(0) * push atomic fix for metal * fix atomic syntax for metal and hlsl emitting extra brackets (breaks tests) * test changes and meta changes 1. max is 8 rw textures with metal because Metal has this limit. Split up tests to not hit this limit 2. added back `[0]`...,`T` to test since this legalizes metal atomic intrinsic * macro'ify some of the atomic code 1. addresses review 2. makes code easier to modify in the future (rather than sifting through 1000 lines we can just look at ~10-30 * fix test 'check' * missing float support due to macro * add functions macro generates, `InternalAtomicOperationInfo` --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-06-27Remove returned-array-legalization pass for metal (#4478)ArielG-NV
* disable return array optimization pass for metal targets fixes: #4468
2024-06-26Expand upon existing `ImageSubscript` support (Metal, GLSL, SPIRV) (#4408)ArielG-NV
* Add additional `ImageSubscript` features: 1. Added ImageSubscript support for Metal & a test case * Merge GLSL/SPIRV/Metal `ImageSubscript` legalization pass 2. Added multisample support to glsl/spirv/metal for when using ImageSubscript * Added in this PR since the overhaul of the code merges together GLSL/SPIRV/Metal implementation 3. Fixed minor metal texture `Load`/`Read` bugs * [HLSL methods of access do not support subscript accessor for texture cube array](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/texturecubearray) * removed swizzling of uint/int/float * other odd bugs which were causing compile errors note: Compute tests do not work due to what seems to be the GFX backend (causes crash without error report). The tests are disabled. * disable LOD with texture 1d seems that LOD for 1d textures need to be a compile time constant as per an error metal throws * syntax error in hlsl.meta * static_assert alone with intrinsic_asm error provides cleaner errors Note: `static_assert` seems to be unstable and not be fully respected (still require `intrinsic_asm` to avoid a stdlib compile error) * change comment to `// lod is not supported for 1D texture * add `static_assert` in related code gen paths * address review * address review * add asserts as per review comment, NOTE: unclear if these should be release 'asserts' as well
2024-06-13Remove `IRHLSLExportDecoration` and `IRKeepAliveDecoration` for ↵Sai Praveen Bangaru
non-CUDA/Torch targets (#4364) * Remove `IRHLSLExportDecoration` and `IRKeepAliveDecoration` for non-CUDA/Torch targets * Update hlsl-torch-cross-compile.slang
2024-06-12Add option to preserve shader parameter declaration in output SPIRV. (#4344)Yong He
* Add option to preserve shader parameter declarations in output. * Add test.
2024-06-10Address glslang ordering requirments for 'derivative_group_*NV' (#4323)ArielG-NV
* Address glslang ordering requirments for 'derivative_group_*NV' fixes: #4305 The solution is to emit some `layout`s after a module source is emitted. Added to slangs gfx backend code to enable the compute shader derivative extension for testing purposes. * address review * enable removed test --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-06-10Partial implementation of static_assert (#4294)Jay Kwak
* Error out for types not supported by texture sample functions This commit prints errors with a new keyword, `static_assert`, when the given texture type is not supported for the target. * Moving the check to linkAndOptimizeIR after specialization is done * Remove unnecessary change * Adding test * Remove kIROp_StaticAssert once processed * Do not remove StaticAssert because it is needed for the next specialization * Remove after iteration of child is done --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-05-29Improve compile time performance. (#3857)Yong He
* Handle type check cache update on extensions more gracefully. * Correctness fix. * Cache implcit cast overload resolution results. * Fix. * More optimizations. * Cache implicit default ctor resolution. * Disable redundancy removal. * Fix. * Fix test. * Fix. * Correctness fix. * Fix. * Fix, * Fix test. * Small tweak.
2024-05-29Add options to speedup compilation. (#4240)Yong He
* Add options to speedup compilation. * Fix. * Plumb options to DCE pass. * Revert debug change. * Fix regressions. * More optimizations. * more cleanup and fixes. * remove comment. * Fixes. * Another fix. * Fix errors. * Fix errors. * Add comments.
2024-05-17Add `-minimum-slang-optimization` to favor compile time. (#4186)Yong He
2024-05-16RasterizerOrder resource for spirv and metal. (#4175)Yong He
* RasterizerOrder resource for spirv and metal. Also fixes the byte address buffer logic for metal. * Fix. * Delete commented lines. --------- Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
2024-05-14Support combined textures for Metal target (#4169)Jay Kwak
2024-05-13Add LoadAligned and StoreAligned methods to ByteAddressBuffers (#4066)Sriram Murali
Fixes #4062 This change enables wide load/stores for byte-address-buffer backed resources, when the data is accessed at an offset that is aligned. **Goals** - Improve performance by issuing wider instructions instead of sequence of scalar instructions, for load and stores of byte-address buffers. - Reduce code-size and readability of the generated shaders. - Help naive users as well as ninja programmers, generate optimal code. **Non Goals** - Help with Structured buffers, or other resources. - Target compilation time improvements. **Key changes** Adds 2 new overloads for Load and Store operations on ByteAddress Buffers. 1. Load / Store with an extra alignment parameter ``` resource.Load<T>(offset, alignment); resource.Store<T>(offset, value, alignment); ``` 2. LoadAligned / StoreAligned with no extra parameter, with the same signature as orignial Load / Store. ``` resource.LoadAligned<T>(offset); resource.StoreAligned<T>(offset, value); ``` - This overload will implicitly identify the alignment value, from the base type T of the elementary unit of the resource. **Supported resources** 1. Vectors This can be upto 4 elements, i.e. float -- float4. 2. Arrays This does not have a limit on number of elements, but on a conservative estimate, we can limit to few hundreds. 3. Structures This is used to group a resource of a single type. ``` struct { float4 x; } ``` **Code updates** - Modified byte-address-ir legalize to handle struct, array and vector kinds of load or store access - Added custom hlsl stdlib functions to implement all the overloads for Load, Store etc. - Added C-like emitter, SPIR-V emitter for handling ByteAddressBuffers. - Added a new core stdlib function intrinsic to wrap around alignOf<T>(). - Added a new peephole optimization entry to identify the equivalent IntLiteral value from the alignOf<T>() inst. - Added tests to check explicit, and implicit aligned Load and Store operations.
2024-05-06Delete `wrap-global-context` pass. (#4114)Yong He
* Delete `wrap-global-context` pass. The pass was added for the metal backend without realizing that the existing `explicit-global-context` does 99% of the job. Instead of duplicating the logic in a different pass for metal, we extend same explicit-global-context pass to work for metal. * Fix build.
2024-04-30Added diagnostics & built-in type lowering for `[CUDAKernel]` functions (#4042)Sai Praveen Bangaru
* Added diagnostics & built-in type lowering for `[CUDAKernel]` functions This PR adds - Diagnostics for non-void return from a cuda kernel entry point - Diagnostics for using differentiable types in a differentiable cuda kernel entry point - Logic for converting built-in types (float3, float3x3, etc..) to portable struct types and unpacks the parameter back into a built-in type on the CUDA side. This is because built-in types have different implementations in CUDA & CPP targets, which causes signature mis-match when linking. * Fix error codes * Add ability to lower structs and arrays that contain built-in types. + Added tests + Fix issue where the host-side was not marshalling data to lowered types. * Update slang-ir-pytorch-cpp-binding.cpp --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-04-30Generate vectorized version of byteaddress load/store methods (#4036)Sriram Murali
Fixes #3533 - Add logic to perform aligned memory operations for loading from and storing to composite resources, like vectors within the ByteAddress legalize pass. - Checks Added a new test for byte address with/without alignment. --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-04-30Metal: Vertex/Fragment builtin and layouts. (#4044)Yong He
* Metal: Vertex/Fragment builtin and layouts. * Fix. * Fix test. * Emit user semantic on vertex/fragment attributes.
2024-04-26WIP: Force Inline If RefType (#4005)ArielG-NV
* Force Inline if reftype Fixes #3997. If we are using a refType, we now ForceInline. remarks: 1. Modifications were made in slang-ir-glsl-legalize to change how we translate GlobalParam proxy's into GlobalParam. a. We now handle the senario where a globalParam is used in multiple disjoint blocks (like 2 different functions). * try to figure out why CI fails but local works try to inline DispatchMesh, works locally, may fail on CI(?) * try another fix * add task tests + don't allow semi-early task-shader inline Task shader uses DispatchMesh which is a very big 'hack' where we check for the function name and modify the callees in very large ways. This function does inline, but it cannot inline early due to future mangling that this operation requires todo. This is reflected with the `[noRefInline]` modifier. It is a modifier so users may stop mandatory inlines with `__ref` parameter.
2024-04-24Parameter layout and reflection for Metal bindings. (#4022)Yong He
2024-04-23Switch to direct-to-spirv backend as default. (#4002)Yong He
* Switch to direct-to-spirv backend as default. * Fix slang-test. * Fix. * Fix.
2024-04-22bit_cast & reinterpret warning if src->dst type not equally sized. (#3988)ArielG-NV
* bit_cast & reinterpret warning if src->dst type not equally sized. bit_cast & reinterpret warning if src->dst type not equally sized. --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-04-18Metal: rewrite global variables as explicit context. (#3981)Yong He
* Metal: rewrite global variables as explicit context. * Small tweaks.
2024-04-17Support combined texture sampler when targeting HLSL. (#3963)Yong He
* Support combined texture sampler when targeting HLSL. * Fix glsl intrinsics. * Update source/slang/slang-ir-lower-combined-texture-sampler.cpp Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> * Update source/slang/slang-ir-lower-combined-texture-sampler.cpp Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> * Update source/slang/slang-ir-lower-combined-texture-sampler.cpp Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> * Fix., * Enhance test. * Remove unused field. * Fix indentation --------- Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com>
2024-04-17Add skeleton for metal backend. (#3971)Yong He
2024-04-11WIP: Fix the variable scope issue (#3838) (#3892)kaizhangNV
* Fix the variable scope issue (#3838) In the IR optimization pass, we turn all the loop to do-while loop form. But in the do-while loop form, the loop body block is dominating the blocks after the loop break block. This assumption is fine for SPIRV and IR code, however, it's incorrect for all the other language target (e.g. c/c++/cuda/glsl/hlsl) because the instructions defined in the loop body is invisible from outside of the loop. Therefore, when translating to other textual language, there could be issue for the variables scope. To fix this issue, we first detect the instructions that are defined inside the loop block, then check if these instructions are used after the break block. If so, we duplicate these instructions right before their users such that we can make those instructions available globally. * Update slang vcxproj file because of add new source files * Minor fix - Update the method to get the block of an instruction - Avoid query the hash-map twice by using "add" method directly. * Reduce complexity In searching loop region blocks, we don't actually need to traverse the instructions. Instead, we only have to check each block to see if it's in a loop region, and hash such block for later on processing. So we can remove one level of loop. In the second pass, we can use that hash to filter out the blocks that are not in the loop region, and only process the instructions inside the loop region. Add description for the new fix-up pass declared in slang-ir-variable-scope-correction.h. * Categorize the unstorable and storable instructs 1. When checking the loop regions, there could be multi-levels nested loops, so we should use a list to store the loopHeaders. 2. Categorize the instructs based on storable and non-storable, because we only have to duplicate the non-storable instructs. Note pointer type instruct is also belonged to non-storable class because we can not store a pointer in local variable. * Fix some test failure * Fix test failures * Recursively process the operands Besides process the out-of-scope instruction, we have to also process all the operands of this instructions. Therefore, we have to make the process logic recursive until all the involved instructions are accessible. * Change how to check storable type * Add target checking for CPP/CUDA In decide whether the type is storable, add target checking for CPP/CUDA as they can store any types. Cleanup the code to remove those debug log prints. * Addressing feedbacks Address some feedbacks. Change the depth-first traverse to breadth-first traverse when processing instruction and its operands. * Minor fix for the variable names
2024-04-03Legalization of non-struct when function expects struct, resolves #3840 (#3880)ArielG-NV
* Legalization of non-struct when expects struct. `__forceVarIntoStructTemporarily()` solves the issue of passing "non-struct type's" into a parameter that only accepts "struct type's". The intrinsic solves the issue through checking the parameter of the intrinsic: If the parameter is a "struct type" * Return a reference to the parameter else * a "struct type" Temporary variable is made and the "non struct type" parameter is copied to a member of this struct. This struct is then returned by `__forceVarIntoStructTemporarily()`. Optionally if the use location of this call is a argument which can have side effects (out, inout, ref, etc.) the temporary struct variable is copied into the original "non struct type" parameter. Testing code has "addComplexity" functions to avoid optimizations through forcing side effects so we can predict the code output. * Address review comments - ForceInline ray functions - fix testing - adjust how we replace operands in senarios to avoid unexpected side effects of replacing operands without any explicit checks * Adjust nv test slightly and remove .glsl file * Remove implicit LOD sampling & test additions - Implicit LOD sampling is not allowed in a raygen. Implicit LOD sampling requires depth (from a fragment shader) to sample. Raygen does not have the depth, so this function was replaced. - Changed other tests for correctness/clarity * Test if Falcor breaks through use of ForceInline * Add back force inline may need to look at how Falcor wrote its slang shaders. This will be done if ForceInline causes issues since ForceInline should not affect code gen in an impactable way.
2024-03-18Check cyclic types after specialization. (#3791)Yong He
2024-03-18Fix SPIRV for mesh shaders, checks for invalid target code&recursion. (#3788)Yong He
* Fix #3780. * Fixers #3781. * Add test for #3781. * Diagnose error on unsupported builtin intrinsic types. * Add check for recursion. * Fix. * Fix. * Fix recursion detection. * Fix. * Fix. * Fix recursion logic. * More fix.
2024-03-15Implement raytracing extension(s); resolves #3560 for GLSL & SPIR-V targets ↵ArielG-NV
(#3675) The following PR implements raytracing extensions (GLSL_EXT_ray_tracing, GLSL_EXT_ray_query, GLSL_NV_shader_invocation_reorder & GLSL_NV_ray_tracing_motion_blur); for GLSL & SPIR-V targets. Fully implements all functions, built-in variables, & syntax; resolves #3560 for GLSL & SPIR-V Targets. notes of worth: * __rayPayloadFromLocation, __rayAttributeFromLocation, and __rayCallableFromLocation, were added as SPIR-V Intrinsics to refer to location's of raytracing objects in SPIR-V for when using GLSL syntax.
2024-03-12Make type names spec-conformant in SPIRV reflect. (#3748)Yong He
* Preserve ByteAddressBuffer user type name. * Make user type lowercase. * Make typenames conform to spec. * Use `SpvOpDecorateString`.
2024-03-11Add `-fvk-use-dx-position-w` and fix ExecutionMode ordering for geometry ↵Yong He
shaders. (#3731) * Add `-fvk-use-dx-position-w`. * Fix ordering of OutputVertices and output primitive type decoration in spirv. * Fix. * fix * Fix. * Move test around.
2024-03-08Enhance link-time type test. (#3724)Yong He
* Enhance link-time type test. * Fix. * Fix.
2024-03-07Uniformity analysis. (#3704)Yong He
* Uniformity analysis. * Add [NonUniformReturn] decorations to some hlsl intrinsic functions.
2024-03-04Extend `as` and `is` operator to work on generic types. (#3672)Yong He
2024-02-20Refactor compiler option representations. (#3598)Yong He
* Refactor compiler option representation. * Fix binary compatibility. * Add a test for specifying compiler options at link time. * Fix binary compatibility. * Fix binary compatibility. * Fix backward compatibility on matrix layout. * Fix. * Fix. * Fix. * Fix gfx. * Fix gfx. * Fix dynamic dispatch. * Polish.
2024-02-02GLSL Passthrough support for SSBO types (#3446)Ellie Hermaszewska
* GLSL Passthrough support for SSBO types * GLSL Passthrough support for SSBO types * Correctly apply glsl local size layout to entry points during lowering * Test for glsl layout correctness * typo * Reflect GLSL SSBO as raw buffers * Functional test for glsl ssbo * Allow allow glsl for render tests * Functional test for ssbo passthrough * Functional test for ssbo passthrough with spirv-direct * fix windows build error --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-01-23SPIRV Legalization fixes. (#3479)Yong He
* Fix CFG legalization for SPIRV backend. * Emit DepthReplacing execution mode. * Fix do-while lowering. --------- Co-authored-by: Yong He <yhe@nvidia.com>
2023-12-30Fix the intrinsic expansion of ObjectToWorld3x4 in spirv_asm. Data type (#3428)Pankaj Mistry
2023-12-15Add ConstBufferPointer::subscript. (#3415)Yong He
Co-authored-by: Yong He <yhe@nvidia.com>
2023-12-15GLSL SSBO Support (#3400)Ellie Hermaszewska
* Squash warnings and fix build with SLANG_EMBED_STDLIB * Add GLSLShaderStorageBuffer magic wrapper * Make GLSLSSBO not a uniform type * Buffers are global variables * Allow creating ssbo aggregate types * Allow reading from RWSB using builder * Nicer debug printing for ssbos * Lower SSBO to RWSB * Parse interface blocks into wrapped structs * Lower Interface Block Decls to structs * remove comment * Two simple ssbo tests * Move ssbo pass earlier * Correct mutable buffer detection * Do not replace ssbo usages outside of blocks * Treat GLSLSSBO as a mutable buffer for type layouts * regenerate vs projects * Correctly detect ssbo types * Diagnose illegal ssbo * remove unreachable code * neaten * ci wobble * Make GLSLSSBO ast handling more uniform * Add modifier cases for glsl * Use empty val info for unhandled interface blocks necessary for ./tests/glsl/out-binding-redeclaration.slang * more sophisticated modifier check * Correct ssbo wrapper name
2023-12-14Looks like `#3327` left in some debugging code. (#3411)jsmall-nvidia
2023-12-13Fix GLSL static initialization bug. (#3409)Yong He
* Fix GLSL static initialization bug. Fixes #3408. * Update comment. * Fold global var initializer as an expression if possible. --------- Co-authored-by: Yong He <yhe@nvidia.com>
2023-11-16Unify stdlib `Texture` types into one generic type. (#3327)Yong He
* Unify Texture types in stdlib into 1 generic type. * Fixes. * Fix. * Fixes. * Fix reflection. * Fix binding reflection. * Add gather intrinsics. * Fix gather intrinsics. * Fix texture type toText. * Fix intrinsic. * fix cuda intrinsic. * Fix project files. * cleanup. * Fix. * Fix. * Fix sampler feedback test. * Fix getDimension intrinsics. * Fix spirv sample image intrinsics. * Fix test. * Fix GLSL intrinsic. * Cleanup. --------- Co-authored-by: Yong He <yhe@nvidia.com>
2023-11-14Add GLSL Compatibility. (#3321)Yong He
* Parse glsl buffer blocks to GLSLInterfaceBlockDecl * Parse glsl local size layout declarations * Parse (and ignore) glsl version directives * spelling * Better l-value interpretation for glsl interface blocks * Better l-value interpretation for glsl interface blocks * Add compile flag for enabling glsl * Parse and ignore precision modifiers. * Automatically import `glsl` module for compatiblity. * Complete vector and matrix types for glsl * Remove generated file from repo * Bump .gitignore * do not mark out globals as params * Synthesize entrypoint layout from global inout vars. * update test result. * Allow HLSL semantic on global variables. * Fix. * Fix test. * Fix win32 compile error. * Add more builtin input/output and texture intrinsics. * Add struct/array constructor syntax. * Skip `#extension` lines. * overide operator * for matrix/vector multiplication. * Add `matrixCompMult`. * Parse modifiers in for loop init var declr. * Add more glsl intrinsics, add stage into to var layout. * Allow `int[3] x` syntax. * Fix array type syntax. --------- Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com> Co-authored-by: Yong He <yhe@nvidia.com>
2023-10-11Small warnings and bugs (#3272)Ellie Hermaszewska
* Correctly use removeTrivialSingleIterationLoops during simplification * remove unused variables * Fix invalid fallthrough --------- Co-authored-by: Yong He <yonghe@outlook.com>
2023-10-11Report spirv-opt time. (#3271)Yong He
* Report spirv-opt time. * Removing timing logic in `loadModule`. --------- Co-authored-by: Yong He <yhe@nvidia.com>
2023-10-09Run curated spirv-opt passes through slang-glslang. (#3266)Yong He
* Run curated spirv-opt passes through slang-glslang. * Cleanup. * Replace spirv-dis downstream compiler with glslang. * delete slang-spirv-opt.cpp. --------- Co-authored-by: Yong He <yhe@nvidia.com>
2023-10-05Various AD Fixes (#3263)Sai Praveen Bangaru
* Various fixes * Remove unused parameter * Update slang-ir-loop-unroll.cpp --------- Co-authored-by: Yong He <yonghe@outlook.com>
2023-10-04SPIRV compiler performance fixes. (#3258)Yong He
* SPIRV compiler performance fixes. * Cleanup. * update project files * Cleanup debug code. * Make redundancy removal non-recursive. --------- Co-authored-by: Yong He <yhe@nvidia.com>
2023-09-27WIP Mesh shaders for SPIR-V (#3226)Ellie Hermaszewska
* SPIR-V impl for SetMeshOutputCounts and DispatchMesh * Unsightly fix for legalization ordering differences between GLSL and SPIR-V * spelling * Start a new block after terminating one in the OpEmitMeshTasksExt SPIR-V asm block * Emit mesh shader decorations in SPIR-V * Mesh and task shader stages for spir-v * Output explicit gl builtins for spir-v * Be more hygenic when SOAizing mesh outputs * Do not create builtin paramter block for spirv mesh outputs * Pass mesh payloads around by ref * comment * less expected failure * remove unused * Add spirv op * Correct type query for default flat modifier --------- Co-authored-by: Yong He <yonghe@outlook.com>