slang.git - Making it easier to work with shaders

	Commit message (Collapse)	Author	Age
...
*	Add options to speedup compilation. (#4240)	Yong He	2024-05-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add options to speedup compilation. * Fix. * Plumb options to DCE pass. * Revert debug change. * Fix regressions. * More optimizations. * more cleanup and fixes. * remove comment. * Fixes. * Another fix. * Fix errors. * Fix errors. * Add comments.
*	Add `-minimum-slang-optimization` to favor compile time. (#4186)	Yong He	2024-05-17
\|
*	RasterizerOrder resource for spirv and metal. (#4175)	Yong He	2024-05-16
\| \| \| \| \| \| \| \| \| \| \| \| \|	* RasterizerOrder resource for spirv and metal. Also fixes the byte address buffer logic for metal. * Fix. * Delete commented lines. --------- Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
*	Support combined textures for Metal target (#4169)	Jay Kwak	2024-05-14
\|
*	Add LoadAligned and StoreAligned methods to ByteAddressBuffers (#4066)	Sriram Murali	2024-05-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes #4062 This change enables wide load/stores for byte-address-buffer backed resources, when the data is accessed at an offset that is aligned. Goals - Improve performance by issuing wider instructions instead of sequence of scalar instructions, for load and stores of byte-address buffers. - Reduce code-size and readability of the generated shaders. - Help naive users as well as ninja programmers, generate optimal code. Non Goals - Help with Structured buffers, or other resources. - Target compilation time improvements. Key changes Adds 2 new overloads for Load and Store operations on ByteAddress Buffers. 1. Load / Store with an extra alignment parameter ``` resource.Load<T>(offset, alignment); resource.Store<T>(offset, value, alignment); ``` 2. LoadAligned / StoreAligned with no extra parameter, with the same signature as orignial Load / Store. ``` resource.LoadAligned<T>(offset); resource.StoreAligned<T>(offset, value); ``` - This overload will implicitly identify the alignment value, from the base type T of the elementary unit of the resource. Supported resources 1. Vectors This can be upto 4 elements, i.e. float -- float4. 2. Arrays This does not have a limit on number of elements, but on a conservative estimate, we can limit to few hundreds. 3. Structures This is used to group a resource of a single type. ``` struct { float4 x; } ``` Code updates - Modified byte-address-ir legalize to handle struct, array and vector kinds of load or store access - Added custom hlsl stdlib functions to implement all the overloads for Load, Store etc. - Added C-like emitter, SPIR-V emitter for handling ByteAddressBuffers. - Added a new core stdlib function intrinsic to wrap around alignOf<T>(). - Added a new peephole optimization entry to identify the equivalent IntLiteral value from the alignOf<T>() inst. - Added tests to check explicit, and implicit aligned Load and Store operations.
*	Delete `wrap-global-context` pass. (#4114)	Yong He	2024-05-06
\| \| \| \| \| \| \|	* Delete `wrap-global-context` pass. The pass was added for the metal backend without realizing that the existing `explicit-global-context` does 99% of the job. Instead of duplicating the logic in a different pass for metal, we extend same explicit-global-context pass to work for metal. * Fix build.
*	Added diagnostics & built-in type lowering for `[CUDAKernel]` functions (#4042)	Sai Praveen Bangaru	2024-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Added diagnostics & built-in type lowering for `[CUDAKernel]` functions This PR adds - Diagnostics for non-void return from a cuda kernel entry point - Diagnostics for using differentiable types in a differentiable cuda kernel entry point - Logic for converting built-in types (float3, float3x3, etc..) to portable struct types and unpacks the parameter back into a built-in type on the CUDA side. This is because built-in types have different implementations in CUDA & CPP targets, which causes signature mis-match when linking. * Fix error codes * Add ability to lower structs and arrays that contain built-in types. + Added tests + Fix issue where the host-side was not marshalling data to lowered types. * Update slang-ir-pytorch-cpp-binding.cpp --------- Co-authored-by: Yong He <yonghe@outlook.com>
*	Generate vectorized version of byteaddress load/store methods (#4036)	Sriram Murali	2024-04-30
\| \| \| \| \| \| \| \| \| \| \| \|	Fixes #3533 - Add logic to perform aligned memory operations for loading from and storing to composite resources, like vectors within the ByteAddress legalize pass. - Checks Added a new test for byte address with/without alignment. --------- Co-authored-by: Yong He <yonghe@outlook.com>
*	Metal: Vertex/Fragment builtin and layouts. (#4044)	Yong He	2024-04-30
\| \| \| \| \| \| \| \| \|	* Metal: Vertex/Fragment builtin and layouts. * Fix. * Fix test. * Emit user semantic on vertex/fragment attributes.
*	WIP: Force Inline If RefType (#4005)	ArielG-NV	2024-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Force Inline if reftype Fixes #3997. If we are using a refType, we now ForceInline. remarks: 1. Modifications were made in slang-ir-glsl-legalize to change how we translate GlobalParam proxy's into GlobalParam. a. We now handle the senario where a globalParam is used in multiple disjoint blocks (like 2 different functions). * try to figure out why CI fails but local works try to inline DispatchMesh, works locally, may fail on CI(?) * try another fix * add task tests + don't allow semi-early task-shader inline Task shader uses DispatchMesh which is a very big 'hack' where we check for the function name and modify the callees in very large ways. This function does inline, but it cannot inline early due to future mangling that this operation requires todo. This is reflected with the `[noRefInline]` modifier. It is a modifier so users may stop mandatory inlines with `__ref` parameter.
*	Parameter layout and reflection for Metal bindings. (#4022)	Yong He	2024-04-24
\|
*	Switch to direct-to-spirv backend as default. (#4002)	Yong He	2024-04-23
\| \| \| \| \| \| \| \| \|	* Switch to direct-to-spirv backend as default. * Fix slang-test. * Fix. * Fix.
*	bit_cast & reinterpret warning if src->dst type not equally sized. (#3988)	ArielG-NV	2024-04-22
\| \| \| \| \| \| \| \| \|	* bit_cast & reinterpret warning if src->dst type not equally sized. bit_cast & reinterpret warning if src->dst type not equally sized. --------- Co-authored-by: Yong He <yonghe@outlook.com>
*	Metal: rewrite global variables as explicit context. (#3981)	Yong He	2024-04-18
\| \| \| \| \|	* Metal: rewrite global variables as explicit context. * Small tweaks.
*	Support combined texture sampler when targeting HLSL. (#3963)	Yong He	2024-04-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Support combined texture sampler when targeting HLSL. * Fix glsl intrinsics. * Update source/slang/slang-ir-lower-combined-texture-sampler.cpp Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> * Update source/slang/slang-ir-lower-combined-texture-sampler.cpp Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> * Update source/slang/slang-ir-lower-combined-texture-sampler.cpp Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> * Fix., * Enhance test. * Remove unused field. * Fix indentation --------- Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com>
*	Add skeleton for metal backend. (#3971)	Yong He	2024-04-17
\|
*	WIP: Fix the variable scope issue (#3838) (#3892)	kaizhangNV	2024-04-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix the variable scope issue (#3838) In the IR optimization pass, we turn all the loop to do-while loop form. But in the do-while loop form, the loop body block is dominating the blocks after the loop break block. This assumption is fine for SPIRV and IR code, however, it's incorrect for all the other language target (e.g. c/c++/cuda/glsl/hlsl) because the instructions defined in the loop body is invisible from outside of the loop. Therefore, when translating to other textual language, there could be issue for the variables scope. To fix this issue, we first detect the instructions that are defined inside the loop block, then check if these instructions are used after the break block. If so, we duplicate these instructions right before their users such that we can make those instructions available globally. * Update slang vcxproj file because of add new source files * Minor fix - Update the method to get the block of an instruction - Avoid query the hash-map twice by using "add" method directly. * Reduce complexity In searching loop region blocks, we don't actually need to traverse the instructions. Instead, we only have to check each block to see if it's in a loop region, and hash such block for later on processing. So we can remove one level of loop. In the second pass, we can use that hash to filter out the blocks that are not in the loop region, and only process the instructions inside the loop region. Add description for the new fix-up pass declared in slang-ir-variable-scope-correction.h. * Categorize the unstorable and storable instructs 1. When checking the loop regions, there could be multi-levels nested loops, so we should use a list to store the loopHeaders. 2. Categorize the instructs based on storable and non-storable, because we only have to duplicate the non-storable instructs. Note pointer type instruct is also belonged to non-storable class because we can not store a pointer in local variable. * Fix some test failure * Fix test failures * Recursively process the operands Besides process the out-of-scope instruction, we have to also process all the operands of this instructions. Therefore, we have to make the process logic recursive until all the involved instructions are accessible. * Change how to check storable type * Add target checking for CPP/CUDA In decide whether the type is storable, add target checking for CPP/CUDA as they can store any types. Cleanup the code to remove those debug log prints. * Addressing feedbacks Address some feedbacks. Change the depth-first traverse to breadth-first traverse when processing instruction and its operands. * Minor fix for the variable names
*	Legalization of non-struct when function expects struct, resolves #3840 (#3880)	ArielG-NV	2024-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Legalization of non-struct when expects struct. `__forceVarIntoStructTemporarily()` solves the issue of passing "non-struct type's" into a parameter that only accepts "struct type's". The intrinsic solves the issue through checking the parameter of the intrinsic: If the parameter is a "struct type" * Return a reference to the parameter else * a "struct type" Temporary variable is made and the "non struct type" parameter is copied to a member of this struct. This struct is then returned by `__forceVarIntoStructTemporarily()`. Optionally if the use location of this call is a argument which can have side effects (out, inout, ref, etc.) the temporary struct variable is copied into the original "non struct type" parameter. Testing code has "addComplexity" functions to avoid optimizations through forcing side effects so we can predict the code output. * Address review comments - ForceInline ray functions - fix testing - adjust how we replace operands in senarios to avoid unexpected side effects of replacing operands without any explicit checks * Adjust nv test slightly and remove .glsl file * Remove implicit LOD sampling & test additions - Implicit LOD sampling is not allowed in a raygen. Implicit LOD sampling requires depth (from a fragment shader) to sample. Raygen does not have the depth, so this function was replaced. - Changed other tests for correctness/clarity * Test if Falcor breaks through use of ForceInline * Add back force inline may need to look at how Falcor wrote its slang shaders. This will be done if ForceInline causes issues since ForceInline should not affect code gen in an impactable way.
*	Check cyclic types after specialization. (#3791)	Yong He	2024-03-18
\|
*	Fix SPIRV for mesh shaders, checks for invalid target code&recursion. (#3788)	Yong He	2024-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix #3780. * Fixers #3781. * Add test for #3781. * Diagnose error on unsupported builtin intrinsic types. * Add check for recursion. * Fix. * Fix. * Fix recursion detection. * Fix. * Fix. * Fix recursion logic. * More fix.
*	Implement raytracing extension(s); resolves #3560 for GLSL & SPIR-V targets ↵	ArielG-NV	2024-03-15
\| \| \| \| \| \| \| \| \|	(#3675) The following PR implements raytracing extensions (GLSL_EXT_ray_tracing, GLSL_EXT_ray_query, GLSL_NV_shader_invocation_reorder & GLSL_NV_ray_tracing_motion_blur); for GLSL & SPIR-V targets. Fully implements all functions, built-in variables, & syntax; resolves #3560 for GLSL & SPIR-V Targets. notes of worth: * __rayPayloadFromLocation, __rayAttributeFromLocation, and __rayCallableFromLocation, were added as SPIR-V Intrinsics to refer to location's of raytracing objects in SPIR-V for when using GLSL syntax.
*	Make type names spec-conformant in SPIRV reflect. (#3748)	Yong He	2024-03-12
\| \| \| \| \| \| \| \| \|	* Preserve ByteAddressBuffer user type name. * Make user type lowercase. * Make typenames conform to spec. * Use `SpvOpDecorateString`.
*	Add `-fvk-use-dx-position-w` and fix ExecutionMode ordering for geometry ↵	Yong He	2024-03-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	shaders. (#3731) * Add `-fvk-use-dx-position-w`. * Fix ordering of OutputVertices and output primitive type decoration in spirv. * Fix. * fix * Fix. * Move test around.
*	Enhance link-time type test. (#3724)	Yong He	2024-03-08
\| \| \| \| \| \| \|	* Enhance link-time type test. * Fix. * Fix.
*	Uniformity analysis. (#3704)	Yong He	2024-03-07
\| \| \| \| \|	* Uniformity analysis. * Add [NonUniformReturn] decorations to some hlsl intrinsic functions.
*	Extend `as` and `is` operator to work on generic types. (#3672)	Yong He	2024-03-04
\|
*	Refactor compiler option representations. (#3598)	Yong He	2024-02-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Refactor compiler option representation. * Fix binary compatibility. * Add a test for specifying compiler options at link time. * Fix binary compatibility. * Fix binary compatibility. * Fix backward compatibility on matrix layout. * Fix. * Fix. * Fix. * Fix gfx. * Fix gfx. * Fix dynamic dispatch. * Polish.
*	GLSL Passthrough support for SSBO types (#3446)	Ellie Hermaszewska	2024-02-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* GLSL Passthrough support for SSBO types * GLSL Passthrough support for SSBO types * Correctly apply glsl local size layout to entry points during lowering * Test for glsl layout correctness * typo * Reflect GLSL SSBO as raw buffers * Functional test for glsl ssbo * Allow allow glsl for render tests * Functional test for ssbo passthrough * Functional test for ssbo passthrough with spirv-direct * fix windows build error --------- Co-authored-by: Yong He <yonghe@outlook.com>
*	SPIRV Legalization fixes. (#3479)	Yong He	2024-01-23
\| \| \| \| \| \| \| \| \| \| \|	* Fix CFG legalization for SPIRV backend. * Emit DepthReplacing execution mode. * Fix do-while lowering. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Fix the intrinsic expansion of ObjectToWorld3x4 in spirv_asm. Data type (#3428)	Pankaj Mistry	2023-12-30
\|
*	Add ConstBufferPointer::subscript. (#3415)	Yong He	2023-12-15
\| \| \|	Co-authored-by: Yong He <yhe@nvidia.com>
*	GLSL SSBO Support (#3400)	Ellie Hermaszewska	2023-12-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Squash warnings and fix build with SLANG_EMBED_STDLIB * Add GLSLShaderStorageBuffer magic wrapper * Make GLSLSSBO not a uniform type * Buffers are global variables * Allow creating ssbo aggregate types * Allow reading from RWSB using builder * Nicer debug printing for ssbos * Lower SSBO to RWSB * Parse interface blocks into wrapped structs * Lower Interface Block Decls to structs * remove comment * Two simple ssbo tests * Move ssbo pass earlier * Correct mutable buffer detection * Do not replace ssbo usages outside of blocks * Treat GLSLSSBO as a mutable buffer for type layouts * regenerate vs projects * Correctly detect ssbo types * Diagnose illegal ssbo * remove unreachable code * neaten * ci wobble * Make GLSLSSBO ast handling more uniform * Add modifier cases for glsl * Use empty val info for unhandled interface blocks necessary for ./tests/glsl/out-binding-redeclaration.slang * more sophisticated modifier check * Correct ssbo wrapper name
*	Looks like `#3327` left in some debugging code. (#3411)	jsmall-nvidia	2023-12-14
\|
*	Fix GLSL static initialization bug. (#3409)	Yong He	2023-12-13
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix GLSL static initialization bug. Fixes #3408. * Update comment. * Fold global var initializer as an expression if possible. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Unify stdlib `Texture` types into one generic type. (#3327)	Yong He	2023-11-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Unify Texture types in stdlib into 1 generic type. * Fixes. * Fix. * Fixes. * Fix reflection. * Fix binding reflection. * Add gather intrinsics. * Fix gather intrinsics. * Fix texture type toText. * Fix intrinsic. * fix cuda intrinsic. * Fix project files. * cleanup. * Fix. * Fix. * Fix sampler feedback test. * Fix getDimension intrinsics. * Fix spirv sample image intrinsics. * Fix test. * Fix GLSL intrinsic. * Cleanup. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Add GLSL Compatibility. (#3321)	Yong He	2023-11-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Parse glsl buffer blocks to GLSLInterfaceBlockDecl * Parse glsl local size layout declarations * Parse (and ignore) glsl version directives * spelling * Better l-value interpretation for glsl interface blocks * Better l-value interpretation for glsl interface blocks * Add compile flag for enabling glsl * Parse and ignore precision modifiers. * Automatically import `glsl` module for compatiblity. * Complete vector and matrix types for glsl * Remove generated file from repo * Bump .gitignore * do not mark out globals as params * Synthesize entrypoint layout from global inout vars. * update test result. * Allow HLSL semantic on global variables. * Fix. * Fix test. * Fix win32 compile error. * Add more builtin input/output and texture intrinsics. * Add struct/array constructor syntax. * Skip `#extension` lines. * overide operator * for matrix/vector multiplication. * Add `matrixCompMult`. * Parse modifiers in for loop init var declr. * Add more glsl intrinsics, add stage into to var layout. * Allow `int[3] x` syntax. * Fix array type syntax. --------- Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com> Co-authored-by: Yong He <yhe@nvidia.com>
*	Small warnings and bugs (#3272)	Ellie Hermaszewska	2023-10-11
\| \| \| \| \| \| \| \| \| \| \|	* Correctly use removeTrivialSingleIterationLoops during simplification * remove unused variables * Fix invalid fallthrough --------- Co-authored-by: Yong He <yonghe@outlook.com>
*	Report spirv-opt time. (#3271)	Yong He	2023-10-11
\| \| \| \| \| \| \| \| \|	* Report spirv-opt time. * Removing timing logic in `loadModule`. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Run curated spirv-opt passes through slang-glslang. (#3266)	Yong He	2023-10-09
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Run curated spirv-opt passes through slang-glslang. * Cleanup. * Replace spirv-dis downstream compiler with glslang. * delete slang-spirv-opt.cpp. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Various AD Fixes (#3263)	Sai Praveen Bangaru	2023-10-05
\| \| \| \| \| \| \| \| \| \| \|	* Various fixes * Remove unused parameter * Update slang-ir-loop-unroll.cpp --------- Co-authored-by: Yong He <yonghe@outlook.com>
*	SPIRV compiler performance fixes. (#3258)	Yong He	2023-10-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* SPIRV compiler performance fixes. * Cleanup. * update project files * Cleanup debug code. * Make redundancy removal non-recursive. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	WIP Mesh shaders for SPIR-V (#3226)	Ellie Hermaszewska	2023-09-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* SPIR-V impl for SetMeshOutputCounts and DispatchMesh * Unsightly fix for legalization ordering differences between GLSL and SPIR-V * spelling * Start a new block after terminating one in the OpEmitMeshTasksExt SPIR-V asm block * Emit mesh shader decorations in SPIR-V * Mesh and task shader stages for spir-v * Output explicit gl builtins for spir-v * Be more hygenic when SOAizing mesh outputs * Do not create builtin paramter block for spirv mesh outputs * Pass mesh payloads around by ref * comment * less expected failure * remove unused * Add spirv op * Correct type query for default flat modifier --------- Co-authored-by: Yong He <yonghe@outlook.com>
*	Various SPIRV fixes. (#3231)	Yong He	2023-09-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Various SPIRV fixes. - Geometry shader support (WIP). - Fix texture get dimension and load. - Fold global GetElement(MakeArray/MakeVector) insts. - Call spvopt to inline all functions. - Translate OpImageSubscript. - Emit struct member names and global variable names. - Fix lowering of OpBitNot -> OpNot, instead of OpBitReverse. * Fix test. * Fix geometry shader. * Fix geometry shader emit. * Add atomic Image access test. * Fix tests. * don't fail if spirv-opt fails. * Update comments. * Fix test. * Cleanups. * indentation --------- Co-authored-by: Yong He <yhe@nvidia.com> Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
*	Revert inlining change in #3217. (#3229)	Yong He	2023-09-21
\| \| \|	Co-authored-by: Yong He <yhe@nvidia.com>
*	move global initializers to entry point for spirv (#3225)	Ellie Hermaszewska	2023-09-21
\| \| \| \| \| \| \| \| \|	* move global initializers to entry point for spirv * less expected failure --------- Co-authored-by: Yong He <yonghe@outlook.com>
*	Move force inlining step to before `processAutodiffCalls` (and run in loop) ↵	Sai Praveen Bangaru	2023-09-20
\| \| \| \| \| \| \| \| \| \| \| \| \|	(#3217) * Move auto-diff force inlining step to before `processAutodiffCalls` * Fix `replaceUsesWith` to handle existing inst defined after current use. * Fix. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Added `[AutoPyBindCUDA]` for automatic kernel binding + `[PyExport]` for ↵	Sai Praveen Bangaru	2023-09-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	exporting type information (#3209) * Initial: add a DiffTensor impl * Auto-binding and diff tensor implementations now work * Refactored diff-tensor implementation + added py-export for struct types * Cleanup * Update slang-ir-pytorch-cpp-binding.cpp * Updated test names * Update autodiff-data-flow.slang.expected * Add more versions of load/store & default generic args for DiffTensorView. * Add diagnostic for default generic arg and more tests * Add more `[AutoPyBind]` tests
*	Direct SPIRV ParameterBlock fix. (#3212)	Yong He	2023-09-19
\| \| \|	Co-authored-by: Yong He <yhe@nvidia.com>
*	Add Mesh and Task shader support to GFX (#3190)	Ellie Hermaszewska	2023-09-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Bump vulkan headers Also just use vulkan-headers as a submodule * Add drawMeshTasks to gfx graphics pipelines * Add DispatchMesh overload with no payload, with GLSL intrinsic * Require spirv 1.4 for mesh shaders * Add vulkan mesh shader feature discovery * Add mesh shader stage bits to vk-util * Add mesh and task shader support to render-test * Add mesh and task tests * Preserve "payload" specifier in task shaders * Add mesh shader pipeline support to gfx * Add TODO * Add numThreads attribute for amplification stage * Add payload to task shader test * Drop dependency on d3dx12 * Allow passing payloads from task to mesh shaders * regenerate vs projects * check DispatchMesh name correctly * Add mesh shader tests to failing tests * Detect wave-ops feature on vulkan * Add fuse-product to expected failures This fails because the global varaible `count` is not initialized * Add required extension to WaveMaskMatch SPIR-V impl * Remove meshShader member from pipeline desc * Identify mesh shader support on d3d12
*	Lower LValue implicit cast before autodiff. (#3194)	Yong He	2023-09-07
\| \| \|	Co-authored-by: Yong He <yhe@nvidia.com>