| Commit message (Collapse) | Author | Age |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add options to speedup compilation.
* Fix.
* Plumb options to DCE pass.
* Revert debug change.
* Fix regressions.
* More optimizations.
* more cleanup and fixes.
* remove comment.
* Fixes.
* Another fix.
* Fix errors.
* Fix errors.
* Add comments.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* RasterizerOrder resource for spirv and metal.
Also fixes the byte address buffer logic for metal.
* Fix.
* Delete commented lines.
---------
Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes #4062
This change enables wide load/stores for byte-address-buffer backed
resources, when the data is accessed at an offset that is aligned.
**Goals**
- Improve performance by issuing wider instructions instead of sequence
of scalar instructions, for load and stores of byte-address buffers.
- Reduce code-size and readability of the generated shaders.
- Help naive users as well as ninja programmers, generate optimal code.
**Non Goals**
- Help with Structured buffers, or other resources.
- Target compilation time improvements.
**Key changes**
Adds 2 new overloads for Load and Store operations on ByteAddress Buffers.
1. Load / Store with an extra alignment parameter
```
resource.Load<T>(offset, alignment);
resource.Store<T>(offset, value, alignment);
```
2. LoadAligned / StoreAligned with no extra parameter,
with the same signature as orignial Load / Store.
```
resource.LoadAligned<T>(offset);
resource.StoreAligned<T>(offset, value);
```
- This overload will implicitly identify the alignment value,
from the base type T of the elementary unit of the resource.
**Supported resources**
1. Vectors
This can be upto 4 elements, i.e. float -- float4.
2. Arrays
This does not have a limit on number of elements, but on a
conservative estimate, we can limit to few hundreds.
3. Structures
This is used to group a resource of a single type.
```
struct {
float4 x;
}
```
**Code updates**
- Modified byte-address-ir legalize to handle struct, array and vector
kinds of load or store access
- Added custom hlsl stdlib functions to implement all the overloads for Load,
Store etc.
- Added C-like emitter, SPIR-V emitter for handling ByteAddressBuffers.
- Added a new core stdlib function intrinsic to wrap around alignOf<T>().
- Added a new peephole optimization entry to identify the equivalent
IntLiteral value from the alignOf<T>() inst.
- Added tests to check explicit, and implicit aligned Load and Store
operations.
|
| |
|
|
|
|
|
| |
* Delete `wrap-global-context` pass.
The pass was added for the metal backend without realizing that the existing `explicit-global-context` does 99% of the job. Instead of duplicating the logic in a different pass for metal, we extend same explicit-global-context pass to work for metal.
* Fix build.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Added diagnostics & built-in type lowering for `[CUDAKernel]` functions
This PR adds
- Diagnostics for non-void return from a cuda kernel entry point
- Diagnostics for using differentiable types in a differentiable cuda kernel entry point
- Logic for converting built-in types (float3, float3x3, etc..) to portable struct types and unpacks the parameter back into a built-in type on the CUDA side. This is because built-in types have different implementations in CUDA & CPP targets, which causes signature mis-match when linking.
* Fix error codes
* Add ability to lower structs and arrays that contain built-in types.
+ Added tests
+ Fix issue where the host-side was not marshalling data to lowered types.
* Update slang-ir-pytorch-cpp-binding.cpp
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Fixes #3533
- Add logic to perform aligned memory operations for loading from and storing to composite resources, like vectors within the ByteAddress legalize pass.
- Checks
Added a new test for byte address with/without alignment.
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
| |
|
|
|
|
|
|
|
| |
* Metal: Vertex/Fragment builtin and layouts.
* Fix.
* Fix test.
* Emit user semantic on vertex/fragment attributes.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Force Inline if reftype
Fixes #3997.
If we are using a refType, we now ForceInline.
remarks:
1. Modifications were made in slang-ir-glsl-legalize to change how we translate GlobalParam proxy's into GlobalParam.
a. We now handle the senario where a globalParam is used in multiple disjoint blocks (like 2 different functions).
* try to figure out why CI fails but local works
try to inline DispatchMesh, works locally, may fail on CI(?)
* try another fix
* add task tests + don't allow semi-early task-shader inline
Task shader uses DispatchMesh which is a very big 'hack' where we check for the function name and modify the callees in very large ways. This function does inline, but it cannot inline early due to future mangling that this operation requires todo. This is reflected with the `[noRefInline]` modifier. It is a modifier so users may stop mandatory inlines with `__ref` parameter.
|
| | |
|
| |
|
|
|
|
|
|
|
| |
* Switch to direct-to-spirv backend as default.
* Fix slang-test.
* Fix.
* Fix.
|
| |
|
|
|
|
|
|
|
| |
* bit_cast & reinterpret warning if src->dst type not equally sized.
bit_cast & reinterpret warning if src->dst type not equally sized.
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
| |
|
|
|
| |
* Metal: rewrite global variables as explicit context.
* Small tweaks.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Support combined texture sampler when targeting HLSL.
* Fix glsl intrinsics.
* Update source/slang/slang-ir-lower-combined-texture-sampler.cpp
Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com>
* Update source/slang/slang-ir-lower-combined-texture-sampler.cpp
Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com>
* Update source/slang/slang-ir-lower-combined-texture-sampler.cpp
Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com>
* Fix.,
* Enhance test.
* Remove unused field.
* Fix indentation
---------
Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com>
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Fix the variable scope issue (#3838)
In the IR optimization pass, we turn all the loop to do-while loop form.
But in the do-while loop form, the loop body block is dominating the
blocks after the loop break block. This assumption is fine for SPIRV and
IR code, however, it's incorrect for all the other language target (e.g.
c/c++/cuda/glsl/hlsl) because the instructions defined in the loop body
is invisible from outside of the loop. Therefore, when translating to
other textual language, there could be issue for the variables scope.
To fix this issue, we first detect the instructions that are defined
inside the loop block, then check if these instructions are used after
the break block. If so, we duplicate these instructions right before
their users such that we can make those instructions available globally.
* Update slang vcxproj file because of add new source files
* Minor fix
- Update the method to get the block of an instruction
- Avoid query the hash-map twice by using "add" method directly.
* Reduce complexity
In searching loop region blocks, we don't actually need to traverse the
instructions. Instead, we only have to check each block to see if it's
in a loop region, and hash such block for later on processing.
So we can remove one level of loop.
In the second pass, we can use that hash to filter out the blocks that
are not in the loop region, and only process the instructions inside the
loop region.
Add description for the new fix-up pass declared in
slang-ir-variable-scope-correction.h.
* Categorize the unstorable and storable instructs
1. When checking the loop regions, there could be multi-levels nested
loops, so we should use a list to store the loopHeaders.
2. Categorize the instructs based on storable and non-storable, because
we only have to duplicate the non-storable instructs. Note pointer
type instruct is also belonged to non-storable class because we can
not store a pointer in local variable.
* Fix some test failure
* Fix test failures
* Recursively process the operands
Besides process the out-of-scope instruction, we have to also process
all the operands of this instructions. Therefore, we have to make the
process logic recursive until all the involved instructions are
accessible.
* Change how to check storable type
* Add target checking for CPP/CUDA
In decide whether the type is storable, add target checking for CPP/CUDA
as they can store any types.
Cleanup the code to remove those debug log prints.
* Addressing feedbacks
Address some feedbacks.
Change the depth-first traverse to breadth-first traverse when
processing instruction and its operands.
* Minor fix for the variable names
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Legalization of non-struct when expects struct.
`__forceVarIntoStructTemporarily()` solves the issue of passing "non-struct type's" into a parameter that only accepts "struct type's".
The intrinsic solves the issue through checking the parameter of the intrinsic:
If the parameter is a "struct type"
* Return a reference to the parameter
else
* a "struct type" Temporary variable is made and the "non struct type" parameter is copied to a member of this struct. This struct is then returned by `__forceVarIntoStructTemporarily()`. Optionally if the use location of this call is a argument which can have side effects (out, inout, ref, etc.) the temporary struct variable is copied into the original "non struct type" parameter.
Testing code has "addComplexity" functions to avoid optimizations through forcing side effects so we can predict the code output.
* Address review comments
- ForceInline ray functions
- fix testing
- adjust how we replace operands in senarios to avoid unexpected side effects of replacing operands without any explicit checks
* Adjust nv test slightly and remove .glsl file
* Remove implicit LOD sampling & test additions
- Implicit LOD sampling is not allowed in a raygen. Implicit LOD sampling requires depth (from a fragment shader) to sample. Raygen does not have the depth, so this function was replaced.
- Changed other tests for correctness/clarity
* Test if Falcor breaks through use of ForceInline
* Add back force inline
may need to look at how Falcor wrote its slang shaders. This will be done if ForceInline causes issues since ForceInline should not affect code gen in an impactable way.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Fix #3780.
* Fixers #3781.
* Add test for #3781.
* Diagnose error on unsupported builtin intrinsic types.
* Add check for recursion.
* Fix.
* Fix.
* Fix recursion detection.
* Fix.
* Fix.
* Fix recursion logic.
* More fix.
|
| |
|
|
|
|
|
|
|
| |
(#3675)
The following PR implements raytracing extensions (GLSL_EXT_ray_tracing, GLSL_EXT_ray_query, GLSL_NV_shader_invocation_reorder & GLSL_NV_ray_tracing_motion_blur); for GLSL & SPIR-V targets. Fully implements all functions, built-in variables, & syntax; resolves #3560 for GLSL & SPIR-V Targets.
notes of worth:
* __rayPayloadFromLocation, __rayAttributeFromLocation, and __rayCallableFromLocation, were added as SPIR-V Intrinsics to refer to location's of raytracing objects in SPIR-V for when using GLSL syntax.
|
| |
|
|
|
|
|
|
|
| |
* Preserve ByteAddressBuffer user type name.
* Make user type lowercase.
* Make typenames conform to spec.
* Use `SpvOpDecorateString`.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shaders. (#3731)
* Add `-fvk-use-dx-position-w`.
* Fix ordering of OutputVertices and output primitive type decoration in spirv.
* Fix.
* fix
* Fix.
* Move test around.
|
| |
|
|
|
|
|
| |
* Enhance link-time type test.
* Fix.
* Fix.
|
| |
|
|
|
| |
* Uniformity analysis.
* Add [NonUniformReturn] decorations to some hlsl intrinsic functions.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Refactor compiler option representation.
* Fix binary compatibility.
* Add a test for specifying compiler options at link time.
* Fix binary compatibility.
* Fix binary compatibility.
* Fix backward compatibility on matrix layout.
* Fix.
* Fix.
* Fix.
* Fix gfx.
* Fix gfx.
* Fix dynamic dispatch.
* Polish.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* GLSL Passthrough support for SSBO types
* GLSL Passthrough support for SSBO types
* Correctly apply glsl local size layout to entry points during lowering
* Test for glsl layout correctness
* typo
* Reflect GLSL SSBO as raw buffers
* Functional test for glsl ssbo
* Allow allow glsl for render tests
* Functional test for ssbo passthrough
* Functional test for ssbo passthrough with spirv-direct
* fix windows build error
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
* Fix CFG legalization for SPIRV backend.
* Emit DepthReplacing execution mode.
* Fix do-while lowering.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
| | |
|
| |
|
| |
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Squash warnings and fix build with SLANG_EMBED_STDLIB
* Add GLSLShaderStorageBuffer magic wrapper
* Make GLSLSSBO not a uniform type
* Buffers are global variables
* Allow creating ssbo aggregate types
* Allow reading from RWSB using builder
* Nicer debug printing for ssbos
* Lower SSBO to RWSB
* Parse interface blocks into wrapped structs
* Lower Interface Block Decls to structs
* remove comment
* Two simple ssbo tests
* Move ssbo pass earlier
* Correct mutable buffer detection
* Do not replace ssbo usages outside of blocks
* Treat GLSLSSBO as a mutable buffer for type layouts
* regenerate vs projects
* Correctly detect ssbo types
* Diagnose illegal ssbo
* remove unreachable code
* neaten
* ci wobble
* Make GLSLSSBO ast handling more uniform
* Add modifier cases for glsl
* Use empty val info for unhandled interface blocks
necessary for ./tests/glsl/out-binding-redeclaration.slang
* more sophisticated modifier check
* Correct ssbo wrapper name
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Fix GLSL static initialization bug.
Fixes #3408.
* Update comment.
* Fold global var initializer as an expression if possible.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Unify Texture types in stdlib into 1 generic type.
* Fixes.
* Fix.
* Fixes.
* Fix reflection.
* Fix binding reflection.
* Add gather intrinsics.
* Fix gather intrinsics.
* Fix texture type toText.
* Fix intrinsic.
* fix cuda intrinsic.
* Fix project files.
* cleanup.
* Fix.
* Fix.
* Fix sampler feedback test.
* Fix getDimension intrinsics.
* Fix spirv sample image intrinsics.
* Fix test.
* Fix GLSL intrinsic.
* Cleanup.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Parse glsl buffer blocks to GLSLInterfaceBlockDecl
* Parse glsl local size layout declarations
* Parse (and ignore) glsl version directives
* spelling
* Better l-value interpretation for glsl interface blocks
* Better l-value interpretation for glsl interface blocks
* Add compile flag for enabling glsl
* Parse and ignore precision modifiers.
* Automatically import `glsl` module for compatiblity.
* Complete vector and matrix types for glsl
* Remove generated file from repo
* Bump .gitignore
* do not mark out globals as params
* Synthesize entrypoint layout from global inout vars.
* update test result.
* Allow HLSL semantic on global variables.
* Fix.
* Fix test.
* Fix win32 compile error.
* Add more builtin input/output and texture intrinsics.
* Add struct/array constructor syntax.
* Skip `#extension` lines.
* overide operator * for matrix/vector multiplication.
* Add `matrixCompMult`.
* Parse modifiers in for loop init var declr.
* Add more glsl intrinsics, add stage into to var layout.
* Allow `int[3] x` syntax.
* Fix array type syntax.
---------
Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
* Correctly use removeTrivialSingleIterationLoops during simplification
* remove unused variables
* Fix invalid fallthrough
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
| |
|
|
|
|
|
|
|
| |
* Report spirv-opt time.
* Removing timing logic in `loadModule`.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Run curated spirv-opt passes through slang-glslang.
* Cleanup.
* Replace spirv-dis downstream compiler with glslang.
* delete slang-spirv-opt.cpp.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
* Various fixes
* Remove unused parameter
* Update slang-ir-loop-unroll.cpp
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* SPIRV compiler performance fixes.
* Cleanup.
* update project files
* Cleanup debug code.
* Make redundancy removal non-recursive.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* SPIR-V impl for SetMeshOutputCounts and DispatchMesh
* Unsightly fix for legalization ordering differences between GLSL and SPIR-V
* spelling
* Start a new block after terminating one in the OpEmitMeshTasksExt SPIR-V asm block
* Emit mesh shader decorations in SPIR-V
* Mesh and task shader stages for spir-v
* Output explicit gl builtins for spir-v
* Be more hygenic when SOAizing mesh outputs
* Do not create builtin paramter block for spirv mesh outputs
* Pass mesh payloads around by ref
* comment
* less expected failure
* remove unused
* Add spirv op
* Correct type query for default flat modifier
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Various SPIRV fixes.
- Geometry shader support (WIP).
- Fix texture get dimension and load.
- Fold global GetElement(MakeArray/MakeVector) insts.
- Call spvopt to inline all functions.
- Translate OpImageSubscript.
- Emit struct member names and global variable names.
- Fix lowering of OpBitNot -> OpNot, instead of OpBitReverse.
* Fix test.
* Fix geometry shader.
* Fix geometry shader emit.
* Add atomic Image access test.
* Fix tests.
* don't fail if spirv-opt fails.
* Update comments.
* Fix test.
* Cleanups.
* indentation
---------
Co-authored-by: Yong He <yhe@nvidia.com>
Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
|
| |
|
| |
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
| |
* move global initializers to entry point for spirv
* less expected failure
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
(#3217)
* Move auto-diff force inlining step to before `processAutodiffCalls`
* Fix `replaceUsesWith` to handle existing inst defined after current use.
* Fix.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
exporting type information (#3209)
* Initial: add a DiffTensor impl
* Auto-binding and diff tensor implementations now work
* Refactored diff-tensor implementation + added py-export for struct types
* Cleanup
* Update slang-ir-pytorch-cpp-binding.cpp
* Updated test names
* Update autodiff-data-flow.slang.expected
* Add more versions of load/store & default generic args for DiffTensorView.
* Add diagnostic for default generic arg and more tests
* Add more `[AutoPyBind]` tests
|
| |
|
| |
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Bump vulkan headers
Also just use vulkan-headers as a submodule
* Add drawMeshTasks to gfx graphics pipelines
* Add DispatchMesh overload with no payload, with GLSL intrinsic
* Require spirv 1.4 for mesh shaders
* Add vulkan mesh shader feature discovery
* Add mesh shader stage bits to vk-util
* Add mesh and task shader support to render-test
* Add mesh and task tests
* Preserve "payload" specifier in task shaders
* Add mesh shader pipeline support to gfx
* Add TODO
* Add numThreads attribute for amplification stage
* Add payload to task shader test
* Drop dependency on d3dx12
* Allow passing payloads from task to mesh shaders
* regenerate vs projects
* check DispatchMesh name correctly
* Add mesh shader tests to failing tests
* Detect wave-ops feature on vulkan
* Add fuse-product to expected failures
This fails because the global varaible `count` is not initialized
* Add required extension to WaveMaskMatch SPIR-V impl
* Remove meshShader member from pipeline desc
* Identify mesh shader support on d3d12
|
| |
|
| |
Co-authored-by: Yong He <yhe@nvidia.com>
|