| Age | Commit message (Collapse) | Author |
|
And also add an actual test case from the User Guide example.
|
|
* Handle type check cache update on extensions more gracefully.
* Correctness fix.
* Cache implcit cast overload resolution results.
* Fix.
* More optimizations.
* Cache implicit default ctor resolution.
* Disable redundancy removal.
* Fix.
* Fix test.
* Fix.
* Correctness fix.
* Fix.
* Fix,
* Fix test.
* Small tweak.
|
|
* push fix: if no sample, set to 0 for textureMS
* push fixes to hlsl [] operator + test so it will error
|
|
per shader program. (#4189)
* Emit only 1 execution mode of type per entry point
Added a dictionary<SpvWord,Hash<ExecutionMode>> to ensure we don't emit multiple.
* get inst->id directly
* address review + fix test
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* SPIR-V support for GLSL texture functions
Closes #4147
This commit implements GLSL texture functions with SPIR-V intrinsics.
It also implements some of missing GLSL implementations.
- textureProj
- textureLod
- texelFetchOffset
- textureProjOffset
- textureLodOffset
- textureProjLod
- textureProjLodOffset
- textureGrad
- textureGradOffset
- textureProjGrad
- textureProjGradOffset
* Fix SPIR-V issues discovered while improving the test case.
* Add __requireComputeDerivative() whenever sampling
* Do not touch GetDimensions
|
|
|
|
|
|
* RasterizerOrder resource for spirv and metal.
Also fixes the byte address buffer logic for metal.
* Fix.
* Delete commented lines.
---------
Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
|
|
* Capabilities System, Backing Logic Overhaul
Fixes #4015
Problems to address:
1. Currently the capabilities system spends anywhere from 25-50% of compile time on the CapabilityVisitor. Most of this time is spent on join logic: 1. Finding abstract atoms 2. Comparing list1<->list2. This should and can be made significantly faster.
2. Error system does not produce errors with auxiliary information. This will require a partial redesign to provide more useful semantic information for debugging.
What was addressed:
1. Array backed `CapabilityConjunctionSet` was replaced in-favor for a `UIntSet` backed `CapabilityTargetSets`. The design is described below.
Design:
* `CapabilityTargetSets` is a `Dictionary<targetAtom, CapabilityTargetSet>`. This is not an array for 2 reasons: 1. Easy to figure out which target is missing between two `CapabilityTargetSets` 2. To statically allocate an array requires the preprocessor to manually annotate which Capability is a target and link that Capability to an index. This means a dictionary is required for lookup regardless of implementation.
* `CapabilityTargetSet` is an intermediate representation of all capabilities for a singular `target` atom (`glsl`, `hlsl`, `metal`, ...). This structure contains a dictionary to all stage specific capability sets for fast lookup of stage capabilities supported by a `CapabilitySet` for a `target` atom. This reduces number of sets searched.
* `CapabilityStageSet` is an intermediate representation of all capabilities for a singular `stage` atom (`vertex`, `fragment`, ...). This structure holds all disjoint capability sets for a `stage`. A disjoint set is rare, but may exist in some scenarios (as an example): `{glsl, EXT_GL_FOO}{glsl, _GLSL_130, _GLSL_150}`. This reduces the number of sets searched.
* `UIntSet` is the main reason for the redesign for better performance and memory usage. All set operations only require a few operations, making all set logic trivial and with minimal cost to run. All algorithms were modified to focus around `UIntSet` operations.
2. Errors
* Semantic information are now better linked to the calling function to provide a connection of function<->function_body for when saving semantic information for errors.
* Missing targets now print errors much like other error code by finding code which could be a cause of incompatibility.
What is missing:
1. Add non naive support for non-stage specific capabilities such as `{hlsl, _sm_5_0}`. Currently non stage specific targets emulate the behavior through assigning such capabilities to every stage: `{hlsl, _sm_5_0, vertex} {hlsl, _sm_5_0, fragment}...`. Removal of this behavior would remove redundant shader stage sets being made at construction time (~80% of new implementation runtime). This is an addition, not an overhaul.
2. Optionally: `UIntSet` should be modified to support SIMD operations for significantly faster operations. This is not required immediately since `UIntSet` is already not a performance constraint.
Notes:
* UIntSet had implementation bugs which were fixed in this PR.
* The old capabilities system had bugs which were fixed in this PR when transforming to the new implementation.
* fix .natvis debug view
* Small optimizations I found while working on the addition
the AST building pass looks like so now:
1% = ~capabilitySet
2% = capabilitySet()
1.5% capabilitySet::unionWith()
0.8% capabilitySet::join()
1.5% auxillary info for debugging
~0.5-1% extra visitor overhead
~5% total for the visitor
~6.5% for total runtime costs
* fix caps which were wrong but worked
* push minor syntax fix (still looking for why other tests fail)
* perf & bug fixes
1. did not properly remake isBetterForTarget for this->empty case with that as Invalid. This is best case in this senario.
2. Remade seralizer for stdlib generation. Faster (more direct) & cleaner code.
NOTE: did not address review comments
* fix glsl.meta caps error
* fixing findBest logic again & UIntSet wrapper
findBest was not checking for 'more specialized' targets & was element counter was flawed
* faster getElements algorithm + natvis for UIntSet + wrong warning
* type incompatability of bitscanForward implementations
* try to fix warnings again
* remove ptr for clang intrinsic
* add missing header
* ifdef to allow clang compile
* compiler hackery to fix up platform/type independent operations
* bracket
* fix MSVC error
* missing template
* change types out again
* changes to fix compiling
* adjustment to parameter for Clang/GCC
* added iterator to delay processing all atomSets of a CapabilitySet
* add a few missing consts's
* ensure we never have more than 1 disjointSet
Added a wrapper + assert + union functionality to all possible disjoint sets. This was done in favor of a removal of the LinkedList for 2 reasons:
1. We still need 0-1 set functionality.
2. Might as well keep the code, just disallow the problematic functionality.
* address review comments
non linked-list refactor review comments addressed; add doc comments + remove redundant code
* comments + remove isValid for bool operator
* push removal of linkedlist for capabilities
* add missing break
* address review comments
minor adjustments of syntax
* push a fix to the `CapabilitySet({shader, missing target})` code
* quality + error
1. add iterator to UIntSet
2. do not specialize target_switch if profile is derived from case (GLSL_150 is not compatable with GLSL_400)
* fix target_switch erroring + temporarily remove UIntSet::Interator
temporarily remove UIntSet::Interator. It will be added after, testing code on CI first so I can multi-task fixing the UIntSet Iterator
* fix the UIntSet iterator
* Revert "fix the UIntSet iterator" temporarily to pull from master
* add metal error as per texture.slang
(took a while I realize this was why things were breaking, likely should adjust errors to reflect this)
* Rework UIntSet to have a template for output type
This is done so it is reasonable to debug the iterator output and not just dealing with messy int's
Fix problems with the iterators implemented + invalid capabilities handling
* removed incorrect `__target_switch` capability
barycentric was being used with anticipation of `profile glsl450`, this does not expand into `GL_EXT_fragment_shader_barycentric`, this instead caused an error which is hidden during cross-compile.
* remove some uses of getElements
* remove undeclared_stage for now
* remove redundant code associated with `undeclared_stage`
* remove unused variable
* address review
specifically to note removed static in a thread dangerous scope. Now using a `const static` for read only (thread safe) which precompile steps generate
* move GLSL_150 capdef change to sm_4_1 (more accurate)
* address most review comments
did not address: https://github.com/shader-slang/slang/pull/4145#discussion_r1602256776
* revert incorrect code review suggestion
* push changes for all code review suggestions
|
|
* Add diagnostic to prevent defining unsized static variables.
* Fix tests.
* Add more tests.
* Fix to allow defining variables of link-time size.
* update diagnostic message.
* Fix tests.
* Simplify code.
|
|
|
|
* Remove use of `G0` and `__target_intrinsic` in stdlib.
* Fix.
* Fix calling intrinsic in global scope.
|
|
* Impl texture APIs for Metal target
This commit is to implement texture functions for Metal target.
The following functions are implemented and tested.
- GetDimensions()
- CalculateLevelOfDetail()
- CalculateLevelOfDetailUnclamped()
- Sample()
- SampleBias()
- SampleLevel()
- SampleCmp()
- SampleCmpLevelZero()
- Gather()
- SampleGrad()
- Load()
Metal has limited support for the texture functions compared to HLSL.
- LOD is not supported for 1D texture,
- Depth textures are limited to 2D, 2DArray, Cube and CubeArray
textures.
- "Offset" variants are limited to 2D, 2DArray, 2D-Depth,
2DArray-Depth and 3D textures.
The functions that cannot be implemented for Metal should properly
be handled by the capability system later.
* Fix the failing test, multi-file.hlsl
I am not sure why this change is needed.
* Fix compile errors on macOS 2nd try
* Remove a typo character to fix the compile error
* Trivial clean up
* Remove `as_type` where it was intended as static_cast
* Use a simpler sytax for __intrinsic_asm
* Trivial clean up
* Remove TEST_AFTER_FIXING_CAPABILITY_PROBLEM after fixing normalize
* Fix the failing test properly
* Fix an incorrect setup of Depth-cube texture
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
Handles a corner case where the first block after the condition on the true-side is another condition. This would currently result in an invalid reverse graph, where the reverse version of the true-block is the merge point for two different branching insts (the reverse version of the loop as well as the second condition).
This patch simply adds a blank block when constructing the reverse-loop (similar to critical edge breaking) so that each branch inst in the reversed loop has a unique merge block.
|
|
Slang APIs are documented as taking UTF-8 encoded shader source,
though it's not explicitly documented whether it is allowed to
include a BOM (Byte Order Marker). This change adds support for
UTF-8 BOM markers by virtue of disposing of BOM data. As a bonus,
UTF-16 input which can cleanly decode to UTF-8 is now also
accepted.
Throwing out the BOM on input is done by leveraging existing
functionality in "determineEncoding()", however a bug exists there
for null-terminated single character input, where the null byte
caused a heuristic to guess UTF-16, even though the null byte
isn't part of the string. The bug in "determineEncoding" is fixed
by only guessing when bytes >= 2 and not looking past the end
of the buffer. The 'implicit-cast' test was mistakenly relying
on the bug to pass, as its expected file was being read as UTF16
and cropped to zero length due to the bug. The expected output
of implicit-cast is updated to pass with the bug fix in place.
The decoding of UTF-16 to UTF-8 is done through an existing
'decode' method. This change fixes a bug in UTF16-LE 'decode'
where it was decoded as if it were Big-Endian.
Adds 3 small tests to ensure the compiler doesn't choke on source
files in UTF-8 (with BOM), UTF16-LE, or UTF16-BE.
Bonus: Fixes a bug in diagnostic reporting where hex values were
incorrectly translated to text, leading to incorrect, possibly
truncated strings.
Fixes #4046
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
Fixes #4062
This change enables wide load/stores for byte-address-buffer backed
resources, when the data is accessed at an offset that is aligned.
**Goals**
- Improve performance by issuing wider instructions instead of sequence
of scalar instructions, for load and stores of byte-address buffers.
- Reduce code-size and readability of the generated shaders.
- Help naive users as well as ninja programmers, generate optimal code.
**Non Goals**
- Help with Structured buffers, or other resources.
- Target compilation time improvements.
**Key changes**
Adds 2 new overloads for Load and Store operations on ByteAddress Buffers.
1. Load / Store with an extra alignment parameter
```
resource.Load<T>(offset, alignment);
resource.Store<T>(offset, value, alignment);
```
2. LoadAligned / StoreAligned with no extra parameter,
with the same signature as orignial Load / Store.
```
resource.LoadAligned<T>(offset);
resource.StoreAligned<T>(offset, value);
```
- This overload will implicitly identify the alignment value,
from the base type T of the elementary unit of the resource.
**Supported resources**
1. Vectors
This can be upto 4 elements, i.e. float -- float4.
2. Arrays
This does not have a limit on number of elements, but on a
conservative estimate, we can limit to few hundreds.
3. Structures
This is used to group a resource of a single type.
```
struct {
float4 x;
}
```
**Code updates**
- Modified byte-address-ir legalize to handle struct, array and vector
kinds of load or store access
- Added custom hlsl stdlib functions to implement all the overloads for Load,
Store etc.
- Added C-like emitter, SPIR-V emitter for handling ByteAddressBuffers.
- Added a new core stdlib function intrinsic to wrap around alignOf<T>().
- Added a new peephole optimization entry to identify the equivalent
IntLiteral value from the alignOf<T>() inst.
- Added tests to check explicit, and implicit aligned Load and Store
operations.
|
|
* Fix race-condition and visual artifacts issues
In PerformanceProfiler::getProfiler() we return a static object
for the profiler implementation, this is not thread-safe, so change
it to thead_local.
There is still some visual artifacts when using slang as the shading
language. We don't know the root cause yet, but found out it's related
to our loop inversion algorithm. So stage this feature for now, and turn
it into an internal option and default off. We will re-enable it after
more investigation on this optimization.
File an new issue 4151 to track it.
* Add '-loop-inversion' to the few tests
|
|
|
|
derivatives (#4136)
* Add stdlib tests for `clamp` derivatives which also checks `max` and `min` derivatives
* Extend test
|
|
|
|
Fixes #4112.
|
|
Fixes #4110.
|
|
|
|
* Fix NonUniformResourceIndex legalization for SPIRV.
* Update gh-4131.slang
|
|
|
|
* Support Metal math functions
Closes #4024
Note that Metal document says Metal doesn't support "double" type;
"Metal does not support the double, long long, unsigned long long,
and long double data types."
According to Metal document, math functions are not defined for
integer types.
That leaves only two types to test: half and float.
As a code clean up, __floatCast is replaced with __realCast.
But I had to add a new signature that can convert from integer to
float.
Some of GLSL functions are moved to hlsl.meta.slang.
For those functions, there isn't builtin functions for HLSL but
there are for GLSL and Metal.
"nextafter(T,T)" is currently not working because it requires Metal
version 3.1 and we invoke metal compiler with a profile version
lower than 3.1.
* Changes based on review comments.
|
|
|
|
* Add host shared library target.
* Attempt fix.
* Fix warnings.
* try fix.
* Fix test.
* Fix.
|
|
* Fix `Ptr::__subscript` to accept any integer index.
* Fix `Ptr::__subscript` to allow 64bit indices.
|
|
* Utilize vector operations over scalar if possible
Closes #4085
* Fix for the failing CI
[ForceUnroll] is removed because it changed the emitted SPIR-V code a
little differently for half-conversion.slang.
SPIR-V code style is changed to a more preferred style,
from "OpXX $$T result $x"
to "result:$$T = OpXX $x"
|
|
|
|
* Fix unzipping logic for inout non-diff parameters and adjust tests
+ Removed `-g0` from `struct-this-parameter.slang` test. Works correctly with the new unzipping logic.
+ Removed `-g0` from `was/warped-sampling-1d.slang` test. Works correctly with DX12 & CS_5_1. CS_5_0 appears to run into an FXC compiler bug with detecting infinite loops where there don't appear to be any.
* Update slang-ir-autodiff-unzip.h
* Update warped-sampling-1d.slang
|
|
* Avoid synthesis for when types can be used as their own differenial
+ Add test
* Add missing files..
* Fix issue with method synthesis for self-differential types
+ Add a generic test
* Fix
* Fix issue with out-of-date type resolution cache.
Witness tables created during the conformance checking phase not being taken into account during the decl type resolution phase because the epoch is not updated after conformance checking.
This leads to certain complex associated-type lookup chains (such as the one in tests/compute/assoctype-nested-lookup) not resolving properly and causing errors.
* Delete self-differential-type-synthesis-extension.slang
* Quick fix to repopulate stdlib cache for deferred stdlib loading
* Update slang-check-decl.cpp
|
|
The syntax like:
[__AttributeUsage(_AttributeTargets.Var)]
[__AttributeUsage(_AttributeTargets.Param)]
struct DefaultValueAttribute
{
int iParam;
};
is allowed.
For user-defined attribute, we can specify more attribute targets on the
attribute declaration. So one attribute can be used in more than one
situations.
|
|
* Fix fmod behavior targetting GLSL and SPIR-V
The default implementation of fmod was doing "Modulo" operation when
"fmod" in HLSL should do "remainder" operation.
* Fix a mistake in `fmod` GLSL target
When using __intrinsic_asm, the "if" logic wasn't emitted.
"__intrinsic_asm" had to be called from a new function and `fmod` had to
call it.
Alternatively, I am using `operator?()` to workaround.
A similar modification is made to `roundEven()` hoping for a better
performance.
|
|
Fixes #4051
This commit implements SPIR-V target for GLSL functions.
It also fixes a few problesm of GLSL targetting implemention too.
|
|
The reflection test doesn't print the user attributes decorating for
the variables, only types. Therefore, add the print for user attributes
of variables.
|
|
* Fix compile failures when using debug symbol.
* Various fixes.
* Fix intrinsic.
* Fix test.
|
|
* SPIRV: Fix performance issue when handling large arrays.
* Add test for packing.
* Fix clang.
|
|
In SPIRV legalization, a struct wrapper is created around push constants
but is not itself legalized. Putting the struct type into the work
list causes the storage access of the push constant pointer to be
PhysicalStorageBuffer as expected, instead of Function scope that was
produced without the added struct legalization.
Adds a SPIRV test that exercises the fix.
Fixes #3946
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* Added diagnostics & built-in type lowering for `[CUDAKernel]` functions
This PR adds
- Diagnostics for non-void return from a cuda kernel entry point
- Diagnostics for using differentiable types in a differentiable cuda kernel entry point
- Logic for converting built-in types (float3, float3x3, etc..) to portable struct types and unpacks the parameter back into a built-in type on the CUDA side. This is because built-in types have different implementations in CUDA & CPP targets, which causes signature mis-match when linking.
* Fix error codes
* Add ability to lower structs and arrays that contain built-in types.
+ Added tests
+ Fix issue where the host-side was not marshalling data to lowered types.
* Update slang-ir-pytorch-cpp-binding.cpp
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
Fixes #3533
- Add logic to perform aligned memory operations for loading from and storing to composite resources, like vectors within the ByteAddress legalize pass.
- Checks
Added a new test for byte address with/without alignment.
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* Metal: Vertex/Fragment builtin and layouts.
* Fix.
* Fix test.
* Emit user semantic on vertex/fragment attributes.
|
|
* add variable pointers to render-test-vk and failing (but ignored with workarounds) test-case
|
|
|
|
* Force Inline if reftype
Fixes #3997.
If we are using a refType, we now ForceInline.
remarks:
1. Modifications were made in slang-ir-glsl-legalize to change how we translate GlobalParam proxy's into GlobalParam.
a. We now handle the senario where a globalParam is used in multiple disjoint blocks (like 2 different functions).
* try to figure out why CI fails but local works
try to inline DispatchMesh, works locally, may fail on CI(?)
* try another fix
* add task tests + don't allow semi-early task-shader inline
Task shader uses DispatchMesh which is a very big 'hack' where we check for the function name and modify the callees in very large ways. This function does inline, but it cannot inline early due to future mangling that this operation requires todo. This is reflected with the `[noRefInline]` modifier. It is a modifier so users may stop mandatory inlines with `__ref` parameter.
|
|
Fixes #4031
Each component of unpackU/Snorm4x8 had to be masked for 8bits.
|
|
* Keep const-ness in generic functions
Closes #3834
The issue was that "const" variables inside of generic functions became
non-const variables. This issue prevented some of GLSL texture
functions from being called inside of generic functions.
When `propagateConstExpr()` iterates the global functions, the generic
functions had to be handled little differently. This commit allows the
iteration to happen for the generic functions.
* Adding an explantion of the test as a comment
|
|
* Support derivative functions in compute & capabilities adjustments
fixes #4000
PR implements derivative functions in compute shaders properly so we have the functionality for SPIR-V & GLSL. Tests reflect fragment and compute paths.
PR also adjusts capabilities to correct wrong SPRI-V target capabilities for when using textures.
Remarks:
1. __requireComputeDerivative(); is a intrinsic_op and not modifier since inlining will destroy the modifier.
2. Derivative mode is tied to an entry point decoration `[DerivativeGroupQuad]`/`[DerivativeGroupLinear]` or GLSL syntax ``derivative_group_linearNV`. Default is to set the mode to `[DerivativeGroupQuad]`
* remove -emit-spirv-directly
* fixes
1. fix minor issue fwidth change where I returned the wrong type
2. fix issue where glslang{glsl->spirv} is wrong, so we don't run that test and just run the glsl test & direct spir-v test for intrinsic-texture.slang
* adjust as per review and refine code
1. add test to ensure multi-diverging-in-logic entry points work -- 2 functions which may cause computeDerivatives + 1 that uses, 1 that does not.
2. naming
3. use entry point ref graph for c-like-targets
4. reordered some code to util's and removed `static linline` since that was just for ease of coding on my end (should not have been pushed).
* Grammer
* split up source file + issolate GLSL emit path change.
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
|