| Age | Commit message (Collapse) | Author |
|
This closes issue #5085.
|
|
* Add `IRWArray` interface, and make StructuredBuffer conform to them.
* Update user guide.
* Fix.
* Fixes.
|
|
* Add WGSL pack/unpack intrinsics
This addresses issue #5080.
* Add WGSL constructor intrinsics
This addresses issue #5081.
* Add WGSL derivative and miscellaneous intrinsics
This addresses issue #5083.
* Add some missing WGSL intrinsics
- degrees
- faceforward
|
|
Two WGSL functions have little different behavior compared to other
shader languages: frexp and modf. They return a struct to return
two values.
|
|
* Implement math intrinsics for WGSL
This commit implements math related intrinsics and a few others for
WGSL.
The implementation is based on the following doc,
https://www.w3.org/TR/WGSL
slang-test was looking for the downstream compiler for WGSL even though
it is not used.
This commit adds a minimal change to avoid the crash.
|
|
* Add WGSL as a target
This is required for #4807.
* C-like emitter: Allow the function header emission to be overloaded
WGSL-style function headers are pretty different from normal C-style headers:
Normal C-style headers:
ReturnType Func(...)
void VoidFunc(...)
WGSL-style headers:
fn Func(...) -> ReturnType
fn VoidFunc(...)
This change allows the header style to be overloaded, in order to accomodate WGSL-style
headers as required to resolve issue #4807, but retains normal C-style headers as the
default implementation.
[1] https://www.w3.org/TR/WGSL/#function-declaration-sec
* C-like emitter: Allow emission of switch case selectors to be overloaded
The C-like emitter will emit code like this:
switch(a.x)
{
case 0:
case 1:
{
...
} break;
...
}
This is not allowed in WGSL. Instead, selectors for cases that share a body must [1] be
separated by commas, like this:
switch(a.x)
{
case 0, 1:
{
...
} break;
...
}
To prepare for addressing issue #4807, this patch makes the emission of switch case
selectors overloadable.
[1] https://www.w3.org/TR/WGSL/#syntax-case_selectors
* C-like emitter: Support WGSL-style declarations
This patch helps to address issue 4807.
C-like languages declare variables like this:
i32 a;
WGSL declares variables like this:
var a : i32
The patch introduces overloads so that the forthcoming WGSL emitter can output WGSL-style
declarations, which helps to resolve #4807.
* C-like emitter: Support overloading of declarators
Unlike C-like languages, WGSL does not support the following types at the syntax level,
via declarators:
- arrays
- pointers
- references
For this reason, this patch introduces support for overloading the declarator emitter,
in order to help address issue #4807.
C-like languages:
int a[3]; // Array-ness of type is mixed into the "declarator"
WGSL:
var a : array<int, 3>; // Array-ness of type is part of the... type_specifier!
* C-like emitter: Allow struct declaration separator to be overridden
C-like languages use ';' as a separator, and languages like e.g. WGSL use ','.
This change prepares for addressing issue #4807.
* C-like emitter: Allow overriding of whether pointer-like syntax is necessary
Things like e.g. structured buffers map to "ptr-to-array" in WGSL, but ptr-typed
expressions don't always need C-style pointer-like syntax.
Therefore, make it overrideable whether or not such syntax is emitted in various cases in
order to address #4807.
* C-like emitter: Emit parenthesis to avoid warning about & and + precedence
This helps with #4807 because WGSL compilers (e.g. Tint) treat absence of parenthesis as
an error.
* C-like emitter: Add hook for emitting struct field attributes
WGSL requires @align attributes to specify explicit field alignment in certain cases.
Thus, this patch prepares for addressing #4807.
* C-like emitter: Add hook for emitting global param types
Declarations of structured buffers map to global array declarations in WGSL.
However, in all other cases such as when structured buffers are used in operands, their
types map to *ptr*-to-array.
This patch makes it possible for the WGSL back-end to say that structured buffers
generally map to "ptr-to-array" types, but still have a special case of just "array" when
declaring the global shader parameter.
Thus, this patch helps with addressing #4807.
* IR lowering: Use std140 for WGSL uniform buffers
This patch just cuts out some logic that prevented std140 to be chosen for WGSL uniform
buffers.
Note that WGSL buffers in the uniform address space is not quite std140, but for now it's
close enough to avoid compile issues.
Later on, a custom layout should be created for WGSL uniform buffers.
When that's done, this change will be revisited, but for now it helps to resolve #4807.
* Don't emit line directives in WGSL by default
WGSL does not support line directives [1].
The plan currently seems to be to instead support source-map [2].
This is part of addressing issue #4807.
[1] https://github.com/gpuweb/gpuweb/issues/606
[2] https://github.com/mozilla/source-map
* WGSL IR legalization: Map SV's
The implementation closely follows the cooresponding one for Metal.
Supported:
- DispatchThreadID
- GroupID
- GroupThreadID
- GroupThreadID
Unsupported:
- GSInstanceID
This is not complete, but it helps to address #4807.
* WGSL emitter: Add support for basic language constructs
A lot of the basics are added in order to generate correct WGSL code for basic Slang language constructs.
This addresses issue #4807.
This adds support for at least the following:
- statments
- if statements
- ternary operator
- while statement
- for statements
- variable declarations
- switch statements
- Note: Slang may emit non-constant case expressions, see issue 4834
- literals
- integer literals
- u?int[16|32|64]_t
- float and half literals
- bool literals
- vector literals and splatting (e.g 1.xxx)
- function definitions
- assignments
- +=, *=, /=
- array assignments
- vector assignments/updates
- swizzles of other vectors
- from matrix rows ('m[i]' notation)
- from matrix cols (using swizzle notation, e.g 'm._11_12_13')
- matrix assignments/updates
- to rows ('m[i]' notation)
- to cols (using swizzle notation, e.g 'm._11_12_13')
- declarations
- arrays
[1] https://www.w3.org/TR/WGSL/#syntax-switch_body
* Add some WGSL capabilities
This patch registers some WGSL capabilities required to pass many of the initial compute
shader compile tests.
Many capabilities still remain to be added -- this is just an initial set to help resolve
issue #4807.
- asint
- min and max
- cos and sin
- all and any
* WGSL and C-like emitters: Add hack to bitcast case expression
In WGSL, the switch condition and case types must match.
https://www.w3.org/TR/WGSL/#switch-statement
Slang currently allows these types to mismatch, as pointed out in #4921.
Issue #4921 should eventually be addressed in the front-end by a patch like [1].
However, at the moment that would break Falcor tests.
Thus, this patch temporarily works around the issue in the WGSL emitter only in order to
help resolve #4807.
In the future, the Falcor tests should be fixed, this patch should be dropped and [1]
should be merged instead.
[1] a32156ef52f43b8503b2c77f2f1d51220ab9bdea
|
|
|
|
* Metal: mesh shading skeleton
* Metal: fixing mesh payload
* Metal: improving mesh shader indices output
* Metal: Implementing conditional mesh output set
* Metal: Trying to not break other backends
* Metal: trying to fix mesh output set
* Metal: Fixing MeshOutputSet usages
* Metal: Fixing vertex and primitive semantics
* Metal: Fixing code style
* Metal: Fixed hlsl indices set
* Fixed HLSL mesh output set disappearing and GLSL mesh output crashing
* Metal: Adjusting task test matching
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
1. Document `__ref` in stdlib.
2. Remove `__ref` example in `docs\user-guide\a1-04-interop.md`
3. New example in `docs\user-guide\a1-04-interop.md` to compensate for no longer providing an example that uses `&` and `OpCapability`/`OpExtension`.
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
|
|
* Overhaul IR lowering of pointer types.
* Propagate address space in IRBuilder.
* Fixup.
* Fix.
* Fix.
* Change how Ptr type is printed to text.
* Fix.
|
|
* Add ResourceArray intrinsic type
* Move aliased parameter generation to GLSL legalization
* Add DynamicResourceEntry type for proxying layout of GenericResourceArray
* Reimplement as DynamicResource
* Add reflection test
* Don't reuse alias cache between different parameters
* Add dynamic cast extensions for buffer types
* Minor format fix
* Fix VarDecl diagnostics after finding non-appliable initializer candidates
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* Metal: `Interlocked` (atomic) member function support for buffers
fixes: #4654
fixes: #4481
1. Add `Interlocked` (atomic) member function support for buffers to Metal
2. Fix `__getEquivalentStructuredBuffer` so it works with CPP/Metal targets
* add `CompareStore` support
* legalize RWByteAddressBuffer to fully replace StructuredBuffer
* destroy replaced byte-addr buffer
* cleanup as per review and add comment to explain why certain code exists
* fix flow of byte-address-buffer replacement
* toggle on option to translate byteAddrBuffer to StructuredBuffer
* cleanup unused buffers
* add treatGetEquivalentStructuredBufferAsGetThis flag to treat getEquivStructuredBuffer as a byteAddressBuffer
* comment to explain `treatGetEquivalentStructuredBufferAsGetThis`
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* Implement 64-bit version of clockARB
* Fix capability versions
* Corrections to capabilities
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* Implement non member function atomic texture support texture_buffer and texture1d
Fixes: #4538
Related to: #4291, fixes `tests/compute/atomics-buffer.slang`
Texture objects cannot use `__getMetalAtomicRef` to cast objects into atomic value type. [Texture objects mandate use of member functions](https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf#Texture%20Functions). The implementation is as follows:
* We can detect texture object usage through checking for an `IRImageSubscript` Operation. `__isTextureAccess()` was added to evaluate if we have an `IRImageSubscript` operation at compile time (before `static_assert`). `__isTextureAccess()` only checks if we are targeting Metal.
* We have all parameter data needed to call a texture atomic function embedded inside `IRImageSubscript`. `__extractTextureFromTextureAccess()` and `__extractCoordFromTextureAccess()` was added to extract this data for use with Metal atomics.
Note:
* Metal documentation has various incorrect details (function names)
* Since we currently hardcode metal versions for compiling, the Metal compiler version was changed to target `Metal 3.1` (`slang-gcc-compiler-util.cpp`)
* textures do not permit atomic float operations
* add fallthrough attribute + fix bug with 'exchange instead of xor' + fix warning bug
* incorrect function name fix
* missing filecheck
* disable atomics-buffer.slang compute test since GFX issue causing it to fail
* Array support for metal interlockedAtomic and proper verification of texture with interlockedAtomic functions
* Array support for metal interlockedAtomic
* proper verification of texture with interlockedAtomic functions
note: had to seperate many functions to allow forceInlining to run
* missing getOperand(0)
* push atomic fix for metal
* fix atomic syntax for metal and hlsl emitting extra brackets (breaks tests)
* test changes and meta changes
1. max is 8 rw textures with metal because Metal has this limit. Split up tests to not hit this limit
2. added back `[0]`...,`T` to test since this legalizes metal atomic intrinsic
* macro'ify some of the atomic code
1. addresses review
2. makes code easier to modify in the future (rather than sifting through 1000 lines we can just look at ~10-30
* fix test 'check'
* missing float support due to macro
* add functions macro generates, `InternalAtomicOperationInfo`
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* initial change to test with CI for CPU/CUDA errors
* Fixes to Metal Input parameters and Output values
Note:
1. Flattening a struct is the process of making a struct have 0 struct/class members.
Changes:
1. Separated `legalizeSystemValueParameters`. This was done to make it easier to run `legalizeSystemValue` 1 system-value at a time to simplify logic. This change is optional and can be undone if not preferred.
2. Wrap everything inside a Metal legalization context. This was done since it simplifies a lot of logic and will be required for #4375
3. Created `convertSystemValueSemanticNameToEnum` and expanded the existing System-Value Enum system. This allows (sometimes) faster comparisons and helps prepare code for porting into `slang-ir-legalize-varying-params.cpp` (#4375)
4. Added a more dynamic `legalizeSystemValue` system so more than 2 types can be targeted for legalization. This is required to legalize `output`. There is still no preference for any converted type, the first valid type will be converted to.
5. Flatten all input(`flattenInputParameters`)/output(part of `wrapReturnValueInStruct`) structs and assign semantics accordingly.
6. Semantics when legalized have no specific logic other than to: 1. avoid overlapping semantics 2. Prefer assigning explicit semantics specified by a user.
7. Fixed some issue with incorrect output semantics if not a fragment stage (when there are not any assigned semantics)
* change metallib test to the correct metal test
* comment code & cleanup -- Did not address all review
Added comments for clarity + cleaned up some odd areas which were messy
* Add comment to `fixFieldSemanticsOfFlatStruct`
I found `fixFieldSemanticsOfFlatStruct` to still be confusing at a cursory glance. Added comments to make the function be more understandable.
* white space
* Address review comments
1. Fix semantic propegation.
2. Fix how we map struct fields of the flat struct to struct. This is specifically important for if reusing the same struct twice since struct member info is not unique per struct instance used.
* Fix semantic legalization by adding TreeMap
Add TreeMap to allow proper sorted-object data iteration.
* Fix some compile issues
* try to fix gcc compile error
* compile error
* fix logic bug in treeMap iterator next-semantic setter
* fix vsproject filters
* filter file syntax error
* remove need of a context to make copies stable
* Rename treemap to the more appropriate name of "treeset", adjust code comments accordingly.
* remove custom type `TreeSet` and use `std::set`
* remove TreeMap fully
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* Support status argument for GatherXXX
This commit adds an argument to all texture GatherXXX functions.
The new argument is for "status" as described in SM5.0 definision.
Close #4466
Limit Gather with status to HLSL
Exclude Gather-status test from VK
* Fix capability errors
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
|
|
placement (#4534)
Closes #4533
Fixes part of #4531
|
|
* Fix invalid capabilities being allowed
fixes: #4506
fixes: #4508
1. As per #4506, no longer allow invalid `[require(...)]`
2. As per #4508, no longer allow mismatch between `case` and `require` of a calling function
3. Fixes incorrect hlsl.meta capabilities
4. Added a ref to the parent function/decl for when erroring with capabilities to help debug meta.slang files for when weird source locations are generated.
* rename vars and copy lambdas by value
* fix some more capabilities
* incorrect capabilities on a texture test
* push capabilities fix
note: seperated capabilities for glsl,spirv,cuda,hlsl since not all functions support all targets (source of capability error)
* fix cmd line arg by using `xslang` to passthrough to slangc
* let auto-infer run for certain capabilities to reduce simple mistakes
---------
Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
|
|
* Add `.sample` operator for MS texture types
* Adding filecheck tests for `.sample`
|
|
* Implement CheckAccessFullyMapped
Closes #4438
Closes #4445
Closes #1712
Related to #4495
This commit implements "CheckAccessFullyMapped()" for HLSL target.
All of other "status" variants for Sample/Load are limited to HLSL by
the capability system, because they not properly implemented yet.
|
|
`SubpassInput<T>` (#4462)
* Add case to `emitVectorReshape` for `vector<>` type, `scalar` value
1. Add new case
2. Add test
* fix warning
* fix warning
* Implement HLSL resource bindings and default type `float4` to `SubpassInput<T>`
fixes: #4440
1. Removed GLSLInputAttachmentIndexLayout modifier and the somewhat 'hacky' binding model 'Input Attachment' previously relied upon. This was changed to work with the slang-type-layout rules system. This change allows Slang automatic bindings, HLSL bindings, GLSL bindings, and translation of GLSL to and from HLSL bindings to work.
2. Added default argument `float4` to SubpassInput<T>.
3. Merged glsl.meta and hlsl.meta SubpassInput logic.
* fix InputAttachment attribute checks
fix InputAttachment attribute checks for HLSL and GLSL syntax
* remove unused var
* validate attribute correctly
Attributes do not have type information. We must check the type expression to validate attribute usage.
* remove hacky validation
type based validation before types are fully resolved is quite hacky and unstable to changes and wrapped types
* fix warning
* remove redundant `!= nullptr`
* remove extra `!= nullptr`
* fix some warnings/errors
* subpass capability to limit to dxc & remove default values in some functions
* revert logic to previous logic
revert logic to return if we have a binding regardless of if a VarDecl is given the binding
|
|
* Extend `countbits` intrinsic for vector types
This commit implements the overloading function for `countbits` function.
Because HLSL has following overloadings,
```
uint count_bits(uint value);
uint2 count_bits(uint2 value);
uint3 count_bits(uint3 value);
uint4 count_bits(uint4 value);
```
https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/countbits
|
|
* Implement asdouble2 intrinsic and tests
Fixes #4437
Adds a new asdouble2 intrinsic for all platforms except Metal.
Extends the test for asdouble to test asdouble2 as well.
|
|
* Fix Texture2DMSArray
Close #4427
We had the postfix order wrong for the keyword MS. This commit changes
the incorrect name Texture2DArrayMS to Texture2DMSArray.
|
|
* Add additional `ImageSubscript` features:
1. Added ImageSubscript support for Metal & a test case
* Merge GLSL/SPIRV/Metal `ImageSubscript` legalization pass
2. Added multisample support to glsl/spirv/metal for when using ImageSubscript
* Added in this PR since the overhaul of the code merges together GLSL/SPIRV/Metal implementation
3. Fixed minor metal texture `Load`/`Read` bugs
* [HLSL methods of access do not support subscript accessor for texture cube array](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/texturecubearray)
* removed swizzling of uint/int/float
* other odd bugs which were causing compile errors
note: Compute tests do not work due to what seems to be the GFX backend (causes crash without error report). The tests are disabled.
* disable LOD with texture 1d
seems that LOD for 1d textures need to be a compile time constant as per an error metal throws
* syntax error in hlsl.meta
* static_assert alone with intrinsic_asm error
provides cleaner errors
Note: `static_assert` seems to be unstable and not be fully respected (still require `intrinsic_asm` to avoid a stdlib compile error)
* change comment to `// lod is not supported for 1D texture
* add `static_assert` in related code gen paths
* address review
* address review
* add asserts as per review comment, NOTE: unclear if these should be release 'asserts' as well
|
|
* Support atomic intrinsics for Metal
This commit adds a support for the atomic intrinsics in Metal.
The atomic member functions for buffers is not implemented yet.
Metal requires the first argument for the atomic functions to be an
atomic data type. This implementation rely on the fact that we can do a
C-style type casting from a regular data type to an atomic data type.
|
|
Add InHelperLane() intrinsic for HLSL, GLSL, Metal and Spirv.
|
|
Closes #4414
|
|
* Metal: Implement fix for non vector4 texture Load/Sample
1. Fixes buffer-swizzle-store
2. Added test cases to texture.slang to cover all types
* remove 1d lod support and buffer swizzle store
this can be enabled later
|
|
'raytracing' and 'texture-footprint' tests
fixed texture-footprint bug
changed when we emit raytracing/rayquery extensions with glsl backend (to reduce incorrect extension emitting)
|
|
* Fix and enable tests for metal.
* Fix.
* Fix.
* Fix tests.
* Fix warnings.
* Fix.
---------
Co-authored-by: Yong He <yonghe@Yongs-Mac-mini.local>
|
|
* capability upgrade warning/error
adjusted implementation + tests to support a warning/error if capabilities are implicitly upgraded and test accordingly.
* add glsl profile caps
* add GLSL and HLSL capabilities to the associated capability
* syntax error in capdef
* only error if user explicitly enables capabilities
1. changed testing infrastructure to not set a `profile` explicitly,
2. Added tests to be sure this works as intended with user API and with slangc command line
* Change capability atom definitions and how Slang manages them to fix errors
1. most `glsl_spirv` version atoms have been removed from `.capdef`, instead we will translate `spirv` version atoms into `glsl_spirv` since there is no point in writing the same code twice in `.capdef` files to define `spirv` versions.
2. add spirv version, and hlsl sm version (and equivlent) capability dependencies
3. removed some stage requirments which were set on objects, keep the wrapper capabilities. I am keeping the wrapper capabilities since I am unaware on if there are stage limitations (spec says code in practice does not work).
* check internal version instead of version profile (_spirv_1_5 vs. spirv_1_5)
* remove unused OpCapability. adjust SPIRV version'ing again for glsl_spirv
* apply workaround for glslang bug with rayquery usage
* ensure capabilities targetted by a profile and added together by a user are valid
* remove additions to `spirv_1_*` wrapper
* spirv_* -> glsl_spirv fix
* fix bug where incompatable profiles would cause invalid target caps
* try to avoid joining invalid capabilities
* fix the warning/error & printing
* run through tests to fix capability system and test mistakes
many mistakes were mesh shaders doing `-profile glsl_450+spirv_1_4`. This is not allowed for a few reasons
1. the test tooling does not handle arguments the same as `slangc`
2. glsl_450 core profile does not support mesh shaders, nor does spirv_1_4. sm_6_5 does work in this senario
* set some sm_4_1 intrinsics to sm_4_0
* replace `GLSL_` defs with `glsl_`
* swap the unsupported render-test syntax for working syntax
* set d3d11/d3d12 profile defaults
this is required since sm version changes compiled code & behavior
* adjusted nvapi capabilities with atomics + d3d11 set to use sm_5_0 as per default
* cleanup
* address review
* incorrect styling
* change `bitscanForward` to work as intended on 32 bit targets
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
|
|
* Support integer typed textures for GLSL
This commit re-enables the ability to sample from an integer typed
texture for GLSL functions while keeping it unavailable for HLSL target.
|
|
|
|
* Support all integer typed indices in StructuredBuffer Load/Store/[].
* Fix tests.
---------
Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
|
|
|
|
* Remove unnecessary call to __requireComputeDerivative
When SPIR-V uses operators whose name has a keyword, "Implicit", they
require calling a function "__requireComputeDerivative()".
When it uses "Explicit", the function doesn't need to be called.
|
|
|
|
|
|
* Handle type check cache update on extensions more gracefully.
* Correctness fix.
* Cache implcit cast overload resolution results.
* Fix.
* More optimizations.
* Cache implicit default ctor resolution.
* Disable redundancy removal.
* Fix.
* Fix test.
* Fix.
* Correctness fix.
* Fix.
* Fix,
* Fix test.
* Small tweak.
|
|
* Add options to speedup compilation.
* Fix.
* Plumb options to DCE pass.
* Revert debug change.
* Fix regressions.
* More optimizations.
* more cleanup and fixes.
* remove comment.
* Fixes.
* Another fix.
* Fix errors.
* Fix errors.
* Add comments.
|
|
* push fix: if no sample, set to 0 for textureMS
* push fixes to hlsl [] operator + test so it will error
|
|
* RasterizerOrder resource for spirv and metal.
Also fixes the byte address buffer logic for metal.
* Fix.
* Delete commented lines.
---------
Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
|
|
* Capabilities System, Backing Logic Overhaul
Fixes #4015
Problems to address:
1. Currently the capabilities system spends anywhere from 25-50% of compile time on the CapabilityVisitor. Most of this time is spent on join logic: 1. Finding abstract atoms 2. Comparing list1<->list2. This should and can be made significantly faster.
2. Error system does not produce errors with auxiliary information. This will require a partial redesign to provide more useful semantic information for debugging.
What was addressed:
1. Array backed `CapabilityConjunctionSet` was replaced in-favor for a `UIntSet` backed `CapabilityTargetSets`. The design is described below.
Design:
* `CapabilityTargetSets` is a `Dictionary<targetAtom, CapabilityTargetSet>`. This is not an array for 2 reasons: 1. Easy to figure out which target is missing between two `CapabilityTargetSets` 2. To statically allocate an array requires the preprocessor to manually annotate which Capability is a target and link that Capability to an index. This means a dictionary is required for lookup regardless of implementation.
* `CapabilityTargetSet` is an intermediate representation of all capabilities for a singular `target` atom (`glsl`, `hlsl`, `metal`, ...). This structure contains a dictionary to all stage specific capability sets for fast lookup of stage capabilities supported by a `CapabilitySet` for a `target` atom. This reduces number of sets searched.
* `CapabilityStageSet` is an intermediate representation of all capabilities for a singular `stage` atom (`vertex`, `fragment`, ...). This structure holds all disjoint capability sets for a `stage`. A disjoint set is rare, but may exist in some scenarios (as an example): `{glsl, EXT_GL_FOO}{glsl, _GLSL_130, _GLSL_150}`. This reduces the number of sets searched.
* `UIntSet` is the main reason for the redesign for better performance and memory usage. All set operations only require a few operations, making all set logic trivial and with minimal cost to run. All algorithms were modified to focus around `UIntSet` operations.
2. Errors
* Semantic information are now better linked to the calling function to provide a connection of function<->function_body for when saving semantic information for errors.
* Missing targets now print errors much like other error code by finding code which could be a cause of incompatibility.
What is missing:
1. Add non naive support for non-stage specific capabilities such as `{hlsl, _sm_5_0}`. Currently non stage specific targets emulate the behavior through assigning such capabilities to every stage: `{hlsl, _sm_5_0, vertex} {hlsl, _sm_5_0, fragment}...`. Removal of this behavior would remove redundant shader stage sets being made at construction time (~80% of new implementation runtime). This is an addition, not an overhaul.
2. Optionally: `UIntSet` should be modified to support SIMD operations for significantly faster operations. This is not required immediately since `UIntSet` is already not a performance constraint.
Notes:
* UIntSet had implementation bugs which were fixed in this PR.
* The old capabilities system had bugs which were fixed in this PR when transforming to the new implementation.
* fix .natvis debug view
* Small optimizations I found while working on the addition
the AST building pass looks like so now:
1% = ~capabilitySet
2% = capabilitySet()
1.5% capabilitySet::unionWith()
0.8% capabilitySet::join()
1.5% auxillary info for debugging
~0.5-1% extra visitor overhead
~5% total for the visitor
~6.5% for total runtime costs
* fix caps which were wrong but worked
* push minor syntax fix (still looking for why other tests fail)
* perf & bug fixes
1. did not properly remake isBetterForTarget for this->empty case with that as Invalid. This is best case in this senario.
2. Remade seralizer for stdlib generation. Faster (more direct) & cleaner code.
NOTE: did not address review comments
* fix glsl.meta caps error
* fixing findBest logic again & UIntSet wrapper
findBest was not checking for 'more specialized' targets & was element counter was flawed
* faster getElements algorithm + natvis for UIntSet + wrong warning
* type incompatability of bitscanForward implementations
* try to fix warnings again
* remove ptr for clang intrinsic
* add missing header
* ifdef to allow clang compile
* compiler hackery to fix up platform/type independent operations
* bracket
* fix MSVC error
* missing template
* change types out again
* changes to fix compiling
* adjustment to parameter for Clang/GCC
* added iterator to delay processing all atomSets of a CapabilitySet
* add a few missing consts's
* ensure we never have more than 1 disjointSet
Added a wrapper + assert + union functionality to all possible disjoint sets. This was done in favor of a removal of the LinkedList for 2 reasons:
1. We still need 0-1 set functionality.
2. Might as well keep the code, just disallow the problematic functionality.
* address review comments
non linked-list refactor review comments addressed; add doc comments + remove redundant code
* comments + remove isValid for bool operator
* push removal of linkedlist for capabilities
* add missing break
* address review comments
minor adjustments of syntax
* push a fix to the `CapabilitySet({shader, missing target})` code
* quality + error
1. add iterator to UIntSet
2. do not specialize target_switch if profile is derived from case (GLSL_150 is not compatable with GLSL_400)
* fix target_switch erroring + temporarily remove UIntSet::Interator
temporarily remove UIntSet::Interator. It will be added after, testing code on CI first so I can multi-task fixing the UIntSet Iterator
* fix the UIntSet iterator
* Revert "fix the UIntSet iterator" temporarily to pull from master
* add metal error as per texture.slang
(took a while I realize this was why things were breaking, likely should adjust errors to reflect this)
* Rework UIntSet to have a template for output type
This is done so it is reasonable to debug the iterator output and not just dealing with messy int's
Fix problems with the iterators implemented + invalid capabilities handling
* removed incorrect `__target_switch` capability
barycentric was being used with anticipation of `profile glsl450`, this does not expand into `GL_EXT_fragment_shader_barycentric`, this instead caused an error which is hidden during cross-compile.
* remove some uses of getElements
* remove undeclared_stage for now
* remove redundant code associated with `undeclared_stage`
* remove unused variable
* address review
specifically to note removed static in a thread dangerous scope. Now using a `const static` for read only (thread safe) which precompile steps generate
* move GLSL_150 capdef change to sm_4_1 (more accurate)
* address most review comments
did not address: https://github.com/shader-slang/slang/pull/4145#discussion_r1602256776
* revert incorrect code review suggestion
* push changes for all code review suggestions
|
|
|
|
* Remove use of `G0` and `__target_intrinsic` in stdlib.
* Fix.
* Fix calling intrinsic in global scope.
|
|
* Impl texture APIs for Metal target
This commit is to implement texture functions for Metal target.
The following functions are implemented and tested.
- GetDimensions()
- CalculateLevelOfDetail()
- CalculateLevelOfDetailUnclamped()
- Sample()
- SampleBias()
- SampleLevel()
- SampleCmp()
- SampleCmpLevelZero()
- Gather()
- SampleGrad()
- Load()
Metal has limited support for the texture functions compared to HLSL.
- LOD is not supported for 1D texture,
- Depth textures are limited to 2D, 2DArray, Cube and CubeArray
textures.
- "Offset" variants are limited to 2D, 2DArray, 2D-Depth,
2DArray-Depth and 3D textures.
The functions that cannot be implemented for Metal should properly
be handled by the capability system later.
* Fix the failing test, multi-file.hlsl
I am not sure why this change is needed.
* Fix compile errors on macOS 2nd try
* Remove a typo character to fix the compile error
* Trivial clean up
* Remove `as_type` where it was intended as static_cast
* Use a simpler sytax for __intrinsic_asm
* Trivial clean up
* Remove TEST_AFTER_FIXING_CAPABILITY_PROBLEM after fixing normalize
* Fix the failing test properly
* Fix an incorrect setup of Depth-cube texture
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|