| Age | Commit message (Collapse) | Author |
|
* Address glslang ordering requirments for 'derivative_group_*NV'
fixes: #4305
The solution is to emit some `layout`s after a module source is emitted.
Added to slangs gfx backend code to enable the compute shader derivative extension for testing purposes.
* address review
* enable removed test
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* Error out for types not supported by texture sample functions
This commit prints errors with a new keyword, `static_assert`, when the
given texture type is not supported for the target.
* Moving the check to linkAndOptimizeIR after specialization is done
* Remove unnecessary change
* Adding test
* Remove kIROp_StaticAssert once processed
* Do not remove StaticAssert because it is needed for the next
specialization
* Remove after iteration of child is done
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
|
|
* Support all integer typed indices in StructuredBuffer Load/Store/[].
* Fix tests.
---------
Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
|
|
* fix double semicolons
* fix another double semicolon
* wait for init data upload
* remove obsolete setData
* refactor swapchain to work on virtual back buffers
* buffer/texture use breakable device reference
* refactor input layout
* create render command encoder
* add todo
* refactor framebuffer layout
* refactor framebuffer
* refactor shader program
* translatePrimitiveType
* add more translate functions
* refactor framebuffer
* refactor render pass
* implement graphics pipeline state
* add depth stencil state
* initial render command encoder support
* comment
|
|
* SPIRV `Block` decoration fixes.
- SPIRV does not allow duplicate `Block` decorations. So we shouldn't be generating them.
- Also fixes duplication of OpName.
- SPIRV and HLSL do not allow ConstantBuffer with trailing unsized arrays. Added a check in the front-end against such code.
* Convert failing cross-compile tests to filecheck.
---------
Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
|
|
message. (#4312)
|
|
|
|
|
|
|
|
* fixes: #4163
Precompute UIntSet from individual capabilities inside generator (removes intermediate form of capabilities).
note:
1. I still expand capabilities which are missing `target` and `stage` atoms.
* fix compile warning<->error with clang
* address review
preallocate the pregenerated UIntSet's
* disable incorrect warning of 'unreachable code'
The warning is wrong since, when `out` has 0 elements (does not start `for` loop), the `return` is reached.
* fix clang warnings
1. use unsigned long for the buffer serializer
2. braces around scalar init
* address review
added work around to avoid warning with `for(...) return; return;` pattern
`else if constexpr` addition instead of cascading blocks
* push fix for use of `_BitScanForward`
* cleanup
* move around assert for proper checking
* syntax error
* use SLANG_ASSERT instead of assert
* test for why SLANG_ASSERT caused CI to fail with linux-arm builds
* test if `SLANG_ASSERT` really is causing a build issue for linux-arm
|
|
|
|
* Fix build warnings and treat warnings as error
|
|
* Remove unnecessary call to __requireComputeDerivative
When SPIR-V uses operators whose name has a keyword, "Implicit", they
require calling a function "__requireComputeDerivative()".
When it uses "Explicit", the function doesn't need to be called.
|
|
* implement sampler state
* implement input layout
* implement fence object
* buffer implementation
* texture implementation
* cleanup
* add adapter enumeration
* supported formats and allocation info
* work on device and implement readBufferResource
* skeleton for transient resource heap
* initial work on command queue / buffers / encoders
* fix uploading initial buffer data
* implement buffer resource view
* string utility functions
* wip query pool implementation
* cleanup
* swapchain
* wip
* remove plain buffer view
* extend gfxGetDeviceTypeName with metal
* basic support for resource binding with compute shaders
* needed for metal bindings
* replace assert(0) with SLANG_UNIMPLEMENTED_X
|
|
|
|
|
|
|
|
|
|
Use memcpy to replace strncpy_s in SlangProfiler::SlangProfiler to fix
the error in Windows.
|
|
* Add APIs to get profile of compile time
Add serial time measurement
Add profiler to measure lots of stages in slang compilation, and it
can accumulate the time spent in each thread in multi-threads case and
finally report a serial timing info.
* Add invocation times to the profiler
* Simplify the profiler and provide a 'clear' option
Change the profiler design to only return the thread_local
profiler to user.
We create a ISlangProfiler interface to carry the thread_local
variable PerformanceProfilerImpl profiler to user.
In addition, we provide a new option in the input parameter to
control whether or not user want to clear the previous profile
data. So spGetCompileProfile() can always returns a fresh new
profiling data.
* Change to use slang container List
Stop using std::vector, instead use slang's container List.
Generate a UUID for ISlangProfiler
|
|
* Print warning when operator<< shifting too much
Closes #3944
For the given type of the left side operand to `operator<<` is not big
enough for the right side operand, print a warning that the result will
be always zero.
|
|
|
|
Add option "-disable-source-map" to disable the source map in obfuscation.
|
|
|
|
code (#4250)
|
|
|
|
|
|
* Fix a bug on default initialization of interface typed value.
* Fix.
|
|
|
|
|
|
* Handle type check cache update on extensions more gracefully.
* Correctness fix.
* Cache implcit cast overload resolution results.
* Fix.
* More optimizations.
* Cache implicit default ctor resolution.
* Disable redundancy removal.
* Fix.
* Fix test.
* Fix.
* Correctness fix.
* Fix.
* Fix,
* Fix test.
* Small tweak.
|
|
* Add options to speedup compilation.
* Fix.
* Plumb options to DCE pass.
* Revert debug change.
* Fix regressions.
* More optimizations.
* more cleanup and fixes.
* remove comment.
* Fixes.
* Another fix.
* Fix errors.
* Fix errors.
* Add comments.
|
|
When memory leak is detected, this commit will dump the information
about the memory leak. This feature is available only in Debug build
on Windows platform.
Also note that the message will not be printed on the client
applications that use slang.dll, because the printing happens as a part
of slangc.exe not slang.dll.
I found a bug that Slang::StdWriters was closing `stdout` and `stderr`
in its destructor, which prevented Crt functions to print the messages
to `stdout` and `stderr`.
|
|
* Update slang-performance-profiler.cpp
* modified: source/core/slang-performance-profiler.cpp
* reviews
---------
Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
|
|
* push fix: if no sample, set to 0 for textureMS
* push fixes to hlsl [] operator + test so it will error
|
|
* fix all Clang-14 warnings
* remove a clang-14 warning fix because it is a MSVC warning...
|
|
|
|
per shader program. (#4189)
* Emit only 1 execution mode of type per entry point
Added a dictionary<SpvWord,Hash<ExecutionMode>> to ensure we don't emit multiple.
* get inst->id directly
* address review + fix test
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
|
|
* SPIR-V support for GLSL texture functions
Closes #4147
This commit implements GLSL texture functions with SPIR-V intrinsics.
It also implements some of missing GLSL implementations.
- textureProj
- textureLod
- texelFetchOffset
- textureProjOffset
- textureLodOffset
- textureProjLod
- textureProjLodOffset
- textureGrad
- textureGradOffset
- textureProjGrad
- textureProjGradOffset
* Fix SPIR-V issues discovered while improving the test case.
* Add __requireComputeDerivative() whenever sampling
* Do not touch GetDimensions
|
|
* capture/relay: Add capture interface classes
Add `ModuleCapture` class for capturing `IModule`
- The `IModule` can only be created from
-- `ISession::loadModule`
-- `ISession::loadModuleFromIRBlob`
-- `ISession::loadModuleFromSource`
-- `ISession::loadModuleFromSourceString`
so, we create the `ModuleCapture` at those methods in `SessionCapture`
class. We use a hash map to store a map from `IModule` to `ModuleCapture`
to avoid creating new `ModuleCapture` when there is already an old one.
- In `SessionCapture::getLoadedModule`, we will assert on not finding
a `ModuleCapture` instance.
Add `EntryPointCapture` class for capturing `IEntryPoint`.
- The `IEntryPoint` can only be created from:
-- `IModule::findEntryPointByName`
-- `IModule::findAndCheckEntryPoint`
so, we create the `EntryPointCapture` at those methods in `ModuleCapture`.
Similarly, we use a hash map to store a map from `IEntryPoint` to
`EntryPointCapture`.
- In `IModule::getDefinedEntryPoint`, we will assert on not finding
a `EntryPointCapture` instance.
Add `CompositeComponentTypeCapture` class for capturing CompositeComponentType,
but since user is only exposed to `IComponentType`, so `CompositeComponentTypeCapture`
just inherits from `IComponentType`.
- `CompositeComponentType` can only be created from:
-- ISession::createCompositeComponentType
so create it here.
Add `TypeConformanceCapture` class for capturing `ITypeConformance`.
- The `ITypeConformance` can only be created from:
-- `ISession::createTypeConformanceComponentType`
so create it here.
In addition, because `EntryPointCapture` and `ModuleCapture` share a some
base class `IComponentType`, we generate the COM GUID for those two
classes to differentiate them.
* Fix the build issue
* Add nullptr check for output parameter
* define the SLANG_CAPTURE_ASSERT macro used in both debug and release build
|
|
|
|
* RasterizerOrder resource for spirv and metal.
Also fixes the byte address buffer logic for metal.
* Fix.
* Delete commented lines.
---------
Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
|
|
* Capabilities System, Backing Logic Overhaul
Fixes #4015
Problems to address:
1. Currently the capabilities system spends anywhere from 25-50% of compile time on the CapabilityVisitor. Most of this time is spent on join logic: 1. Finding abstract atoms 2. Comparing list1<->list2. This should and can be made significantly faster.
2. Error system does not produce errors with auxiliary information. This will require a partial redesign to provide more useful semantic information for debugging.
What was addressed:
1. Array backed `CapabilityConjunctionSet` was replaced in-favor for a `UIntSet` backed `CapabilityTargetSets`. The design is described below.
Design:
* `CapabilityTargetSets` is a `Dictionary<targetAtom, CapabilityTargetSet>`. This is not an array for 2 reasons: 1. Easy to figure out which target is missing between two `CapabilityTargetSets` 2. To statically allocate an array requires the preprocessor to manually annotate which Capability is a target and link that Capability to an index. This means a dictionary is required for lookup regardless of implementation.
* `CapabilityTargetSet` is an intermediate representation of all capabilities for a singular `target` atom (`glsl`, `hlsl`, `metal`, ...). This structure contains a dictionary to all stage specific capability sets for fast lookup of stage capabilities supported by a `CapabilitySet` for a `target` atom. This reduces number of sets searched.
* `CapabilityStageSet` is an intermediate representation of all capabilities for a singular `stage` atom (`vertex`, `fragment`, ...). This structure holds all disjoint capability sets for a `stage`. A disjoint set is rare, but may exist in some scenarios (as an example): `{glsl, EXT_GL_FOO}{glsl, _GLSL_130, _GLSL_150}`. This reduces the number of sets searched.
* `UIntSet` is the main reason for the redesign for better performance and memory usage. All set operations only require a few operations, making all set logic trivial and with minimal cost to run. All algorithms were modified to focus around `UIntSet` operations.
2. Errors
* Semantic information are now better linked to the calling function to provide a connection of function<->function_body for when saving semantic information for errors.
* Missing targets now print errors much like other error code by finding code which could be a cause of incompatibility.
What is missing:
1. Add non naive support for non-stage specific capabilities such as `{hlsl, _sm_5_0}`. Currently non stage specific targets emulate the behavior through assigning such capabilities to every stage: `{hlsl, _sm_5_0, vertex} {hlsl, _sm_5_0, fragment}...`. Removal of this behavior would remove redundant shader stage sets being made at construction time (~80% of new implementation runtime). This is an addition, not an overhaul.
2. Optionally: `UIntSet` should be modified to support SIMD operations for significantly faster operations. This is not required immediately since `UIntSet` is already not a performance constraint.
Notes:
* UIntSet had implementation bugs which were fixed in this PR.
* The old capabilities system had bugs which were fixed in this PR when transforming to the new implementation.
* fix .natvis debug view
* Small optimizations I found while working on the addition
the AST building pass looks like so now:
1% = ~capabilitySet
2% = capabilitySet()
1.5% capabilitySet::unionWith()
0.8% capabilitySet::join()
1.5% auxillary info for debugging
~0.5-1% extra visitor overhead
~5% total for the visitor
~6.5% for total runtime costs
* fix caps which were wrong but worked
* push minor syntax fix (still looking for why other tests fail)
* perf & bug fixes
1. did not properly remake isBetterForTarget for this->empty case with that as Invalid. This is best case in this senario.
2. Remade seralizer for stdlib generation. Faster (more direct) & cleaner code.
NOTE: did not address review comments
* fix glsl.meta caps error
* fixing findBest logic again & UIntSet wrapper
findBest was not checking for 'more specialized' targets & was element counter was flawed
* faster getElements algorithm + natvis for UIntSet + wrong warning
* type incompatability of bitscanForward implementations
* try to fix warnings again
* remove ptr for clang intrinsic
* add missing header
* ifdef to allow clang compile
* compiler hackery to fix up platform/type independent operations
* bracket
* fix MSVC error
* missing template
* change types out again
* changes to fix compiling
* adjustment to parameter for Clang/GCC
* added iterator to delay processing all atomSets of a CapabilitySet
* add a few missing consts's
* ensure we never have more than 1 disjointSet
Added a wrapper + assert + union functionality to all possible disjoint sets. This was done in favor of a removal of the LinkedList for 2 reasons:
1. We still need 0-1 set functionality.
2. Might as well keep the code, just disallow the problematic functionality.
* address review comments
non linked-list refactor review comments addressed; add doc comments + remove redundant code
* comments + remove isValid for bool operator
* push removal of linkedlist for capabilities
* add missing break
* address review comments
minor adjustments of syntax
* push a fix to the `CapabilitySet({shader, missing target})` code
* quality + error
1. add iterator to UIntSet
2. do not specialize target_switch if profile is derived from case (GLSL_150 is not compatable with GLSL_400)
* fix target_switch erroring + temporarily remove UIntSet::Interator
temporarily remove UIntSet::Interator. It will be added after, testing code on CI first so I can multi-task fixing the UIntSet Iterator
* fix the UIntSet iterator
* Revert "fix the UIntSet iterator" temporarily to pull from master
* add metal error as per texture.slang
(took a while I realize this was why things were breaking, likely should adjust errors to reflect this)
* Rework UIntSet to have a template for output type
This is done so it is reasonable to debug the iterator output and not just dealing with messy int's
Fix problems with the iterators implemented + invalid capabilities handling
* removed incorrect `__target_switch` capability
barycentric was being used with anticipation of `profile glsl450`, this does not expand into `GL_EXT_fragment_shader_barycentric`, this instead caused an error which is hidden during cross-compile.
* remove some uses of getElements
* remove undeclared_stage for now
* remove redundant code associated with `undeclared_stage`
* remove unused variable
* address review
specifically to note removed static in a thread dangerous scope. Now using a `const static` for read only (thread safe) which precompile steps generate
* move GLSL_150 capdef change to sm_4_1 (more accurate)
* address most review comments
did not address: https://github.com/shader-slang/slang/pull/4145#discussion_r1602256776
* revert incorrect code review suggestion
* push changes for all code review suggestions
|
|
* Add diagnostic to prevent defining unsized static variables.
* Fix tests.
* Add more tests.
* Fix to allow defining variables of link-time size.
* update diagnostic message.
* Fix tests.
* Simplify code.
|
|
|
|
* Remove use of `G0` and `__target_intrinsic` in stdlib.
* Fix.
* Fix calling intrinsic in global scope.
|
|
* Impl texture APIs for Metal target
This commit is to implement texture functions for Metal target.
The following functions are implemented and tested.
- GetDimensions()
- CalculateLevelOfDetail()
- CalculateLevelOfDetailUnclamped()
- Sample()
- SampleBias()
- SampleLevel()
- SampleCmp()
- SampleCmpLevelZero()
- Gather()
- SampleGrad()
- Load()
Metal has limited support for the texture functions compared to HLSL.
- LOD is not supported for 1D texture,
- Depth textures are limited to 2D, 2DArray, Cube and CubeArray
textures.
- "Offset" variants are limited to 2D, 2DArray, 2D-Depth,
2DArray-Depth and 3D textures.
The functions that cannot be implemented for Metal should properly
be handled by the capability system later.
* Fix the failing test, multi-file.hlsl
I am not sure why this change is needed.
* Fix compile errors on macOS 2nd try
* Remove a typo character to fix the compile error
* Trivial clean up
* Remove `as_type` where it was intended as static_cast
* Use a simpler sytax for __intrinsic_asm
* Trivial clean up
* Remove TEST_AFTER_FIXING_CAPABILITY_PROBLEM after fixing normalize
* Fix the failing test properly
* Fix an incorrect setup of Depth-cube texture
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
Handles a corner case where the first block after the condition on the true-side is another condition. This would currently result in an invalid reverse graph, where the reverse version of the true-block is the merge point for two different branching insts (the reverse version of the loop as well as the second condition).
This patch simply adds a blank block when constructing the reverse-loop (similar to critical edge breaking) so that each branch inst in the reversed loop has a unique merge block.
|