diff options
| author | ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> | 2024-05-16 00:04:12 -0400 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2024-05-16 00:04:12 -0400 |
| commit | 1b89f78cd1762aa08402bd656e807b66833b11d0 (patch) | |
| tree | 2be71c9d97af8d28d440981d0c5adc726d9eac56 /source | |
| parent | 3b0de8b6ea484091146f61e663c63beeac5b4798 (diff) | |
Capabilities System, CapabilitySet Logic Overhaul (#4145)
* Capabilities System, Backing Logic Overhaul
Fixes #4015
Problems to address:
1. Currently the capabilities system spends anywhere from 25-50% of compile time on the CapabilityVisitor. Most of this time is spent on join logic: 1. Finding abstract atoms 2. Comparing list1<->list2. This should and can be made significantly faster.
2. Error system does not produce errors with auxiliary information. This will require a partial redesign to provide more useful semantic information for debugging.
What was addressed:
1. Array backed `CapabilityConjunctionSet` was replaced in-favor for a `UIntSet` backed `CapabilityTargetSets`. The design is described below.
Design:
* `CapabilityTargetSets` is a `Dictionary<targetAtom, CapabilityTargetSet>`. This is not an array for 2 reasons: 1. Easy to figure out which target is missing between two `CapabilityTargetSets` 2. To statically allocate an array requires the preprocessor to manually annotate which Capability is a target and link that Capability to an index. This means a dictionary is required for lookup regardless of implementation.
* `CapabilityTargetSet` is an intermediate representation of all capabilities for a singular `target` atom (`glsl`, `hlsl`, `metal`, ...). This structure contains a dictionary to all stage specific capability sets for fast lookup of stage capabilities supported by a `CapabilitySet` for a `target` atom. This reduces number of sets searched.
* `CapabilityStageSet` is an intermediate representation of all capabilities for a singular `stage` atom (`vertex`, `fragment`, ...). This structure holds all disjoint capability sets for a `stage`. A disjoint set is rare, but may exist in some scenarios (as an example): `{glsl, EXT_GL_FOO}{glsl, _GLSL_130, _GLSL_150}`. This reduces the number of sets searched.
* `UIntSet` is the main reason for the redesign for better performance and memory usage. All set operations only require a few operations, making all set logic trivial and with minimal cost to run. All algorithms were modified to focus around `UIntSet` operations.
2. Errors
* Semantic information are now better linked to the calling function to provide a connection of function<->function_body for when saving semantic information for errors.
* Missing targets now print errors much like other error code by finding code which could be a cause of incompatibility.
What is missing:
1. Add non naive support for non-stage specific capabilities such as `{hlsl, _sm_5_0}`. Currently non stage specific targets emulate the behavior through assigning such capabilities to every stage: `{hlsl, _sm_5_0, vertex} {hlsl, _sm_5_0, fragment}...`. Removal of this behavior would remove redundant shader stage sets being made at construction time (~80% of new implementation runtime). This is an addition, not an overhaul.
2. Optionally: `UIntSet` should be modified to support SIMD operations for significantly faster operations. This is not required immediately since `UIntSet` is already not a performance constraint.
Notes:
* UIntSet had implementation bugs which were fixed in this PR.
* The old capabilities system had bugs which were fixed in this PR when transforming to the new implementation.
* fix .natvis debug view
* Small optimizations I found while working on the addition
the AST building pass looks like so now:
1% = ~capabilitySet
2% = capabilitySet()
1.5% capabilitySet::unionWith()
0.8% capabilitySet::join()
1.5% auxillary info for debugging
~0.5-1% extra visitor overhead
~5% total for the visitor
~6.5% for total runtime costs
* fix caps which were wrong but worked
* push minor syntax fix (still looking for why other tests fail)
* perf & bug fixes
1. did not properly remake isBetterForTarget for this->empty case with that as Invalid. This is best case in this senario.
2. Remade seralizer for stdlib generation. Faster (more direct) & cleaner code.
NOTE: did not address review comments
* fix glsl.meta caps error
* fixing findBest logic again & UIntSet wrapper
findBest was not checking for 'more specialized' targets & was element counter was flawed
* faster getElements algorithm + natvis for UIntSet + wrong warning
* type incompatability of bitscanForward implementations
* try to fix warnings again
* remove ptr for clang intrinsic
* add missing header
* ifdef to allow clang compile
* compiler hackery to fix up platform/type independent operations
* bracket
* fix MSVC error
* missing template
* change types out again
* changes to fix compiling
* adjustment to parameter for Clang/GCC
* added iterator to delay processing all atomSets of a CapabilitySet
* add a few missing consts's
* ensure we never have more than 1 disjointSet
Added a wrapper + assert + union functionality to all possible disjoint sets. This was done in favor of a removal of the LinkedList for 2 reasons:
1. We still need 0-1 set functionality.
2. Might as well keep the code, just disallow the problematic functionality.
* address review comments
non linked-list refactor review comments addressed; add doc comments + remove redundant code
* comments + remove isValid for bool operator
* push removal of linkedlist for capabilities
* add missing break
* address review comments
minor adjustments of syntax
* push a fix to the `CapabilitySet({shader, missing target})` code
* quality + error
1. add iterator to UIntSet
2. do not specialize target_switch if profile is derived from case (GLSL_150 is not compatable with GLSL_400)
* fix target_switch erroring + temporarily remove UIntSet::Interator
temporarily remove UIntSet::Interator. It will be added after, testing code on CI first so I can multi-task fixing the UIntSet Iterator
* fix the UIntSet iterator
* Revert "fix the UIntSet iterator" temporarily to pull from master
* add metal error as per texture.slang
(took a while I realize this was why things were breaking, likely should adjust errors to reflect this)
* Rework UIntSet to have a template for output type
This is done so it is reasonable to debug the iterator output and not just dealing with messy int's
Fix problems with the iterators implemented + invalid capabilities handling
* removed incorrect `__target_switch` capability
barycentric was being used with anticipation of `profile glsl450`, this does not expand into `GL_EXT_fragment_shader_barycentric`, this instead caused an error which is hidden during cross-compile.
* remove some uses of getElements
* remove undeclared_stage for now
* remove redundant code associated with `undeclared_stage`
* remove unused variable
* address review
specifically to note removed static in a thread dangerous scope. Now using a `const static` for read only (thread safe) which precompile steps generate
* move GLSL_150 capdef change to sm_4_1 (more accurate)
* address most review comments
did not address: https://github.com/shader-slang/slang/pull/4145#discussion_r1602256776
* revert incorrect code review suggestion
* push changes for all code review suggestions
Diffstat (limited to 'source')
23 files changed, 1624 insertions, 1437 deletions
diff --git a/source/core/slang-linked-list.h b/source/core/slang-linked-list.h index 93b5e435c..840ef8cd6 100644 --- a/source/core/slang-linked-list.h +++ b/source/core/slang-linked-list.h @@ -323,7 +323,7 @@ public: } return rs; } - int getCount() { return count; } + int getCount() const { return count; } }; } // namespace Slang #endif diff --git a/source/core/slang-uint-set.cpp b/source/core/slang-uint-set.cpp index e973cbc3a..b6871c192 100644 --- a/source/core/slang-uint-set.cpp +++ b/source/core/slang-uint-set.cpp @@ -3,18 +3,6 @@ namespace Slang { -static bool _areAllZero(const UIntSet::Element* elems, Index count) -{ - for (Index i = 0; count; ++i) - { - if (elems[i]) - { - return false; - } - } - return true; -} - UIntSet& UIntSet::operator=(UIntSet&& other) { m_buffer = _Move(other.m_buffer); @@ -49,14 +37,8 @@ void UIntSet::setAll() void UIntSet::resize(UInt size) { - const Index oldCount = m_buffer.getCount(); const Index newCount = Index((size + kElementMask) >> kElementShift); - m_buffer.setCount(newCount); - - if (newCount > oldCount) - { - ::memset(m_buffer.getBuffer() + oldCount, 0, (newCount - oldCount) * sizeof(Element)); - } + resizeBackingBufferDirectly(newCount); } void UIntSet::clear() @@ -66,17 +48,7 @@ void UIntSet::clear() bool UIntSet::isEmpty() const { - const Element*const src = m_buffer.getBuffer(); - const Index count = m_buffer.getCount(); - - for (Index i = 0; i < count; ++i) - { - if (src[i]) - { - return false; - } - } - return true; + return _areAllZero(m_buffer.getBuffer(), m_buffer.getCount()); } void UIntSet::clearAndDeallocate() @@ -106,7 +78,7 @@ bool UIntSet::operator==(const UIntSet& set) const const Index minCount = Math::Min(aCount, bCount); - return ::memcmp(aElems, bElems, minCount) == 0 && + return ::memcmp(aElems, bElems, minCount*sizeof(Element)) == 0 && _areAllZero(aElems + minCount, aCount - minCount) && _areAllZero(bElems + minCount, bCount - minCount); } @@ -123,6 +95,15 @@ void UIntSet::intersectWith(const UIntSet& set) } } +void UIntSet::subtractWith(const UIntSet& set) +{ + const Index minCount = Math::Min(this->m_buffer.getCount(), set.m_buffer.getCount()); + for (Index i = 0; i < minCount; i++) + { + this->m_buffer[i] = this->m_buffer[i] & (~set.m_buffer[i]); + } +} + /* static */void UIntSet::calcUnion(UIntSet& outRs, const UIntSet& set1, const UIntSet& set2) { outRs.m_buffer.setCount(Math::Max(set1.m_buffer.getCount(), set2.m_buffer.getCount())); @@ -162,5 +143,24 @@ void UIntSet::intersectWith(const UIntSet& set) return false; } +Index UIntSet::countElements() const +{ + // TODO: This can be made faster using SIMD intrinsics to count set bits. + uint64_t tmp; + constexpr Index loopSize = ((sizeof(Element) / sizeof(tmp)) != 0) ? sizeof(Element) / sizeof(tmp) : 1; + Index count = 0; + for (auto index = 0; index < this->m_buffer.getCount(); index++) + { + for (auto i = 0; i < loopSize; i++) + { + tmp = m_buffer[index] >> (sizeof(tmp) * i); + tmp = tmp - ((tmp >> 1) & 0x5555555555555555); + tmp = (tmp & 0x3333333333333333) + ((tmp >> 2) & 0x3333333333333333); + count += ((tmp + (tmp >> 4) & 0xF0F0F0F0F0F0F0F) * 0x101010101010101) >> 56; + } + } + return count; +} + } diff --git a/source/core/slang-uint-set.h b/source/core/slang-uint-set.h index 0f2165bab..22ca457b0 100644 --- a/source/core/slang-uint-set.h +++ b/source/core/slang-uint-set.h @@ -6,31 +6,83 @@ #include "slang-common.h" #include "slang-hash.h" +#if defined(_MSC_VER) +#include <intrin.h> +#endif #include <memory.h> namespace Slang { +template<typename T> +constexpr static Index computeElementShift() +{ + Index currentShift = 0; + Index currentShiftValue = 1; + + while (currentShiftValue != sizeof(T) * 8) + { + currentShift++; + currentShiftValue *= 2; + } + + return currentShift; +} + +static inline Index bitscanForward(uint64_t in) +{ +#if defined(_MSC_VER) + +#ifdef _WIN64 + uint64_t out = 0; + _BitScanForward64((unsigned long*)&out, in); + return Index(out); +#else + constexpr uint32_t bitsInType = sizeof(uint32_t) * 8; + uint32_t out; + // check for 0s in 0bit->31bit. If all 0's, check for 0s in 32bit->63bit + _BitScanForward((unsigned long*)&out, *(((uint32_t*)&in) + 1)); + if (out != bitsInType) + return Index(out); + _BitScanForward((unsigned long*)&out, *(((uint32_t*)&in))); + return Index(out + bitsInType); +#endif// #ifdef _WIN64 + +#else + return Index(__builtin_ctzll(in)); +#endif// #if defined(_MSC_VER) +} + /* Hold a set of UInt values. Implementation works by storing as a bit per value */ +/// UIntSet is essentially a Element[], where each Element is `b` bits big. +/// Each index has `b` number of integers. If the bit is 1, we have an element there. +/// Value of each element is equal to the binary offset from Element[0], bit 0. class UIntSet { public: typedef UIntSet ThisType; - typedef uint32_t Element; ///< Type that holds the bits to say if value is present + typedef uint64_t Element; ///< Type that holds the bits to say if value is present + constexpr static Index kElementSize = sizeof(Element) * 8; ///< The number of bits in an element. This also determines how many values a element can hold. + constexpr static Index kElementMask = kElementSize - 1; ///< Mask to get shift from an index + constexpr static Index kElementShift = computeElementShift<Element>(); ///< How many bits to shift to get Element index from an index. 5 for 2^5=32 elements in a uint32_t. 6 for 2^6=64 in a uint64_t. + UIntSet() {} UIntSet(const UIntSet& other) { m_buffer = other.m_buffer; } UIntSet(UIntSet && other) { *this = (_Move(other)); } UIntSet(UInt maxVal) { resizeAndClear(maxVal); } + UIntSet(List<UIntSet::Element> buffer) { m_buffer = buffer; } UIntSet& operator=(UIntSet&& other); UIntSet& operator=(const UIntSet& other); HashCode getHashCode() const; - /// Return the count of all bits directly represented + /// Return the count of all bits directly represented Int getCount() const { return Int(m_buffer.getCount()) * kElementSize; } + List<Element>& getBuffer() { return m_buffer; } + /// Resize such that val can be stored and clear contents void resizeAndClear(UInt val); /// Set all of the values up to count, as set @@ -38,6 +90,7 @@ public: /// Resize (but maintain contents) up to bit size. /// NOTE! That since storage is in Element blocks, it may mean some values after size are set (up to the Element boundary) void resize(UInt size); + void resizeBackingBufferDirectly(Index size); /// Clear all of the contents (by clearing the bits) void clear(); @@ -47,6 +100,8 @@ public: /// Add a value inline void add(UInt val); + inline void add(const UIntSet& val); + /// Remove a value inline void remove(UInt val); /// Returns true if the value is present @@ -59,10 +114,12 @@ public: /// != bool operator!=(const UIntSet& set) const { return !(*this == set); } - /// Store the union between this and set in this + /// Store the union between this and set void unionWith(const UIntSet& set); - /// Store the intersection between this and set in this + /// Store the intersection between this and set void intersectWith(const UIntSet& set); + /// Store the subtraction between this and set + void subtractWith(const UIntSet& set); /// bool isEmpty() const; @@ -70,6 +127,10 @@ public: /// Swap this with rhs void swapWith(ThisType& rhs) { m_buffer.swapWith(rhs.m_buffer); } + template<typename T> + List<T> getElements() const; + Index countElements() const; + /// Store the union of set1 and set2 in outRs static void calcUnion(UIntSet& outRs, const UIntSet& set1, const UIntSet& set2); /// Store the intersection of set1 and set2 in outRs @@ -80,16 +141,98 @@ public: /// Returns true if set1 and set2 have a same value set (ie there is an intersection) static bool hasIntersection(const UIntSet& set1, const UIntSet& set2); -private: - enum + struct Iterator { - kElementShift = 5, ///< How many bits to shift to get Element index from an index - kElementSize = sizeof(Element) * 8, ///< The number of bits in an element - kElementMask = kElementSize - 1, ///< Mask to get shift from an index + friend class UIntSet; + private: + const List<Element>* context; + Index block = 0; + Element processedElement = 0; + uint64_t LSB = 0; + + void clearLSB() + { + LSB = bitscanForward(processedElement); + processedElement &= processedElement - 1; + } + public: + Iterator(const List<Element>* inContext) + { + context = inContext; + } + + Element operator*() + { + return Element(LSB + (kElementSize * block)); + } + + Iterator& operator++() + { + while (processedElement == 0) + { + block++; + if (block >= context->getCount()) + { + return *this; + } + processedElement = (*context)[block]; + } + clearLSB(); + return *this; + } + Iterator& operator++(int) + { + return ++(*this); + } + bool operator==(const Iterator& other) const + { + return other.block == this->block + && other.processedElement == this->processedElement; + } + bool operator!=(const Iterator& other) const + { + return !(other == *this); + } }; + Iterator begin() const + { + Iterator tmp(&m_buffer); + if (m_buffer.getCount() == 0) + return tmp; + + tmp.processedElement = m_buffer[0]; + if (tmp.processedElement == 0) + tmp++; + + tmp.clearLSB(); - // Make sure they are correct for the Element type - SLANG_COMPILE_TIME_ASSERT((1 << kElementShift) == kElementSize); + return tmp; + } + Iterator end() const + { + Iterator tmp(&m_buffer); + tmp.block = m_buffer.getCount(); + tmp.processedElement = 0; + return tmp; + } + + bool areAllZero() + { + return _areAllZero(m_buffer.getBuffer(), m_buffer.getCount()); + } + +protected: + static bool _areAllZero(const UIntSet::Element* elems, Index count) + { + for (Index i = 0; i < count; ++i) + { + if (elems[i]) + { + return false; + } + } + return true; + } List<Element> m_buffer; }; @@ -132,6 +275,18 @@ inline bool UIntSet::contains(const UIntSet& set) const } // -------------------------------------------------------------------------- + +inline void UIntSet::resizeBackingBufferDirectly(Index newCount) +{ + const Index oldCount = m_buffer.getCount(); + m_buffer.setCount(newCount); + + if (newCount > oldCount) + { + ::memset(m_buffer.getBuffer() + oldCount, 0, (newCount - oldCount) * sizeof(Element)); + } +} + inline void UIntSet::add(UInt val) { const Index idx = Index(val >> kElementShift); @@ -142,6 +297,38 @@ inline void UIntSet::add(UInt val) m_buffer[idx] |= Element(1) << (val & kElementMask); } +inline void UIntSet::add(const UIntSet& other) +{ + auto otherCount = other.m_buffer.getCount(); + if (this->m_buffer.getCount() < otherCount) + resizeBackingBufferDirectly(otherCount); + + for (auto i = 0; i < otherCount; i++) + m_buffer[i] |= other.m_buffer[i]; } +template<typename T> +List<T> UIntSet::getElements() const +{ + auto count = m_buffer.getCount(); + if (count == 0) + return {}; + + // Specific path for uint64_t. If using SIMD we should not use this path due to larger data types. + + List<T> elements; + elements.reserve(count); + for (Index block = 0; block < count; block++) + { + Element n = m_buffer[block]; + while (n != 0) + { + elements.add(T(bitscanForward((uint64_t)n) + (kElementSize * block))); + n &= n - 1; + } + } + return elements; +} + +} #endif diff --git a/source/slang/glsl.meta.slang b/source/slang/glsl.meta.slang index bacc8958e..881fabb52 100644 --- a/source/slang/glsl.meta.slang +++ b/source/slang/glsl.meta.slang @@ -328,7 +328,7 @@ public vector<T,N> atan(vector<T,N> y, vector<T,N> x) __generic<T : __BuiltinFloatingPointType> [__readNone] [ForceInline] -[require(cpp_cuda_glsl_hlsl_spirv, GLSL_130)] +[require(cpp_cuda_glsl_hlsl_spirv, sm_2_0_GLSL_140)] public T inversesqrt(T x) { return rsqrt(x); @@ -337,7 +337,7 @@ public T inversesqrt(T x) __generic<T : __BuiltinFloatingPointType, let N:int> [__readNone] [ForceInline] -[require(cpp_cuda_glsl_hlsl_spirv, GLSL_130)] +[require(cpp_cuda_glsl_hlsl_spirv, sm_2_0_GLSL_140)] public vector<T, N> inversesqrt(vector<T, N> x) { return rsqrt(x); @@ -350,7 +350,7 @@ public vector<T, N> inversesqrt(vector<T, N> x) __generic<T : __BuiltinFloatingPointType> [__readNone] [ForceInline] -[require(cpp_cuda_glsl_hlsl_metal_spirv, GLSL_130)] +[require(cpp_cuda_glsl_hlsl_metal_spirv, sm_2_0_GLSL_140)] public T roundEven(T x) { return rint(x); @@ -359,7 +359,7 @@ public T roundEven(T x) __generic<T : __BuiltinFloatingPointType, let N:int> [__readNone] [ForceInline] -[require(cpp_cuda_glsl_hlsl_metal_spirv, GLSL_130)] +[require(cpp_cuda_glsl_hlsl_metal_spirv, sm_2_0_GLSL_140)] public vector<T,N> roundEven(vector<T,N> x) { return rint(x); @@ -8425,7 +8425,7 @@ public vec4 noise4(vector<float, N> x) // TODO: if called after a return, error. [ForceInline] -[require(glsl_hlsl_spirv, shader_stages_compute_tesscontrol_tesseval)] +[require(glsl_hlsl_spirv, glsl_barrier)] public void barrier() { __target_switch diff --git a/source/slang/hlsl.meta.slang b/source/slang/hlsl.meta.slang index 176c7c0e4..26e691ddb 100644 --- a/source/slang/hlsl.meta.slang +++ b/source/slang/hlsl.meta.slang @@ -1818,7 +1818,7 @@ Array<T,4> __makeArray<T>(T v0, T v1, T v2, T v3); // Gather for scalar textures. __generic<TElement, T, Shape: __ITextureShape, let isArray:int, let sampleCount:int, let access:int, let isShadow:int, let format:int> [ForceInline] -[require(glsl_metal_spirv, GLSL_400)] +[require(glsl_metal_spirv, texture_gather)] vector<TElement,4> __texture_gather(__TextureImpl<T, Shape, isArray, 0, sampleCount, access, isShadow, 0, format> texture, SamplerState s, vector<float, Shape.dimensions+isArray> location, int component) { __target_switch @@ -1867,7 +1867,7 @@ vector<TElement,4> __texture_gather(__TextureImpl<T, Shape, isArray, 0, sampleCo } __generic<TElement, T, Shape: __ITextureShape, let isArray:int, let sampleCount:int, let access:int, let isShadow:int, let format:int> [ForceInline] -[require(glsl_spirv, GLSL_400)] +[require(glsl_spirv, texture_gather)] vector<TElement,4> __texture_gather(__TextureImpl<T, Shape, isArray, 0, sampleCount, access, isShadow, 1, format> sampler, vector<float, Shape.dimensions+isArray> location, int component) { __target_switch @@ -1882,7 +1882,7 @@ vector<TElement,4> __texture_gather(__TextureImpl<T, Shape, isArray, 0, sampleCo } __generic<TElement, T, Shape: __ITextureShape, let isArray:int, let sampleCount:int, let access:int, let isShadow:int, let format:int> [ForceInline] -[require(glsl_metal_spirv, GLSL_400)] +[require(glsl_metal_spirv, texture_gather)] vector<TElement,4> __texture_gather_offset(__TextureImpl<T, Shape, isArray, 0, sampleCount, access, isShadow, 0, format> texture, SamplerState s, constexpr vector<float, Shape.dimensions+isArray> location, constexpr vector<int, Shape.planeDimensions> offset, int component) { __target_switch @@ -1917,7 +1917,7 @@ vector<TElement,4> __texture_gather_offset(__TextureImpl<T, Shape, isArray, 0, s } __generic<TElement, T, Shape: __ITextureShape, let isArray:int, let sampleCount:int, let access:int, let isShadow:int, let format:int> [ForceInline] -[require(glsl_spirv, GLSL_400)] +[require(glsl_spirv, texture_gather)] vector<TElement,4> __texture_gather_offset(__TextureImpl<T, Shape, isArray, 0, sampleCount, access, isShadow, 1, format> sampler, vector<float, Shape.dimensions+isArray> location, constexpr vector<int, Shape.planeDimensions> offset, int component) { __target_switch @@ -1932,7 +1932,7 @@ vector<TElement,4> __texture_gather_offset(__TextureImpl<T, Shape, isArray, 0, s } __generic<TElement, T, Shape: __ITextureShape, let isArray:int, let sampleCount:int, let access:int, let isShadow:int, let format:int> [ForceInline] -[require(glsl_spirv, GLSL_400)] +[require(glsl_spirv, texture_gather)] vector<TElement,4> __texture_gather_offsets(__TextureImpl<T, Shape, isArray, 0, sampleCount, access, isShadow, 0, format> texture, SamplerState s, vector<float, Shape.dimensions+isArray> location, constexpr vector<int, Shape.planeDimensions> offset1, constexpr vector<int, Shape.planeDimensions> offset2, @@ -1955,8 +1955,9 @@ vector<TElement,4> __texture_gather_offsets(__TextureImpl<T, Shape, isArray, 0, } __generic<TElement, T, Shape: __ITextureShape, let isArray:int, let sampleCount:int, let access:int, let isShadow:int, let format:int> [ForceInline] -[require(glsl_spirv, GLSL_400)] +[require(glsl_spirv, texture_gather)] vector<TElement,4> __texture_gather_offsets(__TextureImpl<T, Shape, isArray, 0, sampleCount, access, isShadow, 1, format> sampler, vector<float, Shape.dimensions+isArray> location, + constexpr vector<int, Shape.planeDimensions> offset1, constexpr vector<int, Shape.planeDimensions> offset2, constexpr vector<int, Shape.planeDimensions> offset3, @@ -1977,7 +1978,7 @@ vector<TElement,4> __texture_gather_offsets(__TextureImpl<T, Shape, isArray, 0, } __generic<TElement, T, Shape: __ITextureShape, let isArray:int, let sampleCount:int, let access:int, let isShadow:int, let format:int> [ForceInline] -[require(glsl_metal_spirv, GLSL_400)] +[require(glsl_metal_spirv, texture_gather)] vector<TElement,4> __texture_gatherCmp(__TextureImpl<T, Shape, isArray, 0, sampleCount, access, isShadow, 0, format> texture, SamplerComparisonState s, vector<float, Shape.dimensions+isArray> location, TElement compareValue) { __target_switch @@ -2025,7 +2026,7 @@ vector<TElement,4> __texture_gatherCmp(__TextureImpl<T, Shape, isArray, 0, sampl } __generic<TElement, T, Shape: __ITextureShape, let isArray:int, let sampleCount:int, let access:int, let isShadow:int, let format:int> [ForceInline] -[require(glsl_spirv, GLSL_400)] +[require(glsl_spirv, texture_gather)] vector<TElement,4> __texture_gatherCmp(__TextureImpl<T, Shape, isArray, 0, sampleCount, access, isShadow, 1, format> sampler, vector<float, Shape.dimensions+isArray> location, TElement compareValue) { __target_switch @@ -2040,7 +2041,7 @@ vector<TElement,4> __texture_gatherCmp(__TextureImpl<T, Shape, isArray, 0, sampl } __generic<TElement, T, Shape: __ITextureShape, let isArray:int, let sampleCount:int, let access:int, let isShadow:int, let format:int> [ForceInline] -[require(glsl_metal_spirv, GLSL_400)] +[require(glsl_metal_spirv, texture_gather)] vector<TElement,4> __texture_gatherCmp_offset(__TextureImpl<T, Shape, isArray, 0, sampleCount, access, isShadow, 0, format> texture, SamplerComparisonState s, vector<float, Shape.dimensions+isArray> location, TElement compareValue, constexpr vector<int, Shape.planeDimensions> offset) { __target_switch @@ -2075,7 +2076,7 @@ vector<TElement,4> __texture_gatherCmp_offset(__TextureImpl<T, Shape, isArray, 0 } __generic<TElement, T, Shape: __ITextureShape, let isArray:int, let sampleCount:int, let access:int, let isShadow:int, let format:int> [ForceInline] -[require(glsl_spirv, GLSL_400)] +[require(glsl_spirv, texture_gather)] vector<TElement,4> __texture_gatherCmp_offset(__TextureImpl<T, Shape, isArray, 0, sampleCount, access, isShadow, 1, format> sampler, vector<float, Shape.dimensions+isArray> location, TElement compareValue, constexpr vector<int, Shape.planeDimensions> offset) { __target_switch @@ -2090,7 +2091,7 @@ vector<TElement,4> __texture_gatherCmp_offset(__TextureImpl<T, Shape, isArray, 0 } __generic<TElement, T, Shape: __ITextureShape, let isArray:int, let sampleCount:int, let access:int, let isShadow:int, let format:int> [ForceInline] -[require(glsl_spirv, GLSL_400)] +[require(glsl_spirv, texture_gather)] vector<TElement,4> __texture_gatherCmp_offsets(__TextureImpl<T, Shape, isArray, 0, sampleCount, access, isShadow, 0, format> texture, SamplerComparisonState s, vector<float, Shape.dimensions+isArray> location, TElement compareValue, vector<int, Shape.planeDimensions> offset1, vector<int, Shape.planeDimensions> offset2, @@ -2112,7 +2113,7 @@ vector<TElement,4> __texture_gatherCmp_offsets(__TextureImpl<T, Shape, isArray, } __generic<TElement, T, Shape: __ITextureShape, let isArray:int, let sampleCount:int, let access:int, let isShadow:int, let format:int> [ForceInline] -[require(glsl_spirv, GLSL_400)] +[require(glsl_spirv, texture_gather)] vector<TElement,4> __texture_gatherCmp_offsets(__TextureImpl<T, Shape, isArray, 0, sampleCount, access, isShadow, 1, format> sampler, vector<float, Shape.dimensions+isArray> location, TElement compareValue, vector<int, Shape.planeDimensions> offset1, vector<int, Shape.planeDimensions> offset2, @@ -2509,7 +2510,7 @@ extension __TextureImpl<T,Shape,isArray,1,sampleCount,0,isShadow,isCombined,form [__readNone] [ForceInline] - [require(cpp_glsl_hlsl_spirv, texture_sm_4_1)] + [require(cpp_glsl_hlsl_spirv, texture_sm_4_1_samplerless)] T Load(vector<int, Shape.dimensions + isArray + 1> locationAndSampleIndex) { return Load(__vectorReshape<Shape.dimensions + isArray>(locationAndSampleIndex), locationAndSampleIndex[Shape.dimensions + isArray]); @@ -5079,7 +5080,7 @@ matrix<T, N, M> acos(matrix<T, N, M> x) __generic<T : __BuiltinFloatingPointType> [__readNone] [ForceInline] -[require(cpp_cuda_glsl_hlsl_metal_spirv, GLSL_130)] +[require(cpp_cuda_glsl_hlsl_metal_spirv, sm_2_0_GLSL_140)] T acosh(T x) { __target_switch @@ -5099,7 +5100,7 @@ T acosh(T x) __generic<T : __BuiltinFloatingPointType, let N:int> [__readNone] [ForceInline] -[require(cpp_cuda_glsl_hlsl_metal_spirv, GLSL_130)] +[require(cpp_cuda_glsl_hlsl_metal_spirv, sm_2_0_GLSL_140)] vector<T,N> acosh(vector<T,N> x) { __target_switch @@ -5535,7 +5536,7 @@ matrix<T, N, M> asin(matrix<T, N, M> x) __generic<T : __BuiltinFloatingPointType> [__readNone] [ForceInline] -[require(cpp_cuda_glsl_hlsl_metal_spirv, GLSL_130)] +[require(cpp_cuda_glsl_hlsl_metal_spirv, sm_2_0_GLSL_140)] T asinh(T x) { __target_switch @@ -5555,7 +5556,7 @@ T asinh(T x) __generic<T : __BuiltinFloatingPointType, let N:int> [__readNone] [ForceInline] -[require(cpp_cuda_glsl_hlsl_metal_spirv, GLSL_130)] +[require(cpp_cuda_glsl_hlsl_metal_spirv, sm_2_0_GLSL_140)] vector<T,N> asinh(vector<T,N> x) { __target_switch @@ -6114,7 +6115,7 @@ matrix<T,N,M> atan2(matrix<T,N,M> y, matrix<T,N,M> x) __generic<T : __BuiltinFloatingPointType> [__readNone] [ForceInline] -[require(cpp_cuda_glsl_hlsl_metal_spirv, GLSL_130)] +[require(cpp_cuda_glsl_hlsl_metal_spirv, sm_2_0_GLSL_140)] T atanh(T x) { __target_switch @@ -6485,7 +6486,7 @@ matrix<T, N, M> cos(matrix<T, N, M> x) // Hyperbolic cosine __generic<T : __BuiltinFloatingPointType> [__readNone] -[require(cpp_cuda_glsl_hlsl_metal_spirv)] +[require(cpp_cuda_glsl_hlsl_metal_spirv, sm_2_0_GLSL_140)] T cosh(T x) { __target_switch @@ -6503,7 +6504,7 @@ T cosh(T x) __generic<T : __BuiltinFloatingPointType, let N : int> [__readNone] -[require(cpp_cuda_glsl_hlsl_metal_spirv)] +[require(cpp_cuda_glsl_hlsl_metal_spirv, sm_2_0_GLSL_140)] vector<T,N> cosh(vector<T,N> x) { __target_switch @@ -6521,7 +6522,7 @@ vector<T,N> cosh(vector<T,N> x) __generic<T : __BuiltinFloatingPointType, let N : int, let M : int> [__readNone] -[require(cpp_cuda_glsl_hlsl_metal_spirv)] +[require(cpp_cuda_glsl_hlsl_metal_spirv, sm_2_0_GLSL_140)] matrix<T, N, M> cosh(matrix<T, N, M> x) { __target_switch @@ -6536,7 +6537,7 @@ matrix<T, N, M> cosh(matrix<T, N, M> x) __generic<T : __BuiltinFloatingPointType> [__readNone] -[require(cpp_cuda_glsl_hlsl_metal_spirv)] +[require(cpp_cuda_glsl_hlsl_metal_spirv, sm_2_0_GLSL_140)] T cospi(T x) { __target_switch @@ -6549,7 +6550,7 @@ T cospi(T x) __generic<T : __BuiltinFloatingPointType, let N: int> [__readNone] -[require(cpp_cuda_glsl_hlsl_metal_spirv)] +[require(cpp_cuda_glsl_hlsl_metal_spirv, sm_2_0_GLSL_140)] vector<T,N> cospi(vector<T,N> x) { __target_switch @@ -6939,7 +6940,7 @@ T distance(T x, T y) __generic<T : __BuiltinFloatingPointType> [__readNone] -[require(cpp_cuda_glsl_hlsl_metal_spirv)] +[require(cpp_cuda_glsl_hlsl_metal_spirv, sm_2_0_GLSL_140)] T fdim(T x, T y) { __target_switch @@ -6952,7 +6953,7 @@ T fdim(T x, T y) __generic<T : __BuiltinFloatingPointType, let N : int> [__readNone] -[require(cpp_cuda_glsl_hlsl_metal_spirv)] +[require(cpp_cuda_glsl_hlsl_metal_spirv, sm_2_0_GLSL_140)] vector<T,N> fdim(vector<T,N> x, vector<T,N> y) { __target_switch @@ -8081,6 +8082,7 @@ matrix<T, N, M> fwidth(matrix<T, N, M> x) __generic<T : __BuiltinType> [__readNone] __glsl_version(450) +__glsl_extension(GL_EXT_fragment_shader_barycentric) [require(glsl_hlsl_spirv, getattributeatvertex)] T GetAttributeAtVertex(T attribute, uint vertexIndex) { @@ -8088,7 +8090,7 @@ T GetAttributeAtVertex(T attribute, uint vertexIndex) { case hlsl: __intrinsic_asm "GetAttributeAtVertex"; - case _GL_EXT_fragment_shader_barycentric: + case glsl: __intrinsic_asm "$0[$1]"; case spirv: return spirv_asm { @@ -8114,6 +8116,7 @@ T GetAttributeAtVertex(T attribute, uint vertexIndex) __generic<T : __BuiltinType, let N : int> [__readNone] __glsl_version(450) +__glsl_extension(GL_EXT_fragment_shader_barycentric) [require(glsl_hlsl_spirv, getattributeatvertex)] vector<T,N> GetAttributeAtVertex(vector<T,N> attribute, uint vertexIndex) { @@ -8121,7 +8124,7 @@ vector<T,N> GetAttributeAtVertex(vector<T,N> attribute, uint vertexIndex) { case hlsl: __intrinsic_asm "GetAttributeAtVertex"; - case _GL_EXT_fragment_shader_barycentric: + case glsl: __intrinsic_asm "$0[$1]"; case spirv: return spirv_asm { @@ -8147,6 +8150,7 @@ vector<T,N> GetAttributeAtVertex(vector<T,N> attribute, uint vertexIndex) __generic<T : __BuiltinType, let N : int, let M : int> [__readNone] __glsl_version(450) +__glsl_extension(GL_EXT_fragment_shader_barycentric) [require(glsl_hlsl_spirv, getattributeatvertex)] matrix<T,N,M> GetAttributeAtVertex(matrix<T,N,M> attribute, uint vertexIndex) { @@ -8154,7 +8158,7 @@ matrix<T,N,M> GetAttributeAtVertex(matrix<T,N,M> attribute, uint vertexIndex) { case hlsl: __intrinsic_asm "GetAttributeAtVertex"; - case _GL_EXT_fragment_shader_barycentric: + case glsl: __intrinsic_asm "$0[$1]"; case spirv: return spirv_asm { @@ -11194,7 +11198,7 @@ vector<uint, N> reversebits(vector<uint, N> value) __generic<T : __BuiltinFloatingPointType> [__readNone] [ForceInline] -[require(cpp_cuda_glsl_hlsl_metal_spirv, GLSL_130)] +[require(cpp_cuda_glsl_hlsl_metal_spirv, sm_2_0_GLSL_140)] T rint(T x) { __target_switch @@ -11225,7 +11229,7 @@ T rint(T x) __generic<T : __BuiltinFloatingPointType, let N:int> [__readNone] [ForceInline] -[require(cpp_cuda_glsl_hlsl_metal_spirv, GLSL_130)] +[require(cpp_cuda_glsl_hlsl_metal_spirv, sm_2_0_GLSL_140)] vector<T,N> rint(vector<T,N> x) { __target_switch @@ -12091,7 +12095,7 @@ WaveMask __WaveGetActiveMask(); __glsl_extension(GL_KHR_shader_subgroup_ballot) __spirv_version(1.3) -[require(cuda_glsl_hlsl_spirv, subgroup_ballot)] +[require(cuda_glsl_hlsl_spirv, subgroup_ballot_activemask)] WaveMask WaveGetActiveMask() { __target_switch @@ -12200,7 +12204,7 @@ WaveMask WaveMaskBallot(WaveMask mask, bool condition) } } -[require(cuda_glsl_hlsl_spirv, subgroup_ballot)] +[require(cuda_glsl_hlsl_spirv, subgroup_basic_ballot)] uint WaveMaskCountBits(WaveMask mask, bool value) { __target_switch @@ -13793,7 +13797,7 @@ uint4 WaveActiveBallot(bool condition) } } -[require(cuda_glsl_hlsl_spirv, subgroup_ballot)] +[require(cuda_glsl_hlsl_spirv, subgroup_basic_ballot)] uint WaveActiveCountBits(bool value) { __target_switch @@ -13873,7 +13877,7 @@ bool WaveIsFirstLane() // It's useful to have a wave uint4 version of countbits, because some wave functions return uint4. // This implementation tries to limit the amount of work required by the actual lane count. __spirv_version(1.3) -[require(cpp_cuda_glsl_hlsl_spirv, subgroup_ballot)] +[require(cpp_cuda_glsl_hlsl_spirv, subgroup_basic_ballot)] uint _WaveCountBits(uint4 value) { __target_switch diff --git a/source/slang/slang-ast-dump.cpp b/source/slang/slang-ast-dump.cpp index 0986d7284..8b2494310 100644 --- a/source/slang/slang-ast-dump.cpp +++ b/source/slang/slang-ast-dump.cpp @@ -715,20 +715,21 @@ struct ASTDumpContext { m_writer->emit("capability_set("); bool isFirstSet = true; - for (auto& set : capSet.getExpandedAtoms()) + for (auto& set : capSet.getAtomSets()) { if (!isFirstSet) { m_writer->emit(" | "); } bool isFirst = true; - for (auto atom : set.getExpandedAtoms()) + for (auto atom : set) { + CapabilityName formattedAtom = (CapabilityName)atom; if (!isFirst) { m_writer->emit("+"); } - dump(capabilityNameToString((CapabilityName)atom)); + dump(capabilityNameToString((CapabilityName)formattedAtom)); isFirst = false; } isFirstSet = false; diff --git a/source/slang/slang-capabilities.capdef b/source/slang/slang-capabilities.capdef index 5c672d398..5a0df9f9b 100644 --- a/source/slang/slang-capabilities.capdef +++ b/source/slang/slang-capabilities.capdef @@ -46,13 +46,13 @@ def c : target + textualTarget; def cpp : target + textualTarget; def cuda : target + textualTarget; def metal : target + textualTarget; +def spirv_1_0 : target; // We have multiple capabilities for the various SPIR-V versions, // arranged so that they inherit from one another to represent which versions // provide a super-set of the features of earlier ones (e.g., SPIR-V 1.4 is // expressed as inheriting from SPIR-V 1.3). // -def spirv_1_0 : target; def spirv_1_1 : spirv_1_0; def spirv_1_2 : spirv_1_1; def spirv_1_3 : spirv_1_2; @@ -73,6 +73,8 @@ alias cpp_cuda_glsl_hlsl = cpp | cuda | glsl | hlsl; alias cpp_cuda_glsl_hlsl_spirv = cpp | cuda | glsl | hlsl | spirv_1_0; alias cpp_cuda_glsl_hlsl_metal_spirv = cpp | cuda | glsl | hlsl | metal | spirv_1_0; alias cpp_cuda_hlsl = cpp | cuda | hlsl; +alias cpp_cuda_hlsl_spirv = cpp | cuda | hlsl | spirv_1_0; +alias cpp_cuda_hlsl_metal_spirv = cpp | cuda | hlsl | metal | spirv_1_0; alias cpp_glsl = cpp | glsl; alias cpp_glsl_hlsl_spirv = cpp | glsl | hlsl | spirv_1_0; alias cpp_glsl_hlsl_metal_spirv = cpp | glsl | hlsl | metal | spirv_1_0; @@ -99,9 +101,50 @@ def glsl_spirv_1_4 : glsl_spirv_1_3; def glsl_spirv_1_5 : glsl_spirv_1_4; def glsl_spirv_1_6 : glsl_spirv_1_5; +def _GLSL_130 : glsl; +def _GLSL_140 : _GLSL_130; +def _GLSL_150 : _GLSL_140; +def _GLSL_330 : _GLSL_150; +def _GLSL_400 : _GLSL_330; +def _GLSL_410 : _GLSL_400; +def _GLSL_420 : _GLSL_410; +def _GLSL_430 : _GLSL_420; +def _GLSL_440 : _GLSL_430; +def _GLSL_450 : _GLSL_440; +def _GLSL_460 : _GLSL_450; + + +// metal versions def metallib_2_3 : metal; def metallib_2_4 : metallib_2_3; +// hlsl versions +def _sm_4_0 : hlsl; +def _sm_4_1 : _sm_4_0; +def _sm_5_0 : _sm_4_1; +def _sm_5_1 : _sm_5_0; +def _sm_6_0 : _sm_5_1; +def _sm_6_1 : _sm_6_0; +def _sm_6_2 : _sm_6_1; +def _sm_6_3 : _sm_6_2; +def _sm_6_4 : _sm_6_3; +def _sm_6_5 : _sm_6_4; +def _sm_6_6 : _sm_6_5; +def _sm_6_7 : _sm_6_6; + +def hlsl_nvapi : hlsl; + +// cuda versions +def _cuda_sm_1_0 : cuda; +def _cuda_sm_2_0 : _cuda_sm_1_0; +def _cuda_sm_3_0 : _cuda_sm_2_0; +def _cuda_sm_3_5 : _cuda_sm_3_0; +def _cuda_sm_4_0 : _cuda_sm_3_5; +def _cuda_sm_5_0 : _cuda_sm_4_0; +def _cuda_sm_6_0 : _cuda_sm_5_0; +def _cuda_sm_7_0 : _cuda_sm_6_0; +def _cuda_sm_8_0 : _cuda_sm_7_0; +def _cuda_sm_9_0 : _cuda_sm_8_0; abstract stage; def vertex : stage; @@ -118,6 +161,10 @@ def miss : stage; def mesh : stage; def amplification : stage; def callable : stage; +alias any_stage = vertex | fragment | compute | hull | domain | geometry + | raygen | intersection | anyhit | closesthit | miss | mesh + | amplification | callable + ; // shader stage alias's alias pixel = fragment; @@ -143,44 +190,6 @@ alias raytracing_stages_compute_amplification_mesh = raytracing_stages_compute | alias raytracing_stages_compute_fragment = raytracing_stages | shader_stages_compute_fragment; alias raytracing_stages_compute_fragment_geometry_vertex = raytracing_stages | shader_stages_compute_fragment_geometry_vertex; -def _GLSL_130 : glsl; -def _GLSL_140 : _GLSL_130; -def _GLSL_150 : _GLSL_140; -def _GLSL_330 : _GLSL_150; -def _GLSL_400 : _GLSL_330; -def _GLSL_410 : _GLSL_400; -def _GLSL_420 : _GLSL_410; -def _GLSL_430 : _GLSL_420; -def _GLSL_440 : _GLSL_430; -def _GLSL_450 : _GLSL_440; -def _GLSL_460 : _GLSL_450; - -def _sm_4_0 : hlsl; -def _sm_4_1 : _sm_4_0; -def _sm_5_0 : _sm_4_1; -def _sm_5_1 : _sm_5_0; -def _sm_6_0 : _sm_5_1; -def _sm_6_1 : _sm_6_0; -def _sm_6_2 : _sm_6_1; -def _sm_6_3 : _sm_6_2; -def _sm_6_4 : _sm_6_3; -def _sm_6_5 : _sm_6_4; -def _sm_6_6 : _sm_6_5; -def _sm_6_7 : _sm_6_6; - -def hlsl_nvapi : hlsl; - -def _cuda_sm_1_0 : cuda; -def _cuda_sm_2_0 : _cuda_sm_1_0; -def _cuda_sm_3_0 : _cuda_sm_2_0; -def _cuda_sm_3_5 : _cuda_sm_3_0; -def _cuda_sm_4_0 : _cuda_sm_3_5; -def _cuda_sm_5_0 : _cuda_sm_4_0; -def _cuda_sm_6_0 : _cuda_sm_5_0; -def _cuda_sm_7_0 : _cuda_sm_6_0; -def _cuda_sm_8_0 : _cuda_sm_7_0; -def _cuda_sm_9_0 : _cuda_sm_8_0; - // SPIRV extensions. def SOURCE_EXT_GL_NV_compute_shader_derivatives : spirv_1_0; @@ -302,10 +311,10 @@ alias GL_ARB_derivative_control = _GL_ARB_derivative_control | spvDerivativeCont alias GL_ARB_fragment_shader_interlock = _GL_ARB_fragment_shader_interlock | spvFragmentShaderPixelInterlockEXT; alias GL_ARB_gpu_shader5 = _GL_ARB_gpu_shader5 | spirv_1_0; alias GL_ARB_sparse_texture_clamp = _GL_ARB_fragment_shader_interlock | spirv_1_0; -alias GL_EXT_texture_query_lod = _GL_EXT_texture_query_lod | spvImageQuery; -alias GL_ARB_texture_query_levels = _GL_ARB_texture_query_levels |spvImageQuery; +alias GL_EXT_texture_query_lod = _GL_EXT_texture_query_lod | spvImageQuery | metal; +alias GL_ARB_texture_query_levels = _GL_ARB_texture_query_levels | spvImageQuery | metal; alias GL_ARB_texture_cube_map = _GL_ARB_texture_cube_map | spirv_1_0; -alias GL_ARB_texture_gather = _GL_ARB_texture_gather | spirv_1_0; +alias GL_ARB_texture_gather = _GL_ARB_texture_gather | spirv_1_0 | metal; alias GL_EXT_buffer_reference = _GL_ARB_fragment_shader_interlock | spirv_1_5; alias GL_EXT_buffer_reference_uvec2 = _GL_EXT_buffer_reference_uvec2 | spirv_1_0; alias GL_EXT_debug_printf = _GL_EXT_debug_printf | SPV_KHR_non_semantic_info; @@ -334,8 +343,8 @@ alias GL_KHR_shader_subgroup_shuffle_relative = _GL_KHR_shader_subgroup_shuffle_ alias GL_KHR_shader_subgroup_vote = _GL_KHR_shader_subgroup_vote | spvGroupNonUniformVote; alias GL_KHR_shader_subgroup_quad = _GL_KHR_shader_subgroup_quad | spvGroupNonUniformQuad; alias GL_NV_compute_shader_derivatives = _GL_NV_compute_shader_derivatives | SOURCE_EXT_GL_NV_compute_shader_derivatives | SPV_NV_compute_shader_derivatives | _sm_6_6; -alias GL_ARB_shader_image_size = _GL_ARB_shader_image_size | spvImageQuery; -alias GL_ARB_shader_texture_image_samples = _GL_ARB_shader_texture_image_samples | spvImageQuery; +alias GL_ARB_shader_image_size = _GL_ARB_shader_image_size | spvImageQuery | metal; +alias GL_ARB_shader_texture_image_samples = _GL_ARB_shader_texture_image_samples | spvImageQuery | metal; alias GL_NV_shader_atomic_fp16_vector = _GL_NV_shader_atomic_fp16_vector + _GL_NV_gpu_shader5 | spirv_1_0; alias GL_NV_shader_subgroup_partitioned = _GL_NV_shader_subgroup_partitioned | spvGroupNonUniformPartitionedNV; alias GL_NV_ray_tracing_motion_blur = _GL_NV_ray_tracing_motion_blur | spvRayTracingMotionBlurNV; @@ -368,17 +377,6 @@ alias fragmentshaderbarycentric = GL_EXT_fragment_shader_barycentric | _sm_6_1; alias shadermemorycontrol = glsl | spirv_1_0 | _sm_5_0; alias shadermemorycontrol_compute = raytracing_stages_compute + shadermemorycontrol; alias subpass = fragment + any_gfx_target; -alias subgroup_basic = GL_KHR_shader_subgroup_basic | GL_KHR_shader_subgroup_basic + spirv_1_0 | _sm_6_0 | _cuda_sm_7_0; -alias subgroup_basic_ballot = GL_KHR_shader_subgroup_basic + GL_KHR_shader_subgroup_ballot | _sm_6_0 | _cuda_sm_7_0; -alias subgroup_vote = GL_KHR_shader_subgroup_vote | _sm_6_0 | _cuda_sm_7_0; -alias subgroup_arithmetic = GL_KHR_shader_subgroup_arithmetic | _sm_6_0 | _cuda_sm_7_0; -alias subgroup_ballot = GL_KHR_shader_subgroup_ballot | _sm_6_0 | _cuda_sm_7_0; -alias subgroup_shuffle = GL_KHR_shader_subgroup_shuffle | _sm_6_0 | _cuda_sm_7_0; -alias subgroup_shufflerelative = GL_KHR_shader_subgroup_shuffle_relative | _sm_6_0 | _cuda_sm_7_0; -alias subgroup_clustered = GL_KHR_shader_subgroup_clustered | _sm_6_0 | _cuda_sm_7_0; -alias subgroup_quad = GL_KHR_shader_subgroup_quad | _sm_6_0 | _cuda_sm_7_0; -alias subgroup_partitioned = GL_NV_shader_subgroup_partitioned | _sm_6_5; -alias shaderinvocationgroup = subgroup_vote; alias waveprefix = _sm_6_5 | _cuda_sm_7_0 | GL_KHR_shader_subgroup_arithmetic; alias bufferreference = GL_EXT_buffer_reference; alias bufferreference_int64 = bufferreference + GL_EXT_shader_explicit_arithmetic_types_int64; @@ -395,7 +393,7 @@ alias sm_4_0 = _sm_4_0 ; alias sm_4_1 = _sm_4_1 - | glsl_spirv_1_0 + sm_4_0 + | glsl_spirv_1_0 + _GLSL_150 + sm_4_0 | spirv_1_0 + sm_4_0 | _cuda_sm_6_0 | metal @@ -587,8 +585,8 @@ alias DX_6_7 = sm_6_7; alias METAL_2_3 = metallib_2_3; alias METAL_2_4 = metallib_2_4; -alias sm_2_0_GLSL_140 = sm_4_0 | glsl | spirv_1_0 | cuda | cpp; -alias sm_2_0_GLSL_400 = sm_4_0 | glsl | spirv_1_0 | cuda | cpp; +alias sm_2_0_GLSL_140 = _GLSL_140 + sm_4_0 | sm_4_0; +alias sm_2_0_GLSL_400 = _GLSL_400 + sm_4_0 | sm_4_0; alias appendstructuredbuffer = sm_5_0 + raytracing_stages_compute_fragment; alias atomic_hlsl = _sm_4_0; alias atomic_hlsl_nvapi = _sm_4_0 + hlsl_nvapi; @@ -606,15 +604,26 @@ alias fragmentprocessing_derivativecontrol = fragment + _sm_5_0 ; alias getattributeatvertex = fragment + _sm_6_1 | fragment + GL_EXT_fragment_shader_barycentric; alias memorybarrier_compute = raytracing_stages_compute + sm_5_0; +alias glsl_barrier = hlsl + memorybarrier_compute + | glsl_spirv + shader_stages_compute_tesscontrol_tesseval + ; alias structuredbuffer = sm_4_0; alias structuredbuffer_rw = sm_4_0 + raytracing_stages_compute_fragment; -alias texture_sm_4_1 = sm_4_1 + _GLSL_150 | sm_4_1; -alias texture_sm_4_1_samplerless = texture_sm_4_1 + GL_EXT_samplerless_texture_functions; +alias texture_sm_4_1 = sm_4_1 + ; +alias texture_sm_4_1_samplerless = cpp + texture_sm_4_1 + | cuda + texture_sm_4_1 + | glsl + texture_sm_4_1 + GL_EXT_samplerless_texture_functions + | hlsl + texture_sm_4_1 + raytracing_stages_compute_fragment + | spirv_1_0 + texture_sm_4_1 + GL_EXT_samplerless_texture_functions + | metal + texture_sm_4_1 + ; alias texture_sm_4_1_compute_fragment = cpp + texture_sm_4_1 | cuda + texture_sm_4_1 | glsl + texture_sm_4_1 | hlsl + texture_sm_4_1 + raytracing_stages_compute_fragment | spirv_1_0 + texture_sm_4_1 + | metal + texture_sm_4_1 ; // supposedly works on compute but docs say nothing, so for now keep as compute_fragment alias texture_sm_4_1_fragment = cpp + texture_sm_4_1 @@ -622,6 +631,7 @@ alias texture_sm_4_1_fragment = cpp + texture_sm_4_1 | glsl + texture_sm_4_1 | hlsl + texture_sm_4_1 + raytracing_stages_compute_fragment | spirv_1_0 + texture_sm_4_1 + | metal + texture_sm_4_1 ; alias texture_sm_4_1_clamp_fragment = texture_sm_4_1_fragment + GL_ARB_sparse_texture_clamp; alias texture_sm_4_1_vertex_fragment_geometry = cpp + texture_sm_4_1 @@ -629,6 +639,7 @@ alias texture_sm_4_1_vertex_fragment_geometry = cpp + texture_sm_4_1 | glsl + texture_sm_4_1 | hlsl + texture_sm_4_1 + raytracing_stages_compute_fragment_geometry_vertex | spirv_1_0 + texture_sm_4_1 + | metal + texture_sm_4_1 ; alias texture_gather = texture_sm_4_1_vertex_fragment_geometry + GL_ARB_texture_gather; alias image_samples = texture_sm_4_1_compute_fragment + GL_ARB_shader_texture_image_samples; @@ -645,7 +656,7 @@ alias texture_querylevels_cube = texture_querylevels + GL_ARB_texture_cube_map | alias atomic_glsl_float1 = GL_EXT_shader_atomic_float; alias atomic_glsl_float2 = GL_EXT_shader_atomic_float2; alias atomic_glsl_halfvec = GL_NV_shader_atomic_fp16_vector; -alias atomic_glsl = GLSL_430_SPIRV_1_0; +alias atomic_glsl = spirv_1_0 | _GLSL_400; alias atomic_glsl_int64 = atomic_glsl + GL_EXT_shader_atomic_int64; alias GLSL_430_SPIRV_1_0_compute = GLSL_430_SPIRV_1_0 + compute; alias image_loadstore = GL_EXT_shader_image_load_store + GLSL_420; @@ -654,8 +665,32 @@ alias printf = GL_EXT_debug_printf | _sm_4_0 | _cuda_sm_2_0 | cpp; alias texturefootprint = GL_NV_shader_texture_footprint + GLSL_450 | hlsl_nvapi + _sm_4_0; alias texturefootprintclamp = texturefootprint + GL_ARB_sparse_texture_clamp; -alias shader5_sm_4_0 = GL_ARB_gpu_shader5 | sm_4_0; -alias shader5_sm_5_0 = GL_ARB_gpu_shader5 | sm_5_0; +alias shader5_sm_4_0 = GL_ARB_gpu_shader5 + sm_4_0 | sm_4_0; +alias shader5_sm_5_0 = GL_ARB_gpu_shader5 + sm_4_0 | sm_5_0; + +alias subgroup_basic = GL_KHR_shader_subgroup_basic | _sm_6_0 | _cuda_sm_7_0; +alias subgroup_ballot = spirv_1_0 + GL_KHR_shader_subgroup_ballot + | glsl + GL_KHR_shader_subgroup_ballot + shader5_sm_5_0 + | _sm_6_0 + shader5_sm_5_0 + | _cuda_sm_7_0 + shader5_sm_5_0 + ; +alias subgroup_ballot_activemask = spirv_1_0 + GL_KHR_shader_subgroup_ballot + | glsl + GL_KHR_shader_subgroup_ballot + | _sm_6_0 + | _cuda_sm_7_0 + ; +alias subgroup_basic_ballot = glsl + GL_KHR_shader_subgroup_basic + subgroup_ballot + | spirv + GL_KHR_shader_subgroup_basic + subgroup_ballot + | hlsl + subgroup_ballot | cuda + subgroup_ballot + ; +alias subgroup_vote = GL_KHR_shader_subgroup_vote | _sm_6_0 | _cuda_sm_7_0; +alias shaderinvocationgroup = subgroup_vote; +alias subgroup_arithmetic = GL_KHR_shader_subgroup_arithmetic | _sm_6_0 | _cuda_sm_7_0; +alias subgroup_shuffle = glsl + GL_KHR_shader_subgroup_shuffle | _sm_6_0 | _cuda_sm_7_0; +alias subgroup_shufflerelative = GL_KHR_shader_subgroup_shuffle_relative | _sm_6_0 | _cuda_sm_7_0; +alias subgroup_clustered = GL_KHR_shader_subgroup_clustered | _sm_6_0 | _cuda_sm_7_0; +alias subgroup_quad = GL_KHR_shader_subgroup_quad | _sm_6_0 | _cuda_sm_7_0; +alias subgroup_partitioned = GL_NV_shader_subgroup_partitioned + subgroup_ballot_activemask | _sm_6_5; alias atomic_glsl_hlsl_cuda = atomic_glsl | _sm_5_0 | _cuda_sm_2_0; alias atomic_glsl_hlsl_cuda_float1 = atomic_glsl_float1 | atomic_hlsl_nvapi | _cuda_sm_2_0; diff --git a/source/slang/slang-capability.cpp b/source/slang/slang-capability.cpp index 0daf83dac..5cd46f631 100644 --- a/source/slang/slang-capability.cpp +++ b/source/slang/slang-capability.cpp @@ -66,6 +66,12 @@ struct CapabilityAtomInfo #include "slang-generated-capability-defs-impl.h" +static UInt asAtomUInt(CapabilityName name) +{ + SLANG_ASSERT((CapabilityAtom)name < CapabilityAtom::Count); + return (UInt)((CapabilityAtom)name); +} + static CapabilityAtom asAtom(CapabilityName name) { SLANG_ASSERT((CapabilityAtom)name < CapabilityAtom::Count); @@ -110,7 +116,7 @@ bool lookupCapabilityName(const UnownedStringSlice& str, CapabilityName& value); CapabilityName findCapabilityName(UnownedStringSlice const& name) { - CapabilityName result; + CapabilityName result{}; if (!lookupCapabilityName(name, result)) return CapabilityName::Invalid; return result; @@ -134,664 +140,115 @@ bool isCapabilityDerivedFrom(CapabilityAtom atom, CapabilityAtom base) return false; } -// -// CapabilityConjunctionSet -// - -// The current design choice in `CapabilityConjunctionSet` is that it stores -// an expanded, deduplicated, and sorted list of the capability -// atoms in the set. "Expanded" here means that it includes the -// transitive closure of the inheritance graph of those atoms. -// -// This choice is intended to make certain operations on -// capability sets more efficient, since use things like -// binary searches to efficiently detect whether an atom -// is present in a set. - -CapabilityConjunctionSet::CapabilityConjunctionSet() -{} - -CapabilityConjunctionSet::CapabilityConjunctionSet(Int atomCount, CapabilityAtom const* atoms) -{ - _init(atomCount, atoms); -} - -CapabilityConjunctionSet::CapabilityConjunctionSet(CapabilityAtom atom) -{ - _init(1, &atom); -} +//// CapabiltySet -CapabilityConjunctionSet::CapabilityConjunctionSet(List<CapabilityAtom> const& atoms) +void CapabilitySet::addToTargetCapabilityWithValidUIntSetAndTargetAndStage(CapabilityName target, CapabilityName stage, CapabilityAtomSet setToAdd) { - _init(atoms.getCount(), atoms.getBuffer()); -} + SLANG_ASSERT(target != CapabilityName::Invalid && stage != CapabilityName::Invalid); + auto stageAtom = asAtom(stage); + auto targetAtom = asAtom(target); + CapabilityTargetSet& targetSet = m_targetSets[targetAtom]; + targetSet.target = targetAtom; + targetSet.shaderStageSets.reserve(kCapabilityStageCount); -CapabilityConjunctionSet CapabilityConjunctionSet::makeEmpty() -{ - return CapabilityConjunctionSet(); -} + auto& localStageSets = targetSet.shaderStageSets[stageAtom]; + localStageSets.stage = stageAtom; -CapabilityConjunctionSet CapabilityConjunctionSet::makeInvalid() -{ - // An invalid capability set will always be a singleton - // set of the `Invalid` atom, and we will construct - // the set directly rather than use the more expensive - // logic in `_init()`. - // - CapabilityConjunctionSet result; - result.m_expandedAtoms.add(CapabilityAtom::Invalid); - return result; + localStageSets.addNewSet(std::move(setToAdd)); } -void CapabilityConjunctionSet::_init(Int atomCount, CapabilityAtom const* atoms) +void CapabilitySet::addToTargetCapabilityWithTargetAndStageAtom(CapabilityName target, CapabilityName stage, const ArrayView<CapabilityName>& canonicalRepresentation) { - // We will use an explicit hash set to deduplicate input atoms. - // - HashSet<CapabilityAtom> expandedAtomsSet; - for(Int i = 0; i < atomCount; ++i) - { - if (expandedAtomsSet.add(atoms[i])) - { - auto& info = _getInfo(atoms[i]); - - // Add the base items that this atom implies. - if (info.canonicalRepresentation.getCount() == 1) - { - // The atom must have only one conjunction. - SLANG_ASSERT(info.canonicalRepresentation.getCount() == 1); - - for (auto base : info.canonicalRepresentation[0]) - { - expandedAtomsSet.add(asAtom(base)); - } - } - } - } - - // We can then translate the set of atoms into a list, - // and then sort that list to produce the data that - // we use in all our other queries. - // - for(auto atom : expandedAtomsSet) + // If no provided 'stage', set the capability as a target of all stages + if (stage == CapabilityName::Invalid) { - m_expandedAtoms.add(atom); - } - m_expandedAtoms.sort(); -} - -void CapabilityConjunctionSet::calcCompactedAtoms(List<CapabilityAtom>& outAtoms) const -{ - // A "compacted" list of atoms is one that starts with - // the "expanded" list and removes any atoms that are - // implied by another atom already in the list. - // - // If the expanded list contains atom A, and A inherits - // from B, then we know that the expanded list also contains B, - // but the compacted list should not. - // - // We can thus look through the list of atoms A and for - // each base B of A, add it to a set of "redundant" atoms - // that need not appear in the compacted list. - // - HashSet<CapabilityAtom> redundantAtomsSet; - for( auto atom : m_expandedAtoms ) - { - auto& atomInfo = _getInfo(atom); - if (atomInfo.canonicalRepresentation.getCount() != 1) - { - // If the atom is not a single conjunction, skip. - continue; - } - for(auto baseAtom : atomInfo.canonicalRepresentation[0]) - { - // Note: don't add atom itself into redundant set. - if(asAtom(baseAtom) == atom) - continue; - - redundantAtomsSet.add(asAtom(baseAtom)); - } - } - - // Once we are done figuring out which atoms are redundant, - // we can iterate over the expanded list and add all the - // non-redundant ones to the compacted output list. - // - outAtoms.clear(); - for( auto atom : m_expandedAtoms ) - { - if(!redundantAtomsSet.contains(atom)) - { - outAtoms.add(atom); - } - } -} - -bool CapabilityConjunctionSet::isEmpty() const -{ - // Checking if a capability set is empty is trivial in any representation; - // all we need to know is if it has zero atoms in its definition. - // - return m_expandedAtoms.getCount() == 0; -} - -bool CapabilityConjunctionSet::isInvalid() const -{ - // We will assume here that there is only one canonical representation of - // an invalid capability set, which is a singleton set of the `Invalid` - // atom. - // - // TODO: We should ensure that any algorithms that make new capability - // sets by combining others properly ensure that they return the - // canonical invalid set rather than any other set that happens to be - // invalid (e.g., a set {A,B} would be invalid if A and B are incompatible, - // but it would not be in the canonical form this subroutine checks). - // - if(m_expandedAtoms.getCount() != 1) return false; - return m_expandedAtoms[0] == CapabilityAtom::Invalid; -} - -bool CapabilityConjunctionSet::isIncompatibleWith(CapabilityAtom that) const -{ - // Checking for incompatibility is complicated, and it is best - // to only implement it for full (expanded) sets. - // - return isIncompatibleWith(CapabilityConjunctionSet(that)); -} - -static UIntSet _calcConflictMask(CapabilityAtom atom) -{ - UIntSet mask; - auto abstractBase = _getInfo(atom).abstractBase; - if (abstractBase != CapabilityName::Invalid) - { - mask.add((UInt)abstractBase); - } - return mask; -} - -static UIntSet _calcConflictMask(const CapabilityConjunctionSet& set) -{ - // Given a capbility set, we want to compute the mask representing - // all groups of features for which it holds a potentially-conflicting atom. - // - UIntSet mask; - for (auto atom : set.getExpandedAtoms()) - { - auto abstractBase = _getInfo(atom).abstractBase; - if (abstractBase != CapabilityName::Invalid) - { - mask.add((UInt)abstractBase); - } - } - return mask; -} - -bool CapabilityConjunctionSet::isIncompatibleWith(CapabilityConjunctionSet const& that) const -{ - // The `this` and `that` sets are incompatible if there exists - // an atom A in `this` and an atom `B` in `that` such that - // A and B are not equal, but the two have overlapping "conflict group." - // - // Equivalently, we can say that the two are in conflict if - // - // * One of the two sets contains an atom A with conflict mask M - // * The other set contains at least one atom that conflicts with M - // * The other set does not contain A - // - // Our approach here is all about minimizing the number of - // iterations we take over lists of atoms, and trying to - // avoid anything super-linear. - - // We start by identifying the OR of the conflict masks for - // all features in `this` and `that`. - // - UIntSet thisMask = _calcConflictMask(*this); - UIntSet thatMask = _calcConflictMask(that); - - // Note: there is a possible early-exit opportunity here if - // `thisMask` and `thatMask` have no overlap: there could - // be no conflicts in that case. - - // Next we will iterate over the two sets in tandem (O(N) time - // in the size of the larger set), and identify any elements - // that are present in one and not the other. - // - Index thisCount = this->m_expandedAtoms.getCount(); - Index thatCount = that.m_expandedAtoms.getCount(); - Index thisIndex = 0; - Index thatIndex = 0; - for(;;) - { - if(thisIndex == thisCount) break; - if(thatIndex == thatCount) break; - - auto thisAtom = this->m_expandedAtoms[thisIndex]; - auto thatAtom = that.m_expandedAtoms[thatIndex]; - - if(thisAtom == thatAtom) - { - thisIndex++; - thatIndex++; - continue; - } - - if( thisAtom < thatAtom ) - { - // `thisAtom` is present in `this` but not `that. - // - // If `thisAtom` has a conflict mask that overlaps - // with `thatMask`, then we have a conflict: the - // other set doesn't include `thisAtom`, but *does* - // include something with an overlapping mask - // (we don't know what at this point in the code). - // - auto thisConflictMask = Slang::_calcConflictMask(thisAtom); - if(UIntSet::hasIntersection(thisConflictMask, thatMask)) - return true; - thisIndex++; - } - else - { - SLANG_ASSERT(thisAtom > thatAtom); - - // `thatAtom` is present in `that` but not `this. - // - // The logic here is the mirror image of the case above. - // - auto thatConflictMask = Slang::_calcConflictMask(thatAtom); - if(UIntSet::hasIntersection(thatConflictMask, thisMask)) - return true; - thatIndex++; - } - } - - return false; -} - -bool CapabilityConjunctionSet::implies(CapabilityConjunctionSet const& that) const -{ - // One capability set implies another if it is a super-set - // of the other one. Think of it this way: if your target - // supports features {X, Y, Z}, then that implies it also - // supports features {X,Z}. - // - // Because both `this` and `that` have expanded lists - // of all the capability atoms they imply *and* those - // lists are sorted, we can simply walk through the - // lists in tandem and see if there are any entries - // in `that` which are not present in `this. - - Index thisCount = this->m_expandedAtoms.getCount(); - Index thatCount = that.m_expandedAtoms.getCount(); - - // We cannot possibly have `this` contain all the atoms - // in `that` if the latter is has more atoms. - // - if(thatCount > thisCount) - return false; - - // Note: the following iteration is O(N) in the size - // of the larger of the two sets, which is probably - // needlessly inefficient. We might expect that `that` - // will often be a much smaller set, and we'd like to - // scale in its size rather than the size of `this`. - // - // A more advanced algorithm here would be to do - // something recursive: - // - // * If `that` is singleton set, then we can find - // whether `this` contains it via binary search. - // - // * Otherwise, we can split `that` into two - // equally-sized subsets. By taking a "pivot" value - // from where that split took place we can then - // use a binary search to partition `this` into - // two subsets and recurse on each side of that - // partition. - // - // In practice, the size of the sets we are dealing - // with right now doesn't justify such a "clever" algorithm. - - Index thisIndex = 0; - Index thatIndex = 0; - for(;;) - { - if(thisIndex == thisCount) break; - if(thatIndex == thatCount) break; - - auto thisAtom = this->m_expandedAtoms[thisIndex]; - auto thatAtom = that.m_expandedAtoms[thatIndex]; - - if( thisAtom == thatAtom ) - { - // We have an atom that both sets contain; - // we should skip past it and keep looking. - // - thisIndex++; - thatIndex++; - continue; - } - - if( thisAtom < thatAtom ) - { - // We have an atom that `this` contains, - // but `that` doesn't; that is consistent - // with `this` being a super-set, so we - // just skip the item and keep searching. - // - thisIndex++; - } - else + auto info = _getInfo(CapabilityName::any_stage); + List<CapabilityName> newArr; + auto count = canonicalRepresentation.getCount(); + newArr.setCount(count + 1); + memcpy(newArr.getBuffer(), canonicalRepresentation.getBuffer(), count * sizeof(CapabilityName)); + m_targetSets[asAtom(target)].shaderStageSets.reserve(info.canonicalRepresentation.getCount()); + for (auto i : info.canonicalRepresentation) { - SLANG_ASSERT(thisAtom > thatAtom); - - // We have an atom in `that` which isn't - // also in `this`, so we know it cannot - // be a subset. - // - return false; + newArr[count] = i[0]; + addToTargetCapabilityWithTargetAndStageAtom(target, i[0], newArr.getArrayView()); } + return; } - // We reached the end of either this or that atom. - // If we reached the end of 'that', we know everything in 'that' - // is also contained in this, so this implies that. - return thatIndex == thatCount; -} - - /// Helper functor for binary search on lists of `CapabilityAtom` -struct CapabilityAtomComparator -{ - int operator()(CapabilityAtom left, CapabilityAtom right) - { - return int(Int(left) - Int(right)); - } -}; + + CapabilityAtomSet setToAdd = CapabilityAtomSet((UInt)CapabilityAtom::Count); + for(auto i : canonicalRepresentation) + setToAdd.add(asAtomUInt(i)); -bool CapabilityConjunctionSet::implies(CapabilityAtom atom) const -{ - // Every non-alias atom that `this` implies should - // be presented in the `m_expandedAtoms` list. - // - // Because the list is sorted, we can find out whether - // it contains `atom` with a binary search. - // - Index result = m_expandedAtoms.binarySearch(atom, CapabilityAtomComparator()); - return result >= 0; + addToTargetCapabilityWithValidUIntSetAndTargetAndStage(target, stage, setToAdd); } -Int CapabilityConjunctionSet::countIntersectionWith(CapabilityConjunctionSet const& that) const +// No targets atoms have been defined on yet, set stage to target any_target capability +void CapabilitySet::addToTargetCapabilityWithStageAtom(CapabilityName stage, const ArrayView<CapabilityName>& canonicalRepresentation) { - // The goal of this subroutine is to count the number of - // elements in the intersection of `this` and `that`, - // without explicitly forming that intersection. - // - // Our approach here will be to iterate over the two - // sets in tandem (O(N) in the size of the larger set) - // and check for elements that both contain. - // - // TODO: There should be an asymptotically faster - // recursive algorithm here. - - Int intersectionCount = 0; - - Index thisCount = this->m_expandedAtoms.getCount(); - Index thatCount = that.m_expandedAtoms.getCount(); - Index thisIndex = 0; - Index thatIndex = 0; - for(;;) + + if (m_targetSets.getCount() == 0) { - if(thisIndex == thisCount) break; - if(thatIndex == thatCount) break; - - auto thisAtom = this->m_expandedAtoms[thisIndex]; - auto thatAtom = that.m_expandedAtoms[thatIndex]; - - if( thisAtom == thatAtom ) - { - // An item both contain. - - intersectionCount++; - thisIndex++; - thatIndex++; - continue; - } - - if( thisAtom < thatAtom ) - { - // An item in `this` but not `that`. - - thisIndex++; - } - else + const auto anyTargetInfo = _getInfo(CapabilityName::any_target); + CapabilityAtomSet setToAdd; + setToAdd.resize((UInt)CapabilityAtom::Count); + for (int i = 0; i < canonicalRepresentation.getCount(); i++) + setToAdd.add((UInt)canonicalRepresentation[i]); + CapabilityName targetAtom{}; + for (const auto& targetAtomCanonicalRep : anyTargetInfo.canonicalRepresentation) { - SLANG_ASSERT(thisAtom > thatAtom); - - // An item in `that` but not `this`. - - thatIndex++; + for (auto anyTargetAtom : targetAtomCanonicalRep) + { + setToAdd.add((UInt)anyTargetAtom); + if (_getInfo(anyTargetAtom).abstractBase == CapabilityName::target) + targetAtom = anyTargetAtom; + } + addToTargetCapabilityWithValidUIntSetAndTargetAndStage(targetAtom, stage, setToAdd); + for (auto anyTargetAtom : targetAtomCanonicalRep) + setToAdd.remove((UInt)anyTargetAtom); } } - return intersectionCount; } -bool CapabilityConjunctionSet::isBetterForTarget( - CapabilityConjunctionSet const& existingCaps, - CapabilityConjunctionSet const& targetCaps) const +void CapabilitySet::addToTargetCapabilityWithTargetAndOrStageAtom(CapabilityName target, CapabilityName stage, const ArrayView<CapabilityName>& canonicalRepresentation) { - auto& candidateCaps = *this; - - // The task here is to determine if `candidateCaps` should - // be considered "better" than `existingCaps` in the context - // of compilation for a target with the given `targetCaps`. - // - // In an ideal world, this computation could be quite simple: - // - // * If either `candidateCaps` or `existingCaps` is not implied by - // `targetCaps` (that is, they include requirements that aren't - // provided by the target), then the other is automatically "better." - // - // * Otherwise, one set is "better" than the other if it is a - // super-set (which is what `implies()` tests). - // - // There are two main reasons we can't use that simple logic: - // - // 1. Currently a user of Slang can compile for a target but - // not actually spell out its capabilities fully or correctly. - // They might compile for `sm_5_0` but use ray tracing features - // that require `sm_6_2` and expect the compiler to figure out - // what they "obviously" meant. Thus we cannot assume that - // `targetCaps` can be used to rule out candidates fully. - // - // 2. Sometimes there are multiple ways for a target to provide - // the same feature (e.g., multiple extensions) and because of (1) - // we cannot always rely on the `targetCaps` to tell us which to - // use. Thus we cannot rely on pure subset/`implies()` to define - // better-ness, and need some way to break ties. - // - // The following logic is a bunch of "do what I mean" nonsense that - // tries to capture a reasonable intuition of what "better"-ness - // should mean with these caveats. - - // First, if either candidate is fundamentally incompatible - // with the target, we shouldn't favor it. - // - if(candidateCaps.isIncompatibleWith(targetCaps)) return false; - if(existingCaps.isIncompatibleWith(targetCaps)) return true; - - // Next, we want to compare the candidates to the `targetCaps` - // to figure out whether one is obviously "more specialized" for - // the target. - // - // We measure the degree to which a candidate is specialized for - // the target as the size of its set intersection with `targetCaps`. - // - // TODO: If both `candidateCaps` and `existingCaps` are implied - // by `targetCaps`, then this amounts to just measuring the - // size of each set. We probably want this size-based check to - // come later in the overall process. - // - // TODO: A better model here might be to actually compute the actual - // intersected sets, and then check if one is a super-set of the other. - // - auto candidateIntersectionSize = targetCaps.countIntersectionWith(candidateCaps); - auto existingIntersectionSize = targetCaps.countIntersectionWith(existingCaps); - if(candidateIntersectionSize != existingIntersectionSize) - return candidateIntersectionSize > existingIntersectionSize; - - // Next we want to consider that if one of the two candidates - // is actually available on the target (meaning that it is - // implied by `targetCaps`) then we probably want to pick that one - // (since we can use that candidate on the chosen target without - // enabling any additional features the user didn't ask for). - // - // TODO: This step currently needs to come after the preceeding - // one because otherwise we risk selecting a `__target_intrinsic` - // decoration with *no* requirements (which are currently being - // added implicitly in many places) over any one with explicit - // requirements (since every target implies the empty set of - // requirements). - // - // In many ways the counting-based logic above amounts to a quick - // fix to prefer a non-empty set of requirements over an empty one, - // so long as something in that non-empty set overlaps with the target. - // - // TODO: The best fix is probably to figure out how "catch-all" - // intrinsic function definitions should be encoded; we clearly - // want them to be used only as a fallback when no target-specific - // variants are present. - // - bool candidateIsAvailable = targetCaps.implies(candidateCaps); - bool existingIsAvailable = targetCaps.implies(existingCaps); - if(candidateIsAvailable != existingIsAvailable) - return candidateIsAvailable; - - // All preceding factors being equal, we prefer - // a candidate that is strictly more specialized than the other. - // - // We want to avoid choosing the candidate that uses - // optional features if they aren't necessary. - // For example, the set {glsl, optionalFeature} should not be preferred - // over the set {glsl}, if optionalFeature isn't requested explictly. - // - // The solution here is that we want to partition - // `candidateCaps` and `existingCaps` into two parts: their - // intersection with `targetCaps` and their difference with it. - // - // For the intersection part of things, we'd want to favor a - // definition that is more specialized, while for the difference - // part we'd actually wnat to favor a definition that is less - // specialized. - // - CapabilityConjunctionSet candidateCapsIntersection; - CapabilityConjunctionSet candidateCapsDifference; - for (auto atom : candidateCaps.m_expandedAtoms) - { - if (targetCaps.implies(atom)) - candidateCapsIntersection.m_expandedAtoms.add(atom); - else - candidateCapsDifference.m_expandedAtoms.add(atom); - } - CapabilityConjunctionSet existingCapsIntersection; - CapabilityConjunctionSet existingCapsDifference; - for (auto atom : existingCaps.m_expandedAtoms) - { - if (targetCaps.implies(atom)) - existingCapsIntersection.m_expandedAtoms.add(atom); - else - existingCapsDifference.m_expandedAtoms.add(atom); - } - auto scoreCandidate = candidateCapsIntersection.m_expandedAtoms.getCount() - candidateCapsDifference.m_expandedAtoms.getCount(); - auto scoreExisting = existingCapsIntersection.m_expandedAtoms.getCount() - existingCapsDifference.m_expandedAtoms.getCount(); - if (scoreCandidate != scoreExisting) - return scoreCandidate > scoreExisting; - - // At this point we have the problem that neither candidate - // appears to be "obviously" better for the target, but we - // want some way to disambiguate them. - // - // What we want to do now is scan through what makes each candidate - // different from the other, and see if anything in either case - // has a ranking that should make it be preferred. - // - auto candidateScore = candidateCapsDifference._calcDifferenceScoreWith(existingCapsDifference); - auto existingScore = existingCapsDifference._calcDifferenceScoreWith(candidateCapsDifference); - if(candidateScore != existingScore) - return candidateScore > existingScore; - - return false; + if(target != CapabilityName::Invalid) + addToTargetCapabilityWithTargetAndStageAtom(target, stage, canonicalRepresentation); + else if(stage != CapabilityName::Invalid) + addToTargetCapabilityWithStageAtom(stage, canonicalRepresentation); } -uint32_t CapabilityConjunctionSet::_calcDifferenceScoreWith(CapabilityConjunctionSet const& that) const +void CapabilitySet::addToTargetCapabilitesWithCanonicalRepresentation(const ArrayView<CapabilityName>& canonicalRepresentation) { - uint32_t score = 0; - - // Our approach here will be to scan through `this` and `that` - // to identify atoms that are in `this` but not `that` (that is, - // the atoms that would be present in the set difference `this - that`) - // and then compute the maximum rank/score of those atoms. - - Index thisCount = this->m_expandedAtoms.getCount(); - Index thatCount = that.m_expandedAtoms.getCount(); - Index thisIndex = 0; - Index thatIndex = 0; - for(;;) + // only need to search i == 0/1 to find a relevant node + // target node should ALWAYS be first, so if we find a node, we stop searching. This is the most important node. We assume only stage+target with this logic. + // canonicalRepresentation of node has optionally 0-1 abstract node of each type, with a minimum of 1 abstract node total. + CapabilityName target = CapabilityName::Invalid; + CapabilityName stage = CapabilityName::Invalid; + for (const auto& i : canonicalRepresentation) { - if(thisIndex == thisCount) break; - if(thatIndex == thatCount) break; - - auto thisAtom = this->m_expandedAtoms[thisIndex]; - auto thatAtom = that.m_expandedAtoms[thatIndex]; - - if( thisAtom == thatAtom ) - { - thisIndex++; - thatIndex++; + const auto info = _getInfo(i); + if (info.abstractBase == CapabilityName::Invalid) continue; - } - - if( thisAtom < thatAtom ) - { - // `thisAtom` is not present in `that`, so it - // should contribute to our ranking of the difference. - // - auto thisAtomInfo = _getInfo(thisAtom); - auto thisAtomRank = thisAtomInfo.rank; - - if( thisAtomRank > score ) - { - score = thisAtomRank; - } + else if (info.abstractBase == CapabilityName::target) + target = i; + else if (info.abstractBase == CapabilityName::stage) + stage = i; - thisIndex++; - } - else - { - SLANG_ASSERT(thisAtom > thatAtom); - thatIndex++; - } + if (target != CapabilityName::Invalid && stage != CapabilityName::Invalid) + break; } - return score; -} - -bool CapabilityConjunctionSet::operator==(CapabilityConjunctionSet const& other) const -{ - return m_expandedAtoms == other.m_expandedAtoms; + addToTargetCapabilityWithTargetAndOrStageAtom(target, stage, canonicalRepresentation); } -bool CapabilityConjunctionSet::operator<(CapabilityConjunctionSet const& that) const +void CapabilitySet::addUnexpandedCapabilites(CapabilityName atom) { - for (Index i = 0; i < Math::Min(m_expandedAtoms.getCount(), that.m_expandedAtoms.getCount()); i++) - { - if (m_expandedAtoms[i] < that.m_expandedAtoms[i]) - return true; - else if (m_expandedAtoms[i] > that.m_expandedAtoms[i]) - return false; - } - return m_expandedAtoms.getCount() < that.m_expandedAtoms.getCount(); + auto info = _getInfo(atom); + for (const auto& cr : info.canonicalRepresentation) + addToTargetCapabilitesWithCanonicalRepresentation(cr); } - CapabilitySet::CapabilitySet() {} @@ -803,14 +260,8 @@ CapabilitySet::CapabilitySet(Int atomCount, CapabilityName const* atoms) CapabilitySet::CapabilitySet(CapabilityName atom) { - auto info = _getInfo(atom); - for (auto conjunction : info.canonicalRepresentation) - { - CapabilityConjunctionSet set; - for (auto atomName : conjunction) - set.getExpandedAtoms().add(asAtom(atomName)); - m_conjunctions.add(_Move(set)); - } + this->m_targetSets.reserve(kCapabilityTargetCount); + addUnexpandedCapabilites(atom); } CapabilitySet::CapabilitySet(List<CapabilityName> const& atoms) @@ -826,13 +277,9 @@ CapabilitySet CapabilitySet::makeEmpty() CapabilitySet CapabilitySet::makeInvalid() { - // An invalid capability set will always be a singleton - // set of the `Invalid` atom, and we will construct - // the set directly rather than use the more expensive - // logic in `_init()`. - // CapabilitySet result; - result.m_conjunctions.add(CapabilityConjunctionSet(CapabilityAtom::Invalid)); + result.m_targetSets[CapabilityAtom::Invalid].target = CapabilityAtom::Invalid; + return result; } @@ -843,24 +290,23 @@ void CapabilitySet::addCapability(CapabilityName name) bool CapabilitySet::isEmpty() const { - return m_conjunctions.getCount() == 0; + return m_targetSets.getCount() == 0; } bool CapabilitySet::isInvalid() const { - return m_conjunctions.getCount() == 1 && m_conjunctions[0].isInvalid(); + return m_targetSets.containsKey(CapabilityAtom::Invalid); } bool CapabilitySet::isIncompatibleWith(CapabilityAtom other) const { + // should be a target or derivative, otherwise this makes no sense. + if (isEmpty()) return false; - - // If all conjunctions are incompatible with the atom, then we are incompatible. - for (auto& c : m_conjunctions) - if (!c.isIncompatibleWith(other)) - return false; - return true; + + CapabilitySet otherSet((CapabilityName)other); + return isIncompatibleWith(otherSet); } bool CapabilitySet::isIncompatibleWith(CapabilityName other) const @@ -871,367 +317,440 @@ bool CapabilitySet::isIncompatibleWith(CapabilityName other) const return isIncompatibleWith(otherSet); } -bool CapabilitySet::isIncompatibleWith(CapabilityConjunctionSet const& other) const +bool CapabilitySet::isIncompatibleWith(CapabilitySet const& other) const { if (isEmpty()) return false; + if (other.isEmpty()) + return false; + + // Incompatible means there are 0 intersecting abstract nodes from sets in `other` with sets in `this` + for (auto& otherSet : other.m_targetSets) + { + auto targetSet = this->m_targetSets.tryGetValue(otherSet.first); + if (!targetSet) + continue; + + for (auto& otherStageSet : otherSet.second.shaderStageSets) + { + auto stageSet = targetSet->shaderStageSets.tryGetValue(otherStageSet.first); + if (!stageSet) + continue; - // If all conjunctions are incompatible with the atom, then we are incompatible. - for (auto& c : m_conjunctions) - if (!c.isIncompatibleWith(other)) return false; + } + } return true; } -bool CapabilitySet::isIncompatibleWith(CapabilitySet const& other) const +const CapabilityAtomSet& getAtomSetOfTargets() { - if (isEmpty()) - return false; - if (other.isEmpty()) - return false; - - // If all conjunctions in other are incompatible with the this set, then we are incompatible. - for (auto& oc : other.m_conjunctions) - for (auto& c : m_conjunctions) - if (!c.isIncompatibleWith(oc)) - return false; - return true; + return kAnyTargetUIntSetBuffer; +} +const CapabilityAtomSet& getAtomSetOfStages() +{ + return kAnyStageUIntSetBuffer; } -bool CapabilitySet::implies(CapabilityAtom atom) const +bool hasTargetAtom(const CapabilityAtomSet& setIn, CapabilityAtom& targetAtom) { - if (isEmpty()) - return false; + CapabilityAtomSet intersection; + setIn.calcIntersection(intersection, getAtomSetOfTargets(), setIn); - for (auto& c : m_conjunctions) - if (c.implies(atom)) - return true; + if (intersection.isEmpty()) + return false; - return false; + targetAtom = intersection.getElements<CapabilityAtom>().getLast(); + return true; } -bool CapabilitySet::implies(const CapabilityConjunctionSet& set) const +bool CapabilitySet::implies(CapabilityAtom atom) const { - if (isEmpty()) + if (isEmpty() || atom == CapabilityAtom::Invalid) return false; - for (auto& c : m_conjunctions) - if (c.implies(set)) - return true; + CapabilitySet tmpSet = CapabilitySet(CapabilityName(atom)); - return false; + return this->implies(tmpSet); } -bool CapabilitySet::implies(CapabilitySet const& other) const +bool CapabilitySet::implies(CapabilitySet const& other, const bool onlyRequireSingleImply) const { // x implies (c | d) only if (x implies c) and (x implies d). - if (other.isEmpty()) - return true; - for (auto& c : other.m_conjunctions) - if (!implies(c)) - return false; - return true; -} - -bool CapabilitySet::operator==(CapabilitySet const& that) const -{ - return m_conjunctions == that.m_conjunctions; -} - -void CapabilitySet::calcCompactedAtoms(List<List<CapabilityAtom>>& outAtoms) const -{ - for (auto& c : m_conjunctions) - { - List<CapabilityAtom> atoms; - c.calcCompactedAtoms(atoms); - outAtoms.add(atoms); - } -} -void CapabilitySet::unionWith(const CapabilityConjunctionSet& conjunctionToAdd) -{ - // We add conjunctionToAdd to resultSet only if it does not imply any existing conjunctions. - // For example, if `resultSet` is (a), and conjunctionToAdd is (ab), then we don't want to add the conjunction - // to form (a | ab) because that would reduce to (a). - bool skipAdd = false; - for (auto& c : m_conjunctions) + for (const auto& otherTarget : other.m_targetSets) { - if (conjunctionToAdd.implies(c)) + auto thisTarget = this->m_targetSets.tryGetValue(otherTarget.first); + if (!thisTarget) { - skipAdd = true; - break; + // 'this' lacks a target 'other' has. + return false; } - } - if (!skipAdd) - { - // Once we added the new conjunction, any existing conjunctions that implies the new one can be - // removed. - // For example, if resultSet was (ab), and we are adding (a), the result should be just (a). - for (Index i = 0; i < m_conjunctions.getCount();) + + for (const auto& otherStage : otherTarget.second.shaderStageSets) { - if (m_conjunctions[i].implies(conjunctionToAdd)) + auto thisStage = thisTarget->shaderStageSets.tryGetValue(otherStage.first); + if (!thisStage) { - m_conjunctions.fastRemoveAt(i); + // 'this' lacks a stage 'other' has. + return false; } - else + + // all stage sets that are in 'other' must be contained by 'this' + if(thisStage->atomSet) { - i++; + auto& thisStageSet = thisStage->atomSet.value(); + if(otherStage.second.atomSet) + { + if (!onlyRequireSingleImply) + { + if (!thisStageSet.contains(otherStage.second.atomSet.value())) + return false; + } + else + { + if (thisStageSet.contains(otherStage.second.atomSet.value())) + return true; + } + } } } - m_conjunctions.add(conjunctionToAdd); } + return !onlyRequireSingleImply; } -void CapabilitySet::canonicalize() +void CapabilityTargetSet::unionWith(const CapabilityTargetSet& other) { - // Make sure conjunctions are sorted so equality tests are trivial. - m_conjunctions.sort(); + for (auto otherStageSet : other.shaderStageSets) + { + auto& thisStageSet = this->shaderStageSets[otherStageSet.first]; + thisStageSet.stage = otherStageSet.first; + + if (!thisStageSet.atomSet) + thisStageSet.atomSet = otherStageSet.second.atomSet; + else + if(otherStageSet.second.atomSet) + thisStageSet.atomSet->unionWith(*otherStageSet.second.atomSet); + } } -CapabilitySet CapabilitySet::getTargetsThisIsMissingFromOther(const CapabilitySet& other) +void CapabilitySet::unionWith(const CapabilitySet& other) { - CapabilitySet conflicts{}; - List<CapabilityConjunctionSet> textualTargetsNotHandled; - for (auto conjunction : this->m_conjunctions) + if (this->isInvalid() || other.isInvalid()) + return; + + this->m_targetSets.reserve(other.m_targetSets.getCount()); + for (auto otherTargetSet : other.m_targetSets) { - textualTargetsNotHandled.add({}); - auto& currentList = textualTargetsNotHandled.getLast(); - for (auto thatNode : conjunction.getExpandedAtoms()) - { - // To make this faster we can make an assumption that the nodes are: - // {textualTarget, targetAbstract(), targetAbstract(), nonTarget} - // this assumption is not being used since it relies on ordering of .capdef file - if (_getInfo(thatNode).abstractBase == CapabilityName::target) - currentList.getExpandedAtoms().add(thatNode); - } + CapabilityTargetSet& thisTargetSet = this->m_targetSets[otherTargetSet.first]; + thisTargetSet.target = otherTargetSet.first; + thisTargetSet.shaderStageSets.reserve(otherTargetSet.second.shaderStageSets.getCount()); + thisTargetSet.unionWith(otherTargetSet.second); } - for (auto& thatConjunction : other.m_conjunctions) - { - // Worth the check to early leave due to ~5*5 elements to loop around - if (textualTargetsNotHandled.getCount() == 0) - break; +} - for (int i = 0 ; i < textualTargetsNotHandled.getCount(); i++) - { - auto& textualTargets = textualTargetsNotHandled[i]; +/// Join sets, but: +/// 1. do not destroy target set's which are incompatible with `other` (destroying shaderStageSets is fine) +/// 2. do not create an `CapabilityAtom::Invalid` target set. +void CapabilitySet::nonDestructiveJoin(const CapabilitySet& other) +{ + if (this->isInvalid() || other.isInvalid()) + return; - if (textualTargets.countIntersectionWith(thatConjunction) != textualTargets.getExpandedAtoms().getCount()) - continue; - - textualTargetsNotHandled[i] = textualTargets.makeEmpty(); - } + if (this->isEmpty()) + { + this->m_targetSets = other.m_targetSets; + return; } - CapabilitySet set; - for (auto& i : textualTargetsNotHandled) + for (auto& thisTargetSet : this->m_targetSets) { - if (i.isEmpty()) - continue; - set.unionWith(i); + thisTargetSet.second.tryJoin(other.m_targetSets); } - return set; } -// We only run 'join' logic on "this" conjunctions which are compatiable with "other" conjunctions. -// We only add specific nodes which satisfy the abstractMask. -// Any non-compatible conjunctions with "other"s cconjunctions will be preserved and unmodified. -void CapabilitySet::simpleJoinWithSetMask(const CapabilitySet& other, CapabilityName abstractMask) +void CapabilitySet::addCapability(List<List<CapabilityAtom>>& atomLists) { - CapabilitySet resultSet; - HashSet<CapabilityConjunctionSet*> setUsed; - // get used abstract mask nodes per conjunction so we can trivially check - // if we need to add the abstract mask node to avoid duplicates - List<HashSet<CapabilityAtom>> abstractMaskNodeInUse; - abstractMaskNodeInUse.growToCount(m_conjunctions.getCount()); - for (int i = 0; i < m_conjunctions.getCount(); i++) + for (const auto& cr : atomLists) + addToTargetCapabilitesWithCanonicalRepresentation( (*(List<CapabilityName>*)(&cr)).getArrayView()); +} + +CapabilitySet CapabilitySet::getTargetsThisHasButOtherDoesNot(const CapabilitySet& other) +{ + CapabilitySet newSet{}; + for (auto& i : this->m_targetSets) { - auto& thisConjunction = m_conjunctions[i]; - auto& setOfInUseNode = abstractMaskNodeInUse[i]; + if (other.m_targetSets.tryGetValue(i.first)) + continue; - for (auto& atom : thisConjunction.getExpandedAtoms()) - { - if (_getInfo(atom).abstractBase != abstractMask) - continue; - setOfInUseNode.add(atom); - } + newSet.m_targetSets[i.first].target = i.first; + auto info = _getInfo(i.first); + if(info.canonicalRepresentation.getCount() > 0) + newSet.addToTargetCapabilityWithTargetAndStageAtom((CapabilityName)i.first, CapabilityName::Invalid, info.canonicalRepresentation[0]); } + return newSet; +} - for (auto& thatConjunction : other.m_conjunctions) - { - for (int i = 0; i < m_conjunctions.getCount(); i++) - { - auto& thisConjunction = m_conjunctions[i]; - auto& setOfInUseNode = abstractMaskNodeInUse[i]; - CapabilityConjunctionSet conjunctionToAddToResultSet; +/// Join `this` with a compatble stage set of `CapabilityTargetSet other`. +/// Return false when `other` is fully incompatible. +/// incompatability is when `this->stage` is not a supported stage by `other.shaderStageSets`. +bool CapabilityStageSet::tryJoin(const CapabilityTargetSet& other) +{ + const CapabilityStageSet* otherStageSet = other.shaderStageSets.tryGetValue(this->stage); + if (!otherStageSet) + return false; - if (thisConjunction.isIncompatibleWith(thatConjunction)) - continue; - conjunctionToAddToResultSet = thisConjunction; - setUsed.add(&thisConjunction); - for (auto atom : thatConjunction.getExpandedAtoms()) - { - if (_getInfo(atom).abstractBase != abstractMask - || setOfInUseNode.contains(atom)) - continue; - conjunctionToAddToResultSet.getExpandedAtoms().add(atom); - } - conjunctionToAddToResultSet.getExpandedAtoms().sort(); - resultSet.unionWith(conjunctionToAddToResultSet); - } - } - for (auto& c : m_conjunctions) + // should not exceed far beyond 2*2 or 1*1 elements + if(otherStageSet->atomSet && this->atomSet) + this->atomSet->add(*otherStageSet->atomSet); + + return true; +} + +/// Join a compatable target set from `this` with `CapabilityTargetSet other`. +/// Return false when `other` is fully incompatible. +/// incompatability is when one of 2 senarios are true: +/// 1. `this->target` is not a supported target by `other.shaderStageSets` +/// 2. `this` has completly disjoint shader stages from other. +bool CapabilityTargetSet::tryJoin(const CapabilityTargetSets& other) +{ + const CapabilityTargetSet* otherTargetSet = other.tryGetValue(this->target); + if (otherTargetSet == nullptr) + return false; + + List<CapabilityAtom> destroySet; + destroySet.reserve(this->shaderStageSets.getCount()); + for (auto& shaderStageSet : this->shaderStageSets) { - if (!setUsed.contains(&c)) - resultSet.m_conjunctions.add(c); + if (!shaderStageSet.second.tryJoin(*otherTargetSet)) + destroySet.add(shaderStageSet.first); } - m_conjunctions = resultSet.m_conjunctions; -} + if (destroySet.getCount() == Slang::Index(this->shaderStageSets.getCount())) + return false; + + for (const auto& i : destroySet) + this->shaderStageSets.remove(i); + + return true; +} void CapabilitySet::join(const CapabilitySet& other) { - if (isEmpty() || other.isInvalid()) + if (this->isEmpty() || other.isInvalid()) { *this = other; return; } - if (isInvalid()) + if (this->isInvalid()) return; if (other.isEmpty()) return; - CapabilitySet resultSet; - for (auto& thatConjunction : other.m_conjunctions) + List<CapabilityAtom> destroySet; + destroySet.reserve(this->m_targetSets.getCount()); + for (auto& thisTargetSet : this->m_targetSets) { - for (auto& thisConjunction : m_conjunctions) + if (!thisTargetSet.second.tryJoin(other.m_targetSets)) { - if (thisConjunction.isIncompatibleWith(thatConjunction)) - continue; + destroySet.add(thisTargetSet.first); + } + } + for (const auto& i : destroySet) + { + this->m_targetSets.remove(i); + } + // join made a invalid CapabilitySet + if (this->m_targetSets.getCount() == 0) + this->m_targetSets[CapabilityAtom::Invalid].target = CapabilityAtom::Invalid; +} - CapabilityConjunctionSet conjunction; - CapabilityConjunctionSet *conjunctionToAdd = nullptr; +static uint32_t _calcAtomListDifferenceScore(List<CapabilityAtom> const& thisList, List<CapabilityAtom> const& thatList) +{ + uint32_t score = 0; - // Add atoms from thatConjunction that are not existant in thisConjunction. - for (auto atom : thatConjunction.getExpandedAtoms()) - { - if (thisConjunction.getExpandedAtoms().binarySearch(atom, CapabilityAtomComparator()) == -1) - { - conjunction.getExpandedAtoms().add(atom); - } - } + // Our approach here will be to scan through `this` and `that` + // to identify atoms that are in `this` but not `that` (that is, + // the atoms that would be present in the set difference `this - that`) + // and then compute the maximum rank/score of those atoms. - if (conjunction.getExpandedAtoms().getCount() != 0) - { - // If we find any capabilities in thatConjunction that is missing from thisConjunction, - // create a new ConjunctionSet that contains atoms from both, and add it to the disjunction set. - conjunction.getExpandedAtoms().addRange(thisConjunction.getExpandedAtoms()); - conjunction.getExpandedAtoms().sort(); - conjunctionToAdd = &conjunction; - } - else + Index thisCount = thisList.getCount(); + Index thatCount = thatList.getCount(); + Index thisIndex = 0; + Index thatIndex = 0; + for (;;) + { + if (thisIndex == thisCount) break; + if (thatIndex == thatCount) break; + + auto thisAtom = thisList[thisIndex]; + auto thatAtom = thatList[thatIndex]; + + if (thisAtom == thatAtom) + { + thisIndex++; + thatIndex++; + continue; + } + + if (thisAtom < thatAtom) + { + // `thisAtom` is not present in `that`, so it + // should contribute to our ranking of the difference. + // + auto thisAtomInfo = _getInfo(thisAtom); + auto thisAtomRank = thisAtomInfo.rank; + + if (thisAtomRank > score) { - // Otherwise, thisConjunction implies thatConjunction, so we just add thisConjunction to resultSet. - conjunctionToAdd = &thisConjunction; + score = thisAtomRank; } - resultSet.unionWith(*conjunctionToAdd); + + thisIndex++; + } + else + { + SLANG_ASSERT(thisAtom > thatAtom); + thatIndex++; } } - m_conjunctions = _Move(resultSet.m_conjunctions); + return score; +} - if (m_conjunctions.getCount() == 0) - { - // If the result is empty, then we should return as impossible. - *this = CapabilitySet::makeInvalid(); - } - else +bool CapabilitySet::hasSameTargets(const CapabilitySet& other) const +{ + for (const auto& i : this->m_targetSets) { - canonicalize(); + if (!other.m_targetSets.tryGetValue(i.first)) + return false; } + return this->m_targetSets.getCount() == other.m_targetSets.getCount(); } -bool CapabilitySet::isBetterForTarget(CapabilitySet const& that, CapabilitySet const& targetCaps) const + +// MSVC incorrectly throws warning +#pragma warning(push) +#pragma warning(disable:4702) +/// returns true if 'this' is a better target for 'targetCaps' than 'that' +/// isEqual: is `this` and `that` equal +/// isIncompatible: is `this` and `that` incompatible +bool CapabilitySet::isBetterForTarget(CapabilitySet const& that, CapabilitySet const& targetCaps, bool& isEqual) const { - if (targetCaps.isIncompatibleWith(*this)) - return false; - if (targetCaps.isIncompatibleWith(that)) + if (this->isEmpty() && (that.isEmpty() || that.isInvalid())) + { + if(this->isEmpty() && that.isEmpty()) + isEqual = true; return true; + } - ArrayView<CapabilityConjunctionSet> thisSets = m_conjunctions.getArrayView(); - ArrayView<CapabilityConjunctionSet> thatSets = that.m_conjunctions.getArrayView(); - CapabilityConjunctionSet emtpySet = CapabilityConjunctionSet::makeEmpty(); - - if (isEmpty()) - thisSets = makeArrayViewSingle(emtpySet); - if (that.isEmpty()) - thatSets = makeArrayViewSingle(emtpySet); - - // It is hard to think about what it means exactly to compare a general disjunction set to another with regard - // to a target that itself is also a disjunction set. - // Instead of trying to find a meaning for the general case, we just want to extend the logic - // for conjunction sets to disjunction sets in a way that common situations are handled correctly. - // Note that when we reach here, most of these sets are likely to contain only one conjunction, so - // we just need to make sure the more general logic here yields correct result for that case. - // - // Right now, we define betterness for disjunctions as follows: - // A capability set X is determined to be better for a target T than capability set Y, - // if we find a conjunction A in X and a conjunction B in Y and a conjunction C in T such that - // A is better then B for target C. - // - struct ViableConjunctionIndex + // required to have target. + for (auto& targetWeNeed : targetCaps.m_targetSets) { - Index index; - UIntSet targetConjunctionIndices; - }; - auto getViableConjunction = [&](ArrayView<CapabilityConjunctionSet> set, List<ViableConjunctionIndex>& outList) + auto thisTarget = this->m_targetSets.tryGetValue(targetWeNeed.first); + if (!thisTarget) { - for (Index i = 0; i < set.getCount(); i++) - { - auto& conjunction = set[i]; - ViableConjunctionIndex viableConjunction; - viableConjunction.index = i; - for (Index j = 0; j < targetCaps.m_conjunctions.getCount(); j++) - { - auto& targetConjunction = targetCaps.m_conjunctions[j]; - if (conjunction.isIncompatibleWith(targetConjunction)) - continue; - viableConjunction.targetConjunctionIndices.add(j); - } - if (!viableConjunction.targetConjunctionIndices.isEmpty()) - { - outList.add(viableConjunction); - } - } - }; - List<ViableConjunctionIndex> viableConjunctionsThis; - List<ViableConjunctionIndex> viableConjunctionsThat; + isEqual = hasSameTargets(that); + return false; + } + auto thatTarget = that.m_targetSets.tryGetValue(targetWeNeed.first); + if (!thatTarget) + { + isEqual = hasSameTargets(that); + return true; + } - getViableConjunction(thisSets, viableConjunctionsThis); - getViableConjunction(thatSets, viableConjunctionsThat); - - for (auto& thisConjunctionIndex : viableConjunctionsThis) - { - auto& thisConjunction = thisSets[thisConjunctionIndex.index]; - for (auto& thatConjunctionIndex : viableConjunctionsThat) + // required to have shader stage + for (auto& shaderStageSetsWeNeed : targetWeNeed.second.shaderStageSets) { - auto& thatConjunction = thatSets[thatConjunctionIndex.index]; - UIntSet intersection = thisConjunctionIndex.targetConjunctionIndices; - intersection.intersectWith(thatConjunctionIndex.targetConjunctionIndices); - if (!intersection.isEmpty()) + auto thisStageSets = thisTarget->shaderStageSets.tryGetValue(shaderStageSetsWeNeed.first); + if (!thisStageSets) + return false; + auto thatStageSets = thatTarget->shaderStageSets.tryGetValue(shaderStageSetsWeNeed.first); + if (!thatStageSets) + return true; + + // We want the smallest (most specialized) set which is still contained by this/that. This means: + // 1. target.contains(this/that) + // 2. choose smallest super set + // 3. rank each super set and their atoms, choose the smallest rank'd set (most specialized) + if(shaderStageSetsWeNeed.second.atomSet) { - for (Index targetConjunctionIndex = 0; targetConjunctionIndex < targetCaps.m_conjunctions.getCount(); targetConjunctionIndex++) + auto& shaderStageSetWeNeed = shaderStageSetsWeNeed.second.atomSet.value(); + + CapabilityAtomSet tmp_set{}; + Index tmpCount = 0; + CapabilityAtomSet thisSet{}; + Index thisSetCount = 0; + CapabilityAtomSet thatSet{}; + Index thatSetCount = 0; + + // subtraction of the set we want gets us the "elements which 'targetSet' has but `this/that` is less specialized for" + if(thisStageSets->atomSet) { - if (!intersection.contains((UInt)targetConjunctionIndex)) - continue; - if (thisConjunction.isBetterForTarget(thatConjunction, targetCaps.m_conjunctions[targetConjunctionIndex])) + auto& thisStageSet = thisStageSets->atomSet.value(); + // if `thisStageSet` is more specialized than the target, `thisStageSet` should not be a candidate + if (thisStageSet == shaderStageSetWeNeed) + return true; + if (shaderStageSetWeNeed.contains(thisStageSet)) { - return true; + CapabilityAtomSet::calcSubtract(tmp_set, shaderStageSetWeNeed, thisStageSet); + tmpCount = tmp_set.countElements(); + if (thisSetCount < tmpCount) + { + thisSet = tmp_set; + thisSetCount = tmpCount; + } } } + if (thatStageSets->atomSet) + { + auto& thatStageSet = thatStageSets->atomSet.value(); + if (thatStageSet == shaderStageSetWeNeed) + return false; + if (shaderStageSetWeNeed.contains(thatStageSet)) + { + CapabilityAtomSet::calcSubtract(tmp_set, shaderStageSetWeNeed, thatStageSet); + tmpCount = tmp_set.countElements(); + if (thatSetCount < tmpCount) + { + thatSet = tmp_set; + thatSetCount = tmpCount; + } + } + } + + if (thisSet == thatSet) + isEqual = true; + + //empty means no candidate + if (thisSet.areAllZero()) + return false; + if (thatSet.areAllZero()) + return true; + if (thisSetCount < thatSetCount) + return true; + else if (thisSetCount > thatSetCount) + return false; + + auto thisSetElements = thisSet.getElements<CapabilityAtom>(); + auto thatSetElements = thisSet.getElements<CapabilityAtom>(); + auto shaderStageSetWeNeedElements = shaderStageSetWeNeed.getElements<CapabilityAtom>(); + + auto thisDiffScore = _calcAtomListDifferenceScore(thisSetElements, shaderStageSetWeNeedElements); + auto thatDiffScore = _calcAtomListDifferenceScore(thisSetElements, shaderStageSetWeNeedElements); + + return thisDiffScore < thatDiffScore; } } } - return false; + return true; } +#pragma warning(pop) -bool CapabilitySet::checkCapabilityRequirement(CapabilitySet const& available, CapabilitySet const& required, const CapabilityConjunctionSet*& outFailedAvailableSet) +CapabilitySet::AtomSets::Iterator CapabilitySet::getAtomSets() const +{ + return CapabilitySet::AtomSets::Iterator(&this->getCapabilityTargetSets()).begin(); +} + +bool CapabilitySet::checkCapabilityRequirement(CapabilitySet const& available, CapabilitySet const& required, CapabilityAtomSet& outFailedAvailableSet) { // Requirements x are met by available disjoint capabilities (a | b) iff // both 'a' satisfies x and 'b' satisfies x. @@ -1243,75 +762,85 @@ bool CapabilitySet::checkCapabilityRequirement(CapabilitySet const& available, C // We will check that for every capability conjunction X of F(), there is one capability conjunction Y in g() such that X implies Y. // - outFailedAvailableSet = nullptr; + // if empty there is no body, all capabilities are supported. + if (required.isEmpty()) + return true; if (required.isInvalid()) + { + outFailedAvailableSet.add((UInt)CapabilityAtom::Invalid); return false; + } // If F's capability is empty, we can satisfy any non-empty requirements. // if (available.isEmpty() && !required.isEmpty()) return false; - - for (auto& availTargetSet : available.getExpandedAtoms()) + + + // if all sets in `available` are not a super-set to at least 1 `required` set, then we have an err + for (auto& availableTarget : available.m_targetSets) { - bool implied = false; - for (auto& requiredTargetSet : required.getExpandedAtoms()) + auto reqTarget = required.m_targetSets.tryGetValue(availableTarget.first); + if (!reqTarget) { - if (availTargetSet.implies(requiredTargetSet)) - { - implied = true; - break; - } - } - if (!implied) - { - outFailedAvailableSet = &availTargetSet; + outFailedAvailableSet.add((UInt)availableTarget.first); return false; } - } - - return true; -} -bool CapabilitySet::isExactSubset(CapabilitySet const& maybeSuperSet) -{ - // This should only be used when absolutely required due to the - // cost for complex sets. Simple sets are fine (glsl|spirv...) - for (auto& thisCon : m_conjunctions) - { - bool foundEqualCon = false; - for (auto& thatCon : maybeSuperSet.m_conjunctions) + for (auto& availableStage : availableTarget.second.shaderStageSets) { - if (thisCon == thatCon) - foundEqualCon = true; - } - if (foundEqualCon == false) - return false; + auto reqStage = reqTarget->shaderStageSets.tryGetValue(availableStage.first); + if (!reqStage) + { + outFailedAvailableSet.add((UInt)availableStage.first); + return false; + } + + const CapabilityAtomSet* lastBadStage = nullptr; + if(availableStage.second.atomSet) + { + const auto& availableStageSet = availableStage.second.atomSet.value(); + lastBadStage = nullptr; + if(reqStage->atomSet) + { + const auto& reqStageSet = reqStage->atomSet.value(); + if (availableStageSet.contains(reqStageSet)) + break; + else + lastBadStage = &reqStageSet; + } + if (lastBadStage) + { + // get missing atoms + CapabilityAtomSet::calcSubtract(outFailedAvailableSet, *lastBadStage, availableStageSet); + return false; + } + } + } } + return true; } void printDiagnosticArg(StringBuilder& sb, const CapabilitySet& capSet) { bool isFirstSet = true; - for (auto& set : capSet.getExpandedAtoms()) + for (auto& set : capSet.getAtomSets()) { - List<CapabilityAtom> compactAtomList; - set.calcCompactedAtoms(compactAtomList); - if (!isFirstSet) { sb<< " | "; } bool isFirst = true; - for (auto atom : compactAtomList) + for (auto atom : set) { + CapabilityName formattedAtom = (CapabilityName)atom; if (!isFirst) { sb << " + "; } - auto name = capabilityNameToString((CapabilityName)atom); + auto name = capabilityNameToString((CapabilityName)formattedAtom); if (name.startsWith("_")) name = name.tail(1); sb << name; @@ -1331,4 +860,211 @@ void printDiagnosticArg(StringBuilder& sb, CapabilityName name) sb << _getInfo(name).name; } +#ifdef UNIT_TEST_CAPABILITIES + +#define CHECK_CAPS(inData) SLANG_ASSERT(inData>0) + +int TEST_findTargetCapSet(CapabilitySet& capSet, CapabilityAtom target) +{ + return true + && capSet.getCapabilityTargetSets().containsKey(target); +} + +int TEST_findTargetStage( + CapabilitySet& capSet, + CapabilityAtom target, + CapabilityAtom stage) +{ + return capSet.getCapabilityTargetSets()[target].shaderStageSets.containsKey(stage); +} + +int TEST_targetCapSetWithSpecificSetInStage( + CapabilitySet& capSet, + CapabilityAtom target, + CapabilityAtom stage, + List<CapabilityAtom> setToFind) +{ + + bool containsStageKey = capSet.getCapabilityTargetSets()[target].shaderStageSets.containsKey(stage); + if (!containsStageKey) + return 0; + + auto& stageSet = capSet.getCapabilityTargetSets()[target].shaderStageSets[stage]; + if (stage != stageSet.stage) + return -1; + + CapabilityAtomSet set; + for (auto i : setToFind) + set.add(UInt(i)); + + if (stageSet.atomSet) + { + auto& i = stageSet.atomSet.value(); + if (i == set) + return true; + } + + return -2; +} + +void TEST_CapabilitySet_addAtom() +{ + CapabilitySet testCapSet{}; + + // ------------------------------------------------------------ + + testCapSet = CapabilitySet(CapabilityName::TEST_ADD_1); + + CHECK_CAPS(TEST_findTargetCapSet(testCapSet, CapabilityAtom::hlsl)); + CHECK_CAPS(TEST_targetCapSetWithSpecificSetInStage(testCapSet, CapabilityAtom::hlsl, CapabilityAtom::vertex, + { CapabilityAtom::textualTarget, CapabilityAtom::hlsl, CapabilityAtom::vertex, + CapabilityAtom::_sm_4_0, CapabilityAtom::_sm_4_1 })); + + CHECK_CAPS(TEST_findTargetCapSet(testCapSet, CapabilityAtom::glsl)); + CHECK_CAPS(TEST_targetCapSetWithSpecificSetInStage(testCapSet, CapabilityAtom::glsl, CapabilityAtom::vertex, + { CapabilityAtom::textualTarget, CapabilityAtom::glsl, CapabilityAtom::vertex, + CapabilityAtom::_GLSL_130 })); + + CHECK_CAPS(TEST_findTargetCapSet(testCapSet, CapabilityAtom::spirv_1_0)); + CHECK_CAPS(TEST_targetCapSetWithSpecificSetInStage(testCapSet, CapabilityAtom::spirv_1_0, CapabilityAtom::vertex, + { CapabilityAtom::spirv_1_0, CapabilityAtom::vertex, + CapabilityAtom::spirv_1_1 })); + + CHECK_CAPS(TEST_findTargetCapSet(testCapSet, CapabilityAtom::metal)); + CHECK_CAPS(TEST_targetCapSetWithSpecificSetInStage(testCapSet, CapabilityAtom::metal, CapabilityAtom::vertex, + { CapabilityAtom::textualTarget, CapabilityAtom::metal, CapabilityAtom::vertex })); + + // ------------------------------------------------------------ + + testCapSet = CapabilitySet(CapabilityName::TEST_ADD_2); + + CHECK_CAPS(TEST_findTargetCapSet(testCapSet, CapabilityAtom::hlsl)); + CHECK_CAPS(TEST_targetCapSetWithSpecificSetInStage(testCapSet, CapabilityAtom::hlsl, CapabilityAtom::vertex, + { CapabilityAtom::textualTarget, CapabilityAtom::hlsl, CapabilityAtom::vertex, + CapabilityAtom::_sm_4_0, CapabilityAtom::_sm_4_1 })); + CHECK_CAPS(TEST_targetCapSetWithSpecificSetInStage(testCapSet, CapabilityAtom::hlsl, CapabilityAtom::fragment, + { CapabilityAtom::textualTarget, CapabilityAtom::hlsl, CapabilityAtom::fragment, + CapabilityAtom::_sm_4_0, CapabilityAtom::_sm_4_1 })); + + // ------------------------------------------------------------ + + testCapSet = CapabilitySet(CapabilityName::TEST_ADD_3); + + CHECK_CAPS((int)!TEST_findTargetCapSet(testCapSet, CapabilityAtom::spirv_1_0)); + CHECK_CAPS(TEST_findTargetCapSet(testCapSet, CapabilityAtom::glsl)); + CHECK_CAPS(TEST_targetCapSetWithSpecificSetInStage(testCapSet, CapabilityAtom::glsl, CapabilityAtom::fragment, + { CapabilityAtom::textualTarget, CapabilityAtom::glsl, CapabilityAtom::fragment, + CapabilityAtom::_GLSL_130 })); + // ------------------------------------------------------------ +} + +void TEST_CapabilitySet_join() +{ + CapabilitySet testCapSetA{}; + CapabilitySet testCapSetB{}; + + // ------------------------------------------------------------ + + testCapSetA = CapabilitySet(CapabilityName::TEST_JOIN_1A); + testCapSetB = CapabilitySet(CapabilityName::TEST_JOIN_1B); + testCapSetA.join(testCapSetB); + + CHECK_CAPS((int)!TEST_findTargetCapSet(testCapSetA, CapabilityAtom::hlsl)); + CHECK_CAPS((int)!TEST_findTargetCapSet(testCapSetA, CapabilityAtom::glsl)); + + // ------------------------------------------------------------ + + testCapSetA = CapabilitySet(CapabilityName::TEST_JOIN_2A); + testCapSetB = CapabilitySet(CapabilityName::TEST_JOIN_2B); + testCapSetA.join(testCapSetB); + + CHECK_CAPS(TEST_findTargetCapSet(testCapSetA, CapabilityAtom::hlsl)); + CHECK_CAPS(TEST_targetCapSetWithSpecificSetInStage(testCapSetA, CapabilityAtom::hlsl, CapabilityAtom::vertex, + { CapabilityAtom::textualTarget, CapabilityAtom::hlsl, CapabilityAtom::vertex, + CapabilityAtom::_sm_4_0, CapabilityAtom::_sm_4_1 })); + + // ------------------------------------------------------------ + + testCapSetA = CapabilitySet(CapabilityName::TEST_JOIN_3A); + testCapSetB = CapabilitySet(CapabilityName::TEST_JOIN_3B); + testCapSetA.join(testCapSetB); + + CHECK_CAPS((int)!TEST_findTargetCapSet(testCapSetA, CapabilityAtom::spirv_1_0)); + CHECK_CAPS(TEST_findTargetCapSet(testCapSetA, CapabilityAtom::glsl)); + CHECK_CAPS((int)!TEST_findTargetStage(testCapSetA, CapabilityAtom::glsl, CapabilityAtom::raygen)); + CHECK_CAPS(TEST_targetCapSetWithSpecificSetInStage(testCapSetA, CapabilityAtom::glsl, CapabilityAtom::fragment, + { CapabilityAtom::textualTarget, CapabilityAtom::glsl, CapabilityAtom::fragment, + CapabilityAtom::_GLSL_130, CapabilityAtom::_GLSL_140 })); + CHECK_CAPS(TEST_targetCapSetWithSpecificSetInStage(testCapSetA, CapabilityAtom::glsl, CapabilityAtom::vertex, + { CapabilityAtom::textualTarget, CapabilityAtom::glsl, CapabilityAtom::vertex, + CapabilityAtom::_GLSL_130, CapabilityAtom::_GLSL_140 })); + CHECK_CAPS(TEST_targetCapSetWithSpecificSetInStage(testCapSetA, CapabilityAtom::hlsl, CapabilityAtom::fragment, + { CapabilityAtom::textualTarget, CapabilityAtom::hlsl, CapabilityAtom::fragment, + CapabilityAtom::_sm_4_0, CapabilityAtom::_sm_4_1 })); + CHECK_CAPS(TEST_targetCapSetWithSpecificSetInStage(testCapSetA, CapabilityAtom::hlsl, CapabilityAtom::vertex, + { CapabilityAtom::textualTarget, CapabilityAtom::hlsl, CapabilityAtom::vertex, + CapabilityAtom::_sm_4_0 })); + + // ------------------------------------------------------------ + + testCapSetA = CapabilitySet(CapabilityName::TEST_JOIN_4A); + testCapSetB = CapabilitySet(CapabilityName::TEST_JOIN_4B); + testCapSetA.join(testCapSetB); + + CHECK_CAPS(TEST_findTargetCapSet(testCapSetA, CapabilityAtom::glsl)); + CHECK_CAPS(TEST_targetCapSetWithSpecificSetInStage(testCapSetA, CapabilityAtom::glsl, CapabilityAtom::fragment, + { CapabilityAtom::textualTarget, CapabilityAtom::glsl, CapabilityAtom::fragment, + CapabilityAtom::_GLSL_130, CapabilityAtom::_GLSL_140, CapabilityAtom::_GLSL_150, CapabilityAtom::_GL_EXT_texture_query_lod, CapabilityAtom::_GL_EXT_texture_shadow_lod })); + + // ------------------------------------------------------------ + + +} + +void TEST_CapabilitySet() +{ + TEST_CapabilitySet_addAtom(); + TEST_CapabilitySet_join(); +} + +/* +/// Test Capabilities + +alias TEST_ADD_1 = _sm_4_1 | _GLSL_130 | spirv_1_1 | metal + ; + +alias TEST_ADD_2 = _sm_4_1 | _sm_4_0 + shader_stages_compute_fragment + ; + +alias TEST_ADD_3 = _GLSL_130 + shader_stages_compute_fragment_geometry_vertex; + +// + +alias TEST_JOIN_1A = hlsl; +alias TEST_JOIN_1B = glsl; + +alias TEST_JOIN_2A = hlsl; +alias TEST_JOIN_2B = _sm_4_1 | glsl; + +alias TEST_JOIN_3A = glsl + fragment | _sm_4_0 + fragment + | glsl + vertex | hlsl + vertex + ; +alias TEST_JOIN_3B = _sm_4_1 + fragment + | _sm_4_0 + vertex + | _sm_4_0 + compute + | _GLSL_140 + vertex + | _GLSL_140 + fragment + | spirv_1_0 + fragment + | glsl + raygen + | hlsl + raygen + ; + +alias TEST_JOIN_4A = _GLSL_140 + _GL_EXT_texture_query_lod; +alias TEST_JOIN_4B = _GLSL_150 + _GL_EXT_texture_shadow_lod; +/// +*/ +#undef CHECK_CAPS + +#endif + } diff --git a/source/slang/slang-capability.h b/source/slang/slang-capability.h index feac03337..9e4bdb3a8 100644 --- a/source/slang/slang-capability.h +++ b/source/slang/slang-capability.h @@ -3,8 +3,10 @@ #include "../core/slang-list.h" #include "../core/slang-string.h" +#include "../core/slang-dictionary.h" #include <stdint.h> +#include <optional> namespace Slang { @@ -46,108 +48,47 @@ namespace Slang // // In all cases, we represent a set of capabilities with `CapabilitySet`. - /// A set of capabilities, representing features that are either supported or required -struct CapabilityConjunctionSet +struct CapabilityAtomSet : UIntSet { -public: - /// Default-construct an empty capability set - CapabilityConjunctionSet(); - - CapabilityConjunctionSet(CapabilityConjunctionSet const& other) = default; - CapabilityConjunctionSet& operator=(CapabilityConjunctionSet const& other) = default; - CapabilityConjunctionSet(CapabilityConjunctionSet&& other) = default; - CapabilityConjunctionSet& operator=(CapabilityConjunctionSet&& other) = default; - - /// Construct a capability set from an explicit list of atomic capabilities - CapabilityConjunctionSet(Int atomCount, CapabilityAtom const* atoms); - - /// Construct a capability set from an explicit list of atomic capabilities - explicit CapabilityConjunctionSet(List<CapabilityAtom> const& atoms); - - /// Construct a singleton set from a single atomic capability - explicit CapabilityConjunctionSet(CapabilityAtom atom); - - /// Make an empty capability set - static CapabilityConjunctionSet makeEmpty(); - - /// Make an invalid capability set (such that no target could ever support it) - static CapabilityConjunctionSet makeInvalid(); - - /// Is this capability set empty (such that any target supports it)? - bool isEmpty() const; - - /// Is this capability set invalid (such that no target could support it)? - bool isInvalid() const; - - // Capabilities are "incompatible" if no target platform can ever support both - // at the same time. For example, the `HLSL` and `GLSL` capabilities are - // incompatible, because a single target cannot be both an HLSL target and - // a GLSL target (at least for now). - // - // Note that we are using the term "incompatible" here even though it - // seems like "disjoint" would be intuitively correct (HLSL and GLSL - // targets sure do seem to be disjoint). The problem is that in our - // set-theoretic representation of capabilities, incompatible capability - // sets are *never* disjoint sets of atoms, and (valid) disjoint sets of atoms - // *never* represent incompatible capability sets. - - /// Is this capability set incompatible with the given `other` set. - bool isIncompatibleWith(CapabilityAtom other) const; - - /// Is this capability set incompatible with the given `other` atomic capability. - bool isIncompatibleWith(CapabilityConjunctionSet const& other) const; - - // One capability set A "implies" another set B if a target that - // supports A must also support all of B. - // - // In practice, this means that "A implies B" is the same as - // "A is a subset of B" in the set-theoretic model, but - // we ant to think of this primarily as supported/required features, - // and not get hung up on the set theory. - - /// Does this capability set imply all the capabilities in `other`? - bool implies(CapabilityConjunctionSet const& other) const; - - - /// Does this capability set imply the atomic capability `other`? - bool implies(CapabilityAtom other) const; - - // A capability set is equal to another if each implies the other. - - /// Are these two capability sets equal? - bool operator==(CapabilityConjunctionSet const& that) const; - bool operator<(CapabilityConjunctionSet const& that) const; - - /// Get access to the raw atomic capabilities that define this set. - List<CapabilityAtom> const& getExpandedAtoms() const { return m_expandedAtoms; } - List<CapabilityAtom>& getExpandedAtoms() { return m_expandedAtoms; } - - /// Calculate a list of "compacted" atoms, which excludes any atoms from the expanded list that are implies by another item in the list. - void calcCompactedAtoms(List<CapabilityAtom>& outAtoms) const; + using UIntSet::UIntSet; +}; - Int countIntersectionWith(CapabilityConjunctionSet const& that) const; +struct CapabilityTargetSet; +typedef Dictionary<CapabilityAtom, CapabilityTargetSet> CapabilityTargetSets; - bool isBetterForTarget(CapabilityConjunctionSet const& that, CapabilityConjunctionSet const& targetCaps) const; +/// CapabilityStageSet encapsulates all capabilities of a specific shader stage for a specific target. +/// Capabilities may be disjoint, but only in rare cases: +/// {{glsl, _GLSL_130, GL_EXT_FOO1}, {glsl, _GLSL_130, _GLSL_140, _GLSL_150}} +struct CapabilityStageSet +{ + CapabilityAtom stage{}; + + /// LinkedList of all disjoint sets for fast remove/add of unconstrained list positions. + std::optional<CapabilityAtomSet> atomSet{}; + + void addNewSet(CapabilityAtomSet&& setToAdd) + { + if (!atomSet) + atomSet = setToAdd; + else + atomSet->add(setToAdd); + } + bool tryJoin(const CapabilityTargetSet& other); +}; -private: - void _init(Int atomCount, CapabilityAtom const* atoms); +/// CapabilityTargetSet encapsulates all capabilities of a specific target +/// Format: {shader_stage, shader_stage_set} +typedef Dictionary<CapabilityAtom, CapabilityStageSet> CapabilityStageSets; +struct CapabilityTargetSet +{ + CapabilityAtom target{}; - uint32_t _calcDifferenceScoreWith(CapabilityConjunctionSet const& other) const; + CapabilityStageSets shaderStageSets{}; - // The underlying representation we use is a sorted and deduplicated - // list of all the (non-alias) atoms that are present in the set. - // This "expanded" list uses the transitive closure over the inheritnace - // relationship between the atoms. - // - List<CapabilityAtom> m_expandedAtoms; + bool tryJoin(const CapabilityTargetSets& other); + void unionWith(const CapabilityTargetSet& other); }; - /// Are the `left` and `right` capability sets unequal? -inline bool operator!=(CapabilityConjunctionSet const& left, CapabilityConjunctionSet const& right) -{ - return !(left == right); -} - struct CapabilitySet { public: @@ -168,9 +109,6 @@ public: /// Construct a singleton set from a single atomic capability explicit CapabilitySet(CapabilityName atom); - /// Construct a singleton set from conjunctions - explicit CapabilitySet(const List<CapabilityConjunctionSet>& conjunctions); - /// Make an empty capability set static CapabilitySet makeEmpty(); @@ -190,61 +128,158 @@ public: bool isIncompatibleWith(CapabilityName other) const; /// Is this capability set incompatible with the given `other` atomic capability. - bool isIncompatibleWith(CapabilityConjunctionSet const& other) const; - - /// Is this capability set incompatible with the given `other` atomic capability. bool isIncompatibleWith(CapabilitySet const& other) const; /// Does this capability set imply all the capabilities in `other`? - bool implies(CapabilitySet const& other) const; - - /// Does this capability set imply all the capabilities in `other`? - bool implies(CapabilityConjunctionSet const& other) const; + bool implies(CapabilitySet const& other, const bool onlyRequireSingleImply = false) const; /// Does this capability set imply the atomic capability `other`? bool implies(CapabilityAtom other) const; - /// Join two capability sets to form (this & other). + /// Join two capability sets to form ('this' & 'other'). + /// Destroy incompatible targets/sets apart of 'this' between ('this' & 'other'). + /// `this` may be made invalid if other is fully disjoint. void join(const CapabilitySet& other); - void unionWith(const CapabilityConjunctionSet& other); - - void simpleJoinWithSetMask(const CapabilitySet& other, CapabilityName abstractMask); + /// Join two capability sets to form ('this' & 'other'). + /// If a target/set has an incompatible atom, do not destroy the target/set. + void nonDestructiveJoin(const CapabilitySet& other); - CapabilitySet getTargetsThisIsMissingFromOther(const CapabilitySet& other); + /// Add all targets/sets of 'other' into 'this'. Overlapping sets are removed. + void unionWith(const CapabilitySet& other); - void canonicalize(); + /// Return a capability set of 'target' atoms 'this' has, but 'other' does not. + CapabilitySet getTargetsThisHasButOtherDoesNot(const CapabilitySet& other); /// Are these two capability sets equal? bool operator==(CapabilitySet const& that) const; - /// Get access to the raw atomic capabilities that define this set. - List<CapabilityConjunctionSet>& getExpandedAtoms() { return m_conjunctions; } - const List<CapabilityConjunctionSet>& getExpandedAtoms() const { return m_conjunctions; } - - + void addCapability(List<List<CapabilityAtom>>& atomLists); /// Calculate a list of "compacted" atoms, which excludes any atoms from the expanded list that are implies by another item in the list. - void calcCompactedAtoms(List<List<CapabilityAtom>>& outAtoms) const; - - bool isBetterForTarget(CapabilitySet const& that, CapabilitySet const& targetCaps) const; - - static bool checkCapabilityRequirement(CapabilitySet const& available, CapabilitySet const& required, const CapabilityConjunctionSet*& outFailedAvailableSet); - - bool isExactSubset(CapabilitySet const& maybeSuperSet); + bool isBetterForTarget(CapabilitySet const& that, CapabilitySet const& targetCaps, bool& isEqual) const; + + /// Find any capability sets which are in 'available' but not in 'required'. Return false if this situation occurs. + static bool checkCapabilityRequirement(CapabilitySet const& available, CapabilitySet const& required, CapabilityAtomSet& outFailedAvailableSet); + + inline void addToTargetCapabilityWithValidUIntSetAndTargetAndStage(CapabilityName target, CapabilityName stage, CapabilityAtomSet setToAdd); + inline void addToTargetCapabilityWithTargetAndStageAtom(CapabilityName target, CapabilityName stage, const ArrayView<CapabilityName>& canonicalRepresentation); + inline void addToTargetCapabilityWithTargetAndOrStageAtom(CapabilityName target, CapabilityName stage, const ArrayView<CapabilityName>& canonicalRepresentation); + inline void addToTargetCapabilityWithStageAtom(CapabilityName stage, const ArrayView<CapabilityName>& canonicalRepresentation); + inline void addToTargetCapabilitesWithCanonicalRepresentation(const ArrayView<CapabilityName>& atom); + inline void addUnexpandedCapabilites(CapabilityName atom); + + CapabilityTargetSets& getCapabilityTargetSets() { return m_targetSets; } + const CapabilityTargetSets& getCapabilityTargetSets() const { return m_targetSets; } + + struct AtomSets + { + struct Iterator + { + private: + const CapabilityTargetSets* context; + CapabilityTargetSets::ConstIterator targetNode{}; + CapabilityStageSets::ConstIterator stageNode{}; + const std::optional<CapabilityAtomSet>* atomSetNode; + + public: + operator bool() const + { + return atomSetNode->has_value(); + } + const CapabilityAtomSet& operator*() const + { + return *(*this->atomSetNode); + } + const CapabilityAtomSet* operator->() const + { + return &(*(*this->atomSetNode)); + } + bool operator==(const Iterator& other) const + { + return other.context == this->context + && other.targetNode == this->targetNode + && other.stageNode == this->stageNode + ; + } + bool operator!=(const Iterator& other) const + { + return !(other == *this); + } + + Iterator& operator++() + { + for(;;) + { + this->stageNode++; + if (this->stageNode == (*this->targetNode).second.shaderStageSets.end()) + { + for(;;) + { + this->targetNode++; + if (this->targetNode == this->context->end()) + { + this->stageNode = {}; + this->atomSetNode = {}; + return *this; + } + this->stageNode = (*this->targetNode).second.shaderStageSets.begin(); + if (this->stageNode == (*this->targetNode).second.shaderStageSets.end()) + continue; + break; + } + } + if (!(*this->stageNode).second.atomSet) + continue; + this->atomSetNode = &(*this->stageNode).second.atomSet; + break; + } + return *this; + } + Iterator& operator++(int) + { + return ++(*this); + } + Iterator begin() const + { + Iterator tmp(this->context); + tmp.targetNode = this->context->begin(); + if (tmp.targetNode == this->context->end()) + return tmp; + tmp.stageNode = (*tmp.targetNode).second.shaderStageSets.begin(); + if (tmp.stageNode == (*tmp.targetNode).second.shaderStageSets.end()) + { + tmp++; + return tmp; + } + tmp.atomSetNode = &(*tmp.stageNode).second.atomSet; + if (!tmp.atomSetNode->has_value()) + tmp++; + return tmp; + } + Iterator end() const + { + Iterator tmp(this->context); + tmp.targetNode = this->context->end(); + return tmp; + } + Iterator(const CapabilityTargetSets* mainContext) + { + context = mainContext; + } + }; + }; + /// Get access to the raw atomic capabilities that define this set. + /// Get all bottom level UIntSets for each CapabilityTargetSet. + CapabilitySet::AtomSets::Iterator getAtomSets() const; private: - // The underlying representation we use is a list of conjunctions. - // - List<CapabilityConjunctionSet> m_conjunctions; + /// underlying data of CapabilitySet. + CapabilityTargetSets m_targetSets{}; void addCapability(CapabilityName name); -}; -/// Are the `left` and `right` capability sets unequal? -inline bool operator!=(CapabilitySet const& left, CapabilitySet const& right) -{ - return !(left == right); -} + bool hasSameTargets(const CapabilitySet& other) const; +}; /// Returns true if atom is derived from base bool isCapabilityDerivedFrom(CapabilityAtom atom, CapabilityAtom base); @@ -262,4 +297,14 @@ bool isDirectChildOfAbstractAtom(CapabilityAtom name); void printDiagnosticArg(StringBuilder& sb, CapabilityAtom atom); void printDiagnosticArg(StringBuilder& sb, CapabilityName name); +const CapabilityAtomSet& getAtomSetOfTargets(); +const CapabilityAtomSet& getAtomSetOfStages(); + +bool hasTargetAtom(const CapabilityAtomSet& setIn, CapabilityAtom& targetAtom); + +//#define UNIT_TEST_CAPABILITIES +#ifdef UNIT_TEST_CAPABILITIES +void TEST_CapabilitySet(); +#endif + } diff --git a/source/slang/slang-check-decl.cpp b/source/slang/slang-check-decl.cpp index 9a4ee9d71..fbf5332b8 100644 --- a/source/slang/slang-check-decl.cpp +++ b/source/slang/slang-check-decl.cpp @@ -744,7 +744,7 @@ namespace Slang void visitInheritanceDecl(InheritanceDecl* inheritanceDecl); - void diagnoseUndeclaredCapability(Decl* decl, const DiagnosticInfo& diagnosticInfo, const CapabilityConjunctionSet* failedAvailableSet); + void diagnoseUndeclaredCapability(Decl* decl, const DiagnosticInfo& diagnosticInfo, const CapabilityAtomSet& failedAtomsInsideAvailableSet); }; @@ -9932,11 +9932,17 @@ namespace Slang oldCaps); } } + + // if stmt inside parent, set the provenance tracker to the calling function + if(!decl) + decl = visitor->getParentFuncOfVisitor(); if (referencedDecl && decl) { - for (auto& capSet : nodeCaps.getExpandedAtoms()) + for (auto& capSet : nodeCaps.getAtomSets()) { - for (auto atom : capSet.getExpandedAtoms()) + auto elements = capSet.getElements<CapabilityAtom>(); + decl->capabilityRequirementProvenance.reserve(decl->capabilityRequirementProvenance.getCount()+elements.getCount()); + for (auto atom : elements) { decl->capabilityRequirementProvenance.addIfNotExists(atom, DeclReferenceWithLoc{ referencedDecl, referenceLoc }); } @@ -10008,9 +10014,9 @@ namespace Slang } if (!maybeRequireCapability) - targetCap = (CapabilitySet(CapabilityName::any_target).getTargetsThisIsMissingFromOther(set)); + targetCap = (CapabilitySet(CapabilityName::any_target).getTargetsThisHasButOtherDoesNot(set)); else - targetCap = (maybeRequireCapability->capabilitySet.getTargetsThisIsMissingFromOther(set)); + targetCap = (maybeRequireCapability->capabilitySet.getTargetsThisHasButOtherDoesNot(set)); } else { @@ -10024,10 +10030,8 @@ namespace Slang { diagnoseCapabilityErrors(Base::getSink(), outerContext.getOptionSet(), targetCase->body->loc, Diagnostics::conflictingCapabilityDueToStatement, bodyCap, "target_switch", oldCap); } - for (auto& conjunction : targetCap.getExpandedAtoms()) - set.unionWith(conjunction); + set.unionWith(targetCap); } - set.canonicalize(); handleReferenceFunc(stmt, set, stmt->loc); } @@ -10092,8 +10096,7 @@ namespace Slang { for (auto decoration : parent->getModifiersOfType<RequireCapabilityAttribute>()) { - for (auto& set : decoration->capabilitySet.getExpandedAtoms()) - localDeclaredCaps.unionWith(set); + localDeclaredCaps.unionWith(decoration->capabilitySet); } } else @@ -10102,13 +10105,8 @@ namespace Slang shouldBreak = true; } // Merge decl's capability declaration with the parent. - for (auto& localConjunction : localDeclaredCaps.getExpandedAtoms()) - { - if (declaredCaps.isIncompatibleWith(localConjunction)) - declaredCaps.unionWith(localConjunction); - else - declaredCaps.join(localDeclaredCaps); - } + declaredCaps.nonDestructiveJoin(localDeclaredCaps); + // If the parent already has inferred capability requirements, we should stop now // since that already covers transitive parents. if (shouldBreak) @@ -10127,27 +10125,37 @@ namespace Slang decl->inferredCapabilityRequirements = getDeclaredCapabilitySet(decl); } - void SemanticsDeclCapabilityVisitor::visitFunctionDeclBase(FunctionDeclBase* funcDecl) + template<typename ProcessFunc> + static inline void _dispatchCapabilitiesVisitorOfFunctionDecl(SemanticsVisitor* visitor, FunctionDeclBase* funcDecl, ProcessFunc propegateFuncForReferences) { + visitor->setParentFuncOfVisitor(funcDecl); + for (auto member : funcDecl->members) { - ensureDecl(member, DeclCheckState::CapabilityChecked); - _propagateRequirement(this, funcDecl->inferredCapabilityRequirements, funcDecl, member, member->inferredCapabilityRequirements, member->loc); + visitor->ensureDecl(member, DeclCheckState::CapabilityChecked); + _propagateRequirement(visitor, funcDecl->inferredCapabilityRequirements, funcDecl, member, member->inferredCapabilityRequirements, member->loc); } - visitReferencedDecls(*this, funcDecl->body, funcDecl->loc, funcDecl->findModifier<RequireCapabilityAttribute>(), [this, funcDecl](SyntaxNode* node, const CapabilitySet& nodeCaps, SourceLoc refLoc) - { - _propagateRequirement(this, funcDecl->inferredCapabilityRequirements, funcDecl, node, nodeCaps, refLoc); - }); + + visitReferencedDecls(*visitor, funcDecl->body, funcDecl->loc, funcDecl->findModifier<RequireCapabilityAttribute>(), propegateFuncForReferences); if (!isEffectivelyStatic(funcDecl)) { auto parentAggTypeDecl = getParentAggTypeDecl(funcDecl); if (parentAggTypeDecl) { - ensureDecl(parentAggTypeDecl, DeclCheckState::CapabilityChecked); - _propagateRequirement(this, funcDecl->inferredCapabilityRequirements, funcDecl, parentAggTypeDecl, parentAggTypeDecl->inferredCapabilityRequirements, funcDecl->loc); + visitor->ensureDecl(parentAggTypeDecl, DeclCheckState::CapabilityChecked); + _propagateRequirement(visitor, funcDecl->inferredCapabilityRequirements, funcDecl, parentAggTypeDecl, parentAggTypeDecl->inferredCapabilityRequirements, funcDecl->loc); } } + } + + void SemanticsDeclCapabilityVisitor::visitFunctionDeclBase(FunctionDeclBase* funcDecl) + { + _dispatchCapabilitiesVisitorOfFunctionDecl(this, funcDecl, + [this, funcDecl](SyntaxNode* node, const CapabilitySet& nodeCaps, SourceLoc refLoc) + { + _propagateRequirement(this, funcDecl->inferredCapabilityRequirements, funcDecl, node, nodeCaps, refLoc); + }); auto declaredCaps = getDeclaredCapabilitySet(funcDecl); @@ -10169,26 +10177,12 @@ namespace Slang } auto vis = getDeclVisibility(funcDecl); + + // If 0 capabilities were annotated on a function, capabilities are inferred from the function body if (declaredCaps.isEmpty()) { - // If the user has not declared any capabilities, - // we should diagnose a warning if any_target is not - // a super-set by exact atoms. - if (vis == DeclVisibility::Public && !funcDecl->inferredCapabilityRequirements.isEmpty()) - { - if (!getModuleDecl(funcDecl)->isInLegacyLanguage) - { - if (!funcDecl->inferredCapabilityRequirements.isExactSubset(getAnyPlatformCapabilitySet())) - { - diagnoseCapabilityErrors( - getSink(), - this->getOptionSet(), - funcDecl->loc, - Diagnostics::missingCapabilityRequirementOnPublicDecl, - funcDecl, funcDecl->inferredCapabilityRequirements); - } - } - } + declaredCaps = funcDecl->inferredCapabilityRequirements; + return; } else { @@ -10199,7 +10193,7 @@ namespace Slang // At a minimum we will propagate shader requirements to our // function from calling children in all cases so the parent // can enforce shader targets correctly and propagate to `main` - const CapabilityConjunctionSet* failedAvailableCapabilityConjunction = nullptr; + CapabilityAtomSet failedAvailableCapabilityConjunction; if (!CapabilitySet::checkCapabilityRequirement( declaredCaps, funcDecl->inferredCapabilityRequirements, @@ -10209,7 +10203,7 @@ namespace Slang funcDecl->inferredCapabilityRequirements = declaredCaps; } else - funcDecl->inferredCapabilityRequirements.simpleJoinWithSetMask(declaredCaps, CapabilityName::stage); + funcDecl->inferredCapabilityRequirements.nonDestructiveJoin(declaredCaps); } else { @@ -10241,7 +10235,7 @@ namespace Slang ensureDecl(requirementDecl, DeclCheckState::CapabilityChecked); ensureDecl(implDecl.declRefBase, DeclCheckState::CapabilityChecked); - const CapabilityConjunctionSet* failedAvailableCapabilityConjunction = nullptr; + CapabilityAtomSet failedAvailableCapabilityConjunction; if (!CapabilitySet::checkCapabilityRequirement( requirementDecl->inferredCapabilityRequirements, implDecl.getDecl()->inferredCapabilityRequirements, @@ -10303,7 +10297,7 @@ namespace Slang return defaultVis; } - void diagnoseCapabilityProvenance(CompilerOptionSet& optionSet, DiagnosticSink* sink, Decl* decl, CapabilityAtom missingAtom) + void diagnoseCapabilityProvenance(CompilerOptionSet& optionSet, DiagnosticSink* sink, Decl* decl, CapabilityAtom atomToFind, bool optionallyNeverPrintDecl) { HashSet<Decl*> printedDecls; auto thisModule = getModuleDecl(decl); @@ -10311,9 +10305,9 @@ namespace Slang while (declToPrint) { printedDecls.add(declToPrint); - if (auto provenance = declToPrint->capabilityRequirementProvenance.tryGetValue(missingAtom)) + if (auto provenance = declToPrint->capabilityRequirementProvenance.tryGetValue(atomToFind)) { - sink->diagnose(provenance->referenceLoc, Diagnostics::seeUsingOf, provenance->referencedDecl); + diagnoseCapabilityErrors(sink, optionSet, provenance->referenceLoc, Diagnostics::seeUsingOf, provenance->referencedDecl); declToPrint = provenance->referencedDecl; if (printedDecls.contains(declToPrint)) break; @@ -10332,54 +10326,17 @@ namespace Slang break; } } - if (declToPrint) + if (declToPrint && !optionallyNeverPrintDecl) { diagnoseCapabilityErrors(sink, optionSet, declToPrint->loc, Diagnostics::seeDefinitionOf, declToPrint); } } - // Print diagnostics tracing which referenced decls are not compatible with the given atom. - void diagnoseIncompatibleAtomProvenance(SemanticsVisitor* visitor, DiagnosticSink* sink, Decl* decl, CapabilityAtom incompatibleAtom, int traceLevels = 10) + void SemanticsDeclCapabilityVisitor::diagnoseUndeclaredCapability(Decl* decl, const DiagnosticInfo& diagnosticInfo, const CapabilityAtomSet& failedAtomsInsideAvailableSet) { - Decl* refDecl = nullptr; - SourceLoc loc; - HashSet<Decl*> printedDecls; - while (traceLevels > 0) - { - refDecl = nullptr; - visitReferencedDecls(*visitor, decl, decl->loc, decl->findModifier<RequireCapabilityAttribute>(), [&](SyntaxNode* node, const CapabilitySet& nodeCaps, SourceLoc refLoc) - { - if (nodeCaps.isIncompatibleWith(incompatibleAtom)) - { - if (auto referencedDecl = as<Decl>(node)) - { - refDecl = referencedDecl; - loc = refLoc; - } - else - diagnoseCapabilityErrors(sink, visitor->getOptionSet(), refLoc, Diagnostics::seeDefinitionOf, "statement"); - } - }); - if (!refDecl) - break; - if (printedDecls.add(refDecl)) - { - diagnoseCapabilityErrors(sink, visitor->getOptionSet(), loc, Diagnostics::seeUsingOf, refDecl); - decl = refDecl; - } - else - { - break; - } - traceLevels--; - } - } - - void SemanticsDeclCapabilityVisitor::diagnoseUndeclaredCapability(Decl* decl, const DiagnosticInfo& diagnosticInfo, const CapabilityConjunctionSet* failedAvailableSet) - { - if (decl->inferredCapabilityRequirements.getExpandedAtoms().getCount() == 0) + if (decl->inferredCapabilityRequirements.isEmpty()) return; - if(!failedAvailableSet) + if(failedAtomsInsideAvailableSet.isEmpty() || failedAtomsInsideAvailableSet.contains((UInt)CapabilityAtom::Invalid)) return; // There are two causes for why type checking failed on failedAvailableSet. @@ -10394,90 +10351,51 @@ namespace Slang // } // In this case we should diagnose error reporting printf isn't defined on a required target. // - // The second scenario is when the callee is using a capability that is not provided by the requirement. - // For example: - // [require(hlsl,b,c)] - // void caller() - // { - // useD(); // require capability (hlsl,d) - // } - // In this case we should report that useD() is using a capability that is not declared by caller. - // - // Now, we detect if we are case 1. - if (decl->inferredCapabilityRequirements.isIncompatibleWith(*failedAvailableSet)) + { - // Find the most derived atom that is leading to the incompatiblity. - for (Index i = failedAvailableSet->getExpandedAtoms().getCount() - 1; i >= 0; i--) + CapabilityAtom outFailedAtom{}; + if (hasTargetAtom(failedAtomsInsideAvailableSet, outFailedAtom)) { - auto atom = failedAvailableSet->getExpandedAtoms()[i]; - if (!isDirectChildOfAbstractAtom(atom)) - continue; - if (decl->inferredCapabilityRequirements.isIncompatibleWith(atom)) + diagnoseCapabilityErrors(getSink(), this->getOptionSet(), decl->loc, Diagnostics::declHasDependenciesNotCompatibleOnTarget, decl, outFailedAtom); + + // Anything defined on a non-failed target atom may be the culprit to why we fail having a target capability. + // Print out all possible culprits. + CapabilityAtomSet failedAtomSet; + failedAtomSet.add((UInt)outFailedAtom); + CapabilityAtomSet targetsNotUsedSet; + CapabilityAtomSet::calcSubtract(targetsNotUsedSet, getAtomSetOfTargets(), failedAtomSet); + + for (auto atom : targetsNotUsedSet) { - diagnoseCapabilityErrors(getSink(), this->getOptionSet(), decl->loc, Diagnostics::declHasDependenciesNotDefinedOnTarget, decl, atom); - diagnoseIncompatibleAtomProvenance(this, getSink(), decl, atom); - return; + CapabilityAtom formattedAtom = (CapabilityAtom)atom; + diagnoseCapabilityProvenance(this->getOptionSet(), getSink(), decl, formattedAtom, true); } + return; } - return; } - // If we reach here, we are case 2. + //// The second scenario is when the callee is using a capability that is not provided by the requirement. + //// For example: + //// [require(hlsl,b,c)] + //// void caller() + //// { + //// useD(); // require capability (hlsl,d) + //// } + //// In this case we should report that useD() is using a capability that is not declared by caller. + //// - CapabilityConjunctionSet* matchingRequirement = &decl->inferredCapabilityRequirements.getExpandedAtoms().getFirst(); - CapabilityAtom missingAtom = matchingRequirement->getExpandedAtoms().getFirst(); - if (missingAtom == CapabilityAtom::Invalid) - return; + //// If we reach here, we are case 2. - if (failedAvailableSet) + // We will produce all failed atoms. This is important since provenance of multiple atoms + // can come from multiple referenced items in a function body. + for (auto i : failedAtomsInsideAvailableSet) { - Int maxIntersectionCount = 0; - for (auto& usedSet : decl->inferredCapabilityRequirements.getExpandedAtoms()) - { - auto intersection = usedSet.countIntersectionWith(*failedAvailableSet); - if (intersection > maxIntersectionCount) - { - matchingRequirement = &usedSet; - maxIntersectionCount = intersection; - } - } - Index pos = 0; - for (Index i = 0; i < matchingRequirement->getExpandedAtoms().getCount(); i++) - { - auto atom = matchingRequirement->getExpandedAtoms()[i]; - while (pos < failedAvailableSet->getExpandedAtoms().getCount()) - { - if (failedAvailableSet->getExpandedAtoms()[pos] < atom) - pos++; - else - break; - } - - if (pos >= failedAvailableSet->getExpandedAtoms().getCount() || - failedAvailableSet->getExpandedAtoms()[pos] != atom) - { - missingAtom = atom; - break; - } - } - - // Select the most derived atom of `missingAtom`. - for (Index i = matchingRequirement->getExpandedAtoms().getCount() - 1; i >= 0 ; i--) - { - auto atom = matchingRequirement->getExpandedAtoms()[i]; - if (CapabilityConjunctionSet(atom).implies(missingAtom)) - { - missingAtom = atom; - break; - } - } + CapabilityAtom formattedAtom = (CapabilityAtom)i; + diagnoseCapabilityErrors(getSink(), this->getOptionSet(), decl->loc, diagnosticInfo, decl, formattedAtom); + // Print provenances. + diagnoseCapabilityProvenance(this->getOptionSet(), getSink(), decl, formattedAtom); } - - diagnoseCapabilityErrors(getSink(), this->getOptionSet(), decl->loc, diagnosticInfo, decl, missingAtom); - - // Print provenances. - diagnoseCapabilityProvenance(this->getOptionSet(), getSink(), decl, missingAtom); } } diff --git a/source/slang/slang-check-impl.h b/source/slang/slang-check-impl.h index 20139b4e4..569e27e7c 100644 --- a/source/slang/slang-check-impl.h +++ b/source/slang/slang-check-impl.h @@ -825,6 +825,9 @@ namespace Slang return result; } + FunctionDeclBase* getParentFuncOfVisitor() { return m_parentFunc; } + void setParentFuncOfVisitor(FunctionDeclBase* funcDecl) { m_parentFunc = funcDecl; } + SemanticsContext withParentFunc(FunctionDeclBase* parentFunc) { SemanticsContext result(*this); @@ -2786,7 +2789,7 @@ namespace Slang DeclVisibility getDeclVisibility(Decl* decl); - void diagnoseCapabilityProvenance(CompilerOptionSet& optionSet, DiagnosticSink* sink, Decl* decl, CapabilityAtom missingAtom); + void diagnoseCapabilityProvenance(CompilerOptionSet& optionSet, DiagnosticSink* sink, Decl* decl, CapabilityAtom atomToFind, bool optionallyNeverPrintDecl = false); void _ensureAllDeclsRec( SemanticsDeclVisitorBase* visitor, diff --git a/source/slang/slang-check-modifier.cpp b/source/slang/slang-check-modifier.cpp index b8ff1a116..aa30f66ca 100644 --- a/source/slang/slang-check-modifier.cpp +++ b/source/slang/slang-check-modifier.cpp @@ -1641,8 +1641,7 @@ namespace Slang previous = m; continue; } - for(auto& con : req->capabilitySet.getExpandedAtoms()) - firstRequire->capabilitySet.unionWith(con); + firstRequire->capabilitySet.unionWith(req->capabilitySet); if(previous) previous->next = next; continue; diff --git a/source/slang/slang-check-shader.cpp b/source/slang/slang-check-shader.cpp index 2c1f8651c..7a6deed5c 100644 --- a/source/slang/slang-check-shader.cpp +++ b/source/slang/slang-check-shader.cpp @@ -520,29 +520,27 @@ namespace Slang if (targetCaps.isIncompatibleWith(entryPointFuncDecl->inferredCapabilityRequirements)) { diagnoseCapabilityErrors(sink, linkage->m_optionSet, entryPointFuncDecl, Diagnostics::entryPointUsesUnavailableCapability, entryPointFuncDecl, entryPointFuncDecl->inferredCapabilityRequirements, targetCaps); - auto& interredCapConjunctions = entryPointFuncDecl->inferredCapabilityRequirements.getExpandedAtoms(); - + // Find out what exactly is incompatible and print out a trace of provenance to // help user diagnose their code. - auto& conjunctions = targetCaps.getExpandedAtoms(); - if (conjunctions.getCount() == 1 && interredCapConjunctions.getCount() == 1) + // TODO: provedence should have a way to filter out for provenance that are missing X capabilitySet from their caps, else in big functions we get junk errors + // This is specifically a problem for when a function is missing a target but otherwise has identical capabilities. + + const auto& interredCapConjunctions = entryPointFuncDecl->inferredCapabilityRequirements.getAtomSets(); + const auto& compileCaps = targetCaps.getAtomSets(); + if (compileCaps && interredCapConjunctions) { - for (auto atom : conjunctions[0].getExpandedAtoms()) + for (auto inferredAtom : *interredCapConjunctions.begin()) { - for (auto inferredAtom : interredCapConjunctions[0].getExpandedAtoms()) + CapabilityAtom inferredAtomFormatted = (CapabilityAtom)inferredAtom; + if (!compileCaps->contains((UInt)inferredAtom)) { - if (CapabilityConjunctionSet(inferredAtom).isIncompatibleWith(atom)) - { - diagnoseCapabilityProvenance(linkage->m_optionSet, sink, entryPointFuncDecl, inferredAtom); - goto breakLabel; - } + diagnoseCapabilityProvenance(linkage->m_optionSet, sink, entryPointFuncDecl, inferredAtomFormatted); } } } } } - breakLabel:; - } // Given an entry point specified via API or command line options, diff --git a/source/slang/slang-check-stmt.cpp b/source/slang/slang-check-stmt.cpp index 89ec82e48..ae817f867 100644 --- a/source/slang/slang-check-stmt.cpp +++ b/source/slang/slang-check-stmt.cpp @@ -340,7 +340,7 @@ namespace Slang } if (stmt->capabilityToken.getContentLength() != 0 && - (set.getExpandedAtoms().getCount() != 1 || set.isInvalid() || set.isEmpty())) + (set.getCapabilityTargetSets().getCount() != 1 || set.isInvalid() || set.isEmpty())) { getSink()->diagnose( stmt->capabilityToken.loc, diff --git a/source/slang/slang-compiler.cpp b/source/slang/slang-compiler.cpp index b2b765c0e..5ef9a50b1 100644 --- a/source/slang/slang-compiler.cpp +++ b/source/slang/slang-compiler.cpp @@ -614,11 +614,11 @@ namespace Slang GLSLExtensionTracker* extensionTracker, CapabilitySet const& caps) { - for( auto conjunctions : caps.getExpandedAtoms() ) + for(auto& conjunctions : caps.getAtomSets() ) { - for (auto atom : conjunctions.getExpandedAtoms()) + for (auto atom : conjunctions) { - switch (atom) + switch ((CapabilityAtom)atom) { default: break; diff --git a/source/slang/slang-diagnostic-defs.h b/source/slang/slang-diagnostic-defs.h index 85989b26d..9ba1e4724 100644 --- a/source/slang/slang-diagnostic-defs.h +++ b/source/slang/slang-diagnostic-defs.h @@ -387,7 +387,7 @@ DIAGNOSTIC(36104, Error, useOfUndeclaredCapabilityOfInterfaceRequirement, "'$0' DIAGNOSTIC(36105, Error, unknownCapability, "unknown capability name '$0'.") DIAGNOSTIC(36106, Error, expectCapability, "expect a capability name.") DIAGNOSTIC(36107, Error, entryPointUsesUnavailableCapability, "entrypoint '$0' requires capability '$1', which is incompatible with the current compilation target '$2'.") -DIAGNOSTIC(36108, Error, declHasDependenciesNotDefinedOnTarget, "'$0' has dependencies that are not defined on the required target '$1'.") +DIAGNOSTIC(36108, Error, declHasDependenciesNotCompatibleOnTarget, "'$0' has dependencies that are not compatible on the required target '$1'.") DIAGNOSTIC(36109, Error, invalidTargetSwitchCase, "'$0' cannot be used as a target_switch case.") DIAGNOSTIC(36110, Error, stageIsInCompatibleWithCapabilityDefinition, "'$0' is defined for stage '$1', which is incompatible with the declared capability set '$2'.") @@ -725,6 +725,7 @@ DIAGNOSTIC(41000, Warning, unreachableCode, "unreachable code detected") DIAGNOSTIC(41001, Error, recursiveType, "type '$0' contains cyclic reference to itself.") DIAGNOSTIC(41010, Warning, missingReturn, "control flow may reach end of non-'void' function") +DIAGNOSTIC(41011, Error, profileIncompatibleWithTargetSwitch, "__target_switch has no compatable target with current profile '$0'") DIAGNOSTIC(41015, Error, usingUninitializedValue, "use of uninitialized value '$0'") DIAGNOSTIC(41016, Warning, returningWithUninitializedOut, "returning without initializing out parameter '$0'") DIAGNOSTIC(41017, Warning, returningWithPartiallyUninitializedOut, "returning without fully initializing out parameter '$0'") diff --git a/source/slang/slang-ir-link.cpp b/source/slang/slang-ir-link.cpp index e652745e7..26d96690f 100644 --- a/source/slang/slang-ir-link.cpp +++ b/source/slang/slang-ir-link.cpp @@ -1135,8 +1135,10 @@ bool isBetterForTarget( if(newCaps.isInvalid()) return false; if(oldCaps.isInvalid()) return true; - if(newCaps != oldCaps) - return newCaps.implies(oldCaps); + bool isEqual = false; + bool isNewBetter = newCaps.isBetterForTarget(oldCaps, targetCaps, isEqual); + if(!isEqual) + return isNewBetter; // All preceding factors being equal, an `[export]` is better // than an `[import]`. @@ -1882,7 +1884,7 @@ LinkedIR linkIR( } // Specialize target_switch branches to use the best branch for the target. - specializeTargetSwitch(targetReq, state->irModule); + specializeTargetSwitch(targetReq, state->irModule, codeGenContext->getSink()); // Diagnose on unresolved symbols if we are compiling into a target that does // not allow incomplete symbols. diff --git a/source/slang/slang-ir-specialize-target-switch.cpp b/source/slang/slang-ir-specialize-target-switch.cpp index f4cb6bfa7..fac1dd484 100644 --- a/source/slang/slang-ir-specialize-target-switch.cpp +++ b/source/slang/slang-ir-specialize-target-switch.cpp @@ -7,13 +7,15 @@ namespace Slang { - void specializeTargetSwitch(TargetRequest* target, IRGlobalValueWithCode* code) + void specializeTargetSwitch(TargetRequest* target, IRGlobalValueWithCode* code, DiagnosticSink* sink) { bool changed = false; for (auto block : code->getBlocks()) { + bool failedImplies = false; if (auto targetSwitch = as<IRTargetSwitch>(block->getTerminator())) { + bool isEqual; CapabilitySet bestCapSet = CapabilitySet::makeInvalid(); IRBlock* targetBlock = nullptr; for (UInt i = 0; i < targetSwitch->getCaseCount(); i++) @@ -22,14 +24,22 @@ namespace Slang if (target->getTargetCaps().isIncompatibleWith(cap)) continue; CapabilitySet capSet; - if (cap == CapabilityName::Invalid) + if (cap == CapabilityName::Invalid) // `default` case capSet = CapabilitySet::makeEmpty(); else capSet = CapabilitySet(cap); - if (capSet.isBetterForTarget(bestCapSet, target->getTargetCaps())) + bool isBetterForTarget = capSet.isBetterForTarget(bestCapSet, target->getTargetCaps(), isEqual); + if (isBetterForTarget) { - targetBlock = targetSwitch->getCaseBlock(i); - bestCapSet = capSet; + bool targetImpliesCapSet = (target->getTargetCaps().implies(capSet, true) || capSet.isEmpty()); + if (targetImpliesCapSet) + { + // Now check if bestCapSet contains targetCaps. If it does not then this is an invalid target + targetBlock = targetSwitch->getCaseBlock(i); + bestCapSet = capSet; + } + else + failedImplies = true; } } IRBuilder builder(targetSwitch); @@ -40,6 +50,10 @@ namespace Slang } else { + // only error if we have the chance of setting a valid target switch, but did not due to incompatability within same `target` atom. + // Otherwise we will have an issue when we process a `__target_switch() { case metal: return; }` for glsl targets. + if(failedImplies) + sink->diagnose(targetSwitch->sourceLoc, Diagnostics::profileIncompatibleWithTargetSwitch, target->getTargetCaps()); builder.emitMissingReturn(); } targetSwitch->removeAndDeallocate(); @@ -53,19 +67,19 @@ namespace Slang } } - void specializeTargetSwitch(TargetRequest* target, IRModule* module) + void specializeTargetSwitch(TargetRequest* target, IRModule* module, DiagnosticSink* sink) { for (auto globalInst : module->getGlobalInsts()) { if (auto code = as<IRGlobalValueWithCode>(globalInst)) { - specializeTargetSwitch(target, code); + specializeTargetSwitch(target, code, sink); if (auto gen = as<IRGeneric>(code)) { auto retVal = findGenericReturnVal(gen); if (auto innerCode = as<IRGlobalValueWithCode>(retVal)) { - specializeTargetSwitch(target, innerCode); + specializeTargetSwitch(target, innerCode, sink); } } } diff --git a/source/slang/slang-ir-specialize-target-switch.h b/source/slang/slang-ir-specialize-target-switch.h index 91071cec6..03fd7d85a 100644 --- a/source/slang/slang-ir-specialize-target-switch.h +++ b/source/slang/slang-ir-specialize-target-switch.h @@ -5,10 +5,11 @@ namespace Slang { struct IRModule; class TargetRequest; + class DiagnosticSink; // Repalce all target_switch insts with the case that matches current target. // - void specializeTargetSwitch(TargetRequest* target, IRModule* module); + void specializeTargetSwitch(TargetRequest* target, IRModule* module, DiagnosticSink* sink); } diff --git a/source/slang/slang-ir.cpp b/source/slang/slang-ir.cpp index b53bfc9d1..c0bec9654 100644 --- a/source/slang/slang-ir.cpp +++ b/source/slang/slang-ir.cpp @@ -334,7 +334,7 @@ namespace Slang for (Index i = 0; i < count; ++i) { auto operand = cast<IRCapabilitySet>(getOperand(i)); - result.getExpandedAtoms().addRange(operand->getCaps().getExpandedAtoms()); + result.unionWith(operand->getCaps()); } return result; } @@ -2440,13 +2440,12 @@ namespace Slang // be a minimal list of atoms such that they will produce // the same `CapabilitySet` when expanded. - List<List<CapabilityAtom>> compactedAtoms; - caps.calcCompactedAtoms(compactedAtoms); + auto compactedAtoms = caps.getAtomSets(); List<IRInst*> conjunctions; - for( auto atomConjunction : compactedAtoms ) + for( auto& atomConjunctionSet : compactedAtoms ) { List<IRInst*> args; - for (auto atom : atomConjunction) + for (auto atom : atomConjunctionSet) args.add(getIntValue(capabilityAtomType, Int(atom))); auto conjunctionInst = createIntrinsicInst( capabilitySetType, kIROp_CapabilityConjunction, args.getCount(), args.getBuffer()); @@ -8284,7 +8283,8 @@ namespace Slang continue; } - if(!bestDecoration || decorationCaps.isBetterForTarget(bestCaps, targetCaps)) + bool isEqual; + if(!bestDecoration || decorationCaps.isBetterForTarget(bestCaps, targetCaps, isEqual)) { bestDecoration = decoration; bestCaps = decorationCaps; diff --git a/source/slang/slang-serialize-ast-type-info.h b/source/slang/slang-serialize-ast-type-info.h index 96c8a438f..1d7628cd4 100644 --- a/source/slang/slang-serialize-ast-type-info.h +++ b/source/slang/slang-serialize-ast-type-info.h @@ -78,47 +78,178 @@ struct PtrSerialTypeInfo<T, std::enable_if_t<std::is_base_of_v<Val, T>>> template <typename T> struct SerialTypeInfo<DeclRef<T>> : public SerialTypeInfo<DeclRefBase*> {}; +// UIntSet + template<> -struct SerialTypeInfo<CapabilitySet> +struct SerialTypeInfo<CapabilityAtomSet> { - typedef CapabilitySet NativeType; + typedef CapabilityAtomSet NativeType; typedef SerialIndex SerialType; - enum { SerialAlignment = SLANG_ALIGN_OF(SerialType) }; + enum { SerialAlignment = SLANG_ALIGN_OF(SerialIndex) }; + static void toSerial(SerialWriter* writer, const void* native, void* serial) + { + auto& src = *(NativeType*)native; + auto& dst = *(SerialType*)serial; + + dst = writer->addArray(src.getBuffer().getBuffer(), src.getBuffer().getCount()); + } + static void toNative(SerialReader* reader, const void* serial, void* native) + { + auto& dst = *(NativeType*)native; + auto& src = *(const SerialType*)serial; + + List<CapabilityAtomSet::Element> UIntSetBuffer; + reader->getArray(src, UIntSetBuffer); + + dst = CapabilityAtomSet(UIntSetBuffer); + } +}; + +// ~UIntSet + +template<> +struct SerialTypeInfo<CapabilityStageSet> +{ + struct SerialType + { + SerialIndex stage; + SerialIndex atomSet; + }; + + typedef CapabilityStageSet NativeType; + enum { SerialAlignment = SLANG_ALIGN_OF(SerialIndex) }; static void toSerial(SerialWriter* writer, const void* native, void* serial) { auto& src = *(const NativeType*)native; auto& dst = *(SerialType*)serial; - dst = writer->addArray(src.getExpandedAtoms().getBuffer(), src.getExpandedAtoms().getCount()); + List<SerialTypeInfo<CapabilityStageSet>::SerialType> SatomSetsList; + SatomSetsList.setCount(src.atomSet.has_value()); + + if(src.atomSet) + { + auto& i = src.atomSet.value(); + SerialTypeInfo<CapabilityAtomSet>::toSerial(writer, &i, &SatomSetsList[0]); + } + + SerialTypeInfo<CapabilityAtom>::toSerial(writer, &src.stage, &dst.stage); + dst.atomSet = writer->addSerialArray<CapabilityStageSet>(SatomSetsList.getBuffer(), SatomSetsList.getCount()); } static void toNative(SerialReader* reader, const void* serial, void* native) { auto& dst = *(NativeType*)native; auto& src = *(const SerialType*)serial; - reader->getArray(src, dst.getExpandedAtoms()); + CapabilityAtom stage; + List<CapabilityAtomSet> items; + SerialTypeInfo<CapabilityAtom>::toNative(reader, &src.stage, &stage); + reader->getArray(src.atomSet, items); + + dst.stage = stage; + + for (auto i : items) + { + dst.addNewSet(std::move(i)); + } } }; template<> -struct SerialTypeInfo<CapabilityConjunctionSet> +struct SerialTypeInfo<CapabilityTargetSet> { - typedef CapabilityConjunctionSet NativeType; - typedef SerialIndex SerialType; - enum { SerialAlignment = SLANG_ALIGN_OF(SerialType) }; + struct SerialType + { + SerialIndex target; + SerialIndex shaderStageSets; + }; + + typedef CapabilityTargetSet NativeType; + enum { SerialAlignment = SLANG_ALIGN_OF(SerialIndex) }; static void toSerial(SerialWriter* writer, const void* native, void* serial) { auto& src = *(const NativeType*)native; auto& dst = *(SerialType*)serial; - dst = writer->addArray(src.getExpandedAtoms().getBuffer(), src.getExpandedAtoms().getCount()); + List<SerialTypeInfo<CapabilityStageSet>::SerialType> SStageSetList; + SStageSetList.setCount(src.shaderStageSets.getCount()); + Index iter = 0; + for (auto& i : src.shaderStageSets) + { + SerialTypeInfo<CapabilityStageSet>::toSerial(writer, &i.second, &SStageSetList[iter]); + iter++; + } + + SerialTypeInfo<CapabilityAtom>::toSerial(writer, &src.target, &dst.target); + dst.shaderStageSets = writer->addSerialArray<CapabilityStageSet>(SStageSetList.getBuffer(), SStageSetList.getCount()); } static void toNative(SerialReader* reader, const void* serial, void* native) { auto& dst = *(NativeType*)native; auto& src = *(const SerialType*)serial; - reader->getArray(src, dst.getExpandedAtoms()); + CapabilityAtom target; + List<CapabilityStageSet> items; + SerialTypeInfo<CapabilityAtom>::toNative(reader, &src.target, &target); + reader->getArray(src.shaderStageSets, items); + + dst.target = target; + + auto& shaderStageSets = dst.shaderStageSets; + shaderStageSets.clear(); + shaderStageSets.reserve(items.getCount()); + Index iter = 0; + for (auto& i : items) + { + dst.shaderStageSets[i.stage] = i; + iter++; + } + } +}; + +template<> +struct SerialTypeInfo<CapabilitySet> +{ + struct SerialType + { + SerialIndex m_targetSets; + }; + + typedef CapabilitySet NativeType; + enum { SerialAlignment = SLANG_ALIGN_OF(SerialIndex) }; + static void toSerial(SerialWriter* writer, const void* native, void* serial) + { + auto& src = *(const NativeType*)native; + auto& dst = *(SerialType*)serial; + + List<SerialTypeInfo<CapabilityTargetSet>::SerialType> STargetSetList; + auto capabilityTargetSets = src.getCapabilityTargetSets(); + STargetSetList.setCount(capabilityTargetSets.getCount()); + Index iter = 0; + for (auto& i : capabilityTargetSets) + { + SerialTypeInfo<CapabilityTargetSet>::toSerial(writer, &i.second, &STargetSetList[iter]); + iter++; + } + + dst.m_targetSets = writer->addSerialArray<CapabilityTargetSet>(STargetSetList.getBuffer(), STargetSetList.getCount()); + } + static void toNative(SerialReader* reader, const void* serial, void* native) + { + auto& dst = *(NativeType*)native; + auto& src = *(const SerialType*)serial; + + List<CapabilityTargetSet> items; + reader->getArray(src.m_targetSets, items); + + auto& targetSets = dst.getCapabilityTargetSets(); + targetSets.clear(); + targetSets.reserve(items.getCount()); + Index iter = 0; + for (auto& i : items) + { + targetSets[i.target] = i; + iter++; + } } }; diff --git a/source/slang/slang.cpp b/source/slang/slang.cpp index e43c9a556..4d83823d2 100644 --- a/source/slang/slang.cpp +++ b/source/slang/slang.cpp @@ -1740,7 +1740,9 @@ CapabilitySet TargetRequest::getTargetCaps() CapabilitySet targetCap = CapabilitySet(atoms); CapabilitySet latestSpirvCapSet = CapabilitySet(CapabilityName::spirv_latest); - CapabilityName latestSpirvAtom = (CapabilityName)latestSpirvCapSet.getExpandedAtoms()[0].getExpandedAtoms().getLast(); + auto latestSpirvCapSetElements = latestSpirvCapSet.getAtomSets()->getElements<CapabilityAtom>(); + CapabilityName latestSpirvAtom = (CapabilityName)latestSpirvCapSetElements[latestSpirvCapSetElements.getCount()-2]; //-1 gets shader stage + for (auto atomVal : optionSet.getArray(CompilerOptionName::Capability)) { auto atom = (CapabilityName)atomVal.intValue; diff --git a/source/slang/slang.natvis b/source/slang/slang.natvis index c86faa065..21db4016f 100644 --- a/source/slang/slang.natvis +++ b/source/slang/slang.natvis @@ -842,4 +842,114 @@ <Type Name="Slang::BasicExpressionType"> <DisplayString>BasicExpressionType ({*(DeclRefBase*)m_operands.m_buffer[0].values.nodeOperand})</DisplayString> </Type> + + <Type Name="Slang::CapabilitySet"> + <DisplayString>{m_targetSets.map.m_values}</DisplayString> + <Expand> + <CustomListItems> + <Item Name="m_targetSets">m_targetSets</Item> + <Item Name="m_targetSets.map.m_values">m_targetSets.map.m_values</Item> + </CustomListItems> + </Expand> + </Type> + <Type Name="Slang::CapabilityTargetSet"> + <DisplayString>{{target={target}}}</DisplayString> + <Expand> + <CustomListItems> + <Item Name="target">target</Item> + <Item Name="shaderStageSets">shaderStageSets</Item> + <Item Name="shaderStageSets.map.m_values">shaderStageSets.map.m_values</Item> + </CustomListItems> + </Expand> + </Type> + <Type Name="Slang::CapabilityStageSet"> + <DisplayString>{{size={atomSet}}}</DisplayString> + <Expand> + <CustomListItems> + <Item Name="stage">stage</Item> + <Item Name="atomSet">atomSet</Item> + </CustomListItems> + </Expand> + </Type> + + <!--UIntSet--> + <Type Name="Slang::UIntSet"> + <DisplayString>{{max_size={m_buffer.m_count*Slang::UIntSet::kElementSize}}}</DisplayString> + <Expand> + <Synthetic Name="[Values]"> + <Expand> + <CustomListItems MaxItemsPerView="1000"> + <Variable Name="atomType" InitialValue="0"/> + <Variable Name="bitValue" InitialValue="0"/> + <Variable Name="boolRes" InitialValue="false"/> + <Variable Name="bitIter" InitialValue="0"/> + <Variable Name="totalBitIter" InitialValue="0"/> + <Variable Name="value" InitialValue="0"/> + <Variable Name="iter" InitialValue="0"/> + <Exec>iter = (Slang::UIntSet::Element)0</Exec> + <Exec>bitIter = (Slang::UIntSet::Element)0</Exec> + <Exec>totalBitIter = (Slang::UIntSet::Element)0</Exec> + <Exec>value = 0</Exec> + <Loop> + <If Condition="bitIter >= Slang::UIntSet::kElementMask"> + <Exec>bitIter = 0</Exec> + <Exec>totalBitIter++</Exec> + <Exec>iter++</Exec> + </If> + <If Condition="iter >= m_buffer.m_count"> + <Break/> + </If> + <Exec>bitValue = (m_buffer[iter]>>bitIter)&1</Exec> + <If Condition="bitValue != 0"> + <Exec>value = totalBitIter</Exec> + <Item>(CapabilityAtom)value</Item> + </If> + <Exec>bitIter++</Exec> + <Exec>totalBitIter++</Exec> + </Loop> + </CustomListItems> + </Expand> + </Synthetic> + </Expand> + </Type> + <Type Name="Slang::CapabilityAtomSet"> + <DisplayString>{{max_size={m_buffer.m_count*Slang::UIntSet::kElementSize}}}</DisplayString> + <Expand> + <Synthetic Name="[CapabilityAtomView]"> + <Expand> + <CustomListItems MaxItemsPerView="1000"> + <Variable Name="atomType" InitialValue="0"/> + <Variable Name="bitValue" InitialValue="0"/> + <Variable Name="boolRes" InitialValue="false"/> + <Variable Name="bitIter" InitialValue="0"/> + <Variable Name="totalBitIter" InitialValue="0"/> + <Variable Name="value" InitialValue="0"/> + <Variable Name="iter" InitialValue="0"/> + <Exec>iter = (Slang::UIntSet::Element)0</Exec> + <Exec>bitIter = (Slang::UIntSet::Element)0</Exec> + <Exec>totalBitIter = (Slang::UIntSet::Element)0</Exec> + <Exec>value = 0</Exec> + <Loop> + <If Condition="bitIter >= Slang::UIntSet::kElementMask"> + <Exec>bitIter = 0</Exec> + <Exec>totalBitIter++</Exec> + <Exec>iter++</Exec> + </If> + <If Condition="iter >= m_buffer.m_count"> + <Break/> + </If> + <Exec>bitValue = (m_buffer[iter]>>bitIter)&1</Exec> + <If Condition="bitValue != 0"> + <Exec>value = totalBitIter</Exec> + <Item>(CapabilityAtom)value</Item> + </If> + <Exec>bitIter++</Exec> + <Exec>totalBitIter++</Exec> + </Loop> + </CustomListItems> + </Expand> + </Synthetic> + </Expand> + </Type> + <!--~UIntSet--> </AutoVisualizer> |
