diff options
| author | ArielG-NV <159081215+ArielG-NV@users.noreply.github.com> | 2024-05-16 00:04:12 -0400 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2024-05-16 00:04:12 -0400 |
| commit | 1b89f78cd1762aa08402bd656e807b66833b11d0 (patch) | |
| tree | 2be71c9d97af8d28d440981d0c5adc726d9eac56 /source/core | |
| parent | 3b0de8b6ea484091146f61e663c63beeac5b4798 (diff) | |
Capabilities System, CapabilitySet Logic Overhaul (#4145)
* Capabilities System, Backing Logic Overhaul
Fixes #4015
Problems to address:
1. Currently the capabilities system spends anywhere from 25-50% of compile time on the CapabilityVisitor. Most of this time is spent on join logic: 1. Finding abstract atoms 2. Comparing list1<->list2. This should and can be made significantly faster.
2. Error system does not produce errors with auxiliary information. This will require a partial redesign to provide more useful semantic information for debugging.
What was addressed:
1. Array backed `CapabilityConjunctionSet` was replaced in-favor for a `UIntSet` backed `CapabilityTargetSets`. The design is described below.
Design:
* `CapabilityTargetSets` is a `Dictionary<targetAtom, CapabilityTargetSet>`. This is not an array for 2 reasons: 1. Easy to figure out which target is missing between two `CapabilityTargetSets` 2. To statically allocate an array requires the preprocessor to manually annotate which Capability is a target and link that Capability to an index. This means a dictionary is required for lookup regardless of implementation.
* `CapabilityTargetSet` is an intermediate representation of all capabilities for a singular `target` atom (`glsl`, `hlsl`, `metal`, ...). This structure contains a dictionary to all stage specific capability sets for fast lookup of stage capabilities supported by a `CapabilitySet` for a `target` atom. This reduces number of sets searched.
* `CapabilityStageSet` is an intermediate representation of all capabilities for a singular `stage` atom (`vertex`, `fragment`, ...). This structure holds all disjoint capability sets for a `stage`. A disjoint set is rare, but may exist in some scenarios (as an example): `{glsl, EXT_GL_FOO}{glsl, _GLSL_130, _GLSL_150}`. This reduces the number of sets searched.
* `UIntSet` is the main reason for the redesign for better performance and memory usage. All set operations only require a few operations, making all set logic trivial and with minimal cost to run. All algorithms were modified to focus around `UIntSet` operations.
2. Errors
* Semantic information are now better linked to the calling function to provide a connection of function<->function_body for when saving semantic information for errors.
* Missing targets now print errors much like other error code by finding code which could be a cause of incompatibility.
What is missing:
1. Add non naive support for non-stage specific capabilities such as `{hlsl, _sm_5_0}`. Currently non stage specific targets emulate the behavior through assigning such capabilities to every stage: `{hlsl, _sm_5_0, vertex} {hlsl, _sm_5_0, fragment}...`. Removal of this behavior would remove redundant shader stage sets being made at construction time (~80% of new implementation runtime). This is an addition, not an overhaul.
2. Optionally: `UIntSet` should be modified to support SIMD operations for significantly faster operations. This is not required immediately since `UIntSet` is already not a performance constraint.
Notes:
* UIntSet had implementation bugs which were fixed in this PR.
* The old capabilities system had bugs which were fixed in this PR when transforming to the new implementation.
* fix .natvis debug view
* Small optimizations I found while working on the addition
the AST building pass looks like so now:
1% = ~capabilitySet
2% = capabilitySet()
1.5% capabilitySet::unionWith()
0.8% capabilitySet::join()
1.5% auxillary info for debugging
~0.5-1% extra visitor overhead
~5% total for the visitor
~6.5% for total runtime costs
* fix caps which were wrong but worked
* push minor syntax fix (still looking for why other tests fail)
* perf & bug fixes
1. did not properly remake isBetterForTarget for this->empty case with that as Invalid. This is best case in this senario.
2. Remade seralizer for stdlib generation. Faster (more direct) & cleaner code.
NOTE: did not address review comments
* fix glsl.meta caps error
* fixing findBest logic again & UIntSet wrapper
findBest was not checking for 'more specialized' targets & was element counter was flawed
* faster getElements algorithm + natvis for UIntSet + wrong warning
* type incompatability of bitscanForward implementations
* try to fix warnings again
* remove ptr for clang intrinsic
* add missing header
* ifdef to allow clang compile
* compiler hackery to fix up platform/type independent operations
* bracket
* fix MSVC error
* missing template
* change types out again
* changes to fix compiling
* adjustment to parameter for Clang/GCC
* added iterator to delay processing all atomSets of a CapabilitySet
* add a few missing consts's
* ensure we never have more than 1 disjointSet
Added a wrapper + assert + union functionality to all possible disjoint sets. This was done in favor of a removal of the LinkedList for 2 reasons:
1. We still need 0-1 set functionality.
2. Might as well keep the code, just disallow the problematic functionality.
* address review comments
non linked-list refactor review comments addressed; add doc comments + remove redundant code
* comments + remove isValid for bool operator
* push removal of linkedlist for capabilities
* add missing break
* address review comments
minor adjustments of syntax
* push a fix to the `CapabilitySet({shader, missing target})` code
* quality + error
1. add iterator to UIntSet
2. do not specialize target_switch if profile is derived from case (GLSL_150 is not compatable with GLSL_400)
* fix target_switch erroring + temporarily remove UIntSet::Interator
temporarily remove UIntSet::Interator. It will be added after, testing code on CI first so I can multi-task fixing the UIntSet Iterator
* fix the UIntSet iterator
* Revert "fix the UIntSet iterator" temporarily to pull from master
* add metal error as per texture.slang
(took a while I realize this was why things were breaking, likely should adjust errors to reflect this)
* Rework UIntSet to have a template for output type
This is done so it is reasonable to debug the iterator output and not just dealing with messy int's
Fix problems with the iterators implemented + invalid capabilities handling
* removed incorrect `__target_switch` capability
barycentric was being used with anticipation of `profile glsl450`, this does not expand into `GL_EXT_fragment_shader_barycentric`, this instead caused an error which is hidden during cross-compile.
* remove some uses of getElements
* remove undeclared_stage for now
* remove redundant code associated with `undeclared_stage`
* remove unused variable
* address review
specifically to note removed static in a thread dangerous scope. Now using a `const static` for read only (thread safe) which precompile steps generate
* move GLSL_150 capdef change to sm_4_1 (more accurate)
* address most review comments
did not address: https://github.com/shader-slang/slang/pull/4145#discussion_r1602256776
* revert incorrect code review suggestion
* push changes for all code review suggestions
Diffstat (limited to 'source/core')
| -rw-r--r-- | source/core/slang-linked-list.h | 2 | ||||
| -rw-r--r-- | source/core/slang-uint-set.cpp | 62 | ||||
| -rw-r--r-- | source/core/slang-uint-set.h | 209 |
3 files changed, 230 insertions, 43 deletions
diff --git a/source/core/slang-linked-list.h b/source/core/slang-linked-list.h index 93b5e435c..840ef8cd6 100644 --- a/source/core/slang-linked-list.h +++ b/source/core/slang-linked-list.h @@ -323,7 +323,7 @@ public: } return rs; } - int getCount() { return count; } + int getCount() const { return count; } }; } // namespace Slang #endif diff --git a/source/core/slang-uint-set.cpp b/source/core/slang-uint-set.cpp index e973cbc3a..b6871c192 100644 --- a/source/core/slang-uint-set.cpp +++ b/source/core/slang-uint-set.cpp @@ -3,18 +3,6 @@ namespace Slang { -static bool _areAllZero(const UIntSet::Element* elems, Index count) -{ - for (Index i = 0; count; ++i) - { - if (elems[i]) - { - return false; - } - } - return true; -} - UIntSet& UIntSet::operator=(UIntSet&& other) { m_buffer = _Move(other.m_buffer); @@ -49,14 +37,8 @@ void UIntSet::setAll() void UIntSet::resize(UInt size) { - const Index oldCount = m_buffer.getCount(); const Index newCount = Index((size + kElementMask) >> kElementShift); - m_buffer.setCount(newCount); - - if (newCount > oldCount) - { - ::memset(m_buffer.getBuffer() + oldCount, 0, (newCount - oldCount) * sizeof(Element)); - } + resizeBackingBufferDirectly(newCount); } void UIntSet::clear() @@ -66,17 +48,7 @@ void UIntSet::clear() bool UIntSet::isEmpty() const { - const Element*const src = m_buffer.getBuffer(); - const Index count = m_buffer.getCount(); - - for (Index i = 0; i < count; ++i) - { - if (src[i]) - { - return false; - } - } - return true; + return _areAllZero(m_buffer.getBuffer(), m_buffer.getCount()); } void UIntSet::clearAndDeallocate() @@ -106,7 +78,7 @@ bool UIntSet::operator==(const UIntSet& set) const const Index minCount = Math::Min(aCount, bCount); - return ::memcmp(aElems, bElems, minCount) == 0 && + return ::memcmp(aElems, bElems, minCount*sizeof(Element)) == 0 && _areAllZero(aElems + minCount, aCount - minCount) && _areAllZero(bElems + minCount, bCount - minCount); } @@ -123,6 +95,15 @@ void UIntSet::intersectWith(const UIntSet& set) } } +void UIntSet::subtractWith(const UIntSet& set) +{ + const Index minCount = Math::Min(this->m_buffer.getCount(), set.m_buffer.getCount()); + for (Index i = 0; i < minCount; i++) + { + this->m_buffer[i] = this->m_buffer[i] & (~set.m_buffer[i]); + } +} + /* static */void UIntSet::calcUnion(UIntSet& outRs, const UIntSet& set1, const UIntSet& set2) { outRs.m_buffer.setCount(Math::Max(set1.m_buffer.getCount(), set2.m_buffer.getCount())); @@ -162,5 +143,24 @@ void UIntSet::intersectWith(const UIntSet& set) return false; } +Index UIntSet::countElements() const +{ + // TODO: This can be made faster using SIMD intrinsics to count set bits. + uint64_t tmp; + constexpr Index loopSize = ((sizeof(Element) / sizeof(tmp)) != 0) ? sizeof(Element) / sizeof(tmp) : 1; + Index count = 0; + for (auto index = 0; index < this->m_buffer.getCount(); index++) + { + for (auto i = 0; i < loopSize; i++) + { + tmp = m_buffer[index] >> (sizeof(tmp) * i); + tmp = tmp - ((tmp >> 1) & 0x5555555555555555); + tmp = (tmp & 0x3333333333333333) + ((tmp >> 2) & 0x3333333333333333); + count += ((tmp + (tmp >> 4) & 0xF0F0F0F0F0F0F0F) * 0x101010101010101) >> 56; + } + } + return count; +} + } diff --git a/source/core/slang-uint-set.h b/source/core/slang-uint-set.h index 0f2165bab..22ca457b0 100644 --- a/source/core/slang-uint-set.h +++ b/source/core/slang-uint-set.h @@ -6,31 +6,83 @@ #include "slang-common.h" #include "slang-hash.h" +#if defined(_MSC_VER) +#include <intrin.h> +#endif #include <memory.h> namespace Slang { +template<typename T> +constexpr static Index computeElementShift() +{ + Index currentShift = 0; + Index currentShiftValue = 1; + + while (currentShiftValue != sizeof(T) * 8) + { + currentShift++; + currentShiftValue *= 2; + } + + return currentShift; +} + +static inline Index bitscanForward(uint64_t in) +{ +#if defined(_MSC_VER) + +#ifdef _WIN64 + uint64_t out = 0; + _BitScanForward64((unsigned long*)&out, in); + return Index(out); +#else + constexpr uint32_t bitsInType = sizeof(uint32_t) * 8; + uint32_t out; + // check for 0s in 0bit->31bit. If all 0's, check for 0s in 32bit->63bit + _BitScanForward((unsigned long*)&out, *(((uint32_t*)&in) + 1)); + if (out != bitsInType) + return Index(out); + _BitScanForward((unsigned long*)&out, *(((uint32_t*)&in))); + return Index(out + bitsInType); +#endif// #ifdef _WIN64 + +#else + return Index(__builtin_ctzll(in)); +#endif// #if defined(_MSC_VER) +} + /* Hold a set of UInt values. Implementation works by storing as a bit per value */ +/// UIntSet is essentially a Element[], where each Element is `b` bits big. +/// Each index has `b` number of integers. If the bit is 1, we have an element there. +/// Value of each element is equal to the binary offset from Element[0], bit 0. class UIntSet { public: typedef UIntSet ThisType; - typedef uint32_t Element; ///< Type that holds the bits to say if value is present + typedef uint64_t Element; ///< Type that holds the bits to say if value is present + constexpr static Index kElementSize = sizeof(Element) * 8; ///< The number of bits in an element. This also determines how many values a element can hold. + constexpr static Index kElementMask = kElementSize - 1; ///< Mask to get shift from an index + constexpr static Index kElementShift = computeElementShift<Element>(); ///< How many bits to shift to get Element index from an index. 5 for 2^5=32 elements in a uint32_t. 6 for 2^6=64 in a uint64_t. + UIntSet() {} UIntSet(const UIntSet& other) { m_buffer = other.m_buffer; } UIntSet(UIntSet && other) { *this = (_Move(other)); } UIntSet(UInt maxVal) { resizeAndClear(maxVal); } + UIntSet(List<UIntSet::Element> buffer) { m_buffer = buffer; } UIntSet& operator=(UIntSet&& other); UIntSet& operator=(const UIntSet& other); HashCode getHashCode() const; - /// Return the count of all bits directly represented + /// Return the count of all bits directly represented Int getCount() const { return Int(m_buffer.getCount()) * kElementSize; } + List<Element>& getBuffer() { return m_buffer; } + /// Resize such that val can be stored and clear contents void resizeAndClear(UInt val); /// Set all of the values up to count, as set @@ -38,6 +90,7 @@ public: /// Resize (but maintain contents) up to bit size. /// NOTE! That since storage is in Element blocks, it may mean some values after size are set (up to the Element boundary) void resize(UInt size); + void resizeBackingBufferDirectly(Index size); /// Clear all of the contents (by clearing the bits) void clear(); @@ -47,6 +100,8 @@ public: /// Add a value inline void add(UInt val); + inline void add(const UIntSet& val); + /// Remove a value inline void remove(UInt val); /// Returns true if the value is present @@ -59,10 +114,12 @@ public: /// != bool operator!=(const UIntSet& set) const { return !(*this == set); } - /// Store the union between this and set in this + /// Store the union between this and set void unionWith(const UIntSet& set); - /// Store the intersection between this and set in this + /// Store the intersection between this and set void intersectWith(const UIntSet& set); + /// Store the subtraction between this and set + void subtractWith(const UIntSet& set); /// bool isEmpty() const; @@ -70,6 +127,10 @@ public: /// Swap this with rhs void swapWith(ThisType& rhs) { m_buffer.swapWith(rhs.m_buffer); } + template<typename T> + List<T> getElements() const; + Index countElements() const; + /// Store the union of set1 and set2 in outRs static void calcUnion(UIntSet& outRs, const UIntSet& set1, const UIntSet& set2); /// Store the intersection of set1 and set2 in outRs @@ -80,16 +141,98 @@ public: /// Returns true if set1 and set2 have a same value set (ie there is an intersection) static bool hasIntersection(const UIntSet& set1, const UIntSet& set2); -private: - enum + struct Iterator { - kElementShift = 5, ///< How many bits to shift to get Element index from an index - kElementSize = sizeof(Element) * 8, ///< The number of bits in an element - kElementMask = kElementSize - 1, ///< Mask to get shift from an index + friend class UIntSet; + private: + const List<Element>* context; + Index block = 0; + Element processedElement = 0; + uint64_t LSB = 0; + + void clearLSB() + { + LSB = bitscanForward(processedElement); + processedElement &= processedElement - 1; + } + public: + Iterator(const List<Element>* inContext) + { + context = inContext; + } + + Element operator*() + { + return Element(LSB + (kElementSize * block)); + } + + Iterator& operator++() + { + while (processedElement == 0) + { + block++; + if (block >= context->getCount()) + { + return *this; + } + processedElement = (*context)[block]; + } + clearLSB(); + return *this; + } + Iterator& operator++(int) + { + return ++(*this); + } + bool operator==(const Iterator& other) const + { + return other.block == this->block + && other.processedElement == this->processedElement; + } + bool operator!=(const Iterator& other) const + { + return !(other == *this); + } }; + Iterator begin() const + { + Iterator tmp(&m_buffer); + if (m_buffer.getCount() == 0) + return tmp; + + tmp.processedElement = m_buffer[0]; + if (tmp.processedElement == 0) + tmp++; + + tmp.clearLSB(); - // Make sure they are correct for the Element type - SLANG_COMPILE_TIME_ASSERT((1 << kElementShift) == kElementSize); + return tmp; + } + Iterator end() const + { + Iterator tmp(&m_buffer); + tmp.block = m_buffer.getCount(); + tmp.processedElement = 0; + return tmp; + } + + bool areAllZero() + { + return _areAllZero(m_buffer.getBuffer(), m_buffer.getCount()); + } + +protected: + static bool _areAllZero(const UIntSet::Element* elems, Index count) + { + for (Index i = 0; i < count; ++i) + { + if (elems[i]) + { + return false; + } + } + return true; + } List<Element> m_buffer; }; @@ -132,6 +275,18 @@ inline bool UIntSet::contains(const UIntSet& set) const } // -------------------------------------------------------------------------- + +inline void UIntSet::resizeBackingBufferDirectly(Index newCount) +{ + const Index oldCount = m_buffer.getCount(); + m_buffer.setCount(newCount); + + if (newCount > oldCount) + { + ::memset(m_buffer.getBuffer() + oldCount, 0, (newCount - oldCount) * sizeof(Element)); + } +} + inline void UIntSet::add(UInt val) { const Index idx = Index(val >> kElementShift); @@ -142,6 +297,38 @@ inline void UIntSet::add(UInt val) m_buffer[idx] |= Element(1) << (val & kElementMask); } +inline void UIntSet::add(const UIntSet& other) +{ + auto otherCount = other.m_buffer.getCount(); + if (this->m_buffer.getCount() < otherCount) + resizeBackingBufferDirectly(otherCount); + + for (auto i = 0; i < otherCount; i++) + m_buffer[i] |= other.m_buffer[i]; } +template<typename T> +List<T> UIntSet::getElements() const +{ + auto count = m_buffer.getCount(); + if (count == 0) + return {}; + + // Specific path for uint64_t. If using SIMD we should not use this path due to larger data types. + + List<T> elements; + elements.reserve(count); + for (Index block = 0; block < count; block++) + { + Element n = m_buffer[block]; + while (n != 0) + { + elements.add(T(bitscanForward((uint64_t)n) + (kElementSize * block))); + n &= n - 1; + } + } + return elements; +} + +} #endif |
