| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Fix#6993 - Emit Diagnostic Warning and Fix SIGSEGV
* Update external/slang-rhi submodule
* Add checks for valid stage names for paq in SemanticsVisitor check
* format code
---------
Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
* Added Dictionary::erase(iterator) and fixed crashing when filtering a dictionary in slang-ir-autodiff-loop-analysis.cpp
* Added Dictionary::removeIf(Predicate)
* Removed Dictionary::erase(it)
---------
Co-authored-by: Julius Ikkala <julius.ikkala@gmail.com>
|
| |
|
|
|
|
|
| |
* Update build to allow setting external paths
Update the build to allow setting user-specific paths for the external modules.
This allows building Slang without also fetching the external modules, assuming
they are already present elsewhere locally.
|
| |
|
|
|
|
|
| |
* format
* Minor test fixes
* enable checking cpp format in ci
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
auto-diff results (#5394)
* Various AD enhancements
* Fix issue with pt-loop test
* Update pt-loop.slang
* More fixes for perf. Final minimal context test now passes.
* Fix issue with loop-elimination pass not running after dce
* Try fix wgpu test by removing select operator
* Disable wgpu
* Delete out.wgsl
* Remove comments
* Update slang-ir-util.cpp
* Fix header relative paths for slang-embed
* Disbale wgpu for a few other tests
* Better way of determining which params to ignore for side-effects
* Update slang-ir-dce.cpp
* Fix issue with circular reference from previous AD pass being left behind for the next AD pass
* Update slang-ir-dce.cpp
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add options to prevent usage of own submodules
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Allow using external unordered dense headers
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Link system wide installed unordered dense
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Allow external header usage for lz4 and spirv
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Add more options to disable targets
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Add option to provide explizit path for spirv headers and remove earlier options that break the build process
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Rename options to use common prefix
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Fix indentation for the cmake changes
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Add advanced_option function for cmake
* Normalize includes between system and submodule dependencies
Fix any before-accidentally-working problems
* Add option for enabling/disabling slang-rhi
Signed-off-by: Jacki <jacki@thejackimonster.de>
* Pass correct include path for cpu tests
* Correct include path
---------
Signed-off-by: Jacki <jacki@thejackimonster.de>
Co-authored-by: Ellie Hermaszewska <ellieh@nvidia.com>
|
| |
|
|
|
| |
* Fix for problem with OrderedHashSet causing crashes during running tests on on g++ 7.3
* Fix typo
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add -spirv-core-grammar option to load alternate spirv defs
Also embed a version to use by default
* Use perfect hash for spv op lookup
* Neaten perfect hash embedding
* Refactor spirv grammar lookup in preperation for more kinds of lookups
* Load spirv capability list from spec
* Add all SPIR-V enums to lookup table
* regenerate vs projects
* appease msvc
* Use string slices for spir-v core grammar lookups
* wiggle
* comment
* Add OpInfo for spv ops
* regenerate vs projects
* Embed op names
* Add min/max operand counts and enum categories to spirv info
* neaten
* Operand kinds for spirv ops
* Store and embed all information relating to spirv enums and qualifiers
* Use SPIR-V spec to position instructions in spirv_asm blocks
* Neaten spir-v info embedding
* Neaten perfect hash embedding
* Add assignment syntax to spirv_asm snippets
* Better errors for spirv_asm parser
* Add warning for too many operands in spirv asm
* squash warnings
* neaten
* test wiggle
* Lookup enums for spirv
* Put OpCapability and OpExtension in the correct place for spirv_asm blocks
* Tests for OpCapability and OpExtension
* ci wiggle
* Add expected failure
* Allow raising immediate values to constant ids where necessary in spirv_asm blocks
* Allow bitwise or expressions and numeric literals in spirv_asm blocks
* test numeric literals
* Fix memory issues.
* fix.
---------
Co-authored-by: Yong He <yonghe@outlook.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Correct namespace for getClockFrequency
* missing const
* Add missing assignment operator
* Remove unused variables
* Return correct modified variable
* Use stable hash code for file system identity
* terse static_assert
* Structured binding for map iteration
* Make (==) and getHashCode const on many structs
* Add ConstIterator for LinkedList
* Replace uses of ItemProxy::getValue with Dictionary::at
* Extract list of loads from gradientsMap before updating it
* Const correctness in type layout
* Add unordered_dense hashmap submodule
* Use wyhash or getHashCode in slang-hash.h
* refactor slang-hash.h
* Use ankerl/unordered_dense as a hashmap implementation
Notable changes:
- The subscript operator returns a reference directly to the value,
rather than a lazy ItemProxy (pair of dict pointer and key)
slang-profile time (95% over 10 runs):
- Before: 6.3913906 (±0.0746)
- After: 5.9276123 (±0.0964)
* 64 bit hash for strings
So they have the same hash as char buffers with the same contents
* Narrowing warnings for gcc to match msvc
* revert back to c++17
* Correct c++ version for msvc
* Use path to unordered_dense which keeps tests happy
* Do not assign to and read from map in same expression
* Remove redundant map operations in primal-hoist
* Split out stable hash functions into slang-stable-hash.h
* 64 bit hash by default
* regenerate vs projects
* Correct return type from HashSetBase::getCount()
* correct width for call to Dictionary::reserve
* Use stable hash for obfuscated module ids
* Signed int for reserve
* clearer variable naming
* Parameterize Dictionary on hash and equality functors
* Allow heterogenous lookup for Dictionary
* missing const
* Use set over operator[] in some places
* Remove unused function
* s/at/getValue
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Simplify lookup.
* Various bug fixes.
* Report type dictionary size in perf benchmark.
* Remove type duplication.
* increase initial dict size.
* Bug fix.
* Fix bugs.
* Fixup.
* Revert type legalization looping.
* Fix specialization pass.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* #include an absolute path didn't work - because paths were taken to always be relative.
* WIP lowerCamel Dictionary.
* WIP more lowerCamel fixes for Dictionary.
* Add/Remove/Clear
* GetValue/Contains
* Fix tabs in dictionary.
Count -> getCount
* Fix fields with caps.
* Key -> key
Value -> value
Use m_ for members where appropriate.
Use lowerCamel in linked list.
* Some small fixes/improvements to Dictionary.
* Kick CI.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* #include an absolute path didn't work - because paths were taken to always be relative.
* Moved JSON source map writing logic to JSONSourceMapUtil.
* Use ArtifactHandler to read/write SourceMaps.
Use ObjectCastableAdapter to hold SourceMap
Only serialize SourceMap <-> JSON on demand.
* Make some types swappable.
* BoxValue impl.
* Added asBoxValue.
* Remove const get funcs.
* Fix typo in asBoxValue.
* Fix another typo in asBoxValue.
* Slightly simplify conversion to blob of SourceMap.
* Small fix for asBoxValue
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Bug fixes.
* Fix.
* Only perform autodiff for functions whose derivative is actually used.
* Fix loop optimize bug.
* Fix high order diff.
* Fix trivial diff func generation.
* Fixes.
* Cleanup.
---------
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
* WIP: Fix for do-while loops
* Added a somewhat hacky fix for do-while loops
* Redid the indexed region map builder step to fix issue with the nested loops test
* rename
* Used managed pointers
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Perserve specialization cache in IR for specialization pass.
* Fix compile error.
* Fix.
* Fix.
* Fix test case.
* Fix.
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* #include an absolute path didn't work - because paths were taken to always be relative.
* Use TerminatedUnownedStringSlice for literals in output C++.
* Remove Escape/Unescape functions used in slang-token-reader.cpp
Add target type of 'host-cpp' etc to map to the target types.
* Fix some corner cases around string encoding.
* Added unit test for string escaping.
Fixed some assorted escaping bugs.
* Updated test output.
* Added decode test.
* Stop using hex output, to get around 'greedy' aspect. Use octal instead.
* Added HostHostCallable
Small changes to use ArtifactDesc/Info instead of large switches.
* Fix C++ emit to handle arbitrary function export.
* Add options handling for callable without an output being specified.
* Can compile with COM interface. Added example using com interface.
* Use the IR Ptr type instead of hack in C++ emit for interfaces.
* Fix issue with outputting the COM call when ptr is used.
* Fix crash issue on compilation failure.
* Add support for __global.
* Added `ActualGlobalRate`
Added special handling around globals and COM interfaces.
Tested out in cpu-com-example.
* Fix typo in NodeBase.
* Support for accessing globals by name working.
* Check that actual global initialization is working.
* Refactor the com replacement such that it doesn't need a cache or do anything special with GlobalVar.
* Remove context.
Only create replacement if needed.
* Split out COM host-callable into a unit-test.
* host-callable com testing on C++and llvm.
* Comment around the COM ptr replacement.
* Disable com test on vs 32 bit.
Fix C++ prelude
* Disable 32 bit targets testing com host-callable.
* Use JSON parsing to locate VS version.
* Need platform detection in C++prelude.
* Fix com host callable test for LLVM.
* Work around for not being able to include "targetConditionals.h"
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Vulkan: deferred shader compilation and pipeline creation.
* Fix 32bit build.
* gfx: restructure the code in render-d3d12.cpp
* Move `Submitter`.
* Fix.
* merge with master.
* Revert dictionary change in previous PR.
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
| |
* Vulkan: deferred shader compilation and pipeline creation.
* Fix 32bit build.
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* #include an absolute path didn't work - because paths were taken to always be relative.
* Refactor Stream. Working on all tests.
* Split out CharEncode.
* Make method names lower camel.
m_prefix in Writer/Reader
* Tidy up around CharEncode interface.
* Small improvements around encode/decode.
* Better use of types.
* Remove readLine from TextReader.
* Remove exceptions from Stream/Text handling.
* Fix some typos.
* Fix tabbing.
* Fix missing override.
* Remove remaining exception throw/catch via using signal mechanism.
* Remove exceptions that are not used anymore.
* Document the Stream interface.
* Remove index for decoding 'get byte' function.
* Fix CharReader -> ByteReader.
|
| | |
|
| |
|
|
|
|
|
| |
* Fix existential specialization of mutable buffer loads.
* fix
Co-authored-by: Yong He <yhe@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* #include an absolute path didn't work - because paths were taken to always be relative.
* Improve diagnostic for token pasting.
* Token paste location test.
* Output include hierarchy.
* WIP on includes hierarchy.
* Improved include hierarchy output - to handle source files without tokens.
Improved test case.
* Small comment improvements.
Fixed a typo with not returning a reference.
* Slight simplification of the ViewInitiatingHierarchy, by adding GetOrAddValue to Dictionary.
* Remove the need for ViewInitiatingHierarchy type.
* Improve output of path in diagnostic for includes hierarchy.
* Remove comment in diagnostic for token-paste-location.slang
* Update command line docs to include `-output-includes`
Co-authored-by: Yong He <yonghe@outlook.com>
|
| | |
|
| |
|
|
|
|
|
| |
* ShortList<T> and core.natvis improvements.
* Fix gcc build.
* add `getBuffer()` accessor to `GetArrayViewResult`
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Fields from upper to lower case in slang-ast-decl.h
* Lower camel field names in slang-ast-stmt.h
* Fix fields in slang-ast-expr.h
* slang-ast-type.h make fields lowerCamel.
* slang-ast-base.h members functions lowerCamel.
* Method names in slang-ast-type.h to lowerCamel.
* GetCanonicalType -> getCanonicalType
* Substitute -> substitute
* Equals -> equals
ToString -> toString
* ParentDecl -> parentDecl
Members -> members
* * Make hash code types explicit
* Use HashCode as return type of GetHashCode
* Added conversion from double to int64_t
* Split Stable from other hash functions
* toHash32/64 to convert a HashCode to the other styles.
GetHashCode32/64 -> getHashCode32/64
GetStableHashCode32/64 -> getStableHashCode32/64
* Other Get/Stable/HashCode32/64 fixes
* GetHashCode -> getHashCode
* Equals -> equals
* CreateCanonicalType -> createCanonicalType
* Catches of polymorphic types should be through references otherwise slicing can occur.
* Fixes for newer verison of gcc.
Fix hashing problem on gcc for Dictionary.
* Another fix for GetHashPos
* Fix signed issue around GetHashPos
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add support for generic load/store on byte-addressed buffers
Introduction
============
The HLSL `*ByteAddressBuffer` types originaly only supported loading/storing `uint` values or vectors of the same, using `Load`/`Load2`/`Load3`/`Load4` or `Store`/`Store2`/`Store3`/`Store4`. More recent versions of dxc have added support for generic `Load<T>` and `Store<T>`, which adds a two main pieces of functionality for users.
The first and more fundamental feature is that `T` can be a type that isn't 32 bits in size (or a vector with elements of such a type), thus exposing a capability that is difficult or impossible to emulate on top of 32-bit load/store (depending on what guarantees `*StructuredBuffer` makes about the atomicity of loads/stores).
The secondary benefit of having a generic `Load<T>` and `Store<T>` is that it becomes possible to load/store types like `float` without manual bit-casting, and also becomes possible to load/store `struct` types so long as all the fields are loadable/storable.
This change adds generic `Load<T>` and `Store<T>` to the Slang standard library definition of byte-address buffers, and tries to bring those same benefits to as many targets as possible. In particular, the secondary benefits become available on all targets, including DXBC: byte-address buffers can be used to directly load/store types other than `uint`, including user-defined `struct` types, so long as all of the fields of those types can be loaded/stored.
The ability to load/store non-32-bit types depends on target capabilities, and so is only available where direct support for those types is available. For 16-bit types like `half` this includes both Vulkan and D3D12 DXIL with appropriate extensions or shader models.
The implementation is somewhat involved, so I will try to explain the pieces here.
Standard Library
================
The changes to the Slang standard library in `hlsl.meta.slang` are pretty simple. We add new `Load<T>` and `Store<T>` generic methods to `*ByteAddressBuffer`, and route them through to a new IR opcode.
Right now the generic `Load<T>` and `Store<T>` do *not* place any constraints on the type `T`, although in practice they should only work when `T` is a fixed-size type that only contains "first class"
uniform/ordinary data (so no resources, unless the target makes resource types first class). Our front-end checking cannot currently represent first-class-ness and validate it (nor can it represent fixed-size-ness), so these gaps will have to do for now.
Rather than directly translate `Load<T>` or `Store<T>` calls into a single instruction, we instead bottleneck them through internal-use-only subroutines. The design choice here is intended to ensure that for some large user-defined type like `MassiveMaterialStruct` we only emit code for loading all of its fields *once* in the output HLSL/GLSL rather than once per load site. While downstream compilers are likely to inline all of this logic anyway, we are doing what we can to avoid generating bloated code.
Emit and C++/CUDA
=================
Over in `slang-emit-c-like.cpp` we translate the new ops into output code in a straightforward way. A call like `obj.Load<Foo>(offset)` will eventually output as a call like `obj.Load<Foo>(offset)` in the generated code, by default.
For the CPU C++ and CUDA C++ codegen paths, this is enough to make a workable implementation, and we add suitable templated `Load<T>` and `Store<T>` declarations to the prelude for those targets.
Legalization
============
For targets like DXBC and GLSL there is no way to emit a load operation for an aggregate type like a `struct`, so we introduce a legalization pass on the IR that will translate our byte-address-buffer load/store ops into multiple ops that are legal for the target.
Scalarization
-------------
The big picture here is easy enough to understand: when we see a load of a `struct` type from a byte-address buffer, we translate that into loads for each of the fields, and then assemble a new `struct` value from the results. We do similar things for arrays, matrices, and optionally for vectors (depending on the target).
Bit Casting
-----------
After scalarization alone, we might have a load of a `float` or a `float3` that isn't legal for D3D11/DXBC, but that *would* be legal if we just loaded a `uint` or `uint3` and then bit-casted it. The legalization pass thus includes an option to allow for loads/stores to be translated to operate on a same-size unsigned integer type and then to bit-cast.
To make this work actually usable, I had to add some more details to the implementation of the bit-cast op during HLSL emit and, more importantly, I had to customize the way that the byte-address buffer load/store ops get emitted to HLSL so that it prefers to use the existing operations like `Load`/`Load2`/`Load3`/`Load4` instead of the generic one, whenever operating on `uint`s or vectors of `uint`.
Translation to Structured Buffers
---------------------------------
Even after scalarizing all byte-address-buffer loads/stores, we still have a problem for GLSL targets, because a single global `buffer` declaration used to back a byte-address buffer can only have a single element type (currently always `uint`), so the granularity of loads/stores it can express is fixed at declaration time. If we want to load a `half` from a byte-address buffer, we need a dedicated `buffer` declaration in the output GLSL with an element type of `half`.
The solution we employ here is to translate all byte-address buffer loads into "equivalent" structured-buffer ops when targetting GLSL. We add logic to find the underlying global shader parameter that was used for a load/store and introduce a new structured-buffer parameter with the desired element type (e.g., `half`) and then rewrite the load/store op to use that buffer instead. We copy layout information from the original buffer to the new one, so that in the output GLSL all the various `buffer`s will use a single `binding` and thus alias "for free."
We don't want to create a new global buffer for every load/store, so we try to cache these "equivalent" structured buffers as best as we can. For the caching I ended up needing a pair to use as a key, so I tweaked the `KeyValuePair<K,V>` type in `core` so that it could actually work for that purpose.
Because we are working at the level of IR instructions instead of stdlib functions at this work I had to add new IR opcodes to represent structured-buffer load/store that only (currently) apply to GLSL.
Layout
======
In order to translate a load/store of a `struct` type into per-field load/store we need a way to access layout information for the types of the fields. Previously layout information has been an AST-level concern that then gets passed down to the IR only when needed and only on global parameters, so layout information isn't always available in cases like this, at the actual load/store point.
As an expedient move for now I've introduced a dedicated module that does IR-level layout and caches its results on the IR types themselves. This approach *only* supports the "natural" layout of a type, and thus is usable for structured buffers and byte-address buffers (or general pointer load/store on targets that support it), but which is *not* usable for things like constant buffer layout.
We've known for a while that the Right Way to do layout going forward is to have an IR-based layout system, and this could either be seen as a first step toward it, or else as a gross short-term hack. YMMV.
Details
=======
The GLSL "extension tracker" stuff around type support needed to be tweaked to recognize that types like `int16_t` aren't actually available by default. I switched it from using a "black list" of unavailable types at initialization time over to using a "white list" of types that are known to always be available without any extensions.
Tests
=====
There are two tests checked in here: one for the basic case of a `struct` type that has fields that should all be natively loadable, and one that stresses 16-bit types. Each test uses both load and store operations.
Future Directions
=================
Right now we translate vector load/store to GLSL as load/store of individual scalars, which means the assumed alignment is just that of the scalars (consistent with HLSL byte-address buffer rules). We could conceivably introduce some controls to allow outputting the vector load/store ops more directly to GLSL (e.g., declaring a `buffer` of `float4`s), which might enable more efficient load/store based on the alignment rules for `buffer`s.
The IR layout work has a number of rough edges, but the most worrying is probably the assumption that all matrices are laid out in row-major order. Slang really needs an overhaul of its handling of matrices and matrix layout, so I don't know if we can do much better in the near term.
At some point the IR-based layout system needs to be reconciled with our current AST-base layout, and we need to figure out how "natural" layout and the currently computed layouts co-exist (in particular, we need to make sure that the IR-based layout and the existing layout logic for structured buffers will agree). This probably needs to come along once we have moved the core layout logic to operate on IR types instead of AST types (a change we keep talking about).
As part of this work I had to touch the implementation of bit-casting for HLSL, and it seems like that logic has some serious gaps. We really ought to consider a separate legalization pass that can turn IR bitcast instructions into the separate ops that a target actually supports so that we can implement `uint64_t`<->`double` and other conersions that are technically achievable, but which are hard to express in HLSL today.
* fixup: missing files
|
|
|
* Prefixing source files in source/slang with slang-
* Prefix source in source/slang with slang- prefix.
* Rename core source files with slang- prefix.
* Update project files.
* Fix problems from automatic merge.
|