summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-05-30Support SPIR-V DebugTypePointer (#4228)Jay Kwak
2024-05-30Various issues in code snippets (#4247)Elie Michel
Fixed as I was testing release `v2024.1.17` (latest) Co-authored-by: ArielG-NV <159081215+ArielG-NV@users.noreply.github.com>
2024-05-30Update document regarding pointer (#4248)Jay Kwak
And also add an actual test case from the User Guide example.
2024-05-30Fix confusion in Translation Units doc (#4245)Elie Michel
I think the sentence was saying the opposite of what it meant! Co-authored-by: Yong He <yonghe@outlook.com>
2024-05-30Fix small typo (#4246)Elie Michel
2024-05-30Increase MSVC warning level to 4 for Slang projects (#4207)Jay Kwak
2024-05-29Improve compile time performance. (#3857)Yong He
* Handle type check cache update on extensions more gracefully. * Correctness fix. * Cache implcit cast overload resolution results. * Fix. * More optimizations. * Cache implicit default ctor resolution. * Disable redundancy removal. * Fix. * Fix test. * Fix. * Correctness fix. * Fix. * Fix, * Fix test. * Small tweak.
2024-05-29Add options to speedup compilation. (#4240)Yong He
* Add options to speedup compilation. * Fix. * Plumb options to DCE pass. * Revert debug change. * Fix regressions. * More optimizations. * more cleanup and fixes. * remove comment. * Fixes. * Another fix. * Fix errors. * Fix errors. * Add comments.
2024-05-28Print memory leak info in Debug build of slangc.exe (#4210)Jay Kwak
When memory leak is detected, this commit will dump the information about the memory leak. This feature is available only in Debug build on Windows platform. Also note that the message will not be printed on the client applications that use slang.dll, because the printing happens as a part of slangc.exe not slang.dll. I found a bug that Slang::StdWriters was closing `stdout` and `stderr` in its destructor, which prevented Crt functions to print the messages to `stdout` and `stderr`.
2024-05-28Simplify test file names for slang-test (#4227)Jay Kwak
When slang-test.exe ran with a file name doesn't exactly match character-by-character, those tests don't run. This commit alters the file name given from the command-line and it will behave in a more expected way. - "./" are removed. - "../" gets removed along with its parent directory name. - Back-slash characters will be converted to slash on Windows.
2024-05-27CTS: stage some known failure tests for now (#4226)kaizhangNV
Stage some known failure test cases, will enable them back when the fix is merged. The failure tests can be checked in https://github.com/shader-slang/VK-GL-CTS/blob/main/test-lists/slang-waiver-tests.xml
2024-05-27add support for callable shaders in gfx (#3460)skallweitNV
Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com> Co-authored-by: Yong He <yonghe@outlook.com>
2024-05-27[gfx] metal backend skeleton (#4223)Simon Kallweit
* add metal-cpp submodule * add metal-cpp cmake target * gfx metal backend skeleton * add premake support * add foundation framework * add metal-cpp include to premake * update vs project file --------- Co-authored-by: Simon Kallweit <skallweit@nvidia.com> Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
2024-05-24Fix clang-18 build (#4222)exdal
* Update slang-performance-profiler.cpp * modified: source/core/slang-performance-profiler.cpp * reviews --------- Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
2024-05-24If no sample is set with a `Texture(.*)MS[]` operation, set sample to 0. (#4225)ArielG-NV
* push fix: if no sample, set to 0 for textureMS * push fixes to hlsl [] operator + test so it will error
2024-05-24Fix pointer example (#4224)cheneym2
* Fix pointer example Make the example shown for pointers something that would compile. Don't redefine pNext and do define MyType. * Fix formatting of struct in pointer example
2024-05-23Fix pointers link in userguide (#4217)cheneym2
Adding (limited) to the header in a previous doc change broke the link. Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
2024-05-23CTS: Report error when CTS fails (#4219)kaizhangNV
The CTS nightly stops report error because of `continue-on-error` is set to true. Remove that field such that it will fail the job. Add slack notification about the CTS status, this will report the status of CTS nightly result to our slack dev channel.
2024-05-23Update vk-gl-cts-nightly.yml (#4214)cheneym2
Increase timeout from 100 to 180 minutes
2024-05-22Fix all Clang-14 warnings (#4203)ArielG-NV
* fix all Clang-14 warnings * remove a clang-14 warning fix because it is a MSVC warning...
2024-05-20This commit increases the minimum CMake version from 3.20 to 3.25. (#4193)Jay Kwak
I was trying to see if I can lower it to 3.16, but I found that we are currently using CMake feature that requires a version 3.25 not 3.20. This finding is not new. I made a similar change to CMakePresets.json a few days ago. At that time, I didn't realize that the same change had to be made for CMakeList.txt as well.
2024-05-20Printing a timing of stdlib build time (#4190)Jay Kwak
2024-05-19Emit execution mode of type per entry point once. Emit SPIRV capability once ↵ArielG-NV
per shader program. (#4189) * Emit only 1 execution mode of type per entry point Added a dictionary<SpvWord,Hash<ExecutionMode>> to ensure we don't emit multiple. * get inst->id directly * address review + fix test --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-05-17Add `-minimum-slang-optimization` to favor compile time. (#4186)Yong He
2024-05-17SPIR-V support for GLSL texture functions (#4184)Jay Kwak
* SPIR-V support for GLSL texture functions Closes #4147 This commit implements GLSL texture functions with SPIR-V intrinsics. It also implements some of missing GLSL implementations. - textureProj - textureLod - texelFetchOffset - textureProjOffset - textureLodOffset - textureProjLod - textureProjLodOffset - textureGrad - textureGradOffset - textureProjGrad - textureProjGradOffset * Fix SPIR-V issues discovered while improving the test case. * Add __requireComputeDerivative() whenever sampling * Do not touch GetDimensions
2024-05-17Test binding index for combined and not-combined textures (#4180)Jay Kwak
2024-05-17Add warning about CMake version on CONTRIBUTION.mdJay Kwak
Currently CMake version is required to be 3.20 or above. The version requirement is properly defined in our CMakeList.txt file. But older versions of CMake may not even print an error about the version requirement.
2024-05-17capture/relay: Add capture interface classes (#4177)kaizhangNV
* capture/relay: Add capture interface classes Add `ModuleCapture` class for capturing `IModule` - The `IModule` can only be created from -- `ISession::loadModule` -- `ISession::loadModuleFromIRBlob` -- `ISession::loadModuleFromSource` -- `ISession::loadModuleFromSourceString` so, we create the `ModuleCapture` at those methods in `SessionCapture` class. We use a hash map to store a map from `IModule` to `ModuleCapture` to avoid creating new `ModuleCapture` when there is already an old one. - In `SessionCapture::getLoadedModule`, we will assert on not finding a `ModuleCapture` instance. Add `EntryPointCapture` class for capturing `IEntryPoint`. - The `IEntryPoint` can only be created from: -- `IModule::findEntryPointByName` -- `IModule::findAndCheckEntryPoint` so, we create the `EntryPointCapture` at those methods in `ModuleCapture`. Similarly, we use a hash map to store a map from `IEntryPoint` to `EntryPointCapture`. - In `IModule::getDefinedEntryPoint`, we will assert on not finding a `EntryPointCapture` instance. Add `CompositeComponentTypeCapture` class for capturing CompositeComponentType, but since user is only exposed to `IComponentType`, so `CompositeComponentTypeCapture` just inherits from `IComponentType`. - `CompositeComponentType` can only be created from: -- ISession::createCompositeComponentType so create it here. Add `TypeConformanceCapture` class for capturing `ITypeConformance`. - The `ITypeConformance` can only be created from: -- `ISession::createTypeConformanceComponentType` so create it here. In addition, because `EntryPointCapture` and `ModuleCapture` share a some base class `IComponentType`, we generate the COM GUID for those two classes to differentiate them. * Fix the build issue * Add nullptr check for output parameter * define the SLANG_CAPTURE_ASSERT macro used in both debug and release build
2024-05-16ignore capability system skips the capability pass 100% now (#4183)ArielG-NV
2024-05-16Update 03-convenience-features.md (#4179)ArielG-NV
add member init expr and constructor logic to the docs
2024-05-16RasterizerOrder resource for spirv and metal. (#4175)Yong He
* RasterizerOrder resource for spirv and metal. Also fixes the byte address buffer logic for metal. * Fix. * Delete commented lines. --------- Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com>
2024-05-16Fixes running examples from generated SLN (#4173)Hai Nguyen
* Fixes running examples from generated SLN This CL contains changes to CMakeLists.txt that enables the examples to run from within Visual Studio when using CMake generated solution. Previously the working directory was set to examples/<example name> and which resulted in an invalid path in the generated project files. Additionally, the assets (shaders, images, models) were not in location that was accessible to the executable when ran from within Visual Studio. - Changed examples to use ${CMAKE_BINARY_DIR}/${dir} instead of ${dir} if generator is MSVC. - Add custom target to assets (shaders, images, models, etc) to example subdir under ${CMAKE_BINARY_DIR} - Add dependency to copy prebuilt binaries if building examples in MSVC so DirectX shader signing doesn't fail - Changed copy-prebuilt-binaries to use copy_if_different to avoid redundant copies The initial build time is increased by 20 seconds (16%) from 2m3s to 2m23s, due to the asset copy. The incremental build time remained same at 4 seconds. * Corrected tabs to spaces Corrected unintentional use of tabs instead of spaces. --------- Co-authored-by: Jay Kwak <82421531+jkwak-work@users.noreply.github.com> Co-authored-by: Yong He <yonghe@outlook.com>
2024-05-16Capabilities System, CapabilitySet Logic Overhaul (#4145)ArielG-NV
* Capabilities System, Backing Logic Overhaul Fixes #4015 Problems to address: 1. Currently the capabilities system spends anywhere from 25-50% of compile time on the CapabilityVisitor. Most of this time is spent on join logic: 1. Finding abstract atoms 2. Comparing list1<->list2. This should and can be made significantly faster. 2. Error system does not produce errors with auxiliary information. This will require a partial redesign to provide more useful semantic information for debugging. What was addressed: 1. Array backed `CapabilityConjunctionSet` was replaced in-favor for a `UIntSet` backed `CapabilityTargetSets`. The design is described below. Design: * `CapabilityTargetSets` is a `Dictionary<targetAtom, CapabilityTargetSet>`. This is not an array for 2 reasons: 1. Easy to figure out which target is missing between two `CapabilityTargetSets` 2. To statically allocate an array requires the preprocessor to manually annotate which Capability is a target and link that Capability to an index. This means a dictionary is required for lookup regardless of implementation. * `CapabilityTargetSet` is an intermediate representation of all capabilities for a singular `target` atom (`glsl`, `hlsl`, `metal`, ...). This structure contains a dictionary to all stage specific capability sets for fast lookup of stage capabilities supported by a `CapabilitySet` for a `target` atom. This reduces number of sets searched. * `CapabilityStageSet` is an intermediate representation of all capabilities for a singular `stage` atom (`vertex`, `fragment`, ...). This structure holds all disjoint capability sets for a `stage`. A disjoint set is rare, but may exist in some scenarios (as an example): `{glsl, EXT_GL_FOO}{glsl, _GLSL_130, _GLSL_150}`. This reduces the number of sets searched. * `UIntSet` is the main reason for the redesign for better performance and memory usage. All set operations only require a few operations, making all set logic trivial and with minimal cost to run. All algorithms were modified to focus around `UIntSet` operations. 2. Errors * Semantic information are now better linked to the calling function to provide a connection of function<->function_body for when saving semantic information for errors. * Missing targets now print errors much like other error code by finding code which could be a cause of incompatibility. What is missing: 1. Add non naive support for non-stage specific capabilities such as `{hlsl, _sm_5_0}`. Currently non stage specific targets emulate the behavior through assigning such capabilities to every stage: `{hlsl, _sm_5_0, vertex} {hlsl, _sm_5_0, fragment}...`. Removal of this behavior would remove redundant shader stage sets being made at construction time (~80% of new implementation runtime). This is an addition, not an overhaul. 2. Optionally: `UIntSet` should be modified to support SIMD operations for significantly faster operations. This is not required immediately since `UIntSet` is already not a performance constraint. Notes: * UIntSet had implementation bugs which were fixed in this PR. * The old capabilities system had bugs which were fixed in this PR when transforming to the new implementation. * fix .natvis debug view * Small optimizations I found while working on the addition the AST building pass looks like so now: 1% = ~capabilitySet 2% = capabilitySet() 1.5% capabilitySet::unionWith() 0.8% capabilitySet::join() 1.5% auxillary info for debugging ~0.5-1% extra visitor overhead ~5% total for the visitor ~6.5% for total runtime costs * fix caps which were wrong but worked * push minor syntax fix (still looking for why other tests fail) * perf & bug fixes 1. did not properly remake isBetterForTarget for this->empty case with that as Invalid. This is best case in this senario. 2. Remade seralizer for stdlib generation. Faster (more direct) & cleaner code. NOTE: did not address review comments * fix glsl.meta caps error * fixing findBest logic again & UIntSet wrapper findBest was not checking for 'more specialized' targets & was element counter was flawed * faster getElements algorithm + natvis for UIntSet + wrong warning * type incompatability of bitscanForward implementations * try to fix warnings again * remove ptr for clang intrinsic * add missing header * ifdef to allow clang compile * compiler hackery to fix up platform/type independent operations * bracket * fix MSVC error * missing template * change types out again * changes to fix compiling * adjustment to parameter for Clang/GCC * added iterator to delay processing all atomSets of a CapabilitySet * add a few missing consts's * ensure we never have more than 1 disjointSet Added a wrapper + assert + union functionality to all possible disjoint sets. This was done in favor of a removal of the LinkedList for 2 reasons: 1. We still need 0-1 set functionality. 2. Might as well keep the code, just disallow the problematic functionality. * address review comments non linked-list refactor review comments addressed; add doc comments + remove redundant code * comments + remove isValid for bool operator * push removal of linkedlist for capabilities * add missing break * address review comments minor adjustments of syntax * push a fix to the `CapabilitySet({shader, missing target})` code * quality + error 1. add iterator to UIntSet 2. do not specialize target_switch if profile is derived from case (GLSL_150 is not compatable with GLSL_400) * fix target_switch erroring + temporarily remove UIntSet::Interator temporarily remove UIntSet::Interator. It will be added after, testing code on CI first so I can multi-task fixing the UIntSet Iterator * fix the UIntSet iterator * Revert "fix the UIntSet iterator" temporarily to pull from master * add metal error as per texture.slang (took a while I realize this was why things were breaking, likely should adjust errors to reflect this) * Rework UIntSet to have a template for output type This is done so it is reasonable to debug the iterator output and not just dealing with messy int's Fix problems with the iterators implemented + invalid capabilities handling * removed incorrect `__target_switch` capability barycentric was being used with anticipation of `profile glsl450`, this does not expand into `GL_EXT_fragment_shader_barycentric`, this instead caused an error which is hidden during cross-compile. * remove some uses of getElements * remove undeclared_stage for now * remove redundant code associated with `undeclared_stage` * remove unused variable * address review specifically to note removed static in a thread dangerous scope. Now using a `const static` for read only (thread safe) which precompile steps generate * move GLSL_150 capdef change to sm_4_1 (more accurate) * address most review comments did not address: https://github.com/shader-slang/slang/pull/4145#discussion_r1602256776 * revert incorrect code review suggestion * push changes for all code review suggestions
2024-05-15Add diagnostic to prevent defining unsized variables. (#4168)Yong He
* Add diagnostic to prevent defining unsized static variables. * Fix tests. * Add more tests. * Fix to allow defining variables of link-time size. * update diagnostic message. * Fix tests. * Simplify code.
2024-05-14Support combined textures for Metal target (#4169)Jay Kwak
2024-05-14Remove use of `G0` and `__target_intrinsic` in stdlib. (#4170)Yong He
* Remove use of `G0` and `__target_intrinsic` in stdlib. * Fix. * Fix calling intrinsic in global scope.
2024-05-14Implement texture functions for Metal target (#4158)Jay Kwak
* Impl texture APIs for Metal target This commit is to implement texture functions for Metal target. The following functions are implemented and tested. - GetDimensions() - CalculateLevelOfDetail() - CalculateLevelOfDetailUnclamped() - Sample() - SampleBias() - SampleLevel() - SampleCmp() - SampleCmpLevelZero() - Gather() - SampleGrad() - Load() Metal has limited support for the texture functions compared to HLSL. - LOD is not supported for 1D texture, - Depth textures are limited to 2D, 2DArray, Cube and CubeArray textures. - "Offset" variants are limited to 2D, 2DArray, 2D-Depth, 2DArray-Depth and 3D textures. The functions that cannot be implemented for Metal should properly be handled by the capability system later. * Fix the failing test, multi-file.hlsl I am not sure why this change is needed. * Fix compile errors on macOS 2nd try * Remove a typo character to fix the compile error * Trivial clean up * Remove `as_type` where it was intended as static_cast * Use a simpler sytax for __intrinsic_asm * Trivial clean up * Remove TEST_AFTER_FIXING_CAPABILITY_PROBLEM after fixing normalize * Fix the failing test properly * Fix an incorrect setup of Depth-cube texture --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-05-14Fix CFG reversal logic for loops (#4162)Sai Praveen Bangaru
Handles a corner case where the first block after the condition on the true-side is another condition. This would currently result in an invalid reverse graph, where the reverse version of the true-block is the merge point for two different branching insts (the reverse version of the loop as well as the second condition). This patch simply adds a blank block when constructing the reverse-loop (similar to critical edge breaking) so that each branch inst in the reversed loop has a unique merge block.
2024-05-14Slang: Support UTF-8 with Byte Order Markers (#4135)cheneym2
Slang APIs are documented as taking UTF-8 encoded shader source, though it's not explicitly documented whether it is allowed to include a BOM (Byte Order Marker). This change adds support for UTF-8 BOM markers by virtue of disposing of BOM data. As a bonus, UTF-16 input which can cleanly decode to UTF-8 is now also accepted. Throwing out the BOM on input is done by leveraging existing functionality in "determineEncoding()", however a bug exists there for null-terminated single character input, where the null byte caused a heuristic to guess UTF-16, even though the null byte isn't part of the string. The bug in "determineEncoding" is fixed by only guessing when bytes >= 2 and not looking past the end of the buffer. The 'implicit-cast' test was mistakenly relying on the bug to pass, as its expected file was being read as UTF16 and cropped to zero length due to the bug. The expected output of implicit-cast is updated to pass with the bug fix in place. The decoding of UTF-16 to UTF-8 is done through an existing 'decode' method. This change fixes a bug in UTF16-LE 'decode' where it was decoded as if it were Big-Endian. Adds 3 small tests to ensure the compiler doesn't choke on source files in UTF-8 (with BOM), UTF16-LE, or UTF16-BE. Bonus: Fixes a bug in diagnostic reporting where hex values were incorrectly translated to text, leading to incorrect, possibly truncated strings. Fixes #4046 Co-authored-by: Yong He <yonghe@outlook.com>
2024-05-14Propagate warning settings on `Linkage` to IR passes. (#4156)Yong He
2024-05-13Add LoadAligned and StoreAligned methods to ByteAddressBuffers (#4066)Sriram Murali
Fixes #4062 This change enables wide load/stores for byte-address-buffer backed resources, when the data is accessed at an offset that is aligned. **Goals** - Improve performance by issuing wider instructions instead of sequence of scalar instructions, for load and stores of byte-address buffers. - Reduce code-size and readability of the generated shaders. - Help naive users as well as ninja programmers, generate optimal code. **Non Goals** - Help with Structured buffers, or other resources. - Target compilation time improvements. **Key changes** Adds 2 new overloads for Load and Store operations on ByteAddress Buffers. 1. Load / Store with an extra alignment parameter ``` resource.Load<T>(offset, alignment); resource.Store<T>(offset, value, alignment); ``` 2. LoadAligned / StoreAligned with no extra parameter, with the same signature as orignial Load / Store. ``` resource.LoadAligned<T>(offset); resource.StoreAligned<T>(offset, value); ``` - This overload will implicitly identify the alignment value, from the base type T of the elementary unit of the resource. **Supported resources** 1. Vectors This can be upto 4 elements, i.e. float -- float4. 2. Arrays This does not have a limit on number of elements, but on a conservative estimate, we can limit to few hundreds. 3. Structures This is used to group a resource of a single type. ``` struct { float4 x; } ``` **Code updates** - Modified byte-address-ir legalize to handle struct, array and vector kinds of load or store access - Added custom hlsl stdlib functions to implement all the overloads for Load, Store etc. - Added C-like emitter, SPIR-V emitter for handling ByteAddressBuffers. - Added a new core stdlib function intrinsic to wrap around alignOf<T>(). - Added a new peephole optimization entry to identify the equivalent IntLiteral value from the alignOf<T>() inst. - Added tests to check explicit, and implicit aligned Load and Store operations.
2024-05-13[gfx] specify resource view buffer range in bytes (#4149)skallweitNV
* refactor gfx buffer range to use byte range * create buffer view with zero struct stride for ClearUnorderedAccessViewUint/Float * create buffer descriptors on demand * avoid copying gfx.dll --------- Co-authored-by: Yong He <yonghe@outlook.com>
2024-05-13Update CONTRIBUTION.mdJay Kwak
Clarify which `slang.sln` file needs to be used for cmake workflow.
2024-05-11add missing Result to IRayTracingCommandEncoder::bindPipline (#4148)skallweitNV
2024-05-10Fix race-condition and visual artifacts issues (#4152)kaizhangNV
* Fix race-condition and visual artifacts issues In PerformanceProfiler::getProfiler() we return a static object for the profiler implementation, this is not thread-safe, so change it to thead_local. There is still some visual artifacts when using slang as the shading language. We don't know the root cause yet, but found out it's related to our loop inversion algorithm. So stage this feature for now, and turn it into an internal option and default off. We will re-enable it after more investigation on this optimization. File an new issue 4151 to track it. * Add '-loop-inversion' to the few tests
2024-05-10More Metal Intrinsics. (#4143)Yong He
2024-05-09fix typo (#4144)Tomáš Pazdiora
Co-authored-by: Yong He <yonghe@outlook.com>
2024-05-09Add stdlib tests for `clamp` derivatives which also checks `max` and `min` ↵Sai Praveen Bangaru
derivatives (#4136) * Add stdlib tests for `clamp` derivatives which also checks `max` and `min` derivatives * Extend test
2024-05-08Metal: propagate and specialize address space. (#4137)Yong He
2024-05-08Support `getAddress` of a single-element vector swizzle. (#4138)Yong He
Fixes #4112.