slang.git - Making it easier to work with shaders

Age	Commit message (Collapse)	Author
2022-01-10	Test for dynamicDispatch/resource internal error (#2071)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Test for internal error with resource/dynamic dispatch. * Fix typo. Co-authored-by: Theresa Foley <tfoleyNV@users.noreply.github.com>
2021-12-21	Language experiments (#2068)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Moved to experiments. Added some more tests. * More tests around associated types. * Return interface tests. * More tests.
2021-12-21	Assorted disabled tests around generics (#2062)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Some generic experiments. * Add some more generic tests. * More generic experiments/issues. * Some more generic tests. * Remove erroneous test. * Small improvements. * Disable test that was accidentally enabled. * Add equality-2.slang. * Some more generic tests. * Issues around type inference. * Some more generic tests. * Tuple experiment. * Generic interfaces don't seem to be supported. * Add inheritance test. * Alternative array type issue.
2021-12-03	U/Int64 coverage on slang-llvm JIT (#2040)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Update slang-llvm dependencies.
2021-11-30	Fix issue around constant folding/bool (#2036)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Fix bool handling in constant folding for generic parameters.
2021-11-11	Fix two test cases that were causing problems (#2011)	Theresa Foley
	First, we have a CUDA-only test that simply needed a format name to be changed to match the new conventions in `gfx`. Second, we have one of the "active mask" tests that seems to produce different results locally for developers (under Vulkan) than it does on CI. This is almost certainly down to differences in GPUs and/or drivers. The inconsistency ultimately proves the point that I was trying to make when I wrote those tests - the "active mask" concept is effectively meaningless as exposed in D3D and Vulkan because it has not been specified in a way that allows programmers to reason about its value, and drivers have implemented wildly different interpretations of its supposed semantics for so long that there is no real hope of turning `WaveGetActiveMask()` into something that returns a well-defined value in any but the most trivial cases. TLDR: I disabled that test for Vulkan, which means it is completely disabled.
2021-11-03	Fix an infinite-recursion bug in type-checking (#2004)	Theresa Foley
	Fixes #1990 The underlying problem here is in the `ExtractExistentialType` AST node class. An "existential" in current Slang is typically a value of interface type. When such a value is used in an operation, the type-checker "opens" the extistential so that subsequent type-checking steps can work with the (statically unknown) specific type of the value stored inside. The `ExtractExistentialType` AST node represents the type of an existential that has been "opened" in this way. When the front-end performs lookup "into" a value with one of these types, it nees to use a reference to the original interface declaration with a "this-type substitution" that refers to the "opened" type (a this-type substitution tells the compiler the concrete type it should use in place of `This` in signatures within the interface; it allows compiler to "see" the right associated type definitions to use in a context). Prior to this change, the implementation would store the specialized reference to the original interface declaration in the `ExtractExistentialType` node as part of its state. The catch there is that the specialized interface reference indirectly refers to the `ExtractExistentialType` AST node itself, creating a circularity. As soon as the front-end performs any operation that tries to recurse over that structure, it would go into an infinite loop. The fix here sounds kind of like a hack, but seems to be pretty nice in practice. Instead of always storing the specialized interface reference, we instead store the few values that are needed to construct it, and then create and cache the actual reference on-demand. The on-demand created fields are not considered part of the state of the AST node for any kind of recursion or serialization, so they avoid the original problem. A single test case was added that represents the original bug, and confirms the fix.
2021-10-29	Use detected shader model in gfx/d3d12. (#1996)	Yong He
	* Use detected shader model in gfx/d3d12. * Enable all d3d12 tests on Github. * Improve d3d12 software device detection. * Disable d3d12 tests on github for now. Co-authored-by: Yong He <yhe@nvidia.com>
2021-10-27	Update glslang binaries (#1991)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Use updated slang-binaries that have SPIR-V diagnostics improvements. * Re-enable nv-ray-tracing-motion-blur, because with SPIR-V diagnostic fixes in glslang - there shouldn't be spurious errors from glslang compilation. * If optimization fails use the SPIR-V we have. * Update slang binaries. * Hack to disable gfx unit tests for now to try and get CI pass for this PR.
2021-10-27	SPIR-V fixes (#1992)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Use updated slang-binaries that have SPIR-V diagnostics improvements. * Re-enable nv-ray-tracing-motion-blur, because with SPIR-V diagnostic fixes in glslang - there shouldn't be spurious errors from glslang compilation. * If optimization fails use the SPIR-V we have. * Update SPIR-V headers and generated files. Updated documentation. * Update spirv-headers/tools. Revert slang-binaries. * Remove hack around spir-v optimization as no longer needed. disable nv-ray-tracing-motion-blur.slang
2021-10-26	Runs all gfx unit tests through a 'test proxy' (#1981)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Support for test proxy. * Turn on testing using proxy. * Don't pass sink into check of downstream compiler. * Small change to kick off build. * Remove register specification on transcendental. * Increase poll timeout. Small improvements to proxy. * Disable gfx unit tests. * Put test runner in shared library mode by default. * Change comment. Kick off another CI test. * Small edit to kick off builds. * Run unit tests on proxy. * Turn on using proxy for now. * Enable swift shader. * Fix typo. Add exception support. * Make the default spwan type SharedLibrary Use isolation for gfx unit tests. * Update slang-binaries. * Fix typo. * Report unit test output information.
2021-10-26	Expanded gfx::Format to include additional formats (#1982)	lucy96chen
	* Format list updated with additional formats supported by both D3D and Vulkan; D3DUtil::getMapFormat() and VkUtil::getVkFormat() updated to include additional formats; GFX_FORMAT() updated with all additional formats (BC compression unfinished) * Finished updating GFX_FORMAT with newly added formats and sizes; Pixel size is now tracked using the FormatPixelSize struct containing the values for bytes per block and pixels per block to accomodate BC formats; Updated gfxGetFormatSize and associated sub-calls to return FormatPixelSize instead of uint8_t; Most calls to gfxGetFormatSize() updated to reflect changes, a couple calls still unupdated * Changes to accommodate new formats finished, debugging slang-literal unit test * First format unit test working * One test added for BC1Unorm and RGBA8Unorm_SRGB, both passing * Refactored format testing code to merge BC1Unorm and RGBA8Unorm SRGB into a single file * All unit tests added for BC and Srgb formats * Most tests added and working; Added five additional formats (still need tests) and made the appropriate changes to support these; createTextureView() modified for D3D11, D3D12, and Vulkan to take into account the format specified in the texture view desc when the texture's format is typeless * Format enums renamed to more closely match their D3D counterparts; Added a universal float and uint buffer and buffer view for use across all Format tests * Remaining tests added; D3D12 tests pass, but Vulkan crashes in BC1_UNORM and D3D11 spits out a bunch of D3D11 Errors (but supposedly passes) * re-run premake * Added Sint versions of test shaders; Vulkan and D3D11 tests also pass * Size struct for format unit tests no longer use initializer lists * Fixed a Size struct missed in the previous pass * Fixed minor bugs causing tests to fail * Added documentation detailing all currently unsupported formats * Skip tests causing unsupported format warnings due to swiftshader * updated several test using old Format enum names * Revert change to compareComputeResult() that was added for debugging purposes * DEBUGGING: Added prints to identify which formats are failing on CI * Reverted attempted debugging changes; Fixed texture2d-gather.hlsl to use updated Format enums * Fixed incorrect array sizes in d3d11 _initSrvDesc() * Commented out further tests that produce unexpected results when tested for Vulkan with swiftshader * Revert "Merge branch 'expanded-format-support' of https://github.com/lucy96chen/slang into expanded-format-support" This reverts commit 20008f0d3ecc3b1405ecac8c138edaa3cd37ed6b, reversing changes made to 6081e95827315fee50e18409394d5abd62fac787. * Added a fuzzy comparison function for use with floats * submodule update * Revert messed up changes caused by previous revert after automatically merging on github
2021-10-21	Passing associated type arguments to existential parameters + packing for ↵	Yong He
	`bool`. (#1987) * Passing associated type arguments to existential parameters + packing for `bool`. * fix typo Co-authored-by: Yong He <yhe@nvidia.com>
2021-10-21	Diagnostic for no type conformance + bug fix. (#1985)	Yong He
	* Diagnostic for no type conformance + bug fix. * Fixes. * Fix. * Include heterogeneous example only with --enable-experimental-projects premake flag Co-authored-by: Yong He <yhe@nvidia.com> Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
2021-10-19	Generalize heterogenous code emit (#1968)	David Siher
	* Bring heterogeneous-hello-world back up to date. * Reintroduced heterogeneous-hello-world into the premake * No longer uses compiled bytecode for entry point, instead a loadModule call is hardocoded with the slang file name. * Entry point is, similarly, hardcoded for now. * Added a bypass to slang-legalize-types for an unneeded GPUForeach check * Run premake and change to relative path * Removed experimental and added README * Add prebuild command to premake for heterogeneous example * Pass in entry point as parameter (also remove shader bytecode) * Pass in module name as parameter * Squashed commit of the following: commit 5b13b57fe600724344c556fe4309a5d6bb3d39ab Author: Kai Yao <kyao@nvidia.com> Date: Thu Oct 7 23:38:50 2021 -0700 Return diagnostics data when encountering module load error by exception (#1966) commit 112e1515c30fa972ff56f91514b70946153c718c Author: jsmall-nvidia <jsmall@nvidia.com> Date: Thu Oct 7 16:12:29 2021 -0400 Disable test crashing CI (#1965) * #include an absolute path didn't work - because paths were taken to always be relative. * Disable test that appears to be crashing. commit da32069a0c1c8c723d7ef45100049a8f0dd5d9c4 Author: Kai Yao <kyao@nvidia.com> Date: Mon Oct 4 13:58:51 2021 -0700 Modified barrier API to accept multiple resources per call (#1959) Co-authored-by: Yong He <yonghe@outlook.com> commit 97bb82ebcdf8f1391b9d93b5a8d7b1dfc4e88e52 Author: jsmall-nvidia <jsmall@nvidia.com> Date: Mon Oct 4 14:15:51 2021 -0400 Removing exceptions from core/compiler-core (#1953) * #include an absolute path didn't work - because paths were taken to always be relative. * Refactor Stream. Working on all tests. * Split out CharEncode. * Make method names lower camel. m_prefix in Writer/Reader * Tidy up around CharEncode interface. * Small improvements around encode/decode. * Better use of types. * Remove readLine from TextReader. * Remove exceptions from Stream/Text handling. * Fix some typos. * Fix tabbing. * Fix missing override. * Remove remaining exception throw/catch via using signal mechanism. * Remove exceptions that are not used anymore. * Document the Stream interface. * Remove index for decoding 'get byte' function. * Fix CharReader -> ByteReader. commit b3dfe383c6d31ff3dbd76dcfb32de8d536382f3e Author: lucy96chen <47800040+lucy96chen@users.noreply.github.com> Date: Mon Oct 4 09:46:33 2021 -0700 Get native handles for TextureResource and BufferResource (#1960) * Added getNativeHandle() to TextureResource and BufferResource; Implemented getNativeHandle() in Vulkan and D3D12; Added new unit test files for the aforementioned implementation * Added missing getNativeHandle() implementations to renderer-shared.cpp and CUDA * Finished new getNativeHandle() unit tests for ITextureResource and IBufferResource; Modified ICommandQueue and ICommandBuffer unit tests to call QueryInterface to convert to IUnknown then back and compare resulting pointers for equality * Unit tests updated and pass locally * Cast m_buffer.m_buffer and m_image to uint64_t commit 35bca4cc432613af3926da3bed217a6baa9cbd26 Author: lucy96chen <47800040+lucy96chen@users.noreply.github.com> Date: Fri Oct 1 13:08:25 2021 -0700 Add getNativeHandle() to ICommandQueue and ICommandBuffer (#1952) * Added support for getting command buffer and command queue handles to ICommandBuffer and ICommandQueue; D3D12Device, VkDevice, and DebugDevice modifieid to implement this new functionality; immediate-renderer-base.cpp also modified to implement the new functions * Removed excess boilerplate * Changed readRef() to get() in D3D12 getNativeHandle() implementation for ICommandBuffer and ICommandQueue * Added unit tests for new getNativeHandle() implementations, unfinished * Queue test added; Minor cleanup changes * getBufferHandleTestImpl() now closes the command buffer before returning * Added getNativeHandle() implementations to CUDADevice * Added comment clarifying that the Vulkan check is checking for a null handle, which is defined to be 0 commit 6c6200f547c7387598743b23bb3c8f0d375d9494 Author: Kai Yao <kyao@nvidia.com> Date: Thu Sep 30 20:25:34 2021 -0700 VK Resource Barrier (#1955) * Resource barrier API and VK implementation * Stub implementations * Handle VK Acceleration Structure flag * Add a couple more cases to pipeline barrier stages commit 627fc976bac5c2381dbace9c7925cb6a68b8de12 Author: Yong He <yonghe@outlook.com> Date: Thu Sep 30 19:48:47 2021 -0700 Fix aarch64 build on github (#1957) Co-authored-by: Yong He <yhe@nvidia.com> commit 122d701513e116856bd59c999221ce36a373d7db Author: Yong He <yonghe@outlook.com> Date: Thu Sep 30 17:51:56 2021 -0700 Fix GitHub release (#1956) * Fix aarch64 release build config. * Fix for WinAarch64 build. * Update premake for embed-std-lib build on aarch64. * `platform` fix for aarach64 build. * Try revert back to use absolute output path for slang-stdlib-generated.h * Fix * fix Co-authored-by: Yong He <yhe@nvidia.com> commit aa8f7b899b7b562b3d3c6e25c3da41569505e70c Author: Chad Engler <englercj@live.com> Date: Wed Sep 29 13:02:47 2021 -0700 Fix ARM64 detection for MSVC (#1951) commit 6736b0c1c5fa3e89bc561eb7965a1a0d17af3466 Author: Yong He <yonghe@outlook.com> Date: Wed Sep 29 11:29:46 2021 -0700 Add ISession::loadModuleFromSource. (#1950) Co-authored-by: Yong He <yhe@nvidia.com> commit d8e452412e14a6a8ba137f2adcae13b398e5cecb Author: Yong He <yonghe@outlook.com> Date: Tue Sep 28 15:03:03 2021 -0700 Fix AbortCompilationException leaking through loadModule API. (#1949) * Fix AbortCompilationException leaking through loadModule API. * Update. * Fix. Co-authored-by: Yong He <yhe@nvidia.com> commit cdf1b2c007fefdca128584d2a9f63dec3d350e16 Author: Yong He <yonghe@outlook.com> Date: Tue Sep 28 11:54:24 2021 -0700 Improvements to the unit test framework. (#1948) commit af788b62e18bbd55cd748ad60400a74cf1bc93ee Author: lucy96chen <47800040+lucy96chen@users.noreply.github.com> Date: Fri Sep 24 16:53:41 2021 -0700 Add existing device handle support unit test (#1946) commit bec8e6aec85b6e3f875c58bdd59eb15613978358 Author: Yong He <yonghe@outlook.com> Date: Fri Sep 24 11:33:44 2021 -0700 Move existing unit tests to a standalone dll. (#1945) commit f2a3c933bc11a498c622fa18694c84beca8ca031 Author: lucy96chen <47800040+lucy96chen@users.noreply.github.com> Date: Thu Sep 23 12:19:49 2021 -0700 Add method to retrieve native handles (#1944) * Added a getNativeHandle() method that retrieves the natively created handles; Modified RendererBase, VKDevice, D3D12Device, and DebugDevice to implement this new method * Moved ExistingDeviceHandles out of Desc directly inside IDevice and renamed to NativeHandles; Modified calls accessing the struct accordingly in RendererBase, DebugDevice, VKDevice, and D3D12Device * Minor cleanup changes (renames, etc.) commit b9b398d038b524f15a86ff27cd6888d54e8754e0 Author: Yong He <yonghe@outlook.com> Date: Wed Sep 22 10:06:59 2021 -0700 Add gfx unit testing framework. (#1943) * Add gfx unit testing framework. * Fix compilation error. * Reset gfxDebugCallback after render_test. * Pass enabledApi flags through. * Fix for code review suggestions. Co-authored-by: Yong He <yhe@nvidia.com> commit 6e9cee69b3588ddae09b08b9f580f59ad899983f Author: lucy96chen <47800040+lucy96chen@users.noreply.github.com> Date: Tue Sep 21 18:46:32 2021 -0700 Support for existing device/instance handles in Vulkan (#1942) commit b1f04c8544c650de3947955ca68f679535d249aa Author: lucy96chen <47800040+lucy96chen@users.noreply.github.com> Date: Wed Sep 15 20:22:45 2021 -0700 Allow D3D12Device to use an existing device handle (#1940) * Added a new field for an existing device handle to IDevice::Desc; Modified D3D12Device::initialize to set the device stored in desc if it already exists instead of creating a new one * Turned existingDeviceHandle into a struct containing an array of two elements; Updated D3D12Device::initialize to match changes to existingDeviceHandle; Updated comments * Fixed style error for ExistingDeviceHandles struct commit 2f7b9f5ae8be21c6c1d75ae9caefbc7b3f8986a9 Author: Pablo Delgado <private@pablode.com> Date: Thu Sep 16 01:17:57 2021 +0200 Fix incorrect WIN32 macros and missing Windows.h inclusion (#1939) * Replace WIN32 preprocessor macros with _WIN32 * Add missing Windows.h include for InterlockedIncrement commit 11d43642008905ac69a3832eb8a9b2ae7b785f86 Author: Yong He <yonghe@outlook.com> Date: Tue Sep 14 11:36:44 2021 -0700 Avoid upcasting to f32 in 16bit float-uint bit cast. (#1938) Co-authored-by: Yong He <yhe@nvidia.com> commit 502aa3812a82cf0d091cff0c67804e4ee448ac78 Author: David Siher <32305650+dsiher@users.noreply.github.com> Date: Tue Sep 14 12:59:55 2021 -0400 Bring heterogeneous-hello-world back up to date. (#1935) * Bring heterogeneous-hello-world back up to date. * Reintroduced heterogeneous-hello-world into the premake * No longer uses compiled bytecode for entry point, instead a loadModule call is hardocoded with the slang file name. * Entry point is, similarly, hardcoded for now. * Added a bypass to slang-legalize-types for an unneeded GPUForeach check * Run premake and change to relative path * Removed experimental and added README Co-authored-by: Yong He <yonghe@outlook.com> * Revert "Squashed commit of the following:" This reverts commit 4f665858d65f7c332c616ef6db9fdafa1c5e0b9f. * Run premake * Remove prebuild command (only works on Windows?) * Rerun premake * Fix heterogeneous prebuild command * Remove linux specific prebuild command * Fix prebuild command (again) * Change target from dxbc to hlsl to see if that fixes linux issues * Use Path::getFileNameWithoutExt * Change string-literal.slang.expected to have extra filename in decoration Co-authored-by: Yong He <yonghe@outlook.com>
2021-10-18	GFX: implement mutable shader objects. (#1963)	Yong He
	* GFX: implement mutable shader objects. * Revert unnecessary changes * Revert more changes. * Fix clang errors. * Fix clang/gcc errors. * Fix clang errors. * Remove CPU test. * Fix after merge. * Fix after merge. * Remove gl test * Code review fixes. * Fixing all vk validation errors. * Flush test output more often. * Fix a crash in `specializeDynamicAssociatedTypeLookup`. * temporarily disable std-lib-serialize test to see what happens * Fix crashes. * Make sure cpu gfx unit tests are properly disabled on TeamCity. * Disable cpu test. * Fix. * Fix cuda. * Disable nv-ray-tracing-motion-blur Co-authored-by: Yong He <yhe@nvidia.com>
2021-10-11	Support for GL_NV_ray_tracing_motion_blur vk extension (#1964)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Upgrade to GLSLANG 11.16.0+ * Small edit to readme - really to kick another build. * Upgrade slang-binaries to include new glslang binaries. * Update slang-binaries to include linux-x86 * Upgrade slang-binaries. * Support for GL_NV_ray_tracing_motion_blur extension.
2021-10-08	Hotfix/reenable vk test (#1969)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Reenable erroneously disabled test.
2021-10-07	Disable test crashing CI (#1965)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Disable test that appears to be crashing.
2021-10-04	Removing exceptions from core/compiler-core (#1953)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Refactor Stream. Working on all tests. * Split out CharEncode. * Make method names lower camel. m_prefix in Writer/Reader * Tidy up around CharEncode interface. * Small improvements around encode/decode. * Better use of types. * Remove readLine from TextReader. * Remove exceptions from Stream/Text handling. * Fix some typos. * Fix tabbing. * Fix missing override. * Remove remaining exception throw/catch via using signal mechanism. * Remove exceptions that are not used anymore. * Document the Stream interface. * Remove index for decoding 'get byte' function. * Fix CharReader -> ByteReader.
2021-09-13	Bug fix in 16bit type emit, vk validation error fix. (#1936)	Yong He
	+ Implement bit_cast between float16 and uint16 in GLSL. + Enable pack-any-value-16bit test on vk. Co-authored-by: Yong He <yhe@nvidia.com>
2021-09-10	First Slang LLVM integration (#1934)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * First integration with 'slang-llvm'. * Fix project. * Fix test output. * First pass assert support. * Add inline impls for min and max. * Add abs inline abs impl for llvm. * Make abs not use ternary op * Fix typo in slang-llvm.h * Sundary fixes to make remaining tests using llvm backend pass.
2021-09-09	`reinterpret` and 16-bit value packing. (#1933)	Yong He
	* `reinterpret` and 16-bit value packing. * Update `half-texture` cross-compile test reference result. * Revert inadvertent reformatting of slang-ir-inst-defs.h Co-authored-by: Yong He <yhe@nvidia.com>
2021-09-03	Fix crash: dynamic dispatch of generic interface method. (#1929)	Yong He
	* Fix crash: dynamic dispatch of generic interface method. * Fix memory error. Co-authored-by: Yong He <yhe@nvidia.com>
2021-08-26	Add API to control interface specialization. (#1925)	Yong He

2021-08-25	Add `createDynamicObject` stdlib function. (#1923)	Yong He
	This function takes a user provided `typeID` and arbitrary typed value, and turns them into an existential value whose `witnessTableID` is `typeID` and whose `anyValue` is the user provided value. This allows the users to pack the runtime type id info in arbitrary way.
2021-08-17	Add GLSL450 intrinsics to SPIRV direct emit. (#1921)	Yong He
	* Add GLSL450 intrinsics to SPIRV direct emit. * Fix. * Fix compiler error. * Fix. * Fix compiler error. * Make direct-spirv tests actually run.
2021-08-12	Further implementation of SPIRV direct emit. (#1920)	Yong He
	* Further implementation of SPIRV direct emit. This change implements: - Struct, Vector, Matrix and Unsized Array types. - Basic arithmetic opcodes, vector construct, swizzle etc. - getElementPtr, getElement, fieldAddress, extractField. - SPIRV target intrinsics with SPIRV asm code in stdlib. - RWStructuredBuffer and StructuredBuffer. - Pointer storage class propagation. - Control flow. * Fix.
2021-07-21	Work to mitigate SPIR-V bloat (#1914)	Theresa Foley
	* Work to mitigate SPIR-V bloat SPIR-V is not an especially compact format, but some patterns in how Slang generates code and then runs it through `spirv-opt` lead to many redundant field-by-field copy operations being emitted. This change attempts to address some of the resulting bloat from the Slang side of things. Note: experimentation shows that the bloat is less pronounced when running either no SPIR-V optimizations or full SPIR-V optimizations, so it is also likely that the bloat should be addressed by changing which `spirv-opt` passes the Slang compiler runs in default (`-O1`) builds. Such changes should come as a distinct pull request. This change primarily does two things: First, the code generation strategy for passing arguments to `out` and `inout` parameters has been changed. In the past, the compiler would always copy the argument value into a temporary, then pass the address of the temporary, and then write back the value after the call. The new code generation strategy attempts to identify when an argument value already has a simple address in memory and passes that address directly when possible. This eliminates many copy operations that occur before/after calls to functions with `out`/`inout` parameters. Second, we introduce an IR optimization pass that detects call sites where the entire contents of a buffer (usually a constant buffer) is being passed to a callee function, such that many bytes are loaded and then passed even if only very few are used in the callee. The pass moves the load operations from the caller to a specialized version of the the callee where possible (e.g., when the constant buffer in question is a global shader parameter). Doing this eliminates another major category of copies. Notes: * The IR lowering logic is complicated by the fact that several kinds of l-values (values that are usable as the desitnation of assignment, or for `out`/`inout` arguments) are not actually addressable. An easy example is a non-contiguous swizzle like `v.xwz` on a `float4`, where the value occupies 12 bytes, but not 12 consecutive bytes with a single address. There are many more corner cases like that and the IR lowering pass carries a lot of complexity to deal with them. A more systematic overhaul is due some time soon. * The IR representation of `out` and `inout` parameters deserves some careful scrutiny when making these kinds of changes. The official semantics of `inout` in HLSL has been "copy in copy out" (and `out` is just "copy out") which is observably different from any solution that passes in the address of an l-value directly. By making this change we are saying that Slang's semantics are not precisely those of legacy HLSL, and that our semantics for `inout` parameters are closer to those of `inout` in Swift or of a mutable borrow in Rust. In the Swift case the implementation can freely pass the underlying storage of an l-value or the address of a temporary, and valid programs may not observe the different. It is thus illegal to observe the value in a storage local while a mutation to that location is "in flight." All of this is way more detailed and technical than 99% of Slang users will ever care about, but importantly it gives us semantic cover to eliminate these copies in the IR, and also to emit output C++ code that implements `out` and `inout` as by-reference parameter passing. * There was an exsting generic pass for specializing functions based on call sites that uses a "template method" style of pattern to customize its behavior. That pass needed to be generalized to handle this use case because it had previously operated on the assumption that the "desire" to specialize a callee function must be driven by the parameter declarations of that function, and not on the argument values passed in. The code has been slightly refactored to allow the policy for specialization to consider both parameters and arguments. * Unsurprisingly, a bunch of the GLSL (and thus SPIR-V) generated has changed with this work, so several baseline `.slang.glsl` files needed to be updated. * This change is incomplete in that it does not address broader cases of buffer loads, including both partial loads from constant buffers (just loading one field, but a field that uses a "large" structure type), and loads from multi-element buffers (a lot from a structured buffer where the element type is "large"). The main question in each of those cases is how to define how "large" a structure needs to be before we decide to try and sink loads into callee functions like this. In the worst case, sinking loads in this way may actually create more memory traffic (because the same values get loaded in multiple callee functions). * fixup: run premake * fixup: typo
2021-07-09	Enable testing with Swiftshader. (#1906)	Yong He

2021-07-08	Allow render-test to run inline ray tracing tests. (#1903)	Yong He
	* Update VS projects to 2019. * Empty commit to trigger build * Implement gfx inline ray tracing on D3D12. * Allow render-test to run inline ray tracing tests. Co-authored-by: Yong He <yhe@nvidia.com>
2021-06-25	Fixes related to combined texture/sampler types (#1894)	Theresa Foley
	* Fixes related to combined texture/sampler types Work on #1891 Our intention has always been to support combined texture/sampler types in Slang, both targets like OpenGL where that is the only option available and for targets like Vulkan where it can be beneficial to performance. Because Slang's current users mostly focus on D3D12+Vulkan codebases, they strongly prefer separate textures and samplers, and the relevant support code in Slang has "bit-rotted" over many releases until the functionality that was there isn't useful any more. This change significantly overhauls the implementation of combined texture/sampler types and adds a test that uses them in the hopes of avoiding future regressions. The new combined texture/sampler types use the prefix `Sampler`, so where there is an existing standalone `Texture2D` type the equivalent combined texture/sampler type will be `Sampler2D`. The intention is that this naming mirrors the GLSL conventions (where the type is `sampler2d`) while following HLSL naming precedent (to the extent it exists). The operations available on a `Sampler2D` are intended to be those that are available on a `Texture2D`, and it is just that in cases where the `Texture2D` operation would take a separate sampler argument: Texture2D t = ...; SamplerState s = ...; float4 result = t.Sample(s, uv); the equivalent `Sampler2D` operation just elides that argument: Sampler2D s = ...; float4 result = s.Sample(uv); In terms of implementation, there are a lot of subtle details here: * I've tried to use the same metaprogramming logic that generates all the stdlib declarations for `Texture` to also generate `Sampler` in the hopes that this helps keep them in sync. * The big catch to the above is that it means that for certain operations the indidces of parameters depend on whether or not an explicit sampler parameter is used. Rather than try to tweak the indices in the stdlib generation logic (which is already complicated) I went and added Yet Another Hack to the logic that handles intrinsic definition strings. Basically, the special-case handling of `$p` has been modified so that it also applies a negative offset to future parameter references in the same intrinsic string. * Trying to actually bring this up in our test framework revealed that the "flat" reflection API was seemingly not reflecting combined texture/sampler types correctly at all (it was reflecting them as just plain textures). Other than that issue, the Vulkan path seems to work fine with combined texture/samplers. * I also had to add logic to the `TEST_INPUT` parsing to re-introduce handling of the combined types (that was something I consciously left out to reduce the amount of code in the earlier refactor there). Luckily, the architecture is such that a combined texture/sampler can leverage most of the existing logic for the separate cases. * fixup: reveiw feedback
2021-06-16	Initial support for variadic macros (#1887)	Theresa Foley
	This change adds support for variadic macros in the C-style preprocessor, e.g.: #define DEBUG(MSG, ...) print(__FILE__, __LINE__, MSG, __VA_ARGS__) Similar to the gcc preprocessor, this feature supports both named variadic macro parameters and unnamed ones (which then default to `__VA_ARGS__`. The implementation work is mostly straightforward, although there are a few subtle design choices worth mentioning: * A variadic macro is represented by it having a variadic parameter that is part of the ordinary parameter list. * Argument parsing does not detect whether the macro being invoked is variadic and collect/combine arguments to form a single argument value for the variadic parameter. This is motivated by the need for some extensions to differentiate a variadic parameter receiving a single empty argument vs. zero arguments. * Because any reference to the variadic parameter needs to expand to the comma-separated arguments that match it, the logic for turning a macro parameter reference into a list of tokens has been factored out into a subroutine that handles the details. * The choice in the earlier refactor to have a macro invocation collect all the argument tokens (including the intervening commas) into a single token list seems to pay off here, because it means that the tokens in the expansion of a variadic parameter reference were already stored contiguously. * The special-case logic for handling an empty argument list had to be tweaked again to ensure that an empty argument list is treated as having zero arguments for the variadic parameter. Note that historically C did not define the behavior of this case, and always required at least one argument for any variadic macro parameter. * The logic for checking whether the number of arguments to a macro invocation is valid needed to handle variadic and non-variadic macros as distinct cases. There really isn't much overlap in how the checks need to work, even if we tried to change the underling representation. The main missing feature here is any way to discard a comma in a macro body that appears before a variadic parameter reference, e.g.: #define DEBUG(...) print("debug:", __VA_ARGS__) In this case, an empty invocation list `DEBUG()` will expand to `print("debug:",)` - a call with a trailing comma in the argument list. If users end up needing a way to discard commas in cases like this, we have many options we can consider. This change does not implement any of them to keep the initial work as minimal as possible.
2021-06-10	CUDA layout corner cases/testing (#1881)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Add support for sizeOf/alignOf/offsetOf to stdlib. Add $G intrinsic expansion that works of the generic parameters not the param type * Test cuda layout. * Fix CUDA layout issues. Fix reflection to handle other built in types. Fix __offsetOf * Tests of reflection and layout as reported directly from CUDA. * Comment about use of aligned size as size. * Fix warning from VS. * Check alignment is pow2. * Small improvements to alignment calcs. * Tab to spaces. * Fix alignment pointer sizes on 32 bit OS for CUDA. * Fix CUDA reflection on 32 bit.
2021-06-09	Enable some VK texture tests (#1878)	jsmall-nvidia

2021-06-08	Fix RWTexture issues on CUDA (#1876)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Re-enable CUDA RWTexture tests. Re-enable RWTexture1D test Make sure tests have only single mip for RWTexture (required for CUDA) * Fix issue with reading CUDA surface. Re-enable working CUDA RWTextureTest. Enable 1D case.
2021-06-06	Fixed issue around 4xFloat16 texture on CUDA (#1874)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Fixes around Float16. Incorrect calculation of 'elementSize'.
2021-06-06	Include a "stack trace" with nested-import errors (#1872)	T. Foley
	* Include a "stack trace" with nested-import errors When errors occur in nested `#include` files it is often helpful to have a "stack trace" / traceback of the `#include` chain that led from a root translation unit to the file with an error. This change implements a similar feature for `import`s. It is worth noting that `import`s don't really require this kind of compiler support the way `#include`s do because the intention is that the meaning of an `import`ed file does not depend on the order or nesting of `import`s. As such, when trying to fix an error in an `import`ed file, you usually don't care how it came to be `import`ed into your shaders. The use case here is somebody adapting a large body of Slang code to use in a different codebase, such that they have certain `.slang` files they don't actually intend to have compile correctly, and they want to be able to diagnose how they came to include those files when/if they cause problems. The actual feature implementation is pretty simple because we already track a stack of active `import`s so that we can detect and diagnose recursive `import`s. This change simply changes the disagnostics when there is an error in imported code so that instead of just noting the inner-most `import` site it lists all the `import` sites that were active at the time. The change includes a test case to confirm that the behavior works (at least for the case of a parse error). * fixup: test outputs Co-authored-by: Yong He <yonghe@outlook.com> Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
2021-06-06	Fix a bug in preprocessor "busy" logic (#1875)	T. Foley
	* Fix a bug in preprocessor "busy" logic This bug manifested in both incorrect preprocessor output for certain complicated cases, and also (more importantly) a use-after-free issue. One of the "clever" design choices in the Slang preprocessor implementation is that the set of "busy" macros during expansion is implicitly defined by a linked list of those invocations that are actively being read from as part of the input stack. This logic works very well for checking whether a macro name is busy before triggering expansion, and for computing what macros should be considered busy during expansion of an object-like macro. The problem case here was with function-like macros where the preprocessor was re-using the same list of busy macros that had been fetched when reading the macro name, but doing so after all the macro argument tokens had been read. Because additional tokens had been read from the input stream stack, there was no guarantee that the invocations that had been active before were still live. The new logic computes the set of busy macros fresh before starting expansion of a function-like macro invocation. A test is included to ensure that the case that showed the use-after-free bug has been fixed. In addition, the new logic is careful to compute the busy macros only based on where subsequent tokens will come from and not based on any macros that might have contributed to the argument tokens themselves. A test case has been added that relies on getting this detail correct. * Update slang-preprocessor.cpp Remove a test typo. Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
2021-06-02	Various Fixes to gfx, reflection and emit. (#1867)	Yong He
	* Various Fixes to gfx, reflection and emit. - Fix GLSL emit to properly output `bitsTo` functions for `IRBitCast` insts. - Add line directive mode setting for `ISession`. - Extend `TypeLayout::getElementStride` to handle `VectorType` case. - Fix `IDevice::readBufferResource` 's D3D12 implementation to copy only the requested bytes out. - Fix `render-test` to use the `ISession` from `gfx` instead of creating its own `ISession` to make sure `gfx` and `render-test` agree on WitnessTable and RTTI IDs. - Extend `render-test` to support filling vector and matrix values in the new `set x = ...` TEST_INPUT syntax. - Add a `dynamic-dispatch-15` test case to make sure packing / unpacking works correctly across all targets, and to make sure render-test's RTTI/WitnessTable ID filling logic is correct for non-trivial cases. * Remove default-major test * Fix cyclic reference in `ExtendedTypeLayout`. * Move `lineDirectiveMode` setting to `TargetDesc`. Add `structureSize` to `TargetDesc` and `SessionDesc` for future binary compatibility. * Cleanup. Co-authored-by: Yong He <yhe@nvidia.com>
2021-05-28	Glslang refactor bugfix (#1863)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Fix issue with with SLANG_ENABLE_GLSLANG_SUPPORT * Update expected output from glslang-error.glsl * Fix bug in glsl dissassembly. * Make ExtensionTracker available even if source is not emitted. * Only explicitly set extension tracker based on capability bits, if we are in pass through. * Small simplification of invoke sourceEmit.
2021-05-27	Fix initializer lists for derived structs (#1862)	T. Foley
	If the user has a derived `struct` type: ```hlsl struct Base { int b = 1; } struct Derived : Base { int d = 2; } ``` Then it is still reasonable for them to want to use initializer lists when declaring variables using the `Derived` type: ```hlsl Derived x = {}; Derived y = { 7, 8 }; ``` This change implements two missing pieces of functionality in the Slang compiler to allow this case: * First, when the front-end semantic checks are applied to an initializer list, if the type being initialized is a derived `struct` type it always expects to find initialization arguments for its base type before those for its fields. * Second, when lowering an initializer-list expression from the AST to the IR, the compiler expects the first argument in the list to be the initial value for the base field (if any). This also applies to default-initialization of fields/variables. This change slightly entangles front-end logic with the logic for how struct inheritance is lowered to the IR, but the behavior is unlikely to confuse users who expect C++-like layout. It is worth noting that with this change it should be possible to initialize the base type using either a nested initializer list or flat arguments: ```hlsl struct BigBase { int x; int y; int z; } struct BigDerived : BigBase { int w; } BigDerived a = { {1,2,3}, 4 }; BigDerived b = { 1, 2, 3, 4 }; ``` This behavior should Just Work because of the existing C-like rules for initializer lists where an aggregate can be initialized by either a `{}`-enclosed block or distinct values for its leaf fields.
2021-05-27	Fix a bug in struct inheritance (#1861)	T. Foley
	During lowering from AST to IR, the Slang compiler translates code that uses `struct` inheritance: ```hlsl struct Base { int a; } struct Derived : Base {} ``` into code where the inheritance relationship is "witnessed" by a simple field: ```hlsl struct Base { int a; } struct Derived { Base __anonymous_field__; } ``` The underlying bug here is that the `__anonymous_field__` that the compiler generated during IR lowering was not being given any linkage decorations (no mangled name). As a result, if multiple separately-compiled modules all access that field they could disagree on its identity as an IR instruction. This could lead to output code being generated where the declaration of `__anonymous_field__` uses one IR instruction, but accesses use another. This change includes a fix for the issue, and a test that serves as a reproducer for the original problem.
2021-05-26	Fix a bug for enumerations with explicit "tag type" (#1856)	T. Foley
	The basic bug here was that `enum` types with an explicit tag type: enum Color : int32_t { ... } would have an `InheritanceDecl` implying that `Color` inherits from `int32_t`. The problem is that this is not actually an inheritance relationship, since a `Color` needs to be explicitly cast to/from an `int32_t`. Various parts of the compiler currently treat this case like real inheritance, and as a result the operations taht would apply to an `int32_t` end up applying to a `Color` as well. This particularly leads to an ambiguity between applying the `==` operator, because it has overloads for both the `__EnumType` and `__Builtin{something}` interfaces. The fix here is to explicitly exclude the `InheritanceDecl` that represents an enumeration tag type when considering declared subtype relationships. A more complete version of this fix would need to go through all places in the code where `InheritanceDecl`s are used and make sure that any places using them for true inheritnace relationships ignore those that represent an enumeration tag type. (An alternative option would be to use a distinct kind of `Decl` to represent the tag-type relationship, perhaps even going so far as to modifying the type of the relevant AST node as part of semantic checking) This change includes a regression test for the way this bug surfaced in user code. Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
2021-05-25	Rework shader object specialization control interface. (#1857)	Yong He

2021-05-25	Allow overriding specialization args via `IShaderObject`. (#1854)	Yong He
	* Allow overriding specialization args via `IShaderObject`. * Fixes. Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
2021-05-22	Improvements in -X support (#1852)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Added SourceLoc handling for command line parsing. * Fix typo in debug. * Fix issue around the DiagnosticSink used in options parsing not having a writer available - by having DiagnosticSink parenting. * Small rename for clarity. * WIP extracting command line args for downstream tools. * Unit tests/bug fixes around extracting args. * Use DownstreamArgs in the EndToEndCompileRequest * Passing downstream compiler options downstream. * Fix issue with endToEndReq being nullptr. * Fix issue with diagnostics number change. * Small improvements to how the source line is displayed if it's too long. Default to 120, as suggested in previous review. * Make render test use x-args parsing and CommandArgReader. * Added missing diagnostics. * More DownstreamArgs to linkage so can be seen by 'components'. Added dxc-x-arg test. * Used combination of name and args instead of two Lists, which whilst equivalent was perhaps a little confusing. * Added documentation for -X support. * Added test for x-args parsing diagnostic. Improved diagnostic with list of known names. * Fix issues from merge. * Fix lookup for -matrix-layout-column-major in render test. * Remove commented out line.
2021-05-21	[gfx] Support StructuredBuffer<IInterface>. (#1851)	Yong He
	Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
2021-05-21	Downstream option handling (#1850)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Added SourceLoc handling for command line parsing. * Fix typo in debug. * Fix issue around the DiagnosticSink used in options parsing not having a writer available - by having DiagnosticSink parenting. * Small rename for clarity. * WIP extracting command line args for downstream tools. * Unit tests/bug fixes around extracting args. * Use DownstreamArgs in the EndToEndCompileRequest * Passing downstream compiler options downstream. * Fix issue with endToEndReq being nullptr. * Fix issue with diagnostics number change. * Small improvements to how the source line is displayed if it's too long. Default to 120, as suggested in previous review. Co-authored-by: T. Foley <tfoleyNV@users.noreply.github.com>
2021-05-21	Overhaul the preprocessor (#1849)	T. Foley
	* Overhaul the preprocessor The old Slang preprocessor was based on a simple mental model that tried to unify two parts of macro expansion: * Scanning for macro invocations in a sequence of tokens * Producing the expanded tokens for a macro expansion by substituting arguments into its body The basic was that substitution of macro arguments into a macro definition is superficially similar to top-level macro expansion, just with an environment where the macro arguments act like `#define`s for the corresponding parameter names. That approach was "clever" and could conceivably have been extended to include a lot of advanced preprocessor features (e.g., a preprocessor-level `lambda` would be easy to support!), but it was basically impossible to make it correctly handle all the corner cases of the full C/C++ preprocessor. The fundamental problem with the old approach was that it conflated the two parts of expansion listed above into one implementation, while the various special cases of the C/C++ preprocessor rely on treating the two cases very differently. The new approach here (which is somewhere between a refactor and a full rewrite of the preprocessor) changes things up in a few key ways: * The abstraction still cares a lot about streams of tokens, but it now treats the top level streams (`InputFile`s) as fairly different from the lower-level streams (`InputStream`s) * Macro expansion is handled as a dedicated type of stream that wraps another stream. This allows macro expansion to be applied to anything, and supports cases where multiple rounds of macro expansion are required by the spec. * Macro invocations and the substitution of their arguments are now handled by a completely new system. * Macro arguments are no longer treated as if they were `#define`s * The macro body/definition is analyzed at definition time to detect various kinds of issues, and to derive a list of "ops" that make it easier to "play back" the definition at substitution time * Token pasting and stringizing are now only handled in macro definitions (rather than being allowed anywhere), and their use cases are restricted to only those that make sense (e.g., you can't stringize anythign except a macro parameter, because anything else wouldn't make sense) The key new types here are the `ExpansionInputStream` which handles scanning for macro invocations, and the `MacroInvocation` type, which handles playing back the macro body with substitutions. The `ExpansionInputStream` is the easier of the two to understand. By refactoring it to use a single token of lookahead, the one major detail it had to deal with before (abandoning expansion of a function-like macro if the macro name was not followed by `(`) is significantly easier to manage. The more subtle part is the `MacroInvocation` type, and most of the complexity there is around handling of token pasting, and the fact that either or both of the operands to a token paste might be empty. Many of the test cases that exposed the problems in the preprocessor have been moved from `current-bugs` to `preprocessor` since they now work correctly. * debugging: enable extractor command line dump * fixup * fixup