slang.git - Making it easier to work with shaders

Age	Commit message (Collapse)	Author
2021-12-17	Cleanup refactoring work around the IR builder (#2061)	Theresa Foley
	* Cleanup refactoring work around the IR builder We have some long-term goals for the IR that require a more centralized and disciplined set of rules for how IR instructions get created/emitted. I had been working on trying to set things up so that all IR instruction creation goes through a single bottleneck point, but the non-trivial work in that branch was getting drowned out by the sheer volume of cleanup and refactoring changes. This change tries to pull together several of the more important cleanups. The big pieces are: * `IRBuilder` and `SharedIRBuilder` now protect their data members and rely on users to initialize them more directly via constructor of an `init()` method. This change affects a bunch of sites where `IRBuilder`s were created. I changed use sites to use the constructors whenever possible, and to use `init()` in cases where we had longer-lived builders that needed to be initialized multiple times. * The insertion location for the `IRBuilder` now uses an encapsulated type called `IRInsertLoc`. This new type can replace what used to be just two `IRInst` fields in the builder, and also covers some new functionality (if we ever want to take advantage of it). Very little client code cares about this change, but it is still a nice cleanup in terms of making things more explicit. The creation of an `IRModule` has been moded out of `IRBuilder`, because in practice we `IRBuilder` always wants to be associated with a pre-existing `IRModule` at creation time (via its `SharedIRBuilder`). There is now an `IRModule::create()` operation instead. This required changing the sequencing at many `IRModule` creation sites, since most had been contriving to make an `IRBuilder` first. There were also several cleanups because code had been carelessly using non-reference-counted pointers for `IRModule`s in ways that broke now that `IRModule::create()` always returns a `RefPtr`. * The core operations to actually allocate memory for IR instructions were moved into `IRModule` (since they interact with the memory pool that the module owns). These were called `createEmptyInst()` but have been renamed into `_allocateInst()`. In principle these seem like they should only be needed to be called by the `IRBuilder`, but in practice they are also needed by the IR deserialization logic. * A few core operations for emitting IR instructions that were associted with `IRBuilder` were moved to actually be methods on `IRBuilder`. First is `_findOrEmitConstant` which is the primary bottleneck for creating simple scalar constant values. Another is `_createInst` (formerly part of the templated `createInstImpl` along with `createInstWithSizeImpl`) which is the main bottleneck for allocation and initialization of any instruction other than a constant (well, the `IRModuleInst` is the other exception...). Finally, there is also `_maybeSetSourceLoc()`, which is obvious to scope inside the `IRBuilder` once it is protecting the source-location info. Notes: * The `minSizeInBytes` parameter to `_createInst()` might not actually be needed at all. At this point any `IRInst` subtypes that need data allocated for things other than their operands already get created manually via `_allocateInst` or `_findOrEmitConstant`, so I think we could remove that part. I will handle that in a subsequent cleanup if it turns out to be the case. * There is one IR pass (`slang-ir-string-hash.cpp`) that is using manual `_allocateInst()` instead of going through an `IRBuilder`. It could be easily cleaned up to not do so (and I will probably make that change down the line), but for now I wanted to avoid doing anything that wasn't close to pure refactoring if I could. * At this point in our design an `IRBuilder` is a very lightweight thing - it basically just owns the insertion location plus a source location to write into instructions. A lot of our code currently treats `IRBuilder`s like they are expensive and/or need to be re-used (which leads to them being used in more mutable/stateful ways). It is quite likely that as we clean up other aspects of the implementation of IR creation/emission we can make `IRBuilder` use feel more lightweight in ways that can streamline and simplify code. * The next step for this work is to identify the different paths that eventually lead to `_createInst()` being called, and unify them at a single bottleneck operation that can own the decisions around when to create an instruction vs. when to re-use an existing one (rather than those decisions being baked into the various `IRBuilder` subroutines that create instructions of the various subtypes). * fixup: gcc/clang C++ spec details
2021-12-07	Output of IR ids as command line option (#2043)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * WIP control of dump options. * Removed SourceManager for IRDumpOptions * Arm aarch64 debug connection timeout - as CI timed out.
2021-10-19	Generalize heterogenous code emit (#1968)	David Siher
	* Bring heterogeneous-hello-world back up to date. * Reintroduced heterogeneous-hello-world into the premake * No longer uses compiled bytecode for entry point, instead a loadModule call is hardocoded with the slang file name. * Entry point is, similarly, hardcoded for now. * Added a bypass to slang-legalize-types for an unneeded GPUForeach check * Run premake and change to relative path * Removed experimental and added README * Add prebuild command to premake for heterogeneous example * Pass in entry point as parameter (also remove shader bytecode) * Pass in module name as parameter * Squashed commit of the following: commit 5b13b57fe600724344c556fe4309a5d6bb3d39ab Author: Kai Yao <kyao@nvidia.com> Date: Thu Oct 7 23:38:50 2021 -0700 Return diagnostics data when encountering module load error by exception (#1966) commit 112e1515c30fa972ff56f91514b70946153c718c Author: jsmall-nvidia <jsmall@nvidia.com> Date: Thu Oct 7 16:12:29 2021 -0400 Disable test crashing CI (#1965) * #include an absolute path didn't work - because paths were taken to always be relative. * Disable test that appears to be crashing. commit da32069a0c1c8c723d7ef45100049a8f0dd5d9c4 Author: Kai Yao <kyao@nvidia.com> Date: Mon Oct 4 13:58:51 2021 -0700 Modified barrier API to accept multiple resources per call (#1959) Co-authored-by: Yong He <yonghe@outlook.com> commit 97bb82ebcdf8f1391b9d93b5a8d7b1dfc4e88e52 Author: jsmall-nvidia <jsmall@nvidia.com> Date: Mon Oct 4 14:15:51 2021 -0400 Removing exceptions from core/compiler-core (#1953) * #include an absolute path didn't work - because paths were taken to always be relative. * Refactor Stream. Working on all tests. * Split out CharEncode. * Make method names lower camel. m_prefix in Writer/Reader * Tidy up around CharEncode interface. * Small improvements around encode/decode. * Better use of types. * Remove readLine from TextReader. * Remove exceptions from Stream/Text handling. * Fix some typos. * Fix tabbing. * Fix missing override. * Remove remaining exception throw/catch via using signal mechanism. * Remove exceptions that are not used anymore. * Document the Stream interface. * Remove index for decoding 'get byte' function. * Fix CharReader -> ByteReader. commit b3dfe383c6d31ff3dbd76dcfb32de8d536382f3e Author: lucy96chen <47800040+lucy96chen@users.noreply.github.com> Date: Mon Oct 4 09:46:33 2021 -0700 Get native handles for TextureResource and BufferResource (#1960) * Added getNativeHandle() to TextureResource and BufferResource; Implemented getNativeHandle() in Vulkan and D3D12; Added new unit test files for the aforementioned implementation * Added missing getNativeHandle() implementations to renderer-shared.cpp and CUDA * Finished new getNativeHandle() unit tests for ITextureResource and IBufferResource; Modified ICommandQueue and ICommandBuffer unit tests to call QueryInterface to convert to IUnknown then back and compare resulting pointers for equality * Unit tests updated and pass locally * Cast m_buffer.m_buffer and m_image to uint64_t commit 35bca4cc432613af3926da3bed217a6baa9cbd26 Author: lucy96chen <47800040+lucy96chen@users.noreply.github.com> Date: Fri Oct 1 13:08:25 2021 -0700 Add getNativeHandle() to ICommandQueue and ICommandBuffer (#1952) * Added support for getting command buffer and command queue handles to ICommandBuffer and ICommandQueue; D3D12Device, VkDevice, and DebugDevice modifieid to implement this new functionality; immediate-renderer-base.cpp also modified to implement the new functions * Removed excess boilerplate * Changed readRef() to get() in D3D12 getNativeHandle() implementation for ICommandBuffer and ICommandQueue * Added unit tests for new getNativeHandle() implementations, unfinished * Queue test added; Minor cleanup changes * getBufferHandleTestImpl() now closes the command buffer before returning * Added getNativeHandle() implementations to CUDADevice * Added comment clarifying that the Vulkan check is checking for a null handle, which is defined to be 0 commit 6c6200f547c7387598743b23bb3c8f0d375d9494 Author: Kai Yao <kyao@nvidia.com> Date: Thu Sep 30 20:25:34 2021 -0700 VK Resource Barrier (#1955) * Resource barrier API and VK implementation * Stub implementations * Handle VK Acceleration Structure flag * Add a couple more cases to pipeline barrier stages commit 627fc976bac5c2381dbace9c7925cb6a68b8de12 Author: Yong He <yonghe@outlook.com> Date: Thu Sep 30 19:48:47 2021 -0700 Fix aarch64 build on github (#1957) Co-authored-by: Yong He <yhe@nvidia.com> commit 122d701513e116856bd59c999221ce36a373d7db Author: Yong He <yonghe@outlook.com> Date: Thu Sep 30 17:51:56 2021 -0700 Fix GitHub release (#1956) * Fix aarch64 release build config. * Fix for WinAarch64 build. * Update premake for embed-std-lib build on aarch64. * `platform` fix for aarach64 build. * Try revert back to use absolute output path for slang-stdlib-generated.h * Fix * fix Co-authored-by: Yong He <yhe@nvidia.com> commit aa8f7b899b7b562b3d3c6e25c3da41569505e70c Author: Chad Engler <englercj@live.com> Date: Wed Sep 29 13:02:47 2021 -0700 Fix ARM64 detection for MSVC (#1951) commit 6736b0c1c5fa3e89bc561eb7965a1a0d17af3466 Author: Yong He <yonghe@outlook.com> Date: Wed Sep 29 11:29:46 2021 -0700 Add ISession::loadModuleFromSource. (#1950) Co-authored-by: Yong He <yhe@nvidia.com> commit d8e452412e14a6a8ba137f2adcae13b398e5cecb Author: Yong He <yonghe@outlook.com> Date: Tue Sep 28 15:03:03 2021 -0700 Fix AbortCompilationException leaking through loadModule API. (#1949) * Fix AbortCompilationException leaking through loadModule API. * Update. * Fix. Co-authored-by: Yong He <yhe@nvidia.com> commit cdf1b2c007fefdca128584d2a9f63dec3d350e16 Author: Yong He <yonghe@outlook.com> Date: Tue Sep 28 11:54:24 2021 -0700 Improvements to the unit test framework. (#1948) commit af788b62e18bbd55cd748ad60400a74cf1bc93ee Author: lucy96chen <47800040+lucy96chen@users.noreply.github.com> Date: Fri Sep 24 16:53:41 2021 -0700 Add existing device handle support unit test (#1946) commit bec8e6aec85b6e3f875c58bdd59eb15613978358 Author: Yong He <yonghe@outlook.com> Date: Fri Sep 24 11:33:44 2021 -0700 Move existing unit tests to a standalone dll. (#1945) commit f2a3c933bc11a498c622fa18694c84beca8ca031 Author: lucy96chen <47800040+lucy96chen@users.noreply.github.com> Date: Thu Sep 23 12:19:49 2021 -0700 Add method to retrieve native handles (#1944) * Added a getNativeHandle() method that retrieves the natively created handles; Modified RendererBase, VKDevice, D3D12Device, and DebugDevice to implement this new method * Moved ExistingDeviceHandles out of Desc directly inside IDevice and renamed to NativeHandles; Modified calls accessing the struct accordingly in RendererBase, DebugDevice, VKDevice, and D3D12Device * Minor cleanup changes (renames, etc.) commit b9b398d038b524f15a86ff27cd6888d54e8754e0 Author: Yong He <yonghe@outlook.com> Date: Wed Sep 22 10:06:59 2021 -0700 Add gfx unit testing framework. (#1943) * Add gfx unit testing framework. * Fix compilation error. * Reset gfxDebugCallback after render_test. * Pass enabledApi flags through. * Fix for code review suggestions. Co-authored-by: Yong He <yhe@nvidia.com> commit 6e9cee69b3588ddae09b08b9f580f59ad899983f Author: lucy96chen <47800040+lucy96chen@users.noreply.github.com> Date: Tue Sep 21 18:46:32 2021 -0700 Support for existing device/instance handles in Vulkan (#1942) commit b1f04c8544c650de3947955ca68f679535d249aa Author: lucy96chen <47800040+lucy96chen@users.noreply.github.com> Date: Wed Sep 15 20:22:45 2021 -0700 Allow D3D12Device to use an existing device handle (#1940) * Added a new field for an existing device handle to IDevice::Desc; Modified D3D12Device::initialize to set the device stored in desc if it already exists instead of creating a new one * Turned existingDeviceHandle into a struct containing an array of two elements; Updated D3D12Device::initialize to match changes to existingDeviceHandle; Updated comments * Fixed style error for ExistingDeviceHandles struct commit 2f7b9f5ae8be21c6c1d75ae9caefbc7b3f8986a9 Author: Pablo Delgado <private@pablode.com> Date: Thu Sep 16 01:17:57 2021 +0200 Fix incorrect WIN32 macros and missing Windows.h inclusion (#1939) * Replace WIN32 preprocessor macros with _WIN32 * Add missing Windows.h include for InterlockedIncrement commit 11d43642008905ac69a3832eb8a9b2ae7b785f86 Author: Yong He <yonghe@outlook.com> Date: Tue Sep 14 11:36:44 2021 -0700 Avoid upcasting to f32 in 16bit float-uint bit cast. (#1938) Co-authored-by: Yong He <yhe@nvidia.com> commit 502aa3812a82cf0d091cff0c67804e4ee448ac78 Author: David Siher <32305650+dsiher@users.noreply.github.com> Date: Tue Sep 14 12:59:55 2021 -0400 Bring heterogeneous-hello-world back up to date. (#1935) * Bring heterogeneous-hello-world back up to date. * Reintroduced heterogeneous-hello-world into the premake * No longer uses compiled bytecode for entry point, instead a loadModule call is hardocoded with the slang file name. * Entry point is, similarly, hardcoded for now. * Added a bypass to slang-legalize-types for an unneeded GPUForeach check * Run premake and change to relative path * Removed experimental and added README Co-authored-by: Yong He <yonghe@outlook.com> * Revert "Squashed commit of the following:" This reverts commit 4f665858d65f7c332c616ef6db9fdafa1c5e0b9f. * Run premake * Remove prebuild command (only works on Windows?) * Rerun premake * Fix heterogeneous prebuild command * Remove linux specific prebuild command * Fix prebuild command (again) * Change target from dxbc to hlsl to see if that fixes linux issues * Use Path::getFileNameWithoutExt * Change string-literal.slang.expected to have extra filename in decoration Co-authored-by: Yong He <yonghe@outlook.com>
2021-09-14	Bring heterogeneous-hello-world back up to date. (#1935)	David Siher
	* Bring heterogeneous-hello-world back up to date. * Reintroduced heterogeneous-hello-world into the premake * No longer uses compiled bytecode for entry point, instead a loadModule call is hardocoded with the slang file name. * Entry point is, similarly, hardcoded for now. * Added a bypass to slang-legalize-types for an unneeded GPUForeach check * Run premake and change to relative path * Removed experimental and added README Co-authored-by: Yong He <yonghe@outlook.com>
2021-09-03	Fix crash: dynamic dispatch of generic interface method. (#1929)	Yong He
	* Fix crash: dynamic dispatch of generic interface method. * Fix memory error. Co-authored-by: Yong He <yhe@nvidia.com>
2021-08-26	Add API to control interface specialization. (#1925)	Yong He

2021-07-21	Work to mitigate SPIR-V bloat (#1914)	Theresa Foley
	* Work to mitigate SPIR-V bloat SPIR-V is not an especially compact format, but some patterns in how Slang generates code and then runs it through `spirv-opt` lead to many redundant field-by-field copy operations being emitted. This change attempts to address some of the resulting bloat from the Slang side of things. Note: experimentation shows that the bloat is less pronounced when running either no SPIR-V optimizations or full SPIR-V optimizations, so it is also likely that the bloat should be addressed by changing which `spirv-opt` passes the Slang compiler runs in default (`-O1`) builds. Such changes should come as a distinct pull request. This change primarily does two things: First, the code generation strategy for passing arguments to `out` and `inout` parameters has been changed. In the past, the compiler would always copy the argument value into a temporary, then pass the address of the temporary, and then write back the value after the call. The new code generation strategy attempts to identify when an argument value already has a simple address in memory and passes that address directly when possible. This eliminates many copy operations that occur before/after calls to functions with `out`/`inout` parameters. Second, we introduce an IR optimization pass that detects call sites where the entire contents of a buffer (usually a constant buffer) is being passed to a callee function, such that many bytes are loaded and then passed even if only very few are used in the callee. The pass moves the load operations from the caller to a specialized version of the the callee where possible (e.g., when the constant buffer in question is a global shader parameter). Doing this eliminates another major category of copies. Notes: * The IR lowering logic is complicated by the fact that several kinds of l-values (values that are usable as the desitnation of assignment, or for `out`/`inout` arguments) are not actually addressable. An easy example is a non-contiguous swizzle like `v.xwz` on a `float4`, where the value occupies 12 bytes, but not 12 consecutive bytes with a single address. There are many more corner cases like that and the IR lowering pass carries a lot of complexity to deal with them. A more systematic overhaul is due some time soon. * The IR representation of `out` and `inout` parameters deserves some careful scrutiny when making these kinds of changes. The official semantics of `inout` in HLSL has been "copy in copy out" (and `out` is just "copy out") which is observably different from any solution that passes in the address of an l-value directly. By making this change we are saying that Slang's semantics are not precisely those of legacy HLSL, and that our semantics for `inout` parameters are closer to those of `inout` in Swift or of a mutable borrow in Rust. In the Swift case the implementation can freely pass the underlying storage of an l-value or the address of a temporary, and valid programs may not observe the different. It is thus illegal to observe the value in a storage local while a mutation to that location is "in flight." All of this is way more detailed and technical than 99% of Slang users will ever care about, but importantly it gives us semantic cover to eliminate these copies in the IR, and also to emit output C++ code that implements `out` and `inout` as by-reference parameter passing. * There was an exsting generic pass for specializing functions based on call sites that uses a "template method" style of pattern to customize its behavior. That pass needed to be generalized to handle this use case because it had previously operated on the assumption that the "desire" to specialize a callee function must be driven by the parameter declarations of that function, and not on the argument values passed in. The code has been slightly refactored to allow the policy for specialization to consider both parameters and arguments. * Unsurprisingly, a bunch of the GLSL (and thus SPIR-V) generated has changed with this work, so several baseline `.slang.glsl` files needed to be updated. * This change is incomplete in that it does not address broader cases of buffer loads, including both partial loads from constant buffers (just loading one field, but a field that uses a "large" structure type), and loads from multi-element buffers (a lot from a structured buffer where the element type is "large"). The main question in each of those cases is how to define how "large" a structure needs to be before we decide to try and sink loads into callee functions like this. In the worst case, sinking loads in this way may actually create more memory traffic (because the same values get loaded in multiple callee functions). * fixup: run premake * fixup: typo
2021-05-27	Fix initializer lists for derived structs (#1862)	T. Foley
	If the user has a derived `struct` type: ```hlsl struct Base { int b = 1; } struct Derived : Base { int d = 2; } ``` Then it is still reasonable for them to want to use initializer lists when declaring variables using the `Derived` type: ```hlsl Derived x = {}; Derived y = { 7, 8 }; ``` This change implements two missing pieces of functionality in the Slang compiler to allow this case: * First, when the front-end semantic checks are applied to an initializer list, if the type being initialized is a derived `struct` type it always expects to find initialization arguments for its base type before those for its fields. * Second, when lowering an initializer-list expression from the AST to the IR, the compiler expects the first argument in the list to be the initial value for the base field (if any). This also applies to default-initialization of fields/variables. This change slightly entangles front-end logic with the logic for how struct inheritance is lowered to the IR, but the behavior is unlikely to confuse users who expect C++-like layout. It is worth noting that with this change it should be possible to initialize the base type using either a nested initializer list or flat arguments: ```hlsl struct BigBase { int x; int y; int z; } struct BigDerived : BigBase { int w; } BigDerived a = { {1,2,3}, 4 }; BigDerived b = { 1, 2, 3, 4 }; ``` This behavior should Just Work because of the existing C-like rules for initializer lists where an aggregate can be initialized by either a `{}`-enclosed block or distinct values for its leaf fields.
2021-05-27	Fix a bug in struct inheritance (#1861)	T. Foley
	During lowering from AST to IR, the Slang compiler translates code that uses `struct` inheritance: ```hlsl struct Base { int a; } struct Derived : Base {} ``` into code where the inheritance relationship is "witnessed" by a simple field: ```hlsl struct Base { int a; } struct Derived { Base __anonymous_field__; } ``` The underlying bug here is that the `__anonymous_field__` that the compiler generated during IR lowering was not being given any linkage decorations (no mangled name). As a result, if multiple separately-compiled modules all access that field they could disagree on its identity as an IR instruction. This could lead to output code being generated where the declaration of `__anonymous_field__` uses one IR instruction, but accesses use another. This change includes a fix for the issue, and a test that serves as a reproducer for the original problem.
2021-04-16	Update `model-viewer` example and fixing compiler bugs. (#1795)	Yong He

2021-03-10	A bunch of overlapping semantic-checking fixes (#1743)	Tim Foley
	This change originally started with the simple goal of allowing generic functions with default argument values on their parameters to work: ``` void someFunction<T>(T value, int optional = 0); ``` The core problem there was that the compiler code was (correctly) anticipate the case where the default argument value for a parameter depends on a generic parameter, such as: ``` interface IDefaultable { static This getDefault(); } void anotherFunction<T : IDefaultable>(T first, T second = T.getDefault()); ``` Supporting this latter case requires some kind of ability to apply subsitutions to an `Expr`, but our compiler logic simply errored out in that case. The first major fix that went into this change was to add a new `SubstExpr<T>` type that behaves a lot like `DeclRef<T>` in that it stores a `T` plus a set of substititions that need to be applied to it. In addition, it was found that even if `anotherFunction<ConcreteType>(...)` might work, when generic argument inference was used for just `anotherFunction(...)` would fail because it includes a strict match on the number of arguments/parameters in the call expression. The next problem that arose was that the test I'd created used an interace with an `__init` requirement, and it appeared that our code generation didn't work for that case: ``` interface IStuff { __init(int val); } void f<T : IStuff>(T x = T(0)); ``` In this case, the `T(0)` initialization would get compiled to `(ConcreteType) 0` in the output rather than calling the function generated for the `__init` inside `ConcreteType`. The basic problem there was a bit of crufty old logic we have in place to work around the large number of `__init` declarations in the stdlib that don't have proper `__intrinsic_op` modifiers on them. We really need to fix the underlying problem there, but I worked around it by having the IR lowering pass only do its workaround magic on stdlib declarations. The next problem down this line was that my test had two different `__init` declarations in the concrete type and the logic for checking interface conformance was picking the wrong one to satisfying an interface requirement despite it being obviously wrong (not even the right number of parameter). This last problem led me down the rabbit-hole of trying to actually get our semantic checking for interface requirements right. There were a few pieces to that work: Actually checking that the parameter and result types for two callables match is the simple part. If that was all that would be required we would have implement this logic a long time ago. * Next we have to deal with functions that make use of the `This` type, associated types, etc. We have to know that when the interface uses `This`, we want to treat that as equivalent to `ConcreteType`, and similarly for associated types. Getting that working is mostly a matter of setting up a this-type subsitution for the interface member being checked. * Finally, when comparing generic declarations like `IBase::doThing<T>` and `Derived::doThing<U>` we need to deal with the way that `T` and `U` represent the "same" logical type parameter, but are distinct `Decl`s. This is handled by specializing the base declaration to the parameters of the derived one (e.g., forming `IBase::doThing<U>` using the `U` from `Derived::doThing`). The result seems to be passing our tests, but there are still a few gotchas lurking, I'm sure.
2021-03-03	Add GLSL/SPIR-V support got GetAttributeAtVertex (#1733)	Tim Foley
	This change allows varying fragment shader inputs to be declared in a way that allows the `GetAttributeAtVertex` operation to compile to valid code for both D3D and GLSL/SPIR-V/Vulkan. The key is that rather than just use ordinary `nointerpolation`-qualified inputs the code must declare these varying inputs with a new `pervertex` qualifier that marks them as only being usable with `GetAttributeAtVertex`. The `pervertex`-tagged inputs then translate to GLSL inputs using the `pervertexNV` qualifier Note that this change does not include any enforcement of the requirements around how these qualifiers are used (and the compiler doesn't have enforcement for the existing operations like `EvaluateAttributeAtCentroid`). The underlying problem is that the inerpolation-mode qualifiers and explicit interpolation functions in HLSL constitute a kind of rate-qualified type system, but without any systematic rules. It seems wasteful to encode a bunch of ad hoc rules for this stuff as special cases in the compiler when the clear right answer is to implement a systematic approach to rates.
2021-02-17	More #line improvements (#1713)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * WIP: First pass in supporting output of line error information. * Add support for lexing to better be able to indicate SourceLocation information. * Fix lexer usage in DiagnosticSink in C++ extractor. * Update diagnostics tests to have line location info. * Fixed test expected output that now have source location information in them. * Better handling of tab. * Fix test expected results for tabbing change. * DiagnosticLexer -> DiagnosticSink::SourceLocationLexer Added line continuation tests. * Fix typo. * Added String::appendRepeatedChar * Change to rerun tests. * Added source locations to IR dumping. * Output column for IR dump source loc. * Add support for closing brace location to AST. Use closing brace location in lowering when adding return void. * Set the source location through SourceLoc - simplifies identifying if current loc is valid. * Copy terminator sloc. * Test for improved #line handling. * Made writer the last parameter for dumpIR. Small improvements to comments. * Disable sloc output on dump IR by default. * Fix issue with #line and inlining. * Fix for output with improved #line output. * Small comment change - mainly to kick off TC build. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2021-02-16	Add an accessor for IRInst opcode (#1707)	Tim Foley
	* Add an accessor for IRInst opcode This main changing is renaming `IRInst::op` over to `IRInst::m_op` and then adds an accessor `IRInst::getOp()` to read it. The rest of the changes are just changing use sites to `getOp` (or to `m_op` in the limited cases where we write to it). This work is in anticipation of a future change that might need to store an extra bit in the same field as the opcode. It seemed better to do this massive refactoring as a separate PR. * fixup
2021-02-12	Initial support for DXR payload access qualifiers (#1705)	Tim Foley
	This change adds initial support for a feature being proposed for inclusion in dxc: https://github.com/microsoft/DirectXShaderCompiler/pull/3171. The main features are: * A `[payload]` attribute that indicates which `struct` types are intended to be used as payloads. Consistent use of this attribute should mean that an application no longer needs to manually specify a maximum payload size when creating a ray-tracing pipeline. * `read(...)` and `write(...)` qualifiers which can be attached to fields of `struct` types (usually `[payload]`-attributed types) to indicate which ray tracing pipeline stages are allowed read/write access to that part of the payload. Use of these qualifiers should allow an implementation to optimize storage of ray payload elements across RT pipeline stages. The work in this change just adds basic parsing for these features, translation to matching IR decorations, and then emission of HLSL text based on those decorations. Notable gaps in this first change include: * No work is currently being done to validate access to ray payloads in RT entry points based on these qualifiers. * The stage names in `read(...)` and `write(...)` are not being validated, and are being stored in the IR as text. These should probably use the `Stage` enumeration in some fashion, but we would need to have a way to encode the additional `caller` pseudo-stage that the feature uses. * No work is currently being done to adjust or react to the chosen shader model when emitting HLSL code. We should either have these attributes force a switch to a higher shader model, or skip emission of these attributes if the chosen shader model / profile does not imply support for them. * No tests are currently included for this work, because tests would rely on using a custom `dxcompiler.dll` build with the new feature supported.
2021-02-10	Fix a bug in IR lowering (#1701)	Tim Foley
	The underlying problem here is that our `SharedIRBuilder` (which currently owns the "global" value-numbering map) has a subtle invariant ("subtle" in the sense of "dangerous and bad"). The value-numbering map stores `IRInst`s for things like constants and types, and if those instructions end up getting modified or deleted (deleting an instruction currently runs its destructor but does not free the pool-allocated memory), then it is possible for the computed hash code for an instruction to no longer match what it was when it was inserted. The trigger in this case was a use of the `IRInst::removeAndDeallocate()` operation inside of the AST-to-IR lowering pass, which uses a single `SharedIRBuilder`. If that `removeAndDeallocate()` happens to apply to a value in the value-numbering map, then it risks breaking the next time the map gets rehashed. The short-term fix here is simple: never try to delete an instruction during IR lowering, even if it is known to be unused. Instead, we can rely on the subsequent DCE pass to eliminate the instruction. A longer-term fix here would involve fixing our entire strategy around value numbering. We know we need to do that, but that would be a big enough change that it couldn't be pursued as part of a simple bug fix like this.
2021-02-05	Initial implementation of interface conjunctions (#1691)	Tim Foley
	The basic feature here is the ability to use the `&` operator to produce the conjunction/intersection of two interfaces. That is, you can have interfaces: interface IFirst { int getFirst(); } interface ISecond { int getSecoond(); } and if you need a generic function where the type parameter `T` must conform to both of these interfaces, you express that by constraining the parameter to the intersection of the interfaces: void someFunction<T : IFirst & ISecond>(T value) { ... } Without this feature, the main alternative an application would have is to define an intermediate interface, like: interface IBoth : IFirst, ISecond {} Forcing users to deal with an intermediate interface creates more work for type authors (they need to remember to inherit from the right combined interface(s)), or for `extension` authors (when you add `ISecond` to a type that used to just support `IFirst`, you had better also add `IBoth`). In the worst case, a family of N related "leaf" interfaces would give rise to an exponential number of intermediate interfaces to represnt the possible combinations. A conjunction like `IFirst & ISecond` is officially its own type, and can be used to declare a type alias: typealias IBoth = IFirst & ISecond; This change only includes the first pass of work on this feature, so there are several caveats to be aware of: * Using a conjunction as part of an inheritance clause is not yet supported (e.g., `struct X : IFirst & ISecond`). This is true even if the conjunction was introduced by an intermediate `typealias` * The `&` syntax introduced here is only parsed in places where only a type (not an expression) is possible. This means you cannot do things like cast to a conjunction with `(IFirst & ISecond)(someValue)`. * This work should apply to conjunctions of more than two interfaces (like `IA & IB & IC`) but that has not yet been tested * In the long run it may be sensible to allow conjunctions that use concrete types, but we really ought to have the semantic checking logic rule that out for now. * During testing, I encountered compiler crashes when trying to use this feature together with `property` declarations. Further investigation and debugging is called for. * The handling of conjunction types is currently incomplete, in that there are many equivalences the compiler does not yet understand. For example, it is clear that `IA & IB` is equivalent to `IB & IA`, but the compiler currently does not understand this and will treat them as different types. A deeper implementation approach is called for. * Conjunctions are currently only supported for generic type parameter constraints, when performing full specialization. Use of conjunctions for existential-type value parameters or with dynamic dispatch is not yet supported.
2021-02-04	Change how function-scope static variables lower to IR (#1686)	Tim Foley
	This change pertains to `static` variables in function scope (including things like methods, initializers, property accessors, etc.). Note that it does not have anything to do with global-scope `static` variables or with `static const` variables (whether inside a function or not). The old code generation strategy had a lot of "clever" code to deal with the problem of a `static` variable inside a generic function (or inside a function inside a generic type, etc.). Basically, if you had input code like: int myFunc<T>(int newVal) { static int state = 0; int result = state; state = newVal; return result; } The language semantics are that `myFunc<float3>` should have a different `state` variable than `myFunc<int2>`. The way that the existing codegen handled that was to generate the `state` variable into its own dedicated `IRGeneric`. Something like: generic myFunc_state<T0> { global_var g_ptr : int; return g_ptr; } generic myFunc<T1> { func f(int newVal) { let result : int = load(state<T>); store(state<T1>, newVal); return result; } } The catch there is that you end up needing to generate an entire second `IRGeneric`, and then references to `state` need to explicitly use `specialize` to instantiate that generic using the same parameters as `myFunc` was passed (note how `T0` and `T1` are distinct IR generic parameters, despite both representing `T` here). Things get even more complicated when you consider function-`static` variables with initialization logic, since we need to be sure we only perform that initialization once, but the initialization could refer to arguments of the outer function, and thus needs to be done inside the function body. To handle that case we emit an additional `bool` global if a function-`static` variable has an initializer, and that `bool` gets wrapped up in yet another generic. That whole approach seems silly in retrospect, and a much simpler solution is possible: just emit the function-`static` variable immediately before the IR function it pertains to, which means it will be nested under the same* IR generic if there is one (and at module scope if there isn't). The result is something like: generic myFunc<T1> { global_var state_ptr : int*; func f(int newVal) { let result : int = load(state_ptr); store(state_ptr, newVal); return result; } } This change implements that simplification, and all the same tests pass (including whatever tests we had for function-`static` variables).
2021-02-02	Remove GlobalGenericParamSubstitution (#1684)	Tim Foley
	The `GlobalGenericParamSubsitution` class used to be used to represent the mapping of global-scope generic parameters to their concrete arguments, so that we could make use of those concrete arguments for things like layout. That representation caused a lot of pain for other parts of the compiler, though, because everything that dealt with `Substitution`s needed to account for the possibility of global-generic-param subsitutions even if they logically could not occur in most parts of the compiler. We have since moved to a model where the values for global-scope generic parameters are stored in a single explicit global structure that is used by both layout computation and IR lowering. There is no actual code that construct `GlobalGenericParamSubstitution`s from scratch any more, so all of the support code for them was actually unused. This change removes all the unused code, and shows that the tests still pass without it (even the tests that use global-scope generic parameters).
2021-01-07	Add support for [noinline] attribute (#1650)	Tim Foley
	This adds the `[noinline]` attribute to the front-end, and passes it through when generating HLSL output. Notes: * This change doesn't include a test since the dxc version I have locally parses `[noinline]` but then generates DXIL that fails validation. * This change doesn't include logic to handle `[noinline]` for other targets. Notably, SPIR-V has decorations that convey the same intention, but we don't yet take advantage of the GLSL extension(s) that would let us generate those decorations. * By necesstiy, `[noinline]` is only a "strong suggestion" and not actually something the compiler can ever guarantee/enforce.
2020-12-11	Add first steps toward a "capability" system (#1636)	Tim Foley
	* Add first steps toward a "capability" system We already have cases in the stdlib where we mark declarations as being specific to certain targets, e.g.: ``` // My ordinary function to add two numbers. // Works everywhere. // void myFunc(int a, int b) { return a + b; } // On the "coolgpu" target, we can use a secret intrinsic // that adds numbers even faster! // __specialized_for_target(coolgpu) void myFunc(int a, int b) { return __secretIntrinsic(a, b); } ``` The existing logic for dealing with these modifiers (`__specialized_for_target` and `__target_intrinsic`) was almost entirely string-based. We would turn the chosen compilation target into a string, and then use that to try and search for the "best" definition of a function at a few steps: * During IR linking, we always pick one definition of an `[import]`ed function, and that definition will be the one with the "best" target-specialization modifier (if any) * During final code generation, we always look up the "best" target-intrinsic modifier, and use it as the template for the code we output. This change preserves the basic flow there, but replaces the ad hoc string-based logic with something a bit more principled, in terms of a new `CapabilitySet` type. A `CapabilitySet` represents a set of zero or more atomic features (here represented as `CapabilityAtom`s). What a `CapabilitySet` means depends on how and where it is used: * A compilation target implies a `CapabilitySet` where the contents of the set are the features the target supports. * A `CapabilitySet` attached to a declaration (or a modifier on that declaration) describes a set of feature that declaration requires. The current implementation of `CapabilitySet` is wasteful and inefficient, but that is something we can iterate on over time. In practice, most of the current code only ever uses capability sets that are either empty (because they represent a function with no specific requirements) or singleton (because they represent asingle atomic capability like "is a GLSL target," "is an HLSL target," etc.). The main goal here was to put in the skeleton of a new system, including some of the features it might need down the line, and then to leave changes that eventually use the greater flexibility for later. Eventually, the capability system should encompass: * Differences between shader model versions, GLSL versions, SPIR-V versions, etc. (currently tracked with other modifiers) * Optional extensions, and functions that are made available only with certain extensions (currently tracked with other modifiers) * Front-end checking that the call graph of a program doesn't violate any capability-requirements (e.g., having a GLSL+HLSL portable function call a GLSL-only subroutine) * Hypothetically we can also try to fold stage-specific (vertex-only, fragment-only, etc.) functions into this system, but doing so would require more linker cleverness if we allow overloading on stages (since we might have to clone a caller if it calls through to a callee with multiple stage-specific versions) One important complication that the system has to deal with just because of the "do what I mean" nature of the current compiler is that somethings a current Slang user might compile for target X and specify version N, but then use a function that actually requires version N+1 of that target. Currently the Slang compiler silently "upgrades" the version(s) used by user code in these cases, because it is often what users want in cross-compilation scenarios. Dealing with the "silent upgrade" situation requires us to be a little careful and sometimes pick a "best" capability set that doesn't appear to be supported on our target. Refining that system and potentially getting rid of the "do what I mean" behavior over time could be a goal for future changes. * fixup: handle case where value is incompatible during linking
2020-12-02	Fix [mutating] generic methods (#1618)	Tim Foley
	Slang generates code that turns the implicit `this` parameter of a method into an explicit parameter. The logic that decides whether that parameter should be `inout` is a bit involved, and there was a bug where a generic method would lead to the use of an `in` modifier (the default) and override the `inout` modifier that was requested by the method itself. This change fixes the logic to treat generic declarations in the parent chain of a leaf method as having no bearing on whether an implicit `this` parameter should be `inout` or not. A test case is included that breaks with the old behavior, and demonstrates that a generic `[mutating]` method can now work correctly.
2020-11-19	Unify handling of static and dynamic dispatch for interfaces (#1612)	Tim Foley
	Overview ======== Prior to this change, we had two different code generation strategies for interface/existential types in Slang, that didn't always play nicely together: * The "legacy" static specialization approach could handle plugging in an arbitrary concrete type for an existential type parameter (including types with resources, etc.), but wouldn't work well with things like a `StructuredBuffer<>` of an interface type, and requires somewhat counter-intuitive layout rules to make work. * The new dynamic dispatch approach produces simpler, more easily understood layouts by assuming that values of interface type can fit into a fixed number of bytes. The tradeoff there is that it cannot handle types that include resources (only POD types). The goal of this change is to make it so that the two strategies can co-exist. In particular, in cases where a shader is amenable to both static specialization and dynamic dispatch, the type layouts should agree. In order to make the type layouts agree, we: * Declare that all values of existential type reserve storage according to the dynamic-dispatch rules (so 16 bytes for the RTTI and witness-table information, plus whatever bytes are needed to story "any value" of a conforming type). * Then we modify the "legacy" layout rules so that if a value of concrete type can fit in the reserved "any value" space for a given interface, then it is laid out there exactly like the dynamic dispatch rules would do. Otherwise, we fall back to the previous legacy rules (since we don't need to agree with the dynamic-dispatch layout on types that can't be used with dynamic dispatch). Details ======= * Renamed `ExistentialBox` to `BoundInterfaceType` to better clarify how it relates to `BindExistentialsType` * Unconditionally apply the `lowerGenerics` pass during emit, since it is now responsible for aspects of the lowering of existential types when specialization is used. * Made IR type layout take the target into account, so that the layout of resource types can vary by target (e.g., being POD on some targets, and invalid on others) * Cleaned up some issues around using global shader parameters as the "key" for their layout information in the global-scope layout (only comes up when there are global-scope `uniform` parameters) * Made there be a default any-value size (16) instead of making it be an error to leave out. This was the simplest option; we could try to go back to having an error, but we'd need to only issue it if we are sure a type/interface is being used with dynamic dispatch, since static dispatch doesn't have to obey the restrictions. * Changed lowering of existential types to tuples so that bound interfaces where the concrete type won't fit use a "pseudo-pointer" instead of an "any-value" to hold the payload * Changed IR type legalization to handle the "pseudo-pointer" case and apply layout information from an interface type over to the payload part when static specialization was used. * Changed some details of how witness tables were being lowered, so that we didn't have to create "proxy" witness tables for the constraints on associated types (just use the actual requirement entries we generate) * Changed witness tables so that they know the subtype doing the conforming * Added logic so that we don't generate pack/unpack logic and witness table wrapper functions for types that are incompatible with any-value/dynamic dispatch for a given interface. * Changed the core AST-level type layout logic to use the dynamic-dispatch layout in case things fit, and the legacy static specialization case when things don't (while also reserving space for the dynamic-dispatch fields) * Changed a bunch of test cases for static specialization to properly use the new layout (which introduces new buffers in some cases, and moves data around in others). Future Work =========== The experience of trying to reconcile our older way of handling interface-type specialization with our newer model (that supports dynamic dispatch) makes it clear that we really need to make similar changes to our handling of generic type parameters on entry points and at the global scope. A future change should make it so that a global type parameter is lowered with a type layout similar to a value parameter of interface type, including the RTTI and witness-table pieces, and just leaving out the "any value" piece. A similar translation strategy should apply to entry-point generic parameters (mirroring how we lower generic functions for dynamic dispatch already), and value specialization parameters. Co-authored-by: Yong He <yonghe@outlook.com>
2020-10-15	Fix a bug in IR lowering (#1578)	Tim Foley
	The basic problem here is that when a function has multiple declarations with matching signatures (e.g., a forward declaration and then a later definition with a body), the IR lowering logic would lower all declarations whenever the first one was encountered, but then would only register an IR value as the lowered version of the first declaration. Other matching declarations would then run the risk of being lowered again, and in the case where they included features like loops with break/continue labels, that would create the risk of keys getting inserted into certain dictionaries more than one, leading to exceptions. This change ensures that when lowering a function that has multiple matching declarations to IR, we register an IR value for all of those declarations and not just the first. I have added a test case that leads to a crash without this change, to ensure that we don't introduce a regression down the line.
2020-09-23	Simplify workflow when using NVAPI (#1556)	Tim Foley
	In some cases, functionality is available as either a GLSL extension for Vulkan/SPIR-V, or through the NVAPI system for D3D. This situation creates complications because while GLSL extensions are generally all supported by the open-source glslang compiler (which we can bundle and ship), NVAPI operations are exposed through a specific header (`nvHLSLExtns.h`) that ships as part of the NVAPI SDK. When a user wants to explicitly use NVAPI-provided operations in their shader code, there are no major complications for Slang; the user sets up their include paths, `#include`s the relevant header, calls functions in it, and lets Slang deal with the details of compilation. The challenge for Slang arises when we want to provide a cross-platform interface in our standard library (e.g., the `RWByteAddressBuffer.InterlockedAddF32` method that was recently added) that uses either a GLSL extension (when compiling for Vulkan/SPIR-V) or an NVAPI (when compiling to DXBC or DXIL). In that case, the code generated by Slang now has a dependency on NVAPI, and we need to somehow emit a `#include` directive that pulls it in when invoking fxc or dxc. Because we do not (and seemingly cannot) bundle the NVAPI header with the compiler, we have to rely on ther user to have it available and to somehow communicate to Slang where it is. Exposing portable routines that sometimes use NVAPI currently creates two main challenges: 1. The user is forced to interact with the "prelude" mechanism in the compiler, which allows the programmer to define code in a given target language that gets prepended to the Slang-generated code. While the prelude mechanism is powerful, it is also hard for users to integrate into their workflow, and our experience so far is that users want something that Just Works. 2. If the user writes code that uses some of our abstract operations that layer on NVAPI and they also want to use NVAPI explicitly, they end up with two copies of the NVAPI header (one included by the Slang front-end, and another included by the downstream fxc/dxc compiler). This puts the user in the situation of (a) having to ensure that they set the defines like `NV_SHADER_EXTN_SLOT` consistently both when invoking Slang and when adding their prelude, and (b) even if they do make the definitions consistent, they run into the problem that fxc/dxc complain about overlapping register bindings on the two copies of the `g_NvidiaExt` global shader paraemter that the NVAPI header declares. This change attempts to resolve both issues by adding a lot of "do what I mean" logic to the compiler to try to ease things in the common case. In particular: 1. The user no longer needs to use the "prelude" mechanism when using NVAPI. The compiler now embeds a default prelude for HLSL output, which will `#include` the NVAPI header if and only if the generated code needs NVAPI access because of portable standard library routines that were used. 2. The user can mix-and-match explicit NVAPI use and stdlib functions that compile to use NVAPI. The register/space to be used by NVAPI when included via prelude is now set based on whatever the user set via the preprocessor so that it should automatically be consistent between both cases. Furthermore, the code we emit for the declaration of `g_NvidiaExt` when compiling explicit NVAPI use is set up to be conditional, so that it is skipped in the case where the prelude will pull in its own declaration of that parameter. The way all this is achieved involves a lot of moving pieces: * We now have an HLSL prelude, which mostly just serves to `#include "nvHLSLExtns.h"` in the case where NVAPI support is needed downstream. * Standard library operations that require NVAPI for their implementation on HLSL include a new `[__requiresNVAPI]` attribute. * The preprocessor has been extended so that after tokenizing an input file it looks up the NVAPI-relevant macros in the resulting environment, and if they are set it attached a modifier (`NVAPISlotModifier1) to the AST `ModuleDecl` that is based on their values. Logic is added to detect if multiple input files specify values for the macros in ways that conflict. * The semantic checking step is extended so that it detects the "magic" NVAPI declarations (the `g_NvidiaExt` paramter and the `NvShaderExtnStruct` type that it uses) and attaches a modifier to them so that they can be identified as such in later steps. * Parameter binding is extended to collect a list of the AST modifiers that reflect NVAPI binding, and to reserve the relevant register(s) so that ordinary user-defined parameters cannot conflict with them. * IR lowering translates the three new AST modifiers related to NVAPI over to IR equivalents. * IR linking is extended to make sure that it clones any `IRNVAPISlotDecoration`s attached to the input modules. The pass intentionally does not care where the modifiers came from; it just collects them all and leaves it to downstream code to sort out what they mean. * Emit logic is extended to have a notion of "prelude directives" which are preprocessor directives that should come before the prelude in the generated code, because they can impact the way that the prelude compiles. This is done so that we don't have to introduce ad hoc logic for each downstream compiler to set any relevant `-D` flags (e.g., both fxc and dxc would need to duplicate such logic for NVAPI support). * The HLSL source emitter is extended to track whether it emits any operations that require NVAPI support. * The HLSL source emitter is extended to emit prelude directives based on whether NVAPI is needed and, if it is, to also set the register and space that NVAPI should use based on what was stored in the decoration(s) on the IR module. * The HLSL source emitter is extended so that it detects global instructions that represent "magic" NVAPI constructs , and emit them as conditional definitions so that they are skipped when NVAPI is included via the prelude. * The handling of requires capabilities during emit logic was cleaned up a bit so that more logic is shared across targets, and also so that the same logic is used both when emitting a function declaration/definition and when emitting a call to an instrinsic function (which won't get declared/defined).
2020-09-04	Allow mixing unspecialized and specialized existential parameters. (#1533)	Yong He
	* Allow mixing unspecialized and specialized existential parameters. * Fixes.
2020-09-02	Allow unspecialized existential shader parameters (dynamic dispatch). (#1529)	Yong He
	* Allow unspecialized existential shader parameters (dynamic dispatch). * Fixes. * Fixes * disable cuda test
2020-08-28	Avoid nondeterministic ordering of output (#1522)	Tim Foley
	Most people agree that it is a Good Thing when compilers are deterministic: the exact same input bits produce the exact same output bits every time the compiler is run. Bonus points are awarded if the results are independent of the platform the compiler was compiled for and run on. One of the easiest kinds of nondeterminism to have sneak into a compiler is for it to produce the "same" code inside functions, but sometimes emits functions or other global symbols in a different order from run to run. Right now, the Slang compiler has some of this kind of nondeterminism. The main way (but not necessarily the only way) that a compiler ends up producing output with a different ordering across runs is by iterating over the contents of a hash-based container (in our codebase, a `Dictionary` or `HashSet`), where the keys make use of pointers. Most operating systems intentionally try to randomize the address space of processes across runs (as a security feature), so that exact pointer values are not stable across runs, and thus hash value are not stable across runs, and thus the ordering of entries is not stable across runs. This change identifies a few cases of iterating over dictionaries or sets that could have produced output non-determinism: * The `HLSLIntrinsicSet` was using a `Dictionary` to store intrinsics that had been referenced, and would later produce a linear list of those intrinsics based on their order in the dictionary. * The `WitnessTable`s produced by the front-end stored a `Dictionary` or requirements, and lowering from AST->IR was iterating over that dictionary to ensure that everythign got emitted. * The `SharedSemanticsContext` was tracking a `HashSet` of modules that were imported into scope (so that their `extension`s should be visible), and an iteration over that list was used when producing candidate extensions during lookup. This case is unlikely to cause any nondeterminism in final output, but could lead to nondeterministic ordering in diagnostic messages for ambiguous reference/overload cases. * The IR linker maintains a `Dictionary` of symbols based on their mangled names, and iterates over it in code that clones all witness tables into the linked IR whether or not they are referenced. For most of these cases the fix is simple: * Keep both a `Dictionary`/`HashSet` and a `List` of the appropriate type * Whenever adding to the hash-based container also add to the list * Whenever iterating, iterate over the list In the final case of the IR linker, the relevant code was marked with a `TODO` comment noting that it shouldn't actually be needed, so I simply dropped it and the change doesn't seem to break any of our tests. I've been fairly confident that code wasn't needed for a while. This change isn't exactly elegant, and a better long term solution might be to introduce two new types, `OrderedDictionary` and `OrderedSet`, which are similar to `Dictionary` and `HashSet` except that they guarantee a deterministic order of enumeration of their contents, based on insertion order. (Note that a `SortedDictionary` and/or `SortedSet` that use something like a binary tree to produce a "determinsitc" sorted order wouldn't actually help here, because sorting entries by pointer values wouldn't solve the underlying problem that the pointer values aren't stable across runs) I've chosen to avoid adding new types to `core` in the interest of making the change as small as possible. If we all agree that new types are warranted, it should be easy to clean up these use cases. Testing this change is difficult, because we can't produce a reliable test to rule out nondeterminism. I have done best-effort testing by hand by crafting shaders that show output nondeterminism, and then compiling them both with and without these changes.
2020-08-28	Enable lower-generics pass universally. (#1518)	Yong He
	* Enable lower-generics pass universally. * Exclude builtin interfaces and functions from lower-generics pass. * Update stdlib. * Fixup. * Fixes handling of nested intrinsic generic functions. * Fixes. * Fixes.
2020-08-27	Enable simple extensions of interface types (#1521)	Tim Foley
	The big picture here is that an `extension` can now apply to an interface type and provide convenience methods for all types that implement that interface. Suppose you have an interface for counters: interface ICounter { [mutating] void add(int val); } and a type that implements it: struct SimpleCounter : ICounter { int _state = 0; ... } If a common operation in your codebase is to increment a counter by adding one, you would be faced with the problem of either: * Add the `increment()` operation to `ICounter`, and force every implementation to implement the new requirement * Add the `increment()` operation to concrete counter types as needed, and thus not be able to use it in generic code * Make `increment()` a global ("free") function, and force clients of counters to have to know which operations use member syntax (`c.add(...)`) and which use global function call syntax (`increment(c)`). The whole idea of `extension`s is to allow for another option that is better than all of the above: extension ICounter { [mutating] void increment() { this.add(1); } } The core of the implementation is relatively straightforward, and consists of two complementary pieces. The first piece is that when emitting a concrete IR entity (function/type/whatever) we treat any enclosing `interface` type (or `extension` thereof) a bit like an enclosing `GenericDecl`, and introduce an `IRGeneric` to wrap things. The generic `IRGeneric` has parameters representing the `This` type for the interface, along with the witness table that shows how `This` conforms to the interface itself. We thus end up with an IR version of `increment()` something like: void increment<This : ICounter>(This this) { this.add(1); } The second (complementary) fix is that when there is code that references this `increment()` operation, we don't treat it like an interface requirement (look up based on its key), and instead treat it like a generic (since that is how it is lowered now) and speciaize it to the information we can glean from the `ThisTypeSubstitution`. A related fix that is required here is that within the body of `increment`, when we perform `this.add`, we need to ensure that the lookup of `add` in the base interface properly takes into account the subtype relationship (`This : ICounter`) and encodes it into the lookup result, so that we get `((ICounter) this).add`, and properly generate code that looks up the `add` method in the witness table for `This`.
2020-08-27	Clean up the way that lookup "through" a base type is encoded (#1519)	Tim Foley
	* Clean up the way that lookup "through" a base type is encoded In order to undestand this change, it is important to undestand how lookup through base interfaces works prior to this change. In order to understand that it helps to be reminded of how inheritance relationships get encoded in the AST. Suppose the user writes: struct Base { int val; } struct Derived : Base { ... } ... Derived d = ...; int v = d.val; The question is how an expression like `d.val` gets semantically checked, and how it is encoded into the IR after semantic checking. You might assume it gets checked and encoded so that we end up with: int v = ((Base) d).val; and that seems like it should Just Work... so of course that isn't what Slang has been doing. Instead, we relied on the fact that the inheritance relationship `Derived : Base` is represented as an `InheritanceDecl` member of the `Derived` type, and we ended up checking the code into something like: int v = d.<anonymous>.val; where `<anonymous>` stands in for the name of the `InheritanceDecl` that represents inheritance from `Base`. This design choice makes a limited amount of sense when you consider how inheritance would typically be lowered to a C-like output language: // struct Derived : Base { ... } // => struct Derived { Base base; ... } The problem with that encoding is that it really doesn't make sense for almost any other scenario. In particular, if you have a generic type parameter `T` that was constrianed with `T : ISomething`, then the constraint isn't even technically a member of the type parameter `T`, so expressing thing as a member reference in the AST is completely incorrect. Unfortunately, by the time it was clear that we needed something better, a bunch of implementation work was done based on the existing representation. This change tries to clean things up so that lookup of a super-type member through a value of a sub-type does the obvious thing: cast the value to the super-type and then look up the member (as in `((Base) d).val`). The core of the change is that in lookup, instead of creating `Constraint` breadcrumbs whenever we are looking up in a super-type (with a reference to the `TypeConstraintDecl` being used) we instead use `SuperType` breadcrumbs (with a reference to a `SubtypeWitness`). Then when we create the expression from a `LookupResultItem`, we translate any `SuperType` breadcrumbs into `CastToSuperTypeExpr`s (an expression type that already existed). This change also adds support for lookup through the `This` type in the context of an interface, and in order for that to work we need a new kind of subtype witness to represent the knowledge that a `This` type is a subtype of the enclosing interface. Making that work forces us to change the representation of `TransitiveSubtypeWitness` so that it takes a pair of subtype witnesses (and not one subtype witness plus one `TypeConstraintDecl`). For the most part this is a small change, but it raises the possibility that some pieces of the code aren't going to be robust against all possible shapes of subtype witnesses. The IR lowering logic has relied on the weird `d.<anonymous>` representation in order to ensure that when looking up interface members we weren't always casting to the interface type (which would create a `makeExistential` instruction), and then calling using that. Basically, the IR lowering would ignore the `d.<anonymous>` part and just emit `d`, but we can't do that for `((Base) d)` or `((IThing) d)` because whehter or not we should actually perform the cast depends on context. For now we solve that problem by adding specific logic to ignore up-casts to interface types when they appear in member expressions or method calls. A more robust solution might be needed down the line, but this seems to work in practice. All of this work is cleanup that I found was needed in order to make `extension`s of `interface` types workable. * fixup: disable an incorrect test
2020-08-25	Export witness table and RTTI objects in compiled libraries. (#1514)	Yong He
	* Export witness table objects in compiled code. - Ensure that witness tables are preceeded with `extern "C"` modifier in the generated C++ code. - RTTI objects use the mangled name of the type directly, so that can be queried using the type's mangled name directly from the resulting DLL. - Expose `Linkage::getTypeConformanceWitnessMangledName` to return the mangled name of witness tables to the host. - Ensure that all witness tables (including those for associated types) have proper mangled name. * Fix GCC error in Slang generated code.
2020-08-20	Initial support for a using construct (#1506)	Tim Foley
	The basic idea is that if you have a namespace: namespace MyCoolNamespace { void f() { ... } ... } then you can bring the declarations from that namespace into scope with: using MyCoolNamespace; f(); The `using` construct is allowed in any scope where declarations are allowed. As an additional feature, the construct allows and then ignores the keyword `namespace` if it occurs right after `using`: using namespace MyCoolNamespace; Note that unlike in C++, `using` a namespace inside another namespace doesn't implicitly make the symbols available to clients of that namespace: namespace hidden { void secret() {...} ... } namespace api { using hidden; ... } api.secret(); // ERROR: `secret()` isn't a member of `api` The implementation of this feature was relatively simple, although it does leave out more advanced features that might be desirable in the future: * No support for `using MCN = MyCoolNamespace` sorts of tricks to define a short name * No support for `using` anything that isn't a namespace (e.g., to make the members of a type available without qualification) * No support for cases where multiple visible modules have a namespace of the same name (or dealing with overloaded namespaces in general)
2020-08-17	GPU Foreach Loop (#1498)	Dietrich Geisler
	* GPU Foreach Loop This PR introduces the completed GPU foreach loop and updates the heterogeneous-hello-world example to use it. This PR builds on the previous introduction of the GPU Foreach loop parsing and semantic checking PR (#1482) by introducing IR lowering and emmitting. THe new feature can be used by having a GPU_Foreach loop interacting with a named non-CPP entry point, and using the -heterogeneous flag. * Fix to path Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-08-12	GPU Foreach Parsing and Checking (#1482)	Dietrich Geisler
	This PR introduces parsing and semantic checking for a GPU foreach loop for heterogeneouis programming. A GPU foreach loop takes the form: ``` __GPU_FOREACH(renderer, gridDims, LAMBDA(uint3 dispatchThreadID) { kernelCall(args, ...); }); ``` And will allow the host code to call into a kernel with the correct renderer and grid dimensions. This commit also introduces a hack to unify types in the heterogeneous hello world file, which will hopefully be amended in the future. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-08-05	`AnyValue` based dynamic dispatch code gen (#1477)	Yong He
	* AnyValue based dynamic code gen * Fix aarch64 build error
2020-07-31	Add [anyValueSize] attribute to interfaces and propagate that in the IR. (#1469)	Yong He
	Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-07-31	Binary for Heterogeneous Example (#1467)	Dietrich Geisler
	* Binary Heterogeneous Example This PR introduces the ability to insert the binary of a non-CPU target by using the -heterogeneous flag. Specifically, this PR updates the emitting logic to produce a variable of name `__[name_of_entryPoint]` when the heterogeneous flag is present. * Prelude path fix Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-07-28	Generalize lowerSimpleIntrinsicType to include generic arguments (#1464)	Yong He
	* Generalize lowerSimpleIntrinsicType to include generic arguments * Use recursion instead of loop to get the correct ordering for nested generics
2020-07-27	Baseline Heterogeneous Example (#1460)	Dietrich Geisler
	* Baseline Heterogeneous Example This PR introduces a baseline heterogeneous example, including both a Slang file and an associated C++ helper file. This refactoring primarily moves the Slang file "into the driver's seat" while maintaining that the C++ side still does most of the actual work. * Fix to prelude path
2020-07-24	Ensure labels are dumped in `lower-to-ir` (#1459)	Yong He
	* Ensure labels are dumped in `lower-to-ir`. There is a `dumpIR` function that accepts a label parameter already in slang-emit.cpp. This change moves it to slang-ir.cpp so it may be called from other files. * update expected test result Co-authored-by: Yong He <yhe@nvidia.com> Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-07-16	Support associatedtype local variables and return values in dynamic dispatch ↵	Yong He
	code (#1444) * Refactor lower-generics pass into separate subpasses. * IR pass to generate witness table wrappers. * Support associatedtype local variables and return values in dynamic dispatch code.
2020-07-15	Refactor lower-generics pass into separate subpasses. (#1442)	Yong He

2020-07-10	Dynamic code gen for generic local variables. (#1434)	Yong He
	* Dynamic code gen for generic local variables. * Fixes to function calls with generic typed `in` argument. * Fixes per code review comments
2020-07-08	Add support for global uniform shader parameters (#1433)	Tim Foley
	* Adding support for global uniform shader parameters This change adds support for Slang programmers to declare shader parameters of "ordinary" types at global scope: ```hlsl uniform float gScaleFactor; void main() { ... = gScaleFactor; ... } ``` The generated HLSL/GLSL/DXIL/SPIR-V output will be something along the lines of: ```hlsl struct GlobalParams { float gScaleFactor; } cbuffer globalParams { GlobalParams globalParams; } void main() { ... = globalParams.gScaleFactor; ... } ``` The binding information used for the implicit `globalParams` constant buffer will be determined by the existing implicit parameter binding logic (which already had support for this kind of transformation). The reason this change is being pursued right now is because it is one step toward removing the implicit `KernelContext` type that is used to wrap the generated code for our CPU and CUDA C++ targets. Handling global-scope parameters of ordinary type requires an IR pass that synthesizes the `GlobalParams` structure type above, and that step ends up removing the need for the similar `UniformState` structure that was being used in the CPU/CUDA emit logic. A more detailed guide to the changes included follows: * The diagnostic for a global-scope variable that is implicitly a shader parameter was kept, but changed to a warning. Users can opt out of the warning by decorating their parameter as a `uniform` (since that keyword is already being used to mark entry-point parameters that should be treated as uniform shader parameters). * To simplify the task of finding the global shader parameters, the `CLikeSourceEmitter` type has been given an `m_irModule` member. The previous emit logic for `UniformState` was having to do a roundabout solution involving the `EmitAction`s to deal with not having direct access to the module. * Removed a few dead declarations in the emit logic (related to a much earlier point where emit was based on the AST instead of the IR). * Made the computation of type names in C++ emit take into account `ConstantBuffer<T>` and `ParameterBlock<T>`. As far as I can tell, these were being handled with some special-case hacks in the emit logic instead of being supported more fundamentally. It might actually be good to pass these through as `ConstantBuffer<T>` and `ParameterBlock<T>` in the C++ output, and allow the prelude to customize their translation (defaulting to defining them as `T`). Removed the special-case C++ emit logic for references to global shader parameters. There are now at most two global shader parameters to deal with, and the default emit logic (referring to them by name) does the Right Thing. * Changed the handling of entry points for C++ (both CPU and CUDA) so that it handles the bundled-up shader paameters for the global and entry-point scopes the same way. The main complication here is OptiX, where parameter data is passed very differently than it is for CUDA compute kernels. * Reverted changes to `ir-entry-point-uniforms` that had made its logic depend on the compilation target. The parameter binding logic was already responsible for deciding if a given target needed to wrap up its entry-point parameters in a constant buffer, and the IR pass was respecting that layout information. The current workaround had been removing the `ConstantBuffer<T>` indirection from this IR pass for CPU/CUDA, but then reintroducing the same indirection later on in the emit step. * Added an explicit IR pass with the task of collecting global-scope parameters of uniform/ordinary type and packaging them up into a `struct`, and then optionally packaging that `struct` up in a constant buffer. This pass bases its decisions on the IR layout information that was already computed, so it should match whatever policy choices were made at the layout level. * Changed the "key" operand on IR `struct` layout information to not assume an `IRStructKey`. The problem here is that the global scope gets a `StructTypeLayout` to represent its members, and this is convenient (rather than having to always special-case logic that handles the global scope), but the "fields" of that struct are global variables which do not have `IRStructKey`s associated with them. The simplest solution is to use the variables themselves as the keys, which required removing the assumption in the IR encoding. * Updated the IR layout process to compute a layout for the global scope of an entire program, and to attach that to the `IRModule` via a decoration. Updated the IR linking process to carry through that decoration to the linked output. This is necessary so that the IR pass that transforms global parameters can access the global-scope layout information. An important concern with this approach is that the contents and layout of the monolithic `GlobalParams` structure depends on the exact set of modules that were linked (and the order in which they were specified, in some cases). This isn't really a new thing with this change, but it becomes more important as we start to think of how to generalize things to better support separate compilation and linking. There are changes that can (and should) be made to the way that IR layouts are computed for programs (e.g., so that we compute layout per-module and then combine them rather than as a whole-program step). In this case, the problem of forming the combined/linked global layout can be moved down the IR level and not be reliant on AST-level information. Just changing the way layout and linking interact would not change the fundamental problem that global shader parameters as they currently exist in Slang/HLSL/GLSL are not readily compatible with true separate compilation. We either need to find a solution strategy that we can apply to allow existing shaders to work with separate compilation or we need to incrementally work toward removing support for global-scope shader parameters in favor of explicit entry-point parameters in all cases. * fixup: missing files * fixup: comment the new code
2020-07-07	Public Keyword for Functions (#1432)	Dietrich Geisler
	This PR introduces support for the public modifier for functions. This keyword allows labelled functions to be written to the compiled without having a link to an entry point. The goal of this change is to help support heterogeneous design of Slang by permitting C++ code to interact with CPU slang functions. Internally, this PR adds the public decoration to the IR and defines a lowering from the public modifier in the AST to this decoration. Additionally, the Keep Alive decoration is added to any public modifier being lowered, which prevents DCE from eliminating functions labelled with the public keyword. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-07-07	Add a test case for dynamic dispatch with `This` type in interface decl. (#1431)	Yong He
	* Add a test case for dynamic dispatch with `This` type in interface decl. * Update comments * fix typo in comments Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-06-30	Initial work on property declarations (#1410)	Tim Foley
	* Initial work on property declarations Introduction ============ The main feature added here is support for `property` declarations, which provide a nicer experience for working with getter/setter pairs. If existing code had something like this: ```hlsl struct Sphere { float4 centerAndRadius; // xyz: center, w: radius float3 getCenter() { return centerAndRadius.xyz; } void setCenter(float3 newValue) { centerAndRadius.xyz = newValue; } // similarly for radius... } void someFunc(in out Sphere s) { float3 c = s.getCenter(); s.setCenter(c + offset); } ``` It can be expressed instead using a `property` declaration for `center`: ```hlsl struct Sphere { float4 centerAndRadius; // xyz: center, w: radius property center : float3 { get { return centerAndRadius.xyz; } set(newValue) { centerAndRadius.xyz = newValue; } } // similarly for radius... } void someFunc(in out Sphere s) { float3 c = s.center; s.center = c + offset; } ``` The benefits at the declaration site aren't that signficiant (e.g., in the example above we actually have slightly more lines of code), but the improvement in code clarity for users is significant. Having `property` declarations should also make it easier to migrate from a simple field to a property with more complex logic without having to first abstract the use-site code using a getter and setter. An important future benefit of `property` syntax will be if we allow `interface`s to include `property` requirements, and then also allow those requirements to be satisfied by ordinary fields in concrete types. Subscripts ---------- The Slang compiler already has limited (stdlib-use-only) support for `__subscript` declarations, which are conceptually similar to `operator[]` from the C++ world, but are expressed in a way that is more in line with `subscript` declarations in Swift. A `SubscriptDecl` in the AST contains zero or more `AccessorDecl`s, which correspond to the `get` and `set` clauses inside the original declaration (there is also a case for a `__ref` accessor, to handle the case where access needs to return a single address/reference that can be atomically mutated). A major goal of the implementation here is to re-use as much of the infrastructure as possible for `__subscript` declarations when implementing `property` declarations. Nonmutating Setters ------------------- One additional thing added in this change is the ability to mark a `set` accessor on either a subscript or a property as `[nonmutating]`, and indeed all of the existing `set` accessors declared in the stdlib have been marked this way. The need for this modifier is a bit subtle. If we think about a typical subscript or property: ```hlsl struct MyThing { int f; property p : int { get { return f; } set(newValue) { f = newValue; } } } ``` it is clear we want the `set` accessor to translate to output HLSL as something like: ``` void MyThing_p_set(inout MyThing this, int newValue) { this.f = newValue; } ``` Note how the implicit `this` parameter is `inout` even though we didn't mark anything as `[mutating]`. This is the obvious thing a user would expect us to generate given a property declaration. Now consider a case like the following: ```hlsl struct MyThing { RWStructuredBuffer<int> storage; property p : int { get { return storage[0]; } set(newValue) { storage[0] = newValue; } } } ``` This new declaration doesn't require (or want) an `inout` `this` parameter at all: ``` void MyThing_p_set(MyThing this, int newValue) { this.storage[0] = newValue; } ``` In fact, given the limitations in the current Slang compiler around functions that return resource types (or use them for `inout` parameters), we can only support a `set` operation like this if we can ensure that the `this` parameter is considered to be `in` instead of `inout`. This is exactly the behavior we allow users to opt into with a `[nonmutating] set` declaration. All of the subscript operations in the stdlib today have `set` accessors that don't actually change the value of `this` that they act on (e.g., storing into a `RWStructuredBuffer` using its `operator[]` doesn't change the value of the `RWStructuredBuffer` variable -- just its contents). We'd gotten away without this detail so far just because `set` accessors were only being declared in the stdlib and they were all implicitly `[nonmutating]` anyway, so it never surfaced as an issue that the code we generated assumed a setter wouldn't change `this`. Implementation ============== Parser and AST -------------- Adding a new AST node for `PropertyDecl` and the relevant parsing logic was mostly straightforward. The biggest change was allowing a `set` declaration to introduce an explicit name for the parameter that represents the new value to be set. This change also adds a `[nonmutating]` attribute as a dual to `[mutating]`, for reasons I will get to later. Semantic Checking ----------------- The `getTypeForDeclRef` logic was updated to allow references to `property` declarations. Some of the semantic checking work for subscripts was pulled out into re-usable subroutines to allow it to be shared by `__subscript` and `property` declarations. The checking of accessor declarations, which sets their result type based on the type of the outer `__subscript` was changed to also handle an outer `property`. Some special-case logic was added for checking of `set` declarations to make sure that their parameter is given the expected type. Some logic around deciding whether or not `this` is mutable had to be updated to correctly note that `this` should be mutable by default in a `set` accessor, with an explicit `[nonmutating]` modifier required to opt out of this default. (This is the inverse of how a typical method or `get` accessor works). IR Lowering ----------- The good news is that after IR lowering, access to properties turns into ordinary function calls (equivalent to what hand-written getters and setters would produce), so that subsequent compiler steps (including all the target-specific emit logic) doesn't have to care about the new feature. The bad news is that adding `property` declarations has revealed a few holes in how IR lowering was handling `__subscript` declarations and their accessors, so that it didn't trivially work for the new case as-is. The IR lowering pass already has the `LoweredValInfo` type that abstractly represents a value that resulted from lowering some AST code to the IR. One of the cases of `LoweredValInfo` was `BoundSubscript` that represented an expression of the form `baseVal[someIndex]` where the AST-level expression referenced a `__subscript` declaration. The key feature of `BoundSubscript` is that it avoided deciding whether to invoke the getter, the setter, or both "too early" and instead tried to only invoke the expected/required operations on-demand. This change generalizes `BoundSubscript` to handle `property` references as well, so it changes to `BoundStorage`. Making the type handle user-defined property declarations required fixing a bunch of issues: * When building up argument lists in the IR, we need to know whether an argument corresponds to an `in` or an `out`/`inout` parameter, to decide whether to pass the value directly or a pointer to the value. Some of the logic in the lowering pass had been playing fast and loose with this, so this change tries to make sure that whenever we care computing a list of `IRInst` that represent the arguments to a call we have the information about the corresponding parameter. Similarly, when emitting a call to an accessor in the IR, the information about the expected type of the callee was missing/unavailable, and the code was incorrectly building up the expected type of the callee based on the types of the arguments at the call site. The logic has been changed so that we can extract the expected signature of an accessor (how it will be translated to the IR) using the same logic that is used to produce the actual `IRFunc` for the accessor (so hopefully both will always agree). * Dealing with `in` vs. `inout` differences around parameters means also dealing with the "fixup" code that is used to assign from the temporary used to pass an `inout` argument back into the actual l-value expression that was used. That logic has all been hoisted out of the expression visitor(s) and into the global scope. Future Work =========== The entire approach to handling l-values in the IR lowering pass is broken, and it is in need a of a complete rewrite based on new first-principles design goals. While something like `LoweredValInfo` is decent for abstracting over the easy cases of r-values, addresses, and a few complicated l-value cases like swizzling, it just doesn't scale to highly abstract l-values like we get from `__subcript` and `property` declarations, nor other corner cases of l-values that we need to handle (e.g., passing an `int` to an `inout float` parameter is allowed in HLSL, and performs conversions in both directions!). It Should be Easy (TM) to extend the logic that tries to synthesize an interface conformance witness method when there isn't an exact match to also support synthesizing a property declaration (plus its accessors) to witness a required property when the type has a field of the same name/type. * fixup: pedantic template parsing error (thanks, clang!) * fixup: cleanups and review feedback * Removed some `#ifdef`'d out code from merge change * Added proper diagnostics for accessor parameter constraints, which led to some fixes/refactorings * Added a test case for the accessor-related diagnostics
2020-06-25	Add a TODO comment for generic interface requirement key	Yong He

2020-06-25	Fixes	Yong He