| Age | Commit message (Collapse) | Author |
|
This PR introduces parsing and semantic checking for a GPU foreach loop
for heterogeneouis programming. A GPU foreach loop takes the form:
```
__GPU_FOREACH(renderer, gridDims, LAMBDA(uint3 dispatchThreadID) {
kernelCall(args, ...); });
```
And will allow the host code to call into a kernel with the correct
renderer and grid dimensions. This commit also introduces a hack to
unify types in the heterogeneous hello world file, which will hopefully
be amended in the future.
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
Better split of responsibilites around _begin/_endInst
|
|
* Fix WaveActiveCountBits on glsl output.
* Fix warning `could not be inlined because the return instruction is not at the end of the function. This could be fixed by running merge-return before inlining.` from glslang - because we weren't including the CreateMergeReturnPasss on default optimization, and it's assumed in InlineExhaustivePass.
* Keep WaveActiveCountBits use the default WaveMask impl.
* Fix WaveCountBits calculation.
Use WaveActiveBallot instead of the _WaveActiveBallot.
|
|
* Improve handling of cast detection when have a more complex cast than just a single identifier.
* Improve comments around heuristic for casting
* Added nested enum test.
* Improve comments
* Define function like - output change.
* Use lookup for types in determining if cast or not.
* Add _isCast function
* Add heuristic test to nested-enum.slang that works if the type test fails.
* Change hueristic based on review.
Allow (..)( to always be an expression, because if it's a type it will be turned into a cast later.
* Fix output of define-function-like.slang - which changes again with improved casting support.
* Improve testing for type in cast - if we find a decl and it's not a type, then we know it's not a cast.
|
|
* Fix the minProgramTexelOffset should be -.
* Improve comments.
|
|
* AnyValue packing/unpacking pass.
* Add diagnostic for types that does not fit in required AnyValueSize.
* Add expected test result
* Fix warnings.
|
|
* Use m_style for OSFindFilesResult
* Refactor of FindFilesResult.
* Fixes on linux for FindFiles.
* Simplify FindFilesState, and linux support for pattern matching.
* Small fixes to linux FindFilesState
* Fix typo on linux FindFiles
* Fix typo in linux FindFiles.
* Renamed some variables, and improved comments on FindFiles.
* Improve comments on FildFiles
* Small improvements around FindFiles.
* Refactor FindFiles again.. into a visitor and function in Path.
* Fix some problems on linux.
* Fix linux typo.
* Renamed os -> find-file-util
* find-file-utl -> directory-util
* Make delete of PathInfo explicit.
* Initialize alwaysCreateCollectedParam .
* WIP spir-v emit using MemoryArena
* Fix bug in spirv emit.
* Fix bug with handling null termination on strings in spirv emit.
* Small improvements in comments around emit spirv
* Remove the 'dst' from emitOperand - we can only emit to the current inst.
* Improve SpirV emit comments.
* Don't store the created instruction in the InstConstructScope - as it's always the m_currentInst.
Don't return the instruction after _beginInst.
Slight comment improvements.
|
|
* Use m_style for OSFindFilesResult
* Refactor of FindFilesResult.
* Fixes on linux for FindFiles.
* Simplify FindFilesState, and linux support for pattern matching.
* Small fixes to linux FindFilesState
* Fix typo on linux FindFiles
* Fix typo in linux FindFiles.
* Renamed some variables, and improved comments on FindFiles.
* Improve comments on FildFiles
* Small improvements around FindFiles.
* Refactor FindFiles again.. into a visitor and function in Path.
* Fix some problems on linux.
* Fix linux typo.
* Renamed os -> find-file-util
* find-file-utl -> directory-util
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
Entry point `uniform` parameters were a feature of the original Cg and HLSL, but have not been used much in production shader code. One of our goals on Slang is to reduce the (ab)use of the global scope, so bringing entry point `uniform` parameters up to a greater level of usability is an important goal.
Some policy choices about how global vs. entry-point `uniform` parameters behave have already been made, that shape decisions looking forward:
* For DXBC/DXIL, it makes the most sense to follow the lead of fxc/dxc, by treating entry point `uniform` parameters as a kind of syntax sugar for global shader parameters. Any parameters of "ordinary" types are bundles up into an implicit constant buffer, and all the resources (including the implicit constant buffer) are assigned `register`s just as for globals. It is up to the application to decide how to bind those parameters via a root signature (using root descriptors, root constants, descriptor tables, local vs. global root signature, etc.)
* For CPU, it makes sense to pass global vs. entry-point parameters as two different pointers, although the details of what we do for CPU are the least constrained across all current targets.
* For CUDA compute, it makes the most sense to map global shader parameters to `__constant__` global data, and entry-point `uniform` parameters to kernel parameters. This choice ensures that the signature of a kernel when translated from Slang->CUDA follows the Principle of Least Surprise, at the cost of making entry-point vs. global parameters be passed via different mechanisms.
* For OptiX ray tracing, it makes sense to expand on the precedent from CUDA compute: pass global parameters via global `__constant__` data (as is already expected by OptiX for whole-launch parameters), and pass entry-point `uniform` parameters via the "shader record." This establishes a precedent that for ray-tracing shaders, global-scope parameters map to the "global root signature" concept from DXR, while entry-point `uniform` parameters map to a "local root signature" or "shader record."
* For Vulkan ray tracing, the precedent from OptiX then argues that entry-point `uniform` parameters should map to the Vulkan "shader record" concept (and thus cannot support things like resource types).
* The remaining interesting case is what to do for non-ray-tracing shaders on Vulkan.
The dev team agrees that the most reasonable choice to make for non-ray-tracing Vulkan shaders is to map entry-point `uniform` parameters to "push constants." In particular, this makes it easy to express the case of a compute kernel with direct parameters of ordinary/value types in the way that will be implemented most efficiently.
The big picture is then that a kernel like:
```hlsl
void computeMain(uniform float someValue) { ... }
```
will map to output GLSL like:
```glsl
layout(push_constant)
uniform
{
float someValue;
} U;
void main() { ... }
```
If the user really wanted a constant-buffer binding to be created instead, they can easily change their input to make the buffer explicit:
```hlsl
struct Params { float someValue; }
void computeMain(uniform ConstantBuffer<Params> params) { ... }
```
(Forcing the user to be explicit about the desire for a buffer here creates a nice symmetry between Vulkan and CUDA; in the first case the user sets up the data in host memory and passes it to the GPU by copy, while in the second case the user must allocate and set up a device-memory buffer for the data. This symmetry extends to D3D if the application chooses to map entry-point `uniform` parameters to root constants.)
This change implements logic in the "parameter binding" part of the Slang compiler to make sure that entry-point `uniform` parameters are wrapped up in a push-constant buffer rather than an ordinary constant buffer for non-ray-tracing shaders on Vulkan (and in a shader record "buffer" for the ray-tracing case).
The majority of the actual work was in adding support for root/push constants to the test framework and the graphics API abstraction it uses. To be clear about that support:
* Root constant ranges are (perhaps confusingly) treated as a new kind of "slot" that can appear on a descriptor set. This choice ensures that the implicit numbering of registers/spaces used by the back-ends can account for these ranges correctly.
* The `TEST_INPUT` lines are extended to allow a `root_constants` case that behaves more or less like `cbuffer`
* The CPU and CUDA paths can treat a `root_constants` input identically to a `cbuffer`. They already allocate the actual buffers based on reflection, and just use `cbuffer` as a directive that causes bytes to be copied in.
* On D3D12 and Vulkan, a descriptor set allocates a `List<char>` to hold the bytes of root constant data assigned into it, and these bytes are flushed to the command list when the table is actually bound (usually right before rendering).
* On D3D11, a descriptor set treats a root constant range more or less like a constant buffer range (with a single buffer), except that it also automatically allocates a buffer to hold the data. Assigning "root constant" data automatically copies it into that buffer.
The small number of tests that used entry-point `uniform` parameters of ordinary types were updated to use the new `root_constant` input type, and the bugs that surfaced were fixed.
A new test to confirm that entry-point `uniform` parameters map to the shader record for VK ray tracing was added.
An important but technically unrelated change is the removal of the `DescriptorSetImpl::Binding` type and related function from the Vulkan implementation of `Renderer`. That type was created to ensure that objects that are bound into a descriptor set don't get released while the descriptor set is still alive, but the implementation relied on a complicated linear search to check for existing bindings, which could create a performance issue for descriptor sets that include large arrays of descriptors. The new implementation makes use of the approach already present in the various `Renderer` implementations (including the Vulkan one) for assigning ranges in a descriptor set a flat/linear index for where their pertinent data is to be bound. As a result, the Vulkan `DescriptorSetImpl` now uses a single flat array of `RefPtr`s to track bound objects, and has no need for linear search when binding.
Co-authored-by: Yong He <yonghe@outlook.com>
|
|
* AnyValue based dynamic code gen
* Fix aarch64 build error
|
|
* Add the Feedback texture types.
Depreciate SLANG_RESOURCE_EXT_SHAPE_MASK.
* Starting point to test sampler feedback.
* WIP on FeedbackSampler.
* Use __target_intrinsic to override the output of sampler feedback types.
* Use newer generic syntax for FeedbackTexture.
* Reflects Feedback type.
* SLANG_TYPE_KIND_TEXTURE_FEEDBACK -> SLANG_TYPE_KIND_FEEDBACK
* Added reflection test.
* Reneable issue with generics in sampler-feedback-basic.slang
* Add methods to FeedbackTexture2D/Array.
Make test cover test cases.
* Sampler feedback produces DXC code.
* Disabled Sampler feedback test - as requires newer version of DXC.
* Fix bug in reflection tool output.
* Fix problem with direct-spirv-emit.slang.expected due to update to glslang.
* Fix direct-spirv-emit.slang
* Use SLANG_RESOURCE_EXT_SHAPE_MASK again
* Make Feedback be emitted as a textue type prefix.
* Add support for GetDimensions to FeedbackTexture2D
* WIP on CPU sampler feedback.
Update of target compatibility.
* Fix some bugs in C++ feedback sampler.
Fix GetDimensions for FeedbackTextures.
Run 'Compile' test for CPU compute feedback texture test.
Update target-compatability.md
* Fix GetDimensions call on feedback sampler.
* Small documentation improvements.
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
The declarations of the left- and right-shift operations in the Slang standard library were set up identically to the declarations of operator binary (and compound binary) operations. A consequence of this choice was that both operands to a shift were expected to have the same type, which can lead to a confusing result.
If the user wrote a shift of the form `int >> uint`, then the ordinary promotion rules for Slang would decide to perform the operation on `uint` value, so it would change to `uint(int) >> uint` and perform an unsigned shift, which isn't what the user would expect.
The fix implemented here is to make the shift operations be declared separately from the other binary operations, with *two* generic type parameters instead of one: distinct parameters for the left-hand-side and right-hand side types. Each parameter is only constrained to be a built-in integer type.
|
|
* Add the Feedback texture types.
Depreciate SLANG_RESOURCE_EXT_SHAPE_MASK.
* Starting point to test sampler feedback.
* WIP on FeedbackSampler.
* Use __target_intrinsic to override the output of sampler feedback types.
* Use newer generic syntax for FeedbackTexture.
* Reflects Feedback type.
* SLANG_TYPE_KIND_TEXTURE_FEEDBACK -> SLANG_TYPE_KIND_FEEDBACK
* Added reflection test.
* Reneable issue with generics in sampler-feedback-basic.slang
* Add methods to FeedbackTexture2D/Array.
Make test cover test cases.
* Sampler feedback produces DXC code.
* Disabled Sampler feedback test - as requires newer version of DXC.
* Fix bug in reflection tool output.
* Fix problem with direct-spirv-emit.slang.expected due to update to glslang.
* Fix direct-spirv-emit.slang
* Use SLANG_RESOURCE_EXT_SHAPE_MASK again
* Make Feedback be emitted as a textue type prefix.
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
This change includes a few different fixes for issues that arose in a user shader that made use of DXR 1.1.
The existing solution we had for handling the DXR 1.1 `RayQuery` type relied on the fact that a declaration like:
```hlsl
RayQuery<0> myRayQuery;
```
Looks like an undefined variable to existing Slang, while to dxc it is a variable declaration that runs an implicit default constructor (sneaking a bit of C++ into HLSL, but only in a way the standard library can use).
Slang was getting away with the fact that this maps to an undefined variable because it turns out that our emit logic would output the exact same declaration for an undefined value (since declaring a variable without initializing it is the simplest way to get an undefined value of a given type in a C-like language).
The main bug that arose here was that if the `RayQuery<...>`-typed variable was declared under control flow, then the `undefined` instructions introduced by our SSA pass would actually get inserted into the wrong block. Basically, when a block was trying to read a variable, and there was no preceding `store` to that variable in the block, we'd start looking for incoming values from its predecessor block(s). In the case where the variable *never* gets stored to, this search would eventually reach the first block of the function, where we'd realize the value must be `undefined`. The result was that we might insert an `undefined` instruction of some `T` into the first block of a function, but the type `T` might be the result of a lookup operation performed later in that function. This ends up creating a use of `T` that isn't dominated by the definition, which violates the SSA property. This violation of the SSA properties lead to us generating incorrect code in a later pass that deals with scoping differences between SSA form and our structured output statements; that code would end up creating a local variable to hold a *type* instead of a value.
The main fix is in `slang-ir-ssa.cpp`, where we catch the case of trying to read a variable in the block that declares it, if there we no preceding `store`s. We simply insert an `undefined` instruction before the first such read and write that out as the value of the variable to be used for subsequent instructions (up to the next `store`). This fixes the SSA dominance property for the `undefined` values that get introduced and thus technically fixes the output code for the user shader.
A secondary issue is that it is kind of gross to be relying on the behavior of `undefined` instructions in the IR for the semantics of an important standard library type like `RayQuery<...>`. A preceding change already added basic support for Slang to run default initializers (declared as `__init()`) on variables that are declared without an initial-value expression. This change adds such a default initializer to `RayQuery<...>` and maps it to a dedicated IR instruction that is intended to represent the idea of running a C++-style default constructor to produce a value. It turns out that the code we need to emit in that cse is identical to what we currently emit for `undefined` instructions, so that is helpful.
A tertiary issue is that when trying to run the user shader in debug mode, I ran into an assertion because our type layout logic for reflection had never dealt with the issue of user-defined `enum` types being used in constant buffers or other memory that needs layout. I added a quick fix that lays out any `enum` types as their "tag type" (which defaults to `int`).
Unfortunately, there is no easy way to check in a regression test for the user issue, because official `dxcompiler` versions with support for DXR 1.1 are not yet released (at least as of last time I checked).
|
|
* Binary Heterogeneous Example
This PR introduces the ability to insert the binary of a non-CPU target
by using the -heterogeneous flag. Specifically, this PR updates the
emitting logic to produce a variable of name `__[name_of_entryPoint]`
when the heterogeneous flag is present.
* Prelude path fix
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
* Fix premake5.lua so it uses the new path needed for OpenCLDebugInfo100.h
* Keep including the includes directory.
* Added the spirv-tools-generated files.
* We don't need to include the spirv/unified1 path because the files needed are actually in the spirv-tools-generated folder.
* Put the build_info.h glslang generated files in external/glslang-generated. Alter premake5.lua to pick up that header.
* First pass at documenting how to build glslang and spirv-tools.
* Improved glsl/spir-v tools README.md
* Added revision.h
* Change how gResources is calculated.
Update about revision.h
* Update docs a little.
* Split out spirv-tools into a separate project for building glslang. This was not necessary on linux, but *is* necessary on windows, because there is a file disassemble.cpp in spirv-tools and in glslang, and this leads to VS choosing only one. With the separate library, the problem is resolved.
* Fix direct-spirv-emit output.
* Update to latest version of spirv headers and spirv-tools.
* Upgrade submodule version of glslang in external.
* Add fPIC to build options of slang-spirv-tools
* Upgrade slang-binaries to have new glslang.
* Fix issues with Windows slang-glslang binaries, via update of slang-binaries used.
* Small improvements to glslang building process documentation.
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
list) is freed with free in the RiffContainer. (#1473)
|
|
* Generalize lowerSimpleIntrinsicType to include generic arguments
* Use recursion instead of loop to get the correct ordering for nested generics
|
|
The Big Picture
===============
Given input Slang code like:
```hlsl
Texture2D gA;
[shader("compute")]
void kernelFunc(uniform Texture2D b, uint3 tid : SV_DispatchThreadID)
{ ... }
```
the existing CUDA code generation strategy would always generate a kernel with a signature like:
```c++
struct GlobalParams { Texture2D gA; }
struct EntryPointParams { Texture2D b; }
extern "C" __global__
void kernelFunc(EntryPointParams* entryPointParams, GlobalParams* globalParams)
{ ... }
```
This choice was consistent with the conventions of the CPU kernel target, and shares the advantage that it is easy for the user to data-drive the logic for filling in parameters and then invoking a kernel.
However, the approach outlined above has two serious problems when used for CUDA kernels:
* First, it defies the programmer's expectation about what an "equivalent" CUDA kernel signature would be, which makes it awkward for a developer to invoke this kernel from CUDA C++ host code (especially in the context of an app that might also run hand-written CUDA kernels).
* Second, the performance of this approach suffers because every access to a global or entry point parameter turns into a load from global memory. In contrast, a typical hand-written CUDA kernel passes its parameters via an implementation-specific path that (for current CUDA platforms) seems to be equivalent to `__constant__` memory in performance.
This change alters the convention so that the Slang compiler takes the code from the top of this message and translates it into something like:
```c++
struct GlobalParams { Texture2D gA; }
__constant__ GlobalParams SLANG_globalParams;
extern "C" __global__
void kernelFunc( Texture2D b )
{ ... }
```
This translation alleviates both problems with the current translation:
* The signature of the generated CUDA kernel function is as close to that of the original as is possible (we had to eliminate the `SV_*`-semantic varying inputs), and should directly match what the programmer would expect in common cases.
* Entry-point parameters are passed via CUDA kernel parameters, and should thus match in performance. Global parameters are passed via a variable in `__constant__` memory, and thus should also perform as well as possible/expected.
Detailed Changes
================
* Disable the `collectEntryPointUniformParams` pass for CUDA, so that entry-point `uniform` parameters are *not* bundles into a single `struct` and/or `ConstantBuffer`.
* When targeting CUDA, disable the logic for generating an entry-point parameter for passing in the global shader parameter(s)
* Allow `CLikeSourceEmitter` subclasses to override the name generated for entry-point symbols, and use this to add the required prefix for each OptiX kernel type when translating a ray-tracing kernel.
* Add logic to emit "parameter groups" in a specialized way for CUDA (this is the same approach that allows us to generate `cbufffer { ... }` declarations for fxc). A global-scope parameter group will turn into a global `__constant__` variable called `SLANG_globalParams` (that name becomes part of the ABI for Slang-compiled shaders).
* Update the logic in `render-test` for loading and invoking CUDA kernels to handle the new policy.
The last bullet there merits expansion, since it is indicative of the work a client using Slang would have to go through to use our generated kernels with the new policy:
* When loading a CUDA module with one or more kernels, we also use `cuModuleGetGlobal` to query the address of the `SLANG_globalParams` symbol in that CUDA module. That pointer needs to be used when setting global parameter values to be used by kernels in that CUDA odule.
* Because our existing `BindPoint` logic for CUDA always sets up parameter data in GPU memory, we end up having to copy the entry-point parameter data from GPU memory to host memory. This step would ideally be skipped in a codebase that understands the correct policy, but it is a bit unfortunate that it is no longer trivially correct for an application to store all parameter data in GPU memory.
* Before invoking the kernel, we need to use a `cudaMemcpyAsync` to copy from the prepared GPU memory for global parameters over to the `SLANG_globalParams` symbol associated with the kernel to be invoked. Because this operations is issued on the same CUDA stream as the kernel call, it is guaranteed to not overlap with GPU kernel execution.
* When invoking the kernel, we take advantage of the seldom-used `CU_LAUNCH_PARAM_BUFFER_POINTER` facility to specify a contiguous memory region with all the entry-point parameters in it instead of passing each entry-point parameter separately. Given Slang reflection it is also possible to query the offset of each entry-point parameter in the buffer, so we could invoke the kernel in the traditional fashion as well. The choice here is up to the application.
Caveats
=======
* This is a breaking change, and any subsequent release will need to reflect that fact. Any customers who rely on Slang's current CUDA codegen strategy are likely to be surprised by this change, and I don't see an easy way to give them a more gentle transition.
* This change does *not* remove the logic that introduces a `KernelContext` type for code that requires it. That means that things like `static` global variables can continue to work on CUDA for now, but we know that those are not going to be something we can support in the long-term with separate compilation.
* While the policy implemented in this change is a reasonable default, it is still not going to perfectly match expecations for some developers. In particular, some developers who are familiar with both D3D and CUDA will likely wonder why a global `cbuffer` in Slang translates to a global-memory pointer in the output CUDA instead of one global `__constant__` variable per `cbuffer`. A more detailed alternate translation would generate a distinct global `__constant__` variable for each top-level constant buffer or parameter block. We may need to refine the translation even more based on feedback from users who care about how we handle global-scope parameters.
* Recent changes in Slang have broken the logic that handles the OptiX "shader record" as an alternative mechanism for passing entry-point parameters. In order to get any level of OptiX support up and running we will have to change the IR passes that run on CUDA kernels to actually run the "collection" of `uniform` parameters for ray tracing stages, and then to replace references to the resulting parameter with a call to the function to access the shader record.
* The use of `SLANG_globalParams` here works well enough in the case of whole-program compilation; every `CUmodule` ends up with (zero or) one parameter with this name, and an application can just hard-code it. As a mechanism it wouldn't work in the presence of separately-compiled modules that might introduce their own global parameters (including cases like constant lookup tables that really want to be at the global scope). An alternative approach would have Slang generate output PTX for each module, where a module has an optional global symbol for its own global-scope parameters (with a mangled name that is based on the module name), and then a linked CUDA binary has all of those distinct symbols. Such an approach would be compatible with module-at-a-time reflection and parameter binding, but would lead to another breaking change down the line for code that switches to `SLANG_globalParams`.
|
|
The logic that detects intrinsic functions during emit was not able to properly detect an intrinsic generic method nested in a generic type. The basic problem was that this led to a `specialize(specialize(...), ...)` in the IR, which wasn't being handled (only one level of `specialize` was handled). The fix is local and simple.
The larger issue was that the author of this commit had thought our IR ruled out nested generics like this, when in fact that is precisely how we handle nested generics throughout the IR. This oversight/misunderstanding means that we might have broken passes in other places that assume nested generics cannot happen. This change doesn't pretend to fix that other issue, but we should pay attention to it.
|
|
* Baseline Heterogeneous Example
This PR introduces a baseline heterogeneous example, including both a
Slang file and an associated C++ helper file. This refactoring
primarily moves the Slang file "into the driver's seat" while
maintaining that the C++ side still does most of the actual work.
* Fix to prelude path
|
|
There are two main bug fixes here:
* We were failing to diagnose when code calls a `[mutating]` method on a value that doesn't support mutation (that is an r-value instead of an l-value).
* We had a bug in the synthesis logic for interface requirements where we used the *result* type of the requirement in place of each of the *parameter* types.
The second bug made synthesis often produce incorrect signatures with `void` parameters.
The first bug meant that even though a `[mutating]` method should not be able to satisfy a non-`[mutating]` method (and we had code to enforce this for the "exact match" case), when we go on to try and synthesize a non-`[mutating]` method that satisfies the requirement by delegating to the user-written one, it would end up succeeding, because nothing was stopping a non-`[mutating]` method from calling a `[mutating]` one.
In each case this code adds a fix and a test case to confirm it.
|
|
* Ensure labels are dumped in `lower-to-ir`.
There is a `dumpIR` function that accepts a label parameter already in slang-emit.cpp. This change moves it to slang-ir.cpp so it may be called from other files.
* update expected test result
Co-authored-by: Yong He <yhe@nvidia.com>
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
The IR pass that introduces an explicit `KernelContext` for the CPU/CUDA back-ends was also responsible for adding an explicit parameter to the kernel entry point to receive the constant buffer (pointer) with all the global uniform parameters. However, if there were no global uniform parameters, this parameter wasn't getting introduced, which changed the signature/ABI of the generated entry point function.
This change makes it so that the pass unconditionally adds a parameter. In the case where there are no global uniforms it just adds a `void*` parameter that never gets used.
In order to avoid future regressions, this change also adds a test case to confirm that things work correctly when a kernel has only entry-point parameters and no global parameters.
|
|
* Run array specialization in a sperate pass.
* rename specializeFunctionCall->specializeFunctionCalls
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
* Run SSA pass to clean up generic temporary variables during lowering.
* Fix `undefined` emitting logic.
* revert dumpir control flag
* Defer fold decision of `undefined` values after special case logic for GLSL and HLSL.
* Update expected test result.
* Manually update raygen.slang.glsl to minimize change.
* fix formatting
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
During semantic checking, the compiler used to link together `ExtensionDecl`s into a singly-linked list dangling off of the `AggTypeDecl` that they applied to. This approach made lookup relatively easy, because given a `DeclRef` to an `AggTypeDecl` one could easily find and walk the list of candidate extensions.
Unfortunately, the simple approach has two major strikes against it:
* First, as we recently ran into, it creates a lifetime/ownership problem, in cases where the `ExtensionDecl` is outlived by the `AggTypeDecl` it applies to. This creates the one and only place in the compiler today where an "old" AST node might point to a "new" AST node, and it resulted in use-after-free problems in client code.
* Second, the scoping of `extension`s ends up being completely wrong. All of the `extension` methods on a type end up being visible in all cases, instead of just in the context of modules where the `extension` itself is visible. The comparable feature in C# (static extension methods) is careful to not make scoping mistakes like this. The Swift langauge has loose scoping for `extension` more akin to what we have in Slang today, but the maintainers seem to consider it a misfeature.
This change attempts to clean up both issues by changing the way that extension declarations are stored. There are two main pieces:
1. The primary "source of truth" for extension lookup has been moved to the `ModuleDecl`, where a module is responsible for storing a cache of the extensions declared within that module (keyed by the declaration of the type being extended). This cache is updated at the same point where the old code would mutate the AST node being depended on.
2. A secondary aggregated cache is added to the `SharedSemanticsContext` used during semantic checking. This cache includes entries from across multiple modules, and is intended to be invalidated and rebuilt on demand if new modules are added during checking.
Access to the candidate extensions has now been put behind subroutines that require a semantics-checking context to be passed in (there was always one available in contexts that care about extensions).
In addition, the operation for looking up members including those from extensions was refactored heavily to involve internal rather than external iteration and, more importantly, was changed so that it actually tests whether the `ExtensionDecl`s it loops over apply to the type in question, rather than blindly letting extensions members be looked up in ways that don't make sense.
There are three test cases added here to confirm aspects of the fix:
* First, I added a test that reproduces the crash that was being seen, so that we have a regression test for the fix.
* Second, I added a basic semantic-checking test to confirm that an `extension` from an `import`ed module is still visible/usable, to confirm that I didn't break existing valid uses of extensions.
* Third, I added a diagnostic test that ensures that we correctly ignore extensions that should not be visible in a given context as a result of `import` declarations.
Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
|
|
* Multiple Entry Point Backend
This PR introduces changes to the IR linking, emitting, and options for
multiple entry points. Specifically, this PR updates several locations
to support a (potentially empty) list of entry points, adding list infrastructure and looping over entry points as appropriate.
* Formatting change
* Updated unknown target case to not require an entry point
* Formatting and list consts updates
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
contains an array (#1448)
* This code is disabled, it was part of the optimization `Specialize function calls involving array arguments. (#1389)` on github.
It is disabled here because it causes a problem when a struct is passed to a function that contains a structured buffer *and* an array. It is specialized on the struct type, and so those types become parameters to the function. If the struct contains a structured buffer this is a problem on GLSL/VK based targets because currently structured buffers cannot be function parameters.
The fix for now is to just disable this optimization.
* Fix typo in name of test expected values.
|
|
* Put the running of generators into a separate project, to try and sure the generated products are available for other dependencies when compiling with multiple threads on linux.
* Made paths Strings in slang-generate. Made paths use / for path separators (rather than \ on windows which causes some problems with #line).
* Make the run-generators proj a utility step.
* Made run-generators a StaticLib.
* Fix problem with generating when not necessary.
* Trying to get abspath to work on linux.
* Add run-generator-main.cpp dummy file.
* Add comment about the issues around linux and correct build triggering.
* Add updated projects.
* Remove the run-generators-main.cpp as no longer needed for 'run-generators' tool.
Removed the adding of files by default from baseSlangProject
Made the run generators project use slang-string.cpp as the file it builds from core.
* Add the run-generators VS project.
|
|
code (#1444)
* Refactor lower-generics pass into separate subpasses.
* IR pass to generate witness table wrappers.
* Support associatedtype local variables and return values in dynamic dispatch code.
|
|
* Refactor lower-generics pass into separate subpasses.
* IR pass to generate witness table wrappers.
* Re-generate vs project files.
* Fix x86 build error.
|
|
|
|
* Remove KernelContext wrapper from CPU/CUDA emit
Currently, the CPU and CUDA C++ targets rely on a `KernelContext` type that is generated during emit, as a way to provide implicit access to things that were global in the input Slang code, but that can't actually be emitted as globals in the target language (because the semantics of global declarations differ).
For example, input like:
```hlsl
ConstantBuffer<Stuff> gStuff; // shader parameter
groupshared int gData[1024]; // thread-group shared variable
static int gCounter = 0; // "thread-local" global-scope variable
void subroutine() { ... }
[shader("compute")] void computeMain() { ... }
```
would translate to output C++ for CPU a bit like:
```c++
struct KernelContext
{
ConstantBuffer<Stuff> gStuff;
int gData[1024];
int gCounter = 0;
void subroutine() { ... }
void computeMain() { ... }
};
```
Note that both `computeMain()` and `subroutine()` are non-`static` members functions on `KernelContext`, so they have an implicit `this` parameter of type `KernelContext`, which allows the bodies of those functions to implicitly reference `gStuff`, etc. by name in their bodies.
Because `KernelContext::computeMain()` is a member function, we end up emitting an additional global-scope function to expose the entry point to the outside world, and that function is responsible for declaring a local `KernelContext` and invoking the generated entry point on it.
This approach has several important drawbacks:
* It complicates the emit logic for CPU and CUDA, with many special cases around when/how things get emitted
* It complicates the implementation of dynamic dispatch, because what seems like a function pointer in Slang IR needs to be a pointer-to-member-function in C++.
* It makes it difficult to have a non-kernel-oriented mode of compilation for CPU where a Slang function with a given signature gets output as a C++ CPU function with the "same" signature (not wrapped up as a member function of `KernelContext`.
This change makes a step toward addressing these issues by making the introducing of the `KernelContext` type be something that is done in an explicit IR pass instead of being handled as part of the last-mile emit logic.
The most important change is the removal of code related to `KernelContext` from the `slang-emit-{cpp,cuda}.{h,cpp}` files, with the equivalent logic instead being handled in a new pass in `slang-ir-explicit-global-context.{h,cpp}`. It should be noted that further cleanups to the emit logic should now be possible; in particular, both the CPU and CUDA emit paths are manually sequencing the `EmitAction`s instead of relying on the default logic, but at this point they should be able to just use the default. The additional cleanups are left for future work.
The explicit IR pass does more or less what one would expect: it identifies global-scope entities (global variables and parameters) that need to be wrapped and turns them into fields of a `KernelContext` type. It then modifies all entry points to initialize a `KernelContext` as part of their startup. Finally, any code that used to refer to the global entities is changed to refer to a field of the context, with the context passed via new function parameters (the new parameter is only added to functions that need it for now).
Transforming global variables into fields of a `KernelContext` type in the IR pass ends up dropping their initial-value expressions (since those were attached as basic blocks on the `IRGlobalVar`). To avoid breaking code that relies on global-scope (but thread-local) variables, this change also adds an explicit pass that takes the initialization logic on all global variables and moves it to explicit logic that runs at the start of every entry point in a linked module (`slang-ir-explicit-global-init.{h,cpp}`). This pass would also be useful when we get back to direct SPIR-V emit, since SPIR-V also requires initialization logic for globals to be emitted into entry points.
One complication that arises when the IR is introducing the types for entry-point parameters, global-scope parameters, and the `KernelContext` type is that it becomes harder for the emit logic to utter the names of those types (they might not even have names, since `IRNameHint`s might get stripped). This created a problem since the wrapper operations that were being generated for CPU were taking `void*` parameters and casting them to the appropriate type. To work around this issue, we have added an explicit IR pass (`slang-ir-entry-point-raw-ptr-params.{h,cpp}`) that transforms the signature of entry points so that any pointer parameters instead become raw pointer (`void*`) parameters, with the casting being handled inside the entry point itself.
One consequence of all the above changes is that for the CUDA target we no longer need a wrapper function to invoke the generated entry point any more, because the IR function for the entry point ends up having the correct/expected signature already. This is also the case for CPU when it comes to the `*_Thread` wrapper function, but this change doesn't try to eliminate the wrapper because of a belief that the `*_Thread`-level interface is going away anyway.
Because the IR is now responsible for ensuring the signature of the IR entry point for CUDA and CPU is what is expected, I needed to modify the `slang-ir-entry-point-uniforms` pass to always create an explicit parameter for the entry point uniforms when compiling for CUDA/CPU, even if there were no `uniform` parameters on the entry point as written. This also ended up requiring some tweaks to the parameter layout logic to ensure that CPU/CUDA targets always treat `ConstantBuffer<T>` as a `T*` even in the case where `T` is an empty `struct` type (which happens when we construct a `struct` type to represent the uniform parameters of an entry point with no uniform parameters...).
There are several future changes that can/should build on this work:
* We should change the generated signatures for CUDA kernels, so that they don't rely on `KernelContext` for global-scope parameters. At that point we can avoid generating a `KernelContext` at all for CUDA, except when a program uses global-scope thread-local variables.
* We should figure out how to make the "ABI" for dynamic-dispatch calls ensure that the kernel context is either always passed, or always *not* passed. Making a hard-and-fast rule as part of the calling convention for dynamic calls would ensure that they access through the context continues to work with dynamic calls (this change might break it in some cases).
* We should figure out how to handle the layout for the `KernelContext` in cases where a program is composed of multiple separately-compiled modules. Right now the layout of the `KernelContext` requires global knowledge (as does the pass that introduces explicit initialization for global-scope thread-locals).
* We should try to further clean up the CPU/CUDA C++ emit logic to fall back on the default emit behavior more, now that the various special-case approaches that were taken are no longer needed
* fixup: restore build files to default configuration
|
|
* Dynamic code gen for functions returning generic types.
* Add expected test result.
|
|
The main change here is that the CPU and CUDA C++ emit paths now rely on an earlier IR pass to legalize the varying parameter list of a kernel and translate references to varying parameters with semantics like `SV_DispatchThreadID`. Doing so removes a lot of special-case logic from the emit passes.
This work moves us even closer to being able to eliminate `KernelContext` from the CPU/CUDA emit logic, because it removes the issue of state related to varying inputs being stored in `KernelContext`.
The new pass that handles the legalization is in `slang-ir-legalize-varying-params.cpp`, and it borrows heavily from the existing `slang-ir-glsl-legalize.cpp` pass. The new pass factors out the target-independent and target-dependent logic, so that both CPU and CUDA can share much of the same code despite having very different rules for how the system-value parameters are being provided.
An eventual goal is to have the new pass also handle the GLSL case, but doing so requires copying even more logic out of the GLSL-specific pass, and doing so seemed like a step to far for what was meant to be a stepping-stone change as part of other work. As a result of the incomplete nature of the pass, certain cases don't work for compute shader inputs for CPU/CUDA (e.g., wrapping your varying inputs in a `struct` type parameter), but those were cases that also didn't work in the existing `emit`-based logic.
One major consequence of this change is that the logic for emitting the various different functions that represent an entry point for our CPU back-end has been streamlined and simplified. The original logic had a fair bit of cleverness built in to try and avoid unnecessary math ops when computing the various IDs/indices, while the new logic is much more simplistic (the main dispatch function loops over threadgroups with a triply-nested `for` and then delegates to the group-level function loops over threads with its own nested `for`s).
Longer term, it will be important to simplify the CPU functions we emit further, by eliminating things like the `_Thread` function that should never really be exposed to users (the minimum granularity of invoking a CPU compute kernel should be a single threadgroup). We may eventually decide to synthesize all of the extra code that is being generated in the `emit` pass as IR instead.
|
|
* Fix a preprocessor bug affecting X-macros
Fixes #1435
This bug exhibited as nondeterministic output from the preprocessor in release builds, but using a debug build it was narrowed down to a use-after-free issue.
The core problem is subtle, but relates to how we set up the linked list that represents the "busy" status of macros in a particular expansion environment.
Consider this scenario:
```hlsl
X(A)
```
The flow we expect from the preprocessor is something like:
1. Read the `X` token in `X(A)` and recognize the start of a function-like macro invocation. Create an expansion environment for `X`, with the global environment as a parent, read in the arguments (just `A`), and push that expansion onto the stack.
2. Read the `M` token that starts the expansion of `M`, and recognize it as an invocation of the object-like macro representing the argument `M`. Create an expansion environment for the definition of `M` (which is just `A`), and push it onto the stack.
3. Read the token `A` from the expansion for the argument `M`, and recognize it as an invocation of the function-like macro `A`. Create an expansion environemnt for `A`, with the current environment as its parent, read in the arguments (just `0`), and push that expansion onto the stack.
4. Read the token `y` from the expansion for `A`, and recognize it as an invocation of the object-like macro representing the argument `y`. Create an expansion environment for the definition of `y` (which is just `0`) and push it onto the stack.
5. Read `0`.
6. Read a bunch of end-of-file tokens that cause all of these expansions to be popped.
That all looks fine as written, but the gotcha is that the input stream for the expansion in step (2) is only a single token (`A`), which means that during step (3) the current input stream at the time we *create* the macro expansion for `A` is at the end of its input, and by the time we've read in the macro arguments that expansion will have been popped.
The problem, then, is that the logic for setting up the stack of "busy" macros was being performed at the beginning of the expansion (the part referred to as "create an expansion" above), when it should only have been set up as part of pushing the xpansion onto the stack (since at that point we have a guarantee that the parent expansion cannot be popped until the child expansion has been).
The fix here is thus pretty simple: we already have distinct operations for `initializeMacroExpansion()` and `pushMacroExpansion()`, and I simply moved the logic for setting up the "busy" state from the former to the latter.
* fixup: typo
|
|
* Dynamic code gen for generic local variables.
* Fixes to function calls with generic typed `in` argument.
* Fixes per code review comments
|
|
* Adding support for global uniform shader parameters
This change adds support for Slang programmers to declare shader parameters of "ordinary" types at global scope:
```hlsl
uniform float gScaleFactor;
void main() { ... *= gScaleFactor; ... }
```
The generated HLSL/GLSL/DXIL/SPIR-V output will be something along the lines of:
```hlsl
struct GlobalParams
{
float gScaleFactor;
}
cbuffer globalParams
{
GlobalParams globalParams;
}
void main() { ... *= globalParams.gScaleFactor; ... }
```
The binding information used for the implicit `globalParams` constant buffer will be determined by the existing implicit parameter binding logic (which already had support for this kind of transformation).
The reason this change is being pursued right now is because it is one step toward removing the implicit `KernelContext` type that is used to wrap the generated code for our CPU and CUDA C++ targets. Handling global-scope parameters of ordinary type requires an IR pass that synthesizes the `GlobalParams` structure type above, and that step ends up removing the need for the similar `UniformState` structure that was being used in the CPU/CUDA emit logic.
A more detailed guide to the changes included follows:
* The diagnostic for a global-scope variable that is implicitly a shader parameter was kept, but changed to a warning. Users can opt out of the warning by decorating their parameter as a `uniform` (since that keyword is already being used to mark entry-point parameters that should be treated as uniform shader parameters).
* To simplify the task of finding the global shader parameters, the `CLikeSourceEmitter` type has been given an `m_irModule` member. The previous emit logic for `UniformState` was having to do a roundabout solution involving the `EmitAction`s to deal with not having direct access to the module.
* Removed a few dead declarations in the emit logic (related to a much earlier point where emit was based on the AST instead of the IR).
* Made the computation of type names in C++ emit take into account `ConstantBuffer<T>` and `ParameterBlock<T>`. As far as I can tell, these were being handled with some special-case hacks in the emit logic instead of being supported more fundamentally. It might actually be good to pass these through as `ConstantBuffer<T>` and `ParameterBlock<T>` in the C++ output, and allow the prelude to customize their translation (defaulting to defining them as `T*`).
* Removed the special-case C++ emit logic for references to global shader parameters. There are now at most two global shader parameters to deal with, and the default emit logic (referring to them by name) does the Right Thing.
* Changed the handling of entry points for C++ (both CPU and CUDA) so that it handles the bundled-up shader paameters for the global and entry-point scopes the same way. The main complication here is OptiX, where parameter data is passed very differently than it is for CUDA compute kernels.
* Reverted changes to `ir-entry-point-uniforms` that had made its logic depend on the compilation target. The parameter binding logic was already responsible for deciding if a given target needed to wrap up its entry-point parameters in a constant buffer, and the IR pass was respecting that layout information. The current workaround had been removing the `ConstantBuffer<T>` indirection from this IR pass for CPU/CUDA, but then reintroducing the same indirection later on in the emit step.
* Added an explicit IR pass with the task of collecting global-scope parameters of uniform/ordinary type and packaging them up into a `struct`, and then optionally packaging that `struct` up in a constant buffer. This pass bases its decisions on the IR layout information that was already computed, so it should match whatever policy choices were made at the layout level.
* Changed the "key" operand on IR `struct` layout information to not assume an `IRStructKey`. The problem here is that the global scope gets a `StructTypeLayout` to represent its members, and this is convenient (rather than having to always special-case logic that handles the global scope), but the "fields" of that struct are global variables which do not have `IRStructKey`s associated with them. The simplest solution is to use the variables themselves as the keys, which required removing the assumption in the IR encoding.
* Updated the IR layout process to compute a layout for the global scope of an entire program, and to attach that to the `IRModule` via a decoration. Updated the IR linking process to carry through that decoration to the linked output. This is necessary so that the IR pass that transforms global parameters can access the global-scope layout information.
An important concern with this approach is that the contents and layout of the monolithic `GlobalParams` structure depends on the exact set of modules that were linked (and the order in which they were specified, in some cases). This isn't really a new thing with this change, but it becomes more important as we start to think of how to generalize things to better support separate compilation and linking.
There are changes that can (and should) be made to the way that IR layouts are computed for programs (e.g., so that we compute layout per-module and then combine them rather than as a whole-program step). In this case, the problem of forming the combined/linked global layout can be moved down the IR level and not be reliant on AST-level information.
Just changing the way layout and linking interact would not change the fundamental problem that global shader parameters as they currently exist in Slang/HLSL/GLSL are not readily compatible with true separate compilation. We either need to find a solution strategy that we can apply to allow existing shaders to work with separate compilation *or* we need to incrementally work toward removing support for global-scope shader parameters in favor of explicit entry-point parameters in all cases.
* fixup: missing files
* fixup: comment the new code
|
|
This PR introduces support for the public modifier for functions. This
keyword allows labelled functions to be written to the compiled without
having a link to an entry point. The goal of this change is to help
support heterogeneous design of Slang by permitting C++ code to interact
with CPU slang functions.
Internally, this PR adds the public decoration to the IR and defines a
lowering from the public modifier in the AST to this decoration.
Additionally, the Keep Alive decoration is added to any public modifier
being lowered, which prevents DCE from eliminating functions labelled
with the public keyword.
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
* Add a test case for dynamic dispatch with `This` type in interface decl.
* Update comments
* fix typo in comments
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
* Multiple Entry Point Cleanup
This commit provides some in-code cleanup of the previous multiple entry point PR (#1411). Specifically, this PR provides refactoring of multiple entry point functions into helper functions, the removal of the EntryPointAndIndex struct, and various stylistic improvements.
* Minor updates
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
|
|
* ShortList<T> and core.natvis improvements.
* Fix gcc build.
* add `getBuffer()` accessor to `GetArrayViewResult`
|
|
Co-authored-by: Yong He <yhe@nvidia.com>
|
|
* Attempt to silence some warnings
This is an attempt to change code in `slang-ast-serialize.cpp` so that it doesn't trigger a warning(-as-error) in one of our build configurations.
The original code is fine in terms of expressing its intent, so the right answer might actually be to silence the warning.
* fixup: make sure to actually initialize
|
|
(#1420)
* Fix handling of UniformState from #1396
* * Fix bug in slang-dxc-support where it didn't get the source path correctly
* Make entryPointIndices const List<Int>&
|
|
|
|
There was a small but non-trivial amount of code across `IRModule`, the `ObjectScopeManager`, and `StringRepresentationCache` that had to do with managing the lifetimes of `RefObject`s that might be referenced by IR instructions (and thus need to be kept alive for the lifetime of the IR module).
We have long since migrated to a model where IR instruction do not include owned references to `RefObject`s, so these facilities weren't actually needed. This streamlines `IRModule`'s declaration, and trims code that we aren't actually using.
One note for the future is that the `StringRepresentationCache` no longer does what its name implies (it is not a cache of `StringRepresentation`s), so we should consider giving it a more narrowly scoped name. I didn't include that in this change because I wanted to keep the diffs narrow and easy to review.
A follow-on renaming change should be trivial if/when we can agree on what the type should be called at this point. Alternatively, we could simply bake the functionality of `StringRepresentationCache` into he IR deserialiation logic itself, since that is the only code using it.
|
|
* Initial work on property declarations
Introduction
============
The main feature added here is support for `property` declarations, which provide a nicer experience for working with getter/setter pairs.
If existing code had something like this:
```hlsl
struct Sphere
{
float4 centerAndRadius; // xyz: center, w: radius
float3 getCenter() { return centerAndRadius.xyz; }
void setCenter(float3 newValue) { centerAndRadius.xyz = newValue; }
// similarly for radius...
}
void someFunc(in out Sphere s)
{
float3 c = s.getCenter();
s.setCenter(c + offset);
}
```
It can be expressed instead using a `property` declaration for `center`:
```hlsl
struct Sphere
{
float4 centerAndRadius; // xyz: center, w: radius
property center : float3
{
get { return centerAndRadius.xyz; }
set(newValue) { centerAndRadius.xyz = newValue; }
}
// similarly for radius...
}
void someFunc(in out Sphere s)
{
float3 c = s.center;
s.center = c + offset;
}
```
The benefits at the declaration site aren't that signficiant (e.g., in the example above we actually have slightly more lines of code), but the improvement in code clarity for users is significant.
Having `property` declarations should also make it easier to migrate from a simple field to a property with more complex logic without having to first abstract the use-site code using a getter and setter.
An important future benefit of `property` syntax will be if we allow `interface`s to include `property` requirements, and then also allow those requirements to be satisfied by ordinary fields in concrete types.
Subscripts
----------
The Slang compiler already has limited (stdlib-use-only) support for `__subscript` declarations, which are conceptually similar to `operator[]` from the C++ world, but are expressed in a way that is more in line with `subscript` declarations in Swift. A `SubscriptDecl` in the AST contains zero or more `AccessorDecl`s, which correspond to the `get` and `set` clauses inside the original declaration (there is also a case for a `__ref` accessor, to handle the case where access needs to return a single address/reference that can be atomically mutated).
A major goal of the implementation here is to re-use as much of the infrastructure as possible for `__subscript` declarations when implementing `property` declarations.
Nonmutating Setters
-------------------
One additional thing added in this change is the ability to mark a `set` accessor on either a subscript or a property as `[nonmutating]`, and indeed all of the existing `set` accessors declared in the stdlib have been marked this way.
The need for this modifier is a bit subtle. If we think about a typical subscript or property:
```hlsl
struct MyThing
{
int f;
property p : int
{
get { return f; }
set(newValue) { f = newValue; }
}
}
```
it is clear we want the `set` accessor to translate to output HLSL as something like:
```
void MyThing_p_set(inout MyThing this, int newValue)
{
this.f = newValue;
}
```
Note how the implicit `this` parameter is `inout` even though we didn't mark anything as `[mutating]`. This is the obvious thing a user would expect us to generate given a property declaration.
Now consider a case like the following:
```hlsl
struct MyThing
{
RWStructuredBuffer<int> storage;
property p : int
{
get { return storage[0]; }
set(newValue) { storage[0] = newValue; }
}
}
```
This new declaration doesn't require (or want) an `inout` `this` parameter at all:
```
void MyThing_p_set(MyThing this, int newValue)
{
this.storage[0] = newValue;
}
```
In fact, given the limitations in the current Slang compiler around functions that return resource types (or use them for `inout` parameters), we can only support a `set` operation like this if we can ensure that the `this` parameter is considered to be `in` instead of `inout`. This is exactly the behavior we allow users to opt into with a `[nonmutating] set` declaration.
All of the subscript operations in the stdlib today have `set` accessors that don't actually change the value of `this` that they act on (e.g., storing into a `RWStructuredBuffer` using its `operator[]` doesn't change the value of the `RWStructuredBuffer` variable -- just its contents).
We'd gotten away without this detail so far just because `set` accessors were only being declared in the stdlib and they were all implicitly `[nonmutating]` anyway, so it never surfaced as an issue that the code we generated assumed a setter wouldn't change `this`.
Implementation
==============
Parser and AST
--------------
Adding a new AST node for `PropertyDecl` and the relevant parsing logic was mostly straightforward. The biggest change was allowing a `set` declaration to introduce an explicit name for the parameter that represents the new value to be set.
This change also adds a `[nonmutating]` attribute as a dual to `[mutating]`, for reasons I will get to later.
Semantic Checking
-----------------
The `getTypeForDeclRef` logic was updated to allow references to `property` declarations.
Some of the semantic checking work for subscripts was pulled out into re-usable subroutines to allow it to be shared by `__subscript` and `property` declarations.
The checking of accessor declarations, which sets their result type based on the type of the outer `__subscript` was changed to also handle an outer `property`.
Some special-case logic was added for checking of `set` declarations to make sure that their parameter is given the expected type.
Some logic around deciding whether or not `this` is mutable had to be updated to correctly note that `this` should be mutable by default in a `set` accessor, with an explicit `[nonmutating]` modifier required to opt out of this default. (This is the inverse of how a typical method or `get` accessor works).
IR Lowering
-----------
The good news is that after IR lowering, access to properties turns into ordinary function calls (equivalent to what hand-written getters and setters would produce), so that subsequent compiler steps (including all the target-specific emit logic) doesn't have to care about the new feature.
The bad news is that adding `property` declarations has revealed a few holes in how IR lowering was handling `__subscript` declarations and their accessors, so that it didn't trivially work for the new case as-is.
The IR lowering pass already has the `LoweredValInfo` type that abstractly represents a value that resulted from lowering some AST code to the IR. One of the cases of `LoweredValInfo` was `BoundSubscript` that represented an expression of the form `baseVal[someIndex]` where the AST-level expression referenced a `__subscript` declaration. The key feature of `BoundSubscript` is that it avoided deciding whether to invoke the getter, the setter, or both "too early" and instead tried to only invoke the expected/required operations on-demand.
This change generalizes `BoundSubscript` to handle `property` references as well, so it changes to `BoundStorage`. Making the type handle user-defined property declarations required fixing a bunch of issues:
* When building up argument lists in the IR, we need to know whether an argument corresponds to an `in` or an `out`/`inout` parameter, to decide whether to pass the value directly or a pointer to the value. Some of the logic in the lowering pass had been playing fast and loose with this, so this change tries to make sure that whenever we care computing a list of `IRInst*` that represent the arguments to a call we have the information about the corresponding parameter.
* Similarly, when emitting a call to an accessor in the IR, the information about the expected type of the callee was missing/unavailable, and the code was incorrectly building up the expected type of the callee based on the types of the arguments at the call site. The logic has been changed so that we can extract the expected signature of an accessor (how it will be translated to the IR) using the same logic that is used to produce the actual `IRFunc` for the accessor (so hopefully both will always agree).
* Dealing with `in` vs. `inout` differences around parameters means also dealing with the "fixup" code that is used to assign from the temporary used to pass an `inout` argument back into the actual l-value expression that was used. That logic has all been hoisted out of the expression visitor(s) and into the global scope.
Future Work
===========
The entire approach to handling l-values in the IR lowering pass is broken, and it is in need a of a complete rewrite based on new first-principles design goals. While something like `LoweredValInfo` is decent for abstracting over the easy cases of r-values, addresses, and a few complicated l-value cases like swizzling, it just doesn't scale to highly abstract l-values like we get from `__subcript` and `property` declarations, nor other corner cases of l-values that we need to handle (e.g., passing an `int` to an `inout float` parameter is allowed in HLSL, and performs conversions in both directions!).
It Should be Easy (TM) to extend the logic that tries to synthesize an interface conformance witness method when there isn't an exact match to also support synthesizing a property declaration (plus its accessors) to witness a required property when the type has a field of the same name/type.
* fixup: pedantic template parsing error (thanks, clang!)
* fixup: cleanups and review feedback
* Removed some `#ifdef`'d out code from merge change
* Added proper diagnostics for accessor parameter constraints, which led to some fixes/refactorings
* Added a test case for the accessor-related diagnostics
|