| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
|
|
|
|
|
|
| |
* Prefixing source files in source/slang with slang-
* Prefix source in source/slang with slang- prefix.
* Rename core source files with slang- prefix.
* Update project files.
* Fix problems from automatic merge.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* List made members m_
Tweaked types to closer match conventions.
* Use asserts for checking conditions on List.
Other small improvements.
* List<T>.Count() -> getSize()
* List<T>
Add -> add
First -> getFirst
Last -> getLast
RemoveLast -> removeLast
ReleaseBuffer -> detachBuffer
GetArrayView -> getArrayView
* List<T>::
AddRange -> addRange
Capacity -> getCapacity
Insert -> insert
InsertRange -> insertRange
AddRange -> addRange
RemoveRange -> removeRange
RemoveAt -> removeAt
Remove -> remove
Reverse -> reverse
FastRemove -> fastRemove
FastRemoveAt -> fastRemoveAt
Clear -> clear
* List<T>
FreeBuffer -> _deallocateBuffer
Free -> clearAndDeallocate
SwapWith -> swapWith
* List<T>
SetSize -> setSize
Reserve -> reserve
GrowToSize growToSize
* UnsafeShrinkToSize -> unsafeShrinkToSize
Compress -> compress
FindLast -> findLastIndex
FindLast -> findLastIndex
Simplify Contains
* List<T>
Removed m_allocator (wasn't used)
Swap -> swapElements
Sort -> sort
Contains -> contains
ForEach -> forEach
QuickSort -> quickSort
InsertionSort -> insertionSort
BinarySearch -> binarySearch
Max -> calcMax
Min -> calcMin
* Initializer::Initialize -> initialize
List<T>::
Allocate -> _allocate
Init -> _init
IndexOf -> indexOf
* * Put #include <assert.h> in common.h, and remove unneeded inclusions
* Small refactor of ArrayView - remove stride as not used
* getSize -> getCount
setSize -> setCount
unsafeShrinkToSize->unsafeShrinkToCount
growToSize -> growToCount
m_size -> m_count
* Some tidy up around Allocator.
* Use Index type on List.
* Refactor of IntSet.
First tentative look at using Index.
* Made Index an Int
Did preliminary fixes.
Made String use Index.
* Partial refactor of String.
* String::Buffer -> getBuffer
ToWString -> toWString
* Small improvements to String.
String::
Buffer() -> getBuffer()
Equals() -> equals
* Try to use Index where appropriate.
* Fix warnings on windows x86 builds.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Allow plugging in types with resources for interface parameters
The key feature enabled by this change is that you can take a shader declared with interface-type parameters:
```hlsl
ConstantBuffer<ILight> gLight;
float4 myShader(IMaterial material, ...)
{ ... }
```
and specialize its interface-type parameters to concrete type that can contain resources like textures, samplers, etc.
The hard part of doing this layout is that we need to support signatures that include a mix of interface and non-interface types. Imagine this contrived example:
```hlsl
float4 myShader(
Texture2D diffuseMap,
ILight light,
Texture2D specularMap)
{ ... }
```
We end up wanting `diffuseMap` to get `register(t0)` and `specularMap` to get `register(t1)`, so that they have the same location no matter what we plug in for `light`.
But if we plug in a concrete type for `light` that needs a texture register, we need to allocate it *somewhere*.
We handle this by having the `TypeLayout` for `light` come back with a "primary" type layout that doesn't have any texture registers, but with a "pending" type layout that includes the texture register requirements of whatever concrete type we plug in.
This split between "primary" and "pending" layout then needs to work its way up the hierarchy, so that an aggregate `struct` type with a mix of interface and non-interface fields (recursively), needs to compute an aggregate "primary type layout" and an aggregate "pending type layout," and then each field needs to be able to compute its offset in the primary/pending layout of the aggregate.
A large chunk of the work in this PR is then just implementing the split between primary and pending data, and ensuring that layouts are computed appropriately.
The next catch is that when a "parameter group" (either a parameter block or constant buffer) contains one or more values of interface type, then we can allow the parameter group to "mask" some of the resource usage of the concrete types we plug in, but others "bleed through."
For example, if we have:
```hlsl
struct MyStuff { float3 color; ILight light; }
ConstantBuffer<MyStuff> myStuff;
struct SpotLight { float3 position; Texture2D shadowMap; }
``
If we plug in the `SpotLight` type for `myStuff.light`, then the `float3` data for the light can be "masked" by the fact that we have a constant buffer (we can just allocate the `float3` `position` right after `color`), but the `Texture2D` needed for `shadowMap` needs to "bleed through" and become "pending" data for the `myStuff` shader parameter.
Adding support for that detail more or less required a full rewrite of the logic for allocating parameter group type layouts.
The next detail is that when we go to legalize a declaration like the `myStuff` buffer, we will end up with something like:
```hlsl
struct MyStuff_stripped { float3 color; }
struct Wrapped
{
MyStuff_stripped primary;
SpotLight pending;
}
ConstantBuffer<Wrapped> myStuff;
```
This "wrapped" version of the buffer type more accurately reflects the layout we need/want for the uniform/ordinary data, but in order to further legalize it and pull out the resource-type fields like `shadowMap` we need to have accurate layout information, and the problem is that layout information for the original buffer can't apply to this new "wrapped" buffer.
The last major piece of this change is logic that runs during existential type legalization to compute new layouts for "wrapped" buffers like these that embeds correct offset/binding/register information for any resources nested inside them. A key challenge in that code is that existential legalization needs to erase any "pending" data from the program entirely, so that offset information that used to be relatie to the "pending" part of a surrounding type now needs to be relative to the primary part.
The work here may not be 100% complete for all scenarios, but it does well enough on the new and existing tests that I want to checkpoint it. Note that a few other tests have had their output changed, but in all cases I've reviewed the diffs and determined that the change in observable behavior is consistent with what we intened Slang's behavior to be.
Note that there is still one major piece of support for interface-type parameters that is missing here, and which might force us to revisit some of the decisions in this code: we don't properly support user-defined `struct` types with interface-type fields.
* fixup: typos
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Split front- and back-ends
This change is a major refactor of several of the types that provide the behind-the-scenes implementation of the public C API.
The goal of this refactor is primarily to allow for future API services that let the user operate both the front- and back-ends of the compiler in a more complex fashion.
For example, as user should be able to compile a bunch of source code into modules, look up types, functions, etc. in those modules, specialize generic types/functions to the types they've looked up, and then finally request target code to be gernerated for specialized entry points.
The back-end code generation they trigger should re-use the front-end compilation work (parsing, semantic checking, IR generation) that was already performed.
The most visible change is that `CompileRequest` has been split up into several smaller types that take responsibility for parts of what it did:
* The `Linkage` type owns the storage for `import`ed modules, and well as the `TargetRequest`s that represent code-generation targets. The intention is that an application could use a single `Linkage` for the duration of its runtime (so long as it was okay with the memory usage), so that each `import`ed module only gets loaded once. For now, this type needs to manage the search paths, file system, and source manager, because of its responsibility for loading files.
* A `FrontEndCompileRequest` owns the stuff related to parsing, semantic checking, and initial IR generation. This most notably includes the `TranslationUnitRequest`s and the `FrontEndEntryPointRequest`s (which used to be just `EntryPointRequest`s). It's main job is to produce AST and IR modules for each translation unit, and to find and validate the entry points. The front-end request does *not* interact with generic arguments for global or entry-point generic parameters.
* The main output of both `import` operations and front-end translation units is the `Module` type, which is just a simple container for both the AST module (to service the reflection/layout APIs, and also for semantic checking of code that `import`s the module) and the IR module (for linking and code generation). This type captures the commonalities between the old `LoadedModule` (which is now just an alias for `Module`) and `TranslationUnitRequest` (which now owns a `Module`).
* The secondary output of front-end compilation is a `Program`, which comprises a list of referenced `Module`s and validated `EntryPoint`s that will be used together. Layout and code generation both need a `Program` to tell them what modules and entry points will be used together (we don't want to just code-gen everythin that has ever been loaded into the linakge). The `Program`s created by the front-end do not include generic arguments, so they may provide incomplete layout information and/or be unsuitable for code generation.
* A `BackEndCompileRequest` owns stuff related to turning a `Program` into output kernels for the targets of a `Linkage`. Most of the data it owns beyond the `Program` to be compiled is minor, so this is a good candidate for demotion from a heap-allocated object to just a `struct` of options that gets passed around.
* The `CompileRequestBase` type is an attempt to wrap up the common functionality of both front-end and back-end compile requests. Most of it is just exposing the availability of a linkage and `DiagnosticSink`, so this type is a good candidate for subsequent removal. The main interesting thing it has is the flags related to dumping and validation of IR, so there is probably a good refactoring still to be made around deciding how options should be handled going forward.
* Behind the scenes, the `Program` type is set up to handle some level of on-line compilation and layout work. The `Program` knows the `Linkage` it belongs to, and allows for a `TargetProgram` to be looked up based on a specific `TargetRequest`. A `TargetProgram` then allows layout information and compiled kernel code to be asked for on-demand, in order to support eventual "live" compilation scenarios.
* The `EndToEndCompileRequest` type is a composition/coordination type that replaces the old `CompileRequest` in a way that uses the services of the various other types. It owns a few pieces of state that only make sense in the context of an end-to-end compile (e.g., there is really no way to "pass through" code when the front- and back-ends are run separately) or a command-line compile (everything to do with specifying output paths for files is really just for the benefit of `slangc`, and might even be moved there over time).
* One important detail is that the `EndToEndCompilRequest` owns all of the string-based generic arguments for both global and entry-point generic parameters. The logic in `check.cpp` for dealing with those arguments has been heavily refactored to separate out the parsings steps that are specific to end-to-end compilation with string-based type arguments, and the semantic checking steps that result in a specialized `Program` (which can be exposed through new APIs that aren't tied to end-to-end compilation).
It is perhaps not surprising that this change had a lot of consequences, so I'll briefly run over some of the main categories of changes required:
* I changed the way that global generic arguments are passed via API (use `spSetGlobalGenericArgs` instead of the generic arguments for `spAddEntryPointEx`, which are not just for entry-point generics), which has been a change that we've needed for a long time. This is technically a breaking API change, although we should have very few client applications that care about it.
* A bunch of places that used to take "big" objects like `CompileRequest` now just take the sub-pieces they care about (e.g., a function might have only needed a `Linkage` and a `DiagnosticSink`). This makes many subroutines or "context" struct types more generally useful, at the cost of taking more parameters.
* In a few cases the conceptually clean separation of the layers breaks down (often for edge-case or compatibility features), and so we may pass along additional objects that are allowed to be null, but are used when present. A big example of this is how the back-end code generation routines accept an `EndToEndCompileRequest` that is optional, and only used to check whether "pass through" compilation is needed. We should probably look into cleaning this kind of logic up over time so that we don't need to violate the apparent separation of phases of compilation.
* In cases where separation of layers was being broken for the sake of GLSL features, I went ahead and ripped them out, since all of that should be dead code anyway.
* In many cases I increased the encapsulation of data in the core types to help track down use sites and make sure they are following invariants better.
* In cases where code was doing, e.g., `context->shared->compileRequest->session->getThing()` I have tried to introduce convenience routines so that the usage site is just `context->getThing()` to improve encapsulation and allow changes to be made more easily going forward.
* The `noteInternalErrorLoc` functionality was moved off of the compile request and into `DiagnosticSink`, since that is the one type you can rely on having around when you want to note an internal error. We may consider going forward if (and how) it should reset the counter used for noting locations on internal errors.
* A few APIs now take `DiagnosticSink*` arguments where they didn't before, and as a result some public APIs need to create `DiagnosticSink`s to pass in, before going ahead and ignoring the messages. In the future there should be variations of these APIs that accept an `ISlangBlob**` parameter for the output.
* fixup: missing include for compilers with accurate template checking (non-VS)
* fixup: review feedback
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Refactor several IR passes
This change takes some IR passes that lived together in `ir.cpp` and moves them into their own files to improve clarity.
In most cases these were passes introduced early in the life of the IR, so that it didn't seem like a big deal to have them all in one file, but now that `ir.cpp` has grown unwieldly this seems like an important cleanup to make.
To give a quick rundown of the passes involved:
* The IR "linking" step has been pulled out to `ir-link.{h,cpp}`. This code for this pass is pretty much identical to what was in `ir.cpp`, and no attempt has been made to clean up or refactor it in the current change.
* The GLSL legalization step has been pulled out to `ir-glsl-legalize.{h,cpp}`. This used to be invoked directly from the linking step, but has been made a new top-level pass invoked from `emit.cpp`. Just like with the linking, the code in the new file is just a copy-paste of what was in `ir.cpp`, and no attempt at cleanup has been made. Also note that it might be a good idea to move this pass later in the overall sequence, but this PR doesn't attempt to do that as it could change results.
* The generic specialization step has been pulled out to `ir-specialize.{h,cpp}`. The file name does not explicitly reference *generic* specialization because I anticipate this pass having to perform other kinds of specialization as well. The code in this case amounts to a heavy cleanup/refactoring pass and thus deserves careful scrutiny. The reason for the cleanup is that the generic specialization step used to be part of the "linking" step long ago, and continued to share infrastructure with it long after that stopped making sense. The newly cleaned up pass has much simpler logic that should be easy enough to follow from the comments.
* In order to reduce code dulication, the IR "cloning" part of the `ir-specialize-resources.{h,cpp}` pass was pulled into its own files (`ir-clone.{h,cpp}`) that both the generic specialization step and the resource-based specialization step now share.
The remaining changes then pertain to deleting a bunch of code out of `ir.cpp` and adding the new files to the build.
The only test that needed updating was `vkray/raygen`, where some subtle ordering change in the refactored generic specialization logic has lead to the relative order of the specialized `TraceRay` and `saturate` functions beind reversed.
* fixup: typo in assert
* fixup: typos in comments
|
|
|
* Specialize away resource-type function parameters
Work on #397.
Introduction
------------
Suppose a user writes a function that takes a resource type as a parameter:
```hlsl
float4 getThing(RWStructuredBuffer<float4> buffer, int index)
{
return buffer[index];
}
```
This function creates challenges when generating code for GLSL-based targets, because a global shader parameter of type `RWStructuredBuffer`:
```hlsl
RWStructuredBuffer<float4> gBuffer;
```
translates to a global GLSL `buffer` declaration:
```hlsl
buffer _S0
{
float4 _data[];
} gBuffer;
```
There is no equivalent to that `buffer` declaration that can be used in function parameter position, and it is illegal in GLSL to pass `gBuffer` into a function.
(Aside: yes, we could in principle translate a function parameter like `RWStructuredBuffer<float4> buffer` to `float4 buffer[]`, but that will not in turn generalize to arrays of structured buffers; it is a dead-end strategy)
The solution employed by many shader compilers is to "inline everything" to eliminate the need for parameters of resource types, and then rely on dataflow optimization to eliminate locals of resource types. This strategy can of course lead to an increase in code size, and it also means that call stacks are lost when doing step-through debugging. Another serious issue is that an "early `return`" from a function can turn into the equivalent of a multi-level `break` when inlined, and not all of our targets support multi-level `break`.
The solution implemented in this change works around some, but not all, of the problems with full inlining.
The approach here generates specialized versions of a function like `getThing`, adapted to the actual arguments provided at different call sites.
Thus if we have code like:
```hlsl
RWStructuredBuffer<float4> gA;
RWStructuredBuffer<float4> gB[10];
...
getThing(gA, x);
getThing(gA, y);
getThing(gB[someVal], z);
```
we will generate two specializations of `getThing`: one specialized for the `buffer` parameter being `gA` and the other for `gB`:
```hlsl
float4 getThing_gA(int index) { return gA[index]; }
float4 getThing_gB(int _val, int index) { return gB[_val][index]; }
```
and the call sites will change to match:
```hlsl
getThing_gA(x);
getThing_gA(y);
getThing_gB(someVal, z);
```
Note how in the case where the argument being passed in was obtained by indexing into an array of resources, the callee is specialized to the identity of the global shader parameter (`gB`), and now accepts a new parameter to indicate the array index into it.
While this description motivates the change based on GLSL output, the same basic issue can arise for other targets.
For example, while current HLSL has added the `ConstantBuffer<T>` type, it is not supported on older targets, and it turns out that even dxc does not allow functions to have `ConstantBuffer<T>` parameters.
Longer-term, we will likely need to do even more aggressive specialization both in order to generate SPIR-V output directly, and also to deal with function that have return values or `out` parameters of resource types.
Implementation
--------------
The meat of the change is in `ir-specialize-resources.{h,cpp}`, where we have a pass that looks at all call sites (`IRCall` instructions) in the program, and attempts to replace them with calls to specialized functions, where the specializations are generated on-demand.
The code in this pass is heavily commented, so hopefully it serves to explain itself all right.
After specialization is complete, we may still have functions like the original `getThing` that will produce invalid code when emitted as GLSL, so we need a way to make sure they don't appear in the output.
To date we've had some very ad hoc approaches for ignoring IR constructs that we don't want to affect emitted code, but this change goes ahead and adds a more real dead code elimination (DCE) pass in `ir-dce.{h,cpp}`.
This pass follows a straightforward approach of tagging instructions that are "live" and then propagating liveness through the whole program, before making a single pass to delete anything that isn't live.
When I first added the DCE pass it eliminated *everything* because there were no "roots" for liveness.
I solved this for now by adding a new decoration, `IREntryPointDecoration`, to mark shader entry points in the IR which should always be live (as should anything they depend on).
A secondary problem that arose was that for GLSL ray tracing shaders it is possible for the incoming/outgoing payload or attributes parameters to be unused, but eliminating them as dead would change the signature of a shader an potential break the rules for how ray tracing programs communicate.
I added a very simple `IRDependsOnDecoration` that allows one IR instruction to keep another alive *as if* it used it, without actually using it.
There's also a fixup in the IR dumping logic where I was forgetting to store anything in the mapping from instruction to their names, so that the name of an instruction was getting incremented each time it was referenced.
Testing
-------
There are three different tests added as part of this change:
* The `compute/func-resource-param` test covers the basic `RWStructuredBuffer` case above, which we expect to work fine for D3D11/12, but fail for Vulkan without specialization.
* The `cross-compile/func-resource-param-array` test covers the case where we don't just have one resource, but an array of them. This is not an end-to-end compute test primarily because our `render-test` application doesn't yet handle arrays of resources correctly in its binding logic.
* The `compute/func-cbuffer-param` test covers the case of a function with a `ConstantBuffer<T>` parameter, which requires specialization to become valid for any of our targets.
* fixup: warnings/errors from other compilers
* fixup: typos and cleanup
* fixup: typos
|