| Age | Commit message (Collapse) | Author |
|
Fixes issue #620
Given a `RWTexture*` store operation like:
```hlsl
RWTexture3D a<float>;
...
float x = 1.0f;
a[crd] = x;
```
We were generating output GLSL like:
```glsl
layout(rgba32f) image3D a;
...
float x = 1.0f;
imageStore(a, crd, x);
```
but in that case, the `imageStore` operation expected a `vec4` and not a `float` for the last argument, and we fail GLSL compilation.
This change extends our handling of the `imageStore` operation in the stdlib so that we pad out the last argument if it is not a 4-vector.
We also flesh out the code that was picking a `layout(...)` modifier for image formats so that it doesn't just blindly use `layout(rgba32f)` and instead takes the element type fed to `RWTexture3D<...>` into account.
With these two changes, the above HLSL/Slang code now translates to:
```glsl
layout(r32f) image3D a;
...
float x = 1.0f;
imageStore(a, crd, vec4(x, float(0), float(0), float(0)));
```
Note that we are padding out the `x` argument to a full vector, and also that we declare the image with `layout(r32f)` to reflect the fact that it has only as single channel.
|
|
* Typo fix, and added dxc to command line documentation.
* Fix small typos.
Added support for Scope to lexer.
Fix bug in Token ctor.
* Add support for attribute names that are scoped.
* Added GLSLBindingAttribute. Make binding work through core.met.slang.
* Allow [[gl::binding(binding, set)]]
[[vk::binding(binding,set)]]
|
|
Partially fixes #615
There's kind of a mess going on here, and it is difficult to be sure which of the changes here are strictly necessary.
Also, our testing isn't setup to run tests that use `RWTexture2D`, so the only testing I can really run is manual tests using Falcor.
The most basic issue here is that in an earlier change I added `ref` accessors for the subscript operation on various `RW*` types in the standard library, and that included `RWTexture2D` (and the other `RWTexture*` types). The compiler ended up favoring a `ref` accessor over a `set` accessor even when the `set` would suffice, but only the `set` accessor could be lowerd to GLSL/SPIR-V.
This change ends up implementing two different fixes for the same problem:
* Logic has been added to try and favor a `set` accessor over a `ref` accessor in the cases where either could be used (but still require a `ref` accessor to be used when it is really needed)
* The `ref` accessor for `RWTexture*` has been removed, since it turns out that the operations that might have benefited from it (atomics, and component-granularity stores) aren't actually allowed on typed UAVs anyway.
There is a deeper issue here that somebody needs to go through and rationalize our representation and handling of accessors like this, but I'm not going to be able to do that in the time I can put into this PR.
|
|
There was some logic called `maybeEmitGLSLFlatModifier()` that was supposed to emit an implicit `flat` modifier for any varying shader parameter with an integer type that wasn't qualified as `nointerpolation` in the input HLSL/Slang (where `nointerpolation` is the equivalent of `flat`).
This wasn't being triggered because I apparently added code to only emit the implicit modifier if there was no explicit one, but then I had this code:
```c++
bool anyModifiers = false;
anyModifiers = true;
...
if(!anyModifiers && ...) { maybeEmitGLSLFlatModifier(); }
```
Unsurprisingly, the `anyModifiers = true` line meant that things never actually triggered. Once I fixed that issue the next problem that arose was that the `maybeEmitGLSLFlatModifier()` logic was being applied to *any* varying integer parameter, which includes fragment outputs, but GLSL forbids the `flat` modifier on fragment outputs and so gave an error on a shader that wrote to an integer target.
I fixed up the logic to take computed layout for a shader parameter into account, and only emit the `flat` modifier for fragment *inputs*. As the `TODO` commend at that location notes, there may be some arcane rules about how a vertex shader also needs to use `flat` when declaring the matching output, so we may need to make that test more careful down the road. For now the shader that originally surfaced this problem now works under Vulkan.
|
|
* Parsing of control of api parameters no longer needs comma separator.
Parsing of API list now can take an initial state.
Document the command line option.
* * Proper handling of 'default' (or initialFlags) - by using if the first token is an operator.
* Clarified parsing of api flags.
* Now 'vk' will mean just use vk. +vk will mean the defaults plus vk, and -vk is the defaults -vk.
* Improve README.md on api expressions.
Improve error text for failure to parse api expressions.
|
|
message in diagnostic string. (#612)
|
|
* * Make spCompile return SlangResult
* Make spProcessCommandLineArguments return SlangResult (and not internally exit)
* Remove calls to exit()
* Fix typos
* Make all output from spProcessCommandLineArguments get sent to diagnostic sink.
|
|
An earlier change made sure that the `half` and `double` types properly conform to the `__BuiltinFloatingPointType` and `__BuiltinRealType` interfaces, but somehow that change modified only the *generated* source file (`core.meta.slang.h`) and not the source that fed into the generator (`core.meta.slang`).
This meant that when building the compiler, we'd end up with spurious diffs because we'd run the generation logic and clobber the (correct) output file with freshly generated (wrong) code.
This change adds the missing lines to the source file to fix up the issue.
|
|
The `render-test` project has an in-progress graphics API abstraction layer, and it makes sense to share this code with our examples rather than write a bunch of redundant code between examples and tests.
Most of this change is just moving files from `tools/render-test/*` to a new library project at `tools/slang-graphics/`. The most complicated code change there is renaming from `render_test` to `slang_graphics`.
The existing `hello` example was ported to use the graphics API layer instead of raw D3D11 API calls. It is still hard-coded to use the D3D11 back-end and the `SLANG_DXBC` target, so more work is needed if we want to actually support multiple APIs in the examples.
I also went ahead and implemented an extremely rudimentary set of APIs to abstract over the Windows platform calls that were being made in the example, so that we could potentially run that same example on other platforms. I did *not* port `render-test` to use those APIs, and I also did not implement them for anything but Windows (my assumption is that for most other platforms we would just use SDL2, and require people to ensure it is installed to their machine before building Slang examples).
|
|
* Fix typo OuptutTopologyAttribute -> OutputTopologyAttribute
First pass support for handing tesselation shaders - domain and hull.
* Added attribute PatchConstantFuncAttribute
* Added visitHLSLPatchType(HLSLPatchType* type) such that the patch type template parameters are handled
* Added IRNotePatchConstantFunc - such that the patch constant function is referenced within IR
* Added support for outputing typical tesselation attributes (although minimal validation is performed)
* Added findFunctionDeclByName
* Small improvements to diagnostic.
* Improved diagnostics and checking for geometry shader attributes.
* Added diagnostic if patchconstantfunc is not found
Handle assert failure when outputing a domain shader alone and therefore attr->patchConstantFuncDecl is not set.
* Simple script tess.hlsl to test out domain/hull shaders.
* Added url for where hull shader attributes are defined.
* Fix unsigned/signed comparison warning.
* Restore removal of fix in "Improve generic argument inference for builtins (#598)"
* Update tessellation test case to compare against fxc
The test was previously comparing against fixed expected DXBC output, but this caused problems when the test runner tried to execute the test on Linux (where there is no fxc to invoke...), and would also be a potential source of problems down the road if different users run using different builds of fxc.
The simple solution here is to convert the test to compare against fxc output generated on the fly. That test type is already filtered out on non-Windows builds, so it eliminates the portability issue (in a crude way).
I also changed the test to compile both entry points in one compiler invocation, just to streamline things into fewer distinct tests.
* Eliminate unnecessary call to `lowerFuncDecl`
In a very obscure case this could cause a bug, if the patch-constant function had somehow already been lowered (because it was called somewhere else in the code).
The call should not be needed because `ensureDecl` will lower a declaration on-demand if required, so eliminating it causes no problems for code that wouldn't be in that extreme corner case.
|
|
* Added Result definitions to the slang.h
* Removed slang-result.h and added slang-com-helper.h
* Move slang-com-ptr.h to be publically available.
* Add SLANG_IUNKNOWN macros to simplify implementing interfaces.
Use the SLANG_IUNKNOWN macros to in slang.c
* Removed slang-defines.h added outstanding defines to slang.h
* Include slang-com-ptr.h and slang-com-helper.h in archives built with CI.
* Use spaces instead of tabs on appveyor.yml
* Put operator== and != for Guid in global namespace.
* Fix binary windows archive to have all the slang headers.
|
|
* Added Result definitions to the slang.h
* Removed slang-result.h and added slang-com-helper.h
* Move slang-com-ptr.h to be publically available.
* Add SLANG_IUNKNOWN macros to simplify implementing interfaces.
Use the SLANG_IUNKNOWN macros to in slang.c
* Removed slang-defines.h added outstanding defines to slang.h
* Include slang-com-ptr.h and slang-com-helper.h in archives built with CI.
* Use spaces instead of tabs on appveyor.yml
|
|
* Added Result definitions to the slang.h
* Removed slang-result.h and added slang-com-helper.h
* Move slang-com-ptr.h to be publically available.
* Add SLANG_IUNKNOWN macros to simplify implementing interfaces.
Use the SLANG_IUNKNOWN macros to in slang.c
* Removed slang-defines.h added outstanding defines to slang.h
|
|
* Add support for "blobs" and a file-system callback
The most obvious change here is that the Slang header now includes a few COM-style interfaces that can be used for communication between the application and compiler. In order to support the declaration of COM-like interfaces, several platform-detection macros were lifted out of `slang-defines.h` and into the public `slang.h` header. As it exists right now, this change makes the Slang API C++-only, but a C-compatible version can be defined later with the help of lots of macros (and/or something like an IDL compiler).
The two big interfaces introduced are:
* The `ISlangBlob` interface, which is compatible with `ID3DBlob`, `IDxcBlob`, etc. This is used to pass ownership of source/compiled code across the API boundary without copies. New versions of various entry points have been added to allow passing blobs: e.g., `spAddTranslationUnitSourceBlob` and `spGetEntryPointCodeBlob`.
* The `ISlangFileSystem` interface, which is used to allow applications to intercept any attempt by the Slang compiler to load a file (input source files, include files, etc.). This is *not* the same as the `IDxcIncludeHandler` interface, because it assumes UTF-8 encoded path names, instead of the 16-bit encoding that dxc/Windows prefer. It is also not very similar to `ID3DInclude` as used by fxc, because this callback interface is *not* responsible for handling the search through include paths, etc. - it is just a file-system abstraction layer.
Internally, a few different parts of the compiler were changed to either store data in blob form all the time, or to be able to synthesize a blob on-demand. Because our internal `String` type is a reference-counted copy-on-write type, using a `SlangStringBlob` to hold string data should achieve transfer of ownership back to the application without extraneous copies. There is plenty of room to clean up the architecture of some of these internal pieces if they *know* that their data will end up in a blob.
The existing Slang testing doesn't touch any of the APIs introduced here, so they can only confirm that existing functionality hasn't been broken. The new ability to return code blobs has been tested by integration of that feature into Falcor, but there has been zero testing of the ability to pass *in* source code as blobs, and the ability to hook file loading. Future changes will need to add test coverage for the new features.
* fixup: define SLANG_NO_THROW for non-Windows builds
* fixup: header copy-paste error caught by clang/gcc
* Cleanup: return reference-counted objects via output parameters
Returning a reference-counted object through the API as a raw pointer creates challenges.
The "obvious" answer is that the returned pointer should have an added reference (it is returned at "+1"), and the caller is responsible for releasing that reference. This makes sense when using raw pointers on the calling side:
```c++
IFoo* foo = spGetFoo(...);
...
foo->Release();
```
However, as soon as smart pointers start getting involved (to handle releasing reference counts when we are done with things), the picture gets more complicated:
```c++
MySmartPtr<IFoo> foo = spGetFoo(...);
...
```
The intention of code like that is that `foo` gets released when the smart pointer goes out of scope, but this probably doesn't happen with most smart pointer implementations. If the `MySmartPtr` constructor that takes a raw pointer retains it, then the destructor will only release *that* reference, and so the object will leak.
It is possible that the user will have a smart pointer type where the constructor that takes a raw pointer doesn't retain it, but in general such types introduce the potential for errors of their own, and no matter what the Slang API shouldn't go in assuming any particular policy.
This change makes it so that any reference-counted objects that are logically returned from a call are returned through output pointers. This design makes the leak-free cases easy (enough) to implement with raw pointers or smart pointers:
```c++
// raw pointer
IFoo* foo = nullptr;
spGetFoo(..., &foo);
...
foo->Release();
// smart pointer
MySmartPtr<IFoo> foo;
spGetFoo(..., foo.writeableRef());
...
```
The only assumption here is that any COM smart-pointer type needs to provide an operation like `writableRef` that is suitable for using that pointer as an output parameter. Given that COM *loves* output parameters, this seems like a safe assumption (at the very least, anybody who interacts with COM would be used to this convention).
Future changes might introduce inline convenience methods for various operations that return results more directly, possibly by introducing a minimal smart-pointer type in the `slang.h` header (without prescribing that clients must use it...).
* fixup: another error caught by gcc/clang
|
|
Fixes #487
The basic problem here is that the user writes something like:
```hlsl
float invSqrt2 = 1 / sqrt(2);
```
In this case the user knows that `sqrt()` is only defined for floating-point types, so they expect this to compile something like:
```hlsl
float invSqrt2 = float(1) / sqrt(float(2));
```
The challenge this creates for the Slang compiler is that we use generics to streamline our declarations of all the builtins, so that the scalar `sqrt()` function is actually declared as:
```hlsl
T sqrt<T:__BuiltinFloatingPointType>(T value);
```
The `__BuiltinFloatingPointType` is an `interface` defined as part of the standard library, such that only built-in floating-point types conform to it (that is, `half`, `float`, and `double`).
When generic argument inference applies to a call like `sqrt(2)`, we see an argument of type `int`, and try to infer `T=int`, which leads to a failure because `int` does not conform to `__BuiltinFloatingPointType`.
The point where this currently fails in in the logic to "join" two types for inference, which is supposed to pick the best type that can represent both of two input types. E.g., a join between `float` and `int3` would be `float3`, since both of those types can convert to it, and it is the "minimal" type with that property.
So, the goal here is simple: we want a "join" between `int` and `__BuiltinFloatingPointType` to yield the `float` type. The way we handle that in this change is to special case the join of a basic scalar type and an interface, by enumerating all the basic scalar types, filtering them for ones that support the chosen interface and can be implicitly converted from the argument type, and then picking the "best" of them (the comments in the code explain what "best" means in this context).
The technique used here could be generalized in the future to deal with user-defined types or more cases, but that would risk slowing down overload resolution even more, which is already the most expensive part of our semantic checking pass.
A test case has been added for the specific case of `sqrt()` applied to an `int` argument.
|
|
* Make render-test use Slang for all shader compilation
This streamlines the code for render-test by having all its shader compilation go through the Slang API, so that it doesn't have to deal with custom logic to compile HLSL->DXBC and HLSL->DXIL. We were already leaning on Slang to generate SPIR-V for Vulkan, so this makes all the paths more consistent.
My original plan with this change was to make the D3D12 render path start using DXIL at this point, since the change would make that easy, but it turns out that some aspects of how we handle parameter binding are not compatible with that right now, so it would need to come as a later change.
There's a lot of details here, so I will try to walk through the changes, including the incidental ones:
* Add logic to `premake5.lua` so that we copy the necessary libraries for HLSL shader compilation to our target directory from the Windows SDK. This is necessary so that our tests can actually invoke `dxcompiler.dll`
* Re-run Premake to generate new project files. This moves around a few files that I manually added in previous changes without re-running Premake.
* When invoking `fxc` as a pass-through compiler, be sure to pass along any macros defines via API or command-line. This isn't a strictly required change with how things worked out, but it is a positive one anyway, because it makes `slangc -pass-through fxc` more useful.
* Don't print output from a downstream `fxc` invocation if it produces warnings but no errors. The main reason for this is so that our tests don't fail because of `fxc` warnings on Slang's output (which then don't match the baselines), but it can also be rationalized as not wanting to confuse users with warnings that don't come from the "real" compiler they are using. This probably needs fine-tuning as a policy.
* Add the HLSL `NonUniformResourceIndex` function. This was an oversight because it isn't documented as a builtin on MSDN, and only gets mentioned obliquely when they talk about resource indexing.
* Add `glsl_<version>` profiles to match our `sm_<version>` profiles, so that it is easy for a user to use the profile mechanism to request a specific GLSL version without also specifying a stage name.
* Update the render-test logic so that there is a single `ShaderCompiler` implementation that *always* uses Slang, and get rid of all of the renderer-specific `ShaderCompiler` implementations.
* Update logic in render-test `main.cpp` to select the options to use for the eventual Slang compile based on the choice of renderer and input language. I didn't change the options that render-test exposes, even though they are getting increasingly silly (e.g., `-glsl-rewrite` doesn't use GLSL as its input...).
* Note: the D3D12 renderer will still use fxc, DXBC, and SM 5.0 for now, since trying to update it to switch to dxc, DXIL, and SM 6.0 didn't work well at the time.
* Add a bit of supporting D3D12 code to make sure that we don't allocate a structured buffer when a buffer has a format.
* Make sure to *also* define the `__HLSL__` macro when compiling Slang code, because otherwise a bunch of tests don't work (I'm not clear on how it worked before...).
* fixup: missing file
|
|
The problem here arose when a complicated l-value was formed like:
```hlsl
struct Foo { float4 a; }
RWStructuredBuffer<Foo> gBuffer;
gBuffer[index].a.xz += whatever;
```
In this case the `gBuffer[index].a.xz` expression is a complex l-value in multiple ways:
* The `gBuffer[index]` subscript could be routed to either a `get` accessor or a `ref` accessor (and maybe also a `set` accessor if we add one to the stdlib definition), and we defer the choice of which to call until as late as possible in codegen today.
* The `_.a` part then becomes a "bound member acess" because we can't actually produce a direct pointer until we've resolved how to implement the subscript operation.
* The `_.xz` part becomes a "swizzled l-value" because there is *no* way to materialize it as a pointer to contiguous storage in the orignal object (the `x` and `z` components of a vector aren't contiguous).
Recent changes to support atomic operations on buffer elements introduced the `ref` accessor on `RWStructuredBuffer`, which made it possible to form a pointer to a buffer element in the IR. This interacted with some code for the "bound member" case that was trying to only introduce a temporary when absolutely necessary, and was doing so by assuming anything with an address didn't need to be moved into a temporary.
The first fix is to clean up that logic in the bound-member case for assignment: always create a temporary, rather than do it conditionally.
The second fix here is more systemic: we add logic to try to coerce the representation of an l-value during codegen into being a simple address, and employ that in cases where we know an address is desired. In a case like the above this helps to get things into the form that is required, so that a swizzled store can be issued.
There is still some potential for cleanup in this logic, but I don't want to introduce more changes than seem necessary to fix the original problem.
|
|
PR #577 tries to eliminate empty `struct` types by replacing them with a `LegalType::tuple` with zero elements, but this seems to run into problems in some cases, where we end up trying to match up `::none` values with empty `::tuple`s.
An alternative way to handle this is to never create empty `LegalType::tuple`s (and the same for `LegalVal::tuple`), and instead create `LegalType::none` and `LegalVal::none`. PR #577 avoided this because there were various cases in the legalization logic that didn't robustly handle `LegalType::Flavor::none`.
This PR thus includes two main changes:
1. Construct a `::none` type when we have an empty `struct` type.
2. Survery all places that handle the `::tuple` case and extend them to handle the `::none` case if it was missing.
This fixes an issue filed in Falcor's internal GitLab as number 424.
|
|
Slang `enum` declarations will always be scoped, e.g.:
```hlsl
enum Color
{
Red,
Green = 2,
Blue,
}
Color c = Color.Red; // Not just `Red`
```
A user can write `enum class` as a placebo for now (to ease sharing of headers with C++).
Slang does not currently support the `::` operator for static member lookup, so it must be `Color.Green` and not `Color::Green`. Support for `::` as an alternate syntax could be added later if there is strong user demand.
An `enum` type can have a declared "tag type" using syntax like C++ `enum class`:
```hlsl
enum MyThings : uint
{
First = 0,
// ...
}
```
The `enum` cases will store their values using that type. An `enum` that doesn't declare a tag type will use the type `int` by default.
Enum cases are assigned values just like in C/C++: cases can have explicit values, but otherwise default to one more than the previous case, or zero for the first case.
All `enum` types will automatically conform to a standard-library `interface` called `__EnumType`, which is used so that basic operators like equality testing can be defined generically for all `enum` types.
This change only adds one operator at first (the `==` comparison), but other should be added later.
An `enum` case needs to be explicitly converted to an integer where needed (e.g., `int(Color.Red)`).
This is implemented by having the main integer types (`int` and `uint`) support built-in initializers that can work for *any* `enum` type (or rather, anything conforming to `__EnumType`).
Eventually these will be restricted so that an `enum` type can only be converted to its associated tag type.
IR code generation completely eliminates `enum` types and their cases.
The `enum` type will be replaced with its tag type, and the cases will be replaced with the tag values.
Currently this could leave some mess in the IR where cast operations are applied between values that actually have the same type.
|
|
* Add basic support for Shader Model 6.3 profiles
This adds `vs_6_3` and friends as available profiles, but doesn't add any new builtins specific to Shader Model 6.3.
In order to better support the ray tracing shader stages, Slang will not automatically map any attempt to compile a DXR shader up to SM 6.3 (the shader model officially required for these stages) and to the `lib_*` profiles (because there are no stage-specific profiles for these cases).
As an added detail, when invoking `dxcompiler.dll` to generate DXIL for DXR shaders, specify an empty entry-point name, since that is expected for `lib_*` profiles.
* Fixup: don't drop [shader(...)] attributes
The previous change makes the "effective profile" for DXR compiles no longer include a stage, but we had been using the stage stored on the effective profile in exactly one place: when determining what to output for a `[shader("...")]` attribute.
This fixup makes it so that we use the stage from the profile on the entry-point layout instead, which seems like the right choice anyway, if we are ever going to emit multiple entry points at once.
|
|
* Fix atomic operations on RWBuffer
An earlier change added support for passing true pointers to `__ref` parameters to fix the global `Interlocked*()` functions when applied to `groupshared` variables or `RWStructureBuffer<T>` elements.
That change didn't apply to `RWBuffer<T>` or `RWTexture2D<T>`, etc. because those types had so far only declared `get` and `set` accessors, but not any `ref` accessors (which return a pointer).
The main fixes here are:
* Add `ref` accessors to the subscript oeprations on the `RW*` resource types
* Adjust the logic for emitting calls to subscript accessors so that we don't get quite as eager about invoking a `ref` accessor, and instead try to invoke just a `get` or `set` accessor when these will suffice. This is important for Vulkan cross-compilation, where we don't yet support the semantics of our `ref` accessors.
* Add a test case for atomics on a `RWBuffer`
* Fix up `render-test` so that we can specify a format for a buffer resource, which allows us to use things other than `*StructuredBuffer` and `*ByteAddressBuffer`. The work there is probably not complete; I just did what I could to get the test working.
* A bunch of files got whitespace edits thanks to the fact that I'm using editorconfig and others on the project seemingly arent...
* fixup: remove ifdefed-out code
|
|
The HLSL/GLSL output by Slang should try to be robust against whatever flags somebody uses to compile it. Therefore, we will go ahead and output a target-language-specific directive to control the default matrix layout mode so that we can override whatever might be specified via flags.
Also, as long as we are at it, this change goes ahead and makes Slang unconditionally emit row/column-major layout modifiers on all matrices (and arrays of matrices) whereas before these were only being output sometimes (the code to do it seemed buggy to me...).
|
|
* First pass at support for textures in vulkan.
* Binding state has first pass support for VkImageView VkSampler.
* Split out _calcImageViewType
* Fix bug in debug build around constant buffer being added but not part of the binding description for the test.
* Offset recalculated for vk texture construction just store the texture size for each mip level.
* When outputing a vector type with a size of 1 in GLSL, it needs to be output as the underlying type. For example vector<float,1> should be output as float in GLSL.
* Vulkan render-test produces right output for the test
tests/compute/textureSamplingTest.slang -slang -gcompute -o tests/compute/textureSamplingTest.slang.actual.txt -vk
* Small improvement around xml encoding a string.
* More generalized test synthesis.
* Fix image usage flags for Vulkan.
* Improvements to what gets synthesized vulkan tests.
* Do transition on all mip levels.
* Fixing problems appearing from vulkan debug layer.
* Disable Vulkan synthesized tests for now.
* Add Resource::Type member to Resource::DescBase.
* Removed the CompactIndexSlice from binding. Just bind the indices needed.
* BindingRegister -> RegisterSet
* RegisterSet -> RegisterRange
* Typo fix for debug build.
* Remove comment that no longer applied.
|
|
* Add options to control matrix layout rules
Up to this point, the Slang compiler has assumed that the default matrix layout conventions for the target API will be used.
This means column-major layout for D3D, and *row major* layout for GL/Vulkan (note that while GL/Vulkan describe the default as "column major" there is an implicit swap of "row" and "column" when mapping HLSL conventions to GLSL).
This commit introduces two main changes:
1. The default layout convention is switched to column-major on all targets, to ensure that D3D and GL/Vulkan can easily be driven by the same application logic. I would prefer to make the default be row-major (because this is the "obvious" convention for matrices), but I don't want to deviate from the defaults in existing HLSL compilers.
2. Command-line and API options are introduced for setting the matrix layout convention to use (by default) for each code generation target. It is still possible for explicit qualifiers like `row_major` to change the layout from within shader code.
I also added an API to query the matrix layout convention that was used for a type layout (which should be of the `SLANG_TYPE_KIND_MATRIX` kind), but this isn't yet exercised.
I added a reflection test case to make sure that the offsets/sizes we compute for matrix-type fields are appropriately modified by the flag that gets passed in.
In a future change we could possibly switch the default convention to row-major, if we also changed our testing to match, since there are currently not many clients to be adversely impacted by the change.
* Fixup: silence 64-bit build warning
|
|
doesn't block on global memory accesses. The fix is to copy the values to be processed by InterlockedAdd into shared array. The previous test ran successfully on Dx11, but broke on Dx12. (#586)
|
|
* First pass at support for textures in vulkan.
* Binding state has first pass support for VkImageView VkSampler.
* Split out _calcImageViewType
* Fix bug in debug build around constant buffer being added but not part of the binding description for the test.
* Offset recalculated for vk texture construction just store the texture size for each mip level.
* When outputing a vector type with a size of 1 in GLSL, it needs to be output as the underlying type. For example vector<float,1> should be output as float in GLSL.
* Vulkan render-test produces right output for the test
tests/compute/textureSamplingTest.slang -slang -gcompute -o tests/compute/textureSamplingTest.slang.actual.txt -vk
* Small improvement around xml encoding a string.
* More generalized test synthesis.
* Fix image usage flags for Vulkan.
* Improvements to what gets synthesized vulkan tests.
* Do transition on all mip levels.
* Fixing problems appearing from vulkan debug layer.
* Disable Vulkan synthesized tests for now.
|
|
Fixes #581
This change adds a new parameter passing mode `__ref` to exist alongisde `in`, `out`, and `inout`.
The `__ref` modifier indicates true by-reference parameter passing (whereas `inout` is copy-in-copy-out).
This is not intended to be something that users interact with directly, but rather a low-level feature that lets us provide a correct signature for the `Interlocked*()` operations in the standard library.
Most of the support for passing what are logically addresses around already exists in the IR, so the majority of the work here is just in introducing the new type `Ref<T>` and then using it appropriately when lowering `__ref` parameters/arguments to the IR.
|
|
|
|
* render-test should not fail on HLSL compiler *warnings*
The logic in `render-test` that invokes `D3DCompile` was causing a test to fail if it produced any warnings (not just if compilation fails).
Warning output can be dealt with by the test runner, since it will compare output between runs anyway, and it is useful to be able to run something through `render-test` that compiles with warnings.
* Be more careful about deleting IR instructions
There was an `IRInst::deallocate()` method that had a precondition that the instruction should already be removed from its parent and clear out all its operands before calling, but it wasn't checking this and the few call sites weren't doing things right either.
I consolidated things on `IRInst::removeAndDeallocate()` which does all the things: removes from the parent, clear out operands, and then deallocates.
I also made sure to clear out the type operand.
This clears up some crashing issues where passes were removing instructions but those instructions would still show up as users of other instructions.
* Don't emit bitwise not for non-Boolean types
It seems like the logic in `emit.cpp` messed things up and decided that `Not` (the IR instruction that is equivalent to `!` in the AST) should emit as `!` for Boolean types and `~` for other types, but this makes no sense (e.g., `~(a & 1)` is very different from `!(a & 1)`, even when interpreted as a condition).
It seems like this logic was intended for the `BitNot` case, where `~a` and `!a` are actually equivalent for Boolean values (but a target language might not like `~a` on `bool` values).
Maybe the original plan was that the `Not` instruction should only apply to Boolean values in the first place, and that other values should be converted to `bool` (or a vector of `bool`) before applying `Not`, but even in that case the emit logic makes no sense.
This caused an actual problem for one of my test cases, so it was important to fix it now.
* Fix issue with cached resolution for overoaded operators
The basic problem was that the lookup logic was forming a key based on the *first* definition it found for the overloaded operator, but that means that when processing a prefix `++a` call we might look up the *postfix* definition of `operator++` and decide to use its opcode as the key.
This "fixes" the logic by looking for the first definition with a "compatible" definition (e.g., a `__prefix` function if we are checking a `PrefixExpr`), and then uses its opcode.
A better fix in the long run would be to make the cache just be keyed on the operator name and the "fixity" of the expression (prefix, postfix, or infix).
* Introduce an intermediate structured control-flow representation
The code previously used a single function called `emitIRStmtsForBlocks` in `emit.cpp` that would take a logical sub-graph of the CFG and emit it as high-level statements.
It would do this by recognizing operations like coniditional branches that it could turn into high-level `if` statements, etc.
The main problem with this function was that it mixed together the logic for how we restructure the program with the logic for how we emit high-level code from that structure.
This change splits those two parts of the algorithm by introducing an intermediate data structure: a tree of `Region`s, which represent single-entry regions of the CFG.
There are subclasses of `Region` corresponding to various structured control-flow constructs, and then a leaf case that wraps a single `IRBlock`.
The new function `generateRegionsForIRBlocks()` (in `ir-restructure.cpp`) now handles the restructuring work, by building one or more `Region`s to represent a sub-graph, while `emitRegion()` handles emitting HLSL/GLSL source code from a region.
Splitting things in this way opens up some opportunities for future changes:
* We can expand the set of IR control-flow constructs allowed, so long as we can still generate structure `Region`s from them, without having to mess with the emit logic (e.g., we could start to support multi-level `break` by introducing temporaries as needed). In the limit we can generate our `Region`s using something like the "Relooper" algorithm.
* We can emit to other representations while retaining the same control-flow restructuring support. E.g., if we drop the structured information from the IR, then emitting to SPIR-V for Vulkan would require us to use the strucured control-flow information from these `Region`s.
* We can do analysis that needs to understand `Region` structure. This is relevant to issue #569, which was what prompted me to start on this work. Now that we have a representation of the nesting of `Region`s, we can use it to reason about visibility of values between blocks.
During development of this change I ran into a gotcha, in that I had been assuming each IR block would map to a single `Region`, forgetting that our current lowering of "continue clauses" in `for` loops leads to them being duplicated. The `Region` representation handles this by having a linked-list struct mapping IR blocks to the `SimpleRegion`s that represent them. I added a test case that includes a `for` loop with a continue clause that is reached along multiple paths just to make sure that we continue to support that case.
The compiler output should not change as a result of this work; this is supposed to be a pure refactoring change.
* Add a pass to resolve scoping issues in generated code
Fixes #569
The basic problem arises because the structured control flow that we output in high-level HLSL/GLSL doesn't match the "scoping" rules of an SSA IR.
In particular, SSA says that a value can be used in any block that is dominated by the definition, but in the presence of `break` and `continue` statements it is easy to construct cases where a block dominates something that is not in its scope for structured control flow. Consider:
```hlsl
for(;;) {
int a = xyz;
if(a) { int b = a; break; }
int c = a;
}
int d = b;
```
This program is invalid as HLSL, because the variable `b` is referenced outside of its scope, but if we look at the CFG for this function, it is clear that the block that computes `b` dominated the block that computes `d`. IR optimizations can easily create code like this, so we need to be ready for it.
The previous change added an explicit `Region` structure to represent the structured control flow that we re-form out of the IR, and this change adds a pass that exploits the structuring information to detect cases like the above and introduce temporaries to fix the scoping issue. For example, the pass would change the earlier code block into something like:
```hlsl
int tmp;
for(;;) {
int a = xyz;
if(a) { int b = a; tmp = b; break; }
int c = a;
}
int d = tmp;
```
That is, we introduce a new `tmp` variable at a scope "above" both the definition and use of `b`, and then we copy `b` into that temporary right where it is computed, and then use the temporary instead of the original `b` at the use site.
A few details that came up during the implementation:
* Downstream compilers may get confused by code like the above, and complain that `tmp` may be used before it is initialized, even though the very definition of dominators in a CFG means we don't have to worry about it. Still, I introduced some one-off code to initialize the temporaries just to silence spurious warnings coming from fxc.
* We need to be careful not to apply this logic to "phi nodes" (the parameters of basic blocks) since they will already be turned into temporaries by the emit logic, and trying to introduce temporaries with this pass led to broken code (I still need to investigate why). It may be that a future version of this pass should also take the code out of SSA form, so that we can introduce both kinds of temporaries in a single pass (and maybe eliminate some unnecessary variables by doing basic register allocation).
There is another transformation that could fix some issues of this kind, by moving code out of a structured control-flow construct and to the "join point" after it. For example, we could turn our loop from the start of this commit message into:
```hlsl
for(;;) {
int a = xyz;
if(a) { break; }
int c = a;
}
int b = a;
int d = b;
```
Moving the definition of `b` to after the loop is possible because there is no way to get out of the loop without executing that code anyway. Now the scoping issue for `d`'s use of `b` has gone away, but of course we've introduced a *new* scoping issue for `a`, when it gets used by `b`.
Adding a pass to re-arrange control flow like this could reduce the cases where we have to apply the current pass, but it wouldn't eliminate them entirely. That means such a pass can be deferred to future work.
This change includes a test case the reproduces the original issue, so that we can confirm the fix works.
|
|
(#573)
Fixes #568
The problem occurs when an entry point declares multiple `out` parameters:
```hlsl
void myVS( out float4 a : A, out float4 b : B )
{
...
a = whatever;
b = somethingElse;
...
if(done)
{
return; // explicit return
}
...
// implicit return
}
```
Slang translates code like this by introducing a GLSL global `out` parameter for each of `a` and `b`, rewriting the logic inside the entry point to use a local temporary instead of the real parameters, and then assigning from the locals to the globals at every `return` site:
```glsl
out vec4 g_a;
out vec4 g_b;
void main()
{
// insertion location (1)
vec4 t_a;
vec4 t_b;
...
t_a = whatever;
t_b = somethingElse;
...
if(done)
{
// insertion location(2)
g_a = t_a;
g_b = t_b;
return; // explicit return
}
...
// insertion location (3)
g_a = t_a;
g_b = t_b;
// implicit return
}
```
Note that there are three different places (for this example) where code gets inserted to make the translation work. We insert declarations of local variables at the top of the function, and then insert the copy from the temporariesto the globals at each `return` site (implicit or explicit).
The bug in this case was that the pass was setting the insertion location to (1) outside of the loop for parameters, so that when it was done with `a` and moved on to `b`, it would end up inseting the temporary `t_b` at the last location used (location (3) in this example), and this would result in invalid code, because `t_b` gets used before it is declared.
This bug has been around for a while, but it has largely been masked by the fact that so few shaders use multiple `out` parameters, and also because Slang's SSA-ification pass would often be able to eliminate the local variable anyway, so that the bug never bites the user. The reason it surfaced now for a user shader was because we introduced `swizzledStore`, which currently inhibits SSA-ification, so that some temporaries that used to get eliminated are now retained so that they can break things.
The fix in this case is small: we use the existing `IRBuilder` only for insertions at location (1) and construct a new builder on the fly for all the insertions at `return` sites. I have not included a test case yet, because our end-to-end Vulkan testing is not yet ready, so this may regress again in the future.
|
|
as the underlying type. For example vector<float,1> should be output as float in GLSL. (#572)
|
|
Fixes #566
The basic problem here is that the front-end translates a structure initializer-list expression into a `makeStruct` instruction (with one argument per field), but the IR type legalization logic wasn't handling the case where a `makeStruct` is used to construct a struct value that needs to get split by legalization.
The implementation is relatively straightforward, and like the other cases of instruction legalization for compound types, it follows the shape of the `LegalType`/`LegalVal` cases. The one interesting bit is that we need to be a bit careful and filter the single argument list for `makeStruct` into two in the case where we generate a "pair" type for something that has both "ordinary" and "special" (resource) fields. Luckily the `PairInfo` data that was generated by type legalization has exactly the information we need (by design).
This change does not address several issues that could be handled in follow-on changes:
* The `makeArray` instruction will face similar issues if it is applied to a type that requires legalization: we'd need to turn an array of `LegalVal`s into a bunch of distinct arrays.
* The error message when we hit the unimplemented case here isn't great. Ideally we should provide the line number of the instruction that fails in an error message when legalization fails.
This change tries to focus narrowly on the bug at hand, and leave these issues for later changes.
|
|
* Generate Visual Studio projects using Premake
This change adds a `premake5.lua` file that allows us to generate our Visual Studio solution using Premake 5 (https://premake.github.io/).
The existing Visual Studio solution/projects are now replaced with the Premake-generated ones, and project contributors will be expected to update these by running premake after adding/removing files.
I have *not* changed the Linux `Makefile` build at all, because that file is also used for things like running our tests, so that clobbering it with a premake-generated `Makefile` would break our continuous testing.
Hopefully future changes can switch to a generated `Makefile` and perhaps even add an XCode project as well.
Notes:
* The `build/slang-build.props` file is no longer needed/used, so it has been removed.
* The `slang-eval-test` test fixture wasn't following our naming conventions for its directory path, so it was updated to streamline the Premake build configuration work. This required changes to the `Makefile` as well
* Some seemingly unncessary preprocessor definitions that were specified for `core` and `slang-glslang` have been dropped. We will see if anything breaks from that.
* Possible fixup for Premake vpath issue
Premake's `vpath` feature seems to be nondeterministic about the order it applies filters (because Lua isn't deterministic about the order of entries in a key/value table), and as a result we can end up in a weird case where it decides that a `foo.cpp.h` file matches the `**.cpp` filter (I'm not sure why) before it tests against the `**.h` filter.
This change uses an (undocumented) Premake facility to set `vpath` using a list of singleton tables, which seems to fix the order in which things get tested.
* Remove support for "single-file" build of Slang
The `hello` example was the only bit of code that uses the "single-file" way of building Slang, and this had already run up against limitations of the Visual Studio compilers in its Debug|x64 build.
Rather than mess with Premake to make it pass through the `/bigobj` linker flag that is needed to work around the issue, it makes more sense just to stop using/supporting the feature since we wouldn't want users to depend on it anyway (our documentation no longer refers to it).
While I was at it I went ahead and made sure that the `SLANG_DYNAMIC` flag doesn't need to be set manually, so that instead there is a non-default `SLANG_STATIC` option (not that we have a static-library build of Slang at the moment).
|
|
Resolves #310
The behavior was fixed in #484, but that change didn't add test cases to cover the new functionality.
|
|
* Cleanups around behavior when the compiler fails
* Add another case where we try to `noteInternalErrorLoc()` if an exception in thrown. This one is the in the logic for emitting an IR instruciton. This could be improved by adding another layer at the function level (as a catch-all for instructions with no location), but something is better than nothing.
* Change a bunch of `assert()`s over to `SLANG_ASSERT()`s, so that we can theoretically take more control over them (e.g., make release builds with asserts enabled)
* Some other small cleanups around the assertions we perform.
In the survey I made, I didn't really see many obvious "smoking gun" cases where we could produce a significantly better error message for some of the unimplemented/unexpected paths, other than to actually implement the missing functionality.
* fixup
|
|
Resolves #357
The example shader from that issue has been added as a test case, and works with the top-of-tree Slang compiler (most likely due to the changes introduced with the IR-level type system).
|
|
|
|
Workaround for cases where we emit illegal-but-unused types
|
|
This is a quick workaround to deal with cases where we try to emit an unreferenced IR type that contains references to pre-legalization types (which might have been removed from the IR even thought they are still referenced). The basic fix is to *not* add types to our global order of instructions to emit by default, and only add them on demand as they are referenced by other instructions.
This is not a real fix for the underlying issue, which is that type legalization is only being applied to a subset of global instructions instead of all of them. A more detailed fix for that problem will need to be devised next.
This fix also doesn't address the question of why an unreferenced `struct` type came to be present in the IR code passed to the back-end in the first place. It would be good to understand how this scenario is arising.
|
|
* Remove serialization of screen captures from a renderer implementation, capture now writes to a Surface. Then client code can decide to serialize (or use as needed).
* Improved comment for captureScreenSurface.
* First pass support for xunit output.
* Controlling output to improve xunit support.
* Xml encoding and writing out of error/skip for xunit.
* Fixes to make build on linux.
* Fix typo for linux build.
|
|
This was based on feedback from Falcor users, who felt like changing the default to have no line directives didn't work out well. Since I'd only made them disabled by default based on what I perceived to be Falcor's needs, I'm happy to turn this back on by default.
I also added a few changes to clean up the output:
* Don't emit a directive for a sub-expression, since that breaks up the code too much. The only directives inside a function body will be on top-level instructions that didn't get folded into a use site.
* Add logic to emit a directive for top-level declarations (globals, functions, structs), and clean up their printing so that they put any extra space *after* the declaration rather than before (so the line numbers can be accurate)
* Don't emit the file path part of a directive if it would be the same as the previous directive. This makes the output less noisy, at the cost of having to work your way backward to find the file if you are looking directly at the output.
There are certainly more cleanups possible, but these make the output decent enough to be useful for working backwards from a downstream compiler error to the offending code.
|
|
The emit logic already had an idea of when an instruction should be "folded" it its use site(s), and this change just expands on that logic to try to be more aggressive. The basic idea is that instead of outputting this:
```hlsl
float4 _S3 = a_0 + b_0;
float4 _S4 = c_0 * _S3;
d_0 = _S4;
```
we can hopefully output something like this:
```hlsl
d_0 = c_0 * (a_0 + b_0);
```
The way this works is that after dealing with the various special cases that decide an instruction `I` must/cannot be folded in, we look and see if it has the following properites:
* `I` has no side effects
* `I` has a single user, `U`
* `I` and `U` are in the same block (and `I` comes before `U` in that block)
* for every instruction `X` between `I` and `U` (exclusive), `X` has no side effects
If all of these conditions are true, then `I` can be folded in as a sub-expression when we emit `U`.
This change doesn't affect most of our test output, but there is still a single test with SPIR-V output that we compare against a GLSL baseline, and so that baseline had to be modified to match the GLSL we now generate.
Similar to #547, this change is not meant to provide a complete solution, but rather to take a concrete but low-risk step toward improving our output. Opportunities to improve the results further include:
* We can/should ensure that when outputting sub-expressions we keep extra parentheses to a minimum. The old logic for emitting from an AST had support for "unparsing" expressions with minimal parentheses, and we should try to do the same. This can be error-prone, because omitting parentheses can lead to silent failures, so it must be done carefully.
* We could try to be more aggressive about detecting what operations might have side effects. The most interesting case is function calls, where we should try to check if the callee is a function known to be side-effect-free. We could start by annotating most builtin functions with an attribute/decoration that indicates freedom from side effects. Deriving this attribute for user functions could be interesting, but we'd have to be careful since "nontermination" is technically a side effect.
* We could try to be more aggressive about determining what side effects in instructions `X` are "safe" for the instruction `I` to move across. For example, if `I` is a load from variable `a` and `X` is a store to variable `b`, then that would seem to be safe. This starts to get into issues of instruction scheduling, though, and that is probably beyond what we want Slang to be doing.
|
|
* Remove serialization of screen captures from a renderer implementation, capture now writes to a Surface. Then client code can decide to serialize (or use as needed).
* Improved comment for captureScreenSurface.
|
|
This code is currently not used by anything, but I wanted to check in a first pass at an implementation of dominator tree construction so that we don't have to keep avoiding implementing algorithms that rely on having dominator information available.
The algorithm used to construct the dominator tree is taken from "A Simple, Fast Dominance Algorithm" by Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy.
This is not the "best" algorithm in terms of asymptotic performance, but it is among the simplest algorithms for computing a dominator tree that still outperforms naive iterative set-based methods.
The actual data structure and API for the dominator tree has a bit of "cleverness" in it to try to make the common queries reasonably fast (e.g., you can check whether A dominates B in constant time).
My hope is that even if we implement a more advanced algorithm for constructing the dominator tree, we can retain compatibility with passes that might make use of this API.
Because no code is currently using this logic, I have done only minimal testing by stepping through this code and validating the results on paper for some very small CFGs.
More serious testing/debugging may need to wait until we have an optimization pass that needs the dominator tree we compute here.
One open question I have is how best to introduce traditional unit testing into Slang, since this is an example of code that would benefit greatly from being unit tested.
|
|
The basic idea here is that when lowering to the IR, the front-end will attach a "name hint" to the IR instruction(s) that represent a given declaration, and then the passes that work on the IR will try to preserve and propagate those names, and then finally the emit logic will use them in place of mangled or unique names when available.
This change does *not* try to deal with the issues that arise when we try to use those variable names in the output without any modification (e.g., handling cases where they might clash with keywords or builtins in the target language). Instead, it tries to establish baseline behavior for propagating through names, so that a later change can concentrate on the issue of using those names exactly when it is legal to do so.
In order to avoid issues around the name "hints" causing problems we take two main steps:
1. We "scrub" each name to reduce it down to the allowed set of identifier characters in C-like languages, and then ensure that it doesn't do things that would be illegal in some downstream languages (e.g., consecutive underscores are not allowed in GLSL) or could clash with Slang's mangled names. This process isn't guaranteed to give distinct results for distinct inputs (it isn't a mangling scheme, after all).
2. We generate a unique ID for each occurence of a given name and always use that as a suffix. This means that even if a name happens to overlap with a keyword (if you somehow have a variable named `do`), we will still add a suffix that makes it not a problem (we'd output `do_0` which is fine).
The logic for generating these names is mostly straightforward. For simple variables, we use their given name directly, while for other declarations we try to form a name that includes their parent declaration (e.g. `SomeType.someMethod`).
Various IR passes need to propagate or preserve this information. The most interesting is type legalization, when we take a variable with an aggregate type and split some of the fields out into their own variables. In that case we generate "dotted" names like `someVar.someTexture` and rely on the emit logic to turn that into `someVar_someTexture`.
During SSA generation, if we are promoting a variable to SSA temporaries, we will try to propagate the name of the variable over to the temporaries (unless they already have a name from some other place). The same applies to block parameters ("phi nodes").
Many of the test changes need their expected output to be updated for this change. Luckily in most cases the output has gotten easier to understand.
|
|
pixels. (#548)
Added PngSerializeUtil allows currently for just writing Surface of RGBA format.
Removes dependency on stbi_image except for in PngSerializeUtil.
Removed use of gWindowWidth/Height globals - pass the height into initialize or Renderer.
|
|
|
|
* First pass at InputLayout for Vulkan
Add support for RGBA_Float32
* Use VulkanModule and VulkanApi to handle accessing Vulkan types.
* First pass at Vulkan swap chain/Device queue.
* Added VulkanUtil for generic function functions.
* Move more functionality to VulkanApi and VulkanUtil.
Make Buffer able to initialize itself.
* More tidy up around VulkanDeviceQueue
* First pass use of VulkanDeviceQueue in VkRenderer
* First pass use of VulkanSwapChain on VkRenderer
* Added depth formats.
Binding for constant and vertex buffers for Vulkan.
* Setting up VkImageView on backbuffers.
* First pass support for setting up vkRenderPass.
* Fixes to work around Vulkan swap chain/verification issues.
* Added support for Pipeline and a pipeline cache.
* Working without waiting - because use of pipeline cache.
* Added support for VkFramebuffer in Vulkan.
* First pass at creating Vulkan graphics pipeline.
* More efforts to get Vulkan to render.
* Small improvement for checking of Binding flags.
* Removed setConstantBuffers from the Renderer interface - so that all resource binding takes place through the BindingState.
To make this work required a 'hack' in render-test main.cpp - so that the constant buffer binding that is needed in some tests is only added when it doesn't clash.
* RendererID -> unified into RendererType. Added getRendererType to Renderer interface.
Added ProjectionStyle, and function to get from RendererType.
Added getIdentityProjection to RendererUtil - to get projection that is the 'identity' - but hits the same pixels for all projection styles.
* Fix build problem on Win32 on Vulkan where should use VK_NULL_HANDLE.
* Improve naming, comments. Remove dead code.
* Remove unwanted comment.
|
|
Speedup type checking using cached overload resolution results.
|
|
|