slang.git - Making it easier to work with shaders

Age	Commit message (Collapse)	Author
2018-08-21	Add support for more RasterizerOrdered types (#628)	Tim Foley
	Fixes #627 The front-end has support for `RasterizerOrderedBuffer` and `RasterizerOrderedTexture`, but left out support for: `RasterizerOrderedByteAddressBuffer` * `RasterizerOrderedStructuredBuffer` [Nitpick: these tyeps are all amazingly annoying to type. It is easy to want to write `RasterOrdered` instead of the bulkier `RasterizerOrdered`, and almost everybody does in casual speech. There's already the issue of wanting to type `StructureBuffer` (a buffer of structures) instead of `StructuredBuffer` (a buffer that is... structured?). Then you have `ByteAddressBuffer` which is just adding to the confusion because it is nominally a "byte addressable" buffer (so that `ByteAddressedBuffer` would actually make sense), but then actually isn't byte addressable in practice.] There were a few `TODO` comments related to this already, and this change was mostly a matter of doing a find-in-files for `RWByteAddressBuffer` and `RWStructuredBuffer` and adding matching `RasterizerOrdered` cases. The test I added just checks that these types make it through the front-end, and doesn't do any actual confirmation that they work as intended. It is worth noting that the handling of ordering in GLSL/VK is different from in HLSL ("pixel shader interlock" instead of "rasterizer ordered views"), so coming up with a cross-compilation story would need to be a later step.
2018-08-10	Improve model-viewer support for lights (#626)	Tim Foley
	* Improve model-viewer support for lights The main visible change here is that the model-viewer example supports multiple light sources, with a basic UI for adding new light sources to the scene, and for manipulating the ones that are there. Along the way I also refactored the `IMaterial` decomposition to be a bit less naive, while still only including a completely naive Blinn-Phong implementation. I also went ahead and spruced up the `cube.obj` file so that it has multiple materials, although it is still a completely uninteresting asset. * Fixup: Windows SDK version
2018-08-06	Add basic support for "Dear IMGUI" (#625)	Tim Foley
	This isn't being made visible just yet, but it will allow us to have a simple UI for loading models into the model-viewer example. In order to support rendering with IMGUI I had to add the following to the `Renderer` layer: * viewports * scissor rects * blend support These are really only fully implemented for D3D11, but adding them to the other back-ends should be a reasonably small task.
2018-08-03	Major overhaul of Renderer abstraction, to support a new example (#624)	Tim Foley
	The original goal here was to bring up a second example program: `model-viewer`. While the existing `hello-world` example is enough to get somebody up to speed with the basics of the Slang API (as a drop-in replacement for `D3DCompile` or similar), it doesn't really show any of the big-picture stuff that Slang is meant to enable. There wasn't any use of D3D12/Vulkan descriptor tables/sets, and there wasn't any use of interfaces, generics, or `ParameterBlock`s in the shader code. The `model-viewer` example addresses these issues. Its shader code involves generics, interfaces, and multiple `ParameterBlock`s, and the host-side code demonstrates a few key things for working with Slang: * There is an application-level abstraction for parameter blocks, that combines the graphics-API descriptor set object with Slang type information * There is a shader cache layer used to look up an appropriate variant of a rendering effect by using parameter block types to "plug in" global type variables * There is a clear separation between the phases of compilation: a first phase that does semantic checking and enables reflection-based allocation of graphics API objects, followed by one or more code generation passes for specialized kernels. This example is certainly not perfect, and it will need to be revamped more going forward. In particular: * The output picture is ugly as sin. We need a plan for how to get this to load better content, perhaps even popping up an error message to note that the required input data isn't present in the basic repository. * The shader code is too simplistic. There isn't any real material variety, and the `IMaterial` abstraction is completely wrong. * The use of parameter blocks is facile because there are no resource parameters right now. Fixing that will likely expose issues around interfacing with Slang's reflection API. * The whole example exposes the issue that Slang's current APIs aren't really designed for the benefit of two-phase compilation (since our many client application has been stuck on one-phase compilation). * Global type parameters are actually a Bad Idea that we only did for compatibility with existing codebases. We should not be showing them off in an example of the Right Way to use Slang, but the language support for type parameters on entry points is still not complete. Of course, the majority of the changes here are not inside the example applications, and instead involve a major overhaul of the `Renderer` abstraction that is used for both tests and examples. The main thrust of the change is to make the abstraction layer be closer to the D3D12/Vulkan model than to a D3D11-style model. This is important for the `model-viewer` example, since it aspires to show how Slang can be incorporated into a renderer that targets a modern API. The most important bit is actually the use of descriptor sets and "pipeline layouts" a la Vulkan, since without these Slang's `ParameterBlock` abstraction won't make a lot of sense. Implementation of the abstraction for the various APIs has very much been on an as-needed basis. The current implementation is just enough for the two examples to work, plus enough to get all the tests to pass in both debug and release builds on Windows. A big missing feature in the API abstraction right now is memory lifetime management. The code had been trending toward something D3D11-like where a constant buffer could be mapped per-frame with the implementation doing behind-the-scenes allocation for targets like D3D12/Vulkan. I'd like to shift more toward a model of just exposing "transient" allocations that are only valid for one frame, because these are more representation of how an efficient renderer for next-generation APIs will work. That transition isn't actually complete, though, so there are problems with the existing examples where `hello-world` is actually scribbling into memory that the GPU might still be using, while `model-viewer` is doing full-on heavy-weight allocations on a per-frame basis with no real concern for the performance implications. All together, there are a lot of things here that need more work, but this branch has been way too long-lived already, and so I'd like to get this checked in as long as all the tests pass.
2018-07-31	Fix imageStore output for types other than 4-vectors (#622)	Tim Foley
	Fixes issue #620 Given a `RWTexture*` store operation like: ```hlsl RWTexture3D a<float>; ... float x = 1.0f; a[crd] = x; ``` We were generating output GLSL like: ```glsl layout(rgba32f) image3D a; ... float x = 1.0f; imageStore(a, crd, x); ``` but in that case, the `imageStore` operation expected a `vec4` and not a `float` for the last argument, and we fail GLSL compilation. This change extends our handling of the `imageStore` operation in the stdlib so that we pad out the last argument if it is not a 4-vector. We also flesh out the code that was picking a `layout(...)` modifier for image formats so that it doesn't just blindly use `layout(rgba32f)` and instead takes the element type fed to `RWTexture3D<...>` into account. With these two changes, the above HLSL/Slang code now translates to: ```glsl layout(r32f) image3D a; ... float x = 1.0f; imageStore(a, crd, vec4(x, float(0), float(0), float(0))); ``` Note that we are padding out the `x` argument to a full vector, and also that we declare the image with `layout(r32f)` to reflect the fact that it has only as single channel.
2018-07-31	Feature/attributed binding (#621)	jsmall-nvidia
	* Typo fix, and added dxc to command line documentation. * Fix small typos. Added support for Scope to lexer. Fix bug in Token ctor. * Add support for attribute names that are scoped. * Added GLSLBindingAttribute. Make binding work through core.met.slang. * Allow [[gl::binding(binding, set)]] [[vk::binding(binding,set)]]
2018-07-26	Fix translation of RWTexture subscript operations for Vulkan (#618)	Tim Foley
	Partially fixes #615 There's kind of a mess going on here, and it is difficult to be sure which of the changes here are strictly necessary. Also, our testing isn't setup to run tests that use `RWTexture2D`, so the only testing I can really run is manual tests using Falcor. The most basic issue here is that in an earlier change I added `ref` accessors for the subscript operation on various `RW` types in the standard library, and that included `RWTexture2D` (and the other `RWTexture` types). The compiler ended up favoring a `ref` accessor over a `set` accessor even when the `set` would suffice, but only the `set` accessor could be lowerd to GLSL/SPIR-V. This change ends up implementing two different fixes for the same problem: * Logic has been added to try and favor a `set` accessor over a `ref` accessor in the cases where either could be used (but still require a `ref` accessor to be used when it is really needed) * The `ref` accessor for `RWTexture*` has been removed, since it turns out that the operations that might have benefited from it (atomics, and component-granularity stores) aren't actually allowed on typed UAVs anyway. There is a deeper issue here that somebody needs to go through and rationalize our representation and handling of accessors like this, but I'm not going to be able to do that in the time I can put into this PR.
2018-07-26	Fix implicit flat interpolation for GLSL output (#619)	Tim Foley
	There was some logic called `maybeEmitGLSLFlatModifier()` that was supposed to emit an implicit `flat` modifier for any varying shader parameter with an integer type that wasn't qualified as `nointerpolation` in the input HLSL/Slang (where `nointerpolation` is the equivalent of `flat`). This wasn't being triggered because I apparently added code to only emit the implicit modifier if there was no explicit one, but then I had this code: ```c++ bool anyModifiers = false; anyModifiers = true; ... if(!anyModifiers && ...) { maybeEmitGLSLFlatModifier(); } ``` Unsurprisingly, the `anyModifiers = true` line meant that things never actually triggered. Once I fixed that issue the next problem that arose was that the `maybeEmitGLSLFlatModifier()` logic was being applied to any varying integer parameter, which includes fragment outputs, but GLSL forbids the `flat` modifier on fragment outputs and so gave an error on a shader that wrote to an integer target. I fixed up the logic to take computed layout for a shader parameter into account, and only emit the `flat` modifier for fragment inputs. As the `TODO` commend at that location notes, there may be some arcane rules about how a vertex shader also needs to use `flat` when declaring the matching output, so we may need to make that test more careful down the road. For now the shader that originally surfaced this problem now works under Vulkan.
2018-07-17	Fix problem when doing options parse, that failure doesn't leave appropriate ↵	jsmall-nvidia
	message in diagnostic string. (#612)
2018-07-06	spCompile/spProcessCommandLineArguments return SlangResult (#610)	jsmall-nvidia
	* * Make spCompile return SlangResult * Make spProcessCommandLineArguments return SlangResult (and not internally exit) * Remove calls to exit() * Fix typos * Make all output from spProcessCommandLineArguments get sent to diagnostic sink.
2018-06-28	Fix up definitions of half and double in stdlib (#608)	Tim Foley
	An earlier change made sure that the `half` and `double` types properly conform to the `__BuiltinFloatingPointType` and `__BuiltinRealType` interfaces, but somehow that change modified only the generated source file (`core.meta.slang.h`) and not the source that fed into the generator (`core.meta.slang`). This meant that when building the compiler, we'd end up with spurious diffs because we'd run the generation logic and clobber the (correct) output file with freshly generated (wrong) code. This change adds the missing lines to the source file to fix up the issue.
2018-06-27	Support for Tessellation (#607)	jsmall-nvidia
	* Fix typo OuptutTopologyAttribute -> OutputTopologyAttribute First pass support for handing tesselation shaders - domain and hull. * Added attribute PatchConstantFuncAttribute * Added visitHLSLPatchType(HLSLPatchType* type) such that the patch type template parameters are handled * Added IRNotePatchConstantFunc - such that the patch constant function is referenced within IR * Added support for outputing typical tesselation attributes (although minimal validation is performed) * Added findFunctionDeclByName * Small improvements to diagnostic. * Improved diagnostics and checking for geometry shader attributes. * Added diagnostic if patchconstantfunc is not found Handle assert failure when outputing a domain shader alone and therefore attr->patchConstantFuncDecl is not set. * Simple script tess.hlsl to test out domain/hull shaders. * Added url for where hull shader attributes are defined. * Fix unsigned/signed comparison warning. * Restore removal of fix in "Improve generic argument inference for builtins (#598)" * Update tessellation test case to compare against fxc The test was previously comparing against fixed expected DXBC output, but this caused problems when the test runner tried to execute the test on Linux (where there is no fxc to invoke...), and would also be a potential source of problems down the road if different users run using different builds of fxc. The simple solution here is to convert the test to compare against fxc output generated on the fly. That test type is already filtered out on non-Windows builds, so it eliminates the portability issue (in a crude way). I also changed the test to compile both entry points in one compiler invocation, just to streamline things into fewer distinct tests. * Eliminate unnecessary call to `lowerFuncDecl` In a very obscure case this could cause a bug, if the patch-constant function had somehow already been lowered (because it was called somewhere else in the code). The call should not be needed because `ensureDecl` will lower a declaration on-demand if required, so eliminating it causes no problems for code that wouldn't be in that extreme corner case.
2018-06-22	Expose macros/functionality for defining interfaces (#604)	jsmall-nvidia
	* Added Result definitions to the slang.h * Removed slang-result.h and added slang-com-helper.h * Move slang-com-ptr.h to be publically available. * Add SLANG_IUNKNOWN macros to simplify implementing interfaces. Use the SLANG_IUNKNOWN macros to in slang.c * Removed slang-defines.h added outstanding defines to slang.h
2018-06-14	Add support for "blobs" and a file-system callback (#596)	Tim Foley
	* Add support for "blobs" and a file-system callback The most obvious change here is that the Slang header now includes a few COM-style interfaces that can be used for communication between the application and compiler. In order to support the declaration of COM-like interfaces, several platform-detection macros were lifted out of `slang-defines.h` and into the public `slang.h` header. As it exists right now, this change makes the Slang API C++-only, but a C-compatible version can be defined later with the help of lots of macros (and/or something like an IDL compiler). The two big interfaces introduced are: * The `ISlangBlob` interface, which is compatible with `ID3DBlob`, `IDxcBlob`, etc. This is used to pass ownership of source/compiled code across the API boundary without copies. New versions of various entry points have been added to allow passing blobs: e.g., `spAddTranslationUnitSourceBlob` and `spGetEntryPointCodeBlob`. * The `ISlangFileSystem` interface, which is used to allow applications to intercept any attempt by the Slang compiler to load a file (input source files, include files, etc.). This is not the same as the `IDxcIncludeHandler` interface, because it assumes UTF-8 encoded path names, instead of the 16-bit encoding that dxc/Windows prefer. It is also not very similar to `ID3DInclude` as used by fxc, because this callback interface is not responsible for handling the search through include paths, etc. - it is just a file-system abstraction layer. Internally, a few different parts of the compiler were changed to either store data in blob form all the time, or to be able to synthesize a blob on-demand. Because our internal `String` type is a reference-counted copy-on-write type, using a `SlangStringBlob` to hold string data should achieve transfer of ownership back to the application without extraneous copies. There is plenty of room to clean up the architecture of some of these internal pieces if they know that their data will end up in a blob. The existing Slang testing doesn't touch any of the APIs introduced here, so they can only confirm that existing functionality hasn't been broken. The new ability to return code blobs has been tested by integration of that feature into Falcor, but there has been zero testing of the ability to pass in source code as blobs, and the ability to hook file loading. Future changes will need to add test coverage for the new features. * fixup: define SLANG_NO_THROW for non-Windows builds * fixup: header copy-paste error caught by clang/gcc * Cleanup: return reference-counted objects via output parameters Returning a reference-counted object through the API as a raw pointer creates challenges. The "obvious" answer is that the returned pointer should have an added reference (it is returned at "+1"), and the caller is responsible for releasing that reference. This makes sense when using raw pointers on the calling side: ```c++ IFoo* foo = spGetFoo(...); ... foo->Release(); ``` However, as soon as smart pointers start getting involved (to handle releasing reference counts when we are done with things), the picture gets more complicated: ```c++ MySmartPtr<IFoo> foo = spGetFoo(...); ... ``` The intention of code like that is that `foo` gets released when the smart pointer goes out of scope, but this probably doesn't happen with most smart pointer implementations. If the `MySmartPtr` constructor that takes a raw pointer retains it, then the destructor will only release that reference, and so the object will leak. It is possible that the user will have a smart pointer type where the constructor that takes a raw pointer doesn't retain it, but in general such types introduce the potential for errors of their own, and no matter what the Slang API shouldn't go in assuming any particular policy. This change makes it so that any reference-counted objects that are logically returned from a call are returned through output pointers. This design makes the leak-free cases easy (enough) to implement with raw pointers or smart pointers: ```c++ // raw pointer IFoo* foo = nullptr; spGetFoo(..., &foo); ... foo->Release(); // smart pointer MySmartPtr<IFoo> foo; spGetFoo(..., foo.writeableRef()); ... ``` The only assumption here is that any COM smart-pointer type needs to provide an operation like `writableRef` that is suitable for using that pointer as an output parameter. Given that COM loves output parameters, this seems like a safe assumption (at the very least, anybody who interacts with COM would be used to this convention). Future changes might introduce inline convenience methods for various operations that return results more directly, possibly by introducing a minimal smart-pointer type in the `slang.h` header (without prescribing that clients must use it...). * fixup: another error caught by gcc/clang
2018-06-14	Improve generic argument inference for builtins (#598)	Tim Foley
	Fixes #487 The basic problem here is that the user writes something like: ```hlsl float invSqrt2 = 1 / sqrt(2); ``` In this case the user knows that `sqrt()` is only defined for floating-point types, so they expect this to compile something like: ```hlsl float invSqrt2 = float(1) / sqrt(float(2)); ``` The challenge this creates for the Slang compiler is that we use generics to streamline our declarations of all the builtins, so that the scalar `sqrt()` function is actually declared as: ```hlsl T sqrt<T:__BuiltinFloatingPointType>(T value); ``` The `__BuiltinFloatingPointType` is an `interface` defined as part of the standard library, such that only built-in floating-point types conform to it (that is, `half`, `float`, and `double`). When generic argument inference applies to a call like `sqrt(2)`, we see an argument of type `int`, and try to infer `T=int`, which leads to a failure because `int` does not conform to `__BuiltinFloatingPointType`. The point where this currently fails in in the logic to "join" two types for inference, which is supposed to pick the best type that can represent both of two input types. E.g., a join between `float` and `int3` would be `float3`, since both of those types can convert to it, and it is the "minimal" type with that property. So, the goal here is simple: we want a "join" between `int` and `__BuiltinFloatingPointType` to yield the `float` type. The way we handle that in this change is to special case the join of a basic scalar type and an interface, by enumerating all the basic scalar types, filtering them for ones that support the chosen interface and can be implicitly converted from the argument type, and then picking the "best" of them (the comments in the code explain what "best" means in this context). The technique used here could be generalized in the future to deal with user-defined types or more cases, but that would risk slowing down overload resolution even more, which is already the most expensive part of our semantic checking pass. A test case has been added for the specific case of `sqrt()` applied to an `int` argument.
2018-06-13	Make render-test use Slang for all shader compilation (#597)	Tim Foley
	* Make render-test use Slang for all shader compilation This streamlines the code for render-test by having all its shader compilation go through the Slang API, so that it doesn't have to deal with custom logic to compile HLSL->DXBC and HLSL->DXIL. We were already leaning on Slang to generate SPIR-V for Vulkan, so this makes all the paths more consistent. My original plan with this change was to make the D3D12 render path start using DXIL at this point, since the change would make that easy, but it turns out that some aspects of how we handle parameter binding are not compatible with that right now, so it would need to come as a later change. There's a lot of details here, so I will try to walk through the changes, including the incidental ones: * Add logic to `premake5.lua` so that we copy the necessary libraries for HLSL shader compilation to our target directory from the Windows SDK. This is necessary so that our tests can actually invoke `dxcompiler.dll` * Re-run Premake to generate new project files. This moves around a few files that I manually added in previous changes without re-running Premake. * When invoking `fxc` as a pass-through compiler, be sure to pass along any macros defines via API or command-line. This isn't a strictly required change with how things worked out, but it is a positive one anyway, because it makes `slangc -pass-through fxc` more useful. * Don't print output from a downstream `fxc` invocation if it produces warnings but no errors. The main reason for this is so that our tests don't fail because of `fxc` warnings on Slang's output (which then don't match the baselines), but it can also be rationalized as not wanting to confuse users with warnings that don't come from the "real" compiler they are using. This probably needs fine-tuning as a policy. * Add the HLSL `NonUniformResourceIndex` function. This was an oversight because it isn't documented as a builtin on MSDN, and only gets mentioned obliquely when they talk about resource indexing. * Add `glsl_<version>` profiles to match our `sm_<version>` profiles, so that it is easy for a user to use the profile mechanism to request a specific GLSL version without also specifying a stage name. * Update the render-test logic so that there is a single `ShaderCompiler` implementation that always uses Slang, and get rid of all of the renderer-specific `ShaderCompiler` implementations. * Update logic in render-test `main.cpp` to select the options to use for the eventual Slang compile based on the choice of renderer and input language. I didn't change the options that render-test exposes, even though they are getting increasingly silly (e.g., `-glsl-rewrite` doesn't use GLSL as its input...). * Note: the D3D12 renderer will still use fxc, DXBC, and SM 5.0 for now, since trying to update it to switch to dxc, DXIL, and SM 6.0 didn't work well at the time. * Add a bit of supporting D3D12 code to make sure that we don't allocate a structured buffer when a buffer has a format. * Make sure to also define the `__HLSL__` macro when compiling Slang code, because otherwise a bunch of tests don't work (I'm not clear on how it worked before...). * fixup: missing file
2018-06-13	Fix some issues around codegen for l-values and assignment (#601)	Tim Foley
	The problem here arose when a complicated l-value was formed like: ```hlsl struct Foo { float4 a; } RWStructuredBuffer<Foo> gBuffer; gBuffer[index].a.xz += whatever; ``` In this case the `gBuffer[index].a.xz` expression is a complex l-value in multiple ways: * The `gBuffer[index]` subscript could be routed to either a `get` accessor or a `ref` accessor (and maybe also a `set` accessor if we add one to the stdlib definition), and we defer the choice of which to call until as late as possible in codegen today. * The `_.a` part then becomes a "bound member acess" because we can't actually produce a direct pointer until we've resolved how to implement the subscript operation. * The `_.xz` part becomes a "swizzled l-value" because there is no way to materialize it as a pointer to contiguous storage in the orignal object (the `x` and `z` components of a vector aren't contiguous). Recent changes to support atomic operations on buffer elements introduced the `ref` accessor on `RWStructuredBuffer`, which made it possible to form a pointer to a buffer element in the IR. This interacted with some code for the "bound member" case that was trying to only introduce a temporary when absolutely necessary, and was doing so by assuming anything with an address didn't need to be moved into a temporary. The first fix is to clean up that logic in the bound-member case for assignment: always create a temporary, rather than do it conditionally. The second fix here is more systemic: we add logic to try to coerce the representation of an l-value during codegen into being a simple address, and employ that in cases where we know an address is desired. In a case like the above this helps to get things into the form that is required, so that a swizzled store can be issued. There is still some potential for cleanup in this logic, but I don't want to introduce more changes than seem necessary to fix the original problem.
2018-06-13	Fixes related to handling of empty types (#600)	Tim Foley
	PR #577 tries to eliminate empty `struct` types by replacing them with a `LegalType::tuple` with zero elements, but this seems to run into problems in some cases, where we end up trying to match up `::none` values with empty `::tuple`s. An alternative way to handle this is to never create empty `LegalType::tuple`s (and the same for `LegalVal::tuple`), and instead create `LegalType::none` and `LegalVal::none`. PR #577 avoided this because there were various cases in the legalization logic that didn't robustly handle `LegalType::Flavor::none`. This PR thus includes two main changes: 1. Construct a `::none` type when we have an empty `struct` type. 2. Survery all places that handle the `::tuple` case and extend them to handle the `::none` case if it was missing. This fixes an issue filed in Falcor's internal GitLab as number 424.
2018-06-12	Initial support for enum declarations (#599)	Tim Foley
	Slang `enum` declarations will always be scoped, e.g.: ```hlsl enum Color { Red, Green = 2, Blue, } Color c = Color.Red; // Not just `Red` ``` A user can write `enum class` as a placebo for now (to ease sharing of headers with C++). Slang does not currently support the `::` operator for static member lookup, so it must be `Color.Green` and not `Color::Green`. Support for `::` as an alternate syntax could be added later if there is strong user demand. An `enum` type can have a declared "tag type" using syntax like C++ `enum class`: ```hlsl enum MyThings : uint { First = 0, // ... } ``` The `enum` cases will store their values using that type. An `enum` that doesn't declare a tag type will use the type `int` by default. Enum cases are assigned values just like in C/C++: cases can have explicit values, but otherwise default to one more than the previous case, or zero for the first case. All `enum` types will automatically conform to a standard-library `interface` called `__EnumType`, which is used so that basic operators like equality testing can be defined generically for all `enum` types. This change only adds one operator at first (the `==` comparison), but other should be added later. An `enum` case needs to be explicitly converted to an integer where needed (e.g., `int(Color.Red)`). This is implemented by having the main integer types (`int` and `uint`) support built-in initializers that can work for any `enum` type (or rather, anything conforming to `__EnumType`). Eventually these will be restricted so that an `enum` type can only be converted to its associated tag type. IR code generation completely eliminates `enum` types and their cases. The `enum` type will be replaced with its tag type, and the cases will be replaced with the tag values. Currently this could leave some mess in the IR where cast operations are applied between values that actually have the same type.
2018-06-06	Add basic support for Shader Model 6.3 profiles (#594)	Tim Foley
	* Add basic support for Shader Model 6.3 profiles This adds `vs_6_3` and friends as available profiles, but doesn't add any new builtins specific to Shader Model 6.3. In order to better support the ray tracing shader stages, Slang will not automatically map any attempt to compile a DXR shader up to SM 6.3 (the shader model officially required for these stages) and to the `lib_` profiles (because there are no stage-specific profiles for these cases). As an added detail, when invoking `dxcompiler.dll` to generate DXIL for DXR shaders, specify an empty entry-point name, since that is expected for `lib_` profiles. * Fixup: don't drop [shader(...)] attributes The previous change makes the "effective profile" for DXR compiles no longer include a stage, but we had been using the stage stored on the effective profile in exactly one place: when determining what to output for a `[shader("...")]` attribute. This fixup makes it so that we use the stage from the profile on the entry-point layout instead, which seems like the right choice anyway, if we are ever going to emit multiple entry points at once.
2018-06-05	Fix atomic operations on RWBuffer (#593)	Tim Foley
	* Fix atomic operations on RWBuffer An earlier change added support for passing true pointers to `__ref` parameters to fix the global `Interlocked()` functions when applied to `groupshared` variables or `RWStructureBuffer<T>` elements. That change didn't apply to `RWBuffer<T>` or `RWTexture2D<T>`, etc. because those types had so far only declared `get` and `set` accessors, but not any `ref` accessors (which return a pointer). The main fixes here are: Add `ref` accessors to the subscript oeprations on the `RW` resource types Adjust the logic for emitting calls to subscript accessors so that we don't get quite as eager about invoking a `ref` accessor, and instead try to invoke just a `get` or `set` accessor when these will suffice. This is important for Vulkan cross-compilation, where we don't yet support the semantics of our `ref` accessors. * Add a test case for atomics on a `RWBuffer` * Fix up `render-test` so that we can specify a format for a buffer resource, which allows us to use things other than `StructuredBuffer` and `ByteAddressBuffer`. The work there is probably not complete; I just did what I could to get the test working. * A bunch of files got whitespace edits thanks to the fact that I'm using editorconfig and others on the project seemingly arent... * fixup: remove ifdefed-out code
2018-06-04	Emit directives to control matrix layout (#590)	Tim Foley
	The HLSL/GLSL output by Slang should try to be robust against whatever flags somebody uses to compile it. Therefore, we will go ahead and output a target-language-specific directive to control the default matrix layout mode so that we can override whatever might be specified via flags. Also, as long as we are at it, this change goes ahead and makes Slang unconditionally emit row/column-major layout modifiers on all matrices (and arrays of matrices) whereas before these were only being output sometimes (the code to do it seemed buggy to me...).
2018-05-31	Add options to control matrix layout rules (#583)	Tim Foley
	* Add options to control matrix layout rules Up to this point, the Slang compiler has assumed that the default matrix layout conventions for the target API will be used. This means column-major layout for D3D, and row major layout for GL/Vulkan (note that while GL/Vulkan describe the default as "column major" there is an implicit swap of "row" and "column" when mapping HLSL conventions to GLSL). This commit introduces two main changes: 1. The default layout convention is switched to column-major on all targets, to ensure that D3D and GL/Vulkan can easily be driven by the same application logic. I would prefer to make the default be row-major (because this is the "obvious" convention for matrices), but I don't want to deviate from the defaults in existing HLSL compilers. 2. Command-line and API options are introduced for setting the matrix layout convention to use (by default) for each code generation target. It is still possible for explicit qualifiers like `row_major` to change the layout from within shader code. I also added an API to query the matrix layout convention that was used for a type layout (which should be of the `SLANG_TYPE_KIND_MATRIX` kind), but this isn't yet exercised. I added a reflection test case to make sure that the offsets/sizes we compute for matrix-type fields are appropriately modified by the flag that gets passed in. In a future change we could possibly switch the default convention to row-major, if we also changed our testing to match, since there are currently not many clients to be adversely impacted by the change. * Fixup: silence 64-bit build warning
2018-05-29	Fix global atomic functions (#582)	Tim Foley
	Fixes #581 This change adds a new parameter passing mode `__ref` to exist alongisde `in`, `out`, and `inout`. The `__ref` modifier indicates true by-reference parameter passing (whereas `inout` is copy-in-copy-out). This is not intended to be something that users interact with directly, but rather a low-level feature that lets us provide a correct signature for the `Interlocked*()` operations in the standard library. Most of the support for passing what are logically addresses around already exists in the IR, so the majority of the work here is just in introducing the new type `Ref<T>` and then using it appropriately when lowering `__ref` parameters/arguments to the IR.
2018-05-25	Fixes 574. Eliminate empty structs during type legalization (#577)	Yong He

2018-05-24	A bunch of work to resolve #569 (#576)	Tim Foley
	* render-test should not fail on HLSL compiler warnings The logic in `render-test` that invokes `D3DCompile` was causing a test to fail if it produced any warnings (not just if compilation fails). Warning output can be dealt with by the test runner, since it will compare output between runs anyway, and it is useful to be able to run something through `render-test` that compiles with warnings. * Be more careful about deleting IR instructions There was an `IRInst::deallocate()` method that had a precondition that the instruction should already be removed from its parent and clear out all its operands before calling, but it wasn't checking this and the few call sites weren't doing things right either. I consolidated things on `IRInst::removeAndDeallocate()` which does all the things: removes from the parent, clear out operands, and then deallocates. I also made sure to clear out the type operand. This clears up some crashing issues where passes were removing instructions but those instructions would still show up as users of other instructions. * Don't emit bitwise not for non-Boolean types It seems like the logic in `emit.cpp` messed things up and decided that `Not` (the IR instruction that is equivalent to `!` in the AST) should emit as `!` for Boolean types and `~` for other types, but this makes no sense (e.g., `~(a & 1)` is very different from `!(a & 1)`, even when interpreted as a condition). It seems like this logic was intended for the `BitNot` case, where `~a` and `!a` are actually equivalent for Boolean values (but a target language might not like `~a` on `bool` values). Maybe the original plan was that the `Not` instruction should only apply to Boolean values in the first place, and that other values should be converted to `bool` (or a vector of `bool`) before applying `Not`, but even in that case the emit logic makes no sense. This caused an actual problem for one of my test cases, so it was important to fix it now. * Fix issue with cached resolution for overoaded operators The basic problem was that the lookup logic was forming a key based on the first definition it found for the overloaded operator, but that means that when processing a prefix `++a` call we might look up the postfix definition of `operator++` and decide to use its opcode as the key. This "fixes" the logic by looking for the first definition with a "compatible" definition (e.g., a `__prefix` function if we are checking a `PrefixExpr`), and then uses its opcode. A better fix in the long run would be to make the cache just be keyed on the operator name and the "fixity" of the expression (prefix, postfix, or infix). * Introduce an intermediate structured control-flow representation The code previously used a single function called `emitIRStmtsForBlocks` in `emit.cpp` that would take a logical sub-graph of the CFG and emit it as high-level statements. It would do this by recognizing operations like coniditional branches that it could turn into high-level `if` statements, etc. The main problem with this function was that it mixed together the logic for how we restructure the program with the logic for how we emit high-level code from that structure. This change splits those two parts of the algorithm by introducing an intermediate data structure: a tree of `Region`s, which represent single-entry regions of the CFG. There are subclasses of `Region` corresponding to various structured control-flow constructs, and then a leaf case that wraps a single `IRBlock`. The new function `generateRegionsForIRBlocks()` (in `ir-restructure.cpp`) now handles the restructuring work, by building one or more `Region`s to represent a sub-graph, while `emitRegion()` handles emitting HLSL/GLSL source code from a region. Splitting things in this way opens up some opportunities for future changes: * We can expand the set of IR control-flow constructs allowed, so long as we can still generate structure `Region`s from them, without having to mess with the emit logic (e.g., we could start to support multi-level `break` by introducing temporaries as needed). In the limit we can generate our `Region`s using something like the "Relooper" algorithm. * We can emit to other representations while retaining the same control-flow restructuring support. E.g., if we drop the structured information from the IR, then emitting to SPIR-V for Vulkan would require us to use the strucured control-flow information from these `Region`s. * We can do analysis that needs to understand `Region` structure. This is relevant to issue #569, which was what prompted me to start on this work. Now that we have a representation of the nesting of `Region`s, we can use it to reason about visibility of values between blocks. During development of this change I ran into a gotcha, in that I had been assuming each IR block would map to a single `Region`, forgetting that our current lowering of "continue clauses" in `for` loops leads to them being duplicated. The `Region` representation handles this by having a linked-list struct mapping IR blocks to the `SimpleRegion`s that represent them. I added a test case that includes a `for` loop with a continue clause that is reached along multiple paths just to make sure that we continue to support that case. The compiler output should not change as a result of this work; this is supposed to be a pure refactoring change. * Add a pass to resolve scoping issues in generated code Fixes #569 The basic problem arises because the structured control flow that we output in high-level HLSL/GLSL doesn't match the "scoping" rules of an SSA IR. In particular, SSA says that a value can be used in any block that is dominated by the definition, but in the presence of `break` and `continue` statements it is easy to construct cases where a block dominates something that is not in its scope for structured control flow. Consider: ```hlsl for(;;) { int a = xyz; if(a) { int b = a; break; } int c = a; } int d = b; ``` This program is invalid as HLSL, because the variable `b` is referenced outside of its scope, but if we look at the CFG for this function, it is clear that the block that computes `b` dominated the block that computes `d`. IR optimizations can easily create code like this, so we need to be ready for it. The previous change added an explicit `Region` structure to represent the structured control flow that we re-form out of the IR, and this change adds a pass that exploits the structuring information to detect cases like the above and introduce temporaries to fix the scoping issue. For example, the pass would change the earlier code block into something like: ```hlsl int tmp; for(;;) { int a = xyz; if(a) { int b = a; tmp = b; break; } int c = a; } int d = tmp; ``` That is, we introduce a new `tmp` variable at a scope "above" both the definition and use of `b`, and then we copy `b` into that temporary right where it is computed, and then use the temporary instead of the original `b` at the use site. A few details that came up during the implementation: * Downstream compilers may get confused by code like the above, and complain that `tmp` may be used before it is initialized, even though the very definition of dominators in a CFG means we don't have to worry about it. Still, I introduced some one-off code to initialize the temporaries just to silence spurious warnings coming from fxc. * We need to be careful not to apply this logic to "phi nodes" (the parameters of basic blocks) since they will already be turned into temporaries by the emit logic, and trying to introduce temporaries with this pass led to broken code (I still need to investigate why). It may be that a future version of this pass should also take the code out of SSA form, so that we can introduce both kinds of temporaries in a single pass (and maybe eliminate some unnecessary variables by doing basic register allocation). There is another transformation that could fix some issues of this kind, by moving code out of a structured control-flow construct and to the "join point" after it. For example, we could turn our loop from the start of this commit message into: ```hlsl for(;;) { int a = xyz; if(a) { break; } int c = a; } int b = a; int d = b; ``` Moving the definition of `b` to after the loop is possible because there is no way to get out of the loop without executing that code anyway. Now the scoping issue for `d`'s use of `b` has gone away, but of course we've introduced a new scoping issue for `a`, when it gets used by `b`. Adding a pass to re-arrange control flow like this could reduce the cases where we have to apply the current pass, but it wouldn't eliminate them entirely. That means such a pass can be deferred to future work. This change includes a test case the reproduces the original issue, so that we can confirm the fix works.
2018-05-23	Fix Slang->GLSL translation for entry point with multiple `out` parameters ↵	Tim Foley
	(#573) Fixes #568 The problem occurs when an entry point declares multiple `out` parameters: ```hlsl void myVS( out float4 a : A, out float4 b : B ) { ... a = whatever; b = somethingElse; ... if(done) { return; // explicit return } ... // implicit return } ``` Slang translates code like this by introducing a GLSL global `out` parameter for each of `a` and `b`, rewriting the logic inside the entry point to use a local temporary instead of the real parameters, and then assigning from the locals to the globals at every `return` site: ```glsl out vec4 g_a; out vec4 g_b; void main() { // insertion location (1) vec4 t_a; vec4 t_b; ... t_a = whatever; t_b = somethingElse; ... if(done) { // insertion location(2) g_a = t_a; g_b = t_b; return; // explicit return } ... // insertion location (3) g_a = t_a; g_b = t_b; // implicit return } ``` Note that there are three different places (for this example) where code gets inserted to make the translation work. We insert declarations of local variables at the top of the function, and then insert the copy from the temporariesto the globals at each `return` site (implicit or explicit). The bug in this case was that the pass was setting the insertion location to (1) outside of the loop for parameters, so that when it was done with `a` and moved on to `b`, it would end up inseting the temporary `t_b` at the last location used (location (3) in this example), and this would result in invalid code, because `t_b` gets used before it is declared. This bug has been around for a while, but it has largely been masked by the fact that so few shaders use multiple `out` parameters, and also because Slang's SSA-ification pass would often be able to eliminate the local variable anyway, so that the bug never bites the user. The reason it surfaced now for a user shader was because we introduced `swizzledStore`, which currently inhibits SSA-ification, so that some temporaries that used to get eliminated are now retained so that they can break things. The fix in this case is small: we use the existing `IRBuilder` only for insertions at location (1) and construct a new builder on the fly for all the insertions at `return` sites. I have not included a test case yet, because our end-to-end Vulkan testing is not yet ready, so this may regress again in the future.
2018-05-23	When outputing a vector type with a size of 1 in GLSL, it needs to be output ↵	jsmall-nvidia
	as the underlying type. For example vector<float,1> should be output as float in GLSL. (#572)
2018-05-21	Handle structure initializers in IR type legalization (#567)	Tim Foley
	Fixes #566 The basic problem here is that the front-end translates a structure initializer-list expression into a `makeStruct` instruction (with one argument per field), but the IR type legalization logic wasn't handling the case where a `makeStruct` is used to construct a struct value that needs to get split by legalization. The implementation is relatively straightforward, and like the other cases of instruction legalization for compound types, it follows the shape of the `LegalType`/`LegalVal` cases. The one interesting bit is that we need to be a bit careful and filter the single argument list for `makeStruct` into two in the case where we generate a "pair" type for something that has both "ordinary" and "special" (resource) fields. Luckily the `PairInfo` data that was generated by type legalization has exactly the information we need (by design). This change does not address several issues that could be handled in follow-on changes: * The `makeArray` instruction will face similar issues if it is applied to a type that requires legalization: we'd need to turn an array of `LegalVal`s into a bunch of distinct arrays. * The error message when we hit the unimplemented case here isn't great. Ideally we should provide the line number of the instruction that fails in an error message when legalization fails. This change tries to focus narrowly on the bug at hand, and leave these issues for later changes.
2018-05-11	Generate Visual Studio projects using Premake (#557)	Tim Foley
	* Generate Visual Studio projects using Premake This change adds a `premake5.lua` file that allows us to generate our Visual Studio solution using Premake 5 (https://premake.github.io/). The existing Visual Studio solution/projects are now replaced with the Premake-generated ones, and project contributors will be expected to update these by running premake after adding/removing files. I have not changed the Linux `Makefile` build at all, because that file is also used for things like running our tests, so that clobbering it with a premake-generated `Makefile` would break our continuous testing. Hopefully future changes can switch to a generated `Makefile` and perhaps even add an XCode project as well. Notes: * The `build/slang-build.props` file is no longer needed/used, so it has been removed. * The `slang-eval-test` test fixture wasn't following our naming conventions for its directory path, so it was updated to streamline the Premake build configuration work. This required changes to the `Makefile` as well * Some seemingly unncessary preprocessor definitions that were specified for `core` and `slang-glslang` have been dropped. We will see if anything breaks from that. * Possible fixup for Premake vpath issue Premake's `vpath` feature seems to be nondeterministic about the order it applies filters (because Lua isn't deterministic about the order of entries in a key/value table), and as a result we can end up in a weird case where it decides that a `foo.cpp.h` file matches the `.cpp` filter (I'm not sure why) before it tests against the `.h` filter. This change uses an (undocumented) Premake facility to set `vpath` using a list of singleton tables, which seems to fix the order in which things get tested. * Remove support for "single-file" build of Slang The `hello` example was the only bit of code that uses the "single-file" way of building Slang, and this had already run up against limitations of the Visual Studio compilers in its Debug\|x64 build. Rather than mess with Premake to make it pass through the `/bigobj` linker flag that is needed to work around the issue, it makes more sense just to stop using/supporting the feature since we wouldn't want users to depend on it anyway (our documentation no longer refers to it). While I was at it I went ahead and made sure that the `SLANG_DYNAMIC` flag doesn't need to be set manually, so that instead there is a non-default `SLANG_STATIC` option (not that we have a static-library build of Slang at the moment).
2018-05-11	Cleanups around behavior when the compiler fails (#553)	Tim Foley
	* Cleanups around behavior when the compiler fails * Add another case where we try to `noteInternalErrorLoc()` if an exception in thrown. This one is the in the logic for emitting an IR instruciton. This could be improved by adding another layer at the function level (as a catch-all for instructions with no location), but something is better than nothing. * Change a bunch of `assert()`s over to `SLANG_ASSERT()`s, so that we can theoretically take more control over them (e.g., make release builds with asserts enabled) * Some other small cleanups around the assertions we perform. In the survey I made, I didn't really see many obvious "smoking gun" cases where we could produce a significantly better error message for some of the unimplemented/unexpected paths, other than to actually implement the missing functionality. * fixup
2018-05-11	Fixes #559 (#560)	Yong He

2018-05-10	Workaround for cases where we emit illegal-but-unused types	Tim Foley
	This is a quick workaround to deal with cases where we try to emit an unreferenced IR type that contains references to pre-legalization types (which might have been removed from the IR even thought they are still referenced). The basic fix is to not add types to our global order of instructions to emit by default, and only add them on demand as they are referenced by other instructions. This is not a real fix for the underlying issue, which is that type legalization is only being applied to a subset of global instructions instead of all of them. A more detailed fix for that problem will need to be devised next. This fix also doesn't address the question of why an unreferenced `struct` type came to be present in the IR code passed to the back-end in the first place. It would be good to understand how this scenario is arising.
2018-05-04	Re-enable emission of #line directives and clean up output (#554)	Tim Foley
	This was based on feedback from Falcor users, who felt like changing the default to have no line directives didn't work out well. Since I'd only made them disabled by default based on what I perceived to be Falcor's needs, I'm happy to turn this back on by default. I also added a few changes to clean up the output: * Don't emit a directive for a sub-expression, since that breaks up the code too much. The only directives inside a function body will be on top-level instructions that didn't get folded into a use site. * Add logic to emit a directive for top-level declarations (globals, functions, structs), and clean up their printing so that they put any extra space after the declaration rather than before (so the line numbers can be accurate) * Don't emit the file path part of a directive if it would be the same as the previous directive. This makes the output less noisy, at the cost of having to work your way backward to find the file if you are looking directly at the output. There are certainly more cleanups possible, but these make the output decent enough to be useful for working backwards from a downstream compiler error to the offending code.
2018-05-04	Allow more complex compound expressions when emitting from IR (#552)	Tim Foley
	The emit logic already had an idea of when an instruction should be "folded" it its use site(s), and this change just expands on that logic to try to be more aggressive. The basic idea is that instead of outputting this: ```hlsl float4 _S3 = a_0 + b_0; float4 _S4 = c_0 * _S3; d_0 = _S4; ``` we can hopefully output something like this: ```hlsl d_0 = c_0 * (a_0 + b_0); ``` The way this works is that after dealing with the various special cases that decide an instruction `I` must/cannot be folded in, we look and see if it has the following properites: * `I` has no side effects * `I` has a single user, `U` * `I` and `U` are in the same block (and `I` comes before `U` in that block) * for every instruction `X` between `I` and `U` (exclusive), `X` has no side effects If all of these conditions are true, then `I` can be folded in as a sub-expression when we emit `U`. This change doesn't affect most of our test output, but there is still a single test with SPIR-V output that we compare against a GLSL baseline, and so that baseline had to be modified to match the GLSL we now generate. Similar to #547, this change is not meant to provide a complete solution, but rather to take a concrete but low-risk step toward improving our output. Opportunities to improve the results further include: * We can/should ensure that when outputting sub-expressions we keep extra parentheses to a minimum. The old logic for emitting from an AST had support for "unparsing" expressions with minimal parentheses, and we should try to do the same. This can be error-prone, because omitting parentheses can lead to silent failures, so it must be done carefully. * We could try to be more aggressive about detecting what operations might have side effects. The most interesting case is function calls, where we should try to check if the callee is a function known to be side-effect-free. We could start by annotating most builtin functions with an attribute/decoration that indicates freedom from side effects. Deriving this attribute for user functions could be interesting, but we'd have to be careful since "nontermination" is technically a side effect. * We could try to be more aggressive about determining what side effects in instructions `X` are "safe" for the instruction `I` to move across. For example, if `I` is a load from variable `a` and `X` is a store to variable `b`, then that would seem to be safe. This starts to get into issues of instruction scheduling, though, and that is probably beyond what we want Slang to be doing.
2018-05-03	Add a pass for computing dominator trees (#541)	Tim Foley
	This code is currently not used by anything, but I wanted to check in a first pass at an implementation of dominator tree construction so that we don't have to keep avoiding implementing algorithms that rely on having dominator information available. The algorithm used to construct the dominator tree is taken from "A Simple, Fast Dominance Algorithm" by Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy. This is not the "best" algorithm in terms of asymptotic performance, but it is among the simplest algorithms for computing a dominator tree that still outperforms naive iterative set-based methods. The actual data structure and API for the dominator tree has a bit of "cleverness" in it to try to make the common queries reasonably fast (e.g., you can check whether A dominates B in constant time). My hope is that even if we implement a more advanced algorithm for constructing the dominator tree, we can retain compatibility with passes that might make use of this API. Because no code is currently using this logic, I have done only minimal testing by stepping through this code and validating the results on paper for some very small CFGs. More serious testing/debugging may need to wait until we have an optimization pass that needs the dominator tree we compute here. One open question I have is how best to introduce traditional unit testing into Slang, since this is an example of code that would benefit greatly from being unit tested.
2018-05-03	Pass through original names for most declarations (#547)	Tim Foley
	The basic idea here is that when lowering to the IR, the front-end will attach a "name hint" to the IR instruction(s) that represent a given declaration, and then the passes that work on the IR will try to preserve and propagate those names, and then finally the emit logic will use them in place of mangled or unique names when available. This change does not try to deal with the issues that arise when we try to use those variable names in the output without any modification (e.g., handling cases where they might clash with keywords or builtins in the target language). Instead, it tries to establish baseline behavior for propagating through names, so that a later change can concentrate on the issue of using those names exactly when it is legal to do so. In order to avoid issues around the name "hints" causing problems we take two main steps: 1. We "scrub" each name to reduce it down to the allowed set of identifier characters in C-like languages, and then ensure that it doesn't do things that would be illegal in some downstream languages (e.g., consecutive underscores are not allowed in GLSL) or could clash with Slang's mangled names. This process isn't guaranteed to give distinct results for distinct inputs (it isn't a mangling scheme, after all). 2. We generate a unique ID for each occurence of a given name and always use that as a suffix. This means that even if a name happens to overlap with a keyword (if you somehow have a variable named `do`), we will still add a suffix that makes it not a problem (we'd output `do_0` which is fine). The logic for generating these names is mostly straightforward. For simple variables, we use their given name directly, while for other declarations we try to form a name that includes their parent declaration (e.g. `SomeType.someMethod`). Various IR passes need to propagate or preserve this information. The most interesting is type legalization, when we take a variable with an aggregate type and split some of the fields out into their own variables. In that case we generate "dotted" names like `someVar.someTexture` and rely on the emit logic to turn that into `someVar_someTexture`. During SSA generation, if we are promoting a variable to SSA temporaries, we will try to propagate the name of the variable over to the temporaries (unless they already have a name from some other place). The same applies to block parameters ("phi nodes"). Many of the test changes need their expected output to be updated for this change. Luckily in most cases the output has gotten easier to understand.
2018-05-02	Merge branch 'master' into master	Yong He

2018-05-02	Speedup type checking using cached overload resolution results.	Yong He
	This change adds caches to built-in operator overload resolution and type coersion to avoid running these time-consuming operations every time. - Adds `TypeCheckingCache` type, which is defined in check.cpp, that contains two dictionaries for the cached results of `ResolveInvoke` and `CanCoerce` calls. - Add `destroyTypeCheckingCache` and `getTypeCheckingCache` methods to `Session` class to reuse these cached results over the entire session.
2018-05-02	Add support for "swizzled stores" (#544)	Tim Foley
	This was a known issue in our IR representation, which was now biting a user. The basic problem is that in code like the following: ```hlsl RWStructureBuffer<float4> buffer; ... buffer[index].xz = value; ``` we ideally want to be able to reproduce the original HLSL code exactly, but that requires directly encoding the way that this code writes to two elements of a vector, but not the others. The currently lowering strategy we had produced IR something like: ```hlsl float4 tmp = buffer[index]; tmp.xz = value; buffer[index] = tmp; ``` That transformation might seem valid, but it has some big problems: * It generates UAV reads that are not needed, which could impact performance * It performs read-modify-write operations on memory that the programmer didn't explicitly write, which could create data races The fix here is somewhat obvious: if the "base" of a swizzle operation on a left-hand side resolves to a pointer in our IR, then we can output a "swizzled store" instead of the read-modify-write dance. We currently keep the read-modify-write around since it is potentially needed as a fallback in the general case. Along the way I also tried to make sure that we handle the case where we have a swizzle of a swizzle on the left-hand side: ```hlsl buffer[index].xz.y = value; ``` That code should behave the same as `buffer[index].z = value`. I am currently detecting and cleaning up this logic in the lowering path for `SwizzleExpr`, because that is the only place in the lowering logic that "swizzled l-values" currently get created.
2018-05-02	Add support for explicit register space bindings (#542)	Tim Foley
	This change adds support for specifying explicit register spaces, like: ```hlsl // Bind to texture register #2 in space #1 Texture2D t : register(t2, space1); ``` I added a test case to confirm that the register space is properly propagated through the Slang reflection API. This change also adds proper error messages for some error/unsupported cases that weren't being diagnosed: * Specifying a completely bogus register "class" (e.g., `register(bad99)`) * Failing to specify a register index (`register(u)`) * Specifying a component mask (`register(t0.x)`) * Using `packoffset` bindings I added test cases to cover all of these, as well as the new errors around support for register `space` bindings. In order to get the existing tests to pass, I had to remove explicit `packoffset` bindings from some DXSDK test shaders. None of these `packoffset` bindings were semantically significant (they matched what the compiler would do anyway, for both Slang and the standard HLSL compiler). Removing them is required for Slang now that we give an explicit error about our lack of `packoffset` support. In a future change we might add logic to either detect semantically insignificant `packoffset`s, or to just go ahead and support them properly (as a general feature on `struct` types).
2018-05-02	Fix emit logic when "terminators" occur in the middle of a block (#540)	Tim Foley
	Fixes #527 There were a few problem cases for the IR emit logic. The most obvious, which came up in #527 is that a function body with multiple `return` statements would generate invalid code: ```hlsl int foo() { return 1; int x = 2; return x; } ``` In that case the IR for `foo` would have a single block that has two `return` instructions, which is invalid. Another case that seems to be arising more often, but that had less obvious consequences was when one arm of an `if` statement ends in a `return`: ```hlsl if(a) { return b; } else { int c = 0; } int d = 0; ``` In that case, the `return` instruction for `return b` would be followed by a branch to the end of the `if` (the `int d = 0;` line), because that would be the normal control flow without the early `return`. The fix implemented here is to have the IR lowering logic be a bit more careful on two fronts: 1. When emitting a branch, check if the block we are emitting into has already been terminated, and if so just don't emit the branch (since we are logically at an unreachable point in the CFG. 2. Whenever we are about to emit code for a (non-empty) statement, ensure that the current block being build is unterminated. If the current block is terminated, then start a new one. Case (2) will only matter when there is unreachable code (e.g., in the function `foo()`, the declaration of `x` and the second `return` can never be reached), so I added a warning in that case, and included a test case that triggers the new warning (with a function like `foo()` above).
2018-05-01	Diagnose attempts to write to fields in methods (#530)	Tim Foley
	* Diagnose attempts to write to fields in methods Work on #529 This helps to avoid the case where a Slang user writes a struct with helpful `setter` methods, and finds that it doesn't work as expected because the `this` parameter is currently handled like an `in` parameter (passed by value, but mutable in the callee). Fixing this issue actually involved making a more broad fix to how l-value-ness is propagated. The existing checking logic was assuming that l-value-ness is just a property of a particular member declaration (e.g., a field is either mutable or not), and didn't take into account whether the "base expression" was mutable. This change fixes that oversight, which might lead to additional errors being issued if we aren't correctly making things mutable when we should. A `ThisExpr` was already immutable by default, so that part didn't actually need to change. Just propagating its immutability through was enough. As an additional assistance to users, I have added an extra diagnostic that triggers when a "destination of assignment is not an l-value" error occurs and the left-hand-side expression seems to be based on `this` (whether implicitly or explicitly). This will ideally help users to understand that the "setter" idiom is not yet supported. * Fixed setRadius typo
2018-05-01	Cleanups (#539)	Tim Foley
	* Cleanup: remove unused files from project * Cleanup: move IRModule forward declaration into correct namespace
2018-04-28	Remove unused local variable in vm.cpp (#533)	Jeremie St-Amand
	Unused local variable prevents compiling when warnings are treated as errors
2018-04-25	Fix for global generic parameter substitution (#512)	Tim Foley
	The problem here arises when multiple entry points are compiled in one pass. Each entry point has its own arguments for global generic parameters, and leads to us emitting a `bindGlobalGenericParameter(p, val)`. But once the first entry point's substitutions are applied, the second entry point's code gives `bindGlobalGenericParameter(val, val)` and the compiler crashes (in debug builds) because `val` is not a global generic parameter. This change just applies a quick fix. If we see `bindGlobalGenericParameter(x,y)` during specialization, and `x` is not a global generic parameter, then we skip it. The right long-term fix is to change the compiler's representation of global generic arguments so that they live on a `CompileRequest` instead of an `EntryPointRequest`. That is a more significant change (with impact on the public API), so I'm inclined to leave it as a cleanup for another day (given that no customers are using global generic parameters today).
2018-04-23	Improve SSA promotion for arrays and structs (#521)	Tim Foley
	* Improve SSA promotion for arrays and structs Fixes #518 The existing SSA pass would only handle `load(v)` and `store(v,...)` where `v` is the variable instruction, and would bail out if `v` was used as an operand in any other fashion. The new pass adds support for `load(ac)` where `ac` is an "access chain" with a gramar like: ac :: v \| getElementPtr(ac, ...) \| getFieldAddress(ac, ...) What this means in practical terms is that we can promote a local variable of array or structure type to an SSA temporary even if there are loads of individual elements/fields, as along as any assignment to the variable assigns the whole thing. I've added a test case to confirm that this change fixes passing of arrays as function parameters for Vulkan. * Fixup: disable test on Vulkan because render-test isn't ready This is a fix for Vulkan, but I don't think our testing setup is ready for it. * Fixup: error in unreachable return case, caught by clang * Fixups based on testing These are fixes found when testing the original changes against the user code that originated the bug report. * `emit.cpp`: Make sure to handle array-of-texture types when deciding whether to declare a temporary as a local variable in GLSL output * `ir-legalize-types.cpp`: Make a not of a source of validation failures that we need to clean up sooner or later (just not in scope for this bug fix change). * `ir-ssa.cpp`: * When checking if something is an access chain with a promotable var at the end, make sure the recursive case recurses into the "access chain" logic instead of the leaf case * Add some assertions to guard the assumption that any access chain we apply has been scheduled for removal * Correctly emit an element extract instead of getting an element address when promoting an element access into an array being promoted * Eliminate a wrapper routine that was setting up an `IRBuilder` and use the one from the block being processed in the SSA pass (since it was set up for stuff just like this) * `ir-validate.cpp` * Add a hack to avoid validation failures when running IR validation on the stdlib code. This case triggers for an initializer (`__init`) declaration inside an interface, since the logical "return type" is the interface type itself, which has no representation at the IR level and thus yields a null result type in a `FuncType` instruction.
2018-04-23	Fix successor computation for `switch` instruction (#520)	Tim Foley
	Fixes #519 The code was leaving out the `default` label from the successor list, which would break any passes that require an accurate CFG (with the big one right now being the SSA-formation pass).
2018-04-20	Better diagnostics when compilation is aborted (#517)	Tim Foley
	* Improve messages when compilation is aborted. Make sure to include the information from any `Slang::Exception` that was thrown, so that the poor user can at least point us at our own message string from an assertion failure. This doesn't provide them line-number information in their code or the Slang codebase, so there is still work to be done in making the compiler more friendly about this stuff. * When aborting compilation, try to note what source location we were working on This is handled by having exception handlers on the stack at key bottleneck points in semantic checking and IR generation, which can then emit a diagnostic to note what we were working on when things failed. This is not intended to be an indiciation to the user that their code is at fault for a compiler crash (it is always our fault), but might give them a chance to work around whatever bug is blocking them.
2018-04-20	Diagnose use of an implicit cast as an argument for an `out` parameter (#516)	Tim Foley
	Work on #499 Two big fixes here: * The logic for checking constraints on `out` arguments wasn't actually triggering because it relied on function parameters being given an `OutType` if they are marked `out`, but the code wasn't actually doing that. Fixing the computation of types for functions resolved that issue. * Next, I added a specific diagnostic to follow up the "expected an l-value" error to let the user know that their argument was implicitly converted, and that is why it doesn't count as an l-value in Slang's rules. I've added a test case to ensure that we retain this diagnostic until we can do a true fix for the issue. The right long-term fix is to have an AST representation of all the implicit casts involved (e.g., in both directions for an `inout` parameter), and then have the IR generate explicit code for the conversions in each direction (the `LoweredVal` representation can handle this sort of thing).