summaryrefslogtreecommitdiffstats
path: root/source/slang/ir-insts.h
Commit message (Collapse)AuthorAge
* Use slang- prefix on slang compiler and core source (#973)jsmall-nvidia2019-05-31
| | | | | | | | | | | | * Prefixing source files in source/slang with slang- * Prefix source in source/slang with slang- prefix. * Rename core source files with slang- prefix. * Update project files. * Fix problems from automatic merge.
* Hotfix/improve glsl semantic conversion review (#968)jsmall-nvidia2019-05-22
| | | | | | | | | | | | | | | | * Small changes based on review * Remove the explicit 'nominal' tests * Made isValueEqual and isEqual on on IRConstant take a pointer * Small improvements to comments, and clarity of using 'nominal' * Simplify comparison by just using isTypeOperandEqual as basis for isTypeEqual * Use cross compile to test half-texture.slang on glsl * Don't need half-texture.slang.expected * Fix handling of nominal comparison based on review, ensuring that for nominal insts, they can only be compared by pointer.
* Allow interface types to be used inside of structs (#966)Tim Foley2019-05-21
| | | | | | | | | | | | | | | | | | | | | | | Previously, interface types were allowed to be used directly as function parameters, local variables, and global shader parameters. Using an interface type as a field of a `struct` type or a `cbuffer` declaration was not implemented. This change adds that support, and fixes several unrelated issues that caused problems in doing so. * The most important work here was adding a case for `IRStructType` to `maybeSpecializeBindExistentialsType` that creates a specialized variant of a `struct` type on-demand based on specialization operands. This logic loops over the fields of the original struct, and creates new fields by binding the existentials/interfaces in the type of each field. Caching is used to ensure that the same `struct` type specialized to the same operands should yield the same result. * To allow subsequent specialization to occur when a `struct` with interface-type fields is used, it was also necessary to specialize field-address and field-extract instructions in cases where the value that the field is being extracted from is a `wrapExistential`. * Similarly, we neede to make sure that the logic for specializing called functions based on the concrete types for interfaces in the argument list would also take into account `struct` types with existential-type fields inside of them. * Doing the above changes revealed some serious flaws in how the `ir-specialize.cpp` logic was tracking which instructions still needed to be processed. It had previously been assuming that it could assume any relevant instructions were on its work list, and when the work list went empty it could exit. This runs into two problems: (1) sometimes we create new instructions when specializing, and it may be impossible to ensure that all the new instructions (e.g., those created by utility routines in other files) get added to the work list, and (2) sometimes the instruction(s) that need to be re-visited when we specialize something aren't its direct users, but instead somethign that transitively depends on the instruction. These issues were fixed by two changes to the pass: (1) we now maintain a list of known "clean" instructions instead of implicitly using the work-list as a list of "dirty" instructions (so that implicitly any new instruction is dirty), and periodically iterating over all instructions to add the non-clean ones to the work list for processing, and (2) when an instruction is specialized/replaced we mark everything that transitively depends on it "dirty" (by removing it from the "clean" list). * Added some logic to "fix up" the type of an IR function after changes that might modify its parameter list. Failing to have this logic meant that certain types were still live (because they were referenced by a function type) that couldn't actually be emitted as legal HLSL/GLSL. * Added some special cases to IR instruction creation for `wrapExistential` and `BindExistentialsType` so that they act as no-ops when there are no "slots" providing specialization information. This helps avoid some special cases when specializing structure fields (since some fields specialization and others don't, so in general there are zero or more operands specific to each field). * Added a test case that uses an interface type in a `cbuffer`, as well as an interface type in a `struct` passed as an entry-point `uniform` parameter. * Fixed up some parts of the `.natvis` files to reflect naming changes from a previous PR and thus restore some of the useful Visual Studio debugging experience for Slang.
* String/List closer to conventions, and use Index type (#959)jsmall-nvidia2019-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * List made members m_ Tweaked types to closer match conventions. * Use asserts for checking conditions on List. Other small improvements. * List<T>.Count() -> getSize() * List<T> Add -> add First -> getFirst Last -> getLast RemoveLast -> removeLast ReleaseBuffer -> detachBuffer GetArrayView -> getArrayView * List<T>:: AddRange -> addRange Capacity -> getCapacity Insert -> insert InsertRange -> insertRange AddRange -> addRange RemoveRange -> removeRange RemoveAt -> removeAt Remove -> remove Reverse -> reverse FastRemove -> fastRemove FastRemoveAt -> fastRemoveAt Clear -> clear * List<T> FreeBuffer -> _deallocateBuffer Free -> clearAndDeallocate SwapWith -> swapWith * List<T> SetSize -> setSize Reserve -> reserve GrowToSize growToSize * UnsafeShrinkToSize -> unsafeShrinkToSize Compress -> compress FindLast -> findLastIndex FindLast -> findLastIndex Simplify Contains * List<T> Removed m_allocator (wasn't used) Swap -> swapElements Sort -> sort Contains -> contains ForEach -> forEach QuickSort -> quickSort InsertionSort -> insertionSort BinarySearch -> binarySearch Max -> calcMax Min -> calcMin * Initializer::Initialize -> initialize List<T>:: Allocate -> _allocate Init -> _init IndexOf -> indexOf * * Put #include <assert.h> in common.h, and remove unneeded inclusions * Small refactor of ArrayView - remove stride as not used * getSize -> getCount setSize -> setCount unsafeShrinkToSize->unsafeShrinkToCount growToSize -> growToCount m_size -> m_count * Some tidy up around Allocator. * Use Index type on List. * Refactor of IntSet. First tentative look at using Index. * Made Index an Int Did preliminary fixes. Made String use Index. * Partial refactor of String. * String::Buffer -> getBuffer ToWString -> toWString * Small improvements to String. String:: Buffer() -> getBuffer() Equals() -> equals * Try to use Index where appropriate. * Fix warnings on windows x86 builds.
* Initial support for the `precise` keyword (#958)Tim Foley2019-04-29
| | | | | | | | | | | | | | | | | | | | | | Fixes #858 The `precise` keyword exists in both HLSL and GLSL and when applied to a variable declaration is supposed to indicate that all computations that contribute to the value of that variable should not be altered based on "fast-math" optimizations. The main examples are that separate multiply and add operations should not be turned into fused multiply-add (fma) operations, and that operations cannot ignore the possibility of infinity or not-a-number values (e.g., by assuming that `x * 0.0f` is always `0.0f`). (Aside: it is possible that my understanding of what the semantics of `precise` are in HLSL and GLSL is imperfect so that either the GLSL variant isn't sufficient to provide the semantics of the HLSL keyword, or that the definition of "all computations that contribute" to a value isn't actually correct. We may need to revise this implementation based on subsequent learnings.) The basic idea here is to turn the AST `precise` keyword into a `[precise]` decoration in the IR and then emit that as a `precise` keyword again in the output. The main catch is that whereas most of our existing IR decorations apply to things like global shader parameters or `struct` members that usually stick around for the duration of compilation, `[precise]` will get slapped on local variables that will often get optimized away by our SSA pass. There are two ways a variable can get eliminated/replaced during the SSA pass: 1. A use of the variable can be replaced with an ordinary instruction that computes its value. 2. A use of the variable can be replaced with a reference to a "phi node" that will take on the appropriate value based on control flow. These two cases already had logic to copy a "name hint" decoration from the variable over to an instruction that will replace it, and I simply extended them to also propagate over a `[precise]` decoration. The test case added with this change intentionally constructs a case where `[precise]` needs to be propagated over to an SSA "phi node" in order to generate correct output code. The other gotcha is that we can emit variable declarations in various places in `emit.cpp`, and all of these needed to handle `[precise]`. Not only do we have actually local variables (`IRVar`), but we also have SSA phi nodes (`IRParam`), and then there are cases where an intermediate computation (an ordinary instruction) should be `[precise]` and thus we need to emit it as a temporary (not folding it into its use sites) and make sure that the temporary itself gets the `precise` keyword. I have manually confirmed that in the output SPIR-V, this change results in the `NoContraction` SPIR-V decoration being added to the relevant operations, and the output DXBC contains a multiply and an add in place of a multiply-add. The output DXIL does not show any obvious changes due to `precise`, although the exact order and operands of the math instructions emitted does differ when `precise` is added/removed. In all cases the output is equivalent to hand-written HLSL/GLSL with a `precise`-qualified local variable.
* Add better control over image formats for GLSL/SPIR-V targets (#939)Tim Foley2019-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add better control over image formats for GLSL/SPIR-V targets Currently Slang emits GLSL code assuming all R/W images need to have explicit formats, and thus we try to infer a format from the element type of the image. E.g., given a `RWTexture2D<half4>` we might infer that a qualifier of `layout(rgba16f)` should be used. This strategy has two notable shortcomings: * Sometimes the user will want a format that doesn't match an existing HLSL type. E.g., if they want the equivalent of `layout(r11f_g11f_b10f)`, then what should they put in their `RWTexture2D<...>` to make the inference do what they need? * Sometimes the user knows that they don't need to specify a format *at all*, because using the `GL_EXT_shader_image_load_formatted` extension, they can still perform non-atomic load/store on images with no format specified in the SPIR-V. This change adds two features directed at these challenges. First, we add an explicit `[format(...)]` attribute that can be used to specify an explicit image format, including ones that don't match any HLSL type. An example of using this new attribute is: ```hlsl [format("r11f_g11f_b10f")] RWTexture2D<float3> myImage; ``` For simplicity in initial bring-up, the new formats all use the same naming as formats in GLSL (this should make it easy for a programmer who knows what they expect to get in the GLSL output). We can change the naming convention for formats at a later time, so long as we keep these existing names in as a compatibility feature. Note that this is *not* given a `vk::` prefix since the attribute should signal the programmer's intent to provide an image with that format on *all* targets (although only some targets might act on that information). Also note that the attribute takes a string (`[format("rgba8")`) instead of a bare identifier (`[format(rgba8)]`) because this is consistent with the existing convention for attributes in HLSL. When `[format(...)]` is left off, the default compiler behavior will still be to infer a format, but this behavior can be overidden for a single image using an explicit format of `"unknown"`: ```hlsl [format("unknown")] RWTexture2D<float4> mysteryMachine; ``` The second new feature is that if a user knows they are coding for a GPU that supports the `"unknown"` format in all non-atomic cases, then they can opt into making that the default for images without an explicit `[format(...)]`, using the new `-default-image-format-unknown` command-line option for `slangc`. The new test case included with this change confirms that we correctly see the explicit formats in the output GLSL and *no* formats for images without explicit `[format(...)]` when using the new command-line option. The test stresses images declared at global scope, in parameter blocks, and in entry-point parameter lists, to try and make sure that all the relevant IR passes in the compiler preserve the format information. * fixup: missing file
* Improve support for interfaces as shader parameters (#886)Tim Foley2019-03-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Improve support for interfaces as shader parameters This change adds two main things over the existing support: 1. It is now possible to plug in concrete types that actually contain (uniform/ordinary) fields for the existential type parameters introduced by interface-type shader parameters. The `interface-shader-param2.slang` test shows that this works. 2. There is a limited amount of support for doing correct layout computation and generating output code that matches that layout, so that interface and ordinary-type fields can be interleaved to a limited extent. The `interface-shader-param3.slang` test confirms this behavior. There are several moving pieces in the change. * When it comes to terminology, we try to draw a more clear distinction between existial type parameters/arguments and existential/object value parametes/arguments. A simple way to look at it is that an `IFoo[3]` shader parameter introduces a single existential type parameter (so that a concrete type argument like `SomeThing` can be plugged in for the `IFoo`) but introduces three existential object/value parameters (to represent the concrete values for the array elements). * At the IR level, we support a few new operations. A `BindExistentialsType` can take a type that is not itself an interface/existential type but which depends on interfaces/existentials (e.g., `ConstantBuffer<IFoo>`) and plug in the concrete types to be used for its existential type slots. * Then a `wrapExistentials` instruction can take a type with all the existentials plugged in (possibly by `BindExistentialsType`) and wrap it into a value of the existential-using type (e.g., turn `ConstantBuffer<SomeThing>` into a `ConstantBuffer<IFoo>`). * The IR passes for doing generic/existential specialization have been updated to be able to desugar uses of these new operations just enough so that a `ConstantBuffer<IFoo>` can be used. * When we specialize an IR parameter of an interface type like `IFoo` based on a concrete type `SomeThing`, we turn the parameter into an `ExistentialBox<SomeThing>` to reflect the fact that we are conceptually referring to `SomeThing` indirectly (it shouldn't be factored into the layout of its surrounding type). * Parameter binding was updated so that it passes along the bound existential type arguments in a `Program` or `EntryPoint` to type layout, so that we can take them into account. The type layout code needs to do a little work to pass the appropriate range of arguments along to sub-fields when computing layout for aggregate types. * Type layout was updated to have a notion of "pending" items, which represent the concrete types of data that are logically being referenced by existential value slots. The basic idea is that these values aren't included in the layout of a type by default, but then they get "flushed" to come after all the non-existential-related data in a constant buffer, parameter block, etc. * The logic for computing a parameter group (`ConstantBuffer` or `ParameterBlock`) layout was updated to always "flush" the pending items on the element type of the group, so that the resource usage of specialized existential slots would be taken into account. * The type legalization pass has been adapted so that we can derive two different passes from it. One does resource-type legalization (which is all that the original pass did). The new pass uses the same basic machinery to legalize `ExistentialBox<T>` types by moving them out of their containing type(s), and then turning them into ordinary variables/parameters of type `T`. Big things missing from this change include: - Nothing is making sure that "pending" items at the global or entry-point level will get proper registers/bindings allocated to them. For the uniform case, all that matters in the current compiler is that we declare them in the right order in the output HLSL/GLSL, but for resources to be supported we will need to compute this layout information and start associating it with the existential/interface-type fields. - Nothing is being done to support `BindExistentials<S, ...>` where `S` is a `struct` type that might have existential-type fields (or nested fields...). Eventually we need to desugar a type like this into a fresh `struct` type that has the same field keys as `S`, but with fields replaced by suitable `BindExistentials` as needed. (The hard part of this would seem to be computing which slots go to which fields). As a practial matter, this missing feature means that interface-type members of `cbuffer` declarations won't work. The current tests carefully avoid both of these problems. They don't declare any buffer/texture fields in the concrete types, and they don't make use of `cbuffer` declarations or `ConstantBuffer`s over structure types with interface-type fields. * fixup: add override to methods * fixup: typos
* First steps toward supporting interface-type parameters on shaders (#852)Tim Foley2019-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * First steps toward supporting interface-type parameters on shaders What's New ---------- From the perspective of a user, the main thing this change adds is the ability to declare top-level shader parameters (either at global scope, or in an entry-point parameter list) with interface types. For example, the following becomes possible: ```hlsl // Define an interface to modify values interface IModifier { float4 modify(float4 val); } // Define some concrete implementations struct Doubler : IModifier { float4 modify(float4 val) { return val + val; } } struct Squarer : IModifier { ... } // Define a global shader parameter of interface type IModifier gGlobalModifier; // Define an entry point with an interface-type `uniform` parameter void myShader( unifrom IModifier entryPointModifier, float4 inColor : COLOR, out float4 outColor : SV_Target) { // Use the interface-type parameters to compute things float4 color = inColor; color = gGlobalModifier.modify(color); color = entryPointModifier.modify(color); outColor = color; } ``` The user can specialize that shader by specifying the concrete types to use for global and entry-point parameters of interface types (e.g., plugging in `Doubler` for `gGlobalModifier` and `Squarer` for `entryPointModifier`). The "plugging in" process is done in terms of a concept of both global and local "existential slots" which are a new `LayoutResourceKind` that represents the holes where concrete types need to be plugged in for existential/interface types. In simple cases like the above, each interface-type parameter will yield a single existential slot in either the global or entry-point parameter layout. Users can query the start slot and number of slots for each shader parameter, just like they would for any other resource that a parameter can consume. Before generating specialized code, the user plugs in the name of the concrete type they would like to use for each slot using `spSetTypeNameForGlobalExistentialSlot` and/or `spSetTypeNameForEntryPointExistentialSlot`. There are some major limitations to the implementation in this first change: * Parameters must be of interface type (e.g., `IFoo`) and not an array (`IFoo[3]`), or buffer (`ConstantBuffer<IFoo>`) over an interface type. Similarly, `struct` types with interface-type fields still don't work. * The work on interface-type function parameters still doesn't include support for `out` or `inout` parameters, nor for functions that return interface types (that isn't technically related to this change, but affects its usefullness). * No work is being done to correctly lay out shader parameters once the concrete types for existential slots are known, so that this change really only works when the concrete type that gets plugged in is empty. These limitations are severe enough that this feature isn't really usable as implemented in this change, and this merely represents a stepping stone toward a more complete implementation. Implementation -------------- The API side of thing largely mirrors what was already done to support passing strings for the type names to use for global/entry-point generic arguments, so there should be no major surprises there. The logic in `check.cpp` computes the list of existential slots when creating unspecialized `Program`s and `EntryPoint`s (this is logically the "front end" of the compiler), and then checks the supplied argument types against what is expected in each slot when creating specialized `Program`s and `EntryPoint`s. This again mirrors how generic arguments are handled. Type layout was extended to compute the number of existential slots that a type consumes, and will thus automatically assign ranges of slots to top-level and entry-point shader parameters in the same way it already allocates `register`s and `binding`s. The big missing feature is the ability to specialize a layout to account for the concrete types plugged into the existential-type slots. IR generation for specialized programs and entry points was slightly extended so that it attaches information about the concrete types plugged into the existential slots, and the witness tables that show how they conform to the interface for that slot. The linking step needed some small tweaks to make sure that information gets copied over to the target-specific program when we start code generation. The meat of the IR-level work is in `ir-bind-existentials.cpp`, which takes the information that was placed in the IR module by the generation/linking steps and uses it to rewrite shader parameters. For example, if there is a shader parameter `p` of type `IModifier`, and the corresponding existential slot has the type `Doubler` in it, we will rewrite the parameter to have type `Doubler`, and rewrite any uses of `p` to instead use `makeExistential(p, /*witness that Doubler conforms to IModifier*/)`. Once the replacement is done on the parameters, the existing work for specializing existential-based code when the input type(s) are known kicks in and does the rest. Testing ------- A single compute test is added to validate that this feature works. It is narrowly tailored to not require any of the features not supported by the initial implementation (e.g., all of the concrete types used have no members). The test case *does* include use of an associated type through one of these existential-type parameters, which has exposed a subtle bug in how "opening" of existential values is implemented in the front-end. Rather than fix the underlying problem, I cleaned up the code in the front-end to special case when the existential value being opened is a variable bound with `let`, to directly use a reference to that variable rather than introduce a temporary. Similarly, in the IR generation step, I added an optimization to make variables declared with `let` skip introducing an IR-level variable and just use the SSA value of their initializer directly instead. * fixup: missing files * fixup: incorrect type for unreachable return * fixup: actually comment ir-bind-existentials.cpp
* Split front- and back-ends (#846)Tim Foley2019-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Split front- and back-ends This change is a major refactor of several of the types that provide the behind-the-scenes implementation of the public C API. The goal of this refactor is primarily to allow for future API services that let the user operate both the front- and back-ends of the compiler in a more complex fashion. For example, as user should be able to compile a bunch of source code into modules, look up types, functions, etc. in those modules, specialize generic types/functions to the types they've looked up, and then finally request target code to be gernerated for specialized entry points. The back-end code generation they trigger should re-use the front-end compilation work (parsing, semantic checking, IR generation) that was already performed. The most visible change is that `CompileRequest` has been split up into several smaller types that take responsibility for parts of what it did: * The `Linkage` type owns the storage for `import`ed modules, and well as the `TargetRequest`s that represent code-generation targets. The intention is that an application could use a single `Linkage` for the duration of its runtime (so long as it was okay with the memory usage), so that each `import`ed module only gets loaded once. For now, this type needs to manage the search paths, file system, and source manager, because of its responsibility for loading files. * A `FrontEndCompileRequest` owns the stuff related to parsing, semantic checking, and initial IR generation. This most notably includes the `TranslationUnitRequest`s and the `FrontEndEntryPointRequest`s (which used to be just `EntryPointRequest`s). It's main job is to produce AST and IR modules for each translation unit, and to find and validate the entry points. The front-end request does *not* interact with generic arguments for global or entry-point generic parameters. * The main output of both `import` operations and front-end translation units is the `Module` type, which is just a simple container for both the AST module (to service the reflection/layout APIs, and also for semantic checking of code that `import`s the module) and the IR module (for linking and code generation). This type captures the commonalities between the old `LoadedModule` (which is now just an alias for `Module`) and `TranslationUnitRequest` (which now owns a `Module`). * The secondary output of front-end compilation is a `Program`, which comprises a list of referenced `Module`s and validated `EntryPoint`s that will be used together. Layout and code generation both need a `Program` to tell them what modules and entry points will be used together (we don't want to just code-gen everythin that has ever been loaded into the linakge). The `Program`s created by the front-end do not include generic arguments, so they may provide incomplete layout information and/or be unsuitable for code generation. * A `BackEndCompileRequest` owns stuff related to turning a `Program` into output kernels for the targets of a `Linkage`. Most of the data it owns beyond the `Program` to be compiled is minor, so this is a good candidate for demotion from a heap-allocated object to just a `struct` of options that gets passed around. * The `CompileRequestBase` type is an attempt to wrap up the common functionality of both front-end and back-end compile requests. Most of it is just exposing the availability of a linkage and `DiagnosticSink`, so this type is a good candidate for subsequent removal. The main interesting thing it has is the flags related to dumping and validation of IR, so there is probably a good refactoring still to be made around deciding how options should be handled going forward. * Behind the scenes, the `Program` type is set up to handle some level of on-line compilation and layout work. The `Program` knows the `Linkage` it belongs to, and allows for a `TargetProgram` to be looked up based on a specific `TargetRequest`. A `TargetProgram` then allows layout information and compiled kernel code to be asked for on-demand, in order to support eventual "live" compilation scenarios. * The `EndToEndCompileRequest` type is a composition/coordination type that replaces the old `CompileRequest` in a way that uses the services of the various other types. It owns a few pieces of state that only make sense in the context of an end-to-end compile (e.g., there is really no way to "pass through" code when the front- and back-ends are run separately) or a command-line compile (everything to do with specifying output paths for files is really just for the benefit of `slangc`, and might even be moved there over time). * One important detail is that the `EndToEndCompilRequest` owns all of the string-based generic arguments for both global and entry-point generic parameters. The logic in `check.cpp` for dealing with those arguments has been heavily refactored to separate out the parsings steps that are specific to end-to-end compilation with string-based type arguments, and the semantic checking steps that result in a specialized `Program` (which can be exposed through new APIs that aren't tied to end-to-end compilation). It is perhaps not surprising that this change had a lot of consequences, so I'll briefly run over some of the main categories of changes required: * I changed the way that global generic arguments are passed via API (use `spSetGlobalGenericArgs` instead of the generic arguments for `spAddEntryPointEx`, which are not just for entry-point generics), which has been a change that we've needed for a long time. This is technically a breaking API change, although we should have very few client applications that care about it. * A bunch of places that used to take "big" objects like `CompileRequest` now just take the sub-pieces they care about (e.g., a function might have only needed a `Linkage` and a `DiagnosticSink`). This makes many subroutines or "context" struct types more generally useful, at the cost of taking more parameters. * In a few cases the conceptually clean separation of the layers breaks down (often for edge-case or compatibility features), and so we may pass along additional objects that are allowed to be null, but are used when present. A big example of this is how the back-end code generation routines accept an `EndToEndCompileRequest` that is optional, and only used to check whether "pass through" compilation is needed. We should probably look into cleaning this kind of logic up over time so that we don't need to violate the apparent separation of phases of compilation. * In cases where separation of layers was being broken for the sake of GLSL features, I went ahead and ripped them out, since all of that should be dead code anyway. * In many cases I increased the encapsulation of data in the core types to help track down use sites and make sure they are following invariants better. * In cases where code was doing, e.g., `context->shared->compileRequest->session->getThing()` I have tried to introduce convenience routines so that the usage site is just `context->getThing()` to improve encapsulation and allow changes to be made more easily going forward. * The `noteInternalErrorLoc` functionality was moved off of the compile request and into `DiagnosticSink`, since that is the one type you can rely on having around when you want to note an internal error. We may consider going forward if (and how) it should reset the counter used for noting locations on internal errors. * A few APIs now take `DiagnosticSink*` arguments where they didn't before, and as a result some public APIs need to create `DiagnosticSink`s to pass in, before going ahead and ignoring the messages. In the future there should be variations of these APIs that accept an `ISlangBlob**` parameter for the output. * fixup: missing include for compilers with accurate template checking (non-VS) * fixup: review feedback
* Initial support for uniform parameters on entry points (#815)Tim Foley2019-01-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Initial support for uniform parameters on entry points The basic feature this work adds is the ability to define a shader entry point like: ```hlsl [shader("fragment")] float4 main( uniform Texture2D t, uniform SamplerState s, float2 uv : UV) { return t.Sample(s,uv); } ``` In this example, the `uniform` keyword is used to mark that the given entry point parameters are *not* varying input/output flowing through the pipeline, but rather uniform shader parameters that should function as if the shader was declared more like: ```hlsl Texture2D t, SamplerState s, [shader("fragment")] float4 main( float2 uv : UV) { return t.Sample(s,uv); } ``` Allowing `uniform` parameters on entry points makes it easier to define multiple entry points in one file without accidentally polluting the global scope with shader parameters that only certain entry points care about. This feature is also more or less a prerequisite for allowing generic type parameters directly on entry point functions, since the main use case for those type parameters is for determining what goes in various `ConstantBuffer`s or `ParameterBlock`s. There are two main pieces to the implementation. First, we need to be able to compute appropriate layout information for entry points that include `uniform` parameters. Second, we need to transform the entry point function to move any `uniform` parameters to be ordinary global-scope shader parameters, to make sure that all other back-end passes don't need to worry about this special case. The latter piece of the implementation is, relatively speaking, simpler. The pass in `ir-entry-point-uniforms.{h,cpp}` converts entry point parameters that are determined to be uniform (using the already-computed layout information) into fields of a `struct` type and then declares a global shader parameter based on that `struct` type (and applies already-computed layout information to that parameter). After that, the remaining IR passes (notably including type legalization) will handle things just as for any other global shader parameter. The changes to the layout step are more significant, but most of the changes are just cleanups and fixes to enable the feature. The two major changes that enable entry-point `uniform` parameters are: * In `collectEntryPointParameters` we now dispatch out to a new `computeEntryPointParameterTypeLayout` function, which decided whether to compute the type layout for a `uniform` parameter, or for a varying parameter (what used to be the default behavior handled by `processEntryPointParameterDecl`). * The main `generateParameterBindings` routine was extended so that it allocates registers/bindings to the resources required by each entry point (using `completeBindingsForParameter`) after it has allocated registers/binding to all of the global-scope parameters (this addition is mirrored in `specializeProgramLayout`). The effect of these changes is that the `uniform` parameters of any entry points specified in a compile request will be laid out after the global-scope parameters, in the order the entry points were specified in the compile request. A bunch of smaller changes were made around parameter layout that are worth enumerating so that the diffs make some sense: * The `EntryPointLayout` type was changed so that instead of trying to *be* a `StructTypeLayout`, it instead *owns* one, in the same fashion as `ProgramLayout`. This commonality was factored into a base class `ScopeLayout`, and a bunch of edits followed from that change. * Because `uniform` parameters are moved out of the entry point parameter list early in the IR transformations, the logic in `ir-glsl-legalize.cpp` that tried to look up parameter layout information by index would no longer work if the entry point parameter list had been altered. Instead, that logic now looks for the decorations directly on the parameters. * The `UsedRange` type in `parameter-binding.cpp` was tracking the existing parameter associated with a range using a `ParameterInfo*` (which accounts for the possibility of multiple `VarDecl`s mapping to the same logical shader parameter), when just using a `VarLayout*` is sufficient for all current use cases. The overhead of allocating a `ParameterInfo` seems like overkill for entry-point parameters, where there can't possibly be multiple declarations of the "same" parameter, so avoiding these overheads was a focus when trying to deduplicate code between the global and entry-point parameter cases. * A bunch of parameter binding logic that was specific to GLSL input has been deleted completely. There was no way to even execute this code in the compiler today, and there is pretty much zero chance of us needing (or wanting) to deal with GLSL input in the future. This includes custom `UsedRangeSet`s specific to each translation unit, which were only needed for global-scope `in` and `out` varying declarations in GLSL. * A bunch of functions with `EntryPointParameter` in their names were renamed to use `EntryPointVaryingParameter` to help distinguish that they only apply to the varying case, while entry point `uniform` parameters are handled elsewhere. * The `completeBindingsForParameter` function was re-worked into something that can be used for both global-scope shader parameters (where we have a `ParameterInfo` and possibly explicit bindings) and entry-point parameters (where we expect to have neither). This helps unify the (fairly subtle) logic for how we allocate and assign bindings for resources, constant buffers, parameter blocks, etc. * A small change was made so that the entry-point stage is attached directly to top-level parameters of the entry point, and *not* recursively to every field along the way. This could be a breaking change for some applications, but it makes more logical sense (to me); we'll have to check if this affects Falcor. This change produces different output for several of the reflection tests, but the changes are consistent with no longer attaching stage information to sub-fields of varying `struct`-type parameters. * Because there is a bunch of repeated logic in `parameter-binding.cpp` that has to do with computing a `struct` layout for ordinary/uniform data, I tried to factor that into a single `ScopeLayoutBuilder` type, which handles computing the offsets for any parameters with ordinary data, and then also handles wrapping up the layout in a constant buffer layout if there was any ordinary data at the end. * A similar convenience routine `maybeAllocateConstantBufferBinding` was added because I noticed multiple places in `parameter-binding.cpp` that were trying to allocate a constant buffer binding for global uniforms, and they were wildly inconsistent (and in most cases used logic that would only work for D3D). * The main `generateParameterBindings` routine is significantly shortened by using all of these utilities that were introduced. I tried to comment the places that changed to explain the overall flow correctly. * The `specializeProgramLayout` routine (used to take a `ProgramLayout` from `generateParameterBindings` and specialize it based on knowledge of global generic arguments) had basically been rewritten with more explicit commenting/rationale for what happens in each step. It makes use of the same shared utilities as `generateParameterBindings` and `collectEntryPointParameters`. In terms of testing: * I added a test case to specifically test the new behavior, and in particular I made sure to include a mix of both global and entry-point parameters and also to have entry-point parameters of both ordinary and resource/object types. * I tweaked an existing test for global type parameters to use an entry-point `uniform` parameter instead of a global one, in an effort to migrate it toward being able to use an explicitly generic entry point. * fixups from merge
* Initial support for dynamic dispatch using "tagged union" types (#772)Tim Foley2019-01-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Initial support for dynamic dispatch using "tagged union" types Suppose a user declares some generic shader code, like the following: ```hlsl interface IFrobnicator { ... } type_param T : IFrobincator; ParameterBlock<T : IFrobnicator> gFrobnicator; ... gFrobincator.frobnicate(value); ``` and then they have some concrete implementations of the required interface: ```hlsl struct A : IFrobnicator { ... } struct B : IFrobnicator { ... } ``` The current Slang compiler allows them to generate distinct compiled kernels for the case of `T=A` and the case of `T=B`. This means that the decision of which implementation to use must be made at or before the time when a shader gets bound in the application. This change adds a new ability where the Slang compiler can generate code to handle the case where `T` might be *either* `A` or `B`, and which case it is will be determined dynamically at runtime. This means a single compiled kernel can handle both cases, and the decision about which code path to run can be made any time before the shader executes. This new option is supported by defining a *tagged union* type. Via the API, the user specifies that `T` should be specialized to `__TaggedUnion(A,B)` (the double underscore indicates that this is an experimental and unsupported feature at present). We refer to the types `A` and `B` here as the "case" types of the tagged union. Conceptually, the compiler synthesizes a type something like: ```hlsl struct TU { union { A a; B b; } payload; uint tag; } ``` The user can then allocate a constant buffer to hold their tagged union type, and when they pick a concrete type to use (say `B`), they fill in the first `sizeof(B)` bytes of their buffer with data describing a `B` instance, and then set the `tag` field to the appopriate 0-based index of the case type they chose (in this case the `B` case gets the tag value `1`). Actually implementing tagged unions takes a few main steps: * Type parsing was extended to special-case `__TaggedUnion` as a contextual keyword. This is really only intended to be used when parsing types from the API or command-line, and Bad Things are likely to happen if a user ever puts it directly in their code. Eventually construction of tagged unions should be an API feature and not part of the language syntax. * Semantic checking was extended to recognize that a tagged union like `__TaggedUnion(A,B)` shoud support an interface like `IFrobnicator` whenever all of the case types suport it, as long as the interface is "safe" for use with tagged unions (which means it doesn't use a few of the advancd langauge features like associated types). * The IR was extended with instructions to represent tagged union types and to extract their tag and the payload for the different cases as needed. * IR generation was extended to synthesize implementations of interface methods for any interface that a tagged union needs to support. Right now the implementation is simplistic and only handles simple method requirements, which it does by emitting a `switch` instruction to pick between the different cases. * A new IR pass was introduced to "desugar" any tagged union types used in the code. The downstream HLSL and GLSL compilers don't support `union`s, so we have to instead emit a tagged union as a "bag of bits" and implement loading the data for particular cases from it manually. * Final code emit mostly Just Works after the above steps, but we had to introduce an explicit IR instruction for bit-casting to handle the output of the desugaring pass. There are a bunch of gaps and caveats in this implementation, but that seems reasonable for something that is an experimental feature. The various `TODO` comments and assertion failures in unimplemented cases are intended, so that this work can be checked in even if it isn't feature-complete. * fixup: missing files * fixup: typos
* Improve handling of {} initializer list expressions (#778)Tim Foley2019-01-16
| | | | | | | | | | | | | | | | | | Fixes #775 It was reported (in #775) that Slang doesn't handle initializer-list syntax when initializing matrix variables. When starting on a fix for that it became apparent that the time was right to fix two broad issues in the compiler's current handling of `{}`-enclosed initializer lists. The first issue was that the front-end checking of initializer lists wasn't handling the C-style behavior where an initializer list can either contain nested `{}`-enclosed lists for sub-arrays/-structures, or directly contain "leaf" values for initializing those aggregates. For example, the following two variable declarations ought to be equivalent: ```hlsl int4 a[] = { {1, 2, 3, 4}, {5, 6, 7, 8} }; int4 b[] = { 1, 2, 3, 4, 5, 6, 7, 8 }; ``` Getting this distinction right is important because we want to support initializing a matrix either from a list of vectors for its rows, or a list of scalars for its elements (in row-major order). The front-end semantic checking logic for initializer lists was revamped so that it conceptually tries to "read" an expression of a desired type from the initializer list, and decides at each step whether to consume a single expression by coercing it to the desired type, or to recursively read multiple sub-values to construct the type as an aggregate. The logic for deciding between direct vs aggregate initialization could potentially use some tweaking, but luckily it should always handle the case where users introduce explicit `{}`-enclosed sub-lists to make their intention clear, so that existing Slang code should continue to work as before. The second issue was that initializers without the expected number of elements weren't implemented in code generation, so they would lead to internal compiler errors. This change revamps the codegen logic for initializer lists so that it can synthesize default values for fields/elements that were left out during initialization. This includes an attempt to support default initialization of `struct` fields based on explicitly written initialization expressions.
* Refactor several IR passes (#761)Tim Foley2018-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | * Refactor several IR passes This change takes some IR passes that lived together in `ir.cpp` and moves them into their own files to improve clarity. In most cases these were passes introduced early in the life of the IR, so that it didn't seem like a big deal to have them all in one file, but now that `ir.cpp` has grown unwieldly this seems like an important cleanup to make. To give a quick rundown of the passes involved: * The IR "linking" step has been pulled out to `ir-link.{h,cpp}`. This code for this pass is pretty much identical to what was in `ir.cpp`, and no attempt has been made to clean up or refactor it in the current change. * The GLSL legalization step has been pulled out to `ir-glsl-legalize.{h,cpp}`. This used to be invoked directly from the linking step, but has been made a new top-level pass invoked from `emit.cpp`. Just like with the linking, the code in the new file is just a copy-paste of what was in `ir.cpp`, and no attempt at cleanup has been made. Also note that it might be a good idea to move this pass later in the overall sequence, but this PR doesn't attempt to do that as it could change results. * The generic specialization step has been pulled out to `ir-specialize.{h,cpp}`. The file name does not explicitly reference *generic* specialization because I anticipate this pass having to perform other kinds of specialization as well. The code in this case amounts to a heavy cleanup/refactoring pass and thus deserves careful scrutiny. The reason for the cleanup is that the generic specialization step used to be part of the "linking" step long ago, and continued to share infrastructure with it long after that stopped making sense. The newly cleaned up pass has much simpler logic that should be easy enough to follow from the comments. * In order to reduce code dulication, the IR "cloning" part of the `ir-specialize-resources.{h,cpp}` pass was pulled into its own files (`ir-clone.{h,cpp}`) that both the generic specialization step and the resource-based specialization step now share. The remaining changes then pertain to deleting a bunch of code out of `ir.cpp` and adding the new files to the build. The only test that needed updating was `vkray/raygen`, where some subtle ordering change in the refactored generic specialization logic has lead to the relative order of the specialized `TraceRay` and `saturate` functions beind reversed. * fixup: typo in assert * fixup: typos in comments
* First step toward supporting use of interfaces as existential types (#716)Tim Foley2018-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * First step toward supporting use of interfaces as existential types Traditional generics involve universal quantification. E.g., a declaration like: ``` void drive<T : IVehicle>(T vehicle); ``` indicates for *for all* types `T` that implement the `IVehicle` interface, the `drive()` function is available. In contrast, whend directly using an interface type like: ``` IVehicle v = ...; v.doSomething(); ``` we only know that there *exists* some concrete type (we could call it `E`) such that `v` refers to a value of type `E`, and `E` implements the `IVehicle` interface. In order to perform an operation like `v.doSomething()` we need to "open" the existential value so that we can look at the concrete type and how it implements the `IVehicle.doSomething` requirement. This change adds a very explicit representation of existentials to Slang's IR. An operation like `e = makeExistential(v, w)` creates a value of some existential type (interfaces being our only existential types for now), by wrapping a concrete value `v` (the type of `v` can be seen as an implicit operand) and a witness table `w` showing that the type of `v` implements the requirements of the chosen interface type. In turn, opening of an existential is handled with operations `extractExistential{Value|Type|WitnessTable}` which pull the corresponding piece of information out of a value of existential type (which somewhere in the code had to have been created with `makeExistential`). The change includes a trivial simplification pass that can detect cases where an `extractExistential*` operation is applied direclty to a `makeExistential` operation, so that there is only one possible result that could be extracted. This allows for simplification of existential types used in trivial ways for local variables (this is mostly so I can check in a functional test, rather than to actually support useful code involving interfaces right now). The logic in the semantic checking phase of the compiler is comparatively more complex. When we are about to perform member lookup given an expression like `obj.member` we will first check if `obj` has an existential type, and if it does we will construct a suitable local context in which we extract the value, type, and witness table from the existential (these all become explicit AST expression nodes), and then use the extracted value as the base of the lookup operation. The nature of existential values is that two different values with the same existential (interface) type could wrap concrete values with differnt types, so that we need to carefully refer only to the extracted type/value/witness-table of specific *values*. We handle this right now by conceptually moving the existential-type value into a local variable (by introducing a `LetExpr` that amounts to `let v = <init> in <body>`) and then require that the extract expressions must refer to the (immutable) variable declaration from which they are extracting a value. (Eventually we should expand this so that when using an immutable local variable of existential type we just use that variable as-is rather than introduce a new temporary) A simple test case is included that uses an interface type in an almost trivial way for a local variable; this test can be run and produces the expected results. A more complex test case that passes an existential into a function is included, but left disabled because a more aggressive simplification approach is required to generate working code from it. * Add missing file for expected test output * Fixups for merge from top-of-tree
* Specialize away resource-type function parameters (#759)Tim Foley2018-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Specialize away resource-type function parameters Work on #397. Introduction ------------ Suppose a user writes a function that takes a resource type as a parameter: ```hlsl float4 getThing(RWStructuredBuffer<float4> buffer, int index) { return buffer[index]; } ``` This function creates challenges when generating code for GLSL-based targets, because a global shader parameter of type `RWStructuredBuffer`: ```hlsl RWStructuredBuffer<float4> gBuffer; ``` translates to a global GLSL `buffer` declaration: ```hlsl buffer _S0 { float4 _data[]; } gBuffer; ``` There is no equivalent to that `buffer` declaration that can be used in function parameter position, and it is illegal in GLSL to pass `gBuffer` into a function. (Aside: yes, we could in principle translate a function parameter like `RWStructuredBuffer<float4> buffer` to `float4 buffer[]`, but that will not in turn generalize to arrays of structured buffers; it is a dead-end strategy) The solution employed by many shader compilers is to "inline everything" to eliminate the need for parameters of resource types, and then rely on dataflow optimization to eliminate locals of resource types. This strategy can of course lead to an increase in code size, and it also means that call stacks are lost when doing step-through debugging. Another serious issue is that an "early `return`" from a function can turn into the equivalent of a multi-level `break` when inlined, and not all of our targets support multi-level `break`. The solution implemented in this change works around some, but not all, of the problems with full inlining. The approach here generates specialized versions of a function like `getThing`, adapted to the actual arguments provided at different call sites. Thus if we have code like: ```hlsl RWStructuredBuffer<float4> gA; RWStructuredBuffer<float4> gB[10]; ... getThing(gA, x); getThing(gA, y); getThing(gB[someVal], z); ``` we will generate two specializations of `getThing`: one specialized for the `buffer` parameter being `gA` and the other for `gB`: ```hlsl float4 getThing_gA(int index) { return gA[index]; } float4 getThing_gB(int _val, int index) { return gB[_val][index]; } ``` and the call sites will change to match: ```hlsl getThing_gA(x); getThing_gA(y); getThing_gB(someVal, z); ``` Note how in the case where the argument being passed in was obtained by indexing into an array of resources, the callee is specialized to the identity of the global shader parameter (`gB`), and now accepts a new parameter to indicate the array index into it. While this description motivates the change based on GLSL output, the same basic issue can arise for other targets. For example, while current HLSL has added the `ConstantBuffer<T>` type, it is not supported on older targets, and it turns out that even dxc does not allow functions to have `ConstantBuffer<T>` parameters. Longer-term, we will likely need to do even more aggressive specialization both in order to generate SPIR-V output directly, and also to deal with function that have return values or `out` parameters of resource types. Implementation -------------- The meat of the change is in `ir-specialize-resources.{h,cpp}`, where we have a pass that looks at all call sites (`IRCall` instructions) in the program, and attempts to replace them with calls to specialized functions, where the specializations are generated on-demand. The code in this pass is heavily commented, so hopefully it serves to explain itself all right. After specialization is complete, we may still have functions like the original `getThing` that will produce invalid code when emitted as GLSL, so we need a way to make sure they don't appear in the output. To date we've had some very ad hoc approaches for ignoring IR constructs that we don't want to affect emitted code, but this change goes ahead and adds a more real dead code elimination (DCE) pass in `ir-dce.{h,cpp}`. This pass follows a straightforward approach of tagging instructions that are "live" and then propagating liveness through the whole program, before making a single pass to delete anything that isn't live. When I first added the DCE pass it eliminated *everything* because there were no "roots" for liveness. I solved this for now by adding a new decoration, `IREntryPointDecoration`, to mark shader entry points in the IR which should always be live (as should anything they depend on). A secondary problem that arose was that for GLSL ray tracing shaders it is possible for the incoming/outgoing payload or attributes parameters to be unused, but eliminating them as dead would change the signature of a shader an potential break the rules for how ray tracing programs communicate. I added a very simple `IRDependsOnDecoration` that allows one IR instruction to keep another alive *as if* it used it, without actually using it. There's also a fixup in the IR dumping logic where I was forgetting to store anything in the mapping from instruction to their names, so that the name of an instruction was getting incremented each time it was referenced. Testing ------- There are three different tests added as part of this change: * The `compute/func-resource-param` test covers the basic `RWStructuredBuffer` case above, which we expect to work fine for D3D11/12, but fail for Vulkan without specialization. * The `cross-compile/func-resource-param-array` test covers the case where we don't just have one resource, but an array of them. This is not an end-to-end compute test primarily because our `render-test` application doesn't yet handle arrays of resources correctly in its binding logic. * The `compute/func-cbuffer-param` test covers the case of a function with a `ConstantBuffer<T>` parameter, which requires specialization to become valid for any of our targets. * fixup: warnings/errors from other compilers * fixup: typos and cleanup * fixup: typos
* Represent global shader parameters explicitly in the IR (#756)Tim Foley2018-12-14
| | | | | | | | | | | | | | | | | | Before this change, global shader parameters were represented in the IR as just being ordinary global variables. The only indication that a particular global represented a parameter was when it got a layotu attached to it (as part of back-end processing), and we've had a number of bugs related to layouts being dropped so that what should have been a shader parameter turned into an ordinary global variable in the output. This change is more strongly motivated by the fact that making shader parameters look like globals means that we cannot easily reason about their value when doing IR transformations. If we see two `load`s from the same global variable can we assume they yield the same value? In the general case we cannot, and this means that any transformation that wants to rely on the fact that an input `Texture2D` shader parameter can't actually change over the life of the program needs to do extra work. The fix here is to introduce a new kind of IR instruction that represents a global shader parameter directly (not a pointer to it as a global would), at which point there isn't even such a notion as a "load" from the parameter, since it represents the value directly. In several cases logic that used to apply to global variables in case they were shader parameters (by looking for a layout) is now moved to apply to these global parameters. The biggest source of issues in this change was that switching from pointers to plain values to represent these shader parameters stresses different cases in type legalization. I also had to deal with the case of legalization for GLSL where we actually *do* need global shader parameters that are writable (since varying output goes in the global scope), but in that case I borrowed the use of pointer-like `Out<...>` and `InOut<...>` types to represent that intent, which we were already using for function parameters representing outputs. A few tests started failing because the changes lead to a slightly different order of code emission, which in some HLSL tests resulted in a function parameter named `s` getting emitted before a global parameter named `s`, leading to the latter getting the name `s_1` instead of `s_0`. A few SPIR-V tests started failing because the new approach means that we no longer end up performing a load from all varying input parameters at the start of `main` and instead reference the varying inputs directly. The resulting code is more idomatic, but it differed from the baselines for those tests.
* Move mangled name out of IRGlobalValue (#752)Tim Foley2018-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Move mangled name out of IRGlobalValue Previously the `IRGlobalValue` type was used as a root for all IR instructions that can have "linkage," in the sense that a definition in one module can satisfy a use in another module. The mangled symbol name was stored in state directly on each `IRGlobalValue`, which created some complications, and also forced IR instructions that wanted to support linkage to wedge into the hierarchy at that specific point. This change moves the mangled name out into a decoration: either an `IRImportDecoration` or an `IRExportDecoration`, both of which inherit from `IRLinkageDecoration` which exposes the mangled name. This change has a few benefits: * We can now have any kind of instruction be exported/imported, without having to inherit from `IRGlobalValue`. This could potentially let `IRStructType` and `IRWitnessTable` be simplified to just have operand lists instead of dummy chldren as they do today. * We can now easily have "global values" like functions that explicitly *don't* get linkage, instead of using a null or empty mangled name as a marker. * We can use the exact opcode on a linkage decoration to distinguish imports from exports, which could be used to more accurately resolve symbols during the linking step. Other than adding the decorations and making sure that AST->IR lowering adds them, the main changes here are around any code that used `IRGlobalValue`. Variables and parameters of type `IRGlobalValue*` were changed to `IRInst*` easily, so the main challenge was around code that *casts* to `IRGlobalValue*. In cases where a cast to `IRGlobalValue` also performed a test for the mangled name being non-null/non-empty, we simply switched the code to check for the presence of an `IRLinkageDecoration`, since that is the new way of indicating a value with linakge. Most of the serious complications arose in `ir.cpp` around the "linking"/target-specialization and generic specialization steps. The "linking" logic was checking for `IRGlobalValue` to opt into some more complicated cloning logic, and just checking for a linkage decoration here wasn't sufficient since the front-end *does* produce global values without linkage in some cases (e.g., for a function-`static` variable we produce a global variable without linkage). This logic was updated to just check for the cases that used to amount to `IRGlobalValue`s directly by opcode. It might be simpler in the short term to have kept `IRGlobalValue` around to make the existing casts Just Work, but I'm confident that this logic could actually be rewritten for much greater clarity and simplicity and that is the better way forward. The generic specialization logic was using some really messy code to generate a new mangled name to represent the specialized symbol, and then searching for an existing match for that name. The original idea there was that an IR module could include "pre-specialized" versions of certain generics to speed up back-end compilation by eliminating the need to specialize in some cases, but this feature has never been implemented so the overhead here is just a waste. Instead, I moved generic specialization to use a simpler dictionary to map the operands to a `specialize` instruction over to the resulting specialized value. This allows for some simplifications in the name mangling logic, because it no longer needs to figure out how to produce mangled names from IR instructions representing types/values. As part of this change I also overhauled the IR emit logic to produce cleaner output by default, borrowing some of the ideas from the logic in `emit.cpp`. IR values are now automatically given names based on their "name hint" decoration, if any, to make the code easier to follow, and I also made it so that types and literals get collapsed into their use sites in a new "simplified" IR dump mode (which is currently the default, with no way to opt into the other mode without tweaking the code). The resulting IR dumps are much nicer to look at, but as a result the one test that involves IR dumping (`ir/string-literal`) doesn't really test what it used to. One weird issue that came up during testing is that the `transitive-interface` test had previously been producing output that made no sense (that is, the expected output file wasn't really sensible), and somehow these changes were altering its behavior. Changing the test to use `int` values instead of `float` was enough to make the output be what I'd expect, and hand inspection of generating DXBC has me convinced we were compiling the `float` case correctly too. There appears to be some issue around tests with floating-point outputs that we should investigate. * fixup: C++ declaration order
* Decorations are instructions (#748)Tim Foley2018-12-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Make a test case use IR serialization * Make all IR instructions usable as parents This makes it so that every `IRInst` has the list of children that used to be on `IRParentInst` and eliminates `IRParentInst`. Most places in the code were only checking against `IRParentInst` so that they could know whether there were child instructions to iterate over. This change bloats the size of every instruction by two pointers, but we hope to be able to eliminate that overhead with a better encoding later. * Change IR decorations to be instructions. The main change here is that `IRDecoration` now inherits from `IRInst`, and `IRInst` now has a single linked list that holds both decorations *and* children. At each point where code used to loop over `getChildren()` on an `IRInst`, I checked whether it made sense to leave the operation as processing just the children, or if it should process both decorations and children. The thorniest bit was making sure the logic for inserting an instruction into a parent is correct. For the most part, once IR code is built all insertions are explicitly before/after another instruction, so the ordering can't get messed up. The sticking point is any code that does an explicit `insertAtStart` or `insertAtEnd`, but I surveyed those to make sure they are correct in context, and I also made all insertions bottleneck through one routine that does a better job of asserting the preconditions than what was there before. We may still want a "smart" insertion function at some point so that if somebody does `someDecoration->insertAtEnd(someInst)` the decoration intelligently goes to the end of the decoration list, and not the entire decorations-and-children list. All of the existing decoration types were refactored to provide accessors for their operands, rather than directly exposing fields. In most cases the operands are required to be `IRConstant` nodes of fixed types. Not all of these types need to be kept around in the new approach, but they were left in so that as much existing code as possible can be kept working. The `IRBuilder` was extended with factory functions to make the various decoration types and attach them. All the fields in concrete decorations that were using `StringRepresentation` or `Name` pointers are now using IR-level string operands which provide their value as an `UnownedStringSlice`, so logic that was working with those decoration values needed to be updated here and there. I also needed to add the logic to clone string-literal values to the IR cloning pass, since they are now being used in almost every piece of code. A new type of constant IR instruction for literal pointers was added, to handle the cases where an IR decoration needs an operand that is a raw AST-level pointer. These are even being serialized, although we obviously should not rely on them to round-trip through serialization in the future. Ideally, a follow-on change should add a cleanup pass where we remove any decorations from a module that shouldn't be allowed in the serialized code. The biggest overall cleanup is in the serialization logic, where a lot of code just disappears because it can process the raw "decorations and children" list as the logical children of an IR instruction. The only special cases left are literals (which seem like they will always need special-casing) and global values (because they have a mangled name, which we plan to move into a decoration). One other example of a simplification made possible by this change: the `IRNotePatchConstantFunc` instruction was implemented as an instruction only because it couldn't be encoded as a decoration at the time (it needed to have an operand that referenced an IR function). The IR dumping logic was also updated (which meant a change to the `ir/string-literal` test) to try to make it print out all decorations a bit more systematically now that they are encoded like other instructions. The formatting isn't quite perfect, but it is good enough to be able to read what is going on. I didn't include updates to the validation logic to ensure that decorations are being added in ways that follow the invariants, but that would be a nice thing to add next. * fixup: 64-bit issues * fixup: forward declaration issues
* Add support for globallycoherent modifier (#732)Tim Foley2018-11-29
| | | | | | | | | The `globallycoherent` modifier indicates that resource might be read or written by threads outside of the current thread group, so that any memory barriers that affect it should guarantee coherency at the global memory scope, and not just thread-group scope. The equivalent GLSL modifier appears to be `coherent`. This change adds the front-end modifier, transforms it into an IR-level decoration during lowering, and then checks for the modifier during code emit. Note: this logic may not behave correctly when `globallycoherent` is added to a field in a `struct`, since the modifier would then need to be propagated to any variables created during type legalization. Checking up on that is left to future work. Note: it isn't entirely clear if `globallycoherent` should be treated as a declaration modifier or a type modifier. The point is moot for now because Slang doesn't have any support for type modifiers, but when we get around to that we will need to make a decision.
* Feature/early depth stencil (#727)jsmall-nvidia2018-11-21
| | | | | | | | | | | | | | * First pass support for early depth stencil. * Add a simple test to check if output has attributes. * Use cross compilation to test [earlydepthstencil] on glsl. * If target is dxil, use dxc to test against. Add hlsl to test earlydepthstencil against. * * Added spSessionHasCompileTargetSupport * Made slang-test use spSessionHasCompileTargetSupport to ignore tests that cannot run
* Add callable shader support for Vulkan ray tracing (#718)Tim Foley2018-11-12
| | | | | | | | | | | | | | | | | | * Add callable shader support for Vulkan ray tracing This change extends the previous work to update Vulkan ray tracing support for the finished `GL_NV_ray_tracing` spec. One of the features missing in the experimental extension that was added to the final spec is "callable shaders," which allow ray tracing shaders to call other shaders as general-purpose subroutines. Most of the implementation work here mirrors what was done for the `TraceRay()` function to map it to `traceNV()`. We map the generic `CallShader<P>` function to the non-generic `executeCallableNV`, with a payload identifier that indicates a specific global variable of type `P` (the global variable being generated from a `static` local in `CallShader`). A new modifier is added to identify the payload structure, and the parameter binding/layout logic introduces a new resource kind for callable-shader payload data (where previously the logic had assumed ray and callable payloads should use the same resource kind). Two test shaders are included: one for the callable shader (`callable.slang`) and one for a ray generation shader that calls it (`callable-caller.slang`). Just for kicks, the payload data type is defined in a shared file so that we can be sure the two agree (trying to emulate what might be good practice, and ensure that ray tracing support works together with other Slang mechanisms). * Typo fix: assocaited->associated One instance was found in review, but I went ahead and fixed a bunch since I seem to make this typo a lot. * Typo fix: defintiion->definition
* Update Vulkan ray tracing support to final extension spec (#717)Tim Foley2018-11-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Update version of glslang used * Update VK raytracing support for final extension spec A lot of this change is just plain renaming: The `NVX` suffixes become just `NV`, and the extension name changes from `GL_NVX_raytracing` to `GL_NV_ray_tracing`. The Slang standard library and the GLSL baselines for the tests are consistently updated. The other detail is that the final spec requires the "payload" identifier in a `traceNV()` call to be a compile-time constant, which means it cannot be defined as a local variable first, as in: ```glsl int payloadID = 0; traceNV(..., payloadID); // ERROR ``` In terms of how the original support was implemented, the payload ID is being computed via a special builtin function that maps each global GLSL payload variable to a unique ID. There are a few ways we could try to resolve the problem here: 1. We could aspire to put our equivalent of the `constexpr` modifier on the output of the function, so that the GLSL variable gets declared `const` and thus fits the GLSL rules for a constant expression. 2. We could introduce a pass to replace the payload-location instructions with literal integers. 3. We could use a special-purpose instruction instead of a builtin function call, and have that instruction indicate that it doesn't have side effects (so it can be folded into the call site) 4. We could somehow mark the builtin function as not having side effects. We choose option (4) simply because it provides a feature that could have other applications. This change adds a `[__readNone]` attribute that can be applied to function declarations to express a promise on the part of the programmer that the given function has no side effects and computes its result strictly from the bits of its input arguments (and not things they point to, etc.). This mirrors an equivalent function attribute in LLVM. We mark the function that computes a ray payload location with this attribute, and propagate the attribute through the layers of the IR, so that when the emit logic asks if an operation has side effects (to see if it can be folded into the arguments of a subsequent expression), we get an affirmative response. This change should get all of the features that were present in the experiemntal `NVX` extension working with the final extension spec. It does not address callable shaders, which will come as a subsequent change.
* Fix a crash on function-static variables with initializers (#703)Tim Foley2018-10-30
| | | | | This code path hadn't been used, and it had a crash due to not inserting the basic blocks it created (for initializing the variable) into the parent function. The fix adds a bit more smarts to the `IRBuilder` to help with inserting basic blocks into the flow of a function. The actual user issue was around `static const` declarations, and it is clear that the code is incorrectly treating a function local `static const` as if it were just `static`. That will need to be fixed in another change.
* Feature/serial string pool refactor (#702)jsmall-nvidia2018-10-30
| | | | | | | | | | | | | | | | * Ongoing serialization for full debug work. * Use StringRepresentationCache and StringSlicePool for serialization. * Removed some older path handling for serialization which had some wrong underlying assumptions. * Builds with refactored use of SubStringPool in ir-serialize. * Removed prohibitedCategories because not used anywhere. * Add category 'compatibility-issue' * Remove work in progress on debug serialization.
* Vulkan implicit sampler fixups (#686)Tim Foley2018-10-19
| | | | | | | | | | | | | | | | * Fix sampler-less texture functions (#685) * Fix sampler-less texeture functions I'm honestly not sure how the original work on this feature in #648 worked at all (probably insufficient testing). We have these front-end modifiers to indicate that a particular function definition requires a certain GLSL version, or a GLSL extension in order to be used, and they are supposed to be automatically employed by the logic in `emit.cpp` to output `#extension` lines in the output GLSL. However, it turns out that nothing is actually wired up right now, so that adding the modifiers to a declaration is a placebo. This change propagates the modifiers through as decorations, and then uses them during GLSL code emit, which allows the functions that require `EXT_samplerless_texture_functions` to work. * fixup: 32-bit warning * Add serialization support for GLSL extension/version decorations
* Add a warning on missing return, and initial SCCP pass (#671)Tim Foley2018-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add a warning on missing return, and initial SCCP pass The user-visible feature added here is a diagnostic for functions with non-`void` return type where control flow might fall off the end. This *sounds* like a trivial diagnostic to add as part of the front-end AST checking, but that can run afoul of really basic stuff like: ```hlsl int thisFunctionisOkay(int a) { while(true) { if(a > 10) return a; a = a*2 + 1; } // no return here! } ``` This function "obviously" doesn't need to have a `return` statement at the end there, but realizing this fact relies on the compiler to understand that the `while(true)` loop can't exit normally, and doesn't contain any `break` statement. One can write "obvious" examples that need more and more complex analysis to rule out. The answer Slang uses for stuff like this is to do the analysis at the IR level right after initial code generation (this would be before serialization, BTW, so that attached `IRHighLevelDeclDecoration`s can be used). When lowering the AST to the IR, we always emit a `missingReturn` instruction (a subtype of `IRUnreachable`) at the end of its body if it isn't already terminated. The IR analysis pass to detect missing `return` statements is then as simple as just walking through all the functions in the module and making sure they don't contain `missingReturn` instructions. For that simple pass to work, we first need to make some effort to remove dead blocks that control flow can never reach. This change adds a very basic initial implementation of Spare Conditional Constant Propagation (SCCP), which is a well-known SSA optimization that combines constant propagation over SSA form with dead code elimination over a CFG to achieve optimizations that are not possible with either optimization along. For the moment, we don't actually implement any constant *folding* as part of the SCCP pass, so we can eliminate the dead block in a case like the function above (and those in the test case added in this change), but will not catch things like a `while(0 < 1)` loop. Handling more "obvious" cases like that is left for future work. * fixup: warning on unreachable code * Handle case where user of an inst isn't in same function/code The code as assuming any instruction in the SSA work list has to come from the function/code being processed, but this misses the case where an instruction in a generic has a use inside the function that the generic produces. This change adds code to guard against that case.
* Support cross-compilation of ray tracing shaders to Vulkan (#663)Tim Foley2018-10-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Move to newer glslang * Support cross-compilation of ray tracing shaders to Vulkan This change allows HLSL shaders authored for DirectX Raytracing (DXR) to be cross-compiled to run with the experimental `GL_NVX_raytracing` extension (aka "VKRay"). * The GLSL extension spec is marked as experimental, so that any shaders written using this support should be ready for breaking changes when the spec is finalized. * "Callable shaders" are not exposed throug the GLSL extension, so this feature of DXR will not be cross-compiled. * The experimental Vulkan raytracing extension does not have an equivalent to DXR's "local root signature" concept. This does not visibly impact shader translation (because the local/global root signature mapping is handled outside of the HLSL code), but in practice it means that applications which rely on local root signatures on their DXR path will not be able to use the translation in this change as-is; more work will be needed. The simplest part of the implementation was to go into the Slang standard library and start adding GLSL translations for the various DXR operations. In some cases, like mapping `IgnoreHit()` to `ignoreIntersectionNVX()` this is almost trivial. The various functions to query system-provided values (e.g., `RayTMin()`) were also easy, with the only gotcha being that they map to variables rather than function calls in GLSL, and our handling of `__target_intrinsic` assumes that a bare identifier represents a replacement function name, and not a full expression, so we have to wrap these definitions in parentheses. The tricky operations are then `TraceRay<P>()` and `ReportHit<A>()`, because these two are generics/templates in HLSL. GLSL doesn't support generics, even for "standard library" functions, so the raytracing extension implements a slightly complex workaround: the matching operations `traceNVX()` and `reportIntersectionNVX()` pass the payload/attributes argument data via a global variable. That is, shader code for the GLSL extensions writes to the global variable and then calls the intrinsic function. The linkage between the call site and the global is established by a modifier keyword (`rayPayloadNVX` and `hitAttributeNVX`, respectively) and in the case of ray payload also uses `location` number to identify which payload global to use (since a single shader can trace rays with multiple payload types). Our translation strategy in Slang tries to leverage standard language mechanisms instead of special-case logic. For example, to translate the `ReportHit<A>()` function, we provide both a default declaration that will work for HLSL (where the operation is built-in with the signature given), and a *definition* marked with the `__specialized_for_target(glsl)` modifier. The GLSL definition declares a function `static` variable that will fill the role of the required global, and then does what the GLSL spec requires: assigns to the global, and then calls the `reportIntersectionNVX` builtin (which we declare as a separate builtin). Our ordinary lowering process will turn that `static` variable into an ordinary global in the IR, and the `[__vulkanHitAttributes]` attribute on the variable will be emitted as `hitAttributeNVX` in the output. There is no additional cross-compilation logic in Slang specific to `ReportHit<A>()` - the target-specific definition in the standard library Just Works. The case for `TraceRay<P>()` is a bit more complicated, simply because the GLSL `traceNVX()` function needs to be passed the `location` for the payload global. We implement the payload global as a function-`static` variable, with the knowledge that every unique specialization of `TraceRay<P>()` will generate a unique global variable of type `P` to implement our function-`static` variable. We then add a slightly magical builtin function `__rayPayloadLocation()` that can map such a variable to its generated `location`; the logic for this is implemented in `emit.cpp` and described below. We also changed the `RayDesc` and `BuiltinTriangleIntersectionAttributes` types from "magic" intrinsic types over to ordinary types (because the GLSL output needs to declare them as ordinary `struct` types). This ends up removing some cases in the AST and IR type representations. By itself this change would break HLSL emit, because in that case the types really are intrinsic. We added a `__target_intrinsic` modifier to these types to make them intrinsic for HLSL, and then updated the downstream passes to handle the notion of target-intrinsic types. The logic for binding/layout of entry point inputs and outputs was updated so that raytracing stages don't follow the default logic for varying input/output parameters. This is because the input/output parameters of a raytracing entry point aren't really "varying" in the same sense as those in the rasterization pipeline. In particular, the SPIR-V model for raytracing input and output treats "ray payload" and "hit attributes" parameters as being in a distinct storage class from `in` or `out` parameters. We also detect cases where a ray tracing stage declares inputs/outputs that it shouldn't have. This logic could conceivably be extended to other stages (e.g., to give an error on a compute shader with user-defined varying input/output). The type layout logic added cases for handling raytracing payload and hit-attribute data, but this is currently just a stub implementation that follows the same logic as for varying `in` and `out` parameters (it cannot give meaningful byte sizes/offsets right now). To my knowledge the GLSL spec doesn't currently specify anything about layout, and I haven't read the DXR spec language carefully enough to know what it says about layout. A future change should update the layout logic to allow for byte-based layout of ray payloads, etc. so that we can query this information via reflection. The GLSL legalization logic in `ir.cpp` was updated to factor out the per-entry-point-parameter code into its own function, and then that function was updated to special-case the input/output of a ray-tracing shader. While for rasterization stages we typically want to take the user-declared input/output and "scalarize" it for use in GLSL (in part to deal with language limitations, and in part to tease system values apart from user-defined input/output), the GLSL spec for raytracing requires payload and hit attribute parameters to be declared as single variables. There is also the issue that even for an `in out` parameter, a ray payload parameter should only turn into a single global, whereas the handling for varying `in out` parameters generates both an `in` and an `out` global for the GLSL case. Other than the handling of entry point parameters, the GLSL legalization pass doesn't need to do anything special for ray tracing shaders. The trickiest change in the `emit.cpp` logic is that we now generate `location`s for ray payload arguments (the outgoing from a `TraceRay()` call) on demand during code generation. This is a bit hacky, and it would be nice to handle it as a separate pass on the IR rather than clutter up the emit logic, but this approach was expedient. Basically, any of the global variables that got generated from the `static` declarations in the standard library implementation of `TraceRay()` will trigger the logic to assign them a `location`. The logic for emitting intrinsic operations added a few new `$`-based escape sequences. The `$XP` case handles emitting the location of a generated ray payload variable; this is how we emit the matching location at the site where we call `traceNVX`. The `$XT` case emits the appropriate translation for `RayTCurrent()` in HLSL, because it maps to something different depending on the target stage. All of the test cases here consist of a pair of an HLSL/Slang shader written to the DXR spec, plus a matching GLSL shader for a baseline. The GLSL shaders are carefully designed so that when fed into glslang they will produce the same SPIR-V as our cross-compilation process. This kind of testing is quite fragile, but it seems to be the best we can do until our testing framework code supports *both* DXR and VKRay. A bunch of the core changes ended up being blocked on issues in the rest of the compiler, so some additional features go implemented or fixed along the way: The first big wall this work ran into was that the `__specialized_for_target` modifier hasn't actually been working correctly for a while. It turns out that for the one function that is using it, `saturate()`, we have been outputting the workaround GLSL function in *all* cases (including for HLSL output) rather than only on GLSL targets. The problem here is that for a generic function with a `__specialized_for_target` modifier or a `__target_intrinsic` modifier, the IR-level decoration will end up attached to the `IRFunc` instruction nested in the `IRGeneric`, but the logic for comparing IR declarations to see which is more specialized (via `getTargetSpecializationLevel()`) was looking only at decorations on the top-level value (the generic). The quick (hacky) fix here is to make `getTargetSpecializationLevel()` try to look at the return value of a generic rather than the generic itself, so that it can see the decorations that indicate target-specific functions. A more refined fix would be to attach target-specificity decorations to the outer-most generic (to simplify the "linking" logic). The only reason not to fold that into the current fix is that the `__target_intrinsic` modifier currently serves double-duty as a marker of target specialization *and* information to drive emit logic. The latter (the emit-related stuff) currently needs to live on the `IRFunc`, and moving it to the generic could easily break a lot of code. This needs more work in a follow-on fix, but for now target specialization should again be working. The other big gotcha that the simple "just use the standard library" strategy ran into was that function-`static` variables weren't actually implemented yet, and in particular function-`static` variables inside of generic functions required some careful coding. The logic in `lower-to-ir.cpp` has this `emitOuterGenerics()` function that is supposed to take a declaration that might be nested inside of zero or more levels of AST generics, and emit corresponding IR generics for all those levels. This is needed because two different AST functions nested inside a single generic `struct` declaration should turn into distinct `IRFunc`s nested in distinct `IRGeneric`s. The tricky bit to making that all work is that the same AST-level generic type parameter will then map to *different* IR-level instructions (the parameters of distinct `IRGeneric`s) when lowering each function. The existing logic handled this in an idiomatic way by making "sub-builders" and "sub-contexts." This change refactors some of the repeated logic into a `NestedContext` type to help simplify the pattern, and applies it consistently throughout the `lower-to-ir.cpp` file. Besides that cleanup, the major change is `lowerFunctionStaticVarDecl` which, unsurprisingly, handles lower of function-`static` variables to IR globals. The careful handling of nested contexts here is needed because if we are in the middle of lowering a generic function, then a `static` variable should turn into its *own* `IRGeneric` wrapping an `IRGlobalVar`. The body of the function should refer to the global variable by specializing the global variable's `IRGeneric` to the parameters of the *functions* `IRGeneric`. This tricky detail is handled by `defaultSpecializeOuterGenerics`. An additional subtlety not actually required for this raytracing work (and thus not properly tested right now) is handling function-`static` variables with initializers. These can't just be lowered to globals with initializers, because HLSL follows the C rule that function-`static` variables are initialized when the declaration statement is first executed (and this could be visible in the presence of side-effects). The lowering strategy here translates any `static` variable with an initializer into *two* globals: one for the actual storage, plus a second `bool` variable to track whether it has been initialized yet. There are some opportunities to optimize this case, especially for `static const` data, but that will need to wait for future changes. We've slowly been shifting away from the model where a user thinks of a "profile" as including both a stage and a feature level. Instead, the user should think about selecting a profile that only describes a feature level (e.g., `sm_6_1`, `glsl_450`, etc.), and then separately specifying a stage (`vertex`, `raygeneration, etc.) for each entry point. The challenge here is that the command-line processing still only had a single `-profile` switch, and no way to specify the stage. Adding the `-stage` option was relatively easy, but making it work with the existing validation logic for command-line arguments was tricky, because of the complex model that `slangc` supports for compiling multiple entry points in a single pass. * In `slang.h` add new reflection parameter categories for ray payloads and hit attributes, as part of entry point input/output signatures. * A previous change already updated our copy of glslang to one that supports the `GL_NVX_raytracing` extension, so in `slang-glslang.cpp` we just needed to map Slang's `enum` values for the raytracing stage names to their equivalents in the glslang code. * Moved the logic for looking up a stage by name (`findStageByName()`) out of `check.cpp` and into `compiler.cpp`, with a declaration in `profile.h` * Added a `$z` suffix to the GLSL translation of `Texture*.SampleLevel()`, to handle cases where the texture element type is not a 4-component vector. Note that this fix should actually be applied to *all* these texture-sampling operations, but I didn't want to add a bunch of changes that are (clearly) not being tested right now. * The layout logic for entry points was updated to correctly skip producing a `TypeLayout` for an entry point result of type `void`, which meant that the related emit logic now needs to guard against a null value for the result layout. * In `ir.cpp`, dump decorations on every instruction instead of just selected ones, so that our IR dump output is more complete. * Added a command-line `-line-directive-mode` option so that we can easily turn off `#line` directives in the output when debugging. Not all cases where plumbed through because the `none` case is realistically the most important. * Parser was fixed to properly initialize parent links for "scope" declarations used for statements, so that we can walk backwards from a function-scope variable (including a `static`) and see the outer function/generics/etc. * Added GLSL 460 profile, since it is required for ray tracing. Also updated the logic for computing the "effective" profile to use to recognize that GLSL raytracing stages require GLSL 460. * Added some conventional ray-tracing shader suffixes to the handling in `slang-test`. This code isn't actually used, but was relevant when I started by copy-pasting some existing VKRay shaders as the starting point for my testing. * Fixup: typos
* Support for IRStringLit (#645)jsmall-nvidia2018-09-19
| | | | | | | | * * Added support for strings in IR with IRStringLit - with storage of chars after it * Added kIRDecorationOp_Transitory - can be used for detecting instructions constructed on stack * Made IRConstant hashing work off type * Fix comment that is out of date about how an instruction is determines to hold a transitory string.
* Remove IRDeclRef as recommended in comments for PR #635 as no longer used. ↵jsmall-nvidia2018-09-17
| | | | (#638)
* Improvements around IR representation and memory usage (#635)jsmall-nvidia2018-09-14
| | | | | | | | | | | | | | | | | | * * Remove dispose from IRInst * Use MemoryArena instead of MemoryPool * Make all IRInst not require Dtor - by having ref counted array store ptrs that need freeing * Increase block size - typically compilation is 2Mb of IR space(!) * Fix issues around StringRepresentation::equal because null has special meaning. * Don't bother to construct as String to compare StringRepresentation, just used UnownedStringSlice. * Added fromLiteral support to UnownedStringSlice and use instead of strlen version. * Use more conventional way to test StringRepresentation against a String. * Fix gcc/clang template problem with cast.
* Support for Tessellation (#607)jsmall-nvidia2018-06-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Fix typo OuptutTopologyAttribute -> OutputTopologyAttribute First pass support for handing tesselation shaders - domain and hull. * Added attribute PatchConstantFuncAttribute * Added visitHLSLPatchType(HLSLPatchType* type) such that the patch type template parameters are handled * Added IRNotePatchConstantFunc - such that the patch constant function is referenced within IR * Added support for outputing typical tesselation attributes (although minimal validation is performed) * Added findFunctionDeclByName * Small improvements to diagnostic. * Improved diagnostics and checking for geometry shader attributes. * Added diagnostic if patchconstantfunc is not found Handle assert failure when outputing a domain shader alone and therefore attr->patchConstantFuncDecl is not set. * Simple script tess.hlsl to test out domain/hull shaders. * Added url for where hull shader attributes are defined. * Fix unsigned/signed comparison warning. * Restore removal of fix in "Improve generic argument inference for builtins (#598)" * Update tessellation test case to compare against fxc The test was previously comparing against fixed expected DXBC output, but this caused problems when the test runner tried to execute the test on Linux (where there is no fxc to invoke...), and would also be a potential source of problems down the road if different users run using different builds of fxc. The simple solution here is to convert the test to compare against fxc output generated on the fly. That test type is already filtered out on non-Windows builds, so it eliminates the portability issue (in a crude way). I also changed the test to compile both entry points in one compiler invocation, just to streamline things into fewer distinct tests. * Eliminate unnecessary call to `lowerFuncDecl` In a very obscure case this could cause a bug, if the patch-constant function had somehow already been lowered (because it was called somewhere else in the code). The call should not be needed because `ensureDecl` will lower a declaration on-demand if required, so eliminating it causes no problems for code that wouldn't be in that extreme corner case.
* Fix global atomic functions (#582)Tim Foley2018-05-29
| | | | | | | | | Fixes #581 This change adds a new parameter passing mode `__ref` to exist alongisde `in`, `out`, and `inout`. The `__ref` modifier indicates true by-reference parameter passing (whereas `inout` is copy-in-copy-out). This is not intended to be something that users interact with directly, but rather a low-level feature that lets us provide a correct signature for the `Interlocked*()` operations in the standard library. Most of the support for passing what are logically addresses around already exists in the IR, so the majority of the work here is just in introducing the new type `Ref<T>` and then using it appropriately when lowering `__ref` parameters/arguments to the IR.
* Cleanups around behavior when the compiler fails (#553)Tim Foley2018-05-11
| | | | | | | | | | | | | | * Cleanups around behavior when the compiler fails * Add another case where we try to `noteInternalErrorLoc()` if an exception in thrown. This one is the in the logic for emitting an IR instruciton. This could be improved by adding another layer at the function level (as a catch-all for instructions with no location), but something is better than nothing. * Change a bunch of `assert()`s over to `SLANG_ASSERT()`s, so that we can theoretically take more control over them (e.g., make release builds with asserts enabled) * Some other small cleanups around the assertions we perform. In the survey I made, I didn't really see many obvious "smoking gun" cases where we could produce a significantly better error message for some of the unimplemented/unexpected paths, other than to actually implement the missing functionality. * fixup
* Pass through original names for most declarations (#547)Tim Foley2018-05-03
| | | | | | | | | | | | | | | | | | | The basic idea here is that when lowering to the IR, the front-end will attach a "name hint" to the IR instruction(s) that represent a given declaration, and then the passes that work on the IR will try to preserve and propagate those names, and then finally the emit logic will use them in place of mangled or unique names when available. This change does *not* try to deal with the issues that arise when we try to use those variable names in the output without any modification (e.g., handling cases where they might clash with keywords or builtins in the target language). Instead, it tries to establish baseline behavior for propagating through names, so that a later change can concentrate on the issue of using those names exactly when it is legal to do so. In order to avoid issues around the name "hints" causing problems we take two main steps: 1. We "scrub" each name to reduce it down to the allowed set of identifier characters in C-like languages, and then ensure that it doesn't do things that would be illegal in some downstream languages (e.g., consecutive underscores are not allowed in GLSL) or could clash with Slang's mangled names. This process isn't guaranteed to give distinct results for distinct inputs (it isn't a mangling scheme, after all). 2. We generate a unique ID for each occurence of a given name and always use that as a suffix. This means that even if a name happens to overlap with a keyword (if you somehow have a variable named `do`), we will still add a suffix that makes it not a problem (we'd output `do_0` which is fine). The logic for generating these names is mostly straightforward. For simple variables, we use their given name directly, while for other declarations we try to form a name that includes their parent declaration (e.g. `SomeType.someMethod`). Various IR passes need to propagate or preserve this information. The most interesting is type legalization, when we take a variable with an aggregate type and split some of the fields out into their own variables. In that case we generate "dotted" names like `someVar.someTexture` and rely on the emit logic to turn that into `someVar_someTexture`. During SSA generation, if we are promoting a variable to SSA temporaries, we will try to propagate the name of the variable over to the temporaries (unless they already have a name from some other place). The same applies to block parameters ("phi nodes"). Many of the test changes need their expected output to be updated for this change. Luckily in most cases the output has gotten easier to understand.
* Add support for "swizzled stores" (#544)Tim Foley2018-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was a known issue in our IR representation, which was now biting a user. The basic problem is that in code like the following: ```hlsl RWStructureBuffer<float4> buffer; ... buffer[index].xz = value; ``` we ideally want to be able to reproduce the original HLSL code exactly, but that requires directly encoding the way that this code writes to two elements of a vector, but not the others. The currently lowering strategy we had produced IR something like: ```hlsl float4 tmp = buffer[index]; tmp.xz = value; buffer[index] = tmp; ``` That transformation might seem valid, but it has some big problems: * It generates UAV reads that are not needed, which could impact performance * It performs read-modify-write operations on memory that the programmer didn't explicitly write, which could create data races The fix here is somewhat obvious: if the "base" of a swizzle operation on a left-hand side resolves to a pointer in our IR, then we can output a "swizzled store" instead of the read-modify-write dance. We currently keep the read-modify-write around since it is potentially needed as a fallback in the general case. Along the way I also tried to make sure that we handle the case where we have a swizzle of a swizzle on the left-hand side: ```hlsl buffer[index].xz.y = value; ``` That code should behave the same as `buffer[index].z = value`. I am currently detecting and cleaning up this logic in the lowering path for `SwizzleExpr`, because that is the only place in the lowering logic that "swizzled l-values" currently get created.
* Introduce an IR-level type system (#481)Tim Foley2018-04-11
| | | | | | | | | | | | | | | | | * Introduce an IR-level type system Up to this point, the Slang IR has used the front-end type system to represent types in the IR. As a result (but ultimately more importantly) the IR representation of generics and specialization has used AST-level concepts embedded in the IR. For example, to express the specialization of `vector<T,N>` to a concrete type `float` for `T`, we needed an IR operation that could represent the specialization, with operands that somehow represented the type argument `float`. The whole thing was very complicated. The big idea of this change is to introduce a new representation in which types in the IR are just ordinary instructions, so that using them as operands makes sense. The hierarchy of IR types closely mirrors the AST-side hierarchy for now, and that will probably be something we should maintain going forward. In order to make these changes work, though, I also had to do major overhauls of things like the way substitutions are performed, how we check interface conformances, the way lookup through interface types is done, etc. etc. This is a big change, and unfortunately any attempt to summarize it in the commit message wouldn't do it justice. * Fix 64-bit build warning * Fix up some clang warnings/errors
* Falcor fixes (#479)Tim Foley2018-04-05
| | | | | | | | | | | | | | | | | | | * Don't emit interpolation modifiers on GLSL fields The previous change that started passing through interpolation modifiers didn't account for the fact that the GLSL grammar doesn't allow interpolation modifiers on `struct` fields, so we shouldn't emit them in that case (and our legalization strategy for GLSL guarantees that varying input will never use a `struct` type anyway). * Try to handle SV_Position semantic on GS input HLSL allows `SV` semantics to be used for ordinary inter-stage dataflow in some cases (e.g., a VS can output `SV_Position` and it can then be read from a GS). GLSL allows similar things with `gl_Position`, but there are some wrinkles. One fix here is to correctly identify that `gl_FragCoord` should only be used as the translation of `SV_Position` for a fragment shader input (and not in the general case of *any* input). The other "fix" here is a kludge to handle the fact that the right translation for a GS input is not to an array called `gl_Position`, but to the syntax `gl_in[index].gl_Position` (array-of-structs style). I am doing this by attaching a custom decoration to the global variable we create for `gl_Position` and then intercepting it during code emit. I'm not proud of this, and would like to do something better given time. * Fix GLSL output for matrix-scalar multiplication The output logic was assuming that any use of `operator*` in the input code that yields a value of matrix type must be translated to a `matrixCompMult()` call in GLSL, but this should really only apply if both of the *operands* are matrices (not just based on the result type). As a result matrix-times-scalar operations were emitting a call to `matrixCompMult()` and GLSL was complaining because it won't implicitly promote scalars to matrices.
* IR: next phase of "everything is an instruction" (#433)Tim Foley2018-03-03
| | | | | | | | | | | | | The main practical change here is that things that used to be `IRValue`s, like literals, are now being expressed as instructions in the global scope. In order to validate that things are actually being handled correctly, this change introduces an explicit "validation" pass that can be run on the IR to check for different invariants (although it doesn't check many of the important ones right now). I've left the validation pass turned off by default, but with a command-line flag to enable it. We may want to make it be on by default in debug builds, just to keep us honest. The main invariant for the moment is that when on IR instruction is used as an operand to another, it had better come from the same IR module. Some of the existing passes were violating this rule, in particular when it came to cloning of witness tables related to global generic parameter substitution. Those features can in theory be handled better now by allowing `specialize` instructions at other scopes, but I didn't want to over-complicate this change, so I make just enough fixes to ensure that these steps always clone witness tables they get from the "symbols" on an IR specialization context. In order for this to work when recursively specializing, I had to ensure that the logic for generic specialization had a notion of a "parent" specialization context that it would fall back to to perform cloning when necessary. This change keeps the logic that was caching and re-using the instructions for literal values within a module, but adds some logic that isn't really being tested right now for picking the right parent instruction to insert a constant instruction into. This logic doesn't trigger right now because all of the cases we are using it on have zero operands (and so they always get "hoisted" to the global scope), but eventually for things like types we want to be able to support instructions with operands (e.g., `vector<float, 4>`) and handle the case where some of those operands come from different scopes (e.g., when nested inside a generic). The final change here is mostly cosmetic: the `IRBuilder` is now more abstract about where insertion occurs: it tracks a single `IRParentInst` to insert into, and then an optional `IRInst` to insert before. In the common case, that parent is an `IRBlock`, but it could conceivably also be the global scope, or a witness table, etc. Use sites where we used to change those fields directly now use distinct methods `setInsertInto(parent)` and `setInsertBefore(inst)` which capture the two cases we care about. Accessors are also defined to extract the current block (if the current parent is a block), and the current "function" (global value with code, if the current parent is a global value with code, or a block inside one). With this work in place, it should be possible for a follow-on change to start putting `specialize` instructions at the global scope and thus clean up some of the on-the-fly specialization work. This work should also help with some of the requirements around a distinct IR-level type system and more explicit generics.
* IR: "everything is an instruction" (#432)Tim Foley2018-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * IR: "everything is an instruction" This change tries to streamline the representation of the IR in the following ways: * Every IR value is an instruction (there is no `IRValue` type any more) * All IR values that can contain other values share a single base (`IRParentInstruction`) * Dynamic casts to specific IR instruction types can be accomplished with a new `as<Type>(inst)` operation, that uses the IR opcode to implement casts. The biggest change in terms of number of lines is getting rid of `IRValue`. The diff here could probably be smaller if I'd just done `typedef IRInst IRValue;`. Along the way I also renamed the `getArg`/`getArgs`/`getArgCount` combination over to `getOperand`/`getOperands`/`getOperandCount` to avoid being confusing when we have something like a `call` instruction where the "arguments" of the call don't line up with the operands of the instruction. I also tried to clean up the representation of lists of child instructions to try to make it easier to iterate over them with C++ range-based `for` loops. Developers still need to be careful about mutating the contents of a block while iterating over it in this fashion (e.g. if you remove the "current" element, the iteration will end prematurely). Probably the thorniest change here is that parameters are now just represented as the first N instructions in a block, which means: * We need to perform a linear search to find the end of the parameter list. This is probably not often a problem, because usually you would be iterating over the parameters anyway, and that will be linear in the number of parameters. * Algorithms that iterate over a block either need to ignore parameters, treat parameters just like other instructions, or somehow cleave the list into the range of parameters, and the range of "ordinary" instructions (which involves the same linear search above). * When inserting into a block, we need to be careful not to insert instructions at invalid locations (e.g., insert a temporary before the parameters, or insert a parameter in the middle of the code). I can't pretend that I've handled the details of that here. (This is no different than having to make the same adjustments for phi nodes in a typical SSA representation) * One possible future-proof approach is to implement a pass that sorts the instructions in a block so that parameters always come first. That would let us implement passes without caring about this detail, and then clean up right before any pass that cares about the relative order of parameters and other instructions. The current change is missing any work to make literals and other instructions that used to be `IRValue`s properly nest inside of their parent module. Right now these instructions are just left unparented, and may actually end up being shared between distinct modules. Fixing that will need a follow-up change. The biggest challenge there is that it introduces instructions at the global scope that aren't `IRGlobalValue`s. This change doesn't try to take advantage of any of the new flexibility (e.g., by nesting `specialize` instructions inside of witness tables). The goal is to do exactly what we were doing before, just with a different representation. * Warning fix
* Make `IRGlobalValue::mangledName` a `Name*`Yong He2018-02-22
| | | | This allows us to get rid of `IRGlobalValue::dispose()`.
* Initial work on validating "constexpr"-ness in IR (#420)Tim Foley2018-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Initial work on validating "constexpr"-ness in IR The underlying issue here is that certain operations in the target shading languages constrain their operands to be compile-time constants. A notable example is the optional texel offset parameter to the `Texture2D.Sample` operation. When calling these operations in GLSL, the user is required to pass a "constant expression," and any variables in that expression must therefore be marked with the `const` qualifier (and themselves be initialized with constant expressions). Any GLSL output we generate must of course respect these rules. When calling these operations in HLSL, the user is not so constrained. Instead, they can pass an arbitrary expression, which may involve ordinary variables with no particular markup, and then the compiler is responsible for determining if the actual value after simplification works out to be a constant. In some cases, the requirement that a value be constant might actually trigger things like loop unrolling. Also, it is okay to use a function parameter to determine such a constant expression, as long as the argument turns out to be a constant at all call sites. The way we have decided to tackle these challenges in Slang is that we we propagate a notion of `constexpr`-ness through the IR. This is currently being tackled in `ir-constexpr.cpp` with a combination of forward and backward iterative dataflow: * When the operands to an instruction are all `constexpr`, and the opcode is one we believe can be constant-folded, then we infer that the instruction *can* be evaluated as `constexpr` * When instruction is required to be `constexpr`, then we infer that all of its operands are also required to be `constexpr`. If this process ever infers that a function parameter is required to be `constexpr`, then we might have to continue propagation at all the call sites to that function. If after all the propagation is done, there are any cases where an instruction is *required* to be `constexpr`, but it *can't* be `constexpr` (we weren't able to infer `constexpr`-ness for its operands), then we issue an error. This implementation encodes the idea of `constexpr`-ness in the IR as part of the type system, using a simplified notion of rates. This change adds a `RateQualifiedType` that can represent `@R T`, and then introduces a `ConstExprRate` that can be used for `R`. Many accessors for the type information on IR nodes were updated to distinguish when one wants the "full" type of an IR value (which might include rate information) vs. just the "data" type. A `constexpr` qualifier was added in the front-end, and is being used to decorate the texel offset parameter for `Texture2D.Sample`. Lowering from AST to IR looks for this qalifier and infers when a function parameter must be typed as `@ConstExpr T` instead of just `T`. There are lots of limitations and gotchas in the implementation so far: * The `@ConstExpr` rate is the only one added in this change, but it seems clear that the conceptual `ThreadGroup` rate that was added to represent `groupshared` should probably get folded into the representation. * I'm not 100% pleased with how many places in the IR I have to special-case for rate-qualified types. At the same type, pulling out rate as a distinct field on `IRValue` would probably require that we pay attention to rate everywhere. * I've added a test case to show that we can issue errors when users fail to provide a constant expression for the texel offset, but the actual error message isn't great because it doesn't indicate *why* a constant expression was required. Realistically the "initial IR" should contain a few more decorations we can use to relate error conditions back to the original code (even if this is in a side-band structure). * I've added a test case that is supposed to show that we can back-propagate `constexpr`-ness to local variables, and I've manually confirmed that it works for Vulkan/SPIR-V output, but the level of Vulkan support in `render_test` today means I can't enable the test for check-in. * While I'm attempting to propagate `@ConstExpr` information from callees to callers, I haven't implemented any logic to specialize callee functions based on values at call sites. * In a similar vein, there is no handling of control-flow dependence in the current code. If we infer that a phi (block parameter) needs to be `@ConstExpr`, then it isn't actually enough to require that the inputs to the phi (arguments from predecessor blocks) are all `@ConstExpr` because we also need any control-flow decisions that pick which incoming edge we take to be `@ConstExpr` as well. * As a practical matter, implicit propagation of `@ConstExpr` from a function body to a function parameter should only be allowed for functions that are "local" to a module. Any function that might be accessed from outside of a module should really have had its `@ConstExpr` parameter marked manually, and our pass should validate that they follow their own rules. Right now we have no kind of visibility (`public` vs `private`) system, so I'm kind of ignoring this issue. While that is a lot of gaps, this is also just enough code to get the Falcor MultiPassPostProcess example working, so I'm inclined to get it checked in. * Fixup: missing expected output for test * Fixup: disable test that relies on [unroll] for now
* more to fixing memory leaksYong He2018-02-19
| | | | | 1. reorder destruction order of several key classes to avoid using deleted IR objects when destroying Types 2. remove Session::canonicalTypes and make each Type own a RefPtr to the canonicalType, to allow types to be destroyed along with each IRModule it belongs to.
* Fix IR memory leaks.Yong He2018-02-19
| | | | | | | | | 1, make IRModule class own a memory pool for all IR object allocations 2. For now, we allow IR objects to own other (externally) heap allocated objects, such as String, List and RefPtrs by tracking all IR objects that has been allocated for the IRModule in a list named `IRModule::irObjectsToFree`. and call destructor for all these objects upon the destruction of the IRModule. In the long term, we should eliminate the use of all these externally allocated types in IR system and get rid of this tracking and explicit destructor calls. 3. remove non-generic `createValueImpl` functions and retain only generic versions in IRBulider so we can properly call the constructor of the IR types to set up virtual tables correctly for destructor dispatching. 4. add `MemoryPool` class for allocation of the IR objects. 5. Make sure we are disposing IRSpecContexts when we are done with the specialized IR module. 6. Add `_CrtDumpMemoryLeaks()` calls to check memory leaks upon destruction of a Slang session. If we are to support multiple sessions at a time, this call should probably be replaced with the more advanced MemoryState versions of the memory leak checker.
* Implement IR-level translation of system values for GLSL (#413)Tim Foley2018-02-16
| | | | | The approach here isn't ideal. We already have a pass that transforms HLSL varying input/output types into GLSL global variables. This change makes it so that when those inputs/outputs have system-value semantics, we generate a global variable declaration with the appropriate `gl_*` name (leaving the type the same for now). Later, when emitting code, we just skip emitting declarations for declarations with mangled names that start with `gl_*`. A more complete implementation will be needed later on, which handles cases where the translation requires types to be changed (so that conversion code needs to be inserted).
* Fix a bug in IR use-def information (#406)Tim Foley2018-02-13
| | | | | | | | | | | The basic problem here is that when unlinking an `IRUse` from the linked list of uses, there were several cases where I was failing to set the `prevLink` field of the next node to match the `prevLink` field of the node being removed. That doesn't show up when walking the linked list of uses forward, but it breaks it whenever you have subsequent unlinking operations. This change fixes the bugs of that kind I could find, and also adds a debug validation method to try to avoid breaking it again. I also made more access to `IRUse` go through accessor methods rather than using fields directly, to try to avoid this kind of error. I stopped short of making anything `private`, because I tend to find that it creates more hassles than it avoids. A few other fixes along the way: - Made the `List<T>` type default-initialize elements when you resize it. I hadn't realized we weren't doing that. - Add a standalone `dumpIR(IRGlobalValue*)` so help when debugging issues.
* Basic IR support for `static const` globals (#404)Tim Foley2018-02-08
| | | | | | | | | | | | | | * Basic IR support for `static const` globals Our strategy for lowering global *variables* can fall back to putting their initialization into a function, but that isn't really appropriate for global constants (it also isn't appropriate for arrays, but we'll need to deal with that seaprately). This change adds a distinct case for global constants (rather than treating them as variables), and forces the emission logic to always emit them as a single expression. Doing this makes assumptions about how the IR for these constants gets emitted (and what optimziations might do to it). In order to make things work, I had to switch the handling of initializer-list expressions to not be lowered via temporaries and mutation (since that isn't a good fit for reverting to a single expression). I've added a single test case to ensure that this works in the simplest scenario. My next priority will be to see if this unblocks my work in Falcor. * Fixup: bug fixes
* Generate SSA form for IR functions (#400)Tim Foley2018-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Generate SSA form for IR functions The basic idea here is simple: in the front-end after we have lowered the AST to initial IR we will apply a set of "mandatory" optimization passes. The first of these is to attempt to translate the all functions into SSA form so that they are amenable to subsequent dataflow optimizations. Eventually, the mandatory optimization passes would include diagnostic passes that make sure variables aren't used when undefined, etc. Just doing basic SSA generation already cleans up a lot of the messiness in our IR today, because constructs that used to involve many local variables can now be handled via SSA temporaries. The implementation of SSA generation is in `ir-ssa.cpp`, and it follows the approach of Braun et al.'s "Simple and Efficient Construction of Static Single Assignment Form." I used this instead of the more well-known Cytron et al. algorithm because Braun's algorith mis very simple to code, and does not require auxiliary analyses to generate the dominance frontier. The main wrinkle in our SSA representation right now is that instead of using ordinary phi nodes, we instead allow basic blocks to have parameters, where predecessor blocks pass in different parameter values. This encodes information equivalent to traditional phi nodes, but has two (small) benefits: 1. There is no fixed relationship between the order of phi operands and predecessor blocks, so we don't have to worry about breaking the phis when we alter the order in which predecessors are stored. This is important for us because predecessors are being stored implicitly. 2. It is easy to operationalize a "branch with arguments" either when lowering to other languages, or when interpreting the IR. A branch with arguments is implemented as a sequence of stores from the arguments to the parameters of the target block (very similar to a call), followed by a jump to the block. Relevant to the above, this change also adds an interface for enumerating the predecessors or successors of a block in our CFG. Rather than use an auxliary structure, we directly use the information already encoded in the IR: * The sucessors of a block are the target label operands of its terminator instruction. In our IR this is a contiguous range of `IRUse`s, possible with a stride (to account for the way `switch` interleaves values and blocks). * The predecessors of a block are a subset of the uses of the block's value. Specifically, they are any uses that are on a terminator instruction, and within the range of values that represent the successor list of that instruction. One important limitation of the "blocks with arguments" model for handling phis is that it is really only convenient to stash extra arguments on an unconditional terminator instruction. This change works around this prob lem by breaking any "critical edges" - edges between a block with multiple successors and one with multiple predecessors. We assume that "phi" nodes will only ever be needed on a block with multiple predecessors, and because critical edges are broken, each of these predecessors will then have only a single successor, so its branch instruction can handle the extra arguments. This change introduces a notion of an "undefined" instruction in the IR. This is handled as an instruction rather than a value because I anticipate that we will want to distinguish different undefined values when it comes time to start issuing error messages (those messages will need to point to the variable that was used when undefined). * Fix expected test output. Another change was merged that enabled the `glsl-parameter-blocks` test, and its output is affected by our IR optimization work.
* Support __target_intrinsic modifiers in IR codegen (#401)Tim Foley2018-02-07
| | | | | | | | | | | The standard library already has a bunch of these decorations, since they were added to support Slang->Vulkan codegen on the AST-to-AST path. This change makes the IR code generator able to exploit the modifiers so that we pick up a bunch of Vulkan support "for free" in the short term. The basic change is in `lower-to-ir.cpp` where we copy over any `TargetIntrinsicModifier`s to become `IRTargetIntrinsicDecoration`s with the same information. We then need a bit of logic in `ir.cpp` to make sure we clone them as needed. The core work of using the modifiers is in `emit.cpp`, where I basically just copy-pasted the existing logic that applied in the AST path (all the AST-related code there is dead, and we should clean it up soon). The big change that comes with this logic is that when dealing with a member function, the numbering of the argument used in the intrinsic definition string changes, so that `$0` refers to the base object (whereas before the base object was looked up via the base expression of a `MemberExpr` used for the function). This requires a bunch of the definitions in the library to be updated; hopefully I caught them all. For kicks, I've re-enabled a cross-compilation test just to confirm that we are generating valid SPIR-V for code that performs texture-fetch operations. I don't expect us to keep that test enabled as-is in the long term, though, because it would be much better to instead use render-test to do the same thing. Alas, beefing up the Vulkan support in render-test is an outstanding work item, and I didn't want to pollute this change with more work along those lines.
* Improvements and bug fixes for global type parametersYong He2018-01-21
| | | | | | 1. allow spReflection_FindTypeByName to accept arbitrary type expression string 2. allow const int generic value to be used as expression value, and as array size 3. various bug fixes in witness table specialization / function cloning during specializeIRForEntryPoint to avoid creating duplicate global values, not copying the right definition of a function from the other module, not cloning witness tables that are required by specializeGenerics etc.
* Support nested genericsYong He2018-01-12
| | | | fixes #362