summaryrefslogtreecommitdiff
path: root/source
AgeCommit message (Collapse)Author
2020-09-18Serialization fixes based on review of #1547 (#1551)jsmall-nvidia
* Test if blob is returned. * Rename serialize files so can be grouped. * StringRepresentationCache -> SerialStringTable * Split out SerialStringTable from slang-serialize-ir * First pass at reorganizing serialization/containers. Remain some issues about debug info. * Fix bug in calculating sourceloc. * Improve calcFixSourceLoc * Make allocations for payload RiffContainer align to at least 8 bytes. This is important for read, if the payload can contain 8 byte aligned data. Note this has no effect on Riff file format alignment rules. * Improve comments around RiffContainer and alignment. * Remove SerialStringTable, can just use StringSlicePool instead. * Add flags to control what is output in SerialContainer. Turn off AST output for obfuscated code. Lazily create astClasses when doing write container serialization. * Typo fix for Clang/Linux. * Fixes that came out of review * TranslationUnit -> Module * TargetModule -> TargetComponent * PAYLOAD_MIN_ALIGNMENT -> kPayloadMinAlignment
2020-09-18Control container serialization with SerialOptionFlags (#1550)jsmall-nvidia
* Test if blob is returned. * Rename serialize files so can be grouped. * StringRepresentationCache -> SerialStringTable * Split out SerialStringTable from slang-serialize-ir * First pass at reorganizing serialization/containers. Remain some issues about debug info. * Fix bug in calculating sourceloc. * Improve calcFixSourceLoc * Make allocations for payload RiffContainer align to at least 8 bytes. This is important for read, if the payload can contain 8 byte aligned data. Note this has no effect on Riff file format alignment rules. * Improve comments around RiffContainer and alignment. * Remove SerialStringTable, can just use StringSlicePool instead. * Add flags to control what is output in SerialContainer. Turn off AST output for obfuscated code. Lazily create astClasses when doing write container serialization. * Typo fix for Clang/Linux.
2020-09-17Initial attempt to enable CUDA dynamic dispatch codegen (#1549)Yong He
* Front-load cuda module loading to fill in RTTI pointers. * Enable dynamic dispatch codegen for CUDA.
2020-09-17Share debug information between AST and IR (#1547)jsmall-nvidia
* Test if blob is returned. * Rename serialize files so can be grouped. * StringRepresentationCache -> SerialStringTable * Split out SerialStringTable from slang-serialize-ir * First pass at reorganizing serialization/containers. Remain some issues about debug info. * Fix bug in calculating sourceloc. * Improve calcFixSourceLoc * Make allocations for payload RiffContainer align to at least 8 bytes. This is important for read, if the payload can contain 8 byte aligned data. Note this has no effect on Riff file format alignment rules. * Improve comments around RiffContainer and alignment. * Remove SerialStringTable, can just use StringSlicePool instead. * Typo fix for Clang/Linux. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-09-17Embed default prelude for CUDA (#1546)Tim Foley
* Embed default prelude for CUDA Slang supports the notion of a "prelude" that gets prepended to the source code we generate in language. For some targets, a prelude is not necessary (e.g., we compile to HLSL/GLSL and then on to DXBC/DXIL/SPIR-V just fine without a prelude), but some targets have been implemented in a way that makes a prelude necessary (notably CPU and CUDA). For the targets that require a prelude, the Slang codebase includes usable preludes under the `prelude/` directory. Prior to this change, if a user was compiling for such a target (whether via command-line or API), there had to take responsibility for specifying the prelude to use (usually by passing in the contents of the prelude file(s) already included in the Slang distribution). It is reasonable for a user to expect an out-of-the-box experience where compilation to CUDA PTX or native CPU code should Just Work, similarly to how compilation to SPIR-V Just Works. This change is a step in the direction of providing a user experiene that Just Works for common cases. The main addition here is a tool called `slang-embed` that we run during our build to turn the `prelude/*.h` files into `prelude/*.h.cpp` files that embed the contents of the original `.h` file as a `const` variable. By compiling and linking in the generated `.h.cpp` file for the CUDA prelude, we are then able to set the default prelude to use for CUDA at the time a session/linkage is created. That default prelude will be used unless the user manually specifies their own prelude (which current users of the CUDA back-end must be doing). This change only sets up a default prelude for CUDA because of the way that the CPU prelude is split across multiple files. A strategy that provides a good default prelude for CPU may take more work, but that work might also be unnecessary if we switch to a strategy of using LLVM to generate native code. The implementation of the `slang-embed` tool is intentionally simple, and it will likely run into issues if/when we need to embed binary files or larger text files. The assumption being made here is that we can address those issues when they arise, and there is no reason to over-engineer the tool right now. The way that `slang-embed` is integrated into our build process is likely to require some iteration to make sure that it works across all platforms. I expect that this change will have multiple follow-up fixes related to trying to get the build to work as expected across all targets on CI. * fixup: trying to ensure that embedded prelude gets compiled into slang * fixup: properly clean up allocations in slang-embed * fixup: fix double free introduced by previous change * fixup: off-by-one allocation error
2020-09-17Fix an issue with double-counting uniform data for CUDA/CPU (#1545)Tim Foley
The `SimpleScopeLayoutBuilder` helper that is used to build up binding information for entry-point parameter lists has logic to try to support both explicit and implicit binding of parameters. This logic was added as part of supporting dual-source color blending on Vulkan. The basic approach is similar to that used for the global scope, where parameters with explicit binding first "carve out" the ranges they claim via a `UsedRangeSet`, and then parameters without explicit binding allocate space from what is left. The logic is (seemingly by accident) also applied to uniform/ordinary data, which creates a problem because the `ScopeLayoutBuilder` base type is also responsible for computing a layout for uniform/ordinary data that is 100% implicit (while dealing with all the relevant alignment restrictions). That logic goes on to add the computed uniform/ordinary resource usage to the computed type layout, but because such a layout has already been computed (albeit without taking alignment into account), the result is that the uniform/ordinary usage is reported at approximately double what it should be. The fix here is to skip uniform/ordinary resource usage when doing the explicit/implicit dance in `SimpleScopeLayoutBuilder`. This approach means that explicit bindings on entry-point `uniform` parameters will only apply to resources (which matches our rules for the global scope, where we don't allow for explicit binding on uniform/ordinary parameters). This is appropriate since the only reason we are supported explicit layout at all is for dual-source color blending (in general, we only support explicit `register` and `[[vk::binding(...)]]` modifiers on global parameters; users are stuck with our computed layouts in all other cases. Co-authored-by: Yong He <yonghe@outlook.com>
2020-09-16Search for multiple NVRTC versions (#1543)Tim Foley
* Search for multiple NVRTC versions The main change here is that when locating the NVRTC compiler we try multiple library names and take the first one that loads successfully (with an ordering that means we try newer versions before older ones). In order to support this change, I needed to fix the wrapping logic that invokes the downstream compiler "locator" function, so that it does not report every failed dynamic library load as an error diagnostic (leading to compilation failure), but instead only reports such failures once the locator has reported failure. The form of the diagnostic output for failures is also changed, in that we now report a single umbrella error about failing to load a downstream compiler, and then report the actuall dynamic library load failures as notes on that diagnostic instead of errors of their own. This choice seems appropriate since for cases like NVRTC it is *not* the case that each failed library load is a compilation error. We only need one of the listed libraries to be loadable, so that reporting them all as errors risks confusing users. One wrinkle that arose during testing is that the 11.0 release of NVRTC dropped support for the `compute_30` target, which had previously been the minimum and default. I had to add logic to check for versions of 11 or greater and switch to `compute_35` as the default. Similar changes may be required as part of supporting newer NVRTC versions if support for more architectures gets deprecated and removed. A more complete implementation of this logic might try to load multiple NVRTC versions such that the Slang compiler can identify a suitable compiler based on the minimum feature level that code actually requires. That kind of cleanup is left as future work, since for most users the current approach will be sufficient. * testing: use verbose mode for running tests by default * fixup: guard against null diagnostic sink
2020-09-14Support shader parameters that are an array of existential type. (#1542)Yong He
* Support shader parameters that are an array of existential type. * Rename to getFirstNonExistentialValueCategory Co-authored-by: Yong He <yhe@nvidia.com>
2020-09-14Dynamic dispatch bug fixes. (#1541)Yong He
Co-authored-by: Yong He <yhe@nvidia.com>
2020-09-14Change the layout we compute/store for parameter groups (#1540)Tim Foley
The type layouts we store for parameter group types (`ConstantBuffer<T>` and `ParameterBlock<T>`) has a somewhat complicated internal structure, which we've slowly built up and evolved as we learned more about what was actually needed: * There is the outer `ParameterGroupTypeLayout` that represents the whole type (e.g., the `ParameterBlock<Something>`), and has resource usage/bindings based on what needs to be accounted for by anything (like a program) that uses the type. E.g., for a parameter block on Vulkan, the resource usage at this level would usually just be a single descriptor `set`. * There is the inner "element" layout, which represents the layout for the `Something` in `ParameterBlock<Something>`. This was initially just stored as a type layout (and an extra type layout is stored for backwards compatibility), but we later realized we needed to store a `VarLayout` for the element, to deal with the fact that it might have a non-zero offset. * Finally, there is the inner "container" layout, which represents the resource usage/bindings that are introduced by the block/buffer/group itself. In the case of a `ParameterBlock<Something>` this would include any "default" constant buffer that is needed in order to store the uniform/ordinary data from type `Something`. On targets like Vulkan and D3D12 such a buffer would no show up as part of the resource usage of the overall `ParameterBlock`, nor would it be expected to show as part of the "element" layout. The above is just setting the stage so that we can cover the design choice that this change centers around: for each of the above layouts, what should the *type* stored with that layout be? The answers seem simple at first: * The type for the outer `ParameterGroupTypeLayout` should clearly be the whole type (e.g., the `ParameterBlock<Something>`) * The type for the inner "element" layout should clearly be the element type (e.g., the `Something` in `ParameterBlock<Something>`) * The type for the inner "container" layout should be... hmm... That last question is the thorny one. There are two main options, each with trade-offs: 1. What is being done in the code before this change is to store the whole type (e.g., `ParameterGroup<Something>`) as the type of the "container" layout. This makes some superficial sense (the type of the container should be a container type). 2. What this change switches to is the type of the "container" layout being null (it could equivalently be any sentinel that represents the absence of a meaningful type). While option (1) seems like it would make sense, it risks creating an infinite regress for client application code. If they have a recursive routine that walks the Slang reflection hierarchy, then it will probably key off of the kind of each type it visits. Such a recursive walk would end up trying to treat the outer layout and the inner "container" layout equivalently, when they aren't really representing the same concepts. Even if it seems like this approach defendings against null-pointer crashes in client code, it really only delays them, since the inner "container" layout would yield a null type layout when asked for its element layout. In contrast, option (2) more accurately reflects the reality that the container layout is a `VarLayout` and `TypeLayout` that correspond to no variable/type in practice. Clients of the Slang reflection API already have to deal with `VarLayout`s that have no variable, so it is reasonable for them to deal with `TypeLayout`s that have no type. While the above statements may sound strange, it really comes down to the fact that a "type layout" is really just a way of encoding the "size" of something (where size can encapsulate all the different kinds of resources something can consume on our various targets), and a "variable layout" is really just a way of encoding the "offset" of something (again, where there can be different offsets per consumable resource). In that light, it makes sense that the "container" layout for a parameter group is really just a way of representing the resource allocation of the container itself, and is not associated with any type or variable. This change is technically a breaking change for clients of the reflection API, so it will need to be rolled out with an appropriate change to our version number.
2020-09-11Remove some "do what I mean" logic from reflection API (#1539)Tim Foley
The reflection API had a bit of DWIM (Do What I Mean) logic in that a client could query the resource usage/bindings of a `ParameterBlock<X>` and see not only the register `space` or descriptor `set` for the block itself, but also the constant buffer `register` or `binding` for its default constant buffer (if any). The reason for this behavior was that there was existing client code in Falcor that relied on that behavior for parameter blocks, and even after changing the way that parameter block layouts were computed and stored we sought to maintain backwards compatibility with that client code. The trouble is that the weird behavior then goes on to cause confusion for other clients of the Slang reflection API. This change removes the special-case logic, and fixes up our reflection tests to mirror the new (correct) information that we return. When this change is released, it will be a breaking change for any client code that still relies on the old behavior. We will need to coordinate with client application developers to fix their reflection logic. Note that all the same information can still be accessed, simply by using new reflection API that we have added.
2020-09-10Allow existential types in `StructuredBuffer` element type. (#1536)Yong He
* Allow existential types in `StructuredBuffer` element type. * Handle StructuredBuffer.Load/.Consume methods * Clean up unnecessary changes * Code cleanup * Update test comment
2020-09-10Add a pass to support resource return values (#1537)Tim Foley
A long-standing problem for the Slang implementation has been that some targets (notably GLSL/SPIR-V) do not support treating resources (textures, buffers, samplers, etc.) as first-class types. Resource types on such platforms are restricted so that they may not be used as the type of: 1. fields of aggregate types (`struct`s) 2. local variables 3. function results or `out`/`inout` parameters Issue (1) is handled by our "type legalization" pass today, by splitting aggregates that contain resources into separate fields/variables/parameters. Issue (2) is worked around by putting code into SSA form and promoting local variables to SSA temporaries when possible; the net result is that many local variables of texture type are eliminated (that pass is not perfect, though, and it is possible for users to get errors when it doesn't fully clean up local variables of texture type). Issue (3) is a much more complicated matter, and it is what this change is concerned with. A typical solution to issue (3) is to simply inline all of the code in a program, at which point function results and `out`/`inout` parameters will no longer exist to cause problems. We reject such solutions for two reasons. First, there are limitations on control-flow structure in HLSL/GLSL/SPIR-V that mean they cannot express certain programs after inlining has been performed. Second, and more importantly, the philosophy of the Slang compiler is to perform as little duplication of code as possible, so that we do not accidentally contribute to binary size bloat. Instead, this change tackles the problem of functions that output resource types by adding a new specialization pass. The pass detects functions that ought to be specialized (because they have resource-type outputs), and inspects their bodies to see if the values they output have a predicatable structure that can be replicated outside of the function body. The same logic that inspects the function body also rewrites (a copy of) the function to not have the offending outputs. Finally, all the call sites to a function that is rewritten in this way also get rewritten so that instead of using output values from the function itself, they reproduce the expected output value(s) in their own code. The pass as presented here is intentionally limited in the scope of what it can optimize away (and the test case only touches on that specific functionality). The goal is to get a basic version of this pass in place and evaluated, and then to expand on its functionality incrementally over time.
2020-09-08Test if blob is returned. (#1535)jsmall-nvidia
2020-09-04Allow mixing unspecialized and specialized existential parameters. (#1533)Yong He
* Allow mixing unspecialized and specialized existential parameters. * Fixes.
2020-09-03Fix a crashing issue for non-end-to-end compilation (#1532)Tim Foley
The refactorings that added support for multiple entry points in an output file seemingly introduced a regression such that we crash on compilation that is not "end-to-end." Unfortunately, all of our testing only covers end-to-end compilation, and many users only use that mode. I've added a fix for the issue I ran into, but I haven't addressed the testing gap in this change. Without adding testing for non-end-to-end compilation, I expect further regressions to slip in over time. Co-authored-by: Yong He <yonghe@outlook.com>
2020-09-03Rework type layout for ExistentialSpecializedType (#1531)Yong He
2020-09-02Allow unspecialized existential shader parameters (dynamic dispatch). (#1529)Yong He
* Allow unspecialized existential shader parameters (dynamic dispatch). * Fixes. * Fixes * disable cuda test
2020-09-02Add support for (undocumented) HLSL 16-bit bit-cast ops (#1528)Tim Foley
As of SM 6.2, the dxc compiler added support for a set of 16-bit bit-cast operations to mirror the `asuint`, `asfloat`, and `asint` operations that were provided for 32-bit scalar types. These operations are not publicly documented, so we didn't think to add them. It should be noted that there was already a similar operation in HLSL, called `f32tof16`, that took as input a `float` and then packed a half-precision version of it into the low bits of a `uint`. The problem is that using that operation for `half`->`uint16_t` conversion required a round trip through a `float`, and downstream compilers seemingly can't optimize away that conversion. This change adds the new operations along with a test that tries to make use of them to ensure the results are what is expected. There are enough cases to cover that I had to write the test in a way where each thread only writes out a subset of the required output. There are two other changes here are that are not directly related to the main feature: First, it seems like the `[__forceInlineEarly]` attribute on some of these overloads interacts poorly with generics, and results in an `IRVectorType` appearing at local scope in the output code. That is semantically reasonable given our IR model, but it would ideally be something that gets eliminated as a result of deduplication of types. For now I've introduced a slight hack to make types always get inlined into their use sites during emission, which should handle the case of locally-defined types. I'm not 100% happy with that solution, but it seemed better than introducing a bunch of unrelated fixes into this PR. Second, the way that conversion operations were being declared for matrix types seems to have been incorrect: we had a single *explicit* initializer added to matrix types via an `extension` that allowed them to be initialized from other matrix types with the same size and *any* element type. In order to support implicit conversions of matrix types, I cribbed the code we were already using to introduce implicit conversion operations for vector types.
2020-09-01Mark f32tof16 and f16tof32 as HLSL intrinsics (#1526)Tim Foley
Fixes GitLab issue 85 These functions are intrinsic for HLSL, but were not marked as such, leading to emitting code that manually loops for the vector case. The looping code resulted in lower performance for some users, because apparently dxc was unable (or unwilling?) to unroll the loop, and ended up generating temporary ("stack-allocated") arrays for the vectors produced. As a longer-term solution, we may need to consider how the `VECTOR_MAP...` and `MATRIX_MAP...` idioms used in the stdlib get lowered, so that we can emit fully-unrolled versions in cases where the vector/matrix shape is known at the time we generate code. This PR does not attempt to address that larger issue.
2020-09-01Support dynamic existential shader parameters in render-test (#1525)Yong He
* Support dynamic existential shader parameters in render-test * Fix linux build error. * Fixes. * Fix code review issues. * Fix gcc error. * More fixes. * More fixes.
2020-08-31AST Serialization in Modules (#1524)jsmall-nvidia
* First pass at filter for AST serial writing. * Serialization of AST for modules. * Removed some commented out source. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-08-28Add OrderedDictionary to core. (#1523)Yong He
2020-08-28Avoid nondeterministic ordering of output (#1522)Tim Foley
Most people agree that it is a Good Thing when compilers are deterministic: the exact same input bits produce the exact same output bits every time the compiler is run. Bonus points are awarded if the results are independent of the platform the compiler was compiled for and run on. One of the easiest kinds of nondeterminism to have sneak into a compiler is for it to produce the "same" code inside functions, but sometimes emits functions or other global symbols in a different order from run to run. Right now, the Slang compiler has some of this kind of nondeterminism. The main way (but not necessarily the only way) that a compiler ends up producing output with a different ordering across runs is by iterating over the contents of a hash-based container (in our codebase, a `Dictionary` or `HashSet`), where the keys make use of pointers. Most operating systems intentionally try to randomize the address space of processes across runs (as a security feature), so that exact pointer values are not stable across runs, and thus hash value are not stable across runs, and thus the ordering of entries is not stable across runs. This change identifies a few cases of iterating over dictionaries or sets that could have produced output non-determinism: * The `HLSLIntrinsicSet` was using a `Dictionary` to store intrinsics that had been referenced, and would later produce a linear list of those intrinsics based on their order in the dictionary. * The `WitnessTable`s produced by the front-end stored a `Dictionary` or requirements, and lowering from AST->IR was iterating over that dictionary to ensure that everythign got emitted. * The `SharedSemanticsContext` was tracking a `HashSet` of modules that were imported into scope (so that their `extension`s should be visible), and an iteration over that list was used when producing candidate extensions during lookup. This case is unlikely to cause any nondeterminism in final output, but could lead to nondeterministic ordering in diagnostic messages for ambiguous reference/overload cases. * The IR linker maintains a `Dictionary` of symbols based on their mangled names, and iterates over it in code that clones all witness tables into the linked IR whether or not they are referenced. For most of these cases the fix is simple: * Keep both a `Dictionary`/`HashSet` and a `List` of the appropriate type * Whenever adding to the hash-based container also add to the list * Whenever iterating, iterate over the list In the final case of the IR linker, the relevant code was marked with a `TODO` comment noting that it shouldn't actually be needed, so I simply dropped it and the change doesn't seem to break any of our tests. I've been fairly confident that code wasn't needed for a while. This change isn't exactly elegant, and a better long term solution might be to introduce two new types, `OrderedDictionary` and `OrderedSet`, which are similar to `Dictionary` and `HashSet` except that they guarantee a deterministic order of enumeration of their contents, based on insertion order. (Note that a `SortedDictionary` and/or `SortedSet` that use something like a binary tree to produce a "determinsitc" sorted order wouldn't actually help here, because sorting entries by pointer values wouldn't solve the underlying problem that the pointer values aren't stable across runs) I've chosen to avoid adding new types to `core` in the interest of making the change as small as possible. If we all agree that new types are warranted, it should be easy to clean up these use cases. Testing this change is difficult, because we can't produce a reliable test to rule out nondeterminism. I have done best-effort testing by hand by crafting shaders that show output nondeterminism, and then compiling them both with and without these changes.
2020-08-28Enable lower-generics pass universally. (#1518)Yong He
* Enable lower-generics pass universally. * Exclude builtin interfaces and functions from lower-generics pass. * Update stdlib. * Fixup. * Fixes handling of nested intrinsic generic functions. * Fixes. * Fixes.
2020-08-27Enable simple extensions of interface types (#1521)Tim Foley
The big picture here is that an `extension` can now apply to an interface type and provide convenience methods for all types that implement that interface. Suppose you have an interface for counters: interface ICounter { [mutating] void add(int val); } and a type that implements it: struct SimpleCounter : ICounter { int _state = 0; ... } If a common operation in your codebase is to increment a counter by adding one, you would be faced with the problem of either: * Add the `increment()` operation to `ICounter`, and force every implementation to implement the new requirement * Add the `increment()` operation to concrete counter types as needed, and thus not be able to use it in generic code * Make `increment()` a global ("free") function, and force clients of counters to have to know which operations use member syntax (`c.add(...)`) and which use global function call syntax (`increment(c)`). The whole idea of `extension`s is to allow for another option that is better than all of the above: extension ICounter { [mutating] void increment() { this.add(1); } } The core of the implementation is relatively straightforward, and consists of two complementary pieces. The first piece is that when emitting a concrete IR entity (function/type/whatever) we treat any enclosing `interface` type (or `extension` thereof) a bit like an enclosing `GenericDecl`, and introduce an `IRGeneric` to wrap things. The generic `IRGeneric` has parameters representing the `This` type for the interface, along with the witness table that shows how `This` conforms to the interface itself. We thus end up with an IR version of `increment()` something like: void increment<This : ICounter>(This this) { this.add(1); } The second (complementary) fix is that when there is code that references this `increment()` operation, we don't treat it like an interface requirement (look up based on its key), and instead treat it like a generic (since that is how it is lowered now) and speciaize it to the information we can glean from the `ThisTypeSubstitution`. A related fix that is required here is that within the body of `increment`, when we perform `this.add`, we need to ensure that the lookup of `add` in the base interface properly takes into account the subtype relationship (`This : ICounter`) and encodes it into the lookup result, so that we get `((ICounter) this).add`, and properly generate code that looks up the `add` method in the witness table for `This`.
2020-08-27Clean up the way that lookup "through" a base type is encoded (#1519)Tim Foley
* Clean up the way that lookup "through" a base type is encoded In order to undestand this change, it is important to undestand how lookup through base interfaces works prior to this change. In order to understand *that* it helps to be reminded of how inheritance relationships get encoded in the AST. Suppose the user writes: struct Base { int val; } struct Derived : Base { ... } ... Derived d = ...; int v = d.val; The question is how an expression like `d.val` gets semantically checked, and how it is encoded into the IR after semantic checking. You might assume it gets checked and encoded so that we end up with: int v = ((Base) d).val; and that seems like it should Just Work... so of course that isn't what Slang has been doing. Instead, we relied on the fact that the inheritance relationship `Derived : Base` is represented as an `InheritanceDecl` member of the `Derived` type, and we ended up checking the code into something like: int v = d.<anonymous>.val; where `<anonymous>` stands in for the name of the `InheritanceDecl` that represents inheritance from `Base`. This design choice makes a limited amount of sense when you consider how inheritance would typically be lowered to a C-like output language: // struct Derived : Base { ... } // => struct Derived { Base base; ... } The problem with that encoding is that it really doesn't make sense for almost any other scenario. In particular, if you have a generic type parameter `T` that was constrianed with `T : ISomething`, then the constraint isn't even technically a *member* of the type parameter `T`, so expressing thing as a member reference in the AST is completely incorrect. Unfortunately, by the time it was clear that we needed something better, a bunch of implementation work was done based on the existing representation. This change tries to clean things up so that lookup of a super-type member through a value of a sub-type does the obvious thing: cast the value to the super-type and then look up the member (as in `((Base) d).val`). The core of the change is that in lookup, instead of creating `Constraint` breadcrumbs whenever we are looking up in a super-type (with a reference to the `TypeConstraintDecl` being used) we instead use `SuperType` breadcrumbs (with a reference to a `SubtypeWitness`). Then when we create the expression from a `LookupResultItem`, we translate any `SuperType` breadcrumbs into `CastToSuperTypeExpr`s (an expression type that already existed). This change also adds support for lookup through the `This` type in the context of an interface, and in order for that to work we need a new kind of subtype witness to represent the knowledge that a `This` type is a subtype of the enclosing interface. Making that work forces us to change the representation of `TransitiveSubtypeWitness` so that it takes a pair of subtype witnesses (and not one subtype witness plus one `TypeConstraintDecl`). For the most part this is a small change, but it raises the possibility that some pieces of the code aren't going to be robust against all possible shapes of subtype witnesses. The IR lowering logic has relied on the weird `d.<anonymous>` representation in order to ensure that when looking up interface members we weren't always casting to the interface type (which would create a `makeExistential` instruction), and then calling using that. Basically, the IR lowering would ignore the `d.<anonymous>` part and just emit `d`, but we can't do that for `((Base) d)` or `((IThing) d)` because whehter or not we should actually perform the cast depends on context. For now we solve that problem by adding specific logic to ignore up-casts to interface types when they appear in member expressions or method calls. A more robust solution might be needed down the line, but this seems to work in practice. All of this work is cleanup that I found was needed in order to make `extension`s of `interface` types workable. * fixup: disable an incorrect test
2020-08-27Allow use of lib6_6 profile for DXC compilation. (#1520)jsmall-nvidia
2020-08-27Removed use of pthreads in glslang (#1517)jsmall-nvidia
* Try removing pthreads from glslang. * Update slang-binaries to use glslang that doesn't use pthreads. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-08-26Reorder existential tuple elements. (#1516)Yong He
Co-authored-by: Tim Foley <tim.foley.is@gmail.com>
2020-08-26Added more Atomic support for int64 types on RWByteAddressBuffer (#1515)jsmall-nvidia
* Support for more 64 bit atomics on ByteAddressBuffer. * min max 64bit test. * Disable CUDA version of min max 64 bit test - as produces the wrong output. * Update target-compatibility.md with added 64 bit atomics. Co-authored-by: Yong He <yonghe@outlook.com>
2020-08-25Export witness table and RTTI objects in compiled libraries. (#1514)Yong He
* Export witness table objects in compiled code. - Ensure that witness tables are preceeded with `extern "C"` modifier in the generated C++ code. - RTTI objects use the mangled name of the type directly, so that can be queried using the type's mangled name directly from the resulting DLL. - Expose `Linkage::getTypeConformanceWitnessMangledName` to return the mangled name of witness tables to the host. - Ensure that all witness tables (including those for associated types) have proper mangled name. * Fix GCC error in Slang generated code.
2020-08-24RWByteAddressBuffer::InterlockedCompareExchangeU64 (#1513)jsmall-nvidia
* First pass at incorporating nvapi into test harness. * D3d12 Atomic Float Add via NVAPI working * Dx12 atomic float appears to work. * Atomic float add on Dx12. * Added atomic64 feature addition to vk. Fix correct output for atomic-float-byte-address.slang * Disable atomic float failing tests. * Upgraded VK headers. * Detect atomic float availability on VK. * Try to get test working for in64 atomic. * Made HLSL prelude controlled via the render-test requirements. * Added -enable-nvapi to premake. * Fix D3D12Renderer when NVAPI is not available. * Small improvements to VKRenderer. * Improve atomic documentation in target-compatibility.md. * Fixed NVAPI working on D3D12. * Test for specific NVAPI features. * Remove requiredFeatures from Renderer::Desc as was ignored. Tried to document more around nvapiExtnSlot. * Readded requiredFeatures to Renderer::Desc * Improve comments in the tests. * Rename Fp32 -> F32 Added cas-int64-byte-address-buffer.slang test Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-08-21Vulkan update/NVAPI support (#1511)jsmall-nvidia
* First pass at incorporating nvapi into test harness. * D3d12 Atomic Float Add via NVAPI working * Dx12 atomic float appears to work. * Atomic float add on Dx12. * Added atomic64 feature addition to vk. Fix correct output for atomic-float-byte-address.slang * Disable atomic float failing tests. * Upgraded VK headers. * Detect atomic float availability on VK. * Try to get test working for in64 atomic. * Made HLSL prelude controlled via the render-test requirements. * Added -enable-nvapi to premake. * Fix D3D12Renderer when NVAPI is not available. * Small improvements to VKRenderer. * Improve atomic documentation in target-compatibility.md.
2020-08-21Fix stdlib declarations for texture Gather() (#1510)Tim Foley
Fixes #1507 These operations were failing to take into account the way that array textures require an extra coordinate to be passed in for the primary location (but not the additional offsets). Adding `isArray` to the component count is the existing solution used for similar intrinsics elsewhere in the stdlib, and it is adopted here. Because our test framework isn't really set up to do a lot of texture testing (including having no support for texture arrays), the test added here is just a cross-compilation test that compares output with fxc for comparable input.
2020-08-21Another fix for overriding property decls (#1509)Tim Foley
* Another fix for overriding property decls The central problem we keep running into with `property` decls in `interface`s comes down to two choices: 1. When a member lookup `obj.someName` or a simple lookup for `someName` produces an overloaded result, we make no attempt to resolve the overloading right away, and instead postpone disambiguation until the point where that expression gets *used*, in case the context where it gets used can help in disambiguation (a notable case being when there is a call expression `obj.someName(...)` or `someName(...)`). 2. When looking up members in a the scope of a type (either for `obj.someName` or `someName` in the context of a method), we include all results from base types in the set of overloads returned, even in cases where the type has a direct member that "overrides" the inherited one. The combination of these factors means that when a `struct` type implements a `property` to satisfy a requirement of an inherited `interface`, then references to `obj.someProp` end up being ambiguous between the property in the concrete `struct` type and the property it inherits through the `interface`. There is no quick fix possible for issue (2). It might seem that we could just skip over members inherited through `interface`s when doing lookup in a type, but that solution wouldn't apply to inheritance from another `struct` type, or any future scenario where we support default implementations of methods in interfaces. The simple idea of saying that a derived-type member named `M` hides all inherited members named `M` is possible, but would lead to a bad user interface when a type wants to support both a core "bottleneck" method and a bunch of convenience overloads with the same name. That leaves us with issue (1), and trying to find a reasonable fix for it. The common case is that any expression `e` eventually gets used in a context where it will be be subject to disambiguation: * If we form a call expression `e(...)`, then the overload resolution logic will (obviously) work to disambiguate which `e` was meant. * If `e` is used as an argument to another call (`f(... e ...)` or `... + e`), then `e` will be coerced to the expected parameter type for its argument position, and that coercion will disambiguate it (this is the bit that was fixed in #1501) * If `e` is used in another context where a type is expected/known, it will also be coerced: `if(e)`, `int v = e`, etc. The problem case that is left behind is any scenario where `e` is not subject to one of the above resolution cases, which mostly amounts to cases where an expression is never coerced to a single fixed type. There are a few important cases where this occurs today: * When the expression is used as the left-hand side of an assignement (`e = ...`). * When an expression is used to initialize a variable with an implicit type (`let v = e`). * When inferring generic arguments from the value arguments at a call site (`f(e)` where `f` is defined as `f<T>(T v)`) The key connecting thread in each of these cases is that the front-end needs to determine the type of `e` to make progress. Our semantic checking logic already has functions that try to draw a distinction between the two cases: * The `CheckTerm()` operation is supposed to be used when we expect that we will eventually coerce or otherwise diambiguate the term, and also in cases where we don't yet know if a term should name a type or a value * The `CheckExpr()` operation is supposed to be used when we do not expect that we will apply coercion/disambiguation to a term, and need to have assurances that it has been coerced into a non-overloaded expression with a reasonable type The simple part of the fix made here is to make `CheckExpr()` actually do part of what it is suppsoed to (attempt to disambiguate overloaded terms), and then audit all the call sites to `CheckExpr()` to make sure they are actually ones that intend to opt into that logic. The messier part of the fix is dealing with generic argument inference, because we need to extract the type of the disambiguated expression for the purposes of inference, but we don't want to disturb the actual argument list at a call site (because type coercion of the arguments is supposed to handle the disambiguation). This part is done with a bit of special-casing in the overload-resolution context, by adding a method that gets the type or an argument after disambiguation (when possible). * fixup Co-authored-by: Yong He <yonghe@outlook.com>
2020-08-21Allow calling a generic function with an existential value (dynamic ↵Yong He
dispatch) (#1508) * Allow calling a generic function with an existential value (dynamic dispatch). * Fixes per review comments. * Clean up implementation by having `openExistential` return `ExtractExistentialType` instead of a DeclRef to the interface with a `ThisTypeSubstitution`. * More cleanups Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com> Co-authored-by: Yong He <yhe@nvidia.com>
2020-08-20Initial support for a using construct (#1506)Tim Foley
The basic idea is that if you have a namespace: namespace MyCoolNamespace { void f() { ... } ... } then you can bring the declarations from that namespace into scope with: using MyCoolNamespace; f(); The `using` construct is allowed in any scope where declarations are allowed. As an additional feature, the construct allows and then ignores the keyword `namespace` if it occurs right after `using`: using namespace MyCoolNamespace; Note that unlike in C++, `using` a namespace inside another namespace doesn't implicitly make the symbols available to clients of that namespace: namespace hidden { void secret() {...} ... } namespace api { using hidden; ... } api.secret(); // ERROR: `secret()` isn't a member of `api` The implementation of this feature was relatively simple, although it does leave out more advanced features that might be desirable in the future: * No support for `using MCN = MyCoolNamespace` sorts of tricks to define a short name * No support for `using` anything that isn't a namespace (e.g., to make the members of a type available without qualification) * No support for cases where multiple visible modules have a namespace of the same name (or dealing with overloaded namespaces in general)
2020-08-19Remove IncludeHandler. (#1505)jsmall-nvidia
nvAPI -> NVAPI nvAPIPath -> nvapiPath DxcIncludeHandler don't reference count. nv-api-path -> nvapi-path Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-08-19Int64 atomic add RWByteAddressBuffer support (#1504)jsmall-nvidia
* Fix premake5.lua so it uses the new path needed for OpenCLDebugInfo100.h * Keep including the includes directory. * Added the spirv-tools-generated files. * We don't need to include the spirv/unified1 path because the files needed are actually in the spirv-tools-generated folder. * Put the build_info.h glslang generated files in external/glslang-generated. Alter premake5.lua to pick up that header. * First pass at documenting how to build glslang and spirv-tools. * Improved glsl/spir-v tools README.md * Added revision.h * Change how gResources is calculated. Update about revision.h * Update docs a little. * Split out spirv-tools into a separate project for building glslang. This was not necessary on linux, but *is* necessary on windows, because there is a file disassemble.cpp in spirv-tools and in glslang, and this leads to VS choosing only one. With the separate library, the problem is resolved. * Fix direct-spirv-emit output. * Update to latest version of spirv headers and spirv-tools. * Upgrade submodule version of glslang in external. * Add fPIC to build options of slang-spirv-tools * WIP adding support for InterlockedAddFp32 * Upgrade slang-binaries to have new glslang. * Fix issues with Windows slang-glslang binaries, via update of slang-binaries used. * WIP - atomicAdd. This solution can't work as we can't do (float*) in glsl. * WIP on atomic float ops. * Added checking for multiple decls that takes into account __target_intrinsic and __specialized_for_target. First pass impl of atomic add on float for glsl. * Split __atomicAdd so extensions are applied appropriately. * Made Dxc/Fxc support includes. Use HLSL prelude to pass the path to nvapi Added -nv-api-path * Refactor around IncludeHandler and impl of IncludeSystem * slang-include-handler -> slang-include-system Have IncludeHandler/Impl defined in slang-preprocessor * Small comment improvements. * Document atomic float add addition in target-compatibility.md. * CUDA float atomic support on RWByteAddressBuffer. * Add atomic-float-byte-address-buffer-cross.slang * Removed inappropriate-once.slang - the test is no longer valid when a file is loaded and has a unique identity by default. A test could be made, but would require an API call to create the file (so no unique id). Improved handling of loadFile - uses uniqueId if has one. * Work around for testing target overlaps - to avoid exceptions on adding targets. Simplify PathInfo setup. Modify single-target-intrinsic.slang - it no longer failed because there were no longer multiple definitions for the same target. * Int64 atomic add RwByteAddressBuffer support. * Fix typo in stdlib for int atomic ByteAddressBuffer. * Small fixes to int64 atomic test. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-08-18Support initializing an existential value from a generic value. (#1503)Yong He
* Support initializing an existential value from a generic value. * Remove trailing spaces and clean up debugging code.
2020-08-18Support for float atomics on RWByteAddressBuffer (#1502)jsmall-nvidia
* Fix premake5.lua so it uses the new path needed for OpenCLDebugInfo100.h * Keep including the includes directory. * Added the spirv-tools-generated files. * We don't need to include the spirv/unified1 path because the files needed are actually in the spirv-tools-generated folder. * Put the build_info.h glslang generated files in external/glslang-generated. Alter premake5.lua to pick up that header. * First pass at documenting how to build glslang and spirv-tools. * Improved glsl/spir-v tools README.md * Added revision.h * Change how gResources is calculated. Update about revision.h * Update docs a little. * Split out spirv-tools into a separate project for building glslang. This was not necessary on linux, but *is* necessary on windows, because there is a file disassemble.cpp in spirv-tools and in glslang, and this leads to VS choosing only one. With the separate library, the problem is resolved. * Fix direct-spirv-emit output. * Update to latest version of spirv headers and spirv-tools. * Upgrade submodule version of glslang in external. * Add fPIC to build options of slang-spirv-tools * WIP adding support for InterlockedAddFp32 * Upgrade slang-binaries to have new glslang. * Fix issues with Windows slang-glslang binaries, via update of slang-binaries used. * WIP - atomicAdd. This solution can't work as we can't do (float*) in glsl. * WIP on atomic float ops. * Added checking for multiple decls that takes into account __target_intrinsic and __specialized_for_target. First pass impl of atomic add on float for glsl. * Split __atomicAdd so extensions are applied appropriately. * Made Dxc/Fxc support includes. Use HLSL prelude to pass the path to nvapi Added -nv-api-path * Refactor around IncludeHandler and impl of IncludeSystem * slang-include-handler -> slang-include-system Have IncludeHandler/Impl defined in slang-preprocessor * Small comment improvements. * Document atomic float add addition in target-compatibility.md. * CUDA float atomic support on RWByteAddressBuffer. * Add atomic-float-byte-address-buffer-cross.slang * Removed inappropriate-once.slang - the test is no longer valid when a file is loaded and has a unique identity by default. A test could be made, but would require an API call to create the file (so no unique id). Improved handling of loadFile - uses uniqueId if has one. * Work around for testing target overlaps - to avoid exceptions on adding targets. Simplify PathInfo setup. Modify single-target-intrinsic.slang - it no longer failed because there were no longer multiple definitions for the same target. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-08-17Attempt to fix lookup for members that "override" (#1501)Tim Foley
Our current lookup process always finds *all* members of a type, which can include both an inherited member (e.g., from an `interface`) and one that logically overrides/implements it. If something downstream doesn't filter this result down and favor the derived member, then an ambiguity error will result. To date, this has mostly been a non-issue because we haven't emphasized inheritance, and the main case we did support (`struct` types implemented `interface` methods) gets disambiguated as part of overload resolution for function calls. Recent changes to support `property` declarations to `interface`s add the possibility for ambiguity between a "base" and "derived" declaration that can't rely on overload resolution for disambiguation. The approach in this PR is to add disambiguation logic to the other main place where the results of lookup get used. If a lookup result is being assigned to a variable, passed to a function, or otherwise used in a case where a value of a specific type is needed, it will be "coerced" to the desired type. This change makes it so that the first step in the coercion logic is to try to disambiguate the expression that is being coerced. In order to ensure that an overloaded expression can be detected and resolved even when just checking if coercion is possible, I needed to update the `canCoerce*()` functions to also take the expression that is being tested for coercibility, and not just its type. There is only one case (that I saw) where coercion checks were being made without an expression value available, and that case didn't actually need/want to handle overloading. In order to test the fixes here, I added logic to the `property`-in-`interface` test to make sure that the critical cases work as expected (references to a derived member using "dot syntax" and "implicit `this`" syntax). Alternatives Considered ----------------------- The first attempt at this fix took a simpler approach: I added the disambiguation logic as a post-process on member lookup. That is, given `obj.foo` I would take the `LookupResult` for `foo` and immediately try to filter it to include only the most-derived members. This approach has the major benefit of catching even more use cases of values (and thus helping to ensure that we don't spend forever chasing down more of these ambiguity errors), but it also has two critical problems: 1. If we only trigger disambiguation when looking up `obj.foo`, then we can't do anything to help when `foo` is looked up as an ordinary identifier, but is actually equivalent to `this.foo`. A full fix would require doing this disambiguation on *every* name lookup, which leads to the second issue: 2. It is important that for a method call like `obj.m(...)` we do *not* disambiguate when looking up `obj.m`, and instead let the overload resolution for the call resolve things. That choice is what makes it possible to call an inherited `m` declaration even when there is a derived `m` with a different signature. Issue (1) is covered by the test case that was added here, but we should probably have a test case for (2) to make sure we don't break that use case. Caveats ------- An important case that we don't solve in this PR is when the result of a lookup is captured in a variable without an explicit type: let f = obj.foo; That case also needs disambiguation, and should be addressed in a later change. A secondary issue is that our approach to prioritizing declarations during lookup is still quite naive. We really need a way for lookup to attach information about nesting of scopes to results (to be clear that results from inner scopes should be preferred over those from outer scopes), as well as have a robust mechanism for comparing the priority of members based on the inheritance graph of a type. This change doesn't do anything to make the situation better or worse.
2020-08-17GPU Foreach Loop (#1498)Dietrich Geisler
* GPU Foreach Loop This PR introduces the completed GPU foreach loop and updates the heterogeneous-hello-world example to use it. This PR builds on the previous introduction of the GPU Foreach loop parsing and semantic checking PR (#1482) by introducing IR lowering and emmitting. THe new feature can be used by having a GPU_Foreach loop interacting with a named non-CPP entry point, and using the -heterogeneous flag. * Fix to path Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-08-14Fix tuple type lowering (#1499)Yong He
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-08-14Lower existential types. (#1497)Yong He
Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-08-14Fix an issue with explicit enum tag types (#1495)Tim Foley
The basic problem here was that in a declaration like: ```hlsl enum Color : uint { Red, Orange, ... } ``` The `: uint` bit is represented as an `InheritanceDecl`, because that is what we use to represent the syntactic form of inheritance clauses like that. At the point where we parse the `InheritanceDecl` we don't yet know whether it represents a base interface or a "tag type" like `uint` in this case. The root problem that is then created is: an `enum` type is *not* a subtype of its "tag type," and treating it like a subtype can create problems. The main problem that arises is that looking in a type like `Color` will find both the members of color *and* the members of `uint`. In the case of things like `__init` declarations, that creates a problem where the `Color` type has two different `__init`s that take a `uint`: * The one it inherits from `uint` via that `InheritanceDecl` (even though it shouldn't) * The one it gets via an extension just for conforming to `__EnumType` (a non-user-exposed `interface` in the standard library) Because both of those `__init`s are inherited, neither is preferred over the other one and they create an ambiguity if somebody tries to write: ```hlsl uint u = ...; Colorc = Color(u); ``` The solution used in this PR is to add a compiler-internal modifier to the `InheritanceDecl` that introduces a "tag type" to an `enum`, in an early phase of checking (one of the ones that occurs before it is legal to enumerate the bases of a type). Then the lookup process is modified to ignore `InheritanceDecl`s with that modifier when doing lookup in super-types (since the declaration does *not* indicate a subtype/supertype relationship). This appears to get the basic feature working again, although it is possible that there are other parts of the compiler that use `InheritanceDecl`s and mistakently assume that all `InheritanceDecl`s introduce subtype/supertype relationships. We probably need to do a significant audit of the code to start being more clear about the nature of the relationships such declarations introduce. Such steps are left to future changes. Co-authored-by: Yong He <yonghe@outlook.com>
2020-08-13Support property declarations in interfaces (#1494)Tim Foley
There are two main features in this change. First, we allow for `interface`s to declare `property` requirements, which can be satisfied by matching `property` declarations in a type that conforms to the interface: interface IRectangle { property float width { get; } property float height { get; } } struct Square : IRectangle { float size; property float width { get { return size; } } property float height { get { return size; } } } Second, we allow a type to satisfy a `property` requirement with an ordinary field of the same name: struct Rectangle : IRectangle { float width; float height; // no explicit `property` declarations needed } The implementation of these features is mostly in `slang-check-decl.cpp` in the logic for checking conformance of a type to an interface. The first feature simply requires adding logic to checking whether a candidate satisfying `property` declaration matches a required `property` declaration. To do so, it must have the same type, and an accessor to satisfy each of the required accessors. The second feature requires adding logic to synthesize an AST `property` declaration for a type, based on a required `property` declaration and its accessors. This means that, more or less, any type where `this.name` yields a storage location that does what is needed can satisfy a property requirement (there is no specific rule that says the storage needs to be a field, although that is the most likely case). The way that witnesses are stored for property declarations probably merits some description. During IR lowering, an abstract storage declaration like a subscript or `property` more or less desugars away, so that the actual interface requirements correspond to the accessors within it (the `get`, `set`, etc.). This means that a witness table should have entries/keys corresponding to the accessors and not the property itself. The process of finding/recording witnesses for `property` requirements thus installs entries for the individual accessors (with care taken to only install accessor witnesses once we are sure we have witnesses for all the requirements). Currently, the code also installs an entry for the property itself, although that is not strictly required, and might not be something we continue to do long-term. (Aside: it was somewhat surprising that an end-to-end test of `property` declarations in `interface`s Just Worked without any changes to IR lowering.) As we continue to write more code that synthesizes and checks AST expressions/statements, it becomes necessary to refactor the semantic checking logic so that it splits the recursive part (e.g., checking the operands of an assignment) from the validation part (e.g., checking that the assignment itself is valid). It is probably too big of a change to justify at this point, but it might be valuable in the future to have distinct hierarchies that represent unchecked and checked ASTs, with semantic checking mostly being a transformation from one to the other. The benefit of such a change is we could factor out a distinct "builder" API for constructing validated/checked AST nodes, with both semantic checking and AST synthesis being clients of that API.
2020-08-13IR support for Tuple types. (#1492)Yong He
* Tuple types. * Fix x86 warning * Improved deduplication Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-08-13Allow both traditional and modern property syntax (#1487)Tim Foley
The initial change to introduce `property` declarations tied them to a "modern" syntax: property width : float { ... } In practice, a great majority of users assume that properties in Slang will be declared like those in C#: property float height { ... } This change allows both options to parse correctly. The choice made here is to only parse as the "modern" syntax when it can be detected from lookahead (an identifier followed by a `:`), and fall back to the "traditional" syntax otherwise. That choice might not produce the best diagnostic messages around syntax errors in codebases that use the modern syntax, but it is the easiest trade-offs to make. We also add similar disambiguation logic for the `newValue` parameter of a `set` declaration (and other "modern"-style parameters). This strategy cannot be applied to all function parameters in general, because traditional parameter lists can still use `:` to introduce a semantic. Note: the same disambiguation strategy applied here could be used for `let` and `var` declarations: let a : int = 1; let int b = 2; This change does not try to introduce flexibility like that, because it seems unlikely for users to care.