slang.git - Making it easier to work with shaders

	Commit message (Collapse)	Author	Age
...
*	Lower existential types. (#1497)	Yong He	2020-08-14
\| \| \|	Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Change the policy for entry-point uniform parameters on Vulkan (#1476)	Tim Foley	2020-08-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Entry point `uniform` parameters were a feature of the original Cg and HLSL, but have not been used much in production shader code. One of our goals on Slang is to reduce the (ab)use of the global scope, so bringing entry point `uniform` parameters up to a greater level of usability is an important goal. Some policy choices about how global vs. entry-point `uniform` parameters behave have already been made, that shape decisions looking forward: * For DXBC/DXIL, it makes the most sense to follow the lead of fxc/dxc, by treating entry point `uniform` parameters as a kind of syntax sugar for global shader parameters. Any parameters of "ordinary" types are bundles up into an implicit constant buffer, and all the resources (including the implicit constant buffer) are assigned `register`s just as for globals. It is up to the application to decide how to bind those parameters via a root signature (using root descriptors, root constants, descriptor tables, local vs. global root signature, etc.) * For CPU, it makes sense to pass global vs. entry-point parameters as two different pointers, although the details of what we do for CPU are the least constrained across all current targets. * For CUDA compute, it makes the most sense to map global shader parameters to `__constant__` global data, and entry-point `uniform` parameters to kernel parameters. This choice ensures that the signature of a kernel when translated from Slang->CUDA follows the Principle of Least Surprise, at the cost of making entry-point vs. global parameters be passed via different mechanisms. * For OptiX ray tracing, it makes sense to expand on the precedent from CUDA compute: pass global parameters via global `__constant__` data (as is already expected by OptiX for whole-launch parameters), and pass entry-point `uniform` parameters via the "shader record." This establishes a precedent that for ray-tracing shaders, global-scope parameters map to the "global root signature" concept from DXR, while entry-point `uniform` parameters map to a "local root signature" or "shader record." * For Vulkan ray tracing, the precedent from OptiX then argues that entry-point `uniform` parameters should map to the Vulkan "shader record" concept (and thus cannot support things like resource types). * The remaining interesting case is what to do for non-ray-tracing shaders on Vulkan. The dev team agrees that the most reasonable choice to make for non-ray-tracing Vulkan shaders is to map entry-point `uniform` parameters to "push constants." In particular, this makes it easy to express the case of a compute kernel with direct parameters of ordinary/value types in the way that will be implemented most efficiently. The big picture is then that a kernel like: ```hlsl void computeMain(uniform float someValue) { ... } ``` will map to output GLSL like: ```glsl layout(push_constant) uniform { float someValue; } U; void main() { ... } ``` If the user really wanted a constant-buffer binding to be created instead, they can easily change their input to make the buffer explicit: ```hlsl struct Params { float someValue; } void computeMain(uniform ConstantBuffer<Params> params) { ... } ``` (Forcing the user to be explicit about the desire for a buffer here creates a nice symmetry between Vulkan and CUDA; in the first case the user sets up the data in host memory and passes it to the GPU by copy, while in the second case the user must allocate and set up a device-memory buffer for the data. This symmetry extends to D3D if the application chooses to map entry-point `uniform` parameters to root constants.) This change implements logic in the "parameter binding" part of the Slang compiler to make sure that entry-point `uniform` parameters are wrapped up in a push-constant buffer rather than an ordinary constant buffer for non-ray-tracing shaders on Vulkan (and in a shader record "buffer" for the ray-tracing case). The majority of the actual work was in adding support for root/push constants to the test framework and the graphics API abstraction it uses. To be clear about that support: * Root constant ranges are (perhaps confusingly) treated as a new kind of "slot" that can appear on a descriptor set. This choice ensures that the implicit numbering of registers/spaces used by the back-ends can account for these ranges correctly. * The `TEST_INPUT` lines are extended to allow a `root_constants` case that behaves more or less like `cbuffer` * The CPU and CUDA paths can treat a `root_constants` input identically to a `cbuffer`. They already allocate the actual buffers based on reflection, and just use `cbuffer` as a directive that causes bytes to be copied in. * On D3D12 and Vulkan, a descriptor set allocates a `List<char>` to hold the bytes of root constant data assigned into it, and these bytes are flushed to the command list when the table is actually bound (usually right before rendering). * On D3D11, a descriptor set treats a root constant range more or less like a constant buffer range (with a single buffer), except that it also automatically allocates a buffer to hold the data. Assigning "root constant" data automatically copies it into that buffer. The small number of tests that used entry-point `uniform` parameters of ordinary types were updated to use the new `root_constant` input type, and the bugs that surfaced were fixed. A new test to confirm that entry-point `uniform` parameters map to the shader record for VK ray tracing was added. An important but technically unrelated change is the removal of the `DescriptorSetImpl::Binding` type and related function from the Vulkan implementation of `Renderer`. That type was created to ensure that objects that are bound into a descriptor set don't get released while the descriptor set is still alive, but the implementation relied on a complicated linear search to check for existing bindings, which could create a performance issue for descriptor sets that include large arrays of descriptors. The new implementation makes use of the approach already present in the various `Renderer` implementations (including the Vulkan one) for assigning ranges in a descriptor set a flat/linear index for where their pertinent data is to be bound. As a result, the Vulkan `DescriptorSetImpl` now uses a single flat array of `RefPtr`s to track bound objects, and has no need for linear search when binding. Co-authored-by: Yong He <yonghe@outlook.com>
*	`AnyValue` based dynamic dispatch code gen (#1477)	Yong He	2020-08-05
\| \| \| \| \|	* AnyValue based dynamic code gen * Fix aarch64 build error
*	Support associatedtype local variables and return values in dynamic dispatch ↵	Yong He	2020-07-16
\| \| \| \| \| \| \| \| \|	code (#1444) * Refactor lower-generics pass into separate subpasses. * IR pass to generate witness table wrappers. * Support associatedtype local variables and return values in dynamic dispatch code.
*	Dynamic code gen for functions returning generic types. (#1439)	Yong He	2020-07-13
\| \| \| \| \|	* Dynamic code gen for functions returning generic types. * Add expected test result.
*	Dynamic code gen for generic local variables. (#1434)	Yong He	2020-07-10
\| \| \| \| \| \| \|	* Dynamic code gen for generic local variables. * Fixes to function calls with generic typed `in` argument. * Fixes per code review comments
*	Add a test case for dynamic dispatch with `This` type in interface decl. (#1431)	Yong He	2020-07-07
\| \| \| \| \| \| \| \| \|	* Add a test case for dynamic dispatch with `This` type in interface decl. * Update comments * fix typo in comments Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Emit pointers for CPU target. (#1418)	Yong He	2020-07-03
\| \| \|	Co-authored-by: Yong He <yhe@nvidia.com>
*	Disable dynamic dispatch tests on CUDA - as fails with exception about ↵	jsmall-nvidia	2020-07-01
\| \| \| \|	unhandled op. (#1425)
*	Fix `lowerFuncType` and small bug fixes.	Yong He	2020-06-24
\|
*	Dynamic dispatch for generic interface requirements.	Yong He	2020-06-24
\| \| \| \| \| \| \| \|	-Lower interfaces into actual `IRInterfaceType` insts. -Lower `DeclRef<AssocTypeDecl>` into `IRAssociatedType` -Generate proper IRType for generic functions. -Add a test case exercising dynamic dispatching a generic static function through an associated type. -Bug fixes for the test case.
*	Dynamic dispatch for static member functions of associatedtypes. (#1404)	Yong He	2020-06-19
\|
*	Merge branch 'master' into dyndispatch	Tim Foley	2020-06-18
\|\
\| *	Improvements around C++ code generation (#1396)	jsmall-nvidia	2020-06-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* * Remove UniformState and UniformEntryPointParams types * Put all output C++ source in an anonymous namespace * If SLANG_PRELUDE_NAMESPACE is set, make what it defines available in generated file. * Fix signature issue in performance-profile.slang * Context -> KernelContext to avoid ambiguity. * Fix issues around dynamic dispatch and anonymous namespace. * Fix typo.
* \|	Dynamic dipatch non-static functions.	Yong He	2020-06-17
\|/
*	Generate dynamic C++ code for the minimal test case. (#1391)	Yong He	2020-06-17
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Add IR pass to lower generics into ordinary functions. * Fix project files * Emit dynamic C++ code for simple generics and witness tables. Fixes #1386. * Remove -dump-ir flag. * Fixups.
*	Generate IRType for interfaces, and reference them as `operand[0]` in ↵	Yong He	2020-06-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IRWitnessTable values (#1387) * Generate IRType for interfaces, and use them as the type of IRWitnessTable values. This results the following IR for the included test case: ``` [export("_S3tu010IInterface7Computep1pii")] let %1 : _ = key [export("_ST3tu010IInterface")] [nameHint("IInterface")] interface %IInterface : _(%1); [export("_S3tu04Impl7Computep1pii")] [nameHint("Impl.Compute")] func %Implx5FCompute : Func(Int, Int) { block %2( [nameHint("inVal")] param %inVal : Int): let %3 : Int = mul(%inVal, %inVal) return_val(%3) } [export("_SW3tu04Impl3tu010IInterface")] witness_table %4 : %IInterface { witness_table_entry(%1,%Implx5FCompute) } ``` * Fixes per code review comments. Moved interface type reference in IRWitnessTable from their type to operand[0]. * Fix typo in comment.
*	Filter lookup results from interfaces in `visitMemberExpr`.	Yong He	2020-06-05
\| \| \| \|	Fixes #1377
*	Small improvements around atomics (#1333)	jsmall-nvidia	2020-04-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Use the original value in the test. Run test on VK. * Added RWBuffer and Buffer types to C++ prelude. * Add vk to atomics.slang tests * Update target-compatibility around atomics. When tests disabled in atomics-buffer.slang explained why. * tabs -> spaces. * Small docs improvement.
*	CUDA global scope initialization of arrays without function calls. (#1320)	jsmall-nvidia	2020-04-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix CUDA output of a static const array if values are all literals. * Fix bug in Convert definition. * Output makeArray such that is deconstructed on CUDA to fill in based on what the target type is. Tries to expand such that there are no function calls so that static const global scope definitions work. * Fix unbounded-array-of-array-syntax.slang to work correctly on CUDA. * Remove tabs. * Check works with static const vector/matrix. * Fix typo in type comparison. * Shorten _areEquivalent test. * Rename _emitInitializerList. Some small comment fixes. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Change rules for layout of buffers/blocks containing only interface types ↵	Tim Foley	2020-04-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(#1318) TL;DR: This is a tweak the rules for layout that only affects a corner case for people who actually use `interface`-type shader parameters (which for now is just our own test cases). The tweaked rules seem like they make it easier to write the application code for interfacing with Slang, but even if we change our minds later the risk here should be low (again: nobody is using this stuff right now). Slang already has a rule that a constant buffer that contains no ordinary/uniform data doesn't actually allocate a constant buffer `binding`/`register`: struct A { float4 x; Texture2D y; } // has uniform/ordinary data struct B { Texture2D u; SamplerState v; } // has none ConstantBuffer<A> gA; // gets a constant buffer register/binding ConstantBuffer<B> gB; // does not There is similar logic for `ParameterBlock`, where the feature makes more sense. A user would be somewhat surprised if they declared a parmaeter block with a texture and a sampler in it, but then the generating code reserved Vulkan `binding=0` for a constant buffer they never asked for. The behavior in the case of a plain `ConstantBuffer` is chosen to be consistent with the parameter block case. (Aside: all of this is a non-issue for targets with direct support for pointers, like CUDA and CPU. On those platforms a constant buffer or parameter block always translates to a pointer to the contained data.) Now, suppose the user declares a constant buffer with an interface type in it: interface IFoo { ... } ConstantBuffer<IFoo> gBuffer; When the layout logic sees the declaration of `gBuffer` it doesn't yet know what type will be plugged in as `IFoo` there. Will it contain uniform/ordinary data, such that a constant buffer is needed? The existing logic in the type layout step implemented a complicated rule that amounted to: * A `ConstantBuffer` or `cbuffer` that only contains `interface`/existential-type data will not be allocated a constant buffer `register`/`binding` during the initial layout process (on unspecialized code). That means that any resources declared after it will take the next consecutive `register`/`binding` without leaving any "gap" for the `ConstantBuffer` variable. * After specialization (e.g., when we know that `Thing` should be plugged in for `IFoo`), if we discover that there is uniform/ordinary data in `Thing` then we will allocate a constant buffer `register`/`binding` for the `ConstantBuffer`, but that register/binding will necessarily come after any `register`s/`binding`s that were allocated to parameters during the first pass. * Parameter blocks were intended to work the same when when it comes to whether or not they allocate a default `space`/`set`, but that logic appears to not have worked as intended. These rules make some logical sense: a `ConstantBuffer` declaration only pays for what the element type actually needs, and if that changes due to specialization then the new resource allocation comes after the unspecialized resources (so that the locations of unspecialized parameters are stable across specializations). The problem is that in practice it is almost impossible to write client application code that uses the Slang reflection API and makes reasonable choices in the presence of these rules. A general-purpose `ShaderObject` abstraction in application code ends up having to deal with multiple possible states that an object could be in: 1. An object where the element type `E` contains no uniform/ordinary data, and no interface/existential fields, so a constant buffer doesn't need to be allocated or bound. 2. An object where the element type `E` contains no uniform/ordinary data, but has interace/existential fields, with two sub-cases: a. When no values bound to interface/existential fields use uniform/ordinary dat, then the parent object must not bind a buffer b. When the type of value bound to an interface/existential field uses uniform/ordinary data, then the parent object needs to have a buffer allocated, and bind it. 3. When the element type `E` contains uniform/ordinary data, then a buffer should be allocated and bound (although its size/contents may change as interface/existential fields get re-bound) Needing to deal with a possible shift between cases (2a) and (2b) based on what gets bound at runtime is a mess, and it is important to note that even though both (2a) and (3) require a buffer to be bound, the rules about where the buffer gets bound aren't consistent (so that the application needs to undrestand the distinction between "primary" and "pending" data in a type layout). This change introduces a different rule, which seems to be more complicated to explain, but actually seems to simplify things for the application: * A `ConstantBuffer` or `cbuffer` that only contains `interface`/existential-type data always has a constant buffer `register`/`binding` allocated for it "just in case." * If after specialization there is any uniform/ordinary data, then that will use the buffer `register`/`binding` that was already allocated (that's easy enough). * If after speciazliation there isn't any uniform/ordinary data, then the generated HLSL/GLSL shader code won't declare a buffer, but the `register`/`binding` is still claimed. * A `ParameterBlock` behaves equivalently, so that if it contains any `interface`/existential fields, then it will always allocate a `space`/`set` "just in case" The effect of these rules is to streamline the cases that an application needs to deal with down to two: 1. If the element type `E` of a shader object contains no uniform/ordinary or interface/existential fields, then no buffer needs to be allocated or bound 2. If the element type `E` contains any uniform/ordinary or interface/existential fields, then it is always safe to allocate and bind a buffer (even in the cases where it might be ignored). Furthermore, the reflection data for the constant buffer `register`/`binding` becomes consistent in case (2), so that the application can always expect to find it in the same way.
*	Literal folding on other operators (#1314)	jsmall-nvidia	2020-04-09
\| \| \| \| \| \| \| \| \|	* Fold prefix operators if they prefix an int literal. * Make test case a bit more convoluted. * Remove ++ and -- as not appropriate for folding of literals. * Set output buffer name.
*	Optimize creation of memberDictionary (#1305)	jsmall-nvidia	2020-04-02
\| \| \| \| \| \| \| \| \| \|	* Improve performance of building members dictionary by adding when needed. * Fix unbounded-array-of-array-syntax.slang, that DISABLE_TEST now uses up an index. Use IGNORE_TEST. * Improve variable name. Small improvements. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Fix a bug in exiting SSA form for loops (#1293)	Tim Foley	2020-03-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Slang compiler was bit by a known issue when translating from SSA form back to straight-line code. Give code like the following: int x = 0; int y = 1; while(...) { ... int t = x; x = y; y = t; } ... The SSA construction pass will eliminate the temporary `t` and yield code something like: br(b, 0, 1); block b(param x : Int, param y : Int): ... br(b, y, x); The loop-dependent variables have become parameters of the loop block, and the branchs to the top of the loop pass the appropriate values for the next iteration (e.g., the jump that starts the loop sends in `0` and `1`). The problem comes up when translating the back-edge the continues the loop out of SSA form. Our generated code will re-introduce temporaries for `x` and `y`: int x; int y; // jump into loop becomes: x = 0; y = 1; for(;;) { ... // back-edge becomes x = y; y = x; continue; } The problem there is that we've naively translated a branch like `br(b, <a>, <b>)` into `x = <a>; y = <b>;` but that doesn't work correctly in the case where `<b>` is `x`, because we will have already clobbered the value of `x` with `<a>`. The simplest fix is to introduce a temporary (just like the input code had), and generate: // back-edge becomes int t = x; x = y; y = t; This change modifies the `emitPhiVarAssignments()` function so that it detects bad cases like the above and emits temporaries to work around the problem. A new test case is included that produced incorrect output before the change, and now produces the expected results. A secondary change is folded in here that tries to guard against a more subtle version of the problem: for(...) { ... int x1 = x + 1; int y1 = y + 1; x = y1; y = x1; } In this more complicated case, each of `x` and `y` is being assigned to a value derived from the other, but neither is being set using a block parameter directly, so the changes to `emitPhiVarAssignments()` do not apply. The problem in this case would be if the `shouldFoldInstIntoUseSites()` logic decided to fold the computation of `x1` or `y1` into the branch instruction, resulting in: x = y + 1; y = x + 1; which would again violate the semantics of the original code, because now there is an assignment to `x` before the computation of `x + 1`. Right now it seems impossible to force this case to arise in practice, due to implementation details in how we generate IR code for loops. In particular, the block that computes the `x+1` and `y+1` values is currently always distinct from the block that branches back to the top of the loop, and we do not allow "folding" of sub-expressions from different blocks. It is possible, however, that future changes to the compiler could change the form of the IR we generate and make it possible for this problem to arise. The right fix for this issue would be to say that we should introduce a temporary for any branch argument that "involves" a block parameter (whether directly using it or using it as a sub-expression). Unfortunately, the ad hoc approach we use for folding sub-expressions today means that testing if an operand "involves" something would be both expensive and unwieldy. A more expedient fix is to disallow all folding of sub-expressions into unconditional branch instructions (the ones that can pass arguments to the target block), which is what I ended up implementing in this change. Making that defensive change alters the GLSL we output for some of our cross-compilation tests, in a way that required me to update the baseline/gold GLSL. A better long-term fix for this whole space of issues would be to have the "de-SSA" operation be something we do explicitly on the IR. Such an IR pass would still need to be careful about the first issue addressed in this change, but the second one should (in principle) be a non-issue given that our emit/folding logic already handles code with explicit mutable local variables correctly.
*	Unroll target improvements (#1291)	jsmall-nvidia	2020-03-25
\| \| \| \| \| \| \| \| \| \| \|	* Add unroll support for CUDA, and preliminary for C++. Document [unroll] support. * Fix loop-unroll to run on CPU, and test on CPU and elsewhere. Fix bug in emitting loop unroll condition. * Improved comment. * Added support for vk/glsl loop unrolling.
*	Better diagnostics on failure on CUDA. (#1288)	jsmall-nvidia	2020-03-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Better diagnostics on failure on CUDA. * Catch exceptions in render-test * * Added ability to disable reporting on CUDA failures * Stopped using exception for reporting (just write to StdWriter::out() * Removed CUDAResult type * Don't set arch type on nvrtc to see if fixes CI issues. * Try compute_30 on CUDA. * Added ability to IGNORE_ a test DIsabled rw-texture-simple and texture-get-dimensions * Disable tests that require CUDA SM7.0 Use DISABLE_ prefix to disable tests. * Disable signalUnexpectedError doing printf.
*	CPU Texture GetDimensions support (#1283)	jsmall-nvidia	2020-03-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Added CPU support for GetDimensions on C++/CPU target. Added texture-get-dimension.slang test * Fix some typos. * Update CUDA docs. * Fix output of GetDimensions on glsl when has an array. Disabled VK - because VK renderer doesn't support createTextureView * Fix typo. * Fix typo. * Fix bad-operator-call diagnostics output.
*	Add a basc inlining facility for use in the stdlib (#1271)	Tim Foley	2020-03-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The main feature visible to the stdlib here is the `[__unsafeForceInlineEarly]` attribute, which can be attached to a function definition and forces calls to that function to be inlined immediately after initial IR lowering. The pass is implemented in `slang-ir-inline.{h,cpp}` and currently only handles the completely trivial case of a function with no control flow that ends with a single `return`. The lack of support for any other cases motivates the `__unsafe` prefix on the attribute. In order to test that the pass works, I modified the "comma operator" in the standard library to be defined directly (rather than relying on special-case handling in IR lowering), and then added a test that uses that operator to make sure it generates code as expected. The compute version of the test confirms that we generate semantically correct code for the operator, while the SPIR-V cross-compilation test confirms that our output matches GLSL where the comma operator has been inlined, rather than turned into a subroutine. Notes for the future: * With this change it should be possible (in principle) to redefine all the compound operators in the stdlib to instead be ordinary functions with the new attribute, removing the need for `slang-compound-intrinsics.h`. * Once the compound intrinsics are defined in the stdlib, it should be easier/possible to start making built-in operators like `+` be ordinary functions from the standpoint of the IR * The attribute and pass here could be extended to include an alternative inlining attribute that happens later in compilation (after linking) but otherwise works the same. This could in theory be used for functions where we don't want to inline the definition into generated IR, but still want to inline things berfore generating final HlSL/GLSL/whatever. * The inlining pass itself could be generalized to work for less trivial functions pretty easily; for the most part it would just mean "splitting" the block with the call site and then inserting clones of the blocks in the callee. Any `return` instructions in the clone would become unconditional branches (with arguments) to the block after the call (which would get a parameter to represent the returned value). * The "hard" part for such an inlining pass would be handling cases where the control flow that results from inlining can't be handled by our later restructuring passes. The long-term fix there is to implement something like the "relooper" algorithm to restructure control flow as required for specific targets.
*	Support for RWTexture types on CPU and CUDA (#1243)	jsmall-nvidia	2020-02-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Added FloatTextureData as a mechanism to enable CPU based Texture writes. * Add [] RWTexture access for CPU. * Fixed rw-texture-simple.slang.expected.txt * WIP: CUDA stdlib has support for [] surface access. * Made IRWTexture class able to take different locations. Doing a Texture2d access on CUDA works. * Fix bug in outputing UniformState - was missing out padding. Support RWTexture with array. Support RWTexture3D. * Use * for locations for read only textures, so only need a ITexture interface. * Fix problem around application of set/get for CUDA on subscript Texture types.
*	Add surface syntax for "this type" (#1236)	Tim Foley	2020-02-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Within the context of an aggregate type (or an `extension` of one), the programmer can use `this` to refer to the "current" instance of the surrounding type, but there is no easy way to utter the name of the type itself. This is especially relevant inside of an `interface`, where the type of `this` isn't actually the `interface` type, but rather a placeholder for the as-yet-unknown concrete type that will implement the interface. This change adds a keyword `This` that works similarly to `this`, but names the current type instead of the current instance. It can be used to declare things like binary methods or factory functions in an interface: ``` interface IBasicMathType { This absoluteValue(); This sumWith(This left); } T doSomeMath<T:IBasicMathType>(T value) { return value.sumWith(value.absoluteValue()); } ``` The `This` type is consistent with the type named `Self` in Rust and Swift (where Rust/Swift use `self` instead of `this`). Other names could be considered (e.g., `ThisType`) if we find that users don't like the name in this change.
*	Initial support for explicit default initializers (#1235)	Tim Foley	2020-02-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change makes it so that for a suitable type `MyType`, a variable declaration like: MyType v; is treated as if it were written: MyType v = MyType(); The definition of "suitable" here is that `MyType` needs to have an available `__init` declaration that can be invoked with zero arguments. I've added a test to confirm that the new behavior works in this specific case. There are a bunch of caveats to the feature as it stands today: * Just because `MyType` has a zero-parameter `__init`, that doesn't mean an array type like `MyType[10]` does, so arrays currently remain uninitialized by default. Fixing this gap requires careful consideration because some, but not all, array types should be default-initializable. * The change here should mean that a `struct` type with a field like `MyType f;` should count as having a default initial-value expression for that field, but I haven't confirmed that. * Even if a `struct` provides initial values for all its fields (e.g., `struct S { float f = 0; }`), that doesn't mean it has a default `__init` right now, so those `struct` types will still be left uninitialized by default. Converging all this behavior is still TBD. Just to be clear: there is no provision or plan in Slang to support destructors, RAII, copy constructors, move constructors, overloaded assignment operations, or any other features that buy heavily into the C++ model of how construction and destruction of values gets done. In fact, I'm not even 100% sure I like having this change in place at all, and I think we should reserve the right to revert it and say that only specific stdlib types get to opt in to default initialization along these lines.
*	WIP on RWTexture types on CUDA/CPU (#1234)	jsmall-nvidia	2020-02-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* CUDA support for array of resources. * * Add support for Texture2DArray on CPU * Expand texture-simple.slang to test Texture2DArray * Reorganise CUDAComputeUtil to split out createTextureResource. * Add TextureCubeArray support for CPU/CUDA targets. * Pulled out CUDAResource Renamed derived classes to reflect that change. * Creation of SurfObject type. * Functions to return read/write access for simplifying future additions. * WIP for RWTexture access on CPU/CUDA. * CUsurfObject cannot have mips. * Ability to set number of mips on test data. Preliminary support for CUsurfObj and RWTexture1D on CUDA. CUDA docs improvements. * Fix typo.
*	Initial support for user-defined initializer/constructor declarations (#1233)	Tim Foley	2020-02-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The basic idea is that the user can write: ```hlsl struct MyThing { int a; float b; __init(int x, float y) { a = x; b = y; } } ``` and after that point, they can create an intstance of their `MyThing` type as simply as `MyThing(123, 4.56f)`. There was already a large amount of infrastructure laying around that is shared between ininitializers and ordinary functions, so enabling this feature mostly amounted to tying up some loose ends: * In the parser, make sure to properly push/pop the scope for an `__init` (or `__subscript`) declaration, so parameters would be visible to the body * In semantic checking, make sure that declaration "header" checking properly bottlenecks all the function-like cases into a base routine * In semantic checking, make sure that the logic for checking function bodies applies to every `FunctionDeclBase` with a body, and not just `FuncDecl`s * Update semeantic checking for statements to allow for any `FunctionDeclBase` as the parent declaration, not just a `FuncDecl` * In lookup, treat the `this` parameter of an `__init` (well, not actually a parameter in this case) as being mutable, just like for a `[mutating]` method * In IR codegen, don't just assume that all `__init`s are intrinsics, and narrow the scope of that hack to just `__init`s without bodies * In IR codegen, detect when we are emitting an IR function for an `__init`, and in that case create a local variable to represent the `this` value, and implicitly return that value at the end of the body. From that point on the rest of the compiler Just Works and IR codegen doesn't have to think of an `__init` as being any different than if the user had declared a `static MyThing make(...)` function. Caveats: * C++ users might like to use that naming convention (so `MyThing` as the name instead of `__init`). We can consider that later. * Everybody else might prefer a keyword other than `__init` (e.g., just `init` as in Swift), but I'm keeping this as a "preview" feature for now, rather than something officially supported * Early `return`s from the body of an `__init` aren't going to work right now. * There is currently no provision for automatically synthesizing initializers for `struct` types based on their fields. This seems like a reasonable direction to take in the future. * There is no provision for routing `{}`-based initializer lists over to initializer calls. The two syntaxes probably need to be unified at some point so that doing `MyType x = { a, b, c }` and `let x = MyType(a, b, c)` are semantically equivalent. It is possible that as a byproduct of this change user-defined `__subscript`s might Just Work, but I am guessing there will still be loose ends on that front as well, so I will refrain from looking into that feature until we have a use case that calls for it.
*	CUDA/CPU support for 1D, 2D, CubeArray (#1232)	jsmall-nvidia	2020-02-20
\| \| \| \| \| \| \| \| \| \| \|	* CUDA support for array of resources. * * Add support for Texture2DArray on CPU * Expand texture-simple.slang to test Texture2DArray * Reorganise CUDAComputeUtil to split out createTextureResource. * Add TextureCubeArray support for CPU/CUDA targets.
*	First pass Texture Array support on CUDA/CPU (#1225)	jsmall-nvidia	2020-02-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add cubemap support. * Add CUDA fence instrinsics. * Added Gather for CUDA. * Use the CUDA driver API as much as possible. * * Support 1D texture on CPU * WIP on 1D texture on CUDA * Added simplified texture test * Fix test. * Improve texture-simple tests. * * Add CPU support for 3d textures * Add support for mip maps to CUDA * Disable warnings in nvrtc * Update CUDA docs * WIP on 3d texture support. * Add support for 3d textures for CPU and CUDA. * CPU and CUDA support for cube maps. * Add CPU support for Texture1DArray. * Support CUDA Layered/Array type in meta library.
*	CUDA/CPU resource coverage (#1224)	jsmall-nvidia	2020-02-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add cubemap support. * Add CUDA fence instrinsics. * Added Gather for CUDA. * Use the CUDA driver API as much as possible. * * Support 1D texture on CPU * WIP on 1D texture on CUDA * Added simplified texture test * Fix test. * Improve texture-simple tests. * * Add CPU support for 3d textures * Add support for mip maps to CUDA * Disable warnings in nvrtc * Update CUDA docs * WIP on 3d texture support. * Add support for 3d textures for CPU and CUDA.
*	Feature/cuda coverage (#1223)	jsmall-nvidia	2020-02-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add cubemap support. * Add CUDA fence instrinsics. * Added Gather for CUDA. * Use the CUDA driver API as much as possible. * * Support 1D texture on CPU * WIP on 1D texture on CUDA * Added simplified texture test * Fix test. * Improve texture-simple tests. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Nvrtc disable warnings/Float literal improvements (#1220)	jsmall-nvidia	2020-02-12
\| \| \| \| \| \| \|	* Added 'truncate' for fixing floats, for floats near the max value (as opposed to making infinite). Put AreNearlyEqual into Math * Test for ::make static method.
*	CUDA barrier/atomic support (#1218)	jsmall-nvidia	2020-02-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* * Improved fastRemoveAt * Fixed off by one bug * Fixed const safeness with List<> * Made List begin and end const safe. * Revert to previous RefPtr usage. * Fix bug with casting. * Tabs -> spaces. Small fixes/improvements to List. * Improve comment on List. * Group shared/atomic test works on CUDA. * * Enabled CUDA tests for atomics tests * Enabled DX12 test for atomics-buffer.slang Not clear just yet how to implement that for CUDA - it will work with StructuredBuffer. * hasContent -> isNonEmpty * Remove unneeded comment.
*	Fixes to make all CPU compute shaders work on CUDA (#1211)	jsmall-nvidia	2020-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Launch CUDA test taking into account dispatch size. * Enable isCPUOnly hack to work on CUDA. * Rename 'isCPUOnly' hack to 'onlyCPULikeBinding'. * Add $T special type. Support SampleLevel on CUDA. * Fix typo.
*	Literal handling improvements (#1202)	jsmall-nvidia	2020-02-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* WIP: 64 literal diagnostic and truncation. * Improve how integer truncation is handled/supported. Added literal-int64.slang test. Set a suffix on all literals. Fixed problem on C++ based targets where l suffix was not the same as int() cast. So on C++ derived emitters, int() is used instead of l suffix to have same behavior across targets. * Add literal diagnostic testing. * Allow lexer to lex - in front of literals. * Fix lexing and converting int literal with -. * Too large small values of floats become inf. Handling writing inf types out on different targets. Add function to deterimine if a float literals kind. * Roll back the support of lexer lexing negative literals. * Fixed tests broken because of diagnostics numbers. Improved _isFinite * Fix compilation on linux. * Fix problem with abs on linux - use Math::Abs. * Fix typo. * * Improve warnings for float literals zeroed * Improved 64 bit type documentation * Handle half * Improved comments * Fixed tests broken * Use capital letters for suffixes. * Make default behavior on outputting a int literal that is an 'int32_t' is cast (not suffix) to avoid platform inconsistencies. Improve documentation for 64 bit types. Make tests cover material in docs. * Fixed tests. * Rename FloatKind::Normal -> Finite * Fix half zero check.
*	Fix layout for structured buffers of matrices (#1184)	Tim Foley	2020-01-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When using row-major layout (via command-line or API option), the following sort of declaration: ```hlsl StructuredBuffer<float4x4> gBuffer; ... gBuffer[i] ... ``` Generates unexpected results when compiled to DXBC via fxc or DXIL via dxc, because the fxc/dxc compilers do not respect the matrix layout mode in this specific case (a structured buffer of matrices). Instead, they always use column-major layout, even if row-major was requested by the user. A user can work around this behavior by wrapping the matrix in a `struct`: ```hlsl struct Wrapper { float4x4 wrapped; } SturcturedBuffer<Wrapper> gBuffer; ... gBuffer[i].wrapped ... ``` This change simply automates that workaround when compiling for an HLSL-based downstream compiler, so that we get the same behavior across all our backends. The change adds a test case to confirm the behavior across multiple targets, but it turns out we also had a test checked in that confirmed the buggy (or at least surprising) fxc/dxc behavior, so that one had its baselines changed and can work as a regression test for this fix as well.
*	Synthesizing CUDA tests (#1183)	jsmall-nvidia	2020-01-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* When using setUniform clamp the amount of data written to the buffer size. * CUDA implement StructuredBuffer/ByteAddressBuffer as pointer/count as is on CPU. Allow bounds check to zero index. Update docs. * Synthesize tests. * Fix bug in CUDA output. * Fixing more tests to run on CUDA. * Added BaseType for layout of Vector and Matrix - as they are held as int32_t vector array types. * Enable unbound array support on CUDA. * Added unsized array support for CUDA documentation.
*	WIP HLSL intrinsic coverage (#1171)	jsmall-nvidia	2020-01-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Added hlsl-intrinsic test folder. Enabled ceil as works across targets. * log10 support. * Fix float % on CPU/CUDA to match HLSL which is fmod (not fremainder). * Added log10 tests back to scalar-float.slang * Don't add the ( for $Sx - it's clearer what's going on without it. * Works on CUDA/CPU. Problem with asint/asuint do not seem to be found. * Only asuint exists for double. * Support countbits on CUDA and C++. * Fix typo in C++ population count. * First pass at int vector intrinsic tests. * Swizzle for int. * Bit cast tests on CUDA. * Fix warning on gcc. * Fix bit-cast-double execution on CUDA. * scalar-int test working on gcc release.
*	HLSL intrinsic coverage (#1169)	jsmall-nvidia	2020-01-21
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Added hlsl-intrinsic test folder. Enabled ceil as works across targets. * log10 support. * Fix float % on CPU/CUDA to match HLSL which is fmod (not fremainder). * Added log10 tests back to scalar-float.slang * Don't add the ( for $Sx - it's clearer what's going on without it.
*	CUDA support improvements (#1168)	jsmall-nvidia	2020-01-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add test result for compile-to-cuda * Add RAII for some CUDA types to simplify usage. * First pass handling of some instrinsics on CUDA (for example transcendentals) * CUDA working with built in intrinsics. * Add missing CUDA prelude intrinsics. * CUDA matches CPU output on simple-cross-compile.slang * First pass at hlsl-scalar-float-intrinsic.slang test. * Fix smoothstep impl on CUDA and CPU. * Fixed step intrinsic on CUDA/CPU. * Added operator[] to Matrix for C++, to allow row access. Needs a fix for CUDA. * Fixed warning on clang build.
*	Fix invocation of `[mutating]` methods (#1156)	Tim Foley	2019-12-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The logic for invoking methods (member functions) in `slang-lower-to-ir.cpp` was failing to take into account whether the callee was `[mutating]` or not. Instead, it would always lower the `base` expression in something like `base.f(...)` as an r-value expression, consistent with a non-`[mutating]` method. The incorrect code generation strategy somehow turned out to work in many cases, but it broke in cases where a `[mutating]` method was called on an `inout` parameter. E.g., in this code: ```hlsl struct Stuff { [mutating] void doThing() { ... } } void broken(inout Stuff s) { s.doThing(); } ``` The `broken` function would fail to write back the value mutated by `doThing` to its `s` parameter before returning. The crux of the fix here is inside `visitInvokeExpr()`. Instead of directly calling `lowerRValueExpr` on the base expression of a method/member-function call, we instead compute the "direction" of the `this` parameter in the callee, and use that to emit the argument expression appropriately. In order to enable that change, there are several refactorings included: * The existing `ParameterDirection` and `getParameterDirection()` calls were lifted out from the declaration visitor to the global scope, so that they could be shared between lowering of functions and their call sites. * The logic for determining the "direction" of a `this` parameter was factored out of `collectParameterLists()` into its own `getThisParamDirection()` subroutine (again so that functions and call sites can share matching logic). * The logic for turning an AST expression used as a call argument into IR argument(s)* was pulled out into its own `addCallArgsForParam` and was refactored to rely on a `ParameterDirection` instead of directly inspecting the modifiers on a `ParamDecl`. This allows the function to be used for ordinary/direct arguments and the `this` argument, and also ensures that the caller and callee will agree on the direction of parameters. Fixing the way that `[mutating]` methods are called actually broke some test cases, specifically in the cases where a `[mutating]` method was being called on a value with an interface-constrained generic type: ```hlsl interface IThing { [mutating] void doStuff(); } void myFunc<T : IThing>(inout T thing) { thing.doStuff(); } ``` Our argument passing for `inout` parameters currently requires that we make a temp copy of `thing` into a local, and then pass that local as argument for the `inout` parameter, before copying back. The issue that arose was that a simple version of the logic uses the type of the `base` expression in `base.someMethod(...)` as the type of the local variable, but for an interface method call the base expression will have been cast to the interface type (we effectively have `((IThing) thing).doStuff()`. The fix here was to query the this type through the member function we are calling, and to share that logic between the function-call and function-declaration cases, to try and make sure they match, which meant even more logic got hoisted out of the declaration-emission logic and to the top level. Note: This change does not clean up any other clarity or performance concerns around `out` and `inout` parameters; it is only focused on correctness.
*	Support conversion from int/uint to enum types (#1147)	Tim Foley	2019-12-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Support conversion from int/uint to enum types The basic feature here is tiny, and is summarized in the code added to the stdlib: ``` extension __EnumType { __init(int val); __init(uint val); } ``` The front-end already makes all `enum` types implicitly conform to `__EnumType` behind the scenes, and this `extension` makes it so that all such types inherit some initializers (`__init` declarations, aka. "constructors") that take `int` and `uint`. (Note: right now all `__init` declarations in Slang are assumed to be implemented as intrinsics using `kIROp_Construct`. This obviously needs to change some day, especially so that we can support user-defined initializers.) Actually making this work required a bit of fleshing out pieces of the compiler that had previously been a bit ad hoc to be a bit more "correct." Most of the rest of this description is focused on those details, since the main feature is not itself very exciting. When overload resolution sees an attempt to "call" a type (e.g., `MyType(3.0)`) it needs to add appropriate overload candidates for the initializers in that type, which may take different numbers and types of parameters. The existing code for handling this case was using an ad hoc approach to try to enumerate the initializer declarations to consider, which might be found via inheritance, `extension` declarations, etc. In practice, the ad hoc logic for looking up initializers was just doing a subset of the work that already goes into doing member lookup. Changing the code so that it effectively does lookup for `MyType.__init` allows us to look up initializers in a way that is consistent with any other case of member lookup. Generalizing this lookup step brings us one step closer to being able to go from an `enum` type `E` to an initializer defined on an `extension` of an `interface` that `E` conforms to. One casualty of using the ordinary lookup logic for initializers is that we used to pass the type being constructed down into the logic that enumerated the initializers, which made it easier to short-circuit the part of overload resolution that usually asks "what type does this candidate return." It might seem "obvious" that an initializer/constructor on type `Foo` should return a value of type `Foo`, but that isn't necessarily true. Consider the `__BuiltinFloatingPointType` interface, which requires all the built-in floating-point types (`float`, `double`, `half`) to have an initializer that can take a `float`. If we call that interface in a generic context for `T : __BuiltinFloatingPointType`, then we want to treat that initializer as returning `T` and not `__BuiltinFloatingPointType`. Without the ad hoc logic in initializer overload resolution, this is the exact problem that surfaced for the stdlib definition of `clamp`. The solution to the "what type does an initializer return" problem was to introduce a notion of a `ThisType`, which refers to the type of `this` in the body of an interface. More generally, we will eventually want to have the keyword `This` be the type-level equivalent of `this`, and be usable inside any type. The `calcThisType` function introduced here computes a reasonable `Type` to represent the value of `This` within a given declaration. Inside of concrete type it refers to the type itself, while in an `interface` it will always be a `ThisType`. The existing `ThisTypeSubstitution`s, previously only applied to associated types, now apply to `ThisType`s as well, in the same situations. The next roadblock for making the simple declarations for `__EnumType` work was that the lookup logic was only doing lookup through inheritance relationships when the type being looked up in was an `interface`. The logic in play was reasonable: if you are doing lookup in a type `T` that inherits from `IFoo`, then why bother looking for `IFoo::bar` when there must be a `T::bar` if `T` actually implements the interface? The catch in this case is that `IFoo::bar` might not be a requirement of `IFoo`, but rather a concrete method added via an `extension`, in which case `T` need not have its own concrete `bar`. The simple/obvious fix here was to make the lookup logic always include inherited members, even when looking up through a concrete type. Of course, if we allow lookup to see `IFoo::bar` when looking up on `T`, then we have the problem that both `T::bar` and `IFoo::bar` show up in the lookup results, and potentially lead to an "ambiguous overload" error. This problem arises for any interface rquirement (so both methods and associated types right now). In order to get around it, I added a somewhat grungy check for comparing overload candidates (during overload resolution) or `LookupResultItem`s (during resolution of simple overloaded identifiers) that considers a member of a concrete type as automatically "better" than a member of an interface. The Right Way to solve this problem in the long run requires some more subtlety, but for now this check should Just Work. One final wrinkle is that due to our IR lowering pass being a bit overzealous, we currently end up trying to emit IR for those new `__init` declarations, which ends up causing us to try and emit IR for a `ThisType`. That is a case that will require some subtlty to handle correctly down the line, for for now we do the expedient thing and emit the `ThisType` for `IFoo` as `IFoo` itself, which is not especially correct, but doesn't matter since the concrete initializer won't ever be called. * testing: add more debug output to Unix process launch function * testing: increase timeout when running command-line tests
*	Fix bug in calcSafeRadians. (#1138)	jsmall-nvidia	2019-12-02
\|
*	Remove support for explicit register/binding syntax on TEST_INPUT (#1132)	Tim Foley	2019-11-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The `TEST_INPUT` facility allows textual Slang test cases to provide two kinds of information to the `render-test` tool: 1. Information on what shader inputs exist 2. Information on what values/objects to bind into those shader inputs Under the first category of information, there exists supporting for attaching a `dxbinding(...)` annotation to a `TEST_INPUT` which seemingly indicates what HLSL `register` the input uses. There is a similar `glbinding(...)` annotation, used for OpenGL and Vulkan. It turns out that these annotations were, in practice, completely ignored and had no bearing on how `render-test` allocates or bindings graphics API objects. There was some amount of code attempting to validate that explicit registers/bindings were being set appropriately, but the actual values were being ignored. The visible consequence of the `dxbinding` and `glbinding` annotations being ignored is issue #1036: the order of `TEST_INPUT` lines was de facto determining the registers/bindings that were being used by `render-test`. This change simply removes the placebo features and strips things down to what is implemented in practice: the `TEST_INPUT` lines do not need target-API-specific binding/register numbers, because their order in the file implicitly defines them. I added logic to the parsing of `TEST_INPUT` lines to make sure I got an error message on any leftover annotations, and went ahead and systematicaly deleted all of the placebo annotations from our test cases. If we decide to make `TEST_INPUT` lines not depend on order of declaration in the future, we can build it up as a new and better considered feature. The main alternative I considered was to keep the annotations in place, and change `render-test` and the `gfx` abstraction layer to properly respect them, but that path actually creates much more opportunity for breakage (since every single test case would suddenly be specifying its root signature / pipeline layout via a different path using data that has never been tested). The approach in this change has the benefit of giving me high confidence that all the test cases continue to work just as they had before.