slang.git - Making it easier to work with shaders

	Commit message (Collapse)	Author	Age
*	Initial implementation of SP#015 `DescriptorHandle<T>`. (#6028)	Yong He	2025-01-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Initial implementation of `ResourcePtr<T>`. * Update docs * Fix build error. * Add more discussion. * Update documentation. * Update TOC. * Fix. * Fix. * Add test case for custom `getResourceFromBindlessHandle`. * Add namehint to generated descriptor heap param. * Fix. * Fix. * format code * Rename to `DescriptorHandle`, and add `T.Handle` alias. * Fix compiler error. * Fix. * Fix build. * Renames. * Fix documentation. * Documentation fix. --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>
*	format	Ellie Hermaszewska	2024-10-29
\| \| \| \| \| \| \|	* format * Minor test fixes * enable checking cpp format in ci
*	Remove use of `G0` and `__target_intrinsic` in stdlib. (#4170)	Yong He	2024-05-14
\| \| \| \| \| \| \|	* Remove use of `G0` and `__target_intrinsic` in stdlib. * Fix. * Fix calling intrinsic in global scope.
*	Add `target_switch` and `intrinsic_asm` statement. (#3154)	Yong He	2023-08-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add `target_switch` and `__intrinsic_asm` statement. * Cleanup. * WaveGetActiveMask, WaveGetActiveMask, WaveCountBits. * WaveIsFirstLane. * More wave intrinsics. * wave intrinsics. * merge fix. * Fix. * Fix. * Update test. * update test. * Fix. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Lower all ByteAddressBuffer uses for SPIRV. (#3143)	Yong He	2023-08-23
\| \| \|	Co-authored-by: Yong He <yhe@nvidia.com>
*	Add support for emitting cuda kernel and host functions. (#2712)	Yong He	2023-03-17
\| \| \| \| \| \| \| \| \| \| \|	* Add support for emitting cuda kernel and host functions. * Update test. * Fix cuda preamble emit. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	More control flow simplifications. (#2673)	Yong He	2023-02-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* More control flow and Phi param simplifications. * Fix. * Fix gcc error. * Fix. * More IR cleanup. * Fix bug in phi param dce + ifelse simplify. * Propagate and DCE side-effect-free functions. * Enhance CFG simplifcation to remove loops with no side effects. * Fix. * Fixes. * Fix tests. Add [__AlwaysFoldIntoUseSite] for rayPayloadLocation. * More cleanup. * Fixes. * Fix. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Remove `SharedIRBuilder`. (#2657)	Yong He	2023-02-16
\| \| \|	Co-authored-by: Yong He <yhe@nvidia.com>
*	Overhaul global inst deduplication and cpp/cuda backend. (#2654)	Yong He	2023-02-16
\| \| \| \| \| \| \| \| \|	* Overhaul global inst deduplication and cpp/cuda backend. * Update IR documentation. --------- Co-authored-by: Yong He <yhe@nvidia.com>
*	Allow `class` to implement COM interface, [DLLExport] (#2338)	Yong He	2022-07-25
\| \| \| \| \| \| \|	* Allow `class` to implement COM interface, [DLLExport] * Fix [COM] usage in tests and examples with UUIDs. Co-authored-by: Yong He <yhe@nvidia.com>
*	Actual global support (#2262)	jsmall-nvidia	2022-06-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Use TerminatedUnownedStringSlice for literals in output C++. * Remove Escape/Unescape functions used in slang-token-reader.cpp Add target type of 'host-cpp' etc to map to the target types. * Fix some corner cases around string encoding. * Added unit test for string escaping. Fixed some assorted escaping bugs. * Updated test output. * Added decode test. * Stop using hex output, to get around 'greedy' aspect. Use octal instead. * Added HostHostCallable Small changes to use ArtifactDesc/Info instead of large switches. * Fix C++ emit to handle arbitrary function export. * Add options handling for callable without an output being specified. * Can compile with COM interface. Added example using com interface. * Use the IR Ptr type instead of hack in C++ emit for interfaces. * Fix issue with outputting the COM call when ptr is used. * Fix crash issue on compilation failure. * Add support for __global. * Added `ActualGlobalRate` Added special handling around globals and COM interfaces. Tested out in cpu-com-example. * Fix typo in NodeBase. * Support for accessing globals by name working. * Check that actual global initialization is working. * Refactor the com replacement such that it doesn't need a cache or do anything special with GlobalVar. * Remove context. Only create replacement if needed. * Split out COM host-callable into a unit-test. * host-callable com testing on C++and llvm. * Comment around the COM ptr replacement. * Disable com test on vs 32 bit. Fix C++ prelude * Disable 32 bit targets testing com host-callable. * Use JSON parsing to locate VS version. * Need platform detection in C++prelude. * Fix com host callable test for LLVM. * Work around for not being able to include "targetConditionals.h"
*	COM interfaces with host callable (#2258)	jsmall-nvidia	2022-06-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Use TerminatedUnownedStringSlice for literals in output C++. * Remove Escape/Unescape functions used in slang-token-reader.cpp Add target type of 'host-cpp' etc to map to the target types. * Fix some corner cases around string encoding. * Added unit test for string escaping. Fixed some assorted escaping bugs. * Updated test output. * Added decode test. * Stop using hex output, to get around 'greedy' aspect. Use octal instead. * Added HostHostCallable Small changes to use ArtifactDesc/Info instead of large switches. * Fix C++ emit to handle arbitrary function export. * Add options handling for callable without an output being specified. * Can compile with COM interface. Added example using com interface. * Use the IR Ptr type instead of hack in C++ emit for interfaces. * Fix issue with outputting the COM call when ptr is used. * Fix crash issue on compilation failure.
*	Refactor prelude emit (#2236)	jsmall-nvidia	2022-05-17
\| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Refactor how prelude output works in emit. * Small improvement to emit output. * Move around comment on target specific language directives based on review. Co-authored-by: Theresa Foley <10618364+tangent-vector@users.noreply.github.com>
*	Initial support for COM interface in host code. (#2230)	Yong He	2022-05-10
\| \| \| \|	Co-authored-by: Yong He <yhe@nvidia.com> Co-authored-by: Theresa Foley <10618364+tangent-vector@users.noreply.github.com>
*	Support `[DllImport]` (#2181)	Yong He	2022-04-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Support `[DllImport]` * Fix. * Fix. * Fix array type emit in cpp. * Fix. * Fix. * Fix Co-authored-by: Yong He <yhe@nvidia.com>
*	CUDA half comparison support (#1834)	jsmall-nvidia	2021-05-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Split out StringEscapeUtil. * Added StringEscapeUtil. * Fix typo in unix quoting type. * Small comment improvements. * Try to fix linux linking issue. * Fix typo. * Attempt to fix linux link issue. * Update VS proj even though nothing really changed. * Fix another typo issue. * Fix for windows issue. Fixed bug. * Make separate Utils for escaping. * Fix typo. * Split out into StringEscapeHandler. * Windows shell does handle removing quotes (so remove code to remove them). * Handle unescaping if not initiating using the shell. * Slight improvement around shell like decoding. * Simplify command extraction. * Add shared-library category type. * Fix bug in command extraction. * Typo in transcendental category. * Enable unit-test on in smoke test category. * Make parsing failing output as a failing test. * Fixes for transcendental tests. Disable tests that do not work. * Changed category parsing. * Removed the TestResult parameter from _gatherTestsForFile. Made testsList only output. * Remove testing if all tests were disabled. * Make args of CommandLine always unescaped. * Add category. * Don't need escaping on unix/linux. * Remove some no longer used functions. * Add requireSMVersion to CUDAExtensionTracker. * half-calc.slang now works for CUDA. * bit-cast-16-bit works on CUDA. * WIP handling of CUDA vector<half> types. * Half swizzle CUDA. * Half vector test. * Fix swizzle half bug. * Fix compilation issue with narrowing to Index. * Add unary ops. * Add some vector scalar maths ops. * Add half vector conversions for CUDA. * Fix erroneous comment. * Support for half comparisons. * First pass test for half compare. * Fix bug in CUDA specialized emit control. Updated tests to have pre and post inc/dec. * Removed unneeded parts of the cuda prelude. * Half structured buffer works on CUDA. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Heterogeneous Flag Error Visibility (#1642)	Dietrich Geisler	2020-12-18
\| \| \| \| \| \| \| \| \| \|	* PR to fix issue #1638. This change introduces a diagnostic sink to the emitModule function, and updates all associated calls to that function. Additionally, this commit updates the heterogeneous hello world example to not need the entry and stage flags for simplicity. * Updated emit-cpp per suggested changes Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Initial attempt to enable CUDA dynamic dispatch codegen (#1549)	Yong He	2020-09-17
\| \| \| \| \|	* Front-load cuda module loading to fill in RTTI pointers. * Enable dynamic dispatch codegen for CUDA.
*	Dynamic dispatch bug fixes. (#1541)	Yong He	2020-09-14
\| \| \|	Co-authored-by: Yong He <yhe@nvidia.com>
*	IR pass to generate witness table wrappers. (#1443)	Yong He	2020-07-15
\| \| \| \| \| \| \| \| \|	* Refactor lower-generics pass into separate subpasses. * IR pass to generate witness table wrappers. * Re-generate vs project files. * Fix x86 build error.
*	Remove KernelContext wrapper from CPU/CUDA emit (#1440)	Tim Foley	2020-07-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Remove KernelContext wrapper from CPU/CUDA emit Currently, the CPU and CUDA C++ targets rely on a `KernelContext` type that is generated during emit, as a way to provide implicit access to things that were global in the input Slang code, but that can't actually be emitted as globals in the target language (because the semantics of global declarations differ). For example, input like: ```hlsl ConstantBuffer<Stuff> gStuff; // shader parameter groupshared int gData[1024]; // thread-group shared variable static int gCounter = 0; // "thread-local" global-scope variable void subroutine() { ... } [shader("compute")] void computeMain() { ... } ``` would translate to output C++ for CPU a bit like: ```c++ struct KernelContext { ConstantBuffer<Stuff> gStuff; int gData[1024]; int gCounter = 0; void subroutine() { ... } void computeMain() { ... } }; ``` Note that both `computeMain()` and `subroutine()` are non-`static` members functions on `KernelContext`, so they have an implicit `this` parameter of type `KernelContext`, which allows the bodies of those functions to implicitly reference `gStuff`, etc. by name in their bodies. Because `KernelContext::computeMain()` is a member function, we end up emitting an additional global-scope function to expose the entry point to the outside world, and that function is responsible for declaring a local `KernelContext` and invoking the generated entry point on it. This approach has several important drawbacks: * It complicates the emit logic for CPU and CUDA, with many special cases around when/how things get emitted * It complicates the implementation of dynamic dispatch, because what seems like a function pointer in Slang IR needs to be a pointer-to-member-function in C++. * It makes it difficult to have a non-kernel-oriented mode of compilation for CPU where a Slang function with a given signature gets output as a C++ CPU function with the "same" signature (not wrapped up as a member function of `KernelContext`. This change makes a step toward addressing these issues by making the introducing of the `KernelContext` type be something that is done in an explicit IR pass instead of being handled as part of the last-mile emit logic. The most important change is the removal of code related to `KernelContext` from the `slang-emit-{cpp,cuda}.{h,cpp}` files, with the equivalent logic instead being handled in a new pass in `slang-ir-explicit-global-context.{h,cpp}`. It should be noted that further cleanups to the emit logic should now be possible; in particular, both the CPU and CUDA emit paths are manually sequencing the `EmitAction`s instead of relying on the default logic, but at this point they should be able to just use the default. The additional cleanups are left for future work. The explicit IR pass does more or less what one would expect: it identifies global-scope entities (global variables and parameters) that need to be wrapped and turns them into fields of a `KernelContext` type. It then modifies all entry points to initialize a `KernelContext` as part of their startup. Finally, any code that used to refer to the global entities is changed to refer to a field of the context, with the context passed via new function parameters (the new parameter is only added to functions that need it for now). Transforming global variables into fields of a `KernelContext` type in the IR pass ends up dropping their initial-value expressions (since those were attached as basic blocks on the `IRGlobalVar`). To avoid breaking code that relies on global-scope (but thread-local) variables, this change also adds an explicit pass that takes the initialization logic on all global variables and moves it to explicit logic that runs at the start of every entry point in a linked module (`slang-ir-explicit-global-init.{h,cpp}`). This pass would also be useful when we get back to direct SPIR-V emit, since SPIR-V also requires initialization logic for globals to be emitted into entry points. One complication that arises when the IR is introducing the types for entry-point parameters, global-scope parameters, and the `KernelContext` type is that it becomes harder for the emit logic to utter the names of those types (they might not even have names, since `IRNameHint`s might get stripped). This created a problem since the wrapper operations that were being generated for CPU were taking `void` parameters and casting them to the appropriate type. To work around this issue, we have added an explicit IR pass (`slang-ir-entry-point-raw-ptr-params.{h,cpp}`) that transforms the signature of entry points so that any pointer parameters instead become raw pointer (`void`) parameters, with the casting being handled inside the entry point itself. One consequence of all the above changes is that for the CUDA target we no longer need a wrapper function to invoke the generated entry point any more, because the IR function for the entry point ends up having the correct/expected signature already. This is also the case for CPU when it comes to the `_Thread` wrapper function, but this change doesn't try to eliminate the wrapper because of a belief that the `_Thread`-level interface is going away anyway. Because the IR is now responsible for ensuring the signature of the IR entry point for CUDA and CPU is what is expected, I needed to modify the `slang-ir-entry-point-uniforms` pass to always create an explicit parameter for the entry point uniforms when compiling for CUDA/CPU, even if there were no `uniform` parameters on the entry point as written. This also ended up requiring some tweaks to the parameter layout logic to ensure that CPU/CUDA targets always treat `ConstantBuffer<T>` as a `T` even in the case where `T` is an empty `struct` type (which happens when we construct a `struct` type to represent the uniform parameters of an entry point with no uniform parameters...). There are several future changes that can/should build on this work: We should change the generated signatures for CUDA kernels, so that they don't rely on `KernelContext` for global-scope parameters. At that point we can avoid generating a `KernelContext` at all for CUDA, except when a program uses global-scope thread-local variables. * We should figure out how to make the "ABI" for dynamic-dispatch calls ensure that the kernel context is either always passed, or always not passed. Making a hard-and-fast rule as part of the calling convention for dynamic calls would ensure that they access through the context continues to work with dynamic calls (this change might break it in some cases). * We should figure out how to handle the layout for the `KernelContext` in cases where a program is composed of multiple separately-compiled modules. Right now the layout of the `KernelContext` requires global knowledge (as does the pass that introduces explicit initialization for global-scope thread-locals). * We should try to further clean up the CPU/CUDA C++ emit logic to fall back on the default emit behavior more, now that the various special-case approaches that were taken are no longer needed * fixup: restore build files to default configuration
*	Dynamic code gen for generic local variables. (#1434)	Yong He	2020-07-10
\| \| \| \| \| \| \|	* Dynamic code gen for generic local variables. * Fixes to function calls with generic typed `in` argument. * Fixes per code review comments
*	Add support for global uniform shader parameters (#1433)	Tim Foley	2020-07-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Adding support for global uniform shader parameters This change adds support for Slang programmers to declare shader parameters of "ordinary" types at global scope: ```hlsl uniform float gScaleFactor; void main() { ... = gScaleFactor; ... } ``` The generated HLSL/GLSL/DXIL/SPIR-V output will be something along the lines of: ```hlsl struct GlobalParams { float gScaleFactor; } cbuffer globalParams { GlobalParams globalParams; } void main() { ... = globalParams.gScaleFactor; ... } ``` The binding information used for the implicit `globalParams` constant buffer will be determined by the existing implicit parameter binding logic (which already had support for this kind of transformation). The reason this change is being pursued right now is because it is one step toward removing the implicit `KernelContext` type that is used to wrap the generated code for our CPU and CUDA C++ targets. Handling global-scope parameters of ordinary type requires an IR pass that synthesizes the `GlobalParams` structure type above, and that step ends up removing the need for the similar `UniformState` structure that was being used in the CPU/CUDA emit logic. A more detailed guide to the changes included follows: * The diagnostic for a global-scope variable that is implicitly a shader parameter was kept, but changed to a warning. Users can opt out of the warning by decorating their parameter as a `uniform` (since that keyword is already being used to mark entry-point parameters that should be treated as uniform shader parameters). * To simplify the task of finding the global shader parameters, the `CLikeSourceEmitter` type has been given an `m_irModule` member. The previous emit logic for `UniformState` was having to do a roundabout solution involving the `EmitAction`s to deal with not having direct access to the module. * Removed a few dead declarations in the emit logic (related to a much earlier point where emit was based on the AST instead of the IR). * Made the computation of type names in C++ emit take into account `ConstantBuffer<T>` and `ParameterBlock<T>`. As far as I can tell, these were being handled with some special-case hacks in the emit logic instead of being supported more fundamentally. It might actually be good to pass these through as `ConstantBuffer<T>` and `ParameterBlock<T>` in the C++ output, and allow the prelude to customize their translation (defaulting to defining them as `T`). Removed the special-case C++ emit logic for references to global shader parameters. There are now at most two global shader parameters to deal with, and the default emit logic (referring to them by name) does the Right Thing. * Changed the handling of entry points for C++ (both CPU and CUDA) so that it handles the bundled-up shader paameters for the global and entry-point scopes the same way. The main complication here is OptiX, where parameter data is passed very differently than it is for CUDA compute kernels. * Reverted changes to `ir-entry-point-uniforms` that had made its logic depend on the compilation target. The parameter binding logic was already responsible for deciding if a given target needed to wrap up its entry-point parameters in a constant buffer, and the IR pass was respecting that layout information. The current workaround had been removing the `ConstantBuffer<T>` indirection from this IR pass for CPU/CUDA, but then reintroducing the same indirection later on in the emit step. * Added an explicit IR pass with the task of collecting global-scope parameters of uniform/ordinary type and packaging them up into a `struct`, and then optionally packaging that `struct` up in a constant buffer. This pass bases its decisions on the IR layout information that was already computed, so it should match whatever policy choices were made at the layout level. * Changed the "key" operand on IR `struct` layout information to not assume an `IRStructKey`. The problem here is that the global scope gets a `StructTypeLayout` to represent its members, and this is convenient (rather than having to always special-case logic that handles the global scope), but the "fields" of that struct are global variables which do not have `IRStructKey`s associated with them. The simplest solution is to use the variables themselves as the keys, which required removing the assumption in the IR encoding. * Updated the IR layout process to compute a layout for the global scope of an entire program, and to attach that to the `IRModule` via a decoration. Updated the IR linking process to carry through that decoration to the linked output. This is necessary so that the IR pass that transforms global parameters can access the global-scope layout information. An important concern with this approach is that the contents and layout of the monolithic `GlobalParams` structure depends on the exact set of modules that were linked (and the order in which they were specified, in some cases). This isn't really a new thing with this change, but it becomes more important as we start to think of how to generalize things to better support separate compilation and linking. There are changes that can (and should) be made to the way that IR layouts are computed for programs (e.g., so that we compute layout per-module and then combine them rather than as a whole-program step). In this case, the problem of forming the combined/linked global layout can be moved down the IR level and not be reliant on AST-level information. Just changing the way layout and linking interact would not change the fundamental problem that global shader parameters as they currently exist in Slang/HLSL/GLSL are not readily compatible with true separate compilation. We either need to find a solution strategy that we can apply to allow existing shaders to work with separate compilation or we need to incrementally work toward removing support for global-scope shader parameters in favor of explicit entry-point parameters in all cases. * fixup: missing files * fixup: comment the new code
*	Emit pointers for CPU target. (#1418)	Yong He	2020-07-03
\| \| \|	Co-authored-by: Yong He <yhe@nvidia.com>
*	Dynamic dispatch for generic interface requirements.	Yong He	2020-06-24
\| \| \| \| \| \| \| \|	-Lower interfaces into actual `IRInterfaceType` insts. -Lower `DeclRef<AssocTypeDecl>` into `IRAssociatedType` -Generate proper IRType for generic functions. -Add a test case exercising dynamic dispatching a generic static function through an associated type. -Bug fixes for the test case.
*	Dynamic dipatch non-static functions.	Yong He	2020-06-17
\|
*	Generate dynamic C++ code for the minimal test case. (#1391)	Yong He	2020-06-17
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Add IR pass to lower generics into ordinary functions. * Fix project files * Emit dynamic C++ code for simple generics and witness tables. Fixes #1386. * Remove -dump-ir flag. * Fixups.
*	Generate IRType for interfaces, and reference them as `operand[0]` in ↵	Yong He	2020-06-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IRWitnessTable values (#1387) * Generate IRType for interfaces, and use them as the type of IRWitnessTable values. This results the following IR for the included test case: ``` [export("_S3tu010IInterface7Computep1pii")] let %1 : _ = key [export("_ST3tu010IInterface")] [nameHint("IInterface")] interface %IInterface : _(%1); [export("_S3tu04Impl7Computep1pii")] [nameHint("Impl.Compute")] func %Implx5FCompute : Func(Int, Int) { block %2( [nameHint("inVal")] param %inVal : Int): let %3 : Int = mul(%inVal, %inVal) return_val(%3) } [export("_SW3tu04Impl3tu010IInterface")] witness_table %4 : %IInterface { witness_table_entry(%1,%Implx5FCompute) } ``` * Fixes per code review comments. Moved interface type reference in IRWitnessTable from their type to operand[0]. * Fix typo in comment.
*	Unroll target improvements (#1291)	jsmall-nvidia	2020-03-25
\| \| \| \| \| \| \| \| \| \| \|	* Add unroll support for CUDA, and preliminary for C++. Document [unroll] support. * Fix loop-unroll to run on CPU, and test on CPU and elsewhere. Fix bug in emitting loop unroll condition. * Improved comment. * Added support for vk/glsl loop unrolling.
*	Clean-ups related to expanded standard library coverage (#1269)	Tim Foley	2020-03-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change continues the work already started in moving the definitions of many built-in functions to the standard library. The main focus in this change was reducing the number of operations that had to be special-cased on the CPU and CUDA targets by making sure that the scalar cases of built-in functions map to the proper names in the prelude (e.g., `F32_sin()`) via the ordinary `__target_intrinsic` mechanism. In some cases this cleanup meant that special-case logic that was constructing definitions for those functions using C++ code could be scrapped. Additional changes made along the way: * A few scalar functions that were missing in the CPU/CUDA preludes got added: `round`, hyperbolic trigonometric functions, `frexp`, `modf`, and `fma` * The floating-point `min()` and `max()` definitions in the preludes were changed to use intrinsic operations on the target (which are likely to follow IEEE semantics, while our definitions did not) * For the CUDA target, many of the functions had their names translated during code emit from, e.g., `sin` to `sinf`. This change makes the CUDA target more closely match the C++/CPU target in using names like `F32_sin` consistently. * For the CUDA target, a few additional functions have intrinsics that don't exist (portably) on CPU: `sincos()` and `rsqrt()`. * For the Slang stdlib definitions to work, a new `$P` replacement was defined for `__targert_intrinsic` that expands to a type based on the first operand of the function (e.g., `F32` for `float`). * I removed the dedicated opcodes for matrix-matrix, matrix-vector, and vector-matrix multiplication, and instead turned them into ordinary functions with definitions and `__target_intrinsic` modifiers to map them appropriately for HLSL and GLSL. This is realistically how we would have implemented these if we'd had `__target_intrinsic` from the start. Notes about possible follow-on work: * The `ldexp` function is still left in the Slang stdlib because it has to account for a floating-point exponent and the `math.h` version only handles integers for the exponent. It is possible that we can/should define another overload for `ldexp` (and `frexp`) that uses an integer for exponent, and then have that one be a built-in on CPU/CUDA, with the HLSL `frexp` being defined in the stdlib to delegate to the correct `frexp` for those targets. * The `firstbithigh` and related functions are missing for our CPU and CUDA targets, and will need to be added. It is worth nothing that `firstbithigh` apparently has some very odd functionality around signed integer arguments (which are supported, despite MSDN being unclear on that point). General cleanup will be required for those functions. * Maxing the various matrix and vector products no longer be intrinsic ops might affect how we emit code for them as sub-expressions (both whether we fold them into use sites and how we parenthize them). This doesn't seem to affect any of our existing tests, but we could consider marking these functions with `[__readNone]` to ensure they can be folded, and then also adding whatever modifier(s) we might invent to control precdence and parentheses insertion during emit.
*	Expand range of definitions that can be moved into stdlib (#1259)	Tim Foley	2020-03-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The actual definitions that got moved into the stdlib here are pretty few: * `clip()` * `cross()` * `dxx()`, `ddy()` etc. * `degrees()` * `distance()` * `dot()` * `faceforward()` The meat of the change is infrastructure changes required to support these new declarations * Generic versions of the standard operators (e.g., `operator+`) were added that are generic for a type `T` that implements the matching `__Builtin`-prefixed interface. An open question is whether we can now drop the non-generic versions in favor of just having these generic operators. * A `__BuiltinLogicalType` interface was added to capture the commonality between integers and `bool` * `__BuiltinArithmeticType` was extended so that implementations must support initialization from an `int` * `__BuiltinFloatingPointType` was extended to require an accessor that returns the value of pi for the given type, and the concrete floating-point types were extended to provide definitions of this value. * It turns out that our logic for checking if two functions have the same signature (and should thus count as redeclarations/redefinitions) wasn't taking generic constraints into account at all. That was fixed with a stopgap solution that checks if the generic constraints are pairwise identical, but I didn't implement the more "correct" fix that would require canonicalizing the constraints. * When doing overload resolution and considering potential callees, logic was added so that a non-generic candidate should always be selected over a generic one (generally the Right Thing to do), and also so that a generic candidate with fewer parameters will be selected over one with more (an approximation of the much more complicated rule we'd ideally have). * The formatting of declarations/overloads for "ambiguous overload" errors was fleshed out a bit to include more context (the "kind" of declaration where appropriate, the return type for function declarations) and to properly space thing when outputting specialization of operator overloads that end with `<` (so that we print `func < <int>(int, int)` instead of just `func <<int,int>(int,int)`). * The core lookup routines were heavily refactored and reorganized to try to make them bottleneck more effectively so that all paths handle all the nuances of inheritance, extensions, etc. * Because of the refactoring to lookup logic, the semantic checking logic related to checking if a type conforms to an interface was updated to be driven based on the `Type` that is supposed to be conforming, rather than a `DeclRef` to the type's declaration. This allows it to use the type-based lookup entry point and eliminates one special-case entry point for lookup. In addition to the various core changes, this change also refactors some of the existing stdlib code to favor writing more things in actual Slang syntax, and less in C++ code that uses `StringBuilder` to construct the Slang syntax. There is a lot more that could be done along those lines, but even pushing this far is showing that the current approach that `slang-generate` takes for how to separate meta-level C++ and Slang code isn't really ideal, so a revamp of the generator code is probably needed before I continue pushing. One surprising casualty of the refactoring of lookup is that we no longer have the `lookedUpDecls` field in `LookupResult`. That field probably didn't belong there anyway, but the role it served was important. The idea of `lookedUpDecls` was to avoid looking up in the same interface more than once in cases where a type might have a "diamond" inheritance pattern. Removing that field doesn't appear to affect correctness of any of our existing tests, but by adding a specific test for "diamond" inheritance I could see that the refactoring introduced a regression and made looking up a member inherited along multiple paths ambiguous. Rather than add back `lookedUpDecls` I went for a simpler (but arguably even hackier) solution where when ranking candidates from a `LookupResult` we check for identical `DeclRef`s and arbitrarily favor one over the other. One complication that arises here is that when comparing `DeclRef`s inherited along different paths they might have a `ThisTypeSubstitution` for the same type, but with different subtype witnesses (because different inheritance paths could lead to different transitive subtype witnesses: e.g., `A : B : D` and `A : C : D`).
*	CUDA/C++ backend improvements (#1198)	jsmall-nvidia	2020-02-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* WIP with vector float test. * vector-float test working. * Fixed remaing tests broken with init changes. * Improve 64bit-type-support.md * Disable tests broken on CI system for Dx. * WIP: Make type available for comparison. * Moved type conversion into TypeTextUtil. * Add text/type conversions from DownstreamCompiler to TypeTextUtil. * Allow compaison taking into account type. * Removed quantize in vector-float.slang test.
*	Feature/fix cuda function preamble (#1187)	jsmall-nvidia	2020-01-29
\| \| \| \| \| \|	* Fix tests/compute/global-init.slang by handling some other cases where functions are emitted. * Fix comment.
*	Matrix indexing (#1172)	jsmall-nvidia	2020-01-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Added hlsl-intrinsic test folder. Enabled ceil as works across targets. * log10 support. * Fix float % on CPU/CUDA to match HLSL which is fmod (not fremainder). * Added log10 tests back to scalar-float.slang * Don't add the ( for $Sx - it's clearer what's going on without it. * Works on CUDA/CPU. Problem with asint/asuint do not seem to be found. * Only asuint exists for double. * Support countbits on CUDA and C++. * Fix typo in C++ population count. * First pass at int vector intrinsic tests. * Swizzle for int. * Bit cast tests on CUDA. * Fix warning on gcc. * Fix bit-cast-double execution on CUDA. * scalar-int test working on gcc release. * GetAt working on CUDA/C++ * Split out runtime index into it's own test. * Removed SetAt, as can use assignment with GetAt. * Allowing getAt to be used on matrices. * Don't need [] on matrix type any longer because use getAt. * Enable clamp on matrix-int. * Fix matrix-int.slang test - because clamp behavior varied if min and max were say inverted. Added runtime indexing version of matrix-int.
*	CUDA support improvements (#1168)	jsmall-nvidia	2020-01-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add test result for compile-to-cuda * Add RAII for some CUDA types to simplify usage. * First pass handling of some instrinsics on CUDA (for example transcendentals) * CUDA working with built in intrinsics. * Add missing CUDA prelude intrinsics. * CUDA matches CPU output on simple-cross-compile.slang * First pass at hlsl-scalar-float-intrinsic.slang test. * Fix smoothstep impl on CUDA and CPU. * Fixed step intrinsic on CUDA/CPU. * Added operator[] to Matrix for C++, to allow row access. Needs a fix for CUDA. * Fixed warning on clang build.
*	WIP: CPU like CUDA binding (#1164)	jsmall-nvidia	2020-01-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* CUDA generated first test compiles. * WIP on enabling CUDA in render-test. * Detect CUDA_PATH environmental variable to build build cuda support into render-test. Added WIP cuda-compute-util.cpp/h Added CUDA as a renderer type. * Fix libraries needed for cuda in premake. * Added -enable-cuda premake option. Defaults to false. * Creates CUDA device, loads PTX and finds entry point. * Fix some erroneous cruft from slang-cuda-prelude.h * Made CUDA use C++ like ABI for generated code. Fix small bug in C++ output semantics.
*	CUDA generated first test compiles. (#1161)	jsmall-nvidia	2020-01-08
\|
*	Fix scoping issue around use of IRTypeSet (#1160)	jsmall-nvidia	2020-01-06
\| \| \| \| \| \| \| \| \| \|	* WIP use IRTypeSet in CPPSourceEmitter - doesn't work because of a cloning issue, causing a crash on exit. * Fix destruction of module issue for IRTypeSet usage in CPPEmitter. * Fix out definition emitting ordering that was removed. * Disable cuda output test.
*	HLSLIntrinsicSet (#1159)	jsmall-nvidia	2019-12-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* CPPCompiler -> DownstreamCompiler * Added DownstreamCompileResult to start abstraction such that we don't need files. * * Split out slang-blob.cpp * Made CompileResult hold a DownstreamCompileResult - for access to binary or ISlangSharedLibrary * Keep temporary files in scope. * Add a hash to the hex dump stream. * Move all file tracking into DownstreamCompiler. * WIP support for nvrtc. * WIP: Adding support for nvrtc compiler. Adding enum types, wiring up the nvrtc into slang. * Fix remaining CPPCompiler references. * Fix order issue on target string matching. * Use ISlangSharedLibrary for nvrtc. * Use DownstreamCompiler for nvrtc. * WIP first pass at compilation win nvrtc. * Added testing if file is on file system into CommandLineDownstreamCompiler. Added sourceContentsPath. * Make test cuda-compile.cu work by just compiling not comparing output. * Genearlize DownstreamCompiler usage. * Fix warning on clang. * Remove CompilerType from DownstreamCompiler. * Use DownstreamCompiler interface for all compilers. NOTE for FXC, DXC and GLSLANG this doesn't mean using 'compile' - it's still extracting functions from shared library. * Replace DownstreamCompiler::SourceType -> SlangSourceLanguage * Replace _canCompile with something data driven. * Fix compiling on gcc/clang for DownstreamCompiler. * Moved some text conversions into DownstreamCompiler. * Fix problem on non-vc builds with not having return on locateCompilers for VS. * Change so no warning for code not reachable on locateCompilers for vs. * WIP: CUDA code generation - currently just using CPU layout and HLSL. * emitXXXForEntryPoint -> emitEntryPointSource emitSourceForEntryPoint -> emitEntryPointSourceFromIR Fix up generating cuda to get PTX. * WIP emitting cuda for IR. * Small improvements to CUDA ouput. * Disable the CUDA emit test, as output not currently compilable. * Split out IRTypeSet to simplify CPPSourceEmitter and other Emitters that rely on determining unique use of type and/or need to generate types in order to output code. * First pass at HLSLIntrinsicSet. * Small improvements to HLSLIntrinsicSet. * Use HLSLIntrinsicSet in CPPSourceEmitter. * Small improvements to checking of HLSLIntrinsic construction. * Deallocate intrinsic copy if a match was found.
*	Split out IRTypeSet (#1158)	jsmall-nvidia	2019-12-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* CPPCompiler -> DownstreamCompiler * Added DownstreamCompileResult to start abstraction such that we don't need files. * * Split out slang-blob.cpp * Made CompileResult hold a DownstreamCompileResult - for access to binary or ISlangSharedLibrary * Keep temporary files in scope. * Add a hash to the hex dump stream. * Move all file tracking into DownstreamCompiler. * WIP support for nvrtc. * WIP: Adding support for nvrtc compiler. Adding enum types, wiring up the nvrtc into slang. * Fix remaining CPPCompiler references. * Fix order issue on target string matching. * Use ISlangSharedLibrary for nvrtc. * Use DownstreamCompiler for nvrtc. * WIP first pass at compilation win nvrtc. * Added testing if file is on file system into CommandLineDownstreamCompiler. Added sourceContentsPath. * Make test cuda-compile.cu work by just compiling not comparing output. * Genearlize DownstreamCompiler usage. * Fix warning on clang. * Remove CompilerType from DownstreamCompiler. * Use DownstreamCompiler interface for all compilers. NOTE for FXC, DXC and GLSLANG this doesn't mean using 'compile' - it's still extracting functions from shared library. * Replace DownstreamCompiler::SourceType -> SlangSourceLanguage * Replace _canCompile with something data driven. * Fix compiling on gcc/clang for DownstreamCompiler. * Moved some text conversions into DownstreamCompiler. * Fix problem on non-vc builds with not having return on locateCompilers for VS. * Change so no warning for code not reachable on locateCompilers for vs. * WIP: CUDA code generation - currently just using CPU layout and HLSL. * emitXXXForEntryPoint -> emitEntryPointSource emitSourceForEntryPoint -> emitEntryPointSourceFromIR Fix up generating cuda to get PTX. * WIP emitting cuda for IR. * Small improvements to CUDA ouput. * Disable the CUDA emit test, as output not currently compilable. * Split out IRTypeSet to simplify CPPSourceEmitter and other Emitters that rely on determining unique use of type and/or need to generate types in order to output code.
*	Don't use mangled names when emitting code (#1096)	Tim Foley	2019-10-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Small cleanup to how standard-library code gets marked Declarations in the standard library used to be individualy marked with `FromStdLibModifier` so that downstream passes (notably IR lowering) could treat them specially. At some point we simplified things by just looking for `FromStdLibModifier` in the ancestor list of a declaration, so that we could handle nested declarations without having to recursively attach the modifier to everything. This change simplifies things a bit further in that the `FromStdLibModifier` now only get attached to the `ModuleDecl`, since that will be on the ancestor chain of all the declarations anyway. The second change here is that we attach this marker modifier before we parse and do semantic checking on the module, rather than after. There is no code in this change that relies on that difference, but changing the timing of when we attach the modifier means that the semantic checking logic can now reliably detect if something represents a declaration in the standard library. * Change implementation of `sign()` for GLSL target Issue #602 related to the `sign()` function having a different return type between GLSL and HLSL/SLang. At the time, the issue was fixed by adding special-case logic in the emit pass that looked for the `sign()` function by name. This change cleans up that logic to work using the existing `__target_intrinsic` mechanism instead. * Make sure code emit doesn't depend on mangled names There was existing logic in the HLSL/GLSL emit path (that got copy-pasted into the C/C++ path) where if a builtin function didn't have an application `__target_intrinsic` modifier (`IRTargetIntrinsicDecoration` at the IR level), we used some fairly fragile logic to take the mangled symbol name for a function and unmangle it to recover the original name, as well as the number of parameters (to detect member functions). This approach had always been a kludge, and we have reached a point where it actively hinders forward progress on some valuable new features. The big idea of this change is fairly simple: in the IR lowering pass we detect when we have a stdlib function that ought to have an intrinsic definition but doesn't have a "catch-all" `__target_intrinsic` modifier applied. If we discover such a function, we go ahead and emit it to the IR with just such a catch-all modifier. (The main alternative here would be to require that all functions in the stdlib be manually marked `__target_intrinsic` and make this a front-end issue where we get errors if we write stdlib functions wrong, but this workaround is more expedient for now) Some of the logic around target intrinsics already handled the idea of having catch-all intrinsic modifiers/decorations with an empty target name, but that support wasn't 100% complete so this change strives to get it working. One detail that was only being handled along this mangled-name path was support for emitting calls using member-function syntax. I updated the generic handling of `__target_intrinsic`-based calls to support specifying a member function name using a prefix `.` on the definition string. With this work in place, it is possible to clean up a bunch of logic in the `emit` code that had to assume that any function without a body/definition must be an intrinsic. Instead, with this change we can now use the presence of a suitable `IRTargetIntrinsicDecoration` as the indicator that a function is an intrinsic. This should make the emit logic a bit more robust. One wrinkle that remains with this change is the special handling of functions that get the name `operator[]` (that is, the `get`/`set`/etc. accessors for `__subscript` declarations). The logic no longer depends on the mangled name, but it still feels a bit gross to have this kind of string-based special case (especially since it is our main/only special case like this). Another wrinkle is that the C/C++ back-end logic for handling intrinsic functions largely mirrors the HLSL/GLSL case, but can't just use the same exact code because it has to be intercepted by the logic that generates some of the required functions on-demand. There's still a net cleanup with this change, but there is probably an opportunity to remove even more duplication down the road.
*	Remove EntryPointLayout* use in emit logic. (#1071)	jsmall-nvidia	2019-10-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Split out EntryPointParamDecoration. * Add profile to EntryPointDecoration. * WIP for GS handling for GLSL. * WIP for StreamOut GLSL * Fixed GLSL geometry output. * Clean up - remove unneeded/commented out code from the entry point change. * Use Op nums to identify GeometryTypeDecorations (as opposed to contained enum). * Remove setSampleRateFlag & doSampleRateInputCheck * Remove EntryPointLayout from emit. * Change to force CI.
*	IR types for subset of Attributes (#1067)	jsmall-nvidia	2019-10-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* IROutputControlPointsDecoration * IROutputTopologyDecoration * IRPartitioningDecoration * IRDomainDecoration * Use IRPatchConstantDecoration alone for hlsl output. * IRMaxVertexCountDecoration * IRInstanceDecoration * Removed _emitHLSLAttributeSingleString and _emitHLSLAttributeSingleInt Removed GLSLBindingAttribute and just use NumThreadsAttribute * Added IRNumThreadsDecoration. * Added IRNumThreadsDecoration * Fix build problem on x86. Improve diagnostic text based on review.
*	Clean up some behavior of operator% (#1060)	Tim Foley	2019-09-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Work on #1059 The `%` operator in the Slang implementation had several issues, and this change tries to address some of them: * Renamed most occurences of "mod" describing this operator to be "rem" for "remainder" to better match its semantics in HLSL * Split the operator into distinct integer and floating-point variants (`IRem` and `FRem`) to simplify having different codegen for the two * Added floating-point variants of `operator%` and `operator%=` to the stdlib. * Added custom C++ codegen for `kIROp_FRem` such that it maps to the standard C/C++ `remainder()` function * Added custom GLSL codegen so that `kIROp_FRem` maps to the GLSL `mod()` function (which isn't correct...) * Added a test case to confirm that D3D11, D3D12, and CPU targets all agree on the definition of floating-point `%` * Fixed `render-test-tool` to allow a negative integer in a `data=...` specification. This didn't end up being used in the final test, but still seems like a good fix. * Added a customized baseline for the Vulkan flavor of that test to confirm that we are not compiling correctly to SPIR-V just yet Addressing the correctness of the output for GLSL/SPIR-V will have to come as a later change given that the operation we want is not exposed directly by unextended GLSL.
*	CPU ABI improvements (#1056)	jsmall-nvidia	2019-09-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* WIP: Improving CPU performance/ABI * Optionally output code on CPU for groupThreadID and groupID. * Added ability to set compute dispatch size on command line for render-test. Dispatch compute tests taking into account dispatch size. Added test for semantics are working. * Test using GroupRange. * Fix problem with adding \n for externa diagnostic - to do it if there isn't a \n at the end. Change the ouput order (put result before) so last value is diagnostic string.
*	CPU Performance/Testing improvements (#1055)	jsmall-nvidia	2019-09-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First pass of render-test refactor. * Make window construction a function that can choose an implementation. * Remove OpenGL as currently has windows dependency. * Disable Vulkan as Renderer impl has dependency on windows. * Pass Window in as parameter of 'update'. * Add win-window.cpp as was missing. * Fix warning on windows about signs during comparison. * * Added mechanism to add random arrays as buffer inputs and select type * Improved RenderGenerator to generate more types, and to be more careful around int32 ranges. * Added support for security checks (for Visual Studio C++) * Disable Execption handling being on by default when compiling kernels * Added a 'Group' version of the entry point that will evaluate all threads in a group in a single call. In test code use this method if available. * Added -compile-arg to be able to pass arguments to the compile within render-test * Add documention for the _Group execution feature. * Fix some typos in cpu-target.md
*	WIP: CPU compute coverage (#1030)	jsmall-nvidia	2019-08-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add support for '=' when defining a name in test. * Add support for double intrinsics. * Add support for asdouble Add findOrAddInst - used instead of findOrEmitHoistableInst, for nominal instructions. Support cloning of string literals. C++ working on more compute tests. * Constant buffer support in reflection. Fixed debugging into source for generated C++. buffer-layout.slang works. * Added cpu test result. * Remove some commented out code. Comment on next fixes. * Improvements to reflection CPU code. * C++ working with ByteAddressBuffer. * Enabled more compute tests for CPU. * Enabled more compute tests on CPU. Added support for [] style access to a vector. * Enabled more CPU compute tests. * Handling of buffer-type-splitting.slang Named buffers can be paths to resources * Fix some warnings, remove some dead code. * Fix problem with verification of number of operands for asuint/asint as they can have 1 or 3 operands. asdouble takes 2. * Fix handling in MemoryArena around aligned allocations. That _allocateAlignedFromNewBlock assumed the block allocated has the aligment that was requested and so did not correct the start address.
*	WIP: Preliminary Slang -> C++ code generation (#1009)	jsmall-nvidia	2019-08-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Expanded prelude for some other resource types. Disable C++ output for ParameterGroup. * WIP: Layout for CPU. * Fixes to CPU layout. * WIP: The uniform is output, but the variable definition is not. * WIP: Entry point parameters to global scope in C++. Handling of resource types (in so far as outputting) * Some discussion of ABI and different input types. * WIP: More C++ support around resource types. * WIP: Split up variables into different structures on emit. * WIP: Emitting C++ with wrapping up of 'Context' * WIP: C++ code has access to semantic values. Wrap in struct so can use method calls to pass shared state. Disable legalizeResourceTypes and legalizeExistentialTypeLayout * Fix structured buffer layout for CPU. * Remove testing/handling of global uniforms on CPU path. Typo fix. Changed CPU tests to use new CPU calling convention. * Check globals are working. Initalize context to zero globals. * Order the global parameters for C++ ouput by their layout. Note - that layout isn't quite working correctly because the StructuredBuffer<int> the int seems to be consuming uniform space. * Work around for reflection not having all data needed for layout ordering for C++ code. * Output constant buffers as pointers. * Entry point parameters accessed through pointer to struct. * WIP: Layout for CPU is reasonable for test case. * Only output 'f' after float literal if type marks as a float. * Cast construction works on C++. * Made IntrinsicOp::ConvertConstruct to make intent clearer. * C++ handling construction from scalar. Handle access of a scalar with .x. Check default initialization. * Comment about need for split of kIROp_construct. Release build works. * Added support from constructVectorFromScalar to C/C++ target. * Handling of in/out in C/C++. * First pass documentation CPU support. * Improvements to C++/C slang code generation documentation. * Small doc change to include need for mechansim to specify cpp compiler path. * Better handling of swizzling - allow swizzling a scalar into a vector.
*	Change how global-scope constants are handled (#1001)	Tim Foley	2019-07-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this change, global and function-scope `static const` declarations were represented as instructions of type `IRGlobalConstant`, which was represented similarly to an `IRGlobalVar`: with a "body" block of instructions that compute/return the initial value. This representation inhibited optimizations (because a reference to a global constant would not in general be replaced with a reference to its value), and also caused problems for resource type legalization because the logic for type legalization did not (and still does not) handle initializers on globals (so global variables that contain resource types are still unsupported). The change here is simple at the high level: we get rid of `IRGlobalConstant` and instead handle global-scope constants as "ordinary" instructions at the global scope. E.g., if we have a declaration like: static const int a[] = { ... } that will be represented in the IR as a `makeArray` instruction at the global scope, referencing other global-scope instructions that represent the values in the array. This simple choice addresses both of the main limitations. A `static const` variable of integer/float/whatever type is now represented as just a reference to the given IR value and thus enables all the same optimizations. When a `static const` variable uses a type with resources, the existing legalization logic (which can handle most of the "ordinary" instructions already) applies. Another secondary benefit of this approach is that the hacky `IREmitMode` enumeration is no longer needed to help us special-case source code emit for `static const` variables. Beyond just removing `IRGlobalConstant`, and updating the lowering logic to use the initializer direclty, the main change here is to the emit logic to make it properly handle "ordinary" instructions that might appear at global scope. One open issue with this change, that could be addressed in a follow-up change, is that "extern" global constants that need to be imported from another module (but which might not have a known value when the current module is compiled) aren't supported - we don't have a way to put a linkage decoration on them. A future change might re-introduce global constants as a distinct IR instruction type that just references the value as an operand (if it is available). We would then need to replace references to an IR constant with references to its value right after linking.
*	WIP: slang to C++ code generation (#997)	jsmall-nvidia	2019-07-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* WIP: Emitting Cpp * Added HLSLType instead of using IRInst - because they don't seem to be deduped. * Removed need for lexer to take a String. Added mechansim to lookup intrinsic functions on C++. * A c/c++ cross compilation test. * WIP Cpp output using cloning and slang types. * More work to generate mul funcs. * WIP: Outputting some simple C++. * Expose findOrEmitHoistableInst to IRBuilder to aid cloning, * Simplification for checking for BasicTypes. Test infrastructure compiles output C++ code. * Dot and mat/vec multiplication output. * First pass at swizzling. * First support for binary ops. * Builtin binary and unary functions. * Any and all. * WIP adding support for other functions. Added code to generate function signature. * Add scalar functions to slang-cpp-prelude.h * Support for most built in operations. * Tested first ternary. * Checking the emitting of corner cases functions - normalize, length, any, all, normalize, reflect. * Check asfloat etc work. * Fmod support. * WIP Array handling in C++. * First stage in being able to handl arbitrary type output for CLikeSourceEmitter * Removed Handler/Emitter split - so can implement more easily complex type naming. * Array passing by value first pass. * Rename Array -> FixedArray * Outputs structs in C++. * Emit the thread config. * Dimension -> TypeDimension * SpecializedOperation -> SpecializedIntrinsic Operation -> IntrinsicOp Use shared impl of isNominalOp Commented use of m_uniqueModule etc. * Add code to test slang->cpp when compiled doesn't have errors. Does so by building shared library and exporting the entry point. * Fix linux clang/gcc compile error about override not being specified. * Make sure c-cross-compile is run on linux targets/smoke. * Remove c-cross-compile.slang from smoke. * Fix running tests/cross-compile/c-cross-compile.slang on Ubuntu 16.04 * Only add -std=c++11 for C++ source.