slang.git - Making it easier to work with shaders

	Commit message (Collapse)	Author	Age
*	Remove generated header files (#1264)	jsmall-nvidia	2020-03-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Update slang-binaries to verison with SPIR-V version support. * Support vec and matrix Wave intrinsics on vk. Added wave-vector.slang test Add wave-diverge.slang test Add support for more wave intrinsics to vk. * Test out Wave intrinsic support for matrices. * Remove matrix glsl intrinsics -> not available. Fix some typo. * Remove generated slang generated headers.
*	Wave intrinsics for Vector and Matrix types (#1262)	jsmall-nvidia	2020-03-06
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Update slang-binaries to verison with SPIR-V version support. * Support vec and matrix Wave intrinsics on vk. Added wave-vector.slang test Add wave-diverge.slang test Add support for more wave intrinsics to vk. * Test out Wave intrinsic support for matrices. * Remove matrix glsl intrinsics -> not available. Fix some typo.
*	Expand range of definitions that can be moved into stdlib (#1259)	Tim Foley	2020-03-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The actual definitions that got moved into the stdlib here are pretty few: * `clip()` * `cross()` * `dxx()`, `ddy()` etc. * `degrees()` * `distance()` * `dot()` * `faceforward()` The meat of the change is infrastructure changes required to support these new declarations * Generic versions of the standard operators (e.g., `operator+`) were added that are generic for a type `T` that implements the matching `__Builtin`-prefixed interface. An open question is whether we can now drop the non-generic versions in favor of just having these generic operators. * A `__BuiltinLogicalType` interface was added to capture the commonality between integers and `bool` * `__BuiltinArithmeticType` was extended so that implementations must support initialization from an `int` * `__BuiltinFloatingPointType` was extended to require an accessor that returns the value of pi for the given type, and the concrete floating-point types were extended to provide definitions of this value. * It turns out that our logic for checking if two functions have the same signature (and should thus count as redeclarations/redefinitions) wasn't taking generic constraints into account at all. That was fixed with a stopgap solution that checks if the generic constraints are pairwise identical, but I didn't implement the more "correct" fix that would require canonicalizing the constraints. * When doing overload resolution and considering potential callees, logic was added so that a non-generic candidate should always be selected over a generic one (generally the Right Thing to do), and also so that a generic candidate with fewer parameters will be selected over one with more (an approximation of the much more complicated rule we'd ideally have). * The formatting of declarations/overloads for "ambiguous overload" errors was fleshed out a bit to include more context (the "kind" of declaration where appropriate, the return type for function declarations) and to properly space thing when outputting specialization of operator overloads that end with `<` (so that we print `func < <int>(int, int)` instead of just `func <<int,int>(int,int)`). * The core lookup routines were heavily refactored and reorganized to try to make them bottleneck more effectively so that all paths handle all the nuances of inheritance, extensions, etc. * Because of the refactoring to lookup logic, the semantic checking logic related to checking if a type conforms to an interface was updated to be driven based on the `Type` that is supposed to be conforming, rather than a `DeclRef` to the type's declaration. This allows it to use the type-based lookup entry point and eliminates one special-case entry point for lookup. In addition to the various core changes, this change also refactors some of the existing stdlib code to favor writing more things in actual Slang syntax, and less in C++ code that uses `StringBuilder` to construct the Slang syntax. There is a lot more that could be done along those lines, but even pushing this far is showing that the current approach that `slang-generate` takes for how to separate meta-level C++ and Slang code isn't really ideal, so a revamp of the generator code is probably needed before I continue pushing. One surprising casualty of the refactoring of lookup is that we no longer have the `lookedUpDecls` field in `LookupResult`. That field probably didn't belong there anyway, but the role it served was important. The idea of `lookedUpDecls` was to avoid looking up in the same interface more than once in cases where a type might have a "diamond" inheritance pattern. Removing that field doesn't appear to affect correctness of any of our existing tests, but by adding a specific test for "diamond" inheritance I could see that the refactoring introduced a regression and made looking up a member inherited along multiple paths ambiguous. Rather than add back `lookedUpDecls` I went for a simpler (but arguably even hackier) solution where when ranking candidates from a `LookupResult` we check for identical `DeclRef`s and arbitrarily favor one over the other. One complication that arises here is that when comparing `DeclRef`s inherited along different paths they might have a `ThisTypeSubstitution` for the same type, but with different subtype witnesses (because different inheritance paths could lead to different transitive subtype witnesses: e.g., `A : B : D` and `A : C : D`).
*	Feature/glslang spirv version (#1256)	jsmall-nvidia	2020-03-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* WIP add support for __spirv_version . * Added IRRequireSPIRVVersionDecoration * SPIR-V version passed to glslang. Enable VK wave tests. Split ExtensionTracker out, so can be cast and used externally to emit. Added SourceResult. * Fix warning on Clang. * Missing hlsl.meta.h * Refactor communication/parsing of __spirv_version with glslang. * Fix some debug typos. Be more precise in handling of substring handling. * Make glslang forwards and backwards binary compatible. * Small comment improvements. * Added slang-spirv-target-info.h/cpp * Fix for major/minor on gcc. * Another fix for gcc/clang. * VS projects include slang-spirv-target-info.h/cpp * Removed SPIRVTargetInfo Added SemanticVersion. Don't bother with passing a target to glslang. Should be separate from 'version'. * Renamed slang-emit-glsl-extension-tracker.cpp/.h -> slang-glsl-extension-tracker.cpp/.h Fixed some VS project issues. * Fix a comment. * Added slang-semantic-version.cpp/.h * Added slang-glsl-extension-tracker.cpp/.h * Added split that can check for input has all been parsed. * Fix problem on x86 win build.
*	__spirv_version Decoration (#1255)	jsmall-nvidia	2020-03-03
\| \| \| \| \| \|	* WIP add support for __spirv_version . * Added IRRequireSPIRVVersionDecoration
*	Move definitions of simple vector/matrix builtins to stdlib. (#1247)	Tim Foley	2020-03-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some of the functions declared in the Slang standard library are built in on some targets (almost always the case for HLSL) but aren't available on other targets (often the case for GLSL, CUDA, and CPU). To date, the CUDA and CPU targets have worked around this issue by synthesizing definitions of the missing functions on the fly as part of output code generation, at the cost of some amount of code complexity in the emit pass. This change adds definitions inside the stdlib itself for a large number of built-in HLSL functions that act element-wise over both vectors and matrices (e.g., `sin()`, `sqrt()`, etc.), and changes the CPU/CUDA codegen path to not synthesize C++ code for those functions (instead relying on code generated from the Slang definitions). The element-wise vector/matrix function bodies are being defined using macros in the stdlib, so that we can more easily swap out the definitions en masse if we find an implementation strategy we like better. This could involve defining special-case syntax just for vector/matrix "map" operations that can lower directly to the IR and theoretically generate cleaner code after specialization is complete. As a byproduct of this change, the matrix versions of these functions should in principle now be available to GLSL (GLSL only defines vector versions of functions like `sin()`, and leaves out matrix ones). No testing has been done to confirm this fix. In some cases builtins were being declared with multiple declarations to split out the HLSL and GLSL cases, and this change tries to unify these as much as possible into single declarations to keep the stdlib as small as possible. Two functions -- `sincos()` and `saturate()` -- were simple enough that their full definitions could be given in the stdlib so that even the scalar cases wouldn't need to be synthesized, so the corresponding enumerants were removed in `slang-hlsl-intrinsic-set.h`. In the case of `saturate()` the pre-existing definition used for GLSL codegen could have been used for CPU/CUDA all along. In some cases functions that can and should be defined in the future have had commented-out bodies added as an outline for what should be inserted in the future. Most of these functions cannot be implemented directly in the stdlib today because basic operations like `operator+` are currently not defined for `T : __BuiltinArithmeticType`, etc. Adding such declarations should be straightforward, but brings risks of creating unexpected breakage, so it seemed best to leave for a future change. This change does not try to address making vector or matrix versions of builtin functions that map to single `IROp`s, because the existing mechanisms for target-based specialization, etc., do not apply for such cases. In the future we will either have to make those operations into ordinary functions (eliminating many `IROp`s) so that stdlib definitions can apply, or add an explicit IR pass to deal with legalizing vector/matrix ops for targets that don't support them natively. The right path for this is not yet clear, so this change doesn't wade into it. This change does not touch the `Wave*` functions added in Shader Model 6, despite many of these having vector/matrix versions that could benefit from the same default mapping. It is expected that these functions will have GLSL/Vulkan translation added soon, and it probably makes sense to know what cases are directly supported on Vulkan before adding the hand-written definitions. Because of the limitations on what could be ported into the stdlib, it is not yet possible to remove any of the infrastructure for synthesizing builtin function definitions in the CPU and CUDA back-ends.
*	Feature/glsl wave intrinsic (#1253)	jsmall-nvidia	2020-03-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Test for some wave intrinsics. More wave intrinsic support on CUDA. * Use shfl_xor_sync. * Improvements around wave intrinsics. Fix built in integer types belong to __BuiltinIntegerType. * Improvements and fixes around Wave intrinsics. * Added WaveIsFirstLane test. No longer use __wavemask_lt, as appears not available as an intrinsic. * Small fixes to CUDA prelude. * Add wave-active-product test. Handle the special case for arbitray sums. * Used macro to implement CUDA wave intrinsics. * First pass at glsl wave intrinsics. Doesn't work in practice because require mechanism to set spir-v version Replace use of _lanemask_lt() for CUDA.
*	Additional Wave Intrinsic Support (#1252)	jsmall-nvidia	2020-03-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Test for some wave intrinsics. More wave intrinsic support on CUDA. * Use shfl_xor_sync. * Improvements around wave intrinsics. Fix built in integer types belong to __BuiltinIntegerType. * Improvements and fixes around Wave intrinsics. * Added WaveIsFirstLane test. No longer use __wavemask_lt, as appears not available as an intrinsic. * Small fixes to CUDA prelude. * Add wave-active-product test. Handle the special case for arbitray sums. * Used macro to implement CUDA wave intrinsics.
*	WIP on RWTexture types on CUDA/CPU (#1234)	jsmall-nvidia	2020-02-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* CUDA support for array of resources. * * Add support for Texture2DArray on CPU * Expand texture-simple.slang to test Texture2DArray * Reorganise CUDAComputeUtil to split out createTextureResource. * Add TextureCubeArray support for CPU/CUDA targets. * Pulled out CUDAResource Renamed derived classes to reflect that change. * Creation of SurfObject type. * Functions to return read/write access for simplifying future additions. * WIP for RWTexture access on CPU/CUDA. * CUsurfObject cannot have mips. * Ability to set number of mips on test data. Preliminary support for CUsurfObj and RWTexture1D on CUDA. CUDA docs improvements. * Fix typo.
*	Don't allocate a default space for a VK push-constant buffer (#1231)	Tim Foley	2020-02-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a shader only uses `ParameterBlock`s plus a single buffer for root constants: ```hlsl ParameterBlock<A> a; ParameterBlock<B> b; [[vk::push_constant]] cbuffer Stuff { ... } ``` we expect the push-constant buffer should not affect the `space` allocated to the parameter blocks (so `a` should get `space=0`). This behavior wasn't being implemented correctly in `slang-parameter-binding.cpp`. There was logic to ignore certain resource kinds in entry-point parameter lists when computing whether a default space is needed, but the equivalent logic for the global scope only considered parameters that consuem whole register spaces/sets. This change shuffles the code around and makes sure it considers a global push-constant buffer as not needing a default space/set. Note that this change will have no impact on D3D targets, where `Stuff` above would always get put in `space0` because for D3D targets a push-constant buffer is no different from any other constant buffer in terms of register/space allocation. One unrelated point that this change brings to mind is the `[[vk::push_constant]]` should ideally also be allowed to apply to an entry point (where it would modify the default/implicit constant buffer). In fact, it could be argued that push-constant allocation should be the default for (non-RT) entry point `uniform` parameters (while `[[vk::shader_record]]` should be the default for RT entry point `uniform` parameters).
*	Initial partial support for WaveXXX intrinsics on CUDA (#1228)	jsmall-nvidia	2020-02-19
\| \| \| \| \| \| \|	* Start work on wave intrinsics for CUDA. * Add prelimary CUDA support for some Wave intrinsics. Document the issue around WaveGetLaneIndex
*	Fixes for DXR 1.1 RayQuery type (#1227)	Tim Foley	2020-02-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous change that added `RayQuery` to the standard library didn't mark it as any kind of intrinsic, so the first fix here was to add the appopriate attribtue to the stdlib declaration of `RayQuery`. Next I found that the legalization pass was obliterating the `RayQuery` type because it had no members, and thus looked like an empty `struct` (which we eliminate for a variety of reasons). I fixed that by adding a check for a target-intrinsic decoration in type legalization. Next I found that the type wasn't emitted correctly because our generic specialization was turning `RayQuery<0>` into a new type `RayQuery_0` (which is what our specialization is designed to do, after all). I then disabled generic specialization for types that are marked as target intrinsics (which probably renders the preceding fix moot). Finally, I found that the emit logic for types in HLSL wasn't handling the case of a generic intrinsic type that didn't also use its own dedicated opcode. I fixes that up by adding a specific case for `IRSpecialize` as a late catch-all. After all these changes, a declaration of a `RayQuery` variable seems to Just Work (even without any new/improved behavior for handling default constructors). One potential gotcha looking forward is that my checks for `IRTargetIntrinsicDecoration` aren't checking what target the decoration is for. This is fine for now because there are only two types using the decoration right now (`RayDesc` and `RayQuery`), and the special cases above are reasonable for both of them. If/when we have more target-intrinsic types with this decoration, and some of them are only intrinsic for specific targets, then we will need to revisit this choice and either: * make these checks perform filtering based on the "current" target (similar to what the emit logic has today), or * (more likely) make the linking and target-specialization step strip out any target-intrinsic decorations that aren't the right one(s) for the current target Note that this change doesn't include a test case yet because I don't have a DXR 1.1 ready version of dxc to test against. I have manually confirmed that appropriate Slang input seems to be producing reasonable HLSL output when using these functions, but I cannot yet try to check that in (using an HLSL file for the expected output would be quite fragile).
*	Feature/cuda coverage (#1223)	jsmall-nvidia	2020-02-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add cubemap support. * Add CUDA fence instrinsics. * Added Gather for CUDA. * Use the CUDA driver API as much as possible. * * Support 1D texture on CPU * WIP on 1D texture on CUDA * Added simplified texture test * Fix test. * Improve texture-simple tests. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Add a bunch of stdlib declarations for SM 6.4 and 6.5 (#1221)	Tim Foley	2020-02-14
\| \| \| \| \| \| \|	The main thing this adds is the `RayQuery` type with its `TraceRayInline` method for DXR 1.1. None of these new functions/types/constants have been tested, and many of them are not expected to work at all (e.g., we don't actually have any mesh shader support, so adding them as stage types is just for completeness at the API level). I would like to write some test cases after this is checked in by looking for existing DXR 1.1 examples. We currently have an issue around default initialization that means we probably can't run any DXR 1.1 shaders right now with just the stdlib changes.
*	Support for isinf/isfinite/isnan/ldexp (#1219)	jsmall-nvidia	2020-02-12
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Added support ldexp. * Added classify-float.slang test Fixed glsl output. * Added classify-double.slang * Added ldexp test to scalar-double.slang * isnan, isinf, isfinite are macros (on some targets) so remove :: prefix.
*	CUDA barrier/atomic support (#1218)	jsmall-nvidia	2020-02-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* * Improved fastRemoveAt * Fixed off by one bug * Fixed const safeness with List<> * Made List begin and end const safe. * Revert to previous RefPtr usage. * Fix bug with casting. * Tabs -> spaces. Small fixes/improvements to List. * Improve comment on List. * Group shared/atomic test works on CUDA. * * Enabled CUDA tests for atomics tests * Enabled DX12 test for atomics-buffer.slang Not clear just yet how to implement that for CUDA - it will work with StructuredBuffer. * hasContent -> isNonEmpty * Remove unneeded comment.
*	Add support for RWBuffer writes on GLSL/SPIR-V target (#1199)	Tim Foley	2020-02-05
\| \| \| \| \|	This appears to have been an oversight in the work that added support for `imageStore` as well as atomics when writing to `RWTexture*` and friends. The HLSL/Slang `RWBuffer` type maps to GLSL as an `imageBuffer`, which is effectively just another case of writable texture image (bonus points to anybody who can explain to me the meaningful distinction between an `imageBuffer` and an `image1D`). This change copies the handling of subscript access (`operator[]`) from textures over to buffers, and adds a test case to confirm that the new handling works for the simple case of setting a buffer element.
*	HLSL intrinsic coverage (#1169)	jsmall-nvidia	2020-01-21
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Added hlsl-intrinsic test folder. Enabled ceil as works across targets. * log10 support. * Fix float % on CPU/CUDA to match HLSL which is fmod (not fremainder). * Added log10 tests back to scalar-float.slang * Don't add the ( for $Sx - it's clearer what's going on without it.
*	CUDA support improvements (#1168)	jsmall-nvidia	2020-01-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add test result for compile-to-cuda * Add RAII for some CUDA types to simplify usage. * First pass handling of some instrinsics on CUDA (for example transcendentals) * CUDA working with built in intrinsics. * Add missing CUDA prelude intrinsics. * CUDA matches CPU output on simple-cross-compile.slang * First pass at hlsl-scalar-float-intrinsic.slang test. * Fix smoothstep impl on CUDA and CPU. * Fixed step intrinsic on CUDA/CPU. * Added operator[] to Matrix for C++, to allow row access. Needs a fix for CUDA. * Fixed warning on clang build.
*	Clean up the concept of "pseudo ops" (#1136)	Tim Foley	2019-11-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Clean up the concept of "pseudo ops" Built-in functions in the Slang standard library can be marked with `__intrinsic_op(...)` to indicate that they should not lower to functions in the IR, and that instead call sites to those functions should be translated directly to the IR. There are two cases where `__intrinsic_op(...)` gets used: 1. In the case where the argument to `__intrinsic_op(...)` is an actual IR instruction opcode, the IR lowering logic directly translates a call into an instruction with the given opcode. The arguments to the call become the operands of the instruction. 2. In the case where the argument to `__intrinsic_op(...)` is one of a set of "pseudo" instruction opcodes, the IR lowering logic directly handles the lowering to IR with dedicated code. The operands to the call might be handled differently depending on the kind of operation. The compound operators like `+=` are the most important example of these "pseudo" instructions. It doesn't make sense to handle them as true function calls (although that would work semantically), nor does it make sense to have a single IR instruction with such complicated semantics. An earlier version of the compiler used the same enumeration for both the true IR instruction opcodes and these "pseudo" opcodes, with the simple constraint that the pseudo opcodes were all negative while the real opcodes were positive. That design got changed up over a few refactorings, and because there was never a good explanation in the code itself of what "pseudo" opcodes were, we eventually ended up in a place where the in-memory and serialized IR encodings included logic to try to deal with the possibility of these "pseudo" opcodes, even though the entire design of the lowering pass meant that they'd never appear in generated IR. This change tries to clean up the mess in a few ways: * The terminology is now that these are "compound" intrinsic ops, to differentiate them from the more common case of intrinsic ops that map one-to-one to IR instructions. * The declaration of the compound intrinsic ops is no longer in a file related to the IR, and doesn't use the `IR` naming prefix, so somebody looking at the IR opcodes cannot become confused and think the compound ops are allowed there. * The IR encoding in memory and when serialized is updated to not account for or worry about the possibility of "pseudo" ops. * The compound ops are declared in such a way that ensures their enumerant values are all negative, so that they are yet again trivially disjoint from the true IR opcodes. A more drastic change might have split `__intrinsic_op` into two different modifier types: one for the trivial single-instruction case and one for the compound case. Doing this would make the change more invasive, though, because there are places in the meta-code that generates the standard library that intentionally handle both single-instruction and compound ops (because built-in operators can translate to either case). * fixup: missing file * cleanups based on review feedback
*	Don't use mangled names when emitting code (#1096)	Tim Foley	2019-10-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Small cleanup to how standard-library code gets marked Declarations in the standard library used to be individualy marked with `FromStdLibModifier` so that downstream passes (notably IR lowering) could treat them specially. At some point we simplified things by just looking for `FromStdLibModifier` in the ancestor list of a declaration, so that we could handle nested declarations without having to recursively attach the modifier to everything. This change simplifies things a bit further in that the `FromStdLibModifier` now only get attached to the `ModuleDecl`, since that will be on the ancestor chain of all the declarations anyway. The second change here is that we attach this marker modifier before we parse and do semantic checking on the module, rather than after. There is no code in this change that relies on that difference, but changing the timing of when we attach the modifier means that the semantic checking logic can now reliably detect if something represents a declaration in the standard library. * Change implementation of `sign()` for GLSL target Issue #602 related to the `sign()` function having a different return type between GLSL and HLSL/SLang. At the time, the issue was fixed by adding special-case logic in the emit pass that looked for the `sign()` function by name. This change cleans up that logic to work using the existing `__target_intrinsic` mechanism instead. * Make sure code emit doesn't depend on mangled names There was existing logic in the HLSL/GLSL emit path (that got copy-pasted into the C/C++ path) where if a builtin function didn't have an application `__target_intrinsic` modifier (`IRTargetIntrinsicDecoration` at the IR level), we used some fairly fragile logic to take the mangled symbol name for a function and unmangle it to recover the original name, as well as the number of parameters (to detect member functions). This approach had always been a kludge, and we have reached a point where it actively hinders forward progress on some valuable new features. The big idea of this change is fairly simple: in the IR lowering pass we detect when we have a stdlib function that ought to have an intrinsic definition but doesn't have a "catch-all" `__target_intrinsic` modifier applied. If we discover such a function, we go ahead and emit it to the IR with just such a catch-all modifier. (The main alternative here would be to require that all functions in the stdlib be manually marked `__target_intrinsic` and make this a front-end issue where we get errors if we write stdlib functions wrong, but this workaround is more expedient for now) Some of the logic around target intrinsics already handled the idea of having catch-all intrinsic modifiers/decorations with an empty target name, but that support wasn't 100% complete so this change strives to get it working. One detail that was only being handled along this mangled-name path was support for emitting calls using member-function syntax. I updated the generic handling of `__target_intrinsic`-based calls to support specifying a member function name using a prefix `.` on the definition string. With this work in place, it is possible to clean up a bunch of logic in the `emit` code that had to assume that any function without a body/definition must be an intrinsic. Instead, with this change we can now use the presence of a suitable `IRTargetIntrinsicDecoration` as the indicator that a function is an intrinsic. This should make the emit logic a bit more robust. One wrinkle that remains with this change is the special handling of functions that get the name `operator[]` (that is, the `get`/`set`/etc. accessors for `__subscript` declarations). The logic no longer depends on the mangled name, but it still feels a bit gross to have this kind of string-based special case (especially since it is our main/only special case like this). Another wrinkle is that the C/C++ back-end logic for handling intrinsic functions largely mirrors the HLSL/GLSL case, but can't just use the same exact code because it has to be intercepted by the logic that generates some of the required functions on-demand. There's still a net cleanup with this change, but there is probably an opportunity to remove even more duplication down the road.
*	GetDimension on GLSL for StructuredBuffer (#1081)	jsmall-nvidia	2019-10-15
\| \| \| \| \| \|	* Fix GetDimensions for glsl. * Add test for Load on RWStructuredBuffer as part of GetDimension.
*	Update signatures for Shader Model 6 functions (#1029)	Tim Foley	2019-08-22
\| \| \| \| \| \| \| \| \| \| \| \|	* Update signatures for Shader Model 6 functions We originally culled out list of Shader Model 6 function prototypes from some combination of MSDN and the dxc wiki, but this was when things were in a pre-release state and the names of many functions changed (as well as some functions being added/removed). For this change, I initially tried to look at MSDN, but the documentation there was internally inconsistent and included things like misspellings in function names (which I hoped weren't accurate to the real implementation). Instead, I looked at the meta-program input that dxc uses to generate its intrinsic definitions and used that to discover the updated function names along with what was added/removed, a few signature differences, and some extra functions that don't even seem to be documented at all on MSDN> These have not been tested in any meaningful way, so this is really a best-effort thing. Given that most of the old functions had the wrong names there was no way for them to be used in working code anyway, so this is unlikely to cause breakage. * fixup: quadLaneID is a uint
*	Fix declaration of scalar sincos() function. (#996)	Tim Foley	2019-07-03
\| \| \| \| \|	The function was accidentally defined with a generic `int` parameter copy-pasted from the vector definition, but that made the scalar version impossible to call with inferred generic arguments, because there wasn't a way to infer `N` when it isn't used in the parameter list. Includes a simple test case to confirm that the front-end no longer chokes on calls to scalar `sincos()`.
*	Translate .Load() to imageLoad() for Vulkan (#967)	Tim Foley	2019-05-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Translate .Load() to imageLoad() for Vulkan We were already emitting calls to `imageLoad()` and `imageStore` when a `RWTexture` was used with `operator[]`: ```hlsl RWTexture2D<float> myTex; ... float value = myTex[xy]; // becomes an imageLoad myTex[xy] = value; // becomes an imageStore ``` However, we were not* correctly handling the translation of an explicit `.Load()` operation: ```hlsl float value = myTex.Load(xy); ``` The `.Load()` operation was being translated to a GLSL `texelFetch` as it would be a for a `Texture2D`, and not to an `imageLoad()` as would make sense for a `RWTexture2D` (which becomes a GLSL `image2D`). This fix is confined to the stdlib, and is mostly a matter of emitting either `texelFetch` or `imageLoad` as the GLSL function name depending on the "access" of the resource type. It is messy code, but straightforward. One extra detail was that there had been logic to emit a `, 0` argument in the `texelFetch` calls in the non-read-only case, because `texelFetch` usualy requires an explicit mip-level argument and `.Load()` on a `RWTexture` doesn't recieve an LOD parameter. This is a non-issue now that we are calling `imageLoad()` instead, because `imageLoad` doesn't need/want the extra argument. fixup: change test baseline based on recent GLSL output changes * fixup: review feedback
*	Hotfix/dx12 tests use hardware (#920)	jsmall-nvidia	2019-03-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Disable Dx12 half tests. The half-calc test runs, but is not actually doing any half maths. If the code is changed such that it is, the device fails when the shader is used. This can be seen by looking at dxil-asm. * Fix using software driver for dx12 even when hardware is requested. * * Refactor Dx12 _createAdapter such that it doesn't have side effects and stores desc information * Disable half on dx12 software renderer because it crashes * * Disable erroneous warnings from dx12 * Test for adapter creation * Identify warp specifically * Structured buffer test now works on dx12. * Fix intemittent crash on dx12. Due to if a resource was initialized with data, the actual resource constructed might be larger, for alignment issues. This led to a memcpy potentially copying from after the allocated source data and therefore a crash. Now only copies the non aligned amount of data. * * Rename the test to use - style * Disable TextureCube lookup in tests, as does not produce the correct result in dx12 (will fix in future PR) * Updated hlsl.meta.slang.h that has rcp for glsl.
*	Add support for scalar rcp() intrinsic for GLSL (#918)	Robert Stepinski	2019-03-20
\|
*	Fix handling of arrays of resources in type legalization (#896)	Tim Foley	2019-03-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before type legalization we might have code like: (using pseudo-Slang-IR): struct P { ... Texture2D<float>[] t; } global_param p : ParameterBlock<P>; ... // p.t[someIndex].Load(...); // let ptrToArrayOfTextures = getFieldPtr(p, "t") : Ptr<Texture2D<float>[]>; let ptrToTexture = getElementPtr(ptrToArrayOfTextures, someIndex) : Ptr<Texture2D<float>>; let texture = load(ptrToTexture) : Texture2D<float>; let result = call(loadFunc, texture, ...) : float; Legalization needs to move the `t` array there out of the `p` parameter block, so the global declarations become something like: struct P_Ordinary { ... }; // no more "t" field global_param p_ordinary : ParameterBlock<p_ordinary); global_param p_t : Texture2D<float>[]; In terms of the code to access `p.t[someIndex]` the problem is that `p_t` has one less level of indirection than `p.t` had. We solve this in the type legalization pass using "pseudo-types" and "pseudo-values," where one of the cases is `implicitIndirect` which holds a value of type `T`, but indicates that it should act like a value of type `T`. We then use some basic rules for dealing with `implicitIndirect` values, such as: load(implicitDeref(x)) : T => x : T getFieldPtr(implicitDeref(s), f) => implicitDeref(getField(s, f)) getElementPtr(implicitDeref(a), i) => implicitDeref(getElement(a, i)) The bug here was that for the `getFieldPtr` and `getElementPtr` cases, we weren't computing the type of the `getField` or `getElement` instruction correctly. We were copying the type from the `getFieldPtr` or `getElementPtr` operation over directly, but those will be pointer* types and we need the type of whatever they point to. Once the types are fixed, we can properly generate legalized IR for `p.t[someIndex].Load(...) that looks like: let arrayOfTextures = p_t : Texture2D<float>[]; let texture = getElement(arrayOfTextures, someIndex) : Texture2D<float>; let result = call(loadFunc, texture, ...) : float; The old was giving the `texture` intermediate a type of `Ptr<Texure2D<float>>`. That didn't actually trip up too many things, because we mostly just went on to emit code from something with slightly incorrect types for intermediates that never show up in the generated HLSL/GLSL. Where this caused a problem is for some of the intrinsic function definitions for the GLSL/Vulkan back-end, because those do things that inspect operand types. In particular the `$z` opcode in our intrinsic function strings triggers logic that looks at a texture operand, and uses its type to try to find the appropriate swizzle to get from a 4-component vector to the appropriate type for the operation (e.g., for a load from a `Texture2D<float>` we need to swizzle with `.x` to get a single scalar out of the matching GLSL texture fetch operation). The main fix in this change is thus to make `getElementPtr` and `getFieldPtr` legalization properly account for the fact that when switching to `getElement` or `getField` we need a result type that is the "pointee" of the original result. There was already logic to extract the pointed-to type from a pointer in `ir-specialize.cpp`, so I extracted that to a re-usable function in the IR as `tryGetPointedToType` (returns null if the type isn't actually a pointer). This logic needed to be extended for type legalization, to deal with the various "pseudo-type" cases. There is another fix in this change which is marking the `NonUniformResourceIndex` function as `[__readNone]`, which enables it to be more aggressively folded into use sites. Without that fix, we risk emitting code like: ```glsl int tmp = nonUniformEXT(someIndex); vec4 result = texelFetch(arrayOfTextures[tmp], ...); ``` The problem with that code is that (at least by my reading of the spec), assigning to the variable `tmp` that isn't declared with the `nonUniformEXT` qualifier effectively loses that qualifier, and drivers are free to assume that `tmp` is uniform when used to index into `arrayOfTextures`. Marking the `NonUniformResourceIndex` function as `[__readNone]` indicates that it has no side effects, which should mean that our emit logic no longer needs to emit it was its own line of code to be safe. The effects of this change are confirmed by both the new test case added, and the existing `non-uniform-indexing` test.
*	On doing complete rebuild hlsl.meta.slang.h appears to be slightly out of ↵	jsmall-nvidia	2019-03-07
\| \| \| \|	step. (#884)
*	Fix rsqrt intrinsic for GLSL (#881)	Robert Stepinski	2019-03-06
\| \| \| \| \| \|	* Add support for glsl inversesqrt intrinsic * fixup for test failure
*	Add GLSL intrinsics for f32tof16() and f16tof32() (#883)	Robert Stepinski	2019-03-06
\|
*	Hotfix/fix stdlib error reporting (#866)	jsmall-nvidia	2019-02-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* * Add 'identity' version of bit casts (asint, asuint, asfloat) for scalar and vector * Added identity bit casts for matrix (cos no op). We don't support matrix asint on glsl targets * Added tests in bit-cast.slang * Use kIRPseudoOp_Pos for identity asuint/asint/asfloat casts. * * Stop crash if error in stdlib * Use the buildin source manager when compiling stdlib - fixes that line numbers are displayed * Typo fix * Output line directives for 'meta' slang souce files into stdlib * Improve comments and function names.
*	Use kIRPseudoOp_Pos for identity asuint/asint/asfloat casts. (#865)	jsmall-nvidia	2019-02-27
\| \| \| \| \| \| \| \|	* * Add 'identity' version of bit casts (asint, asuint, asfloat) for scalar and vector * Added identity bit casts for matrix (cos no op). We don't support matrix asint on glsl targets * Added tests in bit-cast.slang * Use kIRPseudoOp_Pos for identity asuint/asint/asfloat casts.
*	* Add 'identity' version of bit casts (asint, asuint, asfloat) for scalar ↵	jsmall-nvidia	2019-02-27
\| \| \| \| \| \|	and vector (#864) * Added identity bit casts for matrix (cos no op). We don't support matrix asint on glsl targets * Added tests in bit-cast.slang
*	* Add cross compile test (#849)	jsmall-nvidia	2019-02-14
\| \| \|	* Add intrinsic for StructuredBuffer.Load
*	Add support for user defined attributes.	Yong He	2019-01-29
\|
*	Merge branch 'master' into texture-fix	Yong He	2019-01-28
\|\
\| *	Feature/bit cast glsl (#808)	jsmall-nvidia	2019-01-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First attempt at asint, asuint, asfloat intrinsics. * Test with countbits * Placing glsl definitions first makes them get picked up. * Some more improvements around asint. * Add support for vector versions of asint/asunit * Fix some typos in asuint/asint intrinsics for glsl. Simplified and increased coverage of as/u/int tests. * Added bit-cast-double test. Added notional support for asdouble bit casts to glsl - but couldn't test because glslang doesn't seem to support the extension. * Try to get double bit casts working - doesn't work cos of block issue. * Only output parents on intrinic replacement if return type is not void.
* \|	Add GLSL translation rules for `SampleCmp`, `asint` and `asfloat`.	Yong He	2019-01-25
\|/
*	Fix for byte-address buffers on Vulkan (#760)	Tim Foley	2018-12-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix output comparison for compute tests There was some vestigial logic there that was first doing a string-based comparison of actual/expected output, and then falling back to a path that parsed the expected output as a float, then converted that to an integer, then printed that integer in hex, and did the comparison with the result of that conversion. I'm not even clear on what that code was trying to accomplish, but its main effect was allowing a test failure to slide by unnoticed becaues somehow an all-zeroes actual output file was matching an expected output file with no zeros. My understanding is that it went something like this: * The first line of expected output was `A` (as in hexidecimal for the decimal integer `10`), and the first line of actual output was `0`. * The `StringToFloat` function was failing on the input string `"A"` and returned `0.0` to indicate failure (rather than reporting any kind of error) * We then converted the `0.0` to integer `0` and converted it to a base-16 string `"0"` * The comparison to the actual output passed, and then a careless early exit in the comparison loop meant that a full test would pass as soon as one line of output passed according to this "second change" logic. This change removes the broken code in the test runner since nothing was relying on it, other than the one broken test case we wanted to fix anyway. * Fix the declarations of byte-address buffer methods for Vulkan The HLSL `ByteAddressBuffer` and `RWByteAddressBuffer` types have methods `Load` and `Store` that take byte offsets from the start of the buffer, but we translate them into GLSL that uses `uint[]` array, so that indexing into that array will be off by a factor of four. Somehow the code for mutable byte address buffers was written to add 4, 8, and 12 bytes to the base offset of a vector to get to its subsequent components, but I never thought about the implications this would have for the base address itself. This change includes the following fixes: * Any place in the translation of a byte-address `Load` or `Store` method that was using the address/offset value has been changed to use `$1 / 4` instead of `$1`. * The offsets of 4, 8, and 12 have been changed to 1, 2, and 3 since they are now being added to an index instead of a byte offset * The `GetDimensions` methods have introduced a factor of `* 4` to account for the fact that they need to return a byte size and not a count of elements. With this change the existing `byte-address-buffer` test now produces the desired output under Vulkan.
*	Change how buffers are emitted (#741)	Tim Foley	2018-12-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Change how buffers are emitted This is a change with a lot of pieces, which can't always be separated out cleanly. I'm going to walk through them in what I hope is a logical order. The main goal of this change was to allow arrays of structured buffers to translate to Vulkan. Consider two declarations of structured buffers in HLSL/Slang: ```hlsl StructuredBuffer<X> single; StructuredBuffer<Y> multiple[10]; ``` The current translation logic was handling `single` by translating it into an unnamed GLSL `buffer` block like: ```glsl layout(std430) buffer _S1 { X single[]; }; ``` That syntax allows an expression like `single[i]` in Slang to be translated simply as `single[i]` in GLSL. But that naive translating doesn't work for `multiple`, since we need to declare a array of blocks in GLSL, which requires giving the whole thing a name: ```glsl layout(std430) buffer _S2 { Y _data[]; } multiple[10]; ``` Now a reference to `multiple[i][j]` in Slang needs to become `multiple[i]._data[j]` in GLSL. To avoid having way too many special cases around single structured buffers vs. arrays, it makes sense to allows emit things in the latter form, so that we instead lower `single` as: ```glsl layout(std430) buffer _S1 { X _data[]; } single; ``` So that now a reference to `single[i]` becomes `single._data[i]` in GLSL. Most of that can be handled in the standard library translation of the structured buffer indexing operations. The only wrinkle there is that there were some old special-case instructions in the IR intended to handle buffer load/store operations (these were added back when I was trying to keep the "VM" path working). These aren't really needed to have structured-buffer operations work; they can be handled as ordinary functions as far as the stdlib is concerned. I removed the old instructions. Along the way, it became clear that a few other cases follow the same pattern. Byte-addressed buffers are an obvious case. We were lowering HLSL/Slang: ```hlsl ByteAddressBuffer b; ... uint x = b.Load(0); ``` to GLSL like: ```glsl layout(std430) buffer _S1 { uint b[]; }; ... uint x = b[0]; ``` That logic would fail for arrays the same way that the structured buffer case was failing. The fix is the same: use named `buffer` blocks and then introduce an explicit `_data` field: ```glsl layout(std430) buffer _S1 { uint _data[]; } b; ... uint x = b._data[0]; ``` Just like with structured buffers, all of the VK translation for operations on byte-addressed buffers can be implemented directly in teh stdlib, so once the emit logic was changed it was just a matter of adding `._data` to a bunch of VK tranlsations. It turns out that arrays of constant buffers have more or less the same problem, and furthermore we have some problems with any code that directly uses the modern HLSL `ConstantBuffer<T>` type. Note: the emit logic around constant buffers sometimes refers to "parameter groups" because that is being used in the compiler as a catch-all term for constant buffers, texture buffers, and parameter blocks. The existing code was going out of its way to reproduce the way that constant buffer declarations are implicitly referenced in HLSL: ```hlsl cbuffer C { float f; } ... float tmp = f; // No reference to `C` here ``` This can be seen in the emit logic with the `isDerefBaseImplicit` function, which is used to take the internal IR representation for a reference to `f` (which is closer to the expression `(C).f` or `C->f`) and leave off any reference to `C` so that we emit just `f`. That kind of logic just flat out doesn't work in some important cases. Arrays of constant buffers are a clear one: ```hlsl ConstantBuffer<X> cbArray[3]; ... X x = cbArray[0]; ``` There is no way to translate that to an ordinary `cbuffer` declaration at all. The same problem can be created without arrays, though: ```hlsl ConstantBuffer<X> singleCB; ... X x = singleCB; ``` The current strategy for translating constant buffers was translating `singleCB` into a `cbuffer` declaration that reproduced the fields of `X` as its members, which just wouldn't work: ```hlsl cbuffer singleCB { float f; // field of `X` } ... X x = singleCB; // ERROR: there is nothing named `singleCB` in this HLSL ``` The new strategy is more consistent. We still generate a `cbuffer` declaration for a single constant buffer, but we always give it a single field of the chosen element type: ```hlsl cbuffer singleCB { X singleCB; } ... X x = singleCB; // this works fine! ``` And in the array case we generate code that uses the explicit `ConstantBuffer<T>` type: ```hlsl ConstantBuffer<X> cbArray[3]; ... X x = cbArray[0]; ``` The GLSL output is more complicated because unlike with HLSL there is no implicit conversion from a uniform block to its element type (there is no notion of an element type). The array case thus needs a `_data` field similar to what we do for structured buffers: ```glsl layout(std140) uniform _S3 { X _data; } cbArray[3]; ... X x = cbArray[0]._data; ``` And then the non-array case needs to have a similar `_data` field for consistency: ```glsl layout(std140) uniform _S1 { X _data; } singleCB; ... X x = singleCB._data; ``` This is handled by inserting the necessary reference to `_data` whenever we dereference a constant buffer, either as part of a load instruction (loading from the whole CB as a pointer), or an `IRFieldAddress` instruction which forms a pointer into the CB (e.g., `&(singleCB->f)` becomes `singleCB._data.f`). The current emit logic handles `ParameterBlock<X>` differently from `ConstantBuffer<X>`, but really only to allow parameter blocks to be explicitly named in the output, while constant buffers were left implicit by default. Thus the only difference was a legacy one (from back when trying to exactly reproduce the HLSL text we got as input was considered an important goal), and the new approach to emitting constant buffers would get rid of it. I removed the separate logic for emitting `ParameterBlock<X>` and just let the handling for constant buffers deal with it. Note that any resource types inside of a `ParameterBlock<X>` would have been moved out as part of legalization, so that a parameter block is 100% equivalent to a constant buffer when it comes time to emit code. Unsurprisingly, changing the way we generate HLSL and GLSL output for all these buffer types meant that any tests that were directly comparing the output of `slangc` against `fxc`, `dxc`, or `glslang` broke. The basic approach to fixing the breakage in GLSL tests was to update the GLSL baseline to reflect the new output startegy. In some cases I used macros to name the various `_S<digits>` temporaries so that future renaming will hopefully be easier (it would be great if we auto-generated temporary names with a bit more context). There was one GLSL test (`tests/bugs/vk-structured-buffer-binding`) that was using raw GLSL expected output, and this was changed to use a GLSL baseline to generate SPIR-V for comparison. For HLSL tests we were sometimes running the same input file through `slangc` and `fxc`/`dxc`, and in these cases I macro-ized the various `cbuffer` declarations to generate different declarations depending on the compiler. I completely dropped the tests coming from the D3D SDK because they aren't providing much coverage, and updating them would change them so far from the original code that the purported benefit (using a body of existing shaders) would be lost. I also dropped the explicit matrix layout qualifiers in the `matrix-layout` test because the new output strategy breaks those for GLSL (you can't put matrix layout qualifiers on `struct` fields, and now the body of every constant buffer is inside a `struct`). This isn't as big of a loss as it seems, because our handling of those qualifiers wasn't really right to begin with. Slang users should only be setting the matrix layout mode globally (and we should probably switch to error out on the explicit qualifiers for now). The other thing that got dropped is tests involving `packoffset` modifiers. Slang already warns that it doesn't support these, and the way they were used in the test cases is actually misleading. For the binding/layout-related tests, the goal was to show that Slang reproduces the same layout as fxc, in which case explicitly enforcing a layout via `packoffset` seems like cheating (are we sure we enforced the layout fxc would have produced?). The real reason was that Slang used to emit explicit `packoffset` on every* field of a `cbuffer` it would output, because of an `fxc` bug where you couldn't use `register` on textures/samplers declared inside a `cbuffer` unless every field in the `cbuffer` used a `register` or `packoffset` modifier. Slang hasn't required that behavior in a while because it now splits textures and samplers, and the one test case where we needed `packoffset` to work around the `fxc` bug in the baseline HLSL has been macro-ified even more to work around the bug. The amount of churn in the test cases is unfortunate, but it continues to point at the weakness of any testing strategy that checks for exact equivalent between Slang's output and that of other compilers. We need to keep working to replace these tests with better alternatives. In `check.cpp` there is logic to perform implicit dereferencing, so that if you write `obj.f` where `obj` is a `ConstantBuffer<X>` (or some other "pointer-like" type) and `f` is a field in `X`, then this effectively translates as `(obj).f`. That is, we dereference the value of type `ConstantBuffer<X>` to get a value of type `X`, and then refer to the field of the `X` value. There was a problem where the logic to insert that kind of implicit dereference operation was using a reference (`auto& type = ...`) for the type of the expression being dereferenced, and then clobbering it. This would mean that an expression of type `ConstantBuffer<X>` would have its type overwritten to be just `X` and then codegen would break later on. I'm not sure how we haven't run into that before. The `array-of-buffers` test case was added to confirm that we now support arrays of constant, structured, and byte-address buffers for both DXIL and SPIR-V output. Okay, so that was a lot of stuff, but hopefully it is clear how this all works to make the output of the compiler more consistent and explicit, while also supporting the required new functionality. fixup: review feedback
*	A few extra functions cross-compiled to Vulkan. (#738)	Tim Foley	2018-12-03
\| \| \| \| \|	This is building on the work done in #737, and just borrows translations from [this](https://anteru.net/blog/2016/mapping-between-hlsl-and-glsl/index.html) blog post. One thing that I don't try to get right here, and that seems to be a recurring pattern is that there are something HLSL functions that operate on matrix types where the GLSL equivalents only work on vectors. These can be handled in "user space" by writing an explicit (profile-specific) overload that just calls the vector function on each row, but that adds complexity in the stdlib that I'm avoiding for now (since almost nobody performs these operations on matrices anyway).
*	Add Vulkan translations for HLSL barrier operations (#737)	Tim Foley	2018-12-03
\| \| \| \| \| \| \| \| \|	Fixes #730 The most basic of these already had translations added, and the Slang approach to cross-compilation made adding the new ones almost trivial. I also took the time to confirm that using "operator comma" for the sequencing (since the single HLSL function call needs to expand to a sequence of GLSL function calls) works fine with glslang. I added a test case to confirm that all of these operations can compile down to SPIR-V. I am not personally confident that these translations are perfect, but they are based on what was [linked](https://anteru.net/blog/2016/mapping-between-hlsl-and-glsl/index.html) from #730. The one difference is where that blog post was listing `memoryBarrierImage, memoryBarrierImage` I assumed they meant `memoryBarrierImage, memoryBarrierBuffer`.
*	Add Vulkan cross-compilation for byte-address buffers (#721)	Tim Foley	2018-11-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add Vulkan cross-compilation for byte-address buffers This covers `ByteAddressBuffer`, `RWByteAddressBuffer`, and `RasterizerOrderedByteAddressBuffer`. A declaration of any of these types translates to a GLSL `buffer` declaration with a single `uint` array of data. Most of the methods on these types then have straightforward translations to operations on the array. The overall translation is similar to what was already being done for structured buffers. While implementing GLSL translation for the various atomic (`Interlocked`) methods, I discovered that some of these included declarations that aren't actually included in HLSL. I cleaned these up, including in the declarations of the global `Interlocked` functions. The test case that is included here covers only the most basic functionality: `Load`, `Load2`, `Load4` and `Store`. We should try to back-fill tests for the remaining methods when we have time. Two large caveats with this work: 1. We don't handle arrays of byte-address buffers, just as we don't handle arrays of structured buffers. That will take additional work. 2. We don't handle byte-address (or structured) buffers being passed as function parameters, since the parameter would need to be declared as a bare `uint[]` array. * Fixup: don't lump raytracing acceleration structures in with buffers Raytracing acceleration structures share a common base class with byte-address buffers (they are both buffer resources without a specific element type), and I was mistakently matching on this base class in an attempt to have a catch-all that applied to all byte-address buffers. The fix here was to add a distinct base class for all byte-address buffers and catch that instead. * Fixup: typos
*	Add callable shader support for Vulkan ray tracing (#718)	Tim Foley	2018-11-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add callable shader support for Vulkan ray tracing This change extends the previous work to update Vulkan ray tracing support for the finished `GL_NV_ray_tracing` spec. One of the features missing in the experimental extension that was added to the final spec is "callable shaders," which allow ray tracing shaders to call other shaders as general-purpose subroutines. Most of the implementation work here mirrors what was done for the `TraceRay()` function to map it to `traceNV()`. We map the generic `CallShader<P>` function to the non-generic `executeCallableNV`, with a payload identifier that indicates a specific global variable of type `P` (the global variable being generated from a `static` local in `CallShader`). A new modifier is added to identify the payload structure, and the parameter binding/layout logic introduces a new resource kind for callable-shader payload data (where previously the logic had assumed ray and callable payloads should use the same resource kind). Two test shaders are included: one for the callable shader (`callable.slang`) and one for a ray generation shader that calls it (`callable-caller.slang`). Just for kicks, the payload data type is defined in a shared file so that we can be sure the two agree (trying to emulate what might be good practice, and ensure that ray tracing support works together with other Slang mechanisms). * Typo fix: assocaited->associated One instance was found in review, but I went ahead and fixed a bunch since I seem to make this typo a lot. * Typo fix: defintiion->definition
*	Update Vulkan ray tracing support to final extension spec (#717)	Tim Foley	2018-11-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Update version of glslang used * Update VK raytracing support for final extension spec A lot of this change is just plain renaming: The `NVX` suffixes become just `NV`, and the extension name changes from `GL_NVX_raytracing` to `GL_NV_ray_tracing`. The Slang standard library and the GLSL baselines for the tests are consistently updated. The other detail is that the final spec requires the "payload" identifier in a `traceNV()` call to be a compile-time constant, which means it cannot be defined as a local variable first, as in: ```glsl int payloadID = 0; traceNV(..., payloadID); // ERROR ``` In terms of how the original support was implemented, the payload ID is being computed via a special builtin function that maps each global GLSL payload variable to a unique ID. There are a few ways we could try to resolve the problem here: 1. We could aspire to put our equivalent of the `constexpr` modifier on the output of the function, so that the GLSL variable gets declared `const` and thus fits the GLSL rules for a constant expression. 2. We could introduce a pass to replace the payload-location instructions with literal integers. 3. We could use a special-purpose instruction instead of a builtin function call, and have that instruction indicate that it doesn't have side effects (so it can be folded into the call site) 4. We could somehow mark the builtin function as not having side effects. We choose option (4) simply because it provides a feature that could have other applications. This change adds a `[__readNone]` attribute that can be applied to function declarations to express a promise on the part of the programmer that the given function has no side effects and computes its result strictly from the bits of its input arguments (and not things they point to, etc.). This mirrors an equivalent function attribute in LLVM. We mark the function that computes a ray payload location with this attribute, and propagate the attribute through the layers of the IR, so that when the emit logic asks if an operation has side effects (to see if it can be folded into the arguments of a subsequent expression), we get an affirmative response. This change should get all of the features that were present in the experiemntal `NVX` extension working with the final extension spec. It does not address callable shaders, which will come as a subsequent change.
*	Translate NonUniformResourceIndex() calls to Vulkan GLSL (#713)	Tim Foley	2018-11-06
\| \| \| \| \|	These calls translate to uses of the `nonuniformEXT` qualifier introduced by the `GL_EXT_nonuniform_qualifier` extension. The standard library changes in this case are straightforward uses of existing compiler mechanisms. The test case is one of the less pleasant ones where we compare SPIR-V output against SPIR-V generated from a hand-coded GLSL baseline. This is a case where a simpler test type that just checks for specific textual matches in the output (and not whole files) would be better, but that is out of scope for this change.
*	Fix declarations of InterlockedMin/Max (#700)	Tim Foley	2018-10-30
\| \| \| \| \|	Fixed #699 These functions were declared with an `in out` parameter (copy in, copy out) where they should have used `__ref` (true by-reference parameter passing).
*	Support cross-compilation of ray tracing shaders to Vulkan (#663)	Tim Foley	2018-10-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Move to newer glslang * Support cross-compilation of ray tracing shaders to Vulkan This change allows HLSL shaders authored for DirectX Raytracing (DXR) to be cross-compiled to run with the experimental `GL_NVX_raytracing` extension (aka "VKRay"). * The GLSL extension spec is marked as experimental, so that any shaders written using this support should be ready for breaking changes when the spec is finalized. * "Callable shaders" are not exposed throug the GLSL extension, so this feature of DXR will not be cross-compiled. * The experimental Vulkan raytracing extension does not have an equivalent to DXR's "local root signature" concept. This does not visibly impact shader translation (because the local/global root signature mapping is handled outside of the HLSL code), but in practice it means that applications which rely on local root signatures on their DXR path will not be able to use the translation in this change as-is; more work will be needed. The simplest part of the implementation was to go into the Slang standard library and start adding GLSL translations for the various DXR operations. In some cases, like mapping `IgnoreHit()` to `ignoreIntersectionNVX()` this is almost trivial. The various functions to query system-provided values (e.g., `RayTMin()`) were also easy, with the only gotcha being that they map to variables rather than function calls in GLSL, and our handling of `__target_intrinsic` assumes that a bare identifier represents a replacement function name, and not a full expression, so we have to wrap these definitions in parentheses. The tricky operations are then `TraceRay<P>()` and `ReportHit<A>()`, because these two are generics/templates in HLSL. GLSL doesn't support generics, even for "standard library" functions, so the raytracing extension implements a slightly complex workaround: the matching operations `traceNVX()` and `reportIntersectionNVX()` pass the payload/attributes argument data via a global variable. That is, shader code for the GLSL extensions writes to the global variable and then calls the intrinsic function. The linkage between the call site and the global is established by a modifier keyword (`rayPayloadNVX` and `hitAttributeNVX`, respectively) and in the case of ray payload also uses `location` number to identify which payload global to use (since a single shader can trace rays with multiple payload types). Our translation strategy in Slang tries to leverage standard language mechanisms instead of special-case logic. For example, to translate the `ReportHit<A>()` function, we provide both a default declaration that will work for HLSL (where the operation is built-in with the signature given), and a definition marked with the `__specialized_for_target(glsl)` modifier. The GLSL definition declares a function `static` variable that will fill the role of the required global, and then does what the GLSL spec requires: assigns to the global, and then calls the `reportIntersectionNVX` builtin (which we declare as a separate builtin). Our ordinary lowering process will turn that `static` variable into an ordinary global in the IR, and the `[__vulkanHitAttributes]` attribute on the variable will be emitted as `hitAttributeNVX` in the output. There is no additional cross-compilation logic in Slang specific to `ReportHit<A>()` - the target-specific definition in the standard library Just Works. The case for `TraceRay<P>()` is a bit more complicated, simply because the GLSL `traceNVX()` function needs to be passed the `location` for the payload global. We implement the payload global as a function-`static` variable, with the knowledge that every unique specialization of `TraceRay<P>()` will generate a unique global variable of type `P` to implement our function-`static` variable. We then add a slightly magical builtin function `__rayPayloadLocation()` that can map such a variable to its generated `location`; the logic for this is implemented in `emit.cpp` and described below. We also changed the `RayDesc` and `BuiltinTriangleIntersectionAttributes` types from "magic" intrinsic types over to ordinary types (because the GLSL output needs to declare them as ordinary `struct` types). This ends up removing some cases in the AST and IR type representations. By itself this change would break HLSL emit, because in that case the types really are intrinsic. We added a `__target_intrinsic` modifier to these types to make them intrinsic for HLSL, and then updated the downstream passes to handle the notion of target-intrinsic types. The logic for binding/layout of entry point inputs and outputs was updated so that raytracing stages don't follow the default logic for varying input/output parameters. This is because the input/output parameters of a raytracing entry point aren't really "varying" in the same sense as those in the rasterization pipeline. In particular, the SPIR-V model for raytracing input and output treats "ray payload" and "hit attributes" parameters as being in a distinct storage class from `in` or `out` parameters. We also detect cases where a ray tracing stage declares inputs/outputs that it shouldn't have. This logic could conceivably be extended to other stages (e.g., to give an error on a compute shader with user-defined varying input/output). The type layout logic added cases for handling raytracing payload and hit-attribute data, but this is currently just a stub implementation that follows the same logic as for varying `in` and `out` parameters (it cannot give meaningful byte sizes/offsets right now). To my knowledge the GLSL spec doesn't currently specify anything about layout, and I haven't read the DXR spec language carefully enough to know what it says about layout. A future change should update the layout logic to allow for byte-based layout of ray payloads, etc. so that we can query this information via reflection. The GLSL legalization logic in `ir.cpp` was updated to factor out the per-entry-point-parameter code into its own function, and then that function was updated to special-case the input/output of a ray-tracing shader. While for rasterization stages we typically want to take the user-declared input/output and "scalarize" it for use in GLSL (in part to deal with language limitations, and in part to tease system values apart from user-defined input/output), the GLSL spec for raytracing requires payload and hit attribute parameters to be declared as single variables. There is also the issue that even for an `in out` parameter, a ray payload parameter should only turn into a single global, whereas the handling for varying `in out` parameters generates both an `in` and an `out` global for the GLSL case. Other than the handling of entry point parameters, the GLSL legalization pass doesn't need to do anything special for ray tracing shaders. The trickiest change in the `emit.cpp` logic is that we now generate `location`s for ray payload arguments (the outgoing from a `TraceRay()` call) on demand during code generation. This is a bit hacky, and it would be nice to handle it as a separate pass on the IR rather than clutter up the emit logic, but this approach was expedient. Basically, any of the global variables that got generated from the `static` declarations in the standard library implementation of `TraceRay()` will trigger the logic to assign them a `location`. The logic for emitting intrinsic operations added a few new `$`-based escape sequences. The `$XP` case handles emitting the location of a generated ray payload variable; this is how we emit the matching location at the site where we call `traceNVX`. The `$XT` case emits the appropriate translation for `RayTCurrent()` in HLSL, because it maps to something different depending on the target stage. All of the test cases here consist of a pair of an HLSL/Slang shader written to the DXR spec, plus a matching GLSL shader for a baseline. The GLSL shaders are carefully designed so that when fed into glslang they will produce the same SPIR-V as our cross-compilation process. This kind of testing is quite fragile, but it seems to be the best we can do until our testing framework code supports both DXR and VKRay. A bunch of the core changes ended up being blocked on issues in the rest of the compiler, so some additional features go implemented or fixed along the way: The first big wall this work ran into was that the `__specialized_for_target` modifier hasn't actually been working correctly for a while. It turns out that for the one function that is using it, `saturate()`, we have been outputting the workaround GLSL function in all cases (including for HLSL output) rather than only on GLSL targets. The problem here is that for a generic function with a `__specialized_for_target` modifier or a `__target_intrinsic` modifier, the IR-level decoration will end up attached to the `IRFunc` instruction nested in the `IRGeneric`, but the logic for comparing IR declarations to see which is more specialized (via `getTargetSpecializationLevel()`) was looking only at decorations on the top-level value (the generic). The quick (hacky) fix here is to make `getTargetSpecializationLevel()` try to look at the return value of a generic rather than the generic itself, so that it can see the decorations that indicate target-specific functions. A more refined fix would be to attach target-specificity decorations to the outer-most generic (to simplify the "linking" logic). The only reason not to fold that into the current fix is that the `__target_intrinsic` modifier currently serves double-duty as a marker of target specialization and information to drive emit logic. The latter (the emit-related stuff) currently needs to live on the `IRFunc`, and moving it to the generic could easily break a lot of code. This needs more work in a follow-on fix, but for now target specialization should again be working. The other big gotcha that the simple "just use the standard library" strategy ran into was that function-`static` variables weren't actually implemented yet, and in particular function-`static` variables inside of generic functions required some careful coding. The logic in `lower-to-ir.cpp` has this `emitOuterGenerics()` function that is supposed to take a declaration that might be nested inside of zero or more levels of AST generics, and emit corresponding IR generics for all those levels. This is needed because two different AST functions nested inside a single generic `struct` declaration should turn into distinct `IRFunc`s nested in distinct `IRGeneric`s. The tricky bit to making that all work is that the same AST-level generic type parameter will then map to different IR-level instructions (the parameters of distinct `IRGeneric`s) when lowering each function. The existing logic handled this in an idiomatic way by making "sub-builders" and "sub-contexts." This change refactors some of the repeated logic into a `NestedContext` type to help simplify the pattern, and applies it consistently throughout the `lower-to-ir.cpp` file. Besides that cleanup, the major change is `lowerFunctionStaticVarDecl` which, unsurprisingly, handles lower of function-`static` variables to IR globals. The careful handling of nested contexts here is needed because if we are in the middle of lowering a generic function, then a `static` variable should turn into its own `IRGeneric` wrapping an `IRGlobalVar`. The body of the function should refer to the global variable by specializing the global variable's `IRGeneric` to the parameters of the functions `IRGeneric`. This tricky detail is handled by `defaultSpecializeOuterGenerics`. An additional subtlety not actually required for this raytracing work (and thus not properly tested right now) is handling function-`static` variables with initializers. These can't just be lowered to globals with initializers, because HLSL follows the C rule that function-`static` variables are initialized when the declaration statement is first executed (and this could be visible in the presence of side-effects). The lowering strategy here translates any `static` variable with an initializer into two globals: one for the actual storage, plus a second `bool` variable to track whether it has been initialized yet. There are some opportunities to optimize this case, especially for `static const` data, but that will need to wait for future changes. We've slowly been shifting away from the model where a user thinks of a "profile" as including both a stage and a feature level. Instead, the user should think about selecting a profile that only describes a feature level (e.g., `sm_6_1`, `glsl_450`, etc.), and then separately specifying a stage (`vertex`, `raygeneration, etc.) for each entry point. The challenge here is that the command-line processing still only had a single `-profile` switch, and no way to specify the stage. Adding the `-stage` option was relatively easy, but making it work with the existing validation logic for command-line arguments was tricky, because of the complex model that `slangc` supports for compiling multiple entry points in a single pass. * In `slang.h` add new reflection parameter categories for ray payloads and hit attributes, as part of entry point input/output signatures. * A previous change already updated our copy of glslang to one that supports the `GL_NVX_raytracing` extension, so in `slang-glslang.cpp` we just needed to map Slang's `enum` values for the raytracing stage names to their equivalents in the glslang code. * Moved the logic for looking up a stage by name (`findStageByName()`) out of `check.cpp` and into `compiler.cpp`, with a declaration in `profile.h` * Added a `$z` suffix to the GLSL translation of `Texture.SampleLevel()`, to handle cases where the texture element type is not a 4-component vector. Note that this fix should actually be applied to all* these texture-sampling operations, but I didn't want to add a bunch of changes that are (clearly) not being tested right now. * The layout logic for entry points was updated to correctly skip producing a `TypeLayout` for an entry point result of type `void`, which meant that the related emit logic now needs to guard against a null value for the result layout. * In `ir.cpp`, dump decorations on every instruction instead of just selected ones, so that our IR dump output is more complete. * Added a command-line `-line-directive-mode` option so that we can easily turn off `#line` directives in the output when debugging. Not all cases where plumbed through because the `none` case is realistically the most important. * Parser was fixed to properly initialize parent links for "scope" declarations used for statements, so that we can walk backwards from a function-scope variable (including a `static`) and see the outer function/generics/etc. * Added GLSL 460 profile, since it is required for ray tracing. Also updated the logic for computing the "effective" profile to use to recognize that GLSL raytracing stages require GLSL 460. * Added some conventional ray-tracing shader suffixes to the handling in `slang-test`. This code isn't actually used, but was relevant when I started by copy-pasting some existing VKRay shaders as the starting point for my testing. * Fixup: typos
*	Update DXR API definitions for final spec. (#659)	Tim Foley	2018-10-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Update DXR API definitions for final spec. The final version of the DXR API has changed the result type of the `DispatchRaysIndex()` and `DispatchRaysDimensions()` builtins to `uint3` (from `uint2`). * Add updates for DXR object<->world transformations The `ObjectToWorld()` and `WorldToObject()` functions were renamed to `ObjectToWorld3x4()` and `WorldToObject3x4()`, resepctively, and then new functions `ObjectToWorld4x3()` and `WorldToObject4x3()` were added to give convenient access to the transpose of these matrices. (No, I'm not clear on why user's couldn't just call `transpose()`, either) I've left the old function names in the standard library as forwarding functions just so that we don't break existing DXR code that relied on the old names.