slang.git - Making it easier to work with shaders

Age	Commit message (Collapse)	Author
2020-06-11	Fix problem with C++ extractor ernoneous concating of type tokens (#1382)	jsmall-nvidia
	* Try to fix problem with C++ extractor concating tokens producing an erroneous result. * Improve naming/comments around C++ extractor fix. * Another small improvement around space concating when outputing token list. * Handle some more special cases for consecutive tokens for C++ extractor concat of tokens.
2020-06-05	Fix FindTypeByName reflection API not finding stdlib types.	Yong He

2020-05-28	Small improvements to documentation and code around DiagnosticSink (#1359)	jsmall-nvidia

2020-05-26	Synthesize "active mask" for CUDA (#1352)	Tim Foley
	* Synthesize "active mask" for CUDA The Big Picture =============== The most important change here is to `hlsl.meta.slang`, where the declaration of `WaveGetActiveMask()` is changed so that instead of mapping to `__activemask()` on CUDA (which is semantically incorrect) it maps to a dedicated IR instruction. The other `WaveActive()` intrinsics that make use of the implicit "active mask" concept had already been changed in #1336 so that they explicitly translate to call the equivalent `WaveMask()` intrinsic with the result of `WaveGetActiveMask()`. As a result, all of the `WaveActive()` functions are now no different from a user-defined function that uses `WaveGetActiveMask()`. The bulk of the work in this change goes into an IR pass to replace the new instruction for getting the active mask gets replaced with appropriately computed values before we generate output CUDA code. That work is in `slang-ir-synthesize-active-mask.{h,cpp}`. Utilities ========= There are a few pieces of code that were helpful in writing the main pass but that can be explained separately: IR instructions were added corresponding to the Slang `WaveMaskBallot()` and `WaveMaskMatch()` functions, which map to the CUDA `__ballot_sync()` and `__match_any_sync()` operations, respectively. These are only implemented for the CUDA target because they are only being generated as part of our CUDA-only pass. * The `IRDominatorTree` type was updated to make it a bit more robust in the presence of unreachable blocks in the CFG. It is possible that the same ends could be achieved more efficiently by folding the corner cases into the main logic, but I went ahead and made things very explicit for now. * I added an `IREdge` utility type to better encapsulate the way that certain code operating on the predecessors/successors of an `IRBlock` were using an `IRUse` to represent a control-flow edge. The `IREdge` type makes the logic of those operations more explicit. A future change should proably change it so that `IRBlock::getPredecessors()` and `getSuccessors()` are instead `getIncomingEdges()` and `getOutgoingEdges()` and work as iterators over `IREdge` values, given the way that the predecessor and successor lists today can contain duplicates. Using the above `IREdge` type, the logic for detecting and break critical edges was broken down into something that is a bit more clear (I hope), and that also factors out the breaking of an edge (by inserting a block along it) into a reusable subroutine. The Main Pass ============= The implementation of the new pass is in `slang-ir-synthesize-active-mask.cpp`, and that file attempts to include enough comments to make the logic clear. A brief summary for the benefit of the commit history: * The first order of business is to identify functions that need to have the active mask value piped into them, and to add an additional parameter to them so that the active mask is passed down explicitly. Call sites are adjusted to pass down the active mask which can then result in new functions being identified as needing the active mask. * The next challenge is for a function that uses the active mask, to compute the active mask value to use in each basic block. The entry block can easily use the active mask value that was passed in, while other blocks need more work. * When doing a conditional branch, we can compute the new mask for the block we branch to as a function of the existing mask and the branch condition. E.g., the value `WaveMaskBallot(existingMask, condition)` can be used as the mask for the "then" block of an `if` statement. * When control flow paths need to "reconverge" at a point after a structured control-flow statement, we need to insert logic to synchronize and re-build the mask that will execute after the statement, while also excluding any lanes/threads that exited the statement in other ways (e.g., an early `return` from the function). The explanation here is fairly hand-wavy, but the actual pass uses much more crisp definitions, so the code itself should be inspected if you care about the details. Tests ===== The tests for the new feature are all under `tests/hlsl-intrinsic/active-mask/`. Most of them stress a single control-flow construct (`if`, `switch`, or loop) and write out the value of `WaveGetActiveMask()` at various points in the code. In practice, our definition of the active mask doesn't always agree with what D3D/Vulkan implementations seem to produce in practice, and as a result a certain amount of effort has gone into adding tweaks to the tests that force them to produce the expected output on existing graphics APIs. These tweaks usually amount to introducing conditional branches that aren't actually conditional in practice (the branch condition is always `true` or always `false` at runtime), in order to trick some simplistic analysis approaches that downstream compilers seem to employ. One test case currently fails on our CUDA target (`switch-trivial-fallthrough.slang`) and has been disabled. This is an expected failure, because making it produce the expected value requires a bit of detailed/careful coding that would add a lot of additional complexity to this change. It seemed better to leave that as future work. Future Work =========== * As discussed under "Tests" above, the handling of simple `switch` statements in the current pass is incomplete. * There's an entire can of worms to be dealt with around the handling of fall-through for `switch`. * The current work also doesn't handle `discard` statements, which is unimportant right now (CUDA doesn't have fragment shaders), but might matter if we decide to synthesize masks for other targets. Similar work would probably be needed if we ever have `throw` or other non-local control flow that crosses function boundaries. * An important optimization opportunity is being left on the floor in this change. When block that comes "after" a structured control-flow region (which is encoded explicitly in Slang IR and SPIR-V) post-dominates the entry block of the region, then we know that the active mask when exiting the region must be the same as the mask when entering the region, and there is no need to insert explicit code to cause "re-convergence." This should be addressed in a follow-on change once we add code to Slang for computing a post-dominator tree from a function CFG. * Related to the above, the decision-making around whether a basic block "needs" the active mask is perhaps too conservative, since it decides that any block that precedes one needing the active mask also needs it. This isn't true in cases where the active mask for a merge block can be inferred by post-dominance (as described above), so that the blocks that branch to it don't need to compute an active mask at all. * If/when we extend the CPU target to support these operations (along with SIMD code generation, I assume), we will also need to synthesize an active mask on that platform, but the approach taken here (which pretty much relies on support for CUDA "cooperative groups") wouldn't seem to apply in the SIMD case. * Similarly, the approach taken to computing the active mask here requires a new enough CUDA SM architecture version to support explicit cooperative groups. If we want to run on older CUDA-supporting architectures, we will need a new and potentially very different strategy. * Because the new pass here changes the signature of functions that require the active mask (and not those that don't), it creates possible problems for generating code that uses dynamic dispatch (via function pointers). In principle, we need to know at a call site whether or not the callee uses the active mask. There are multiple possible solutions to this problem, and they'd need to be worked through before we can make the implicit active mask and dynamic dispatch be mutually compatible. * Related to changing function signatures: no effort is made in this pass to clean up the IR type of the functions it modifies, so there could technically be mismatches between the IR type of a function and its actual signature. If/when this causes problems for downstream passes we probably need to do some cleanup. * fixup: backslash-escaped lines I did some "ASCII art" sorts of diagrams to explain cases in the CFG, and some of those diagrams used backslash (`\`) characters as the last character on the line, causing them to count as escaped newlines for C/C++. The gcc compiler apparently balked at those lines, since they made some of the single-line comments into multi-line comments. I solved the problem by adding a terminating column of `\|` characters at the end of each line that was part of an ASCII art diagram. * fixup: typos Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
2020-05-26	Improvements around hashing (#1355)	jsmall-nvidia
	* Fields from upper to lower case in slang-ast-decl.h * Lower camel field names in slang-ast-stmt.h * Fix fields in slang-ast-expr.h * slang-ast-type.h make fields lowerCamel. * slang-ast-base.h members functions lowerCamel. * Method names in slang-ast-type.h to lowerCamel. * GetCanonicalType -> getCanonicalType * Substitute -> substitute * Equals -> equals ToString -> toString * ParentDecl -> parentDecl Members -> members * * Make hash code types explicit * Use HashCode as return type of GetHashCode * Added conversion from double to int64_t * Split Stable from other hash functions * toHash32/64 to convert a HashCode to the other styles. GetHashCode32/64 -> getHashCode32/64 GetStableHashCode32/64 -> getStableHashCode32/64 * Other Get/Stable/HashCode32/64 fixes * GetHashCode -> getHashCode * Equals -> equals * CreateCanonicalType -> createCanonicalType * Catches of polymorphic types should be through references otherwise slicing can occur. * Fixes for newer verison of gcc. Fix hashing problem on gcc for Dictionary. * Another fix for GetHashPos * Fix signed issue around GetHashPos
2020-05-20	AST dumping via C++ Extractor reflection (#1348)	jsmall-nvidia
	* Add support for parsing array types to C++ extractor. * C++ extractor looks for 'balanced tokens'. Use for extracting array suffixes. * First pass at field dumping. * Update project for field dumping. * WIP AST Dumper. * More AST dump compiling. * Fix bug in StringSlicePool where it doesn't use the copy of the UnownedStringSlice in the map. * Add support for SLANG_RELFECTED and SLANG_UNREFLECTED More AST dump support. * Support for hierarchical dumping/flat dumping. Use SourceWriter to dump. * Add -dump-ast command line option. * Add fixes to VS project to incude AST dump. * Fix compilation on gcc. * Add fix for type ambiguity issue on x86 VS. * Fixes from merge of reducing Token size. * Fix comment about using SourceWriter.
2020-05-19	Reduce the size of Token (#1349)	jsmall-nvidia
	* Token size on 64 bits is 24 bytes (from 40). On 32 bits is 16 bytes from 24. * Added hasContent method to Token. Some other small improvements around Token.
2020-05-18	Use 'balance' for extracting array suffix in C++ extractor (#1346)	jsmall-nvidia
	* Add support for parsing array types to C++ extractor. * C++ extractor looks for 'balanced tokens'. Use for extracting array suffixes.
2020-05-14	Add support for parsing array types to C++ extractor. (#1343)	jsmall-nvidia

2020-05-08	AST nodes using C++ Extractor (#1341)	jsmall-nvidia
	* Extractor builds without any reference to syntax (as it will be helping to produce this!). * Change macros to include the super class. * WIP replacing defs files. * Added indexOf(const UnownedSubString& in) to UnownedSubString. Refactored extractor * Output a macro for each type with the extracted info - can be used during injection in class * Simplify the header file - as can get super type and last from macro now * Store the 'origin' of a definition * Some small tidy ups to the extractor. * Improve comments on the extractor options. * Made CPPExtractor own SourceOrigins * Small fixes around SourceOrigin. * Small tidy up around macroOrign * WIP Visitor seems now to work correctly. Split out types used by ast into slang-ast-support-types.h * Fix remaining problems with C++ extractor being used with AST nodes. Add CountOf to extractor type ids. Added ReflectClassInfo::getInfo to turn an ASTNodeType into a ReflectClassInfo * Fix compiling on linux. Fix typo in memset. * Small tidy up around comments/layout. Moved NodeBase casting to NodeBase. * Make premake generate project that builds with cpp-extractor for AST. * Get the source directory from the filter in premake. * Fix typo in source path * Explicitly set the source path for premake generation for AST. * Special case handling of override to apease Clang. * Use a more general way to find the slang-ast-reflect.h file to run the extractor. * Appveyor is not triggering slang-cpp-extractor - try putting dependson together. * Put building slang-cpp-extractor first. * Disable some project options to stop MSBuild producing internal compiler errors. * Try reordering the projects in premake5.lua * Hack to try and make slang-cpp-extractor built on appveyor. * Disable flags - not required for MSBuild on appveyor. * Disable flags not required for build on AppVeyor. * Updated Visual Studio projects with slang-cpp-extractor. * Added Visual Studio slang-cpp-extractor project.
2020-05-07	Enhanced C++ extractor (#1340)	jsmall-nvidia
	* Extractor builds without any reference to syntax (as it will be helping to produce this!). * Change macros to include the super class. * Added indexOf(const UnownedSubString& in) to UnownedSubString. Refactored extractor * Output a macro for each type with the extracted info - can be used during injection in class * Simplify the header file - as can get super type and last from macro now * Store the 'origin' of a definition * Some small tidy ups to the extractor. * Improve comments on the extractor options. * Made CPPExtractor own SourceOrigins * Small fixes around SourceOrigin. * Small tidy up around macroOrign
2020-05-04	C++ Extractor (#1337)	jsmall-nvidia
	* WIP: Doing texing using slangs lexer for cpp-extractor * Node tree for C++ extraction. * Bug fixing. Add dump of hierarchy. * First pass at extracting fields. * Parse template types. * Use diagnostics defs for C++ extractor. * Simplify Diagnostic Defs. * Remove the brace stack. * Added IdentifierLookup. * Add handling for >> style template close. * Improved identifier handling/keywords. * Added ability to check if reader is at cursor position. * Handling of an unspecified root type. * Parsing code comments. Tidy up some parsing - to use advanceIf functions more. * Improve path handling. * Fixes around changes to Path interface. * Working Range, Type and Scope header. * Extract the middle part of marker and put in output. Gives more flexibility at macro injection, and in class definitions. * Split DERIVED types into it's own macro, to provide way to generate for derived types. * Fix clang/g++ compile issue. * Tabs -> spaces. * Fix small bug in getFileNameWithoutExt * Small improvement around naming. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-04-22	Disable OptiX tests by default. (#1331)	Tim Foley
	When running `slang-test`, the OptiX tests will be skipped by default for now, and must be explicitly enabled by adding `-category optix` on the command line. I will need to add a better discovery mechanism down the line, closer to how support for different graphics APIs is being tested, but for now this should be enough to unblock our CI builds.
2020-04-21	Small Improvements around Wave Intrinsics (#1328)	jsmall-nvidia
	* Fix issues in wave-mask/wave.slang tests. WaveGetActiveMask -> WaveGetConvergedMask. Update target-compatibility.md * First pass at wave-intrinsics.md documentation. Write up around WaveMaskSharedSync. * Added more of the Wave intrinsics as WaveMask intrinsics. Improvements to documentation around wave-intrinsics. * Add the Wave intrinsics for SM6.5 for WaveMask Expand WaveMask intrinsics Improve WaveMask documentation * Added WaveMaskIsFirstLane. * Added WaveGetConvergedMask for glsl and hlsl. Added wave-get-converged-mask.slang test. * WaveGetActiveMask/Multi and WageGetConvergedMask/Multi * Improve Wave intrinsics docs. Adde WaveGetActveMulti WaveGetConvergedMulti, WaveGetActiveMask (for vk/hlsl). * Enable GLSL WaveMultiPrefixBitAnd. * Re-add definitions of f16tof32 and f32to16 from #1326 * Remove multiple definition of f32tof16 Disable optix call to Ray trace test, if OPTIX not available. * Improve wave intrinsics documetnation - remove the __generic as part of definitions, small improvements. * Change comment to try and trigger build.
2020-04-17	Add support for global shader parameters to OptiX path (#1323)	Tim Foley
	There are two main pieces here. First, we specialize the code generaiton for CUDA kernels to account for the way that shader parameters are passed differently for ordinary compute kernels vs. ray-tracing kernels. Both global and entry-point shader parameters in Slang are translated to kernel function parameters for CUDA compute kernels, while for OptiX ray tracing kernels we need to use a global `__constant__` variable for the global parameters, and the SBT data (accessed via an OptiX API function) for entry-point shader parameters. This choice bakes in a few pieces of policy when it comes to how Slang ray-tracing shaders translate to OptiX: * It fixes the name used for the global `__constant__` variable for global shader parameters to be `SLANG_globalParams`. Since that name has to be specified when creating a pipeline with the OptiX API, the choice of name effectively becomes an ABI contract for Slang's code generation. * It fixes the choice that global parameters in Slang map to per-launch parameters in OptiX, and entry-point parameters in Slang map to SBT-backed parameters in OptiX. This is a reasonable policy, and it is also one that we are likely to need to codify for Vulkan as well, but it is always a bit unfortunate to bake policy choices like this into the compiler (especially when shaders compiled for D3D can often decouple the form of their HLSL/Slang code from how things are bound in the API). The second piece is a lot of refactoring of the logic in `render-test/cuda/cuda-compute-util.cpp`, so that the logic for setting up (and reading back) the buffers of parameter data can be shared between the compute and ray-tracing paths. The result may not be a true global optimum for how the code is organized, but it at least serves the goal of not duplicating the parameter-binding logic between compute and ray-tracing.
2020-04-16	Workaround for matching of dxc diagnostics (#1324)	jsmall-nvidia
	* Specialized handling for comparison of dxc output that ignores line/column numbers. * Simplify areAllEqualWithSplit.
2020-04-10	Fix CUDA build of render-test (#1316)	Tim Foley
	The CUDA build of the render-test tool had been broken in a fixup change to #1307 (which was ostensibly adding features for the CUDA path). The fix is a simple one-liner.
2020-04-08	Initial work to support OptiX output for ray tracing shaders (#1307)	Tim Foley
	* Initial work to support OptiX output for ray tracing shaders This change represents in-progress work toward allowing Slang/HLSL ray-tracing shaders to be cross-compiled for execution on top of OptiX. The work as it exists here is incomplete, but the changes are incremental and should not disturb existing supported use cases. One major unresolved issue in this work is that the OptiX SDK does not appear to set an environment variable Changes include: * Modified the premake script to support new options for adding OptiX to the build. Right now the default path to the OptiX SDK is hard-coded because the installer doesn't seem to set an environment variable. We will want to update that to have a reasonable default path for both Windows and Unix-y platforms in a later chance. * I ran the premake generator on the project since I added new options, which resulted in a bunch of diffs to the Visual Studio project files that are unrelated to this change. Many of the diffs come from previous edits that added files using only the Visual Studio IDE rather than by re-running premake, so it is arguably better to have the checked-in project files more accurately reflect the generated files used for CI builds. * The "downstream compiler" abstraction was extended to have an explicit notion of the kind of pipeline that shaders are being compiled for (e.g., compute vs. rasterization vs. ray tracing). This option is used to tell the NVRTC case when it needs to include the OptiX SDK headers in the search path for shader compilation (and also when it should add a `#define` to make the prelude pull in OptiX). This code again uses a hard-coded default path for the OptiX SDK; we will need to modify that to have a better discovery approach and also to support an API or command-line override. * One note for the future is that instead of passing down a "pipeline type" we could instead pass down the list/set of stages for the kernels being compiled, and the OptiX support could be enabled whenever there is any ray tracing entry point present in a module. That approach would allow mixing RT and compute kernels during downstream compilation. We will need to revisit these choices when we start supporting code generation for multiple entry points at a time. * The CUDA emit logic is currently mostly unchanged. The biggest difference is that when emitting a ray-tracing entry point we prefix the name of the generated `__global__` function with a marker for its stage type, as required by the OptiX runtime (e.g., a `__raygen__` prefix is required on all ray-generation entry points). * The `Renderer` abstraction had a bare minimum of changes made to be able to understand that ray-tracing pipelines exist, and also that some APIs will require the name of each entry point along with its binary data in order to create a program. * The `ShaderCompileRequest` type was updated so that only a single "source" is supported (rather than distinct source for each entry point), and also the entry points have been turned into a single list where each entry identifies its stage instead of a fixed list of fields for the supported entry-point types. * The CUDA compute path had a lot of code added to support execution for the new ray-tracing pipeline type. The logic is mostly derived from the `optixHello` example in the OptiX SDK, and at present only supports running a single ray-generation shader with no parameters. The code here is not intended to be ready for use, but represents a signficiant amount of learning-by-doing. * The `slang-support.cpp` file in `render-test` was updated so that instead of having separate compilation logic for compute vs. rasterization shaders (which would mean adding a third path for ray tracing), there is now a single flow to the code that works for all pipeline types and any kind of entry points. * Implicit in the new code is dropping support for the way GLSL was being compiled for pass-through render tests, which means pass-through GLSL render tests will no longer work. It seems like we didn't have any of those to begin with, though, so it is no great loss. * Also implicit are some new invariants about how shaders without known/default entry points need to be handled. For example, the ray tracing case intentionally does not fill in entry points on the `ShaderCompileRequest` and instead fully relies on the Slang compiler's support for discovering and enumerating entry points via reflection. As a consequence of those edits the `-no-default-entry-point` flag on `render-test` is probably not working, but it seems like we don't have any test cases that use that flag anyway. Given the seemingly breaking changes in those last two bullets, I was surprised to find that all our current tests seem to pass with this change. If there are things that I'm missing, I hope they will come up in review. * fixup: issues from review and CI * Some issues noted during the review process (e.g., a missing `break`) * Fix logic for render tests with `-no-default-entry-point`. I had somehow missed that we had tests reliant on that flag. This required a bit of refactoring to pass down the relevant flag (luckily the function in question was already being passed most of what was in `Options`, so that just passing that in directly actually simplifies the call sites a bit. * There was a missing line of code to actually add the default compute entry points to the compile request. I think this was a problem that slipped in as part of some pre-PR refactoring/cleanup changes that I failed to re-test.
2020-03-31	Improve diagnostic parsing from GCC. (#1303)	jsmall-nvidia
	Enable x86_64 CPU tests on TC.
2020-03-30	CUDA version handling (#1301)	jsmall-nvidia
	* render feature for CUDA compute model. * Use SemanticVersion type. * Enable CUDA wave tests that require CUDA SM 7.0. Provide mechanism for DownstreamCompiler to specify version numbers. * Enabled wave-equality.slang * Make CUDA SM version major version not just a single digit. * Fix assert. * DownstreamCompiler::Version -> CapabilityVersion
2020-03-26	Disable CPU tests on TC. (#1295)	jsmall-nvidia
	Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-03-25	Better diagnostics on failure on CUDA. (#1288)	jsmall-nvidia
	* Better diagnostics on failure on CUDA. * Catch exceptions in render-test * * Added ability to disable reporting on CUDA failures * Stopped using exception for reporting (just write to StdWriter::out() * Removed CUDAResult type * Don't set arch type on nvrtc to see if fixes CI issues. * Try compute_30 on CUDA. * Added ability to IGNORE_ a test DIsabled rw-texture-simple and texture-get-dimensions * Disable tests that require CUDA SM7.0 Use DISABLE_ prefix to disable tests. * Disable signalUnexpectedError doing printf.
2020-03-21	CPU Texture GetDimensions support (#1283)	jsmall-nvidia
	* Added CPU support for GetDimensions on C++/CPU target. Added texture-get-dimension.slang test * Fix some typos. * Update CUDA docs. * Fix output of GetDimensions on glsl when has an array. Disabled VK - because VK renderer doesn't support createTextureView * Fix typo. * Fix typo. * Fix bad-operator-call diagnostics output.
2020-03-20	Diagnostic compare special case for stdlib line number changes (#1286)	jsmall-nvidia
	* Added diagnostic compare which ignores line number changes in std library. * Change stdlib just to make sure it's all working.
2020-03-02	Renamed UnownedStringSlice::size to getLength to make match String. (#1254)	jsmall-nvidia

2020-03-02	Feature/profile tool (#1251)	jsmall-nvidia
	* WIP slang-profile * Turn on symbols needed for profile. * Remove calls to slang API from core as doing so broke profiling information. Fix premake so slang-profile works on VS.
2020-02-26	Support for RWTexture types on CPU and CUDA (#1243)	jsmall-nvidia
	* Added FloatTextureData as a mechanism to enable CPU based Texture writes. * Add [] RWTexture access for CPU. * Fixed rw-texture-simple.slang.expected.txt * WIP: CUDA stdlib has support for [] surface access. * Made IRWTexture class able to take different locations. Doing a Texture2d access on CUDA works. * Fix bug in outputing UniformState - was missing out padding. Support RWTexture with array. Support RWTexture3D. * Use * for locations for read only textures, so only need a ITexture interface. * Fix problem around application of set/get for CUDA on subscript Texture types.
2020-02-20	WIP on RWTexture types on CUDA/CPU (#1234)	jsmall-nvidia
	* CUDA support for array of resources. * * Add support for Texture2DArray on CPU * Expand texture-simple.slang to test Texture2DArray * Reorganise CUDAComputeUtil to split out createTextureResource. * Add TextureCubeArray support for CPU/CUDA targets. * Pulled out CUDAResource Renamed derived classes to reflect that change. * Creation of SurfObject type. * Functions to return read/write access for simplifying future additions. * WIP for RWTexture access on CPU/CUDA. * CUsurfObject cannot have mips. * Ability to set number of mips on test data. Preliminary support for CUsurfObj and RWTexture1D on CUDA. CUDA docs improvements. * Fix typo.
2020-02-20	CUDA/CPU support for 1D, 2D, CubeArray (#1232)	jsmall-nvidia
	* CUDA support for array of resources. * * Add support for Texture2DArray on CPU * Expand texture-simple.slang to test Texture2DArray * Reorganise CUDAComputeUtil to split out createTextureResource. * Add TextureCubeArray support for CPU/CUDA targets.
2020-02-18	Added support for Targets to TypeTextUtil. (#1226)	jsmall-nvidia
	* Added support for Targets to TypeTextUtil. * Made Function names 'get' and 'find' instead of 'as' in TypeTextUtil.
2020-02-18	First pass Texture Array support on CUDA/CPU (#1225)	jsmall-nvidia
	* Add cubemap support. * Add CUDA fence instrinsics. * Added Gather for CUDA. * Use the CUDA driver API as much as possible. * * Support 1D texture on CPU * WIP on 1D texture on CUDA * Added simplified texture test * Fix test. * Improve texture-simple tests. * * Add CPU support for 3d textures * Add support for mip maps to CUDA * Disable warnings in nvrtc * Update CUDA docs * WIP on 3d texture support. * Add support for 3d textures for CPU and CUDA. * CPU and CUDA support for cube maps. * Add CPU support for Texture1DArray. * Support CUDA Layered/Array type in meta library.
2020-02-18	CUDA/CPU resource coverage (#1224)	jsmall-nvidia
	* Add cubemap support. * Add CUDA fence instrinsics. * Added Gather for CUDA. * Use the CUDA driver API as much as possible. * * Support 1D texture on CPU * WIP on 1D texture on CUDA * Added simplified texture test * Fix test. * Improve texture-simple tests. * * Add CPU support for 3d textures * Add support for mip maps to CUDA * Disable warnings in nvrtc * Update CUDA docs * WIP on 3d texture support. * Add support for 3d textures for CPU and CUDA.
2020-02-14	Feature/cuda coverage (#1223)	jsmall-nvidia
	* Add cubemap support. * Add CUDA fence instrinsics. * Added Gather for CUDA. * Use the CUDA driver API as much as possible. * * Support 1D texture on CPU * WIP on 1D texture on CUDA * Added simplified texture test * Fix test. * Improve texture-simple tests. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-02-12	Nvrtc disable warnings/Float literal improvements (#1220)	jsmall-nvidia
	* Added 'truncate' for fixing floats, for floats near the max value (as opposed to making infinite). Put AreNearlyEqual into Math * Test for ::make static method.
2020-02-11	Fix ref counting for sesion - simple test had a path to not releasing ↵	jsmall-nvidia
	compile request. (#1217)
2020-02-11	Make fixes for the attribute-error test case. (#1215)	Tim Foley
	There are two main fixes here: The first is to remove a memory leak in the reflection test tool, in the case where Slang compilation fails. There is no real reason to be using the reflection test tool for tests that produce diagnostics (we have the slangc tool for that), but it makes sense to go ahead and fix the leak rather than work around it. This was one of those leaks that could have been avoided with smart pointers and a COM-lite API. The second issue was that the logic for constructing an `AttributeDecl` based on a user-defined `struct` was not correctly setting the parent of the attribute decl. The code in question was a little hard to follow and had a few steps that didn't seem strictly necessary given its goals, so I went ahead and scrubbed+commented it to just do what made sense to me (and the tests still pass...). I'm not entirely happy with the design and implementation approach for user-defined attributes, so we might want to take another stab at it sooner or later. This change is not meant to address any design issues, and is just about fixing bugs in what is already there.
2020-02-08	Fixes to make all CPU compute shaders work on CUDA (#1211)	jsmall-nvidia
	* Launch CUDA test taking into account dispatch size. * Enable isCPUOnly hack to work on CUDA. * Rename 'isCPUOnly' hack to 'onlyCPULikeBinding'. * Add $T special type. Support SampleLevel on CUDA. * Fix typo.
2020-02-06	Float matrix intrinsic test/fixes (#1203)	jsmall-nvidia
	* Fix CPP construct when matrix type. * Test intrinsics on float matrices. * Fix typo in _areNearlyEqual test. Increased default sensitivity. Added matrix-float test.
2020-02-06	Literal handling improvements (#1202)	jsmall-nvidia
	* WIP: 64 literal diagnostic and truncation. * Improve how integer truncation is handled/supported. Added literal-int64.slang test. Set a suffix on all literals. Fixed problem on C++ based targets where l suffix was not the same as int() cast. So on C++ derived emitters, int() is used instead of l suffix to have same behavior across targets. * Add literal diagnostic testing. * Allow lexer to lex - in front of literals. * Fix lexing and converting int literal with -. * Too large small values of floats become inf. Handling writing inf types out on different targets. Add function to deterimine if a float literals kind. * Roll back the support of lexer lexing negative literals. * Fixed tests broken because of diagnostics numbers. Improved _isFinite * Fix compilation on linux. * Fix problem with abs on linux - use Math::Abs. * Fix typo. * * Improve warnings for float literals zeroed * Improved 64 bit type documentation * Handle half * Improved comments * Fixed tests broken * Use capital letters for suffixes. * Make default behavior on outputting a int literal that is an 'int32_t' is cast (not suffix) to avoid platform inconsistencies. Improve documentation for 64 bit types. Make tests cover material in docs. * Fixed tests. * Rename FloatKind::Normal -> Finite * Fix half zero check.
2020-02-04	CUDA/C++ backend improvements (#1198)	jsmall-nvidia
	* WIP with vector float test. * vector-float test working. * Fixed remaing tests broken with init changes. * Improve 64bit-type-support.md * Disable tests broken on CI system for Dx. * WIP: Make type available for comparison. * Moved type conversion into TypeTextUtil. * Add text/type conversions from DownstreamCompiler to TypeTextUtil. * Allow compaison taking into account type. * Removed quantize in vector-float.slang test.
2020-01-29	Feature/test for double behavior (#1186)	jsmall-nvidia
	* Split out binding writing. * Pass in the entry type. * Take into account output type with -output-using-type Added GPULikeBindRoot Added dxbc-double-problem test. * Add the dxbc-double-problem test.
2020-01-28	Synthesizing CUDA tests (#1183)	jsmall-nvidia
	* When using setUniform clamp the amount of data written to the buffer size. * CUDA implement StructuredBuffer/ByteAddressBuffer as pointer/count as is on CPU. Allow bounds check to zero index. Update docs. * Synthesize tests. * Fix bug in CUDA output. * Fixing more tests to run on CUDA. * Added BaseType for layout of Vector and Matrix - as they are held as int32_t vector array types. * Enable unbound array support on CUDA. * Added unsized array support for CUDA documentation.
2020-01-27	When using setUniform clamp the amount of data written to the buffer size. ↵	jsmall-nvidia
	(#1181)
2020-01-27	CUDA implement StructuredBuffer/ByteAddressBuffer as pointer/count as is on ↵	jsmall-nvidia
	CPU. (#1182) Allow bounds check to zero index. Update docs.
2020-01-24	Texture Sample available in CUDA (#1176)	jsmall-nvidia
	* WIP: Trying to figure out how texturing will work with CUDA. * WIP: Fixes for CUDA layout. Initial CUDA texture test. * WIP: Outputs something compilable by CUDA for TextureND.Sample * 2d texture working with CUDA. * Fix how binding for SamplerState occurs in CUDA. * Small tidy up of comments.
2020-01-22	WIP HLSL intrinsic coverage (#1171)	jsmall-nvidia
	* Added hlsl-intrinsic test folder. Enabled ceil as works across targets. * log10 support. * Fix float % on CPU/CUDA to match HLSL which is fmod (not fremainder). * Added log10 tests back to scalar-float.slang * Don't add the ( for $Sx - it's clearer what's going on without it. * Works on CUDA/CPU. Problem with asint/asuint do not seem to be found. * Only asuint exists for double. * Support countbits on CUDA and C++. * Fix typo in C++ population count. * First pass at int vector intrinsic tests. * Swizzle for int. * Bit cast tests on CUDA. * Fix warning on gcc. * Fix bit-cast-double execution on CUDA. * scalar-int test working on gcc release.
2020-01-21	CUDA support improvements (#1168)	jsmall-nvidia
	* Add test result for compile-to-cuda * Add RAII for some CUDA types to simplify usage. * First pass handling of some instrinsics on CUDA (for example transcendentals) * CUDA working with built in intrinsics. * Add missing CUDA prelude intrinsics. * CUDA matches CPU output on simple-cross-compile.slang * First pass at hlsl-scalar-float-intrinsic.slang test. * Fix smoothstep impl on CUDA and CPU. * Fixed step intrinsic on CUDA/CPU. * Added operator[] to Matrix for C++, to allow row access. Needs a fix for CUDA. * Fixed warning on clang build.
2020-01-17	Slang -> CUDA kernel runs correctly in test infrastructure (#1167)	jsmall-nvidia
	* First pass at BindLocation. * Added BindSet::init - for initializing with two input constant buffers. Needs better name, and perhaps should be another class. * Fix handling of constant buffer stripping. Improved initialization. * Trying to generalize BindLocation a little more. Split out CPULikeBindRoot. * More work to make BindLocation et al work with non uniform bindings. * Added parsing to a location. * WIP: Trying to get CPU working with BindLocation. * Describe problem of knowing the type of the reference point in the binding table. * More ideas on getBindings fix. * Remove BindSet as member of BindLocation. * Added BindLocation::Invalid * Made BindLocation able to be key in hash * Use BindLocation for bindings on BindingSet. * Added cuda and nvrtc categories to test infrastructure. Disabled CUDA synthetic tests by default. Fixed such that all tests now produce something in BindLocation style. * Use m_userIndex instead of m_userData on Resource. Move the binding setup out of cpu-compute-util (as no longer CPU specific) * Removed CPUBinding - used BindLocation/BindSet instead. Fixed some bugs around indexOf around uniform indirection. * Renamed BindSet::Resource -> BindSet::Value. * Document BindLocation. * Fixes for Clang/GCC Improve invariant requirement handling when constructing from BindPoints. * WIP: First attempt to run CUDA kernel. * Fix some issues around doing CUDA kernel launch. * Fix issues around use of cudaMemCpy . * Better cuda runtime error checking mechanism. * Fixed bug in passing parameters to cuda kernel launch. Simplified initialisation of context. * WIP: Fix CUDA runtime issues. * Add explicit CUDA synchronize so failures don't appear on implicit ones. * Fix problem emitting non shared variable on CUDA. * Fix some typos in CUDA layout. Use just a pointer for now for CUDA StucturedBuffer. * Arg order for CUDA launch was wrong. * First compute kernel runs on CUDA.
2020-01-15	Bind Location (#1166)	jsmall-nvidia
	* First pass at BindLocation. * Added BindSet::init - for initializing with two input constant buffers. Needs better name, and perhaps should be another class. * Fix handling of constant buffer stripping. Improved initialization. * Trying to generalize BindLocation a little more. Split out CPULikeBindRoot. * More work to make BindLocation et al work with non uniform bindings. * Added parsing to a location. * WIP: Trying to get CPU working with BindLocation. * Describe problem of knowing the type of the reference point in the binding table. * More ideas on getBindings fix. * Remove BindSet as member of BindLocation. * Added BindLocation::Invalid * Made BindLocation able to be key in hash * Use BindLocation for bindings on BindingSet. * Added cuda and nvrtc categories to test infrastructure. Disabled CUDA synthetic tests by default. Fixed such that all tests now produce something in BindLocation style. * Use m_userIndex instead of m_userData on Resource. Move the binding setup out of cpu-compute-util (as no longer CPU specific) * Removed CPUBinding - used BindLocation/BindSet instead. Fixed some bugs around indexOf around uniform indirection. * Renamed BindSet::Resource -> BindSet::Value. * Document BindLocation. * Fixes for Clang/GCC Improve invariant requirement handling when constructing from BindPoints.
2020-01-08	Cover a few corner cases in reflection API (#1163)	Tim Foley
	This change adds some new entry points to the reflection API to cover corner cases that a majority of applications won't care about. These are most likely to come up for users who want to make a complete copy of the Slang reflection information into a data format of their own design. All of the information is stuff that we already computed as part of layout, and just hadn't exposed: * Alignment information for type layouts. This is only useful for ordinary/uniform data; in all other cases alignment is always one. Even for uniform/ordinary data, it is unlikely that any application would actually make use of it. * Layout information for the result of an entry point function. This would be useful for applications that need to enumerate the varying outputs (user- or system-defined) of a shader. Having information available for `out` parameters but not the function result was inconsistent. * The "element type" of a parameter block type (e.g., going from `ParameterBlock<X>` to `X`). This seems to have been an oversight since `ConstantBuffer<X>` appears to have been implemented, and the case for a type layout was handled. * The "container" variable layout for a parameter block or constant buffer. It took a while for us to arrive at the current representation of layout for parameter groups, and most client code continues to use the original API that requires us to generated kludged "do what I mean" data. However, if we don't expose the more useful new representation fully, there is no way for users to take advantage of it! The reflection test tool has been updated to print the new information where it makes sense, which provides us some level of coverage for the new code. Unfortunately, this led to some cascading changes: * First, a bunch of the tests had their output changed since they include new information. That's the easy bit. * Next, the "container" and "element" var layouts don't actually have names (because there is no actual variable underlying them), which means that the code to emit variable names in the JSON dump needed to be condition. * Making the `"name"` output conditional messed up a lot of the delicate logic that had been dealing with when to emit commas for the output JSON (JSON uses commas as separators, and doesn't allow trailing commas). I added a bit of new infrastructure to make it simple(-ish) to track when a comma actually needs to be output.