slang.git - Making it easier to work with shaders

	Commit message (Collapse)	Author	Age
...
*	DownstreamDiagnostic::Type -> Severity (#1687)	jsmall-nvidia	2021-02-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * WIP diagnostics for line number output. * Small param naming change * Use x macro for pass through compile human name lookup/getting. * WIP on parsing downstream compiler output. * Split out parsing into ParseDiagnosticUtil. Added test result of single line. * Dump out the std output on fail to parse diagnostics. * Change test type for syntax-error-intrinsic.slang be TEST not TEST_DIAGNOSTIC * Use Index for StringUtil. * WIP: First pass support for parsing Slang diagnostics. * WIP Testing comparing with ParseDiagnosticUtil with previous ad-hoc mechanism. * Use the new parsing mechanism for diagnostic comparisons. * Improvements to diagnostics parsing. Better error handling, and fallback handling. Added ability to parse downstream compilers without a prefix. Added ability to parse Slang with a prefix. * DownstreamDiagnostic::Type -> Severity and related fixes. * Small fixes around moving from DownstreamDiagnostic::Type -> Severity * Small comment fixes. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Diagnostic comparison using parsing (#1683)	jsmall-nvidia	2021-02-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * WIP diagnostics for line number output. * Small param naming change * Use x macro for pass through compile human name lookup/getting. * WIP on parsing downstream compiler output. * Split out parsing into ParseDiagnosticUtil. Added test result of single line. * Dump out the std output on fail to parse diagnostics. * Change test type for syntax-error-intrinsic.slang be TEST not TEST_DIAGNOSTIC * Use Index for StringUtil. * WIP: First pass support for parsing Slang diagnostics. * WIP Testing comparing with ParseDiagnosticUtil with previous ad-hoc mechanism. * Use the new parsing mechanism for diagnostic comparisons. * Improvements to diagnostics parsing. Better error handling, and fallback handling. Added ability to parse downstream compilers without a prefix. Added ability to parse Slang with a prefix.
*	Downstream compiler line number test (#1682)	jsmall-nvidia	2021-02-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * WIP diagnostics for line number output. * Small param naming change * Use x macro for pass through compile human name lookup/getting. * WIP on parsing downstream compiler output. * Split out parsing into ParseDiagnosticUtil. Added test result of single line. * Dump out the std output on fail to parse diagnostics. * Change test type for syntax-error-intrinsic.slang be TEST not TEST_DIAGNOSTIC
*	Obfuscation naming issue fix (#1676)	jsmall-nvidia	2021-01-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Work around for issue with obfuscation (and lack of name hints) leading to names in output not being correctly uniquified. * Improve appendChar Remove unrequired memory juggling to scrub names. * Remove test code. * Small fixes in comments and method called. * Remove linkage decoration on functions that are specialized. * Obfuscation naming with specialization test. * Fix instruction deletion. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Improved NVRTC location finding (#1674)	jsmall-nvidia	2021-01-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * WIP more sophisticated mechanism to find NVRTC. * Improve nvrtc searching to include PATH. * Make getting an extension able to differentiate between no extension, and just a . * Add comment. * Add support for searching instance path. * Small improvements around scope and finding NVRTC. * Improve documentation around NVRTC loading.
*	Add nvrtc shared library/dll names (#1673)	jsmall-nvidia	2021-01-22
\| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Add other NVRTC versions. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Fix existential specialization of mutable buffer loads. (#1671)	Yong He	2021-01-22
\| \| \| \| \| \| \|	* Fix existential specialization of mutable buffer loads. * fix Co-authored-by: Yong He <yhe@nvidia.com>
*	Add `StructuredBuffer` support in `gfx`. (#1666)	Yong He	2021-01-21
\|
*	COM-ify all slang-gfx interfaces. (#1656)	Yong He	2021-01-14
\| \| \|	* COM-ify all slang-gfx interfaces.
*	LZ4 compression support (#1654)	jsmall-nvidia	2021-01-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Testing out use of lz4. * Added ICompressionSystem, and LZ4 implementation. * Add support for deflate compression. Simplify compression interface - to make more easily work across apis. * WIP on CompressedFileSystem. * ImplicitDirectoryCollector * SubStringIndexMap - > StringSliceIndexMap. * WIP save stdlib in different containers. * Support for different archive types for stdlib. * Fix project. * CompressedFileSystem -> ArchiveFileSystem. Added CompressionSystemType::None * Added ArchiveFileSystem * Fix problem RiffFileSystem load withoug compression system. * Test archive types. Improve diagnostic message. * Fix typo in testing file system archives. * Split out archive detection. * Fix gcc warning issue. * Fix warning. * RiffArchiveFileSystem -> RiffFileSystem Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Use "capability" system to select VKRT extension (#1647)	Tim Foley	2021-01-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Use "capability" system to select VKRT extension Slang currently supports translation of ray tracing shader code to Vulkan GLSL code that uses the `GL_NV_ray_tracing` extension. A multi-vendor equivalent of that extension has been released as `GL_EXT_ray_tracing` and we want Slang to support that extension as well. At the simplest, making the change from one extension to the other is just a matter of changing a few strings, since it does not appear that anything of significance was changed at the GLSL level (or even in SPIR-V). Where this gets trickier is when we have users who want us to support both extensions, and to be able to switch between them. The solution we've implemented here more or less amounts to: * If you don't tell the compiler which extension to use, it will default to `GL_EXT_ray_tracing` (the newer multi-vendor one). * If you explicitly want the older extension, you can opt into it using the `-profile` option or via a new API for explicitly adding capabilities to your target. Making that work required a few different kinds of changes: * The options parsing and public API needed ways to add optional capabilities to a target. * During GLSL code emit, we can check the capabilities that were added to the target to see if the `GL_NV_ray_tracing` extension was explicitly enabled and, if not, default to using the `GL_EXT_ray_tracing` names for things. This step is needed because some of the modifiers/attributes involved in the extension have to be handled explicitly in the code generator rather than implicitly as part of mapping intrinsic functions. * We add two different translations to the relevant operatiosn in the stdlib, one marked with each of the extensions. If profile/capability-based overload resolution can be relied on to pick the right one, this should Just Work. * Next, a bunch of work had to go into making capability-based overloading Just Work for the purposes of this change. There's been a nearly complete reworking of the implementation of `CapabilitySet` here to make it more suitable for our needs. * The tests that were using ray tracing translation for Vulkan needed to be updated. For some of them I updated their baselines to use `GL_EXT_ray_tracing` so that they can test the new path. For others, I updated the command line for the test case so that it explicitly opts into using `GL_NV_ray_tracing`. The result is that we have some coverage of each extension. I would have liked to have each test run in both modes, but our pass-through glslang support doesn't support `-D` options, so I couldn't take that step easily. This change does not add support for `GL_EXT_ray_query`, the extension that supports "DXR 1.1" style queries under Vulkan. Adding support for that extension should hopefully be a smaller step because it doesn't have the same multiple-extensions issue. This change does not address a lot of possible avenues for improvement or cleanup around the capability system. It focuses only on those changes that are necessary to make the ray tracing feature work and leaves the rest for future work. * fixup: infinite loop * Comment-only change to retrigger TC build
*	Building with embedded stdlib (#1634)	jsmall-nvidia	2020-12-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Move reflection to reflection-api. * Slight reorg to pull out potentially Slang internal functions from the reflection API impls. * Remove visual studio projects * Fix for slang-binaries copy. * Add the visual studio projects in build/visual-studio * Remove miniz project. * Differentiate the linePath from the filePath. * Improve comment in premake5.lua + to kick of CI. * Kick CI. * Use COM compile request for calls to functions inside api-less-slang. Add static-slang project. * Fix const typo issue. * Don't include 'core' link in 'api-less-slang' * Removed static-slang lib causes problems on linux with linking. Embed Slang stdlib Added StaticBlob Added dumpSourceBytes Use ConstArrayView for the archive. At startup allow loading of zip with stdlib. Made -save-stdlib -load-stdlib take a name Added '-save-stdlib-bin-source' to save out serialized stdlib as source. * Ability enable/disable stdlib embedding. * Fix problem with moduleDecl not having module pointer set when serialized in. * Set of debugdir for slang-test and examples. * Add slang-stdlib-api.cpp * Update slang filters for VS. * Try to use pic, and -mcmodel=medium * Some more efforts ot make premake work. * WIP premake5.lua from previously working version. * Remove api-less-slang project. * Disable dllexport on gcc/clang. * Embed via slangc-bootstrap. * Fix slang-profile. Always compiles without stdlib. * Use pic "On" * Remove slangc-bootstrap and embed-stdlib-generator if embedding not required. Make bootstrap run the generators. * Improve comments in premake5.lua. Kick off another CI build. * Remove generation of stdlib source from std-lib-serialize.slang
*	Projects in 'build' and Slang API separation (#1624)	jsmall-nvidia	2020-12-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Move reflection to reflection-api. * Slight reorg to pull out potentially Slang internal functions from the reflection API impls. * Remove visual studio projects * Fix for slang-binaries copy. * Add the visual studio projects in build/visual-studio * Remove miniz project. * Differentiate the linePath from the filePath. * Improve comment in premake5.lua + to kick of CI. * Kick CI.
*	Zip FileSystem support (#1617)	jsmall-nvidia	2020-12-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Add miniz * Fix for separator in CacheFileSystem. Add compression unit test for zip. * Put zip compression into core. * Remove delimiter stripping if simplifying a path - as stripping will fix delimiters. * ZipFileSystem WIP. * More ZipFileSystem working. * Added isEmpty. Fixed small bug is contains. * First pass support for mutability on zip. * Improvements to File::read/writeAllBytes * Can access and save archive - but has memory leaks. * Fix memory leak. * Some ZIP compression tests. * Fix memory leak on ScopedAllocation. Fix off by one bug on UIntSet * Bug fix in UIntSet * Fix remaining ZipFileSystem issues. Adde stand alone unit-test. * Turn tabs to spaces in slang-io.h * Renamed mode ReadWrite (instead of just Write) * Make miniz it's own project. * Fix windows warning on win32. * Remove warnings needed when miniz was included as a header library. * Set the C++ standard via 'flags' in premake. * Add support for 'implicit' paths. * Add testing for implicit directories. Better handling of implicit directories. * Improve comments in ZipFileSystem. * Update comment around reader/writer transformation.
*	Bug fixes: Memory leak/off by one on UIntSet (#1616)	jsmall-nvidia	2020-11-20
\| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Fix memory leak on ScopedAllocation. Fix off by one bug on UIntSet
*	File system refactor (#1611)	jsmall-nvidia	2020-11-19
\| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * WIP FileSystem refactor. * Made loadFile load the file in binary mode. * Fixed some comments. Fixed typo in RelativePath - not used 'fixedPath'.
*	Test for serializing out and reading back Stdlib (#1605)	jsmall-nvidia	2020-11-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Mangling/module name extraction for GenericDecl * Add comment on SerialFilter to explain re-enabling Stmt. * Support setting up SyntaxDecl when reconstructed after deserialization. * Improvements to setup SyntaxDecl. * Fix typo so can read compressed SourceLocs. * Fix issue with SourceManger. * Simple test for serializing out stdlib and reading back in. * Fix calling convention. * Add override to StdLib impls. * Fix typo. * Apply testing to an actual compute test when using load-stdlib Make -load/compile-stdlib processable by Slang Move out testing into util into TestToolUtil so can be shared. * Slightly more concise setup of session. * Fix some errors introduced with session handling. * Made setup for compile same across slangc and slangc-tool.
*	Include hierarchy output (#1595)	jsmall-nvidia	2020-11-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Improve diagnostic for token pasting. * Token paste location test. * Output include hierarchy. * WIP on includes hierarchy. * Improved include hierarchy output - to handle source files without tokens. Improved test case. * Small comment improvements. Fixed a typo with not returning a reference. * Slight simplification of the ViewInitiatingHierarchy, by adding GetOrAddValue to Dictionary. * Remove the need for ViewInitiatingHierarchy type. * Improve output of path in diagnostic for includes hierarchy. * Remove comment in diagnostic for token-paste-location.slang * Update command line docs to include `-output-includes` Co-authored-by: Yong He <yonghe@outlook.com>
*	Fix IR serialization to use variable length encoding for opcode. (#1599)	Yong He	2020-11-10
\| \| \| \| \| \| \| \| \|	* Fix IR serialization to use 16bits for opcode. * Undo accidental comment change. * Use variable length encoding for opcode. * Fixing issues
*	Standard library save/loadable (#1592)	jsmall-nvidia	2020-11-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Fix handling of access modifiers inside type definition. * Fix access problem for AST node. Make dumping produce a single function with switch, to potentially make available without Dump specific access. * WIP on serialization design doc. * Remove project references to previously generated files. * More docs on serialization design. * Improve serialization documentation. Remove unused function from IRSerialReader. * Small fixes around naming. Remove long comment from slang-serialize.h - as covered in serialization.md * Remove long comment in slang-serialize.h as covered in serialization.md * More information about doing replacements on read for AST and problems surrounding. * Typo fix. * Spelling fixes. * Value serialize. * Value types with inheritence. * Use value reflection serial conversion for more AST types * Use automatic serialization on more of AST. * Get the types via decltype, simplifies what the extractor has to do. * Update the serialization.md for the value serialization. * Small doc improvements. * Update project. * Remove ImportExternalDecl type Added addImportSymbol and ImportSymbol type Fixed bug in container which meant it wouldn't read back AST module * Because of change of how imports and handled, store objects as SerialPointers. * First pass symbol lookup from mangled names. * Cache current module looked up from mangled name. * Fix SourceLoc bug. Improve comments. * Added diagnostic on mangled symbol not being found * Fix typo. * WIP serializing stdlib. * WIP serializing stdlib in. * Fix problem serializing arrays that hold data that is already serialized. * Remove clash of names in MagicTypeModifier. * Make conversion from char to String explicit. Fix reference count issue with SerialReader. * Add code to save/load stdlib. * Use return code to avoid warning - SerialContainerUtil::write(module, options, &stream)) * Make all String numeric ctors explicit. Added isChar to UnownedStringSlice. Added operator== for UnownedStringSlice to String to avoid need to convert to String and allocate. * Add error check to readAllText. * tabs -> spaces on String.h * tab -> spaces String.cpp * Remove msg for StringBuilder, just build inplace for exceptions. * Check SerialClasses - for name clashes. Renamed Modifier::name as Modifier::keywordName * Handling of extensions when deserializing AST - updating the moduleDecl->mapTypeToCandidateExtensions Co-authored-by: Tim Foley <tim.foley.is@gmail.com>
*	Single pass C++ extraction (#1583)	jsmall-nvidia	2020-10-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Added CharUtil. Added TypeSet to extractor. First pass at being able to specify all headers for multiple output headers. * Fix includes for new C++ extractor convension. Update premake5 to use new extractor mechanisms. * Small improvements around StringUtil. * Split out NameConventionUtil. * Use a 'convert' to convert between convention types. * Fix output of build message for C++ extractor. Improve NameConventionUtil interface. * Improve comments. * Fix warning on gcc. * Fix clang warning. * Fix some typos in NameConventionUtil. * Small fix to premake5.lua * Fix generated includes. * Remove m_reflectType as no longer applicable with TypeSet. * Fix .gitignore for slang-generated-* files. Added getConvention to determine convention from slice. Add versions of split and convert that infer the from convention * Fix typo in spliting camel. * LineWhitespace -> HorizontalWhitespace * Improve CharUtil comments.
*	Fix saving Repro files on Linux (#1581)	jsmall-nvidia	2020-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* #include an absolute path didn't work - because paths were taken to always be relative. * Ascii mode not always set on FileStream. Remove this-> if not needed. Simplify setting of m_fileAccess. * Fix typo. * Fix typo. * Clear up default FileAccess calculation. * Convert tabs to spaces. * Small naming improvements in FileStream::seek.
*	Simplify workflow when using NVAPI (#1556)	Tim Foley	2020-09-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In some cases, functionality is available as either a GLSL extension for Vulkan/SPIR-V, or through the NVAPI system for D3D. This situation creates complications because while GLSL extensions are generally all supported by the open-source glslang compiler (which we can bundle and ship), NVAPI operations are exposed through a specific header (`nvHLSLExtns.h`) that ships as part of the NVAPI SDK. When a user wants to explicitly use NVAPI-provided operations in their shader code, there are no major complications for Slang; the user sets up their include paths, `#include`s the relevant header, calls functions in it, and lets Slang deal with the details of compilation. The challenge for Slang arises when we want to provide a cross-platform interface in our standard library (e.g., the `RWByteAddressBuffer.InterlockedAddF32` method that was recently added) that uses either a GLSL extension (when compiling for Vulkan/SPIR-V) or an NVAPI (when compiling to DXBC or DXIL). In that case, the code generated by Slang now has a dependency on NVAPI, and we need to somehow emit a `#include` directive that pulls it in when invoking fxc or dxc. Because we do not (and seemingly cannot) bundle the NVAPI header with the compiler, we have to rely on ther user to have it available and to somehow communicate to Slang where it is. Exposing portable routines that sometimes use NVAPI currently creates two main challenges: 1. The user is forced to interact with the "prelude" mechanism in the compiler, which allows the programmer to define code in a given target language that gets prepended to the Slang-generated code. While the prelude mechanism is powerful, it is also hard for users to integrate into their workflow, and our experience so far is that users want something that Just Works. 2. If the user writes code that uses some of our abstract operations that layer on NVAPI and they also want to use NVAPI explicitly, they end up with two copies of the NVAPI header (one included by the Slang front-end, and another included by the downstream fxc/dxc compiler). This puts the user in the situation of (a) having to ensure that they set the defines like `NV_SHADER_EXTN_SLOT` consistently both when invoking Slang and when adding their prelude, and (b) even if they do make the definitions consistent, they run into the problem that fxc/dxc complain about overlapping register bindings on the two copies of the `g_NvidiaExt` global shader paraemter that the NVAPI header declares. This change attempts to resolve both issues by adding a lot of "do what I mean" logic to the compiler to try to ease things in the common case. In particular: 1. The user no longer needs to use the "prelude" mechanism when using NVAPI. The compiler now embeds a default prelude for HLSL output, which will `#include` the NVAPI header if and only if the generated code needs NVAPI access because of portable standard library routines that were used. 2. The user can mix-and-match explicit NVAPI use and stdlib functions that compile to use NVAPI. The register/space to be used by NVAPI when included via prelude is now set based on whatever the user set via the preprocessor so that it should automatically be consistent between both cases. Furthermore, the code we emit for the declaration of `g_NvidiaExt` when compiling explicit NVAPI use is set up to be conditional, so that it is skipped in the case where the prelude will pull in its own declaration of that parameter. The way all this is achieved involves a lot of moving pieces: * We now have an HLSL prelude, which mostly just serves to `#include "nvHLSLExtns.h"` in the case where NVAPI support is needed downstream. * Standard library operations that require NVAPI for their implementation on HLSL include a new `[__requiresNVAPI]` attribute. * The preprocessor has been extended so that after tokenizing an input file it looks up the NVAPI-relevant macros in the resulting environment, and if they are set it attached a modifier (`NVAPISlotModifier1) to the AST `ModuleDecl` that is based on their values. Logic is added to detect if multiple input files specify values for the macros in ways that conflict. * The semantic checking step is extended so that it detects the "magic" NVAPI declarations (the `g_NvidiaExt` paramter and the `NvShaderExtnStruct` type that it uses) and attaches a modifier to them so that they can be identified as such in later steps. * Parameter binding is extended to collect a list of the AST modifiers that reflect NVAPI binding, and to reserve the relevant register(s) so that ordinary user-defined parameters cannot conflict with them. * IR lowering translates the three new AST modifiers related to NVAPI over to IR equivalents. * IR linking is extended to make sure that it clones any `IRNVAPISlotDecoration`s attached to the input modules. The pass intentionally does not care where the modifiers came from; it just collects them all and leaves it to downstream code to sort out what they mean. * Emit logic is extended to have a notion of "prelude directives" which are preprocessor directives that should come before the prelude in the generated code, because they can impact the way that the prelude compiles. This is done so that we don't have to introduce ad hoc logic for each downstream compiler to set any relevant `-D` flags (e.g., both fxc and dxc would need to duplicate such logic for NVAPI support). * The HLSL source emitter is extended to track whether it emits any operations that require NVAPI support. * The HLSL source emitter is extended to emit prelude directives based on whether NVAPI is needed and, if it is, to also set the register and space that NVAPI should use based on what was stored in the decoration(s) on the IR module. * The HLSL source emitter is extended so that it detects global instructions that represent "magic" NVAPI constructs , and emit them as conditional definitions so that they are skipped when NVAPI is included via the prelude. * The handling of requires capabilities during emit logic was cleaned up a bit so that more logic is shared across targets, and also so that the same logic is used both when emitting a function declaration/definition and when emitting a call to an instrinsic function (which won't get declared/defined).
*	Serialization fixes based on review of #1547 (#1551)	jsmall-nvidia	2020-09-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Test if blob is returned. * Rename serialize files so can be grouped. * StringRepresentationCache -> SerialStringTable * Split out SerialStringTable from slang-serialize-ir * First pass at reorganizing serialization/containers. Remain some issues about debug info. * Fix bug in calculating sourceloc. * Improve calcFixSourceLoc * Make allocations for payload RiffContainer align to at least 8 bytes. This is important for read, if the payload can contain 8 byte aligned data. Note this has no effect on Riff file format alignment rules. * Improve comments around RiffContainer and alignment. * Remove SerialStringTable, can just use StringSlicePool instead. * Add flags to control what is output in SerialContainer. Turn off AST output for obfuscated code. Lazily create astClasses when doing write container serialization. * Typo fix for Clang/Linux. * Fixes that came out of review * TranslationUnit -> Module * TargetModule -> TargetComponent * PAYLOAD_MIN_ALIGNMENT -> kPayloadMinAlignment
*	Share debug information between AST and IR (#1547)	jsmall-nvidia	2020-09-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Test if blob is returned. * Rename serialize files so can be grouped. * StringRepresentationCache -> SerialStringTable * Split out SerialStringTable from slang-serialize-ir * First pass at reorganizing serialization/containers. Remain some issues about debug info. * Fix bug in calculating sourceloc. * Improve calcFixSourceLoc * Make allocations for payload RiffContainer align to at least 8 bytes. This is important for read, if the payload can contain 8 byte aligned data. Note this has no effect on Riff file format alignment rules. * Improve comments around RiffContainer and alignment. * Remove SerialStringTable, can just use StringSlicePool instead. * Typo fix for Clang/Linux. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Embed default prelude for CUDA (#1546)	Tim Foley	2020-09-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Embed default prelude for CUDA Slang supports the notion of a "prelude" that gets prepended to the source code we generate in language. For some targets, a prelude is not necessary (e.g., we compile to HLSL/GLSL and then on to DXBC/DXIL/SPIR-V just fine without a prelude), but some targets have been implemented in a way that makes a prelude necessary (notably CPU and CUDA). For the targets that require a prelude, the Slang codebase includes usable preludes under the `prelude/` directory. Prior to this change, if a user was compiling for such a target (whether via command-line or API), there had to take responsibility for specifying the prelude to use (usually by passing in the contents of the prelude file(s) already included in the Slang distribution). It is reasonable for a user to expect an out-of-the-box experience where compilation to CUDA PTX or native CPU code should Just Work, similarly to how compilation to SPIR-V Just Works. This change is a step in the direction of providing a user experiene that Just Works for common cases. The main addition here is a tool called `slang-embed` that we run during our build to turn the `prelude/.h` files into `prelude/.h.cpp` files that embed the contents of the original `.h` file as a `const` variable. By compiling and linking in the generated `.h.cpp` file for the CUDA prelude, we are then able to set the default prelude to use for CUDA at the time a session/linkage is created. That default prelude will be used unless the user manually specifies their own prelude (which current users of the CUDA back-end must be doing). This change only sets up a default prelude for CUDA because of the way that the CPU prelude is split across multiple files. A strategy that provides a good default prelude for CPU may take more work, but that work might also be unnecessary if we switch to a strategy of using LLVM to generate native code. The implementation of the `slang-embed` tool is intentionally simple, and it will likely run into issues if/when we need to embed binary files or larger text files. The assumption being made here is that we can address those issues when they arise, and there is no reason to over-engineer the tool right now. The way that `slang-embed` is integrated into our build process is likely to require some iteration to make sure that it works across all platforms. I expect that this change will have multiple follow-up fixes related to trying to get the build to work as expected across all targets on CI. * fixup: trying to ensure that embedded prelude gets compiled into slang * fixup: properly clean up allocations in slang-embed * fixup: fix double free introduced by previous change * fixup: off-by-one allocation error
*	Search for multiple NVRTC versions (#1543)	Tim Foley	2020-09-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Search for multiple NVRTC versions The main change here is that when locating the NVRTC compiler we try multiple library names and take the first one that loads successfully (with an ordering that means we try newer versions before older ones). In order to support this change, I needed to fix the wrapping logic that invokes the downstream compiler "locator" function, so that it does not report every failed dynamic library load as an error diagnostic (leading to compilation failure), but instead only reports such failures once the locator has reported failure. The form of the diagnostic output for failures is also changed, in that we now report a single umbrella error about failing to load a downstream compiler, and then report the actuall dynamic library load failures as notes on that diagnostic instead of errors of their own. This choice seems appropriate since for cases like NVRTC it is not the case that each failed library load is a compilation error. We only need one of the listed libraries to be loadable, so that reporting them all as errors risks confusing users. One wrinkle that arose during testing is that the 11.0 release of NVRTC dropped support for the `compute_30` target, which had previously been the minimum and default. I had to add logic to check for versions of 11 or greater and switch to `compute_35` as the default. Similar changes may be required as part of supporting newer NVRTC versions if support for more architectures gets deprecated and removed. A more complete implementation of this logic might try to load multiple NVRTC versions such that the Slang compiler can identify a suitable compiler based on the minimum feature level that code actually requires. That kind of cleanup is left as future work, since for most users the current approach will be sufficient. * testing: use verbose mode for running tests by default * fixup: guard against null diagnostic sink
*	Support dynamic existential shader parameters in render-test (#1525)	Yong He	2020-09-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Support dynamic existential shader parameters in render-test * Fix linux build error. * Fixes. * Fix code review issues. * Fix gcc error. * More fixes. * More fixes.
*	Add OrderedDictionary to core. (#1523)	Yong He	2020-08-28
\|
*	Vulkan update/NVAPI support (#1511)	jsmall-nvidia	2020-08-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First pass at incorporating nvapi into test harness. * D3d12 Atomic Float Add via NVAPI working * Dx12 atomic float appears to work. * Atomic float add on Dx12. * Added atomic64 feature addition to vk. Fix correct output for atomic-float-byte-address.slang * Disable atomic float failing tests. * Upgraded VK headers. * Detect atomic float availability on VK. * Try to get test working for in64 atomic. * Made HLSL prelude controlled via the render-test requirements. * Added -enable-nvapi to premake. * Fix D3D12Renderer when NVAPI is not available. * Small improvements to VKRenderer. * Improve atomic documentation in target-compatibility.md.
*	Remove IncludeHandler. (#1505)	jsmall-nvidia	2020-08-19
\| \| \| \| \| \| \| \|	nvAPI -> NVAPI nvAPIPath -> nvapiPath DxcIncludeHandler don't reference count. nv-api-path -> nvapi-path Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Support for float atomics on RWByteAddressBuffer (#1502)	jsmall-nvidia	2020-08-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix premake5.lua so it uses the new path needed for OpenCLDebugInfo100.h * Keep including the includes directory. * Added the spirv-tools-generated files. * We don't need to include the spirv/unified1 path because the files needed are actually in the spirv-tools-generated folder. * Put the build_info.h glslang generated files in external/glslang-generated. Alter premake5.lua to pick up that header. * First pass at documenting how to build glslang and spirv-tools. * Improved glsl/spir-v tools README.md * Added revision.h * Change how gResources is calculated. Update about revision.h * Update docs a little. * Split out spirv-tools into a separate project for building glslang. This was not necessary on linux, but is necessary on windows, because there is a file disassemble.cpp in spirv-tools and in glslang, and this leads to VS choosing only one. With the separate library, the problem is resolved. * Fix direct-spirv-emit output. * Update to latest version of spirv headers and spirv-tools. * Upgrade submodule version of glslang in external. * Add fPIC to build options of slang-spirv-tools * WIP adding support for InterlockedAddFp32 * Upgrade slang-binaries to have new glslang. * Fix issues with Windows slang-glslang binaries, via update of slang-binaries used. * WIP - atomicAdd. This solution can't work as we can't do (float) in glsl. WIP on atomic float ops. * Added checking for multiple decls that takes into account __target_intrinsic and __specialized_for_target. First pass impl of atomic add on float for glsl. * Split __atomicAdd so extensions are applied appropriately. * Made Dxc/Fxc support includes. Use HLSL prelude to pass the path to nvapi Added -nv-api-path * Refactor around IncludeHandler and impl of IncludeSystem * slang-include-handler -> slang-include-system Have IncludeHandler/Impl defined in slang-preprocessor * Small comment improvements. * Document atomic float add addition in target-compatibility.md. * CUDA float atomic support on RWByteAddressBuffer. * Add atomic-float-byte-address-buffer-cross.slang * Removed inappropriate-once.slang - the test is no longer valid when a file is loaded and has a unique identity by default. A test could be made, but would require an API call to create the file (so no unique id). Improved handling of loadFile - uses uniqueId if has one. * Work around for testing target overlaps - to avoid exceptions on adding targets. Simplify PathInfo setup. Modify single-target-intrinsic.slang - it no longer failed because there were no longer multiple definitions for the same target. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Refactor enumerating directory contents (#1478)	jsmall-nvidia	2020-08-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Use m_style for OSFindFilesResult * Refactor of FindFilesResult. * Fixes on linux for FindFiles. * Simplify FindFilesState, and linux support for pattern matching. * Small fixes to linux FindFilesState * Fix typo on linux FindFiles * Fix typo in linux FindFiles. * Renamed some variables, and improved comments on FindFiles. * Improve comments on FildFiles * Small improvements around FindFiles. * Refactor FindFiles again.. into a visitor and function in Path. * Fix some problems on linux. * Fix linux typo. * Renamed os -> find-file-util * find-file-utl -> directory-util Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Fix for bug where memory that has been allocated with new T[] (within a ↵	jsmall-nvidia	2020-07-31
\| \| \| \|	list) is freed with free in the RiffContainer. (#1473)
*	CUDA/CPU varying compute inputs as IR pass (#1438)	Tim Foley	2020-07-10
\| \| \| \| \| \| \| \| \| \| \| \| \|	The main change here is that the CPU and CUDA C++ emit paths now rely on an earlier IR pass to legalize the varying parameter list of a kernel and translate references to varying parameters with semantics like `SV_DispatchThreadID`. Doing so removes a lot of special-case logic from the emit passes. This work moves us even closer to being able to eliminate `KernelContext` from the CPU/CUDA emit logic, because it removes the issue of state related to varying inputs being stored in `KernelContext`. The new pass that handles the legalization is in `slang-ir-legalize-varying-params.cpp`, and it borrows heavily from the existing `slang-ir-glsl-legalize.cpp` pass. The new pass factors out the target-independent and target-dependent logic, so that both CPU and CUDA can share much of the same code despite having very different rules for how the system-value parameters are being provided. An eventual goal is to have the new pass also handle the GLSL case, but doing so requires copying even more logic out of the GLSL-specific pass, and doing so seemed like a step to far for what was meant to be a stepping-stone change as part of other work. As a result of the incomplete nature of the pass, certain cases don't work for compute shader inputs for CPU/CUDA (e.g., wrapping your varying inputs in a `struct` type parameter), but those were cases that also didn't work in the existing `emit`-based logic. One major consequence of this change is that the logic for emitting the various different functions that represent an entry point for our CPU back-end has been streamlined and simplified. The original logic had a fair bit of cleverness built in to try and avoid unnecessary math ops when computing the various IDs/indices, while the new logic is much more simplistic (the main dispatch function loops over threadgroups with a triply-nested `for` and then delegates to the group-level function loops over threads with its own nested `for`s). Longer term, it will be important to simplify the CPU functions we emit further, by eliminating things like the `_Thread` function that should never really be exposed to users (the minimum granularity of invoking a CPU compute kernel should be a single threadgroup). We may eventually decide to synthesize all of the extra code that is being generated in the `emit` pass as IR instead.
*	ShortList<T> and core.natvis improvements. (#1430)	Yong He	2020-07-06
\| \| \| \| \| \| \|	* ShortList<T> and core.natvis improvements. * Fix gcc build. * add `getBuffer()` accessor to `GetArrayViewResult`
*	Clean up unused code for IR object ownership (#1416)	Tim Foley	2020-06-30
\| \| \| \| \| \| \| \| \|	There was a small but non-trivial amount of code across `IRModule`, the `ObjectScopeManager`, and `StringRepresentationCache` that had to do with managing the lifetimes of `RefObject`s that might be referenced by IR instructions (and thus need to be kept alive for the lifetime of the IR module). We have long since migrated to a model where IR instruction do not include owned references to `RefObject`s, so these facilities weren't actually needed. This streamlines `IRModule`'s declaration, and trims code that we aren't actually using. One note for the future is that the `StringRepresentationCache` no longer does what its name implies (it is not a cache of `StringRepresentation`s), so we should consider giving it a more narrowly scoped name. I didn't include that in this change because I wanted to keep the diffs narrow and easy to review. A follow-on renaming change should be trivial if/when we can agree on what the type should be called at this point. Alternatively, we could simply bake the functionality of `StringRepresentationCache` into he IR deserialiation logic itself, since that is the only code using it.
*	Prelude is associated with SourceLanguage (#1398)	jsmall-nvidia	2020-06-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Associate a downstream compiler for prelude lookup even if output is source. * Remove LanguageStyle and just use SourceLanguage instread. * Added set/getPrelude. Made prelude work on source language. * Fix typo in method name replacement. get/SetPrelude get/setLanguagePrelude * Fix issue because of method name change. * Remove getPreludeDownstreamCompilerForTarget
*	Improvements around hashing (#1355)	jsmall-nvidia	2020-05-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fields from upper to lower case in slang-ast-decl.h * Lower camel field names in slang-ast-stmt.h * Fix fields in slang-ast-expr.h * slang-ast-type.h make fields lowerCamel. * slang-ast-base.h members functions lowerCamel. * Method names in slang-ast-type.h to lowerCamel. * GetCanonicalType -> getCanonicalType * Substitute -> substitute * Equals -> equals ToString -> toString * ParentDecl -> parentDecl Members -> members * * Make hash code types explicit * Use HashCode as return type of GetHashCode * Added conversion from double to int64_t * Split Stable from other hash functions * toHash32/64 to convert a HashCode to the other styles. GetHashCode32/64 -> getHashCode32/64 GetStableHashCode32/64 -> getStableHashCode32/64 * Other Get/Stable/HashCode32/64 fixes * GetHashCode -> getHashCode * Equals -> equals * CreateCanonicalType -> createCanonicalType * Catches of polymorphic types should be through references otherwise slicing can occur. * Fixes for newer verison of gcc. Fix hashing problem on gcc for Dictionary. * Another fix for GetHashPos * Fix signed issue around GetHashPos
*	Remove non-ascii characters from source file.	Yong He	2020-05-25
\|
*	AST dumping via C++ Extractor reflection (#1348)	jsmall-nvidia	2020-05-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add support for parsing array types to C++ extractor. * C++ extractor looks for 'balanced tokens'. Use for extracting array suffixes. * First pass at field dumping. * Update project for field dumping. * WIP AST Dumper. * More AST dump compiling. * Fix bug in StringSlicePool where it doesn't use the copy of the UnownedStringSlice in the map. * Add support for SLANG_RELFECTED and SLANG_UNREFLECTED More AST dump support. * Support for hierarchical dumping/flat dumping. Use SourceWriter to dump. * Add -dump-ast command line option. * Add fixes to VS project to incude AST dump. * Fix compilation on gcc. * Add fix for type ambiguity issue on x86 VS. * Fixes from merge of reducing Token size. * Fix comment about using SourceWriter.
*	Enhanced C++ extractor (#1340)	jsmall-nvidia	2020-05-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Extractor builds without any reference to syntax (as it will be helping to produce this!). * Change macros to include the super class. * Added indexOf(const UnownedSubString& in) to UnownedSubString. Refactored extractor * Output a macro for each type with the extracted info - can be used during injection in class * Simplify the header file - as can get super type and last from macro now * Store the 'origin' of a definition * Some small tidy ups to the extractor. * Improve comments on the extractor options. * Made CPPExtractor own SourceOrigins * Small fixes around SourceOrigin. * Small tidy up around macroOrign
*	C++ Extractor (#1337)	jsmall-nvidia	2020-05-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* WIP: Doing texing using slangs lexer for cpp-extractor * Node tree for C++ extraction. * Bug fixing. Add dump of hierarchy. * First pass at extracting fields. * Parse template types. * Use diagnostics defs for C++ extractor. * Simplify Diagnostic Defs. * Remove the brace stack. * Added IdentifierLookup. * Add handling for >> style template close. * Improved identifier handling/keywords. * Added ability to check if reader is at cursor position. * Handling of an unspecified root type. * Parsing code comments. Tidy up some parsing - to use advanceIf functions more. * Improve path handling. * Fixes around changes to Path interface. * Working Range, Type and Scope header. * Extract the middle part of marker and put in output. Gives more flexibility at macro injection, and in class definitions. * Split DERIVED types into it's own macro, to provide way to generate for derived types. * Fix clang/g++ compile issue. * Tabs -> spaces. * Fix small bug in getFileNameWithoutExt * Small improvement around naming. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
*	Add support for generic load/store on byte-addressed buffers (#1334)	Tim Foley	2020-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add support for generic load/store on byte-addressed buffers Introduction ============ The HLSL `ByteAddressBuffer` types originaly only supported loading/storing `uint` values or vectors of the same, using `Load`/`Load2`/`Load3`/`Load4` or `Store`/`Store2`/`Store3`/`Store4`. More recent versions of dxc have added support for generic `Load<T>` and `Store<T>`, which adds a two main pieces of functionality for users. The first and more fundamental feature is that `T` can be a type that isn't 32 bits in size (or a vector with elements of such a type), thus exposing a capability that is difficult or impossible to emulate on top of 32-bit load/store (depending on what guarantees `StructuredBuffer` makes about the atomicity of loads/stores). The secondary benefit of having a generic `Load<T>` and `Store<T>` is that it becomes possible to load/store types like `float` without manual bit-casting, and also becomes possible to load/store `struct` types so long as all the fields are loadable/storable. This change adds generic `Load<T>` and `Store<T>` to the Slang standard library definition of byte-address buffers, and tries to bring those same benefits to as many targets as possible. In particular, the secondary benefits become available on all targets, including DXBC: byte-address buffers can be used to directly load/store types other than `uint`, including user-defined `struct` types, so long as all of the fields of those types can be loaded/stored. The ability to load/store non-32-bit types depends on target capabilities, and so is only available where direct support for those types is available. For 16-bit types like `half` this includes both Vulkan and D3D12 DXIL with appropriate extensions or shader models. The implementation is somewhat involved, so I will try to explain the pieces here. Standard Library ================ The changes to the Slang standard library in `hlsl.meta.slang` are pretty simple. We add new `Load<T>` and `Store<T>` generic methods to `ByteAddressBuffer`, and route them through to a new IR opcode. Right now the generic `Load<T>` and `Store<T>` do not* place any constraints on the type `T`, although in practice they should only work when `T` is a fixed-size type that only contains "first class" uniform/ordinary data (so no resources, unless the target makes resource types first class). Our front-end checking cannot currently represent first-class-ness and validate it (nor can it represent fixed-size-ness), so these gaps will have to do for now. Rather than directly translate `Load<T>` or `Store<T>` calls into a single instruction, we instead bottleneck them through internal-use-only subroutines. The design choice here is intended to ensure that for some large user-defined type like `MassiveMaterialStruct` we only emit code for loading all of its fields once in the output HLSL/GLSL rather than once per load site. While downstream compilers are likely to inline all of this logic anyway, we are doing what we can to avoid generating bloated code. Emit and C++/CUDA ================= Over in `slang-emit-c-like.cpp` we translate the new ops into output code in a straightforward way. A call like `obj.Load<Foo>(offset)` will eventually output as a call like `obj.Load<Foo>(offset)` in the generated code, by default. For the CPU C++ and CUDA C++ codegen paths, this is enough to make a workable implementation, and we add suitable templated `Load<T>` and `Store<T>` declarations to the prelude for those targets. Legalization ============ For targets like DXBC and GLSL there is no way to emit a load operation for an aggregate type like a `struct`, so we introduce a legalization pass on the IR that will translate our byte-address-buffer load/store ops into multiple ops that are legal for the target. Scalarization ------------- The big picture here is easy enough to understand: when we see a load of a `struct` type from a byte-address buffer, we translate that into loads for each of the fields, and then assemble a new `struct` value from the results. We do similar things for arrays, matrices, and optionally for vectors (depending on the target). Bit Casting ----------- After scalarization alone, we might have a load of a `float` or a `float3` that isn't legal for D3D11/DXBC, but that would be legal if we just loaded a `uint` or `uint3` and then bit-casted it. The legalization pass thus includes an option to allow for loads/stores to be translated to operate on a same-size unsigned integer type and then to bit-cast. To make this work actually usable, I had to add some more details to the implementation of the bit-cast op during HLSL emit and, more importantly, I had to customize the way that the byte-address buffer load/store ops get emitted to HLSL so that it prefers to use the existing operations like `Load`/`Load2`/`Load3`/`Load4` instead of the generic one, whenever operating on `uint`s or vectors of `uint`. Translation to Structured Buffers --------------------------------- Even after scalarizing all byte-address-buffer loads/stores, we still have a problem for GLSL targets, because a single global `buffer` declaration used to back a byte-address buffer can only have a single element type (currently always `uint`), so the granularity of loads/stores it can express is fixed at declaration time. If we want to load a `half` from a byte-address buffer, we need a dedicated `buffer` declaration in the output GLSL with an element type of `half`. The solution we employ here is to translate all byte-address buffer loads into "equivalent" structured-buffer ops when targetting GLSL. We add logic to find the underlying global shader parameter that was used for a load/store and introduce a new structured-buffer parameter with the desired element type (e.g., `half`) and then rewrite the load/store op to use that buffer instead. We copy layout information from the original buffer to the new one, so that in the output GLSL all the various `buffer`s will use a single `binding` and thus alias "for free." We don't want to create a new global buffer for every load/store, so we try to cache these "equivalent" structured buffers as best as we can. For the caching I ended up needing a pair to use as a key, so I tweaked the `KeyValuePair<K,V>` type in `core` so that it could actually work for that purpose. Because we are working at the level of IR instructions instead of stdlib functions at this work I had to add new IR opcodes to represent structured-buffer load/store that only (currently) apply to GLSL. Layout ====== In order to translate a load/store of a `struct` type into per-field load/store we need a way to access layout information for the types of the fields. Previously layout information has been an AST-level concern that then gets passed down to the IR only when needed and only on global parameters, so layout information isn't always available in cases like this, at the actual load/store point. As an expedient move for now I've introduced a dedicated module that does IR-level layout and caches its results on the IR types themselves. This approach only supports the "natural" layout of a type, and thus is usable for structured buffers and byte-address buffers (or general pointer load/store on targets that support it), but which is not usable for things like constant buffer layout. We've known for a while that the Right Way to do layout going forward is to have an IR-based layout system, and this could either be seen as a first step toward it, or else as a gross short-term hack. YMMV. Details ======= The GLSL "extension tracker" stuff around type support needed to be tweaked to recognize that types like `int16_t` aren't actually available by default. I switched it from using a "black list" of unavailable types at initialization time over to using a "white list" of types that are known to always be available without any extensions. Tests ===== There are two tests checked in here: one for the basic case of a `struct` type that has fields that should all be natively loadable, and one that stresses 16-bit types. Each test uses both load and store operations. Future Directions ================= Right now we translate vector load/store to GLSL as load/store of individual scalars, which means the assumed alignment is just that of the scalars (consistent with HLSL byte-address buffer rules). We could conceivably introduce some controls to allow outputting the vector load/store ops more directly to GLSL (e.g., declaring a `buffer` of `float4`s), which might enable more efficient load/store based on the alignment rules for `buffer`s. The IR layout work has a number of rough edges, but the most worrying is probably the assumption that all matrices are laid out in row-major order. Slang really needs an overhaul of its handling of matrices and matrix layout, so I don't know if we can do much better in the near term. At some point the IR-based layout system needs to be reconciled with our current AST-base layout, and we need to figure out how "natural" layout and the currently computed layouts co-exist (in particular, we need to make sure that the IR-based layout and the existing layout logic for structured buffers will agree). This probably needs to come along once we have moved the core layout logic to operate on IR types instead of AST types (a change we keep talking about). As part of this work I had to touch the implementation of bit-casting for HLSL, and it seems like that logic has some serious gaps. We really ought to consider a separate legalization pass that can turn IR bitcast instructions into the separate ops that a target actually supports so that we can implement `uint64_t`<->`double` and other conersions that are technically achievable, but which are hard to express in HLSL today. * fixup: missing files
*	Fix typo in TypeTextUtil::getCompileTargetName(SlangCompileTarget target) ↵	jsmall-nvidia	2020-04-23
\| \| \| \|	(#1332)
*	Workaround for matching of dxc diagnostics (#1324)	jsmall-nvidia	2020-04-16
\| \| \| \| \|	* Specialized handling for comparison of dxc output that ignores line/column numbers. * Simplify areAllEqualWithSplit.
*	Initial work to support OptiX output for ray tracing shaders (#1307)	Tim Foley	2020-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Initial work to support OptiX output for ray tracing shaders This change represents in-progress work toward allowing Slang/HLSL ray-tracing shaders to be cross-compiled for execution on top of OptiX. The work as it exists here is incomplete, but the changes are incremental and should not disturb existing supported use cases. One major unresolved issue in this work is that the OptiX SDK does not appear to set an environment variable Changes include: * Modified the premake script to support new options for adding OptiX to the build. Right now the default path to the OptiX SDK is hard-coded because the installer doesn't seem to set an environment variable. We will want to update that to have a reasonable default path for both Windows and Unix-y platforms in a later chance. * I ran the premake generator on the project since I added new options, which resulted in a bunch of diffs to the Visual Studio project files that are unrelated to this change. Many of the diffs come from previous edits that added files using only the Visual Studio IDE rather than by re-running premake, so it is arguably better to have the checked-in project files more accurately reflect the generated files used for CI builds. * The "downstream compiler" abstraction was extended to have an explicit notion of the kind of pipeline that shaders are being compiled for (e.g., compute vs. rasterization vs. ray tracing). This option is used to tell the NVRTC case when it needs to include the OptiX SDK headers in the search path for shader compilation (and also when it should add a `#define` to make the prelude pull in OptiX). This code again uses a hard-coded default path for the OptiX SDK; we will need to modify that to have a better discovery approach and also to support an API or command-line override. * One note for the future is that instead of passing down a "pipeline type" we could instead pass down the list/set of stages for the kernels being compiled, and the OptiX support could be enabled whenever there is any ray tracing entry point present in a module. That approach would allow mixing RT and compute kernels during downstream compilation. We will need to revisit these choices when we start supporting code generation for multiple entry points at a time. * The CUDA emit logic is currently mostly unchanged. The biggest difference is that when emitting a ray-tracing entry point we prefix the name of the generated `__global__` function with a marker for its stage type, as required by the OptiX runtime (e.g., a `__raygen__` prefix is required on all ray-generation entry points). * The `Renderer` abstraction had a bare minimum of changes made to be able to understand that ray-tracing pipelines exist, and also that some APIs will require the name of each entry point along with its binary data in order to create a program. * The `ShaderCompileRequest` type was updated so that only a single "source" is supported (rather than distinct source for each entry point), and also the entry points have been turned into a single list where each entry identifies its stage instead of a fixed list of fields for the supported entry-point types. * The CUDA compute path had a lot of code added to support execution for the new ray-tracing pipeline type. The logic is mostly derived from the `optixHello` example in the OptiX SDK, and at present only supports running a single ray-generation shader with no parameters. The code here is not intended to be ready for use, but represents a signficiant amount of learning-by-doing. * The `slang-support.cpp` file in `render-test` was updated so that instead of having separate compilation logic for compute vs. rasterization shaders (which would mean adding a third path for ray tracing), there is now a single flow to the code that works for all pipeline types and any kind of entry points. * Implicit in the new code is dropping support for the way GLSL was being compiled for pass-through render tests, which means pass-through GLSL render tests will no longer work. It seems like we didn't have any of those to begin with, though, so it is no great loss. * Also implicit are some new invariants about how shaders without known/default entry points need to be handled. For example, the ray tracing case intentionally does not fill in entry points on the `ShaderCompileRequest` and instead fully relies on the Slang compiler's support for discovering and enumerating entry points via reflection. As a consequence of those edits the `-no-default-entry-point` flag on `render-test` is probably not working, but it seems like we don't have any test cases that use that flag anyway. Given the seemingly breaking changes in those last two bullets, I was surprised to find that all our current tests seem to pass with this change. If there are things that I'm missing, I hope they will come up in review. * fixup: issues from review and CI * Some issues noted during the review process (e.g., a missing `break`) * Fix logic for render tests with `-no-default-entry-point`. I had somehow missed that we had tests reliant on that flag. This required a bit of refactoring to pass down the relevant flag (luckily the function in question was already being passed most of what was in `Options`, so that just passing that in directly actually simplifies the call sites a bit. * There was a missing line of code to actually add the default compute entry points to the compile request. I think this was a problem that slipped in as part of some pre-PR refactoring/cleanup changes that I failed to re-test.
*	Improve diagnostic parsing from GCC. (#1303)	jsmall-nvidia	2020-03-31
\| \| \|	Enable x86_64 CPU tests on TC.
*	CUDA version handling (#1301)	jsmall-nvidia	2020-03-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* render feature for CUDA compute model. * Use SemanticVersion type. * Enable CUDA wave tests that require CUDA SM 7.0. Provide mechanism for DownstreamCompiler to specify version numbers. * Enabled wave-equality.slang * Make CUDA SM version major version not just a single digit. * Fix assert. * DownstreamCompiler::Version -> CapabilityVersion
*	Better diagnostics on failure on CUDA. (#1288)	jsmall-nvidia	2020-03-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Better diagnostics on failure on CUDA. * Catch exceptions in render-test * * Added ability to disable reporting on CUDA failures * Stopped using exception for reporting (just write to StdWriter::out() * Removed CUDAResult type * Don't set arch type on nvrtc to see if fixes CI issues. * Try compute_30 on CUDA. * Added ability to IGNORE_ a test DIsabled rw-texture-simple and texture-get-dimensions * Disable tests that require CUDA SM7.0 Use DISABLE_ prefix to disable tests. * Disable signalUnexpectedError doing printf.