slang.git - Making it easier to work with shaders

Age	Commit message (Collapse)	Author
2020-11-05	Refactor the flow of type legalization (#1594)	Tim Foley
	The existing type legalization logic worked as a single preorder pass over the IR tree. This could create problems in cases where an instruction might be processed before one of its operands (e.g., a function that references a global shader parameter is processed before that parameter). This change makes it so that type legalization uses a work list, and only adds instructions to the work list once their parent, type, and operands have been processed. As a result, we should be able to guarantee that an instruction will only be processed once all of its operands have been. One wrinkle here is that in the current IR it is possible to end up with a cycle of uses for global-scope instructions, specifically around interface types and their list of requirements. This change includes a short-term kludge to break those cycles and allow the pass to complete. As it stands, this is simply a refactoring pass and no new functionality is introduced. The changes are necessary to unblock work in a feature branch that depends on type legalization being more robust against IR that might use an unexpected ordering.
2020-11-05	Standard library save/loadable (#1592)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Fix handling of access modifiers inside type definition. * Fix access problem for AST node. Make dumping produce a single function with switch, to potentially make available without Dump specific access. * WIP on serialization design doc. * Remove project references to previously generated files. * More docs on serialization design. * Improve serialization documentation. Remove unused function from IRSerialReader. * Small fixes around naming. Remove long comment from slang-serialize.h - as covered in serialization.md * Remove long comment in slang-serialize.h as covered in serialization.md * More information about doing replacements on read for AST and problems surrounding. * Typo fix. * Spelling fixes. * Value serialize. * Value types with inheritence. * Use value reflection serial conversion for more AST types * Use automatic serialization on more of AST. * Get the types via decltype, simplifies what the extractor has to do. * Update the serialization.md for the value serialization. * Small doc improvements. * Update project. * Remove ImportExternalDecl type Added addImportSymbol and ImportSymbol type Fixed bug in container which meant it wouldn't read back AST module * Because of change of how imports and handled, store objects as SerialPointers. * First pass symbol lookup from mangled names. * Cache current module looked up from mangled name. * Fix SourceLoc bug. Improve comments. * Added diagnostic on mangled symbol not being found * Fix typo. * WIP serializing stdlib. * WIP serializing stdlib in. * Fix problem serializing arrays that hold data that is already serialized. * Remove clash of names in MagicTypeModifier. * Make conversion from char to String explicit. Fix reference count issue with SerialReader. * Add code to save/load stdlib. * Use return code to avoid warning - SerialContainerUtil::write(module, options, &stream)) * Make all String numeric ctors explicit. Added isChar to UnownedStringSlice. Added operator== for UnownedStringSlice to String to avoid need to convert to String and allocate. * Add error check to readAllText. * tabs -> spaces on String.h * tab -> spaces String.cpp * Remove msg for StringBuilder, just build inplace for exceptions. * Check SerialClasses - for name clashes. Renamed Modifier::name as Modifier::keywordName * Handling of extensions when deserializing AST - updating the moduleDecl->mapTypeToCandidateExtensions Co-authored-by: Tim Foley <tim.foley.is@gmail.com>
2020-11-04	Improve insertion location for "hoistable" instructions (#1593)	Tim Foley
	The Slang IR builder has a notion of "hoistable" instructions, which are basically those instructions that represent a pure side-effect-free operation on their operands, and which can and should be deduplicated. Most types are "hoistable" instructions. In order to make deduplication of hoistable instructions work, we need to emit them at the right location. Consider if we had: ```hlsl void myFunc<T>(...) { if(someCondition) { vector<T, 4> a = ...; ... } else { vector<T, 4> b = ...; } } ``` The IR instruction that represents `vector<T,4>` can't be inserted at the global scope, because then the parameter `T` would not be visible to it. That instruction also shouldn't be inserted into the same block that declares `a`, because then the instruction itself wouldn't be visible at the point where `b` is declared. The IR builder already has logic to pick the right parent instruction. In the example given, the IR instruction for `vector<T,4>` should be inserted into the body of the IR generic, but outside of the IR function that represents `myFunc`. The problem this change fixes is that while the logic was picking the parent for a hoistable instruction correctly, it wasn't putting much care into pick the insertion location. The existing strategy amounted to: * If the IR builder was set with an insertion location inside the chosen parent, then use that insertion location * Otherwise, insert at the end of the chosen parent Neither of those options is perfect. Either could lead to an instruction being inserted after one of its uses, and the second option could even lead to a type being inserted after the `return` instruction in a function/generic, which violates another structural invariant of our IR (that every block must end with a terminator, and terminators must only appear at the end of blocks). This change updates the rules as follows: * If the type of the instruction being created, or any of its operands are in the chosen parent, then insert immediately after whichever of those instructions is last in that parent. * Otherwise, insert before the first non-decoration, non-parameter child of the chosen parent The combined effect of these two rules is now that we insert any hoistable instruction as early as we can in its parent, without violating the structural validity rules. (One small exception to these rules is that if the parent is the module then we don't worry about ordering and just insert at the end, since order-of-declaration isn't significant at module scope in our IR) All of our existing tests work with this new behavior, although there could conceivably be future cases that lead to complicated breakage. For example, if a pass looks at the first "ordinary" instruction in a block and saves it to use as an insertion point for parameter, and then proceeds to manipulate code in the block before going back and inserting parameters at the chosen location, there is a chance that a hoistable instruction might have been inserted before the chosen insertion point, leading to a parameter being inserted after an ordinary instruction. In general, though, code that works like that would already be playing a dangerous game in that it is manipulating instructions in a block while assuming the first instruction will remain fixed. This change is currently just a refactor, but the underlying issue surfaced as a bug when I made other changes in a feature branch.
2020-10-29	Generate `switch` based dynamic dispatch logic. (#1591)	Yong He
	Co-authored-by: Tim Foley <tim.foley.is@gmail.com>
2020-10-29	Handling imported/exporting symbols from serialized modules (#1589)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Fix handling of access modifiers inside type definition. * Fix access problem for AST node. Make dumping produce a single function with switch, to potentially make available without Dump specific access. * WIP on serialization design doc. * Remove project references to previously generated files. * More docs on serialization design. * Improve serialization documentation. Remove unused function from IRSerialReader. * Small fixes around naming. Remove long comment from slang-serialize.h - as covered in serialization.md * Remove long comment in slang-serialize.h as covered in serialization.md * More information about doing replacements on read for AST and problems surrounding. * Typo fix. * Spelling fixes. * Value serialize. * Value types with inheritence. * Use value reflection serial conversion for more AST types * Use automatic serialization on more of AST. * Get the types via decltype, simplifies what the extractor has to do. * Update the serialization.md for the value serialization. * Small doc improvements. * Update project. * Remove ImportExternalDecl type Added addImportSymbol and ImportSymbol type Fixed bug in container which meant it wouldn't read back AST module * Because of change of how imports and handled, store objects as SerialPointers. * First pass symbol lookup from mangled names. * Cache current module looked up from mangled name. * Fix SourceLoc bug. Improve comments. * Added diagnostic on mangled symbol not being found * Fix typo. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-10-28	Add sequential ID cache in Linkage for witness tables and RTTI objects. (#1590)	Yong He

2020-10-26	Value type serialization via C++ Extractor (#1588)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Fix handling of access modifiers inside type definition. * Fix access problem for AST node. Make dumping produce a single function with switch, to potentially make available without Dump specific access. * WIP on serialization design doc. * Remove project references to previously generated files. * More docs on serialization design. * Improve serialization documentation. Remove unused function from IRSerialReader. * Small fixes around naming. Remove long comment from slang-serialize.h - as covered in serialization.md * Remove long comment in slang-serialize.h as covered in serialization.md * More information about doing replacements on read for AST and problems surrounding. * Typo fix. * Spelling fixes. * Value serialize. * Value types with inheritence. * Use value reflection serial conversion for more AST types * Use automatic serialization on more of AST. * Get the types via decltype, simplifies what the extractor has to do. * Update the serialization.md for the value serialization. * Small doc improvements. * Update project.
2020-10-23	Serialization design doc first pass (#1587)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * WIP on serialization design doc. * More docs on serialization design. * Improve serialization documentation. Remove unused function from IRSerialReader. * Small fixes around naming. Remove long comment from slang-serialize.h - as covered in serialization.md * Remove long comment in slang-serialize.h as covered in serialization.md * More information about doing replacements on read for AST and problems surrounding. * Typo fix. * Spelling fixes.
2020-10-23	C++ extractor fix for access modifiers (#1586)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Fix handling of access modifiers inside type definition. * Fix access problem for AST node. Make dumping produce a single function with switch, to potentially make available without Dump specific access. * Remove project references to previously generated files.
2020-10-22	Generate `if` based dispatch logic on GPU targets. (#1585)	Yong He

2020-10-22	Single pass C++ extraction (#1583)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Added CharUtil. Added TypeSet to extractor. First pass at being able to specify all headers for multiple output headers. * Fix includes for new C++ extractor convension. Update premake5 to use new extractor mechanisms. * Small improvements around StringUtil. * Split out NameConventionUtil. * Use a 'convert' to convert between convention types. * Fix output of build message for C++ extractor. Improve NameConventionUtil interface. * Improve comments. * Fix warning on gcc. * Fix clang warning. * Fix some typos in NameConventionUtil. * Small fix to premake5.lua * Fix generated includes. * Remove m_reflectType as no longer applicable with TypeSet. * Fix .gitignore for slang-generated-* files. Added getConvention to determine convention from slice. Add versions of split and convert that infer the from convention * Fix typo in spliting camel. * LineWhitespace -> HorizontalWhitespace * Improve CharUtil comments.
2020-10-20	Bottleneck interface dispatch calls through a single function. (#1584)	Yong He

2020-10-20	Small improvement in AST serialization (#1582)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Make AST serialization types, marker include _AST_. Ie SLANG_CLASS -> SLANG_AST_CLASS and SLANG_ABSTRACT_CLASS -> SLANG_ABSTRACT_AST_CLASS
2020-10-19	Fix saving Repro files on Linux (#1581)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Ascii mode not always set on FileStream. Remove this-> if not needed. Simplify setting of m_fileAccess. * Fix typo. * Fix typo. * Clear up default FileAccess calculation. * Convert tabs to spaces. * Small naming improvements in FileStream::seek.
2020-10-19	Hotfix: Crash due to ContainerDecl->members being altered whislt iterated ↵	jsmall-nvidia
	over (#1580) * #include an absolute path didn't work - because paths were taken to always be relative. * Access the members iteration in _ensureAllDeclsRec via indices to avoid a change in the array invalidating the list. * Fix another iterator of members in SemanticVisitor * Slight improvements to comments - main purpose is to kick a new build.
2020-10-15	Fix a bug in IR lowering (#1578)	Tim Foley
	The basic problem here is that when a function has multiple declarations with matching signatures (e.g., a forward declaration and then a later definition with a body), the IR lowering logic would lower all declarations whenever the first one was encountered, but then would only register an IR value as the lowered version of the first declaration. Other matching declarations would then run the risk of being lowered again, and in the case where they included features like loops with break/continue labels, that would create the risk of keys getting inserted into certain dictionaries more than one, leading to exceptions. This change ensures that when lowering a function that has multiple matching declarations to IR, we register an IR value for all of those declarations and not just the first. I have added a test case that leads to a crash without this change, to ensure that we don't introduce a regression down the line.
2020-10-15	Fix Vk leak (#1579)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Handle scope of VkShaderModule. * Fix tabbing issue.
2020-10-14	Add reflection API access to global params type layout (#1577)	Tim Foley
	This change adds a single new entry point to our reflection API that allows an application to query the `TypeLayout` that represents the global-scope shader parameters. This can be used by the application in order to detect when the global parameters have required allocation of a default constant buffer, or simply to unify the handling of the global scope with handling of other kinds of parameters.
2020-10-13	Repro test that loads repro (#1576)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Slang repro test that reloads and runs compiled code.
2020-10-09	Support CUDA bindless texture in dynamic dispatch code. (#1575)	Yong He

2020-10-09	Make RTTI objects __constant__ in CUDA (#1573)	Yong He
	Co-authored-by: Yong He <yhe@nvidia.com>
2020-10-09	NVAPI support doc (#1574)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Split out NVAPI documentation. Attempt to describe updated usage. * Discuss downstream compiler include paths issues. * Fix links . * Apparently github supports relative links... * Fix typo.
2020-10-07	Fix C++ emit for `bit_cast` inst. (#1570)	Yong He
	Co-authored-by: Yong He <yhe@nvidia.com>
2020-10-06	Use Reflection for (Serial)RefObject Serialization (#1567)	jsmall-nvidia
	* First pass at generalizing serializer. * Split out ReflectClassInfo * Use the general ReflectClassInfo * Fix some typos in debug generalized serialization. * Add calculation of classIds. Make distinct addCopy/add on SerialClasses. * Write up of more generalized serialization * WIP to transition from ASTSerialReader/Writer etc to generalized SerialReader/Writer and associated types. * Improvements to SerialExtraObjects. Keep RefObjects in scope in factory * Compiles with Serial refactor - doesn't quite work yet. * First pass serialization appears to work with refector. * Split out type info for general slang types. * Split out slang-serialize-misc-type-info.h * DebugSerialData -> SerialSourecLocData DebugSerialReader -> SerialSourceLocReader DebugSerialWriter -> SerialSourceLocWriter * Remove unused template that only compiles on VS. * Fix warning around unused function on non-VS. * Improve output of type names that are in scopes in C++ extractor. Update premake5.lua to run generation for RefObject derived types. * C++ extractor working on RefObject type. * Split out serialization functionality that spans different types into slang-serialization-factory.cpp/.h Put AST type info into header. Removed RefObjectSerialSubType - use RefObjectType Add filtering for RefObject derived types Remove construction and filteringhacks. * Set up field serialization for SerialRefObject derived types. * Fix template problem compiling on Clang/Gcc * Work in progress to make Value types work. * Added slang-value-reflect.cpp
2020-10-06	InterlockedExchangeU64 support on RWByteAddressBuffer (#1572)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Added [__requiresNVAPI] to functions that need nvapi support. * Added support for InterlockedExchangeU64 Added exchange-int64-byte-address-buffer test Fixed typo in cas-int64-byte-address-buffer test * Improve comment around NVAPI usage in hlsl.meta.slang
2020-10-06	Added [__requiresNVAPI] to functions missing it (#1571)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Added [__requiresNVAPI] to functions that need nvapi support.
2020-10-05	Small fixes for CUDA code emit (#1564)	Tim Foley
	* Small fixes for CUDA code emit * Add a CUDA translation to `GroupMemoryBarrierWithWaveSync()`. We map this to `__syncwarp()` for CUDA (with no mask, implying a full-warp sync). * Consistently use `SLANG_PRELUDE_ASSERT` for assertions introduced in code emit (rather than just using the bare `assert(...)` function, which is not included by our CUDA prelude by default) * Add a new `SLANG_CUDA_STRUCTURED_BUFFER_NO_COUNT` flag to the CUDA prelude that allows the `count` field to be omitted from `(RW)StructuredBuffer<T>`. This is a bit of a hacky because the computed layouts will still assume the `count` field is present, but this feature is required by at least one client application for now. A better long-term fix will take more time to design and implement. * fixup: CUDA prelude code fix for pedantic compilers Co-authored-by: Tim Foley <tim.foley.is@gmail.com> Co-authored-by: Yong He <yonghe@outlook.com>
2020-10-05	Update the type of a call inst during specialization. (#1569)	Yong He

2020-10-04	Handle partial existential parameter type specialization. (#1568)	Yong He
	* Specialize exsitentials parameters in struct fields. * Cleanup. * Handle partial existential parameter type specialization. Co-authored-by: Yong He <yhe@nvidia.com>
2020-10-02	Use new vulkan debug layer. (#1566)	Yong He
	* Use new vulkan debug layer. * Try use VK_LAYER_KHRONOS_validation when it exists. Co-authored-by: Tim Foley <tim.foley.is@gmail.com>
2020-10-02	Specialize exsitentials parameters in struct fields. (#1565)	Yong He
	* Specialize exsitentials parameters in struct fields. * Cleanup. Co-authored-by: Yong He <yhe@nvidia.com>
2020-09-30	Generalizing Serialization (#1563)	jsmall-nvidia
	* First pass at generalizing serializer. * Split out ReflectClassInfo * Use the general ReflectClassInfo * Fix some typos in debug generalized serialization. * Add calculation of classIds. Make distinct addCopy/add on SerialClasses. * Write up of more generalized serialization * WIP to transition from ASTSerialReader/Writer etc to generalized SerialReader/Writer and associated types. * Improvements to SerialExtraObjects. Keep RefObjects in scope in factory * Compiles with Serial refactor - doesn't quite work yet. * First pass serialization appears to work with refector. * Split out type info for general slang types. * Split out slang-serialize-misc-type-info.h * DebugSerialData -> SerialSourecLocData DebugSerialReader -> SerialSourceLocReader DebugSerialWriter -> SerialSourceLocWriter * Remove unused template that only compiles on VS. * Fix warning around unused function on non-VS.
2020-09-26	Add API for whole program compilation. (#1562)	Yong He
	* Add API for whole program compilation. This change exposes a new target flag: `SLANG_TARGET_FLAG_GENERATE_WHOLE_PROGRAM` that can be set on a target with `spSetTargetFlags`. When this flag is set, `spCompile` function generates target code for the entire input module instead of just the specified entrypoints. The resulting code will include all the entrypoints defined in the input source. The resulting whole program code can be retrieved with two new functions: `spGetTargetCodeBlob` and `spGetTargetHostCallable`. This change also cleans up the unnecessary `entryPointIndices` parameter of `TargetProgram::getOrCreateWholeProgramResult`, and modifies the `cpu-hello-world` example to make use of the new whole-program compilation API to simplify its logic. * Update comments.
2020-09-24	Enable default cpp prelude. (#1560)	Yong He
	* Enable default cpp prelude. * Print the "#include" line as a normal source if the file does not exist. * Bug fix * Fix. * Fix c++ prelude header. * Remove unnecessary fopen call.
2020-09-24	Refactor preprocessor API to avoid coupling (#1559)	Tim Foley
	Based on review feedback from #1556, this change updates the Slang preprocessor so that it is no longer coupled to policy details from higher levels of the software stack. In particular, the preprocessor used to: * Deal with updating the list of file paths that a `Module` depends on. * (As of #1556) detect NVAPI-related macro definitions and use them to construct an AST-level `Modifier` attached to the `ModuleDecl`. This change introduces a callback interface where the `Preprocessor` calls out to a `PreprocessorHandler` at certain points during execution, allowing the handler to introduce custom logic that suits a particular high-level use case. This change also removes the dependence of the preprocessor on the `Linkage`, because in practice only a small number of its sub-objects were needed. As a convenience, a wrapper function that takes a `Linkage` was left in place so that the existing call sites didn't have to change very much.
2020-09-23	Fix GLSL output for byte-address loads of vectors (#1558)	Tim Foley
	While working on #1557, it became clear that something was going wrong when using `*ByteAddressBuffer.Load<T>` to load a vector type on GLSL/SPIR-V targets. The root problem was that the IR-level layout logic (which computes the "natural" layout of a type) had not yet been extended to handle vectors. The fix is simple enough, but it highlights the fact that we probably need to go ahead and "complete" that layout logic sooner or later. This change includes a test case that covers the behavior added here, as well as the case that #1557 fixes. Unfortunately, due to CI system limitations, the HLSL/dxc part of the test is not yet enabled.
2020-09-23	Simplify workflow when using NVAPI (#1556)	Tim Foley
	In some cases, functionality is available as either a GLSL extension for Vulkan/SPIR-V, or through the NVAPI system for D3D. This situation creates complications because while GLSL extensions are generally all supported by the open-source glslang compiler (which we can bundle and ship), NVAPI operations are exposed through a specific header (`nvHLSLExtns.h`) that ships as part of the NVAPI SDK. When a user wants to explicitly use NVAPI-provided operations in their shader code, there are no major complications for Slang; the user sets up their include paths, `#include`s the relevant header, calls functions in it, and lets Slang deal with the details of compilation. The challenge for Slang arises when we want to provide a cross-platform interface in our standard library (e.g., the `RWByteAddressBuffer.InterlockedAddF32` method that was recently added) that uses either a GLSL extension (when compiling for Vulkan/SPIR-V) or an NVAPI (when compiling to DXBC or DXIL). In that case, the code generated by Slang now has a dependency on NVAPI, and we need to somehow emit a `#include` directive that pulls it in when invoking fxc or dxc. Because we do not (and seemingly cannot) bundle the NVAPI header with the compiler, we have to rely on ther user to have it available and to somehow communicate to Slang where it is. Exposing portable routines that sometimes use NVAPI currently creates two main challenges: 1. The user is forced to interact with the "prelude" mechanism in the compiler, which allows the programmer to define code in a given target language that gets prepended to the Slang-generated code. While the prelude mechanism is powerful, it is also hard for users to integrate into their workflow, and our experience so far is that users want something that Just Works. 2. If the user writes code that uses some of our abstract operations that layer on NVAPI and they also want to use NVAPI explicitly, they end up with two copies of the NVAPI header (one included by the Slang front-end, and another included by the downstream fxc/dxc compiler). This puts the user in the situation of (a) having to ensure that they set the defines like `NV_SHADER_EXTN_SLOT` consistently both when invoking Slang and when adding their prelude, and (b) even if they do make the definitions consistent, they run into the problem that fxc/dxc complain about overlapping register bindings on the two copies of the `g_NvidiaExt` global shader paraemter that the NVAPI header declares. This change attempts to resolve both issues by adding a lot of "do what I mean" logic to the compiler to try to ease things in the common case. In particular: 1. The user no longer needs to use the "prelude" mechanism when using NVAPI. The compiler now embeds a default prelude for HLSL output, which will `#include` the NVAPI header if and only if the generated code needs NVAPI access because of portable standard library routines that were used. 2. The user can mix-and-match explicit NVAPI use and stdlib functions that compile to use NVAPI. The register/space to be used by NVAPI when included via prelude is now set based on whatever the user set via the preprocessor so that it should automatically be consistent between both cases. Furthermore, the code we emit for the declaration of `g_NvidiaExt` when compiling explicit NVAPI use is set up to be conditional, so that it is skipped in the case where the prelude will pull in its own declaration of that parameter. The way all this is achieved involves a lot of moving pieces: * We now have an HLSL prelude, which mostly just serves to `#include "nvHLSLExtns.h"` in the case where NVAPI support is needed downstream. * Standard library operations that require NVAPI for their implementation on HLSL include a new `[__requiresNVAPI]` attribute. * The preprocessor has been extended so that after tokenizing an input file it looks up the NVAPI-relevant macros in the resulting environment, and if they are set it attached a modifier (`NVAPISlotModifier1) to the AST `ModuleDecl` that is based on their values. Logic is added to detect if multiple input files specify values for the macros in ways that conflict. * The semantic checking step is extended so that it detects the "magic" NVAPI declarations (the `g_NvidiaExt` paramter and the `NvShaderExtnStruct` type that it uses) and attaches a modifier to them so that they can be identified as such in later steps. * Parameter binding is extended to collect a list of the AST modifiers that reflect NVAPI binding, and to reserve the relevant register(s) so that ordinary user-defined parameters cannot conflict with them. * IR lowering translates the three new AST modifiers related to NVAPI over to IR equivalents. * IR linking is extended to make sure that it clones any `IRNVAPISlotDecoration`s attached to the input modules. The pass intentionally does not care where the modifiers came from; it just collects them all and leaves it to downstream code to sort out what they mean. * Emit logic is extended to have a notion of "prelude directives" which are preprocessor directives that should come before the prelude in the generated code, because they can impact the way that the prelude compiles. This is done so that we don't have to introduce ad hoc logic for each downstream compiler to set any relevant `-D` flags (e.g., both fxc and dxc would need to duplicate such logic for NVAPI support). * The HLSL source emitter is extended to track whether it emits any operations that require NVAPI support. * The HLSL source emitter is extended to emit prelude directives based on whether NVAPI is needed and, if it is, to also set the register and space that NVAPI should use based on what was stored in the decoration(s) on the IR module. * The HLSL source emitter is extended so that it detects global instructions that represent "magic" NVAPI constructs , and emit them as conditional definitions so that they are skipped when NVAPI is included via the prelude. * The handling of requires capabilities during emit logic was cleaned up a bit so that more logic is shared across targets, and also so that the same logic is used both when emitting a function declaration/definition and when emitting a call to an instrinsic function (which won't get declared/defined).
2020-09-23	Fix a bug around byte-address buffer loads of vectors (#1557)	Tim Foley
	This problem is only visible when * using `RWByteAddressBuffer.Load<V>`, * when `V` is `vector<T,N>`, * and `V` is a non-32-bit type In such a case, the Slang compiler generates output HLSL like: someBuffer.Load<vector<T,N>>(someOffset); and dxc balks because it fails to parse the `>>` as closing the generics, and instead parses it as a shift operator. The solution here is simple: add a space before the closing `>` when emitting a `.Load<T>` operation. Note that this change does not come with a test fix yet because writing the test case exposed a more complicated issue with GLSL codegen for this same scenario. This change includes the simple single-byte fix, to unblock users while we work on a fix for the GLSL case (and that fix will include the test coverage).
2020-09-21	Allow #include of absolute paths (#1555)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Improve comments. * Small comment improvement.
2020-09-21	Enable all dynamic dispatch tests on CUDA. (#1552)	Yong He
	* Enable all dynamic dispatch tests on CUDA. * Fix expected cross-compile test results.
2020-09-18	Serialization fixes based on review of #1547 (#1551)	jsmall-nvidia
	* Test if blob is returned. * Rename serialize files so can be grouped. * StringRepresentationCache -> SerialStringTable * Split out SerialStringTable from slang-serialize-ir * First pass at reorganizing serialization/containers. Remain some issues about debug info. * Fix bug in calculating sourceloc. * Improve calcFixSourceLoc * Make allocations for payload RiffContainer align to at least 8 bytes. This is important for read, if the payload can contain 8 byte aligned data. Note this has no effect on Riff file format alignment rules. * Improve comments around RiffContainer and alignment. * Remove SerialStringTable, can just use StringSlicePool instead. * Add flags to control what is output in SerialContainer. Turn off AST output for obfuscated code. Lazily create astClasses when doing write container serialization. * Typo fix for Clang/Linux. * Fixes that came out of review * TranslationUnit -> Module * TargetModule -> TargetComponent * PAYLOAD_MIN_ALIGNMENT -> kPayloadMinAlignment
2020-09-18	Control container serialization with SerialOptionFlags (#1550)	jsmall-nvidia
	* Test if blob is returned. * Rename serialize files so can be grouped. * StringRepresentationCache -> SerialStringTable * Split out SerialStringTable from slang-serialize-ir * First pass at reorganizing serialization/containers. Remain some issues about debug info. * Fix bug in calculating sourceloc. * Improve calcFixSourceLoc * Make allocations for payload RiffContainer align to at least 8 bytes. This is important for read, if the payload can contain 8 byte aligned data. Note this has no effect on Riff file format alignment rules. * Improve comments around RiffContainer and alignment. * Remove SerialStringTable, can just use StringSlicePool instead. * Add flags to control what is output in SerialContainer. Turn off AST output for obfuscated code. Lazily create astClasses when doing write container serialization. * Typo fix for Clang/Linux.
2020-09-17	Initial attempt to enable CUDA dynamic dispatch codegen (#1549)	Yong He
	* Front-load cuda module loading to fill in RTTI pointers. * Enable dynamic dispatch codegen for CUDA.
2020-09-17	Front-load cuda module loading to fill in RTTI pointers. (#1548)	Yong He
	Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-09-17	Share debug information between AST and IR (#1547)	jsmall-nvidia
	* Test if blob is returned. * Rename serialize files so can be grouped. * StringRepresentationCache -> SerialStringTable * Split out SerialStringTable from slang-serialize-ir * First pass at reorganizing serialization/containers. Remain some issues about debug info. * Fix bug in calculating sourceloc. * Improve calcFixSourceLoc * Make allocations for payload RiffContainer align to at least 8 bytes. This is important for read, if the payload can contain 8 byte aligned data. Note this has no effect on Riff file format alignment rules. * Improve comments around RiffContainer and alignment. * Remove SerialStringTable, can just use StringSlicePool instead. * Typo fix for Clang/Linux. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-09-17	Embed default prelude for CUDA (#1546)	Tim Foley
	* Embed default prelude for CUDA Slang supports the notion of a "prelude" that gets prepended to the source code we generate in language. For some targets, a prelude is not necessary (e.g., we compile to HLSL/GLSL and then on to DXBC/DXIL/SPIR-V just fine without a prelude), but some targets have been implemented in a way that makes a prelude necessary (notably CPU and CUDA). For the targets that require a prelude, the Slang codebase includes usable preludes under the `prelude/` directory. Prior to this change, if a user was compiling for such a target (whether via command-line or API), there had to take responsibility for specifying the prelude to use (usually by passing in the contents of the prelude file(s) already included in the Slang distribution). It is reasonable for a user to expect an out-of-the-box experience where compilation to CUDA PTX or native CPU code should Just Work, similarly to how compilation to SPIR-V Just Works. This change is a step in the direction of providing a user experiene that Just Works for common cases. The main addition here is a tool called `slang-embed` that we run during our build to turn the `prelude/.h` files into `prelude/.h.cpp` files that embed the contents of the original `.h` file as a `const` variable. By compiling and linking in the generated `.h.cpp` file for the CUDA prelude, we are then able to set the default prelude to use for CUDA at the time a session/linkage is created. That default prelude will be used unless the user manually specifies their own prelude (which current users of the CUDA back-end must be doing). This change only sets up a default prelude for CUDA because of the way that the CPU prelude is split across multiple files. A strategy that provides a good default prelude for CPU may take more work, but that work might also be unnecessary if we switch to a strategy of using LLVM to generate native code. The implementation of the `slang-embed` tool is intentionally simple, and it will likely run into issues if/when we need to embed binary files or larger text files. The assumption being made here is that we can address those issues when they arise, and there is no reason to over-engineer the tool right now. The way that `slang-embed` is integrated into our build process is likely to require some iteration to make sure that it works across all platforms. I expect that this change will have multiple follow-up fixes related to trying to get the build to work as expected across all targets on CI. * fixup: trying to ensure that embedded prelude gets compiled into slang * fixup: properly clean up allocations in slang-embed * fixup: fix double free introduced by previous change * fixup: off-by-one allocation error
2020-09-17	Fix an issue with double-counting uniform data for CUDA/CPU (#1545)	Tim Foley
	The `SimpleScopeLayoutBuilder` helper that is used to build up binding information for entry-point parameter lists has logic to try to support both explicit and implicit binding of parameters. This logic was added as part of supporting dual-source color blending on Vulkan. The basic approach is similar to that used for the global scope, where parameters with explicit binding first "carve out" the ranges they claim via a `UsedRangeSet`, and then parameters without explicit binding allocate space from what is left. The logic is (seemingly by accident) also applied to uniform/ordinary data, which creates a problem because the `ScopeLayoutBuilder` base type is also responsible for computing a layout for uniform/ordinary data that is 100% implicit (while dealing with all the relevant alignment restrictions). That logic goes on to add the computed uniform/ordinary resource usage to the computed type layout, but because such a layout has already been computed (albeit without taking alignment into account), the result is that the uniform/ordinary usage is reported at approximately double what it should be. The fix here is to skip uniform/ordinary resource usage when doing the explicit/implicit dance in `SimpleScopeLayoutBuilder`. This approach means that explicit bindings on entry-point `uniform` parameters will only apply to resources (which matches our rules for the global scope, where we don't allow for explicit binding on uniform/ordinary parameters). This is appropriate since the only reason we are supported explicit layout at all is for dual-source color blending (in general, we only support explicit `register` and `[[vk::binding(...)]]` modifiers on global parameters; users are stuck with our computed layouts in all other cases. Co-authored-by: Yong He <yonghe@outlook.com>
2020-09-16	Fix some issues around dim3 for CUDA (#1544)	Tim Foley
	The logic we use to compute `SV_DispatchThreadID` and friends for CUDA makes use of the `gridDim` and `blockDim` built-in variables in CUDA. These variables have type `dim3` which is similar to `uint3` but is considered a distinct type for some reason. The logic for computing the `SV`s currently pretends that `gridDim` and `blockDim` are `uint3`s, and this means that the code they emit doesn't always compile cleanly (although it does in our existing test cases...). This change adds a few overloads that work on `dim3` to the CUDA prelude and that seem to make the code we emit work again. Note: This change should be seen as a somewhat hacky quick fix rather than a real resolution to the underlying issue. It is probably better if we emit code that replaces uses of `gridDim` with `uint3(gridDim.x, gridDim.y, gridDim.z)` instead, to ensure that we get the typing correct, even if the result looks less idiomatic.
2020-09-16	Search for multiple NVRTC versions (#1543)	Tim Foley
	* Search for multiple NVRTC versions The main change here is that when locating the NVRTC compiler we try multiple library names and take the first one that loads successfully (with an ordering that means we try newer versions before older ones). In order to support this change, I needed to fix the wrapping logic that invokes the downstream compiler "locator" function, so that it does not report every failed dynamic library load as an error diagnostic (leading to compilation failure), but instead only reports such failures once the locator has reported failure. The form of the diagnostic output for failures is also changed, in that we now report a single umbrella error about failing to load a downstream compiler, and then report the actuall dynamic library load failures as notes on that diagnostic instead of errors of their own. This choice seems appropriate since for cases like NVRTC it is not the case that each failed library load is a compilation error. We only need one of the listed libraries to be loadable, so that reporting them all as errors risks confusing users. One wrinkle that arose during testing is that the 11.0 release of NVRTC dropped support for the `compute_30` target, which had previously been the minimum and default. I had to add logic to check for versions of 11 or greater and switch to `compute_35` as the default. Similar changes may be required as part of supporting newer NVRTC versions if support for more architectures gets deprecated and removed. A more complete implementation of this logic might try to load multiple NVRTC versions such that the Slang compiler can identify a suitable compiler based on the minimum feature level that code actually requires. That kind of cleanup is left as future work, since for most users the current approach will be sufficient. * testing: use verbose mode for running tests by default * fixup: guard against null diagnostic sink
2020-09-14	Support shader parameters that are an array of existential type. (#1542)	Yong He
	* Support shader parameters that are an array of existential type. * Rename to getFirstNonExistentialValueCategory Co-authored-by: Yong He <yhe@nvidia.com>