slang.git - Making it easier to work with shaders

Age	Commit message (Collapse)	Author
2020-11-18	Test for serializing out and reading back Stdlib (#1605)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Mangling/module name extraction for GenericDecl * Add comment on SerialFilter to explain re-enabling Stmt. * Support setting up SyntaxDecl when reconstructed after deserialization. * Improvements to setup SyntaxDecl. * Fix typo so can read compressed SourceLocs. * Fix issue with SourceManger. * Simple test for serializing out stdlib and reading back in. * Fix calling convention. * Add override to StdLib impls. * Fix typo. * Apply testing to an actual compute test when using load-stdlib Make -load/compile-stdlib processable by Slang Move out testing into util into TestToolUtil so can be shared. * Slightly more concise setup of session. * Fix some errors introduced with session handling. * Made setup for compile same across slangc and slangc-tool.
2020-11-11	Include hierarchy output (#1595)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Improve diagnostic for token pasting. * Token paste location test. * Output include hierarchy. * WIP on includes hierarchy. * Improved include hierarchy output - to handle source files without tokens. Improved test case. * Small comment improvements. Fixed a typo with not returning a reference. * Slight simplification of the ViewInitiatingHierarchy, by adding GetOrAddValue to Dictionary. * Remove the need for ViewInitiatingHierarchy type. * Improve output of path in diagnostic for includes hierarchy. * Remove comment in diagnostic for token-paste-location.slang * Update command line docs to include `-output-includes` Co-authored-by: Yong He <yonghe@outlook.com>
2020-11-10	Use integer RTTI/witness handles in existential tuples. (#1598)	Yong He
	* Use integer RTTI/witness handles in existential tuples. * Fix clang error. * Fix IR serialization to use 16bits for opcode. * Undo accidental comment change. * Use variable length encoding for opcode. * Fix compile error. * Fixing issues * Fix code review issues.
2020-11-05	Standard library save/loadable (#1592)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Fix handling of access modifiers inside type definition. * Fix access problem for AST node. Make dumping produce a single function with switch, to potentially make available without Dump specific access. * WIP on serialization design doc. * Remove project references to previously generated files. * More docs on serialization design. * Improve serialization documentation. Remove unused function from IRSerialReader. * Small fixes around naming. Remove long comment from slang-serialize.h - as covered in serialization.md * Remove long comment in slang-serialize.h as covered in serialization.md * More information about doing replacements on read for AST and problems surrounding. * Typo fix. * Spelling fixes. * Value serialize. * Value types with inheritence. * Use value reflection serial conversion for more AST types * Use automatic serialization on more of AST. * Get the types via decltype, simplifies what the extractor has to do. * Update the serialization.md for the value serialization. * Small doc improvements. * Update project. * Remove ImportExternalDecl type Added addImportSymbol and ImportSymbol type Fixed bug in container which meant it wouldn't read back AST module * Because of change of how imports and handled, store objects as SerialPointers. * First pass symbol lookup from mangled names. * Cache current module looked up from mangled name. * Fix SourceLoc bug. Improve comments. * Added diagnostic on mangled symbol not being found * Fix typo. * WIP serializing stdlib. * WIP serializing stdlib in. * Fix problem serializing arrays that hold data that is already serialized. * Remove clash of names in MagicTypeModifier. * Make conversion from char to String explicit. Fix reference count issue with SerialReader. * Add code to save/load stdlib. * Use return code to avoid warning - SerialContainerUtil::write(module, options, &stream)) * Make all String numeric ctors explicit. Added isChar to UnownedStringSlice. Added operator== for UnownedStringSlice to String to avoid need to convert to String and allocate. * Add error check to readAllText. * tabs -> spaces on String.h * tab -> spaces String.cpp * Remove msg for StringBuilder, just build inplace for exceptions. * Check SerialClasses - for name clashes. Renamed Modifier::name as Modifier::keywordName * Handling of extensions when deserializing AST - updating the moduleDecl->mapTypeToCandidateExtensions Co-authored-by: Tim Foley <tim.foley.is@gmail.com>
2020-10-26	Value type serialization via C++ Extractor (#1588)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Fix handling of access modifiers inside type definition. * Fix access problem for AST node. Make dumping produce a single function with switch, to potentially make available without Dump specific access. * WIP on serialization design doc. * Remove project references to previously generated files. * More docs on serialization design. * Improve serialization documentation. Remove unused function from IRSerialReader. * Small fixes around naming. Remove long comment from slang-serialize.h - as covered in serialization.md * Remove long comment in slang-serialize.h as covered in serialization.md * More information about doing replacements on read for AST and problems surrounding. * Typo fix. * Spelling fixes. * Value serialize. * Value types with inheritence. * Use value reflection serial conversion for more AST types * Use automatic serialization on more of AST. * Get the types via decltype, simplifies what the extractor has to do. * Update the serialization.md for the value serialization. * Small doc improvements. * Update project.
2020-10-23	C++ extractor fix for access modifiers (#1586)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Fix handling of access modifiers inside type definition. * Fix access problem for AST node. Make dumping produce a single function with switch, to potentially make available without Dump specific access. * Remove project references to previously generated files.
2020-10-22	Single pass C++ extraction (#1583)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Added CharUtil. Added TypeSet to extractor. First pass at being able to specify all headers for multiple output headers. * Fix includes for new C++ extractor convension. Update premake5 to use new extractor mechanisms. * Small improvements around StringUtil. * Split out NameConventionUtil. * Use a 'convert' to convert between convention types. * Fix output of build message for C++ extractor. Improve NameConventionUtil interface. * Improve comments. * Fix warning on gcc. * Fix clang warning. * Fix some typos in NameConventionUtil. * Small fix to premake5.lua * Fix generated includes. * Remove m_reflectType as no longer applicable with TypeSet. * Fix .gitignore for slang-generated-* files. Added getConvention to determine convention from slice. Add versions of split and convert that infer the from convention * Fix typo in spliting camel. * LineWhitespace -> HorizontalWhitespace * Improve CharUtil comments.
2020-10-15	Fix Vk leak (#1579)	jsmall-nvidia
	* #include an absolute path didn't work - because paths were taken to always be relative. * Handle scope of VkShaderModule. * Fix tabbing issue.
2020-10-09	Support CUDA bindless texture in dynamic dispatch code. (#1575)	Yong He

2020-10-06	Use Reflection for (Serial)RefObject Serialization (#1567)	jsmall-nvidia
	* First pass at generalizing serializer. * Split out ReflectClassInfo * Use the general ReflectClassInfo * Fix some typos in debug generalized serialization. * Add calculation of classIds. Make distinct addCopy/add on SerialClasses. * Write up of more generalized serialization * WIP to transition from ASTSerialReader/Writer etc to generalized SerialReader/Writer and associated types. * Improvements to SerialExtraObjects. Keep RefObjects in scope in factory * Compiles with Serial refactor - doesn't quite work yet. * First pass serialization appears to work with refector. * Split out type info for general slang types. * Split out slang-serialize-misc-type-info.h * DebugSerialData -> SerialSourecLocData DebugSerialReader -> SerialSourceLocReader DebugSerialWriter -> SerialSourceLocWriter * Remove unused template that only compiles on VS. * Fix warning around unused function on non-VS. * Improve output of type names that are in scopes in C++ extractor. Update premake5.lua to run generation for RefObject derived types. * C++ extractor working on RefObject type. * Split out serialization functionality that spans different types into slang-serialization-factory.cpp/.h Put AST type info into header. Removed RefObjectSerialSubType - use RefObjectType Add filtering for RefObject derived types Remove construction and filteringhacks. * Set up field serialization for SerialRefObject derived types. * Fix template problem compiling on Clang/Gcc * Work in progress to make Value types work. * Added slang-value-reflect.cpp
2020-10-02	Use new vulkan debug layer. (#1566)	Yong He
	* Use new vulkan debug layer. * Try use VK_LAYER_KHRONOS_validation when it exists. Co-authored-by: Tim Foley <tim.foley.is@gmail.com>
2020-09-24	Enable default cpp prelude. (#1560)	Yong He
	* Enable default cpp prelude. * Print the "#include" line as a normal source if the file does not exist. * Bug fix * Fix. * Fix c++ prelude header. * Remove unnecessary fopen call.
2020-09-17	Front-load cuda module loading to fill in RTTI pointers. (#1548)	Yong He
	Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-09-17	Embed default prelude for CUDA (#1546)	Tim Foley
	* Embed default prelude for CUDA Slang supports the notion of a "prelude" that gets prepended to the source code we generate in language. For some targets, a prelude is not necessary (e.g., we compile to HLSL/GLSL and then on to DXBC/DXIL/SPIR-V just fine without a prelude), but some targets have been implemented in a way that makes a prelude necessary (notably CPU and CUDA). For the targets that require a prelude, the Slang codebase includes usable preludes under the `prelude/` directory. Prior to this change, if a user was compiling for such a target (whether via command-line or API), there had to take responsibility for specifying the prelude to use (usually by passing in the contents of the prelude file(s) already included in the Slang distribution). It is reasonable for a user to expect an out-of-the-box experience where compilation to CUDA PTX or native CPU code should Just Work, similarly to how compilation to SPIR-V Just Works. This change is a step in the direction of providing a user experiene that Just Works for common cases. The main addition here is a tool called `slang-embed` that we run during our build to turn the `prelude/.h` files into `prelude/.h.cpp` files that embed the contents of the original `.h` file as a `const` variable. By compiling and linking in the generated `.h.cpp` file for the CUDA prelude, we are then able to set the default prelude to use for CUDA at the time a session/linkage is created. That default prelude will be used unless the user manually specifies their own prelude (which current users of the CUDA back-end must be doing). This change only sets up a default prelude for CUDA because of the way that the CPU prelude is split across multiple files. A strategy that provides a good default prelude for CPU may take more work, but that work might also be unnecessary if we switch to a strategy of using LLVM to generate native code. The implementation of the `slang-embed` tool is intentionally simple, and it will likely run into issues if/when we need to embed binary files or larger text files. The assumption being made here is that we can address those issues when they arise, and there is no reason to over-engineer the tool right now. The way that `slang-embed` is integrated into our build process is likely to require some iteration to make sure that it works across all platforms. I expect that this change will have multiple follow-up fixes related to trying to get the build to work as expected across all targets on CI. * fixup: trying to ensure that embedded prelude gets compiled into slang * fixup: properly clean up allocations in slang-embed * fixup: fix double free introduced by previous change * fixup: off-by-one allocation error
2020-09-14	Support shader parameters that are an array of existential type. (#1542)	Yong He
	* Support shader parameters that are an array of existential type. * Rename to getFirstNonExistentialValueCategory Co-authored-by: Yong He <yhe@nvidia.com>
2020-09-02	[slang-cpp-extractor] Don't modify generated source if there is no change. ↵	Yong He
	(#1530)
2020-09-02	Allow unspecialized existential shader parameters (dynamic dispatch). (#1529)	Yong He
	* Allow unspecialized existential shader parameters (dynamic dispatch). * Fixes. * Fixes * disable cuda test
2020-09-01	Support dynamic existential shader parameters in render-test (#1525)	Yong He
	* Support dynamic existential shader parameters in render-test * Fix linux build error. * Fixes. * Fix code review issues. * Fix gcc error. * More fixes. * More fixes.
2020-08-24	NVAPI improvements (#1512)	jsmall-nvidia
	* First pass at incorporating nvapi into test harness. * D3d12 Atomic Float Add via NVAPI working * Dx12 atomic float appears to work. * Atomic float add on Dx12. * Added atomic64 feature addition to vk. Fix correct output for atomic-float-byte-address.slang * Disable atomic float failing tests. * Upgraded VK headers. * Detect atomic float availability on VK. * Try to get test working for in64 atomic. * Made HLSL prelude controlled via the render-test requirements. * Added -enable-nvapi to premake. * Fix D3D12Renderer when NVAPI is not available. * Small improvements to VKRenderer. * Improve atomic documentation in target-compatibility.md. * Fixed NVAPI working on D3D12. * Test for specific NVAPI features. * Remove requiredFeatures from Renderer::Desc as was ignored. Tried to document more around nvapiExtnSlot. * Readded requiredFeatures to Renderer::Desc * Improve comments in the tests.
2020-08-21	Vulkan update/NVAPI support (#1511)	jsmall-nvidia
	* First pass at incorporating nvapi into test harness. * D3d12 Atomic Float Add via NVAPI working * Dx12 atomic float appears to work. * Atomic float add on Dx12. * Added atomic64 feature addition to vk. Fix correct output for atomic-float-byte-address.slang * Disable atomic float failing tests. * Upgraded VK headers. * Detect atomic float availability on VK. * Try to get test working for in64 atomic. * Made HLSL prelude controlled via the render-test requirements. * Added -enable-nvapi to premake. * Fix D3D12Renderer when NVAPI is not available. * Small improvements to VKRenderer. * Improve atomic documentation in target-compatibility.md.
2020-08-21	Fix stdlib declarations for texture Gather() (#1510)	Tim Foley
	Fixes #1507 These operations were failing to take into account the way that array textures require an extra coordinate to be passed in for the primary location (but not the additional offsets). Adding `isArray` to the component count is the existing solution used for similar intrinsics elsewhere in the stdlib, and it is adopted here. Because our test framework isn't really set up to do a lot of texture testing (including having no support for texture arrays), the test added here is just a cross-compilation test that compares output with fxc for comparable input.
2020-08-19	Remove IncludeHandler. (#1505)	jsmall-nvidia
	nvAPI -> NVAPI nvAPIPath -> nvapiPath DxcIncludeHandler don't reference count. nv-api-path -> nvapi-path Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-08-18	Support for float atomics on RWByteAddressBuffer (#1502)	jsmall-nvidia
	* Fix premake5.lua so it uses the new path needed for OpenCLDebugInfo100.h * Keep including the includes directory. * Added the spirv-tools-generated files. * We don't need to include the spirv/unified1 path because the files needed are actually in the spirv-tools-generated folder. * Put the build_info.h glslang generated files in external/glslang-generated. Alter premake5.lua to pick up that header. * First pass at documenting how to build glslang and spirv-tools. * Improved glsl/spir-v tools README.md * Added revision.h * Change how gResources is calculated. Update about revision.h * Update docs a little. * Split out spirv-tools into a separate project for building glslang. This was not necessary on linux, but is necessary on windows, because there is a file disassemble.cpp in spirv-tools and in glslang, and this leads to VS choosing only one. With the separate library, the problem is resolved. * Fix direct-spirv-emit output. * Update to latest version of spirv headers and spirv-tools. * Upgrade submodule version of glslang in external. * Add fPIC to build options of slang-spirv-tools * WIP adding support for InterlockedAddFp32 * Upgrade slang-binaries to have new glslang. * Fix issues with Windows slang-glslang binaries, via update of slang-binaries used. * WIP - atomicAdd. This solution can't work as we can't do (float) in glsl. WIP on atomic float ops. * Added checking for multiple decls that takes into account __target_intrinsic and __specialized_for_target. First pass impl of atomic add on float for glsl. * Split __atomicAdd so extensions are applied appropriately. * Made Dxc/Fxc support includes. Use HLSL prelude to pass the path to nvapi Added -nv-api-path * Refactor around IncludeHandler and impl of IncludeSystem * slang-include-handler -> slang-include-system Have IncludeHandler/Impl defined in slang-preprocessor * Small comment improvements. * Document atomic float add addition in target-compatibility.md. * CUDA float atomic support on RWByteAddressBuffer. * Add atomic-float-byte-address-buffer-cross.slang * Removed inappropriate-once.slang - the test is no longer valid when a file is loaded and has a unique identity by default. A test could be made, but would require an API call to create the file (so no unique id). Improved handling of loadFile - uses uniqueId if has one. * Work around for testing target overlaps - to avoid exceptions on adding targets. Simplify PathInfo setup. Modify single-target-intrinsic.slang - it no longer failed because there were no longer multiple definitions for the same target. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-08-05	Refactor enumerating directory contents (#1478)	jsmall-nvidia
	* Use m_style for OSFindFilesResult * Refactor of FindFilesResult. * Fixes on linux for FindFiles. * Simplify FindFilesState, and linux support for pattern matching. * Small fixes to linux FindFilesState * Fix typo on linux FindFiles * Fix typo in linux FindFiles. * Renamed some variables, and improved comments on FindFiles. * Improve comments on FildFiles * Small improvements around FindFiles. * Refactor FindFiles again.. into a visitor and function in Path. * Fix some problems on linux. * Fix linux typo. * Renamed os -> find-file-util * find-file-utl -> directory-util Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-08-05	Change the policy for entry-point uniform parameters on Vulkan (#1476)	Tim Foley
	Entry point `uniform` parameters were a feature of the original Cg and HLSL, but have not been used much in production shader code. One of our goals on Slang is to reduce the (ab)use of the global scope, so bringing entry point `uniform` parameters up to a greater level of usability is an important goal. Some policy choices about how global vs. entry-point `uniform` parameters behave have already been made, that shape decisions looking forward: * For DXBC/DXIL, it makes the most sense to follow the lead of fxc/dxc, by treating entry point `uniform` parameters as a kind of syntax sugar for global shader parameters. Any parameters of "ordinary" types are bundles up into an implicit constant buffer, and all the resources (including the implicit constant buffer) are assigned `register`s just as for globals. It is up to the application to decide how to bind those parameters via a root signature (using root descriptors, root constants, descriptor tables, local vs. global root signature, etc.) * For CPU, it makes sense to pass global vs. entry-point parameters as two different pointers, although the details of what we do for CPU are the least constrained across all current targets. * For CUDA compute, it makes the most sense to map global shader parameters to `__constant__` global data, and entry-point `uniform` parameters to kernel parameters. This choice ensures that the signature of a kernel when translated from Slang->CUDA follows the Principle of Least Surprise, at the cost of making entry-point vs. global parameters be passed via different mechanisms. * For OptiX ray tracing, it makes sense to expand on the precedent from CUDA compute: pass global parameters via global `__constant__` data (as is already expected by OptiX for whole-launch parameters), and pass entry-point `uniform` parameters via the "shader record." This establishes a precedent that for ray-tracing shaders, global-scope parameters map to the "global root signature" concept from DXR, while entry-point `uniform` parameters map to a "local root signature" or "shader record." * For Vulkan ray tracing, the precedent from OptiX then argues that entry-point `uniform` parameters should map to the Vulkan "shader record" concept (and thus cannot support things like resource types). * The remaining interesting case is what to do for non-ray-tracing shaders on Vulkan. The dev team agrees that the most reasonable choice to make for non-ray-tracing Vulkan shaders is to map entry-point `uniform` parameters to "push constants." In particular, this makes it easy to express the case of a compute kernel with direct parameters of ordinary/value types in the way that will be implemented most efficiently. The big picture is then that a kernel like: ```hlsl void computeMain(uniform float someValue) { ... } ``` will map to output GLSL like: ```glsl layout(push_constant) uniform { float someValue; } U; void main() { ... } ``` If the user really wanted a constant-buffer binding to be created instead, they can easily change their input to make the buffer explicit: ```hlsl struct Params { float someValue; } void computeMain(uniform ConstantBuffer<Params> params) { ... } ``` (Forcing the user to be explicit about the desire for a buffer here creates a nice symmetry between Vulkan and CUDA; in the first case the user sets up the data in host memory and passes it to the GPU by copy, while in the second case the user must allocate and set up a device-memory buffer for the data. This symmetry extends to D3D if the application chooses to map entry-point `uniform` parameters to root constants.) This change implements logic in the "parameter binding" part of the Slang compiler to make sure that entry-point `uniform` parameters are wrapped up in a push-constant buffer rather than an ordinary constant buffer for non-ray-tracing shaders on Vulkan (and in a shader record "buffer" for the ray-tracing case). The majority of the actual work was in adding support for root/push constants to the test framework and the graphics API abstraction it uses. To be clear about that support: * Root constant ranges are (perhaps confusingly) treated as a new kind of "slot" that can appear on a descriptor set. This choice ensures that the implicit numbering of registers/spaces used by the back-ends can account for these ranges correctly. * The `TEST_INPUT` lines are extended to allow a `root_constants` case that behaves more or less like `cbuffer` * The CPU and CUDA paths can treat a `root_constants` input identically to a `cbuffer`. They already allocate the actual buffers based on reflection, and just use `cbuffer` as a directive that causes bytes to be copied in. * On D3D12 and Vulkan, a descriptor set allocates a `List<char>` to hold the bytes of root constant data assigned into it, and these bytes are flushed to the command list when the table is actually bound (usually right before rendering). * On D3D11, a descriptor set treats a root constant range more or less like a constant buffer range (with a single buffer), except that it also automatically allocates a buffer to hold the data. Assigning "root constant" data automatically copies it into that buffer. The small number of tests that used entry-point `uniform` parameters of ordinary types were updated to use the new `root_constant` input type, and the bugs that surfaced were fixed. A new test to confirm that entry-point `uniform` parameters map to the shader record for VK ray tracing was added. An important but technically unrelated change is the removal of the `DescriptorSetImpl::Binding` type and related function from the Vulkan implementation of `Renderer`. That type was created to ensure that objects that are bound into a descriptor set don't get released while the descriptor set is still alive, but the implementation relied on a complicated linear search to check for existing bindings, which could create a performance issue for descriptor sets that include large arrays of descriptors. The new implementation makes use of the approach already present in the various `Renderer` implementations (including the Vulkan one) for assigning ranges in a descriptor set a flat/linear index for where their pertinent data is to be bound. As a result, the Vulkan `DescriptorSetImpl` now uses a single flat array of `RefPtr`s to track bound objects, and has no need for linear search when binding. Co-authored-by: Yong He <yonghe@outlook.com>
2020-08-04	Fix leaks in slang-generate (#1472)	Yong He
	Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-08-03	First pass support for Sampler Feedback (#1470)	jsmall-nvidia
	* Add the Feedback texture types. Depreciate SLANG_RESOURCE_EXT_SHAPE_MASK. * Starting point to test sampler feedback. * WIP on FeedbackSampler. * Use __target_intrinsic to override the output of sampler feedback types. * Use newer generic syntax for FeedbackTexture. * Reflects Feedback type. * SLANG_TYPE_KIND_TEXTURE_FEEDBACK -> SLANG_TYPE_KIND_FEEDBACK * Added reflection test. * Reneable issue with generics in sampler-feedback-basic.slang * Add methods to FeedbackTexture2D/Array. Make test cover test cases. * Sampler feedback produces DXC code. * Disabled Sampler feedback test - as requires newer version of DXC. * Fix bug in reflection tool output. * Fix problem with direct-spirv-emit.slang.expected due to update to glslang. * Fix direct-spirv-emit.slang * Use SLANG_RESOURCE_EXT_SHAPE_MASK again * Make Feedback be emitted as a textue type prefix. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-07-28	Change parameter passing convention for CUDA (#1463)	Tim Foley
	The Big Picture =============== Given input Slang code like: ```hlsl Texture2D gA; [shader("compute")] void kernelFunc(uniform Texture2D b, uint3 tid : SV_DispatchThreadID) { ... } ``` the existing CUDA code generation strategy would always generate a kernel with a signature like: ```c++ struct GlobalParams { Texture2D gA; } struct EntryPointParams { Texture2D b; } extern "C" __global__ void kernelFunc(EntryPointParams* entryPointParams, GlobalParams* globalParams) { ... } ``` This choice was consistent with the conventions of the CPU kernel target, and shares the advantage that it is easy for the user to data-drive the logic for filling in parameters and then invoking a kernel. However, the approach outlined above has two serious problems when used for CUDA kernels: * First, it defies the programmer's expectation about what an "equivalent" CUDA kernel signature would be, which makes it awkward for a developer to invoke this kernel from CUDA C++ host code (especially in the context of an app that might also run hand-written CUDA kernels). * Second, the performance of this approach suffers because every access to a global or entry point parameter turns into a load from global memory. In contrast, a typical hand-written CUDA kernel passes its parameters via an implementation-specific path that (for current CUDA platforms) seems to be equivalent to `__constant__` memory in performance. This change alters the convention so that the Slang compiler takes the code from the top of this message and translates it into something like: ```c++ struct GlobalParams { Texture2D gA; } __constant__ GlobalParams SLANG_globalParams; extern "C" __global__ void kernelFunc( Texture2D b ) { ... } ``` This translation alleviates both problems with the current translation: * The signature of the generated CUDA kernel function is as close to that of the original as is possible (we had to eliminate the `SV_`-semantic varying inputs), and should directly match what the programmer would expect in common cases. Entry-point parameters are passed via CUDA kernel parameters, and should thus match in performance. Global parameters are passed via a variable in `__constant__` memory, and thus should also perform as well as possible/expected. Detailed Changes ================ * Disable the `collectEntryPointUniformParams` pass for CUDA, so that entry-point `uniform` parameters are not bundles into a single `struct` and/or `ConstantBuffer`. * When targeting CUDA, disable the logic for generating an entry-point parameter for passing in the global shader parameter(s) * Allow `CLikeSourceEmitter` subclasses to override the name generated for entry-point symbols, and use this to add the required prefix for each OptiX kernel type when translating a ray-tracing kernel. * Add logic to emit "parameter groups" in a specialized way for CUDA (this is the same approach that allows us to generate `cbufffer { ... }` declarations for fxc). A global-scope parameter group will turn into a global `__constant__` variable called `SLANG_globalParams` (that name becomes part of the ABI for Slang-compiled shaders). * Update the logic in `render-test` for loading and invoking CUDA kernels to handle the new policy. The last bullet there merits expansion, since it is indicative of the work a client using Slang would have to go through to use our generated kernels with the new policy: * When loading a CUDA module with one or more kernels, we also use `cuModuleGetGlobal` to query the address of the `SLANG_globalParams` symbol in that CUDA module. That pointer needs to be used when setting global parameter values to be used by kernels in that CUDA odule. * Because our existing `BindPoint` logic for CUDA always sets up parameter data in GPU memory, we end up having to copy the entry-point parameter data from GPU memory to host memory. This step would ideally be skipped in a codebase that understands the correct policy, but it is a bit unfortunate that it is no longer trivially correct for an application to store all parameter data in GPU memory. * Before invoking the kernel, we need to use a `cudaMemcpyAsync` to copy from the prepared GPU memory for global parameters over to the `SLANG_globalParams` symbol associated with the kernel to be invoked. Because this operations is issued on the same CUDA stream as the kernel call, it is guaranteed to not overlap with GPU kernel execution. * When invoking the kernel, we take advantage of the seldom-used `CU_LAUNCH_PARAM_BUFFER_POINTER` facility to specify a contiguous memory region with all the entry-point parameters in it instead of passing each entry-point parameter separately. Given Slang reflection it is also possible to query the offset of each entry-point parameter in the buffer, so we could invoke the kernel in the traditional fashion as well. The choice here is up to the application. Caveats ======= * This is a breaking change, and any subsequent release will need to reflect that fact. Any customers who rely on Slang's current CUDA codegen strategy are likely to be surprised by this change, and I don't see an easy way to give them a more gentle transition. * This change does not remove the logic that introduces a `KernelContext` type for code that requires it. That means that things like `static` global variables can continue to work on CUDA for now, but we know that those are not going to be something we can support in the long-term with separate compilation. * While the policy implemented in this change is a reasonable default, it is still not going to perfectly match expecations for some developers. In particular, some developers who are familiar with both D3D and CUDA will likely wonder why a global `cbuffer` in Slang translates to a global-memory pointer in the output CUDA instead of one global `__constant__` variable per `cbuffer`. A more detailed alternate translation would generate a distinct global `__constant__` variable for each top-level constant buffer or parameter block. We may need to refine the translation even more based on feedback from users who care about how we handle global-scope parameters. * Recent changes in Slang have broken the logic that handles the OptiX "shader record" as an alternative mechanism for passing entry-point parameters. In order to get any level of OptiX support up and running we will have to change the IR passes that run on CUDA kernels to actually run the "collection" of `uniform` parameters for ray tracing stages, and then to replace references to the resulting parameter with a call to the function to access the shader record. * The use of `SLANG_globalParams` here works well enough in the case of whole-program compilation; every `CUmodule` ends up with (zero or) one parameter with this name, and an application can just hard-code it. As a mechanism it wouldn't work in the presence of separately-compiled modules that might introduce their own global parameters (including cases like constant lookup tables that really want to be at the global scope). An alternative approach would have Slang generate output PTX for each module, where a module has an optional global symbol for its own global-scope parameters (with a mangled name that is based on the module name), and then a linked CUDA binary has all of those distinct symbols. Such an approach would be compatible with module-at-a-time reflection and parameter binding, but would lead to another breaking change down the line for code that switches to `SLANG_globalParams`.
2020-07-24	Test frame work improvements (#1452)	jsmall-nvidia
	* Add -hide-ignored Made API filter when enbled filter out non API tests. * Add ability to set categories at file level. Added wave, wave-mask and wave-active categories. * Added -api-only flag. * Don't synthesize tests from only CPU tests. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-07-23	Fix for vulkan tests failing (#1456)	jsmall-nvidia
	* Clean up device when VKRenderer dtor is run. Added destroy methods to VulkanSwapChain & VulkanDeviceQueue * Small fixes around testing if DeviceQueue is valid. * Disable active-mask tests. Different drivers appear to change the results.
2020-07-16	Running generators as separate premake project step (#1441)	jsmall-nvidia
	* Put the running of generators into a separate project, to try and sure the generated products are available for other dependencies when compiling with multiple threads on linux. * Made paths Strings in slang-generate. Made paths use / for path separators (rather than \ on windows which causes some problems with #line). * Make the run-generators proj a utility step. * Made run-generators a StaticLib. * Fix problem with generating when not necessary. * Trying to get abspath to work on linux. * Add run-generator-main.cpp dummy file. * Add comment about the issues around linux and correct build triggering. * Add updated projects. * Remove the run-generators-main.cpp as no longer needed for 'run-generators' tool. Removed the adding of files by default from baseSlangProject Made the run generators project use slang-string.cpp as the file it builds from core. * Add the run-generators VS project.
2020-07-10	CUDA/CPU varying compute inputs as IR pass (#1438)	Tim Foley
	The main change here is that the CPU and CUDA C++ emit paths now rely on an earlier IR pass to legalize the varying parameter list of a kernel and translate references to varying parameters with semantics like `SV_DispatchThreadID`. Doing so removes a lot of special-case logic from the emit passes. This work moves us even closer to being able to eliminate `KernelContext` from the CPU/CUDA emit logic, because it removes the issue of state related to varying inputs being stored in `KernelContext`. The new pass that handles the legalization is in `slang-ir-legalize-varying-params.cpp`, and it borrows heavily from the existing `slang-ir-glsl-legalize.cpp` pass. The new pass factors out the target-independent and target-dependent logic, so that both CPU and CUDA can share much of the same code despite having very different rules for how the system-value parameters are being provided. An eventual goal is to have the new pass also handle the GLSL case, but doing so requires copying even more logic out of the GLSL-specific pass, and doing so seemed like a step to far for what was meant to be a stepping-stone change as part of other work. As a result of the incomplete nature of the pass, certain cases don't work for compute shader inputs for CPU/CUDA (e.g., wrapping your varying inputs in a `struct` type parameter), but those were cases that also didn't work in the existing `emit`-based logic. One major consequence of this change is that the logic for emitting the various different functions that represent an entry point for our CPU back-end has been streamlined and simplified. The original logic had a fair bit of cleverness built in to try and avoid unnecessary math ops when computing the various IDs/indices, while the new logic is much more simplistic (the main dispatch function loops over threadgroups with a triply-nested `for` and then delegates to the group-level function loops over threads with its own nested `for`s). Longer term, it will be important to simplify the CPU functions we emit further, by eliminating things like the `_Thread` function that should never really be exposed to users (the minimum granularity of invoking a CPU compute kernel should be a single threadgroup). We may eventually decide to synthesize all of the extra code that is being generated in the `emit` pass as IR instead.
2020-07-06	ShortList<T> and core.natvis improvements. (#1430)	Yong He
	* ShortList<T> and core.natvis improvements. * Fix gcc build. * add `getBuffer()` accessor to `GetArrayViewResult`
2020-07-02	Bug fix in C++ extractor (#1429)	jsmall-nvidia
	* Fix bug from change in diagnostics. * Catch exceptions and display a message on problem with C++ extractor.
2020-07-02	Only call m_api functions if m_api has been validly set on dtor of ↵	jsmall-nvidia
	VulkanDeviceQueue. (#1426)
2020-07-01	Ignore tests that don't have all the rendering APIs they require available. ↵	jsmall-nvidia
	(#1419)
2020-06-24	Heterogeneous example (#1399)	Dietrich Geisler
	* Introduced heterogeneous example. Example includes C++ source and header files, and does not currently make use of the associated slang file when building. The intent of this commit is to introduce the example as a baseline for later updates as the heterogeneous model is expanded. * Changing namespace * Renamed and rewrote README * Updated example to account for compiler updates * Updated path Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-06-18	Prelude is associated with SourceLanguage (#1398)	jsmall-nvidia
	* Associate a downstream compiler for prelude lookup even if output is source. * Remove LanguageStyle and just use SourceLanguage instread. * Added set/getPrelude. Made prelude work on source language. * Fix typo in method name replacement. get/SetPrelude get/setLanguagePrelude * Fix issue because of method name change. * Remove getPreludeDownstreamCompilerForTarget
2020-06-18	Improvements around C++ code generation (#1396)	jsmall-nvidia
	* * Remove UniformState and UniformEntryPointParams types * Put all output C++ source in an anonymous namespace * If SLANG_PRELUDE_NAMESPACE is set, make what it defines available in generated file. * Fix signature issue in performance-profile.slang * Context -> KernelContext to avoid ambiguity. * Fix issues around dynamic dispatch and anonymous namespace. * Fix typo.
2020-06-17	Generate dynamic C++ code for the minimal test case. (#1391)	Yong He
	* Add IR pass to lower generics into ordinary functions. * Fix project files * Emit dynamic C++ code for simple generics and witness tables. Fixes #1386. * Remove -dump-ir flag. * Fixups.
2020-06-11	Fix problem with C++ extractor ernoneous concating of type tokens (#1382)	jsmall-nvidia
	* Try to fix problem with C++ extractor concating tokens producing an erroneous result. * Improve naming/comments around C++ extractor fix. * Another small improvement around space concating when outputing token list. * Handle some more special cases for consecutive tokens for C++ extractor concat of tokens.
2020-06-05	Fix FindTypeByName reflection API not finding stdlib types.	Yong He

2020-05-28	Small improvements to documentation and code around DiagnosticSink (#1359)	jsmall-nvidia

2020-05-26	Synthesize "active mask" for CUDA (#1352)	Tim Foley
	* Synthesize "active mask" for CUDA The Big Picture =============== The most important change here is to `hlsl.meta.slang`, where the declaration of `WaveGetActiveMask()` is changed so that instead of mapping to `__activemask()` on CUDA (which is semantically incorrect) it maps to a dedicated IR instruction. The other `WaveActive()` intrinsics that make use of the implicit "active mask" concept had already been changed in #1336 so that they explicitly translate to call the equivalent `WaveMask()` intrinsic with the result of `WaveGetActiveMask()`. As a result, all of the `WaveActive()` functions are now no different from a user-defined function that uses `WaveGetActiveMask()`. The bulk of the work in this change goes into an IR pass to replace the new instruction for getting the active mask gets replaced with appropriately computed values before we generate output CUDA code. That work is in `slang-ir-synthesize-active-mask.{h,cpp}`. Utilities ========= There are a few pieces of code that were helpful in writing the main pass but that can be explained separately: IR instructions were added corresponding to the Slang `WaveMaskBallot()` and `WaveMaskMatch()` functions, which map to the CUDA `__ballot_sync()` and `__match_any_sync()` operations, respectively. These are only implemented for the CUDA target because they are only being generated as part of our CUDA-only pass. * The `IRDominatorTree` type was updated to make it a bit more robust in the presence of unreachable blocks in the CFG. It is possible that the same ends could be achieved more efficiently by folding the corner cases into the main logic, but I went ahead and made things very explicit for now. * I added an `IREdge` utility type to better encapsulate the way that certain code operating on the predecessors/successors of an `IRBlock` were using an `IRUse` to represent a control-flow edge. The `IREdge` type makes the logic of those operations more explicit. A future change should proably change it so that `IRBlock::getPredecessors()` and `getSuccessors()` are instead `getIncomingEdges()` and `getOutgoingEdges()` and work as iterators over `IREdge` values, given the way that the predecessor and successor lists today can contain duplicates. Using the above `IREdge` type, the logic for detecting and break critical edges was broken down into something that is a bit more clear (I hope), and that also factors out the breaking of an edge (by inserting a block along it) into a reusable subroutine. The Main Pass ============= The implementation of the new pass is in `slang-ir-synthesize-active-mask.cpp`, and that file attempts to include enough comments to make the logic clear. A brief summary for the benefit of the commit history: * The first order of business is to identify functions that need to have the active mask value piped into them, and to add an additional parameter to them so that the active mask is passed down explicitly. Call sites are adjusted to pass down the active mask which can then result in new functions being identified as needing the active mask. * The next challenge is for a function that uses the active mask, to compute the active mask value to use in each basic block. The entry block can easily use the active mask value that was passed in, while other blocks need more work. * When doing a conditional branch, we can compute the new mask for the block we branch to as a function of the existing mask and the branch condition. E.g., the value `WaveMaskBallot(existingMask, condition)` can be used as the mask for the "then" block of an `if` statement. * When control flow paths need to "reconverge" at a point after a structured control-flow statement, we need to insert logic to synchronize and re-build the mask that will execute after the statement, while also excluding any lanes/threads that exited the statement in other ways (e.g., an early `return` from the function). The explanation here is fairly hand-wavy, but the actual pass uses much more crisp definitions, so the code itself should be inspected if you care about the details. Tests ===== The tests for the new feature are all under `tests/hlsl-intrinsic/active-mask/`. Most of them stress a single control-flow construct (`if`, `switch`, or loop) and write out the value of `WaveGetActiveMask()` at various points in the code. In practice, our definition of the active mask doesn't always agree with what D3D/Vulkan implementations seem to produce in practice, and as a result a certain amount of effort has gone into adding tweaks to the tests that force them to produce the expected output on existing graphics APIs. These tweaks usually amount to introducing conditional branches that aren't actually conditional in practice (the branch condition is always `true` or always `false` at runtime), in order to trick some simplistic analysis approaches that downstream compilers seem to employ. One test case currently fails on our CUDA target (`switch-trivial-fallthrough.slang`) and has been disabled. This is an expected failure, because making it produce the expected value requires a bit of detailed/careful coding that would add a lot of additional complexity to this change. It seemed better to leave that as future work. Future Work =========== * As discussed under "Tests" above, the handling of simple `switch` statements in the current pass is incomplete. * There's an entire can of worms to be dealt with around the handling of fall-through for `switch`. * The current work also doesn't handle `discard` statements, which is unimportant right now (CUDA doesn't have fragment shaders), but might matter if we decide to synthesize masks for other targets. Similar work would probably be needed if we ever have `throw` or other non-local control flow that crosses function boundaries. * An important optimization opportunity is being left on the floor in this change. When block that comes "after" a structured control-flow region (which is encoded explicitly in Slang IR and SPIR-V) post-dominates the entry block of the region, then we know that the active mask when exiting the region must be the same as the mask when entering the region, and there is no need to insert explicit code to cause "re-convergence." This should be addressed in a follow-on change once we add code to Slang for computing a post-dominator tree from a function CFG. * Related to the above, the decision-making around whether a basic block "needs" the active mask is perhaps too conservative, since it decides that any block that precedes one needing the active mask also needs it. This isn't true in cases where the active mask for a merge block can be inferred by post-dominance (as described above), so that the blocks that branch to it don't need to compute an active mask at all. * If/when we extend the CPU target to support these operations (along with SIMD code generation, I assume), we will also need to synthesize an active mask on that platform, but the approach taken here (which pretty much relies on support for CUDA "cooperative groups") wouldn't seem to apply in the SIMD case. * Similarly, the approach taken to computing the active mask here requires a new enough CUDA SM architecture version to support explicit cooperative groups. If we want to run on older CUDA-supporting architectures, we will need a new and potentially very different strategy. * Because the new pass here changes the signature of functions that require the active mask (and not those that don't), it creates possible problems for generating code that uses dynamic dispatch (via function pointers). In principle, we need to know at a call site whether or not the callee uses the active mask. There are multiple possible solutions to this problem, and they'd need to be worked through before we can make the implicit active mask and dynamic dispatch be mutually compatible. * Related to changing function signatures: no effort is made in this pass to clean up the IR type of the functions it modifies, so there could technically be mismatches between the IR type of a function and its actual signature. If/when this causes problems for downstream passes we probably need to do some cleanup. * fixup: backslash-escaped lines I did some "ASCII art" sorts of diagrams to explain cases in the CFG, and some of those diagrams used backslash (`\`) characters as the last character on the line, causing them to count as escaped newlines for C/C++. The gcc compiler apparently balked at those lines, since they made some of the single-line comments into multi-line comments. I solved the problem by adding a terminating column of `\|` characters at the end of each line that was part of an ASCII art diagram. * fixup: typos Co-authored-by: jsmall-nvidia <jsmall@nvidia.com>
2020-05-26	Improvements around hashing (#1355)	jsmall-nvidia
	* Fields from upper to lower case in slang-ast-decl.h * Lower camel field names in slang-ast-stmt.h * Fix fields in slang-ast-expr.h * slang-ast-type.h make fields lowerCamel. * slang-ast-base.h members functions lowerCamel. * Method names in slang-ast-type.h to lowerCamel. * GetCanonicalType -> getCanonicalType * Substitute -> substitute * Equals -> equals ToString -> toString * ParentDecl -> parentDecl Members -> members * * Make hash code types explicit * Use HashCode as return type of GetHashCode * Added conversion from double to int64_t * Split Stable from other hash functions * toHash32/64 to convert a HashCode to the other styles. GetHashCode32/64 -> getHashCode32/64 GetStableHashCode32/64 -> getStableHashCode32/64 * Other Get/Stable/HashCode32/64 fixes * GetHashCode -> getHashCode * Equals -> equals * CreateCanonicalType -> createCanonicalType * Catches of polymorphic types should be through references otherwise slicing can occur. * Fixes for newer verison of gcc. Fix hashing problem on gcc for Dictionary. * Another fix for GetHashPos * Fix signed issue around GetHashPos
2020-05-20	AST dumping via C++ Extractor reflection (#1348)	jsmall-nvidia
	* Add support for parsing array types to C++ extractor. * C++ extractor looks for 'balanced tokens'. Use for extracting array suffixes. * First pass at field dumping. * Update project for field dumping. * WIP AST Dumper. * More AST dump compiling. * Fix bug in StringSlicePool where it doesn't use the copy of the UnownedStringSlice in the map. * Add support for SLANG_RELFECTED and SLANG_UNREFLECTED More AST dump support. * Support for hierarchical dumping/flat dumping. Use SourceWriter to dump. * Add -dump-ast command line option. * Add fixes to VS project to incude AST dump. * Fix compilation on gcc. * Add fix for type ambiguity issue on x86 VS. * Fixes from merge of reducing Token size. * Fix comment about using SourceWriter.
2020-05-19	Reduce the size of Token (#1349)	jsmall-nvidia
	* Token size on 64 bits is 24 bytes (from 40). On 32 bits is 16 bytes from 24. * Added hasContent method to Token. Some other small improvements around Token.
2020-05-18	Use 'balance' for extracting array suffix in C++ extractor (#1346)	jsmall-nvidia
	* Add support for parsing array types to C++ extractor. * C++ extractor looks for 'balanced tokens'. Use for extracting array suffixes.
2020-05-14	Add support for parsing array types to C++ extractor. (#1343)	jsmall-nvidia

2020-05-08	AST nodes using C++ Extractor (#1341)	jsmall-nvidia
	* Extractor builds without any reference to syntax (as it will be helping to produce this!). * Change macros to include the super class. * WIP replacing defs files. * Added indexOf(const UnownedSubString& in) to UnownedSubString. Refactored extractor * Output a macro for each type with the extracted info - can be used during injection in class * Simplify the header file - as can get super type and last from macro now * Store the 'origin' of a definition * Some small tidy ups to the extractor. * Improve comments on the extractor options. * Made CPPExtractor own SourceOrigins * Small fixes around SourceOrigin. * Small tidy up around macroOrign * WIP Visitor seems now to work correctly. Split out types used by ast into slang-ast-support-types.h * Fix remaining problems with C++ extractor being used with AST nodes. Add CountOf to extractor type ids. Added ReflectClassInfo::getInfo to turn an ASTNodeType into a ReflectClassInfo * Fix compiling on linux. Fix typo in memset. * Small tidy up around comments/layout. Moved NodeBase casting to NodeBase. * Make premake generate project that builds with cpp-extractor for AST. * Get the source directory from the filter in premake. * Fix typo in source path * Explicitly set the source path for premake generation for AST. * Special case handling of override to apease Clang. * Use a more general way to find the slang-ast-reflect.h file to run the extractor. * Appveyor is not triggering slang-cpp-extractor - try putting dependson together. * Put building slang-cpp-extractor first. * Disable some project options to stop MSBuild producing internal compiler errors. * Try reordering the projects in premake5.lua * Hack to try and make slang-cpp-extractor built on appveyor. * Disable flags - not required for MSBuild on appveyor. * Disable flags not required for build on AppVeyor. * Updated Visual Studio projects with slang-cpp-extractor. * Added Visual Studio slang-cpp-extractor project.