Slang CPU target Support ======================== Slang has preliminary support for producing CPU source and binaries. # Features * Can compile C/C++/Slang to binaries (executables and or shared libraries) * Can compile Slang source into C++ source code * Supports compute style shaders * C/C++ backend abstracts the command line options, and parses the compiler errors/out such that all supported compilers output available in same format * Once compilation is complete can optionally access and run CPU code directly # Limitations These limitations apply to Slang source, with C/C++ the limitations are whatever the compiler requires * Barriers are not supported (making these work would require an ABI change) * Atomics are not supported * Complex resource types (such as Texture2d) are work in progress * Out of bounds access to resources has undefined behavior * ParameterBlocks are not currently supported For current C++ source output, the compiler needs to support partial specialization. # How it works The initial version works by adding 'back end' compiler support for C/C++ compilers. Currently this is tested to work with Visual Studio, Clang and G++/Gcc on Windows and Linux. The C/C++ backend can be directly accessed much like 'dxc', 'fxc' of 'glslang' can, using the pass-through mechanism with the following new backends... ``` SLANG_PASS_THROUGH_CLANG, ///< Clang C/C++ compiler SLANG_PASS_THROUGH_VISUAL_STUDIO, ///< Visual studio C/C++ compiler SLANG_PASS_THROUGH_GCC, ///< GCC C/C++ compiler SLANG_PASS_THROUGH_GENERIC_C_CPP, ///< Generic C or C++ compiler, which is decided by the source type ``` Sometimes it is not important which C/C++ compiler is used, and this can be specified via the 'Generic C/C++' option. This will aim to use the compiler that is most likely binary compatible with the compiler that was used to build the slang binary being used. To make it possible for slang to produce CPU code, we now need a mechanism to convert slang code into C/C++. The first iteration only supports C++ generation. If source is desired instead of a binary this can be specified via the SlangCompileTarget. These can be specified on the slangc command line as `-target c` or `-target cpp`. Note that when using the 'pass through' mode for a CPU based target it is currently necessary to set an entry point, even though it's basically ignored. In the API the `SlangCompileTarget`s are ``` SLANG_C_SOURCE ///< The C language SLANG_CPP_SOURCE ///< The C++ language ``` If a CPU binary is required this can be specified as a `SlangCompileTarget` of ``` SLANG_EXECUTABLE ///< Executable (for hosting CPU/OS) SLANG_SHARED_LIBRARY ///< A shared library/Dll (for hosting CPU/OS) SLANG_HOST_CALLABLE ///< A CPU target that makes the compiled code available to be run immediately ``` These can also be specified on the slang command line as `-target exe` and `-target dll` or `-target sharedlib`. `-target callable` or `-target host-callable` is also possible, but is typically not very useful from the command line, other than to test such code is avaiable for host execution. In order to be able to use the slang code on CPU, there needs to be binding via values passed to a function that the C/C++ code will produce and export. How this works is described in the ABI section. If a binary target is requested, the binary contents will be returned in a ISlangBlob just like for other targets. To use the CPU binary typically it must be saved as file and then potentially marked for execution by the OS before executing. It may be possible to load shared libraries or dlls from memory - but is a non standard feature, that requires unusual work arounds. Under the covers when slang is used to generate a binary via a C/C++ compiler, it must do so through the file system. Currently this means that the source (say generated by slang) and the binary (produced by the C/C++ compiler) must all be files. To make this work slang uses temporary files. That the reasoning for hiding this mechanism - and not return say filenames, is so that in the future when binaries are produced directly (for example with LLVM), nothing will need to change. Executing CPU Code ================== In typical slang operation when code is compiled it produces either source or a binary that can then be loaded by another API such as a rendering API. With CPU code the binary produced could be saved to a file and then executed as an exe or a shared library/dll. In practice though it is not uncommon to want to be able to execute compiled code immediately. Having to save off to a file and then load again can be awkward. It is also not necessarily the case that code needs to be saved to a file to be executed. To handle being able call code directly, code can be compiled using the SLANG_HOST_CALLABLE code target type. To access the code that has been produced use the function ``` SLANG_API SlangResult spGetEntryPointHostCallable( SlangCompileRequest* request, int entryPointIndex, int targetIndex, ISlangSharedLibrary** outSharedLibrary); ``` This outputs a `ISlangSharedLibrary` which whilst in scope, any contained functions remain available (even if the request or session go out of scope). The contained functions can then be accessed via the `findFuncByName` method on the `ISlangSharedLibrary` interface. The returned function pointer should be cast to the appropriate function signature before calling. For entry points - the function will appear under the same name as the entry point name. See the ABI section for what is the appropriate signature for entry points. For pass through compilation of C/C++ this mechanism allows any functions marked for export to be directly queried. ABI === Say we have some slang source like the following. ``` struct Thing { int a; int b; } Texture2D tex; SamplerState sampler; [numthreads(4, 1, 1)] void computeMain( uint3 dispatchThreadID : SV_DispatchThreadID, uniform Thing thing, uniform Thing thing2) { // ... } ``` When it is compiled into a shared library/dll - how is it invoked? The entry point is exported with a signiture ``` void computeMain(ComputeVaryingInput* varyingInput, UniformState* uniformState); ``` The UniformState struct typically varies by shader, and it holds all of the bindings. Where these are located can be determined by reflection. For example ``` struct _S1 { Thing_0 thing_0; Thing_0 thing2_0; }; struct UniformState { Thing* thing3; RWStructuredBuffer outputBuffer; Texture2D tex; SamplerState sampler; _S1* _S2; }; ``` For C++ targets, the templated types are defined in the slang-cpp-prelude.h that is included. Note that `slang-cpp-prelude.h` *MUST* currently be within the search path passed to the compiler. By default with CPU code-generation, the file path to the slang file is included as a 'system' include path, such that placing the `slang-cpp-prelude.h` file in the same directory as the slang source file should mean that it is found. ConstantBuffers will become pointers to the type they hold (as thing3_0 is in the above structure). StructuredBuffer/RWStructuredBuffer/ByteAddressBuffer/RWByteAddressBuffer become in effect (where in ByteAddressBuffers T is uint32_t). ``` T* data; size_t count; ``` Resource types become pointers to interfaces that implement their features. For example `Texture2D` become a pointer to a `ITexture2D` interface that has to be implemented in client side code. Similarly SamplerState and SamplerComparisonState become `ISamplerState` and `ISamplerComparisonState`. The `_S1` struct in the example above (which may have different names) is actually a struct that holds all of the entry point uniforms if there are any, in this case Note that the this pointer is not directly reflected (although layout of uniform paramters in the struct are). Currently this pointer is just placed after all the other reflected bindings. It may be useful to be able to include `slang-cpp-prelude.h` in C++ code to access the types that are used in the generated code. This introduces a problem in that the types used in the generated code might clash with types in client code. To work around this problem, you can wrap all of the types defined in the prelude with a namespace of your choosing. For example ``` #define SLANG_PRELUDE_NAMESPACE CPPPrelude #include "../../tests/cross-compile/slang-cpp-prelude.h" ``` Would wrap all the slang prelude types in the namespace `CPPPrelude`. Language aspects ================ # Arrays passed by Value Slang follows the HLSL convention that arrays are passed by value. This is in contrast the C/C++ where arrays are passed by reference. To make generated C/C++ follow this convention an array is turned into a 'FixedArray' struct type. Sinces classes by default in C/C++ are passed by reference the wrapped array is also. To get something more similar to C/C++ operation the array can be marked in out or inout to make it passed by reference. Limitations =========== # Out of bounds access In HLSL code if an access is made out of bounds of a StructuredBuffer, execution proceceeds. If an out of bounds read is performed, a zeroed value is returned. If an out of bounds write is performed it's effectively a noop, as the value is discarded. On the CPU target this behaviour is *NOT* supported. For a debug CPU build an out of bounds access will assert, for a release build the behaviour is undefined. The reason for this is that such an access is quite difficult and/or slow to implement on the CPU. The underlying reason is that operator[] typically returns a reference to the contained value. If this is out of bounds - it's not clear what to return, in particular because the value may be read or written and moreover elements of the type might bet written. In practice this means a global zeroed value cannot be returned. This could be supported if code gen worked as followed for say ``` RWStructuredBuffer values; values[3].x = 10; ``` Produces ``` template struct RWStructuredBuffer { T& at(size_t index, T& defValue) { return index < size ? values[index] : defValue; } T* values; size_t size; }; RWStructuredBuffer values; // ... Vector defValue = {}; // Zero initialize such that read access returns default values values.at(3).x = 10; ``` Note that [] would be turned into the `at` function, which takes the default value as a paramter provided by the caller. If this is then written to then only the defValue is corrupted. Even this mechanism not be quite right, because if we write and then read again from the out of bounds reference in HLSL we may expect that 0 is returned, whereas here we get the value that was last written. TODO ==== # Main * Complete support (in terms of interfaces) for 'complex' resource types - such as Texture * Parameter block support (the difficulty is around layout) * Split out entry point uniforms into a separate pointer passed to the entry point * Test system executes and tests for CPU targets - for example compute tests run on CPU * Improve documentation * Output of header files # Internal Slang compiler features These issues are more internal Slang features/improvements * Slang compute tests work (where appropriate) * Currently only generates C++ code, it would be fairly straight forward to support C (especially if we have 'intrinsic definitions') * Have 'intrinsic definitions' in standard library - such that they can be generated where appropriate + This will simplify the C/C++ code generation as means slang language will generate must of the appropriate code * Currently 'construct' IR inst is supported as is, we may want to split out to separate instructions for specific scenarios * Refactoring around swizzle. Currently in emit it has to check for a variety of scenarios - could be simplified with an IR pass and perhaps more specific instructions.