slang.git - Making it easier to work with shaders

	Commit message (Collapse)	Author	Age
*	Running tests in slang-test process (#740)	jsmall-nvidia	2018-12-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First pass at having an interface to write text to that can be replaced. Simplifed and made more rigerous the interface used to write formatted strings. * Added AppContext to simplify setting up and parsing around of streams. * Added more simplified way to get the std error/out from AppContext. * Work in progress using dll for tools to speed up testing. * First pass at ISlangWriter interface. * Added support for writing VaArgs. Added NullWriter. * Use ISlangWriter for output. * Use ISlangWriter for output - replacing OutputCallback. Make IRDump go to ISlangWriter * SlangWriterTargetType -> SlangWriterChannel Improvements around AppContext * Shared library working with slang-reflection-test. * Dll testing working for render-test. * Include va_list definintion from header. * Fix errors from clang. * Fix typo for linux. * Added -usexes option * Fix typo. * Fix arguments problem on linux. * Fix typo for linux. * Add windows tool shared library projects. * Fix warning from x86 win build. Fix signed warning from slang-test/main.cpp * First attempt at getting premake to work on travis, and run tests. * Try moving build out into script. * Invoke bash scripts so they don't have to be executable. * Drive configuration/tests from env parameters set by travis * Try using source to run travis tests. * Remove the build.linux directory - but doing so will overwrite Makefile. * Made -fno-delete-null-pointer-checks gcc only. * Try to fix warning from -fno-delete-null-pointer-checks * Turn of warnings for unknown switches. * Try to make premake choose the correct tooling. * Disabled missing braces warning. * Disable -Wundefined-var-template on clang. * -Wunused-function disabled for clang. * Fix typo due to SlangBool. * Remove this nullptr tests. * "-Wno-unused-private-field" for clang. * Added "-Wno-undefined-bool-conversion" * Add DominatorList::end fix. * Split scripts into travis_build.sh travis_test.sh * Fix gcc/clang template pre-declaration issue around QualType. * Fix premake to build such that pthread correctly links with slang-glslang
*	* Renamed spSessionHasCompileTargetSupport to ↵	jsmall-nvidia	2018-11-28
\| \| \| \| \|	spSessionCheckCompileTargetSupport. (#728) * Improved return codes from spSessionCheckCompileTargetSupport
*	Feature/early depth stencil (#727)	jsmall-nvidia	2018-11-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First pass support for early depth stencil. * Add a simple test to check if output has attributes. * Use cross compilation to test [earlydepthstencil] on glsl. * If target is dxil, use dxc to test against. Add hlsl to test earlydepthstencil against. * * Added spSessionHasCompileTargetSupport * Made slang-test use spSessionHasCompileTargetSupport to ignore tests that cannot run
*	Feature/shared library refactor (#712)	jsmall-nvidia	2018-11-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* * Added ISlangSharedLibraryLoader and ISlangSharedLibrary * Implemented default implementations * Added slang API function to get/set the ISlangSharedLibraryLoader on the session * Put function caching onto the Session - so that if the loader is chaged, its easy to reset the shared libraries, and functions * Run premake. * Fix problem with setting null, would cause an unnecessary function/shared lib flush. * * Unload SharedLibrary when DefaultSharedLibrary is deleted. * Make SharedLibrary handle unload safely if already unloaded. * Refactor SharedLibrary, such that it becomes a utility class - simplifying it's semantics. * Simplified ISlangSharedLibrary such that doesn't have unload and isLoaded so easier to implement. Use updated SharedLibrary impl. * Disable aarch64 on windows * Premake windows files without aarch64 build. * Moved slang-shared-library to core (so can be used in code outside of main slang) Fixed problem in premake5 where on windows projects were incorrectly constructed * Allowed RefObject to base class of com types Added ConfigurableSharedLibraryLoader Added -dxc-path -fxc-path -glslang-path Fix problem with dxc-path not honoring it's path when loading dxil * Added documentation for command line control of dll loading paths. * Remove some tabbing issues. * Change name of include guard.
*	Add support for a "strict" floating-point mode (#709)	Tim Foley	2018-11-01
\| \| \| \|	This change adds an API function and command line options for controlling the default floating-point behavior for a target, with options for "fast" and "precise" computation. The "precise" option gets mapped to the "IEEE strictness" mode in `fxc` and `dxc` (there is currently no equivalent option for glslang that I could find).
*	Fix handling of DXR profiles (#704)	Tim Foley	2018-10-30
\| \| \| \| \| \| \| \| \| \| \|	The logic in `getEffectiveProfile()` function was mapping these to use `Stage::Unknown` in an early attempt to handle the way that dxc requires the `lib_` profile for DXR shaders, instead of anything that mentions the stage name (in constrast to, e.g., `vs_5_1`). At the same time, the `GetHLSLProfileName()` function was updated to explicitly handle the DXR shaders and map anything it doesn't expect (including `Stage::Unknown`) to a profile named `unknown`, which dxc obviously doesn't like. This change tries to fix both issues by: Having `getEffectiveProfile()` no longer clobber the stage part of a profile for DXR shaders. * Having `GetHLSLProfileName()` map all unhandled cases to the `lib_*` profiles, since that seems likely to be how any future stages will need to be handled as well (based on the precedent with DXR) Along the way, I also fixed a bug where invoking command-line `slangc` with no `-stage` options and then relying on `[shader(...)]` attributes to pick up the entry points would lead to a crash since the array of per-entry-point output paths on each target would not be sized appropriately.
*	Rework command-line options handling for entry points and targets (#697)	Tim Foley	2018-10-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Rework command-line options handling for entry points and targets Overview: * The biggest functionality change is that the implicit ordering constraints when multiple `-entry` options are reversed: any `-stage` option affects the `-entry` to its left instead of to its right as it used to. This is technically a breaking change, but I expect most users aren't using this feature. * The options parsing tries to handle profile versions and stages as distinct data (rather than using the combined `Profile` type all over), and treats a `-profile` option that specifies both a profile version and a stage (e.g., `-profile ps_5_0`) as if it were sugar for both a `-profile` and a `-stage` (e.g., `-profile sm_5_0 -stage fragment`). * We now technically handle multiple `-target` options in one invocation of `-slangc`, but do not advertise that fact in the documentation because it might be confusing for users. Similar to the relationship between `-stage` and `-entry`, any `-profile` option affects the most recent `-target` option unless there is only one `-target`. * The logic for associating `-o` options with corresponding entry points and targets has been beefed up. The rule is that a `-o` option for a compiled kernel binds to the entry point to its left, unless there is only one entry point (just like for `-stage`). The associated target for a `-o` option is found via a search, however, because otherwise it would be impossible to specify `-o` options for both SPIR-V and DXIL in one pass. * The handling of output paths for entry points in the internal compiler structures was changed, because previously it could only handle one output path per entry point (even when there are multiple targets). The new logic builds up a per-target mapping from an entry point to its desired output path (if any). Details: * Support for formatting profile versions, stages, and compile targets (formats) was added to diagnostic printing, so that we can make better error messages. This is fairly ad hoc, and it would be nice to have all of the string<->enum stuff be more data-driven throughout the codebase. * Test cases were added for (almost) all of the error conditions in the current options validation. The main one that is missing is around specifying an `-entry` option before any source file when compiling multiple files. This is because the test runner is putting the source file name first on the command line automatically, so we can't reproduce that case. * Several reflection-related tests now reflect entry points where they didn't before, because the logic for detecting when to infer a default `main` entry point have been made more loose * On the dxc path, beefed up the handling of mapping from Slang `Profile`s to the coresponding string to use when invoking dxc. * A bunch of tests cases were in violation of the newly imposed rules, so those needed to be cleaned up. * There were also a bunch of test cases that had accidentally gotten "disabled" at some point because there were comparing output from `slangc` both with and without a `-pass-through` option, but that meant that any errors in command-line parsing produced the same error output in both the Slang and pass-through cases. This change updates `slang-test` to always expect a successful run for these tests, and then manually updates or disables the various test cases that are affected. * When merging the updated test for matrix layout mode, I found that the new command-line logic was failing to propagate a matrix layout mode passed to `render-test` into the compiler. This was because the `-matrix-layout` options were implemented as per-target, but the target was being set by API while the option came in via command line (passed through the API). It seems like we want matrix layout mode to be a global option anyway (rather than per-target), so I made that change here. Add missing expected output files * A 64-bit fix * Remove commented-out code noted in review
*	Feature/file system cache (#692)	jsmall-nvidia	2018-10-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First pass at caching file system. * default-file-system -> slang-file-system fix problem with location("build.linux") confusing windows build for now. * Added CompressedResult Fix problem in Result construction with it being unsigned * Add support for Path simplification. * Testing for Path::Simplify. * Refactored CacheFileSystem - automatically handles ISlangFileSystem or ISlangFileSystemExt appropriately. Removed WrapFileSystem - because wasn't possible to emulate some of the behavior if just loadFile is implemented. Split out StringBlob - so that no need to convert between ISlangBlob and String repeatidly. * Remove unwanted code in ~CompileRequest
*	Fix problem with __import not working (#688)	jsmall-nvidia	2018-10-22
\| \| \| \| \| \| \| \| \| \| \|	* Added getPathType to ISlangFileSystemExt. This is needed so that when searching for a file it's existance can be tested without loading the file. On some platforms a getCanonicalPath can do this - but depending on how getCanonicalPath is implemented, it may not do. This test is made after the relative path is produced before finding the canonical path. * Test for importing along search path. * Added comment to explain the issue around WrapFileSystem impl of getPathType. * Make search path use / not \
*	IncludeFileSystem -> DefaultFileSystem (#677)	jsmall-nvidia	2018-10-17
\| \| \| \| \|	Improvements in 'singleton'ness of DefaultFileSystem Made WrapFileSystem a stand alone type - to remove 'odd' aspects of deriving from DefaultFileSystem (such as inheriting getSingleton method/fixing ref counting) Simplified CompileRequest::loadFile - becauce fileSystemExt is always available.
*	Feature/include refactor (#675)	jsmall-nvidia	2018-10-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Refactor of path handling. * Added PathInfo * Changed ISlangFileSystem - such that has separate concepts of reading a file, getting a relative path and getting a canonical path * Added support for getting a canonical path for windows/linux * Made maps/testing around canonicalPaths * User output remains around 'foundPath' - which is the same as before * Small improvements around PathInfo * Added a type and make constructors to make clear the different 'path' uses * Fixed bug in findViewRecursively * Checking and reporting for ignored #pragma once. * Removed SLANG_PATH_TYPE_NONE as doesn't serve any useful purpose. * Improve comments in slang.h aroung ISlangFileSystem * Remove the need for <windows.h> in slang-io.cpp * Ran premake5. * Improvements and fixes around PathInfo. * Fix typo on linix GetCanonical * Make the ISlangFileSystem the same as before, and ISlangFileSystem contain the new methods. Internally it always uses the ISlangFileSystemExt, and will wrap a ISlangFileSystem with WrapFileSystem, if it is determined (via queryInterface) that it doesn't implement the full interface.
*	Feature/source loc improvements (#673)	jsmall-nvidia	2018-10-15
\| \| \| \| \| \|	* Fix comment to better explain usage. * For getting the type string use a temporary SourceManager.
*	Feature/source loc review (#672)	jsmall-nvidia	2018-10-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fixes/improvements based around review comments. * SourceUnit -> SourceView * * Removed the HumaneSourceLoc as it's POD-like ness seemed to make that unnecessary * Made exposed member variables in SourceManager protected - so make clear where/how can be accesed * Improved description about SourceLoc and associated structures * Changed SourceLocType to 'Actual' and 'Nominal'. * Improved a comment.
*	Feature/source loc refactor (#668)	jsmall-nvidia	2018-10-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* * Remove the need for IRHighLevelDecoration in Emit * Use the IRLayoutDecoration for GeometryShaderPrimitiveTypeModifier * Initial look at at variable byte encoding, and simple unit test. * Fixing problems with comparison due to naming differences with slang/fxc. * * More tests and perf improvements for byte encoding. * Mechanism to detect processor and processor features in main slang header. * Split out cpu based defines into slang-cpu-defines.h so do not polute slang.h * Support for variable byte encoding on serialization. * Removed unused flag. * Fix warning. * Fix calcMsByte32 for 0 values without using intrinsic. * Fix a mistake in calculating maximum instruction size. * Introduced the idea of SourceUnit. * Small improvements around naming. Add more functionality - including getting the HumaneLoc. * Add support for #line default * Compiling with new SourceLoc handling. * Fix off by one on #line directives. * Can use 32bits for SourceLoc. Fix serialize to use that. * Small fixes and comment on usage. * Premake run. * Fix signed warning. * Fix typo on StringSlicePool::has found in review.
*	Added -serial-ir command line option (#664)	jsmall-nvidia	2018-10-09
\| \| \| \| \| \| \|	* Added -serial-ir option, to make generateIR always serialize in and out before further processing. Testing out serialization, and adding a kind of 'firewall' between compiler front end and backend. * Reduce peak memory usage, by discarding IR when stored in serialized form. Typo fix.
*	Support cross-compilation of ray tracing shaders to Vulkan (#663)	Tim Foley	2018-10-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Move to newer glslang * Support cross-compilation of ray tracing shaders to Vulkan This change allows HLSL shaders authored for DirectX Raytracing (DXR) to be cross-compiled to run with the experimental `GL_NVX_raytracing` extension (aka "VKRay"). * The GLSL extension spec is marked as experimental, so that any shaders written using this support should be ready for breaking changes when the spec is finalized. * "Callable shaders" are not exposed throug the GLSL extension, so this feature of DXR will not be cross-compiled. * The experimental Vulkan raytracing extension does not have an equivalent to DXR's "local root signature" concept. This does not visibly impact shader translation (because the local/global root signature mapping is handled outside of the HLSL code), but in practice it means that applications which rely on local root signatures on their DXR path will not be able to use the translation in this change as-is; more work will be needed. The simplest part of the implementation was to go into the Slang standard library and start adding GLSL translations for the various DXR operations. In some cases, like mapping `IgnoreHit()` to `ignoreIntersectionNVX()` this is almost trivial. The various functions to query system-provided values (e.g., `RayTMin()`) were also easy, with the only gotcha being that they map to variables rather than function calls in GLSL, and our handling of `__target_intrinsic` assumes that a bare identifier represents a replacement function name, and not a full expression, so we have to wrap these definitions in parentheses. The tricky operations are then `TraceRay<P>()` and `ReportHit<A>()`, because these two are generics/templates in HLSL. GLSL doesn't support generics, even for "standard library" functions, so the raytracing extension implements a slightly complex workaround: the matching operations `traceNVX()` and `reportIntersectionNVX()` pass the payload/attributes argument data via a global variable. That is, shader code for the GLSL extensions writes to the global variable and then calls the intrinsic function. The linkage between the call site and the global is established by a modifier keyword (`rayPayloadNVX` and `hitAttributeNVX`, respectively) and in the case of ray payload also uses `location` number to identify which payload global to use (since a single shader can trace rays with multiple payload types). Our translation strategy in Slang tries to leverage standard language mechanisms instead of special-case logic. For example, to translate the `ReportHit<A>()` function, we provide both a default declaration that will work for HLSL (where the operation is built-in with the signature given), and a definition marked with the `__specialized_for_target(glsl)` modifier. The GLSL definition declares a function `static` variable that will fill the role of the required global, and then does what the GLSL spec requires: assigns to the global, and then calls the `reportIntersectionNVX` builtin (which we declare as a separate builtin). Our ordinary lowering process will turn that `static` variable into an ordinary global in the IR, and the `[__vulkanHitAttributes]` attribute on the variable will be emitted as `hitAttributeNVX` in the output. There is no additional cross-compilation logic in Slang specific to `ReportHit<A>()` - the target-specific definition in the standard library Just Works. The case for `TraceRay<P>()` is a bit more complicated, simply because the GLSL `traceNVX()` function needs to be passed the `location` for the payload global. We implement the payload global as a function-`static` variable, with the knowledge that every unique specialization of `TraceRay<P>()` will generate a unique global variable of type `P` to implement our function-`static` variable. We then add a slightly magical builtin function `__rayPayloadLocation()` that can map such a variable to its generated `location`; the logic for this is implemented in `emit.cpp` and described below. We also changed the `RayDesc` and `BuiltinTriangleIntersectionAttributes` types from "magic" intrinsic types over to ordinary types (because the GLSL output needs to declare them as ordinary `struct` types). This ends up removing some cases in the AST and IR type representations. By itself this change would break HLSL emit, because in that case the types really are intrinsic. We added a `__target_intrinsic` modifier to these types to make them intrinsic for HLSL, and then updated the downstream passes to handle the notion of target-intrinsic types. The logic for binding/layout of entry point inputs and outputs was updated so that raytracing stages don't follow the default logic for varying input/output parameters. This is because the input/output parameters of a raytracing entry point aren't really "varying" in the same sense as those in the rasterization pipeline. In particular, the SPIR-V model for raytracing input and output treats "ray payload" and "hit attributes" parameters as being in a distinct storage class from `in` or `out` parameters. We also detect cases where a ray tracing stage declares inputs/outputs that it shouldn't have. This logic could conceivably be extended to other stages (e.g., to give an error on a compute shader with user-defined varying input/output). The type layout logic added cases for handling raytracing payload and hit-attribute data, but this is currently just a stub implementation that follows the same logic as for varying `in` and `out` parameters (it cannot give meaningful byte sizes/offsets right now). To my knowledge the GLSL spec doesn't currently specify anything about layout, and I haven't read the DXR spec language carefully enough to know what it says about layout. A future change should update the layout logic to allow for byte-based layout of ray payloads, etc. so that we can query this information via reflection. The GLSL legalization logic in `ir.cpp` was updated to factor out the per-entry-point-parameter code into its own function, and then that function was updated to special-case the input/output of a ray-tracing shader. While for rasterization stages we typically want to take the user-declared input/output and "scalarize" it for use in GLSL (in part to deal with language limitations, and in part to tease system values apart from user-defined input/output), the GLSL spec for raytracing requires payload and hit attribute parameters to be declared as single variables. There is also the issue that even for an `in out` parameter, a ray payload parameter should only turn into a single global, whereas the handling for varying `in out` parameters generates both an `in` and an `out` global for the GLSL case. Other than the handling of entry point parameters, the GLSL legalization pass doesn't need to do anything special for ray tracing shaders. The trickiest change in the `emit.cpp` logic is that we now generate `location`s for ray payload arguments (the outgoing from a `TraceRay()` call) on demand during code generation. This is a bit hacky, and it would be nice to handle it as a separate pass on the IR rather than clutter up the emit logic, but this approach was expedient. Basically, any of the global variables that got generated from the `static` declarations in the standard library implementation of `TraceRay()` will trigger the logic to assign them a `location`. The logic for emitting intrinsic operations added a few new `$`-based escape sequences. The `$XP` case handles emitting the location of a generated ray payload variable; this is how we emit the matching location at the site where we call `traceNVX`. The `$XT` case emits the appropriate translation for `RayTCurrent()` in HLSL, because it maps to something different depending on the target stage. All of the test cases here consist of a pair of an HLSL/Slang shader written to the DXR spec, plus a matching GLSL shader for a baseline. The GLSL shaders are carefully designed so that when fed into glslang they will produce the same SPIR-V as our cross-compilation process. This kind of testing is quite fragile, but it seems to be the best we can do until our testing framework code supports both DXR and VKRay. A bunch of the core changes ended up being blocked on issues in the rest of the compiler, so some additional features go implemented or fixed along the way: The first big wall this work ran into was that the `__specialized_for_target` modifier hasn't actually been working correctly for a while. It turns out that for the one function that is using it, `saturate()`, we have been outputting the workaround GLSL function in all cases (including for HLSL output) rather than only on GLSL targets. The problem here is that for a generic function with a `__specialized_for_target` modifier or a `__target_intrinsic` modifier, the IR-level decoration will end up attached to the `IRFunc` instruction nested in the `IRGeneric`, but the logic for comparing IR declarations to see which is more specialized (via `getTargetSpecializationLevel()`) was looking only at decorations on the top-level value (the generic). The quick (hacky) fix here is to make `getTargetSpecializationLevel()` try to look at the return value of a generic rather than the generic itself, so that it can see the decorations that indicate target-specific functions. A more refined fix would be to attach target-specificity decorations to the outer-most generic (to simplify the "linking" logic). The only reason not to fold that into the current fix is that the `__target_intrinsic` modifier currently serves double-duty as a marker of target specialization and information to drive emit logic. The latter (the emit-related stuff) currently needs to live on the `IRFunc`, and moving it to the generic could easily break a lot of code. This needs more work in a follow-on fix, but for now target specialization should again be working. The other big gotcha that the simple "just use the standard library" strategy ran into was that function-`static` variables weren't actually implemented yet, and in particular function-`static` variables inside of generic functions required some careful coding. The logic in `lower-to-ir.cpp` has this `emitOuterGenerics()` function that is supposed to take a declaration that might be nested inside of zero or more levels of AST generics, and emit corresponding IR generics for all those levels. This is needed because two different AST functions nested inside a single generic `struct` declaration should turn into distinct `IRFunc`s nested in distinct `IRGeneric`s. The tricky bit to making that all work is that the same AST-level generic type parameter will then map to different IR-level instructions (the parameters of distinct `IRGeneric`s) when lowering each function. The existing logic handled this in an idiomatic way by making "sub-builders" and "sub-contexts." This change refactors some of the repeated logic into a `NestedContext` type to help simplify the pattern, and applies it consistently throughout the `lower-to-ir.cpp` file. Besides that cleanup, the major change is `lowerFunctionStaticVarDecl` which, unsurprisingly, handles lower of function-`static` variables to IR globals. The careful handling of nested contexts here is needed because if we are in the middle of lowering a generic function, then a `static` variable should turn into its own `IRGeneric` wrapping an `IRGlobalVar`. The body of the function should refer to the global variable by specializing the global variable's `IRGeneric` to the parameters of the functions `IRGeneric`. This tricky detail is handled by `defaultSpecializeOuterGenerics`. An additional subtlety not actually required for this raytracing work (and thus not properly tested right now) is handling function-`static` variables with initializers. These can't just be lowered to globals with initializers, because HLSL follows the C rule that function-`static` variables are initialized when the declaration statement is first executed (and this could be visible in the presence of side-effects). The lowering strategy here translates any `static` variable with an initializer into two globals: one for the actual storage, plus a second `bool` variable to track whether it has been initialized yet. There are some opportunities to optimize this case, especially for `static const` data, but that will need to wait for future changes. We've slowly been shifting away from the model where a user thinks of a "profile" as including both a stage and a feature level. Instead, the user should think about selecting a profile that only describes a feature level (e.g., `sm_6_1`, `glsl_450`, etc.), and then separately specifying a stage (`vertex`, `raygeneration, etc.) for each entry point. The challenge here is that the command-line processing still only had a single `-profile` switch, and no way to specify the stage. Adding the `-stage` option was relatively easy, but making it work with the existing validation logic for command-line arguments was tricky, because of the complex model that `slangc` supports for compiling multiple entry points in a single pass. * In `slang.h` add new reflection parameter categories for ray payloads and hit attributes, as part of entry point input/output signatures. * A previous change already updated our copy of glslang to one that supports the `GL_NVX_raytracing` extension, so in `slang-glslang.cpp` we just needed to map Slang's `enum` values for the raytracing stage names to their equivalents in the glslang code. * Moved the logic for looking up a stage by name (`findStageByName()`) out of `check.cpp` and into `compiler.cpp`, with a declaration in `profile.h` * Added a `$z` suffix to the GLSL translation of `Texture.SampleLevel()`, to handle cases where the texture element type is not a 4-component vector. Note that this fix should actually be applied to all* these texture-sampling operations, but I didn't want to add a bunch of changes that are (clearly) not being tested right now. * The layout logic for entry points was updated to correctly skip producing a `TypeLayout` for an entry point result of type `void`, which meant that the related emit logic now needs to guard against a null value for the result layout. * In `ir.cpp`, dump decorations on every instruction instead of just selected ones, so that our IR dump output is more complete. * Added a command-line `-line-directive-mode` option so that we can easily turn off `#line` directives in the output when debugging. Not all cases where plumbed through because the `none` case is realistically the most important. * Parser was fixed to properly initialize parent links for "scope" declarations used for statements, so that we can walk backwards from a function-scope variable (including a `static`) and see the outer function/generics/etc. * Added GLSL 460 profile, since it is required for ray tracing. Also updated the logic for computing the "effective" profile to use to recognize that GLSL raytracing stages require GLSL 460. * Added some conventional ray-tracing shader suffixes to the handling in `slang-test`. This code isn't actually used, but was relevant when I started by copy-pasting some existing VKRay shaders as the starting point for my testing. * Fixup: typos
*	Major overhaul of Renderer abstraction, to support a new example (#624)	Tim Foley	2018-08-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original goal here was to bring up a second example program: `model-viewer`. While the existing `hello-world` example is enough to get somebody up to speed with the basics of the Slang API (as a drop-in replacement for `D3DCompile` or similar), it doesn't really show any of the big-picture stuff that Slang is meant to enable. There wasn't any use of D3D12/Vulkan descriptor tables/sets, and there wasn't any use of interfaces, generics, or `ParameterBlock`s in the shader code. The `model-viewer` example addresses these issues. Its shader code involves generics, interfaces, and multiple `ParameterBlock`s, and the host-side code demonstrates a few key things for working with Slang: * There is an application-level abstraction for parameter blocks, that combines the graphics-API descriptor set object with Slang type information * There is a shader cache layer used to look up an appropriate variant of a rendering effect by using parameter block types to "plug in" global type variables * There is a clear separation between the phases of compilation: a first phase that does semantic checking and enables reflection-based allocation of graphics API objects, followed by one or more code generation passes for specialized kernels. This example is certainly not perfect, and it will need to be revamped more going forward. In particular: * The output picture is ugly as sin. We need a plan for how to get this to load better content, perhaps even popping up an error message to note that the required input data isn't present in the basic repository. * The shader code is too simplistic. There isn't any real material variety, and the `IMaterial` abstraction is completely wrong. * The use of parameter blocks is facile because there are no resource parameters right now. Fixing that will likely expose issues around interfacing with Slang's reflection API. * The whole example exposes the issue that Slang's current APIs aren't really designed for the benefit of two-phase compilation (since our many client application has been stuck on one-phase compilation). * Global type parameters are actually a Bad Idea that we only did for compatibility with existing codebases. We should not be showing them off in an example of the Right Way to use Slang, but the language support for type parameters on entry points is still not complete. Of course, the majority of the changes here are not inside the example applications, and instead involve a major overhaul of the `Renderer` abstraction that is used for both tests and examples. The main thrust of the change is to make the abstraction layer be closer to the D3D12/Vulkan model than to a D3D11-style model. This is important for the `model-viewer` example, since it aspires to show how Slang can be incorporated into a renderer that targets a modern API. The most important bit is actually the use of descriptor sets and "pipeline layouts" a la Vulkan, since without these Slang's `ParameterBlock` abstraction won't make a lot of sense. Implementation of the abstraction for the various APIs has very much been on an as-needed basis. The current implementation is just enough for the two examples to work, plus enough to get all the tests to pass in both debug and release builds on Windows. A big missing feature in the API abstraction right now is memory lifetime management. The code had been trending toward something D3D11-like where a constant buffer could be mapped per-frame with the implementation doing behind-the-scenes allocation for targets like D3D12/Vulkan. I'd like to shift more toward a model of just exposing "transient" allocations that are only valid for one frame, because these are more representation of how an efficient renderer for next-generation APIs will work. That transition isn't actually complete, though, so there are problems with the existing examples where `hello-world` is actually scribbling into memory that the GPU might still be using, while `model-viewer` is doing full-on heavy-weight allocations on a per-frame basis with no real concern for the performance implications. All together, there are a lot of things here that need more work, but this branch has been way too long-lived already, and so I'd like to get this checked in as long as all the tests pass.
*	spCompile/spProcessCommandLineArguments return SlangResult (#610)	jsmall-nvidia	2018-07-06
\| \| \| \| \| \| \| \| \|	* * Make spCompile return SlangResult * Make spProcessCommandLineArguments return SlangResult (and not internally exit) * Remove calls to exit() * Fix typos * Make all output from spProcessCommandLineArguments get sent to diagnostic sink.
*	Expose macros/functionality for defining interfaces (#604)	jsmall-nvidia	2018-06-22
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Added Result definitions to the slang.h * Removed slang-result.h and added slang-com-helper.h * Move slang-com-ptr.h to be publically available. * Add SLANG_IUNKNOWN macros to simplify implementing interfaces. Use the SLANG_IUNKNOWN macros to in slang.c * Removed slang-defines.h added outstanding defines to slang.h
*	Add support for "blobs" and a file-system callback (#596)	Tim Foley	2018-06-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add support for "blobs" and a file-system callback The most obvious change here is that the Slang header now includes a few COM-style interfaces that can be used for communication between the application and compiler. In order to support the declaration of COM-like interfaces, several platform-detection macros were lifted out of `slang-defines.h` and into the public `slang.h` header. As it exists right now, this change makes the Slang API C++-only, but a C-compatible version can be defined later with the help of lots of macros (and/or something like an IDL compiler). The two big interfaces introduced are: * The `ISlangBlob` interface, which is compatible with `ID3DBlob`, `IDxcBlob`, etc. This is used to pass ownership of source/compiled code across the API boundary without copies. New versions of various entry points have been added to allow passing blobs: e.g., `spAddTranslationUnitSourceBlob` and `spGetEntryPointCodeBlob`. * The `ISlangFileSystem` interface, which is used to allow applications to intercept any attempt by the Slang compiler to load a file (input source files, include files, etc.). This is not the same as the `IDxcIncludeHandler` interface, because it assumes UTF-8 encoded path names, instead of the 16-bit encoding that dxc/Windows prefer. It is also not very similar to `ID3DInclude` as used by fxc, because this callback interface is not responsible for handling the search through include paths, etc. - it is just a file-system abstraction layer. Internally, a few different parts of the compiler were changed to either store data in blob form all the time, or to be able to synthesize a blob on-demand. Because our internal `String` type is a reference-counted copy-on-write type, using a `SlangStringBlob` to hold string data should achieve transfer of ownership back to the application without extraneous copies. There is plenty of room to clean up the architecture of some of these internal pieces if they know that their data will end up in a blob. The existing Slang testing doesn't touch any of the APIs introduced here, so they can only confirm that existing functionality hasn't been broken. The new ability to return code blobs has been tested by integration of that feature into Falcor, but there has been zero testing of the ability to pass in source code as blobs, and the ability to hook file loading. Future changes will need to add test coverage for the new features. * fixup: define SLANG_NO_THROW for non-Windows builds * fixup: header copy-paste error caught by clang/gcc * Cleanup: return reference-counted objects via output parameters Returning a reference-counted object through the API as a raw pointer creates challenges. The "obvious" answer is that the returned pointer should have an added reference (it is returned at "+1"), and the caller is responsible for releasing that reference. This makes sense when using raw pointers on the calling side: ```c++ IFoo* foo = spGetFoo(...); ... foo->Release(); ``` However, as soon as smart pointers start getting involved (to handle releasing reference counts when we are done with things), the picture gets more complicated: ```c++ MySmartPtr<IFoo> foo = spGetFoo(...); ... ``` The intention of code like that is that `foo` gets released when the smart pointer goes out of scope, but this probably doesn't happen with most smart pointer implementations. If the `MySmartPtr` constructor that takes a raw pointer retains it, then the destructor will only release that reference, and so the object will leak. It is possible that the user will have a smart pointer type where the constructor that takes a raw pointer doesn't retain it, but in general such types introduce the potential for errors of their own, and no matter what the Slang API shouldn't go in assuming any particular policy. This change makes it so that any reference-counted objects that are logically returned from a call are returned through output pointers. This design makes the leak-free cases easy (enough) to implement with raw pointers or smart pointers: ```c++ // raw pointer IFoo* foo = nullptr; spGetFoo(..., &foo); ... foo->Release(); // smart pointer MySmartPtr<IFoo> foo; spGetFoo(..., foo.writeableRef()); ... ``` The only assumption here is that any COM smart-pointer type needs to provide an operation like `writableRef` that is suitable for using that pointer as an output parameter. Given that COM loves output parameters, this seems like a safe assumption (at the very least, anybody who interacts with COM would be used to this convention). Future changes might introduce inline convenience methods for various operations that return results more directly, possibly by introducing a minimal smart-pointer type in the `slang.h` header (without prescribing that clients must use it...). * fixup: another error caught by gcc/clang
*	Add basic support for Shader Model 6.3 profiles (#594)	Tim Foley	2018-06-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add basic support for Shader Model 6.3 profiles This adds `vs_6_3` and friends as available profiles, but doesn't add any new builtins specific to Shader Model 6.3. In order to better support the ray tracing shader stages, Slang will not automatically map any attempt to compile a DXR shader up to SM 6.3 (the shader model officially required for these stages) and to the `lib_` profiles (because there are no stage-specific profiles for these cases). As an added detail, when invoking `dxcompiler.dll` to generate DXIL for DXR shaders, specify an empty entry-point name, since that is expected for `lib_` profiles. * Fixup: don't drop [shader(...)] attributes The previous change makes the "effective profile" for DXR compiles no longer include a stage, but we had been using the stage stored on the effective profile in exactly one place: when determining what to output for a `[shader("...")]` attribute. This fixup makes it so that we use the stage from the profile on the entry-point layout instead, which seems like the right choice anyway, if we are ever going to emit multiple entry points at once.
*	Add options to control matrix layout rules (#583)	Tim Foley	2018-05-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add options to control matrix layout rules Up to this point, the Slang compiler has assumed that the default matrix layout conventions for the target API will be used. This means column-major layout for D3D, and row major layout for GL/Vulkan (note that while GL/Vulkan describe the default as "column major" there is an implicit swap of "row" and "column" when mapping HLSL conventions to GLSL). This commit introduces two main changes: 1. The default layout convention is switched to column-major on all targets, to ensure that D3D and GL/Vulkan can easily be driven by the same application logic. I would prefer to make the default be row-major (because this is the "obvious" convention for matrices), but I don't want to deviate from the defaults in existing HLSL compilers. 2. Command-line and API options are introduced for setting the matrix layout convention to use (by default) for each code generation target. It is still possible for explicit qualifiers like `row_major` to change the layout from within shader code. I also added an API to query the matrix layout convention that was used for a type layout (which should be of the `SLANG_TYPE_KIND_MATRIX` kind), but this isn't yet exercised. I added a reflection test case to make sure that the offsets/sizes we compute for matrix-type fields are appropriately modified by the flag that gets passed in. In a future change we could possibly switch the default convention to row-major, if we also changed our testing to match, since there are currently not many clients to be adversely impacted by the change. * Fixup: silence 64-bit build warning
*	Speedup type checking using cached overload resolution results.	Yong He	2018-05-02
\| \| \| \| \| \| \|	This change adds caches to built-in operator overload resolution and type coersion to avoid running these time-consuming operations every time. - Adds `TypeCheckingCache` type, which is defined in check.cpp, that contains two dictionaries for the cached results of `ResolveInvoke` and `CanCoerce` calls. - Add `destroyTypeCheckingCache` and `getTypeCheckingCache` methods to `Session` class to reuse these cached results over the entire session.
*	Better diagnostics when compilation is aborted (#517)	Tim Foley	2018-04-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Improve messages when compilation is aborted. Make sure to include the information from any `Slang::Exception` that was thrown, so that the poor user can at least point us at our own message string from an assertion failure. This doesn't provide them line-number information in their code or the Slang codebase, so there is still work to be done in making the compiler more friendly about this stuff. * When aborting compilation, try to note what source location we were working on This is handled by having exception handlers on the stack at key bottleneck points in semantic checking and IR generation, which can then emit a diagnostic to note what we were working on when things failed. This is not intended to be an indiciation to the user that their code is at fault for a compiler crash (it is always our fault), but might give them a chance to work around whatever bug is blocking them.
*	Propagate diagnostics when imported module has errors (#485)	Tim Foley	2018-04-13
\| \| \| \| \| \| \|	A previous fix avoided crashes when an `import`ed module has errors by making the "failed to import" error a fatal one. Unfortunately, the code path that handles fatal errors was failing to copy diagnostic output from the sink over to the member variable on the `CompileRequest` that exposes the output through the API. This meant that API users lost all context on error messages in `import`ed code. This change fixes the immediate issue by plumbing through the error output, but doesn't fix the more fundamental issue: the front-end should not crash when an `import` fails, by any means.
*	Introduce an IR-level type system (#481)	Tim Foley	2018-04-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Introduce an IR-level type system Up to this point, the Slang IR has used the front-end type system to represent types in the IR. As a result (but ultimately more importantly) the IR representation of generics and specialization has used AST-level concepts embedded in the IR. For example, to express the specialization of `vector<T,N>` to a concrete type `float` for `T`, we needed an IR operation that could represent the specialization, with operands that somehow represented the type argument `float`. The whole thing was very complicated. The big idea of this change is to introduce a new representation in which types in the IR are just ordinary instructions, so that using them as operands makes sense. The hierarchy of IR types closely mirrors the AST-side hierarchy for now, and that will probably be something we should maintain going forward. In order to make these changes work, though, I also had to do major overhauls of things like the way substitutions are performed, how we check interface conformances, the way lookup through interface types is done, etc. etc. This is a big change, and unfortunately any attempt to summarize it in the commit message wouldn't do it justice. * Fix 64-bit build warning * Fix up some clang warnings/errors
*	Add support for DirectX Raytracing (DXR) (#451)	Tim Foley	2018-03-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add support for DirectX Raytracing (DXR) This is an initial pass to add support to Slang for the shader stages introduced by DirectX Raytracing (DXR). * Add declarations for DXR intrinsic types and functions to the Slang standard library. The way our compilation works, these will then get propagated through the IR as intrinsics and get spit back out again as-is during HLSL code emission. * Declare the DXR-related stages. This is the main work that affects the compiler's C++ implementation rather than being something we can add via the standard library today. * Switch around the encoding of the `Profile` type so that the stage is in the low bits, allowing API users to pass an ordinary `SlangStage` to operations that expect a `SlangProfileID`. - This represents a direction I'd like to push in long term, where the user specifies stage and "feature level" separately rather than using composite profiles like `vs_6_0`. The introduction of these new stages seems like a good point to try and make a clean break here and not introduce, e.g., `rgs_6_1` for ray generatin shaders. * Upgrade "effective profile" computation so that it advances the required version based on the specified stage (e.g., DXR stages seem to require at least shader model 6.1). - This is a bit of a kludge overall, but ideally we don't want a typical user to have to think about "feature level" stuff much at all. The ideal workflow is that they just hand us a source file and we work out entry points and their required feature levels in the compiler (and let the user query it when we are done). Until we implement that for real, stopgaps like this are required. Overall these are relatively small changes for supporting some major new API behavior. Slang's design helps out here, by allowing a lot of things to be specified in the stdlib (including generic intrinsic functions), but some of this is also owed to the DXIL-influenced design of DXR - e.g., the use of global functions in place of `SV_` semantics. fixup: typos * Fixup: use `pixel` instead of `fragment` as primary stage name This is to match HLSL conventions when generating output code, even if the Slang project officially favors the more correct term "fragment shader."
*	Entry point attribute (#447)	Tim Foley	2018-03-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Typo * Add [shader(...)] and clean up some literal handling * Add supporting for validating the `[shader(...)]` attribute, by checking that its argument is a string literal that names a known shader stage. * Split the `ConstantExpr` class into distinct subclasses rooted at `LiteralExpr`, so we have `BoolLiteralExpr`, `IntegerLiteralExpr`, `FloatingPointLiteralExpr`, and `StringLiteralExpr` * Add a `String` type to the stdlib, to be used as the type of a string literal. This change allows code using `[shader(...)]` to be accepted by the front-end again, but it does nothing about emitting it in final HLSL. * Allow entry points to be specified via [shader(...)] Before this change, the compiler would track a list of `EntryPointRequest` objects, based on what the suer specified via API and/or command-line options. Each entry point request would get matched up with an AST `FuncDecl` as part of semantic checking, and then the back end steps (layout, codegen, etc.) would work from that information. This change makes the compiler modal, in that it can either continue to use an explicit list of entry point requests (this is the mode when the list is non-empty), or it can rely on user-supplied attributes on entry point functions to drive codegen (this is the mode when the list is empty). User-specified `[shader(...)]` attributes are processed at the same place where the association from `EntryPointRequest`s to `FuncDecl`s would otherwise be made, and basically does the same thing in the opposite direction: looks for `FuncDecl`s with the appropriate attribute and synthesizes an `EntryPointRequest` for them. Subsequent processing should ideally not know where a given `EntryPointRequest` came from, and should handle both methods of specifying the entry points equivalently. One design choice that might not make immediate sense is that we do not process a function as an entry point (applying further validation, etc.) just because it has a `[shader(...)]` modifier, unless we are in the appropriate mode (which in this case is the mode where the user didn't specify their own entry points via API or command line). This is to handle cases where the user wants to explicitly compile only one entry point, so that they (1) don't want us to spend time validating code they don't care about, (2) don't want do get output they don't expect, and (3) might actually be presenting us with code that violates the language rules due to a combination of `#define`s in effect (e.g., they might have a `[shader("vertex")]` function that transitively executes a `discard` because of how the preprocessor was configured, but they don't care because they are compiling a fragment entry point). This decision might be something we revisit over time. As part of this work, I had to add some logic to pick a "profile version" to use for a combination of a target and stage (because when you specify `[shader("vertex")]` the compiler can't tell if you want `vs_5_0`, `vs_5_1`, etc.). This isn't really complete right now, because something like `-target dxbc` also doesn't determine a profile, so there is a bit of a kludge at present. We need to figure out a good long-term plan here, which might involve keeping target format, feature level/version, and pipeline stage as truly orthogonal concepts, rather than conflating them. That would involve more work in the API and command-line layers to de-compose things when the user specifies, e.g., `vs_5_1`, but might make downstream logic easier to manage. * Emit [shader(...)] attribute on entry point for SM 6.1 and later This should help ensure that the output from Slang can be compiled with dxc `lib_` profiles. Fix warning
*	Stop compilation when a imported module contains errors. (#440)	Yong He	2018-03-12
\| \| \| \| \| \|	* Stop compilation when a important module contains errors. * Fixup test cases
*	Initial work on validating "constexpr"-ness in IR (#420)	Tim Foley	2018-02-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Initial work on validating "constexpr"-ness in IR The underlying issue here is that certain operations in the target shading languages constrain their operands to be compile-time constants. A notable example is the optional texel offset parameter to the `Texture2D.Sample` operation. When calling these operations in GLSL, the user is required to pass a "constant expression," and any variables in that expression must therefore be marked with the `const` qualifier (and themselves be initialized with constant expressions). Any GLSL output we generate must of course respect these rules. When calling these operations in HLSL, the user is not so constrained. Instead, they can pass an arbitrary expression, which may involve ordinary variables with no particular markup, and then the compiler is responsible for determining if the actual value after simplification works out to be a constant. In some cases, the requirement that a value be constant might actually trigger things like loop unrolling. Also, it is okay to use a function parameter to determine such a constant expression, as long as the argument turns out to be a constant at all call sites. The way we have decided to tackle these challenges in Slang is that we we propagate a notion of `constexpr`-ness through the IR. This is currently being tackled in `ir-constexpr.cpp` with a combination of forward and backward iterative dataflow: * When the operands to an instruction are all `constexpr`, and the opcode is one we believe can be constant-folded, then we infer that the instruction can be evaluated as `constexpr` * When instruction is required to be `constexpr`, then we infer that all of its operands are also required to be `constexpr`. If this process ever infers that a function parameter is required to be `constexpr`, then we might have to continue propagation at all the call sites to that function. If after all the propagation is done, there are any cases where an instruction is required to be `constexpr`, but it can't be `constexpr` (we weren't able to infer `constexpr`-ness for its operands), then we issue an error. This implementation encodes the idea of `constexpr`-ness in the IR as part of the type system, using a simplified notion of rates. This change adds a `RateQualifiedType` that can represent `@R T`, and then introduces a `ConstExprRate` that can be used for `R`. Many accessors for the type information on IR nodes were updated to distinguish when one wants the "full" type of an IR value (which might include rate information) vs. just the "data" type. A `constexpr` qualifier was added in the front-end, and is being used to decorate the texel offset parameter for `Texture2D.Sample`. Lowering from AST to IR looks for this qalifier and infers when a function parameter must be typed as `@ConstExpr T` instead of just `T`. There are lots of limitations and gotchas in the implementation so far: * The `@ConstExpr` rate is the only one added in this change, but it seems clear that the conceptual `ThreadGroup` rate that was added to represent `groupshared` should probably get folded into the representation. * I'm not 100% pleased with how many places in the IR I have to special-case for rate-qualified types. At the same type, pulling out rate as a distinct field on `IRValue` would probably require that we pay attention to rate everywhere. * I've added a test case to show that we can issue errors when users fail to provide a constant expression for the texel offset, but the actual error message isn't great because it doesn't indicate why a constant expression was required. Realistically the "initial IR" should contain a few more decorations we can use to relate error conditions back to the original code (even if this is in a side-band structure). * I've added a test case that is supposed to show that we can back-propagate `constexpr`-ness to local variables, and I've manually confirmed that it works for Vulkan/SPIR-V output, but the level of Vulkan support in `render_test` today means I can't enable the test for check-in. * While I'm attempting to propagate `@ConstExpr` information from callees to callers, I haven't implemented any logic to specialize callee functions based on values at call sites. * In a similar vein, there is no handling of control-flow dependence in the current code. If we infer that a phi (block parameter) needs to be `@ConstExpr`, then it isn't actually enough to require that the inputs to the phi (arguments from predecessor blocks) are all `@ConstExpr` because we also need any control-flow decisions that pick which incoming edge we take to be `@ConstExpr` as well. * As a practical matter, implicit propagation of `@ConstExpr` from a function body to a function parameter should only be allowed for functions that are "local" to a module. Any function that might be accessed from outside of a module should really have had its `@ConstExpr` parameter marked manually, and our pass should validate that they follow their own rules. Right now we have no kind of visibility (`public` vs `private`) system, so I'm kind of ignoring this issue. While that is a lot of gaps, this is also just enough code to get the Falcor MultiPassPostProcess example working, so I'm inclined to get it checked in. * Fixup: missing expected output for test * Fixup: disable test that relies on [unroll] for now
*	make CompileRequest retain specailized IR module.	Yong He	2018-02-20
\| \| \| \|	This is to workaround with the issue that the Types returned in ProgramLayout may reference to IRWitnessTables via GlobalGenericParamSubstitution.
*	more to fixing memory leaks	Yong He	2018-02-19
\| \| \| \| \|	1. reorder destruction order of several key classes to avoid using deleted IR objects when destroying Types 2. remove Session::canonicalTypes and make each Type own a RefPtr to the canonicalType, to allow types to be destroyed along with each IRModule it belongs to.
*	Fix IR memory leaks.	Yong He	2018-02-19
\| \| \| \| \| \| \| \| \|	1, make IRModule class own a memory pool for all IR object allocations 2. For now, we allow IR objects to own other (externally) heap allocated objects, such as String, List and RefPtrs by tracking all IR objects that has been allocated for the IRModule in a list named `IRModule::irObjectsToFree`. and call destructor for all these objects upon the destruction of the IRModule. In the long term, we should eliminate the use of all these externally allocated types in IR system and get rid of this tracking and explicit destructor calls. 3. remove non-generic `createValueImpl` functions and retain only generic versions in IRBulider so we can properly call the constructor of the IR types to set up virtual tables correctly for destructor dispatching. 4. add `MemoryPool` class for allocation of the IR objects. 5. Make sure we are disposing IRSpecContexts when we are done with the specialized IR module. 6. Add `_CrtDumpMemoryLeaks()` calls to check memory leaks upon destruction of a Slang session. If we are to support multiple sessions at a time, this call should probably be replaced with the more advanced MemoryState versions of the memory leak checker.
*	IR/Vulkan fixes (#412)	Tim Foley	2018-02-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix bugs around IR legalization of GLSL input/output - Add case to handle assignment of one `ScalarizedVal::Flavor::address` to another (still need to make sure we are handling all the possible cases there) - Revamp logic for creating global variable declarations for varying inputs/outputs. - Actually handle creating array declarations (not sure if binding locations will be correct) - Properly deal with offsetting of locations for nested fields - Only create varying input/output layout information as needed for the separate `in` and `out` variables we create to represent a single HLSL `inout` varying * During SSA generation, recursively remove trivial phis This is actually written up in the original paper I used as a reference, but I hadn't implemented the case yet. When you eliminate one phi as trivial (because its only operands were itself and at most one other value), you might find that another phi becomes trivial (because it had this phi as an operand, but now it will have the other value...). The one thing that made any of this tricky is that our "phi" nodes are really block parameters, and thus they don't technically have operands (`IRUse`s). The `IRUse`s for each phi were being tracked in a separate array, and had their `user` field set to null. With this change, I set their `user` to be the corresponding `IRParam` for the phi (and that means I changed `IRParam` to inherit from `IRUser` even though it shouldn't really be required). * Re-build SSA form after specialization/legalization The main reason to do this is that legalization might scalarize types, and thus might allow us to clean up resource-type local variables that we were not able to clean up when they were part of an aggregate. Note: we shouldn't really need to do this, because the front-end should actually be guaranteeing that types that include resources are used in "safe" ways, but we currently don't have the analyses required to support that. * Give an error message if we get GLSL input The API and command-line interface still recognize and nominally support GLSL input files, because they need to be supported in the "pass-through" mode. This change just adds an error message if we encounter a GLSL input file in anything other than "pass-through" mode.
*	Falcor fixes (#402)	Tim Foley	2018-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Re-define deprecated compile flags By including these flags in the header file, with a value of zero, we can allow some existing code to compile even after the major changes to the implementation. * The `SLANG_COMPILE_FLAG_NO_CHECKING` option will effectively be ignored, since checking is always enabled. * The `SLANG_COMPILE_FLAG_SPLIT_MIXED_TYPES` option will now act as if it is always enabled (and indeed some of the code has been relying on this flag being set always). * Make subscript operators writable for writable textures This even had a `TODO` comment saying that we needed to fix it, and now I'm seeing semantic checking failures because we didn't define these and so we find assignment to non l-values. * Fix definitions of any() and all() intrinsics These should always return a scalar `bool` value, but they were being defined wrong in two ways: 1. They were using their generic type parameter `T` in the return type 2. They were returning a vector in the vector case, and a matrix in the matrix case. This change just alters the return type to be `bool` in all cases. * Fix bug in SSA construction When eliminating a trivial phi node, it is possible that the phi is still recorded as the "latest" value for a local variable in its block. When later code queries that value from the block (which can happen whenever another block looks up a variable in its predecessors), it would get the old phi and not the replacement value. I simply added a loop that checks if the value we look up is a phi that got replaced, and then continues with the replacement value (which might itself be a phi...). A more advanced solution might try to get clever and have the map itself hold `IRUse` values so that we can replace them seamlessly. * Simplify IR control flow representation This change gets rid of various special-case operations for conditional and unconditional branches, and instead requires emit logic to recognize when a direct branch is targetting a `break` or `continue` label. The new approach here isn't perfect, but it seems beter than what we had before, because it can actually work in the presence of control-flow optimizations (including our current critical-edge-splitting step). * Load from groupshared isn't groupshared When loading from a `groupshared` variable, the resulting temporary shouldn't have the `groupshared` qualifier on it. This might eventually need to generalize to a better understanding of storage modifiers in the IR, but I don't really want to deal with that right now. * Don't emit references to typedefs in output code Now that we are using the IR for all codegen, we shouldn't be dealing with surface-level things like `typedef` declarations in the output code; just use the type that was being referred to in the first place. * Fix floating-point literal printing for IR The IR was calling `emit()` instead of `Emit()` (we really need to normalize our convention here), and was implicitly invoking a default constructor on `String` that takes a `double` (that constructor should really be marked `explicit`), and which doesn't meet our requirements for printing floating-point values. * Fix error when importing module that doesn't parse We already added a case to bail out if semantic checking fails, but neglected to add a case if there is an error during parsing of a module to be imported. Note: this logic doesn't correctly register the module as being loaded (but still in error), so users could see multiple error messages if there are multiple `import`s for the same module. * Improve error message for overload resolution failure - Drop debugging info from the candidate printing - Add cases to print `double` and `half` types properly * Fixup: switch loopTest to ifElse in expected IR output
*	Remove non-IR codegen paths (#398)	Tim Foley	2018-02-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The basic change is simple: remove support for all code generation paths other than the IR. There is a lot of vestigial code left, but the main logic in `ast-legalize.` is gone. Doing this breaks a lot* of tests, for various reasons: - We can no longer guarantee exactly matching DXBC or SPIR-V output after things pass through out IR - Many builtins don't have matching versions defined for GLSL output via IR (even when they had versions defined via the earlier approach that worked with the AST) - A lot of code creates intermediate values of opaque types in the IR, which turn into opaque-type temporaries that aren't allowed (this breaks many GLSL tests, but also some HLSL) I implemented some small fixes for issues that I could get working in the time I had, but most of the above are larger than made sense to fix in this commit. For now I'm disabling the tests that cause problems, but we will need to make a concerted effort to get things working on this new substrate if we are going to make good on our goals.
*	Remove support for the -no-checking flag (#392)	Tim Foley	2018-02-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Remove support for the -no-checking flag Fixes #381 Fixes #383 Work on #382 - No longer expose flag through API (`SLANG_COMPILE_FLAG_NO_CHECKING`) and command-line (`-no-checking`) options - Remove all logic in `check.cpp` that was withholding diagnostics (including errors) when the no-checking mode was enabled - Remove `HiddenImplicitCastExpr`, which was only created to support no-checking mode (it represented an implicit cast that our checking through was needed, but couldn't emit because it might be wrong) - Remove logic for storing function bodies as raw token lists when checking is turned off. I'm leaving in the `UnparsedStmt` AST node in case we ever need/want to lazily parse and check function bodies down the line. - Remove a few of the code-generation paths we had to contend with, but keep the comment about them in place. - Remove GLSL-based tests that can't meaningfully work with the new approach. - Fix other tests that used a GLSL baseline so that their GLSL compiles with `-pass-through glslang` instead of invoking `slang` with the `-no-checking` flag. - Remove tests that were explicitly added to test the "rewriter + IR" path, since that is no longer supported. There is more cleanup that can be done here, now that we know that AST-based rewrite and IR will never co-exist, but it is probably easier to deal with that as part of removing the AST-based rewrite path. We've lost some test coverage here, but actually not too much if we consider that we are dropping GLSL input anyway. * Fixup: test runner was mis-counting ignored tests * Fixup: turn on dumping on test failure under Travis * Fixup: enable extensions in Linux build of glslang
*	Fix a bug in import handling (#394)	Tim Foley	2018-02-01
\| \| \| \| \|	The recent change that removed `#import` accidentally introduced a regression that made any code that imports the same module in more than one place fail. I'm just fixing the bug for now to unblock users, but this should really get a regression test.
*	Remove #import directive (#389)	Tim Foley	2018-01-29
\| \| \| \| \| \| \|	Fixes #380 The `#import` directive was a stopgap measure to allow a macro-heavy shader library to incrementally adopt `import`, but it has turned out to cause as many problems as it fixes (not least because users have never been able to form a good mental model around which kind of import to do when). This change yanks support for the feature.
*	Fix handling of errors in imported modules (#387)	Tim Foley	2018-01-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix handling of errors in imported modules - If a semantic error is detected in an imported module, then don't try to generate IR code for it - Also, if a module (transitively) imports itself, then report that as an error - The way I'm checking for this is a bit hacky (I'm adding the module to the map of loaded modules, but in an "unfinished" state, and then using that unfinished state to detect the import of a module already being imported). This isn't a 100% complete solution for any of the related problems, but it improves the user experience for the common case. * Remove #import test. The feature is slated to be removed, so it isn't worth fixing up this test case.
*	Fix some crashing bugs around local variable declarations. (#385)	Tim Foley	2018-01-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The basic problem here arises when a local variable is used either before its own declaration: ```hlsl int a = b; ... int b = 0; ``` or when a local variable is used in its own decalration: ```hlsl int b = b; ``` In each case, Slang considers the scope of the `{}`-enclosed function body (or nested statement) as a whole, and so the lookup can "see" the declaration even if it is later in the same function. This behavior isn't really correct for HLSL semantics, so the right long-term fix is to change our scoping rules, but for now users really just want the compiler to not crash on code like this, and give an error message that points at the issue. This change makes both of the above examples print an error message saying that variable `b` was used before its declaration, which is accurate to the way that Slang is interpreting those code examples. This is currently treated as a fatal error, so that compilation aborts right away, to avoid all of the downstream crashes that these cases were causing.
*	Improvements and bug fixes for global type parameters	Yong He	2018-01-21
\| \| \| \| \| \|	1. allow spReflection_FindTypeByName to accept arbitrary type expression string 2. allow const int generic value to be used as expression value, and as array size 3. various bug fixes in witness table specialization / function cloning during specializeIRForEntryPoint to avoid creating duplicate global values, not copying the right definition of a function from the other module, not cloning witness tables that are required by specializeGenerics etc.
*	Allow arbitrary type string as type argument in spAddEntryPointEx.	Yong He	2018-01-19
\|
*	Bug fixes for Slang integration (#356)	Yong He	2018-01-04
\| \| \| \| \| \| \| \| \| \| \| \|	* fix #353 * move validateEntryPoint to after all entrypoints has been checked * bug fix: DeclRefType::SubstituteImpl should change ioDiff * bug fix: generic resource usage should have count of 1 instead of 0. * update test case
*	Fix type lookup of global type arguments	Yong He	2018-01-03
\| \| \| \|	Global type argument lookup should be done in both loaded modules and current trnaslation units. This is the same as the logic of spReflection_FindTypeByName, so it is extracted into `CompileRequest::lookupGlobalDecl(Name*)` method and reused in places.
*	no-codegen compile flag and global generics reflection (#347)	Yong He	2018-01-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* no-codegen compile flag and global generics reflection 1. Add SLANG_COMPILE_FLAG_NO_CODEGEN (-no-codegen) compiler flag to skip code generation stage, so that a shader that uses global generic type parmameters can be parsed, checked and introspected without knowing the final specialization. 2. Add reflection API to query for global generic type parameters, global parameters of generic type, and the generic type parameter index related to a global generic parameter. 3. Add a reflection test case for global generic type parameters. * add expected result for global-type-params test case. * fix reflection json output. * fix branch condition errors * fix expected result for global-type-params.slang * fix expected test case output
*	Enable HLSL/GLSL "rewrite" + IR-based Slang codegen (#300)	Tim Foley	2017-11-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The big picture here is that the AST-to-AST pass in `ast-legalize` will now detect when a declaration being referenced comes from an `import`ed module, and (if IR codegen is enabled), it will trigger cloning of the IR for the chosen symbol into an IR module that will sit alongside the legalized AST. Then, during HLSL/GLSL code emit, we emit all the IR-based code first, and then the AST-based code. Whenever the AST code references a symbol that was lowered via IR (we keep track of these) we emit the mangled name of the IR symbol. Notes/details: - A lot of the logic for cloning IR symbols referenced by the AST matches the same logic that would clone them for completely IR-based codegen, so I tried to hoist out the common logic and share it (e.g., so that we apply the same guaranteed transformations in both cases). This required basically rewriting the logic in `emit.cpp` that decomposed the various cases. - There is a new compute test case added to test this functionality. `tests/compute/rewriter.hlsl` confirms that we can use the `-no-checking` mode for the HLSL code, but still make use of a library of Slang code that employs generics, etc. - Adding this test case required adding a new compute test mode that invokes `render-test` with the `-hlsl-rewrite` flag. - It turns out that the existing `tests/render/cross-compile0.hlsl` test should have been using this functionality already. It was opting into the use of the IR via `-use-ir`, and the `render-test` application already tries to set `-no-checking` for non-Slang input languages by default. Fixing the code path this test triggers means that it is now a second test of rewriter+IR codegen. - The `translateDeclRef` logic in `ast-legalize.cpp` seemed sloppy in places, and would potentially clone declarations, when declaration references were desired. I tried to clean a bit of this up, so some call sites are now changed. - This change tries to clean up some work around cloning of global values - All global value kinds (not just functions) now go through the logic of trying to pick a "best" definition, so that they can be used when we are linking multiple modules - The logic for registering cloned values has been unified a bit, so that clients always pass in an `IROriginalValuesForClone` that either wraps a single value (maybe just null), or an `IRSpecSymbol` that gives a list of values to regsiter the new value as a clone for. - I made one piece of code that was cloning witness tables as part of generic specializations not* register a clone. I think this is correct because we may specialize the same generic multiple ways, so registering any values we clone is not the right idea, but I might be missing something... - I also reorganized this logic so that it would be easier to clone a global value when we only know its mangled name (which is the case when it is the AST that triggers cloning) - I made sure that when loading a module via `import`, the translation unit for the new module copies the `-use-ir` flag from the overall compile request, if it is present (otherwise we wouldn't generate IR for loaded modules at all... oops). - Note that `getSpecializedGlobalValueForDeclRef()`, which is the main routine used by the AST legalization to trigger cloning of an IR value does not currently handle declaration references that require specialization. - This change does not deal with trying to unify the type legalization logic between the AST-to-AST rewriter and the IR-based codegen, so if you call an imported function with types that require legalization, Bad Things are expected to happen right now.
*	Generate IR per-module for loaded modules (#299)	Tim Foley	2017-11-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The basic idea here is that for each module that gets loaded via `import`, we should also generate the initial IR for the declarations in that module at the time it gets loaded. Furthermore, when we generate initial IR for a module, we will only generate IR declarations (not definitions) for any functions/variables in modules it imports. Later, when cloning IR to begin code generation for an entry point, we will effectively "link" all of the loadedm modules together, so that a given global value can get its definition from any of the IR modules present. - Change the `loadedModulesList` and related data structures to hold a new `LoadedModule` type, instead of just the AST (and then have a `LoadedModule` own both the AST and the IR module) - Share some logic between the `import` and `#import` cases, so that we always try to generate IR for modules we load. - Make sure that IR generation always gets skipped if the command-line flags tell us not to use the IR. - A few small fixups for cases that didn't arise in IR lowering so far, but come up when we try to actually generate IR for things like the stdlib. There are some notable gaps in this work right now: - The stdlib modules are exempted from this behavior; we always generate IR for stdlib functions in any user module that calls them. This is just a workaround for the fact that the stdlib modules don't show up in the list of imported modules right now. - We don't currently have logic that does the "linking" step for global variables like we do for functions. We really need to look up the symbols with the same mangled name, and favor any one of them that has a definition (if there is one) - Similarly, the handling of witness tables is incomplete. During initial IR generation, we should probably be generating empty witness tables for any conformances that were declared in other modules (but are being used locally in this module), and then the "linking" step should favor non-empty witness tables over empty ones. Still, all the test cases pass with the code like this, and this seems like an important step in the right direction.
*	Add support for global generic parameters (#285)	Yong He	2017-11-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add support for global generic parameters (In-progress work) This commit include: 1. Update Slang API to allow specification of generic type arguments in an `EntryPointRequest` 2. Add parsing of `__generic_param` construct, which becomes a GlobalGenericParamDecl, contains members of `GenericTypeConstraintDecl`. 3. Semantics checking will check whether the provided type arguments conform to the interfaces as defined by the generic parameter, and store SubtypeWitness values in the EntryPointRequest, which will be used by `specializeIRForEntryPoint` when generating final IR. 4. Add a new type of substitution - `GlobalGenericParamSubstitution` for subsittuting references to `__generic_param` decls or to its member `GenericTypeConsraintDecl` with the actual type argument or witness tables. 5. Update `IRSpecContext` to apply `GlobalGenericParamSubstitution` when specializing the IR for an EntryPointRequest. 6. Update `render-test` to take additional `type` inputs, which specifies the type arguments to substitute into the global `__generic_param` types. This commit does not include ProgramLayout specialization. * IR: pass through `[unroll]` attribute (#284) The initial lowering was adding an `IRLoopControlDecoration` to the instruction at the head of a loop, but this was getting dropped when the IR gets cloned for a particular entry point. The fix was simply to add a case for loop-control decorations to `cloneDecoration`. * fix warnings * IR: support `CompileTimeForStmt` (#286) This statement type is a bit of a hack, to support loops that must be unrolled. The AST-to-AST pass handles them by cloning the AST for the loop body N times, and it was easy enough to do the same thing for the IR: emit the instructions for the body N times. The only thing that requires a bit of care is that now we might see the same variable declarations multiple times, so we need to play it safe and overwrite existing entries in our map from declarations to their IR values. Of course a better answer long-term would be to do the actual unrolling in the IR. This is especially true because we might some day want to support compile-time/must-unroll loops in functions, where the loop counter comes in as a parameter (but must still be compile-time-constant at every call site). * Add support for global generic parameters (In-progress work) This commit include: 1. Update Slang API to allow specification of generic type arguments in an `EntryPointRequest` 2. Add parsing of `__generic_param` construct, which becomes a GlobalGenericParamDecl, contains members of `GenericTypeConstraintDecl`. 3. Semantics checking will check whether the provided type arguments conform to the interfaces as defined by the generic parameter, and store SubtypeWitness values in the EntryPointRequest, which will be used by `specializeIRForEntryPoint` when generating final IR. 4. Add a new type of substitution - `GlobalGenericParamSubstitution` for subsittuting references to `__generic_param` decls or to its member `GenericTypeConsraintDecl` with the actual type argument or witness tables. 5. Update `IRSpecContext` to apply `GlobalGenericParamSubstitution` when specializing the IR for an EntryPointRequest. 6. Update `render-test` to take additional `type` inputs, which specifies the type arguments to substitute into the global `__generic_param` types. progress on parameter binding * Add a more contrived test case for specializing parameter bindings * update render-test to align buffers to 256 bytes (to get rid of D3D complains on minimal buffer size). * adding one more test case for parameter binding specialization. * Cleanup according to @tfoleyNV 's suggestions. * fix a bug introduced in the cleanup
*	Parameter block work (#276)	Tim Foley	2017-11-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Don't auto-enable IR use for compute tests The `COMPARE_COMPUTE` and `COMPARE_RENDER_COMPUTE` test fixtures were set up to always enable the `-use-ir` flag on Slang, which precludes having any tests that confirm functionality on the old non-IR path (which is still required by our main customer). This change adds the `-xslang -use-ir` flags explicitly to any compute test cases that left them out, and makes the fixture no longer add it by default. * Continue building out parameter block support The initial front-end logic for parameter blocks was already added, but they are still missing a bunch of functionality. This change addresses some of the known issues: - Bug fix: don't try to emit HLSL `register` bindings for variables that consume whole register spaces/sets - Overhaul type layout logic so that it can make decisions based on a given code generation target (currently passed in as a `TargetRequest`), which allows us to decide whether or not a parameter block should get its own register set on a per-target basis. - Always use a register space/set for Vulkan - Never use a register space/set for HLSL SM 5.0 and lower - By default, don't use register spaces/sets for HLSL output - Add a command-line flag and some "target flags" to enable register-space usage for D3D targets - Hackily add initial support for parameter blocks in the AST-to-AST path - This just blindly lowers `ParameterBlock<T>` to `T`, which shouldn't quite work - A more complete overhaul will probably need to wait until the AST-to-AST legalization is changed to use the `LegalType`s from the IR legalization pass. - Add a compute-based test case to actually run code using parameter blocks - This file runs test cases both with and without the IR