slang.git - Making it easier to work with shaders

Age	Commit message (Collapse)	Author
2019-03-07	Fix problems with synthesized tests and inconsitent render-test command ↵	jsmall-nvidia
	lines (#885) * * Check for inconsistent command line options for renderer * Moved RenderApiUtil into core so can be used in slang-test * Make it use the ShaderdLibrary for API testsing * Added some simplifying functions to StringUtil for spliting/comparisons * Refactored the synthesis of rendering tests so that inconsistent combinations are not produced * Add missing slang-render-api-util.cpp & .h * Stop warning on linux about _canLoadSharedLibrary not being used.
2019-03-05	Hotfix/texture2d gather (#876)	jsmall-nvidia
	* First pass test to see if GatherRed works. * Add support for generating R_Float32 textures. * Set default texture format. * * Alter the texture2d-gather to work with a R_Float32 texture * Add support for scalar Texture2d types with GatherXXX in stdlib * Remove some left over commented out test code from texture2d-gather.hlsl
2019-03-02	#include not using search paths (#873)	jsmall-nvidia
	* Fix warnings from visual studio due to coercion losing data. * Removed searchDirectories from FrontEndCompileRequest and use the one in Linkage as that is the one that is changed via Slang API. * * Add searchPaths back to FrontEndRequest * Add comments to explain the issue * Add a test to check include paths
2019-02-27	Hotfix/fix stdlib error reporting (#866)	jsmall-nvidia
	* * Add 'identity' version of bit casts (asint, asuint, asfloat) for scalar and vector * Added identity bit casts for matrix (cos no op). We don't support matrix asint on glsl targets * Added tests in bit-cast.slang * Use kIRPseudoOp_Pos for identity asuint/asint/asfloat casts. * * Stop crash if error in stdlib * Use the buildin source manager when compiling stdlib - fixes that line numbers are displayed * Typo fix * Output line directives for 'meta' slang souce files into stdlib * Improve comments and function names.
2019-02-27	Hotfix/device check review (#862)	jsmall-nvidia
	* Fix typo on return type. * * Inverted order of FlagCombiner (to make more 'nested for' like) * On Dx12 just use D3D_FEATURE_LEVEL_11_0 * Fix typo on dll name
2019-02-26	Dx11 & Dx12 device startup (#861)	jsmall-nvidia
	* Added CombinationUtil to produce combinations of flags Used in Dx11 device creation making it fall back to release driver if debug driver is not found * Made dx12 renderer startup similar to dx11 - testing multiple configs. * Small improvements in naming. * * Moved functionality to gfx from core * Use FlagCombiner to simplify construction, and can be iterated over, without need for array * Share DeviceCheckFlags * Improve comments. * Re-add the comment about combinations tested to set up dx11 device. * More comment improvements.
2019-02-19	First steps toward supporting interface-type parameters on shaders (#852)	Tim Foley
	* First steps toward supporting interface-type parameters on shaders What's New ---------- From the perspective of a user, the main thing this change adds is the ability to declare top-level shader parameters (either at global scope, or in an entry-point parameter list) with interface types. For example, the following becomes possible: ```hlsl // Define an interface to modify values interface IModifier { float4 modify(float4 val); } // Define some concrete implementations struct Doubler : IModifier { float4 modify(float4 val) { return val + val; } } struct Squarer : IModifier { ... } // Define a global shader parameter of interface type IModifier gGlobalModifier; // Define an entry point with an interface-type `uniform` parameter void myShader( unifrom IModifier entryPointModifier, float4 inColor : COLOR, out float4 outColor : SV_Target) { // Use the interface-type parameters to compute things float4 color = inColor; color = gGlobalModifier.modify(color); color = entryPointModifier.modify(color); outColor = color; } ``` The user can specialize that shader by specifying the concrete types to use for global and entry-point parameters of interface types (e.g., plugging in `Doubler` for `gGlobalModifier` and `Squarer` for `entryPointModifier`). The "plugging in" process is done in terms of a concept of both global and local "existential slots" which are a new `LayoutResourceKind` that represents the holes where concrete types need to be plugged in for existential/interface types. In simple cases like the above, each interface-type parameter will yield a single existential slot in either the global or entry-point parameter layout. Users can query the start slot and number of slots for each shader parameter, just like they would for any other resource that a parameter can consume. Before generating specialized code, the user plugs in the name of the concrete type they would like to use for each slot using `spSetTypeNameForGlobalExistentialSlot` and/or `spSetTypeNameForEntryPointExistentialSlot`. There are some major limitations to the implementation in this first change: * Parameters must be of interface type (e.g., `IFoo`) and not an array (`IFoo[3]`), or buffer (`ConstantBuffer<IFoo>`) over an interface type. Similarly, `struct` types with interface-type fields still don't work. * The work on interface-type function parameters still doesn't include support for `out` or `inout` parameters, nor for functions that return interface types (that isn't technically related to this change, but affects its usefullness). * No work is being done to correctly lay out shader parameters once the concrete types for existential slots are known, so that this change really only works when the concrete type that gets plugged in is empty. These limitations are severe enough that this feature isn't really usable as implemented in this change, and this merely represents a stepping stone toward a more complete implementation. Implementation -------------- The API side of thing largely mirrors what was already done to support passing strings for the type names to use for global/entry-point generic arguments, so there should be no major surprises there. The logic in `check.cpp` computes the list of existential slots when creating unspecialized `Program`s and `EntryPoint`s (this is logically the "front end" of the compiler), and then checks the supplied argument types against what is expected in each slot when creating specialized `Program`s and `EntryPoint`s. This again mirrors how generic arguments are handled. Type layout was extended to compute the number of existential slots that a type consumes, and will thus automatically assign ranges of slots to top-level and entry-point shader parameters in the same way it already allocates `register`s and `binding`s. The big missing feature is the ability to specialize a layout to account for the concrete types plugged into the existential-type slots. IR generation for specialized programs and entry points was slightly extended so that it attaches information about the concrete types plugged into the existential slots, and the witness tables that show how they conform to the interface for that slot. The linking step needed some small tweaks to make sure that information gets copied over to the target-specific program when we start code generation. The meat of the IR-level work is in `ir-bind-existentials.cpp`, which takes the information that was placed in the IR module by the generation/linking steps and uses it to rewrite shader parameters. For example, if there is a shader parameter `p` of type `IModifier`, and the corresponding existential slot has the type `Doubler` in it, we will rewrite the parameter to have type `Doubler`, and rewrite any uses of `p` to instead use `makeExistential(p, /witness that Doubler conforms to IModifier/)`. Once the replacement is done on the parameters, the existing work for specializing existential-based code when the input type(s) are known kicks in and does the rest. Testing ------- A single compute test is added to validate that this feature works. It is narrowly tailored to not require any of the features not supported by the initial implementation (e.g., all of the concrete types used have no members). The test case does include use of an associated type through one of these existential-type parameters, which has exposed a subtle bug in how "opening" of existential values is implemented in the front-end. Rather than fix the underlying problem, I cleaned up the code in the front-end to special case when the existential value being opened is a variable bound with `let`, to directly use a reference to that variable rather than introduce a temporary. Similarly, in the IR generation step, I added an optimization to make variables declared with `let` skip introducing an IR-level variable and just use the SSA value of their initializer directly instead. * fixup: missing files * fixup: incorrect type for unreachable return * fixup: actually comment ir-bind-existentials.cpp
2019-02-15	Split front- and back-ends (#846)	Tim Foley
	* Split front- and back-ends This change is a major refactor of several of the types that provide the behind-the-scenes implementation of the public C API. The goal of this refactor is primarily to allow for future API services that let the user operate both the front- and back-ends of the compiler in a more complex fashion. For example, as user should be able to compile a bunch of source code into modules, look up types, functions, etc. in those modules, specialize generic types/functions to the types they've looked up, and then finally request target code to be gernerated for specialized entry points. The back-end code generation they trigger should re-use the front-end compilation work (parsing, semantic checking, IR generation) that was already performed. The most visible change is that `CompileRequest` has been split up into several smaller types that take responsibility for parts of what it did: * The `Linkage` type owns the storage for `import`ed modules, and well as the `TargetRequest`s that represent code-generation targets. The intention is that an application could use a single `Linkage` for the duration of its runtime (so long as it was okay with the memory usage), so that each `import`ed module only gets loaded once. For now, this type needs to manage the search paths, file system, and source manager, because of its responsibility for loading files. * A `FrontEndCompileRequest` owns the stuff related to parsing, semantic checking, and initial IR generation. This most notably includes the `TranslationUnitRequest`s and the `FrontEndEntryPointRequest`s (which used to be just `EntryPointRequest`s). It's main job is to produce AST and IR modules for each translation unit, and to find and validate the entry points. The front-end request does not interact with generic arguments for global or entry-point generic parameters. * The main output of both `import` operations and front-end translation units is the `Module` type, which is just a simple container for both the AST module (to service the reflection/layout APIs, and also for semantic checking of code that `import`s the module) and the IR module (for linking and code generation). This type captures the commonalities between the old `LoadedModule` (which is now just an alias for `Module`) and `TranslationUnitRequest` (which now owns a `Module`). * The secondary output of front-end compilation is a `Program`, which comprises a list of referenced `Module`s and validated `EntryPoint`s that will be used together. Layout and code generation both need a `Program` to tell them what modules and entry points will be used together (we don't want to just code-gen everythin that has ever been loaded into the linakge). The `Program`s created by the front-end do not include generic arguments, so they may provide incomplete layout information and/or be unsuitable for code generation. * A `BackEndCompileRequest` owns stuff related to turning a `Program` into output kernels for the targets of a `Linkage`. Most of the data it owns beyond the `Program` to be compiled is minor, so this is a good candidate for demotion from a heap-allocated object to just a `struct` of options that gets passed around. * The `CompileRequestBase` type is an attempt to wrap up the common functionality of both front-end and back-end compile requests. Most of it is just exposing the availability of a linkage and `DiagnosticSink`, so this type is a good candidate for subsequent removal. The main interesting thing it has is the flags related to dumping and validation of IR, so there is probably a good refactoring still to be made around deciding how options should be handled going forward. * Behind the scenes, the `Program` type is set up to handle some level of on-line compilation and layout work. The `Program` knows the `Linkage` it belongs to, and allows for a `TargetProgram` to be looked up based on a specific `TargetRequest`. A `TargetProgram` then allows layout information and compiled kernel code to be asked for on-demand, in order to support eventual "live" compilation scenarios. * The `EndToEndCompileRequest` type is a composition/coordination type that replaces the old `CompileRequest` in a way that uses the services of the various other types. It owns a few pieces of state that only make sense in the context of an end-to-end compile (e.g., there is really no way to "pass through" code when the front- and back-ends are run separately) or a command-line compile (everything to do with specifying output paths for files is really just for the benefit of `slangc`, and might even be moved there over time). * One important detail is that the `EndToEndCompilRequest` owns all of the string-based generic arguments for both global and entry-point generic parameters. The logic in `check.cpp` for dealing with those arguments has been heavily refactored to separate out the parsings steps that are specific to end-to-end compilation with string-based type arguments, and the semantic checking steps that result in a specialized `Program` (which can be exposed through new APIs that aren't tied to end-to-end compilation). It is perhaps not surprising that this change had a lot of consequences, so I'll briefly run over some of the main categories of changes required: * I changed the way that global generic arguments are passed via API (use `spSetGlobalGenericArgs` instead of the generic arguments for `spAddEntryPointEx`, which are not just for entry-point generics), which has been a change that we've needed for a long time. This is technically a breaking API change, although we should have very few client applications that care about it. * A bunch of places that used to take "big" objects like `CompileRequest` now just take the sub-pieces they care about (e.g., a function might have only needed a `Linkage` and a `DiagnosticSink`). This makes many subroutines or "context" struct types more generally useful, at the cost of taking more parameters. * In a few cases the conceptually clean separation of the layers breaks down (often for edge-case or compatibility features), and so we may pass along additional objects that are allowed to be null, but are used when present. A big example of this is how the back-end code generation routines accept an `EndToEndCompileRequest` that is optional, and only used to check whether "pass through" compilation is needed. We should probably look into cleaning this kind of logic up over time so that we don't need to violate the apparent separation of phases of compilation. * In cases where separation of layers was being broken for the sake of GLSL features, I went ahead and ripped them out, since all of that should be dead code anyway. * In many cases I increased the encapsulation of data in the core types to help track down use sites and make sure they are following invariants better. * In cases where code was doing, e.g., `context->shared->compileRequest->session->getThing()` I have tried to introduce convenience routines so that the usage site is just `context->getThing()` to improve encapsulation and allow changes to be made more easily going forward. * The `noteInternalErrorLoc` functionality was moved off of the compile request and into `DiagnosticSink`, since that is the one type you can rely on having around when you want to note an internal error. We may consider going forward if (and how) it should reset the counter used for noting locations on internal errors. * A few APIs now take `DiagnosticSink` arguments where they didn't before, and as a result some public APIs need to create `DiagnosticSink`s to pass in, before going ahead and ignoring the messages. In the future there should be variations of these APIs that accept an `ISlangBlob` parameter for the output. fixup: missing include for compilers with accurate template checking (non-VS) * fixup: review feedback
2019-02-14	Fix for Dx12 to stop crash when dxil cannot be handled by driver (#851)	jsmall-nvidia
	* If there is a problem with initialize RenderTestApp::initialize constructing the pipeline, this was not being reported as an error causing a later crash. * Use same style to return error.
2019-02-11	MemoryArena rewindability/Improved IRInst construction (#837)	jsmall-nvidia
	* Make MemoryArena rewindable. * Add test for rewinding MemoryArena * Use memory rewinding in IRInst lookup instead of malloc/free. * Small tidy. * Don't bother recreating instruction if after lookup it's found it's new. * Fix 32 bit signed compare issue.
2019-02-07	Hotfix/remove null this work around (#831)	jsmall-nvidia
	* Re-enable warnings around null this. * Remove testing for nullptr in Substitution::Equals tests * Fix ref counting problem in vulkan render. * * Remove SLANG_ASSERT(this) in mthods * Place asserts conservatively at method call sites where appropriate.
2019-01-29	Add support for user defined attributes.	Yong He

2019-01-21	Feature/file unique identity (#789)	jsmall-nvidia
	* * Fix memory bug around expanding va_args - needed buffer to have space for terminating 0 * Fix problem with FileWriter defaults being globals, as memory they allocate, will only be freed after return from main - work around by making StdWriters RefObject derived, and kept in scope such the writers are destroyed before checks for leaks is found * Added SimplifyPathAndHash mode for CacheFileSystem - will simplify the path and see if simplified path is in cache before reading file (limiting amout of underlying file requests) * * Added calcReplaceChar * Renamed DefaultFileSystem to OSFileSystem * Made OSFileSystem convert windows \ to / on linux * Simplified logic for caching in CacheFileSystem. * Added pragma-once-c to add extra test, but also so there is an 'include' directory in preprocessor tests. * Small fixes in pragma once test. * Simplified cache handling path, so that paths/simplified paths area always added. * Improve naming of methods for different caches. * Removed references to 'canonicalPath' and made 'uniqueIdentity' * * Re-add support for canonicalPath to ISlangFileSystem -> not for uniqueIdentifier but as a way to display 'canonicalPath' * Added peliminary support for being able to display verbose paths in a diagnostic * Added 'clearCache' support * Added verbose path support to SourceManager (now needs a ISlangFileSystemExt to do this) * Added support for '-verbose-path' option to slangc and slang-test.
2019-01-21	Path simplification/hash mode, plus bug fixes (#788)	jsmall-nvidia
	* * Fix memory bug around expanding va_args - needed buffer to have space for terminating 0 * Fix problem with FileWriter defaults being globals, as memory they allocate, will only be freed after return from main - work around by making StdWriters RefObject derived, and kept in scope such the writers are destroyed before checks for leaks is found * Added SimplifyPathAndHash mode for CacheFileSystem - will simplify the path and see if simplified path is in cache before reading file (limiting amout of underlying file requests) * * Added calcReplaceChar * Renamed DefaultFileSystem to OSFileSystem * Made OSFileSystem convert windows \ to / on linux * Simplified logic for caching in CacheFileSystem. * Added pragma-once-c to add extra test, but also so there is an 'include' directory in preprocessor tests. * Small fixes in pragma once test. * Simplified cache handling path, so that paths/simplified paths area always added. * Improve naming of methods for different caches.
2019-01-17	Feature/hash for source identity (#786)	jsmall-nvidia
	* * Added COMMAND_LINE_SIMPLE test type * Made how spawning works controllable by paramter/type SpawnType * Made break-outside-loop and global-uniform run command line slangc * calcRelativePath -> calcCombinedPath * Add 64 bit version of GetHash. * Add support for Hash based mode for CacheFileSystem.
2019-01-16	Feature/external compiler reporting (#776)	jsmall-nvidia
	* Added support for converting SlangResult to string in PlatformUtil. * * Added reportExternalCompilerError * Made external compilers use this * Made DiagnosticSink accept UnownedStringSlice * Made emitXXX compiler functions return SlangError * Use smart pointers to handle life of Com interfaces * * Make SlangResult compatible with HRESULT for some common cases. * Make PlatformUtil::appendResult return SlangResult * Compile check SLANG_RESULT. * Add tests for checking diagnostics from external compilers. * * Make external compiler tests only run on windows for now. * Added 'windows' and 'unix' categories * Added categories based on what backends are available. Will make more tests run on linux and handle case where dxcompiler is not available on appveyor. * * Added spSessionCheckPassThroughSupport * Use to determine whats available for categories for tests * Add support for outputting source filename/s when using pass through.
2019-01-16	Add proper IR codegen support for local static const variables (#779)	Tim Foley
	Previously the IR codegen logic was treating function-scope `static const` variables just like `static` variables, which results in them generating less efficient output HLSL/GLSL. This change special-cases function-local `static const` variables with logic that mirrors how we handle global-scope `static const` variables. The approach in this change attempts to find a simpler solution to deal with `static const` variables inside of generic functions than what is currently done for `static` variables in generic functions, but I haven't tested whether that works in practice, so I didn't apply the same approach to the plain `static` case. That would make a good follow-on change. I've included a single test case to demonstrate that with this fix the Slang compiler generates output DXBC that uses an indexable "immediate" constant buffer, whereas without the fix it generates an array in local memory (slow).
2019-01-11	Fix some subtle bugs in D3D constant buffer layout (#771)	Tim Foley
	* Fix some subtle bugs in D3D constant buffer layout The root of the issue here is that the D3D constant buffer layout rules require 16-byte alignment for arrays and structures, but they do not round up the size of an array/structure type to account for that alignment. That means that in cases like the following: ```hlsl cbuffer C0 { float3 a[2]; float c0; } struct A { float4 x; float3 y; }; cbuffer C1 { A a; float c1; } ``` The `c0` and `c1` fields get an offset of 28 and not 32 like you might expect if the preceding array/structure field `a` had been padded out to match its 16-byte alignment. The actual fix here is relatively simple, and mostly amount to shuffling around some code in `type-layout.cpp` to ensure that the D3D constant buffer layout don't inherit the logic that was rounding up array/structure sizes. Along the way I took the opportunity to clean up the inheritance hierarchy by making the GLSL-family layout rules not try to share anythign with the D3D family (not that there is very little to share), which in turn allowed for some simplification of the GLSL side of things. Fixing this behavior changed the output of a few reflection tests. In the case of `tests/reflection/arrays.hlsl` the output confirmed that we had been producing bad reflection information in these kinds of cases. The output for `tests/reflection/matrix-layout.slang` also showed some bugs in our reflection, but these were overall more minor: we mis-reported the size of certain matrices as 64 bytes instead of 60, and as a result also computed the size of the overall constant buffer as 4 bytes bigger than needed. In all of these cases I double-checked the expected output against dxc to make sure that the new offsets/sizes are what we should have been producing in the first place. I also updated the reflection test harness to start outputting layout information for the element type of a structured buffer, which changed the output of `tests/reflection/structured-buffer.slang`, but this didn't show any change in what we reported: it is just information that wasn't in the output to begin with. Finally, I added two new tests around these subtle cases of buffer layout behavior (especially subtle because it varies across target APIs). The `tests/compute/buffer-layout.slang` test simply sets up a type to ilustrate the troublesome scenarios and then embeds it in both a constant buffer and structured buffer that will be backed by memory with sequential `int` values. We then read out the value of a field as a way to probe its de facto offset at runtime. This test doesn't really stress the Slang compiler (except for our ability to pass through the same type declarations to downstream compilers), but it is useful to confirm our expectations about where things land in memory. The `tests/reflection/buffer-layout.slang` test then uses the reflection test infrastructure to confirm that the same type declarations used in the compute test produce the expected offsets in our reported reflection information. Before the fixes in this change this test showed us producing dangerously incorrect results in our D3D reflection information, which has now been fixed to match the empirically-determined offsets from the compute test. * fixups based on review feedback
2019-01-07	Feature/unique tool source names (#766)	jsmall-nvidia
	* Remove AppContext. Use StdChannels to hold writers, and TestToolUtil to hold test tool specific functionality. * StdChannels -> StdWriters * getStdOut -> getOut, getStdError -> getError * Renamed main.cpp files of tools to try and stop visual studio getting confused between files - such that clicking on an error takes editor to the right location.
2018-12-21	Feature/remove app context (#765)	jsmall-nvidia
	* Remove AppContext. Use StdChannels to hold writers, and TestToolUtil to hold test tool specific functionality. * StdChannels -> StdWriters * getStdOut -> getOut, getStdError -> getError
2018-12-21	* Made 'sub command' git-like - with parameters before going to slang-test ↵	jsmall-nvidia
	and after going to the tool (#764) * Document some of the changes to command line invocation * Make -v option display the effective command that is being used to run the test
2018-12-20	Feature/lex memory reduction (#762)	jsmall-nvidia
	* Only do scrubbing if needed. When allocating content try to limit size (with scrubbing each token takes up 1k), now it's 16 bytes min size. * Don't allocate for every call to write on the CallbackWriter - use the m_appendBuffer. * Don't allocate memory for CallbackWriter use m_appendBuffer. * Use UnownedStringSlice for suffix output for parsing float/int literals. Fix typo in invalidFloatingPointLiteralSuffix * Using memory arena to hold tokens that are not in SourceManager. * Improve comment on lexing. * Make UnownedStringSlice allocation simpler on SourceManager. * Fix error on gcc around UnownedStringSlice - because VC converted string + UnownedStringSlice automatically into a String. * Fix generateName needing concat string for gcc. * When constructing a Token in parseAttributeName - because it's a Identifier, we have to set the Name. * Remove translation through String on getIntrinsicOp * Make func-cbuffer-param disablable with -exclude compatibility-issue * Move memory leak in render-test. * From review - can just use "?:" instead of performing a concat.
2018-12-18	Fix for byte-address buffers on Vulkan (#760)	Tim Foley
	* Fix output comparison for compute tests There was some vestigial logic there that was first doing a string-based comparison of actual/expected output, and then falling back to a path that parsed the expected output as a float, then converted that to an integer, then printed that integer in hex, and did the comparison with the result of that conversion. I'm not even clear on what that code was trying to accomplish, but its main effect was allowing a test failure to slide by unnoticed becaues somehow an all-zeroes actual output file was matching an expected output file with no zeros. My understanding is that it went something like this: * The first line of expected output was `A` (as in hexidecimal for the decimal integer `10`), and the first line of actual output was `0`. * The `StringToFloat` function was failing on the input string `"A"` and returned `0.0` to indicate failure (rather than reporting any kind of error) * We then converted the `0.0` to integer `0` and converted it to a base-16 string `"0"` * The comparison to the actual output passed, and then a careless early exit in the comparison loop meant that a full test would pass as soon as one line of output passed according to this "second change" logic. This change removes the broken code in the test runner since nothing was relying on it, other than the one broken test case we wanted to fix anyway. * Fix the declarations of byte-address buffer methods for Vulkan The HLSL `ByteAddressBuffer` and `RWByteAddressBuffer` types have methods `Load` and `Store` that take byte offsets from the start of the buffer, but we translate them into GLSL that uses `uint[]` array, so that indexing into that array will be off by a factor of four. Somehow the code for mutable byte address buffers was written to add 4, 8, and 12 bytes to the base offset of a vector to get to its subsequent components, but I never thought about the implications this would have for the base address itself. This change includes the following fixes: * Any place in the translation of a byte-address `Load` or `Store` method that was using the address/offset value has been changed to use `$1 / 4` instead of `$1`. * The offsets of 4, 8, and 12 have been changed to 1, 2, and 3 since they are now being added to an index instead of a byte offset * The `GetDimensions` methods have introduced a factor of `* 4` to account for the fact that they need to return a byte size and not a count of elements. With this change the existing `byte-address-buffer` test now produces the desired output under Vulkan.
2018-12-17	Feature/test tool shared libraries (#758)	jsmall-nvidia
	* Remove circular reference to renderer on Vk & D3D12 DescriptorSetImpl * Refactor Stbi image loading such that memory is correctly freed when goes out of scope. Added Crt memory dump at termination. Reduced erroneous reporting by scoping TestContext. * Used capitalized acronym for STBImage to keep Tim happy. * Split out TestReporter - to just handle reporting test results Split out Options Made TestContext hold options, and the reporter Removed remaining memory leaks. * Small optimization for rawWrite, such that it directly writes over print.. * Improve comments on TestCategorySet * Fix typos in TestCategorySet * Made slangc a cpp file as part of slang-test (removing need for separate project/shared library). * * Made all test tools only available as dlls. * Made possible to invoke test tool dll from command line slang-test slangc [--bindir xxx] options to slangc * Fix Visual Studio projects that are no longer needed.
2018-12-14	Fix memory leaks around slang-test (#757)	jsmall-nvidia
	* Remove circular reference to renderer on Vk & D3D12 DescriptorSetImpl * Refactor Stbi image loading such that memory is correctly freed when goes out of scope. Added Crt memory dump at termination. Reduced erroneous reporting by scoping TestContext. * Used capitalized acronym for STBImage to keep Tim happy. * Split out TestReporter - to just handle reporting test results Split out Options Made TestContext hold options, and the reporter Removed remaining memory leaks. * Small optimization for rawWrite, such that it directly writes over print.. * Improve comments on TestCategorySet * Fix typos in TestCategorySet
2018-12-13	Remove memory and resource leaks (#754)	jsmall-nvidia
	* Remove circular reference to renderer on Vk & D3D12 DescriptorSetImpl * Refactor Stbi image loading such that memory is correctly freed when goes out of scope. Added Crt memory dump at termination. Reduced erroneous reporting by scoping TestContext. * Used capitalized acronym for STBImage to keep Tim happy.
2018-12-12	Running tests in slang-test process (#740)	jsmall-nvidia
	* First pass at having an interface to write text to that can be replaced. Simplifed and made more rigerous the interface used to write formatted strings. * Added AppContext to simplify setting up and parsing around of streams. * Added more simplified way to get the std error/out from AppContext. * Work in progress using dll for tools to speed up testing. * First pass at ISlangWriter interface. * Added support for writing VaArgs. Added NullWriter. * Use ISlangWriter for output. * Use ISlangWriter for output - replacing OutputCallback. Make IRDump go to ISlangWriter * SlangWriterTargetType -> SlangWriterChannel Improvements around AppContext * Shared library working with slang-reflection-test. * Dll testing working for render-test. * Include va_list definintion from header. * Fix errors from clang. * Fix typo for linux. * Added -usexes option * Fix typo. * Fix arguments problem on linux. * Fix typo for linux. * Add windows tool shared library projects. * Fix warning from x86 win build. Fix signed warning from slang-test/main.cpp * First attempt at getting premake to work on travis, and run tests. * Try moving build out into script. * Invoke bash scripts so they don't have to be executable. * Drive configuration/tests from env parameters set by travis * Try using source to run travis tests. * Remove the build.linux directory - but doing so will overwrite Makefile. * Made -fno-delete-null-pointer-checks gcc only. * Try to fix warning from -fno-delete-null-pointer-checks * Turn of warnings for unknown switches. * Try to make premake choose the correct tooling. * Disabled missing braces warning. * Disable -Wundefined-var-template on clang. * -Wunused-function disabled for clang. * Fix typo due to SlangBool. * Remove this nullptr tests. * "-Wno-unused-private-field" for clang. * Added "-Wno-undefined-bool-conversion" * Add DominatorList::end fix. * Split scripts into travis_build.sh travis_test.sh * Fix gcc/clang template pre-declaration issue around QualType. * Fix premake to build such that pthread correctly links with slang-glslang
2018-12-10	Remove the "VM" and "bytecode" features (#745)	Tim Foley
	* Remove the "VM" and "bytecode" features The "bytecode" in `bc.{h,cpp}` was an initial attempt at a serialized encoding for the Slang IR, but we now have the `ir-serialize.{h,cpp}` approach which was has been kept up to date much better. Similarly, the "VM" in `vm.{h,cpp}` was intended to be a system for interpreting Slang code in the bytecode format directly (so that you could load and evaluate code in a Slang module in a lightweight fashion). This never got used past a single test, which we eventually disabled. There are good ideas in some of this code, but at this point the implementations have bit-rotted to a point where trying to maintain it is more costly than it would be to re-created it if/when we ever decide these features are important again. * fixup: remove slang-eval-test from Makefile
2018-11-29	Use hardware if available for Dx11 testing (#733)	jsmall-nvidia
	* Try using hardware device before reference on dx11 * Output error string on renderer construction failure
2018-11-29	Fix uses of dynamic_cast on types in reflection API (#731)	Tim Foley
	The `Type` infrastructure uses a class hierarchy, but blindly `dynamic_cast`ing to a desired case doesn't always give the expected result, because a `Type` could represent a `typedef` (a `NamedExpressionType`) that itself resolves to, e.g, a vector type (a `VectorExpressionType`). In that case a `dynamic_cast<VectorExpressionType*>(someType)` would fail, even though the type logically represents a vector. The `Type::As<T>()` method is designed to handle this case, by "looking through" simple `typedef`s to get at the real definition of a type. The fix in this case is to use `Type::As<T>()` at various points in the reflection code (`reflection.cpp`) instead of `dynamic_cast`. This problem surfaced with a `StructuredBuffer<float2>` not reflecting correctly, because the element type (`float2`) is actually a `typedef` (for `vector<float,2>`), so I've included a test case that stresses that case. Getting the right output in the test required tweaking the `slang-reflection-test` tool to produce additional output for resource types (currently narrowed down to only affect structured buffers to avoid large diffs in expected test outputs).
2018-11-28	* Renamed spSessionHasCompileTargetSupport to ↵	jsmall-nvidia
	spSessionCheckCompileTargetSupport. (#728) * Improved return codes from spSessionCheckCompileTargetSupport
2018-11-21	Feature/early depth stencil (#727)	jsmall-nvidia
	* First pass support for early depth stencil. * Add a simple test to check if output has attributes. * Use cross compilation to test [earlydepthstencil] on glsl. * If target is dxil, use dxc to test against. Add hlsl to test earlydepthstencil against. * * Added spSessionHasCompileTargetSupport * Made slang-test use spSessionHasCompileTargetSupport to ignore tests that cannot run
2018-11-21	Add support for unbounded arrays as shader parameters (#725)	Tim Foley
	* Add support for unbounded arrays as shader parameters With this change, Slang shaders can use unbounded-size arrays as parameters, e.g.: ```hlsl Texture2D t[] : register(t3, space2); SamplerState s[]; ``` As shown in the above example, Slang supports both explicit `register` declarations on unbounded-size arrays and also implicit binding. When doing automatic parmaeter binding, Slang will allocate a full register space to an unbounded-size array of textures/smaplers, starting at register zero. Note that for the Vulkan target, an array of descriptors of any size (including unbounded size) consumes only a single `bindign`, so much of this logic is specific to D3D targets. Details on the changes made: * The single biggest change is a new `LayoutSize` type that is used to store a value that can either be a finite unsigned integer or a dedicated "infinite" value (which is stored as the all-bits-set `-1` value). This is used in places where a size could either be a finite value or an "unbounded" value, to both try to make standard math robust against the infinite case, and also to force code to deal with both the finite and infinite cases more explicitly when they care about the difference. * The public API was documented so that unbounded-size arrays report their size as `-1`. We should probably change this function to return a signed value instead of `size_t`, but that would technically be a source-breaking change, so we want to make sure we stage it appropriately. * The code that invokes fxc was updated so that it passes the appropriate flag to enable unbounded arrays of descriptors. I haven't looked yet at whether dxc needs such a flag, so there may need to be a follow-on change to add that. * The logic in the `UsedRanges::Add` method for tracking what registers have been claimed was rewritten because the previous version had some subtle bugs. The new version includes more detailed comments that attempt to explain why I think the new logic works. * The top-level logic for auto-assigning bindings to parameters has been overhauled to deal with the fact that a parameter that needs "infinite" amounts of a resource should be claiming a full register space for those resources instead. Whenever a parameter allocates any register spaces we want them all to be contiguous, so we have a loop that counts the requirements and allocates the spaces before we go along and dole them out. * When computing the layout for an array type, we need to carefully deal with unbounded-size arrays. In the case of an unbounded array of a "simple" resource type (e.g., `Texture2D[]`), we opt to expose the type layout as consuming an infinite number of the appropriate register, while in the case of a complex type (say, a `struct` with two texture fields), we need to instead allocate whole spaces for those fields. The logic here is more subtle than I would like, and interacts with the existing code that "adjusts" the element type of an array in order to make standard indexing math Just Work. * Similarly, when a `struct` type has unbounded-array fields, then we need to transform any field with infinite register requirements to instead consume a space in the resulting aggregate type. This case is comparatively easier than the array case. * The test case for unbounded arrays covers both explicit and implicit bindings, and also the case of an unbounded array over a `struct` type (it does not cover the case of a `struct` contianing unbounded arrays, so that will need to be added later). For this test we are both validation the output reflection data and that we produce the same code as fxc (with explicit bindings in the fxc case). * The reflection test app was modified to use the new API contract and detect when a parameter consumes `SLANG_UNBOUNDED_SIZE` resources. * Fixup: ensure unbounded size is defined at right bit width
2018-11-20	Keep resources in scope for Vk in DescriptorSet. (#726)	jsmall-nvidia

2018-11-09	Feature/teamcity output (#715)	jsmall-nvidia
	* First pass support for TeamCity compatible output. * Remove reset of counters on starting suite - so summary is over all suites.
2018-10-30	Feature/serial string pool refactor (#702)	jsmall-nvidia
	* Ongoing serialization for full debug work. * Use StringRepresentationCache and StringSlicePool for serialization. * Removed some older path handling for serialization which had some wrong underlying assumptions. * Builds with refactored use of SubStringPool in ir-serialize. * Removed prohibitedCategories because not used anywhere. * Add category 'compatibility-issue' * Remove work in progress on debug serialization.
2018-10-29	Rework command-line options handling for entry points and targets (#697)	Tim Foley
	* Rework command-line options handling for entry points and targets Overview: * The biggest functionality change is that the implicit ordering constraints when multiple `-entry` options are reversed: any `-stage` option affects the `-entry` to its left instead of to its right as it used to. This is technically a breaking change, but I expect most users aren't using this feature. * The options parsing tries to handle profile versions and stages as distinct data (rather than using the combined `Profile` type all over), and treats a `-profile` option that specifies both a profile version and a stage (e.g., `-profile ps_5_0`) as if it were sugar for both a `-profile` and a `-stage` (e.g., `-profile sm_5_0 -stage fragment`). * We now technically handle multiple `-target` options in one invocation of `-slangc`, but do not advertise that fact in the documentation because it might be confusing for users. Similar to the relationship between `-stage` and `-entry`, any `-profile` option affects the most recent `-target` option unless there is only one `-target`. * The logic for associating `-o` options with corresponding entry points and targets has been beefed up. The rule is that a `-o` option for a compiled kernel binds to the entry point to its left, unless there is only one entry point (just like for `-stage`). The associated target for a `-o` option is found via a search, however, because otherwise it would be impossible to specify `-o` options for both SPIR-V and DXIL in one pass. * The handling of output paths for entry points in the internal compiler structures was changed, because previously it could only handle one output path per entry point (even when there are multiple targets). The new logic builds up a per-target mapping from an entry point to its desired output path (if any). Details: * Support for formatting profile versions, stages, and compile targets (formats) was added to diagnostic printing, so that we can make better error messages. This is fairly ad hoc, and it would be nice to have all of the string<->enum stuff be more data-driven throughout the codebase. * Test cases were added for (almost) all of the error conditions in the current options validation. The main one that is missing is around specifying an `-entry` option before any source file when compiling multiple files. This is because the test runner is putting the source file name first on the command line automatically, so we can't reproduce that case. * Several reflection-related tests now reflect entry points where they didn't before, because the logic for detecting when to infer a default `main` entry point have been made more loose * On the dxc path, beefed up the handling of mapping from Slang `Profile`s to the coresponding string to use when invoking dxc. * A bunch of tests cases were in violation of the newly imposed rules, so those needed to be cleaned up. * There were also a bunch of test cases that had accidentally gotten "disabled" at some point because there were comparing output from `slangc` both with and without a `-pass-through` option, but that meant that any errors in command-line parsing produced the same error output in both the Slang and pass-through cases. This change updates `slang-test` to always expect a successful run for these tests, and then manually updates or disables the various test cases that are affected. * When merging the updated test for matrix layout mode, I found that the new command-line logic was failing to propagate a matrix layout mode passed to `render-test` into the compiler. This was because the `-matrix-layout` options were implemented as per-target, but the target was being set by API while the option came in via command line (passed through the API). It seems like we want matrix layout mode to be a global option anyway (rather than per-target), so I made that change here. Add missing expected output files * A 64-bit fix * Remove commented-out code noted in review
2018-10-26	Work around dxc matrix layout behavior (#694)	Tim Foley
	The Slang compiler allows the default matrix layout convention (row-major vs. column-major) to be specified via the command line or API. When generating output HLSL, Slang emits a `#pragma pack_matrix` directive for the chosen default convention, so that a user can generate plain HLSL output and still have it encode their desired defaults. The problem that has arisen is that many released versions of dxc (including those in the most recent Windows SDK at this time) ignore the `#pragma pack_matrix` directive (the feature has since been added to top-of-tree dxc). The main fix here is to instead pass the `-Zpr` option in to dxc when invoking it if the row-major (non-default) convention is requested. This will solve the problem for clients that use Slang to generate DXIL, but not for clients who use Slang to generate plain HLSL that they then pass into dxc (those clients are assumed to be able to work around the problem for themselves). In order to test the change, I added a test that fills a constant buffer with sequential integers, and then reads out the rows/columns of an `int3x4` matrix with both row- and column-major layout, as well as an integer placed after the matrix, so we can see the offset it was given. The `render-test` application did not yet support generating code via dxc/DXIL, so I added an option for that. This ends up assuming that anybody who is running the D3D12 tests will also have a version of dxc available.
2018-10-26	Feature/file system cache (#692)	jsmall-nvidia
	* First pass at caching file system. * default-file-system -> slang-file-system fix problem with location("build.linux") confusing windows build for now. * Added CompressedResult Fix problem in Result construction with it being unsigned * Add support for Path simplification. * Testing for Path::Simplify. * Refactored CacheFileSystem - automatically handles ISlangFileSystem or ISlangFileSystemExt appropriately. Removed WrapFileSystem - because wasn't possible to emulate some of the behavior if just loadFile is implemented. Split out StringBlob - so that no need to convert between ISlangBlob and String repeatidly. * Remove unwanted code in ~CompileRequest
2018-10-25	Feature/premake linux (#689)	jsmall-nvidia
	* Premake work in progress for linux. * Added dump function. * Remove examples on linux Small warning fix. * * Don't build render-test on linux * Removed work around virtual destructor warning, and just used virtual dtor for simplicity * Git ignore obj directories * Fix premake working on windows. * * Fix sprintf_s functions * Make generates arg parsing more robust * Added FloatIntUnion to avoid type punning/strong aliasing issues, and repeated union definitions. * Work around problems building on linux with getClass claiming a strict aliasing issue. * Fix for targetBlock appearing potentiall used unintialized to gcc. * Linux slang link options -fPIC to make dll. * Add -fPIC to build options on linux. * Add -ldl for linux on slang. * Fixes to try and get premake working with .so on linux. * Make core compile with -fPIC * Try to fix linux linking with --no-as-needed before -ldl * Add rpath back. * Remove render-gl from linux build. * Re-add location for linux. * Don't include <malloc.h> except on windows. * Remove unused line to fix warning on osx. * Remove ambiguity on OSX for operator <<. * Fixing ambiguity with operator overloading and Int types for OSX. * Fix ambiguity around UInt and operator * Fix ambiguity of UInt conversion for OSX. * Added UnambiguousInt and UnambiguousUInt to make it easier to work around OSX integer coercion for UInt/Int types.
2018-10-16	Feature/include refactor (#675)	jsmall-nvidia
	* Refactor of path handling. * Added PathInfo * Changed ISlangFileSystem - such that has separate concepts of reading a file, getting a relative path and getting a canonical path * Added support for getting a canonical path for windows/linux * Made maps/testing around canonicalPaths * User output remains around 'foundPath' - which is the same as before * Small improvements around PathInfo * Added a type and make constructors to make clear the different 'path' uses * Fixed bug in findViewRecursively * Checking and reporting for ignored #pragma once. * Removed SLANG_PATH_TYPE_NONE as doesn't serve any useful purpose. * Improve comments in slang.h aroung ISlangFileSystem * Remove the need for <windows.h> in slang-io.cpp * Ran premake5. * Improvements and fixes around PathInfo. * Fix typo on linix GetCanonical * Make the ISlangFileSystem the same as before, and ISlangFileSystem contain the new methods. Internally it always uses the ISlangFileSystemExt, and will wrap a ISlangFileSystem with WrapFileSystem, if it is determined (via queryInterface) that it doesn't implement the full interface.
2018-10-12	Add a warning on missing return, and initial SCCP pass (#671)	Tim Foley
	* Add a warning on missing return, and initial SCCP pass The user-visible feature added here is a diagnostic for functions with non-`void` return type where control flow might fall off the end. This sounds like a trivial diagnostic to add as part of the front-end AST checking, but that can run afoul of really basic stuff like: ```hlsl int thisFunctionisOkay(int a) { while(true) { if(a > 10) return a; a = a2 + 1; } // no return here! } ``` This function "obviously" doesn't need to have a `return` statement at the end there, but realizing this fact relies on the compiler to understand that the `while(true)` loop can't exit normally, and doesn't contain any `break` statement. One can write "obvious" examples that need more and more complex analysis to rule out. The answer Slang uses for stuff like this is to do the analysis at the IR level right after initial code generation (this would be before serialization, BTW, so that attached `IRHighLevelDeclDecoration`s can be used). When lowering the AST to the IR, we always emit a `missingReturn` instruction (a subtype of `IRUnreachable`) at the end of its body if it isn't already terminated. The IR analysis pass to detect missing `return` statements is then as simple as just walking through all the functions in the module and making sure they don't contain `missingReturn` instructions. For that simple pass to work, we first need to make some effort to remove dead blocks that control flow can never reach. This change adds a very basic initial implementation of Spare Conditional Constant Propagation (SCCP), which is a well-known SSA optimization that combines constant propagation over SSA form with dead code elimination over a CFG to achieve optimizations that are not possible with either optimization along. For the moment, we don't actually implement any constant folding* as part of the SCCP pass, so we can eliminate the dead block in a case like the function above (and those in the test case added in this change), but will not catch things like a `while(0 < 1)` loop. Handling more "obvious" cases like that is left for future work. * fixup: warning on unreachable code * Handle case where user of an inst isn't in same function/code The code as assuming any instruction in the SSA work list has to come from the function/code being processed, but this misses the case where an instruction in a generic has a use inside the function that the generic produces. This change adds code to guard against that case.
2018-10-10	Feature/source loc refactor (#668)	jsmall-nvidia
	* * Remove the need for IRHighLevelDecoration in Emit * Use the IRLayoutDecoration for GeometryShaderPrimitiveTypeModifier * Initial look at at variable byte encoding, and simple unit test. * Fixing problems with comparison due to naming differences with slang/fxc. * * More tests and perf improvements for byte encoding. * Mechanism to detect processor and processor features in main slang header. * Split out cpu based defines into slang-cpu-defines.h so do not polute slang.h * Support for variable byte encoding on serialization. * Removed unused flag. * Fix warning. * Fix calcMsByte32 for 0 values without using intrinsic. * Fix a mistake in calculating maximum instruction size. * Introduced the idea of SourceUnit. * Small improvements around naming. Add more functionality - including getting the HumaneLoc. * Add support for #line default * Compiling with new SourceLoc handling. * Fix off by one on #line directives. * Can use 32bits for SourceLoc. Fix serialize to use that. * Small fixes and comment on usage. * Premake run. * Fix signed warning. * Fix typo on StringSlicePool::has found in review.
2018-10-09	Feature/var byte encoding (#665)	jsmall-nvidia
	* * Remove the need for IRHighLevelDecoration in Emit * Use the IRLayoutDecoration for GeometryShaderPrimitiveTypeModifier * Initial look at at variable byte encoding, and simple unit test. * Fixing problems with comparison due to naming differences with slang/fxc. * * More tests and perf improvements for byte encoding. * Mechanism to detect processor and processor features in main slang header. * Split out cpu based defines into slang-cpu-defines.h so do not polute slang.h * Support for variable byte encoding on serialization. * Removed unused flag. * Fix warning. * Fix calcMsByte32 for 0 values without using intrinsic. * Fix a mistake in calculating maximum instruction size.
2018-10-04	Support cross-compilation of ray tracing shaders to Vulkan (#663)	Tim Foley
	* Move to newer glslang * Support cross-compilation of ray tracing shaders to Vulkan This change allows HLSL shaders authored for DirectX Raytracing (DXR) to be cross-compiled to run with the experimental `GL_NVX_raytracing` extension (aka "VKRay"). * The GLSL extension spec is marked as experimental, so that any shaders written using this support should be ready for breaking changes when the spec is finalized. * "Callable shaders" are not exposed throug the GLSL extension, so this feature of DXR will not be cross-compiled. * The experimental Vulkan raytracing extension does not have an equivalent to DXR's "local root signature" concept. This does not visibly impact shader translation (because the local/global root signature mapping is handled outside of the HLSL code), but in practice it means that applications which rely on local root signatures on their DXR path will not be able to use the translation in this change as-is; more work will be needed. The simplest part of the implementation was to go into the Slang standard library and start adding GLSL translations for the various DXR operations. In some cases, like mapping `IgnoreHit()` to `ignoreIntersectionNVX()` this is almost trivial. The various functions to query system-provided values (e.g., `RayTMin()`) were also easy, with the only gotcha being that they map to variables rather than function calls in GLSL, and our handling of `__target_intrinsic` assumes that a bare identifier represents a replacement function name, and not a full expression, so we have to wrap these definitions in parentheses. The tricky operations are then `TraceRay<P>()` and `ReportHit<A>()`, because these two are generics/templates in HLSL. GLSL doesn't support generics, even for "standard library" functions, so the raytracing extension implements a slightly complex workaround: the matching operations `traceNVX()` and `reportIntersectionNVX()` pass the payload/attributes argument data via a global variable. That is, shader code for the GLSL extensions writes to the global variable and then calls the intrinsic function. The linkage between the call site and the global is established by a modifier keyword (`rayPayloadNVX` and `hitAttributeNVX`, respectively) and in the case of ray payload also uses `location` number to identify which payload global to use (since a single shader can trace rays with multiple payload types). Our translation strategy in Slang tries to leverage standard language mechanisms instead of special-case logic. For example, to translate the `ReportHit<A>()` function, we provide both a default declaration that will work for HLSL (where the operation is built-in with the signature given), and a definition marked with the `__specialized_for_target(glsl)` modifier. The GLSL definition declares a function `static` variable that will fill the role of the required global, and then does what the GLSL spec requires: assigns to the global, and then calls the `reportIntersectionNVX` builtin (which we declare as a separate builtin). Our ordinary lowering process will turn that `static` variable into an ordinary global in the IR, and the `[__vulkanHitAttributes]` attribute on the variable will be emitted as `hitAttributeNVX` in the output. There is no additional cross-compilation logic in Slang specific to `ReportHit<A>()` - the target-specific definition in the standard library Just Works. The case for `TraceRay<P>()` is a bit more complicated, simply because the GLSL `traceNVX()` function needs to be passed the `location` for the payload global. We implement the payload global as a function-`static` variable, with the knowledge that every unique specialization of `TraceRay<P>()` will generate a unique global variable of type `P` to implement our function-`static` variable. We then add a slightly magical builtin function `__rayPayloadLocation()` that can map such a variable to its generated `location`; the logic for this is implemented in `emit.cpp` and described below. We also changed the `RayDesc` and `BuiltinTriangleIntersectionAttributes` types from "magic" intrinsic types over to ordinary types (because the GLSL output needs to declare them as ordinary `struct` types). This ends up removing some cases in the AST and IR type representations. By itself this change would break HLSL emit, because in that case the types really are intrinsic. We added a `__target_intrinsic` modifier to these types to make them intrinsic for HLSL, and then updated the downstream passes to handle the notion of target-intrinsic types. The logic for binding/layout of entry point inputs and outputs was updated so that raytracing stages don't follow the default logic for varying input/output parameters. This is because the input/output parameters of a raytracing entry point aren't really "varying" in the same sense as those in the rasterization pipeline. In particular, the SPIR-V model for raytracing input and output treats "ray payload" and "hit attributes" parameters as being in a distinct storage class from `in` or `out` parameters. We also detect cases where a ray tracing stage declares inputs/outputs that it shouldn't have. This logic could conceivably be extended to other stages (e.g., to give an error on a compute shader with user-defined varying input/output). The type layout logic added cases for handling raytracing payload and hit-attribute data, but this is currently just a stub implementation that follows the same logic as for varying `in` and `out` parameters (it cannot give meaningful byte sizes/offsets right now). To my knowledge the GLSL spec doesn't currently specify anything about layout, and I haven't read the DXR spec language carefully enough to know what it says about layout. A future change should update the layout logic to allow for byte-based layout of ray payloads, etc. so that we can query this information via reflection. The GLSL legalization logic in `ir.cpp` was updated to factor out the per-entry-point-parameter code into its own function, and then that function was updated to special-case the input/output of a ray-tracing shader. While for rasterization stages we typically want to take the user-declared input/output and "scalarize" it for use in GLSL (in part to deal with language limitations, and in part to tease system values apart from user-defined input/output), the GLSL spec for raytracing requires payload and hit attribute parameters to be declared as single variables. There is also the issue that even for an `in out` parameter, a ray payload parameter should only turn into a single global, whereas the handling for varying `in out` parameters generates both an `in` and an `out` global for the GLSL case. Other than the handling of entry point parameters, the GLSL legalization pass doesn't need to do anything special for ray tracing shaders. The trickiest change in the `emit.cpp` logic is that we now generate `location`s for ray payload arguments (the outgoing from a `TraceRay()` call) on demand during code generation. This is a bit hacky, and it would be nice to handle it as a separate pass on the IR rather than clutter up the emit logic, but this approach was expedient. Basically, any of the global variables that got generated from the `static` declarations in the standard library implementation of `TraceRay()` will trigger the logic to assign them a `location`. The logic for emitting intrinsic operations added a few new `$`-based escape sequences. The `$XP` case handles emitting the location of a generated ray payload variable; this is how we emit the matching location at the site where we call `traceNVX`. The `$XT` case emits the appropriate translation for `RayTCurrent()` in HLSL, because it maps to something different depending on the target stage. All of the test cases here consist of a pair of an HLSL/Slang shader written to the DXR spec, plus a matching GLSL shader for a baseline. The GLSL shaders are carefully designed so that when fed into glslang they will produce the same SPIR-V as our cross-compilation process. This kind of testing is quite fragile, but it seems to be the best we can do until our testing framework code supports both DXR and VKRay. A bunch of the core changes ended up being blocked on issues in the rest of the compiler, so some additional features go implemented or fixed along the way: The first big wall this work ran into was that the `__specialized_for_target` modifier hasn't actually been working correctly for a while. It turns out that for the one function that is using it, `saturate()`, we have been outputting the workaround GLSL function in all cases (including for HLSL output) rather than only on GLSL targets. The problem here is that for a generic function with a `__specialized_for_target` modifier or a `__target_intrinsic` modifier, the IR-level decoration will end up attached to the `IRFunc` instruction nested in the `IRGeneric`, but the logic for comparing IR declarations to see which is more specialized (via `getTargetSpecializationLevel()`) was looking only at decorations on the top-level value (the generic). The quick (hacky) fix here is to make `getTargetSpecializationLevel()` try to look at the return value of a generic rather than the generic itself, so that it can see the decorations that indicate target-specific functions. A more refined fix would be to attach target-specificity decorations to the outer-most generic (to simplify the "linking" logic). The only reason not to fold that into the current fix is that the `__target_intrinsic` modifier currently serves double-duty as a marker of target specialization and information to drive emit logic. The latter (the emit-related stuff) currently needs to live on the `IRFunc`, and moving it to the generic could easily break a lot of code. This needs more work in a follow-on fix, but for now target specialization should again be working. The other big gotcha that the simple "just use the standard library" strategy ran into was that function-`static` variables weren't actually implemented yet, and in particular function-`static` variables inside of generic functions required some careful coding. The logic in `lower-to-ir.cpp` has this `emitOuterGenerics()` function that is supposed to take a declaration that might be nested inside of zero or more levels of AST generics, and emit corresponding IR generics for all those levels. This is needed because two different AST functions nested inside a single generic `struct` declaration should turn into distinct `IRFunc`s nested in distinct `IRGeneric`s. The tricky bit to making that all work is that the same AST-level generic type parameter will then map to different IR-level instructions (the parameters of distinct `IRGeneric`s) when lowering each function. The existing logic handled this in an idiomatic way by making "sub-builders" and "sub-contexts." This change refactors some of the repeated logic into a `NestedContext` type to help simplify the pattern, and applies it consistently throughout the `lower-to-ir.cpp` file. Besides that cleanup, the major change is `lowerFunctionStaticVarDecl` which, unsurprisingly, handles lower of function-`static` variables to IR globals. The careful handling of nested contexts here is needed because if we are in the middle of lowering a generic function, then a `static` variable should turn into its own `IRGeneric` wrapping an `IRGlobalVar`. The body of the function should refer to the global variable by specializing the global variable's `IRGeneric` to the parameters of the functions `IRGeneric`. This tricky detail is handled by `defaultSpecializeOuterGenerics`. An additional subtlety not actually required for this raytracing work (and thus not properly tested right now) is handling function-`static` variables with initializers. These can't just be lowered to globals with initializers, because HLSL follows the C rule that function-`static` variables are initialized when the declaration statement is first executed (and this could be visible in the presence of side-effects). The lowering strategy here translates any `static` variable with an initializer into two globals: one for the actual storage, plus a second `bool` variable to track whether it has been initialized yet. There are some opportunities to optimize this case, especially for `static const` data, but that will need to wait for future changes. We've slowly been shifting away from the model where a user thinks of a "profile" as including both a stage and a feature level. Instead, the user should think about selecting a profile that only describes a feature level (e.g., `sm_6_1`, `glsl_450`, etc.), and then separately specifying a stage (`vertex`, `raygeneration, etc.) for each entry point. The challenge here is that the command-line processing still only had a single `-profile` switch, and no way to specify the stage. Adding the `-stage` option was relatively easy, but making it work with the existing validation logic for command-line arguments was tricky, because of the complex model that `slangc` supports for compiling multiple entry points in a single pass. * In `slang.h` add new reflection parameter categories for ray payloads and hit attributes, as part of entry point input/output signatures. * A previous change already updated our copy of glslang to one that supports the `GL_NVX_raytracing` extension, so in `slang-glslang.cpp` we just needed to map Slang's `enum` values for the raytracing stage names to their equivalents in the glslang code. * Moved the logic for looking up a stage by name (`findStageByName()`) out of `check.cpp` and into `compiler.cpp`, with a declaration in `profile.h` * Added a `$z` suffix to the GLSL translation of `Texture.SampleLevel()`, to handle cases where the texture element type is not a 4-component vector. Note that this fix should actually be applied to all* these texture-sampling operations, but I didn't want to add a bunch of changes that are (clearly) not being tested right now. * The layout logic for entry points was updated to correctly skip producing a `TypeLayout` for an entry point result of type `void`, which meant that the related emit logic now needs to guard against a null value for the result layout. * In `ir.cpp`, dump decorations on every instruction instead of just selected ones, so that our IR dump output is more complete. * Added a command-line `-line-directive-mode` option so that we can easily turn off `#line` directives in the output when debugging. Not all cases where plumbed through because the `none` case is realistically the most important. * Parser was fixed to properly initialize parent links for "scope" declarations used for statements, so that we can walk backwards from a function-scope variable (including a `static`) and see the outer function/generics/etc. * Added GLSL 460 profile, since it is required for ray tracing. Also updated the logic for computing the "effective" profile to use to recognize that GLSL raytracing stages require GLSL 460. * Added some conventional ray-tracing shader suffixes to the handling in `slang-test`. This code isn't actually used, but was relevant when I started by copy-pasting some existing VKRay shaders as the starting point for my testing. * Fixup: typos
2018-09-27	First pass implementation of IR serialization (#653)	jsmall-nvidia
	* * Change the layout of IROp such that 'main' IROps are 0-x. * Removed MANUAL_RANGE instuction types, as no longer needed. * Work in prog on optimizing. * * Constant time lookup for IROpInfo * Refactor and document a little more the IROp layout * Mark ops that use 'other' bits * Fix typo in definition of kIROpFlag_UseOther * First pass at working out serialization structure. * Work in progress on ir-serialize * Storing strings in IRSerialInfo Split out IRSerialInfo from the IRSerializer - to make more explicit what is actually saved. * First pass at serializing out data. * First pass at serialize reading. * Fix riff fourcc mark order. * First pass at reconstructing IRInst / IRDecoration from serialized data. * Handling of TextureBaseType * Deserializing of constants. * Small changes around ir serialization. * Changed StringIndex indexing to not be an offset into the m_strings array, but an index into strings in order. Doing so makes cache lookup much faster, and makes the 'indicies' themselves smaller and therefore more compressible. * Removed the need for m_arena in IRSerialWriter. Previously it's purpose was to store the string contents that were being used to lookup UnownedStringSlice. Now we keep the StringRepresentation in scope and reference that, and so don't need the copy. * Don't need to construct the IRModuleInst as is created and set on createModule call. * Remove test code for testing serialization. * Fix problem with release build in ir-serialize causing warning. * Use SLANG_OFFSET_OF for offsets in non pod classes to avoid gcc/clang warning. Give storage to integral static variables to avoid linkage problems with gcc/clang. * Fix warnings under x86 win32 debug.
2018-09-21	Remove the "hack sampler" workaround (#648)	Tim Foley
	* Update glslang version * Fix build for new glslang The latest glslang required a few changes to our manual build for their code (because we are not taking a dependency on CMake). * Rebuild project files using premake, which picks up a few files added to glslang, but also a few diffs in Slang's own project files in cases where they were edited manually instead of using premake. * Fix up the declaration our our device limits (which are inentionally set to not limit what code passes through our glslang), because the underlying structure definition in glslang has changed. This is a kludgy bit of glslang's design, but it doesn't make sense for us to invest in a more serious workaround. * Remove the "hack sampler" workaround When the `GL_KHR_vulkan_glsl` spec was introduced to allow GLSL to be compiled for Vulkan SPIR-V, it made an annoying mistake by leaving a few builtins as taking `sampler2D`, etc. when the equivalent SPIR-V operations only require a `texture2D`, etc. The relevant builtins are: * `textureSize` * `textureQueryLevels` * `textureSamples` * `texelFetch` * `texelFetchOffset` This means that shader code that wanted to use those operations needed to conspire to have a `sampler` handy so they could write, e.g.: ```glsl vec4 val = texelFetch(sampler2D(myTexture, someRandomSampler), p, lod); ``` when what they really wanted was this: ```glsl vec4 val = texelFetch(myTexture, p, lod); ``` That is annoying but probably something each to work around for a GLSL programmer, but when cross-compiling from HLSL, you might have an operation like: ```hlsl float4 val = myTexure.Load(p); ``` in which case a cross-compiler needs to manufacture a sampler out of thin air. If the shader happened to use a sampler for something else you could snag that, but in the worse case you had to cross-compile to GLSL that declared a new sampler. Slang did this by declaring a sampler called `SLANG_hack_samplerForTexelFetch` (because `texelFetch` is the operation that first surfaced the issue). For complex reasons we always define this sampler, even if we turn out not to need it in a particular output kernel. This choice has a bunch of annoying consequences: * There is always a sampler defined in descriptor set zero, because that's where we put the hack sampler, so a user-defined parameter block always has a set number of 1 or greater (see #646). * The hack sampler shows up in reflection output because users need to size their descriptor sets appropriately to pass along this sampler that won't actually be used if they don't want to get debug spew from the validation layers. We filed an issue on glslang about this problem, and eventually some kind folks from the gamedev community (who also saw the same problem) defined an extension spec (`GL_EXT_samplerless_texture_functions`) to fix the underlying issue and contributed a patch to glslang to make it support that extension. This change just backs the hack out of Slang now that we have a glslang version that supports the extension to get past the defect in the original GLSL-for-Vulkan definition. Besides yanking out the code for the hack, we also change the relevant builtins to declare that they require this new GLSL extension (so that we properly request it from glslang when the builtins are used), and fix some reflection test cases that exposed the existence of the "hack sampler." * Fixup: syntax error in stdlib generator files * Remove more code for hack sampler There was logic to ensure we always have a "default" register space/set when cross-compiling, because the hack sampler would need it. This is no longer necessary once we remove the hack sampler. * Fix expected test output. Fixing the root cause of issue #646 means that one of our test cases that tickles that issue now produces different output (luckily it can now be used as a regression test for the issue).
2018-09-17	Control unit tests being run with -category -exclude and using prefix. (#637)	jsmall-nvidia
	Unit tests appear in unit-test category Unit tests 'appear' in a directory unit-tests Removed the -unitTests option
2018-09-17	Hotfix/fixing warnings (#636)	jsmall-nvidia
	* * Remove dispose from IRInst * Use MemoryArena instead of MemoryPool * Make all IRInst not require Dtor - by having ref counted array store ptrs that need freeing * Increase block size - typically compilation is 2Mb of IR space(!) * Fix issues around StringRepresentation::equal because null has special meaning. * Don't bother to construct as String to compare StringRepresentation, just used UnownedStringSlice. * Added fromLiteral support to UnownedStringSlice and use instead of strlen version. * Use more conventional way to test StringRepresentation against a String. * Fix gcc/clang template problem with cast. * Fix warnings.
2018-09-13	Feature/memory arena improvements (#633)	jsmall-nvidia
	* First pass at MemoryArena. * First pass at RandomGenerator. * Extract TestContext into external source file. * Fix warning on printf. * Use enum classes for Test enums. OutputMode -> TestOutputMode. * First pass at FreeList unit test. * Auto registering tests. Improvements to RandomGenerator. * Remove the need for unitTest headers - cos can use registering. * Added unitTest for MemoryArena. * Do unit tests. * Fix typo. * Fix problem limiting errors from TestContext. * Refactor of MemoryArena * Removed the ability to rewind (to improve memory usage/simplify) * Better memory usage - around oversized blocks + Will keep allocating from a normal block if more than 1/3 memory left, or an oversided block is allocated * Better unitTest coverage for MemoryArena. * Fixes based on code review * Remove e prefix from enum class types for TestContext * Added extra checking for allocations sizes * Fixed some typos * Added std::is_pod test to allocateAndCopyArray * Add include for is_pod needed for linux build.