slang.git - Making it easier to work with shaders

Age	Commit message (Collapse)	Author
2020-02-11	Small improvements around List (#1216)	jsmall-nvidia
	* * Improved fastRemoveAt * Fixed off by one bug * Fixed const safeness with List<> * Made List begin and end const safe. * Revert to previous RefPtr usage. * Fix bug with casting. * Tabs -> spaces. Small fixes/improvements to List. * Improve comment on List. * hasContent -> isNonEmpty
2020-02-11	Make fixes for the attribute-error test case. (#1215)	Tim Foley
	There are two main fixes here: The first is to remove a memory leak in the reflection test tool, in the case where Slang compilation fails. There is no real reason to be using the reflection test tool for tests that produce diagnostics (we have the slangc tool for that), but it makes sense to go ahead and fix the leak rather than work around it. This was one of those leaks that could have been avoided with smart pointers and a COM-lite API. The second issue was that the logic for constructing an `AttributeDecl` based on a user-defined `struct` was not correctly setting the parent of the attribute decl. The code in question was a little hard to follow and had a few steps that didn't seem strictly necessary given its goals, so I went ahead and scrubbed+commented it to just do what made sense to me (and the tests still pass...). I'm not entirely happy with the design and implementation approach for user-defined attributes, so we might want to take another stab at it sooner or later. This change is not meant to address any design issues, and is just about fixing bugs in what is already there.
2020-02-10	Fix output GLSL for primitive ID in a geometry shader (#1214)	Tim Foley
	We had been translating an `SV_PrimitiveID` input in a shader over to `gl_PrimitiveID` in GLSL. That translation seemed to work just fine for users, so we thought it was correct. It turns out that `gl_PrimitiveID` is the correct GLSL for a primitive ID input in every stage except a geometry shader. In a geometry shader, `gl_PrimitiveID` is a primitive ID output, and if you want the input case you have to write `gl_PrimitiveIDIn` (note the `In` suffix). This change sets aside my bewilderment at the above long enough to implement a workaround in the GLSL legalization step. I also modified our current geometry shader cross compilation test to make use of an input primitive ID.
2020-02-10	Add attributes to enable dual-source blending on Vulkan (#1210)	Tim Foley
	This change adds support for the `[[vk::location(...)]]` and `[[vk::index(...)]]` attributes, which can be used together to mark up shader outputs for dual-source blending on Vulkan. HLSL/Slang code like the following: ```hlsl struct Output { [[vk::location(0)]] float4 a : SV_Target0; [[vk::location(0), vk::index(1)]] float4 b : SV_Target1; } [shader("fragment")] Output main(...) { ...} ``` can be used to set up dual-source blending on both D3D and Vulkan APIs. The output GLSL for the above will look something like: ```glsl layout(location = 0) out vec4 a; layout(location = 0, index = 1) out vec4 b; void main() { ... } ``` The more or less straightforward parts of this change were: * Added new `attribute_syntax` declarations to the stdlib, for `[[vk::location(...)]]` and `[[vk::index(...)]]` * Added new AST node types for the new attribute cases, sharing a base class so that argument checking can be shared * Added checks for the arguments to the new attributes in `slang-check-modifier.cpp` (eventually this kind of logic shouldn't be needed for new attributes) * Updated GLSL emit logic so that it treats the `index`/`space` parts of a variable layout as the `location`/`index` for varying parameters. * Updated GLSL legalization so that when it translates entry-point parameters into globals (and scalarizes structures) it handles both a binding index and space for the parameters. * Added a cross-compilation test case to verify that the basics of the feature work The remaining work is all in `slang-parameter-binding.cpp`. There is some work that isn't technically related to this change (and which could be reverted if it causes problems), around the detection and handling of fragment shader outputs with `SV_Target` semantics. The basic changes (which could be backed out and then merged separately) are: * Made the special-case `SV_Target` logic only trigger for fragment shaders (that is the only place where `SV_Target` should appear, but we weren't guarding against it) * Made the logic to reserve a `u<N>` register for `SV_Target<N>` only trigger for D3D Shader Model 5.0 and below (since it is not required for SM 5.1 and up). This could be a breaking change for some users, but that seems unlikely. * Fixed one test case that relied on the behavior of reserving `u0` for `SV_Target0` even though it was a SM6.0 test. * Also added more comments to the system-value handling logic. The more interesting changes come up starting in `processEntryPointVaryingParameterDecl()`. The basic issue is that we have so far only supported implicit layout for varying parameters on GLSL/Vulkan, but the `[[vk::location(...)]]` attribute is a form of explicit layout annotation. Rather than try to kludge something that only works in narrow cases, I instead opted to try to fix things more generally. In `processEntryPointVaryingParameterDecl()` we now check for the `location` and `index` attributes when we are on "Khronos" targets (Vulkan/OpenGL/GLSL) and immediately add them to the variable layout being constructed if they are found. There is nothing in this logic specific to fragment-shader outputs, so this feature now applies to any varying input/output on Khronos targets. Allowing explicit layouts creates the potential for mixing implicit and explicit layout. For example, consider: ```hlsl struct Output { float4 color : COLOR; [[vk::location(0)]] float3 normal : NORMAL; } ``` What `location` should `color` get? Should this code be an error? There are two cases where this conundrum can come up: when working with `struct` types used for varying parameters, and the entry-point parameter list itself. For the varying `struct` case we currently make an expedient choice. We handle fields with both implicit or explicit layotu with appropriate logic, but logic that doesn't account for the case of mixing the two. Then at the end of layout for the `struct` we issue an error if there was a mix of implicit and explicit layout (such that our results aren't likely to be valid). For the entry point varying parameter case, things were already using a `ScopeLayoutBuilder` type (that encapsulates some logic shared between entry-point and global parameters). The entry-point-specific bits were moved out into a `SimpleScopeLayoutBuilder` and it was updated so that rather than assuming all parameters use implicit layout it does a two-phase layout approach similar to what we use for the global scope: * First all parameters are enumerated to collect explicit bindings and mark certain ranges as "used" * Next the parameters are enumerated again and those without explicit bindings get allocated space using a "first fit" algorithm In principle we could extend the two-phase approach to apply to `struct` types as well, but that would be best saved for a future refactoring of some of this parameter binding logic, since I would like to exploit more of the opportunities for sharing code across the uniform/varying and struct/entry-point/global cases. By moving the point where entry point parameters get their offsets assigned, it was necessary to move around some of the logic that removes varying parameter usage (and other things that shouldn't "leak" out of an entry point) to a different point in the entry point layout process. While adding these various pieces does not quite enable us to support explicit bindings on entry point parameters (e.g., putting `uniform Texture2D t : register(t0)` in an entry point parameter list) or in `struct` types (e.g., explicit `packoffset` annotations on fields), it starts to provide some of the infrastructure that we'd need in order to support those cases.
2020-02-08	Fixes to make all CPU compute shaders work on CUDA (#1211)	jsmall-nvidia
	* Launch CUDA test taking into account dispatch size. * Enable isCPUOnly hack to work on CUDA. * Rename 'isCPUOnly' hack to 'onlyCPULikeBinding'. * Add $T special type. Support SampleLevel on CUDA. * Fix typo.
2020-02-07	Code standard changes for Lexer (#1209)	jsmall-nvidia
	* Upper camel -> lowerCamel m_ prefix members where appropriate _ prefix module local functions * m_ prefix members in Lexer. Fit's standard because type has methods/ctor.
2020-02-07	HLSL Intrinsic coverage test improvements (#1206)	jsmall-nvidia
	* Fix CPP construct when matrix type. * Test intrinsics on float matrices. * Fix typo in _areNearlyEqual test. Increased default sensitivity. Added matrix-float test. * Matrix double test. Fixed some issues with CUDA. * Added reduced intrinsic version of matrix-double test. * Improve matrix double coverage. Test reflect/length etc on vector float. * * Added literal-float test. * Added vector double test * Improved coverage of vector/matrix tests * Disable Dx11 double-vector test because fails on CI. * Disable literal-float, because on CI fails.
2020-02-07	Change handling of strings for HLSL/GLSL targets (#1204)	Tim Foley
	* Change handling of strings for HLSL/GLSL targets This change switches our handling of string literals and `getStringHash` to something that is more streamlined at the cost of potentially being less general/flexible. * `String` is now allowed as a parameter type in user-defined functions * `getStringHash` is now allowed to apply to `String`-type values that aren't literals * The list of strings in an IR module is now generated during IR lowering as part of lowering a string literal expression, rather than being defined by recursively walking the IR of the module looking for `getStringHash` calls. The public API still refers to these as "hashed" strings, but they are realistically now "static strings." * When emitting code for HLSL/GLSL, the `String` type emits as `int`, and `getStringHash(x)` emits as `x`. In terms of implementation, the choice was whether to translate `String` over to `int` in an explicit IR pass, or to lump it into the emit pass. While adding the logic to emit clutters up an already complicated step, it is ultimately much easier to make the change there than to write a clean IR pass to eliminate all `String` use. Note that other targets that can handle a more full-featured `String` type are not addressed by this change and thus do not support `String` at all. It may be woth emitting `String` as `const char` on those targets, and emitting string literals directly, but the `getStringHash` function would need to be implemented in the "prelude" then, and we probably want to pick a well-known/-documented hash algorithm before we go that far. This change also brings along some some clean-ups to the `gpu-printing` example, since it can now take advantage of the new functionality of `String`. Fix up tests for new string handling * Add global string literal list to string-literal test (since we now list all static string literals and not just those passed to `getStringHash`) * Disable `getStringHash` test on CPU, since we don't have a working `String` on that platform right now (only HLSL/GLSL) Co-authored-by: Tim Foley <tim.foley.is@gmail.com>
2020-02-06	Float matrix intrinsic test/fixes (#1203)	jsmall-nvidia
	* Fix CPP construct when matrix type. * Test intrinsics on float matrices. * Fix typo in _areNearlyEqual test. Increased default sensitivity. Added matrix-float test.
2020-02-06	Literal handling improvements (#1202)	jsmall-nvidia
	* WIP: 64 literal diagnostic and truncation. * Improve how integer truncation is handled/supported. Added literal-int64.slang test. Set a suffix on all literals. Fixed problem on C++ based targets where l suffix was not the same as int() cast. So on C++ derived emitters, int() is used instead of l suffix to have same behavior across targets. * Add literal diagnostic testing. * Allow lexer to lex - in front of literals. * Fix lexing and converting int literal with -. * Too large small values of floats become inf. Handling writing inf types out on different targets. Add function to deterimine if a float literals kind. * Roll back the support of lexer lexing negative literals. * Fixed tests broken because of diagnostics numbers. Improved _isFinite * Fix compilation on linux. * Fix problem with abs on linux - use Math::Abs. * Fix typo. * * Improve warnings for float literals zeroed * Improved 64 bit type documentation * Handle half * Improved comments * Fixed tests broken * Use capital letters for suffixes. * Make default behavior on outputting a int literal that is an 'int32_t' is cast (not suffix) to avoid platform inconsistencies. Improve documentation for 64 bit types. Make tests cover material in docs. * Fixed tests. * Rename FloatKind::Normal -> Finite * Fix half zero check.
2020-02-06	Improve checks and diagnostics around redeclarations (#1201)	Tim Foley
	* Improve checks and diagnostics around redeclarations This change turns checking for redeclarations into a dedicated phase of semantic checking, and ensures that it applies to the main categories of declarations: functions, types, and variables. Note that "variables" here includes function parameters and `struct` fields in addition to the more obvious global and local variables. Some of the logic for checking redeclarations already existed for functions, and was refactored to deal with other cases of declarations. The checking for functions still needs to be special-cased because functions are much more flexible about the kinds of redeclarations that are allowed. In addition to improving the diagnosis of redeclaration itself, this change also changes the error message that is produced when referencing a symbol that is ambiguous due to begin redeclared. This is a small quality-of-life fix, and has the benefit of being much easier to implement than robust tracking of what variables have had redeclaration errors issued so that we can skip emitting an ambiguity error at the use site. A new test case was added to cover the redeclaration cases for variables (but not types or functions), and the test for function parameters was updated to account for the new more universal diagnostic message (since function parameters used to have special-case redeclaration checking). * fixup: missing file
2020-02-05	Improve behavior when undefined identifier is a contextual keyword (#1200)	Tim Foley
	The HLSL language has keywords with very common names like `triangle`, and Slang doesn't want to preclude users from using such names for their variables/functions/etc. In addition, Slang adds new keywords on top of HLSL (like `extension`) and we don't want those to prevent us from compiling existing code. As a result, almost all keywords in Slang are contextual keywords, and they can be shadowed by user varaibles. The down-side to making all keywords contextual is that in a case like this: ``` int test() { return triangle; } ``` The identifier `triangle` is not undefined as far as lookup (it is defined as a modifier keyword), so the existing "undefined identifier" logic gets bypassed, and instead we ran into an internal compiler error trying to construct an expression that refers to a modifier keyword. Fortunately, the internal compiler error in that case was overkill, and the compiler already had defensive logic to produce an expression with an error type if it couldn't figure out what the type of a declaration reference should be. The main fix here is thus to emit an "undefined identifier" error instead of an internal compiler error at the point where we see an attempt to reference a declaration that shouldn't be available in an expression context. In order to improve the quality of the diagnostic, the code for constructing declaration references was updated to pass along a source location to be used in error messages.
2020-02-05	Add support for RWBuffer writes on GLSL/SPIR-V target (#1199)	Tim Foley
	This appears to have been an oversight in the work that added support for `imageStore` as well as atomics when writing to `RWTexture*` and friends. The HLSL/Slang `RWBuffer` type maps to GLSL as an `imageBuffer`, which is effectively just another case of writable texture image (bonus points to anybody who can explain to me the meaningful distinction between an `imageBuffer` and an `image1D`). This change copies the handling of subscript access (`operator[]`) from textures over to buffers, and adds a test case to confirm that the new handling works for the simple case of setting a buffer element.
2020-02-04	CUDA/C++ backend improvements (#1198)	jsmall-nvidia
	* WIP with vector float test. * vector-float test working. * Fixed remaing tests broken with init changes. * Improve 64bit-type-support.md * Disable tests broken on CI system for Dx. * WIP: Make type available for comparison. * Moved type conversion into TypeTextUtil. * Add text/type conversions from DownstreamCompiler to TypeTextUtil. * Allow compaison taking into account type. * Removed quantize in vector-float.slang test.
2020-01-31	Some Slang API additions (#1195)	Tim Foley
	* Some Slang API additions These are additions to the public Slang API that came up while I was trying to write an example to demonstrate GPU printing. The additions aren't strictly necessary for the example, but I found these to be missing services when writing the code the way I wanted to. The main public changes are: * There is a new distinct `IEntryPoint` interface which inherit from `IComponentType` (much like `IModule` before). For now this doesn't expose any new functions on top of `IComponentType`, but I expect it to do so eventually. * It is now possible to get the `IModule` for a specific translation unit in a compile request with `spCompileRequest_getModule`. Even for a compile request that had only one translation unit this is not the same object as gets returned by `spCompileRequest_getProgram`. The latter should probably be called `_getLinkedProgram` because it returns a composite component type that links the module with everything it `import`s. An `IModule` can look up entry points declared in the module. Eventually a module should support looking up most of the declarations in the module (e.g., types) by name, but entry points are an obvious case. * A new `link()` operation is added to `IComponentType`. It is possible to have component types that have unsatisfied dependencies, such that trying to generate kernel code from them will fail. The `link()` operation tries to produce a new composite component type that combines a component with its dependencies, to enable code generation. The implementation of end-to-end compilation was using a function like this internally, but it hadn't been exposed to the API. Notes on the implementation: * The list of entry points declared in a given translation unit has moved from `TranslationUnitRequest` to the `Module` inside of it. * `EntryPoint` now has to do a song and dance much like `Module` to both inherit from `ComponentType` and support the `IEntryPoint` interface. * The `Session* m_session` member in `Linkage` (in terms of public API, this is the `slang::ISession` holding a pointer to the `slang::IGlobalSession`) has been changed to a `RefPtr`. Without this change an application can't just hold onto a `ComPtr<slang::ISession>`; they also need to retain the `IGlobalSession` or things will crash. The new behavior seems more correct, but I worry that it might introduce a leak. * The `asInternal` operation for `IComponentType` had to be updated to not just perform a cast. A type like `Module` has two `IComponentType` sub-objects, and only one of these is at the same address as the `ComponentType` base. * Similarly, the `Module::getInterface` logic was changed to fall back to `Super::getInterface` for all the cases other than `IID_IModule`, so that it would be guaranteed to return the `IComponentType` at the same address as `ComponentType` in response to a `queryInterface`. * Fixes for memory retain cycles As part of the earlier change, I made the `Linkage` type hold a `RefPtr` to the `Session`. The motivation there is it lets a user hang onto just a `slang::ISession` without having to also retain the `slang::IGlobalSession` for no immediately apparent reason. There are two problems that this surfaced, one pre-existing and one new. The new problem was that `Session` already held a `RefPtr<Linkage> m_builtinLinkage` for the linkage that holds the stdlib code. I solved that problem by splitting the parent pointer in `Linkage` into two pointers: a raw pointer that is used to actually locate the parent session, and a ref-counted one that can be used to optionally retain the parent session. The builtin linkage is then set up to explicitly not retain its parent, thus breaking the cycle. The second problem was a pre-existing one, where every `ComponentType` was holding a retained pointer to its parent `Linkage`, but in turn the `Linkage` was holding retained pointers to many `ComponentTypes` (and subclasses thereof). For this case I used the more expedient fix of making the parent pointer into a raw pointer, and figuring that it is a reasonable rule to expect user to retain the `Linkage` (aka `slang::ISession`) that owns a component type if they want to be able to use the component type. I might need/want to investigate a better fix for the latter issue, but for now this seems to clean up the issues I was seeing in the tests. Fingers crossed.
2020-01-31	Fix for a macro expansion bug (#1192)	Tim Foley
	* Fix for a macro expansion bug The work in #1177 added a notion of a macro being "busy" to prevent errors around recursively-defined macros: #define BAR(X) BAR(0) BAR(A) /* should yield BAR(0) and not infinite-loop / The fix was to place an `isBusy` flag on each macro, setting it to `true` when an expansion of that macro gets pushed onto the stack of input streams, and back to `false` when the expansion gets popped from the stack of input streams. That approach meant that we had to pre-expand macro arguments to avoid incorrectly treating a macro as busy for nested-but-not-recursive invocations: #define NEG(X) (-X) NEG(NEG(3)) // should expand to (-(-3)) and not (-NEG(3)) The subtle bug that arose with the `isBusy` flag is that the current preprocessor design doesn't always pop input streams when we might expect. For input like the following: BAR(A) BAR(B) The preprocessor can be in a state where the stack of input streams look like: // top input stream: BAR(X) => BAR(0) ----------------^ // bottom input stream BAR(A) BAR(B) ------^ That is, the first macro expansion is still "open," even though it is at the end of its input. When the preprocesor looks ahead and sees `BAR` as the next token and asks whether it should be expanded, the `isBusy` state on the BAR macro hasn't yet been unset, so expansion gets skipped. Aside: it is completely reasonable to ask at this point whether we should just "fix" the behavior of not popping input streams that are at their end. We should consider doing that, but the current code goes to some lengths to preserve the current behavior and I do not* recall why we had found it necessary. Any attempt to change that behavior should come along with lots of testing. This change instead adds a different approach for tracking the busy-ness of macros that doesn't rely on mutable state in the macro, and thus works even without pre-expansion of macro arguments. In the Slang preprocessor, macro names are always looked up in environments, where each environment maps names to macros, and has an option parent environment. There is a global environment used for most cases, but when expanding a function-like macro we would create an "argument environment" for it that maps the parameter names to their argument tokens. By expanding the environment type to include an (optional) field for a macro that is made "busy" by that environment, we can check if a macro is busy in a given environment by checking if the given macro has been associated with any of the environments on the chain of parent environments. A function-like macro expansion could then attach the macro to the "argument environment" and automatically make itself busy for the purposes of expansion of its body. To extend the same functionality to all macros, the "argument environment" was changes to an "expansion environment" that always gets used for a macro expansion and always has the identity of the macro attached. Because the arguments to a function-like macro have their own environment for expansion that bypasses the argument/expansion environment of the function-like macro, the macro will show as busy in its own body, but not in the expansion of its arguments. Thus the pre-expansion of macro arguments is not needed with this change. Aside: it might not be clear how this design avoids the problem of the unpopped input streams that aren't at their end. The answer is that the "current" environment for the preprocessor is already taken as the environment of the top-most input stream that is not at its end. This means that the new test for busy-ness benefits from the existing logic that was in place to deal with non popping input streams as soon as then end. (This is not to say that the current input-stream behavior isn't questionable) This change includes a new test case for the behavior that was broken, and expands the test case from #1177 to also test object-like macros. * Fix to handle the mutually-recursive case The revised "busy" logic was unable to handle a mutually-recursive macro like: #define ABC XYZ #define XYZ ABC This change restores the ability to handle mutually recursive cases, and makes sure that we test that case. The crux of the fix relies on splitting the environment and busy-macro tracking into more separate data structures. Rather than the list of environments directly trakcing busy-ness via the chain of parent environments, an environment now stores a separate linked list of macros that should be considered busy in that environment. Decoupling the two lists allows for an important change: when expanding an ordinary macro (either object-like or function-like, but not a function argument) the parent environment comes from the point where the macro was defined (this is required for lookup to make sense), but the parent list of busy macros comes from the current input stream at the point where expansion takes place. That change ensures that non-argument macros always properly "stack up" the list of busy macros.
2020-01-31	64 bit types doc (#1194)	jsmall-nvidia
	* * For integer literals add postfix, and use unsigned/signed output appropriately * Extend GLSL extension handling by type, and for adding 64 bit int extensions * Added tests for int/uint64 types * Add explicit Int/UInt64 emit functions to avoid ambiguity. * Fix uint64_t intrinsics on CUDA/C++. * WIP 64 bit types documentation. * Testing int64 intrinsic support. * Dx12 Dxil sm6.0 does actually support int64_t.
2020-01-30	Support for 64 bit integer types (#1191)	jsmall-nvidia
	* * For integer literals add postfix, and use unsigned/signed output appropriately * Extend GLSL extension handling by type, and for adding 64 bit int extensions * Added tests for int/uint64 types * Add explicit Int/UInt64 emit functions to avoid ambiguity.
2020-01-29	Feature/target intrinsic fold (#1190)	jsmall-nvidia
	* When checking if an instruction can be folded, take into account if it's called by a target intrinsic, because if it is we need to check if the parameter is accessed multiple times to see if it's worth allowing to fold. * Tidy up code around folding/target intrinsics. * Fix texture-load.slang . * Fix typo in assert.
2020-01-29	Feature/fix cuda function preamble (#1187)	jsmall-nvidia
	* Fix tests/compute/global-init.slang by handling some other cases where functions are emitted. * Fix comment.
2020-01-28	Fix layout for structured buffers of matrices (#1184)	Tim Foley
	When using row-major layout (via command-line or API option), the following sort of declaration: ```hlsl StructuredBuffer<float4x4> gBuffer; ... gBuffer[i] ... ``` Generates unexpected results when compiled to DXBC via fxc or DXIL via dxc, because the fxc/dxc compilers do not respect the matrix layout mode in this specific case (a structured buffer of matrices). Instead, they always use column-major layout, even if row-major was requested by the user. A user can work around this behavior by wrapping the matrix in a `struct`: ```hlsl struct Wrapper { float4x4 wrapped; } SturcturedBuffer<Wrapper> gBuffer; ... gBuffer[i].wrapped ... ``` This change simply automates that workaround when compiling for an HLSL-based downstream compiler, so that we get the same behavior across all our backends. The change adds a test case to confirm the behavior across multiple targets, but it turns out we also had a test checked in that confirmed the buggy (or at least surprising) fxc/dxc behavior, so that one had its baselines changed and can work as a regression test for this fix as well.
2020-01-28	Synthesizing CUDA tests (#1183)	jsmall-nvidia
	* When using setUniform clamp the amount of data written to the buffer size. * CUDA implement StructuredBuffer/ByteAddressBuffer as pointer/count as is on CPU. Allow bounds check to zero index. Update docs. * Synthesize tests. * Fix bug in CUDA output. * Fixing more tests to run on CUDA. * Added BaseType for layout of Vector and Matrix - as they are held as int32_t vector array types. * Enable unbound array support on CUDA. * Added unsized array support for CUDA documentation.
2020-01-27	CUDA implement StructuredBuffer/ByteAddressBuffer as pointer/count as is on ↵	jsmall-nvidia
	CPU. (#1182) Allow bounds check to zero index. Update docs.
2020-01-24	Fix for infinite recursion with macro invocation (#1177)	jsmall-nvidia
	* First pass fix of macro expansion logic to stop recursive application (causting a recursive loop), whilst also allowing application on parameters to a macro. * Added recursive-macro test. Fixed macro application example.
2020-01-24	Texture Sample available in CUDA (#1176)	jsmall-nvidia
	* WIP: Trying to figure out how texturing will work with CUDA. * WIP: Fixes for CUDA layout. Initial CUDA texture test. * WIP: Outputs something compilable by CUDA for TextureND.Sample * 2d texture working with CUDA. * Fix how binding for SamplerState occurs in CUDA. * Small tidy up of comments.
2020-01-23	Fix a bug in handling explicit register space bindings (#1175)	Tim Foley
	The logic for handling explicit `space`/`set` bindings on shader parameters for parameter blocks was not correctly marking the `space`/`set` that gets grabbed as used, and as a result it was possible for another parameter block that relies on implicit assignment to end up with a conflicting space. This change fixes the original oversight, and adds a test case to prevent against regression.
2020-01-22	Matrix indexing (#1172)	jsmall-nvidia
	* Added hlsl-intrinsic test folder. Enabled ceil as works across targets. * log10 support. * Fix float % on CPU/CUDA to match HLSL which is fmod (not fremainder). * Added log10 tests back to scalar-float.slang * Don't add the ( for $Sx - it's clearer what's going on without it. * Works on CUDA/CPU. Problem with asint/asuint do not seem to be found. * Only asuint exists for double. * Support countbits on CUDA and C++. * Fix typo in C++ population count. * First pass at int vector intrinsic tests. * Swizzle for int. * Bit cast tests on CUDA. * Fix warning on gcc. * Fix bit-cast-double execution on CUDA. * scalar-int test working on gcc release. * GetAt working on CUDA/C++ * Split out runtime index into it's own test. * Removed SetAt, as can use assignment with GetAt. * Allowing getAt to be used on matrices. * Don't need [] on matrix type any longer because use getAt. * Enable clamp on matrix-int. * Fix matrix-int.slang test - because clamp behavior varied if min and max were say inverted. Added runtime indexing version of matrix-int.
2020-01-22	WIP HLSL intrinsic coverage (#1171)	jsmall-nvidia
	* Added hlsl-intrinsic test folder. Enabled ceil as works across targets. * log10 support. * Fix float % on CPU/CUDA to match HLSL which is fmod (not fremainder). * Added log10 tests back to scalar-float.slang * Don't add the ( for $Sx - it's clearer what's going on without it. * Works on CUDA/CPU. Problem with asint/asuint do not seem to be found. * Only asuint exists for double. * Support countbits on CUDA and C++. * Fix typo in C++ population count. * First pass at int vector intrinsic tests. * Swizzle for int. * Bit cast tests on CUDA. * Fix warning on gcc. * Fix bit-cast-double execution on CUDA. * scalar-int test working on gcc release.
2020-01-21	HLSL intrinsic coverage (#1169)	jsmall-nvidia
	* Added hlsl-intrinsic test folder. Enabled ceil as works across targets. * log10 support. * Fix float % on CPU/CUDA to match HLSL which is fmod (not fremainder). * Added log10 tests back to scalar-float.slang * Don't add the ( for $Sx - it's clearer what's going on without it.
2020-01-21	CUDA support improvements (#1168)	jsmall-nvidia
	* Add test result for compile-to-cuda * Add RAII for some CUDA types to simplify usage. * First pass handling of some instrinsics on CUDA (for example transcendentals) * CUDA working with built in intrinsics. * Add missing CUDA prelude intrinsics. * CUDA matches CPU output on simple-cross-compile.slang * First pass at hlsl-scalar-float-intrinsic.slang test. * Fix smoothstep impl on CUDA and CPU. * Fixed step intrinsic on CUDA/CPU. * Added operator[] to Matrix for C++, to allow row access. Needs a fix for CUDA. * Fixed warning on clang build.
2020-01-17	Slang -> CUDA kernel runs correctly in test infrastructure (#1167)	jsmall-nvidia
	* First pass at BindLocation. * Added BindSet::init - for initializing with two input constant buffers. Needs better name, and perhaps should be another class. * Fix handling of constant buffer stripping. Improved initialization. * Trying to generalize BindLocation a little more. Split out CPULikeBindRoot. * More work to make BindLocation et al work with non uniform bindings. * Added parsing to a location. * WIP: Trying to get CPU working with BindLocation. * Describe problem of knowing the type of the reference point in the binding table. * More ideas on getBindings fix. * Remove BindSet as member of BindLocation. * Added BindLocation::Invalid * Made BindLocation able to be key in hash * Use BindLocation for bindings on BindingSet. * Added cuda and nvrtc categories to test infrastructure. Disabled CUDA synthetic tests by default. Fixed such that all tests now produce something in BindLocation style. * Use m_userIndex instead of m_userData on Resource. Move the binding setup out of cpu-compute-util (as no longer CPU specific) * Removed CPUBinding - used BindLocation/BindSet instead. Fixed some bugs around indexOf around uniform indirection. * Renamed BindSet::Resource -> BindSet::Value. * Document BindLocation. * Fixes for Clang/GCC Improve invariant requirement handling when constructing from BindPoints. * WIP: First attempt to run CUDA kernel. * Fix some issues around doing CUDA kernel launch. * Fix issues around use of cudaMemCpy . * Better cuda runtime error checking mechanism. * Fixed bug in passing parameters to cuda kernel launch. Simplified initialisation of context. * WIP: Fix CUDA runtime issues. * Add explicit CUDA synchronize so failures don't appear on implicit ones. * Fix problem emitting non shared variable on CUDA. * Fix some typos in CUDA layout. Use just a pointer for now for CUDA StucturedBuffer. * Arg order for CUDA launch was wrong. * First compute kernel runs on CUDA.
2020-01-15	Bind Location (#1166)	jsmall-nvidia
	* First pass at BindLocation. * Added BindSet::init - for initializing with two input constant buffers. Needs better name, and perhaps should be another class. * Fix handling of constant buffer stripping. Improved initialization. * Trying to generalize BindLocation a little more. Split out CPULikeBindRoot. * More work to make BindLocation et al work with non uniform bindings. * Added parsing to a location. * WIP: Trying to get CPU working with BindLocation. * Describe problem of knowing the type of the reference point in the binding table. * More ideas on getBindings fix. * Remove BindSet as member of BindLocation. * Added BindLocation::Invalid * Made BindLocation able to be key in hash * Use BindLocation for bindings on BindingSet. * Added cuda and nvrtc categories to test infrastructure. Disabled CUDA synthetic tests by default. Fixed such that all tests now produce something in BindLocation style. * Use m_userIndex instead of m_userData on Resource. Move the binding setup out of cpu-compute-util (as no longer CPU specific) * Removed CPUBinding - used BindLocation/BindSet instead. Fixed some bugs around indexOf around uniform indirection. * Renamed BindSet::Resource -> BindSet::Value. * Document BindLocation. * Fixes for Clang/GCC Improve invariant requirement handling when constructing from BindPoints.
2020-01-10	WIP: CPU like CUDA binding (#1164)	jsmall-nvidia
	* CUDA generated first test compiles. * WIP on enabling CUDA in render-test. * Detect CUDA_PATH environmental variable to build build cuda support into render-test. Added WIP cuda-compute-util.cpp/h Added CUDA as a renderer type. * Fix libraries needed for cuda in premake. * Added -enable-cuda premake option. Defaults to false. * Creates CUDA device, loads PTX and finds entry point. * Fix some erroneous cruft from slang-cuda-prelude.h * Made CUDA use C++ like ABI for generated code. Fix small bug in C++ output semantics.
2020-01-09	Catch exceptions inside FindTypeByName()	Tim Foley
	If `spReflection_FindTypeByName` is called with a string that refers to the name of a variable instead of a type, then it ends up causing a fatal error to be diagnosed, leading to an `AbortCompilationException` being thrown. This exception then propagates up to the application that called `spReflection_FindTypeByName()`, which actually violates the contract for our `extern "C"` API. This change adds a wrapper `try`/`catch` around the meat of the operation and turns any exception into a null result (this function has no other way to report errors to the caller). It would be nice for a separate change to make the error message that caused the problem non-fatal, but that is orthogonal to this fix.
2020-01-08	Cover a few corner cases in reflection API (#1163)	Tim Foley
	This change adds some new entry points to the reflection API to cover corner cases that a majority of applications won't care about. These are most likely to come up for users who want to make a complete copy of the Slang reflection information into a data format of their own design. All of the information is stuff that we already computed as part of layout, and just hadn't exposed: * Alignment information for type layouts. This is only useful for ordinary/uniform data; in all other cases alignment is always one. Even for uniform/ordinary data, it is unlikely that any application would actually make use of it. * Layout information for the result of an entry point function. This would be useful for applications that need to enumerate the varying outputs (user- or system-defined) of a shader. Having information available for `out` parameters but not the function result was inconsistent. * The "element type" of a parameter block type (e.g., going from `ParameterBlock<X>` to `X`). This seems to have been an oversight since `ConstantBuffer<X>` appears to have been implemented, and the case for a type layout was handled. * The "container" variable layout for a parameter block or constant buffer. It took a while for us to arrive at the current representation of layout for parameter groups, and most client code continues to use the original API that requires us to generated kludged "do what I mean" data. However, if we don't expose the more useful new representation fully, there is no way for users to take advantage of it! The reflection test tool has been updated to print the new information where it makes sense, which provides us some level of coverage for the new code. Unfortunately, this led to some cascading changes: * First, a bunch of the tests had their output changed since they include new information. That's the easy bit. * Next, the "container" and "element" var layouts don't actually have names (because there is no actual variable underlying them), which means that the code to emit variable names in the JSON dump needed to be condition. * Making the `"name"` output conditional messed up a lot of the delicate logic that had been dealing with when to emit commas for the output JSON (JSON uses commas as separators, and doesn't allow trailing commas). I added a bit of new infrastructure to make it simple(-ish) to track when a comma actually needs to be output.
2020-01-08	Setup of runtime cuda device (#1162)	jsmall-nvidia
	* CUDA generated first test compiles. * WIP on enabling CUDA in render-test. * Detect CUDA_PATH environmental variable to build build cuda support into render-test. Added WIP cuda-compute-util.cpp/h Added CUDA as a renderer type. * Fix libraries needed for cuda in premake. * Added -enable-cuda premake option. Defaults to false. * Creates CUDA device, loads PTX and finds entry point. * Fix some erroneous cruft from slang-cuda-prelude.h
2020-01-08	CUDA generated first test compiles. (#1161)	jsmall-nvidia

2020-01-06	Fix scoping issue around use of IRTypeSet (#1160)	jsmall-nvidia
	* WIP use IRTypeSet in CPPSourceEmitter - doesn't work because of a cloning issue, causing a crash on exit. * Fix destruction of module issue for IRTypeSet usage in CPPEmitter. * Fix out definition emitting ordering that was removed. * Disable cuda output test.
2019-12-20	HLSLIntrinsicSet (#1159)	jsmall-nvidia
	* CPPCompiler -> DownstreamCompiler * Added DownstreamCompileResult to start abstraction such that we don't need files. * * Split out slang-blob.cpp * Made CompileResult hold a DownstreamCompileResult - for access to binary or ISlangSharedLibrary * Keep temporary files in scope. * Add a hash to the hex dump stream. * Move all file tracking into DownstreamCompiler. * WIP support for nvrtc. * WIP: Adding support for nvrtc compiler. Adding enum types, wiring up the nvrtc into slang. * Fix remaining CPPCompiler references. * Fix order issue on target string matching. * Use ISlangSharedLibrary for nvrtc. * Use DownstreamCompiler for nvrtc. * WIP first pass at compilation win nvrtc. * Added testing if file is on file system into CommandLineDownstreamCompiler. Added sourceContentsPath. * Make test cuda-compile.cu work by just compiling not comparing output. * Genearlize DownstreamCompiler usage. * Fix warning on clang. * Remove CompilerType from DownstreamCompiler. * Use DownstreamCompiler interface for all compilers. NOTE for FXC, DXC and GLSLANG this doesn't mean using 'compile' - it's still extracting functions from shared library. * Replace DownstreamCompiler::SourceType -> SlangSourceLanguage * Replace _canCompile with something data driven. * Fix compiling on gcc/clang for DownstreamCompiler. * Moved some text conversions into DownstreamCompiler. * Fix problem on non-vc builds with not having return on locateCompilers for VS. * Change so no warning for code not reachable on locateCompilers for vs. * WIP: CUDA code generation - currently just using CPU layout and HLSL. * emitXXXForEntryPoint -> emitEntryPointSource emitSourceForEntryPoint -> emitEntryPointSourceFromIR Fix up generating cuda to get PTX. * WIP emitting cuda for IR. * Small improvements to CUDA ouput. * Disable the CUDA emit test, as output not currently compilable. * Split out IRTypeSet to simplify CPPSourceEmitter and other Emitters that rely on determining unique use of type and/or need to generate types in order to output code. * First pass at HLSLIntrinsicSet. * Small improvements to HLSLIntrinsicSet. * Use HLSLIntrinsicSet in CPPSourceEmitter. * Small improvements to checking of HLSLIntrinsic construction. * Deallocate intrinsic copy if a match was found.
2019-12-19	Split out IRTypeSet (#1158)	jsmall-nvidia
	* CPPCompiler -> DownstreamCompiler * Added DownstreamCompileResult to start abstraction such that we don't need files. * * Split out slang-blob.cpp * Made CompileResult hold a DownstreamCompileResult - for access to binary or ISlangSharedLibrary * Keep temporary files in scope. * Add a hash to the hex dump stream. * Move all file tracking into DownstreamCompiler. * WIP support for nvrtc. * WIP: Adding support for nvrtc compiler. Adding enum types, wiring up the nvrtc into slang. * Fix remaining CPPCompiler references. * Fix order issue on target string matching. * Use ISlangSharedLibrary for nvrtc. * Use DownstreamCompiler for nvrtc. * WIP first pass at compilation win nvrtc. * Added testing if file is on file system into CommandLineDownstreamCompiler. Added sourceContentsPath. * Make test cuda-compile.cu work by just compiling not comparing output. * Genearlize DownstreamCompiler usage. * Fix warning on clang. * Remove CompilerType from DownstreamCompiler. * Use DownstreamCompiler interface for all compilers. NOTE for FXC, DXC and GLSLANG this doesn't mean using 'compile' - it's still extracting functions from shared library. * Replace DownstreamCompiler::SourceType -> SlangSourceLanguage * Replace _canCompile with something data driven. * Fix compiling on gcc/clang for DownstreamCompiler. * Moved some text conversions into DownstreamCompiler. * Fix problem on non-vc builds with not having return on locateCompilers for VS. * Change so no warning for code not reachable on locateCompilers for vs. * WIP: CUDA code generation - currently just using CPU layout and HLSL. * emitXXXForEntryPoint -> emitEntryPointSource emitSourceForEntryPoint -> emitEntryPointSourceFromIR Fix up generating cuda to get PTX. * WIP emitting cuda for IR. * Small improvements to CUDA ouput. * Disable the CUDA emit test, as output not currently compilable. * Split out IRTypeSet to simplify CPPSourceEmitter and other Emitters that rely on determining unique use of type and/or need to generate types in order to output code.
2019-12-19	WIP CUDA source emit (#1157)	jsmall-nvidia
	* CPPCompiler -> DownstreamCompiler * Added DownstreamCompileResult to start abstraction such that we don't need files. * * Split out slang-blob.cpp * Made CompileResult hold a DownstreamCompileResult - for access to binary or ISlangSharedLibrary * Keep temporary files in scope. * Add a hash to the hex dump stream. * Move all file tracking into DownstreamCompiler. * WIP support for nvrtc. * WIP: Adding support for nvrtc compiler. Adding enum types, wiring up the nvrtc into slang. * Fix remaining CPPCompiler references. * Fix order issue on target string matching. * Use ISlangSharedLibrary for nvrtc. * Use DownstreamCompiler for nvrtc. * WIP first pass at compilation win nvrtc. * Added testing if file is on file system into CommandLineDownstreamCompiler. Added sourceContentsPath. * Make test cuda-compile.cu work by just compiling not comparing output. * Genearlize DownstreamCompiler usage. * Fix warning on clang. * Remove CompilerType from DownstreamCompiler. * Use DownstreamCompiler interface for all compilers. NOTE for FXC, DXC and GLSLANG this doesn't mean using 'compile' - it's still extracting functions from shared library. * Replace DownstreamCompiler::SourceType -> SlangSourceLanguage * Replace _canCompile with something data driven. * Fix compiling on gcc/clang for DownstreamCompiler. * Moved some text conversions into DownstreamCompiler. * Fix problem on non-vc builds with not having return on locateCompilers for VS. * Change so no warning for code not reachable on locateCompilers for vs. * WIP: CUDA code generation - currently just using CPU layout and HLSL. * emitXXXForEntryPoint -> emitEntryPointSource emitSourceForEntryPoint -> emitEntryPointSourceFromIR Fix up generating cuda to get PTX. * WIP emitting cuda for IR. * Small improvements to CUDA ouput. * Disable the CUDA emit test, as output not currently compilable.
2019-12-19	Fix invocation of `[mutating]` methods (#1156)	Tim Foley
	The logic for invoking methods (member functions) in `slang-lower-to-ir.cpp` was failing to take into account whether the callee was `[mutating]` or not. Instead, it would always lower the `base` expression in something like `base.f(...)` as an r-value expression, consistent with a non-`[mutating]` method. The incorrect code generation strategy somehow turned out to work in many cases, but it broke in cases where a `[mutating]` method was called on an `inout` parameter. E.g., in this code: ```hlsl struct Stuff { [mutating] void doThing() { ... } } void broken(inout Stuff s) { s.doThing(); } ``` The `broken` function would fail to write back the value mutated by `doThing` to its `s` parameter before returning. The crux of the fix here is inside `visitInvokeExpr()`. Instead of directly calling `lowerRValueExpr` on the base expression of a method/member-function call, we instead compute the "direction" of the `this` parameter in the callee, and use that to emit the argument expression appropriately. In order to enable that change, there are several refactorings included: * The existing `ParameterDirection` and `getParameterDirection()` calls were lifted out from the declaration visitor to the global scope, so that they could be shared between lowering of functions and their call sites. * The logic for determining the "direction" of a `this` parameter was factored out of `collectParameterLists()` into its own `getThisParamDirection()` subroutine (again so that functions and call sites can share matching logic). * The logic for turning an AST expression used as a call argument into IR argument(s)* was pulled out into its own `addCallArgsForParam` and was refactored to rely on a `ParameterDirection` instead of directly inspecting the modifiers on a `ParamDecl`. This allows the function to be used for ordinary/direct arguments and the `this` argument, and also ensures that the caller and callee will agree on the direction of parameters. Fixing the way that `[mutating]` methods are called actually broke some test cases, specifically in the cases where a `[mutating]` method was being called on a value with an interface-constrained generic type: ```hlsl interface IThing { [mutating] void doStuff(); } void myFunc<T : IThing>(inout T thing) { thing.doStuff(); } ``` Our argument passing for `inout` parameters currently requires that we make a temp copy of `thing` into a local, and then pass that local as argument for the `inout` parameter, before copying back. The issue that arose was that a simple version of the logic uses the type of the `base` expression in `base.someMethod(...)` as the type of the local variable, but for an interface method call the base expression will have been cast to the interface type (we effectively have `((IThing) thing).doStuff()`. The fix here was to query the this type through the member function we are calling, and to share that logic between the function-call and function-declaration cases, to try and make sure they match, which meant even more logic got hoisted out of the declaration-emission logic and to the top level. Note: This change does not clean up any other clarity or performance concerns around `out` and `inout` parameters; it is only focused on correctness.
2019-12-16	If a locator is not set for a DownstreamCompiler type, that doesn't ↵	jsmall-nvidia
	necessarily mean that there isn't one, for example with the GENERIC_C_CPP option. This could also be fixed by say giving this type an empty locator, or special casing. The option chosen here, is to allow lookup even if there isn't a locator. Note that there is some special case handling, where a generic lookup, will prime all of the specific types. (#1154)
2019-12-12	Feature/source downstream compiler (#1153)	jsmall-nvidia
	* CPPCompiler -> DownstreamCompiler * Added DownstreamCompileResult to start abstraction such that we don't need files. * * Split out slang-blob.cpp * Made CompileResult hold a DownstreamCompileResult - for access to binary or ISlangSharedLibrary * Keep temporary files in scope. * Add a hash to the hex dump stream. * Move all file tracking into DownstreamCompiler. * WIP support for nvrtc. * WIP: Adding support for nvrtc compiler. Adding enum types, wiring up the nvrtc into slang. * Fix remaining CPPCompiler references. * Fix order issue on target string matching. * Use ISlangSharedLibrary for nvrtc. * Use DownstreamCompiler for nvrtc. * WIP first pass at compilation win nvrtc. * Added testing if file is on file system into CommandLineDownstreamCompiler. Added sourceContentsPath. * Make test cuda-compile.cu work by just compiling not comparing output. * Genearlize DownstreamCompiler usage. * Fix warning on clang. * Remove CompilerType from DownstreamCompiler. * Use DownstreamCompiler interface for all compilers. NOTE for FXC, DXC and GLSLANG this doesn't mean using 'compile' - it's still extracting functions from shared library. * Replace DownstreamCompiler::SourceType -> SlangSourceLanguage * Replace _canCompile with something data driven. * Fix compiling on gcc/clang for DownstreamCompiler. * Moved some text conversions into DownstreamCompiler. * Fix problem on non-vc builds with not having return on locateCompilers for VS. * Change so no warning for code not reachable on locateCompilers for vs.
2019-12-12	Use DownstreamCompiler for all downstream compilers (#1152)	jsmall-nvidia
	* CPPCompiler -> DownstreamCompiler * Added DownstreamCompileResult to start abstraction such that we don't need files. * * Split out slang-blob.cpp * Made CompileResult hold a DownstreamCompileResult - for access to binary or ISlangSharedLibrary * Keep temporary files in scope. * Add a hash to the hex dump stream. * Move all file tracking into DownstreamCompiler. * WIP support for nvrtc. * WIP: Adding support for nvrtc compiler. Adding enum types, wiring up the nvrtc into slang. * Fix remaining CPPCompiler references. * Fix order issue on target string matching. * Use ISlangSharedLibrary for nvrtc. * Use DownstreamCompiler for nvrtc. * WIP first pass at compilation win nvrtc. * Added testing if file is on file system into CommandLineDownstreamCompiler. Added sourceContentsPath. * Make test cuda-compile.cu work by just compiling not comparing output. * Genearlize DownstreamCompiler usage. * Fix warning on clang. * Remove CompilerType from DownstreamCompiler. * Use DownstreamCompiler interface for all compilers. NOTE for FXC, DXC and GLSLANG this doesn't mean using 'compile' - it's still extracting functions from shared library. * Fix compiling on gcc/clang for DownstreamCompiler. * Fix problem on non-vc builds with not having return on locateCompilers for VS. * Change so no warning for code not reachable on locateCompilers for vs.
2019-12-12	Slang compiles CUDA source via NVRTC (#1151)	jsmall-nvidia
	* CPPCompiler -> DownstreamCompiler * Added DownstreamCompileResult to start abstraction such that we don't need files. * * Split out slang-blob.cpp * Made CompileResult hold a DownstreamCompileResult - for access to binary or ISlangSharedLibrary * Keep temporary files in scope. * Add a hash to the hex dump stream. * Move all file tracking into DownstreamCompiler. * WIP support for nvrtc. * WIP: Adding support for nvrtc compiler. Adding enum types, wiring up the nvrtc into slang. * Fix remaining CPPCompiler references. * Fix order issue on target string matching. * Use ISlangSharedLibrary for nvrtc. * Use DownstreamCompiler for nvrtc. * WIP first pass at compilation win nvrtc. * Added testing if file is on file system into CommandLineDownstreamCompiler. Added sourceContentsPath. * Make test cuda-compile.cu work by just compiling not comparing output. * Fix warning on clang.
2019-12-10	DownstreamCompiler abstraction (#1149)	jsmall-nvidia
	* CPPCompiler -> DownstreamCompiler * Added DownstreamCompileResult to start abstraction such that we don't need files. * * Split out slang-blob.cpp * Made CompileResult hold a DownstreamCompileResult - for access to binary or ISlangSharedLibrary * Keep temporary files in scope. * Add a hash to the hex dump stream. * Move all file tracking into DownstreamCompiler.
2019-12-06	Support conversion from int/uint to enum types (#1147)	Tim Foley
	* Support conversion from int/uint to enum types The basic feature here is tiny, and is summarized in the code added to the stdlib: ``` extension __EnumType { __init(int val); __init(uint val); } ``` The front-end already makes all `enum` types implicitly conform to `__EnumType` behind the scenes, and this `extension` makes it so that all such types inherit some initializers (`__init` declarations, aka. "constructors") that take `int` and `uint`. (Note: right now all `__init` declarations in Slang are assumed to be implemented as intrinsics using `kIROp_Construct`. This obviously needs to change some day, especially so that we can support user-defined initializers.) Actually making this work required a bit of fleshing out pieces of the compiler that had previously been a bit ad hoc to be a bit more "correct." Most of the rest of this description is focused on those details, since the main feature is not itself very exciting. When overload resolution sees an attempt to "call" a type (e.g., `MyType(3.0)`) it needs to add appropriate overload candidates for the initializers in that type, which may take different numbers and types of parameters. The existing code for handling this case was using an ad hoc approach to try to enumerate the initializer declarations to consider, which might be found via inheritance, `extension` declarations, etc. In practice, the ad hoc logic for looking up initializers was just doing a subset of the work that already goes into doing member lookup. Changing the code so that it effectively does lookup for `MyType.__init` allows us to look up initializers in a way that is consistent with any other case of member lookup. Generalizing this lookup step brings us one step closer to being able to go from an `enum` type `E` to an initializer defined on an `extension` of an `interface` that `E` conforms to. One casualty of using the ordinary lookup logic for initializers is that we used to pass the type being constructed down into the logic that enumerated the initializers, which made it easier to short-circuit the part of overload resolution that usually asks "what type does this candidate return." It might seem "obvious" that an initializer/constructor on type `Foo` should return a value of type `Foo`, but that isn't necessarily true. Consider the `__BuiltinFloatingPointType` interface, which requires all the built-in floating-point types (`float`, `double`, `half`) to have an initializer that can take a `float`. If we call that interface in a generic context for `T : __BuiltinFloatingPointType`, then we want to treat that initializer as returning `T` and not `__BuiltinFloatingPointType`. Without the ad hoc logic in initializer overload resolution, this is the exact problem that surfaced for the stdlib definition of `clamp`. The solution to the "what type does an initializer return" problem was to introduce a notion of a `ThisType`, which refers to the type of `this` in the body of an interface. More generally, we will eventually want to have the keyword `This` be the type-level equivalent of `this`, and be usable inside any type. The `calcThisType` function introduced here computes a reasonable `Type` to represent the value of `This` within a given declaration. Inside of concrete type it refers to the type itself, while in an `interface` it will always be a `ThisType`. The existing `ThisTypeSubstitution`s, previously only applied to associated types, now apply to `ThisType`s as well, in the same situations. The next roadblock for making the simple declarations for `__EnumType` work was that the lookup logic was only doing lookup through inheritance relationships when the type being looked up in was an `interface`. The logic in play was reasonable: if you are doing lookup in a type `T` that inherits from `IFoo`, then why bother looking for `IFoo::bar` when there must be a `T::bar` if `T` actually implements the interface? The catch in this case is that `IFoo::bar` might not be a requirement of `IFoo`, but rather a concrete method added via an `extension`, in which case `T` need not have its own concrete `bar`. The simple/obvious fix here was to make the lookup logic always include inherited members, even when looking up through a concrete type. Of course, if we allow lookup to see `IFoo::bar` when looking up on `T`, then we have the problem that both `T::bar` and `IFoo::bar` show up in the lookup results, and potentially lead to an "ambiguous overload" error. This problem arises for any interface rquirement (so both methods and associated types right now). In order to get around it, I added a somewhat grungy check for comparing overload candidates (during overload resolution) or `LookupResultItem`s (during resolution of simple overloaded identifiers) that considers a member of a concrete type as automatically "better" than a member of an interface. The Right Way to solve this problem in the long run requires some more subtlety, but for now this check should Just Work. One final wrinkle is that due to our IR lowering pass being a bit overzealous, we currently end up trying to emit IR for those new `__init` declarations, which ends up causing us to try and emit IR for a `ThisType`. That is a case that will require some subtlty to handle correctly down the line, for for now we do the expedient thing and emit the `ThisType` for `IFoo` as `IFoo` itself, which is not especially correct, but doesn't matter since the concrete initializer won't ever be called. * testing: add more debug output to Unix process launch function * testing: increase timeout when running command-line tests
2019-12-06	Add a custom RTTI implementation for the AST (#1148)	Tim Foley
	* Add a custom RTTI implementation for the AST Profiling was showing that the internal routines behind `dynamic_cast` were the worst offenders in the whole codebase, and a lot of this was being driven by casting inside the semantic checking logic. This change takes advantage of the fact that we already had a custom RTTI structure built up for the classes of syntax nodes that was previously being used to implement string->class lookup and factory services to support "magic" types and modifier in the stdlib (e.g., the way that a modifier declaration in the stdlib just lists the name of a C++ class that should be instantiated for it). That RTTI information already included a pointer from each syntax class to its base class (if any), based on the restriction that the AST node types form a single-inheritance hierarchy. The existing code already had a virtual `getClass()` routine on AST nodes, and an `isSubClassOf` query on the class information. Putting those pieces together to implement the `dynamicCast` and `as` routines was a cinch. The work in previous PRs to layer an abstraction over all the existing `dynamic_cast` call sites and to support type-specific `dynamicCast` implementations inside `RefPtr<>` pays off greatly here. * fixup: refactor implementation to appease more pedantic/correct compilers
2019-12-06	Remove legacy feature for merging global shader parameters (#1139)	Tim Foley
	* Remove legacy feature for merging global shader parameters There is a fair amount of special-case code in the Slang compiler today to deal with the scenario where a programmer declares the "same" shader parameter across two different translation units: ```hlsl // a.hlsl Texture2D a; cbuffer C { float4 c; } ``` ```hlsl // b.hlsl cbuffer C { float4 c; } Texture2D b; ``` An important note here is that the declaration of `C` may be in a header file that both `a.hlsl` and `b.hlsl` `#include`, because from the standpoint of the parser and later stages of the compiler, there is no difference between `C` being in an included file vs. it being copy-pasted across both `a.hlsl` and `b.hlsl`. When a user invokes `slangc a.hlsl b.hlsl` (or the equivalent via the API), then they may decide that it is "obvious" that the shader parameter `C` is the "same" in both `a.hlsl` and `b.hlsl`. Knowing that the parameter is the "same" may lead them to make certain assumptions: * They may assume that generated code for entry points in `a.hlsl` and `b.hlsl` will both agree on the exact `register`/`binding` occupied by `C`. * They may assume that reflection information for their program will only reflect `C` once, and it will reflect it in a way that is applicable to entry points in both `a.hlsl` and `b.hlsl` * They may assume that the compiler can and should handle this use case even when `C` contains fields with `struct` types that are declared in both `a.hlsl` and `b.hlsl` that have the "same" definition. * They may assume that in cases where `C` is declared inconsistently between `a.hlsl` and `b.hlsl` the compiler can and will diagnose an error. Making these assumptions work in practice required a lot of special-case code: * When composing/linking programs was `ComponentType`s we had to include a special case `LegacyProgram` type that could provide these "do what I mean" semantics, since they are not what one would want in the general case for a `CompositeComponentType`. * During enumeration of global shader parameter in a `LegacyProgram`, we had to detect parameters from distinct modules (translation units) with the same name, and then enforce that they must have the "same" type (via an ad hoc recursive structural type match). No other semantic checking logic needs or uses that kind of structural check. * During parameter binding generation, we need to handle the case where a single global shader parameter might have multiple declarations, and make sure to collect explicit bindings from all of them (checking for inconsistency) and also to apply generated bindings to all of them. * The `mapVarToLayout` member in `StructTypeLayout` is a concession to the fact that we might have multiple `VarDecl`s for each field of the struct that represents the global scope, we might need to look up a field and its layout using any of those declarations (much of the need for this field had gone away now that IR passes are largely using IR-based layout). All of these different special cases added more complex code in many places in the compiler, all to support a scenario that isn't especially common. Most users won't be affected by the original issue, because they will do one of several things that rule it out: * Anybody using `slangc` like a stand-in for `fxc` or `dxc` and compiling one translation unit at a time will not suffer from any problems. If/when such users want consistent bindings across translation units, they already use either explicit binding or rely on consistent ordering and implicit binding. * Anybody who puts all the entry points that get combined into a pass/pipeline in a single file will not have problems. They will automatically get consistent bindings because of Slang's guarantees, and there can't be duplicated declarations when there is only one translation unit. * Anybody using `import` to factor out common declarations while compiling multiple translation units at once will not be affected. Parameters declared in an `import`ed module are the "same" in a much deeper way that it is trivial for Slang to support. Only users of the Falcor framework are likely to be affected by this, and they have two easy migration paths: either put related entry points into the same file, or factor common parameters into an `import`ed module. (It is also worth noting that for command-line `slangc`, it is possible to have a single module with multiple `.slang` files in it, which can all see global declarations like parameters across all the files. Anybody who buys into doing things the Slang Way should have no problem avoiding duplicated declarations) With the rationale out of the way, the actual change mostly just amounts to deleting lots of code that is no longer needed. An astute reviewer might notice several `assert`-fail conditions where complex Slang features were never actually made to work correctly with this legacy behavior. A small number of test cases broke with the code changes, but these were tests that specifically exercised the behavior being removed. In the case of the tests around binding/reflection generating, I rewrote the tests to use one of the idomatic workarounds (putting the shared parameters into an `import`ed module), but doing so required me to add support for `#include` when doing pass-through compilation with `fxc`. That logic added a bit more cruft than I had originally hoped to this commit, but having `#include` support when doing pass-through compilation is probably a net win. * fixup: 64-bit warning