slang.git - Making it easier to work with shaders

Age	Commit message (Collapse)	Author
2020-02-21	dxil-asm target output was dxil code not asm. (#1238)	jsmall-nvidia

2020-02-21	Add some missing support for Shader Model 6.4/6.5 (#1237)	Tim Foley
	Just adding the enumerants for the new stages/profiles didn't automatically update the code that computes a profile string for passing to fxc/dxc.
2020-02-21	Add surface syntax for "this type" (#1236)	Tim Foley
	Within the context of an aggregate type (or an `extension` of one), the programmer can use `this` to refer to the "current" instance of the surrounding type, but there is no easy way to utter the name of the type itself. This is especially relevant inside of an `interface`, where the type of `this` isn't actually the `interface` type, but rather a placeholder for the as-yet-unknown concrete type that will implement the interface. This change adds a keyword `This` that works similarly to `this`, but names the current type instead of the current instance. It can be used to declare things like binary methods or factory functions in an interface: ``` interface IBasicMathType { This absoluteValue(); This sumWith(This left); } T doSomeMath<T:IBasicMathType>(T value) { return value.sumWith(value.absoluteValue()); } ``` The `This` type is consistent with the type named `Self` in Rust and Swift (where Rust/Swift use `self` instead of `this`). Other names could be considered (e.g., `ThisType`) if we find that users don't like the name in this change.
2020-02-21	Initial support for explicit default initializers (#1235)	Tim Foley
	This change makes it so that for a suitable type `MyType`, a variable declaration like: MyType v; is treated as if it were written: MyType v = MyType(); The definition of "suitable" here is that `MyType` needs to have an available `__init` declaration that can be invoked with zero arguments. I've added a test to confirm that the new behavior works in this specific case. There are a bunch of caveats to the feature as it stands today: * Just because `MyType` has a zero-parameter `__init`, that doesn't mean an array type like `MyType[10]` does, so arrays currently remain uninitialized by default. Fixing this gap requires careful consideration because some, but not all, array types should be default-initializable. * The change here should mean that a `struct` type with a field like `MyType f;` should count as having a default initial-value expression for that field, but I haven't confirmed that. * Even if a `struct` provides initial values for all its fields (e.g., `struct S { float f = 0; }`), that doesn't mean it has a default `__init` right now, so those `struct` types will still be left uninitialized by default. Converging all this behavior is still TBD. Just to be clear: there is no provision or plan in Slang to support destructors, RAII, copy constructors, move constructors, overloaded assignment operations, or any other features that buy heavily into the C++ model of how construction and destruction of values gets done. In fact, I'm not even 100% sure I like having this change in place at all, and I think we should reserve the right to revert it and say that only specific stdlib types get to opt in to default initialization along these lines.
2020-02-20	WIP on RWTexture types on CUDA/CPU (#1234)	jsmall-nvidia
	* CUDA support for array of resources. * * Add support for Texture2DArray on CPU * Expand texture-simple.slang to test Texture2DArray * Reorganise CUDAComputeUtil to split out createTextureResource. * Add TextureCubeArray support for CPU/CUDA targets. * Pulled out CUDAResource Renamed derived classes to reflect that change. * Creation of SurfObject type. * Functions to return read/write access for simplifying future additions. * WIP for RWTexture access on CPU/CUDA. * CUsurfObject cannot have mips. * Ability to set number of mips on test data. Preliminary support for CUsurfObj and RWTexture1D on CUDA. CUDA docs improvements. * Fix typo.
2020-02-20	Initial support for user-defined initializer/constructor declarations (#1233)	Tim Foley
	The basic idea is that the user can write: ```hlsl struct MyThing { int a; float b; __init(int x, float y) { a = x; b = y; } } ``` and after that point, they can create an intstance of their `MyThing` type as simply as `MyThing(123, 4.56f)`. There was already a large amount of infrastructure laying around that is shared between ininitializers and ordinary functions, so enabling this feature mostly amounted to tying up some loose ends: * In the parser, make sure to properly push/pop the scope for an `__init` (or `__subscript`) declaration, so parameters would be visible to the body * In semantic checking, make sure that declaration "header" checking properly bottlenecks all the function-like cases into a base routine * In semantic checking, make sure that the logic for checking function bodies applies to every `FunctionDeclBase` with a body, and not just `FuncDecl`s * Update semeantic checking for statements to allow for any `FunctionDeclBase` as the parent declaration, not just a `FuncDecl` * In lookup, treat the `this` parameter of an `__init` (well, not actually a parameter in this case) as being mutable, just like for a `[mutating]` method * In IR codegen, don't just assume that all `__init`s are intrinsics, and narrow the scope of that hack to just `__init`s without bodies * In IR codegen, detect when we are emitting an IR function for an `__init`, and in that case create a local variable to represent the `this` value, and implicitly return that value at the end of the body. From that point on the rest of the compiler Just Works and IR codegen doesn't have to think of an `__init` as being any different than if the user had declared a `static MyThing make(...)` function. Caveats: * C++ users might like to use that naming convention (so `MyThing` as the name instead of `__init`). We can consider that later. * Everybody else might prefer a keyword other than `__init` (e.g., just `init` as in Swift), but I'm keeping this as a "preview" feature for now, rather than something officially supported * Early `return`s from the body of an `__init` aren't going to work right now. * There is currently no provision for automatically synthesizing initializers for `struct` types based on their fields. This seems like a reasonable direction to take in the future. * There is no provision for routing `{}`-based initializer lists over to initializer calls. The two syntaxes probably need to be unified at some point so that doing `MyType x = { a, b, c }` and `let x = MyType(a, b, c)` are semantically equivalent. It is possible that as a byproduct of this change user-defined `__subscript`s might Just Work, but I am guessing there will still be loose ends on that front as well, so I will refrain from looking into that feature until we have a use case that calls for it.
2020-02-19	Don't allocate a default space for a VK push-constant buffer (#1231)	Tim Foley
	When a shader only uses `ParameterBlock`s plus a single buffer for root constants: ```hlsl ParameterBlock<A> a; ParameterBlock<B> b; [[vk::push_constant]] cbuffer Stuff { ... } ``` we expect the push-constant buffer should not affect the `space` allocated to the parameter blocks (so `a` should get `space=0`). This behavior wasn't being implemented correctly in `slang-parameter-binding.cpp`. There was logic to ignore certain resource kinds in entry-point parameter lists when computing whether a default space is needed, but the equivalent logic for the global scope only considered parameters that consuem whole register spaces/sets. This change shuffles the code around and makes sure it considers a global push-constant buffer as not needing a default space/set. Note that this change will have no impact on D3D targets, where `Stuff` above would always get put in `space0` because for D3D targets a push-constant buffer is no different from any other constant buffer in terms of register/space allocation. One unrelated point that this change brings to mind is the `[[vk::push_constant]]` should ideally also be allowed to apply to an entry point (where it would modify the default/implicit constant buffer). In fact, it could be argued that push-constant allocation should be the default for (non-RT) entry point `uniform` parameters (while `[[vk::shader_record]]` should be the default for RT entry point `uniform` parameters).
2020-02-19	Fix a reference-counting bug in one of the session creation routines. (#1230)	Tim Foley
	This is pretty straightforward, because we were calling `Session::init` (which can retain/release the session) on a `Session*` (no reference held). The catch is that our current tests use the older form of the Slang API, while Falcor relies on the newer API, and so the recent change to our reference-counting logic introduced a regression that we didn't detect in testing. This change just fixes the direct issue but doesn't address the gap in testing. A better long-term fix would be to fully define our "1.0" API, shift our testing to it, and layer the old API on top of it (to try and avoid regressions for client code).
2020-02-19	Initial partial support for WaveXXX intrinsics on CUDA (#1228)	jsmall-nvidia
	* Start work on wave intrinsics for CUDA. * Add prelimary CUDA support for some Wave intrinsics. Document the issue around WaveGetLaneIndex
2020-02-19	Fixes for DXR 1.1 RayQuery type (#1227)	Tim Foley
	The previous change that added `RayQuery` to the standard library didn't mark it as any kind of intrinsic, so the first fix here was to add the appopriate attribtue to the stdlib declaration of `RayQuery`. Next I found that the legalization pass was obliterating the `RayQuery` type because it had no members, and thus looked like an empty `struct` (which we eliminate for a variety of reasons). I fixed that by adding a check for a target-intrinsic decoration in type legalization. Next I found that the type wasn't emitted correctly because our generic specialization was turning `RayQuery<0>` into a new type `RayQuery_0` (which is what our specialization is designed to do, after all). I then disabled generic specialization for types that are marked as target intrinsics (which probably renders the preceding fix moot). Finally, I found that the emit logic for types in HLSL wasn't handling the case of a generic intrinsic type that didn't also use its own dedicated opcode. I fixes that up by adding a specific case for `IRSpecialize` as a late catch-all. After all these changes, a declaration of a `RayQuery` variable seems to Just Work (even without any new/improved behavior for handling default constructors). One potential gotcha looking forward is that my checks for `IRTargetIntrinsicDecoration` aren't checking what target the decoration is for. This is fine for now because there are only two types using the decoration right now (`RayDesc` and `RayQuery`), and the special cases above are reasonable for both of them. If/when we have more target-intrinsic types with this decoration, and some of them are only intrinsic for specific targets, then we will need to revisit this choice and either: * make these checks perform filtering based on the "current" target (similar to what the emit logic has today), or * (more likely) make the linking and target-specialization step strip out any target-intrinsic decorations that aren't the right one(s) for the current target Note that this change doesn't include a test case yet because I don't have a DXR 1.1 ready version of dxc to test against. I have manually confirmed that appropriate Slang input seems to be producing reasonable HLSL output when using these functions, but I cannot yet try to check that in (using an HLSL file for the expected output would be quite fragile).
2020-02-18	Added support for Targets to TypeTextUtil. (#1226)	jsmall-nvidia
	* Added support for Targets to TypeTextUtil. * Made Function names 'get' and 'find' instead of 'as' in TypeTextUtil.
2020-02-18	First pass Texture Array support on CUDA/CPU (#1225)	jsmall-nvidia
	* Add cubemap support. * Add CUDA fence instrinsics. * Added Gather for CUDA. * Use the CUDA driver API as much as possible. * * Support 1D texture on CPU * WIP on 1D texture on CUDA * Added simplified texture test * Fix test. * Improve texture-simple tests. * * Add CPU support for 3d textures * Add support for mip maps to CUDA * Disable warnings in nvrtc * Update CUDA docs * WIP on 3d texture support. * Add support for 3d textures for CPU and CUDA. * CPU and CUDA support for cube maps. * Add CPU support for Texture1DArray. * Support CUDA Layered/Array type in meta library.
2020-02-18	CUDA/CPU resource coverage (#1224)	jsmall-nvidia
	* Add cubemap support. * Add CUDA fence instrinsics. * Added Gather for CUDA. * Use the CUDA driver API as much as possible. * * Support 1D texture on CPU * WIP on 1D texture on CUDA * Added simplified texture test * Fix test. * Improve texture-simple tests. * * Add CPU support for 3d textures * Add support for mip maps to CUDA * Disable warnings in nvrtc * Update CUDA docs * WIP on 3d texture support. * Add support for 3d textures for CPU and CUDA.
2020-02-14	Feature/cuda coverage (#1223)	jsmall-nvidia
	* Add cubemap support. * Add CUDA fence instrinsics. * Added Gather for CUDA. * Use the CUDA driver API as much as possible. * * Support 1D texture on CPU * WIP on 1D texture on CUDA * Added simplified texture test * Fix test. * Improve texture-simple tests. Co-authored-by: Tim Foley <tfoleyNV@users.noreply.github.com>
2020-02-14	Add a bunch of stdlib declarations for SM 6.4 and 6.5 (#1221)	Tim Foley
	The main thing this adds is the `RayQuery` type with its `TraceRayInline` method for DXR 1.1. None of these new functions/types/constants have been tested, and many of them are not expected to work at all (e.g., we don't actually have any mesh shader support, so adding them as stage types is just for completeness at the API level). I would like to write some test cases after this is checked in by looking for existing DXR 1.1 examples. We currently have an issue around default initialization that means we probably can't run any DXR 1.1 shaders right now with just the stdlib changes.
2020-02-13	* Fix for unary - on glsl (#1222)	jsmall-nvidia
	* Test to check fix
2020-02-12	Nvrtc disable warnings/Float literal improvements (#1220)	jsmall-nvidia
	* Added 'truncate' for fixing floats, for floats near the max value (as opposed to making infinite). Put AreNearlyEqual into Math * Test for ::make static method.
2020-02-12	Support for isinf/isfinite/isnan/ldexp (#1219)	jsmall-nvidia
	* Added support ldexp. * Added classify-float.slang test Fixed glsl output. * Added classify-double.slang * Added ldexp test to scalar-double.slang * isnan, isinf, isfinite are macros (on some targets) so remove :: prefix.
2020-02-12	CUDA barrier/atomic support (#1218)	jsmall-nvidia
	* * Improved fastRemoveAt * Fixed off by one bug * Fixed const safeness with List<> * Made List begin and end const safe. * Revert to previous RefPtr usage. * Fix bug with casting. * Tabs -> spaces. Small fixes/improvements to List. * Improve comment on List. * Group shared/atomic test works on CUDA. * * Enabled CUDA tests for atomics tests * Enabled DX12 test for atomics-buffer.slang Not clear just yet how to implement that for CUDA - it will work with StructuredBuffer. * hasContent -> isNonEmpty * Remove unneeded comment.
2020-02-11	Small improvements around List (#1216)	jsmall-nvidia
	* * Improved fastRemoveAt * Fixed off by one bug * Fixed const safeness with List<> * Made List begin and end const safe. * Revert to previous RefPtr usage. * Fix bug with casting. * Tabs -> spaces. Small fixes/improvements to List. * Improve comment on List. * hasContent -> isNonEmpty
2020-02-11	Make fixes for the attribute-error test case. (#1215)	Tim Foley
	There are two main fixes here: The first is to remove a memory leak in the reflection test tool, in the case where Slang compilation fails. There is no real reason to be using the reflection test tool for tests that produce diagnostics (we have the slangc tool for that), but it makes sense to go ahead and fix the leak rather than work around it. This was one of those leaks that could have been avoided with smart pointers and a COM-lite API. The second issue was that the logic for constructing an `AttributeDecl` based on a user-defined `struct` was not correctly setting the parent of the attribute decl. The code in question was a little hard to follow and had a few steps that didn't seem strictly necessary given its goals, so I went ahead and scrubbed+commented it to just do what made sense to me (and the tests still pass...). I'm not entirely happy with the design and implementation approach for user-defined attributes, so we might want to take another stab at it sooner or later. This change is not meant to address any design issues, and is just about fixing bugs in what is already there.
2020-02-10	Fix output GLSL for primitive ID in a geometry shader (#1214)	Tim Foley
	We had been translating an `SV_PrimitiveID` input in a shader over to `gl_PrimitiveID` in GLSL. That translation seemed to work just fine for users, so we thought it was correct. It turns out that `gl_PrimitiveID` is the correct GLSL for a primitive ID input in every stage except a geometry shader. In a geometry shader, `gl_PrimitiveID` is a primitive ID output, and if you want the input case you have to write `gl_PrimitiveIDIn` (note the `In` suffix). This change sets aside my bewilderment at the above long enough to implement a workaround in the GLSL legalization step. I also modified our current geometry shader cross compilation test to make use of an input primitive ID.
2020-02-10	Add attributes to enable dual-source blending on Vulkan (#1210)	Tim Foley
	This change adds support for the `[[vk::location(...)]]` and `[[vk::index(...)]]` attributes, which can be used together to mark up shader outputs for dual-source blending on Vulkan. HLSL/Slang code like the following: ```hlsl struct Output { [[vk::location(0)]] float4 a : SV_Target0; [[vk::location(0), vk::index(1)]] float4 b : SV_Target1; } [shader("fragment")] Output main(...) { ...} ``` can be used to set up dual-source blending on both D3D and Vulkan APIs. The output GLSL for the above will look something like: ```glsl layout(location = 0) out vec4 a; layout(location = 0, index = 1) out vec4 b; void main() { ... } ``` The more or less straightforward parts of this change were: * Added new `attribute_syntax` declarations to the stdlib, for `[[vk::location(...)]]` and `[[vk::index(...)]]` * Added new AST node types for the new attribute cases, sharing a base class so that argument checking can be shared * Added checks for the arguments to the new attributes in `slang-check-modifier.cpp` (eventually this kind of logic shouldn't be needed for new attributes) * Updated GLSL emit logic so that it treats the `index`/`space` parts of a variable layout as the `location`/`index` for varying parameters. * Updated GLSL legalization so that when it translates entry-point parameters into globals (and scalarizes structures) it handles both a binding index and space for the parameters. * Added a cross-compilation test case to verify that the basics of the feature work The remaining work is all in `slang-parameter-binding.cpp`. There is some work that isn't technically related to this change (and which could be reverted if it causes problems), around the detection and handling of fragment shader outputs with `SV_Target` semantics. The basic changes (which could be backed out and then merged separately) are: * Made the special-case `SV_Target` logic only trigger for fragment shaders (that is the only place where `SV_Target` should appear, but we weren't guarding against it) * Made the logic to reserve a `u<N>` register for `SV_Target<N>` only trigger for D3D Shader Model 5.0 and below (since it is not required for SM 5.1 and up). This could be a breaking change for some users, but that seems unlikely. * Fixed one test case that relied on the behavior of reserving `u0` for `SV_Target0` even though it was a SM6.0 test. * Also added more comments to the system-value handling logic. The more interesting changes come up starting in `processEntryPointVaryingParameterDecl()`. The basic issue is that we have so far only supported implicit layout for varying parameters on GLSL/Vulkan, but the `[[vk::location(...)]]` attribute is a form of explicit layout annotation. Rather than try to kludge something that only works in narrow cases, I instead opted to try to fix things more generally. In `processEntryPointVaryingParameterDecl()` we now check for the `location` and `index` attributes when we are on "Khronos" targets (Vulkan/OpenGL/GLSL) and immediately add them to the variable layout being constructed if they are found. There is nothing in this logic specific to fragment-shader outputs, so this feature now applies to any varying input/output on Khronos targets. Allowing explicit layouts creates the potential for mixing implicit and explicit layout. For example, consider: ```hlsl struct Output { float4 color : COLOR; [[vk::location(0)]] float3 normal : NORMAL; } ``` What `location` should `color` get? Should this code be an error? There are two cases where this conundrum can come up: when working with `struct` types used for varying parameters, and the entry-point parameter list itself. For the varying `struct` case we currently make an expedient choice. We handle fields with both implicit or explicit layotu with appropriate logic, but logic that doesn't account for the case of mixing the two. Then at the end of layout for the `struct` we issue an error if there was a mix of implicit and explicit layout (such that our results aren't likely to be valid). For the entry point varying parameter case, things were already using a `ScopeLayoutBuilder` type (that encapsulates some logic shared between entry-point and global parameters). The entry-point-specific bits were moved out into a `SimpleScopeLayoutBuilder` and it was updated so that rather than assuming all parameters use implicit layout it does a two-phase layout approach similar to what we use for the global scope: * First all parameters are enumerated to collect explicit bindings and mark certain ranges as "used" * Next the parameters are enumerated again and those without explicit bindings get allocated space using a "first fit" algorithm In principle we could extend the two-phase approach to apply to `struct` types as well, but that would be best saved for a future refactoring of some of this parameter binding logic, since I would like to exploit more of the opportunities for sharing code across the uniform/varying and struct/entry-point/global cases. By moving the point where entry point parameters get their offsets assigned, it was necessary to move around some of the logic that removes varying parameter usage (and other things that shouldn't "leak" out of an entry point) to a different point in the entry point layout process. While adding these various pieces does not quite enable us to support explicit bindings on entry point parameters (e.g., putting `uniform Texture2D t : register(t0)` in an entry point parameter list) or in `struct` types (e.g., explicit `packoffset` annotations on fields), it starts to provide some of the infrastructure that we'd need in order to support those cases.
2020-02-08	Fixes to make all CPU compute shaders work on CUDA (#1211)	jsmall-nvidia
	* Launch CUDA test taking into account dispatch size. * Enable isCPUOnly hack to work on CUDA. * Rename 'isCPUOnly' hack to 'onlyCPULikeBinding'. * Add $T special type. Support SampleLevel on CUDA. * Fix typo.
2020-02-07	Code standard changes for Lexer (#1209)	jsmall-nvidia
	* Upper camel -> lowerCamel m_ prefix members where appropriate _ prefix module local functions * m_ prefix members in Lexer. Fit's standard because type has methods/ctor.
2020-02-07	HLSL Intrinsic coverage test improvements (#1206)	jsmall-nvidia
	* Fix CPP construct when matrix type. * Test intrinsics on float matrices. * Fix typo in _areNearlyEqual test. Increased default sensitivity. Added matrix-float test. * Matrix double test. Fixed some issues with CUDA. * Added reduced intrinsic version of matrix-double test. * Improve matrix double coverage. Test reflect/length etc on vector float. * * Added literal-float test. * Added vector double test * Improved coverage of vector/matrix tests * Disable Dx11 double-vector test because fails on CI. * Disable literal-float, because on CI fails.
2020-02-07	Change handling of strings for HLSL/GLSL targets (#1204)	Tim Foley
	* Change handling of strings for HLSL/GLSL targets This change switches our handling of string literals and `getStringHash` to something that is more streamlined at the cost of potentially being less general/flexible. * `String` is now allowed as a parameter type in user-defined functions * `getStringHash` is now allowed to apply to `String`-type values that aren't literals * The list of strings in an IR module is now generated during IR lowering as part of lowering a string literal expression, rather than being defined by recursively walking the IR of the module looking for `getStringHash` calls. The public API still refers to these as "hashed" strings, but they are realistically now "static strings." * When emitting code for HLSL/GLSL, the `String` type emits as `int`, and `getStringHash(x)` emits as `x`. In terms of implementation, the choice was whether to translate `String` over to `int` in an explicit IR pass, or to lump it into the emit pass. While adding the logic to emit clutters up an already complicated step, it is ultimately much easier to make the change there than to write a clean IR pass to eliminate all `String` use. Note that other targets that can handle a more full-featured `String` type are not addressed by this change and thus do not support `String` at all. It may be woth emitting `String` as `const char` on those targets, and emitting string literals directly, but the `getStringHash` function would need to be implemented in the "prelude" then, and we probably want to pick a well-known/-documented hash algorithm before we go that far. This change also brings along some some clean-ups to the `gpu-printing` example, since it can now take advantage of the new functionality of `String`. Fix up tests for new string handling * Add global string literal list to string-literal test (since we now list all static string literals and not just those passed to `getStringHash`) * Disable `getStringHash` test on CPU, since we don't have a working `String` on that platform right now (only HLSL/GLSL) Co-authored-by: Tim Foley <tim.foley.is@gmail.com>
2020-02-06	Float matrix intrinsic test/fixes (#1203)	jsmall-nvidia
	* Fix CPP construct when matrix type. * Test intrinsics on float matrices. * Fix typo in _areNearlyEqual test. Increased default sensitivity. Added matrix-float test.
2020-02-06	Literal handling improvements (#1202)	jsmall-nvidia
	* WIP: 64 literal diagnostic and truncation. * Improve how integer truncation is handled/supported. Added literal-int64.slang test. Set a suffix on all literals. Fixed problem on C++ based targets where l suffix was not the same as int() cast. So on C++ derived emitters, int() is used instead of l suffix to have same behavior across targets. * Add literal diagnostic testing. * Allow lexer to lex - in front of literals. * Fix lexing and converting int literal with -. * Too large small values of floats become inf. Handling writing inf types out on different targets. Add function to deterimine if a float literals kind. * Roll back the support of lexer lexing negative literals. * Fixed tests broken because of diagnostics numbers. Improved _isFinite * Fix compilation on linux. * Fix problem with abs on linux - use Math::Abs. * Fix typo. * * Improve warnings for float literals zeroed * Improved 64 bit type documentation * Handle half * Improved comments * Fixed tests broken * Use capital letters for suffixes. * Make default behavior on outputting a int literal that is an 'int32_t' is cast (not suffix) to avoid platform inconsistencies. Improve documentation for 64 bit types. Make tests cover material in docs. * Fixed tests. * Rename FloatKind::Normal -> Finite * Fix half zero check.
2020-02-06	Improve checks and diagnostics around redeclarations (#1201)	Tim Foley
	* Improve checks and diagnostics around redeclarations This change turns checking for redeclarations into a dedicated phase of semantic checking, and ensures that it applies to the main categories of declarations: functions, types, and variables. Note that "variables" here includes function parameters and `struct` fields in addition to the more obvious global and local variables. Some of the logic for checking redeclarations already existed for functions, and was refactored to deal with other cases of declarations. The checking for functions still needs to be special-cased because functions are much more flexible about the kinds of redeclarations that are allowed. In addition to improving the diagnosis of redeclaration itself, this change also changes the error message that is produced when referencing a symbol that is ambiguous due to begin redeclared. This is a small quality-of-life fix, and has the benefit of being much easier to implement than robust tracking of what variables have had redeclaration errors issued so that we can skip emitting an ambiguity error at the use site. A new test case was added to cover the redeclaration cases for variables (but not types or functions), and the test for function parameters was updated to account for the new more universal diagnostic message (since function parameters used to have special-case redeclaration checking). * fixup: missing file
2020-02-05	Improve behavior when undefined identifier is a contextual keyword (#1200)	Tim Foley
	The HLSL language has keywords with very common names like `triangle`, and Slang doesn't want to preclude users from using such names for their variables/functions/etc. In addition, Slang adds new keywords on top of HLSL (like `extension`) and we don't want those to prevent us from compiling existing code. As a result, almost all keywords in Slang are contextual keywords, and they can be shadowed by user varaibles. The down-side to making all keywords contextual is that in a case like this: ``` int test() { return triangle; } ``` The identifier `triangle` is not undefined as far as lookup (it is defined as a modifier keyword), so the existing "undefined identifier" logic gets bypassed, and instead we ran into an internal compiler error trying to construct an expression that refers to a modifier keyword. Fortunately, the internal compiler error in that case was overkill, and the compiler already had defensive logic to produce an expression with an error type if it couldn't figure out what the type of a declaration reference should be. The main fix here is thus to emit an "undefined identifier" error instead of an internal compiler error at the point where we see an attempt to reference a declaration that shouldn't be available in an expression context. In order to improve the quality of the diagnostic, the code for constructing declaration references was updated to pass along a source location to be used in error messages.
2020-02-05	Add support for RWBuffer writes on GLSL/SPIR-V target (#1199)	Tim Foley
	This appears to have been an oversight in the work that added support for `imageStore` as well as atomics when writing to `RWTexture*` and friends. The HLSL/Slang `RWBuffer` type maps to GLSL as an `imageBuffer`, which is effectively just another case of writable texture image (bonus points to anybody who can explain to me the meaningful distinction between an `imageBuffer` and an `image1D`). This change copies the handling of subscript access (`operator[]`) from textures over to buffers, and adds a test case to confirm that the new handling works for the simple case of setting a buffer element.
2020-02-04	CUDA/C++ backend improvements (#1198)	jsmall-nvidia
	* WIP with vector float test. * vector-float test working. * Fixed remaing tests broken with init changes. * Improve 64bit-type-support.md * Disable tests broken on CI system for Dx. * WIP: Make type available for comparison. * Moved type conversion into TypeTextUtil. * Add text/type conversions from DownstreamCompiler to TypeTextUtil. * Allow compaison taking into account type. * Removed quantize in vector-float.slang test.
2020-01-31	Some Slang API additions (#1195)	Tim Foley
	* Some Slang API additions These are additions to the public Slang API that came up while I was trying to write an example to demonstrate GPU printing. The additions aren't strictly necessary for the example, but I found these to be missing services when writing the code the way I wanted to. The main public changes are: * There is a new distinct `IEntryPoint` interface which inherit from `IComponentType` (much like `IModule` before). For now this doesn't expose any new functions on top of `IComponentType`, but I expect it to do so eventually. * It is now possible to get the `IModule` for a specific translation unit in a compile request with `spCompileRequest_getModule`. Even for a compile request that had only one translation unit this is not the same object as gets returned by `spCompileRequest_getProgram`. The latter should probably be called `_getLinkedProgram` because it returns a composite component type that links the module with everything it `import`s. An `IModule` can look up entry points declared in the module. Eventually a module should support looking up most of the declarations in the module (e.g., types) by name, but entry points are an obvious case. * A new `link()` operation is added to `IComponentType`. It is possible to have component types that have unsatisfied dependencies, such that trying to generate kernel code from them will fail. The `link()` operation tries to produce a new composite component type that combines a component with its dependencies, to enable code generation. The implementation of end-to-end compilation was using a function like this internally, but it hadn't been exposed to the API. Notes on the implementation: * The list of entry points declared in a given translation unit has moved from `TranslationUnitRequest` to the `Module` inside of it. * `EntryPoint` now has to do a song and dance much like `Module` to both inherit from `ComponentType` and support the `IEntryPoint` interface. * The `Session* m_session` member in `Linkage` (in terms of public API, this is the `slang::ISession` holding a pointer to the `slang::IGlobalSession`) has been changed to a `RefPtr`. Without this change an application can't just hold onto a `ComPtr<slang::ISession>`; they also need to retain the `IGlobalSession` or things will crash. The new behavior seems more correct, but I worry that it might introduce a leak. * The `asInternal` operation for `IComponentType` had to be updated to not just perform a cast. A type like `Module` has two `IComponentType` sub-objects, and only one of these is at the same address as the `ComponentType` base. * Similarly, the `Module::getInterface` logic was changed to fall back to `Super::getInterface` for all the cases other than `IID_IModule`, so that it would be guaranteed to return the `IComponentType` at the same address as `ComponentType` in response to a `queryInterface`. * Fixes for memory retain cycles As part of the earlier change, I made the `Linkage` type hold a `RefPtr` to the `Session`. The motivation there is it lets a user hang onto just a `slang::ISession` without having to also retain the `slang::IGlobalSession` for no immediately apparent reason. There are two problems that this surfaced, one pre-existing and one new. The new problem was that `Session` already held a `RefPtr<Linkage> m_builtinLinkage` for the linkage that holds the stdlib code. I solved that problem by splitting the parent pointer in `Linkage` into two pointers: a raw pointer that is used to actually locate the parent session, and a ref-counted one that can be used to optionally retain the parent session. The builtin linkage is then set up to explicitly not retain its parent, thus breaking the cycle. The second problem was a pre-existing one, where every `ComponentType` was holding a retained pointer to its parent `Linkage`, but in turn the `Linkage` was holding retained pointers to many `ComponentTypes` (and subclasses thereof). For this case I used the more expedient fix of making the parent pointer into a raw pointer, and figuring that it is a reasonable rule to expect user to retain the `Linkage` (aka `slang::ISession`) that owns a component type if they want to be able to use the component type. I might need/want to investigate a better fix for the latter issue, but for now this seems to clean up the issues I was seeing in the tests. Fingers crossed.
2020-01-31	Fix for a macro expansion bug (#1192)	Tim Foley
	* Fix for a macro expansion bug The work in #1177 added a notion of a macro being "busy" to prevent errors around recursively-defined macros: #define BAR(X) BAR(0) BAR(A) /* should yield BAR(0) and not infinite-loop / The fix was to place an `isBusy` flag on each macro, setting it to `true` when an expansion of that macro gets pushed onto the stack of input streams, and back to `false` when the expansion gets popped from the stack of input streams. That approach meant that we had to pre-expand macro arguments to avoid incorrectly treating a macro as busy for nested-but-not-recursive invocations: #define NEG(X) (-X) NEG(NEG(3)) // should expand to (-(-3)) and not (-NEG(3)) The subtle bug that arose with the `isBusy` flag is that the current preprocessor design doesn't always pop input streams when we might expect. For input like the following: BAR(A) BAR(B) The preprocessor can be in a state where the stack of input streams look like: // top input stream: BAR(X) => BAR(0) ----------------^ // bottom input stream BAR(A) BAR(B) ------^ That is, the first macro expansion is still "open," even though it is at the end of its input. When the preprocesor looks ahead and sees `BAR` as the next token and asks whether it should be expanded, the `isBusy` state on the BAR macro hasn't yet been unset, so expansion gets skipped. Aside: it is completely reasonable to ask at this point whether we should just "fix" the behavior of not popping input streams that are at their end. We should consider doing that, but the current code goes to some lengths to preserve the current behavior and I do not* recall why we had found it necessary. Any attempt to change that behavior should come along with lots of testing. This change instead adds a different approach for tracking the busy-ness of macros that doesn't rely on mutable state in the macro, and thus works even without pre-expansion of macro arguments. In the Slang preprocessor, macro names are always looked up in environments, where each environment maps names to macros, and has an option parent environment. There is a global environment used for most cases, but when expanding a function-like macro we would create an "argument environment" for it that maps the parameter names to their argument tokens. By expanding the environment type to include an (optional) field for a macro that is made "busy" by that environment, we can check if a macro is busy in a given environment by checking if the given macro has been associated with any of the environments on the chain of parent environments. A function-like macro expansion could then attach the macro to the "argument environment" and automatically make itself busy for the purposes of expansion of its body. To extend the same functionality to all macros, the "argument environment" was changes to an "expansion environment" that always gets used for a macro expansion and always has the identity of the macro attached. Because the arguments to a function-like macro have their own environment for expansion that bypasses the argument/expansion environment of the function-like macro, the macro will show as busy in its own body, but not in the expansion of its arguments. Thus the pre-expansion of macro arguments is not needed with this change. Aside: it might not be clear how this design avoids the problem of the unpopped input streams that aren't at their end. The answer is that the "current" environment for the preprocessor is already taken as the environment of the top-most input stream that is not at its end. This means that the new test for busy-ness benefits from the existing logic that was in place to deal with non popping input streams as soon as then end. (This is not to say that the current input-stream behavior isn't questionable) This change includes a new test case for the behavior that was broken, and expands the test case from #1177 to also test object-like macros. * Fix to handle the mutually-recursive case The revised "busy" logic was unable to handle a mutually-recursive macro like: #define ABC XYZ #define XYZ ABC This change restores the ability to handle mutually recursive cases, and makes sure that we test that case. The crux of the fix relies on splitting the environment and busy-macro tracking into more separate data structures. Rather than the list of environments directly trakcing busy-ness via the chain of parent environments, an environment now stores a separate linked list of macros that should be considered busy in that environment. Decoupling the two lists allows for an important change: when expanding an ordinary macro (either object-like or function-like, but not a function argument) the parent environment comes from the point where the macro was defined (this is required for lookup to make sense), but the parent list of busy macros comes from the current input stream at the point where expansion takes place. That change ensures that non-argument macros always properly "stack up" the list of busy macros.
2020-01-31	64 bit types doc (#1194)	jsmall-nvidia
	* * For integer literals add postfix, and use unsigned/signed output appropriately * Extend GLSL extension handling by type, and for adding 64 bit int extensions * Added tests for int/uint64 types * Add explicit Int/UInt64 emit functions to avoid ambiguity. * Fix uint64_t intrinsics on CUDA/C++. * WIP 64 bit types documentation. * Testing int64 intrinsic support. * Dx12 Dxil sm6.0 does actually support int64_t.
2020-01-30	Support for 64 bit integer types (#1191)	jsmall-nvidia
	* * For integer literals add postfix, and use unsigned/signed output appropriately * Extend GLSL extension handling by type, and for adding 64 bit int extensions * Added tests for int/uint64 types * Add explicit Int/UInt64 emit functions to avoid ambiguity.
2020-01-29	Feature/target intrinsic fold (#1190)	jsmall-nvidia
	* When checking if an instruction can be folded, take into account if it's called by a target intrinsic, because if it is we need to check if the parameter is accessed multiple times to see if it's worth allowing to fold. * Tidy up code around folding/target intrinsics. * Fix texture-load.slang . * Fix typo in assert.
2020-01-29	Feature/fix cuda function preamble (#1187)	jsmall-nvidia
	* Fix tests/compute/global-init.slang by handling some other cases where functions are emitted. * Fix comment.
2020-01-28	Fix layout for structured buffers of matrices (#1184)	Tim Foley
	When using row-major layout (via command-line or API option), the following sort of declaration: ```hlsl StructuredBuffer<float4x4> gBuffer; ... gBuffer[i] ... ``` Generates unexpected results when compiled to DXBC via fxc or DXIL via dxc, because the fxc/dxc compilers do not respect the matrix layout mode in this specific case (a structured buffer of matrices). Instead, they always use column-major layout, even if row-major was requested by the user. A user can work around this behavior by wrapping the matrix in a `struct`: ```hlsl struct Wrapper { float4x4 wrapped; } SturcturedBuffer<Wrapper> gBuffer; ... gBuffer[i].wrapped ... ``` This change simply automates that workaround when compiling for an HLSL-based downstream compiler, so that we get the same behavior across all our backends. The change adds a test case to confirm the behavior across multiple targets, but it turns out we also had a test checked in that confirmed the buggy (or at least surprising) fxc/dxc behavior, so that one had its baselines changed and can work as a regression test for this fix as well.
2020-01-28	Synthesizing CUDA tests (#1183)	jsmall-nvidia
	* When using setUniform clamp the amount of data written to the buffer size. * CUDA implement StructuredBuffer/ByteAddressBuffer as pointer/count as is on CPU. Allow bounds check to zero index. Update docs. * Synthesize tests. * Fix bug in CUDA output. * Fixing more tests to run on CUDA. * Added BaseType for layout of Vector and Matrix - as they are held as int32_t vector array types. * Enable unbound array support on CUDA. * Added unsized array support for CUDA documentation.
2020-01-27	CUDA implement StructuredBuffer/ByteAddressBuffer as pointer/count as is on ↵	jsmall-nvidia
	CPU. (#1182) Allow bounds check to zero index. Update docs.
2020-01-24	Fix for infinite recursion with macro invocation (#1177)	jsmall-nvidia
	* First pass fix of macro expansion logic to stop recursive application (causting a recursive loop), whilst also allowing application on parameters to a macro. * Added recursive-macro test. Fixed macro application example.
2020-01-24	Texture Sample available in CUDA (#1176)	jsmall-nvidia
	* WIP: Trying to figure out how texturing will work with CUDA. * WIP: Fixes for CUDA layout. Initial CUDA texture test. * WIP: Outputs something compilable by CUDA for TextureND.Sample * 2d texture working with CUDA. * Fix how binding for SamplerState occurs in CUDA. * Small tidy up of comments.
2020-01-23	Fix a bug in handling explicit register space bindings (#1175)	Tim Foley
	The logic for handling explicit `space`/`set` bindings on shader parameters for parameter blocks was not correctly marking the `space`/`set` that gets grabbed as used, and as a result it was possible for another parameter block that relies on implicit assignment to end up with a conflicting space. This change fixes the original oversight, and adds a test case to prevent against regression.
2020-01-22	Matrix indexing (#1172)	jsmall-nvidia
	* Added hlsl-intrinsic test folder. Enabled ceil as works across targets. * log10 support. * Fix float % on CPU/CUDA to match HLSL which is fmod (not fremainder). * Added log10 tests back to scalar-float.slang * Don't add the ( for $Sx - it's clearer what's going on without it. * Works on CUDA/CPU. Problem with asint/asuint do not seem to be found. * Only asuint exists for double. * Support countbits on CUDA and C++. * Fix typo in C++ population count. * First pass at int vector intrinsic tests. * Swizzle for int. * Bit cast tests on CUDA. * Fix warning on gcc. * Fix bit-cast-double execution on CUDA. * scalar-int test working on gcc release. * GetAt working on CUDA/C++ * Split out runtime index into it's own test. * Removed SetAt, as can use assignment with GetAt. * Allowing getAt to be used on matrices. * Don't need [] on matrix type any longer because use getAt. * Enable clamp on matrix-int. * Fix matrix-int.slang test - because clamp behavior varied if min and max were say inverted. Added runtime indexing version of matrix-int.
2020-01-22	WIP HLSL intrinsic coverage (#1171)	jsmall-nvidia
	* Added hlsl-intrinsic test folder. Enabled ceil as works across targets. * log10 support. * Fix float % on CPU/CUDA to match HLSL which is fmod (not fremainder). * Added log10 tests back to scalar-float.slang * Don't add the ( for $Sx - it's clearer what's going on without it. * Works on CUDA/CPU. Problem with asint/asuint do not seem to be found. * Only asuint exists for double. * Support countbits on CUDA and C++. * Fix typo in C++ population count. * First pass at int vector intrinsic tests. * Swizzle for int. * Bit cast tests on CUDA. * Fix warning on gcc. * Fix bit-cast-double execution on CUDA. * scalar-int test working on gcc release.
2020-01-21	HLSL intrinsic coverage (#1169)	jsmall-nvidia
	* Added hlsl-intrinsic test folder. Enabled ceil as works across targets. * log10 support. * Fix float % on CPU/CUDA to match HLSL which is fmod (not fremainder). * Added log10 tests back to scalar-float.slang * Don't add the ( for $Sx - it's clearer what's going on without it.
2020-01-21	CUDA support improvements (#1168)	jsmall-nvidia
	* Add test result for compile-to-cuda * Add RAII for some CUDA types to simplify usage. * First pass handling of some instrinsics on CUDA (for example transcendentals) * CUDA working with built in intrinsics. * Add missing CUDA prelude intrinsics. * CUDA matches CPU output on simple-cross-compile.slang * First pass at hlsl-scalar-float-intrinsic.slang test. * Fix smoothstep impl on CUDA and CPU. * Fixed step intrinsic on CUDA/CPU. * Added operator[] to Matrix for C++, to allow row access. Needs a fix for CUDA. * Fixed warning on clang build.
2020-01-17	Slang -> CUDA kernel runs correctly in test infrastructure (#1167)	jsmall-nvidia
	* First pass at BindLocation. * Added BindSet::init - for initializing with two input constant buffers. Needs better name, and perhaps should be another class. * Fix handling of constant buffer stripping. Improved initialization. * Trying to generalize BindLocation a little more. Split out CPULikeBindRoot. * More work to make BindLocation et al work with non uniform bindings. * Added parsing to a location. * WIP: Trying to get CPU working with BindLocation. * Describe problem of knowing the type of the reference point in the binding table. * More ideas on getBindings fix. * Remove BindSet as member of BindLocation. * Added BindLocation::Invalid * Made BindLocation able to be key in hash * Use BindLocation for bindings on BindingSet. * Added cuda and nvrtc categories to test infrastructure. Disabled CUDA synthetic tests by default. Fixed such that all tests now produce something in BindLocation style. * Use m_userIndex instead of m_userData on Resource. Move the binding setup out of cpu-compute-util (as no longer CPU specific) * Removed CPUBinding - used BindLocation/BindSet instead. Fixed some bugs around indexOf around uniform indirection. * Renamed BindSet::Resource -> BindSet::Value. * Document BindLocation. * Fixes for Clang/GCC Improve invariant requirement handling when constructing from BindPoints. * WIP: First attempt to run CUDA kernel. * Fix some issues around doing CUDA kernel launch. * Fix issues around use of cudaMemCpy . * Better cuda runtime error checking mechanism. * Fixed bug in passing parameters to cuda kernel launch. Simplified initialisation of context. * WIP: Fix CUDA runtime issues. * Add explicit CUDA synchronize so failures don't appear on implicit ones. * Fix problem emitting non shared variable on CUDA. * Fix some typos in CUDA layout. Use just a pointer for now for CUDA StucturedBuffer. * Arg order for CUDA launch was wrong. * First compute kernel runs on CUDA.