slang.git - Making it easier to work with shaders

	Commit message (Collapse)	Author	Age
*	Perform some transformations on IR to legalize for GLSL (#200)	Tim Foley	2017-10-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When outputting GLSL from a Slang or HLSL entry point, we need to translate any parameters or results of an entry-point function into global declarations of `in` or `out` parameters, as needed by GLSL. This change adds these transformations at the IR level, so that they don't need to complicate the emit logic. More detailed changes: - Make `render0` test use IR. It passes out of the box. - Fix test runner to not always dump diffs on failures I accidentally initialized an option to `true` instead of `false` when working on debugging the Travis CI failures. - Special-case output for component-wise multiplication to handle GLSL `matrixCompMul()` - Handle GLSL vs. HLSL output for calls to `mul()` - Output proper `layout(std140)` on GLSL constant buffer declarations - Require appropriate GLSL extension when emitting explicit `layout(offset = ...)` on constant buffer members - TODO: Need to avoid requiring this extension in cases where the offsets are what would be computed anyway. Realistically, should probably be emitting code with explicit padding, etc. to guarantee layouts. - Add an IR-based pass to translate entry point functions by eliminating their input/output parameters and replacing them with global variables. - Demangle names when calling target intrinsics The lowering to the IR will turn a call like `sin(foo)` into a call to a function declaration with a mangled name like `_S3sin...`. This works fine when the user is calling their own functions, since the name mangling will apply to both the definition and use sites, but for builtin functions it obviously isn't what we want. This change makes it so that we demangle the name of an instrinsic function just enough so that we can extract the original simple name, and make a call using that. These changes do nor provide 100% of what we need when translating to GLSL, so the `cross-compile-entry-point` test still hasn't been flipped over to use the IR (even though that is the test case I've been using to develop these changes).
*	IR: overhaul IR design/implementation (#195)	Tim Foley	2017-10-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* IR: overhaul IR design/implementation Closes #192 Closes #188 This is a major overhaul of how the IR is implemented, with the primary goal of just using the AST-level type representation as the IR's type representation, rather than inventing an entire shadow set of types (as captured in issue #192). One consequence of this choice is that types in the IR are no longer explicit "instructions" and are not represented as ordinary operands (so a bunch of `+ 1` cases end up going away when enumerating ordinary operands). Along the way I also got rid of the embedded IDs in the IR (issue #188) because this wasn't too hard to deal with at the same time. Another related change was to split the `IRValue` and `IRInst` cases, so that there are values that are not also instructions. Non-instruction values are now used to represent literals, references to declarations, and would eventually be used for an `undef` value if we need one. IR functions, global variables, and basic blocks are all values (because they can appear as operands), but not instructions. The main benefit of this approach is that the top-level structure of a bytecode (BC) module is much simpler to understand and walk, and BC-level types are represented much more directly (such that we could conceivably use them for reflection soon). * fixup: 64-bit build fix * fixup: try to silence clang's pedantic dependent-type errors * fixup: bug in VM loading of constants
*	First attempt at a Linux build (#193)	Tim Foley	2017-09-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* First attempt at a Linux build - Fix up places where C++ idioms were written assuming lenient behavior of Microsoft's compiler - Add a few more alternatives for platform-specific behavior where Windows was the only platform accounted for. - Add a basic Makefile that can at least invoke our build, even if it isn't going good dependency tracking, etc. - Build `libslang.so` and `slangc` that depends on it, using a relative `RPATH` to make the binary portable (I hope) - Add an initial `.travis.yml` to see if we can trigger their build process. * Fixup: const bug in `List::Sort` I'm not clear why this gets picked up by the gcc and clang that Travis uses, but not the (newer) gcc I'm using on Ubuntu here, but I'm hoping it is just some missing `const` qualifiers. * Fixup: reorder specialization of "class info" Clang complains about things being specialized after being instantiated (implicilty), and I hope it is just the fact that I generate the class info for the roots of the hierarchy after the other cases. We'll see. * Fixup: add `platform.cpp` to unified/lumped build * Fixup: Windows uses `FreeLibrary` and not `UnloadLibrary` * Fixup: fix Windows project file to include new source file This obviously points to the fact that we are going to need to be generating these files sooner or later.
*	Initial work on a "VM" for Slang code (#189)	Tim Foley	2017-09-21
	At a high level, this commit adds two things: 1. A "bytecode" format for serializing Slang IR instructions and related structure (functions, "registers") 2. A virtual machine that can load and then execute code in that bytecode format. The reason for kicking off this work right now is that we need a way to run tests on Slang code generation that doesn't rely on having a GPU present (given that our CI runs on VM instances without GPUs), nor on textual comparison to the output of other compilers. With these features I've implemented a slapdash `slang-eval-test` test fixture that can run a (trivial) compute shader to very our compilation flow through to bytecode. Some key design constraints/challenges: - The bytecode format should be "position independent" so that a user can just load a blob of data and then inspect it without having to deserialize into another format, allocate memory, etc. Eventually the bytecode format might be a replacement for out current reflection API (we used to base reflection off a similar format, but the cost/benefit wasn't there at the time and we switched to just using the AST). - The VM should be able to execute bytecode functions without doing any per-operation translation, JIT, etc. (translation of more coarse-grained symbols is okay). For now the VM is just being used to run tests, but eventually I'd like it to be viable for: - Running Slang-based code in the context of the compiler itself. This starts with stuff like constant-folding in the front-end, but could expand to more general metaprogramming features. - Running Slang-based ocde within a runtime application (e.g., a game engine) that wants to be able to run things like "parameter shader" code, or even just evaluate compute-like code on CPU (e.g., when supporting particles on both CPU and GPU). - Finally, the bytecode format should ideally be able to round-trip back to the IR without unacceptable loss of information. This requirement and the previous one play off of each other, because things like a traditional SSA phi operation is ugly when you have to actually execute it. This doesn't matter right now when we don't have SSA yet, but it might be part of the decision-making here. The actual implementation is centralized in `bytecode.{h,cpp}` and `vm.{h.cpp}`. Big picture notes: - The space of opcodes is shared between IR and bytecode (BC), with the hope that this makes translation of operations between the two easy. - The actual bytecode instruction stream relies on a variable-length encoding for integer values, including opcodes and operand numbers, so that the common case is single-byte encoding. - In the long term I intend to have a rule that if you use a single-byte encoding for an opcode, then all operands are required to use single-byte encodings too. Operations that need multi-byte operands would then be forced to use a multi-byte encoding of the op, and would be sent down a slower path in the interpeter. - The "bytecode"'s outer structure is based on ordinary data structures linked with pointers, but they are "relative pointers" so the actual structure is position-independent. - There are two main kinds of operands: registers and "constants." An operand is a signed integer where non-negatie values indicate registers (with `index == operandVal`) and negative values indicate constants (with `index == ~operandVal`). - Registers are stored in the "stack frame" for a VM function call, and each has a fixed offset based on the size of the type and those that come before it. Conceptually, registers are allowed to overlap if they aren't live at the same time, and we manage this with a simple stack model: every register is supposed to identify the register that comes directly before it (this isn't implemented yet). - "Constants" are more realistically a representation of "captured" values, but they are currently also how constants come in. Basically we can use a compact range of indices in the bytecode for a function, and each of these indices indirectly refers to some value in the next outer scope. - The actual encoding of bytecode instructions right now is largely ad-hoc and very wasteful (we encode the type on everything, and we also encode everything as if it had varargs). - In some cases, an instruction needs to know the types of the values involved (e.g., because it needs to load an array element, which means copying a number of bytes based on the size). The way the VM works we have types attached to our registers, so we currently get sneaky and look at those types in some ops. Longer term is makes sense to encode the required type info directly in the BC. - There's a whole lot of hand-waving going on with how the actual top-level bytecode module gets loaded, because of the way we currently treat the top-level module as an instruction stream in the IR. This means that we try to represent the loaded module as a "stack frame" for a call to the module as a function, but that approach as serious problems, and isn't realistically what we want to do.