slang.git/source/slang/slang-serialize-ir.cpp, branch master

Improve performance of AST deserialization (#7935)

2025-08-07T02:30:57+00:00

* Improve performance of AST deserialization The primary goal of these changes is to reduce the total time spent in the global session's `loadBuiltinModule()`, which gets called as part of global session creation to load the core module, and thus impacts every invocation of `slangc` and every user of the Slang compiler API. The majority of the time is spent simply deserializing the core module's AST and IR and, of those two, the AST takes significantly longer to load than the IR (in the ballpark of 5x the time). This change is focused on the serialization infrastructure but, given the performance situation described above, the focus is first and foremost on *deserialization* performance for the Slang *AST*, when using the *fossil* format. That focus shows through in the changes that have been implemented. Change serialization framework to use `template` instead of `virtual` ===================================================================== The recently-introduced serialization framework in `slang-serialize.h` was centered around a dynamically-dispatched `ISerializerImpl` interface. As a result, every single invocation of a `serialize(...)` call ultimately went through `virtual` function dispatch. While the overhead of the `virtual` calls themselves does not have a major impact on the total deserialization performance, those calls end up serving as a barrier to further optimization. This change changes operations that used to take a `Serializer const&` (which wraps an `ISerializerImpl*`), to instead declare a template parameter `` and take an `S const&`. The main consequence of the change is that `serialize()` functions for user-defined types will need to be template functions, and thus either be defined in headers (alongside the type that they serialize) or else in the specific source file that handles serialization (as is currently being done for the AST-related types in `slang-serialize-ast.cpp`). Note that if we later decide that we want the ability to perform serialization through a dynamically-dispatched interface (e.g., to easily toggle between different serialization back-ends), it will be easier to layer a dynamically-dispatched implementation on top of the statically-dispatched `template` version than the other way around. Generous use of `SLANG_FORCE_INLINE` ==================================== In order to unlock further optimizations, a bunch of operations were marked with `SLANG_FORCE_INLINE`. It is important to note that forcing inlining like this is a big hammer, and needs to be approached with at least a little caution. The simplest cases are: * trivial wrapper function that just delegate to another function * functions that only have a single call site (but exist to keep abstractions clean) Externalize Scope for `begin`/`end` Operations ============================================== The old `ISerializerImpl` interface had a bunch of paired begin/end operations that define the hierarchical structure of data being read. Most serializer implementations (whether for reading or writing) use these operations to help maintain some kind of internal stack for tracking state in the hierarchy. The overhead of maintaining such a stack with something like a `List` amortizes out over many operations, but even that overhead is unnecessary when the begin/end pairs are *already* mirroring the call stack of the code invoking serialization. This change modifies the `ScopedSerializerFoo` types so that they each provide a piece of stack-allocated storage to the serializer back-end's `beginFoo()` and `endFoo()` operations. Currently only the `Fossil::SerialReader` is making use of that facility, but the other implementations of readers and writers in the codebase could be adapted if we ever wanted to. Streamline `Fossil::SerialReader` ================================= The most significant performance gains came from changes to the `Fossil::SerialReader` type, aimed at minimizing the cycles spent in the core `_readValPtr()` routine. That function used to have a large-ish `switch` statement that implemented superficially very different reading logic depending on the outer container/object being read from. The new logic pushes more work back on the `begin` and `end` operations (which get invoked far less frequently than simple scalar/pointer values get read), so that they always set up the state of the reader with direct pointers to the data and layout for the next fossilized value to be read. The remaining work in `_readValPtr()` has been factored into a differnt subroutine - `_advanceCursor()` - that takes responsibility for advancing the data pointer, and updating the various other fields. The `_advanceCursor()` routine is still messier than is ideal, because it has to deal with the various different kinds of logic required for navigating to the next value. Various other conditionals inside the `SerialReader` implementation were streamlined, mostly by collapsing the `State::Type` enumeration down to only represent the cases that are truly semantically distinct. Evaluated: Streamline Layout Rules for Fossil ============================================= One potential approach that I implemented but then reverted (after finding it had little to no performance impact) was changing the fossil format to always write things with 4-byte alignment/granularity. That would mean values smaller than 4 bytes would get inflated to a full 4 bytes, and scalar values larger than 4 bytes get written with only 4-byte alignment (requiring unaligned loads to read them). I found that the only way to take advantage of the simplified layout rules to improve read performance would be to more-or-less eliminate the use of the layout information embedded in the fossil data, which would make it very difficult to validate that the data is correctly structured. Possible Future Work: Further Type Specialization ================================================= As it stands, the biggest overhead remaining on the critical path of `_readValPtr()` is the way the `_advanceCursor()` logic needs to take different approaches depending on the type of the surrounding context (advancing through elements of a container is very different than advancing through fields of a `struct`, for example). The interesting thing to note is that at the use site within a `serialize()` function, it is usually manifestly obvious which case something is in. If the code uses `SLANG_SCOPED_SERIALIZER_ARRAY` it is in a container, while if it uses `SLANG_SCOPED_SERIALIZER_STRUCT` it is in a struct. This means that the contextual information is staticaly available, but just isn't exposed in a way that lets the core reading logic take advantage of it. A logical extension of the work here would be to expand on the `Scope` idea added in this change such that most of the serialization operations (`handleInt32`, `handleString`, etc.) are actually dispatched through the scope, and then have each of the `SLANG_SCOPED_SERIALIZER_...` macros instantiate a *different* scope type (still dependent on the serializer). * fixup * format code * typo --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>

Perf improvements to IR serialization (#7751)

2025-07-17T06:59:33+00:00

* option to use riff as serialization backend * option to use riff as serialization backend * perf * shuffle code * perf improvements to deserialization * formatting * remove bit_cast * correct IR verification * neaten serialized format * fix peek module info * formatting * remove temporary profiling code * cleanup * fix wasm build * more explicit sizes * deserialize via fossil on 32 bit wasm * Make serialized modules Int size agnostic * reorder stable names to allow range based check for 64 bit constants * format * review comments * fix build * fix * c++17 compat slang-common.h

Stable names and backwards compat for serialized IR modules (#7644)

2025-07-09T06:41:19+00:00

* stable names * tests, options and ci for stable names * Add back compat design document * fix warnings * formatting * comment * neaten * regenerate command line reference * consolidate ci scripts * faster ci * remove libreadline * Move new function to end of interface --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>

Use fossil for IR serialization (#7619)

2025-07-08T02:36:52+00:00

* bottleneck ir module reading and writing * compute/simple working * more complex tests working * neaten * factor out SourceLoc serialization * document changes * Appease clang * Correct name serialization * remove unnecessary code * neaten * neaten

extend fiddle to allow custom lua splices in more places (#7559)

2025-07-01T19:03:41+00:00

* Add fkYAML submodule * Generate slang-ir-inst-defs.h from slang-ir-inst-defs.yaml * generate ir-inst-defs.h * neaten things * neaten inst def parser * add rapidyaml submodule * remove fkyaml * remove fkyaml submodule * remove use of ir-inst-defs.h * format and warnings * fix wasm build * tidy * remove rapidyaml * Extend fiddle to allow custom splices in more places * Use lua to describe ir insts * fix * neaten * neaten * neaten * spelling * neaten * comment comment out assert * merge

Remove some cruft/complexity from IR serialization (#7483)

2025-07-01T01:20:33+00:00

* Remove some cruft/complexity from IR serialization This is a very simple cleanup to unnecessary code paths and remove some flexibility that isn't actually needed, to hopefully simplify the task of more completely overhauling the approach to IR serialization in a later change. The concrete feature that gets removed here is a debug-only feature (which thus shouldn't be affecting any users of Slang) that was added long ago in the life of the compiler as we were working to truly separate the front- and back-ends. At the time there was a lot of code in the compiler back-end that still made use of AST-level data structures, and thus got in the way of our goal to support separate compilation and linking (such that final code generation can only depend on the IR, and not the AST). The option was used to cause the Slang IR to be serialized out and then read back in as part of compilation, to try and enforce that only the wanted constructs could pass through that bottleneck. The idea was only ever half implemented, however, because it made use of a secondary implementation path in IR serialization that supported serializing the "raw" source locations (which are heavily dependent on AST-level information, even down to the number of bytes in source files). This change removes the feature entirely, since it is no longer useful for its intended purpose, and its presence causes there to be entire second code path for source locations in IR serialization that would need to have test coverage if we wanted to be sure it kept working. In addition, our pre-existing infrastructure for module serialization had various options that have either stopped being useful, or were not really useful at the time they were introduced. For example: there are no places in the code today where we attempt to serialize out a module without including both the serialized AST and IR. If that was a feature that we ever supported, the relevant code got removed at some preceding point without breaking any of our tests or (seemingly) upsetting users. Similarly, the options being passed into writing of a serialized module included both a flag to control whether source locations should be serialized *and* a pointer to the `SourceManager` to use in that case... but it was only ever meaningful to set both, or neither. The option has been changed to just be the `SourceManager` pointer, and the name has been updated to reflect its very narrow intended use case. * format code * fixup * regenerate command line reference --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: Yong He

Cleanups related to RIFF support (#7041)

2025-05-12T17:28:05+00:00

Remove support for ad hoc Slang IR compression (#6834)

2025-04-16T19:28:39+00:00

* Remove support for ad hoc Slang IR compression This change is part of a larger effort to clean up the approach to serialization in the Slang compiler. The overall goal is to simplify and streamline all of the serialization-related logic, so that we are left with code that is less "clever," and easier to understand for contributors to the codebase. Removing support for compression of serialized Slang IR has benefits that include: * Reduction in code complexity: consider things like the subtle way that the `FOURCC`s for compressed chunks were being computed from the uncompressed versions, and the mental overhead that goes into understanding that, for anybody who would dare to touch this code. * Reduction in testing burden: there have been, de facto, two very different code paths for serialization of the Slang IR, and it is not clear that the existing test corpus for Slang has sufficient coverage for both options. By having only a single code path, every test that performs any amount of IR serialization helps with test coverage of that one path. * Opportunity to explore alternatives. This is perhaps a reiteration of the first point, but once the code is stripped down to the simplest thing that could possibly work (I am not claiming it has reached that point yet), it becomes easier for contributors to understand, and it becomes more tractable for somebody to come along with an improved approach that performs better (in either compression ratio or performance) while still being maintainable. In my own local setup, I found that removing support for Slang IR compression led to the `slang-core-module-generated.h` file increasing in size from 46.1MB to 47.4MB. This increase in the `.h` file size for the core library binary only resulted in a release build of `slang.dll` increasing from 20.0MB to 20.2MB. Removing the ad hoc compression support has almost no impact on the size of actual binary Slang modules *so long* as the additional LZ4 compression step is being applied to them. * format code --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com>

Improve performance when compiling small shaders. (#6396)

2025-02-23T18:31:05+00:00

Improve performance when compiling small shaders. Avoid copying witness table entries that are not getting used during linking. Avoid copying auto-diff related decorations and derivative functions during linking, if the user modules doesn't use autodiff. Cache operator overload resolution results on global session, so each new Session doesn't need to repetitively run through overload resolution from scratch.

Move switch statement bodies to their own lines (#5493)

2024-11-05T17:47:26+00:00

* Move switch statement bodies to their own lines * format --------- Co-authored-by: Yong He