slang.git/source/core/slang-riff.h, branch master

Add a memory-mappable binary serialization format (#7222)

2025-05-30T17:00:38+00:00

The files `slang-fossil.{h,cpp}` define a new serialization format that is designed to support data being memory-mapped in and then traversed as-is. The `docs/design/serialization.md` document was updated with details on this new format. The `slang-serialize-fossil.{h,cpp}` files define implementations of the recently introduced `ISerializerImpl` interface for reading/writing this new binary format. The overall structure of these implementations is heavily based on the existing RIFF implementation from `slang-serialize-riff.{h,cpp}`. Switching the AST serialization over to use this format required almost no changes to `slang-serialize-ast.cpp`. The new format is more space-efficient than the RIFF-based format in memory (by factor of over 2x), but is actually *worse* than the RIFF-based format in terms of how it affects the size of `slang.dll`, because the new format is seemingly less amenable to LZ4 compression. A few pieces of utility code were added or moved as part of this work: * The `core/slang-internally-linked-list.*` implementation is just a type that was used as part of `core/slang-riff.*`, but that wasn't really RIFF-specific. * The `core/slang-blob-builder.*` files implement a low-level utility for building a binary format in memory out of "chunks". The overall structure of this type is based on the RIFF-specific builder implementation, but has been generalized so that it should apply to other kinds of binary serialization. * The `core/slang-relative-ptr.h` file implements a simple relative pointer type, which is currently only used by the `slang-fossil.h` format. If there are concerns about adopting the new format immediately for the AST, this change could be modified to introduce all the new code, but leave the AST serialization using the previous RIFF-based format.

Fix nullptr_t compiling failure (#7240)

2025-05-26T22:36:11+00:00

Fixes when building with Clang 14.0

Generalize serialization system used for AST (#7126)

2025-05-21T04:55:39+00:00

This change takes the new approach to serialization that was used for the AST and generalizes it in a few ways: * The new approach is no longer tangled up with the RIFF format. The serialization system supports multiple different implementations of the underlying format. The existing RIFF format is now supported as one back-end, but support for others will follow in subsequent changes. * The new approach is no longer deeply specialized to AST serialization. The old code had things like serialization for `List`s and `Dictionary`s, but it was embedded inside the `AST{Encoding|Decoding}Context`, and thus couldn't be leveraged for other serialization tasks. This change factors out a completely AST-independent `Serializer` implementation, with an `ASTSerializer` layered on top of it to provide the additional context needed. * There is less duplication of code between reading and writing of serialized data. The old code had both the `ASTEncodingContext` and `ASTDecodingContext`, with serialization logic for most types being implemented in both, but with the constraint that those implementations needed to be kept in sync to avoid serialization-related runtime failures. A key property of the revamped approach is that a single `serialize()` method for a type implements both the reading and writing directions of serialization.

Cleanups related to RIFF support (#7041)

2025-05-12T17:28:05+00:00

A new approach to AST serialization (#6854)

2025-04-22T20:26:57+00:00

* A new approach to AST serialization This change completely overhauls the way that AST nodes are being serialized, and the offline source-code generation steps that enable that serialization. In practice, this ends up being a complete overhaul of the way that *modules* are being serialized (not just the AST part), although things like the serialization format for the Slang IR and for source locations are not affected. The rest of this commit message is broken down in to sections, in an attempt to help guide anybody looking at the code in how to make sense of all the changes. The Old C++ Extractor --------------------- AST serialization used to be driven by information scraped using the `slang-cpp-extractor` tool, which did an ad hoc parse of the C++ declarations of the AST node types and then generated a set of "X macros" that could be for macro-based code generation within the rest of the compiler. While the existing approach was functional, it wasn't easy to understand or maintain, and it has been getting in the way of forward progress on other features we'd like to work on in the language and compiler. This change removes the `slang-cpp-extractor` tool entirely. Marking Up the AST Declarations ------------------------------- The most notable change that contributors to the compiler may notice is the large number of invocations of a macro `FIDDLE()` on the declarations of the AST node types. The basic idea is that only declarations (namespaces, types, fields) that are preceded by `FIDDLE()` are visible to the code generator tool. So if somebody is working with the AST and wondering why a new node type isn't working, or why a field they added isn't being serialized correctly, it is probably because they need to add `FIDDLE()` in front of it. Generating the Boilerplate Code ------------------------------- The file `slang-ast-boilerplate.cpp` provides a good example of how the information extracted from the marked-up AST declarations gets used. In that file, the `FIDDLE TEMPLATE` construct is used to generate type information for each of the AST node types. Similar logic is used in `slang-ast-forward-declarations.h` to generate the declaration of the `ASTNodeType` enumeration, and forward-declare all the AST node classes. For many parts of the code, simply including that file replaces the need for the old `slang-generated-*.h` files. Replacing Visitors and Related Logic ------------------------------------ The old visitor types for the AST used the macros that were generated by `slang-cpp-extractor`, so something new was needed to replace them. The same goes for the `SLANG_AST_NODE_VIRTUAL_CALL` macros. The core of the solution implemented here is in `slang-ast-dispatch.h`. Given a "dispatchable" AST node type (say, `Expr`), a call like: ``` ASTNodeDispatcher(expr, [&](auto e) { return doSomething(e); }) ``` is an expression of type `R`, which does the equivalent of something like: ``` switch(expr->getTag()) { case ASTNodeType::VarExpr: return doSomething(static_cast(expr)); // ... } ``` The `SLANG_AST_NODE_VIRTUAL_CALL` macro is now implemented in terms of `ASTNodeDispatcher`. The implementation of the visitor types is more involved. The code in this change retains some of the macro names from the original version, just to try and make the parallels more clear. The visitor types are all implemented on top of the `ASTNodeDispatcher` approach, and use `FIDDLE TEMPLATE` to generate all the boilerplate `visit*()` method declarations. Refactoring of `Linkage` Module Loading --------------------------------------- Needing to revisit all the places where modules get deserialized made it clear that there is a lot of complexity and apparent duplication in the core routines on the `Linkage` that get used for loading modules. This change tries to clean up some of that logic, but it is worth noting that there are two legacy features that get in the way of making things as clean as they should be: * The `LoadedModuleDictionary` type that gets passed around a lot exists entirely to handle the corner case where somebody uses the Slang API to perform a compilation with multiple `TranslationUnitRequest`s in the same `FrontEndCompileRequest`, and one of the translation units `import`s the module defined by another of the translation units. * There are a lot of special-case behaviors and routines entirely there to support the `ModuleLibrary` feature, although that feature should be considered deprecated (or at least subject to getting entirely re-designed down the line). The basic idea of the cleanup is that all of the (non-deprecated) ways load a module from a serialized binary, or compile one from source should now bottleneck through `loadModuleImpl`, which then bifurcates into `loadSourceModuleImpl` for the compilation case and `loadBinaryModuleImpl` for the deserialization case. High-Level Serialization Approach --------------------------------- The old serialization logic used the [RIFF](https://en.wikipedia.org/wiki/Resource_Interchange_File_Format) format to encode the high-level structure of things, and this change retains that usage (and actually doubles down on the RIFF usage). The old serialization system relied on the idea that for any given type `Foo` that wants to support serialization, there should be something like a `SerialFooData` type in C++, that can represent the state of a `Foo`, and then the actual serialization applied to that `SerialFooData`. This means that in most cases there are four pieces of code written: * During serialization: * Copying the data of a `Foo` in memory over to a `SerialFooData` in memory * Writing the state of a `SerialFooData` into the serialized data stream * During deserialization: * Reading the state of a `SerialFooData` from a serialized data stream * Copying the data of the `SerialFooData` in memory over to a `Foo` The new logic gets rid of the intermediate `SerialFooData`. In the serialization direction, we take a `Foo` and write it to the `RIFFContainer` directly, or using some other utilities layered on top of it. In the deserialization direction, we have additional flexibility. Given a `RIFFContainer::Chunk*` that represents a serialized `Foo`, we often navigate through the in-memory representation of the RIFF data to get to the parts of the serialized value that we actually want/need, without needing to deserialize the entire `Foo`. To support this kind of operation, this change introduces a few helper types like `ContainerChunkRef` an `ModuleChunkRef`, that are little more than typed wrappers around a `RIFFContainer::Chunk*`. The Module "Container" Part --------------------------- A serialized `Module` is encoded as a RIFF chunk, using logic in `slang-serialize-container.cpp` - both before and after this change. This change reorganizes a lot of the code in that file, to account for the way that eliminating the intermediate `SerialContainerData` type streamlines the overall task of writing out the parts of the module. In the deserialization logic... there isn't really much to do in `slang-serialize-container.cpp`. Most of the logic in `slang.cpp` and `slang-module-library.cpp` that pertains to deserializing modules uses the `ModuleChunkRef`-based approach, and simply extracts the pieces of the serialized module that it needs. The Actual Serialization of the AST ----------------------------------- The actual AST serialization logic is in `slang-serialize-ast.cpp`. The basic approach in both the writing and reading directions is: * Use the `FIDDLE TEMPLATE` system to generate a set of functions, one for each AST node type, that recursively invoke the read/write logic on each field of that node (after recursively invoking the case for its direct superclass) * Use the `ASTNodeDispatcher` system to dispatch out to those functions whene reading or writing anything derived from `NodeBase` * For now, handle all types *not* derived from `NodeBase` by hand. There's a lot of room for improvement around that last item: it should be just as easy to generate the serialization and deserialization logic for other types that don't inherit from `NodeBase`, but the current change tries to err on the side of making the logic as explicit and simplistic as possible, rather than trying to get too clever too soon. The actual serialization *format* used for the AST is almost comically simplistic: the code uses hierarchical RIFF chunks to emulate a JSON-like structure. This is a very wasteful representation (e.g., a `bool` or a null pointer each take up *8 bytes*), but the goal for now is to start with the simplest thing that could possibly work, and only add more cleverness once we are sure it won't get in the way of important future improvements (like lazy/on-demand deserialization or IR and AST, to improve compiler startup times). The files `slang-serialize.{h,cpp}` have been co-opted to define a new pair of types `Encoder` and `Decoder` that are used for a more-or-less stream-oriented way or reading or writing RIFF chunks for the JSON-like structure. Almost everything related to the actual AST serialization could do with a cleanup pass, and some time spent on picking good/better names for everything. Smaller Stuff ------------- * Cleaned up a lot of code that was using bare `ASTNodeType` or the extractor's `ReflectClassInfo` type to consistently use `SyntaxClass`. * Fixed an apparent bug in how the destination-driven code genarator was handling `TryExpr`s * Fixed an apparent bug in how the GLSL legalization pass was handling translation of certain `SV_*` semantics. * format code * fixup: template errors caught by non-VS compilers * format code * fixup: more template errors * fixup: more stuff VS didn't catch * fixup: it's amazing VS doesn't catch these... * fixup: yet more template stuff VS ignores * fixup: more VS template nonsense * fixup: unreachable return macro usage * fixup: more unreacable returns * fixup: unused parameter * fixup: strict aliasing * fixup: allow missing entry point list chunk * fixup: wasm build script * fixup: AST changes since this PR was created --------- Co-authored-by: slangbot <186143334+slangbot@users.noreply.github.com> Co-authored-by: Yong He

format

2024-10-29T06:49:26+00:00

* format * Minor test fixes * enable checking cpp format in ci

Improvements to repro diagnostics (#2039)

2021-12-03T14:46:08+00:00

* #include an absolute path didn't work - because paths were taken to always be relative. * Improvements to repro diagnostics. * Fix typo.

LZ4 compression support (#1654)

2021-01-11T20:24:11+00:00

* #include an absolute path didn't work - because paths were taken to always be relative. * Testing out use of lz4. * Added ICompressionSystem, and LZ4 implementation. * Add support for deflate compression. Simplify compression interface - to make more easily work across apis. * WIP on CompressedFileSystem. * ImplicitDirectoryCollector * SubStringIndexMap - > StringSliceIndexMap. * WIP save stdlib in different containers. * Support for different archive types for stdlib. * Fix project. * CompressedFileSystem -> ArchiveFileSystem. Added CompressionSystemType::None * Added ArchiveFileSystem * Fix problem RiffFileSystem load withoug compression system. * Test archive types. Improve diagnostic message. * Fix typo in testing file system archives. * Split out archive detection. * Fix gcc warning issue. * Fix warning. * RiffArchiveFileSystem -> RiffFileSystem Co-authored-by: Tim Foley

Serialization fixes based on review of #1547 (#1551)

2020-09-18T17:35:45+00:00

* Test if blob is returned. * Rename serialize files so can be grouped. * StringRepresentationCache -> SerialStringTable * Split out SerialStringTable from slang-serialize-ir * First pass at reorganizing serialization/containers. Remain some issues about debug info. * Fix bug in calculating sourceloc. * Improve calcFixSourceLoc * Make allocations for payload RiffContainer align to at least 8 bytes. This is important for read, if the payload can contain 8 byte aligned data. Note this has no effect on Riff file format alignment rules. * Improve comments around RiffContainer and alignment. * Remove SerialStringTable, can just use StringSlicePool instead. * Add flags to control what is output in SerialContainer. Turn off AST output for obfuscated code. Lazily create astClasses when doing write container serialization. * Typo fix for Clang/Linux. * Fixes that came out of review * TranslationUnit -> Module * TargetModule -> TargetComponent * PAYLOAD_MIN_ALIGNMENT -> kPayloadMinAlignment

Share debug information between AST and IR (#1547)

2020-09-17T20:47:57+00:00

* Test if blob is returned. * Rename serialize files so can be grouped. * StringRepresentationCache -> SerialStringTable * Split out SerialStringTable from slang-serialize-ir * First pass at reorganizing serialization/containers. Remain some issues about debug info. * Fix bug in calculating sourceloc. * Improve calcFixSourceLoc * Make allocations for payload RiffContainer align to at least 8 bytes. This is important for read, if the payload can contain 8 byte aligned data. Note this has no effect on Riff file format alignment rules. * Improve comments around RiffContainer and alignment. * Remove SerialStringTable, can just use StringSlicePool instead. * Typo fix for Clang/Linux. Co-authored-by: Tim Foley