diff options
| author | jsmall-nvidia <jsmall@nvidia.com> | 2020-10-23 16:39:18 -0400 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2020-10-23 16:39:18 -0400 |
| commit | e702b704e15a3f0dcbcac6ae82b7cb3c10a4ced2 (patch) | |
| tree | 61d16e88b301ae021b301338851d3b6ccd274efb /source/slang | |
| parent | 051b20c218124e9ffc72ae31b95529b35aa9a43c (diff) | |
Serialization design doc first pass (#1587)
* #include an absolute path didn't work - because paths were taken to always be relative.
* WIP on serialization design doc.
* More docs on serialization design.
* Improve serialization documentation.
Remove unused function from IRSerialReader.
* Small fixes around naming. Remove long comment from slang-serialize.h - as covered in serialization.md
* Remove long comment in slang-serialize.h as covered in serialization.md
* More information about doing replacements on read for AST and problems surrounding.
* Typo fix.
* Spelling fixes.
Diffstat (limited to 'source/slang')
| -rw-r--r-- | source/slang/slang-serialize-ir.h | 3 | ||||
| -rw-r--r-- | source/slang/slang-serialize-source-loc.h | 4 | ||||
| -rw-r--r-- | source/slang/slang-serialize-types.h | 4 | ||||
| -rw-r--r-- | source/slang/slang-serialize.h | 235 |
4 files changed, 7 insertions, 239 deletions
diff --git a/source/slang/slang-serialize-ir.h b/source/slang/slang-serialize-ir.h index c3c3bcf19..038335a60 100644 --- a/source/slang/slang-serialize-ir.h +++ b/source/slang/slang-serialize-ir.h @@ -68,9 +68,6 @@ struct IRSerialReader { typedef IRSerialData Ser; - /// Read potentially multiple modules from a stream - static Result readStreamModules(Stream* stream, Session* session, SourceManager* manager, List<RefPtr<IRModule>>& outModules, List<FrontEndCompileRequest::ExtraEntryPointInfo>& outEntryPoints); - /// Read a stream to fill in dataOut IRSerialData static Result readContainer(RiffContainer::ListChunk* module, SerialCompressionType containerCompressionType, IRSerialData* outData); diff --git a/source/slang/slang-serialize-source-loc.h b/source/slang/slang-serialize-source-loc.h index c8f06d6eb..5ebd264cc 100644 --- a/source/slang/slang-serialize-source-loc.h +++ b/source/slang/slang-serialize-source-loc.h @@ -141,7 +141,7 @@ public: class SerialSourceLocReader : public RefObject { public: - static const SerialExtraType kExtraType = SerialExtraType::DebugReader; + static const SerialExtraType kExtraType = SerialExtraType::SourceLocReader; Index findViewIndex(SerialSourceLocData::SourceLoc loc); @@ -170,7 +170,7 @@ protected: class SerialSourceLocWriter : public RefObject { public: - static const SerialExtraType kExtraType = SerialExtraType::DebugWriter; + static const SerialExtraType kExtraType = SerialExtraType::SourceLocWriter; class Source : public RefObject { diff --git a/source/slang/slang-serialize-types.h b/source/slang/slang-serialize-types.h index 8df2f362f..9bb84e290 100644 --- a/source/slang/slang-serialize-types.h +++ b/source/slang/slang-serialize-types.h @@ -14,8 +14,8 @@ namespace Slang { // An enumeration of types that can be set enum class SerialExtraType { - DebugReader, - DebugWriter, + SourceLocReader, + SourceLocWriter, CountOf, }; diff --git a/source/slang/slang-serialize.h b/source/slang/slang-serialize.h index 4c6a57b34..0e7fdd68a 100644 --- a/source/slang/slang-serialize.h +++ b/source/slang/slang-serialize.h @@ -19,238 +19,9 @@ namespace Slang class Linkage; /* -General Serialization Overview -============================== +A discussion of the serialization system design can be found in -The AST node types are generally types derived from the NodeBase. The C++ extractor is used to associate an ASTNodeType with -every NodeBase type, such that casting is fast and simple and we have a simple integer to uniquely identify those types. The -extractor also performs another task of associating with the type name all of the fields held in just that type. The definition -of the fields is stored in an 'x macro' which is in the slang-ast-generated-macro.h file, for example - -``` -#define SLANG_FIELDS_ASTNode_DeclRefExpr(_x_, _param_)\ - _x_(scope, (RefPtr<Scope>), _param_)\ - _x_(declRef, (DeclRef<Decl>), _param_)\ - _x_(name, (Name*), _param_) -`` - -For the type DeclRefExpr, this holds all of the fields held in just DeclRefExpr in this case `scope`, `declRef` and `name`. -DeclRefExpr derives from Expr and this might hold other fields and so forth. - -The implementation makes a distinction between the 'native' types, the regular C++ in memory types and 'serial' types. -Each serializable C++ type has an associated 'serial' type - with the distinction that it can be written out and (with perhaps some other data) -read back in to recreate the C++ type. The serial type can be a C++ type, but is such it can be written and read from disk and still -represent the same data. - -We need a mechanism to be able to do do a conversion between native and serial types. To make the association we use the template - -``` -template <typename T> -struct SerialTypeInfo; -``` - -and specialize it for each native type. The specialization holds - -SerialType - The type that will be used to represent the native type -NativeType - The native type -SerialAlignment - A value that holds what kind of alignment the SerialType needs to be serializable (it may be different from SLANG_ALIGN_OF(SerialType)!) -toSerial - A function that with the help of ASTSerialWriter convert the NativeType into the SerialType -toNative - A function that with the help of ASTSerialReader convert the SerialType into the NativeType - -It is useful to have a structure that holds the type information, so it can be stored. That is achieved with - -``` -template <typename T> -struct SerialGetType; -``` - -This template can be specialized for a specific native types - but all it holds is just a function getType, which returns a SerialType*, -which just holds the information held in the SerialTypeInfo template, but additionally including the size of the SerialType. - -So we need to define a specialized SerialTypeInfo for each type that can be a field in a NodeBase/RefObject derived type. We don't need to define -anything explicitly for the NodeBase derived types, as we will just generate the layout from the fields. How do we know the fields? We just -used the macros generated from the C++ extractor. - -So first a few things to observe... - -1) Some types don't need any conversion to be serializable - int8_t, or float the bits can just be written out and read in (1) -2) Some types need a conversion but it's very simple - for example an enum without explicit size, being written as an explicit size -3) Some types can be written out but would not be directly readable or usable with different targets/processors, so need converting -4) Some types require complex conversions that require programmer code - like Dictionary/List - -For types that need no conversion (1), we can just use the template SerialIdentityTypeInfo - -``` -template <> -struct SerialTypeInfo<SomeType> : public SerialIdentityTypeInfo<SomeType> {}; -``` - -This specialization means that SomeType can be written out and read in across targets/compilers without problems. - -For (2) we have another template that will do the conversion for us - -``` -template <typename NATIVE_T, typename SERIAL_T> -struct SerialConvertTypeInfo; -``` - -That we can use as above, and specify the native and serial types. - -For (3) there are a few scenarios. For any field in a serial type we must store in the serialized type such that the representation -will work across all processors/compilers. So one problematic type is `bool`. It's not specified how it's laid out in memory - and -some compiles have stored it as a word. Most recently it's been stored as a byte. To make sure bool is ok for serialization therefore -we store as a uint8_t. - -Another example would be double. It's 64 bits, but on some arches/compilers it's SLANG_ALIGN_OF is 4 and on others it's 8. On some -arches a non aligned read will lead to a fault. To work around this problem therefore we have to ensure double has the alignment that -will work across all targets - and that alignment is 8. In that specific case that issue is handled via SerialBasicTypeInfo, which -makes the SerialAlignment the sizeof the type. - -For (4) there are a few things to say. First a type can always implement a custom version of how to do a conversion by specializing -`SerialTypeInfo`. But there remains another nagging issue - types which allocate/use other memory that changes at runtime. Clearly -we cannot define 'any size of memory' in a fixed SerialType defined in a specialization of SerialTypeInfo. The mechanism to work around -this is to allow arbitrary arrays to be stored, that can be accessed via an SerialIndex. This will be discussed more once we discuss -a little more about the file system, and SerialIndex. - -Serialization Format -==================== - -The serialization format used is 'stream-like' with each 'object' stored in order. Each object is given an index starting from 1. -0 is used to be in effect nullptr. The stream looks like - -``` -SerialInfo::Entry (for index 1) -Payload for type in entry - -SerialInfo::Entry (for index 2) -Payload for type in entry - -... -... - -That when writing we have an array that maps each index to a pointer to the associated header. We also have a map that maps native pointers -to their indices. The Payload *is* the SerialType for thing saved. The payload directly follows the Entry data. - -Each object in this list can only be a few types of things - -* NodeBase derived type -* RefObject derived type -* String -* Array - -The actual Entry followed by the payloads are allocated and stored when writing in a MemoryArena. When we want to write into a stream, we -can just iterate over each entry in order and write it out. - -You may have spotted a problem here - that some Entry types can be stored without alignment (for example a string - which stores the length -VarInt encoded followed by the characters). Others require an alignment - for example an NodeBase derived type that contains a int64_t will -*require* 8 byte alignment. That as a feature of the serialization format we want to be able to just map the data into memory, and be able -to access all the SerialType as is on the CPU. For that to work we *require* that the payload for each entry has the right alignment for -the associated SerialType. - -To achieve this we store in the Entry it's alignment requirement *AND* the next entries alignment. With this when we read, as we as stepping -through the entries we can find where the next Entry starts. Because the payload comes directly after the Entry - the Entrys size must be -a modulo of the largest alignment the payload can have. - -For the code that does the conversion between native and serial types it uses either the SerialWriter or SerialReader. This provides -the mechanism to turn a pointer into a serializable ASTSerialIndex and vice versa. There are some special functions for turning string like -types to and forth. - -The final mechanism is that of 'Arrays'. An array allows reading or writing a chunk of data associated with a ASTSerialIndex. The chunk of -data *must* hold data that is serializable. If the array holds pointers - then the serialized array must hold SerialIndices that -represent those pointers. When reading back in they are converted back. - -Arrays are the escape hatch that allows for more complex types to serialize. Dictionaries for example are saved as a serial type that is -two SerialIndices one to a keys array and one to a values array. - -Note that writing has two phases, serializing out into an SerialWriter, and then secondly writing out to a stream. - -Object/Reference Types -====================== - -When talking about Object/Reference types this means types that can be referenced natively as pointers. Currently that means NodeBase and -some RefObject derived types. - -The SerialTypeInfo mechanism is generally for *fields* of object types. That for derived types we use the C++ extractors -field list to work out the native fields offsets and types. With this we can then calculate the layout for NodeBase types such that they -follow the requirements for serialization - such as alignment and so forth. - -This information is held in the SerialClasses, which for a given TypeKind/SubType gives a SerialClassInfo, that specifies fields for -just that type. - -Reading -======= - -Due to the care in writing reading is relatively simple. We can just take the contents of the file and put in memory, as long as in memory -it has an alignment of at least MAX_ALIGNMENT. Then we can build up an entries table by stepping through the data and writing the pointer. - -The toNative functions take an SerialReader - this allows the implementation to ask for pointers and arrays from other parts of the serialized -data. It also allows for types to be lazily reconstructed if necessary. - -Lazy reconstruction may be useful in the future to partially reconstruct a sub part of the serialized data. In the current implementation, lazy -evaluation is used on Strings. The m_objects array holds all of the recreated native 'objects'. Since the objects can be derived from different -base classes the associated Entry will describe what it really is. - -For the String type, we initially store the object pointer as null. If a string is requested from that index, we see if the object pointer is null, -if it is we have to construct the StringRepresentation that will be used. - -An extra wrinkle is that we allow accessing of a serialized String as a Name or a string or a UnownedSubString. Fortunately a Name just holds a string, -and a Name remains in scope as long as it's NamePool does which is passed in. - -Other Reading issues -==================== - -## SourceLoc - -SourceLoc present a problem. If we follow the simple mechanism described above, then we require two things - -1) That the SourceLoc information is blossomed before anything that defines a SourceLoc -2) That the structure for accessing SourceLoc information is conveniently available. - -This was sidestepped previously because the SourceLoc information was held in a different structure, and a separate Riff section. It was deserialized -before anything else took place. - -That *is* a strategy we could use here. That we could make the SourceLoc information generally serialized. On loading locate it in a Riff section -deserialize it (perhaps with general serialization), then deserialize the rest using this structure. - -## IRModule - -In this case we may want to have IRModule serialized in someway unlike the generalized serialization (for example supporting compression). In other -frameworks this aspect might be handing by 'read/writeReplacing'. Doing so would significantly complicate the simple reading mechanism - because instead -of just constructing and referencing we would have to care about construction order. That this could perhaps be achieved by having any reference access -be handled lazily. Note that SourceLoc would still require being handled specially because it requires construction before any SourceLoc is referenced, -and SourceLocs *aren't* pointers. - -## Modified reading - -We could modify reading as follows. - -1) Don't construct anything at the start -2) Find 'root's they must be created and deserialized first - . Any read/writeReplace is a root - . Any marked (like SourceLocData) is a root. (When deconstructed it also needs to add information to the Reader) - . The root of the objects (note we could just deserialize first to last if not already constructed) -3) During deserialization pointer references and constructed on demand -4) Extra code is needed to make sure there aren't cycles. Any object is either Pre/Created/Deserialized. - -For now we might want to just do this with Riff sections for simplicity - -Other Issues -============ - -A final issue is around the special extra types needed for serializing or deserializing. SourceLoc information (on reading and writing), -but it could be other types in the future. - -We probably don't want to have them as specific types on the SerialReader/SerialWriter, as doing so requires exposing the types to this interface. -What we really want is a mechanism for the Reader/Writer where it's possible to get a pointer based on some type. We want this to be fairly fast -because every SourceLoc reference will have to do this lookup. - -We could use an enum, and just have an array of pointers on the reader and writer. How that pointer is interpreted is dependent on the Reader/Writer. -This would be very fast, extendable without making types specific. On debug builds we could do a dynamic cast to make sure it is the expected type. - -Rich Information -================ - -Nothing is done here about versioning, patching, backward or forward compatibility. +docs/design/serialization.md */ // Predeclare @@ -636,7 +407,7 @@ struct SerialField static SerialField make(const char* name, T* in); const char* name; ///< The name of the field - const SerialFieldType* type; ///< The type of the field + const SerialFieldType* type; ///< The type of the field uint32_t nativeOffset; ///< Offset to field from base of type uint32_t serialOffset; ///< Offset in serial type }; |
