GSoC Week 10: Serialization and Deserialization
23 Aug 2022This blog post is related to my GSoC 2022 project.
In the previous week, I was able to get a little bit of work done on the serialization and deserialization process over
the weekend. I implemented functions to serialize and deserialize integral and floating types, and variant
.
There are two kinds of serialization: in-place and allocating. The in-place functions directly take a byte stream to write
to, and assume that there is enough space in the stream beginning from the provided pointer to store the serialized form
of the data. The allocating functions make a std::vector<std::byte>
and after resizing to the required
size, pass it to the in-place functions then return it.
The deserialization functions do not have to deal with in-place or allocating functions, they simply start reading the
byte stream from the provided pointer and return the result by value. All of the work is neatly encapsulated behind
functions and a set of macros (ENIGMA_INTERNAL_OBJECT_*
, which assume code with particular variable declarations).
There is some type magic going on behind the scenes to uniformly represent serialization for these three types of types.
At compile time, depending on the type of the provided argument, the correct routine is chosen using the if constexpr
construct and the std::is_same_v
type trait (to test equivalence of types). These routines then do type-specific
work.
For serialization of types whose size is known statically, there is a base type used which the given value is bitwise cast
to, i.e. a memcpy
from the value to an object of the base type. This base type is chosen such that bitwise
operations can be done on it, and they do not introduce any problems. For example, for signed integers, their unsigned form
is chosen, because trying to shift right with signed integers can cause sign extension, which is not the case with unsigned
integers. For floating point numbers, no bitwise operations are possible, so the equally-sized unsigned integer types are
used. For variant
, the type tag is written before the data, to know how to deserialize it. As it stands currently,
after each value is converted to their base type, the base type is just written LSB-first to the byte stream.
There are three helper functions provided, resize_buffer_for_*
, to automate the process of resizing the buffer
to the correct size before writing data to it. Together with the serialization routines, these form the basis for serialization
of any data using the macros given earlier.
These methods are used in the object heirarchy which represents the game entities in terms of using inheritance for “components”, the heirarchy being:
-
struct OBJ_base
(the type defined for the user by the compiler depending on the object name) -
struct object_locals
-
struct event_parent
-
struct object_collisions
-
struct object_transform
-
struct object_graphics
-
struct object_timelines
-
struct object_planar
-
struct object_basic
Most of these nine classes have data which needs to be serialized and the lower six classes (4 - 9) are quite trivial. This
is because these are statically defined within ENIGMAsystem/SHELL/Universal_System/Object_Tiers/
, whereas
the top three classes are dynamically generated for the game in IDE_EDIT_objectdeclarations.h
and
IDE_EDIT_evparent.h
. Of these, the top two are the hardest, because they require keeping track of user defined
variables and potentially undeclared variables which must be instantiated at runtime. object_locals
keeps track
of access to undeclared variables, and OBJ_base
stores the user-defined variables defined within the event.
I have begun work on generating the code for OBJ_base
, which currently uses the dectrip
system
to represent variable types, which makes analysis harder to do. This is because dectrip
represents the type
as three parts: the type specifier (referred to as type
, the part of the declarator which occcurs before the
variable name (referred to as prefix
), and the part of the declarator which appears after the variable name
(referred to as suffix
). This means that it does not store the structure of the declaration itself,
only the string representation of it. Thus, it is not as easy to tell when a value is a pointer or a reference, or neither.
While this system will likely get ported to the new enigma::parsing::FullType
, there is not enough time left
in the project to finish the conversion. Thus, I will likely finish the serialization and deserialization work using the
old system, and look at porting it over after GSoC ends. For now, the focus is on getting the generated code to work for
the basic types, and also implement the serialization and deserialization for the main var
, which is likely
used more than any other.
In this week, I will probably spend the next 2-3 days working on the code generation if I get an idea of the implementation, after which I will have my exams (from August 28 - September 3). Thus, I will only be available after this period, and at that point only 9 days will be remaining in the GSoC timeline for my project.