GSoC Week 3: Master Of Buffers
04 Jul 2022This blog post is related to my GSoC 2022 project.
This week, I worked on Part I of my GSoC project, i.e. the buffer functions. In this time I refactored most of the functions, and found out how to make them conform to how it works in GameMaker Studio (GMS).
Firstly, to make serialization and deserialization of data more consistent and easy to refactor, I
implemented two functions
to do the job: serialize_to_type()
and deserialize_from_type()
.
These two functions are pretty simple:
-
serialize_to_type()
: takes avariant
, the type it is storing, and serializes it as a vector of bytes. For integers, the serialization is done in big endian form, where the most significant byte of the integer comes first in the stream, and the least significant byte comes last.There is also an interesting issue here: as all numeric values in ENIGMA are stored as
double
s, large values of signed and unsigned integers have to be deserialized specially. Before commit #2e8303486, the value stored inrval.d
was converted to a signed integer, always. However, the problem lies in that when converting from a double to a signed integer, the top bit of the integer is reserved for the sign bit of the double. This causes an issue where unsigned values greater than the signed maximum value (which have the topmost bit set) get divided by half when converting to signed integers as that top bit is reserved for the sign bit.The fix is fairly straightforward: check if
rval.d
is greater than the signed maximum value, and if so, convert to unsigned instead. This does not cause any issues, asrval.d
will only be greater than the maximum value if an unsigned value is stored, whereas a signed integer would wrap around to a negative value and thenrval.d
would be less than the maximum value.Unlike GMS, both
buffer_string
andbuffer_text
are treated as being equivalent, where both consist of a stream of characters terminated by a null terminator. In GMS, onlybuffer_text
is null-terminated. -
deserialize_from_type()
: takes a pair ofstd::vector<std::byte>::iterator
s and the type being converted to, and returns avariant
containing the deserialized value. Checks if the pair of iterators exactly represent the type being deserialized.
After this, I did some refactoring, where I renamed the macros get_buffer
and get_bufferr
to
GET_BUFFER
and GET_BUFFER_R
respectively. These are the macros used to access buffers, and in
debug mode, they are responsible for checking if the buffer being accessed actually exists. Then, I changed the global
enigma::buffers
variable to be a AssetArray<BinaryBufferAsset>
instead of a
std::vector<BinaryBuffer*>
.
AssetArray
is effectively to ENIGMA what std::vector
is to the C++ standard library. It allows
making a container of assets, which require three methods to be implemented:
-
getAssetTypeName()
-
isDestroyed()
-
destroy()
These methods make it possible to delete elements from the sequence without having to use the erase-remove idiom, which
can be quite slow as it is O(n) in terms of time complexity. To implement these methods without changing the BinaryBuffer
class, I created the BinaryBufferAsset
class. This class simply wraps a BinaryBuffer
inside a
std::unique_ptr
, and the methods manipulate the smart pointer. The nullptr
value is considered
as the BinaryBuffer
within being “destroyed”.
I also gave types to three of the enums which are used for the bufer constants:
-
buffer_data_t
: this stores the various data types which the buffer can store, likebuffer_u8
,buffer_u16
,buffer_string
and so on. -
buffer_seek_t
: this stores the three ways in which a buffer’s can seek to a position, namelybuffer_seek_start
,buffer_seek_relative
andbuffer_seek_end
. -
buffer_type_t
: this stores the type of the buffer itself, i.e. how it behaves when it comes to accessing data and resizing itself. This enum consists ofbuffer_grow
,buffer_wrap
,buffer_fixed
andbuffer_fast
. The constantbuffer_vbuffer
is currently not stored in this enum.
Previously, all these values were just passed around as standard integers. The problem with this was that you could pass the wrong type of value to a function and there would be no error as the enum value would be implicitly cast to an int. In fact, it was possible to not even realize that there was an error, as the erroneous value could have had the same integer value as the correct type of value.
After that, I added the ability to have sub-directories of tests within the SimpleTests/
directory. Previously,
the test harness would only check for directories ending with .sog
in the top level directory. However, as I
have decided to make the tests for each buffer function its own separate test, having it all in the main tests directory would
lead to more crowding and would make it harder to navigate through the tests. Therefore, when the test harness finds a directory
ending with the .multi
extension, it considers all the directories within that directory as tests too. This means
that I can group all the buffer tests, each of which begin with the “buffer_” prefix into the buffers.multi
sub-directory.
Finally, after having set all this up, I began work on the buffer functions themselves. The problems here did not come from the implementation being complex (in fact the functions are quite trivial in terms of functionality), they came from wanting to conform with how GMS works so that transitioning from it to ENIGMA would be easier.
So, a quick changelog of each function:
-
buffer_exists()
: no noteworthy changes -
buffer_create()
: refactor into a call tomake_new_buffer
, which deals withAssetArray
-
buffer_delete()
: useAssetArray
’s “destroy()” method -
buffer_read()
: align to buffer’s alignment before reading -
buffer_write()
: align to buffer’s alignment before writing -
buffer_fill()
: make it not completely broken, align to buffer’s alignment before writing -
buffer_seek()
: seek toGetSize() - 1 + offset
when usingbuffer_seek_end
-
buffer_tell()
: no noteworthy changes -
buffer_peek()
: usedeserialize_from_type
, read integral types asdouble
instead oflong
-
buffer_poke()
: add flag to control automatic resizing ofbuffer_grow
, refactor into a call towrite_to_buffer()
-
buffer_load()
: make it not completely broken, switch to a call tostd::transform
instead ofstd::ifstream::read
-
buffer_load_ext()
: switch to a call tostd::transform
instead ofstd::ifstream::read
, refactor into a call towrite_to_buffer()
-
buffer_get_type()
: no noteworthy changes -
buffer_get_alignment()
: no noteworthy changes -
buffer_get_address()
: no noteworthy changes -
buffer_get_size()
: no noteworthy changes -
buffer_resize()
: no noteworthy changes -
buffer_sizeof()
: no noteworthy changes
Out of all of these, buffer_fill()
has caused the most pain till now. While not really difficult to implement,
getting it to conform (mostly) with GMS’s implementation was not fun. In fact, I have had 6 commits till now, updating it
whenever I find a new edge case and test it in GMS to make sure my implementation conforms properly.
I think the reason that I have been having these issues is mainly because of buffer alignment and how I think about it versus how GMS thinks about it. Initially, I implemented buffer alignment by adding padding bytes after the written data, so that the next element being written was always aligned.
What is interesting is that GMS does the opposite: it writes
padding bytes until the data being written would be aligned, and only then does it write the data itself. This means that
instead of writing padding for the next element, GMS writes padding for the current element. Logically, it makes more sense
as it uses less space, however it was confusing when I first encountered it when seeing buffer_fill()
not write
zeroes after it wrote data to a buffer.
I have tried to validate the way these functions work as much as I can, by running GMS inside a Windows VM to test how the
reference implementation works, and also by writing as many tests as I can under the buffers.multi
group. Hopefully,
these tests cover enough edge cases so that the buffer functions work properly in normal use. The functions currently remaining
for refactoring are:
-
buffer_save()
-
buffer_save_ext()
-
buffer_copy()
-
buffer_get_surface()
-
buffer_set_surface()
The functions currently not implemented at all are:
-
buffer_create_from_vertex_buffer()
-
buffer_create_from_vertex_buffer_ext()
-
buffer_save_async()
-
buffer_load_async()
-
buffer_load_partial()
-
buffer_compress()
-
buffer_decompress()
-
buffer_async_group_begin()
-
buffer_async_group_option()
-
buffer_async_group_end()
-
buffer_copy_from_vertex_buffer()
-
buffer_md5()
-
buffer_sha1()
-
buffer_crc32()
-
buffer_base64_encode()
-
buffer_base64_decode()
-
buffer_base64_decode_ext()
-
buffer_set_used_size()
Of these, the ones I will definitely implement are:
-
buffer_compress()
-
buffer_decompress()
-
buffer_md5()
-
buffer_sha1()
-
buffer_crc32()
-
buffer_base64_encode()
-
buffer_base64_decode()
-
buffer_base64_decode_ext()
I am also interested in buffer_load_partial()
and buffer_set_used_size()
. Hopefully, I can finish
these functions this week, and get back to working on the EDL parser, so that I can have as much time as possible to work on
Part II of my GSoC project before college begins.
Incompatibilities with GMS:
-
buffer_text
: not null-terminated, considered equivalent tobuffer_string
-
buffer_grow
: GMS resizes by multiplying previous size by 2, ENIGMA resizes by adding +1 to previous size -
buffer_get_size()
: GMS does not store any extra bytes, ENIGMA always has an extra byte at the end where the next byte will be written because of howBinaryBuffer::Seek()
works -
buffer_poke()
: resizesbuffer_grow
whenresize
flag passed -
buffer_save()
: saves the whole buffer, instead of GMS which seems to only save the buffer up to the point where the last byte was written