GSoC Week 5: What is full_type ? Oh, grammar don't hurt me, no more
18 Jul 2022
This blog post is related to my GSoC 2022 project.
In the previous week, I finished up Part I of my GSoC project, and began work on the preparation for Part II, where me and
@Josh
redo the EDL compiler.
There were a few small nags remaining in the buffer functions, relating to incorrect calculations of truncated byte counts
and out-of-bounds access in a few functions. After fixing these, I wrote up documentation for most of the buffer functions,
except for game_load_buffer
and game_save_buffer
as these are not done yet.
As the wiki is still not working, I had to resort to writing the documentation in the form of doxygen comments
for the functions declared in buffer.h
. If the wiki does get fixed, moving these docs to it will be quite trivial,
as it will just be a matter of copying them from the source and pasting them into the wiki.
As it stands currently, the Part I pull request is ready for reviews and merging, with tests written for pretty much all the functions. This ends Part I of my GSoC project.
After this, I started work on the EDL compiler again to finish up parsing declarations, which was much trickier than I
expected. I created a pull request to track my progress
with regards to parsers being finished for expressions, statements and declarations. Till now, expression parsing is mostly
done, with a few expressions which depend on type parsers (like cast, sizeof
and new
expressions) remaining.
Statement parsing is also mostly done, with everything except for the for
-loop, switch
-statement
and block-statement remaining. As the for-loop can be somewhat ambiguous to parse, I have left it to be looked at later.
I have not done switch statements mainly because I forgot about them and block statements rely on the declaration parser
which is not yet complete.
Unfortunately, much of my time this week was spent on trying to understand how the declaration grammar works, and also how
JDI implements parsing it as the EDL compiler will use it’s full_type
class to represent parsed types. Thus,
I was not really able to commit a lot of useful code to my PR.
The problem comes mainly from how the type specifier / declarator grammar is structured, which leads to immediate confusion from how many different nested rules there are and doubts about how to parse it because so many of those rules begin with the same tokens, and only differ when you have parsed them halfway. So, simply put, declarations work like so:
-
Any declaration consists of a sequence of type specifiers and then a sequence of comma-separated declarators, each of which may optionally be followed by an initializer.
-
A type specifier can be a type name (something built in like
int
or user-defined likefoo
), a modifier (const
,signed
,short
etc.), a template type (an identifier followed by angle brackets containing template arguments) or a nested name (something followed by ‘::’). -
A declarator effectively declares the syntax with which a variable is supposed to be used. For instance, a pointer is prefixed with
*
in it’s declarator because to use a pointer you have to dereference it using the*
operator. There are two different kinds of declarators: abstract and… non-abstract? Abstract declarators do not name any variable, whereas non-abstract ones do. The non-abstract declarators are used in declarations, as they obviously have to declare a variable.This design comes from C, where Dennis Ritchie had the idea of declarations mirroring usage. This design over time has led to many generations of programmers scratching their heads about which
*
and which parentheses go where, and modern languages like Rust and Go have done away with this archaic system entirely. However, as EDL is supposed to be a gentle introduction to C++, we use this syntax. -
Initializers are really simple: an equals sign (‘=’) followed by an expression or a
()
/{}
initializer.
A few examples of declarations:
-
const std::vector<int> x { 1, 2, 3, 4, 5 };
Type specifiers:
const
, andstd::vector<int>
(nested name followed by template type)Declarators:
x
Initializer:
{ 1, 2, 3, 4, 5, }
-
const unsigned int **y = &foo;
Type specifiers:
const
,unsigned
andint
Declarators:
**y
Initializer:
&foo
-
float *x = nullptr, **const y = &bar, (Foo::*z)(int, int) = nullptr;
Type specifiers:
float
Declarators:
*x
,**const y
,(Foo::*z)(int, int)
(Pointer to member function)Initializers:
nullptr
and&bar
Examples of abstract declarators are just declarators without the names: *
in place of *x
,
**const
in place of **const y
and (Foo::*)(int, int)
in place of (Foo::*z)(int, int)
.
The really tricky part comes mainly from two places: the <declarator>
rule vs. the <abstract-declarator>
rule, and the <qualified-id>
rule vs. the <ptr-operator>
rule. In the case
of <declarator>
, there is no way of knowing if we are parsing a normal declarator or an abstract one
until we reach the identifier being declared. This causes issues when parsing function parameters, for example:
int foo(int **, float *const x);
Here the first parameter is an abstract declarator (consisting of **
), and the second one is a non-abstract declarator,
(consisting of *const x
), however this information is only found out after parsing the declarator itself. The
solution to this then lies in somehow merging the two rules to handle it as an abstract declarator optionally followed by
a qualified or unqualified id, which is not really possible or worth doing in the grammar however might be possible in the
parser source itself.
The story for <qualified-id>
is similar, for example:
int (foo::bar) = 0;
// versus
int (foo::*bar) = nullptr;
In the first case, the declarator (foo::bar)
comes under the qualified id rule (a nested name specifier followed
by an identifer), whereas in the second case, the declarator (foo::*bar)
comes under the pointer operator rule
(a nested name specifier followed by *
) followed by an identifier.
The problem here comes from the <ptr-operator>
rule including an alternative <nested-name-specifier> * <cv-qualifier-seq>?
,
which clashes with the <nested-name-specifier> <unqualified-id>
rule of <qualified-id>
.
To resolve this, it might be possible in both rules’ parsers to see if a *
follows the nested name specifier
or not, and use that information to decide what rule to apply.
Until this issue is resolved, I am spending time trying to figure out how to use full_type
properly and seeing
how the various definition
types are used within JDI. I expect my output to start dropping somewhat now, as
I head to college tomorrow and classes start the day after. Hopefully, by the end of this month, the parser issues are sorted
out and I can work on Part II of my GSoC project.