GSoC Week 5: What is full_type? Oh, grammar don't hurt me, no more

18 Jul 2022

This blog post is related to my GSoC 2022 project.

In the previous week, I finished up Part I of my GSoC project, and began work on the preparation for Part II, where me and @Josh redo the EDL compiler.

There were a few small nags remaining in the buffer functions, relating to incorrect calculations of truncated byte counts and out-of-bounds access in a few functions. After fixing these, I wrote up documentation for most of the buffer functions, except for game_load_buffer and game_save_buffer as these are not done yet.

As the wiki is still not working, I had to resort to writing the documentation in the form of doxygen comments for the functions declared in buffer.h. If the wiki does get fixed, moving these docs to it will be quite trivial, as it will just be a matter of copying them from the source and pasting them into the wiki.

As it stands currently, the Part I pull request is ready for reviews and merging, with tests written for pretty much all the functions. This ends Part I of my GSoC project.

After this, I started work on the EDL compiler again to finish up parsing declarations, which was much trickier than I expected. I created a pull request to track my progress with regards to parsers being finished for expressions, statements and declarations. Till now, expression parsing is mostly done, with a few expressions which depend on type parsers (like cast, sizeof and new expressions) remaining. Statement parsing is also mostly done, with everything except for the for-loop, switch-statement and block-statement remaining. As the for-loop can be somewhat ambiguous to parse, I have left it to be looked at later. I have not done switch statements mainly because I forgot about them and block statements rely on the declaration parser which is not yet complete.

Unfortunately, much of my time this week was spent on trying to understand how the declaration grammar works, and also how JDI implements parsing it as the EDL compiler will use it’s full_type class to represent parsed types. Thus, I was not really able to commit a lot of useful code to my PR.

The problem comes mainly from how the type specifier / declarator grammar is structured, which leads to immediate confusion from how many different nested rules there are and doubts about how to parse it because so many of those rules begin with the same tokens, and only differ when you have parsed them halfway. So, simply put, declarations work like so:

Any declaration consists of a sequence of type specifiers and then a sequence of comma-separated declarators, each of which may optionally be followed by an initializer.
A type specifier can be a type name (something built in like int or user-defined like foo), a modifier (const, signed, short etc.), a template type (an identifier followed by angle brackets containing template arguments) or a nested name (something followed by ‘::’).
A declarator effectively declares the syntax with which a variable is supposed to be used. For instance, a pointer is prefixed with * in it’s declarator because to use a pointer you have to dereference it using the * operator. There are two different kinds of declarators: abstract and… non-abstract? Abstract declarators do not name any variable, whereas non-abstract ones do. The non-abstract declarators are used in declarations, as they obviously have to declare a variable.

This design comes from C, where Dennis Ritchie had the idea of declarations mirroring usage. This design over time has led to many generations of programmers scratching their heads about which * and which parentheses go where, and modern languages like Rust and Go have done away with this archaic system entirely. However, as EDL is supposed to be a gentle introduction to C++, we use this syntax.
Initializers are really simple: an equals sign (‘=’) followed by an expression or a () / {} initializer.

A few examples of declarations:

const std::vector<int> x { 1, 2, 3, 4, 5 };

Type specifiers: const, and std::vector<int> (nested name followed by template type)

Declarators: x

Initializer: { 1, 2, 3, 4, 5, }
const unsigned int **y = &foo;

Type specifiers: const, unsigned and int

Declarators: **y

Initializer: &foo
float *x = nullptr, **const y = &bar, (Foo::*z)(int, int) = nullptr;

Type specifiers: float

Declarators: *x, **const y, (Foo::*z)(int, int) (Pointer to member function)

Initializers: nullptr and &bar

Examples of abstract declarators are just declarators without the names: * in place of *x, **const in place of **const y and (Foo::*)(int, int) in place of (Foo::*z)(int, int).

The really tricky part comes mainly from two places: the <declarator> rule vs. the <abstract-declarator> rule, and the <qualified-id> rule vs. the <ptr-operator> rule. In the case of <declarator>, there is no way of knowing if we are parsing a normal declarator or an abstract one until we reach the identifier being declared. This causes issues when parsing function parameters, for example:

int foo(int **, float *const x);

Here the first parameter is an abstract declarator (consisting of **), and the second one is a non-abstract declarator, (consisting of *const x), however this information is only found out after parsing the declarator itself. The solution to this then lies in somehow merging the two rules to handle it as an abstract declarator optionally followed by a qualified or unqualified id, which is not really possible or worth doing in the grammar however might be possible in the parser source itself.

The story for <qualified-id> is similar, for example:

int (foo::bar) = 0;
// versus
int (foo::*bar) = nullptr;

In the first case, the declarator (foo::bar) comes under the qualified id rule (a nested name specifier followed by an identifer), whereas in the second case, the declarator (foo::*bar) comes under the pointer operator rule (a nested name specifier followed by *) followed by an identifier.

The problem here comes from the <ptr-operator> rule including an alternative <nested-name-specifier> * <cv-qualifier-seq>?, which clashes with the <nested-name-specifier> <unqualified-id> rule of <qualified-id>. To resolve this, it might be possible in both rules’ parsers to see if a * follows the nested name specifier or not, and use that information to decide what rule to apply.

Until this issue is resolved, I am spending time trying to figure out how to use full_type properly and seeing how the various definition types are used within JDI. I expect my output to start dropping somewhat now, as I head to college tomorrow and classes start the day after. Hopefully, by the end of this month, the parser issues are sorted out and I can work on Part II of my GSoC project.

dc03

GSoC Week 5: What is full_type? Oh, grammar don't hurt me, no more

Related Posts

GSoC 2023 Week 19 and 20 - That's All, Folks! 29 Sep 2023

GSoC 2023 Week 17 and 18 - Same Old, Same Old 10 Sep 2023

GSoC 2023 Week 15 and 16 29 Aug 2023

GSoC 2023 Week 19 and 20 - That's All, Folks!
29 Sep 2023

GSoC 2023 Week 17 and 18 - Same Old, Same Old
10 Sep 2023

GSoC 2023 Week 15 and 16
29 Aug 2023