GSoC Week 5: What is full_type? Oh, grammar don't hurt me, no more

This blog post is related to my GSoC 2022 project.

In the previous week, I finished up Part I of my GSoC project, and began work on the preparation for Part II, where me and @Josh redo the EDL compiler.

There were a few small nags remaining in the buffer functions, relating to incorrect calculations of truncated byte counts and out-of-bounds access in a few functions. After fixing these, I wrote up documentation for most of the buffer functions, except for game_load_buffer and game_save_buffer as these are not done yet.

As the wiki is still not working, I had to resort to writing the documentation in the form of doxygen comments for the functions declared in buffer.h. If the wiki does get fixed, moving these docs to it will be quite trivial, as it will just be a matter of copying them from the source and pasting them into the wiki.

As it stands currently, the Part I pull request is ready for reviews and merging, with tests written for pretty much all the functions. This ends Part I of my GSoC project.

After this, I started work on the EDL compiler again to finish up parsing declarations, which was much trickier than I expected. I created a pull request to track my progress with regards to parsers being finished for expressions, statements and declarations. Till now, expression parsing is mostly done, with a few expressions which depend on type parsers (like cast, sizeof and new expressions) remaining. Statement parsing is also mostly done, with everything except for the for-loop, switch-statement and block-statement remaining. As the for-loop can be somewhat ambiguous to parse, I have left it to be looked at later. I have not done switch statements mainly because I forgot about them and block statements rely on the declaration parser which is not yet complete.

Unfortunately, much of my time this week was spent on trying to understand how the declaration grammar works, and also how JDI implements parsing it as the EDL compiler will use it’s full_type class to represent parsed types. Thus, I was not really able to commit a lot of useful code to my PR.

The problem comes mainly from how the type specifier / declarator grammar is structured, which leads to immediate confusion from how many different nested rules there are and doubts about how to parse it because so many of those rules begin with the same tokens, and only differ when you have parsed them halfway. So, simply put, declarations work like so:

A few examples of declarations:

Examples of abstract declarators are just declarators without the names: * in place of *x, **const in place of **const y and (Foo::*)(int, int) in place of (Foo::*z)(int, int).

The really tricky part comes mainly from two places: the <declarator> rule vs. the <abstract-declarator> rule, and the <qualified-id> rule vs. the <ptr-operator> rule. In the case of <declarator>, there is no way of knowing if we are parsing a normal declarator or an abstract one until we reach the identifier being declared. This causes issues when parsing function parameters, for example:

int foo(int **, float *const x);

Here the first parameter is an abstract declarator (consisting of **), and the second one is a non-abstract declarator, (consisting of *const x), however this information is only found out after parsing the declarator itself. The solution to this then lies in somehow merging the two rules to handle it as an abstract declarator optionally followed by a qualified or unqualified id, which is not really possible or worth doing in the grammar however might be possible in the parser source itself.

The story for <qualified-id> is similar, for example:

int (foo::bar) = 0;
// versus
int (foo::*bar) = nullptr;

In the first case, the declarator (foo::bar) comes under the qualified id rule (a nested name specifier followed by an identifer), whereas in the second case, the declarator (foo::*bar) comes under the pointer operator rule (a nested name specifier followed by *) followed by an identifier.

The problem here comes from the <ptr-operator> rule including an alternative <nested-name-specifier> * <cv-qualifier-seq>?, which clashes with the <nested-name-specifier> <unqualified-id> rule of <qualified-id>. To resolve this, it might be possible in both rules’ parsers to see if a * follows the nested name specifier or not, and use that information to decide what rule to apply.

Until this issue is resolved, I am spending time trying to figure out how to use full_type properly and seeing how the various definition types are used within JDI. I expect my output to start dropping somewhat now, as I head to college tomorrow and classes start the day after. Hopefully, by the end of this month, the parser issues are sorted out and I can work on Part II of my GSoC project.