GSoC Week 7: 'Temporary Initialization' more like 'Permanent Pain'
02 Aug 2022This blog post is related to my GSoC 2022 project.
In the previous week, I worked on getting a new system for storing types into place and then using that system to implement temporary object initialization.
The class I created for this purpose is enigma::parsing::FullType
, which stores all the data necessary to
represent a C++-style declaration, including the type specifier, the declarator, and any flags attached to the declaration.
This class is mostly identical to the previous jdi::full_type
, except that the way declarators are stored has
been redone.
jdi::full_type
uses the jdi::ref_stack
class, which I found lacking for my purposes
involving const
/volatile
pointers and nested declarators (i.e. *(*x)
). In the case
of the latter, the way ref_stack
works is that the components (i.e. the *
s, &
s etc.)
making up the nested declarator (the **
in *(**x)[10]
) are pushed onto the stack after the outer
components are parsed. The design has been done this way so that dereferencing the stack involves just popping off the top
component from the stack. Therefore, if we consider *(**x)[10]
, the stack will look like the following:
Location | Type | Meaning | Component |
---|---|---|---|
TOP -> | RT_POINTERTO | pointer to object | `*` |
RT_POINTERTO | pointer to object | `*` | |
RT_ARRAYBOUND | array bound, size=10 | `*` | |
BOTTOM -> | RT_POINTERTO | pointer to object | `*` |
While this system makes processing and generating the declarator easier, it does not preserve the previous structure
of the declarator so there is no easy way to convert a declarator to an expression, which comes into play later on. In the
new system that I designed, declarators instead of being a stack like before are instead an array consisting of “nodes”
i.e. components. These components are stored under one main type (DeclaratorNode
), and consist of the following
types:
-
PointerNode
(representing*
(pointer to object) and::*
(pointer to member)) -
ReferenceNode
(representing&
(l-value reference) and&&
(r-value or forwarding reference)) -
ArrayBoundNode
(representing[x]
(array bound size) wherex
is an arbitrary expression) -
FunctionParameterNode
(representing(int y, ...)
(function parameters along with default values)) -
NestedNode
(representing the*x
in*(*x)
)
In this system, instead of storing nested declarators after the outer components, they are
stored before them under the special NestedNode
class which stores a declarator inside itself. This also means
that a NestedNode
can store a NestedNode
within itself, which means that arbitrarily nested declarators
can be represented this way. Therefore, the structure of the declarator is preserved. For example, the previous *(**x)[10]
will look like this:
Index | Type | Meaning | Component |
---|---|---|---|
0 | Kind::POINTER_TO | pointer to object | `*` |
1 | Kind::NESTED (Kind::POINTER_TO -> Kind::POINTER_TO) | nested node (containing: pointer to object, pointer to object) | `*` |
2 | Kind::ARRAY_BOUND | array bound, size=10 | `*` |
While this makes processing the declarator harder, it preserves the structure of it exactly so that reproducing it later
on is easier. If the jdi::ref_stack
semantics are required, there is a to_jdi_refstack()
method
defined so that conversion between the two is easier. Another reason this is required is to make storing default values of
function parameters easier: the jdi::ref_stack
class requires a jdi::AST
to be stored for default
values whereas EDL uses enigma::parsing::AST
, so conversion functions would be required to convert the EDL
AST into a JDI AST.
However, the main reason for making this class, as noted above, was converting declarators to expressions. This has to be
done as the parser I am writing tries to parse a declarator first, before bailing out when encountering an expression. For
example, when parsing int(*(*x + 4))
, upto the +
operator, the input is treated as a declarator
consisting of *(*x)
. On encountering the +
, it bails and tries to convert the previously parsed
declarator to an expresion (in this case *x
) then pass it as an argument to TryParseExpression
which takes care of the rest. It then has to somehow signal to the outer (caller) declarator parser that this inner declarator
is actually an expression, which requires storing expressions within the DeclaratorNode
class created to store
the declarator components.
Note, though, that this only requires changing two inner classes and not DeclaratorNode
as a whole, these two
being NestedNode
and FunctionParameterNode
, as these are the only two places in the declarator
where expressions can occur. This is because the places where expressions can occur are:
-
Before the declarator itself, like in
int(++x)
-
Within a parenthesised nested declarator, like in
int(*x + 4)
-
After a parenthesised nested declarator, like in
int((*x) + 5)
-
After a declarator, like in
int(foo::bar + 5)
-
Within function arguments, like in
foo(int(*x + 5))
Therefore, the parser needs special cases to handle all of these possibilities. The first four cases are handled within
TryParseNoPtrDeclarator
and the last case is handled within TryParseParametersAndQualifiers
. There
is a little more code to handle what should happen if an expression is found when the inner declarator is not an expression
however that is all the code needed to handle functional casts.
The major downfall of this system is the inflicted complexity: it makes the code path for the <noptr-declarator>
rule more complex and harder to read because the declarator and expression parsing code is combined into one. Before implementing
this system, @Josh had talked about doing things differently: parse everything as an expression, and filter out whatever
declarators appear within them.
At the time, I did not really understand this system, as I did not really understand
that declarators themselves are generally expressions and I wanted to stick by the rules to interpret the BNF
grammar literally. With his system, it would require two passes: one to parse the expression and another to verify if
the expression can be a declarator. This would be a lot easier with this system, as it would just involve checking if
the expression is either unary-prefix (*
, &
and probably &&
) or binary (with the previous
operators, and []
and ()
) with a type as the left hand operand, and recursively apply that to
the nested expressions. It would also give functional casts by default, switching to declarators wherever needed. In hindsight,
this system would give cleaner code, but would introduce a second pass into parsing, which would incur additional complexity
elsewhere. It would also involve a fair bit of AST rewriting when transforming the expression into the declaration node.
So, would it be better than the current system I made? It is hard to say. The current system is still not too hard to read,
as it is only the one rule that is affected. It is also still a simple, one-pass system which does not rely on any AST
rewriting. This week, I will plumb this new code into TryParseOperand()
and TryReadStatement()
,
and after a few changes the parser will basically be done and I will be ready to start Part II of my GSoC project. I will
not be handling non-type template arguments for now as they involve converting the EDL AST to a JDI one and array bounds
expressions which require implementing the AST::Node::eval()
function for all the various nodes.