Lpg Ast Generation

Posted by mb0 on October 26th, 2007

I could not find any specific documentation (except this) about the automatic AST generation of LPG in Eclipse IMP. That's why I try to note some observations.

In one %options line of the default parser skeleton you see this:

parent_saved,automatic_ast=toplevel,visitor=preorder, ast_directory=./Ast,ast_type=ASTNode

parent_saved saves the parent node in the AST base class. When this option is omitted the getParent() method will throw an UnsupportedOperationException .

automatic_ast will specify whether and how the AST classes are generated. When omitted the other options render useless and AST generation will not occur. Two arguments can be specified. toplevel for generation of each class in its own file or nested for generating the AST classes as nested types of the generated parser class. When specified without argument it will default to nested.

visitor will specify the mode of the generated abstract visitor class. When omitted the visitor will not be generated. When specified without argument it will create three visitor files ArgumentVisitor, ResourceVisitor and AbstractVisitor. With the argument preorder is specified it will generate only the AbstractVisitor with some more convenient interface (generic preVisit and postVisit , specific visit and endVisit ). There might be other options.

ast_directory is straightforward a path where to generate the (toplevel) AST classes. Relative paths are based on the grammar file directory.

ast_type is the name that should be used for the AST base classes ast_type=ASTNode will result in ASTNode , ASTNodeToken and ASTNodeList .

If you generate the example grammar that is created by the IMP wizard you will see that Rule declaration will result in the class declaration and the interface Ideclaration. To preserve coding standards I advise you to use rule names starting with an upper-case letter. For each concrete node a class is created (e.g declaration). This class extends the base AST class and implements the corresponding interface Ideclaration. The interface hierarchy reflects the chain of rules. The basic concept is described in the LPG documentation better than I would summarize it.

In the default example there are seven expression classes (expression0..N). These are ugly. If you look into the difference you will see the next naming convention. Generated classes use the name of the tokens/nodes as field names with an underline prefix.

So expression ::= expression "+" term will generate class expression with field _expression , _PLUS and _term as well as corresponding getters and setters (e.g. getexpression() , getPLUS() ). Because the operator field name differs seven classes are generated. However there are some tricks:

expression$AdditiveExpression  ::=  expression "+"  term
expression$AdditiveExpression  ::=  expression "-"  term
The concrete class will be named AdditiveExpression . The generated class will have both operator field _PLUS and _Minus .
expression  ::=  expression "+"$  term
The operator token will be omitted in the node.
expression ::=  expression "+"$Operator  term
expression  ::=  expression "+"$Operator  term
The operator tokens will be saved in the _Operator field instead of _PLUS and _Minus
exprList$$expr ::= expr | exprList expr
With a double dollar sign ($$ ) you can specify to construct a list of the given AST nodes that extends the base class ASTNodeList. And for completeness some short examples how common automata notation is written in lpg.
Optional : R? <=> R2 ::= %empty | R
None or many: R* <=> R2 ::= %empty | R2 R
One or many: R+ <=> R2 ::= R | R2 R

That's it for now. But because I am very interested in LPG and IMP and my google page rank is impressive for these keywords, I might write more posts.

Comments