How Groovy compiles

In a previous post, I briefly wrote how Groovy compiler takes advantage of ANTLR [1] in the compilation process. In this post, I would like to elaborate on that. Simply put, Groovy goes through the following general phases:

Parsing

Groovy, in past, has had an internal parser. When ANTLR was added to Groovy, seemingly, they still somehow first use the old one to read the source, called CST, then use the ANTLR parser plugin to read and adapt to the newer version. After parsing the source code, there is some facility that converts CST to AST [2] of ANTLR.

Groovy introduces its own complete data structure set to represent a compilation unit which is usually a script file containing at least one class or some statements of Groovy language. The CST is build to represent the whole structure of the script and then it is converted to the AST. More clarification on this is required.

Byte Code Generation

When you first get introduced to ANTLR, the recommended approach to use ANTLR in programming language development is [3]:

  1. Develop a parser grammar with which ANTLR generates the lexer and the parser for the grammar of the language. It is recommended to use the output format of AST in ANTLR that will give out the parse tree according to the language.
  2. Develop a tree grammar according to the parser grammar so that when a sample program is input, you have the ability to traverse the structure of the program and inject the required actions on different nodes of the program.
  3. Develop actions or take advantage of string templates for some form of output such as translation to a lower level opcodes.

Well, Groovy does not take the recommended approaches. Instead, and for the CST reason, they heavily take advantage of Visitor [4] pattern. Groovy introduces a GroovyCodeVisitor containing all the methods required for every possible construct in the Groovy language required in the compile process. One implementation of this visitor is AsmClassGenerator. As its name says, it uses ASM to generate byte code while it is based on visitor patter. Specifically, when the byte code generation begins, AsmClassGenerator receives an instance of ClassNode which is the root for the whole source unit parsed and converted to AST. It starts to traverse the children of the root node and visits every node in the tree. In every node, Groovy actually takes advantage of ASM’s facility called ClassVisitor. The ASM’s ClassVisitor is also based on the visitor pattern. So, for instance, when in a level of a statement or a class declaration, the visitor pattern in Groovy class generator takes advantage of the ASM’s ClassVisitor’s different visit methods to actually generate the byte codes for the current node in the AST structure. The concrete instance of the ClassVisitor that Groovy uses is ClassWriter; it has methods to generate operation byte code for different constructs according to JVM byte code specification.

So, at the end, when the starting class node is completely visited, on the other side of the story, the ASM’s class visitor has actually all the byte code for the whole class.

References

  1. [1]: http://www.antlr.org/
  2. [2]: http://www.antlr.org/wiki/display/ANTLR3/Interfacing+AST+with+Java
  3. [3]: The Definitive ANTLR Reference
  4. [4]: http://en.wikipedia.org/wiki/Visitor_pattern