Combined unparser/parser generator

There are several parser generators that include an implementation of an unparser. One of them is the nearley parser generator for context-free grammars.

It is also possible to implement bidirectional transformations of source code using definite clause grammars. In SWI-Prolog, the phrase/2 predicate can convert an input text into a parse tree and vice-versa.

I have implemented a set of Invertible Parser Combinators in Java and Kotlin. A parser is written pretty much in LL-1 style and it provides a parse- and a print-method where the latter provides the pretty printer.

You can find the project here: https://github.com/searles/parsing Here is a tutorial: https://github.com/searles/parsing/blob/master/tutorial.md And here is a parser/pretty printer for mathematical expressions: https://github.com/searles/parsing/blob/master/src/main/java/at/searles/demo/DemoInvert.kt

Take a look at Invertible syntax descriptions: Unifying parsing and pretty printing.

Our DMS Software Reengineering Toolkit does precisely this (and provides a lot of additional support for analyzing/transforming code). It does this by decorating a language grammar with additional attributes, producing what is called an attribute grammar. We use a special DSL to write these rules to make them convenient to write.

It helps to know that DMS produces a tree based directly on the grammar.

Each DMS grammar rule is paired with with so-called "prettyprinting" rule. Each prettyprinting rule describes how to "prettyprint" the syntactic element and sub-elements recognized by its corresponding grammar rule. The prettyprinting process essentially manufactures or combines rectangular boxes of text horizontally or vertically (with optional indentation), with leaves producing unit-height boxes containing the literal value of the leaf (keyword, operator, identifier, constant, etc.

As an example, one might write the following DMS grammar rule and matching prettyprinting rule:

statement = 'for' '(' assignment ';' assignment ';' conditional_expression ')'
            '{' sequence_of_statements '}' ;
<<PrettyPrinter>>: 
    { V(H('for','(',assignment[1],';','assignment[2],';',conditional_expression,')'),
        H('{', I(sequence_of_statements)),
        '}');

This will parse the following:

    for ( i=x*2;
       i--;  i>-2*x ) {  a[x]+=3; 
      b[x]=a[x]-1; }

(using additional grammar rules for statements and expressions) and prettyprint it (using additional prettyprinting rules for those additional grammar rules) as follows:

    for (i=x*2;i--;i>-2*x)
    {   a[x]+=3;
        b[x]=a[x]-1;
    }

DMS also captures comments, attaches them to AST nodes, and regenerates them on output. The implementation is a bit exotic because most parsers don't handle comments, but utilization is easy, even "free"; comments will be automatically inserted in the prettyprinted result in their original places.

DMS can also print in "fidelity" mode. In this form, it tries to preserve the shape of the toke (e.g., number radix, identifier character capitalization, which keyword spelling was used) the column offset (into the line) of a parsed token. This would cause the original text (or something so close that you don't think it is different) to get regenerated.

More details about what prettyprinters must do are provided in my SO answer on Compiling an AST back to source code. DMS addresses all of those topics cleanly.

This capability has been used by DMS on some 40+ real languages, including full IBM COBOL, PL/SQL, Java 1.8, C# 5.0, C (many dialects) and C++14.

By writing a sufficiently interesting set of prettyprinter rules, you can build things like JavaDoc extended to include hyperlinked source code.

Combined unparser/parser generator

Tags:

Parsing

Grammar

Related

Recent Posts