Parsing a simple text grammar with Superpower

Step 1 of writing any Superpower parser is to figure out what the token kinds are. You have something like:

// ECL - Elevator Control Language ;-)
enum EclToken {
    LParen,
    RParen,
    UpKeyword,
    DownKeyword,
    WaitKeyword,
    AtSymbol,
    Number,
    Comma
}

Step 2, write a Tokenizer<EclToken>. This is left as a direct programming task by Superpower v1 - there aren't many helpers to lean on, you just need to write the code as in the examples.

The tokenizer takes the input string, strips out the whitespace, and figures out what the sequence of tokens is.

For your example input, the first line will be:

// (UP 100),
LParen, UpKeyword, Number, RParen, Comma

For tokens like Number that contain content, the span associated with the Result<EclToken> will point to the portion of the input string corresponding to the token. In this line, the number will be a TextSpan covering 100.

Step 3 is to figure out what you want to parse the input into. For programming languages with nested expressions, this is usually an AST. In the case of the ECL sample, it's pretty simple so you might cut it down to:

struct ElevatorCommand {        
    public int Distance; // + or -
    public bool IsRelative;
}

Step 4, the parser. This is usually embedded in a static class. The job of the parser is to build up more complex results (an ElevatorCommand[], here), from simpler results (number, movement).

This is where Superpower does the heavy lifting, particularly with respect to expectations and errors.

static class EclParser 
{
    static TokenListParser<EclToken, int> Number =
        Token.EqualTo(EclToken.Number).Apply(Numerics.IntegerInt32);
}

The first thing we do is define the parser for numbers; this one applies a built-in TextParser<int> to the content of an EclToken.Number span.

You can see more parsing machinery in this example.

A few more clues to help you find the way (not syntax checked, let alone compiled/tested):

    static TokenListParser<EclToken, ElevatorCommand> Up =
        from _ in Token.EqualTo(EclToken.UpKeyword)
        from distance in Number
        select new ElevatorCommand {
            Distance = distance,
            IsRelative = false
        };

    static TokenListParser<EclToken, ElevatorCommand> Command =
        from lp in Token.EqualTo(EclToken.LParen)
        from command in Up // .Or(Down).Or(Wait)
        from rp in Token.EqualTo(EclToken.RParen)
        select command;

    static TokenListParser<EclToken, ElevatorCommand[]> Commands =
        Command.ManyDelimitedBy(Token.EqualTo(EclToken.Comma));
}

Commands is the completed parser that you can apply to the input.

It's best to build up the parser incrementally, testing each smaller parser on chunks of input they're expected to parse.


OK, I have finally managed to get it. It wasn't so difficult with @Nicholas Blumhardt's guidance :)

I have created a project in GitHub to illustrate the scenario. Since the classes are big for a post, I'm linking to the files:

  • This is the tokenizer
  • This is the class with the parsers.