Example: Calculator

This example focuses on the practical aspect of using a Pratt parser to parse expressions using pest. To illustrate this, we build a parser for simple equations, and construct an abstract syntax tree.

Precedence and associativity

In a simple equation multiplication and division are evaluated first, which means they have a higher precedence. e.g. 1 + 2 * 3 is evaluated as 1 + (2 * 3), if the precedence was equal it would be (1 + 2) * 3. For our system we have the following operands:

highest precedence: multiplication & division
lowest precedence: addition & subtraction

In the expression 1 + 2 - 3, no operator is inherently more important than the other. Addition, subtraction, multiplication and division are evaluated from left to right, e.g. 1 - 2 + 3 is evaluated as (1 - 2) + 3. We call this property left associativity. Operators can also be right associative. For example, we usually evaluate the statement x = y = 1 by first assigning y = 1 and x = 1 (or x = y) afterwards.

Associativity only matters if two operators have the same precedence, as is the case with addition and subtraction for example. This means that if we have an expression with only additions and subtractions, we can just evaluate it from left to right. 1 + 2 - 3 is equal to (1 + 2) - 3. And 1 - 2 + 3 is equal to (1 - 2) + 3.

To go from a flat list of operands separated by operators, it suffices to define a precedence and associativity for each operator. With these definitions an algorithm such as Pratt parsing is able to construct a corresponding expression tree.

If you are curious to know more about how Pratt parsing is implemented, Aleksey Kladov has a great tutorial on implementing it from scratch using Rust.

Calculator example

We want our calculator to be able to parse simple equations that consist of integers and simple binary operators. Additionally, we want to support parenthesis and unary minus. For example:


1 + 2 * 3
-(2 + 5) * 16

Grammar

We start by defining our atoms, bits of self-contained syntax that cannot be split up into smaller parts. For our calculator we start with just simple integers:


// No whitespace allowed between digits
integer = @{ ASCII_DIGIT+ }

atom = _{ integer }

Next, our binary operators:


bin_op = _{ add | subtract | multiply | divide }
    add = { "+" }
    subtract = { "-" }
    multiply = { "*" }
    divide = { "/" }

These two rules will be the input to the PrattParser. It expects to receive atoms separated by operators, like so: atom, bin_op, atom, bin_op, atom, ....

Corresponding to this format, we define our rule for expressions:


expr = { atom ~ (bin_op ~ atom)* }

And finally, we define our WHITESPACE and equation rule:


WHITESPACE = _{ " " }

// We can't have SOI and EOI on expr directly, because it is used
// recursively (e.g. with parentheses)
equation = _{ SOI ~ expr ~ EOI }

This defines the grammar which generates the required input for the Pratt parser.

Abstract Syntax Tree

We want to convert our input into an abstract syntax tree. For this we define the following types:


#[derive(Debug)]
pub enum Expr {
    Integer(i32),
    BinOp {
        lhs: Box<Expr>,
        op: Op,
        rhs: Box<Expr>,
    },
}

#[derive(Debug)]
pub enum Op {
    Add,
    Subtract,
    Multiply,
    Divide,
}

Note the Box<Expr> required because Rust does not allow unboxed recursive types.

There is no separate atom type, any atom is also a valid expression.

Pratt parser

The precedence of operations is defined in the Pratt parser.

An easy approach is to define the PrattParser as global using lazy_static.

Adhering to standard rules of arithmetic, we will define addition and subtraction to have lower priority than multiplication and division, and make all operators left associative.


lazy_static::lazy_static! {
    static ref PRATT_PARSER: PrattParser<Rule> = {
        use pest::pratt_parser::{Assoc::*, Op};
        use Rule::*;

        // Precedence is defined lowest to highest
        PrattParser::new()
            // Addition and subtract have equal precedence
            .op(Op::infix(add, Left) | Op::infix(subtract, Left))
            .op(Op::infix(multiply, Left) | Op::infix(divide, Left))
    };
}

We are almost there, the only thing that's left is to use our Pratt parser. For this the map_primary, map_infix, and parse functions are used, the first two take functions and the third one takes an iterator over pairs. map_primary is executed for every primary (atom), and map_infix is executed for every binop with its new left-hand and right-hand side according to the precedence rules defined earlier. In this example, we create an AST in the Pratt parser.


pub fn parse_expr(pairs: Pairs<Rule>) -> Expr {
    PRATT_PARSER
        .map_primary(|primary| match primary.as_rule() {
            Rule::integer => Expr::Integer(primary.as_str().parse::<i32>().unwrap()),
            rule => unreachable!("Expr::parse expected atom, found {:?}", rule)
        })
        .map_infix(|lhs, op, rhs| {
            let op = match op.as_rule() {
                Rule::add => Op::Add,
                Rule::subtract => Op::Subtract,
                Rule::multiply => Op::Multiply,
                Rule::divide => Op::Divide,
                rule => unreachable!("Expr::parse expected infix operation, found {:?}", rule),
            };
            Expr::BinOp {
                lhs: Box::new(lhs),
                op,
                rhs: Box::new(rhs),
            }
        })
        .parse(pairs)

}

Here's an example of how to use the parser.


fn main() -> io::Result<()> {
    for line in io::stdin().lock().lines() {
        match CalculatorParser::parse(Rule::equation, &line?) {
            Ok(mut pairs) => {
                println!(
                    "Parsed: {:#?}",
                    parse_expr(
                        // inner of expr
                        pairs.next().unwrap().into_inner()
                    )
                );
            }
            Err(e) => {
                eprintln!("Parse failed: {:?}", e);
            }
        }
    }
    Ok(())
}

With this, we can parse the following simple equation:


> 1 * 2 + 3 / 4
Parsed: BinOp {
    lhs: BinOp {
        lhs: Integer( 1 ),
        op: Multiply,
        rhs: Integer( 2 ),
    },
    op: Add,
    rhs: BinOp {
        lhs: Integer( 3 ),
        op: Divide,
        rhs: Integer( 4 ),
    },
}

Unary minus and parenthesis

So far, our calculator can parse fairly complicated expressions, but it will fail if it encounters explicit parentheses or a unary minus sign. Let's fix that.

Parentheses

Consider the expression (1 + 2) * 3. Clearly removing the parentheses would give a different result, so we must support parsing such expressions. Luckily, this can be a simple addition to our atom rule:


- atom = _{ integer }
+ atom = _{ integer | "(" ~ expr ~ ")" }

Earlier we said that atoms should be simple token sequences that cannot be split up further, but now an atom can contain arbitrary expressions! The reason we are okay with this is that the parentheses mark clear boundaries for the expression, it will not make ambiguous what operators belong to the inner expression and which to the outer one.

Unary minus

We can currently only parse positive integers, eg 16 or 2342. But we also want to do calculations with negative integers. To do this, we introduce the unary minus, so we can make -4 and -(8 + 15). We need the following change to grammar:


+ unary_minus = { "-" }
+ primary = _{ integer | "(" ~ expr ~ ")" }
- atom = _{ integer | "(" ~ expr ~ ")" }
+ atom = _{ unary_minus? ~ primary }

For these last changes we've omitted the small changes to the AST and parsing logic (using map_prefix).

You can find all these details in the repository: github.com/pest-parser/book/tree/master/examples/pest-calculator.

Keyboard shortcuts

A thoughtful introduction to the pest parser