Tutorial

This chapter gives a complete example of a project implemented using the Wisent parser generator. Here we will implement a simple calculator.

Creating The Grammar File

We create a file calculator.wi which contains the following grammar.

expr: _additive ;

_additive: sum | difference | _multiplicative ;
sum: _additive '+' _multiplicative ;
difference: _additive '-' _multiplicative ;

_multiplicative: product | quotient | _primary ;
product: _multiplicative '*' _primary ;
quotient: _multiplicative '/' _primary ;

_primary: NUMBER
       | brackets
       | function ;
brackets: '(' _additive ')' ;
function: SYMBOL '(' _additive ')' ;

This file describes the structure of the input for our calculator. The format of grammar files is describes in the section Describing the Format of your Input. The reason for the fact that some symbol names are prefixed by _ is explained in the section about Transparent Symbols.

Running Wisent

We use Wisent to generate a parser. Execute the following command to convert the grammar file calculator.wi into python code:

$ wisent -o parser.py -e calc.py calculator.wi

Detail about using the wisent program are describe in the section Running Wisent. The program call generates a (long) file parser.py which contains the generated parser. You can use pythons introspection abilities, e.g. by calling pydoc parser on the command line. The generated parser is described in more detail in the section The Parser class.

The -e option instructs Wisent to generate example code in the file calc.py. This code will not be useful in itself, but it serves as an illustration of how to use the generated parser. It looks as follows:

from sys import stderr

from parser import Parser

def print_tree(tree, terminals, indent=0):
    """Print a parse tree to stdout."""
    prefix = "    "*indent
    if tree[0] in terminals:
        print prefix + repr(tree)
    else:
        print prefix + unicode(tree[0])
        for x in tree[1:]:
            print_tree(x, terminals, indent+1)

input = [ ('NUMBER',), ('/',), ('SYMBOL',), ('(',), ('NUMBER',), ('*',),
          ('NUMBER',), ('+',), ('NUMBER',), ('/',), ('NUMBER',), (')',),
          ('*',), ('SYMBOL',), ('(',), ('NUMBER',), ('-',), ('NUMBER',),
          ('-',), ('NUMBER',), (')',) ]

p = Parser()
try:
    tree = p.parse(input)
except p.ParseErrors, e:
    for token,expected in e.errors:
        if token[0] == p.EOF:
            print >>stderr, "unexpected end of file"
            continue

        found = repr(token[0])
        if len(expected) == 1:
            msg = "missing %s (found %s)"%(repr(expected[0]), found)
        else:
            msg1 = "parse error before %s, "%found
            l = sorted([ repr(s) for s in expected ])
            msg2 = "expected one of "+", ".join(l)
            msg = msg1+msg2
        print >>stderr, msg
    raise SystemExit(1)

print_tree(tree, p.terminals)

The input of the generated parser (input in the example code) must be a sequence (or any iterable) of Python tuples where the first element gives the type of input token. It must be one of the terminal symbols +, -, *, /, (, ), NUMBER, and SYMBOL from the grammar. The remaining elements of the tuples (not present in the example code) don’t affect the parsing process, they are directly copied into the resulting output tree. We will use these elements to store the value of input numbers and function names.

The program just outputs the parse tree returned by p.parse:

expr
    difference
        ('NUMBER',)
        ('-',)
        quotient
            function
                ('SYMBOL',)
                ('(',)
                ('NUMBER',)
                (')',)
            ('/',)
            ('NUMBER',)

Adding a Tokenizer

The next step is to edit the file calc.py to add a tokenizer which can convert a string typed by the user into a list of input tokens for p.parse:

def tokenize(str):
    from re import match

    res = []
    while str:
        if str[0].isspace():
            str = str[1:]
            continue

        m = match('[0-9.]+', str)
        if m:
            res.append(('NUMBER', float(m.group(0))))
            str = str[m.end(0):]
            continue

        m = match('[a-z]+', str)
        if m:
            res.append(('SYMBOL', m.group(0)))
            str = str[m.end(0):]
            continue

        res.append((str[0],))
        str = str[1:]

    return res

As required, this function breaks a string into a list of tokens:

>>> tokenize("fn(1+2)")
[('SYMBOL', fn'), ('(',), ('NUMBER', 1.0), ('+',), ('NUMBER', 2.0), (')',)]

Computing the Result from the Parse Tree

Write a function which recursively examines the parse tree and computes the numerical result:

def eval_tree(tree):
    if tree[0] == 'expr':
        return eval_tree(tree[1])
    elif tree[0] == 'sum':
        return eval_tree(tree[1]) + eval_tree(tree[3])
    elif tree[0] == 'difference':
        return eval_tree(tree[1]) - eval_tree(tree[3])
    elif tree[0] == 'product':
        return eval_tree(tree[1]) * eval_tree(tree[3])
    elif tree[0] == 'quotient':
        return eval_tree(tree[1]) / eval_tree(tree[3])
    elif tree[0] == 'NUMBER':
        return tree[1]
    elif tree[0] == 'brackets':
        return eval_tree(tree[2])
    elif tree[0] == 'function':
        fn = getattr(math, tree[1][1])
        return fn(eval_tree(tree[3]))

Details about the format of the parse tree are contained in the section Parse Trees.

Putting it all Together

The final result of our tutorial is the following python file calculator.wi:

#! /usr/bin/env python

from sys import stderr
import math

from parser import Parser

def tokenize(str):
    from re import match

    res = []
    while str:
        if str[0].isspace():
            str = str[1:]
            continue

        m = match('[0-9.]+', str)
        if m:
            res.append(('NUMBER', float(m.group(0))))
            str = str[m.end(0):]
            continue

        m = match('[a-z]+', str)
        if m:
            res.append(('SYMBOL', m.group(0)))
            str = str[m.end(0):]
            continue

        res.append((str[0],))
        str = str[1:]

    return res

def eval_tree(tree):
    if tree[0] == 'expr':
        return eval_tree(tree[1])
    elif tree[0] == 'sum':
        return eval_tree(tree[1]) + eval_tree(tree[3])
    elif tree[0] == 'difference':
        return eval_tree(tree[1]) - eval_tree(tree[3])
    elif tree[0] == 'product':
        return eval_tree(tree[1]) * eval_tree(tree[3])
    elif tree[0] == 'quotient':
        return eval_tree(tree[1]) / eval_tree(tree[3])
    elif tree[0] == 'NUMBER':
        return tree[1]
    elif tree[0] == 'brackets':
        return eval_tree(tree[2])
    elif tree[0] == 'function':
        fn = getattr(math, tree[1][1])
        return fn(eval_tree(tree[3]))

p = Parser()
while True:
    try:
        s = raw_input("calc: ")
    except EOFError:
        print
        break
    input = tokenize(s)

    try:
        tree = p.parse(input)
    except p.ParseErrors, e:
        for token,expected in e.errors:
            if token[0] == p.EOF:
                print >>stderr, "unexpected end of file"
                continue

            found = repr(token[0])
            if len(expected) == 1:
                msg = "missing %s (found %s)"%(repr(expected[0]), found)
            else:
                msg1 = "parse error before %s, "%found
                l = sorted([ repr(s) for s in expected ])
                msg2 = "expected one of "+", ".join(l)
                msg = msg1+msg2
            print >>stderr, msg
        continue

    print eval_tree(tree)

This file, together with the parser generated by Wisent, implements a rudimentary calculator including some error handling:

$ ./calc.py
calc: 1
1.0
calc: 1*(2+3)
5.0
calc: 1+*2
parse error before '*', expected one of '(', 'NUMBER', 'SYMBOL'
calc: 4*sin(6)
-1.1176619928

Exercise: This calculator can not yet deal with negative numbers:

calc: -3
parse error before '-', expected one of '(', 'NUMBER', 'SYMBOL'

Make the required changes to the code to fix this.

Table Of Contents

Previous topic

Introduction

Next topic

Documentation for Wisent