Tutorial¶
This chapter gives a complete example of a project implemented using the Wisent parser generator. Here we will implement a simple calculator.
Creating The Grammar File¶
We create a file calculator.wi which contains the following
grammar.
expr: _additive ;
_additive: sum | difference | _multiplicative ;
sum: _additive '+' _multiplicative ;
difference: _additive '-' _multiplicative ;
_multiplicative: product | quotient | _primary ;
product: _multiplicative '*' _primary ;
quotient: _multiplicative '/' _primary ;
_primary: NUMBER
| brackets
| function ;
brackets: '(' _additive ')' ;
function: SYMBOL '(' _additive ')' ;
This file describes the structure of the input for our calculator.
The format of grammar files is describes in the section
Describing the Format of your Input. The reason for the fact that some symbol names
are prefixed by _ is explained in the section about
Transparent Symbols.
Running Wisent¶
We use Wisent to generate a parser. Execute the following command to
convert the grammar file calculator.wi into python code:
$ wisent -o parser.py -e calc.py calculator.wi
Detail about using the wisent program are describe in the
section Running Wisent. The program call generates a (long)
file parser.py which contains the generated parser. You can
use pythons introspection abilities, e.g. by calling pydoc
parser on the command line. The generated parser is described in
more detail in the section The Parser class.
The -e option instructs Wisent to generate example code in the
file calc.py. This code will not be useful in itself, but
it serves as an illustration of how to use the generated parser.
It looks as follows:
from sys import stderr
from parser import Parser
def print_tree(tree, terminals, indent=0):
"""Print a parse tree to stdout."""
prefix = " "*indent
if tree[0] in terminals:
print prefix + repr(tree)
else:
print prefix + unicode(tree[0])
for x in tree[1:]:
print_tree(x, terminals, indent+1)
input = [ ('NUMBER',), ('/',), ('SYMBOL',), ('(',), ('NUMBER',), ('*',),
('NUMBER',), ('+',), ('NUMBER',), ('/',), ('NUMBER',), (')',),
('*',), ('SYMBOL',), ('(',), ('NUMBER',), ('-',), ('NUMBER',),
('-',), ('NUMBER',), (')',) ]
p = Parser()
try:
tree = p.parse(input)
except p.ParseErrors, e:
for token,expected in e.errors:
if token[0] == p.EOF:
print >>stderr, "unexpected end of file"
continue
found = repr(token[0])
if len(expected) == 1:
msg = "missing %s (found %s)"%(repr(expected[0]), found)
else:
msg1 = "parse error before %s, "%found
l = sorted([ repr(s) for s in expected ])
msg2 = "expected one of "+", ".join(l)
msg = msg1+msg2
print >>stderr, msg
raise SystemExit(1)
print_tree(tree, p.terminals)
The input of the generated parser (input in the example code) must
be a sequence (or any iterable) of Python tuples where the first
element gives the type of input token. It must be one of the terminal
symbols +, -, *, /, (, ), NUMBER, and
SYMBOL from the grammar. The remaining elements of the tuples
(not present in the example code) don’t affect the parsing process,
they are directly copied into the resulting output tree. We will use
these elements to store the value of input numbers and function names.
The program just outputs the parse tree returned by p.parse:
expr
difference
('NUMBER',)
('-',)
quotient
function
('SYMBOL',)
('(',)
('NUMBER',)
(')',)
('/',)
('NUMBER',)
Adding a Tokenizer¶
The next step is to edit the file calc.py to add a tokenizer
which can convert a string typed by the user into a list of input
tokens for p.parse:
def tokenize(str):
from re import match
res = []
while str:
if str[0].isspace():
str = str[1:]
continue
m = match('[0-9.]+', str)
if m:
res.append(('NUMBER', float(m.group(0))))
str = str[m.end(0):]
continue
m = match('[a-z]+', str)
if m:
res.append(('SYMBOL', m.group(0)))
str = str[m.end(0):]
continue
res.append((str[0],))
str = str[1:]
return res
As required, this function breaks a string into a list of tokens:
>>> tokenize("fn(1+2)")
[('SYMBOL', fn'), ('(',), ('NUMBER', 1.0), ('+',), ('NUMBER', 2.0), (')',)]
Computing the Result from the Parse Tree¶
Write a function which recursively examines the parse tree and computes the numerical result:
def eval_tree(tree):
if tree[0] == 'expr':
return eval_tree(tree[1])
elif tree[0] == 'sum':
return eval_tree(tree[1]) + eval_tree(tree[3])
elif tree[0] == 'difference':
return eval_tree(tree[1]) - eval_tree(tree[3])
elif tree[0] == 'product':
return eval_tree(tree[1]) * eval_tree(tree[3])
elif tree[0] == 'quotient':
return eval_tree(tree[1]) / eval_tree(tree[3])
elif tree[0] == 'NUMBER':
return tree[1]
elif tree[0] == 'brackets':
return eval_tree(tree[2])
elif tree[0] == 'function':
fn = getattr(math, tree[1][1])
return fn(eval_tree(tree[3]))
Details about the format of the parse tree are contained in the section Parse Trees.
Putting it all Together¶
The final result of our tutorial is the following python file
calculator.wi:
#! /usr/bin/env python3
from sys import stderr
import math
from parser import Parser
def tokenize(str):
from re import match
res = []
while str:
if str[0].isspace():
str = str[1:]
continue
m = match('[0-9.]+', str)
if m:
res.append(('NUMBER', float(m.group(0))))
str = str[m.end(0):]
continue
m = match('[a-z]+', str)
if m:
res.append(('SYMBOL', m.group(0)))
str = str[m.end(0):]
continue
res.append((str[0],))
str = str[1:]
return res
def eval_tree(tree):
if tree[0] == 'expr':
return eval_tree(tree[1])
elif tree[0] == 'sum':
return eval_tree(tree[1]) + eval_tree(tree[3])
elif tree[0] == 'difference':
return eval_tree(tree[1]) - eval_tree(tree[3])
elif tree[0] == 'product':
return eval_tree(tree[1]) * eval_tree(tree[3])
elif tree[0] == 'quotient':
return eval_tree(tree[1]) / eval_tree(tree[3])
elif tree[0] == 'NUMBER':
return tree[1]
elif tree[0] == 'brackets':
return eval_tree(tree[2])
elif tree[0] == 'function':
fn = getattr(math, tree[1][1])
return fn(eval_tree(tree[3]))
p = Parser()
while True:
try:
s = input("calc: ")
except EOFError:
print()
break
tokens = tokenize(s)
try:
tree = p.parse(tokens)
except p.ParseErrors as e:
for token,expected in e.errors:
if token[0] == p.EOF:
print("unexpected end of file", file=stderr)
continue
found = repr(token[0])
if len(expected) == 1:
msg = "missing %s (found %s)"%(repr(expected[0]), found)
else:
msg1 = "parse error before %s, "%found
l = sorted([ repr(s) for s in expected ])
msg2 = "expected one of "+", ".join(l)
msg = msg1+msg2
print(msg, file=stderr)
continue
print(eval_tree(tree))
This file, together with the parser generated by Wisent, implements a rudimentary calculator including some error handling:
$ ./calc.py
calc: 1
1.0
calc: 1*(2+3)
5.0
calc: 1+*2
parse error before '*', expected one of '(', 'NUMBER', 'SYMBOL'
calc: 4*sin(6)
-1.1176619928
Exercise: This calculator can not yet deal with negative numbers:
calc: -3
parse error before '-', expected one of '(', 'NUMBER', 'SYMBOL'
Make the required changes to the code to fix this.