Implementing a language interpreter in Haskell
A couple of good papers I've finally found:
- Building Interpreters by Composing Monads
- Monad Transformers Step by Step - how incrementally build tiny interpreter using monad transformers
- How to build a monadic interpreter in one day
- Monad Transformers and Modular Interpreters
If you are new to writing this kind of processors, I would recommend to put off using monads for a while and first focus on getting a barebones implementation without any bells or whistles.
The following may serve as a minitutorial.
I assume that you have already tackled the issue of parsing the source text of the programs you want to write an interpreter for and that you have some types for capturing the abstract syntax of your language. The language that I use here is very simple and only consists of integer expressions and some basic statements.
Preliminaries
Let us first import some modules that we will use in just a bit.
import Data.Function
import Data.List
The essence of an imperative language is that it has some form of mutable variables. Here, variables simply represented by strings:
type Var = String
Expressions
Next, we define expressions. Expressions are constructed from integer constants, variable references, and arithmetic operations.
infixl 6 :+:, :-:
infixl 7 :*:, :/:
data Exp
= C Int -- constant
| V Var -- variable
| Exp :+: Exp -- addition
| Exp :-: Exp -- subtraction
| Exp :*: Exp -- multiplication
| Exp :/: Exp -- division
For example, the expression that adds the constant 2 to the variable x
is represented by V "x" :+: C 2
.
Statements
The statement language is rather minimal. We have three forms of statements: variable assignments, while loops, and sequences.
infix 1 :=
data Stmt
= Var := Exp -- assignment
| While Exp Stmt -- loop
| Seq [Stmt] -- sequence
For example, a sequence of statements for "swapping" the values of the variables x
and y
can be represented by Seq ["tmp" := V "x", "x" := V "y", "y" := V "tmp"]
.
Programs
A program is just a statement.
type Prog = Stmt
Stores
Now, let us move to the actual interpreter. While running a program, we need to keep track of the values assigned to the different variables in the programs. These values are just integers and as a representation of our "memory" we just use lists of pairs consisting of a variable and a value.
type Val = Int
type Store = [(Var, Val)]
Evaluating expressions
Expressions are evaluated by mapping constants to their value, looking up the values of variables in the store, and mapping arithmetic operations to their Haskell counterparts.
eval :: Exp -> Store -> Val
eval (C n) r = n
eval (V x) r = case lookup x r of
Nothing -> error ("unbound variable `" ++ x ++ "'")
Just v -> v
eval (e1 :+: e2) r = eval e1 r + eval e2 r
eval (e1 :-: e2) r = eval e1 r - eval e2 r
eval (e1 :*: e2) r = eval e1 r * eval e2 r
eval (e1 :/: e2) r = eval e1 r `div` eval e2 r
Note that if the store contains multiple bindings for a variable, lookup
selects the bindings that comes first in the store.
Executing statements
While the evaluation of an expression cannot alter the contents of the store, executing a statement may in fact result in an update of the store. Hence, the function for executing a statement takes a store as an argument and produces a possibly updated store.
exec :: Stmt -> Store -> Store
exec (x := e) r = (x, eval e r) : r
exec (While e s) r | eval e r /= 0 = exec (Seq [s, While e s]) r
| otherwise = r
exec (Seq []) r = r
exec (Seq (s : ss)) r = exec (Seq ss) (exec s r)
Note that, in the case of assignments, we simply push a new binding for the updated variable to the store, effectively shadowing any previous bindings for that variable.
Top-level Interpreter
Running a program reduces to executing its top-level statement in the context of an initial store.
run :: Prog -> Store -> Store
run p r = nubBy ((==) `on` fst) (exec p r)
After executing the statement we clean up any shadowed bindings, so that we can easily read off the contents of the final store.
Example
As an example, consider the following program that computes the Fibonacci number of the number stored in the variable n
and stores its result in the variable x
.
fib :: Prog
fib = Seq
[ "x" := C 0
, "y" := C 1
, While (V "n") $ Seq
[ "z" := V "x" :+: V "y"
, "x" := V "y"
, "y" := V "z"
, "n" := V "n" :-: C 1
]
]
For instance, in an interactive environment, we can now use our interpreter to compute the 25th Fibonacci number:
> lookup "x" $ run fib [("n", 25)]
Just 75025
Monadic Interpretation
Of course, here, we are dealing with a very simple and tiny imperative language. As your language gets more complex, so will the implementation of your interpreter. Think for example about what additions you need when you add procedures and need to distinguish between local (stack-based) storage and global (heap-based) storage. Returning to that part of your question, you may then indeed consider the introduction of monads to streamline the implementation of your interpreter a bit.
In the example interpreter above, there are two "effects" that are candidates for being captured by a monadic structure:
- The passing around and updating of the store.
- Aborting running the program when a run-time error is encountered. (In the implementation above, the interpreter simply crashes when such an error occurs.)
The first effect is typically captured by a state monad, the second by an error monad. Let us briefly investigate how to do this for our interpreter.
We prepare by importing just one more module from the standard libraries.
import Control.Monad
We can use monad transformers to construct a composite monad for our two effects by combining a basic state monad and a basic error monad. Here, however, we simply construct the composite monad in one go.
newtype Interp a = Interp { runInterp :: Store -> Either String (a, Store) }
instance Monad Interp where
return x = Interp $ \r -> Right (x, r)
i >>= k = Interp $ \r -> case runInterp i r of
Left msg -> Left msg
Right (x, r') -> runInterp (k x) r'
fail msg = Interp $ \_ -> Left msg
Edit 2018: The Applicative Monad Proposal
Since the Applicative Monad Proposal (AMP) every Monad must also be an instance of Functor and Applicative. To do this we can add
import Control.Applicative -- Otherwise you can't do the Applicative instance.
to the imports and make Interp an instance of Functor and Applicative like this
instance Functor Interp where
fmap = liftM -- imported from Control.Monad
instance Applicative Interp where
pure = return
(<*>) = ap -- imported from Control.Monad
Edit 2018 end
For reading from and writing to the store, we introduce effectful functions rd
and wr
:
rd :: Var -> Interp Val
rd x = Interp $ \r -> case lookup x r of
Nothing -> Left ("unbound variable `" ++ x ++ "'")
Just v -> Right (v, r)
wr :: Var -> Val -> Interp ()
wr x v = Interp $ \r -> Right ((), (x, v) : r)
Note that rd
produces a Left
-wrapped error message if a variable lookup fails.
The monadic version of the expression evaluator now reads
eval :: Exp -> Interp Val
eval (C n) = do return n
eval (V x) = do rd x
eval (e1 :+: e2) = do v1 <- eval e1
v2 <- eval e2
return (v1 + v2)
eval (e1 :-: e2) = do v1 <- eval e1
v2 <- eval e2
return (v1 - v2)
eval (e1 :*: e2) = do v1 <- eval e1
v2 <- eval e2
return (v1 * v2)
eval (e1 :/: e2) = do v1 <- eval e1
v2 <- eval e2
if v2 == 0
then fail "division by zero"
else return (v1 `div` v2)
In the case for :/:
, division by zero results in an error message being produced through the Monad
-method fail
, which, for Interp
, reduces to wrapping the message in a Left
-value.
For the execution of statements we have
exec :: Stmt -> Interp ()
exec (x := e) = do v <- eval e
wr x v
exec (While e s) = do v <- eval e
when (v /= 0) (exec (Seq [s, While e s]))
exec (Seq []) = do return ()
exec (Seq (s : ss)) = do exec s
exec (Seq ss)
The type of exec
conveys that statements do not result in values but are executed only for their effects on the store or the run-time errors they may trigger.
Finally, in the function run
we perform a monadic computation and process its effects.
run :: Prog -> Store -> Either String Store
run p r = case runInterp (exec p) r of
Left msg -> Left msg
Right (_, r') -> Right (nubBy ((==) `on` fst) r')
In the interactive environment, we can now revisit the interpretation of our example program:
> lookup "x" `fmap` run fib [("n", 25)]
Right (Just 75025)
> lookup "x" `fmap` run fib []
Left "unbound variable `n'"