How do I do python-style indent/dedent tokens with alex/haskell?

Note that in other whitespace-sensitive languages -- like Haskell -- the layout handling is indeed done in the lexer. GHC in fact implements layout handling in Alex. Here's the source:

https://github.com/ghc/ghc/blob/master/compiler/GHC/Parser/Lexer.x

There are some serious errors in your question that lead you astray, as jrockway points out. "I can't store any global data anywhere with Haskell" is on the wrong track. Firstly, you can have global state, secondly, you should not be using global state here, when Alex fully supports state transitions in rules in a safe manner.

Look at the AlexState structure that Alex provides, letting you thread state through your lexer. Then, look at how the state is used in GHC's layout implementation to implement indent/unindent of the layout rules. (Search for "-- Layout processing" in GHC's lexer to see how the state is pushed and popped).

I can't store any global data anywhere with Haskell

This is not true; in most cases something like the State monad is sufficient, but there is also the ST monad.

You don't need global state for this task, however. Writing a parser consists of two parts; lexical analysis and syntax analysis. The lexical analysis just turns a stream of characters into a stream of meaningful tokens. The syntax analysis turns tokens into an AST; this is where you should deal with indentation.

As you are interpreting the indentation, you will call a handler function as the indentation level changes -- when it increases (nesting), you call your handler function (perhaps with one arg incremented, if you want to track the indentation level); when the level decreases, you simply return the relevant AST portion from the function.

(As an aside, using a global variable for this is something that would not occur to me in an imperative language either -- if anything, it's an instance variable. The State monad is very similar conceptually to this.)

Finally, I think the phrase "I can't put all my lexing rules inside any monad" indicates some sort of misunderstanding of monads. If I needed to parse and keep global state, my code would look like:

data AST = ...
type Step = State Int AST

parseFunction :: Stream -> Step
parseFunction s = do
   level <- get
   ...
   if anotherFunction then put (level + 1) >> parseFunction ...
   else parseWhatever
   ...
   return node

parse :: Stream -> Step
parse s = do
   if looksLikeFunction then parseFunction ...

main = runState parse 0 -- initial nesting of 0

Instead of combining function applications with (.) or ($), you combine them with (>>=) or (>>). Other than that, the algorithm is the same. (There is no "monad" to be "inside".)

Finally, you might like applicative functors:

eval :: Environment -> Node -> Evaluated
eval e (Constant x) = Evaluated x
eval e (Variable x) = Evaluated (lookup e x)
eval e (Function f x y) = (f <$> (`eval` x) <*> (`eval` y)) e

(or

eval e (Function f x y) = ((`eval` f) <*> (`eval` x) <*> (`eval` y)) e

if you have something like "funcall"... but I digress.)

There is plenty of literature on parsing with applicative functors, monads, and arrows; all of which have the potential to solve your problem. Read up on those and see what you get.

How do I do python-style indent/dedent tokens with alex/haskell?

Tags:

Parsing

Haskell

Related

Recent Posts