March 2021
yacc
,bison
or happy
)This lecture: writing top-down recursive descent parsers using monadic parsing combinators.
A parser consumes an input string and yields pairs of result and remaining input.
Read this out loud as a rhyme:
A parser for things
Is a function from strings
To lists of pairs
Of things and strings
-- run a parser on some input
runParser :: Parser a -> String -> [(a, String)]
runParser (Parser f) = f
A parser that accepts a single next character.
instance Monad Parser where
return a = Parser (\cs -> [(a, cs)])
p >>= k = Parser (\cs ->
concat [runParser (k a) cs'
| (a,cs')<-runParser p cs])
return
yields a value without consuming input>>=
applies a continuation to the result of the first parser-- | a parser that always fails
empty :: Parser a
empty = Parser (\_ -> [])
-- | combine results of two parsers
(+++) :: Parser a -> Parser a -> Parser a
p +++ q = Parser (\cs -> runParser p cs ++
runParser q cs)
The combinator +++
returns all results of both parsers; we often care only about the first result.
Let us define an infix operator for deterministic choice.
(<|>) :: Parser a -> Parser a -> Parser a
p <|> q = Parser (\cs -> case runParser p cs of
[] -> runParser q cs
(x:_) -> [x])
Both +++
and <|>
are associative with neutral element empty
.
-- | parse a char satisfying a predicate
satisfy :: (Char -> Bool) -> Parser Char
satisfy p = do c <- next
if p c then return c else empty
-- | parse a specific char
char :: Char -> Parser Char
char c = satisfy (==c)
-- | parse a specific string
string :: String -> Parser String
string "" = return ""
string (c:cs)
= do char c; string cs; return (c:cs)
-- | repeat a parser zero or more times
many :: Parser a -> Parser [a]
many p = many1 p <|> return []
-- | repeat a parser one or more times
many1 :: Parser a -> Parser [a]
many1 p = do a<-p; as<-many p; return (a:as)
Remember that <|>
is left-biased; cue example…
-- | parse many whitespace charaters
spaces :: Parser String
spaces = many (satisfy isSpace)
-- | parse a token using a parser `p`
-- discarding any initial spaces
token :: Parser a -> Parser a
token p = spaces >> p
-- | parse a string token
symbol :: String -> Parser String
symbol s = token (string s)
,
)\n
Example:
"Name", "Sex", "Age", "Height(in)", "Weight(lbs)"
"Alex", "M", 41, 74, 170
"Bert", "M", 42, 68, 166
"Carl", "M", 32, 70, 155
integer :: Parser Integer
integer = token $ do
s <- many1 (satisfy isDigit)
return (read s)
stringLit :: Parser String
stringLit = token $ do
char '\"'
s <- many (satisfy (/='\"'))
char '\"'
return s
NB: some cases missing (negative numbers, escape chars)
Use a sum type to tag different results:
type Item = Either Integer String
item :: Parser Item
item = do n <- integer; return (Left n)
<|>
do s <- stringLit; return (Right s)
Alternativa: we could define our own data type.
data Item = Number Integer | Text String
item :: Parser Item
item = do n <- integer; return (Number n)
<|>
do s <- stringLit; return (Text s)
Need to “factor out” the first item because of the separators (commas).
row :: Parser [Item]
row = do v<-item; vs<-rest; return (v:vs)
<|>
return []
where
rest = many (comma >> item) <|> return []
comma :: Parser Char
comma = token (char ',')
row
parser is a bit repetitivemany
because commas are separators not terminators-- | repeated applications of `p'
-- separated by applications of `sep'
sepBy :: Parser a -> Parser b -> Parser [a]
p `sepBy` sep = (p `sepBy1` sep) <|> return []
sepBy1 :: Parser a -> Parser b -> Parser [a]
p `sepBy1` sep = do a <- p
as <- many (sep >> p)
return (a:as)
The row parser is much simpler using sepBy
:
We can define the parser for rows using the previous parser and many1
:
We can even inline the definition of row
directly:
+
, -
, *
, /
and parenthesesExample:
\[ \begin{array}{rcl} \text{expr} &::=& \text{expr} ~ \text{addop}~ \text{term} \quad|\quad \text{term} \\ \text{term} &::=& \text{term}~ \text{mulop}~ \text{factor} \quad|\quad \text{factor} \\ \text{factor} &::=& \text{integer} \quad|\quad \mathtt{(}~\text{expr}~\mathtt{)} \\ \text{addop} &::=& \mathtt{+} \quad|\quad \mathtt{-} \\ \text{mulop} &::=& \mathtt{*} \quad|\quad \mathtt{/}\\ \text{integer} &::=& \cdots \end{array} \]
factor :: Parser Integer
factor = integer <|> parens expr
-- `expr' will be defined later
integer :: Parser Integer -- as before
-- add parenthesis around a parser
parens :: Parser a -> Parser a
parens p = do
symbol "("
a <- p
symbol ")"
return a
term :: Parser Integer
term = do t<-factor; cont t
where
cont t1 = do op <- mulop
t2 <- factor
cont (t1 `op` t2)
<|> return t1
mulop :: Parser (Integer -> Integer -> Integer)
mulop = do symbol "*"; return (*)
<|>
do symbol "/"; return div
expr :: Parser Integer
expr = do t<-term; cont t
where
cont t1 = do op <- addop
t2 <- term
cont (t1 `op` t2)
<|> return t1
addop :: Parser (Integer -> Integer -> Integer)
addop = do symbol "+"; return (+)
<|>
do symbol "-"; return (-)
expr
and term
are similar
expr
allows + and -term
allows * and /sepBy
because we need to combine acording to the operator-- | chain applications of a parser
-- using some operator associated to the left
chainl :: Parser a
-> Parser (a -> a -> a) -> a -> Parser a
chainl p op a = (p `chainl1` op) <|> return a
chainl1 :: Parser a
-> Parser (a -> a -> a) -> Parser a
chainl1 p op = do a <- p; cont a
where cont a = do f <- op
b <- p
cont (f a b)
<|> return a
expr = term `chainl1` addop
term = factor `chainl1` mulop
addop = do symbol "+"; return (+)
<|>
do symbol "-"; return (-)
mulop = do symbol "*"; return (*)
<|>
do symbol "/"; return div
factor = integer <|> parens expr
Parsec is an “industrial strength” monadic parsing library:
https://hackage.haskell.org/package/parsec