SlideShare a Scribd company logo
Parser Combinators
                              Writing external DSLs




Wednesday, November 9, 2011
DSLs

                   • DSL is an ultimate abstraction
                   • You create a language that precisely
                          describes the problem you are trying to
                          solve.




Wednesday, November 9, 2011
Internal & External

                   • DSL can be internal - made using the host
                          programming language.
                   • Or external - you invent a little language
                          for solving your problem.




Wednesday, November 9, 2011
Internal DSL: Good
                   • No external tool dependencies
                   • Nice IDE support
                   • Type checking for free
                   • Easy to implement
                   • Gives full power of your language
                   • Parsing is free
Wednesday, November 9, 2011
Internal DSL: Bad

                   • It requires expressive language
                   • If the language is not expressive enough
                          DSL will be verbose and not pleasant to
                          use
                   • It can produce obscure error messages

Wednesday, November 9, 2011
External DSL: Good

                   • Platform independent
                   • Nicer syntax
                   • External tool can perform static analysis
                   • Good error messages

Wednesday, November 9, 2011
External DSLs: Bad


                   • No IDE support
                   • Hard to implement


Wednesday, November 9, 2011
External DSLs

                   • External DSLs can be implemented using
                          interpretation or code generation.
                   • Parse
                   • Analyze
                   • Interpret | Generate Code

Wednesday, November 9, 2011
parseJSON :: String -> Either Error JSONObject




Wednesday, November 9, 2011
JSON Grammar-like stuff
               object := '{' fields? '}'
               fields := field (',' field)*
               field := id ':' value
               value := literal | object | array




Wednesday, November 9, 2011
Bison (C)
                               ANTLR (Java and many others)
                              You can encode grammar by hand




Wednesday, November 9, 2011
function parseObject() {
       match('{')
       if (lookahead(ID))
         parseFields()
       match('}')
     }




Wednesday, November 9, 2011
function parseFields() {
       parseField()
       while (lookahead(',')) {
         consume()
         parseField()
       }
     }




Wednesday, November 9, 2011
• Parser can either consume some input or
                          not.

                   • Parser can fail or succeed.
                   • Parser can return some value.


Wednesday, November 9, 2011
• Usually sequences are just sequential calls.
                   • Optional parts of grammar are just IFs
                   • Zero or more elements is WHILE
                   • Grammar rule is a function


Wednesday, November 9, 2011
function parseFields() {
       separatedBy(',', parseField)
     }

     function separatedBy(ch, f) {
       f()
       while (lookahead(ch)) {
         consume()
         f()
       }
     }



Wednesday, November 9, 2011
function sepBy(ch, p) {
       return
         seq(p, many(seq(ch, p)))
     }




Wednesday, November 9, 2011
Combinator pattern




Wednesday, November 9, 2011
Haskell in 60 seconds



Wednesday, November 9, 2011
main = putStrLn "Hello World!"

     times x n = x * n
     times = x n -> x * n
     times = x -> n -> x * n
     times = (*)
     times 2 3
     2 `times` 3
     data Point = Point Int Int
     data Either a b = Left a | Right b
     Point 10 20
     Left “Hello”


Wednesday, November 9, 2011
import Text.Parsec

     parse ::
         Parser
      -> SourceName
      -> String
      -> Either ParseError a




Wednesday, November 9, 2011
p = satisfy (x -> x == 'a')
     main = print (parse p "<test>" "abc")

     > Right 'a'

     p = satisfy (x -> x == 'a')
     main = print (parse p "<test>" "bcd")

     > Left "<test>" (line 1, column 1):
     > unexpected "b"



Wednesday, November 9, 2011
-- There is something like this in Parsec
     char c = satisfy (x -> x == c)

     -- Example
     p = char 'a'
     main = print (parse p "<test>" "bcd")

     > Left "<test>" (line 1, column 1):
     > unexpected "b"
     > expecting "a"



Wednesday, November 9, 2011
char c = satisfy (==c)   <?> show [c]




Wednesday, November 9, 2011
p = char 'a' >>= x ->
         char 'b' >>= y ->
         return [x, y]
     main = print (parse p "<test>" "abc")

     > Right "ab"




Wednesday, November 9, 2011
p = do char 'a'
            char 'b'
            return "ab"
     main = print (parse p "<test>" "abc")

     > Right "ab"




Wednesday, November 9, 2011
p = do char 'a'
            char 'b' <|> char 'q'
            char 'c' <|> char 'd'

     main = print (parse p "<test>" "abc")




Wednesday, November 9, 2011
letter = satisfy isAlpha <?> "letter"
     digit = satisfy isDigit <?> "digit"
     spaces = skipMany space <?> "white space"

     many p = ... censored ...




Wednesday, November 9, 2011
p = do letter
            many (letter <|> digit)
     main = print (parse p "<test>" "hello123")

     > Right "ello123"




Wednesday, November 9, 2011
p = do x <- letter
            xs <- many (letter <|> digit)
            return (x:xs)
     main = print (parse p "<test>" "hello123")

     > Right "hello123"




Wednesday, November 9, 2011
ident = do x <- letter
                xs <- many (letter <|> digit)
                return (x:xs)
     p = ident
     main = print (parse p "<test>" "123hello")

     > Left "<test>" (line 1, column 1):
     > unexpected "1"
     > expecting letter




Wednesday, November 9, 2011
ident = do x <- letter
                xs <- many (letter <|> digit)
                return (x:xs)
     p = ident <?> "variable name"
     main = print (parse p "<test>" "123hello")

     > Left "<test>" (line 1, column 1):
     > unexpected "1"
     > expecting variable name




Wednesday, November 9, 2011
ident = do x <- letter
                xs <- many (letter <|> digit)
                return (x:xs)
             <?> "variable name"
     p = ident
     main = print (parse p "<test>" "123hello")




Wednesday, November 9, 2011
ident = do x <- letter
                xs <- many (letter <|> digit)
                return (x:xs)
             <?> "variable name"
     letKeyword = string "let"
     p = ident <|> letKeyword
     main = print (parse p "<test>" "letter")

     > Right "letter"




Wednesday, November 9, 2011
ident = do x <- letter
                xs <- many (letter <|> digit)
                return (x:xs)
             <?> "variable name"
     letKeyword = do s <- string "let"
                     notFollowedBy (letter <|> digit)
                     spaces
                     return s
     p = try(letKeyword) <|> ident
     main = print (parse p "<test>" "letter")
     -----
     Right "letter"




Wednesday, November 9, 2011
Lexer-like example



Wednesday, November 9, 2011
type Name = String
     data Expr = Number Integer
               | Var Name
               | Let Name Expr
               | Seq [Expr]
               deriving Show




Wednesday, November 9, 2011
myDef = emptyDef {
         identStart = letter <|> char '_',
         identLetter = alphaNum <|> char '_',
         opStart = opLetter myDef,
         opLetter = oneOf "=,;",
         reservedOpNames = [
            ",", ";", "="
         ],
         reservedNames = ["let"],
         caseSensitive = True
       }




Wednesday, November 9, 2011
tokenParser = PT.makeTokenParser myDef

     parens = PT.parens tokenParser
     naturalNumber = PT.natural tokenParser
     keyword = PT.reserved tokenParser
     identifier = PT.identifier tokenParser
     op = PT.reservedOp tokenParser
     commaSep = PT.commaSep tokenParser
     commaSep1 = PT.commaSep1 tokenParser




Wednesday, November 9, 2011
simple = numberLiteral <|> var

     numberLiteral = do n <- naturalNumber
                        return $ Number n

     var = do s <- identifier
              return $ Var s




Wednesday, November 9, 2011
letStmt = do keyword "let"
                  defs <- commaSep1 def
                  op ";"
                  return $ Seq defs

     def = do name <- identifier
              op "="
              value <- simple
              return $ Let name value




Wednesday, November 9, 2011
main = print (
         parse letStmt "<test>" "let one = 1, two = 2;"
       )

     ----

     Right (
       Seq [
         Let "one" (Number 1),
         Let "two" (Number 2)
       ]
     )



Wednesday, November 9, 2011
CSV Parser Example



Wednesday, November 9, 2011
import Text.Parsec

     csvFile                  = line `endBy1` eol
     line                     = cell `sepBy` (char ',')
     cell                     = wsopt >> many (noneOf ",n")
     wsopt                    = many (oneOf " t")
     eol                      = char 'n'
     main                     = print (parse csvFile "<test>" input)
       where                  input = "aa, b,cnd,e,fn"




     > Right [["aa","b","c"],["d","e","f"]]


Wednesday, November 9, 2011
import scala.util.parsing.combinator._

     object CSVParser extends RegexParsers {
       override def skipWhitespace = false
       def whitespace = """[ t]*"""r
       def csv = rep1sep(row, "n") <~ "n"
       def row = rep1sep(field, ",")
       def field = opt(whitespace) ~> """[^n,]""".r
       def parse(s: String) = parseAll(csv, s)
     }
     println(CSVParser.parse("a,b,cnd,e, fn"))



     > [3.1] parsed: List(List(a, b, c), List(d, e, f))


Wednesday, November 9, 2011
Pysec for Python
     Spirit for C++
     In standard library of Scala
     A lot of others...




Wednesday, November 9, 2011
Stuff to read

                   • Language Implementation Patterns:
                          Create Your Own Domain-Specific and
                          General Programming Languages
                          by Terence Parr




Wednesday, November 9, 2011
Stuff to read


                   • Domain Specific Languages
                          by Martin Fowler




Wednesday, November 9, 2011
Stuff to read


                   • http://learnyouahaskell.com/
                          Learn You a Haskell for a Great Good




Wednesday, November 9, 2011
The End



Wednesday, November 9, 2011

More Related Content

PDF
JavaSE 7
PDF
Swift, functional programming, and the future of Objective-C
PPTX
Bioinformatics p1-perl-introduction v2013
PDF
Scala: Functioneel programmeren in een object georiënteerde wereld
PDF
A Prelude of Purity: Scaling Back ZIO
PPT
Basic perl programming
PPTX
Lviv EDGE 2 - NoSQL
JavaSE 7
Swift, functional programming, and the future of Objective-C
Bioinformatics p1-perl-introduction v2013
Scala: Functioneel programmeren in een object georiënteerde wereld
A Prelude of Purity: Scaling Back ZIO
Basic perl programming
Lviv EDGE 2 - NoSQL

Similar to Parser combinators (20)

PDF
Text Manipulation with/without Parsec
PDF
Java SE 7 - The Platform Evolves, Dalibor Topić (Oracle)
PDF
Acceptance & Integration Testing With Behat (PHPNw2011)
PDF
Javascript the Language of the Web
PDF
Ruby struct
PDF
De vuelta al pasado con SQL y stored procedures
PDF
Lecture 22
PDF
A Tour Through the Groovy Ecosystem
PDF
Web Scraping using Diazo!
PDF
PPTX
codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)
PPTX
Open course(programming languages) 20150225
PDF
Perl 5.10
PDF
Dynamic languages, for software craftmanship group
PDF
D-Talk: What's awesome about Ruby 2.x and Rails 4
PDF
Localizing iOS Apps
PPTX
P3 2018 python_regexes
KEY
PDF
international PHP2011_ilia alshanetsky_Hidden Features of PHP
PDF
The Not Java That's Not Scala
Text Manipulation with/without Parsec
Java SE 7 - The Platform Evolves, Dalibor Topić (Oracle)
Acceptance & Integration Testing With Behat (PHPNw2011)
Javascript the Language of the Web
Ruby struct
De vuelta al pasado con SQL y stored procedures
Lecture 22
A Tour Through the Groovy Ecosystem
Web Scraping using Diazo!
codin9cafe[2015.02.25]Open course(programming languages) - 장철호(Ch Jang)
Open course(programming languages) 20150225
Perl 5.10
Dynamic languages, for software craftmanship group
D-Talk: What's awesome about Ruby 2.x and Rails 4
Localizing iOS Apps
P3 2018 python_regexes
international PHP2011_ilia alshanetsky_Hidden Features of PHP
The Not Java That's Not Scala
Ad

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPT
Teaching material agriculture food technology
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Cloud computing and distributed systems.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Electronic commerce courselecture one. Pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
MIND Revenue Release Quarter 2 2025 Press Release
Encapsulation theory and applications.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Chapter 3 Spatial Domain Image Processing.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Teaching material agriculture food technology
Machine learning based COVID-19 study performance prediction
Cloud computing and distributed systems.
Mobile App Security Testing_ A Comprehensive Guide.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
“AI and Expert System Decision Support & Business Intelligence Systems”
The AUB Centre for AI in Media Proposal.docx
Electronic commerce courselecture one. Pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
MIND Revenue Release Quarter 2 2025 Press Release
Ad

Parser combinators

  • 1. Parser Combinators Writing external DSLs Wednesday, November 9, 2011
  • 2. DSLs • DSL is an ultimate abstraction • You create a language that precisely describes the problem you are trying to solve. Wednesday, November 9, 2011
  • 3. Internal & External • DSL can be internal - made using the host programming language. • Or external - you invent a little language for solving your problem. Wednesday, November 9, 2011
  • 4. Internal DSL: Good • No external tool dependencies • Nice IDE support • Type checking for free • Easy to implement • Gives full power of your language • Parsing is free Wednesday, November 9, 2011
  • 5. Internal DSL: Bad • It requires expressive language • If the language is not expressive enough DSL will be verbose and not pleasant to use • It can produce obscure error messages Wednesday, November 9, 2011
  • 6. External DSL: Good • Platform independent • Nicer syntax • External tool can perform static analysis • Good error messages Wednesday, November 9, 2011
  • 7. External DSLs: Bad • No IDE support • Hard to implement Wednesday, November 9, 2011
  • 8. External DSLs • External DSLs can be implemented using interpretation or code generation. • Parse • Analyze • Interpret | Generate Code Wednesday, November 9, 2011
  • 9. parseJSON :: String -> Either Error JSONObject Wednesday, November 9, 2011
  • 10. JSON Grammar-like stuff object := '{' fields? '}' fields := field (',' field)* field := id ':' value value := literal | object | array Wednesday, November 9, 2011
  • 11. Bison (C) ANTLR (Java and many others) You can encode grammar by hand Wednesday, November 9, 2011
  • 12. function parseObject() { match('{') if (lookahead(ID)) parseFields() match('}') } Wednesday, November 9, 2011
  • 13. function parseFields() { parseField() while (lookahead(',')) { consume() parseField() } } Wednesday, November 9, 2011
  • 14. • Parser can either consume some input or not. • Parser can fail or succeed. • Parser can return some value. Wednesday, November 9, 2011
  • 15. • Usually sequences are just sequential calls. • Optional parts of grammar are just IFs • Zero or more elements is WHILE • Grammar rule is a function Wednesday, November 9, 2011
  • 16. function parseFields() { separatedBy(',', parseField) } function separatedBy(ch, f) { f() while (lookahead(ch)) { consume() f() } } Wednesday, November 9, 2011
  • 17. function sepBy(ch, p) { return seq(p, many(seq(ch, p))) } Wednesday, November 9, 2011
  • 19. Haskell in 60 seconds Wednesday, November 9, 2011
  • 20. main = putStrLn "Hello World!" times x n = x * n times = x n -> x * n times = x -> n -> x * n times = (*) times 2 3 2 `times` 3 data Point = Point Int Int data Either a b = Left a | Right b Point 10 20 Left “Hello” Wednesday, November 9, 2011
  • 21. import Text.Parsec parse :: Parser -> SourceName -> String -> Either ParseError a Wednesday, November 9, 2011
  • 22. p = satisfy (x -> x == 'a') main = print (parse p "<test>" "abc") > Right 'a' p = satisfy (x -> x == 'a') main = print (parse p "<test>" "bcd") > Left "<test>" (line 1, column 1): > unexpected "b" Wednesday, November 9, 2011
  • 23. -- There is something like this in Parsec char c = satisfy (x -> x == c) -- Example p = char 'a' main = print (parse p "<test>" "bcd") > Left "<test>" (line 1, column 1): > unexpected "b" > expecting "a" Wednesday, November 9, 2011
  • 24. char c = satisfy (==c) <?> show [c] Wednesday, November 9, 2011
  • 25. p = char 'a' >>= x -> char 'b' >>= y -> return [x, y] main = print (parse p "<test>" "abc") > Right "ab" Wednesday, November 9, 2011
  • 26. p = do char 'a' char 'b' return "ab" main = print (parse p "<test>" "abc") > Right "ab" Wednesday, November 9, 2011
  • 27. p = do char 'a' char 'b' <|> char 'q' char 'c' <|> char 'd' main = print (parse p "<test>" "abc") Wednesday, November 9, 2011
  • 28. letter = satisfy isAlpha <?> "letter" digit = satisfy isDigit <?> "digit" spaces = skipMany space <?> "white space" many p = ... censored ... Wednesday, November 9, 2011
  • 29. p = do letter many (letter <|> digit) main = print (parse p "<test>" "hello123") > Right "ello123" Wednesday, November 9, 2011
  • 30. p = do x <- letter xs <- many (letter <|> digit) return (x:xs) main = print (parse p "<test>" "hello123") > Right "hello123" Wednesday, November 9, 2011
  • 31. ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) p = ident main = print (parse p "<test>" "123hello") > Left "<test>" (line 1, column 1): > unexpected "1" > expecting letter Wednesday, November 9, 2011
  • 32. ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) p = ident <?> "variable name" main = print (parse p "<test>" "123hello") > Left "<test>" (line 1, column 1): > unexpected "1" > expecting variable name Wednesday, November 9, 2011
  • 33. ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) <?> "variable name" p = ident main = print (parse p "<test>" "123hello") Wednesday, November 9, 2011
  • 34. ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) <?> "variable name" letKeyword = string "let" p = ident <|> letKeyword main = print (parse p "<test>" "letter") > Right "letter" Wednesday, November 9, 2011
  • 35. ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) <?> "variable name" letKeyword = do s <- string "let" notFollowedBy (letter <|> digit) spaces return s p = try(letKeyword) <|> ident main = print (parse p "<test>" "letter") ----- Right "letter" Wednesday, November 9, 2011
  • 37. type Name = String data Expr = Number Integer | Var Name | Let Name Expr | Seq [Expr] deriving Show Wednesday, November 9, 2011
  • 38. myDef = emptyDef { identStart = letter <|> char '_', identLetter = alphaNum <|> char '_', opStart = opLetter myDef, opLetter = oneOf "=,;", reservedOpNames = [ ",", ";", "=" ], reservedNames = ["let"], caseSensitive = True } Wednesday, November 9, 2011
  • 39. tokenParser = PT.makeTokenParser myDef parens = PT.parens tokenParser naturalNumber = PT.natural tokenParser keyword = PT.reserved tokenParser identifier = PT.identifier tokenParser op = PT.reservedOp tokenParser commaSep = PT.commaSep tokenParser commaSep1 = PT.commaSep1 tokenParser Wednesday, November 9, 2011
  • 40. simple = numberLiteral <|> var numberLiteral = do n <- naturalNumber return $ Number n var = do s <- identifier return $ Var s Wednesday, November 9, 2011
  • 41. letStmt = do keyword "let" defs <- commaSep1 def op ";" return $ Seq defs def = do name <- identifier op "=" value <- simple return $ Let name value Wednesday, November 9, 2011
  • 42. main = print ( parse letStmt "<test>" "let one = 1, two = 2;" ) ---- Right ( Seq [ Let "one" (Number 1), Let "two" (Number 2) ] ) Wednesday, November 9, 2011
  • 43. CSV Parser Example Wednesday, November 9, 2011
  • 44. import Text.Parsec csvFile = line `endBy1` eol line = cell `sepBy` (char ',') cell = wsopt >> many (noneOf ",n") wsopt = many (oneOf " t") eol = char 'n' main = print (parse csvFile "<test>" input) where input = "aa, b,cnd,e,fn" > Right [["aa","b","c"],["d","e","f"]] Wednesday, November 9, 2011
  • 45. import scala.util.parsing.combinator._ object CSVParser extends RegexParsers { override def skipWhitespace = false def whitespace = """[ t]*"""r def csv = rep1sep(row, "n") <~ "n" def row = rep1sep(field, ",") def field = opt(whitespace) ~> """[^n,]""".r def parse(s: String) = parseAll(csv, s) } println(CSVParser.parse("a,b,cnd,e, fn")) > [3.1] parsed: List(List(a, b, c), List(d, e, f)) Wednesday, November 9, 2011
  • 46. Pysec for Python Spirit for C++ In standard library of Scala A lot of others... Wednesday, November 9, 2011
  • 47. Stuff to read • Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages by Terence Parr Wednesday, November 9, 2011
  • 48. Stuff to read • Domain Specific Languages by Martin Fowler Wednesday, November 9, 2011
  • 49. Stuff to read • http://learnyouahaskell.com/ Learn You a Haskell for a Great Good Wednesday, November 9, 2011