Instaduction to Instaparse
CAP-CLUG - 2019-03-13
Instaparse is a clojure library for building
parsers from context-free grammars
What is a parser?
● Program that takes some input data (usually a string), and produces a
data-structure (usually a parse tree), based on some grammar (usually a
context-free grammar)
What’s a context-free grammar?
Formal definition:
V = finite set of non-terminals or variables. Each variable represents a clause or a
phrase, or a syntactic category
𝚺 = finite set of terminals. The set of terminals is the alphabet of the language
R = finite relation from V to (V U 𝚺)*. Each member of R is a rewrite rule or
production
S = the starting symbol, must be an element of V
Adapted from https://en.wikipedia.org/wiki/Context-free_grammar#Formal_definitions
What’s a context-free grammar?
The “context-free” bit means that the rules can always be applied, regardless of
the rest of the string (context).
There are other kinds of grammars, some more or less powerful than CFG’s. See
Chomsky Hierarchy for more
A simple CFG
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
A simple CFG
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Non-terminals Non-terminals
A simple CFG
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Terminals
A simple CFG
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Productions
A simple CFG
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Starting symbol
Productions
Each production is a rule.
You can replace the symbol on the left with
symbol(s) on the right.
‘+’ means “one or more”; ‘*’ means “zero or
more”
Non-terminals can be recursively defined, and
appear on left- and right-side of rules
Terminals only appear on the right side of a rule
If you imagine a tree, non-terminals are interior
nodes, terminals are leaf nodes (we’ll see this
more later)
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Productions example
S S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Productions example
AB S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Productions example
A B S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Productions example
aaaaaa B S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Productions example
aaaaaabb S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Some other strings that this grammar can generate
aaabbbbababaaaabbbb
abbbbbbbbb
abababababababab
aabbaabbbbbaaaaaab
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Running a grammar “forwards”
generates strings that conform to a
grammar
Running a grammar “backwards” over a
string tells us if that string is valid,
according to the grammar
Running the CFG backwards
aaaaaabbaab S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Running the CFG backwards
aaaaaabbaab S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Running the CFG backwards
A bbaab S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Running the CFG backwards
A B A B S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Running the CFG backwards
AB AB S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Running the CFG backwards
S S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Running the CFG backwards
S S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
VALID!
Let’s see that again, but not overwrite
the string
Running the CFG backwards
aaaaaabbaab S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
Running the CFG backwards
aaaaaa bb aa b S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+A AB B
Running the CFG backwards
aaaaaa bb aa b S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+A AB B
AB AB
Running the CFG backwards
aaaaaa bb aa b S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+A AB B
AB AB
S
Running the CFG backwards
aaaaaa bb aa b S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+A AB B
AB AB
S
Parse Tree
What is a parser?
● Program that takes some input data (usually a string), and produces a
data-structure (usually a parse tree), based on some grammar (usually a
context-free grammar)
Instaparse is a clojure library for building
parsers from context-free grammars
A grammar that recognizes runs of a’s and b’s
S ::= AB*
AB ::= A B
A ::= ‘a’+
B ::= ‘b’+
ABNF notation for instaparse
Hello, instaparse
instaparse-talk.core=> (require '[instaparse.core :as insta])
nil
instaparse-talk.core=> (def as-and-bs
#_=> (insta/parser
#_=> "S = AB*
#_=> AB = A B
#_=> A = 'a'+
#_=> B = 'b'+"))
#'instaparse-talk.core/as-and-bs
Hello, instaparse
instaparse-talk.core=> (as-and-bs "aaabbbaabb")
[:S [:AB [:A "a" "a" "a"] [:B "b" "b" "b"]] [:AB [:A "a" "a"] [:B "b" "b"]]]
instaparse-talk.core=> (pprint *1)
[:S
[:AB [:A "a" "a" "a"] [:B "b" "b" "b"]]
[:AB [:A "a" "a"] [:B "b" "b"]]]
nil
instaparse-talk.core=> (insta/visualize (as-and-bs "aaabbbaabb"))
nil
Instaduction to instaparse
Walking the parse tree
Parse trees are just clojure data! We have a TON of great ways to handle them
● Recursive or iterative processing using case or core.match (pattern
matching)
● Zippers (functional navigation and “editing” of trees)
● insta/transform
● Seq/tree-seq
● Enlive (CSS-style selectors for clojure data structures)
● Any other way that you want to walk nested vectors in clojure!
Example: replacing a node with a zipper
instaparse-talk.core=> (-> (zip/vector-zip (as-and-bs "aaaabbbbaabbabbb"))
pprint)
[[:S
[:AB [:A "a" "a" "a" "a"] [:B "b" "b" "b" "b"]]
[:AB [:A "a" "a"] [:B "b" "b"]]
[:AB [:A "a"] [:B "b" "b" "b"]]]
nil]
nil
Example: replacing a node with a zipper
Example: infix to postfix using case statements
1 + 2 * 3 - 4 / 5 1 2 3 * + 4 5 / -
Example: infix to postfix using case statements
Instaduction to instaparse
Example: infix to postfix using case statements
Example: infix to postfix using case statements
Example: insta/transform
Apply this fn
To nodes that
match
magic!
Wrapping up
● Parsers turn text into trees
● Clojure is great at walking through trees
● Instaparse makes it easy to parse things
○ Programming languages
○ Config files
○ Data
○ Lots more!
The docs for instaparse are amazing. A lot of my examples were lifted straight
from it. Read the docs. They’re great. Everyone on the project did a fantastic job
https://github.com/Engelberg/instaparse
Thanks!
A cool way to visualize a CFG is with a railroad
diagram

More Related Content

PPTX
Chomsky Normal Form
PPTX
Function therory
PPT
Simplifies and normal forms - Theory of Computation
PPTX
Context free grammar
PPT
Simplifiaction of grammar
PPT
Chomsky & Greibach Normal Forms
PPTX
Context free grammar
PDF
Unit 8 text processing tools
Chomsky Normal Form
Function therory
Simplifies and normal forms - Theory of Computation
Context free grammar
Simplifiaction of grammar
Chomsky & Greibach Normal Forms
Context free grammar
Unit 8 text processing tools

Similar to Instaduction to instaparse (20)

PPT
Compiler design.ppt
PPT
Compiler design lessons notes from Semester
PPT
Ch2 (1).ppt
PPT
Programming_Language_Syntax.ppt
DOCX
PPTX
NLP_KASHK:Parsing with Context-Free Grammar
PPTX
Chapter-3 compiler.pptx course materials
PPTX
Syntactic specification is concerned with the structure and organization of t...
PPTX
Syntactic Analysis in Compiler Construction
PPT
PPTX
compiler design syntax analysis top down parsing
PDF
07 top-down-parsing
PDF
Context free langauges
PPT
Parsing
PPTX
CONTEXT FREE GRAMMAR
PPTX
COMPILER DESIGN LECTURES -UNIT-2 ST.pptx
PPT
Cd2 [autosaved]
PDF
Syntax Analysis.pdf
PDF
syntaxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.pdf
Compiler design.ppt
Compiler design lessons notes from Semester
Ch2 (1).ppt
Programming_Language_Syntax.ppt
NLP_KASHK:Parsing with Context-Free Grammar
Chapter-3 compiler.pptx course materials
Syntactic specification is concerned with the structure and organization of t...
Syntactic Analysis in Compiler Construction
compiler design syntax analysis top down parsing
07 top-down-parsing
Context free langauges
Parsing
CONTEXT FREE GRAMMAR
COMPILER DESIGN LECTURES -UNIT-2 ST.pptx
Cd2 [autosaved]
Syntax Analysis.pdf
syntaxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.pdf
Ad

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPTX
Configure Apache Mutual Authentication
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
Developing a website for English-speaking practice to English as a foreign la...
DOCX
search engine optimization ppt fir known well about this
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
Flame analysis and combustion estimation using large language and vision assi...
PPTX
Microsoft Excel 365/2024 Beginner's training
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PPTX
TEXTILE technology diploma scope and career opportunities
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PPTX
Chapter 5: Probability Theory and Statistics
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
NewMind AI Weekly Chronicles – August ’25 Week III
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Enhancing plagiarism detection using data pre-processing and machine learning...
The influence of sentiment analysis in enhancing early warning system model f...
Configure Apache Mutual Authentication
UiPath Agentic Automation session 1: RPA to Agents
Developing a website for English-speaking practice to English as a foreign la...
search engine optimization ppt fir known well about this
Consumable AI The What, Why & How for Small Teams.pdf
Flame analysis and combustion estimation using large language and vision assi...
Microsoft Excel 365/2024 Beginner's training
Taming the Chaos: How to Turn Unstructured Data into Decisions
Convolutional neural network based encoder-decoder for efficient real-time ob...
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
TEXTILE technology diploma scope and career opportunities
Credit Without Borders: AI and Financial Inclusion in Bangladesh
Chapter 5: Probability Theory and Statistics
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
Ad

Instaduction to instaparse

  • 2. Instaparse is a clojure library for building parsers from context-free grammars
  • 3. What is a parser? ● Program that takes some input data (usually a string), and produces a data-structure (usually a parse tree), based on some grammar (usually a context-free grammar)
  • 4. What’s a context-free grammar? Formal definition: V = finite set of non-terminals or variables. Each variable represents a clause or a phrase, or a syntactic category 𝚺 = finite set of terminals. The set of terminals is the alphabet of the language R = finite relation from V to (V U 𝚺)*. Each member of R is a rewrite rule or production S = the starting symbol, must be an element of V Adapted from https://en.wikipedia.org/wiki/Context-free_grammar#Formal_definitions
  • 5. What’s a context-free grammar? The “context-free” bit means that the rules can always be applied, regardless of the rest of the string (context). There are other kinds of grammars, some more or less powerful than CFG’s. See Chomsky Hierarchy for more
  • 6. A simple CFG S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 7. A simple CFG S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+ Non-terminals Non-terminals
  • 8. A simple CFG S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+ Terminals
  • 9. A simple CFG S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+ Productions
  • 10. A simple CFG S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+ Starting symbol
  • 11. Productions Each production is a rule. You can replace the symbol on the left with symbol(s) on the right. ‘+’ means “one or more”; ‘*’ means “zero or more” Non-terminals can be recursively defined, and appear on left- and right-side of rules Terminals only appear on the right side of a rule If you imagine a tree, non-terminals are interior nodes, terminals are leaf nodes (we’ll see this more later) S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 12. Productions example S S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 13. Productions example AB S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 14. Productions example A B S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 15. Productions example aaaaaa B S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 16. Productions example aaaaaabb S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 17. Some other strings that this grammar can generate aaabbbbababaaaabbbb abbbbbbbbb abababababababab aabbaabbbbbaaaaaab S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 18. Running a grammar “forwards” generates strings that conform to a grammar
  • 19. Running a grammar “backwards” over a string tells us if that string is valid, according to the grammar
  • 20. Running the CFG backwards aaaaaabbaab S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 21. Running the CFG backwards aaaaaabbaab S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 22. Running the CFG backwards A bbaab S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 23. Running the CFG backwards A B A B S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 24. Running the CFG backwards AB AB S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 25. Running the CFG backwards S S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 26. Running the CFG backwards S S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+ VALID!
  • 27. Let’s see that again, but not overwrite the string
  • 28. Running the CFG backwards aaaaaabbaab S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 29. Running the CFG backwards aaaaaa bb aa b S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+A AB B
  • 30. Running the CFG backwards aaaaaa bb aa b S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+A AB B AB AB
  • 31. Running the CFG backwards aaaaaa bb aa b S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+A AB B AB AB S
  • 32. Running the CFG backwards aaaaaa bb aa b S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+A AB B AB AB S Parse Tree
  • 33. What is a parser? ● Program that takes some input data (usually a string), and produces a data-structure (usually a parse tree), based on some grammar (usually a context-free grammar)
  • 34. Instaparse is a clojure library for building parsers from context-free grammars
  • 35. A grammar that recognizes runs of a’s and b’s S ::= AB* AB ::= A B A ::= ‘a’+ B ::= ‘b’+
  • 36. ABNF notation for instaparse
  • 37. Hello, instaparse instaparse-talk.core=> (require '[instaparse.core :as insta]) nil instaparse-talk.core=> (def as-and-bs #_=> (insta/parser #_=> "S = AB* #_=> AB = A B #_=> A = 'a'+ #_=> B = 'b'+")) #'instaparse-talk.core/as-and-bs
  • 38. Hello, instaparse instaparse-talk.core=> (as-and-bs "aaabbbaabb") [:S [:AB [:A "a" "a" "a"] [:B "b" "b" "b"]] [:AB [:A "a" "a"] [:B "b" "b"]]] instaparse-talk.core=> (pprint *1) [:S [:AB [:A "a" "a" "a"] [:B "b" "b" "b"]] [:AB [:A "a" "a"] [:B "b" "b"]]] nil instaparse-talk.core=> (insta/visualize (as-and-bs "aaabbbaabb")) nil
  • 40. Walking the parse tree Parse trees are just clojure data! We have a TON of great ways to handle them ● Recursive or iterative processing using case or core.match (pattern matching) ● Zippers (functional navigation and “editing” of trees) ● insta/transform ● Seq/tree-seq ● Enlive (CSS-style selectors for clojure data structures) ● Any other way that you want to walk nested vectors in clojure!
  • 41. Example: replacing a node with a zipper instaparse-talk.core=> (-> (zip/vector-zip (as-and-bs "aaaabbbbaabbabbb")) pprint) [[:S [:AB [:A "a" "a" "a" "a"] [:B "b" "b" "b" "b"]] [:AB [:A "a" "a"] [:B "b" "b"]] [:AB [:A "a"] [:B "b" "b" "b"]]] nil] nil
  • 42. Example: replacing a node with a zipper
  • 43. Example: infix to postfix using case statements 1 + 2 * 3 - 4 / 5 1 2 3 * + 4 5 / -
  • 44. Example: infix to postfix using case statements
  • 46. Example: infix to postfix using case statements
  • 47. Example: infix to postfix using case statements
  • 48. Example: insta/transform Apply this fn To nodes that match magic!
  • 49. Wrapping up ● Parsers turn text into trees ● Clojure is great at walking through trees ● Instaparse makes it easy to parse things ○ Programming languages ○ Config files ○ Data ○ Lots more! The docs for instaparse are amazing. A lot of my examples were lifted straight from it. Read the docs. They’re great. Everyone on the project did a fantastic job https://github.com/Engelberg/instaparse
  • 51. A cool way to visualize a CFG is with a railroad diagram