SlideShare a Scribd company logo
Parsing Expression Grammars:
A Recognition­Based Syntactic Foundation
Bryan Ford
Massachusetts Institute of Technology
January 14, 2004
Designing a Language Syntax
Designing a Language Syntax
1.Formalize syntax via
context­free grammar
2.Write a YACC parser
specification
3.Hack on grammar
until “ near­LALR(1)”
4.Use generated parser
Textbook Method
Designing a Language Syntax
1.Formalize syntax via
context­free grammar
2.Write a YACC parser
specification
3.Hack on grammar
until “ near­LALR(1)”
4.Use generated parser
1.Specify syntax
informally
2.Write a recursive
descent parser
Textbook Method Pragmatic Method

What exactly does a CFG describe?
Short answer:
a rule system to generate language strings
Example CFG:
S  aaS
S  
S
aaS
aa aaaaS
...
aaaa

What exactly does a CFG describe?
Short answer:
a rule system to generate language strings
Example CFG:
S  aaS
S  
S
aaS
aa aaaaS
...
aaaa
Start symbol

What exactly does a CFG describe?
Short answer:
a rule system to generate language strings
Example CFG:
S  aaS
S  
S
aaS
aa aaaaS
...
aaaa
Start symbol
Output strings
What exatly do we want to describe?
Proposed answer:
a rule system to recognize language strings
Parsing Expression Grammar (PEG)
models recursive descent parsing practice
Example PEG:
S  aaS / 
a a a a 
S
S
S
a a
a a
What exatly do we want to describe?
Proposed answer:
a rule system to recognize language strings
Parsing Expression Grammar (PEG)
models recursive descent parsing practice
Example PEG:
S  aaS / 
a a a a 
S
S
S
a a
a a
Input
string
What exatly do we want to describe?
Proposed answer:
a rule system to recognize language strings
Parsing Expression Grammar (PEG)
models recursive descent parsing practice
Example PEG:
S  aaS / 
a a a a 
S
S
S
a a
a a
Input
string
Derive
structure
Take­Home Points
Key benefits of PEGs:
● Simplicity, formalism, analyzability of CFGs
● Closer match to syntax practices
– More expressive than deterministic CFGs (LL/LR)
– More of the “ right kind” of expressiveness:
prioritized choice, greedy rules, syntactic predicates
– Unlimited lookahead, backtracking
● Linear­time parsing for any PEG
What kind of
recursive descent parsing?
Key assumptions:
● Parsing functions are stateless:
depend only on input string
● Parsing functions make decisions locally:
return at most one result (success/failure)
Parsing Expression Grammars
Consists of: (∑, N, R, eS)
– ∑: finite set of terminals (character set)
– N: finite set of nonterminals
– R: finite set of rules of the form “A  e”,
where A ∈ N, e is a parsing expression.
– eS: a parsing expression called the start expression.
Parsing Expressions
 the empty string
a terminal (a ∈ ∑)
A nonterminal (A ∈ N)
e1 e2 a sequence of parsing expressions
e1 / e2 prioritized choice between alternatives
e?, e*, e+ optional, zero­or­more, one­or­more
&e, !e syntactic predicates
How PEGs Express Languages
Given input string s, a parsing expression either:
– Matches and consumes a prefix s'
of s.
– Fails on s.
Example:
S  bad
S matches “ badder”
S matches “ baddest”
S fails on “ abad”
S fails on “ babe”
Prioritized Choice with Backtracking
S  A / B means:
“ To parse an S, first try to parse an A.
If A fails, then backtrack and try to parse a B.”
Example:
S  if C then S else S /
if C then S
S matches “ if C then S foo”
S matches “ if C then S1 else S2”
S fails on “ if C else S”
Prioritized Choice with Backtracking
S  A / B means:
“ To parse an S, first try to parse an A.
If A fails, then backtrack and try to parse a B.”
Example from the C++ standard:
“ An expression­statement ... can be indistinguishable
from a declaration ... In those cases the statement is a
declaration.”
statement  declaration /
expression­statement
Greedy Option and Repetition
A  e? equivalent to A  e / 
A  e* equivalent to A  e A / 
A  e+ equivalent to A  e e*
Example:
I  L+
L  a / b / c / ...
I matches “ foobar”
I matches “ foo(bar)”
I fails on “ 123”
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
A  foo &(bar)
B  foo !(bar)
A matches “ foobar”
A fails on “ foobie”
B matches “ foobie”
B fails on “ foobar”
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
C  B I* E
I  !E (C / T)
B  (*
E  *)
T  [any terminal]
C matches “ (*ab*)cd”
C matches “ (*a(*b*)c*)”
C fails on “ (*a(*b*)”
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
C  B I* E
I  !E (C / T)
B  (*
E  *)
T  [any terminal]
C matches “ (*ab*)cd”
C matches “ (*a(*b*)c*)”
C fails on “ (*a(*b*)”
Begin marker
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
C  B I* E
I  !E (C / T)
B  (*
E  *)
T  [any terminal]
C matches “ (*ab*)cd”
C matches “ (*a(*b*)c*)”
C fails on “ (*a(*b*)”
Internal elements
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
C  B I* E
I  !E (C / T)
B  (*
E  *)
T  [any terminal]
C matches “ (*ab*)cd”
C matches “ (*a(*b*)c*)”
C fails on “ (*a(*b*)”
End marker
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
C  B I* E
I  !E (C / T)
B  (*
E  *)
T  [any terminal]
C matches “ (*ab*)cd”
C matches “ (*a(*b*)c*)”
C fails on “ (*a(*b*)”
➔
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
C  B I* E
I  !E (C / T)
B  (*
E  *)
T  [any terminal]
C matches “ (*ab*)cd”
C matches “ (*a(*b*)c*)”
C fails on “ (*a(*b*)”
Only if an end marker doesn'
t start here...
➔
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
C  B I* E
I  !E (C / T)
B  (*
E  *)
T  [any terminal]
C matches “ (*ab*)cd”
C matches “ (*a(*b*)c*)”
C fails on “ (*a(*b*)”
Only if an end marker doesn'
t start here...
...consume a nested comment,
or else consume any single character.
➔
Syntactic Predicates
And­predicate: &e succeeds whenever e does,
but consumes no input [Parr '
94, '
95]
Not­predicate: !e succeeds whenever e fails
Example:
C  B I* E
I  !E (C / T)
B  (*
E  *)
T  [any terminal]
C matches “ (*ab*)cd”
C matches “ (*a(*b*)c*)”
C fails on “ (*a(*b*)”
Unified Grammars
PEGs can express both lexical and hierarchical
syntax of realistic languages in one grammar
● Example (in paper):
Complete self­describing PEG in 2/3 column
● Example (on web):
Unified PEG for Java language
Lexical/Hierarchical Interplay
Unified grammars create new design opportunities
Example:
To get Unicode “ ∀”,
instead of “u2200”,
write “(0x2200)”
or “(8704)”
or “(FOR_ALL)”
E  S / ( E ) / ...
S  “ C* “
C  ( E ) /
!“ ! T
T  [any terminal]
Lexical/Hierarchical Interplay
Unified grammars create new design opportunities
Example:
To get Unicode “ ∀”,
instead of “u2200”,
write “(0x2200)”
or “(8704)”
or “(FOR_ALL)”
E  S / ( E ) / ...
S  “ C* “
C  ( E ) /
!“ ! T
T  [any terminal]
General­purpose expression syntax
Lexical/Hierarchical Interplay
Unified grammars create new design opportunities
Example:
To get Unicode “ ∀”,
instead of “u2200”,
write “(0x2200)”
or “(8704)”
or “(FOR_ALL)”
E  S / ( E ) / ...
S  “ C* “
C  ( E ) /
!“ ! T
T  [any terminal]
String literals
Lexical/Hierarchical Interplay
Unified grammars create new design opportunities
Example:
To get Unicode “ ∀”,
instead of “u2200”,
write “(0x2200)”
or “(8704)”
or “(FOR_ALL)”
E  S / ( E ) / ...
S  “ C* “
C  ( E ) /
!“ ! T
T  [any terminal]
Quotable characters
Lexical/Hierarchical Interplay
Unified grammars create new design opportunities
Example:
To get Unicode “ ∀”,
instead of “u2200”,
write “(0x2200)”
or “(8704)”
or “(FOR_ALL)”
E  S / ( E ) / ...
S  “ C* “
C  ( E ) /
!“ ! T
T  [any terminal]
Formal Properties of PEGs
● Express all deterministic languages ­ LR(k)
● Closed under union, intersection, complement
●
Some non­context free languages, e.g., an
bn
cn
● Undecidable whether L(G) = ∅
● Predicate operators can be eliminated
– ...but the process is non­trivial!
Minimalist Forms
Predicate­free PEG
⇩
TS [Birman '
70/'
73]
TDPL [Aho '
72]
Any PEG
⇩
gTS [Birman '
70/'
73]
GTDPL [Aho '
72]
A  
A  a
A  f
A  BC / D
A  
A  a
A  f
A  B[C, D]
⇦⇨
Formal Contributions
● Generalize TDPL/GTDPL with more expressive
structured parsing expression syntax
● Negative syntactic predicate ­ !e
● Predicate elimination transformation
– Intermediate stages depend on
generalized parsing expressions
● Proof of equivalence of TDPL and GTDPL
What can'
t PEGs express directly?
● Ambiguous languages
That's
what CFGs were designed for!
● Globally disambiguated languages?
– {a,b}n a {a,b}n ?
● State­ or semantic­dependent syntax
– C, C++ typedef symbol tables
– Python, Haskell, ML layout
Generating Parsers from PEGs
Recursive­descent parsing
☞Simple & direct, but exponential­time if not careful
Packrat parsing [Birman '
70/'
73, Ford '
02]
☞Linear­time, but can consume substantial storage
Classic LL/LR algorithms?
☞Grammar restrictions, but both time­ & space­efficient
Conclusion
PEGs model common parsing practices
– Prioritized choice, greedy rules, syntactic predicates
PEGs naturally complement CFGs
– CFG: generative system, for ambiguous languages
– PEG: recognition­based, for unambiguous languages
For more info:
http://pdos.lcs.mit.edu/~baford/packrat
(or G
Go
oo
og
gl
le
e for “ Packrat Parsing”)

More Related Content

PDF
New compiler design 101 April 13 2024.pdf
PPTX
COMPILER DESIGN LECTURES -UNIT-2 ST.pptx
PPT
Chapter 3 -Syntax Analyzer.ppt
PPTX
Syntax_Analysis_Syntax analysis_NLP.pptx
PPTX
Unitiv 111206005201-phpapp01
PDF
syntaxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.pdf
PPTX
Syntactic specification is concerned with the structure and organization of t...
PPTX
System Programming Unit IV
New compiler design 101 April 13 2024.pdf
COMPILER DESIGN LECTURES -UNIT-2 ST.pptx
Chapter 3 -Syntax Analyzer.ppt
Syntax_Analysis_Syntax analysis_NLP.pptx
Unitiv 111206005201-phpapp01
syntaxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.pdf
Syntactic specification is concerned with the structure and organization of t...
System Programming Unit IV

Similar to Parsing Expression Grammars (20)

PPT
Integrated Fundamental and Technical Analysis of Select Public Sector Oil Com...
PDF
Syntax part1
PDF
3a. Context Free Grammar.pdf
PPTX
Syntactic Analysis in Compiler Construction
PDF
Learn from LL(1) to PEG parser the hard way
DOCX
8-Practice problems on operator precedence parser-24-05-2023.docx
PPT
Module 11
PPT
SYNTAX ANALYSIS, PARSING, BACKTRACKING IN COMPILER DESIGN
PPTX
Syntax Analysis in Compiler Design
PDF
Syntax analysis
PPTX
Top Down Parsing, Predictive Parsing
PPTX
unit2_cdunit2_cdunit2_cdunit2_cdunit2_cd.pptx
PPT
PPT
Cd2 [autosaved]
PPT
Programming_Language_Syntax.ppt
PPT
Syntax analysis and Run time Environment
PPT
SS & CD Module 3
PPT
Module 2
PPT
atc 3rd module compiler and automata.ppt
DOCX
Integrated Fundamental and Technical Analysis of Select Public Sector Oil Com...
Syntax part1
3a. Context Free Grammar.pdf
Syntactic Analysis in Compiler Construction
Learn from LL(1) to PEG parser the hard way
8-Practice problems on operator precedence parser-24-05-2023.docx
Module 11
SYNTAX ANALYSIS, PARSING, BACKTRACKING IN COMPILER DESIGN
Syntax Analysis in Compiler Design
Syntax analysis
Top Down Parsing, Predictive Parsing
unit2_cdunit2_cdunit2_cdunit2_cdunit2_cd.pptx
Cd2 [autosaved]
Programming_Language_Syntax.ppt
Syntax analysis and Run time Environment
SS & CD Module 3
Module 2
atc 3rd module compiler and automata.ppt
Ad

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Big Data Technologies - Introduction.pptx
PDF
KodekX | Application Modernization Development
PDF
Approach and Philosophy of On baking technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPT
Teaching material agriculture food technology
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
20250228 LYD VKU AI Blended-Learning.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Empathic Computing: Creating Shared Understanding
The Rise and Fall of 3GPP – Time for a Sabbatical?
Building Integrated photovoltaic BIPV_UPV.pdf
Unlocking AI with Model Context Protocol (MCP)
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The AUB Centre for AI in Media Proposal.docx
Big Data Technologies - Introduction.pptx
KodekX | Application Modernization Development
Approach and Philosophy of On baking technology
Review of recent advances in non-invasive hemoglobin estimation
NewMind AI Weekly Chronicles - August'25 Week I
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Teaching material agriculture food technology
Ad

Parsing Expression Grammars

  • 1. Parsing Expression Grammars: A Recognition­Based Syntactic Foundation Bryan Ford Massachusetts Institute of Technology January 14, 2004
  • 3. Designing a Language Syntax 1.Formalize syntax via context­free grammar 2.Write a YACC parser specification 3.Hack on grammar until “ near­LALR(1)” 4.Use generated parser Textbook Method
  • 4. Designing a Language Syntax 1.Formalize syntax via context­free grammar 2.Write a YACC parser specification 3.Hack on grammar until “ near­LALR(1)” 4.Use generated parser 1.Specify syntax informally 2.Write a recursive descent parser Textbook Method Pragmatic Method
  • 5.  What exactly does a CFG describe? Short answer: a rule system to generate language strings Example CFG: S  aaS S   S aaS aa aaaaS ... aaaa
  • 6.  What exactly does a CFG describe? Short answer: a rule system to generate language strings Example CFG: S  aaS S   S aaS aa aaaaS ... aaaa Start symbol
  • 7.  What exactly does a CFG describe? Short answer: a rule system to generate language strings Example CFG: S  aaS S   S aaS aa aaaaS ... aaaa Start symbol Output strings
  • 8. What exatly do we want to describe? Proposed answer: a rule system to recognize language strings Parsing Expression Grammar (PEG) models recursive descent parsing practice Example PEG: S  aaS /  a a a a  S S S a a a a
  • 9. What exatly do we want to describe? Proposed answer: a rule system to recognize language strings Parsing Expression Grammar (PEG) models recursive descent parsing practice Example PEG: S  aaS /  a a a a  S S S a a a a Input string
  • 10. What exatly do we want to describe? Proposed answer: a rule system to recognize language strings Parsing Expression Grammar (PEG) models recursive descent parsing practice Example PEG: S  aaS /  a a a a  S S S a a a a Input string Derive structure
  • 11. Take­Home Points Key benefits of PEGs: ● Simplicity, formalism, analyzability of CFGs ● Closer match to syntax practices – More expressive than deterministic CFGs (LL/LR) – More of the “ right kind” of expressiveness: prioritized choice, greedy rules, syntactic predicates – Unlimited lookahead, backtracking ● Linear­time parsing for any PEG
  • 12. What kind of recursive descent parsing? Key assumptions: ● Parsing functions are stateless: depend only on input string ● Parsing functions make decisions locally: return at most one result (success/failure)
  • 13. Parsing Expression Grammars Consists of: (∑, N, R, eS) – ∑: finite set of terminals (character set) – N: finite set of nonterminals – R: finite set of rules of the form “A  e”, where A ∈ N, e is a parsing expression. – eS: a parsing expression called the start expression.
  • 14. Parsing Expressions  the empty string a terminal (a ∈ ∑) A nonterminal (A ∈ N) e1 e2 a sequence of parsing expressions e1 / e2 prioritized choice between alternatives e?, e*, e+ optional, zero­or­more, one­or­more &e, !e syntactic predicates
  • 15. How PEGs Express Languages Given input string s, a parsing expression either: – Matches and consumes a prefix s' of s. – Fails on s. Example: S  bad S matches “ badder” S matches “ baddest” S fails on “ abad” S fails on “ babe”
  • 16. Prioritized Choice with Backtracking S  A / B means: “ To parse an S, first try to parse an A. If A fails, then backtrack and try to parse a B.” Example: S  if C then S else S / if C then S S matches “ if C then S foo” S matches “ if C then S1 else S2” S fails on “ if C else S”
  • 17. Prioritized Choice with Backtracking S  A / B means: “ To parse an S, first try to parse an A. If A fails, then backtrack and try to parse a B.” Example from the C++ standard: “ An expression­statement ... can be indistinguishable from a declaration ... In those cases the statement is a declaration.” statement  declaration / expression­statement
  • 18. Greedy Option and Repetition A  e? equivalent to A  e /  A  e* equivalent to A  e A /  A  e+ equivalent to A  e e* Example: I  L+ L  a / b / c / ... I matches “ foobar” I matches “ foo(bar)” I fails on “ 123”
  • 19. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: A  foo &(bar) B  foo !(bar) A matches “ foobar” A fails on “ foobie” B matches “ foobie” B fails on “ foobar”
  • 20. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: C  B I* E I  !E (C / T) B  (* E  *) T  [any terminal] C matches “ (*ab*)cd” C matches “ (*a(*b*)c*)” C fails on “ (*a(*b*)”
  • 21. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: C  B I* E I  !E (C / T) B  (* E  *) T  [any terminal] C matches “ (*ab*)cd” C matches “ (*a(*b*)c*)” C fails on “ (*a(*b*)” Begin marker
  • 22. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: C  B I* E I  !E (C / T) B  (* E  *) T  [any terminal] C matches “ (*ab*)cd” C matches “ (*a(*b*)c*)” C fails on “ (*a(*b*)” Internal elements
  • 23. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: C  B I* E I  !E (C / T) B  (* E  *) T  [any terminal] C matches “ (*ab*)cd” C matches “ (*a(*b*)c*)” C fails on “ (*a(*b*)” End marker
  • 24. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: C  B I* E I  !E (C / T) B  (* E  *) T  [any terminal] C matches “ (*ab*)cd” C matches “ (*a(*b*)c*)” C fails on “ (*a(*b*)” ➔
  • 25. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: C  B I* E I  !E (C / T) B  (* E  *) T  [any terminal] C matches “ (*ab*)cd” C matches “ (*a(*b*)c*)” C fails on “ (*a(*b*)” Only if an end marker doesn' t start here... ➔
  • 26. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: C  B I* E I  !E (C / T) B  (* E  *) T  [any terminal] C matches “ (*ab*)cd” C matches “ (*a(*b*)c*)” C fails on “ (*a(*b*)” Only if an end marker doesn' t start here... ...consume a nested comment, or else consume any single character. ➔
  • 27. Syntactic Predicates And­predicate: &e succeeds whenever e does, but consumes no input [Parr ' 94, ' 95] Not­predicate: !e succeeds whenever e fails Example: C  B I* E I  !E (C / T) B  (* E  *) T  [any terminal] C matches “ (*ab*)cd” C matches “ (*a(*b*)c*)” C fails on “ (*a(*b*)”
  • 28. Unified Grammars PEGs can express both lexical and hierarchical syntax of realistic languages in one grammar ● Example (in paper): Complete self­describing PEG in 2/3 column ● Example (on web): Unified PEG for Java language
  • 29. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: To get Unicode “ ∀”, instead of “u2200”, write “(0x2200)” or “(8704)” or “(FOR_ALL)” E  S / ( E ) / ... S  “ C* “ C  ( E ) / !“ ! T T  [any terminal]
  • 30. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: To get Unicode “ ∀”, instead of “u2200”, write “(0x2200)” or “(8704)” or “(FOR_ALL)” E  S / ( E ) / ... S  “ C* “ C  ( E ) / !“ ! T T  [any terminal] General­purpose expression syntax
  • 31. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: To get Unicode “ ∀”, instead of “u2200”, write “(0x2200)” or “(8704)” or “(FOR_ALL)” E  S / ( E ) / ... S  “ C* “ C  ( E ) / !“ ! T T  [any terminal] String literals
  • 32. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: To get Unicode “ ∀”, instead of “u2200”, write “(0x2200)” or “(8704)” or “(FOR_ALL)” E  S / ( E ) / ... S  “ C* “ C  ( E ) / !“ ! T T  [any terminal] Quotable characters
  • 33. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: To get Unicode “ ∀”, instead of “u2200”, write “(0x2200)” or “(8704)” or “(FOR_ALL)” E  S / ( E ) / ... S  “ C* “ C  ( E ) / !“ ! T T  [any terminal]
  • 34. Formal Properties of PEGs ● Express all deterministic languages ­ LR(k) ● Closed under union, intersection, complement ● Some non­context free languages, e.g., an bn cn ● Undecidable whether L(G) = ∅ ● Predicate operators can be eliminated – ...but the process is non­trivial!
  • 35. Minimalist Forms Predicate­free PEG ⇩ TS [Birman ' 70/' 73] TDPL [Aho ' 72] Any PEG ⇩ gTS [Birman ' 70/' 73] GTDPL [Aho ' 72] A   A  a A  f A  BC / D A   A  a A  f A  B[C, D] ⇦⇨
  • 36. Formal Contributions ● Generalize TDPL/GTDPL with more expressive structured parsing expression syntax ● Negative syntactic predicate ­ !e ● Predicate elimination transformation – Intermediate stages depend on generalized parsing expressions ● Proof of equivalence of TDPL and GTDPL
  • 37. What can' t PEGs express directly? ● Ambiguous languages That's what CFGs were designed for! ● Globally disambiguated languages? – {a,b}n a {a,b}n ? ● State­ or semantic­dependent syntax – C, C++ typedef symbol tables – Python, Haskell, ML layout
  • 38. Generating Parsers from PEGs Recursive­descent parsing ☞Simple & direct, but exponential­time if not careful Packrat parsing [Birman ' 70/' 73, Ford ' 02] ☞Linear­time, but can consume substantial storage Classic LL/LR algorithms? ☞Grammar restrictions, but both time­ & space­efficient
  • 39. Conclusion PEGs model common parsing practices – Prioritized choice, greedy rules, syntactic predicates PEGs naturally complement CFGs – CFG: generative system, for ambiguous languages – PEG: recognition­based, for unambiguous languages For more info: http://pdos.lcs.mit.edu/~baford/packrat (or G Go oo og gl le e for “ Packrat Parsing”)