SlideShare a Scribd company logo
Recursive Descent Parsing
In practice with PHP
Plan for the next 40 mins
1. Walk through creating a Parsing Expression Grammar
and scannerless predictive recursive descent parser
for a subset of print_r output.
2. Talk about why anyone would want to do such a
thing.
Source Code: https://bit.ly/dpc14rdp
Disclaimer: I am not …
I am Boy Baukema
Senior Software Engineer @ ibuildings.nl
print_r
(PHP 4, PHP 5)
print_r — Prints human-readable information about a
variable
An example
Array	
(	
[Talk] => Array	
(	
[Title] => Ansible: Orchestrate	
[Type] => 3
Just one problem…
!
!
It’s unparsable.
No escaping
> print_r(array("a"=>"n [b] => evil"));	
Array	
(	
[a] => 	
[b] => evil	
)
print_r
* for anything non-trivial
–Martin Fowler
“…it’s a technique that isn't as widely known as it
should be. Many people are under the impression
that using it is quite hard. I think that this fear often
comes from the fact that Syntax- Directed
Translation is usually described in the context of
parsing a general-purpose language—which
introduces a lot of complexities that you don't face
with a DSL.”
V1 - An empty array
Source Code: https://bit.ly/dpc14rdp
> print_r(array());
Array
(
)
ARRAY <- ARRAY_START
LF
PAREN_OPEN
LF
PAREN_CLOSE
LF
ARRAY_START <- ‘Array’
LF <- “n”
PAREN_OPEN <- ‘(’
PAREN_CLOSE <- ‘)’
PrintRLang  V1 
RecursiveDescentParser
- $content : string
+ __construct ( string $content )	
+ consume ( string $terminal )	
+ lookAhead ( string $terminal )
Source Code: https://bit.ly/dpc14rdp
PrintRLang  V1  

ArrayParser
- $parser : RecursiveDescentParser
+ __construct(RecursiveDescentParser $parser)	
+ parse(): array	
+ arrayStart()	
+ lf()	
+ braceOpen()	
+ braceClose()
Source Code: https://bit.ly/dpc14rdp
!
$parser = new PrintRLang  ArrayParser(	
new PrintRLang  RecursiveDescentParser(	
"Arrayn(n)n"	
)	
);	
$parser->parse();
public function parse() {	
$this->arrayStart();	
$this->lf();	
$this->braceOpen();	
$this->lf()	
$this->braceClose();	
$this->lf();	
return array();	
}
A r r a y n ( n ) n
public function arrayStart() {	
$this->parser->consume('Array');	
}
n ( n ) n
n ( n ) n
public function lf() {	
$this->parser->consume("n");	
}
( n ) n
( n ) n
public function braceOpen() {	
$this->parser->consume('(');	
}
n ) n
n ) n
public function lf() {	
$this->parser->consume("n");	
}
) n
) n
public function braceClose() {	
$this->parser->consume(')');	
}
n
n
public function lf() {	
$this->parser->consume("n");	
}
V2 - Array of strings
Source Code: https://bit.ly/dpc14rdp
Array	
(	
[Room] => E104	
[Difficulty] => 2	
[Type] => 1	
)
ARRAY <- ARRAY_START	
LF	
PAREN_OPEN	
LF	
ARRAY_ASSIGN*	
PAREN_CLOSE	
LF
Kleene star
translates to:
ARRAY_ASSIGN*
while (lookAhead(' '))	
$result = arrayAssign($result)
ARRAY_ASSIGN <- SPACE+	
ARRAY_KEY	
SPACE	
FAT_ARROW	
SPACE	
ARRAY_VALUE	
LF
Kleene plus
SPACE+ === SPACE SPACE*
Kleene plus implemented
space()	
while (lookAhead(' '))	
space()
ARRAY_KEY <- BRACKET_OPEN	
KEY_VALUE	
BRACKET_CLOSE	
KEY_VALUE <-!BRACKET_CLOSE
ARRAY_VALUE <- !LF
PrintRLang  V2 
RecursiveDescentParser
- $content : string
+ __construct ( string $content )	
+ consume ( string $terminal )	
+ consumeRegex( string $regex )
+ lookAhead ( string $terminal )	
+ lookAheadRegex( string $regex
Source Code: https://bit.ly/dpc14rdp
PrintRLang  V2  

ArrayParser
- $parser : RecursiveDescentParser
...
+ arrayAssign( array $result )
+ arrayKey() : string
+ arrayValue() : string
+ space()
+ fatArrow
...
V3 - Array of Arrays
Array	
(	
[Talk] => Array	
(	
[Title] => Ansible: Orchestrate	
[Type] => 3	
)	
)
ARRAY_VALUE <- ARRAY / 	
	 	 	 	 	 	 	 	 	 	 	 STRING	
STRING		 	 	 	 <- !LF
ARRAY <- ARRAY_START	
LF	
SPACE*	
PAREN_OPEN	
LF	
ARRAY_ASSIGN*	
SPACE*	
PAREN_CLOSE
PrintRLang  V3  

ArrayParser
- $parser : RecursiveDescentParser
...
+ string()	
...
Why?
– Steve Yegge, Rich Programmer Food
“If you don't know how parsing works, you'll do it
badly with regular expressions, or if you don't know
those, then with hand-rolled state machines that are
thousands of lines of incomprehensible code that
doesn't actually work.”
Mail::RFC822::Address
(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t]
)+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:
rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(
?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[
t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-0
31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*
](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+
(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:
(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z
|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)
?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:
rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[
t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)
?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t]
)*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[
t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*
)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t]
)+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)
*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+
|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:r
n)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:
rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t
]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031
]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](
?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?
:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?
:rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:".[] 000-031]+(?:(?
:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?
[ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[]
000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|
.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>
@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"
(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t]
)*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:
".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?
:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[
]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-
031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(
?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;
:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([
^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:"
.[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[
]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".
[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]
r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[]
000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]
|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 0
00-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|
.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,
;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?
:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*
(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".
[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[
^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]
]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*(
?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:
".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(
?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[
["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t
])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t
])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?
:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|
Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:
[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[
]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)
?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["
()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)
?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>
@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[
t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,
;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t]
)*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:
".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?
(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".
[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:
rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[[
"()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])
*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])
+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:
.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z
|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(
?:rn)?[ t])*))*)?;s*)
Useful applications I’ve seen
REST API with CQL querying (MediaMosa.org)
Migrating wiki content
Parsing log files
Parsing obscure specifications (ARF)
Concise configuration files
Domain Specific Languages
–Martin Fowler, Domain Specific Languages
“a DSL is a front-end to a library providing a different
style of manipulation to the command-query API.	

”
Rules for building a parser
Consider using an existing parser.
Consider porting one from another language.
Consider XML or the new XMLs: JSON / YAML
Consider working around it.
Then and only then consider building your own
parser
Whereto from here?
Let’s build a parser!
http://protalk.me/dpcradio-lets-build-a-parser
Thank you for your time and attention!
Questions?
Tweet to @relaxnow
Rate @ https://joind.in/10859
Slides @ https://joind.in/10859
Code @ https://bit.ly/dpc14rdp

More Related Content

PPTX
PPTX
Segmentation in operating systems
PPTX
Top down and botttom up Parsing
PPTX
RECURSIVE DESCENT PARSING
PDF
Implementation of Pipe in Linux
PPT
Data race
PDF
Intro to AI STRIPS Planning & Applications in Video-games Lecture6-Part1
PDF
Dbms 14: Relational Calculus
Segmentation in operating systems
Top down and botttom up Parsing
RECURSIVE DESCENT PARSING
Implementation of Pipe in Linux
Data race
Intro to AI STRIPS Planning & Applications in Video-games Lecture6-Part1
Dbms 14: Relational Calculus

What's hot (20)

PPTX
Church Turing Thesis
PDF
2.2. interactive computer graphics
PDF
Memory management
PDF
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
PDF
Hardware Attacks and Security
PPT
Concurrency
PDF
bag-of-words models
PPTX
The role of the parser and Error recovery strategies ppt in compiler design
PDF
Compiler Design- Machine Independent Optimizations
PPTX
Daa:Dynamic Programing
PPTX
Ngrams smoothing
PPTX
Syntax Analysis in Compiler Design
PDF
Syntax analysis
PPTX
Word embedding
PPT
12-Syntax Directed Definition – Evaluation Order-09-06-2023.ppt
PPTX
Structure of the compiler
PDF
Computer Graphics - Output Primitive
PPT
Unix memory management
PPTX
unit-4-dynamic programming
Church Turing Thesis
2.2. interactive computer graphics
Memory management
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Hardware Attacks and Security
Concurrency
bag-of-words models
The role of the parser and Error recovery strategies ppt in compiler design
Compiler Design- Machine Independent Optimizations
Daa:Dynamic Programing
Ngrams smoothing
Syntax Analysis in Compiler Design
Syntax analysis
Word embedding
12-Syntax Directed Definition – Evaluation Order-09-06-2023.ppt
Structure of the compiler
Computer Graphics - Output Primitive
Unix memory management
unit-4-dynamic programming
Ad

Similar to Recursive descent parsing (20)

KEY
Let's build a parser!
PPT
Programming_Language_Syntax.ppt
ZIP
Round PEG, Round Hole - Parsing Functionally
PDF
Parsing
PDF
Perly Parsing with Regexp::Grammars
PDF
When RegEx is not enough
PDF
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
PPTX
P3 2018 python_regexes
PDF
Certified bit coded regular expression parsing
PDF
Left factor put
PPTX
Extracting Archival-Quality Information from Software-Related Chats
PDF
Stop overusing regular expressions!
PDF
Parsing Expression Grammars and Treetop
PDF
How to create a programming language
PPTX
Lexing and parsing
PDF
SWP - A Generic Language Parser
PDF
(1) cpp introducing the_cpp_programming_language
PPTX
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
PPT
Cd2 [autosaved]
PPT
Parsing
Let's build a parser!
Programming_Language_Syntax.ppt
Round PEG, Round Hole - Parsing Functionally
Parsing
Perly Parsing with Regexp::Grammars
When RegEx is not enough
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
P3 2018 python_regexes
Certified bit coded regular expression parsing
Left factor put
Extracting Archival-Quality Information from Software-Related Chats
Stop overusing regular expressions!
Parsing Expression Grammars and Treetop
How to create a programming language
Lexing and parsing
SWP - A Generic Language Parser
(1) cpp introducing the_cpp_programming_language
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Cd2 [autosaved]
Parsing
Ad

More from Boy Baukema (12)

PPTX
Security horrors
PPTX
Tampering with JavaScript
PDF
Code by the sea: Web Application Security
PDF
Ibuildings ISO 27001 lunchbox
PDF
OWASP ASVS 3 - What's new for level 1?
PDF
Verifying Drupal modules with OWASP ASVS 2014
PDF
Secure Drupal, from start to finish
PDF
Security as a part of quality assurance
PDF
Dpc14 security as part of Quality Assurance
PDF
SURFconext and Mobile
PDF
WebAppSec @ Ibuildings in 2014
PDF
Javascript: 8 Reasons Every PHP Developer Should Love It
Security horrors
Tampering with JavaScript
Code by the sea: Web Application Security
Ibuildings ISO 27001 lunchbox
OWASP ASVS 3 - What's new for level 1?
Verifying Drupal modules with OWASP ASVS 2014
Secure Drupal, from start to finish
Security as a part of quality assurance
Dpc14 security as part of Quality Assurance
SURFconext and Mobile
WebAppSec @ Ibuildings in 2014
Javascript: 8 Reasons Every PHP Developer Should Love It

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Spectroscopy.pptx food analysis technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
KodekX | Application Modernization Development
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
20250228 LYD VKU AI Blended-Learning.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Reach Out and Touch Someone: Haptics and Empathic Computing
sap open course for s4hana steps from ECC to s4
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Spectroscopy.pptx food analysis technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
MIND Revenue Release Quarter 2 2025 Press Release
KodekX | Application Modernization Development
Chapter 3 Spatial Domain Image Processing.pdf
Electronic commerce courselecture one. Pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Unlocking AI with Model Context Protocol (MCP)
MYSQL Presentation for SQL database connectivity
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The AUB Centre for AI in Media Proposal.docx
Per capita expenditure prediction using model stacking based on satellite ima...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

Recursive descent parsing

  • 1. Recursive Descent Parsing In practice with PHP
  • 2. Plan for the next 40 mins 1. Walk through creating a Parsing Expression Grammar and scannerless predictive recursive descent parser for a subset of print_r output. 2. Talk about why anyone would want to do such a thing. Source Code: https://bit.ly/dpc14rdp
  • 4. I am Boy Baukema Senior Software Engineer @ ibuildings.nl
  • 5. print_r (PHP 4, PHP 5) print_r — Prints human-readable information about a variable
  • 6. An example Array ( [Talk] => Array ( [Title] => Ansible: Orchestrate [Type] => 3
  • 8. No escaping > print_r(array("a"=>"n [b] => evil")); Array ( [a] => [b] => evil )
  • 10. –Martin Fowler “…it’s a technique that isn't as widely known as it should be. Many people are under the impression that using it is quite hard. I think that this fear often comes from the fact that Syntax- Directed Translation is usually described in the context of parsing a general-purpose language—which introduces a lot of complexities that you don't face with a DSL.”
  • 11. V1 - An empty array Source Code: https://bit.ly/dpc14rdp
  • 14. ARRAY_START <- ‘Array’ LF <- “n” PAREN_OPEN <- ‘(’ PAREN_CLOSE <- ‘)’
  • 15. PrintRLang V1 RecursiveDescentParser - $content : string + __construct ( string $content ) + consume ( string $terminal ) + lookAhead ( string $terminal ) Source Code: https://bit.ly/dpc14rdp
  • 16. PrintRLang V1 
 ArrayParser - $parser : RecursiveDescentParser + __construct(RecursiveDescentParser $parser) + parse(): array + arrayStart() + lf() + braceOpen() + braceClose() Source Code: https://bit.ly/dpc14rdp
  • 17. ! $parser = new PrintRLang ArrayParser( new PrintRLang RecursiveDescentParser( "Arrayn(n)n" ) ); $parser->parse();
  • 18. public function parse() { $this->arrayStart(); $this->lf(); $this->braceOpen(); $this->lf() $this->braceClose(); $this->lf(); return array(); }
  • 19. A r r a y n ( n ) n public function arrayStart() { $this->parser->consume('Array'); } n ( n ) n
  • 20. n ( n ) n public function lf() { $this->parser->consume("n"); } ( n ) n
  • 21. ( n ) n public function braceOpen() { $this->parser->consume('('); } n ) n
  • 22. n ) n public function lf() { $this->parser->consume("n"); } ) n
  • 23. ) n public function braceClose() { $this->parser->consume(')'); } n
  • 24. n public function lf() { $this->parser->consume("n"); }
  • 25. V2 - Array of strings Source Code: https://bit.ly/dpc14rdp
  • 28. Kleene star translates to: ARRAY_ASSIGN* while (lookAhead(' ')) $result = arrayAssign($result)
  • 30. Kleene plus SPACE+ === SPACE SPACE*
  • 31. Kleene plus implemented space() while (lookAhead(' ')) space()
  • 34. PrintRLang V2 RecursiveDescentParser - $content : string + __construct ( string $content ) + consume ( string $terminal ) + consumeRegex( string $regex ) + lookAhead ( string $terminal ) + lookAheadRegex( string $regex Source Code: https://bit.ly/dpc14rdp
  • 35. PrintRLang V2 
 ArrayParser - $parser : RecursiveDescentParser ... + arrayAssign( array $result ) + arrayKey() : string + arrayValue() : string + space() + fatArrow ...
  • 36. V3 - Array of Arrays
  • 37. Array ( [Talk] => Array ( [Title] => Ansible: Orchestrate [Type] => 3 ) )
  • 38. ARRAY_VALUE <- ARRAY / STRING STRING <- !LF
  • 40. PrintRLang V3 
 ArrayParser - $parser : RecursiveDescentParser ... + string() ...
  • 41. Why?
  • 42. – Steve Yegge, Rich Programmer Food “If you don't know how parsing works, you'll do it badly with regular expressions, or if you don't know those, then with hand-rolled state machines that are thousands of lines of incomprehensible code that doesn't actually work.”
  • 43. Mail::RFC822::Address (?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t] )+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?: rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:( ?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-0 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)* ](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+ (?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?: (?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z |(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn) ?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?: rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[
  • 44. t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn) ?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t] )*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])* )(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t] )+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*) *:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+ |Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:r n)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?: rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t ]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031 ]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*]( ?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(? :(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(? :rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:".[] 000-031]+(?:(? :(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)? [ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[]
  • 45. 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]| .|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<> @,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|" (?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t] )*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;: ".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(? :[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[ ]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000- 031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|( ?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,; :".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([ ^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:" .[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[ ]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:". [] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[] r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[]
  • 46. 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r] |.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 0 00-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]| .|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@, ;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(? :[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])* (?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:". []]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[ ^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[] ]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*( ?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;: ".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:( ?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[ ["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t ])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t ])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(? :.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|
  • 47. Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?: [^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[ ]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn) ?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[[" ()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn) ?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<> @,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@, ;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t] )*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;: ".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)? (?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:". []]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?: rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[[ "()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t]) *))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t]) +|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?: .(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z |(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:( ?:rn)?[ t])*))*)?;s*)
  • 48. Useful applications I’ve seen REST API with CQL querying (MediaMosa.org) Migrating wiki content Parsing log files Parsing obscure specifications (ARF) Concise configuration files Domain Specific Languages
  • 49. –Martin Fowler, Domain Specific Languages “a DSL is a front-end to a library providing a different style of manipulation to the command-query API. ”
  • 50. Rules for building a parser Consider using an existing parser. Consider porting one from another language. Consider XML or the new XMLs: JSON / YAML Consider working around it. Then and only then consider building your own parser
  • 52. Let’s build a parser! http://protalk.me/dpcradio-lets-build-a-parser
  • 53. Thank you for your time and attention! Questions? Tweet to @relaxnow Rate @ https://joind.in/10859 Slides @ https://joind.in/10859 Code @ https://bit.ly/dpc14rdp