SlideShare a Scribd company logo
2
Most read
4
Most read
Writing parsers in C#
(“Projecting arbitrary character streams into C# objects using monadic
parser combinators”)
Speaker: Alexey Golub @Tyrrrz
What is a parser?
• To parse — to resolve text into logical syntactic components
• i.e. IEnumerable<T> Parse(IEnumerable<char> text)
• e.g. double.Parse, XDocument.Parse
Where are parsers used?
• Data deserialization (JSON, XML, YAML)
• Static code analysis (ReSharper, TSLint)
• Syntax highlighting (VS Code, Highlight.js)
• Compilers, transpilers, interpreters (Roslyn, Markdig, Babel, SQL)
• Template engines (Razor, Liquid, Scriban)
• Natural language processing (Spellchecking, Translation)
What do parsers do?
• Disambiguate text into domain objects
• Assert that the text is well-formed
123 456,93
numeric literals
thousands separator
decimal separator
numeric literal
Formal language theory
• Alphabet – set of allowed characters
• Language – set of words made from characters in alphabet
• Grammar – set of rules that define how words are generated
Grammar types
• Regular grammar – RHS of a production rule is a terminal or a
terminal plus non-terminal
• Context-free grammar – RHS of a production rule is a finite sequence
of terminals and/or non-terminals
Rules of thumb
• If a language has recursive grammar rules – it’s not regular
• Regular grammar can be represented with regular expressions
• Context-free grammar cannot be directly represented with regular
expressions (in .NET)
Syntax trees
• Primary goal of a parser is to break down text into syntactic
components
• Syntactic structure of context-free languages is represented by a
syntax tree
• Program can then further evaluate the syntax tree as required
Root
Terminal
node
Non-terminal
node
Terminal
node
Terminal
node
Example AST produced by C-like code
Approaches
• Loop/stack-based manual parsers
• Loop through all characters in the input
• Maintain context on a stack
• Parser generators
• Custom language that defines grammar
• Compiles into code that you can execute
• Parser combinators
• Each parser is a delegate
• Parsers can be combined into higher-order parsers
Example from JSON.net (manual parser)
ANTLR (parser generator)
Sprache (parser combinator)
Parser combinators
• Start by building simple parsers
• Combine them into more complex parsers
• Repeat until you reach the root
• Hierarchy of parsers should resemble target syntax tree
Parser combinators (illustrated)
10 + 5
NumberParser WhiteSpaceParser SignParser
NumberParser THEN WhiteSpaceParser THEN SignParser THEN WhiteSpaceParser THEN NumberParser
Number (5)Number (10)
PlusOperator
OperatorParser
Coding challenge
Let’s develop a basic JSON parser
Further reading
• Formal grammar on Wikipedia –
https://en.wikipedia.org/wiki/Formal_grammar
• Parsing in C# by Federico Tomassetti –
https://tomassetti.me/parsing-in-csharp
Thank you!
@Tyrrrz

More Related Content

ODP
Using ANTLR on real example - convert "string combined" queries into paramete...
ODP
ANTLR4 and its testing
PPTX
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 1)
PDF
Webinar: OpenNLP and Solr for Superior Relevance
PPTX
Compiler Design
PDF
Python 45 minutes hangout #3
PDF
DIG1108C Lesson 2 Fall 2014
PDF
Netflix Global Search - Lucene Revolution
Using ANTLR on real example - convert "string combined" queries into paramete...
ANTLR4 and its testing
Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 1)
Webinar: OpenNLP and Solr for Superior Relevance
Compiler Design
Python 45 minutes hangout #3
DIG1108C Lesson 2 Fall 2014
Netflix Global Search - Lucene Revolution

What's hot (15)

PPTX
Episode 8 - Path To Code - Integrate Salesforce with external system using R...
PDF
Topic Modelling and APIs
PPT
3 describing syntax
PPTX
Regular expressions
PPTX
Plagirism checker
PDF
Introduction to Apache Solr
PPT
Introduction to php
PPTX
Introduction to Operational Semantics
PPTX
ElasticSearch in Production: lessons learned
PPT
Language processor implementation using python
PPT
OWL briefing
PDF
Webinar: Simpler Semantic Search with Solr
PDF
HypergraphDB
PDF
Expressing and sharing workflows
PPTX
Building NLP solutions using Python
Episode 8 - Path To Code - Integrate Salesforce with external system using R...
Topic Modelling and APIs
3 describing syntax
Regular expressions
Plagirism checker
Introduction to Apache Solr
Introduction to php
Introduction to Operational Semantics
ElasticSearch in Production: lessons learned
Language processor implementation using python
OWL briefing
Webinar: Simpler Semantic Search with Solr
HypergraphDB
Expressing and sharing workflows
Building NLP solutions using Python
Ad

Similar to Alexey Golub - Writing parsers in c# | 3Shape Meetup (20)

PPTX
ANTLR - Writing Parsers the Easy Way
PPTX
Regular Expressions(Theory of programming languages))
PDF
Lexical analysis - Compiler Design
PPT
Compiler1
PPT
Json - ideal for data interchange
PPTX
COMPILER CONSTRUCTION KU 1.pptx
PPTX
1._Introduction_.pptx
PPTX
Plc part 2
PPT
Compiler Design
PDF
An Introduction to the Compiler Designss
PPTX
Python Tutorial Part 1
PPTX
Compiler Construction
PPTX
PPTX
Understanding Character Encodings
PPTX
A Lecture of Compiler Design Subject.pptx
PPTX
Assignment4.pptx
PPTX
1 cc
PPTX
Compiler Lexical Analyzer to analyze lexemes.pptx
PPT
Compier Design_Unit I_SRM.ppt
ANTLR - Writing Parsers the Easy Way
Regular Expressions(Theory of programming languages))
Lexical analysis - Compiler Design
Compiler1
Json - ideal for data interchange
COMPILER CONSTRUCTION KU 1.pptx
1._Introduction_.pptx
Plc part 2
Compiler Design
An Introduction to the Compiler Designss
Python Tutorial Part 1
Compiler Construction
Understanding Character Encodings
A Lecture of Compiler Design Subject.pptx
Assignment4.pptx
1 cc
Compiler Lexical Analyzer to analyze lexemes.pptx
Compier Design_Unit I_SRM.ppt
Ad

More from Oleksii Holub (8)

PPTX
Reality-Driven Testing using TestContainers
PDF
Intro to CliWrap
PDF
Intro to CliWrap
PDF
Expression trees in C#
PDF
Fallacies of unit testing
PDF
Expression trees in c#
PDF
GitHub Actions in action
PDF
Alexey Golub - Dependency absolution (application as a pipeline) | Svitla Sma...
Reality-Driven Testing using TestContainers
Intro to CliWrap
Intro to CliWrap
Expression trees in C#
Fallacies of unit testing
Expression trees in c#
GitHub Actions in action
Alexey Golub - Dependency absolution (application as a pipeline) | Svitla Sma...

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Approach and Philosophy of On baking technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Empathic Computing: Creating Shared Understanding
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
KodekX | Application Modernization Development
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The AUB Centre for AI in Media Proposal.docx
Chapter 3 Spatial Domain Image Processing.pdf
Machine learning based COVID-19 study performance prediction
Encapsulation_ Review paper, used for researhc scholars
NewMind AI Weekly Chronicles - August'25 Week I
Review of recent advances in non-invasive hemoglobin estimation
“AI and Expert System Decision Support & Business Intelligence Systems”
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Approach and Philosophy of On baking technology
MIND Revenue Release Quarter 2 2025 Press Release
Empathic Computing: Creating Shared Understanding
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Big Data Technologies - Introduction.pptx
KodekX | Application Modernization Development
20250228 LYD VKU AI Blended-Learning.pptx
Building Integrated photovoltaic BIPV_UPV.pdf

Alexey Golub - Writing parsers in c# | 3Shape Meetup

  • 1. Writing parsers in C# (“Projecting arbitrary character streams into C# objects using monadic parser combinators”) Speaker: Alexey Golub @Tyrrrz
  • 2. What is a parser? • To parse — to resolve text into logical syntactic components • i.e. IEnumerable<T> Parse(IEnumerable<char> text) • e.g. double.Parse, XDocument.Parse
  • 3. Where are parsers used? • Data deserialization (JSON, XML, YAML) • Static code analysis (ReSharper, TSLint) • Syntax highlighting (VS Code, Highlight.js) • Compilers, transpilers, interpreters (Roslyn, Markdig, Babel, SQL) • Template engines (Razor, Liquid, Scriban) • Natural language processing (Spellchecking, Translation)
  • 4. What do parsers do? • Disambiguate text into domain objects • Assert that the text is well-formed 123 456,93 numeric literals thousands separator decimal separator numeric literal
  • 5. Formal language theory • Alphabet – set of allowed characters • Language – set of words made from characters in alphabet • Grammar – set of rules that define how words are generated
  • 6. Grammar types • Regular grammar – RHS of a production rule is a terminal or a terminal plus non-terminal • Context-free grammar – RHS of a production rule is a finite sequence of terminals and/or non-terminals
  • 7. Rules of thumb • If a language has recursive grammar rules – it’s not regular • Regular grammar can be represented with regular expressions • Context-free grammar cannot be directly represented with regular expressions (in .NET)
  • 8. Syntax trees • Primary goal of a parser is to break down text into syntactic components • Syntactic structure of context-free languages is represented by a syntax tree • Program can then further evaluate the syntax tree as required Root Terminal node Non-terminal node Terminal node Terminal node
  • 9. Example AST produced by C-like code
  • 10. Approaches • Loop/stack-based manual parsers • Loop through all characters in the input • Maintain context on a stack • Parser generators • Custom language that defines grammar • Compiles into code that you can execute • Parser combinators • Each parser is a delegate • Parsers can be combined into higher-order parsers
  • 11. Example from JSON.net (manual parser)
  • 14. Parser combinators • Start by building simple parsers • Combine them into more complex parsers • Repeat until you reach the root • Hierarchy of parsers should resemble target syntax tree
  • 15. Parser combinators (illustrated) 10 + 5 NumberParser WhiteSpaceParser SignParser NumberParser THEN WhiteSpaceParser THEN SignParser THEN WhiteSpaceParser THEN NumberParser Number (5)Number (10) PlusOperator OperatorParser
  • 16. Coding challenge Let’s develop a basic JSON parser
  • 17. Further reading • Formal grammar on Wikipedia – https://en.wikipedia.org/wiki/Formal_grammar • Parsing in C# by Federico Tomassetti – https://tomassetti.me/parsing-in-csharp