SlideShare a Scribd company logo
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 1
www.nand2tetris.org
Building a Modern Computer From First Principles
Compiler I: Syntax Analysis
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 2
Course map
Assembler
Chapter 6
H.L. Language
&
Operating Sys.
abstract interface
Compiler
Chapters 10 - 11
VM Translator
Chapters 7 - 8
Computer
Architecture
Chapters 4 - 5
Gate Logic
Chapters 1 - 3 Electrical
Engineering
Physics
Virtual
Machine
abstract interface
Software
hierarchy
Assembly
Language
abstract interface
Hardware
hierarchy
Machine
Language
abstract interface
Hardware
Platform
abstract interface
Chips &
Logic Gates
abstract interface
Human
Thought
Abstract design
Chapters 9, 12
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 3
Motivation: Why study about compilers?
Because Compilers …
Are an essential part of applied computer science
Are very relevant to computational linguistics
Are implemented using classical programming techniques
Employ important software engineering principles
Train you in developing software for transforming one structure to
another (programs, files, transactions, …)
Train you to think in terms of ”description languages”.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 4
The big picture
. . .
RISC
machine
other digital platforms, each equipped
with its VM implementation
RISC
machine
language
Hack
computer
Hack
machine
language
CISC
machine
language
CISC
machine
. . .
written in
a high-level
language
Any
computer
. . .
HW
lectures
(Projects
1-6)
Intermediate code
VM
implementation
over CISC
platforms
VM imp.
over RISC
platforms
VM imp.
over the Hack
platform
VM
emulator
VM
lectures
(Projects
7-8)
Some Other
language
Jack
language
Some
compiler Some Other
compiler
Jack
compiler
. . .Some
language
. . .
Compiler
lectures
(Projects
10,11)
Modern compilers
are two-tiered:
Front-end:
from high-level
language to some
intermediate
language
Back-end:
from the
intermediate
language to
binary code.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 5
Compiler architecture (front end)
. . .
Intermediate code
RISC
machine
language
Hack
machine
language
CISC
machine
language
. . .
written in
a high-level
language
. . .
VM
implementation
over CISC
platforms
VM imp.
over RISC
platforms
VM imp.
over the Hack
platform
VM
emulator
Some Other
language
Jack
language
Some
compiler Some Other
compiler
Jack
compiler
. . .Some
language
. . .
Syntax analysis: understanding the semantics implied by the source code
Code generation: reconstructing the semantics using the syntax of the
target code.
Tokenizing: creating a stream of “atoms”
Parsing: matching the atom stream with the language grammar
XML output = one way to demonstrate that the syntax analyzer works
(Chapter 11)Jack
Program
Toke-
nizer
Parser
Code
Gene
-ration
Syntax Analyzer
Jack Compiler
VM
code
XML
code
(Chapter 10)
(source) (target)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 6
Tokenizing / Lexical analysis
Remove white space
Construct a token list (language atoms)
Things to worry about:
Language specific rules:
e.g. how to treat “++”
Language-specific classifications:
keyword, symbol, identifier, integerCconstant, stringConstant,...
While we are at it, we can have the tokenizer record not only the token, but
also its lexical classification (as defined by the source language grammar).
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 7
Jack Tokenizer
if (x < 153) {let city = ”Paris”;}if (x < 153) {let city = ”Paris”;}
Source code
<tokens>
<keyword> if </keyword>
<symbol> ( </symbol>
<identifier> x </identifier>
<symbol> &lt; </symbol>
<integerConstant> 153 </integerConstant>
<symbol> ) </symbol>
<symbol> { </symbol>
<keyword> let </keyword>
<identifier> city </identifier>
<symbol> = </symbol>
<stringConstant> Paris </stringConstant>
<symbol> ; </symbol>
<symbol> } </symbol>
</tokens>
<tokens>
<keyword> if </keyword>
<symbol> ( </symbol>
<identifier> x </identifier>
<symbol> &lt; </symbol>
<integerConstant> 153 </integerConstant>
<symbol> ) </symbol>
<symbol> { </symbol>
<keyword> let </keyword>
<identifier> city </identifier>
<symbol> = </symbol>
<stringConstant> Paris </stringConstant>
<symbol> ; </symbol>
<symbol> } </symbol>
</tokens>
Tokenizer’s output
Tokenizer
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 8
Parsing
The tokenizer discussed thus far is part of a larger program called parser
Each language is characterized by a grammar.
The parser is implemented to recognize this grammar in given texts
The parsing process:
A text is given and tokenized
The parser determines weather or not the text can be generated from
the grammar
In the process, the parser performs a complete structural analysis of
the text
The text can be in an expression in a :
Natural language (English, …)
Programming language (Jack, …).
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 9
Parsing examples
(5+3)*2 – sqrt(9*4) she discussed sex with her doctor
-
5
sqrt
+
*
3
2
9 4
*
Jack English
discussed
she sex
with
her doctor
parse 1
discussed
she with
her doctor
parse 2
sex
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 10
More examples of challenging parsing
We gave the monkeys the bananas because they were hungry
We gave the monkeys the bananas because they were over-ripe
I never said she stole my money
I never said she stole my money
I never said she stole my money
I never said she stole my money
I never said she stole my money
I never said she stole my money
I never said she stole my money
I never said she stole my money
Time flies like an arrow
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 11
Simple (terminal) forms / complex (non-terminal) forms
Grammar = set of rules on how to construct complex forms from simpler forms
Highly recursive.
A typical grammar of a typical C-like language
while (expression) {
if (expression)
statement;
while (expression) {
statement;
if (expression)
statement;
}
while (expression) {
statement;
statement;
}
}
if (expression) {
statement;
while (expression)
statement;
statement;
}
if (expression)
if (expression)
statement;
}
while (expression) {
if (expression)
statement;
while (expression) {
statement;
if (expression)
statement;
}
while (expression) {
statement;
statement;
}
}
if (expression) {
statement;
while (expression)
statement;
statement;
}
if (expression)
if (expression)
statement;
}
Code sample
program: statement;
statement: whileStatement
| ifStatement
| // other statement possibilities ...
| '{' statementSequence '}'
whileStatement: 'while' '(' expression ')' statement
ifStatement: simpleIf
| ifElse
simpleIf: 'if' '(' expression ')' statement
ifElse: 'if' '(' expression ')' statement
'else' statement
statementSequence: '' // null, i.e. the empty sequence
| statement ';' statementSequence
expression: // definition of an expression comes here
// more definitions follow
program: statement;
statement: whileStatement
| ifStatement
| // other statement possibilities ...
| '{' statementSequence '}'
whileStatement: 'while' '(' expression ')' statement
ifStatement: simpleIf
| ifElse
simpleIf: 'if' '(' expression ')' statement
ifElse: 'if' '(' expression ')' statement
'else' statement
statementSequence: '' // null, i.e. the empty sequence
| statement ';' statementSequence
expression: // definition of an expression comes here
// more definitions follow
Grammar
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 12
Parse tree
while . . .( )count <= 100 { count ++
statement
whileStatement
expression
statementSequence
statement
;
statement statementSequence
Input Text:
while (count<=100) {
/** demonstration */
count++;
// ...
Tokenized:
while
(
count
<=
100
)
{
count
++
;
...
program: statement;
statement: whileStatement
| ifStatement
| // other statement possibilities ...
| '{' statementSequence '}'
whileStatement: 'while'
'(' expression ')'
statement
...
program: statement;
statement: whileStatement
| ifStatement
| // other statement possibilities ...
| '{' statementSequence '}'
whileStatement: 'while'
'(' expression ')'
statement
...
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 13
Recursive descent parsing
Parser implementation: a set of parsing
methods, one for each rule:
parseStatement()
parseWhileStatement()
parseIfStatement()
parseStatementSequence()
parseExpression().
Highly recursive
LL(0) grammars: the first token
determines in which rule we are
In other grammars you have to
look ahead 1 or more tokens
Jack is almost LL(0).
while (expression) {
statement;
statement;
while (expression) {
while (expression)
statement;
statement;
}
}
while (expression) {
statement;
statement;
while (expression) {
while (expression)
statement;
statement;
}
}
code sample
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 14
A linguist view on parsing
Parsing:
One of the mental processes involved
in sentence comprehension, in which
the listener determines the syntactic
categories of the words, joins them
up in a tree, and identifies the
subject, object, and predicate, a
prerequisite to determining who did
what to whom from the information in
the sentence.
(Steven Pinker,
The Language Instinct)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 15
The Jack grammar
’x’: x appears verbatim
x: x is a language construct
x?: x appears 0 or 1 times
x*: x appears 0 or more times
x|y: either x or y appears
(x,y): x appears, then y.
’x’: x appears verbatim
x: x is a language construct
x?: x appears 0 or 1 times
x*: x appears 0 or more times
x|y: either x or y appears
(x,y): x appears, then y.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 16
The Jack grammar (cont.)
’x’: x appears verbatim
x: x is a language construct
x?: x appears 0 or 1 times
x*: x appears 0 or more times
x|y: either x or y appears
(x,y): x appears, then y.
’x’: x appears verbatim
x: x is a language construct
x?: x appears 0 or 1 times
x*: x appears 0 or more times
x|y: either x or y appears
(x,y): x appears, then y.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 17
Jack syntax analyzer in action
Class Bar {
method Fraction foo(int y) {
var int temp; // a variable
let temp = (xxx+12)*-63;
...
...
Class Bar {
method Fraction foo(int y) {
var int temp; // a variable
let temp = (xxx+12)*-63;
...
...
Syntax analyzer
Using the language grammar,
a programmer can write
a syntax analyzer program (parser)
The syntax analyzer takes a source text
file and attempts to match it on the
language grammar
If successful, it can generate a parse tree
in some structured format, e.g. XML.
<varDec>
<keyword> var </keyword>
<keyword> int </keyword>
<identifier> temp </identifier>
<symbol> ; </symbol>
</varDec>
<statements>
<letStatement>
<keyword> let </keyword>
<identifier> temp </identifier>
<symbol> = </symbol>
<expression>
<term>
<symbol> ( </symbol>
<expression>
<term>
<identifier> xxx </identifier>
</term>
<symbol> + </symbol>
<term>
<int.Const.> 12 </int.Const.>
</term>
</expression>
...
<varDec>
<keyword> var </keyword>
<keyword> int </keyword>
<identifier> temp </identifier>
<symbol> ; </symbol>
</varDec>
<statements>
<letStatement>
<keyword> let </keyword>
<identifier> temp </identifier>
<symbol> = </symbol>
<expression>
<term>
<symbol> ( </symbol>
<expression>
<term>
<identifier> xxx </identifier>
</term>
<symbol> + </symbol>
<term>
<int.Const.> 12 </int.Const.>
</term>
</expression>
...
Syntax analyzer
The syntax analyzer’s algorithm shown in this slide:
If xxx is non-terminal, output:
<xxx>
Recursive code for the body of xxx
</xxx>
If xxx is terminal (keyword, symbol, constant, or identifier) ,
output:
<xxx>
xxx value
</xxx>
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 18
JackTokenizer: a tokenizer for the Jack language (proposed implementation)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 19
JackTokenizer (cont.)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 20
CompilationEngine: a recursive top-down parser for Jack
The CompilationEngine effects the actual compilation output.
It gets its input from a JackTokenizer and emits its parsed structure into an
output file/stream.
The output is generated by a series of compilexxx() routines, one for every
syntactic element xxx of the Jack grammar.
The contract between these routines is that each compilexxx() routine should
read the syntactic construct xxx from the input, advance() the tokenizer
exactly beyond xxx, and output the parsing of xxx.
Thus, compilexxx()may only be called if indeed xxx is the next syntactic
element of the input.
In the first version of the compiler, which we now build, this module emits a
structured printout of the code, wrapped in XML tags (defined in the specs of
project 10). In the final version of the compiler, this module generates
executable VM code (defined in the specs of project 11).
In both cases, the parsing logic and module API are exactly the same.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 21
CompilationEngine (cont.)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 22
CompilationEngine (cont.)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 23
CompilationEngine (cont.)
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 24
Summary and next step
(Chapter 11)Jack
Program
Toke-
nizer
Parser
Code
Gene
-ration
Syntax Analyzer
Jack Compiler
VM
code
XML
code
(Chapter 10)
Syntax analysis: understanding syntax
Code generation: constructing semantics
The code generation challenge:
Extend the syntax analyzer into a full-blown compiler that, instead of
generating passive XML code, generates executable VM code
Two challenges: (a) handling data, and (b) handling commands.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 25
Perspective
The parse tree can be constructed on the fly
Syntax analyzers can be built using:
Lex tool for tokenizing
Yacc tool for parsing
Do everything from scratch (our approach ...)
The Jack language is intentionally simple:
Statement prefixes: let, do, ...
No operator priority
No error checking
Basic data types, etc.
Richer languages require more powerful compilers
The Jack compiler: designed to illustrate the key ideas that underlie modern
compilers, leaving advanced features to more advanced courses
Industrial-strength compilers:
Have good error diagnostics
Generate tight and efficient code
Support parallel (multi-core) processors.

More Related Content

PDF
Lecture 09 high level language
PDF
Lecture 12 os
PDF
Lecture 11 compiler ii
PDF
Lecture 08 virtual machine ii
PDF
nand2tetris 舊版投影片 -- 第五章 計算機結構
PDF
Introduction
PDF
Boolean arithmetic
PDF
Computer architecture
Lecture 09 high level language
Lecture 12 os
Lecture 11 compiler ii
Lecture 08 virtual machine ii
nand2tetris 舊版投影片 -- 第五章 計算機結構
Introduction
Boolean arithmetic
Computer architecture

What's hot (20)

PDF
Lecture 07 virtual machine i
PDF
Sequential logic
PDF
nand2tetris 舊版投影片 -- 第三章 循序邏輯
PDF
nand2tetris 舊版投影片 -- 第四章 機器語言
PDF
nand2tetris 舊版投影片 -- 第二章 布林算術
PDF
nand2tetris 舊版投影片 -- 第一章 布林邏輯
PDF
Lecture 06 assembler
PDF
Machine language
PPTX
Mutual Exclusion
PDF
Introduction to Julia for bioinformacis
PDF
Julia - Easier, Better, Faster, Stronger
PDF
Serial comm matlab
PDF
interfacing matlab with embedded systems
PDF
Matlab Serial Port
PDF
Vb.net ii
PPTX
KOLEJ KOMUNITI - Sijil Aplikasi Perisian Komputer
ODP
(6) collections algorithms
PDF
Matlab workshop
PDF
GUESS FUNDAMENTAL PAPER FOE CCAT Feb 2014
PDF
Basics of MATLAB programming
Lecture 07 virtual machine i
Sequential logic
nand2tetris 舊版投影片 -- 第三章 循序邏輯
nand2tetris 舊版投影片 -- 第四章 機器語言
nand2tetris 舊版投影片 -- 第二章 布林算術
nand2tetris 舊版投影片 -- 第一章 布林邏輯
Lecture 06 assembler
Machine language
Mutual Exclusion
Introduction to Julia for bioinformacis
Julia - Easier, Better, Faster, Stronger
Serial comm matlab
interfacing matlab with embedded systems
Matlab Serial Port
Vb.net ii
KOLEJ KOMUNITI - Sijil Aplikasi Perisian Komputer
(6) collections algorithms
Matlab workshop
GUESS FUNDAMENTAL PAPER FOE CCAT Feb 2014
Basics of MATLAB programming
Ad

Similar to Lecture 10 compiler i (20)

PPTX
Ss ui lecture 2
PPT
Chapter1 Introduction of compiler
PDF
1 introduction to compiler
PPTX
CD U1-5.pptx
PPTX
COMPILER CONSTRUCTION KU 1.pptx
PPTX
1._Introduction_.pptx
PPT
Unit1.ppt
PPT
Compiler Design in Computer Applications
PPT
compiler construvtion aaaaaaaaaaaaaaaaaads
PDF
Lecture 01 introduction to compiler
PPTX
1 compiler outline
PPT
Compiler design computer science engineering.ppt
DOC
Chapter 1 1
PPT
Chapter One
PPT
Compier Design_Unit I.ppt
PPT
Compier Design_Unit I.ppt
PPTX
COMPILER DESIGN PPTS.pptx
PPTX
1 cc
PDF
Compiler design Introduction
PPT
Introduction to Compiler Construction
Ss ui lecture 2
Chapter1 Introduction of compiler
1 introduction to compiler
CD U1-5.pptx
COMPILER CONSTRUCTION KU 1.pptx
1._Introduction_.pptx
Unit1.ppt
Compiler Design in Computer Applications
compiler construvtion aaaaaaaaaaaaaaaaaads
Lecture 01 introduction to compiler
1 compiler outline
Compiler design computer science engineering.ppt
Chapter 1 1
Chapter One
Compier Design_Unit I.ppt
Compier Design_Unit I.ppt
COMPILER DESIGN PPTS.pptx
1 cc
Compiler design Introduction
Introduction to Compiler Construction
Ad

More from 鍾誠 陳鍾誠 (20)

PDF
用十分鐘瞭解 新竹科學園區的發展史
PDF
用十分鐘搞懂 λ-Calculus
PDF
交⼤資訊⼯程學系備審資料 ⾱詠祥
PDF
smallpt: Global Illumination in 99 lines of C++
PDF
西洋史 (你或許不知道但卻影響現代教育的那些事)
PDF
區塊鏈 (比特幣背後的關鍵技術) -- 十分鐘系列
PDF
區塊鏈 (比特幣背後的關鍵技術) -- 十分鐘系列
PDF
梯度下降法 (隱藏在深度學習背後的演算法) -- 十分鐘系列
PDF
用十分鐘理解 《微分方程》
DOCX
系統程式 -- 前言
DOCX
系統程式 -- 附錄
DOCX
系統程式 -- 第 12 章 系統軟體實作
DOCX
系統程式 -- 第 11 章 嵌入式系統
DOCX
系統程式 -- 第 10 章 作業系統
DOCX
系統程式 -- 第 9 章 虛擬機器
DOCX
系統程式 -- 第 8 章 編譯器
DOCX
系統程式 -- 第 7 章 高階語言
DOCX
系統程式 -- 第 6 章 巨集處理器
DOCX
系統程式 -- 第 5 章 連結與載入
DOCX
系統程式 -- 第 4 章 組譯器
用十分鐘瞭解 新竹科學園區的發展史
用十分鐘搞懂 λ-Calculus
交⼤資訊⼯程學系備審資料 ⾱詠祥
smallpt: Global Illumination in 99 lines of C++
西洋史 (你或許不知道但卻影響現代教育的那些事)
區塊鏈 (比特幣背後的關鍵技術) -- 十分鐘系列
區塊鏈 (比特幣背後的關鍵技術) -- 十分鐘系列
梯度下降法 (隱藏在深度學習背後的演算法) -- 十分鐘系列
用十分鐘理解 《微分方程》
系統程式 -- 前言
系統程式 -- 附錄
系統程式 -- 第 12 章 系統軟體實作
系統程式 -- 第 11 章 嵌入式系統
系統程式 -- 第 10 章 作業系統
系統程式 -- 第 9 章 虛擬機器
系統程式 -- 第 8 章 編譯器
系統程式 -- 第 7 章 高階語言
系統程式 -- 第 6 章 巨集處理器
系統程式 -- 第 5 章 連結與載入
系統程式 -- 第 4 章 組譯器

Recently uploaded (20)

PDF
Anesthesia in Laparoscopic Surgery in India
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Cell Types and Its function , kingdom of life
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
master seminar digital applications in india
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Pharma ospi slides which help in ospi learning
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Cell Structure & Organelles in detailed.
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Classroom Observation Tools for Teachers
Anesthesia in Laparoscopic Surgery in India
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Cell Types and Its function , kingdom of life
PPH.pptx obstetrics and gynecology in nursing
O7-L3 Supply Chain Operations - ICLT Program
master seminar digital applications in india
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Pharma ospi slides which help in ospi learning
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
STATICS OF THE RIGID BODIES Hibbelers.pdf
human mycosis Human fungal infections are called human mycosis..pptx
01-Introduction-to-Information-Management.pdf
Final Presentation General Medicine 03-08-2024.pptx
Cell Structure & Organelles in detailed.
Renaissance Architecture: A Journey from Faith to Humanism
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Classroom Observation Tools for Teachers

Lecture 10 compiler i

  • 1. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 1 www.nand2tetris.org Building a Modern Computer From First Principles Compiler I: Syntax Analysis
  • 2. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 2 Course map Assembler Chapter 6 H.L. Language & Operating Sys. abstract interface Compiler Chapters 10 - 11 VM Translator Chapters 7 - 8 Computer Architecture Chapters 4 - 5 Gate Logic Chapters 1 - 3 Electrical Engineering Physics Virtual Machine abstract interface Software hierarchy Assembly Language abstract interface Hardware hierarchy Machine Language abstract interface Hardware Platform abstract interface Chips & Logic Gates abstract interface Human Thought Abstract design Chapters 9, 12
  • 3. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 3 Motivation: Why study about compilers? Because Compilers … Are an essential part of applied computer science Are very relevant to computational linguistics Are implemented using classical programming techniques Employ important software engineering principles Train you in developing software for transforming one structure to another (programs, files, transactions, …) Train you to think in terms of ”description languages”.
  • 4. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 4 The big picture . . . RISC machine other digital platforms, each equipped with its VM implementation RISC machine language Hack computer Hack machine language CISC machine language CISC machine . . . written in a high-level language Any computer . . . HW lectures (Projects 1-6) Intermediate code VM implementation over CISC platforms VM imp. over RISC platforms VM imp. over the Hack platform VM emulator VM lectures (Projects 7-8) Some Other language Jack language Some compiler Some Other compiler Jack compiler . . .Some language . . . Compiler lectures (Projects 10,11) Modern compilers are two-tiered: Front-end: from high-level language to some intermediate language Back-end: from the intermediate language to binary code.
  • 5. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 5 Compiler architecture (front end) . . . Intermediate code RISC machine language Hack machine language CISC machine language . . . written in a high-level language . . . VM implementation over CISC platforms VM imp. over RISC platforms VM imp. over the Hack platform VM emulator Some Other language Jack language Some compiler Some Other compiler Jack compiler . . .Some language . . . Syntax analysis: understanding the semantics implied by the source code Code generation: reconstructing the semantics using the syntax of the target code. Tokenizing: creating a stream of “atoms” Parsing: matching the atom stream with the language grammar XML output = one way to demonstrate that the syntax analyzer works (Chapter 11)Jack Program Toke- nizer Parser Code Gene -ration Syntax Analyzer Jack Compiler VM code XML code (Chapter 10) (source) (target)
  • 6. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 6 Tokenizing / Lexical analysis Remove white space Construct a token list (language atoms) Things to worry about: Language specific rules: e.g. how to treat “++” Language-specific classifications: keyword, symbol, identifier, integerCconstant, stringConstant,... While we are at it, we can have the tokenizer record not only the token, but also its lexical classification (as defined by the source language grammar).
  • 7. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 7 Jack Tokenizer if (x < 153) {let city = ”Paris”;}if (x < 153) {let city = ”Paris”;} Source code <tokens> <keyword> if </keyword> <symbol> ( </symbol> <identifier> x </identifier> <symbol> &lt; </symbol> <integerConstant> 153 </integerConstant> <symbol> ) </symbol> <symbol> { </symbol> <keyword> let </keyword> <identifier> city </identifier> <symbol> = </symbol> <stringConstant> Paris </stringConstant> <symbol> ; </symbol> <symbol> } </symbol> </tokens> <tokens> <keyword> if </keyword> <symbol> ( </symbol> <identifier> x </identifier> <symbol> &lt; </symbol> <integerConstant> 153 </integerConstant> <symbol> ) </symbol> <symbol> { </symbol> <keyword> let </keyword> <identifier> city </identifier> <symbol> = </symbol> <stringConstant> Paris </stringConstant> <symbol> ; </symbol> <symbol> } </symbol> </tokens> Tokenizer’s output Tokenizer
  • 8. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 8 Parsing The tokenizer discussed thus far is part of a larger program called parser Each language is characterized by a grammar. The parser is implemented to recognize this grammar in given texts The parsing process: A text is given and tokenized The parser determines weather or not the text can be generated from the grammar In the process, the parser performs a complete structural analysis of the text The text can be in an expression in a : Natural language (English, …) Programming language (Jack, …).
  • 9. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 9 Parsing examples (5+3)*2 – sqrt(9*4) she discussed sex with her doctor - 5 sqrt + * 3 2 9 4 * Jack English discussed she sex with her doctor parse 1 discussed she with her doctor parse 2 sex
  • 10. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 10 More examples of challenging parsing We gave the monkeys the bananas because they were hungry We gave the monkeys the bananas because they were over-ripe I never said she stole my money I never said she stole my money I never said she stole my money I never said she stole my money I never said she stole my money I never said she stole my money I never said she stole my money I never said she stole my money Time flies like an arrow
  • 11. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 11 Simple (terminal) forms / complex (non-terminal) forms Grammar = set of rules on how to construct complex forms from simpler forms Highly recursive. A typical grammar of a typical C-like language while (expression) { if (expression) statement; while (expression) { statement; if (expression) statement; } while (expression) { statement; statement; } } if (expression) { statement; while (expression) statement; statement; } if (expression) if (expression) statement; } while (expression) { if (expression) statement; while (expression) { statement; if (expression) statement; } while (expression) { statement; statement; } } if (expression) { statement; while (expression) statement; statement; } if (expression) if (expression) statement; } Code sample program: statement; statement: whileStatement | ifStatement | // other statement possibilities ... | '{' statementSequence '}' whileStatement: 'while' '(' expression ')' statement ifStatement: simpleIf | ifElse simpleIf: 'if' '(' expression ')' statement ifElse: 'if' '(' expression ')' statement 'else' statement statementSequence: '' // null, i.e. the empty sequence | statement ';' statementSequence expression: // definition of an expression comes here // more definitions follow program: statement; statement: whileStatement | ifStatement | // other statement possibilities ... | '{' statementSequence '}' whileStatement: 'while' '(' expression ')' statement ifStatement: simpleIf | ifElse simpleIf: 'if' '(' expression ')' statement ifElse: 'if' '(' expression ')' statement 'else' statement statementSequence: '' // null, i.e. the empty sequence | statement ';' statementSequence expression: // definition of an expression comes here // more definitions follow Grammar
  • 12. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 12 Parse tree while . . .( )count <= 100 { count ++ statement whileStatement expression statementSequence statement ; statement statementSequence Input Text: while (count<=100) { /** demonstration */ count++; // ... Tokenized: while ( count <= 100 ) { count ++ ; ... program: statement; statement: whileStatement | ifStatement | // other statement possibilities ... | '{' statementSequence '}' whileStatement: 'while' '(' expression ')' statement ... program: statement; statement: whileStatement | ifStatement | // other statement possibilities ... | '{' statementSequence '}' whileStatement: 'while' '(' expression ')' statement ...
  • 13. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 13 Recursive descent parsing Parser implementation: a set of parsing methods, one for each rule: parseStatement() parseWhileStatement() parseIfStatement() parseStatementSequence() parseExpression(). Highly recursive LL(0) grammars: the first token determines in which rule we are In other grammars you have to look ahead 1 or more tokens Jack is almost LL(0). while (expression) { statement; statement; while (expression) { while (expression) statement; statement; } } while (expression) { statement; statement; while (expression) { while (expression) statement; statement; } } code sample
  • 14. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 14 A linguist view on parsing Parsing: One of the mental processes involved in sentence comprehension, in which the listener determines the syntactic categories of the words, joins them up in a tree, and identifies the subject, object, and predicate, a prerequisite to determining who did what to whom from the information in the sentence. (Steven Pinker, The Language Instinct)
  • 15. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 15 The Jack grammar ’x’: x appears verbatim x: x is a language construct x?: x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears (x,y): x appears, then y. ’x’: x appears verbatim x: x is a language construct x?: x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears (x,y): x appears, then y.
  • 16. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 16 The Jack grammar (cont.) ’x’: x appears verbatim x: x is a language construct x?: x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears (x,y): x appears, then y. ’x’: x appears verbatim x: x is a language construct x?: x appears 0 or 1 times x*: x appears 0 or more times x|y: either x or y appears (x,y): x appears, then y.
  • 17. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 17 Jack syntax analyzer in action Class Bar { method Fraction foo(int y) { var int temp; // a variable let temp = (xxx+12)*-63; ... ... Class Bar { method Fraction foo(int y) { var int temp; // a variable let temp = (xxx+12)*-63; ... ... Syntax analyzer Using the language grammar, a programmer can write a syntax analyzer program (parser) The syntax analyzer takes a source text file and attempts to match it on the language grammar If successful, it can generate a parse tree in some structured format, e.g. XML. <varDec> <keyword> var </keyword> <keyword> int </keyword> <identifier> temp </identifier> <symbol> ; </symbol> </varDec> <statements> <letStatement> <keyword> let </keyword> <identifier> temp </identifier> <symbol> = </symbol> <expression> <term> <symbol> ( </symbol> <expression> <term> <identifier> xxx </identifier> </term> <symbol> + </symbol> <term> <int.Const.> 12 </int.Const.> </term> </expression> ... <varDec> <keyword> var </keyword> <keyword> int </keyword> <identifier> temp </identifier> <symbol> ; </symbol> </varDec> <statements> <letStatement> <keyword> let </keyword> <identifier> temp </identifier> <symbol> = </symbol> <expression> <term> <symbol> ( </symbol> <expression> <term> <identifier> xxx </identifier> </term> <symbol> + </symbol> <term> <int.Const.> 12 </int.Const.> </term> </expression> ... Syntax analyzer The syntax analyzer’s algorithm shown in this slide: If xxx is non-terminal, output: <xxx> Recursive code for the body of xxx </xxx> If xxx is terminal (keyword, symbol, constant, or identifier) , output: <xxx> xxx value </xxx>
  • 18. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 18 JackTokenizer: a tokenizer for the Jack language (proposed implementation)
  • 19. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 19 JackTokenizer (cont.)
  • 20. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 20 CompilationEngine: a recursive top-down parser for Jack The CompilationEngine effects the actual compilation output. It gets its input from a JackTokenizer and emits its parsed structure into an output file/stream. The output is generated by a series of compilexxx() routines, one for every syntactic element xxx of the Jack grammar. The contract between these routines is that each compilexxx() routine should read the syntactic construct xxx from the input, advance() the tokenizer exactly beyond xxx, and output the parsing of xxx. Thus, compilexxx()may only be called if indeed xxx is the next syntactic element of the input. In the first version of the compiler, which we now build, this module emits a structured printout of the code, wrapped in XML tags (defined in the specs of project 10). In the final version of the compiler, this module generates executable VM code (defined in the specs of project 11). In both cases, the parsing logic and module API are exactly the same.
  • 21. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 21 CompilationEngine (cont.)
  • 22. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 22 CompilationEngine (cont.)
  • 23. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 23 CompilationEngine (cont.)
  • 24. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 24 Summary and next step (Chapter 11)Jack Program Toke- nizer Parser Code Gene -ration Syntax Analyzer Jack Compiler VM code XML code (Chapter 10) Syntax analysis: understanding syntax Code generation: constructing semantics The code generation challenge: Extend the syntax analyzer into a full-blown compiler that, instead of generating passive XML code, generates executable VM code Two challenges: (a) handling data, and (b) handling commands.
  • 25. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 10: Compiler I: Syntax Analysis slide 25 Perspective The parse tree can be constructed on the fly Syntax analyzers can be built using: Lex tool for tokenizing Yacc tool for parsing Do everything from scratch (our approach ...) The Jack language is intentionally simple: Statement prefixes: let, do, ... No operator priority No error checking Basic data types, etc. Richer languages require more powerful compilers The Jack compiler: designed to illustrate the key ideas that underlie modern compilers, leaving advanced features to more advanced courses Industrial-strength compilers: Have good error diagnostics Generate tight and efficient code Support parallel (multi-core) processors.