SlideShare a Scribd company logo
PARSING
Submitted by:-
Kartika
URN-1805192 CRN-1815040 CSA2
PARSING ?
● Also called as Syntactic analysis or syntax analysis
● Syntactic analysis or parsing or syntax analysis is the third phase of NLP
● The word ‘Parsing’ is originated from Latin word ‘pars’ (which means ‘part’)
● Comparing the rules of formal grammar, syntax analysis checks the text for
meaningfulness.
● The sentence like “Give me hot ice-cream”, for example, would be rejected by
parser or syntactic analyzer.
● The purpose of this phase is to draw exact meaning, or you can say dictionary
meaning from the text.
It may be defined as the process of
analyzing the strings of symbols in
natural language conforming to the rules
of formal grammar.
Syntactic analysis in NLP
ROLE OF PARSER
➢ To report any syntax error.
➢ It helps to recover from commonly occurring error so that the processing of the
remainder of program can be continued.
➢ To create parse tree
➢ To create symbol table.
➢ To produce intermediate representations (IR).
DEEPVS SHALLOW PARSING
DEEP PARSING SHALLOW PARSING
In deep parsing, the search strategy will give a
complete syntactic structure to a sentence
It is the task of parsing a limited part of the syntactic
information from the given task.
It is suitable for complex NLP applications It can be used for less complex applications
EXAMPLES:- Dialogue Summarisation and
summarisation
EXAMPLES:- Information extraction and text mining
It is called Full Parsing It is called Chunking
TYPES OF PARSING
Parsing is classified into two categories, i.e.
Top Down Parsing and Bottom-Up Parsing.
Top-Down Parsing is based on Left Most Derivation
Bottom Up Parsing is dependent on Reverse rightmost
Derivation.
DERIVATION
Derivation is a sequence of production rules. It is used to get the input
string through these production rules. During parsing we have to take
two decisions. These are as follows:
● We have to decide the non-terminal which is to be replaced.
● We have to decide the production rule by which the non-terminal will
be replaced.
Types of Derivation: Leftmost and rightmost
LEFT MOST DERIVATION
In the leftmost derivation, the input is scanned and replaced with the production rule from left
to right. So in left most derivatives we read the input string from left to right.
Example:
Production rules:
1. S = S + S
2. S = S - S
3. S = a | b |c
INPUT
a - b + c
EXAMPLE OF LEFT MOST DERIVATION
1. S = S + S
2. S = S - S + S
3. S = a - S + S
4. S = a - b + S
5. S = a - b + c
RIGHT MOST DERIVATION
In the right most derivation, the input is scanned and replaced with the production
rule from right to left. So in right most derivatives we read the input string from right
to left.
EXAMPLE:-
1. S = S - S
2. S = S - S + S
3. S = S - S + c
4. S = S - b + c
5. S = a - b + c
TOP DOWN PARSING
The process of constructing the parse tree which starts from the root and goes down to the leaf is Top-Down
Parsing.
● Top-Down Parsers constructs from the Grammar which is free from ambiguity and left recursion.
● Top Down Parsers uses leftmost derivation to construct a parse tree.
● It allows a grammar which is free from Left Factoring.
EXAMPLE OF LEFT FACTORING-
S → iEtS / iEtSeS / a
E → b
Syntactic analysis in NLP
WORKING OF TOP DOWN PARSER
EXAMPLE:-
S -> aABe
A -> Abc | b
B -> d
Input –
abbcde
Now, you will see that how top down approach works. Here, you will see how you can generate a input string
from the grammar for top down approach.
● First, you can start with S -> a A B e and then you will see input string a in the beginning and e in the
end.
● Now, you need to generate abbcde .
● Expand A-> Abc and Expand B-> d.
● Now, You have string like aAbcde and your input string is abbcde.
● Expand A->b.
● Final string, you will get abbcde.
BOTTOM UP PARSER OR SHIFT REDUCE PARSERS
Bottom Up Parsers / Shift Reduce Parsers
Build the parse tree from leaves to root. Bottom-up parsing can be defined as an attempt to reduce the input
string w to the start symbol of grammar by tracing out the rightmost derivations of w in reverse.
Syntactic analysis in NLP
BOTTOM UP PARSER
There are two unique steps for bottom-up parsing. These steps are known as shift-step and
reduce-step.
● Shift step: The shift step refers to the advancement of the input pointer to the next input
symbol, which is called the shifted symbol. This symbol is pushed onto the stack. The shifted
symbol is treated as a single node of the parse tree.
● Reduce step : When the parser finds a complete grammar rule (RHS) and replaces it to
(LHS), it is known as reduce-step. This occurs when the top of the stack contains a handle. To
reduce, a POP function is performed on the stack which pops off the handle and replaces it
with LHS non-terminal symbol.
LR PARSER
A general shift reduce parsing is LR parsing. The L stands for scanning the input from left to right and R stands for
constructing a rightmost derivation in reverse.
Benefits of LR parsing:
1. Many programming languages using some variations of an LR parser. It should be noted that C++ and Perl are
exceptions to it.
2. LR Parser can be implemented very efficiently.
3. Of all the Parsers that scan their symbols from left to right, LR Parsers detect syntactic errors, as soon as
possible.
Syntactic analysis in NLP
ISSUES IN BASIC PARSING
√ Never explores trees that aren’t potential solutions, ones with the wrong
kind of root node.
X But explores trees that do not match the input sentence (predicts input
before inspecting input).
X Naive top-down parsers never terminate if G contains recursive rules like X
→ X Y (left recursive rules).
X Backtracking may discard valid constituents that have to be re-discovered
later (duplication of effort)
EARLEYALGORITHM
The Earley Parsing Algorithm:
an efficient top-down parsing algorithm that avoids some of the inefficiency
associated with purely naive search with the same top-down strategy
In naive search, top-down parsing is inefficient because structures are created
over and over again.
Intermediate solutions are created only once and stored in a chart (dynamic
programming).
Left-recursion problem is solved by examining the input.
Earley is not picky about what type of grammar it accepts, i.e., it accepts
arbitrary CFGs.
function Earley-Parse(words,grammar) returns chart
Enqueue((γ → •S, [0,0]),chart[0])
for i ← from 0 to Length(words) do
for each state in chart[i] do
if Incomplete?(state) and Next-Cat(state) is not POS then
Predictor(state)
elseif Incomplete?(state) and Next-Cat(state) is POS then
Scanner(state)
else
Completer(state)
end
end
return(chart)
A state consists of:
1) A subtree corresponding to a grammar rule S → NP VP
2) Info about progress made towards completing this subtree S → NP • VP
3) The position of the subtree wrt input S → NP • VP, [0, 3]
4) Pointers to all contributing states in the case of a parser
A dotted rule is a data structure used in top-down parsing to record partial
solutions towards discovering a constituent.
Earley: fundamental operations
1) Predict sub-structure (based on grammar)
2) Scan partial solutions for a match
3) Complete a sub-structure (i.e., build constituents)
How to represent progress towards finding an S node?
Add a dummy rule to grammar: γ → • S
This seeds the chart as the base case for recursion.
Earley's dot notation: given a production X → αβ, the notation X → α • β represents a condition in which α has already
been parsed and β is expected.
CYKALGORITHM
CYK algorithm is a parsing algorithm for context free grammar.
In order to apply CYK algorithm to a grammar, it must be in
Chomsky Normal Form. It uses a dynamic programming algorithm
to tell whether a string is in the language of a grammar.
Syntactic analysis in NLP
Syntactic analysis in NLP
Syntactic analysis in NLP
“abbb”
Length 1 strings:
a,b,b,b
a =>(11) or (aa) , a can be derived from A
b=>(44) or (33) or (22), b can derived from B
Length 2 strings
ab =>(11)or (22) ,(11)=>A and (22)=>B ,concatenate them
ie AB ,AB can be derived from {S,B}
bb =>22 or 33 ,22=>B and 33=>B, concatenate them ie BB can be derived from { A}
S->AB,A->BB|a,B->AB |b
“abbb” S->AB,A->BB|a,B->AB |b
abb (11) and (23) or 12 and 33
AA or (SB) and B=> SB and BB
“bbb”=>234
22 and 34 or 23 and 44
BA or AB =>S and B
“abbb” 1234
11+24 and 12+34 and 13+44
{S,B}
S->AB,A->BB|a,B->AB |b Input_string=”abbb”
b (4) b (3) b (2) a (1)
a (1) {B,S} {A} {S,B} {A}
b (2) {B,S} {A} {B}
b (3) {A} {B}
b (4) {B}
S->AB|BC,A->BA|a,B->CC|b,C->AB|a “baaba”
b (1) a (2) a(3) b (4) a(5)
b (1)
a (2)
a (3)
b (4)
a (5)
Syntactic analysis in NLP
Syntactic analysis in NLP
Syntactic analysis in NLP
Syntactic analysis in NLP
Syntactic analysis in NLP
Syntactic analysis in NLP
Parsing using Probabilistic Context
Free Grammars
Syntactic analysis in NLP
REFERENCES
Basics of
PCFG:-https://www.youtube.com/watch?v=wSONlMwa9rE&list=PLlQBy7xY8mbKypSJe_AjVtCuXXsdODiDi
&index=3
https://www.youtube.com/watch?v=DjwH9wzCFzg&list=PLlQBy7xY8mbKypSJe_AjVtCuXXsdODiDi&index=
4
CYK ALGORITHM:-
https://www.youtube.com/watch?v=xRMn6HK84io
https://www.geeksforgeeks.org/cyk-algorithm-for-context-free-grammar/
PARSING:-
https://www.tutorialspoint.com/compiler_design/compiler_design_types_of_parsing.htm
Documents: http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/pcfgs.pdf

More Related Content

PPTX
Natural Language Processing: Parsing
PPTX
Introduction to natural language processing, history and origin
PPTX
Natural language processing (NLP)
PPTX
Natural language processing PPT presentation
PPTX
PPTX
Natural Language Processing in AI
PPT
Natural language processing
PPSX
Semantic analysis
Natural Language Processing: Parsing
Introduction to natural language processing, history and origin
Natural language processing (NLP)
Natural language processing PPT presentation
Natural Language Processing in AI
Natural language processing
Semantic analysis

What's hot (20)

PPTX
Lexical analysis - Compiler Design
PPTX
Lexical Analysis - Compiler Design
PDF
Challenges in nlp
PPT
Lexical Analysis
PPT
Chapter 5 Syntax Directed Translation
PPTX
Matching techniques
PPT
Intermediate code generation (Compiler Design)
PPTX
Spell checker using Natural language processing
PPTX
Regular Expression to Finite Automata
PDF
Syntax analysis
PPTX
Morphological Analysis
PPTX
NLP_KASHK:N-Grams
PDF
Lecture Notes-Finite State Automata for NLP.pdf
PPTX
5. phases of nlp
PPT
Introduction to Compiler design
PPTX
Types of Parser
PPTX
Lecture 10 semantic analysis 01
PPTX
PPTX
Recognition-of-tokens
Lexical analysis - Compiler Design
Lexical Analysis - Compiler Design
Challenges in nlp
Lexical Analysis
Chapter 5 Syntax Directed Translation
Matching techniques
Intermediate code generation (Compiler Design)
Spell checker using Natural language processing
Regular Expression to Finite Automata
Syntax analysis
Morphological Analysis
NLP_KASHK:N-Grams
Lecture Notes-Finite State Automata for NLP.pdf
5. phases of nlp
Introduction to Compiler design
Types of Parser
Lecture 10 semantic analysis 01
Recognition-of-tokens
Ad

Similar to Syntactic analysis in NLP (20)

PPT
Parsing
PPT
Cd2 [autosaved]
PPTX
3. Syntax Analyzer.pptx
PPTX
Top down parsering and bottom up parsering.pptx
DOC
PPT
BUP-1 (1).pptLec 11-BUP-1 (1).pptpptpptppt
PPTX
Chapter-3 compiler.pptx course materials
PDF
3b. LMD & RMD.pdf
PPTX
Parsing
PPTX
Syntactic Analysis in Compiler Construction
PDF
LL(1) and the LR family of parsers used in compilers
PPTX
Syntax Analysis.pptx
PPTX
Parsing (Automata)
PPT
51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt
PPT
51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt
PPTX
Top down parsing
PDF
Lecture9 syntax analysis_5
PPTX
Compiler Deisgn-Varrious parsing methods
Parsing
Cd2 [autosaved]
3. Syntax Analyzer.pptx
Top down parsering and bottom up parsering.pptx
BUP-1 (1).pptLec 11-BUP-1 (1).pptpptpptppt
Chapter-3 compiler.pptx course materials
3b. LMD & RMD.pdf
Parsing
Syntactic Analysis in Compiler Construction
LL(1) and the LR family of parsers used in compilers
Syntax Analysis.pptx
Parsing (Automata)
51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt
51114.-Compiler-Design-Syntax-Analysis-Top-down.ppt
Top down parsing
Lecture9 syntax analysis_5
Compiler Deisgn-Varrious parsing methods
Ad

Recently uploaded (20)

PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
DOCX
573137875-Attendance-Management-System-original
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Sustainable Sites - Green Building Construction
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
PPT on Performance Review to get promotions
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Foundation to blockchain - A guide to Blockchain Tech
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Model Code of Practice - Construction Work - 21102022 .pdf
573137875-Attendance-Management-System-original
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Operating System & Kernel Study Guide-1 - converted.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Internet of Things (IOT) - A guide to understanding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Sustainable Sites - Green Building Construction
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
CH1 Production IntroductoryConcepts.pptx
PPT on Performance Review to get promotions
CYBER-CRIMES AND SECURITY A guide to understanding
Lecture Notes Electrical Wiring System Components
Foundation to blockchain - A guide to Blockchain Tech

Syntactic analysis in NLP

  • 2. PARSING ? ● Also called as Syntactic analysis or syntax analysis ● Syntactic analysis or parsing or syntax analysis is the third phase of NLP ● The word ‘Parsing’ is originated from Latin word ‘pars’ (which means ‘part’) ● Comparing the rules of formal grammar, syntax analysis checks the text for meaningfulness. ● The sentence like “Give me hot ice-cream”, for example, would be rejected by parser or syntactic analyzer. ● The purpose of this phase is to draw exact meaning, or you can say dictionary meaning from the text.
  • 3. It may be defined as the process of analyzing the strings of symbols in natural language conforming to the rules of formal grammar.
  • 5. ROLE OF PARSER ➢ To report any syntax error. ➢ It helps to recover from commonly occurring error so that the processing of the remainder of program can be continued. ➢ To create parse tree ➢ To create symbol table. ➢ To produce intermediate representations (IR).
  • 6. DEEPVS SHALLOW PARSING DEEP PARSING SHALLOW PARSING In deep parsing, the search strategy will give a complete syntactic structure to a sentence It is the task of parsing a limited part of the syntactic information from the given task. It is suitable for complex NLP applications It can be used for less complex applications EXAMPLES:- Dialogue Summarisation and summarisation EXAMPLES:- Information extraction and text mining It is called Full Parsing It is called Chunking
  • 7. TYPES OF PARSING Parsing is classified into two categories, i.e. Top Down Parsing and Bottom-Up Parsing. Top-Down Parsing is based on Left Most Derivation Bottom Up Parsing is dependent on Reverse rightmost Derivation.
  • 8. DERIVATION Derivation is a sequence of production rules. It is used to get the input string through these production rules. During parsing we have to take two decisions. These are as follows: ● We have to decide the non-terminal which is to be replaced. ● We have to decide the production rule by which the non-terminal will be replaced. Types of Derivation: Leftmost and rightmost
  • 9. LEFT MOST DERIVATION In the leftmost derivation, the input is scanned and replaced with the production rule from left to right. So in left most derivatives we read the input string from left to right. Example: Production rules: 1. S = S + S 2. S = S - S 3. S = a | b |c INPUT a - b + c
  • 10. EXAMPLE OF LEFT MOST DERIVATION 1. S = S + S 2. S = S - S + S 3. S = a - S + S 4. S = a - b + S 5. S = a - b + c
  • 11. RIGHT MOST DERIVATION In the right most derivation, the input is scanned and replaced with the production rule from right to left. So in right most derivatives we read the input string from right to left. EXAMPLE:- 1. S = S - S 2. S = S - S + S 3. S = S - S + c 4. S = S - b + c 5. S = a - b + c
  • 12. TOP DOWN PARSING The process of constructing the parse tree which starts from the root and goes down to the leaf is Top-Down Parsing. ● Top-Down Parsers constructs from the Grammar which is free from ambiguity and left recursion. ● Top Down Parsers uses leftmost derivation to construct a parse tree. ● It allows a grammar which is free from Left Factoring. EXAMPLE OF LEFT FACTORING- S → iEtS / iEtSeS / a E → b
  • 14. WORKING OF TOP DOWN PARSER EXAMPLE:- S -> aABe A -> Abc | b B -> d Input – abbcde
  • 15. Now, you will see that how top down approach works. Here, you will see how you can generate a input string from the grammar for top down approach. ● First, you can start with S -> a A B e and then you will see input string a in the beginning and e in the end. ● Now, you need to generate abbcde . ● Expand A-> Abc and Expand B-> d. ● Now, You have string like aAbcde and your input string is abbcde. ● Expand A->b. ● Final string, you will get abbcde.
  • 16. BOTTOM UP PARSER OR SHIFT REDUCE PARSERS Bottom Up Parsers / Shift Reduce Parsers Build the parse tree from leaves to root. Bottom-up parsing can be defined as an attempt to reduce the input string w to the start symbol of grammar by tracing out the rightmost derivations of w in reverse.
  • 18. BOTTOM UP PARSER There are two unique steps for bottom-up parsing. These steps are known as shift-step and reduce-step. ● Shift step: The shift step refers to the advancement of the input pointer to the next input symbol, which is called the shifted symbol. This symbol is pushed onto the stack. The shifted symbol is treated as a single node of the parse tree. ● Reduce step : When the parser finds a complete grammar rule (RHS) and replaces it to (LHS), it is known as reduce-step. This occurs when the top of the stack contains a handle. To reduce, a POP function is performed on the stack which pops off the handle and replaces it with LHS non-terminal symbol.
  • 19. LR PARSER A general shift reduce parsing is LR parsing. The L stands for scanning the input from left to right and R stands for constructing a rightmost derivation in reverse. Benefits of LR parsing: 1. Many programming languages using some variations of an LR parser. It should be noted that C++ and Perl are exceptions to it. 2. LR Parser can be implemented very efficiently. 3. Of all the Parsers that scan their symbols from left to right, LR Parsers detect syntactic errors, as soon as possible.
  • 21. ISSUES IN BASIC PARSING √ Never explores trees that aren’t potential solutions, ones with the wrong kind of root node. X But explores trees that do not match the input sentence (predicts input before inspecting input). X Naive top-down parsers never terminate if G contains recursive rules like X → X Y (left recursive rules). X Backtracking may discard valid constituents that have to be re-discovered later (duplication of effort)
  • 22. EARLEYALGORITHM The Earley Parsing Algorithm: an efficient top-down parsing algorithm that avoids some of the inefficiency associated with purely naive search with the same top-down strategy In naive search, top-down parsing is inefficient because structures are created over and over again.
  • 23. Intermediate solutions are created only once and stored in a chart (dynamic programming). Left-recursion problem is solved by examining the input. Earley is not picky about what type of grammar it accepts, i.e., it accepts arbitrary CFGs.
  • 24. function Earley-Parse(words,grammar) returns chart Enqueue((γ → •S, [0,0]),chart[0]) for i ← from 0 to Length(words) do for each state in chart[i] do if Incomplete?(state) and Next-Cat(state) is not POS then Predictor(state) elseif Incomplete?(state) and Next-Cat(state) is POS then Scanner(state) else Completer(state) end end return(chart)
  • 25. A state consists of: 1) A subtree corresponding to a grammar rule S → NP VP 2) Info about progress made towards completing this subtree S → NP • VP 3) The position of the subtree wrt input S → NP • VP, [0, 3] 4) Pointers to all contributing states in the case of a parser A dotted rule is a data structure used in top-down parsing to record partial solutions towards discovering a constituent. Earley: fundamental operations 1) Predict sub-structure (based on grammar) 2) Scan partial solutions for a match 3) Complete a sub-structure (i.e., build constituents)
  • 26. How to represent progress towards finding an S node? Add a dummy rule to grammar: γ → • S This seeds the chart as the base case for recursion. Earley's dot notation: given a production X → αβ, the notation X → α • β represents a condition in which α has already been parsed and β is expected.
  • 27. CYKALGORITHM CYK algorithm is a parsing algorithm for context free grammar. In order to apply CYK algorithm to a grammar, it must be in Chomsky Normal Form. It uses a dynamic programming algorithm to tell whether a string is in the language of a grammar.
  • 31. “abbb” Length 1 strings: a,b,b,b a =>(11) or (aa) , a can be derived from A b=>(44) or (33) or (22), b can derived from B Length 2 strings ab =>(11)or (22) ,(11)=>A and (22)=>B ,concatenate them ie AB ,AB can be derived from {S,B} bb =>22 or 33 ,22=>B and 33=>B, concatenate them ie BB can be derived from { A} S->AB,A->BB|a,B->AB |b
  • 32. “abbb” S->AB,A->BB|a,B->AB |b abb (11) and (23) or 12 and 33 AA or (SB) and B=> SB and BB “bbb”=>234 22 and 34 or 23 and 44 BA or AB =>S and B “abbb” 1234 11+24 and 12+34 and 13+44 {S,B}
  • 33. S->AB,A->BB|a,B->AB |b Input_string=”abbb” b (4) b (3) b (2) a (1) a (1) {B,S} {A} {S,B} {A} b (2) {B,S} {A} {B} b (3) {A} {B} b (4) {B}
  • 34. S->AB|BC,A->BA|a,B->CC|b,C->AB|a “baaba” b (1) a (2) a(3) b (4) a(5) b (1) a (2) a (3) b (4) a (5)
  • 41. Parsing using Probabilistic Context Free Grammars