Learning for semantic parsing using statistical syntactic parsing techniques

1
Learning for Semantic Parsing Using
Statistical Syntactic Parsing
Techniques
Ruifang Ge
Ph.D. Final Defense
Supervisor: Raymond J. Mooney
Machine Learning Group
Department of Computer Science
The University of Texas at Austin

2
Semantic Parsing
 Semantic Parsing: Transforming natural
language (NL) sentences into completely
formal meaning representations (MRs)
 Sample application domains where MRs
are directly executable by another
computer system to perform some task
 CLang: Robocup Coach Language
 Geoquery: A Database Query Application

3
CLang (RoboCup Coach Language)
 In RoboCup Coach competition, teams compete to coach
simulated players
 The coaching instructions are given in a formal language
called CLang
Simulated soccer field
Coach
If our player 2 has
the ball, then position
our player 5 in the
midfield.
CLang ((bowner (player our {2}))
(do (player our {5}) (pos (midfield))))
Semantic Parsing

4
GeoQuery: A Database Query Application
 Query application for U.S. geography
database [Zelle & Mooney, 1996]
User What are the
rivers in Texas?
Semantic Parsing
DataBaseAngelina,Angelina,
Blanco, …Blanco, …
Query answer(x1, (river(x1), loc(x1,x2),
equal(x2,stateid(texas))))

5
Motivation for Semantic
Parsing
 Theoretically, it answers the question of how
people interpret language
 Practical applications
 Question answering
 Natural language interface
 Knowledge acquisition
 Reasoning

6
Motivating Example
Semantic parsing is a compositional process.
Sentence structures are needed for building meaning representations.
((bowner (player our {2})) (do our {4} (pos (half our))))
If our player 2 has the ball, our player 4 should stay in our half
bowner: ball owner
pos: position

7
Syntax-Based Approaches
 Meaning composition follows the tree structure of a
syntactic parse
 Composing the meaning of a constituent from the
meanings of its sub-constituents in a syntactic parse
 Hand-built approaches (Woods, 1970, Warren and Pereira,
1982)
 Learned approaches
 Miller et al. (1996): Conceptually simple sentences
 Zettlemoyer & Collins (2005)): hand-built Combinatory
Categorial Grammar (CCG) template rules

8
Example
our player 2 has
the ball
PRP$ NN CD VB
DT NN
NP
VPNP
S
MR: bowner(player(our,2))
Use the structure of a syntactic parse

9
our player 2 has
the ball
PRP$-our NN-player(_,_) CD-2 VB-bowner(_)
DT-null NN-null
NP
VPNP
S
Example
Assign semantic concepts to words

10
our player 2 has
the ball
DT-null NN-null
NP
VPNP-player(our,2)
S
Example
Compose meaning for the internal nodes

11
our player 2 has
the ball
DT-null NN-null
NP-null
VP-bowner(_)NP-player(our,2)
S
Example

12
our player 2 has
the ball
DT-null NN-null
NP-null
VP-bowner(_)NP-player(our,2)
S-bowner(player(our,2))
Example

13
Semantic Grammars
 Non-terminals in a semantic grammar
correspond to semantic concepts in
application domains
 Hand-built approaches (Hendrix et al., 1978)
 Learned approaches
 Tang & Mooney (2001), Kate & Mooney (2006), Wong &
Mooney (2006)

14
our player 2
has the ball
our 2
player
bowner
Example
bowner → player has the ball

15
Thesis Contributions
 Introduce two novel syntax-based
approaches to semantic parsing
 Theoretically well-founded in computational
semantics (Blackburn and Bos, 2005)
 Great opportunity: leverage the significant
progress made in statistical syntactic parsing
for semantic parsing (Collins, 1997; Charniak and
Johnson, 2005; Huang, 2008)

16
Thesis Contributions
 SCISSOR: a novel integrated syntactic-
semantic parser
 SYNSEM: exploits an existing syntactic parser
to produce disambiguated parse trees that
drive the compositional meaning composition
 Investigate when the knowledge of syntax
can help

17
Representing Semantic Knowledge in
Meaning Representation Language Grammar
(MRLG)
Production Predicate
CONDITION →(bowner PLAYER) P_BOWNER
PLAYER →(player TEAM {UNUM}) P_PLAYER
UNUM → 2 P_UNUM
TEAM → our P_OUR
 Assumes a meaning representation language (MRL)
is defined by an unambiguous context-free
grammar.
 Each production rule introduces a single predicate
in the MRL.
 The parse of a MR gives its predicate-argument
structure.

18
Roadmap
 SCISSOR
 SYNSEM
 Future Work
 Conclusions

19
 Semantic Composition that Integrates Syntax
and Semantics to get Optimal Representations
 Integrated syntactic-semantic parsing
 Allows both syntax and semantics to be used
simultaneously to obtain an accurate combined
syntactic-semantic analysis
 A statistical parser is used to generate a
semantically augmented parse tree (SAPT)
SCISSOR

20
Syntactic Parse
PRP$ NN CD VB
DT NN
NP
VPNP
S
our player 2 has
the ball

21
SAPT
PRP$-P_OUR NN-P_PLAYER CD- P_UNUM VB-P_BOWNER
DT-NULL NN-NULL
NP-NULL
VP-P_BOWNERNP-P_PLAYER
S-P_BOWNER
our player 2 has
the ball
Non-terminals now have both syntactic and semantic labels
Semantic labels: dominate predicates in the sub-trees

22
SAPT
DT-NULL NN-NULL
NP-NULL
S-P_BOWNER
our player 2 has
the ball
MR: P_BOWNER(P_PLAYER(P_OUR,P_UNUM))

23
SCISSOR Overview
Integrated Semantic ParserSAPT Training Examples
TRAINING
learner

24
Integrated Semantic Parser
SAPT
Compose MR
MR
NL Sentence
TESTING
SCISSOR Overview

25
Extending Collins’ (1997) Syntactic
Parsing Model
 Find a SAPT with the maximum probability
 A lexicalized head-driven syntactic parsing
model
 Extending the parsing model to generate
semantic labels simultaneously with syntactic
labels

26
Why Extending Collins’ (1997) Syntactic
Parsing Model
 Suitable for incorporating semantic
knowledge
 Head dependency: predicate-argument relation
 Syntactic subcategorization: a set of arguments
that a predicate appears with
 Bikel (2004) implementation: easily
extendable

27
Parser Implementation
 Supervised training on annotated SAPTs is
just frequency counting
 Testing: a variant of standard CKY chart-
parsing algorithm
 Details in the thesis

28
Smoothing
 Each label in SAPT is the combination of a
syntactic label and a semantic label
 Increases data sparsity
 Break the parameters down
Ph(H | P, w)
= Ph(Hsyn, Hsem| P, w)
= Ph(Hsyn | P, w) × Ph(Hsem| P, w, Hsyn)

29
Experimental Corpora
 CLang (Kate, Wong & Mooney, 2005)
 300 pieces of coaching advice
 22.52 words per sentence
 Geoquery (Zelle & Mooney, 1996)
 880 queries on a geography database
 7.48 word per sentence
 MRL: Prolog and FunQL

30
Prolog vs. FunQL (Wong, 2007)
Prolog:
answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas))))
What are the rivers in Texas?
FunQL:
answer(river(loc_2(stateid(texas))))
X1: river; x2: texas
Logical forms: widely used as MRLs in
computational semantics, support reasoning

31
Prolog:
FunQL:
answer(river(loc_2(stateid(texas))))
Flexible order
Strict order
Better generalization on Prolog
Prolog vs. FunQL (Wong, 2007)

32
Experimental Methodology
 standard 10-fold cross validation
 Correctness
 CLang: exactly matches the correct MR
 Geoquery: retrieves the same answers as the
correct MR
 Metrics
 Precision: % of the returned MRs that are correct
 Recall: % of NLs with their MRs correctly returned
 F-measure: harmonic mean of precision and recall

33
Compared Systems
 COCKTAIL (Tang & Mooney, 2001)
 Deterministic, inductive logic programming
 WASP (Wong & Mooney, 2006)
 Semantic grammar, machine translation
 KRISP (Kate & Mooney, 2006)
 Semantic grammar, string kernels
 Z&C (Zettleymoyer & Collins, 2007)
 Syntax-based, combinatory categorial grammar (CCG)
 LU (Lu et al., 2008)
 Semantic grammar, generative parsing model

34
Compared Systems
 LU (Lu et al., 2008)
Hand-built
lexicon
for Geoquery
Manual CCG
Template rules

35
Compared Systems
 LU (Lu et al., 2008)
λ-WASP,
handling logical
forms

36
Results on CLang
Precision Recall F-measure
COCKTAIL - - -
SCISSOR 89.5 73.7 80.8
WASP 88.9 61.9 73.0
KRISP 85.2 61.9 71.7
Z&C - - -
LU 82.4 57.7 67.8
(LU: F-measure after reranking is 74.4%)
Memory
overflow
Not reported

37
Results on CLang
SCISSOR 89.5 73.7 80.8
WASP 88.9 61.9 73.0
KRISP 85.2 61.9 71.7
LU 82.4 57.7 67.8

38
Results on Geoquery
SCISSOR 92.1 72.3 81.0
WASP 87.2 74.8 80.5
KRISP 93.3 71.7 81.1
LU 86.2 81.8 84.0
COCKTAIL 89.9 79.4 84.3
λ-WASP 92.0 86.6 89.2
Z&C 95.5 83.2 88.9
Prolog
FunQL

39
Results on Geoquery (FunQL)
SCISSOR 92.1 72.3 81.0
WASP 87.2 74.8 80.5
KRISP 93.3 71.7 81.1
LU 86.2 81.8 84.0
competitive

40
Why Knowledge of Syntax
does not Help
 Geoquery: 7.48 word per sentence
 Short sentence
 Sentence structure can be feasibly learned from
NLs paired with MRs
 Gain from knowledge of syntax vs.
flexibility loss

41
Limitation of Using Prior
Knowledge of Syntax
What state
is the smallest
N1
N2
answer(smallest(state(all)))
Traditional syntactic analysis

42
Limitation of Using Prior
Knowledge of Syntax
What state
is the smallest
state is the smallest
N1
What N2
N1
N2
answer(smallest(state(all))) answer(smallest(state(all)))
Traditional syntactic analysis Semantic grammar
Isomorphic syntactic structure with MR
Better generalization

43
Why Prior Knowledge of
Syntax does not Help
 Geoquery: 7.48 word per sentence
 Short sentence
 Sentence structure can be feasibly learned from
NLs paired with MRs
 Gain from knowledge of syntax vs.
flexibility loss
 LU vs. WASP and KRISP
 Decomposed model for semantic grammar

44
Detailed Clang Results on
Sentence Length
0-10
(7%)
11-20
(33%)
21-30
(46%)
31-40
(13%)
0-10
(7%)
11-20
(33%)
21-30
(46%)
0-10
(7%)
11-20
(33%)
31-40
(13%)
21-30
(46%)
0-10
(7%)
11-20
(33%)

45
SCISSOR Summary
 Integrated syntactic-semantic parsing
approach
 Learns accurate semantic interpretations
by utilizing the SAPT annotations
 knowledge of syntax improves performance
on long sentences

46
Roadmap
 SCISSOR
 SYNSEM
 Future Work
 Conclusions

47
SYNSEM Motivation
 SCISSOR requires extra SAPT annotation for
training
 Must learn both syntax and semantics from
same limited training corpus
 High performance syntactic parsers are
available that are trained on existing large
corpora (Collins, 1997; Charniak & Johnson,
2005)

48
SCISSOR Requires SAPT Annotation
DT-NULL NN-NULL
NP-NULL
S-P_BOWNER
our player 2 has
the ball
Time consuming.
Automate it!

49
Part I: Syntactic Parse
PRP$ NN CD VB
DT NN
NP
VPNP
S
our player 2 has
the ball
Use a statistical syntactic parser

50
Part II: Word Meanings
P_OUR P_PLAYER P_UNUM P_BOWNER NULL NULL
our player 2 has
the ball
Use a word alignment model (Wong and
Mooney (2006) )
our player 2 has ballthe
P_PLAYERP_BOWNER P_OUR P_UNUM

51
Learning a Semantic Lexicon
 IBM Model 5 word alignment (GIZA++)
 top 5 word/predicate alignments for each training
example
 Assume each word alignment and syntactic parse
defines a possible SAPT for composing the correct
MR

52
Introducing λvariables in semantic labels for
missing arguments
(a1: the first argument)
VP
S
NP
NP
P_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
NP

53
VP
S
NP
NP
P_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
P_BOWNER
P_PLAYER
P_UNUMP_OUR
Part III: Internal Semantic Labels
How to choose the dominant predicates?
NP

54
λa1λa2P_PLAYER P_UNUM
?
player 2
P_BOWNER
P_PLAYER
P_UNUMP_OUR
, a2=c2P_PLAYERλa1λa2PLAYER + P_UNUM  λa1
(c2: child 2)
Learning Semantic Composition Rules

55
VP
S
NPP_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
λa1P_PLAYER
?
λa1λa2PLAYER + P_UNUM  {λa1P_PLAYER, a2=c2}
P_BOWNER
P_PLAYER
P_UNUMP_OUR

56
VP
S
P_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
λa1P_PLAYER ?
P_PLAYER
P_OUR +λa1P_PLAYER  {P_PLAYER, a1=c1}
P_BOWNER
P_PLAYER
P_UNUMP_OUR

57
P_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
λa1P_PLAYER
P_PLAYER
NULL
λa1P_BOWNER
?
P_BOWNER
P_PLAYER
P_UNUMP_OUR

58
P_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
λa1P_PLAYER
P_PLAYER
NULL
λa1P_BOWNER
P_BOWNER
P_PLAYER + λa1P_BOWNER  {P_BOWNER, a1=c1}
P_BOWNER
P_PLAYER
P_UNUMP_OUR

59
Ensuring Meaning Composition
What state
is the smallest
N1
N2
answer(smallest(state(all)))
Non-isomorphism

60
Ensuring Meaning Composition
 Non-isomorphism between NL parse and MR
parse
 Various linguistic phenomena
 Machine translation between NL and MRL
 Use automated syntactic parses
 Introduce macro-predicates that combine
multiple predicates.
 Ensure that MR can be composed using a
syntactic parse and word alignment

61
Unambiguous CFG of MRL
Training set, {(S,T,MR)}
Training
Semantic parsing
Input sentence parse T Output MR
Testing
Before training & testing
training/test sentence, S
Syntactic parser
syntactic parse tree,T
Semantic knowledge acquisition
Semantic lexicon & composition rules
Parameter estimation
Probabilistic parsing model
SYNSEM Overview

62
Unambiguous CFG of MRL
Training set, {(S,T,MR)}
Training
Semantic parsing
Input sentence, S Output MR
Testing
Before training & testing
training/test sentence, S
Syntactic parser
syntactic parse tree,T
Semantic knowledge acquisition
Semantic lexicon & composition rules
Parameter estimation
Probabilistic parsing model
SYNSEM Overview

63
Parameter Estimation
• Apply the learned semantic knowledge to all training
examples to generate possible SAPTs
• Use a standard maximum-entropy model similar to that
of Zettlemoyer & Collins (2005), and Wong & Mooney
(2006)
• Training finds a parameter that (approximately)
maximizes the sum of the conditional log-likelihood of
the training set including syntactic parses
• Incomplete data since SAPTs are hidden variables

64
Features
 Lexical features:
 Unigram features: # that a word is assigned a
predicate
 Bigram features: # that a word is assigned a
predicate given its previous/subsequent word.
 Rule features: # a composition rule applied in
a derivation

65
λv1P_ANSWER(x1)
λv1P_RIVER(x1) λv1λv2P_LOC(x1,x2) λv1P_EQUAL(x2)
Handling Logical Forms
Handle shared logical variables
Use Lambda Calculus (v: variable)

66
Prolog Example
λv1P_ANSWER(x1)
(λv1P_RIVER(x1) λv1 λv2P_LOC(x1,x2) λv1P_EQUAL(x2))

67
Prolog Example
λv1P_ANSWER(x1)
(λv1P_RIVER(x1) λv1λv2P_LOC(x1,x2) λv1P_EQUAL(x2))

68
Prolog Example
What are the rivers in Texas
NP
PP
IN
SBARQ
NP
NP
SQ
VBPWHNP
Start from a
syntactic parse

69
Prolog Example
PP
SBARQ
NP
SQ
λv1λa1P_ANSWER NULL λv1P_RIVER λv1λv2P_LOC λv1P_EQUAL
Add predicates to
words

70
Prolog Example
SBARQ
NP
SQ
λv1λa1P_ANSWER NULL λv1P_RIVER λv1λv2P_LOC λv1P_EQUAL
λv1P_LOC
Learn a rule with
variable unification
λv1λv2P_LOC(x1,x2) + λv1P_EQUAL(x2)  λv1P_LOC

71
Experimental Results
 CLang
 Geoquery (Prolog)

72
Syntactic Parsers (Bikel,2004)
 WSJ only
 CLang(SYN0): F-measure=82.15%
 Geoquery(SYN0) : F-measure=76.44%
 WSJ + in-domain sentences
 CLang(SYN20): 20 sentences, F-measure=88.21%
 Geoquery(SYN40): 40 sentences, F-measure=91.46%
 Gold-standard syntactic parses (GOLDSYN)

73
Questions
 Q1. Can SYNSEM produce accurate semantic
interpretations?
 Q2. Can more accurate Treebank syntactic parsers
produce more accurate semantic parsers?
 Q3. Does it also improve on long sentences?
 Q4. Does it improve on limited training data due
to the prior knowledge from large treebanks?
 Q5. Can it handle syntactic errors?

74
Results on CLang
GOLDSYN 84.7 74.0 79.0
SYN20 85.4 70.0 76.9
SYN0 87.0 67.0 75.7
SCISSOR 89.5 73.7 80.8
WASP 88.9 61.9 73.0
KRISP 85.2 61.9 71.7
LU 82.4 57.7 67.8
SYNSEM
SAPTs
GOLDSYN > SYN20 > SYN0

75
Questions
 Q1. Can SynSem produce accurate semantic
interpretations? [yes]
produce more accurate semantic parsers? [yes]
 Q3. Does it also improve on long sentences?

76
Detailed Clang Results on Sentence Length
31-40
(13%)
21-30
(46%)
0-10
(7%)
11-20
(33%)
Prior
Knowledge
Syntactic
error
+ Flexibility + = ?

77
Questions
 Q3. Does it also improve on long sentences? [yes]
to the prior knowledge from large treebanks?

78
Results on Clang
(training size = 40)
GOLDSYN 61.1 35.7 45.1
SYN20 57.8 31.0 40.4
SYN0 53.5 22.7 31.9
SCISSOR 85.0 23.0 36.2
WASP 88.0 14.4 24.7
KRISP 68.35 20.0 31.0
SYNSEM
SAPTs
The quality of syntactic parser is critically important!

79
Questions
to the prior knowledge from large treebanks? [yes]
 Q5. Can it handle syntactic errors?

80
Handling Syntactic Errors
 Training ensures meaning composition from
syntactic parses with errors
 For test NLs that generate correct MRs, measure
the F-measures of their syntactic parses
 SYN0: 85.5%
 SYN20: 91.2%
If DR2C7 is true
then players 2 , 3 , 7 and 8 should pass to player 4

81
Questions
to the prior knowledge of large treebanks? [yes]
 Q5. Is it robust to syntactic errors? [yes]

82
Results on Geoquery (Prolog)
GOLDSYN 91.9 88.2 90.0
SYN40 90.2 86.9 88.5
SYN0 81.8 79.0 80.4
COCKTAIL 89.9 79.4 84.3
λ-WASP 92.0 86.6 89.2
Z&C 95.5 83.2 88.9
SYNSEM
SYN0 does not perform well
All other recent systems perform competitively

83
SYNSEM Summary
 Exploits an existing syntactic parser to drive
the meaning composition process
 Prior knowledge of syntax improves
performance on long sentences
 Prior knowledge of syntax improves
performance on limited training data
 Handle syntactic errors

84
Discriminative Reranking for
semantic Parsing
 Adapt global features used for reranking
syntactic parsing for semantic parsing
 Improvement on CLang
 No improvement on Geoquery where
sentences are short, and are less likely for
global features to show improvement on

85
Roadmap
 SCISSOR
 SYNSEM
 Future Work
 Conclusions

86
Future Work
 Improve SCISSOR
 Discriminative SCISSOR (Finkel, et al., 2008)
 Handling logical forms
 SCISSOR without extra annotation (Klein and
Manning, 2002, 2004)
 Improve SYNSEM
 Utilizing syntactic parsers with improved accuracy
and in other syntactic formalism

87
Future Work
 Utilizing wide-coverage semantic
representations (Curran et al., 2007)
 Better generalizations for syntactic variations
 Utilizing semantic role labeling (Gildea and
Palmer, 2002)
 Provides a layer of correlated semantic
information

88
Roadmap
 SCISSOR
 SYNSEM
 Future Work
 Conclusions

89
Conclusions
 SCISSOR: a novel integrated syntactic-semantic
parser.
 SYNSEM: exploits an existing syntactic parser to
produce disambiguated parse trees that drive the
compositional meaning composition.
 Both produce accurate semantic interpretations.
 Using the knowledge of syntax improves
performance on long sentences.
 SYNSEM also improves performance on limited
training data.

Learning for semantic parsing using statistical syntactic parsing techniques

More Related Content

What's hot (20)

Similar to Learning for semantic parsing using statistical syntactic parsing techniques (20)

Recently uploaded (20)

Learning for semantic parsing using statistical syntactic parsing techniques

Editor's Notes