Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser for Universal Conceptual Cognitive Annotation

1
A Transition-Based Directed Acyclic Graph Parser
for Universal Conceptual Cognitive Annotation
Daniel Hershcovich, Omri Abend and Ari Rappoport
ACL 2017

2
TUPA — Transition-based UCCA Parser
The ﬁrst parser to support the combination of three properties:
1. Non-terminal nodes — entities and events over the text
You want
to
take a long bath

3
2. Reentrancy — allow argument sharing
You want
to
take a long bath

4
2. Reentrancy — allow argument sharing
3. Discontinuity — conceptual units are split
— needed for many semantic schemes (e.g. AMR, UCCA).
You want
to
take a long bath

6
Linguistic Structure Annotation Schemes
• Syntactic dependencies
• Semantic dependencies (Oepen et al., 2016)
Syntactic (UD)
You want to take a long bath
root
nsubj
xcomp
mark
dobj
det
amod
top
ARG2
ARG1
ARG1
ARG2
BV
ARG1
Semantic (DM)
Bilexical dependencies.

7
Linguistic Structure Annotation Schemes
• Syntactic dependencies
• Semantic dependencies (Oepen et al., 2016)
• Semantic role labeling (PropBank, FrameNet)
• AMR (Banarescu et al., 2013)
• UCCA (Abend and Rappoport, 2013)
• Other semantic representation schemes1
Semantic representation schemes attempt to abstract away from
syntactic detail that does not aﬀect meaning:
. . . bathed = . . . took a bath
1
See recent survey (Abend and Rappoport, 2017)

8
The UCCA Semantic Representation Scheme

9
Universal Conceptual Cognitive Annotation (UCCA)
Cross-linguistically applicable (Abend and Rappoport, 2013).
Stable in translation (Sulem et al., 2015).
English
Hebrew

10
Universal Conceptual Cognitive Annotation (UCCA)
Rapid and intuitive annotation interface (Abend et al., 2017).
Usable by non-experts. ucca-demo.cs.huji.ac.il
Facilitates semantics-based human evaluation of machine
translation (Birch et al., 2016). ucca.cs.huji.ac.il/mteval

11
Graph Structure
UCCA generates a directed acyclic graph (DAG).
Text tokens are terminals, complex units are non-terminal nodes.
Remote edges enable reentrancy for argument sharing.
Phrases may be discontinuous (e.g., multi-word expressions).
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D
—– primary edge
- - - remote edge
P process
A participant
C center
D adverbial
F function

12
Transition-based UCCA Parsing

13
Transition-Based Parsing
First used for dependency parsing (Nivre, 2004).
Parse text w1 . . . wn to graph G incrementally by applying
transitions to the parser state: stack, buﬀer and constructed graph.

14
Initial state:
stack buﬀer

15
Initial state:
stack buﬀer
TUPA transitions:
{Shift, Reduce, NodeX , Left-EdgeX , Right-EdgeX ,
Left-RemoteX , Right-RemoteX , Swap, Finish}
Support non-terminal nodes, reentrancy and discontinuity.

16
Example
⇒ Shift
stack
You
buﬀer
want to take a long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

17
Example
⇒ Right-EdgeA
stack
You
buﬀer
want to take a long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

18
Example
⇒ Shift
stack
You want
buﬀer
to take a long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

19
Example
⇒ Swap
stack
want
buﬀer
You to take a long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

20
Example
⇒ Right-EdgeP
stack
want
buﬀer
You to take a long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

21
Example
⇒ Reduce
stack buﬀer
to take a long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

22
Example
⇒ Shift
stack
You
buﬀer
to take a long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

23
Example
⇒ Shift
stack
You to
buﬀer
take a long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

24
Example
⇒ NodeF
stack
You to
buﬀer
take a long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

25
Example
⇒ Reduce
stack
You
buﬀer
take a long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

26
Example
⇒ Shift
stack
You
buﬀer
take a long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

27
Example
⇒ Shift
stack
You take
buﬀer
a long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

28
Example
⇒ NodeC
stack
You take
buﬀer
a long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

29
Example
⇒ Reduce
stack
You
buﬀer
a long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

30
Example
⇒ Shift
stack
You
buﬀer
a long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

31
Example
⇒ Right-EdgeP
stack
You
buﬀer
a long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

32
Example
⇒ Shift
stack
You a
buﬀer
long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

33
Example
⇒ Right-EdgeF
stack
You a
buﬀer
long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

34
Example
⇒ Reduce
stack
You
buﬀer
long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

35
Example
⇒ Shift
stack
You long
buﬀer
bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

36
Example
⇒ Swap
stack
You long
buﬀer
bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

37
Example
⇒ Right-EdgeD
stack
You long
buﬀer
bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

38
Example
⇒ Reduce
stack
You
buﬀer
bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

39
Example
⇒ Swap
stack buﬀer
You bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

40
Example
⇒ Right-EdgeA
stack buﬀer
You bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

41
Example
⇒ Reduce
stack buﬀer
You bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

42
Example
⇒ Reduce
stack buﬀer
You bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

43
Example
⇒ Shift
stack
You
buﬀer
bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

44
Example
⇒ Shift
stack
You
buﬀer
bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

45
Example
⇒ Left-RemoteA
stack
You
buﬀer
bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

46
Example
⇒ Shift
stack
You bath
buﬀer
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

47
Example
⇒ Right-EdgeC
stack
You bath
buﬀer
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

48
Example
⇒ Finish
stack
You bath
buﬀer
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D

49
Training
An oracle provides the transition sequence given the correct graph:
You
A
want
P
to
F
take
C
a
F
long bath
C
P
A
A
D
⇓
Shift, Right-EdgeA, Shift, Swap, Right-EdgeP , Reduce, Shift,
Shift, NodeF , Reduce, Shift, Shift, NodeC , Reduce, Shift,
Right-EdgeP , Shift, Right-EdgeF , Reduce, Shift, Swap,
Right-EdgeD, Reduce, Swap, Right-EdgeA, Reduce, Reduce, Shift,
Shift, Left-RemoteA, Shift, Right-EdgeC , Finish

50
TUPA Model
Learn to greedily predict transition based on current state.
Experimenting with three classifiers:
Sparse Perceptron with sparse features (Zhang and Nivre, 2011).
MLP Embeddings + feedforward NN (Chen and Manning, 2014).
BiLSTM Embeddings + deep bidirectional LSTM + MLP
(Kiperwasser and Goldberg, 2016).
Features: words, POS, syntactic dependencies, existing edge labels
from the stack and buffer + parents, children, grandchildren;
ordinal features (height, number of parents and children)
stack buffer

51
TUPA Model
Eﬀective “lookahead” encoded in the representation.
You
LSTM
want
LSTM
to
LSTM
take
LSTM
a
LSTM
long
LSTM
bath
LSTM

52
TUPA Model
You
LSTM
LSTM
want
LSTM
LSTM
to
LSTM
LSTM
take
LSTM
LSTM
a
LSTM
LSTM
long
LSTM
LSTM
bath
LSTM
LSTM

53
TUPA Model
You
LSTM
LSTM
LSTM
want
LSTM
LSTM
LSTM
to
LSTM
LSTM
LSTM
take
LSTM
LSTM
LSTM
a
LSTM
LSTM
LSTM
long
LSTM
LSTM
LSTM
bath
LSTM
LSTM
LSTM

54
TUPA Model
You
LSTM
LSTM
LSTM
LSTM
want
LSTM
LSTM
LSTM
LSTM
to
LSTM
LSTM
LSTM
LSTM
take
LSTM
LSTM
LSTM
LSTM
a
LSTM
LSTM
LSTM
LSTM
long
LSTM
LSTM
LSTM
LSTM
bath
LSTM
LSTM
LSTM
LSTM

55
stack You take
buﬀer a long bath
graph
You
A
want
P
to
F
take
C
a
F
long bath
C
You
LSTM
LSTM
LSTM
LSTM
want
LSTM
LSTM
LSTM
LSTM
to
LSTM
LSTM
LSTM
LSTM
take
LSTM
LSTM
LSTM
LSTM
a
LSTM
LSTM
LSTM
LSTM
long
LSTM
LSTM
LSTM
LSTM
bath
LSTM
LSTM
LSTM
LSTM
MLP
NodeC

57
Experimental Setup
• UCCA Wikipedia corpus (
train
4268 +
dev
454 +
test
503 sentences).
• Out-of-domain: English part of English-French parallel corpus,
Twenty Thousand Leagues Under the Sea (506 sentences).

58
Baselines
No existing UCCA parsers ⇒ conversion-based approximation.
Bilexical DAG parsers (allow reentrancy):
• DAGParser (Ribeyre et al., 2014): transition-based.
• TurboParser (Almeida and Martins, 2015): graph-based.
Tree parsers (all transition-based):
• MaltParser (Nivre et al., 2007): bilexical tree parser.
• Stack LSTM Parser (Dyer et al., 2015): bilexical tree parser.
• uparse (Maier, 2015): allows non-terminals, discontinuity.
A
A
A
F F
D
C
UCCA bilexical DAG approximation (for tree, delete remote edges).

59
Bilexical Graph Approximation
1. Convert UCCA to bilexical dependencies.
2. Train bilexical parsers and apply to test sentences.
3. Reconstruct UCCA graphs and compare with gold standard.
After
L
graduation
P
H
,
U
Joe
A
moved
P
to
R
Paris
C
A
H
A
After graduation , Joe moved to Paris
L U
A
A
H
R
A

60
Evaluation
Comparing graphs over the same sequence of tokens,
• Match edges by their terminal yield and label.
• Calculate labeled precision, recall and F1 scores.
• Separate primary and remote edges.
gold
After
L
graduation
P
H
,
U
Joe
A
moved
P
to
R
Paris
C
A
H
A
predicted
After
L
graduation
S
H
,
U
Joe
A
moved
P
to
F
Paris
A
H
A
A
Primary:
LP LR LF
6
9 = 67% 6
10 = 60% 64%
Remote:
LP LR LF
1
2 = 50% 1
1 = 100% 67%

61
Results
TUPABiLSTM obtains the highest F-scores in all metrics:
Primary edges Remote edges
LP LR LF LP LR LF
TUPASparse 64.5 63.7 64.1 19.8 13.4 16
TUPAMLP 65.2 64.6 64.9 23.7 13.2 16.9
TUPABiLSTM 74.4 72.7 73.5 47.4 51.6 49.4
Bilexical DAG (91) (58.3)
DAGParser 61.8 55.8 58.6 9.5 0.5 1
TurboParser 57.7 46 51.2 77.8 1.8 3.7
Bilexical tree (91) –
MaltParser 62.8 57.7 60.2 – – –
Stack LSTM 73.2 66.9 69.9 – – –
Tree (100) –
uparse 60.9 61.2 61.1 – – –
Results on the Wiki test set.

62
Results
Comparable on out-of-domain test set:
Primary edges Remote edges
LP LR LF LP LR LF
TUPASparse 59.6 59.9 59.8 22.2 7.7 11.5
TUPAMLP 62.3 62.6 62.5 20.9 6.3 9.7
TUPABiLSTM 68.7 68.5 68.6 38.6 18.8 25.3
Bilexical DAG (91.3) (43.4)
DAGParser 56.4 50.6 53.4 – 0 0
TurboParser 50.3 37.7 43.1 100 0.4 0.8
Bilexical tree (91.3) –
MaltParser 57.8 53 55.3 – – –
Stack LSTM 66.1 61.1 63.5 – – –
Tree (100) –
uparse 52.7 52.8 52.8 – – –
Results on the 20K Leagues out-of-domain set.

63
Conclusion
• UCCA’s semantic distinctions require a graph structure
including non-terminals, reentrancy and discontinuity.
• TUPA is an accurate transition-based UCCA parser, and the
ﬁrst to support UCCA and any DAG over the text tokens.
• Outperforms strong conversion-based baselines.
Code: github.com/danielhers/tupa
Demo: bit.ly/tupademo
Corpora: cs.huji.ac.il/˜oabend/ucca.html

64
Conclusion
Future Work:
• More languages (German corpus construction is underway).
• Parsing other schemes, such as AMR.
• Compare semantic representations through conversion.
• Text simpliﬁcation, MT evaluation and other applications.

65
Conclusion
Future Work:
• More languages (German corpus construction is underway).
• Parsing other schemes, such as AMR.
• Compare semantic representations through conversion.
• Text simpliﬁcation, MT evaluation and other applications.
Thank you!

66
References I
Abend, O. and Rappoport, A. (2013).
Universal Conceptual Cognitive Annotation (UCCA).
In Proc. of ACL, pages 228–238.
Abend, O. and Rappoport, A. (2017).
The state of the art in semantic representation.
In Proc. of ACL.
to appear.
Abend, O., Yerushalmi, S., and Rappoport, A. (2017).
UCCAApp: Web-application for syntactic and semantic phrase-based annotation.
In Proc. of ACL: System Demonstration Papers.
to appear.
Almeida, M. S. C. and Martins, A. F. T. (2015).
Lisbon: Evaluating TurboSemanticParser on multiple languages and out-of-domain data.
In Proc. of SemEval, pages 970–973.
Banarescu, L., Bonial, C., Cai, S., Georgescu, M., Griﬃtt, K., Hermjakob, U., Knight, K., Palmer, M., and
Schneider, N. (2013).
Abstract Meaning Representation for sembanking.
In Proc. of the Linguistic Annotation Workshop.
Birch, A., Abend, O., Bojar, O., and Haddow, B. (2016).
HUME: Human UCCA-based evaluation of machine translation.
In Proc. of EMNLP, pages 1264–1274.
Chen, D. and Manning, C. (2014).
A fast and accurate dependency parser using neural networks.
In Proc. of EMNLP, pages 740–750.

67
References II
Dyer, C., Ballesteros, M., Ling, W., Matthews, A., and Smith, N. A. (2015).
Transition-based dependeny parsing with stack long short-term memory.
Kiperwasser, E. and Goldberg, Y. (2016).
Simple and accurate dependency parsing using bidirectional LSTM feature representations.
TACL, 4:313–327.
Maier, W. (2015).
Discontinuous incremental shift-reduce parsing.
Nivre, J. (2004).
Incrementality in deterministic dependency parsing.
In Keller, F., Clark, S., Crocker, M., and Steedman, M., editors, Proceedings of the ACL Workshop
Incremental Parsing: Bringing Engineering and Cognition Together, pages 50–57, Barcelona, Spain.
Association for Computational Linguistics.
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S., and Marsi, E. (2007).
MaltParser: A language-independent system for data-driven dependency parsing.
Natural Language Engineering, 13(02):95–135.
Oepen, S., Kuhlmann, M., Miyao, Y., Zeman, D., Cinková, S., Flickinger, D., Hajic, J., Ivanova, A., and Uresová,
Z. (2016).
Towards comparability of linguistic graph banks for semantic parsing.
In LREC.
Ribeyre, C., Villemonte de la Clergerie, E., and Seddah, D. (2014).
Alpage: Transition-based semantic graph parsing with syntactic features.
In Proc. of SemEval, pages 97–103.

68
References III
Sulem, E., Abend, O., and Rappoport, A. (2015).
Conceptual annotations preserve structure across translations: A French-English case study.
In Proc. of S2MT, pages 11–22.
Zhang, Y. and Nivre, J. (2011).
Transition-based dependency parsing with rich non-local features.
In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human
Language Technologies, pages 188–193.

70
UCCA Corpora
Wiki 20K
Train Dev Test Leagues
# passages 300 34 33 154
# sentences 4268 454 503 506
# nodes 298,993 33,704 35,718 29,315
% terminal 42.96 43.54 42.87 42.09
% non-term. 58.33 57.60 58.35 60.01
% discont. 0.54 0.53 0.44 0.81
% reentrant 2.38 1.88 2.15 2.03
# edges 287,914 32,460 34,336 27,749
% primary 98.25 98.75 98.74 97.73
% remote 1.75 1.25 1.26 2.27
Average per non-terminal node
# children 1.67 1.68 1.66 1.61
Corpus statistics.

71
Evaluation
Mutual edges between predicted graph Gp = (Vp, Ep, p) and gold
graph Gg = (Vg , Eg , g ), both over terminals W = {w1, . . . , wn}:
M(Gp, Gg ) = (e1, e2) ∈ Ep×Eg y(e1) = y(e2)∧ p(e1) = g (e2)
The yield y(e) ⊆ W of an edge e = (u, v) in either graph is the set
of terminals in W that are descendants of v. is the edge label.
Labeled precision, recall and F-score are then deﬁned as:
LP =
|M(Gp, Gg )|
|Ep|
, LR =
|M(Gp, Gg )|
|Eg |
,
LF =
2 · LP · LR
LP + LR
.
Two variants: one for primary edges, and another for remote edges.

Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser for Universal Conceptual Cognitive Annotation

More Related Content

Similar to Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser for Universal Conceptual Cognitive Annotation (20)

More from Association for Computational Linguistics (20)

Recently uploaded (20)

Daniel Hershcovich - 2017 - A Transition-Based Directed Acyclic Graph Parser for Universal Conceptual Cognitive Annotation