SPADE: Evaluation Dataset for Monolingual Phrase Alignment

SPADE: Evaluation Dataset for
Monolingual Phrase Alignment
Yuki Arase*† and Junichi Tsujii†◊
*Osaka University, Japan
†Artificial Intelligence Research Center (AIRC), AIST, Japan
◊NaCTeM, School of Computer Science, University of Manchester, UK

Created and Released a dataset annotating
Phrase alignments on parse trees
of paraphrases
her life is excellent and wonderful… she also has a very splendid… life
COOD
ADJP
VP
NP
S…
ADJP
NP
NP
VP
VP
S…
Annotator #1
Annotator #2
Annotator #3
2
15,721
alignments

https://catalog.ldc.upenn.edu/LDC2018T09
3

Phrasal (N-gram) Paraphrases
♠Phrasal paraphrases of N-grams have been useful
for NLP applications
• Semantic parsing (Berant and Liang, 2014)
• Automatic QA (Dong et al., 2017)
♠PPDB (Ganitkevitch et al., 2013) is widely used as
an abundant resource
4

Are N-grams Sufficient?
♠Syntactic structures are important in modeling
phrases/sentences
• Semantic relatedness (Tai et al., 2015)
• Phrase embedding (Wieting et al., 2015)
♠Part of PPDB provides phrasal paraphrases under the
synchronous context free grammar (SCFG)
♠SCFG captures only a fraction of paraphrasing
phenomenon (Weese et al., 2014)
• Only 9.1% of paraphrases were reachable using SCFG
5

♠Phrasal paraphrases under the linguistically
motivated grammar would deliver richer
syntactic information
♠For systematic research,
• SPADE annotates phrase alignments under
the head-driven phrase structure grammar (Pollard
and Sag, 1994)
• Evaluation metrics are proposed for benchmarking
Phrase Alignment on Paraphrases
6

Annotation Target
Paraphrases extracted from MT evaluation corpora
♠Paraphrases by linguistic operations
♠Paraphrases with simple summarization
Relying on team spirit, expedition members defeated difficulties.
Members of the scientific team overcame challenges living on Mars
through teamwork.
7

Approach
1. Gold-tree annotation by a linguistic expert
2. Phrase alignment annotation
• 3 annotators independently identified phrase
alignments using a provided annotation tool
• Refer to tree structures when helpful
8

Gold-Tree Annotation
COOD
ADJP
VP
NP
S…
ADJP
NP
NP
VP
VP
S…
9

Phrase alignment annotation
COOD
ADJP
VP
NP
S…
ADJP
NP
NP
VP
VP
S…
10

SPADE Statistics
Dev Test
# of sentence pairs 50 151
# of tokens 2,494 7,276
# of types 736 1,573
# of phrases (w/o tokens) 5,201 15,075
# of alignments (∪) 3,932 11,789
# of alignments (∩) 2,518 7,134
11

Evaluation Metric
♠ALIR (ALInment Recall) evaluates how gold alignments
(𝔾𝔾 & 𝔾𝔾′) can be replicated by automatic alignment (ℍ𝑎𝑎)
ALIR =
| 𝕙𝕙|𝕙𝕙 ∈ ℍ𝑎𝑎 ∧ 𝕙𝕙 ∈ 𝔾𝔾 ∩ 𝔾𝔾′
|
𝔾𝔾 ∩ 𝔾𝔾′
♠ALIP (ALInment Precision) evaluates how automatic
alignments overlap with alignments that at least an
annotator aligned
ALIP =
| 𝕙𝕙|𝕙𝕙 ∈ ℍ𝑎𝑎 ∧ 𝕙𝕙 ∈ 𝔾𝔾 ∪ 𝔾𝔾′
|
ℍ𝑎𝑎
12

Benchmark
90.65
88.21
83.64
78.91
70
75
80
85
90
95
ALIR ALIP
Human
(Arase and Tsujii,
2017)
Y. Arase and J. Tsujii. 2017.
Monolingual Phrase Alignment
on Parse Forests, in Proc. of
EMNLP, pp. 1-11.
13

Future Directions
Expand the dataset
1. Size
• Working on annotating 5k more paraphrase pairs
2. Linguistic phenomenon in paraphrases
• SPADE used reference translations as paraphrases
• Cover relatively simple paraphrases due to constraints by
the source sentences
14

Future Directions (Cont’d)
2. Linguistic phenomenon in paraphrases
• Annotate paraphrases from other datasets
• Microsoft Research Paraphrase Corpus (Dolan et al., 2004)
• Twitter URL corpus (Lan et al., 2017)
• Cover diverse linguistic phenomenon of
paraphrases in the wild
Ex) Paraphrases involve inferences/entailments
Scientists overcame challenges living on Mars.
Scientists overcame water and oxygen scarcity on the red planet. 15

SPADE: Evaluation Dataset for Monolingual Phrase Alignment

More Related Content

Recently uploaded (20)

Featured (20)

SPADE: Evaluation Dataset for Monolingual Phrase Alignment