Do Neural Models Learn Transitivity of Veridical Inference?

Do Neural Models Learn Transitivity of
Veridical Inference?
Hitomi Yanaka1,2
Koji Mineshima3
Kentaro Inui4,2
1
University of Tokyo, 2
RIKEN, 3
Keio University, 4
Tohoku University
NALOMA2021@Online
1

Generalization concern about neural models
• Deep neural network models (BERT [Devlin+,2018]) pretrained with
large-scale data have achieved high performance in language
understanding benchmark tasks (GLUE, SuperGLUE [Wang+, 2019]).
• However, many recent analyses [Liu et al., 2019] [McCoy+, 2020] show
that high performance on standard benchmarks does not always
mean that the model has the intended ability to understand
languages (“understanding languages like humans”).
2
1.Introduction

Systematicity (Fodor & Pylyshin, 1988)
Systematicity of language/thought:
● If you understand “John loves the girl”, then you must also
understand “The girl loves John”.
Systematicity of inference:
● If you infer A from A&B, then you must also infer A&B from
A&B&C, etc.
3
1.Introduction
Question
To what extent neural models can learn the systematicity of
inference from training instances?

Systematicity in NLI
● Goal: Study the systematic generalization ability of neural
models on Natural Language Inference (NLI) [Dagan+, 2013].
● Task to judge whether a premise P entails a hypothesis H.
4
P: John knew that there was a wild deer jumping a fence
H: There was a deer jumping over a fence Entailment
1.Introduction

Related work on analyzing whether neural models can learn
systematicity
● Monotonicity inference involving quantifiers and negation
[Goodwin+2020][Geiger+ 2020][Yanaka+ 2020]
● Semantic parsing task
artificial language: SCAN[Lake and Baroni 2017]
natural language: COGS[Kim and Linzen 2020], SyGNS[Yanaka+ 2021]
● Inductive reasoning task
CLUTRR[Sinha+ 2019]
Related work
5
1.Introduction

Transitivity: a key challenge for systematicity of NLI
● If you infer B from A and C from B, then you must also be able to
infer C from A.
A → B B → C
　 A → C
● Syllogism/Cut Rule (in modern proof theory)
● Meta-logical inference ability:
○ The challenge is not to perform/learn a single pattern of inference
but to combine multiple patterns of inference.
○ Given that sentence pairs (A, B) and (B, C) are entailment, you
should also be able to judge (A, C) is entailment.
6
1.Introduction

Transitivity inference: Challenge
● If a model learns basic patterns A → B and B → C, it must be
able to compose these two and draw a new inference A → C.
● If a model lacks this generalization ability, it must memorize an
exponential number of inference combinations independently.
● How to create an NLI dataset for transitivity inference?
7
1.Introduction

Veridicality
Veridicality of clause-embedding verbs [Karttunen+,2012]
8
● A verb V is veridical when “x V that P” entails that P is true
P: John knows that [there was a deer jumping a fence]
H: There was a deer jumping a fence Entailment
● A verb V is non-veridical when “x V that P” does not entail that P is true
P: John hopes that [there was a deer jumping a fence]
H: There was a deer jumping a fence Non-entailment
1.Introduction

Transitivity inference involving veridicality
● Veridical inference can easily compose transitivity inference at scale by
embedding various inferences into clause-embedding verbs.
● Simple heuristics (word overlap etc) fail for composite inferences.
9
1.Introduction

Our work and contributions
Evaluate the systematic generalization ability of neural models on
transitivity inferences that combine veridical inferences with
various inference
1. Provide analysis methods with two transitivity inference
datasets: synthetic datasets and naturalistic datasets
https://github.com/verypluming/transitivity
2. Use our datasets to analyze two standard NLI models: LSTM
and BERT on various combination patterns
3. Analyze whether the data augmentation with new combination
patterns helps models to learn transitivity
10

How to test transitivity
Training
Basic 1. veridical inference: f(s1) → s1
Premise: John {knew/hoped} that Bob and Ann left. [f(s1)]
Hypothesis: Bob and Ann left. [s1] (Entailment/Non-entailment)
Basic 2. various inference patterns (eg. Boolean): s1 → s2
Premise: Bob and Ann left. [s1]
Hypothesis: Ann left. [s2] (Entailment)
Test composite inference: f(s1) → s2
Premise: John {knew/hoped} that Bob and Ann left. [f(s1)]
Hypothesis: Ann left. [s2] (Entailment/Non-entailment)
11
2. Method

Entailment/non-entailment labels
12
Basic (Train) Composite (Test)
f(s1) → s1 s1 → s2 f(s1) → s2
entailment (f: veridical) entailment entailment
entailment (f: veridical) neutral neutral
neutral (f: non-veridical) entailment neutral
neutral (f: non-veridical) neutral neutral
• Entailment labels of basic patterns f(s1) → s1 are determined by
rules (eg. if the embedding verb is veridical, the label for f(s1) → s1
is entailment).
• Entailment labels of composite inference are fixed by:
2. Method

13
● Synthetic dataset: Embedding Boolean inference (and, or, not)
created by CFG rules with meaning representations (simple
Montague Grammar!). Entailment labels of Boolean inference are
checked by a theorem prover.
f(s1): Someone knew that [Bob and Ann found Tom, Jim and Fred]
s1: Bob and Ann found Tom, Jim and Fred
s2: Bob found Jim
f(s1): Someone sees that [a person is brushing a cat]
s1: A person is brushing a cat
s2: A person is combing the fur of a cat
Dataset creation: synthetic and naturalistic datasets
● Naturalistic dataset: Embedding inferences in the SICK dataset
[Marelli+,2014], a collection of lexical/structural inferences.
2. Method

● Choose 30 clause-embedding verbs in previous verb veridicality
datasets[White+,2018][Ross and Pavlick,2019]
● Insert a clause-embedding verb f into the template “Someone f
that s1” to make the main clause in f(s1)
14
Dataset creation: clause-embedding verbs
Examples of how to create naturalistic datasets with SICK
f(s1): Someone sees that [a person is brushing a cat]
s1: A person is brushing a cat
s2: A person is combing the fur of a cat
f(s1) → s2, s1 → s2, f(s1) → s2
2. Method

Experimental setting
● Two neural NLI models
○ LSTM [Hochreiter and Schmidhuber 1997]
○ BERT [Devlin+ 2018]
● Datasets
● Evaluation metrics: the average accuracy of 5 runs
15
3. Experiments
Split Pattern entail:non-entail Synthetic Naturalistic
Train f(s1) → s1 1:1 6,000 30,000
s1 → s2 1:1 3,000 1,000
Test f(s1) → s2 1:3 6,000 30,000

● The models do not perform well on the composite inferences where the
verb f is veridical but embedded sentence s1 does not entail s2.
● They only look at the veridicality of f
Premise: Someone knew that Bob or Ann left. [f(s1)]
Hypothesis: Ann left. [s2]
(Gold: Non-entailment, Prediction: Entailment)
16
Results: LSTM and BERT do not perform transitivity
Synthetic data Naturalistic data
3. Experiments

● (1) Use various templates to generate the main clause in f(s1)
● We manually select 40 clauses of the verb veridicality dataset [Ross and
Pavlick 2019] and provide additional templates.
17
Is poor performance of transitivity inference due to
overfitting on verbs? Two additional setting (1)
Type Template
Pronoun At that moment, we f that s1
Specific group Some economists f that s1
Proper noun Hanson f that s1
3. Experiments

● (1) Use various templates to generate the main clause in f(s1)
● We manually select 40 clauses of the verb veridicality dataset [Ross and
Pavlick 2019] and provide additional templates.
● The results show the same trends: the models fail on the cases where the
verb f is veridical but embedded sentence s1 does not entail s2.
18
yes: entailment, unk: neutral
3. Experiments

● (2) Flip the gold labels of 10% veridical inference examples
● The results show the same trends; even when we consider the
complexity of veridical inference in our analysis, the models fail to
consistently perform composite inference.
19
yes: entailment, unk: neutral
3. Experiments

The data augmentation
improved performance
on transitivity test sets.
20
Does the data augmentation with a part of combination
patterns help models to learn transitivity?
3. Experiments
f(s1)→ s2: non-entail
f: veridical (f(s1)→ s1: entail)
s1 → s2: non-entail

improved performance on
transitivity test sets.
However, the accuracy was
improved without training
s1 → s2 patterns...
21
3. Experiments

improved performance on
transitivity test sets.
However, the accuracy was
improved without training
s1 → s2 patterns...
Models do not “combine”
basic inferences to
perform transitivity
inference.
22
3. Experiments

Humans generally follow the distinction between veridical and
non-veridical verbs, as well as the transitivity of entailment relation.
23
How well do human perform transitivity inference?
Naturalistic data
3. Experiments

24
How well do human perform transitivity inference?
● Humans tend to fail on the composite inferences where verb f is
non-veridical and s1 entails s2.
● They often neglect the veridical verb (Veridicality bias [Ross and Pavlick, 2019])
Premise: Someone believed that a man is jumping off a low wall. [f(s1)]
Hypothesis: A man is jumping a low wall. [s2]
(Gold: Non-entailment, Prediction: Entailment)
Naturalistic data
3. Experiments

Conclusion
Motivation
Evaluate the systematic generalization ability of neural NLI
models on transitivity inferences
Approach
Analyze models with synthetic and naturalistic transitivity
inference datasets involving veridicality
Main results
25
● Current models fail to consistently perform transitivity inference
● Models can memorize composite inference examples, but do not
have the intended ability to combine basic inference
Thanks! Hitomi Yanaka hyanaka@is.s.u-tokyo.ac.jp
Data and Code: https://github.com/verypluming/transitivity

Do Neural Models Learn Transitivity of Veridical Inference?

More Related Content

What's hot (19)

Similar to Do Neural Models Learn Transitivity of Veridical Inference? (20)

More from Hitomi Yanaka (7)

Recently uploaded (20)

Do Neural Models Learn Transitivity of Veridical Inference?