The document investigates the ability of neural models, particularly BERT and LSTM, to learn transitivity in natural language inference (NLI) involving veridical inferences. It finds that while these models perform well on standard benchmarks, they struggle with complex composite inferences, indicating a failure to generalize transitivity. The study presents synthetic and naturalistic datasets specifically designed to evaluate this systematic generalization ability in neural models.
Related topics: