Tutorial on Coreference Resolution

Образец заголовка
Tutorial on Coreference
Resolution
by Anirudh Jayakumar (ajayaku2),
Sili Hui(silihui2)
Prepared as an assignment for CS410: Text Information Systems in Spring 2016

Образец заголовкаAgenda
We will mainly address 3 problems
1. What is the Coreference Resolution
problem?
2. What are the existing approaches to it?
3. What are the future directions?

Образец заголовкаWhat is the Coreference Resolution problem?
Suppose you are given a sample text,
I did not voted for Donald
Trump because I think he is…
How can program tell he refers to Donald Trump?

Образец заголовкаDefinition
• Coreference resolution is the task of finding
all noun phrases (NPs) that refer to the
same real-world entity.
• In previous example, he == Donald Trump
• One of the classical NLP problems

Образец заголовкаWhy do we care?
• Suppose your boss ask you to general opinion of Donald Trump from a corpus
of collected text data, how to finish your job?
• I did not voted for Donald Trump because I
think he is…
• What goes after “he is…” provides information about the sentiment of this
person towards “he”, but what does he refers to?
• If we know “he” refers to “Donald Trump” we know more about this person!
(that either likes or, most likely, dislikes Donald Trump)
– Small dataset can be labeled by hand (time consuming but ok-ish)
– What if we have GBs of text data????

Образец заголовкаWhy do we care?
• This is where coreference resolution come into play
– We know more about what entity are associated with
what words
• There are many potential real world use cases:
– information extraction
– information retrieval
– question answering
– machine translation
– text summarization

Образец заголовкаA brief history of mainstream…
• 1970s - 1990s
– Mostly linguistic approaches
– Parse tree, semantic analysis, etc.
• 1990s - 2000s
– More machine Learning approaches to this problem
– Mostly supervised machine learning approaches
• Later 2000s - now
– More unsupervised machine learning approaches came out
– Other models (ILP, Markov Logic Nets, etc) were proposed

Образец заголовкаHow to evaluate?
• How to tell my approach is better than yours?
• Many well-established datasets and benchmarks
– ACE
– MUC
• Evaluate the performance on these datasets on F1 score,
precision, etc.

Образец заголовкаTaxonomy of ideas
• In this tutorial, we will focus on two approaches:
– Linguistic approaches
– Machine learning approaches
• Supervised approaches
• Unsupervised approaches
• Other approaches will be briefly addressed towards the
end

Образец заголовкаLinguistic Approach
• Appear in 1980s
• One of very first approaches to the problem
• Take advantage of linguistic of the text
– parse tree
– syntactic constraints
– semantic analysis
• Requires domain specific knowledge

• Centering approach to pronouns was proposed by : S.E. Brennan, M.W.
Friedman, and C.J. Pollard in 1987
• Centering theory was proposed in order to model the “relationships among
– a) focus of attention
– b) choice of referring expression
– c) perceived coherence of utterances
• An entity means an object, that could be the targets of a referring expression.
• An utterance is used to describe the basic unit, which could be a sentence or a
clause or a phrase
• Each utterance is assigned a set of forward-looking centers, Cf (U), and a
single backward-looking center, Cb(U)

• The algorithm consists of four main steps
– Construct all possible < Cb, Cf > pairs, by taking the
cross-product of Cb, Cf lists
– Filter these pairs by applying certain constraints
– Classify each pair based on the transition type, and
rank the pairs
– Choose the best ranked pair
• The goal of the algorithm design was conceptual clarity
rather than efficiency

Образец заголовкаMachine Learning Approaches
• More ML approaches appear since 1990s
• We consider two classical categorizations of ML
approaches:
– Supervised learning
• Take advantage of labeled data (train) and predict
on unlabeled data
– Unsupervised learning
• Feed in unlabeled data the algorithm will do the right
thing for you(hopefully)

Образец заголовкаSupervised Learning
• Supervised learning is the machine learning task of
inferring a function from labeled training data.
• The training data consist of a set of training examples.
• A supervised learning algorithm analyzes the training data
and produces an inferred function, which can be used for
mapping new examples

Образец заголовкаSupervised Paper 1
• Evaluating automated and manual acquisition of anaphora resolution
strategies - Chinatsu Aone and Scott William Bennett
• This paper describes an approach to build an automatically trainable
anaphora resolution system
• Uses discourse information tagged Japanese newspaper articles as
training examples for a C4.5 decision tree algorithm
• The training features include lexical (e.g. category), syntactic (e.g.
grammatical role), semantic (e.g. semantic class) and positional (e.g.
distance between anaphor and antecedent)

Образец заголовкаSupervised Paper 1 cont.
• The method uses three training techniques using different parameters
– The anaphoric chain parameter is used in selecting positive and negative
training examples
– With anaphoric type identification parameter,
• answer "no" when a pair of an anaphor and a possible antecedent
are not co-referential,
• answer “yes” the anaphoric type when they are co-referential
– The confidence factor parameter (0-100) is used in pruning decision
trees. With a higher confidence factor, less pruning of the tree
• Using anaphoric chains without anaphoric type identification helps improve
the learning algorithm
• With 100% confidence factor, the tree overfits the examples that lead to
spurious uses of features

• A Machine Learning Approach to Coreference Resolution of Noun Phrases: Wee
Meng Soon, Hwee Tou Ng and Daniel Chung Yong Lim
• Learning approach in unrestricted text by learning from a small-annotated corpus
• All markables in the training set are determined by pipeline of NLP modules
consists of tokenization, sentence segmentation, morphological processing, part-
of-speech tagging, noun phrase identification, named entity recognition, nested
noun phrase extraction and semantic class determination
• The feature vector consists of 12 features derived based on two extracted
markables, j, and i where i is the potential antecedent and j is the anaphor

• The learning algorithm used in our coreference engine is C5, which is an
updated version of C4.5
• For each j, the algorithm considers every markable i before j as a potential
antecedent. For each pair i and j, a feature vector is generated and given to
the decision tree classifier
• The coreference engine achieves a recall of 58.6% and a precision of 67.3%,
yielding a balanced F-measure of 62.6% for MUC-6.
• For MUC-7, the recall is 56.1%, the precision is 65.5%, and the balanced F-
measure is 60.4%.

• Conditional models of identity uncertainty with application to proper noun
coreference: A. McCallum and B. Wellner
• This paper introduces several discriminative, conditional-probability models
for coreferecne analysis.
• No assumption that pairwise coreference decisions should be made
independently from each other
• Model 1:
– Very general discriminative model where the dependency structure is
unrestricted
– The model considers the coreference decisions and the attributes of
entities as random variables, conditioned on the entity mentions
– The feature functions depend on the coreference decisions, y, the set of
attributes, a as well as the mentions of the entities, x.

• Model 2: Authors remove the dependence of the coreference variable, y, by
replacing it with a binary valued random variable, Yij for every pair of
mentions
• Model 3: The third model that they introduce does not include attributes as a
random variable, and is otherwise similar to the second model
• The model performs a little better than the approach by Ng and Cardie
(2002)
• the F1 results on NP coreference on the MUC-6 dataset is only about 73%.

• Kernel-Based Pronoun Resolution with Structured Syntactic Knowledge:
Xiaofeng Yang, Jian Su and Chew Lim Tan
• a kernel-based method that can automatically mine the syntactic information
from the parse trees for pronoun resolution
• for each pronominal anaphor encountered, a positive instance is created by
paring the anaphor and its closest antecedent
• a set of negative instances is formed by paring the anaphor with each of the
non-coreferential candidates
• The learning algorithm used in this work is SVM to allow the use of kernels to
incorporate the structured feature

• The study examines three possible structured features
• Min-Expansion records the minimal structure covering both the pronoun and
the candidate in the parse tree
• Simple-Expansion captures the syntactic properties of the candidate or the
pronoun
• Full-Expansion focuses on the whole tree structure between the candidate
and pronoun
• Hobbs’ algorithm obtains 66%-72% success rates on the three domains
while the baseline system obtains 74%-77% success rates

Образец заголовкаUnsupervised learning
• Let it run on top of your data, no supervision of wrong or
right. Most are iterative methods.
• Generally preferred over supervised learning
– Does not generally need labeled data
– Does not generally need prior knowledge
– Does not subject to dataset limitation
– Often scales better than supervised approaches
• Yet, it came along way…

Образец заголовкаUnsupervised Paper 1
• First notable Unsupervised learning algorithms came out in
2007 by Haghighi, Aria, and Dan Klein
– it presents a generative model
– the objective function is to maximize posterior
probability of entities given collection of variables of
current mention
– it also discussed adding features to the collection like
gender, plurality and entity activation (how often this
entity is mentioned)

Образец заголовкаUnsupervised Paper 1 cont.
• Resulting a over-complicated generative
model
• Achieving 72.5 F1 on MUC-6
• Set a good standard for later algorithms

• Inspired by the previous papers
• Another unsupervised method is proposed by Vincent Ng
in 2008
– Use a new but simpler generative model
– Consider all pairs of mentions in a document
– Probability of pair of mentions taking into account 7
context features (gender, plurality, etc)
– Use classical EM algorithm to iterative update those
parameters

• Greatly simplified previous generative
model
• Can be applied to document level instead of
collection level
• Beat the performance of previous model on
AEC dataset (by small margin)

• Previous methods emphasize on generative model
• Why not use Markov Net?
• Proposed by Hoifung Poon and Pedro Domingos
– formulate the problem in Markov Logic Net(MLN)
– define rules and clauses and gradually build from a
base model by adding rules
– leverage other sampling algorithms in training and
inference step

• Pioneer work of leveraging Markov Logic into coreference
resolution problem
• Beat the generative model proposed by Haghighi and Klein
by large margin on MUC-6 dataset.
• Authors are pioneers in Markov Logic Network. This paper
may be just a “showcase” of their work and what MLN can
do?

Образец заголовкаRelated Work
• There are many other related work:
– Formulation of equivalent ILP problem
• Pascal Denis and Jason Bladridge
– Enforce transitivity property on ILP
• Finkel, Jenny Rose, and Christopher D. Manning
– Latent structure prediction approach
• Kai-wei Chang, Rajhans Samdani, Dan Roth
(professor @ UIUC)

Образец заголовкаFuture Direction
• After studies, we think here are some major points of future directions
• More Standardized Updated Benchmarks
– Coreference resolution research should use a more standard set of the
standard corpora; in this way, results will be comparable.
• First-order and cluster based features will play an important role
– The use of them has given the field a much-needed push, and will likely
remain a staple of future state-of-the-art
• Combination of linguistic ideas into modern models
– Combining the strengths of the two themes, using more of the richer
machine learning models with the linguistic ideas.

Образец заголовкаConclusion
• Thanks for going through our tutorial!
• Major take-aways:
– Coreference Resolution remains an active research area
– Modern research tends to diverge from pure linguistic analysis
– Generally, the performance(evaluated on well-established dataset) for
state-of-art algorithms is still not optimal for industrial usages that
requires precise labels
– For general purpose, modern unsupervised learning approach can
achieve decent accuracy compared to supervised learning approach
– Future machine learning approaches will leverage more linguistic
knowledge (features) into their model

Образец заголовкаReference
• Brennan, Susan E., Marilyn W. Friedman, and Carl J. Pollard. "A centering approach to pronouns." Proceedings of the 25th annual meeting on
Association for Computational Linguistics. Association for Computational Linguistics, 1987.
• Aone, Chinatsu, and Scott William Bennett. "Evaluating automated and manual acquisition of anaphora resolution strategies." Proceedings of the 33rd
annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 1995.
• Ge, Niyu, John Hale, and Eugene Charniak. "A statistical approach to anaphora resolution." Proceedings of the sixth workshop on very large corpora.
Vol. 71. 1998.
• Soon, Wee Meng, Hwee Tou Ng, and Daniel Chung Yong Lim. "A machine learning approach to coreference resolution of noun phrases."
Computational linguistics 27.4 (2001): 521-544.
• McCallum, Andrew, and Ben Wellner. "Toward conditional models of identity uncertainty with application to proper noun coreference." (2003).
• Yang, Xiaofeng, Jian Su, and Chew Lim Tan. "Kernel-based pronoun resolution with structured syntactic knowledge." Proceedings of the 21st
International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for
Computational Linguistics, 2006.
• Haghighi, Aria, and Dan Klein. "Unsupervised coreference resolution in a nonparametric bayesian model." Annual meeting-Association for
Computational Linguistics. Vol. 45. No. 1. 2007.
• Ng, Vincent. "Unsupervised models for coreference resolution." Proceedings of the Conference on Empirical Methods in Natural Language
Processing. Association for Computational Linguistics, 2008.
• Poon, Hoifung, and Pedro Domingos. "Joint unsupervised coreference resolution with Markov logic." Proceedings of the conference on empirical
methods in natural language processing. Association for Computational Linguistics, 2008.
• Finkel, Jenny Rose, and Christopher D. Manning. "Enforcing transitivity in coreference resolution." Proceedings of the 46th Annual Meeting of the
Association for Computational Linguistics on Human Language Technologies: Short Papers. Association for Computational Linguistics, 2008.
• Chang, Kai-Wei, Rajhans Samdani, and Dan Roth. "A constrained latent variable model for coreference resolution." (2013).
• Denis, Pascal, and Jason Baldridge. "Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming."
HLT-NAACL. 2007.

Tutorial on Coreference Resolution

More Related Content

What's hot (20)

Similar to Tutorial on Coreference Resolution (20)

Recently uploaded (20)

Tutorial on Coreference Resolution