SlideShare a Scribd company logo
Образец заголовка
Tutorial on Coreference
Resolution
by Anirudh Jayakumar (ajayaku2),
Sili Hui(silihui2)
Prepared as an assignment for CS410: Text Information Systems in Spring 2016
Образец заголовкаAgenda
We will mainly address 3 problems
1. What is the Coreference Resolution
problem?
2. What are the existing approaches to it?
3. What are the future directions?
Образец заголовкаWhat is the Coreference Resolution problem?
Suppose you are given a sample text,
I did not voted for Donald
Trump because I think he is…
How can program tell he refers to Donald Trump?
Образец заголовкаDefinition
• Coreference resolution is the task of finding
all noun phrases (NPs) that refer to the
same real-world entity.
• In previous example, he == Donald Trump
• One of the classical NLP problems
Образец заголовкаWhy do we care?
• Suppose your boss ask you to general opinion of Donald Trump from a corpus
of collected text data, how to finish your job?
• I did not voted for Donald Trump because I
think he is…
• What goes after “he is…” provides information about the sentiment of this
person towards “he”, but what does he refers to?
• If we know “he” refers to “Donald Trump” we know more about this person!
(that either likes or, most likely, dislikes Donald Trump)
– Small dataset can be labeled by hand (time consuming but ok-ish)
– What if we have GBs of text data????
Образец заголовкаWhy do we care?
• This is where coreference resolution come into play
– We know more about what entity are associated with
what words
• There are many potential real world use cases:
– information extraction
– information retrieval
– question answering
– machine translation
– text summarization
Образец заголовкаA brief history of mainstream…
• 1970s - 1990s
– Mostly linguistic approaches
– Parse tree, semantic analysis, etc.
• 1990s - 2000s
– More machine Learning approaches to this problem
– Mostly supervised machine learning approaches
• Later 2000s - now
– More unsupervised machine learning approaches came out
– Other models (ILP, Markov Logic Nets, etc) were proposed
Образец заголовкаHow to evaluate?
• How to tell my approach is better than yours?
• Many well-established datasets and benchmarks
– ACE
– MUC
• Evaluate the performance on these datasets on F1 score,
precision, etc.
Образец заголовкаTaxonomy of ideas
• In this tutorial, we will focus on two approaches:
– Linguistic approaches
– Machine learning approaches
• Supervised approaches
• Unsupervised approaches
• Other approaches will be briefly addressed towards the
end
Образец заголовкаLinguistic Approach
• Appear in 1980s
• One of very first approaches to the problem
• Take advantage of linguistic of the text
– parse tree
– syntactic constraints
– semantic analysis
• Requires domain specific knowledge
Образец заголовкаLinguistic Approach
• Centering approach to pronouns was proposed by : S.E. Brennan, M.W.
Friedman, and C.J. Pollard in 1987
• Centering theory was proposed in order to model the “relationships among
– a) focus of attention
– b) choice of referring expression
– c) perceived coherence of utterances
• An entity means an object, that could be the targets of a referring expression.
• An utterance is used to describe the basic unit, which could be a sentence or a
clause or a phrase
• Each utterance is assigned a set of forward-looking centers, Cf (U), and a
single backward-looking center, Cb(U)
Образец заголовкаLinguistic Approach
• The algorithm consists of four main steps
– Construct all possible < Cb, Cf > pairs, by taking the
cross-product of Cb, Cf lists
– Filter these pairs by applying certain constraints
– Classify each pair based on the transition type, and
rank the pairs
– Choose the best ranked pair
• The goal of the algorithm design was conceptual clarity
rather than efficiency
Образец заголовкаMachine Learning Approaches
• More ML approaches appear since 1990s
• We consider two classical categorizations of ML
approaches:
– Supervised learning
• Take advantage of labeled data (train) and predict
on unlabeled data
– Unsupervised learning
• Feed in unlabeled data the algorithm will do the right
thing for you(hopefully)
Образец заголовкаSupervised Learning
• Supervised learning is the machine learning task of
inferring a function from labeled training data.
• The training data consist of a set of training examples.
• A supervised learning algorithm analyzes the training data
and produces an inferred function, which can be used for
mapping new examples
Образец заголовкаSupervised Paper 1
• Evaluating automated and manual acquisition of anaphora resolution
strategies - Chinatsu Aone and Scott William Bennett
• This paper describes an approach to build an automatically trainable
anaphora resolution system
• Uses discourse information tagged Japanese newspaper articles as
training examples for a C4.5 decision tree algorithm
• The training features include lexical (e.g. category), syntactic (e.g.
grammatical role), semantic (e.g. semantic class) and positional (e.g.
distance between anaphor and antecedent)
Образец заголовкаSupervised Paper 1 cont.
• The method uses three training techniques using different parameters
– The anaphoric chain parameter is used in selecting positive and negative
training examples
– With anaphoric type identification parameter,
• answer "no" when a pair of an anaphor and a possible antecedent
are not co-referential,
• answer “yes” the anaphoric type when they are co-referential
– The confidence factor parameter (0-100) is used in pruning decision
trees. With a higher confidence factor, less pruning of the tree
• Using anaphoric chains without anaphoric type identification helps improve
the learning algorithm
• With 100% confidence factor, the tree overfits the examples that lead to
spurious uses of features
Образец заголовкаSupervised Paper 2
• A Machine Learning Approach to Coreference Resolution of Noun Phrases: Wee
Meng Soon, Hwee Tou Ng and Daniel Chung Yong Lim
• Learning approach in unrestricted text by learning from a small-annotated corpus
• All markables in the training set are determined by pipeline of NLP modules
consists of tokenization, sentence segmentation, morphological processing, part-
of-speech tagging, noun phrase identification, named entity recognition, nested
noun phrase extraction and semantic class determination
• The feature vector consists of 12 features derived based on two extracted
markables, j, and i where i is the potential antecedent and j is the anaphor
Образец заголовкаSupervised Paper 2 cont.
• The learning algorithm used in our coreference engine is C5, which is an
updated version of C4.5
• For each j, the algorithm considers every markable i before j as a potential
antecedent. For each pair i and j, a feature vector is generated and given to
the decision tree classifier
• The coreference engine achieves a recall of 58.6% and a precision of 67.3%,
yielding a balanced F-measure of 62.6% for MUC-6.
• For MUC-7, the recall is 56.1%, the precision is 65.5%, and the balanced F-
measure is 60.4%.
Образец заголовкаSupervised Paper 3
• Conditional models of identity uncertainty with application to proper noun
coreference: A. McCallum and B. Wellner
• This paper introduces several discriminative, conditional-probability models
for coreferecne analysis.
• No assumption that pairwise coreference decisions should be made
independently from each other
• Model 1:
– Very general discriminative model where the dependency structure is
unrestricted
– The model considers the coreference decisions and the attributes of
entities as random variables, conditioned on the entity mentions
– The feature functions depend on the coreference decisions, y, the set of
attributes, a as well as the mentions of the entities, x.
Образец заголовкаSupervised Paper 3 cont.
• Model 2: Authors remove the dependence of the coreference variable, y, by
replacing it with a binary valued random variable, Yij for every pair of
mentions
• Model 3: The third model that they introduce does not include attributes as a
random variable, and is otherwise similar to the second model
• The model performs a little better than the approach by Ng and Cardie
(2002)
• the F1 results on NP coreference on the MUC-6 dataset is only about 73%.
Образец заголовкаSupervised Paper 4
• Kernel-Based Pronoun Resolution with Structured Syntactic Knowledge:
Xiaofeng Yang, Jian Su and Chew Lim Tan
• a kernel-based method that can automatically mine the syntactic information
from the parse trees for pronoun resolution
• for each pronominal anaphor encountered, a positive instance is created by
paring the anaphor and its closest antecedent
• a set of negative instances is formed by paring the anaphor with each of the
non-coreferential candidates
• The learning algorithm used in this work is SVM to allow the use of kernels to
incorporate the structured feature
Образец заголовкаSupervised Paper 4 cont.
• The study examines three possible structured features
• Min-Expansion records the minimal structure covering both the pronoun and
the candidate in the parse tree
• Simple-Expansion captures the syntactic properties of the candidate or the
pronoun
• Full-Expansion focuses on the whole tree structure between the candidate
and pronoun
• Hobbs’ algorithm obtains 66%-72% success rates on the three domains
while the baseline system obtains 74%-77% success rates
Образец заголовкаUnsupervised learning
• Let it run on top of your data, no supervision of wrong or
right. Most are iterative methods.
• Generally preferred over supervised learning
– Does not generally need labeled data
– Does not generally need prior knowledge
– Does not subject to dataset limitation
– Often scales better than supervised approaches
• Yet, it came along way…
Образец заголовкаUnsupervised Paper 1
• First notable Unsupervised learning algorithms came out in
2007 by Haghighi, Aria, and Dan Klein
– it presents a generative model
– the objective function is to maximize posterior
probability of entities given collection of variables of
current mention
– it also discussed adding features to the collection like
gender, plurality and entity activation (how often this
entity is mentioned)
Образец заголовкаUnsupervised Paper 1 cont.
• Resulting a over-complicated generative
model
• Achieving 72.5 F1 on MUC-6
• Set a good standard for later algorithms
Образец заголовкаUnsupervised Paper 2
• Inspired by the previous papers
• Another unsupervised method is proposed by Vincent Ng
in 2008
– Use a new but simpler generative model
– Consider all pairs of mentions in a document
– Probability of pair of mentions taking into account 7
context features (gender, plurality, etc)
– Use classical EM algorithm to iterative update those
parameters
Образец заголовкаUnsupervised Paper 2 cont.
• Greatly simplified previous generative
model
• Can be applied to document level instead of
collection level
• Beat the performance of previous model on
AEC dataset (by small margin)
Образец заголовкаUnsupervised Paper 3
• Previous methods emphasize on generative model
• Why not use Markov Net?
• Proposed by Hoifung Poon and Pedro Domingos
– formulate the problem in Markov Logic Net(MLN)
– define rules and clauses and gradually build from a
base model by adding rules
– leverage other sampling algorithms in training and
inference step
Образец заголовкаUnsupervised Paper 3 cont.
• Pioneer work of leveraging Markov Logic into coreference
resolution problem
• Beat the generative model proposed by Haghighi and Klein
by large margin on MUC-6 dataset.
• Authors are pioneers in Markov Logic Network. This paper
may be just a “showcase” of their work and what MLN can
do?
Образец заголовкаRelated Work
• There are many other related work:
– Formulation of equivalent ILP problem
• Pascal Denis and Jason Bladridge
– Enforce transitivity property on ILP
• Finkel, Jenny Rose, and Christopher D. Manning
– Latent structure prediction approach
• Kai-wei Chang, Rajhans Samdani, Dan Roth
(professor @ UIUC)
Образец заголовкаFuture Direction
• After studies, we think here are some major points of future directions
• More Standardized Updated Benchmarks
– Coreference resolution research should use a more standard set of the
standard corpora; in this way, results will be comparable.
• First-order and cluster based features will play an important role
– The use of them has given the field a much-needed push, and will likely
remain a staple of future state-of-the-art
• Combination of linguistic ideas into modern models
– Combining the strengths of the two themes, using more of the richer
machine learning models with the linguistic ideas.
Образец заголовкаConclusion
• Thanks for going through our tutorial!
• Major take-aways:
– Coreference Resolution remains an active research area
– Modern research tends to diverge from pure linguistic analysis
– Generally, the performance(evaluated on well-established dataset) for
state-of-art algorithms is still not optimal for industrial usages that
requires precise labels
– For general purpose, modern unsupervised learning approach can
achieve decent accuracy compared to supervised learning approach
– Future machine learning approaches will leverage more linguistic
knowledge (features) into their model
Образец заголовкаReference
• Brennan, Susan E., Marilyn W. Friedman, and Carl J. Pollard. "A centering approach to pronouns." Proceedings of the 25th annual meeting on
Association for Computational Linguistics. Association for Computational Linguistics, 1987.
• Aone, Chinatsu, and Scott William Bennett. "Evaluating automated and manual acquisition of anaphora resolution strategies." Proceedings of the 33rd
annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 1995.
• Ge, Niyu, John Hale, and Eugene Charniak. "A statistical approach to anaphora resolution." Proceedings of the sixth workshop on very large corpora.
Vol. 71. 1998.
• Soon, Wee Meng, Hwee Tou Ng, and Daniel Chung Yong Lim. "A machine learning approach to coreference resolution of noun phrases."
Computational linguistics 27.4 (2001): 521-544.
• McCallum, Andrew, and Ben Wellner. "Toward conditional models of identity uncertainty with application to proper noun coreference." (2003).
• Yang, Xiaofeng, Jian Su, and Chew Lim Tan. "Kernel-based pronoun resolution with structured syntactic knowledge." Proceedings of the 21st
International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for
Computational Linguistics, 2006.
• Haghighi, Aria, and Dan Klein. "Unsupervised coreference resolution in a nonparametric bayesian model." Annual meeting-Association for
Computational Linguistics. Vol. 45. No. 1. 2007.
• Ng, Vincent. "Unsupervised models for coreference resolution." Proceedings of the Conference on Empirical Methods in Natural Language
Processing. Association for Computational Linguistics, 2008.
• Poon, Hoifung, and Pedro Domingos. "Joint unsupervised coreference resolution with Markov logic." Proceedings of the conference on empirical
methods in natural language processing. Association for Computational Linguistics, 2008.
• Finkel, Jenny Rose, and Christopher D. Manning. "Enforcing transitivity in coreference resolution." Proceedings of the 46th Annual Meeting of the
Association for Computational Linguistics on Human Language Technologies: Short Papers. Association for Computational Linguistics, 2008.
• Chang, Kai-Wei, Rajhans Samdani, and Dan Roth. "A constrained latent variable model for coreference resolution." (2013).
• Denis, Pascal, and Jason Baldridge. "Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming."
HLT-NAACL. 2007.

More Related Content

PPTX
Theory of computation / Post’s Correspondence Problems (PCP)
PPTX
Lecture 1: Semantic Analysis in Language Technology
PPT
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
PDF
CS571: Coreference Resolution
PPTX
Natural Language Processing: Parsing
PDF
Latent dirichletallocation presentation
ODT
A tutorial on Machine Translation
PDF
Challenges in transfer learning in nlp
Theory of computation / Post’s Correspondence Problems (PCP)
Lecture 1: Semantic Analysis in Language Technology
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
CS571: Coreference Resolution
Natural Language Processing: Parsing
Latent dirichletallocation presentation
A tutorial on Machine Translation
Challenges in transfer learning in nlp

What's hot (20)

PPTX
Scala - The Simple Parts, SFScala presentation
PPTX
Machine Tanslation
PDF
The Hiring Problem
PPTX
Phases of compiler
PPTX
Supervised and unsupervised learning
PPT
Master method theorem
PPT
Regular Grammar
PPT
Introduction to Compiler design
PDF
Lecture 5 - Agent communication
PDF
Introduction to Information Retrieval & Models
PPTX
NLP_KASHK:Finite-State Morphological Parsing
PPTX
Pumping lemma Theory Of Automata
PPTX
PPT
Compiler Design
PPTX
An Introduction to Information Retrieval and Applications
PPTX
Introduction to Prolog
PDF
Lecture: Summarization
PDF
Code Smells and Its type (With Example)
PPTX
Knowledge representation In Artificial Intelligence
PPT
Greedy algorithm
Scala - The Simple Parts, SFScala presentation
Machine Tanslation
The Hiring Problem
Phases of compiler
Supervised and unsupervised learning
Master method theorem
Regular Grammar
Introduction to Compiler design
Lecture 5 - Agent communication
Introduction to Information Retrieval & Models
NLP_KASHK:Finite-State Morphological Parsing
Pumping lemma Theory Of Automata
Compiler Design
An Introduction to Information Retrieval and Applications
Introduction to Prolog
Lecture: Summarization
Code Smells and Its type (With Example)
Knowledge representation In Artificial Intelligence
Greedy algorithm
Ad

Similar to Tutorial on Coreference Resolution (20)

PPTX
ML slide share.pptx
PPTX
Introduction and Basics of Machine Learning.pptx
PPTX
Model Development And Evaluation in ML.pptx
PPTX
Intro to machine learning
PDF
Topic Modelling: Tutorial on Usage and Applications
PPTX
Unit 1 - ML - Introduction to Machine Learning.pptx
PPTX
Introduction to ML (Machine Learning)
PDF
newmicrosoftpowerpointpresentation-210512111200.pdf
PPTX
Machine Learning_overview_presentation.pptx
PPTX
AI_attachment.pptx prepared for all students
PPTX
AI Unit 5 Pattern Recognition AKTU .pptx
PPTX
STAT7440StudentIMLPresentationJishan.pptx
PPTX
seminar.pptx
PPTX
Artificial Intelligence Approaches
PPTX
PDF
Machine-Learning for Data analytics and detection
PPTX
Keyword_extraction.pptx
PDF
Lec 4 expert systems
PPTX
Machine learning - session 3
ML slide share.pptx
Introduction and Basics of Machine Learning.pptx
Model Development And Evaluation in ML.pptx
Intro to machine learning
Topic Modelling: Tutorial on Usage and Applications
Unit 1 - ML - Introduction to Machine Learning.pptx
Introduction to ML (Machine Learning)
newmicrosoftpowerpointpresentation-210512111200.pdf
Machine Learning_overview_presentation.pptx
AI_attachment.pptx prepared for all students
AI Unit 5 Pattern Recognition AKTU .pptx
STAT7440StudentIMLPresentationJishan.pptx
seminar.pptx
Artificial Intelligence Approaches
Machine-Learning for Data analytics and detection
Keyword_extraction.pptx
Lec 4 expert systems
Machine learning - session 3
Ad

Recently uploaded (20)

PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Electronic commerce courselecture one. Pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Encapsulation theory and applications.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Per capita expenditure prediction using model stacking based on satellite ima...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Digital-Transformation-Roadmap-for-Companies.pptx
sap open course for s4hana steps from ECC to s4
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Empathic Computing: Creating Shared Understanding
Spectroscopy.pptx food analysis technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
20250228 LYD VKU AI Blended-Learning.pptx
Review of recent advances in non-invasive hemoglobin estimation
Electronic commerce courselecture one. Pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Understanding_Digital_Forensics_Presentation.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
MYSQL Presentation for SQL database connectivity
Encapsulation theory and applications.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Tutorial on Coreference Resolution

  • 1. Образец заголовка Tutorial on Coreference Resolution by Anirudh Jayakumar (ajayaku2), Sili Hui(silihui2) Prepared as an assignment for CS410: Text Information Systems in Spring 2016
  • 2. Образец заголовкаAgenda We will mainly address 3 problems 1. What is the Coreference Resolution problem? 2. What are the existing approaches to it? 3. What are the future directions?
  • 3. Образец заголовкаWhat is the Coreference Resolution problem? Suppose you are given a sample text, I did not voted for Donald Trump because I think he is… How can program tell he refers to Donald Trump?
  • 4. Образец заголовкаDefinition • Coreference resolution is the task of finding all noun phrases (NPs) that refer to the same real-world entity. • In previous example, he == Donald Trump • One of the classical NLP problems
  • 5. Образец заголовкаWhy do we care? • Suppose your boss ask you to general opinion of Donald Trump from a corpus of collected text data, how to finish your job? • I did not voted for Donald Trump because I think he is… • What goes after “he is…” provides information about the sentiment of this person towards “he”, but what does he refers to? • If we know “he” refers to “Donald Trump” we know more about this person! (that either likes or, most likely, dislikes Donald Trump) – Small dataset can be labeled by hand (time consuming but ok-ish) – What if we have GBs of text data????
  • 6. Образец заголовкаWhy do we care? • This is where coreference resolution come into play – We know more about what entity are associated with what words • There are many potential real world use cases: – information extraction – information retrieval – question answering – machine translation – text summarization
  • 7. Образец заголовкаA brief history of mainstream… • 1970s - 1990s – Mostly linguistic approaches – Parse tree, semantic analysis, etc. • 1990s - 2000s – More machine Learning approaches to this problem – Mostly supervised machine learning approaches • Later 2000s - now – More unsupervised machine learning approaches came out – Other models (ILP, Markov Logic Nets, etc) were proposed
  • 8. Образец заголовкаHow to evaluate? • How to tell my approach is better than yours? • Many well-established datasets and benchmarks – ACE – MUC • Evaluate the performance on these datasets on F1 score, precision, etc.
  • 9. Образец заголовкаTaxonomy of ideas • In this tutorial, we will focus on two approaches: – Linguistic approaches – Machine learning approaches • Supervised approaches • Unsupervised approaches • Other approaches will be briefly addressed towards the end
  • 10. Образец заголовкаLinguistic Approach • Appear in 1980s • One of very first approaches to the problem • Take advantage of linguistic of the text – parse tree – syntactic constraints – semantic analysis • Requires domain specific knowledge
  • 11. Образец заголовкаLinguistic Approach • Centering approach to pronouns was proposed by : S.E. Brennan, M.W. Friedman, and C.J. Pollard in 1987 • Centering theory was proposed in order to model the “relationships among – a) focus of attention – b) choice of referring expression – c) perceived coherence of utterances • An entity means an object, that could be the targets of a referring expression. • An utterance is used to describe the basic unit, which could be a sentence or a clause or a phrase • Each utterance is assigned a set of forward-looking centers, Cf (U), and a single backward-looking center, Cb(U)
  • 12. Образец заголовкаLinguistic Approach • The algorithm consists of four main steps – Construct all possible < Cb, Cf > pairs, by taking the cross-product of Cb, Cf lists – Filter these pairs by applying certain constraints – Classify each pair based on the transition type, and rank the pairs – Choose the best ranked pair • The goal of the algorithm design was conceptual clarity rather than efficiency
  • 13. Образец заголовкаMachine Learning Approaches • More ML approaches appear since 1990s • We consider two classical categorizations of ML approaches: – Supervised learning • Take advantage of labeled data (train) and predict on unlabeled data – Unsupervised learning • Feed in unlabeled data the algorithm will do the right thing for you(hopefully)
  • 14. Образец заголовкаSupervised Learning • Supervised learning is the machine learning task of inferring a function from labeled training data. • The training data consist of a set of training examples. • A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples
  • 15. Образец заголовкаSupervised Paper 1 • Evaluating automated and manual acquisition of anaphora resolution strategies - Chinatsu Aone and Scott William Bennett • This paper describes an approach to build an automatically trainable anaphora resolution system • Uses discourse information tagged Japanese newspaper articles as training examples for a C4.5 decision tree algorithm • The training features include lexical (e.g. category), syntactic (e.g. grammatical role), semantic (e.g. semantic class) and positional (e.g. distance between anaphor and antecedent)
  • 16. Образец заголовкаSupervised Paper 1 cont. • The method uses three training techniques using different parameters – The anaphoric chain parameter is used in selecting positive and negative training examples – With anaphoric type identification parameter, • answer "no" when a pair of an anaphor and a possible antecedent are not co-referential, • answer “yes” the anaphoric type when they are co-referential – The confidence factor parameter (0-100) is used in pruning decision trees. With a higher confidence factor, less pruning of the tree • Using anaphoric chains without anaphoric type identification helps improve the learning algorithm • With 100% confidence factor, the tree overfits the examples that lead to spurious uses of features
  • 17. Образец заголовкаSupervised Paper 2 • A Machine Learning Approach to Coreference Resolution of Noun Phrases: Wee Meng Soon, Hwee Tou Ng and Daniel Chung Yong Lim • Learning approach in unrestricted text by learning from a small-annotated corpus • All markables in the training set are determined by pipeline of NLP modules consists of tokenization, sentence segmentation, morphological processing, part- of-speech tagging, noun phrase identification, named entity recognition, nested noun phrase extraction and semantic class determination • The feature vector consists of 12 features derived based on two extracted markables, j, and i where i is the potential antecedent and j is the anaphor
  • 18. Образец заголовкаSupervised Paper 2 cont. • The learning algorithm used in our coreference engine is C5, which is an updated version of C4.5 • For each j, the algorithm considers every markable i before j as a potential antecedent. For each pair i and j, a feature vector is generated and given to the decision tree classifier • The coreference engine achieves a recall of 58.6% and a precision of 67.3%, yielding a balanced F-measure of 62.6% for MUC-6. • For MUC-7, the recall is 56.1%, the precision is 65.5%, and the balanced F- measure is 60.4%.
  • 19. Образец заголовкаSupervised Paper 3 • Conditional models of identity uncertainty with application to proper noun coreference: A. McCallum and B. Wellner • This paper introduces several discriminative, conditional-probability models for coreferecne analysis. • No assumption that pairwise coreference decisions should be made independently from each other • Model 1: – Very general discriminative model where the dependency structure is unrestricted – The model considers the coreference decisions and the attributes of entities as random variables, conditioned on the entity mentions – The feature functions depend on the coreference decisions, y, the set of attributes, a as well as the mentions of the entities, x.
  • 20. Образец заголовкаSupervised Paper 3 cont. • Model 2: Authors remove the dependence of the coreference variable, y, by replacing it with a binary valued random variable, Yij for every pair of mentions • Model 3: The third model that they introduce does not include attributes as a random variable, and is otherwise similar to the second model • The model performs a little better than the approach by Ng and Cardie (2002) • the F1 results on NP coreference on the MUC-6 dataset is only about 73%.
  • 21. Образец заголовкаSupervised Paper 4 • Kernel-Based Pronoun Resolution with Structured Syntactic Knowledge: Xiaofeng Yang, Jian Su and Chew Lim Tan • a kernel-based method that can automatically mine the syntactic information from the parse trees for pronoun resolution • for each pronominal anaphor encountered, a positive instance is created by paring the anaphor and its closest antecedent • a set of negative instances is formed by paring the anaphor with each of the non-coreferential candidates • The learning algorithm used in this work is SVM to allow the use of kernels to incorporate the structured feature
  • 22. Образец заголовкаSupervised Paper 4 cont. • The study examines three possible structured features • Min-Expansion records the minimal structure covering both the pronoun and the candidate in the parse tree • Simple-Expansion captures the syntactic properties of the candidate or the pronoun • Full-Expansion focuses on the whole tree structure between the candidate and pronoun • Hobbs’ algorithm obtains 66%-72% success rates on the three domains while the baseline system obtains 74%-77% success rates
  • 23. Образец заголовкаUnsupervised learning • Let it run on top of your data, no supervision of wrong or right. Most are iterative methods. • Generally preferred over supervised learning – Does not generally need labeled data – Does not generally need prior knowledge – Does not subject to dataset limitation – Often scales better than supervised approaches • Yet, it came along way…
  • 24. Образец заголовкаUnsupervised Paper 1 • First notable Unsupervised learning algorithms came out in 2007 by Haghighi, Aria, and Dan Klein – it presents a generative model – the objective function is to maximize posterior probability of entities given collection of variables of current mention – it also discussed adding features to the collection like gender, plurality and entity activation (how often this entity is mentioned)
  • 25. Образец заголовкаUnsupervised Paper 1 cont. • Resulting a over-complicated generative model • Achieving 72.5 F1 on MUC-6 • Set a good standard for later algorithms
  • 26. Образец заголовкаUnsupervised Paper 2 • Inspired by the previous papers • Another unsupervised method is proposed by Vincent Ng in 2008 – Use a new but simpler generative model – Consider all pairs of mentions in a document – Probability of pair of mentions taking into account 7 context features (gender, plurality, etc) – Use classical EM algorithm to iterative update those parameters
  • 27. Образец заголовкаUnsupervised Paper 2 cont. • Greatly simplified previous generative model • Can be applied to document level instead of collection level • Beat the performance of previous model on AEC dataset (by small margin)
  • 28. Образец заголовкаUnsupervised Paper 3 • Previous methods emphasize on generative model • Why not use Markov Net? • Proposed by Hoifung Poon and Pedro Domingos – formulate the problem in Markov Logic Net(MLN) – define rules and clauses and gradually build from a base model by adding rules – leverage other sampling algorithms in training and inference step
  • 29. Образец заголовкаUnsupervised Paper 3 cont. • Pioneer work of leveraging Markov Logic into coreference resolution problem • Beat the generative model proposed by Haghighi and Klein by large margin on MUC-6 dataset. • Authors are pioneers in Markov Logic Network. This paper may be just a “showcase” of their work and what MLN can do?
  • 30. Образец заголовкаRelated Work • There are many other related work: – Formulation of equivalent ILP problem • Pascal Denis and Jason Bladridge – Enforce transitivity property on ILP • Finkel, Jenny Rose, and Christopher D. Manning – Latent structure prediction approach • Kai-wei Chang, Rajhans Samdani, Dan Roth (professor @ UIUC)
  • 31. Образец заголовкаFuture Direction • After studies, we think here are some major points of future directions • More Standardized Updated Benchmarks – Coreference resolution research should use a more standard set of the standard corpora; in this way, results will be comparable. • First-order and cluster based features will play an important role – The use of them has given the field a much-needed push, and will likely remain a staple of future state-of-the-art • Combination of linguistic ideas into modern models – Combining the strengths of the two themes, using more of the richer machine learning models with the linguistic ideas.
  • 32. Образец заголовкаConclusion • Thanks for going through our tutorial! • Major take-aways: – Coreference Resolution remains an active research area – Modern research tends to diverge from pure linguistic analysis – Generally, the performance(evaluated on well-established dataset) for state-of-art algorithms is still not optimal for industrial usages that requires precise labels – For general purpose, modern unsupervised learning approach can achieve decent accuracy compared to supervised learning approach – Future machine learning approaches will leverage more linguistic knowledge (features) into their model
  • 33. Образец заголовкаReference • Brennan, Susan E., Marilyn W. Friedman, and Carl J. Pollard. "A centering approach to pronouns." Proceedings of the 25th annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 1987. • Aone, Chinatsu, and Scott William Bennett. "Evaluating automated and manual acquisition of anaphora resolution strategies." Proceedings of the 33rd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 1995. • Ge, Niyu, John Hale, and Eugene Charniak. "A statistical approach to anaphora resolution." Proceedings of the sixth workshop on very large corpora. Vol. 71. 1998. • Soon, Wee Meng, Hwee Tou Ng, and Daniel Chung Yong Lim. "A machine learning approach to coreference resolution of noun phrases." Computational linguistics 27.4 (2001): 521-544. • McCallum, Andrew, and Ben Wellner. "Toward conditional models of identity uncertainty with application to proper noun coreference." (2003). • Yang, Xiaofeng, Jian Su, and Chew Lim Tan. "Kernel-based pronoun resolution with structured syntactic knowledge." Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006. • Haghighi, Aria, and Dan Klein. "Unsupervised coreference resolution in a nonparametric bayesian model." Annual meeting-Association for Computational Linguistics. Vol. 45. No. 1. 2007. • Ng, Vincent. "Unsupervised models for coreference resolution." Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2008. • Poon, Hoifung, and Pedro Domingos. "Joint unsupervised coreference resolution with Markov logic." Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 2008. • Finkel, Jenny Rose, and Christopher D. Manning. "Enforcing transitivity in coreference resolution." Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers. Association for Computational Linguistics, 2008. • Chang, Kai-Wei, Rajhans Samdani, and Dan Roth. "A constrained latent variable model for coreference resolution." (2013). • Denis, Pascal, and Jason Baldridge. "Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming." HLT-NAACL. 2007.