SlideShare a Scribd company logo
Reference Scope Identification
in Citing Sentences
         Authors:
                 Amjad Abu-Jbara, Dragomir Radev
                           (University of Michigan)
            Conference:
                                      NAACL 2012
            Expositor:
                                  Akihiro Kameda
              (Aizawa Lab. The University of Tokyo)
Abstract
●   Problem:
    ●   Multiple citation in one sentence
    ●   There are many POS taggers developed using
        different techniques for many major languages such
        as transformation-based error-driven learning (Brill,
        1995), decision trees (Black et al., 1992), Markov
        model (Cutting et al., 1992), maximum entropy
        methods (Ratnaparkhi, 1996) etc for English.
●   Approach:Prepossessing
         and 2+1+2*3+1=10 methods
Preprocessing & Methods
Reference Preprocessing
    (tagging, grouping, non-syntactical element removal)
●   These constraints can be lexicalized (REF.1; REF.2),
    unlexicalized (REF.3; TREF.4) or automatically learned
    (REF.5; REF.6).

●   These constraints can be lexicalized (GREF.1), unlexicalized
    (GTREF.2) or automatically learned (GREF.3).

●   (GTREF.1) apply fuzzy techniques for integrating source
    syntax into hierarchical phrase-based systems (REF.2).
Approach 1(SVM,LR)
●   Word classification
    ●   with SVM, a logistic regression classifier
●   Feature: Distance, Position(Before/After), in Segment(,.;
    and, but, for, nor, or, so, yet), POS tag, Dependency
    Distance, Dependency Relations, Common Ancestor Node,
    Syntactic Distance
●   Problem Example:
    ●   There are many POS taggers developed using different
        techniques for many major languages such as transformation-
        based error-driven learning (Brill, 1995), decision trees (Black et
        al., 1992), Markov model (Cutting et al., 1992), maximum entropy
        methods (Ratnaparkhi, 1996) etc for English.
Approach 2(CRF)
●   Sequence Labeling with CRF
    ●   feature is same as Approach 1
Approach 3-S1-* (CRF/segment)
●   segmentation (1)
    ●   punctuation marks
    ●   coordination conjunctions
        –   and, but, for, nor, or, so, yet
    ●   a set of special expressions
        –   "for example", "for instance", "including", "includes",
            "such as", "like", etc.
●   [Rerankers have been successfully applied to numerous
    NLP tasks such as] [parse selection (GTREF)], [parse
    reranking (GREF)], [question-answering (REF)].
Approach 3-S2-* (CRF/segment)
●   segmentation (2)
    ●   chunking tool
        –   noun groups
        –   verb groups
        –   preposition groups
        –   adjective groups
        –   adverb groups
        –   other parts form segment by themselves
●   [To] [score] [the output] [of] [the coreference models], [we]
    [employ] [the commonly-used MUC scoring program (REF)]
    [and] [the recently-developed CEAF scoring program (TREF)].
Approach 3-*-R1,2,3
                 (CRF/segment)
●   R1: majority label of the words it contains
●   R2: inside if any word is inside
●   R3: outside if any word is outside
    ●   [I O O O O] [I I I] [O O]
AR2011




the link grammar parser
(Sleator and Temperley,1991)
Experiment
Data
●   ACL Anthology Network Corpus
●   3300 sentences, citations in each ≧ 2


             Annotation agreement
●   500 of 3300,
    ●   Preprocessing is perfect
    ●   Kappa coefficient of scope is
              P ( A)−P ( E )
           K=                =2P ( A)−1=0.61
              1−P ( E )
Tools
●   Edinburgh Language Technology Text
    Tokenization Toolkit (LT-TTT)
    ●   text tokenization, part-of-speech tagging, chunking,
        and noun phrase head identification.
●   Stanford parser
    ●   syntactic and dependency parsing
●   LibSVM with linear kernel
●   Weka
    ●   logistic regression classification
Tools
●   Machine Learning for Language Toolkit
    (MALLET)
    ●   CRF

                    Validation
●   10-fold cross validation
Experiment (Preprocessing)
    These constraints can be lexicalized (REF.1; REF.2), ll
                                                   r ec a
●

    unlexicalized (REF.3; TREF.4) or and 93  .1%learned
    (REF.5; REF.6). 3% preci
                               s ion automatically
           ng: 9 8 .
    Taggi
●   These constraints can be lexicalized (GREF.1), unlexicalized
    (GTREF.2) or Perfect!
                 automatically learned (GREF.3).
    Grouping:
    (GTREF.1) apply fuzzy techniques for integrating source
                                      a l:
●


 syntax into hierarchicalence
                              removsystems (REF.2).
Non-syn    tactic refer phrase-based ecall
                            9 0. 1% r
                cision and
9 0.08% pre
Experiment (Main)
               ● CRF
               ● Chunking

               ● Majority
Feature Analysis
●   Feature: Distance, Position(Before/After), Same
    segment(,.; and, but, for, nor, or, so, yet), POS
    tag, Dependency Distance, Dependency
    Relations, Common Ancestor Node, Syntactic
    Distance
Summary
●   Identified reference scope in a sentence which
    has multiple citation
● CRF
● Chunking

● Majority
Reference Scope Identification in Citing Sentences

More Related Content

PDF
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
PPTX
[Paper review] BERT
PPTX
1909 BERT: why-and-how (CODE SEMINAR)
PPTX
Introduction on Prolog - Programming in Logic
PDF
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
PDF
Introduction to Transformers for NLP - Olga Petrova
PDF
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
PPTX
Pre trained language model
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
[Paper review] BERT
1909 BERT: why-and-how (CODE SEMINAR)
Introduction on Prolog - Programming in Logic
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
Introduction to Transformers for NLP - Olga Petrova
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Pre trained language model

What's hot (20)

PPT
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
 
PDF
Labelled Variables in Logic Programming: A First Prototipe in tuProlog
PDF
Cross-Lingual Sentiment Analysis using modified BRAE
PPTX
Method in oop
PDF
FregeDay: Roadmap for resolving differences between Haskell and Frege (Ingo W...
PPTX
Type checking compiler construction Chapter #6
PPTX
NLP State of the Art | BERT
PPTX
Implementation of lexical analyser
DOCX
Sp imp gtu
PPTX
Csr2011 june17 15_15_kaminski
PDF
Fafl notes [2010] (sjbit)
PPT
Type Checking(Compiler Design) #ShareThisIfYouLike
PDF
Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...
PPTX
Language Interaction and Quality Issues: An Exploratory Study
PPTX
Scheme Programming Language
PPTX
Notes on attention mechanism
PPTX
A simple approach of lexical analyzers
PDF
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
PPT
Chapter 13 - Recursion
PPTX
short_talk
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
 
Labelled Variables in Logic Programming: A First Prototipe in tuProlog
Cross-Lingual Sentiment Analysis using modified BRAE
Method in oop
FregeDay: Roadmap for resolving differences between Haskell and Frege (Ingo W...
Type checking compiler construction Chapter #6
NLP State of the Art | BERT
Implementation of lexical analyser
Sp imp gtu
Csr2011 june17 15_15_kaminski
Fafl notes [2010] (sjbit)
Type Checking(Compiler Design) #ShareThisIfYouLike
Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...
Language Interaction and Quality Issues: An Exploratory Study
Scheme Programming Language
Notes on attention mechanism
A simple approach of lexical analyzers
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Chapter 13 - Recursion
short_talk
Ad

Viewers also liked (20)

PPT
Chemrev4
PPTX
Mlearn 2011 conference keynote
PDF
B2B marketing analytics-report
PPTX
3 Ways to Simplify your Mobile Apps
PPT
Chembond
PPT
milieu
PPT
Chembond
PPTX
UCS Senior Market Mobile 2010
PPT
PDF
Amakusa Rick
PPTX
Effective Pedagogy at Scale – Social Learning and Citizen Inquiry
PPT
Shift Happens
PPS
Ingl Sfacil
PPT
Econfig Pres
PPTX
Sense-it - mLearn 2015 presentation
PPTX
M learn 2014 slideshare
PPT
NEW MEDIA LECTURE - Swinburne University Radio Students
PPT
期末專題
 
PPTX
Webstock 2011
PDF
Why Should You Join The Mobile Revolution?
Chemrev4
Mlearn 2011 conference keynote
B2B marketing analytics-report
3 Ways to Simplify your Mobile Apps
Chembond
milieu
Chembond
UCS Senior Market Mobile 2010
Amakusa Rick
Effective Pedagogy at Scale – Social Learning and Citizen Inquiry
Shift Happens
Ingl Sfacil
Econfig Pres
Sense-it - mLearn 2015 presentation
M learn 2014 slideshare
NEW MEDIA LECTURE - Swinburne University Radio Students
期末專題
 
Webstock 2011
Why Should You Join The Mobile Revolution?
Ad

Similar to Reference Scope Identification in Citing Sentences (20)

PPT
Learning for semantic parsing using statistical syntactic parsing techniques
PDF
A Pilot Study On Computer-Aided Coreference Annotation
PPTX
Knowledge Extraction
PDF
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
PDF
Optimization of NLP Components for Robustness and Scalability
PDF
Sentence Validation by Statistical Language Modeling and Semantic Relations
PDF
Improving Machine Learning Approaches to Coreference Resolution
DOC
Coreference Resolution using Hybrid Approach
PDF
ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5
DOC
P-6
DOC
P-6
PDF
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
PDF
Evaluation of subjective answers using glsa enhanced with contextual synonymy
PDF
ijcai2020_submitted_paper_but_not_acceptted
DOC
PPT
4-Chapter Four-Syntactic Parsing and Semantic Analysis.ppt
PDF
An Adaptive Approach for Subjective Answer Evaluation
PDF
Implementation Of Syntax Parser For English Language Using Grammar Rules
PDF
A Bridge Not too Far
PPTX
MEBI 591C/598 – Data and Text Mining in Biomedical Informatics
Learning for semantic parsing using statistical syntactic parsing techniques
A Pilot Study On Computer-Aided Coreference Annotation
Knowledge Extraction
A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relatio...
Optimization of NLP Components for Robustness and Scalability
Sentence Validation by Statistical Language Modeling and Semantic Relations
Improving Machine Learning Approaches to Coreference Resolution
Coreference Resolution using Hybrid Approach
ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5
P-6
P-6
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Evaluation of subjective answers using glsa enhanced with contextual synonymy
ijcai2020_submitted_paper_but_not_acceptted
4-Chapter Four-Syntactic Parsing and Semantic Analysis.ppt
An Adaptive Approach for Subjective Answer Evaluation
Implementation Of Syntax Parser For English Language Using Grammar Rules
A Bridge Not too Far
MEBI 591C/598 – Data and Text Mining in Biomedical Informatics

More from Akihiro Kameda (7)

PPTX
iPRES Day 3 instruction
PPTX
Evaluating Visual Representations for Topic Understanding and Their Effect...
PPTX
ISWC2016 1-slide-survey
PPTX
いかにしてデータを手に入れるか
PPTX
"Joint Extraction of Events and Entities within a Document Context"の解説
PPTX
Iodd2015osaka kameda-slideshare
ODP
PATTY: A Taxonomy of Relational Patterns with Semantic Types
iPRES Day 3 instruction
Evaluating Visual Representations for Topic Understanding and Their Effect...
ISWC2016 1-slide-survey
いかにしてデータを手に入れるか
"Joint Extraction of Events and Entities within a Document Context"の解説
Iodd2015osaka kameda-slideshare
PATTY: A Taxonomy of Relational Patterns with Semantic Types

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Empathic Computing: Creating Shared Understanding
PPT
Teaching material agriculture food technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Electronic commerce courselecture one. Pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Empathic Computing: Creating Shared Understanding
Teaching material agriculture food technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Machine Learning_overview_presentation.pptx
Spectroscopy.pptx food analysis technology
Chapter 3 Spatial Domain Image Processing.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Dropbox Q2 2025 Financial Results & Investor Presentation
MIND Revenue Release Quarter 2 2025 Press Release
sap open course for s4hana steps from ECC to s4
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton

Reference Scope Identification in Citing Sentences

  • 1. Reference Scope Identification in Citing Sentences          Authors: Amjad Abu-Jbara, Dragomir Radev (University of Michigan)             Conference: NAACL 2012             Expositor: Akihiro Kameda (Aizawa Lab. The University of Tokyo)
  • 2. Abstract ● Problem: ● Multiple citation in one sentence ● There are many POS taggers developed using different techniques for many major languages such as transformation-based error-driven learning (Brill, 1995), decision trees (Black et al., 1992), Markov model (Cutting et al., 1992), maximum entropy methods (Ratnaparkhi, 1996) etc for English. ● Approach:Prepossessing      and 2+1+2*3+1=10 methods
  • 4. Reference Preprocessing (tagging, grouping, non-syntactical element removal) ● These constraints can be lexicalized (REF.1; REF.2), unlexicalized (REF.3; TREF.4) or automatically learned (REF.5; REF.6). ● These constraints can be lexicalized (GREF.1), unlexicalized (GTREF.2) or automatically learned (GREF.3). ● (GTREF.1) apply fuzzy techniques for integrating source syntax into hierarchical phrase-based systems (REF.2).
  • 5. Approach 1(SVM,LR) ● Word classification ● with SVM, a logistic regression classifier ● Feature: Distance, Position(Before/After), in Segment(,.; and, but, for, nor, or, so, yet), POS tag, Dependency Distance, Dependency Relations, Common Ancestor Node, Syntactic Distance ● Problem Example: ● There are many POS taggers developed using different techniques for many major languages such as transformation- based error-driven learning (Brill, 1995), decision trees (Black et al., 1992), Markov model (Cutting et al., 1992), maximum entropy methods (Ratnaparkhi, 1996) etc for English.
  • 6. Approach 2(CRF) ● Sequence Labeling with CRF ● feature is same as Approach 1
  • 7. Approach 3-S1-* (CRF/segment) ● segmentation (1) ● punctuation marks ● coordination conjunctions – and, but, for, nor, or, so, yet ● a set of special expressions – "for example", "for instance", "including", "includes", "such as", "like", etc. ● [Rerankers have been successfully applied to numerous NLP tasks such as] [parse selection (GTREF)], [parse reranking (GREF)], [question-answering (REF)].
  • 8. Approach 3-S2-* (CRF/segment) ● segmentation (2) ● chunking tool – noun groups – verb groups – preposition groups – adjective groups – adverb groups – other parts form segment by themselves ● [To] [score] [the output] [of] [the coreference models], [we] [employ] [the commonly-used MUC scoring program (REF)] [and] [the recently-developed CEAF scoring program (TREF)].
  • 9. Approach 3-*-R1,2,3 (CRF/segment) ● R1: majority label of the words it contains ● R2: inside if any word is inside ● R3: outside if any word is outside ● [I O O O O] [I I I] [O O]
  • 10. AR2011 the link grammar parser (Sleator and Temperley,1991)
  • 12. Data ● ACL Anthology Network Corpus ● 3300 sentences, citations in each ≧ 2 Annotation agreement ● 500 of 3300, ● Preprocessing is perfect ● Kappa coefficient of scope is P ( A)−P ( E ) K= =2P ( A)−1=0.61 1−P ( E )
  • 13. Tools ● Edinburgh Language Technology Text Tokenization Toolkit (LT-TTT) ● text tokenization, part-of-speech tagging, chunking, and noun phrase head identification. ● Stanford parser ● syntactic and dependency parsing ● LibSVM with linear kernel ● Weka ● logistic regression classification
  • 14. Tools ● Machine Learning for Language Toolkit (MALLET) ● CRF Validation ● 10-fold cross validation
  • 15. Experiment (Preprocessing) These constraints can be lexicalized (REF.1; REF.2), ll r ec a ● unlexicalized (REF.3; TREF.4) or and 93 .1%learned (REF.5; REF.6). 3% preci s ion automatically ng: 9 8 . Taggi ● These constraints can be lexicalized (GREF.1), unlexicalized (GTREF.2) or Perfect! automatically learned (GREF.3). Grouping: (GTREF.1) apply fuzzy techniques for integrating source a l: ● syntax into hierarchicalence removsystems (REF.2). Non-syn tactic refer phrase-based ecall 9 0. 1% r cision and 9 0.08% pre
  • 16. Experiment (Main) ● CRF ● Chunking ● Majority
  • 17. Feature Analysis ● Feature: Distance, Position(Before/After), Same segment(,.; and, but, for, nor, or, so, yet), POS tag, Dependency Distance, Dependency Relations, Common Ancestor Node, Syntactic Distance
  • 18. Summary ● Identified reference scope in a sentence which has multiple citation ● CRF ● Chunking ● Majority

Editor's Notes

  • #3: 難波先生や自身らがある引用が文をまたがって説明されている場合のスコープの同定を扱っていることが関連研究に示されている。 応用は要約など。
  • #13: Annotator 2人なのでたまたま被る確率P(E)は1/2 P(A)は8割ちょい