SlideShare a Scribd company logo
Automatically Detecting
Scientific
Misinformation
Isabelle Augenstein*
augenstein@di.ku.dk
@IAugenstein
http://isabelleaugenstein.github.io/
*partial slide credit: Dustin Wright
CONSTRAINT
Workshop @ ACL
27 May 2022
Introduction
2
Supporting the Life Cycle of Research
26/05/2022 3
Reviewing
Support
Citation
Analysis
Writing
Assistance
Information
Discovery
Conducting
Experiments
Paper
Writing
Peer Review
Research
Impact
Tracking
Information
Extraction
Summarisa
tion
Citation
Prediction
Reviewer
Matching
Review
Score
Prediction
Citation
Prediction
Citation
Trend
Analysis
Fact Checking
26/05/2022 4
Focus on veracity
What about more subtle
forms of misinformation?
26/05/2022 5
Press Release
BBC DailyMail
The Express
Etc...
Credibility and Veracity of Science Communication
Knowledge-Intensive Tasks
26/05/2022 6
CLAIMS
EVIDENCE
Factuality,
faithfulness, credit
assignment, etc.
Goal: automatically ensure the credibility and veracity of scientific information
Fundamental Unit of Information: Claims
• Focus: tasks involving scientific claims, i.e. assertions
about the world which are factual in nature
26/05/2022 7
Current treatment options for ALS are based on symptom
management and respiratory support with the only approved
medications in widespread use, Riluzole and Edaravone,
providing only modest benefits and only in some patients.
• Challenges
• Supervised learning is hard: annotation is expensive, requiring
domain experts
• Language used is diverse across fields
• Different modalities
• Scientific claims are complex
• Meta-data also important
Overview of Today’s Talk
• Introduction
• The Life Cycle of Scientific Research
• Part 1: Claim detection and generation
• Cite-worthiness detection
• Scientific claim generation for zero-shot scientific fact checking
• Part 2: Exaggeration Detection
• Exaggeration detection of health science press releases
• Conclusion
• Future research challenges
CiteWorth: Cite-Worthiness Detection
for Improved Scientific Document
Understanding
Dustin Wright, Isabelle Augenstein
ACL 2021 (Findings)
9
Claims Should be Properly Credited
26/05/2022 10
Current treatment options for ALS are based on symptom
management and respiratory support with the only approved
medications in widespread use, Riluzole and Edaravone,
providing only modest benefits and only in some patients.
Cite-worthiness detection
Citances in Machine Learning
26/05/2022 11
We use the model from the original BERT paper (Devlin et al. 2019).
Cite-worthiness: Is this a citance? Yes
Recommendation: What paper should be cited? Devlin et al. (2019)
Influence: Was this an influential paper? Yes
Intent: What is the purpose of the citation? Method
Cite-Worthiness Datasets
• Tend to be small and limited to only a few domains
(e.g. Computer Science)
• No attention paid to how clean the data is
26/05/2022 12
We use the model from Devlin et al. (2019) as a baseline.
e.g. ungrammatical phrases
Our Research Questions
• How can a dataset for cite-worthiness detection be
automatically curated with low noise?
• What methods are most effective for automatically
detecting cite-worthy sentences?
• How does domain affect learning cite-worthiness
detection?
• Can large scale cite-worthiness data be used to
perform transfer learning to downstream scientific text
tasks?
26/05/2022 13
CiteWorth: Dataset Curation
26/05/2022 14
1. https://github.com/allenai/s2orc
We use the model from the original BERT paper (Devlin et al. 2019).
We use the model from the original BERT paper [1].
Parenthetical author/year and bracketed numerical citations only
Citations must be at the end of a sentence
• We limit citances as follows
• Source data: S2ORC1 – millions of extracted scientific
documents from Semantic Scholar
RQ1: How can a dataset for cite-worthiness detection be automatically curated with low noise?
CiteWorth Final Dataset
• 1,181,793 sentences
• 10 different fields, 20,000+ paragraphs per field
• Much cleaner than a naive baseline which only
removes citation text based on gold spans
26/05/2022 15
RQ1: How can a dataset for cite-worthiness detection be automatically curated with low noise?
Method Sentences Clean (%) Citation Markers Removed (%)
Naive Baseline 92.07 92.78
CiteWorth (Ours) 98.90 98.10
Predicting on Individual Sentences
26/05/2022 16
Can context improve performance?
Method P R F1
Logistic
Regression
46.65 64.88 54.28
CRNN 50.87 62.21 55.97
Transformer 47.92 71.59 57.39
BERT 55.04 69.02 61.23
SciBERT 57.03 68.08 62.06
Hard Examples – Low Confidence
Easy Examples – High Confidence
Explanations using inputXGrad*:
RQ2: What methods are most effective for automatically detecting cite-worthy sentences?
* Pieter-Jan Kindermans, Kristof Schütt, Klaus-Robert Müller, and Sven Dähne. 2016. Investigating the Influence of Noise and
Distractors on the Interpretation of Neural Networks. arXiv preprint arXiv:1611.07270.
Predicting Multiple Sentences at Once
26/05/2022 17
Are there variations across field?
RQ2: What methods are most effective for automatically detecting cite-worthy sentences?
Longformer*
[CLS] !!
! !!
"
[SEP] !"
!
!"
"
!"
#
[SEP]
Pooling
Classify
Pooling
Classify
… …
Method P R F1
SciBERT 57.03 68.08 62.06
Longformer-Solo 57.21 68.00 62.14
Longformer-Ctx 59.92 77.15 67.45 Δ 5 pts
* Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The Long-Document Transformer. CoRR, abs/2004.05150.
Domain Effects
26/05/2022 18
67.58 58.41 56.86 62.35 68.23
66.62 60.25 60.11 64.02 68.07
65.05 59.36 61.99 63.85 66.72
65.49 58.03 56.69 65.10 68.27
66.59 58.80 58.22 64.54 69.12
Ch
E
CS
P
B
Ch E CS P B
Test
Train
RQ3: How does domain affect learning cite-worthiness detection?
Average Pooled BERT representations of
sentences
Clustered using GMM
Roee Aharoni and Y. Goldberg. 2020. Unsupervised Domain Clusters in Pretrained Language Models. In ACL.
Longformer-Ctx performance in
cross-domain evaluation
Generating Scientific Claims for Zero-
Shot Scientific Fact Checking
Dustin Wright, Dave Wadden, Kyle Lo,
Bailey Kuehl, Isabelle Augenstein, Lucie
Lu Wang
ACL 2022
19
Claims Should be Factually Correct
26/05/2022 20
Current treatment options for ALS are based on symptom
management and respiratory support with the only approved
medications in widespread use, Riluzole and Edaravone,
providing only modest benefits and only in some patients.
ALS cannot be treated by Riluzole.
Scientific claim generation for zero-shot
scientific fact checking
Citances in Scientific Fact Checking
● Scientific fact checking data is difficult to collect --
where to collect claims?
● Previous work: SciFact
○ Crowdsource claims from citances
○ Requires manually rewriting complex claims into atomic
claims
26/05/2022 21
Australian statistics show that 1 in 7 young people have an
anxiety disorder and 1 in 16 have depression
Australian statistics show that 1 in 7 young people
have an anxiety disorder
Australian statistics show that 1 in 16 young
people have depression
Individual claims are “atomic verifiable statements
expressing a finding about one aspect of a scientific
entity or process,
which can be verified from a single source” ZSSFC (ACL 2022)
Research Questions
1. How can we generate claims from citances that are useful for
scientific fact checking?
➢ Citances -- sentences which have references to external source
➢ Useful for fact checking because they contain a link to evidence
which can be used to verify their content
2. What methods can generate claims which are fluent, faithful,
atomic, and de-contextualized?
3. In what situations do generated claims help improve
performance on zero-shot scientific fact checking?
26/05/2022 22
ZSSFC (ACL 2022)
Methods: ClaimGen-Entity
26/05/2022 23
P
P,a
1
P,a
2
P,a
3
Mq
Q1,a
1
Q2,a
2
Q3,a
3
Mc
C1
C2
C3
NER
Citance Named
Entity
Recognitio
n
(scispacy)
Question
Generation
(BART)
Claim
Generation
(BART)
ZSSFC (ACL 2022)
Methods: ClaimGen-BART
26/05/2022 24
P
P,a
1
P,a
2
P,a
3
Mq
Q1,a
1
Q2,a
2
Q3,a
3
Mc
C1
C2
C3
NER
Citance Named
Entity
Recognitio
n
(scispacy)
Question
Generation
(BART)
Claim
Generation
(BART)
Input format: “{CONTEXT} || {CITANCE} || {CLAIM}”
Sample multiple claims at test time
ZSSFC (ACL 2022)
Generating Negations: KBIN
26/05/2022 25
Exergames improve function and
reduce the risk of falls.
C0184511
UMLS
'C0426422'
'C1998348'
'C0024103'
'C0043094'
'C1457868'
'C1827505'
'C1719838'
T03
3
cui2vec1
C1457868
1. Andrew L. Beam, Benjamin Kompa, Allen Schmaltz, Inbar Fried, Griffin M. Weber, Nathan P.
Palmer, Xu Shi, Tianxi Cai, Isaac S. Kohane: Clinical Concept Embeddings Learned from
Massive Sources of Multimodal Medical Data. PSB 2020: 295-306
ZSSFC (ACL 2022)
Generating Negations: KBIN
26/05/2022 26
UMLS
C1457868
Exergames worse function
and reduce the risk of falls.
Exergames deteriorating
function and reduce the risk
of falls.
Exergames deteriorate
function and reduce the risk
of falls.
Exergames worsened
function and reduce the risk
of falls.
GPT2
Rank by
perplexity
Exergames
deteriorate
function and
reduce the risk of
falls.
Additionally: Sample N top concepts, run best
claims through NLI model with original claim and
select claim with highest contradiction
ZSSFC (ACL 2022)
Evaluation: Zero-Shot Fact Checking
• Task: generate claims using CG-Entity/CG-BART +
KBIN, use those claims to train LongChecker
• LongChecker: Longformer based scientific FC model which
uses an entire abstract as evidence
• Comparison:
• Baseline: training on general domain claims (FEVER dataset)
• Upper bound: training on original SciFact claims
26/05/2022 27
ZSSFC (ACL 2022)
Evaluation: Zero-Shot Fact Checking
26/05/2022 28
Method P R F1
FEVER 69.51 66.51 67.80
SciFact (Upper Bound) 77.88 77.51 77.70
CG-Entity 72.86 69.38 71.08
CG-BART 64.09 79.43 70.94
• Both methods achieve within 90% the performance of the upper
bound
• Significant improvement over out of domain pre-training
• Not much difference in performance between the two methods
ZSSFC (ACL 2022)
Conclusions
• We introduce CiteWorth – a large, rigorously cleaned
dataset for citation-related tasks
• We show that paragraph level context is crucial to
perform cite-worthiness detection
• We show that the data is diverse with a significant
domain effect
• We show that citances can be used to generate high
quality, atomic scientific claims usable to train models
for scientific fact checking
26/05/2022 29
Overview of Today’s Talk
• Introduction
• The Life Cycle of Scientific Research
• Part 1: Claim detection and generation
• Cite-worthiness detection
• Scientific claim generation for zero-shot scientific fact checking
• Part 2: Exaggeration Detection
• Exaggeration detection of health science press releases
• Conclusion
• Future research challenges
Exaggeration Detection of Science
Press Releases
Dustin Wright, Isabelle Augenstein
EMNLP 2021
31
Claims Should be Reported on Faithfully
26/05/2022 32
Current treatment options for ALS are based on symptom
management and respiratory support with the only approved
medications in widespread use, Riluzole and Edaravone,
providing only modest benefits and only in some patients.
Riluzole and Edaravone can cure ALS in most patients.
Exaggerated
Exaggeration detection of health science press releases
Exaggeration in Science Journalism
Sumner et al. 20141 and Bratton et al. 20192: InSciOut
26/05/2022 33
1. Sumner, P., Vivian-Griffiths, S., Boivin, J., Williams, A., Venetis, C. A., Davies, A., ... & Chambers, C. D. (2014). The
association between exaggeration in health related science news and academic press releases: retrospective
observational study. Bmj, 349.
2. Bratton, L., Adams, R. C., Challenger, A., Boivin, J., Bott, L., Chambers, C. D., & Sumner, P. (2019). The association
between exaggeration in health-related science news and academic press releases: a replication study. Wellcome
open research, 4.
Objective: To identify the source (press releases or news) of
distortions, exaggerations, or changes to the main conclusions
drawn from research that could potentially influence a reader’s
health related behaviour.
Conclusions:
• 33% of press releases contain exaggerations of conclusions of
scientific papers
• Exaggeration in news is strongly associated with exaggeration in
press releases
Our Work on Exaggeration Detection in Science
• Task: predicting when a press release exaggerates a scientific
paper
• Input: primary finding of the paper as written in the abstract
and the press release
• Focus of previous work: causal claim strength prediction of
these primary findings
26/05/2022 34
Task Formulations
• T1
• Entailment-like task to predict
exaggeration label
• Paired (press release, abstract) data
26/05/2022 35
ℒ!" = #
0 Downplays
1 Same
2 Exaggerates
• T2
• Text classification task to predict causal
claim strength
• Unpaired press releases and abstracts
• Final prediction compares strength of
paired press release and abstract
ℒ!# =
0 No Relation
1 Correlational
2 Conditional Causal
3 Direct Causal
Label Type Language Cue
0 No Relation
1 Correlational
association, associated with, predictor, at high risk
of
2 Conditional causal
increase, decrease, lead to, effect on, contribute to,
result in (Cues indicating doubt: may, might, appear
to, probably)
3 Direct causal
increase, decrease, lead to, effective on, contribute
to, reduce, can
Li et al. 2017
Evaluation Dataset Creation
26/05/2022 36
Start with the 823 labeled pairs from
Sumner et al. 2014 and Bratton et al. 2019
(InSciOut)
Collect original abstract text from Semantic
Scholar
Match original conclusion sentences to
paraphrased annotations via ROUGE
score
Manually inspect and discard missing or
incorrect abstracts
!
Downplays +! < +"
Same +! = +"
Exaggerates +! > +"
Final label: compare annotated claim
strength (+! for press release, +" for abstract)
Total data: 663 pairs (100 training, 553 test)
MT-PET
26/05/2022 37
Eating chocolate causes
happiness. The claim strength
is [MASK]
ℳ
0.01 0.21 0.15 &. '(
m
edium
estim
ated
cautious
distorted
Scientists claim eating
chocolate sometimes causes
happiness. Reporters claim
eating chocolate causes
happiness. The reporters
claims are [MASK]
0.01 0.05 &. )*
prelim
inary
identical
naive
+!
+"
+!
#
ℳ!
,!
-
.!
,!
Soft Labels
KL-Divergence Loss
(Unlabelled)
+"
#
."
+!
$
ℳ"
.!
+"
$
."
T1 (Exaggeration Detection) with MT-PET
26/05/2022 38
28,06
33,1
29,05
41,9
39,87 39,12
47,8 47,99 47,35
25
30
35
40
45
50
P R F1
Supervised PET MT-PET
Substantial improvements when using PET (10 points)
Further improvements with MT-PET (8 points)
Demonstrates transfer of knowledge from claim strength prediction to
exaggeration prediction
Error Analysis
• All models:
• disproportionately get pairs involving direct causal claims
incorrect
• do best for correlational claims from abstracts and claims
from press releases which are correlational or stronger
• MT-PET:
• helps the most for the most difficult category -- causal claims
26/05/2022 39
Summary
• We formalize the problem of scientific exaggeration
detection, providing two task formulations for the
problem
• We curate a set of benchmark data to evaluate
automatic methods for performing the task
• We propose MT-PET, a few-shot learning method
based on PET, which we demonstrate outperforms
strong baselines
26/05/2022 40
Overview of Today’s Talk
• Introduction
• The Life Cycle of Scientific Research
• Part 1: Claim detection and generation
• Cite-worthiness detection
• Scientific claim generation for zero-shot scientific fact checking
• Part 2: Exaggeration Detection
• Exaggeration detection of health science press releases
• Conclusion
• Future research challenges
Wrap-Up
42
Supporting the Life Cycle of Research
26/05/2022 43
Reviewing
Support
Citation
Analysis
Writing
Assistance
Information
Discovery
Conducting
Experiments
Paper
Writing
Peer Review
Research
Impact
Tracking
Information
Extraction
Summarisa
tion
Citation
Prediction
Reviewer
Matching
Review
Score
Prediction
Citation
Prediction
Citation
Trend
Analysis
Supporting the Life Cycle of Research
26/05/2022 44
Reviewing
Support
Citation
Analysis
Writing
Assistance
Information
Discovery
Conducting
Experiments
Paper
Writing
Peer Review
Research
Impact
Tracking
Information
Extraction
Summarisa
tion
Citation
Prediction
Credibility
Detection
Reviewer
Matching
Review
Score
Prediction
Citation
Prediction
Citation
Trend
Analysis
NEW
Overall Take-Aways
• Why scholarly document processing?
• Supporting the life cycle of research, from information discovery to
research impact tracking
• Why credibility detection for scholarly communication?
• Detect claims which should be backed up by evidence
(cite-worthiness detection)
• Detect inconsistencies between primary and secondary sources of
information (exaggeration detection, fact checking)
Overall Take-Aways
• Overarching challenges
• Difficult NLP tasks (require understanding of pragmatics)
• Domain effects, importance of context pose further challenges
• Not well-studied yet
• Scarcity of available benchmarks
• Many opportunities for future work
• Explore more different settings
• Gather more datasets
• Methods for domain adaptation & few-shot learning
• Tools for journalists & authors
Thank you!
isabelleaugenstein.github.io
augenstein@di.ku.dk
@IAugenstein
github.com/isabelleaugenstein
26/05/2022 47
Acknowledgements
48
CopeNLU
https://copenlu.github.io/
This project has received funding from the European Union’s Horizon 2020 research and
innovation programme under the Marie Skłodowska-Curie grant agreement No 801199.
Dustin Wright Dave Wadden Kyle Lo Bailey Kuehl Lucy Lu Wang
Presented Papers
Isabelle Augenstein. Determining the Credibility of Science
Communication. SDP workshop, 2021.
Dustin Wright, Isabelle Augenstein. CiteWorth: Cite-Worthiness
Detection for Improved Scientific Document Understanding. ACL 2021.
Dustin Wright, Dave Wadden, Kyle Lo, Bailey Kuehl, Isabelle
Augenstein, Lucy Lu Wang. Generating Scientific Claims for Zero-Shot
Scientific Fact Checking. ACL 2022.
Dustin Wright, Isabelle Augenstein. Exaggeration Detection of Science
Press Releases. EMNLP 2021.
Other Recent Related Papers
Andreas Nugaard Holm, Barbara Plank, Dustin Wright, Isabelle
Augenstein. Longitudinal Citation Prediction using Temporal Graph
Neural Networks. AAAI 2022 Workshop on Scientific Document
Understanding (SDU 2022), February 2022.
Malte Ostendorff, Nils Rethmeier, Isabelle Augenstein, Bela Gipp,
Georg Rehm. Neighborhood Contrastive Learning for Scientific
Document Representations with Citation Embeddings. CoRR,
abs/2202.06671, February 2022.
Shailza Jolly, Pepa Atanasova, Isabelle Augenstein. Generating Fluent
Fact Checking Explanations with Unsupervised Post-Editing. CoRR,
abs/2112.06924, December 2021.

More Related Content

PDF
Accountable and Robust Automatic Fact Checking
PDF
Tests de Diagnostic Rapide (TDR) du paludisme sous les tropiques
PDF
Towards Explainable Fact Checking (DIKU Business Club presentation)
PPTX
conduite-à-tenir-devant-une-hyperéosinophilie.pptx
PDF
Intelligent data analysis for medicinal diagnosis
PDF
Paris Data Ladies #14
PDF
How to establish and evaluate clinical prediction models - Statswork
PDF
Architectural approaches for implementing Clinical Decision Support Systems i...
Accountable and Robust Automatic Fact Checking
Tests de Diagnostic Rapide (TDR) du paludisme sous les tropiques
Towards Explainable Fact Checking (DIKU Business Club presentation)
conduite-à-tenir-devant-une-hyperéosinophilie.pptx
Intelligent data analysis for medicinal diagnosis
Paris Data Ladies #14
How to establish and evaluate clinical prediction models - Statswork
Architectural approaches for implementing Clinical Decision Support Systems i...

Similar to Automatically Detecting Scientific Misinformation (20)

PDF
Architectural approaches for implementing Clinical Decision Support Systems i...
PDF
VitreoRetinal Surgery Progress III .....
PDF
DEEP FACIAL DIAGNOSIS: DEEP TRANSFER LEARNING FROM FACE RECOGNITION TO FACIAL...
PDF
Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...
PDF
AI at AZ Festival of Genomics 2025 final.pdf
PDF
AI at AZ Festival of Genomics 2025 final.pdf
PPTX
How to establish and evaluate clinical prediction models - Statswork
PDF
Deep Learning-based Diagnosis of Pneumonia using X-Ray Scans
PDF
Heart Disease Prediction Using Data Mining
PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNING
PPTX
KDD22_tutorial_slides_final_sharing.pptx
PDF
A Review on Covid Detection using Cross Dataset Analysis
PPTX
Enabling Patient-Driven Medicine Using Graph Database
PPT
Cloud Computing and Innovations for Optimizing Life Sciences Research
PDF
Detecting outliers and anomalies in data streams
PPTX
Predictive Analytics and AI: Unlocking Clinical Trial Insights
PDF
PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...
PDF
IRJET- Prediction of Heart Disease using RNN Algorithm
DOCX
Citation Kristoffersson, A.; Lindén,
Architectural approaches for implementing Clinical Decision Support Systems i...
VitreoRetinal Surgery Progress III .....
DEEP FACIAL DIAGNOSIS: DEEP TRANSFER LEARNING FROM FACE RECOGNITION TO FACIAL...
Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...
AI at AZ Festival of Genomics 2025 final.pdf
AI at AZ Festival of Genomics 2025 final.pdf
How to establish and evaluate clinical prediction models - Statswork
Deep Learning-based Diagnosis of Pneumonia using X-Ray Scans
Heart Disease Prediction Using Data Mining
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNING
KDD22_tutorial_slides_final_sharing.pptx
A Review on Covid Detection using Cross Dataset Analysis
Enabling Patient-Driven Medicine Using Graph Database
Cloud Computing and Innovations for Optimizing Life Sciences Research
Detecting outliers and anomalies in data streams
Predictive Analytics and AI: Unlocking Clinical Trial Insights
PERSPECTIVES GENERATION VIA MULTI-HEAD ATTENTION MECHANISM AND COMMON-SENSE K...
IRJET- Prediction of Heart Disease using RNN Algorithm
Citation Kristoffersson, A.; Lindén,
Ad

More from Isabelle Augenstein (20)

PPTX
Beyond Fact Checking — Modelling Information Change in Scientific Communication
PDF
Determining the Credibility of Science Communication
PDF
Explainability for NLP
PDF
Towards Explainable Fact Checking
PDF
Tracking False Information Online
PDF
What can typological knowledge bases and language representations tell us abo...
PDF
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
PDF
Learning with limited labelled data in NLP: multi-task learning and beyond
PDF
Learning to read for automated fact checking
PDF
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
PDF
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
PDF
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
PDF
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
PPTX
Weakly Supervised Machine Reading
PDF
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
PDF
Distant Supervision with Imitation Learning
PDF
Extracting Relations between Non-Standard Entities using Distant Supervision ...
PDF
Information Extraction with Linked Data
PDF
Lodifier: Generating Linked Data from Unstructured Text
PDF
Relation Extraction from the Web using Distant Supervision
Beyond Fact Checking — Modelling Information Change in Scientific Communication
Determining the Credibility of Science Communication
Explainability for NLP
Towards Explainable Fact Checking
Tracking False Information Online
What can typological knowledge bases and language representations tell us abo...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Learning with limited labelled data in NLP: multi-task learning and beyond
Learning to read for automated fact checking
SemEval 2017 Task 10: ScienceIE – Extracting Keyphrases and Relations from Sc...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
1st Workshop for Women and Underrepresented Minorities (WiNLP) at ACL 2017 - ...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Weakly Supervised Machine Reading
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
Distant Supervision with Imitation Learning
Extracting Relations between Non-Standard Entities using Distant Supervision ...
Information Extraction with Linked Data
Lodifier: Generating Linked Data from Unstructured Text
Relation Extraction from the Web using Distant Supervision
Ad

Recently uploaded (20)

PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
composite construction of structures.pdf
PPT
introduction to datamining and warehousing
DOCX
573137875-Attendance-Management-System-original
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Construction Project Organization Group 2.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
CH1 Production IntroductoryConcepts.pptx
PPT
Project quality management in manufacturing
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Digital Logic Computer Design lecture notes
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
additive manufacturing of ss316l using mig welding
Automation-in-Manufacturing-Chapter-Introduction.pdf
composite construction of structures.pdf
introduction to datamining and warehousing
573137875-Attendance-Management-System-original
Model Code of Practice - Construction Work - 21102022 .pdf
Construction Project Organization Group 2.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Mechanical Engineering MATERIALS Selection
CH1 Production IntroductoryConcepts.pptx
Project quality management in manufacturing
UNIT 4 Total Quality Management .pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Digital Logic Computer Design lecture notes
UNIT-1 - COAL BASED THERMAL POWER PLANTS
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
OOP with Java - Java Introduction (Basics)
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
additive manufacturing of ss316l using mig welding

Automatically Detecting Scientific Misinformation

  • 3. Supporting the Life Cycle of Research 26/05/2022 3 Reviewing Support Citation Analysis Writing Assistance Information Discovery Conducting Experiments Paper Writing Peer Review Research Impact Tracking Information Extraction Summarisa tion Citation Prediction Reviewer Matching Review Score Prediction Citation Prediction Citation Trend Analysis
  • 4. Fact Checking 26/05/2022 4 Focus on veracity What about more subtle forms of misinformation?
  • 5. 26/05/2022 5 Press Release BBC DailyMail The Express Etc... Credibility and Veracity of Science Communication
  • 6. Knowledge-Intensive Tasks 26/05/2022 6 CLAIMS EVIDENCE Factuality, faithfulness, credit assignment, etc. Goal: automatically ensure the credibility and veracity of scientific information
  • 7. Fundamental Unit of Information: Claims • Focus: tasks involving scientific claims, i.e. assertions about the world which are factual in nature 26/05/2022 7 Current treatment options for ALS are based on symptom management and respiratory support with the only approved medications in widespread use, Riluzole and Edaravone, providing only modest benefits and only in some patients. • Challenges • Supervised learning is hard: annotation is expensive, requiring domain experts • Language used is diverse across fields • Different modalities • Scientific claims are complex • Meta-data also important
  • 8. Overview of Today’s Talk • Introduction • The Life Cycle of Scientific Research • Part 1: Claim detection and generation • Cite-worthiness detection • Scientific claim generation for zero-shot scientific fact checking • Part 2: Exaggeration Detection • Exaggeration detection of health science press releases • Conclusion • Future research challenges
  • 9. CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding Dustin Wright, Isabelle Augenstein ACL 2021 (Findings) 9
  • 10. Claims Should be Properly Credited 26/05/2022 10 Current treatment options for ALS are based on symptom management and respiratory support with the only approved medications in widespread use, Riluzole and Edaravone, providing only modest benefits and only in some patients. Cite-worthiness detection
  • 11. Citances in Machine Learning 26/05/2022 11 We use the model from the original BERT paper (Devlin et al. 2019). Cite-worthiness: Is this a citance? Yes Recommendation: What paper should be cited? Devlin et al. (2019) Influence: Was this an influential paper? Yes Intent: What is the purpose of the citation? Method
  • 12. Cite-Worthiness Datasets • Tend to be small and limited to only a few domains (e.g. Computer Science) • No attention paid to how clean the data is 26/05/2022 12 We use the model from Devlin et al. (2019) as a baseline. e.g. ungrammatical phrases
  • 13. Our Research Questions • How can a dataset for cite-worthiness detection be automatically curated with low noise? • What methods are most effective for automatically detecting cite-worthy sentences? • How does domain affect learning cite-worthiness detection? • Can large scale cite-worthiness data be used to perform transfer learning to downstream scientific text tasks? 26/05/2022 13
  • 14. CiteWorth: Dataset Curation 26/05/2022 14 1. https://github.com/allenai/s2orc We use the model from the original BERT paper (Devlin et al. 2019). We use the model from the original BERT paper [1]. Parenthetical author/year and bracketed numerical citations only Citations must be at the end of a sentence • We limit citances as follows • Source data: S2ORC1 – millions of extracted scientific documents from Semantic Scholar RQ1: How can a dataset for cite-worthiness detection be automatically curated with low noise?
  • 15. CiteWorth Final Dataset • 1,181,793 sentences • 10 different fields, 20,000+ paragraphs per field • Much cleaner than a naive baseline which only removes citation text based on gold spans 26/05/2022 15 RQ1: How can a dataset for cite-worthiness detection be automatically curated with low noise? Method Sentences Clean (%) Citation Markers Removed (%) Naive Baseline 92.07 92.78 CiteWorth (Ours) 98.90 98.10
  • 16. Predicting on Individual Sentences 26/05/2022 16 Can context improve performance? Method P R F1 Logistic Regression 46.65 64.88 54.28 CRNN 50.87 62.21 55.97 Transformer 47.92 71.59 57.39 BERT 55.04 69.02 61.23 SciBERT 57.03 68.08 62.06 Hard Examples – Low Confidence Easy Examples – High Confidence Explanations using inputXGrad*: RQ2: What methods are most effective for automatically detecting cite-worthy sentences? * Pieter-Jan Kindermans, Kristof Schütt, Klaus-Robert Müller, and Sven Dähne. 2016. Investigating the Influence of Noise and Distractors on the Interpretation of Neural Networks. arXiv preprint arXiv:1611.07270.
  • 17. Predicting Multiple Sentences at Once 26/05/2022 17 Are there variations across field? RQ2: What methods are most effective for automatically detecting cite-worthy sentences? Longformer* [CLS] !! ! !! " [SEP] !" ! !" " !" # [SEP] Pooling Classify Pooling Classify … … Method P R F1 SciBERT 57.03 68.08 62.06 Longformer-Solo 57.21 68.00 62.14 Longformer-Ctx 59.92 77.15 67.45 Δ 5 pts * Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The Long-Document Transformer. CoRR, abs/2004.05150.
  • 18. Domain Effects 26/05/2022 18 67.58 58.41 56.86 62.35 68.23 66.62 60.25 60.11 64.02 68.07 65.05 59.36 61.99 63.85 66.72 65.49 58.03 56.69 65.10 68.27 66.59 58.80 58.22 64.54 69.12 Ch E CS P B Ch E CS P B Test Train RQ3: How does domain affect learning cite-worthiness detection? Average Pooled BERT representations of sentences Clustered using GMM Roee Aharoni and Y. Goldberg. 2020. Unsupervised Domain Clusters in Pretrained Language Models. In ACL. Longformer-Ctx performance in cross-domain evaluation
  • 19. Generating Scientific Claims for Zero- Shot Scientific Fact Checking Dustin Wright, Dave Wadden, Kyle Lo, Bailey Kuehl, Isabelle Augenstein, Lucie Lu Wang ACL 2022 19
  • 20. Claims Should be Factually Correct 26/05/2022 20 Current treatment options for ALS are based on symptom management and respiratory support with the only approved medications in widespread use, Riluzole and Edaravone, providing only modest benefits and only in some patients. ALS cannot be treated by Riluzole. Scientific claim generation for zero-shot scientific fact checking
  • 21. Citances in Scientific Fact Checking ● Scientific fact checking data is difficult to collect -- where to collect claims? ● Previous work: SciFact ○ Crowdsource claims from citances ○ Requires manually rewriting complex claims into atomic claims 26/05/2022 21 Australian statistics show that 1 in 7 young people have an anxiety disorder and 1 in 16 have depression Australian statistics show that 1 in 7 young people have an anxiety disorder Australian statistics show that 1 in 16 young people have depression Individual claims are “atomic verifiable statements expressing a finding about one aspect of a scientific entity or process, which can be verified from a single source” ZSSFC (ACL 2022)
  • 22. Research Questions 1. How can we generate claims from citances that are useful for scientific fact checking? ➢ Citances -- sentences which have references to external source ➢ Useful for fact checking because they contain a link to evidence which can be used to verify their content 2. What methods can generate claims which are fluent, faithful, atomic, and de-contextualized? 3. In what situations do generated claims help improve performance on zero-shot scientific fact checking? 26/05/2022 22 ZSSFC (ACL 2022)
  • 23. Methods: ClaimGen-Entity 26/05/2022 23 P P,a 1 P,a 2 P,a 3 Mq Q1,a 1 Q2,a 2 Q3,a 3 Mc C1 C2 C3 NER Citance Named Entity Recognitio n (scispacy) Question Generation (BART) Claim Generation (BART) ZSSFC (ACL 2022)
  • 24. Methods: ClaimGen-BART 26/05/2022 24 P P,a 1 P,a 2 P,a 3 Mq Q1,a 1 Q2,a 2 Q3,a 3 Mc C1 C2 C3 NER Citance Named Entity Recognitio n (scispacy) Question Generation (BART) Claim Generation (BART) Input format: “{CONTEXT} || {CITANCE} || {CLAIM}” Sample multiple claims at test time ZSSFC (ACL 2022)
  • 25. Generating Negations: KBIN 26/05/2022 25 Exergames improve function and reduce the risk of falls. C0184511 UMLS 'C0426422' 'C1998348' 'C0024103' 'C0043094' 'C1457868' 'C1827505' 'C1719838' T03 3 cui2vec1 C1457868 1. Andrew L. Beam, Benjamin Kompa, Allen Schmaltz, Inbar Fried, Griffin M. Weber, Nathan P. Palmer, Xu Shi, Tianxi Cai, Isaac S. Kohane: Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data. PSB 2020: 295-306 ZSSFC (ACL 2022)
  • 26. Generating Negations: KBIN 26/05/2022 26 UMLS C1457868 Exergames worse function and reduce the risk of falls. Exergames deteriorating function and reduce the risk of falls. Exergames deteriorate function and reduce the risk of falls. Exergames worsened function and reduce the risk of falls. GPT2 Rank by perplexity Exergames deteriorate function and reduce the risk of falls. Additionally: Sample N top concepts, run best claims through NLI model with original claim and select claim with highest contradiction ZSSFC (ACL 2022)
  • 27. Evaluation: Zero-Shot Fact Checking • Task: generate claims using CG-Entity/CG-BART + KBIN, use those claims to train LongChecker • LongChecker: Longformer based scientific FC model which uses an entire abstract as evidence • Comparison: • Baseline: training on general domain claims (FEVER dataset) • Upper bound: training on original SciFact claims 26/05/2022 27 ZSSFC (ACL 2022)
  • 28. Evaluation: Zero-Shot Fact Checking 26/05/2022 28 Method P R F1 FEVER 69.51 66.51 67.80 SciFact (Upper Bound) 77.88 77.51 77.70 CG-Entity 72.86 69.38 71.08 CG-BART 64.09 79.43 70.94 • Both methods achieve within 90% the performance of the upper bound • Significant improvement over out of domain pre-training • Not much difference in performance between the two methods ZSSFC (ACL 2022)
  • 29. Conclusions • We introduce CiteWorth – a large, rigorously cleaned dataset for citation-related tasks • We show that paragraph level context is crucial to perform cite-worthiness detection • We show that the data is diverse with a significant domain effect • We show that citances can be used to generate high quality, atomic scientific claims usable to train models for scientific fact checking 26/05/2022 29
  • 30. Overview of Today’s Talk • Introduction • The Life Cycle of Scientific Research • Part 1: Claim detection and generation • Cite-worthiness detection • Scientific claim generation for zero-shot scientific fact checking • Part 2: Exaggeration Detection • Exaggeration detection of health science press releases • Conclusion • Future research challenges
  • 31. Exaggeration Detection of Science Press Releases Dustin Wright, Isabelle Augenstein EMNLP 2021 31
  • 32. Claims Should be Reported on Faithfully 26/05/2022 32 Current treatment options for ALS are based on symptom management and respiratory support with the only approved medications in widespread use, Riluzole and Edaravone, providing only modest benefits and only in some patients. Riluzole and Edaravone can cure ALS in most patients. Exaggerated Exaggeration detection of health science press releases
  • 33. Exaggeration in Science Journalism Sumner et al. 20141 and Bratton et al. 20192: InSciOut 26/05/2022 33 1. Sumner, P., Vivian-Griffiths, S., Boivin, J., Williams, A., Venetis, C. A., Davies, A., ... & Chambers, C. D. (2014). The association between exaggeration in health related science news and academic press releases: retrospective observational study. Bmj, 349. 2. Bratton, L., Adams, R. C., Challenger, A., Boivin, J., Bott, L., Chambers, C. D., & Sumner, P. (2019). The association between exaggeration in health-related science news and academic press releases: a replication study. Wellcome open research, 4. Objective: To identify the source (press releases or news) of distortions, exaggerations, or changes to the main conclusions drawn from research that could potentially influence a reader’s health related behaviour. Conclusions: • 33% of press releases contain exaggerations of conclusions of scientific papers • Exaggeration in news is strongly associated with exaggeration in press releases
  • 34. Our Work on Exaggeration Detection in Science • Task: predicting when a press release exaggerates a scientific paper • Input: primary finding of the paper as written in the abstract and the press release • Focus of previous work: causal claim strength prediction of these primary findings 26/05/2022 34
  • 35. Task Formulations • T1 • Entailment-like task to predict exaggeration label • Paired (press release, abstract) data 26/05/2022 35 ℒ!" = # 0 Downplays 1 Same 2 Exaggerates • T2 • Text classification task to predict causal claim strength • Unpaired press releases and abstracts • Final prediction compares strength of paired press release and abstract ℒ!# = 0 No Relation 1 Correlational 2 Conditional Causal 3 Direct Causal Label Type Language Cue 0 No Relation 1 Correlational association, associated with, predictor, at high risk of 2 Conditional causal increase, decrease, lead to, effect on, contribute to, result in (Cues indicating doubt: may, might, appear to, probably) 3 Direct causal increase, decrease, lead to, effective on, contribute to, reduce, can Li et al. 2017
  • 36. Evaluation Dataset Creation 26/05/2022 36 Start with the 823 labeled pairs from Sumner et al. 2014 and Bratton et al. 2019 (InSciOut) Collect original abstract text from Semantic Scholar Match original conclusion sentences to paraphrased annotations via ROUGE score Manually inspect and discard missing or incorrect abstracts ! Downplays +! < +" Same +! = +" Exaggerates +! > +" Final label: compare annotated claim strength (+! for press release, +" for abstract) Total data: 663 pairs (100 training, 553 test)
  • 37. MT-PET 26/05/2022 37 Eating chocolate causes happiness. The claim strength is [MASK] ℳ 0.01 0.21 0.15 &. '( m edium estim ated cautious distorted Scientists claim eating chocolate sometimes causes happiness. Reporters claim eating chocolate causes happiness. The reporters claims are [MASK] 0.01 0.05 &. )* prelim inary identical naive +! +" +! # ℳ! ,! - .! ,! Soft Labels KL-Divergence Loss (Unlabelled) +" # ." +! $ ℳ" .! +" $ ."
  • 38. T1 (Exaggeration Detection) with MT-PET 26/05/2022 38 28,06 33,1 29,05 41,9 39,87 39,12 47,8 47,99 47,35 25 30 35 40 45 50 P R F1 Supervised PET MT-PET Substantial improvements when using PET (10 points) Further improvements with MT-PET (8 points) Demonstrates transfer of knowledge from claim strength prediction to exaggeration prediction
  • 39. Error Analysis • All models: • disproportionately get pairs involving direct causal claims incorrect • do best for correlational claims from abstracts and claims from press releases which are correlational or stronger • MT-PET: • helps the most for the most difficult category -- causal claims 26/05/2022 39
  • 40. Summary • We formalize the problem of scientific exaggeration detection, providing two task formulations for the problem • We curate a set of benchmark data to evaluate automatic methods for performing the task • We propose MT-PET, a few-shot learning method based on PET, which we demonstrate outperforms strong baselines 26/05/2022 40
  • 41. Overview of Today’s Talk • Introduction • The Life Cycle of Scientific Research • Part 1: Claim detection and generation • Cite-worthiness detection • Scientific claim generation for zero-shot scientific fact checking • Part 2: Exaggeration Detection • Exaggeration detection of health science press releases • Conclusion • Future research challenges
  • 43. Supporting the Life Cycle of Research 26/05/2022 43 Reviewing Support Citation Analysis Writing Assistance Information Discovery Conducting Experiments Paper Writing Peer Review Research Impact Tracking Information Extraction Summarisa tion Citation Prediction Reviewer Matching Review Score Prediction Citation Prediction Citation Trend Analysis
  • 44. Supporting the Life Cycle of Research 26/05/2022 44 Reviewing Support Citation Analysis Writing Assistance Information Discovery Conducting Experiments Paper Writing Peer Review Research Impact Tracking Information Extraction Summarisa tion Citation Prediction Credibility Detection Reviewer Matching Review Score Prediction Citation Prediction Citation Trend Analysis NEW
  • 45. Overall Take-Aways • Why scholarly document processing? • Supporting the life cycle of research, from information discovery to research impact tracking • Why credibility detection for scholarly communication? • Detect claims which should be backed up by evidence (cite-worthiness detection) • Detect inconsistencies between primary and secondary sources of information (exaggeration detection, fact checking)
  • 46. Overall Take-Aways • Overarching challenges • Difficult NLP tasks (require understanding of pragmatics) • Domain effects, importance of context pose further challenges • Not well-studied yet • Scarcity of available benchmarks • Many opportunities for future work • Explore more different settings • Gather more datasets • Methods for domain adaptation & few-shot learning • Tools for journalists & authors
  • 48. Acknowledgements 48 CopeNLU https://copenlu.github.io/ This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 801199. Dustin Wright Dave Wadden Kyle Lo Bailey Kuehl Lucy Lu Wang
  • 49. Presented Papers Isabelle Augenstein. Determining the Credibility of Science Communication. SDP workshop, 2021. Dustin Wright, Isabelle Augenstein. CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding. ACL 2021. Dustin Wright, Dave Wadden, Kyle Lo, Bailey Kuehl, Isabelle Augenstein, Lucy Lu Wang. Generating Scientific Claims for Zero-Shot Scientific Fact Checking. ACL 2022. Dustin Wright, Isabelle Augenstein. Exaggeration Detection of Science Press Releases. EMNLP 2021.
  • 50. Other Recent Related Papers Andreas Nugaard Holm, Barbara Plank, Dustin Wright, Isabelle Augenstein. Longitudinal Citation Prediction using Temporal Graph Neural Networks. AAAI 2022 Workshop on Scientific Document Understanding (SDU 2022), February 2022. Malte Ostendorff, Nils Rethmeier, Isabelle Augenstein, Bela Gipp, Georg Rehm. Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings. CoRR, abs/2202.06671, February 2022. Shailza Jolly, Pepa Atanasova, Isabelle Augenstein. Generating Fluent Fact Checking Explanations with Unsupervised Post-Editing. CoRR, abs/2112.06924, December 2021.