Corroborating
Information from
Affirmative Statements
Minji Wu, Rutgers University
Amélie Marian, Rutgers University
Background
• Information is often untrustworthy
• Erroneous (e.g, news site at breaking events)
• Misleading (e.g., malicious sources)
• Biased (e.g., political domains)
• Outdated (e.g., knowledge base that doesn’t update
frequently)
• This phenomenon is amplified by the widespread
information dependency (copy-paste)
• It is difficult for the user to discern the correctness of
information and the trustworthiness of the sources
2
Conflicting Information
3
Data Corroboration
• Early Corroboration
• Frequency-based approach
• Recent work on Corroboration techniques
• Trustworthiness of sources
A measure s(s) that quantify the precision of a source s
• Probability of information (facts)
A measure that s(f) quantify the probability that a fact f is true
• Starting with a default s(s), iteratively compute the
probabilities for the facts and the trustworthiness of the
sources
• Machine-Learning approaches
• Some corroboration problems can be seen as a ML
classification problem
4
What if there is no conflicts?
• Does the presence of information without
contradictions means it is correct?
5
Our Problem: Corroborating Information
with only Affirmative Statements
• We focus on scenarios in which sources have little
or no dissention
• Frequent real-world problem (rumors, hard-to-rebut
claims)
• Difficult to identify incorrect information since all
reported information is consistent
• Existing corroboration approaches do not work
well
• Rely on conflicting information to differentiate the
trustworthiness of the sources
6
Contributions
• Novel corroboration approach:
• Assigns multiple trust scores to each sources
• Considers the trustworthiness of the source for a group of
facts
• Corroboration algorithm incrementally evaluates facts
• Groups unknown facts based on the sources reporting
them
• Makes decisions based on information entropy
• Extensive real world and synthetic experiments that
demonstrate the benefits of our method
7
Evaluation Setting
• Corroboration task:
• Sources for restaurant address: Citysearch,
Foursquare, Menupages, Opentable, Yellowpages,
Yelp
• Golden set
• Selected restaurants in 3 zip codes: 601 listings
• Verified their legitimacy in person (Apr 2012)
• 340 true and 261 false
Identify legitimate restaurant listings in NYC given
the listing information from a set of sources
8
Motivating Example
Opentable Yelp Menupages Citysearch Yellowpag
es
Correct
value
M Bar T T true
Sam’s T T T T true
27 Sunshine T T T true
Crepe
Creations
T T false
El Portal T T false
Holy Basil F T false
Papatzul T T true
Wine Spot T T true
Vbar T T true
Wai Cafe T T false
Tomoe Sushi T T T true
Khushie 139 F F T false
9
State-of-the-art
Corroboration Strategies
Approaches
• TwoEstimate [Galland WSDM’10]
• Iteratively estimates the trust score of the sources
and the probability of the facts
• BayesEstimate [Zhao VLDB’12]
• Uses a Bayesian graphical model
• Considers a two-sided errors (false positives and
false negatives)Precisio
n
Recall Accurac
y
Computed trust scores
TwoEstimate .64 1 .67 (1, 1, 0.8, 0.9, 1)
BayesEstima
te
.58 1 .58 (1, 0.8, 0.6, 1, 1)
used to evaluate each fact!10
Key Observation
• Using the same trust score to judge the correctness
of all information is too coarse
• Each source may exhibit different accuracy towards
different group of facts
• The corroboration result could be greatly improved if
we could derive finer-grained trust scores for each
source
11
Multi-value trust scores for sources
Trust Scores
• Single-value trust scores (s(s))
• A single measure for each source
• Each fact is evaluated using the same value from each
source
• Multi-value trust scores
• A group of values assigned to each source
s(s) = < s1(s), s2(s), …>
• Each (group of) fact is evaluated using one of the trust
values from each source
12
Multi-Value Trust Scores
• Two major challenges
• How to calculate the trust values for each source
• How to decide which sources’ trust values to
consider for each fact
• Solution: an incremental evaluation mechanism
• Select a subset of facts to process
• Update the trust values based on the already
processed facts
• Facts are assigned a truth value when they are
processed
13
How to Select Facts?
• Model each fact f as a random variable
• Objective: compute the probability s(f) that f is true
• Information Entropy approach:
• Consider the entropy H(f) of each fact f
• The entropy of a random variable measures its uncertainty
• Our solution: select facts such that the entropy of
unknown facts are maximized
• Existing corroboration techniques normalize their results
to attain a probability of 1 (or 0) for each fact, i.e., entropy
of 0
• Reducing uncertainty leads to (too) early consensus
14
Heuristics for Selecting Facts
• Group facts based on the votes from sources
• At each step i:
• Calculate the entropy of each fact group using si(s)
• Calculate ΔH(FG) for each fact group FG
(Represents the change of entropy if FG is selected)
• Select both positive and negative fact groups with highest
ΔH(FG)
• Assign positive and negative values to the same number of
facts
15
Revisiting the running
example
Positive: {r7}, {r2}, {r3}, {r5, r8}, {r11}, {r9}, {r4, r10}, {r6}, {r1}
Negative: {r12}
Positive: {r3}, {r11}, {r5, r8}, {r2}, {r9}, {r1}
s(S)={0.9, 0.9, 0.9, 0.9,
0.9}
s(S)={1, 1, 1, 0, 0.9}
Negative: {r4, r10}, {r6}
F1={r7, r12}F2={r3, {r4, r10}}
Positive: {r9}, {r5, r8}, {r1}, {r11}, {r2}
s(S)={1, 1, 1, 0, 0.5}
Negative: {r10}, {r6}
F2={r3, r4}F3={r9, r10}
Positive: {r5, r8}, {r1}, {r11}, {r2}
s(S)={1, 1, 1, 0, 0.5}
Negative: {r6}
F4={r5, r6}
Positive: {r8}, {r3}, {r11}, {r2}
s(S)={1, 1, 1, 0, 0.5}
Negative:
True facts: r7
False facts:r12
r3
r4
r9
r10
r5
r6
r3 r8 r2 r11 Precision Recall Accurac
y
0.86 1 0.92
16
Precisio
n
Recall Accurac
y
Computed trust scores
TwoEstimate .64 1 .67 (1, 1, 0.8, 0.9, 1)
BayesEstima
te
.58 1 .58 (1, 0.8, 0.6, 1, 1)
IncEstHeu .86 1 .92 (0.9,0.9,0.9,0.9,0.9)
(1,1,1,0,0.9)
(1,1,1,0,0.5)
Experimental Setting
• Algorithms
• We implemented two strategies (IncEstPS, IncEstHeu) using
Java
• Frequency-based: Voting and Counting
• Existing Corroboration Techniques: TwoEstimate, BayesEstimate
• Machine Learning based: ML-SVM, ML-Logistic
• 36916 listings from 6 sources
• Metrics
• Precision, Recall, Accuracy
• Mean Square error (MSE) of trust score
17
Corroboration Results
Precision Recall Accuracy F-1
Voting 0.65 1.00 0.66 0.79
Counting 0.94 0.65 0.76 0.77
BayesEstimate 0.63 1.00 0.67 0.77
TwoEstimate 0.65 1.00 0.66 0.79
ML-SVM 0.98 0.74 0.77 0.84
ML-Logistic 0.86 0.85 0.82 0.82
IncEstPS 0.66 1.00 0.68 0.79
IncEstHeu 0.86 0.86 0.83 0.86
18
MSE on the sources
Yellowpag
es
Foursquar
e
Menupage
s
Opentabl
e
Citysearc
h
Yel
p
MSE
Accuracy 0.59 0.78 0.93 0.96 0.62 0.84 -
TwoEstimate 1.00 1.00 0.98 1.00 1.00 0.98 0.063
BayesEstimat
e
1.00 1.00 1.00 1.00 1.00 1.00 0.066
ML-Logistic 0.62 0.85 0.98 0.92 0.65 0.95 0.004
IncEstHeu 0.51 0.70 0.90 0.93 0.51 0.89 0.005
19
Multi-value Trust Score
• Simple Fact Selection • Entropy-based Fact
Selection
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
Trustscore
Time point
Yellowpages
Foursquare
Menupages
Opentable
Citysearch
Yelp
0.8
0.85
0.9
0.95
1
1.05
0 20 40 60 80 100
Trustscore
Time point
Yellowpages
Foursquare
Menupages
Opentable
Citysearch
Yelp
20
Conclusion
• Proposed techniques for corroborating facts with
mostly affirmative statements
• Designed a novel algorithm that adopts a multi-value
trust score for the sources
• Incrementally selects facts by leveraging the information
entropy of unknown facts
• Uses different sets of sources’ trust scores to evaluate ach
sets of facts
• Performed experiments using both real world and
synthetic (see paper) data
21

More Related Content

PPT
Simple Present Tense Affirmative And Negative Statements
PPTX
TRANSFORMATION OF AFFIRMATIVE SENTENCES
PPT
Powerpoint present simple-affirmative
PPT
Verb to be positive, negative, interrogative
PPT
Basics of English Grammar
PPT
Present Simple,affirmative,negative and interrogative form
PPT
Reported Speech
PDF
10 a simple past affirmative statements
Simple Present Tense Affirmative And Negative Statements
TRANSFORMATION OF AFFIRMATIVE SENTENCES
Powerpoint present simple-affirmative
Verb to be positive, negative, interrogative
Basics of English Grammar
Present Simple,affirmative,negative and interrogative form
Reported Speech
10 a simple past affirmative statements

Viewers also liked (20)

PPTX
judgment(proposition)
PPTX
6. intonation
PPTX
Sentence transformations
PPT
Simple past ppt.
PPT
Present simple explanation + exercises
DOCX
ระบบงานเฝ้าระวังโรคทางระบาดวิทยาสำนักงานสาธารณสุขอำเภอป่าบอน จังหวัดพัทลุง
PPTX
PDF
Partnership and open data as enablers of INSPIREd innovative services
PPT
Bridge the gap research strategies 2014
PPT
Hackathon - Mapping da National Core a INSPIRE (Hydrography)
PDF
Moz food market_integration
PPT
Seminario biologia molecular l. donovani
PPT
Textile testing-equipments-1224828041803735-9
PPT
#quclms launch presentation
PDF
Q3 2012 Home Improvement Search Trends
PPT
PPTX
Werkwinkel Repair Café - Etapastocht
PPTX
Il condizionale
PDF
Acoustic Correction of High School gym in Florence
PPTX
judgment(proposition)
6. intonation
Sentence transformations
Simple past ppt.
Present simple explanation + exercises
ระบบงานเฝ้าระวังโรคทางระบาดวิทยาสำนักงานสาธารณสุขอำเภอป่าบอน จังหวัดพัทลุง
Partnership and open data as enablers of INSPIREd innovative services
Bridge the gap research strategies 2014
Hackathon - Mapping da National Core a INSPIRE (Hydrography)
Moz food market_integration
Seminario biologia molecular l. donovani
Textile testing-equipments-1224828041803735-9
#quclms launch presentation
Q3 2012 Home Improvement Search Trends
Werkwinkel Repair Café - Etapastocht
Il condizionale
Acoustic Correction of High School gym in Florence
Ad

Similar to Corroborating Facts from Affirmative Statements (20)

PPT
Burns And Bush Chapter 16
PPTX
11.1. Quantitative Data Analysis - Data and Hypothesis.pptx
PPTX
Haystax bayesian networks
PPTX
CS194Lec0hbh6EDA.pptx
PDF
Statistical hypothesis testing in e commerce
PPT
Qm 0809
PPT
Mir 2012 13 session #4
PPTX
Basic stat analysis using excel
PPT
data mining
PPT
Hypothesis testing interview
PPTX
Environmental statistics
PDF
Chapter 8 statistics for the sciences 10
PPTX
Introduction to Hypothesis Testing
PPTX
inferencial statistics
PPT
Data analysis
PDF
Missing data handling
PPTX
Descriptive Statistics (Basic Ideas).PPTX
PDF
76a15ed521b7679e372aab35412ab78ab583436a-1602816156135.pdf
PDF
Bengkel Analisis Data Menggunakan SPSS- 3152024-.pdf
Burns And Bush Chapter 16
11.1. Quantitative Data Analysis - Data and Hypothesis.pptx
Haystax bayesian networks
CS194Lec0hbh6EDA.pptx
Statistical hypothesis testing in e commerce
Qm 0809
Mir 2012 13 session #4
Basic stat analysis using excel
data mining
Hypothesis testing interview
Environmental statistics
Chapter 8 statistics for the sciences 10
Introduction to Hypothesis Testing
inferencial statistics
Data analysis
Missing data handling
Descriptive Statistics (Basic Ideas).PPTX
76a15ed521b7679e372aab35412ab78ab583436a-1602816156135.pdf
Bengkel Analisis Data Menggunakan SPSS- 3152024-.pdf
Ad

More from Amélie Marian (8)

PPTX
Integration and Exploration of Connected Personal Digital Traces
PPTX
Miettes de données - Keynote BDA 2015
PDF
Personal Information Management Systems - EDBT/ICDT'15 Tutorial
PPTX
Personal Information Search and Discovery
PDF
Personalizing Forum Search using Multidimensional Random Walks
PPTX
Searching Web Forums
PPTX
Remembrance of data past
PPTX
Searching data with substance and style
Integration and Exploration of Connected Personal Digital Traces
Miettes de données - Keynote BDA 2015
Personal Information Management Systems - EDBT/ICDT'15 Tutorial
Personal Information Search and Discovery
Personalizing Forum Search using Multidimensional Random Walks
Searching Web Forums
Remembrance of data past
Searching data with substance and style

Recently uploaded (20)

PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
The various Industrial Revolutions .pptx
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
CloudStack 4.21: First Look Webinar slides
PDF
UiPath Agentic Automation session 1: RPA to Agents
PPTX
Chapter 5: Probability Theory and Statistics
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPT
What is a Computer? Input Devices /output devices
PDF
Architecture types and enterprise applications.pdf
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PDF
STKI Israel Market Study 2025 version august
PDF
Five Habits of High-Impact Board Members
Hindi spoken digit analysis for native and non-native speakers
Developing a website for English-speaking practice to English as a foreign la...
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
OpenACC and Open Hackathons Monthly Highlights July 2025
NewMind AI Weekly Chronicles – August ’25 Week III
Enhancing emotion recognition model for a student engagement use case through...
The various Industrial Revolutions .pptx
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
CloudStack 4.21: First Look Webinar slides
UiPath Agentic Automation session 1: RPA to Agents
Chapter 5: Probability Theory and Statistics
The influence of sentiment analysis in enhancing early warning system model f...
sustainability-14-14877-v2.pddhzftheheeeee
What is a Computer? Input Devices /output devices
Architecture types and enterprise applications.pdf
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
A proposed approach for plagiarism detection in Myanmar Unicode text
STKI Israel Market Study 2025 version august
Five Habits of High-Impact Board Members

Corroborating Facts from Affirmative Statements

  • 1. Corroborating Information from Affirmative Statements Minji Wu, Rutgers University Amélie Marian, Rutgers University
  • 2. Background • Information is often untrustworthy • Erroneous (e.g, news site at breaking events) • Misleading (e.g., malicious sources) • Biased (e.g., political domains) • Outdated (e.g., knowledge base that doesn’t update frequently) • This phenomenon is amplified by the widespread information dependency (copy-paste) • It is difficult for the user to discern the correctness of information and the trustworthiness of the sources 2
  • 4. Data Corroboration • Early Corroboration • Frequency-based approach • Recent work on Corroboration techniques • Trustworthiness of sources A measure s(s) that quantify the precision of a source s • Probability of information (facts) A measure that s(f) quantify the probability that a fact f is true • Starting with a default s(s), iteratively compute the probabilities for the facts and the trustworthiness of the sources • Machine-Learning approaches • Some corroboration problems can be seen as a ML classification problem 4
  • 5. What if there is no conflicts? • Does the presence of information without contradictions means it is correct? 5
  • 6. Our Problem: Corroborating Information with only Affirmative Statements • We focus on scenarios in which sources have little or no dissention • Frequent real-world problem (rumors, hard-to-rebut claims) • Difficult to identify incorrect information since all reported information is consistent • Existing corroboration approaches do not work well • Rely on conflicting information to differentiate the trustworthiness of the sources 6
  • 7. Contributions • Novel corroboration approach: • Assigns multiple trust scores to each sources • Considers the trustworthiness of the source for a group of facts • Corroboration algorithm incrementally evaluates facts • Groups unknown facts based on the sources reporting them • Makes decisions based on information entropy • Extensive real world and synthetic experiments that demonstrate the benefits of our method 7
  • 8. Evaluation Setting • Corroboration task: • Sources for restaurant address: Citysearch, Foursquare, Menupages, Opentable, Yellowpages, Yelp • Golden set • Selected restaurants in 3 zip codes: 601 listings • Verified their legitimacy in person (Apr 2012) • 340 true and 261 false Identify legitimate restaurant listings in NYC given the listing information from a set of sources 8
  • 9. Motivating Example Opentable Yelp Menupages Citysearch Yellowpag es Correct value M Bar T T true Sam’s T T T T true 27 Sunshine T T T true Crepe Creations T T false El Portal T T false Holy Basil F T false Papatzul T T true Wine Spot T T true Vbar T T true Wai Cafe T T false Tomoe Sushi T T T true Khushie 139 F F T false 9
  • 10. State-of-the-art Corroboration Strategies Approaches • TwoEstimate [Galland WSDM’10] • Iteratively estimates the trust score of the sources and the probability of the facts • BayesEstimate [Zhao VLDB’12] • Uses a Bayesian graphical model • Considers a two-sided errors (false positives and false negatives)Precisio n Recall Accurac y Computed trust scores TwoEstimate .64 1 .67 (1, 1, 0.8, 0.9, 1) BayesEstima te .58 1 .58 (1, 0.8, 0.6, 1, 1) used to evaluate each fact!10
  • 11. Key Observation • Using the same trust score to judge the correctness of all information is too coarse • Each source may exhibit different accuracy towards different group of facts • The corroboration result could be greatly improved if we could derive finer-grained trust scores for each source 11 Multi-value trust scores for sources
  • 12. Trust Scores • Single-value trust scores (s(s)) • A single measure for each source • Each fact is evaluated using the same value from each source • Multi-value trust scores • A group of values assigned to each source s(s) = < s1(s), s2(s), …> • Each (group of) fact is evaluated using one of the trust values from each source 12
  • 13. Multi-Value Trust Scores • Two major challenges • How to calculate the trust values for each source • How to decide which sources’ trust values to consider for each fact • Solution: an incremental evaluation mechanism • Select a subset of facts to process • Update the trust values based on the already processed facts • Facts are assigned a truth value when they are processed 13
  • 14. How to Select Facts? • Model each fact f as a random variable • Objective: compute the probability s(f) that f is true • Information Entropy approach: • Consider the entropy H(f) of each fact f • The entropy of a random variable measures its uncertainty • Our solution: select facts such that the entropy of unknown facts are maximized • Existing corroboration techniques normalize their results to attain a probability of 1 (or 0) for each fact, i.e., entropy of 0 • Reducing uncertainty leads to (too) early consensus 14
  • 15. Heuristics for Selecting Facts • Group facts based on the votes from sources • At each step i: • Calculate the entropy of each fact group using si(s) • Calculate ΔH(FG) for each fact group FG (Represents the change of entropy if FG is selected) • Select both positive and negative fact groups with highest ΔH(FG) • Assign positive and negative values to the same number of facts 15
  • 16. Revisiting the running example Positive: {r7}, {r2}, {r3}, {r5, r8}, {r11}, {r9}, {r4, r10}, {r6}, {r1} Negative: {r12} Positive: {r3}, {r11}, {r5, r8}, {r2}, {r9}, {r1} s(S)={0.9, 0.9, 0.9, 0.9, 0.9} s(S)={1, 1, 1, 0, 0.9} Negative: {r4, r10}, {r6} F1={r7, r12}F2={r3, {r4, r10}} Positive: {r9}, {r5, r8}, {r1}, {r11}, {r2} s(S)={1, 1, 1, 0, 0.5} Negative: {r10}, {r6} F2={r3, r4}F3={r9, r10} Positive: {r5, r8}, {r1}, {r11}, {r2} s(S)={1, 1, 1, 0, 0.5} Negative: {r6} F4={r5, r6} Positive: {r8}, {r3}, {r11}, {r2} s(S)={1, 1, 1, 0, 0.5} Negative: True facts: r7 False facts:r12 r3 r4 r9 r10 r5 r6 r3 r8 r2 r11 Precision Recall Accurac y 0.86 1 0.92 16 Precisio n Recall Accurac y Computed trust scores TwoEstimate .64 1 .67 (1, 1, 0.8, 0.9, 1) BayesEstima te .58 1 .58 (1, 0.8, 0.6, 1, 1) IncEstHeu .86 1 .92 (0.9,0.9,0.9,0.9,0.9) (1,1,1,0,0.9) (1,1,1,0,0.5)
  • 17. Experimental Setting • Algorithms • We implemented two strategies (IncEstPS, IncEstHeu) using Java • Frequency-based: Voting and Counting • Existing Corroboration Techniques: TwoEstimate, BayesEstimate • Machine Learning based: ML-SVM, ML-Logistic • 36916 listings from 6 sources • Metrics • Precision, Recall, Accuracy • Mean Square error (MSE) of trust score 17
  • 18. Corroboration Results Precision Recall Accuracy F-1 Voting 0.65 1.00 0.66 0.79 Counting 0.94 0.65 0.76 0.77 BayesEstimate 0.63 1.00 0.67 0.77 TwoEstimate 0.65 1.00 0.66 0.79 ML-SVM 0.98 0.74 0.77 0.84 ML-Logistic 0.86 0.85 0.82 0.82 IncEstPS 0.66 1.00 0.68 0.79 IncEstHeu 0.86 0.86 0.83 0.86 18
  • 19. MSE on the sources Yellowpag es Foursquar e Menupage s Opentabl e Citysearc h Yel p MSE Accuracy 0.59 0.78 0.93 0.96 0.62 0.84 - TwoEstimate 1.00 1.00 0.98 1.00 1.00 0.98 0.063 BayesEstimat e 1.00 1.00 1.00 1.00 1.00 1.00 0.066 ML-Logistic 0.62 0.85 0.98 0.92 0.65 0.95 0.004 IncEstHeu 0.51 0.70 0.90 0.93 0.51 0.89 0.005 19
  • 20. Multi-value Trust Score • Simple Fact Selection • Entropy-based Fact Selection 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 Trustscore Time point Yellowpages Foursquare Menupages Opentable Citysearch Yelp 0.8 0.85 0.9 0.95 1 1.05 0 20 40 60 80 100 Trustscore Time point Yellowpages Foursquare Menupages Opentable Citysearch Yelp 20
  • 21. Conclusion • Proposed techniques for corroborating facts with mostly affirmative statements • Designed a novel algorithm that adopts a multi-value trust score for the sources • Incrementally selects facts by leveraging the information entropy of unknown facts • Uses different sets of sources’ trust scores to evaluate ach sets of facts • Performed experiments using both real world and synthetic (see paper) data 21