Corroborating Facts from Affirmative Statements

Corroborating
Information from
Affirmative Statements
Minji Wu, Rutgers University
Amélie Marian, Rutgers University

Background
• Information is often untrustworthy
• Erroneous (e.g, news site at breaking events)
• Misleading (e.g., malicious sources)
• Biased (e.g., political domains)
• Outdated (e.g., knowledge base that doesn’t update
frequently)
• This phenomenon is amplified by the widespread
information dependency (copy-paste)
• It is difficult for the user to discern the correctness of
information and the trustworthiness of the sources
2

Data Corroboration
• Early Corroboration
• Frequency-based approach
• Recent work on Corroboration techniques
• Trustworthiness of sources
A measure s(s) that quantify the precision of a source s
• Probability of information (facts)
A measure that s(f) quantify the probability that a fact f is true
• Starting with a default s(s), iteratively compute the
probabilities for the facts and the trustworthiness of the
sources
• Machine-Learning approaches
• Some corroboration problems can be seen as a ML
classification problem
4

What if there is no conflicts?
• Does the presence of information without
contradictions means it is correct?
5

Our Problem: Corroborating Information
with only Affirmative Statements
• We focus on scenarios in which sources have little
or no dissention
• Frequent real-world problem (rumors, hard-to-rebut
claims)
• Difficult to identify incorrect information since all
reported information is consistent
• Existing corroboration approaches do not work
well
• Rely on conflicting information to differentiate the
trustworthiness of the sources
6

Contributions
• Novel corroboration approach:
• Assigns multiple trust scores to each sources
• Considers the trustworthiness of the source for a group of
facts
• Corroboration algorithm incrementally evaluates facts
• Groups unknown facts based on the sources reporting
them
• Makes decisions based on information entropy
• Extensive real world and synthetic experiments that
demonstrate the benefits of our method
7

Evaluation Setting
• Corroboration task:
• Sources for restaurant address: Citysearch,
Foursquare, Menupages, Opentable, Yellowpages,
Yelp
• Golden set
• Selected restaurants in 3 zip codes: 601 listings
• Verified their legitimacy in person (Apr 2012)
• 340 true and 261 false
Identify legitimate restaurant listings in NYC given
the listing information from a set of sources
8

Motivating Example
Opentable Yelp Menupages Citysearch Yellowpag
es
Correct
value
M Bar T T true
Sam’s T T T T true
27 Sunshine T T T true
Crepe
Creations
T T false
El Portal T T false
Holy Basil F T false
Papatzul T T true
Wine Spot T T true
Vbar T T true
Wai Cafe T T false
Tomoe Sushi T T T true
Khushie 139 F F T false
9

State-of-the-art
Corroboration Strategies
Approaches
• TwoEstimate [Galland WSDM’10]
• Iteratively estimates the trust score of the sources
and the probability of the facts
• BayesEstimate [Zhao VLDB’12]
• Uses a Bayesian graphical model
• Considers a two-sided errors (false positives and
false negatives)Precisio
n
Recall Accurac
y
Computed trust scores
TwoEstimate .64 1 .67 (1, 1, 0.8, 0.9, 1)
BayesEstima
te
.58 1 .58 (1, 0.8, 0.6, 1, 1)
used to evaluate each fact!10

Key Observation
• Using the same trust score to judge the correctness
of all information is too coarse
• Each source may exhibit different accuracy towards
different group of facts
• The corroboration result could be greatly improved if
we could derive finer-grained trust scores for each
source
11
Multi-value trust scores for sources

Trust Scores
• Single-value trust scores (s(s))
• A single measure for each source
• Each fact is evaluated using the same value from each
source
• Multi-value trust scores
• A group of values assigned to each source
s(s) = < s1(s), s2(s), …>
• Each (group of) fact is evaluated using one of the trust
values from each source
12

Multi-Value Trust Scores
• Two major challenges
• How to calculate the trust values for each source
• How to decide which sources’ trust values to
consider for each fact
• Solution: an incremental evaluation mechanism
• Select a subset of facts to process
• Update the trust values based on the already
processed facts
• Facts are assigned a truth value when they are
processed
13

How to Select Facts?
• Model each fact f as a random variable
• Objective: compute the probability s(f) that f is true
• Information Entropy approach:
• Consider the entropy H(f) of each fact f
• The entropy of a random variable measures its uncertainty
• Our solution: select facts such that the entropy of
unknown facts are maximized
• Existing corroboration techniques normalize their results
to attain a probability of 1 (or 0) for each fact, i.e., entropy
of 0
• Reducing uncertainty leads to (too) early consensus
14

Heuristics for Selecting Facts
• Group facts based on the votes from sources
• At each step i:
• Calculate the entropy of each fact group using si(s)
• Calculate ΔH(FG) for each fact group FG
(Represents the change of entropy if FG is selected)
• Select both positive and negative fact groups with highest
ΔH(FG)
• Assign positive and negative values to the same number of
facts
15

Revisiting the running
example
Positive: {r7}, {r2}, {r3}, {r5, r8}, {r11}, {r9}, {r4, r10}, {r6}, {r1}
Negative: {r12}
Positive: {r3}, {r11}, {r5, r8}, {r2}, {r9}, {r1}
s(S)={0.9, 0.9, 0.9, 0.9,
0.9}
s(S)={1, 1, 1, 0, 0.9}
Negative: {r4, r10}, {r6}
F1={r7, r12}F2={r3, {r4, r10}}
Positive: {r9}, {r5, r8}, {r1}, {r11}, {r2}
s(S)={1, 1, 1, 0, 0.5}
Negative: {r10}, {r6}
F2={r3, r4}F3={r9, r10}
Positive: {r5, r8}, {r1}, {r11}, {r2}
s(S)={1, 1, 1, 0, 0.5}
Negative: {r6}
F4={r5, r6}
Positive: {r8}, {r3}, {r11}, {r2}
s(S)={1, 1, 1, 0, 0.5}
Negative:
True facts: r7
False facts:r12
r3
r4
r9
r10
r5
r6
r3 r8 r2 r11 Precision Recall Accurac
y
0.86 1 0.92
16
Precisio
n
Recall Accurac
y
Computed trust scores
TwoEstimate .64 1 .67 (1, 1, 0.8, 0.9, 1)
BayesEstima
te
.58 1 .58 (1, 0.8, 0.6, 1, 1)
IncEstHeu .86 1 .92 (0.9,0.9,0.9,0.9,0.9)
(1,1,1,0,0.9)
(1,1,1,0,0.5)

Experimental Setting
• Algorithms
• We implemented two strategies (IncEstPS, IncEstHeu) using
Java
• Frequency-based: Voting and Counting
• Existing Corroboration Techniques: TwoEstimate, BayesEstimate
• Machine Learning based: ML-SVM, ML-Logistic
• 36916 listings from 6 sources
• Metrics
• Precision, Recall, Accuracy
• Mean Square error (MSE) of trust score
17

Corroboration Results
Precision Recall Accuracy F-1
Voting 0.65 1.00 0.66 0.79
Counting 0.94 0.65 0.76 0.77
BayesEstimate 0.63 1.00 0.67 0.77
TwoEstimate 0.65 1.00 0.66 0.79
ML-SVM 0.98 0.74 0.77 0.84
ML-Logistic 0.86 0.85 0.82 0.82
IncEstPS 0.66 1.00 0.68 0.79
IncEstHeu 0.86 0.86 0.83 0.86
18

MSE on the sources
Yellowpag
es
Foursquar
e
Menupage
s
Opentabl
e
Citysearc
h
Yel
p
MSE
Accuracy 0.59 0.78 0.93 0.96 0.62 0.84 -
TwoEstimate 1.00 1.00 0.98 1.00 1.00 0.98 0.063
BayesEstimat
e
1.00 1.00 1.00 1.00 1.00 1.00 0.066
ML-Logistic 0.62 0.85 0.98 0.92 0.65 0.95 0.004
IncEstHeu 0.51 0.70 0.90 0.93 0.51 0.89 0.005
19

Multi-value Trust Score
• Simple Fact Selection • Entropy-based Fact
Selection
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100
Trustscore
Time point
Yellowpages
Foursquare
Menupages
Opentable
Citysearch
Yelp
0.8
0.85
0.9
0.95
1
1.05
0 20 40 60 80 100
Trustscore
Time point
Yellowpages
Foursquare
Menupages
Opentable
Citysearch
Yelp
20

Conclusion
• Proposed techniques for corroborating facts with
mostly affirmative statements
• Designed a novel algorithm that adopts a multi-value
trust score for the sources
• Incrementally selects facts by leveraging the information
entropy of unknown facts
• Uses different sets of sources’ trust scores to evaluate ach
sets of facts
• Performed experiments using both real world and
synthetic (see paper) data
21

Corroborating Facts from Affirmative Statements

More Related Content

Viewers also liked (20)

Similar to Corroborating Facts from Affirmative Statements (20)

More from Amélie Marian (8)

Recently uploaded (20)

Corroborating Facts from Affirmative Statements