Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster

Naeemul Hassan1 Fatma Arslan2 Chengkai Li2 Mark Tremayne3
1Department of Computer and Information Science, University of Mississippi
2Deparment of Computer Science and Engineering, University of Texas at Arlington
3Department of Communication, University of Texas at Arlington
Fake-news floods social media (“filter bubbles” and “echo chambers”)
The Quest to Automate Fact-checking
Politicians make false and misleading claims
§ Facebook trending topic algorithms promoted fake-news.
§ A sample of 140,000 Twitter users in the battleground state of Michigan shared as many junk news
items as professional news during the final ten days of the 2016 election. http://politicalbots.org/?p=1064
National security threats
§ Russian government interfered with the 2016 election. Fake-news websites and bots used.
§ Pizzagate: conspiracy theory led to shooting
§ 100+ active fact-checking sites in 2017 (PolitiFact.com, FullFact.org, CNN,
Washington Post, …)
§ Google and Bing include fact-checks in search results.
§ Facebook lets users report false items and flags items disputed by fact-checkers.
Claim Spotting: Check-worthy Factual Claims Detection
Presidential Debate
Transcripts (1960-2012)
20788 sentences
Ground
Truth
Human
Annotation Feature
Vectors
Feature
Extraction
Learning
Algorithm
Important
Factual Claims
2016
Presidential
Debates
Classification and ranking by check-worthiness
§ Non-Factual Sentence (NFS) (Opinions, beliefs,
declarations): “But I think it’s time to talk about the future.”
§ Unimportant Factual Sentence (UFS): “Two days ago we
ate lunch at a restaurant.”
§ Check-worthy Factual Sentence (CFS): “He voted against
the first Gulf War.”
Feature extraction and selection
I was in a state where my legislature was 87 percent Democrat.
Entity Type: QuantityPart-of-Speech: Noun Concept: United States
Sentiment: 0.032 Words: state, legislature, 87, percent, democrat
Case Study: 2016 U.S. Presidential Election Debates
Data Labeling and Ground-Truth Collection
20788 sentences
374 coders
76552 labels
86 top-quality coders
52333 labels
Majority voting
20617 admitted sentences
Combating falsehoods
Comparison of topic distributions of CNN, PolitiFact fact-checked
sentences and sentences scored high (>=.5) by ClaimBuster
Funded by
Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster
Fact-checks on major party presidential nominees by PolitiFact
Lack of automated tools that assist fact-checkers
Coding
website
bit.ly/claimbusters
o 20788 sentences
o 20 months, 374 coders, ~$4,000 paid
o 30 training sentences
o 1032 screening sentences (731 NFS,
63 UFS, 238 CFS) to detect spammers
& low-quality coders
Coder quality
Quality assurance
Feature importance
§ “The Holy Grail”: fully automated fact-checking
End-to-End Fact-Checking
System idir.uta.edu/claimbuster
Classification and Ranking Accuracy

Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster

More Related Content

What's hot (20)

Similar to Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster (20)

More from The Innovative Data Intelligence Research (IDIR) Laboratory, University of Texas at Arlington (20)

Recently uploaded (20)

Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster