SlideShare a Scribd company logo
From unstructured data to
structured journalism
Giuseppe Futia
Nexa Center for Internet and Society, Politecnico di Torino
(DAUIN)
April 12, 2016
Master in Giornalismo "Giorgio Bocca" di Torino
Nexa Center for Internet &
Society at Politecnico di Torino
Website:
http://nexa.polito.it/
Communication Manager
Website, social media,
mailing-list
Research Fellow
GitHub account:
https://github.com/
giuseppefutia
Start with Why
Presentation of
Jonathan Stray
(Journalist, data scientist)
YouTube Video:
https://www.youtube.com/w
atch?v=z4wHiv4bs-Y
Who said What?
Best tool for multi-lingual
journalists
#newsHack 2016
organized by
BBC Connected Studio
Team
• 1 Product manager
• 1 Software engineer
• 2 Researchers
Team
• 1 Product manager
• 1 Software engineer
• 2 Researchers
• And journalists…?
New York Times, BBC,
Washington Post
Source: Poynter.org
Using "machine learning," technologists
at news outlets around the world are
helping newsrooms eliminate extra
time-consuming tasks and giving
humans more time to do what they do
best: reporting the news (Poynter.org)
Editor NYTimes Labs
Juicer BBC News Labs
Linked Data Cloud
Source:
https://en.wikipedia.org/
wiki/Linked_data
Knowledge Map Washington Post
Panama papers leak Source: Wired.com
Panama papers leak
• 11.5 million of documents
– 4.8 million of mails
– 4 million of database entries
– 2 million of PDFs
– 1 million of images
– 320.000 text documents
• 100 news organisations and 400 journalists
Panama papers processing
• Sort and organise the files
• Index these files
• Bring out all of the metadata
• Investigate data from the big data and
analytical perspective
Panama papers result
• The final database: 30 per cent of the original
data size
• Bring out entities: first names and second
names
• Analytics to find how these names refer to the
documents
TellMeFirst http://tellmefirst.polito.it
Public Contracts http://public-
contracts.nexacenter.org/
Data journalism as a framework
BBC News Labs Project
“To help news organisations
curate stories that scale, adapt
and connect across platforms
and use cases”
Thanks!
Mail
giuseppe.futia@polito.it
GitHub Repository
https://github.com/giuseppefutia/

More Related Content

PDF
The power of Structured Journalism & Hacker Culture in NPR
PDF
Journalism in an Age of Big Data: What It Is, Why It Matters and Where to Start
PDF
How to get started with Data Journalism
PPTX
data journalism
PPTX
Mac281 big data & journalism lecture 2014
PPTX
Dacena
PPTX
Mac373 med312 data journalism lecture
PDF
Data Con LA 2018 - From the Panama Papers by Mark Quinsland
The power of Structured Journalism & Hacker Culture in NPR
Journalism in an Age of Big Data: What It Is, Why It Matters and Where to Start
How to get started with Data Journalism
data journalism
Mac281 big data & journalism lecture 2014
Dacena
Mac373 med312 data journalism lecture
Data Con LA 2018 - From the Panama Papers by Mark Quinsland

Similar to From unstructured data to structured journalism (15)

PDF
The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...
PPTX
The art and science of data-driven journalism
PPTX
How Graphs Help Investigative Journalists to Connect the Dots
PDF
Storytelling in a digital age - challenges of a Data Journalist
PDF
IRJET- Fake News Detection
PDF
Periodismo automatizado
PPTX
Computational journalism projects
PPT
Analytic Journalism: Digital Evolution in the Datasphere
PPTX
COM00481 Week 1 Defining Journalism
PDF
Real News vs Fake News
DOCX
Finding news guide feb 2011
PDF
40 Questions & 12 Trends for the Future of News.
PDF
Fanta, putting europe’s robots on the map
PPTX
Digital Disruption and Journalism's Future
PDF
Data journalism: history and roles
The Rise of Data Journalism: The Making of Journalistic Knowledge through Qua...
The art and science of data-driven journalism
How Graphs Help Investigative Journalists to Connect the Dots
Storytelling in a digital age - challenges of a Data Journalist
IRJET- Fake News Detection
Periodismo automatizado
Computational journalism projects
Analytic Journalism: Digital Evolution in the Datasphere
COM00481 Week 1 Defining Journalism
Real News vs Fake News
Finding news guide feb 2011
40 Questions & 12 Trends for the Future of News.
Fanta, putting europe’s robots on the map
Digital Disruption and Journalism's Future
Data journalism: history and roles
Ad

More from giuseppe_futia (7)

PDF
Removing barriers to transparency: a case study on the use of semantic techno...
PPTX
From Big Linked Data to Linked Big Data - DBpedia as a framework for data int...
PPTX
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
PPTX
TellMeFirst - A knowledge domain discovery framework
PPT
Visualization of Linked Data
ODP
Exploiting Linked Open Data and Natural Language Processing for Classificati...
PDF
Visualizing Internet-Measurements Data for Research Purposes: the NeuViz Data...
Removing barriers to transparency: a case study on the use of semantic techno...
From Big Linked Data to Linked Big Data - DBpedia as a framework for data int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
TellMeFirst - A knowledge domain discovery framework
Visualization of Linked Data
Exploiting Linked Open Data and Natural Language Processing for Classificati...
Visualizing Internet-Measurements Data for Research Purposes: the NeuViz Data...
Ad

Recently uploaded (20)

PPTX
Introduction to Knowledge Engineering Part 1
PDF
Lecture1 pattern recognition............
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Quality review (1)_presentation of this 21
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction to machine learning and Linear Models
PDF
annual-report-2024-2025 original latest.
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
1_Introduction to advance data techniques.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction to Knowledge Engineering Part 1
Lecture1 pattern recognition............
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Quality review (1)_presentation of this 21
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
.pdf is not working space design for the following data for the following dat...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
ISS -ESG Data flows What is ESG and HowHow
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Business Acumen Training GuidePresentation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to machine learning and Linear Models
annual-report-2024-2025 original latest.
Qualitative Qantitative and Mixed Methods.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
1_Introduction to advance data techniques.pptx
Foundation of Data Science unit number two notes
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx

From unstructured data to structured journalism