SlideShare a Scribd company logo
Using entity extraction extension with 	

OpenRefine and Dandelion API	

!
food for thoughts
What we are talking about
OpenRefine www.openrefine.org
NER extension integrated with
Dandelion API
http://freeyourmetadata.org/named-entity-extraction/
(dandelion.eu)
What industries are using OpenRefine?
https://groups.google.com/d/msg/openrefine/vA75Ac_XODo/AfG8IRlEfSAJ
data journalists
metadata curators
museums
libraries
research labs
SEO folks
data scientists
enterprises
universities
patent attorneys
Open Data 	

hackers
Social Media specialists
civil servants
What does OpenRefine offer that other 	

data-parsing tools don't?
http://opendata.stackexchange.com/questions/515/what-does-openrefine-offer-that-other-data-parsing-tools-dont
reconciliation of text data against reference data 	

services containing strong identifiers (Freebase,
OpenCorporates, any SPARQL or RDF, etc)	

!
simple linking of reconciled entities to other info 	

sources like Wikipedia, MusicBrainz, IMDB, etc	

[…]
[…]
How we are using it, at SpazioDati?
OpenRefine is inside 
our data curation controller
normalize, clean and extract data from different 	

sources	

reconcile against internal reconciliation services 	

( administrative regions, names and telephone 	

numbers… )
apply rules and transformations to data, aligned	

it with our internal ontologies
A look at OpenRefine &	

reconciliation
Why it’s useful reconciliation?
Instruments
bla bla bla
bla bla bla bla
…
what kind of 	

instruments?
reconciliation identifies 	

keywords in flowing text and gives them a URL
from strings to things
instruments	

data column
musical instruments
measuring instruments
aeronautical instruments
URL
URL
URL
Instruments
bla bla bla
reconciliation works great for those fields 	

in your dataset that contain single terms
names of people	

countries, 	

works of art	

[…]
and what if we have a column with	

unstructured texts, like this one?
we need a new step in the data curation workflow…
a new column data,	

labelled “dataTXT”
extract named 	

entities using	

NER extension 	

+ Dandelion API
data column with 	

some texts
in this column, there are named concepts, 	

linked to Wikipedia
label + URI
“Collective action” + http://en.wikipedia.org/wiki/Collective_action
make a text filter
looking for a concept
classify and categorize 	

the content
…
things, not strings
some scenarios
Open Data community real issues
Using OpenRefine + NER extension with 	

Dandelion API
extract meaninful informations from some	

CVs, like names, organizations, skills, …
http://opendata.stackexchange.com/search?page=3&tab=relevance&q=extraction
normalize organizations names cited in some	

texts
Data journalists
Using OpenRefine + NER extension with 	

Dandelion API
extract relevant news to a precise topic	

( a person, a brand or a company )
write a summary from a politician speech, starting 	

from the main concepts extracted from the text
mine specific informations in judicial decisions 	

(judge's name, court, area of law and neutral citation
number
Using OpenRefine + NER extension with 	

Dandelion API
Text mining on tweets: extract brands,	

places and concepts easily from a twitter flow	

related to an event
Text mining on website content: extract concepts and
places easily from a webpage, to improve website	

SEO ranking
Social media specialists
Using OpenRefine + NER extension with 	

Dandelion API
Understand your own bank account statements: 	

extract useful informations, like brands and places, 	

to categorize and classify your own expenses
“Quantify self” movement
Analytics on Personal Data
@dandelionapi	

#refine	

#ner
you know other use cases?	

tell us on Twitter!
@spaziodatidandelion.eu

More Related Content

PDF
Data Curation @ SpazioDati - NEXA Lunch Seminar
PDF
ISWC 2014 - Dandelion: from raw data to dataGEMs for developers
PDF
Text analytics for Google Spreadsheets using Text Mining add-on
PDF
Introduction to OpenRefine
PDF
Toronto OpenRefine MeetUp Nov 2015
PPT
20110922 owf
PDF
PID services - understandability and findability of data
PDF
PID Services for FAIR data
Data Curation @ SpazioDati - NEXA Lunch Seminar
ISWC 2014 - Dandelion: from raw data to dataGEMs for developers
Text analytics for Google Spreadsheets using Text Mining add-on
Introduction to OpenRefine
Toronto OpenRefine MeetUp Nov 2015
20110922 owf
PID services - understandability and findability of data
PID Services for FAIR data

What's hot (20)

PPTX
Data Wrangling with Open Refine
PPT
The Power of Semantic Technologies to Explore Linked Open Data
PPTX
How to get your data into Sindice and Google with sitemap4rdf
PDF
ODI Summit 2016 - Linked Open Data at Springer Nature
PPT
SemanticWebApp
PPTX
TXDHC OpenRefine Training
PDF
It Don’t Mean a Thing If It Ain’t Got Semantics
PPTX
Omitola w3 c_govtlinkeddata
PDF
Discovering Related Data Sources in Data Portals
PDF
The Bounties of Semantic Data Integration for the Enterprise
PPTX
Fitting MarcEdit into the library software ecosystem
PPTX
Linked dataresearch
PDF
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
PPTX
Science in the open, what does it take?
KEY
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
PPTX
How open is open? An evaluation rubric for public knowledgebases
PDF
Iterative data discovery and transformation with open refine
PPTX
ORCID at Crossref LIVE Indonesia
PDF
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Data Wrangling with Open Refine
The Power of Semantic Technologies to Explore Linked Open Data
How to get your data into Sindice and Google with sitemap4rdf
ODI Summit 2016 - Linked Open Data at Springer Nature
SemanticWebApp
TXDHC OpenRefine Training
It Don’t Mean a Thing If It Ain’t Got Semantics
Omitola w3 c_govtlinkeddata
Discovering Related Data Sources in Data Portals
The Bounties of Semantic Data Integration for the Enterprise
Fitting MarcEdit into the library software ecosystem
Linked dataresearch
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
Science in the open, what does it take?
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
How open is open? An evaluation rubric for public knowledgebases
Iterative data discovery and transformation with open refine
ORCID at Crossref LIVE Indonesia
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Ad

Viewers also liked (6)

PDF
Journaliste web, 5 outils indispensables
PPTX
OpenRefine Tutorial
PDF
Neural nets: How regular expressions brought about deep learning
PPTX
Google refine tutotial
ODP
OpenRefine - Data Science Training for Librarians
PPTX
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Journaliste web, 5 outils indispensables
OpenRefine Tutorial
Neural nets: How regular expressions brought about deep learning
Google refine tutotial
OpenRefine - Data Science Training for Librarians
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Ad

Similar to Using entity extraction extension with OpenRefine and Dandelion API (20)

PDF
Democratizing Data at Airbnb
PPTX
Asis&t webinar people directories access innovations
PPTX
Doing Clever Things with the Semantic Web
PPTX
Search Me: Using Lucene.Net
PDF
OpenCalais in Linked Data context
PPTX
Making things findable
PPTX
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
PPT
Using metadata repositories with search
PPTX
Boost your data analytics with open data and public news content
PPT
Flax ovum search-across_the_enterprise
PPT
PoolParty SKOS and Linked Data
ODP
Linked Data
PPT
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
PDF
ITWS Capstone (RPI, Fall 2013)
PPT
Exploring and using the Semantic Web - SSSW09 tutorial
PPT
George thomas gtra2010
PPT
Making the Web searchable
PPTX
LUCERO - Building the Open University Web of Linked Data
PPT
Peter Mika's Presentation at SSSW 2011
PPT
Understanding Seo At A Glance
Democratizing Data at Airbnb
Asis&t webinar people directories access innovations
Doing Clever Things with the Semantic Web
Search Me: Using Lucene.Net
OpenCalais in Linked Data context
Making things findable
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Using metadata repositories with search
Boost your data analytics with open data and public news content
Flax ovum search-across_the_enterprise
PoolParty SKOS and Linked Data
Linked Data
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
ITWS Capstone (RPI, Fall 2013)
Exploring and using the Semantic Web - SSSW09 tutorial
George thomas gtra2010
Making the Web searchable
LUCERO - Building the Open University Web of Linked Data
Peter Mika's Presentation at SSSW 2011
Understanding Seo At A Glance

More from SpazioDati (14)

PDF
Dandelion API e Atoka: due strumenti utili al Data Journalism
PDF
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
PDF
SpazioDati presents dataTXT - SenTaClAus project - final meeting
PDF
Opening “Big Data Challenge” data: some insights on our role in the story
PDF
News Fact-checking: One Practical Application of Linked Statistics
PDF
SpazioDati presents dataTXT - SenTaClAus project - 2nd open day
PDF
Find the specific Wikipedia page you’re looking for, using Wikisearch API
PDF
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
PDF
Cerved Group scommette sull'analisi semantica made in Italy
PDF
LinkedStat: making ISTAT data more valuable
PDF
Smart Open Data Kickoff - Madrid - Linked
PDF
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
PDF
Introducing JSONpedia
PDF
Pubblicare Linked Open Data, lezione 1
Dandelion API e Atoka: due strumenti utili al Data Journalism
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati presents dataTXT - SenTaClAus project - final meeting
Opening “Big Data Challenge” data: some insights on our role in the story
News Fact-checking: One Practical Application of Linked Statistics
SpazioDati presents dataTXT - SenTaClAus project - 2nd open day
Find the specific Wikipedia page you’re looking for, using Wikisearch API
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
Cerved Group scommette sull'analisi semantica made in Italy
LinkedStat: making ISTAT data more valuable
Smart Open Data Kickoff - Madrid - Linked
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
Introducing JSONpedia
Pubblicare Linked Open Data, lezione 1

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Modernizing your data center with Dell and AMD
PDF
Encapsulation theory and applications.pdf
PDF
KodekX | Application Modernization Development
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Big Data Technologies - Introduction.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Approach and Philosophy of On baking technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Monthly Chronicles - July 2025
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Modernizing your data center with Dell and AMD
Encapsulation theory and applications.pdf
KodekX | Application Modernization Development
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx
Electronic commerce courselecture one. Pdf
Approach and Philosophy of On baking technology
Digital-Transformation-Roadmap-for-Companies.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation_ Review paper, used for researhc scholars
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Monthly Chronicles - July 2025

Using entity extraction extension with OpenRefine and Dandelion API

  • 1. Using entity extraction extension with OpenRefine and Dandelion API ! food for thoughts
  • 2. What we are talking about OpenRefine www.openrefine.org NER extension integrated with Dandelion API http://freeyourmetadata.org/named-entity-extraction/ (dandelion.eu)
  • 3. What industries are using OpenRefine? https://groups.google.com/d/msg/openrefine/vA75Ac_XODo/AfG8IRlEfSAJ
  • 4. data journalists metadata curators museums libraries research labs SEO folks data scientists enterprises universities patent attorneys Open Data hackers Social Media specialists civil servants
  • 5. What does OpenRefine offer that other data-parsing tools don't? http://opendata.stackexchange.com/questions/515/what-does-openrefine-offer-that-other-data-parsing-tools-dont
  • 6. reconciliation of text data against reference data services containing strong identifiers (Freebase, OpenCorporates, any SPARQL or RDF, etc) ! simple linking of reconciled entities to other info sources like Wikipedia, MusicBrainz, IMDB, etc […] […]
  • 7. How we are using it, at SpazioDati?
  • 8. OpenRefine is inside our data curation controller
  • 9. normalize, clean and extract data from different sources reconcile against internal reconciliation services ( administrative regions, names and telephone numbers… ) apply rules and transformations to data, aligned it with our internal ontologies
  • 10. A look at OpenRefine & reconciliation
  • 11. Why it’s useful reconciliation? Instruments bla bla bla bla bla bla bla … what kind of instruments?
  • 12. reconciliation identifies keywords in flowing text and gives them a URL from strings to things
  • 13. instruments data column musical instruments measuring instruments aeronautical instruments URL URL URL Instruments bla bla bla
  • 14. reconciliation works great for those fields in your dataset that contain single terms names of people countries, works of art […]
  • 15. and what if we have a column with unstructured texts, like this one?
  • 16. we need a new step in the data curation workflow… a new column data, labelled “dataTXT” extract named entities using NER extension + Dandelion API data column with some texts
  • 17. in this column, there are named concepts, linked to Wikipedia label + URI “Collective action” + http://en.wikipedia.org/wiki/Collective_action
  • 18. make a text filter looking for a concept classify and categorize the content … things, not strings
  • 20. Open Data community real issues Using OpenRefine + NER extension with Dandelion API extract meaninful informations from some CVs, like names, organizations, skills, … http://opendata.stackexchange.com/search?page=3&tab=relevance&q=extraction normalize organizations names cited in some texts
  • 21. Data journalists Using OpenRefine + NER extension with Dandelion API extract relevant news to a precise topic ( a person, a brand or a company ) write a summary from a politician speech, starting from the main concepts extracted from the text mine specific informations in judicial decisions (judge's name, court, area of law and neutral citation number
  • 22. Using OpenRefine + NER extension with Dandelion API Text mining on tweets: extract brands, places and concepts easily from a twitter flow related to an event Text mining on website content: extract concepts and places easily from a webpage, to improve website SEO ranking Social media specialists
  • 23. Using OpenRefine + NER extension with Dandelion API Understand your own bank account statements: extract useful informations, like brands and places, to categorize and classify your own expenses “Quantify self” movement Analytics on Personal Data
  • 24. @dandelionapi #refine #ner you know other use cases? tell us on Twitter! @spaziodatidandelion.eu