SlideShare a Scribd company logo
Let your data shine… with OpenRefine
Open Belgium 2016
OpenRefine workshop
Brosens - Desmet
What people say: tweets
@bartox: "Damn! Wish I had this 5 years ago! RT @swiertz nice tools ! Format & clean your data with Google Refine http:
//goo.gl/UniR6 #cleanup #tools" view tweet
@Musebrarian: "YIPEEEE! Google Refine works with OAI-PMH XML out of the box. This is going to make my life much
easier." view tweet
@kb: "It’s kind of ridiculous how exciting I find this: https://code.google.com/p/google-refine/" view tweet
@litcritter: "I rarely feel the desire to kiss a corporation on the mouth, but Google Refine is making me come close http:
//goo.gl/8pvKB #datageek" view tweet
@LearonDalby: "I'm sold on #Google #Refine used it most of the day with "messy" data and managed to clean nearly all of
it." view tweet
@roolio: "Today google #refine saved my afternoon. Every #data #hacker should try it" view tweet
@Salesient: "Google refine is awesome. Never before have I been home this early." view tweet
@Mayin: "Not only will it clean your data, Google Refine will slice, dice and put bows on your hairdo!http://bit.ly/cPGn1E
Rocks data exploration." view tweet
@marklabedz: "Google Refine: Making interns unneccesary since 2010." view tweet
@naterkane: "i'm completely in love with Google Refine. fo' reals." view tweet
@LearonDalby: "Using #Google #Refine makes me happy. Even for the easy stuff." view tweet
@loranstefani: "Google Refine: love at first click" view tweet
@tracystan: "Google Refine is gonna change my life" view tweet
What people say: tweets
"Google Refine isn’t going to solve the problem of poor data availability, but for those who manage to gain access to
existing records, it can be a powerful tool for transparency." Rebekah Heacock, co-director of the Technology for
Transparency Network and a Project Coordinator at Harvard’s Berkman Center for Internet and Society - Sunlight
Foundation, Tools for transparency: Google Refine.
"Google Refine is an immensely powerful tool for dealing with "messy" data, and it sports a myriad of advanced features for
massaging and analyzing complex data sets" Dmitri Popov (Linux Magazine) - Use Google Refine to Massage Your Data
"For anyone who’s ever had to sort through messy data to try to turn up a meaningful treatment, and who hasn’t, this tool is
a godsend." Michael Lines, SLAW - Google Refine 2.0
"Google Refine 2.0 will serve an excellent back-end for data visualization services. It has been well received by the Chicago
Tribune and open-government data communities. Along with Google Squared, Refine 2.0 can create a powerful research
tool." Chinmoy Kanjilal, Techie Buzz - Google Refine 2.0: Power Tools for Working With Data
What people say: blogs
● Formerly known as Google Refine, now OpenRefine
● Site: http://openrefine.org
● Github: https://github.com/OpenRefine
● Used for
○ Data cleaning (detect and correct anomalies)
○ Transform data (change format, change datatype)
○ “Pimp” & “link” data (harvest & connect data from online databases)
● More powerful than a worksheet
● More visual than scripting
A free, open source, powerful tool for working with messy data
● Supported by a large community (lots of tutorials and plugins)
● Works quite well up to 100.000 rows of data
● Supports several file formats
● The original file is unaffected
● OpenRefine runs in a modern browser, but does not require an internet
connection (except when you connect to services)
A free, open source, powerful tool for working with messy data
Other tools OpenRefine
Worksheet focus on cells focus on rows and columns
focus on import data &
calculations
focus on exploring and
transforming existing data
Scripting data → script → output all steps are visualized
focus on transformation of
data
Databases focus on queries looks like a worksheet
you should know the data data is always visible, facets
shows you choices
OpenRefine vs other tools
Distribution Description Authors
LODRefine LODRefine is actually OpenRefine with integrated extensions that make transition from
tabular data to Linked Data a bit easier. Integrated extensions are: RDF extension, DBpedia
extension, Crowdsourcing extension, Stats extension
Sparkica
OpenDataRise Tool to cleanse and semantify datasets from CKAN repositories. Based on OpenRefine. Open Data in
Trentino
p3-batchrefine BatchRefine adds batch processing capabilities to OpenRefine and support multiple back
end including spark
SpazioDati
SparkonRefine RefineOnSpark is a driver program to run OpenRefine jobs on the Spark cluster SpazioDati
Reconciliation-and-Matching-
Framework
A framework to allow the matching of string entities using customised sets of transformations
and matchers, plus a tool to produce the necessary configurations and another to expose
them as OpenRefine reconciliation services.
RBGKew
Tools working with OpenRefine
● Download Google Refine on: http://openrefine.org/download.html
● Launch Google Refine
● Create a project
● Choose the file you want to clean (Example Dataset: Onderwijsaanbod in Vlaanderen
(http://opendata.vlaanderen.be/dataset/onderwijsaanbod)
Hands on: install OpenRefine
● Check the preview and define parsing
○ Set character encoding (UTF8)
○ Choose delimiter (/t ; , …)
○ Parse data as (csv)
○ Parse first line as column header, ignore first … line(s)....
Hands on: importing data
● Accessing information organized according to a faceted classification system
○ Creating an overview of the data
○ Allows targeted editing of your data
○ Allows specific filtering
○ Facet choices as tab separated values (like pivot tables in Excel)
Hands on: faceting
● Clustering allows to automatically group and edit different but similar values
Hands on: clustering
● Common transforms:
○ to number
○ trim leading and trailing whitespace
○ to title case; to date; to number
● Split & Join multi valued cells
Hands on: edit cells
● Split columns (by separator or field length)
● Add columns (by fetching urls or based on column) (use GREL)
● Move columns
● Remove columns
● Rename columns
Hands on: edit columns
● GREL (google refine expression language)
○ add columns based on other column
■ basic string modification
■ find and replace
■ string parsing and splitting
■ calling web services
○ Result are always visible in the Preview
Hands on: scripting using GREL
● Add columns by fetching url
■ find and replace
■ string parsing & splitting
■ add column based on column”straat” (value+”%20”+cells[‘huisnummer’].value)
■ Call google API (or openstreetmap or….) ("https://maps.googleapis.
com/maps/api/geocode/json?address="+value+ cells["huisnummer"].
value&key=AIzaSyDY2Z6wehbIqIPrHIb9ljC62pwRqEHOous")
■ Parse JSON (value.parseJson()["results"][0]["geometry"]["location"]["lng"])
Hands on: georeferencing
● Grouping concepts with an external service, eg taxonomic reconciliation
○ Example from the natural environment (biodiversity data)
■ add a reconciliation service (reconcile, start reconciling)
■ Let’s use Encyclopedia of Life
■ Select Matches (Facet, Quick actions…)
Hands on: reconciling
● Grouping concepts with an external service, eg taxonomic reconciliation
○ Example from the natural environment (biodiversity data)
■ add ID EOL ID column (GREL) cell.recon.match.id
■ create url based on EOL ID
■ http://eol.org/pages/3465521
Hands on: reconciling
● Merge data from the two projects by creating a new column from values from
an existing column within one project that are used to index into a similar
column in the other project
○ cell.cross("datasetname.csv","scientificName").cells["order"].value[0]
Hands on: cross referencing
● Extract and save parts of your operation history as JSON that you can apply
to this or other projects in the future.
Hands on: Extract operation history
● https://github.com/OpenRefine/OpenRefine/wiki
● https://github.com/OpenRefine/OpenRefine/wiki/Recipes
● http://enipedia.tudelft.nl/wiki/OpenRefine_Tutorial
● ...
Hands on: further reading

More Related Content

PDF
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
PPTX
OpenRefine Tutorial
PDF
Introduction to OpenRefine
PPTX
Data Wrangling with Open Refine
PPTX
TXDHC OpenRefine Training
PPTX
OpenRefine Class Tutorial
PPTX
OMG! My metadata is as fresh as the Backstreet Boys: How Google Refine can up...
PDF
ISWC 2014 - Dandelion: from raw data to dataGEMs for developers
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
OpenRefine Tutorial
Introduction to OpenRefine
Data Wrangling with Open Refine
TXDHC OpenRefine Training
OpenRefine Class Tutorial
OMG! My metadata is as fresh as the Backstreet Boys: How Google Refine can up...
ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

What's hot (20)

PDF
Using entity extraction extension with OpenRefine and Dandelion API
PPT
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
PDF
Congressional PageRank: Graph Analytics of US Congress With Neo4j
PDF
Knowledge discoverylaurahollink
PPTX
SSSW2015 Data Workflow Tutorial
PPT
The Power of Semantic Technologies to Explore Linked Open Data
PDF
The Digital Cavemen of Linked Lascaux
PPTX
Consuming Linked Data 4/5 Semtech2011
PDF
The Nature.com ontologies portal - Linked Science 2015
PDF
What Factors Influence the Design of a Linked Data Generation Algorithm?
PDF
Cenitpede: Analyzing Webcrawl
PDF
Informal presentation about RES
PPT
A Semantic Data Model for Web Applications
PDF
Finding Insights In Connected Data: Using Graph Databases In Journalism
PDF
The RDF Report Card: Beyond the Triple Count
PPTX
Introduction to Linked Data 1/5
PPTX
ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...
PDF
SF Python Meetup: TextRank in Python
PPTX
Scalable Web Data Management using RDF
PDF
ODI Summit 2016 - Linked Open Data at Springer Nature
Using entity extraction extension with OpenRefine and Dandelion API
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
Congressional PageRank: Graph Analytics of US Congress With Neo4j
Knowledge discoverylaurahollink
SSSW2015 Data Workflow Tutorial
The Power of Semantic Technologies to Explore Linked Open Data
The Digital Cavemen of Linked Lascaux
Consuming Linked Data 4/5 Semtech2011
The Nature.com ontologies portal - Linked Science 2015
What Factors Influence the Design of a Linked Data Generation Algorithm?
Cenitpede: Analyzing Webcrawl
Informal presentation about RES
A Semantic Data Model for Web Applications
Finding Insights In Connected Data: Using Graph Databases In Journalism
The RDF Report Card: Beyond the Triple Count
Introduction to Linked Data 1/5
ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...
SF Python Meetup: TextRank in Python
Scalable Web Data Management using RDF
ODI Summit 2016 - Linked Open Data at Springer Nature
Ad

Similar to Let your data shine... with OpenRefine (20)

PPTX
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
PDF
How to build and run a big data platform in the 21st century
PDF
Open Data Inside - Why Internal Data Portals are Key to Successful Data Gover...
PPTX
H2O & Tensorflow - Fabrizio
PDF
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
PDF
Google Dremel. Concept and Implementations.
PDF
Open event (Drupalcamp Sunderland 2015)
PDF
Open Data Portals: 9 Solutions and How they Compare
PPTX
Linked Open Data Principles, benefits of LOD for sustainable development
PDF
Linked Open Data for Digital Humanities
PDF
The Problem with Data Portals: A Data Portal is just the tip of a Data Govern...
PDF
Publishing Linked Data using Schema.org
PDF
The Problem with Data Portals - PUBLIC (FINAL).pdf
PDF
Introduction to Open Data and Data Science
PDF
SDSC18 and DSATL Meetup March 2018
PDF
Presenting Your Digital Research
PPTX
Drupal 8 preview_slideshow
PDF
Building Data Products with Python (Georgetown)
PPTX
(PROJEKTURA) open data big data @tgg osijek
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
How to build and run a big data platform in the 21st century
Open Data Inside - Why Internal Data Portals are Key to Successful Data Gover...
H2O & Tensorflow - Fabrizio
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Google Dremel. Concept and Implementations.
Open event (Drupalcamp Sunderland 2015)
Open Data Portals: 9 Solutions and How they Compare
Linked Open Data Principles, benefits of LOD for sustainable development
Linked Open Data for Digital Humanities
The Problem with Data Portals: A Data Portal is just the tip of a Data Govern...
Publishing Linked Data using Schema.org
The Problem with Data Portals - PUBLIC (FINAL).pdf
Introduction to Open Data and Data Science
SDSC18 and DSATL Meetup March 2018
Presenting Your Digital Research
Drupal 8 preview_slideshow
Building Data Products with Python (Georgetown)
(PROJEKTURA) open data big data @tgg osijek
Ad

More from Open Knowledge Belgium (20)

PPTX
Open Data Stories You haven't heard!
PPTX
A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)
PDF
Smarter by Open Data: Process and Practice in Flevoland (NL)
PDF
Open Knowledge for Social Innovation
PDF
Smart Flanders: Tackling urban challenges through Open Data
PDF
EIF and NIFO connecting public administrations, businesses, and citizens
PDF
Connecting Open data for solving the fiscal transparency puzzle in the EU
PDF
Open Government and Networked European Democracy
PPTX
Mundaneum Factories for Open Tokenomics
PDF
MIRVA: The European Open Recognition Project
PPTX
Bike for Brussels - Open Summer of Code 2017
PPTX
The story behind SNCB alerts
PPTX
Traffic safety - answering tough questions with open data
PPTX
Eliminating data roadbloacks to get by traffic roadblocks without pain
PPTX
Linked Open Data in limbo: Open cultural heritage resources
PPTX
A journey to Linked Open Touristic Data
PDF
How we use the massive open lidar dataset for the benfit of our clients
PPTX
mu.semte.ch: A transitional architecture for Linked Data
PPTX
Linked Open Chatbots
PDF
The role and value of making data inventories
Open Data Stories You haven't heard!
A​ FUNUMENTARY:​ Take what you can, give nothing back...​ ​(NOT)
Smarter by Open Data: Process and Practice in Flevoland (NL)
Open Knowledge for Social Innovation
Smart Flanders: Tackling urban challenges through Open Data
EIF and NIFO connecting public administrations, businesses, and citizens
Connecting Open data for solving the fiscal transparency puzzle in the EU
Open Government and Networked European Democracy
Mundaneum Factories for Open Tokenomics
MIRVA: The European Open Recognition Project
Bike for Brussels - Open Summer of Code 2017
The story behind SNCB alerts
Traffic safety - answering tough questions with open data
Eliminating data roadbloacks to get by traffic roadblocks without pain
Linked Open Data in limbo: Open cultural heritage resources
A journey to Linked Open Touristic Data
How we use the massive open lidar dataset for the benfit of our clients
mu.semte.ch: A transitional architecture for Linked Data
Linked Open Chatbots
The role and value of making data inventories

Recently uploaded (20)

PPTX
Primary and secondary sources, and history
PPTX
Intro to ISO 9001 2015.pptx wareness raising
PPTX
2025-08-10 Joseph 02 (shared slides).pptx
PDF
oil_refinery_presentation_v1 sllfmfls.pdf
PDF
Why Top Brands Trust Enuncia Global for Language Solutions.pdf
DOCX
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
PPTX
The Effect of Human Resource Management Practice on Organizational Performanc...
PPTX
Hydrogel Based delivery Cancer Treatment
PPTX
Self management and self evaluation presentation
PPTX
Role and Responsibilities of Bangladesh Coast Guard Base, Mongla Challenges
DOCX
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
DOC
学位双硕士UTAS毕业证,墨尔本理工学院毕业证留学硕士毕业证
PPTX
Understanding-Communication-Berlos-S-M-C-R-Model.pptx
PPTX
Presentation for DGJV QMS (PQP)_12.03.2025.pptx
PPTX
_ISO_Presentation_ISO 9001 and 45001.pptx
PPTX
AcademyNaturalLanguageProcessing-EN-ILT-M02-Introduction.pptx
PDF
Swiggy’s Playbook: UX, Logistics & Monetization
PPTX
INTERNATIONAL LABOUR ORAGNISATION PPT ON SOCIAL SCIENCE
PPTX
Non-Verbal-Communication .mh.pdf_110245_compressed.pptx
PPTX
Effective_Handling_Information_Presentation.pptx
Primary and secondary sources, and history
Intro to ISO 9001 2015.pptx wareness raising
2025-08-10 Joseph 02 (shared slides).pptx
oil_refinery_presentation_v1 sllfmfls.pdf
Why Top Brands Trust Enuncia Global for Language Solutions.pdf
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
The Effect of Human Resource Management Practice on Organizational Performanc...
Hydrogel Based delivery Cancer Treatment
Self management and self evaluation presentation
Role and Responsibilities of Bangladesh Coast Guard Base, Mongla Challenges
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
学位双硕士UTAS毕业证,墨尔本理工学院毕业证留学硕士毕业证
Understanding-Communication-Berlos-S-M-C-R-Model.pptx
Presentation for DGJV QMS (PQP)_12.03.2025.pptx
_ISO_Presentation_ISO 9001 and 45001.pptx
AcademyNaturalLanguageProcessing-EN-ILT-M02-Introduction.pptx
Swiggy’s Playbook: UX, Logistics & Monetization
INTERNATIONAL LABOUR ORAGNISATION PPT ON SOCIAL SCIENCE
Non-Verbal-Communication .mh.pdf_110245_compressed.pptx
Effective_Handling_Information_Presentation.pptx

Let your data shine... with OpenRefine

  • 1. Let your data shine… with OpenRefine Open Belgium 2016 OpenRefine workshop Brosens - Desmet
  • 2. What people say: tweets @bartox: "Damn! Wish I had this 5 years ago! RT @swiertz nice tools ! Format & clean your data with Google Refine http: //goo.gl/UniR6 #cleanup #tools" view tweet @Musebrarian: "YIPEEEE! Google Refine works with OAI-PMH XML out of the box. This is going to make my life much easier." view tweet @kb: "It’s kind of ridiculous how exciting I find this: https://code.google.com/p/google-refine/" view tweet @litcritter: "I rarely feel the desire to kiss a corporation on the mouth, but Google Refine is making me come close http: //goo.gl/8pvKB #datageek" view tweet
  • 3. @LearonDalby: "I'm sold on #Google #Refine used it most of the day with "messy" data and managed to clean nearly all of it." view tweet @roolio: "Today google #refine saved my afternoon. Every #data #hacker should try it" view tweet @Salesient: "Google refine is awesome. Never before have I been home this early." view tweet @Mayin: "Not only will it clean your data, Google Refine will slice, dice and put bows on your hairdo!http://bit.ly/cPGn1E Rocks data exploration." view tweet @marklabedz: "Google Refine: Making interns unneccesary since 2010." view tweet @naterkane: "i'm completely in love with Google Refine. fo' reals." view tweet @LearonDalby: "Using #Google #Refine makes me happy. Even for the easy stuff." view tweet @loranstefani: "Google Refine: love at first click" view tweet @tracystan: "Google Refine is gonna change my life" view tweet What people say: tweets
  • 4. "Google Refine isn’t going to solve the problem of poor data availability, but for those who manage to gain access to existing records, it can be a powerful tool for transparency." Rebekah Heacock, co-director of the Technology for Transparency Network and a Project Coordinator at Harvard’s Berkman Center for Internet and Society - Sunlight Foundation, Tools for transparency: Google Refine. "Google Refine is an immensely powerful tool for dealing with "messy" data, and it sports a myriad of advanced features for massaging and analyzing complex data sets" Dmitri Popov (Linux Magazine) - Use Google Refine to Massage Your Data "For anyone who’s ever had to sort through messy data to try to turn up a meaningful treatment, and who hasn’t, this tool is a godsend." Michael Lines, SLAW - Google Refine 2.0 "Google Refine 2.0 will serve an excellent back-end for data visualization services. It has been well received by the Chicago Tribune and open-government data communities. Along with Google Squared, Refine 2.0 can create a powerful research tool." Chinmoy Kanjilal, Techie Buzz - Google Refine 2.0: Power Tools for Working With Data What people say: blogs
  • 5. ● Formerly known as Google Refine, now OpenRefine ● Site: http://openrefine.org ● Github: https://github.com/OpenRefine ● Used for ○ Data cleaning (detect and correct anomalies) ○ Transform data (change format, change datatype) ○ “Pimp” & “link” data (harvest & connect data from online databases) ● More powerful than a worksheet ● More visual than scripting A free, open source, powerful tool for working with messy data
  • 6. ● Supported by a large community (lots of tutorials and plugins) ● Works quite well up to 100.000 rows of data ● Supports several file formats ● The original file is unaffected ● OpenRefine runs in a modern browser, but does not require an internet connection (except when you connect to services) A free, open source, powerful tool for working with messy data
  • 7. Other tools OpenRefine Worksheet focus on cells focus on rows and columns focus on import data & calculations focus on exploring and transforming existing data Scripting data → script → output all steps are visualized focus on transformation of data Databases focus on queries looks like a worksheet you should know the data data is always visible, facets shows you choices OpenRefine vs other tools
  • 8. Distribution Description Authors LODRefine LODRefine is actually OpenRefine with integrated extensions that make transition from tabular data to Linked Data a bit easier. Integrated extensions are: RDF extension, DBpedia extension, Crowdsourcing extension, Stats extension Sparkica OpenDataRise Tool to cleanse and semantify datasets from CKAN repositories. Based on OpenRefine. Open Data in Trentino p3-batchrefine BatchRefine adds batch processing capabilities to OpenRefine and support multiple back end including spark SpazioDati SparkonRefine RefineOnSpark is a driver program to run OpenRefine jobs on the Spark cluster SpazioDati Reconciliation-and-Matching- Framework A framework to allow the matching of string entities using customised sets of transformations and matchers, plus a tool to produce the necessary configurations and another to expose them as OpenRefine reconciliation services. RBGKew Tools working with OpenRefine
  • 9. ● Download Google Refine on: http://openrefine.org/download.html ● Launch Google Refine ● Create a project ● Choose the file you want to clean (Example Dataset: Onderwijsaanbod in Vlaanderen (http://opendata.vlaanderen.be/dataset/onderwijsaanbod) Hands on: install OpenRefine
  • 10. ● Check the preview and define parsing ○ Set character encoding (UTF8) ○ Choose delimiter (/t ; , …) ○ Parse data as (csv) ○ Parse first line as column header, ignore first … line(s).... Hands on: importing data
  • 11. ● Accessing information organized according to a faceted classification system ○ Creating an overview of the data ○ Allows targeted editing of your data ○ Allows specific filtering ○ Facet choices as tab separated values (like pivot tables in Excel) Hands on: faceting
  • 12. ● Clustering allows to automatically group and edit different but similar values Hands on: clustering
  • 13. ● Common transforms: ○ to number ○ trim leading and trailing whitespace ○ to title case; to date; to number ● Split & Join multi valued cells Hands on: edit cells
  • 14. ● Split columns (by separator or field length) ● Add columns (by fetching urls or based on column) (use GREL) ● Move columns ● Remove columns ● Rename columns Hands on: edit columns
  • 15. ● GREL (google refine expression language) ○ add columns based on other column ■ basic string modification ■ find and replace ■ string parsing and splitting ■ calling web services ○ Result are always visible in the Preview Hands on: scripting using GREL
  • 16. ● Add columns by fetching url ■ find and replace ■ string parsing & splitting ■ add column based on column”straat” (value+”%20”+cells[‘huisnummer’].value) ■ Call google API (or openstreetmap or….) ("https://maps.googleapis. com/maps/api/geocode/json?address="+value+ cells["huisnummer"]. value&key=AIzaSyDY2Z6wehbIqIPrHIb9ljC62pwRqEHOous") ■ Parse JSON (value.parseJson()["results"][0]["geometry"]["location"]["lng"]) Hands on: georeferencing
  • 17. ● Grouping concepts with an external service, eg taxonomic reconciliation ○ Example from the natural environment (biodiversity data) ■ add a reconciliation service (reconcile, start reconciling) ■ Let’s use Encyclopedia of Life ■ Select Matches (Facet, Quick actions…) Hands on: reconciling
  • 18. ● Grouping concepts with an external service, eg taxonomic reconciliation ○ Example from the natural environment (biodiversity data) ■ add ID EOL ID column (GREL) cell.recon.match.id ■ create url based on EOL ID ■ http://eol.org/pages/3465521 Hands on: reconciling
  • 19. ● Merge data from the two projects by creating a new column from values from an existing column within one project that are used to index into a similar column in the other project ○ cell.cross("datasetname.csv","scientificName").cells["order"].value[0] Hands on: cross referencing
  • 20. ● Extract and save parts of your operation history as JSON that you can apply to this or other projects in the future. Hands on: Extract operation history
  • 21. ● https://github.com/OpenRefine/OpenRefine/wiki ● https://github.com/OpenRefine/OpenRefine/wiki/Recipes ● http://enipedia.tudelft.nl/wiki/OpenRefine_Tutorial ● ... Hands on: further reading