SlideShare a Scribd company logo
1-5 stars: Metadata on the Openness
Level of Open Data Sets in Europe
Sébastien Martin, Muriel Foulonneau, Slim Turki
Context & Objectives
•
•
•
•

Level of reuse of open data is still disappointing.
Development of open data requires a better reusability of data.
Degree of openness is a key success factor.
Catalogs listing data have a crucial role.

Analyse PublicData.eu catalogue
(i) identify the quality of a sample of metadata properties, which
are critical to enable data reuse
(ii) study the stated level of data openness.

21/11/2013

1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe

2
PublicData.eu
•

•

Many local and national portals to provide access to public sector open
datasets - 114 EU catalogues on datacatalogs.org
Gather datasets across geographic and institutional boundaries

PublicData.eu
•
•
•
•
•
•

pan-European catalogue launched under the FP7 LOD2 project.
aggregates data from CKAN open data catalogues all over Europe.
collects data from 26 sources
1st to be published in Europe in 2011
data beyond the European Union, e.g., Serbian datasets.
not exhaustive, it represents a unique aggregation of European datasets.

•
•

17.027 datasets
UK: largest provider

21/11/2013

3
Methodology
Descriptions of datasets collected in May 2013
236 distinct dataset properties identified, partially due to
•
•

linguistic diversity; some providers adapt property names in their language
problems of consistency in naming (upper / lower case, spaces /
underscore for a single field).

Major challenge to understand the content of the PublicData.eu
Data collected and analysed to identify information made available
on data openness and reusability in particular the licensing
conditions and the data formats.

21/11/2013

1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe

4
Tim Berners-Lee’s evaluation scale

★

Available on the web (whatever format) but with an
open license, to be Open Data

★★ Available as machine-readable structured data
★★★ 2 + non-proprietary format

★★★★
★★★★★

21/11/2013

3 + Use open standards from W3C (RDF and SPARQL)
to identify things
4 + Link your data to other people’s data to provide
context

1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe

5
★ Data Licences
13.535 / 17.027 datasets have at least 1 license indication
12.470 datasets can be considered having some form of open
license  73,24%
769 datasets have a Creative Commons license
Significant number of datasets have a national license:
•
•
•

apie v2 to publish information created by French public authorities
UK-crown which “covers material created by civil servants, ministers and
government departments and agencies” in the UK,
UK Open Government License

128 datasets with an explicitly closed license

21/11/2013

1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe

6
★★ Machine readable format
• Facilitates data reusability
• 4.051 / 17.027 with
content_TYPE
• 11.285 with at least one
indication about format
• 56 datasets in RDF
• Dominant proportion of
spreadsheets type’s formats
Distribution of formats

40% not a machine readable format
34% of datasets available in a machine readable format
 machine readability cond. for openness levels of 2★ and >
21/11/2013

1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe

7
★★★ Use of non-proprietary formats
Creates ambiguities as the openness nature of formats can be
debated in some cases:
•
•

Certain formats are proprietary but their specifications are open.
Some formats have been open at a certain point of time but additions and
further evolutions remain proprietary

In many cases, value of property was too vague to determine
whether the format was or not proprietary.
It was possible to identify:
•
•

For 49% of the datasets, a non-proprietary format
For 21% a proprietary format.

Use of proprietary formats is a critical issue for improving the
level of openness of datasets.

21/11/2013

1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe

8
★★★★ Use of open standards from
W3C
Including HTML, XML, and RDF in particular.
•

XML-based formats may be entirely independent from W3C (e.g. KML)

Availability in W3C standards: 9,5% of datasets
Availability in XML based formats: 10%

Information remains unknown in most cases

21/11/2013

1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe

9
★★★★★ Linked data
Linked data are only mentioned in the description of a single
dataset (Brandweer Amsterdam-Amstelland Uitrukberichten)
for which the format is described as “linked data api, rdf json”.
58 datasets mention RDF (or RDFa) as a format or content type,
i.e., 0,34%.

21/11/2013

1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe

10
Level of openness (1/2)
6.891 / 17.027 datasets show at least one information about their
degree of openness.
All come from Data.gov.uk (8 689 datasets)
For a majority of datasets, the level of openness is unknown.
•

21/11/2013

Coherent with lack of licensing information without which it is impossible
to conclude on even ★ openness level.

1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe

Distribution of openness levels in UK datasets

11
Level of openness (2/2)
Approximate level of openness derived from licensing and format
properties
•
•

73,24% of the datasets should have ★ or above.
Reference to 5★ should take into consideration linkages, cannot be
inferred from dataset metadata.

Level of openness according
to Format and License
related properties

Data openness mainly related to 1st level of compliance: licensing
issue.
•
21/11/2013

Data providers have clearly not focused on publication of data in reusable
formats.
1-5 stars: Metadata on the Openness Level of
12
Open Data Sets in Europe
Conclusion
• Limited openness of datasets advertised as open data
• Heterogeneity of associated metadata
 Difficulty for reusers to (i) discover datasets, despite the
creation of large catalogues of datasets, and to (ii) effectively
reuse machine readable and contextualized data.
★ may be sufficient to ensure transparency of gov. action,
facilitating reuse of data through services is not served below 2★
Confirmed risks regarding major challenges that data providers
have to face: (i) language barrier and (ii) lack of consistency of
metadata.
Harmonization of practices, training and tools necessary to
ensure that datasets are available in relevant formats.

21/11/2013

1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe

13
1-5 stars: Metadata on the Openness
Level of Open Data Sets in Europe
Sébastien Martin, Muriel Foulonneau, Slim Turki

Contact:

muriel.foulonneau@tudor.lu

More Related Content

PPT
Methodology for the publication of Linked Open Data from small and medium siz...
PPT
OLE Project Webinr - Conversation with CUFTS April 8 2009
PPTX
Information Extraction from EuroParliament and UK Parliament data
PDF
Linked Data and Semantic Web Application Development by Peter Haase
PPT
Semantic data integration proof of concept
PPTX
Tactical Data Link (TDL) Training Crash Course
PDF
SemIoT (Semantic technologies for Internet of Things) - Project Overview
PDF
LACNIC Update
Methodology for the publication of Linked Open Data from small and medium siz...
OLE Project Webinr - Conversation with CUFTS April 8 2009
Information Extraction from EuroParliament and UK Parliament data
Linked Data and Semantic Web Application Development by Peter Haase
Semantic data integration proof of concept
Tactical Data Link (TDL) Training Crash Course
SemIoT (Semantic technologies for Internet of Things) - Project Overview
LACNIC Update

What's hot (10)

PPTX
On chemical structures, substances, nanomaterials and measurements
PPTX
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
PPTX
Euro lipids 2014_graz
PDF
Documents, services, and data on the web
PPTX
Information Extraction in the TalkOfEurope Creative Camp
PDF
Linked Data Notifications Distributed Update Notification and Propagation on ...
PPT
Krakow2010
PPTX
Tonex's Link 16 Operational Overview Training
PDF
Automatics and Remote Control
PDF
Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
On chemical structures, substances, nanomaterials and measurements
OpenAIRE infrastructure presentation at the Semantic Services in EOSC worksho...
Euro lipids 2014_graz
Documents, services, and data on the web
Information Extraction in the TalkOfEurope Creative Camp
Linked Data Notifications Distributed Update Notification and Propagation on ...
Krakow2010
Tonex's Link 16 Operational Overview Training
Automatics and Remote Control
Master defence 2020 - Kateryna Liubonko - Matching Red Links to Wikidata Items
Ad

Viewers also liked (20)

PPT
The star system
PDF
How to resize facebook photos using pic monkey
PDF
COURRIER CAB 31 MD
PPTX
Google glass
PDF
VTSP 5.5
PDF
Геомаркетинг Геомаркетинговые исследования
PDF
Asat book0-fresh blood
PDF
Less is More
PDF
Carta de Oneida Pinto A El Espectador
PPTX
Presentación1
PDF
Calendario escolar
DOC
Aulbrey Meade - Surgical Tech RESUME
PPTX
Buruketak 3.1.
PPTX
New Barco ClickShare CMS-1
PPT
Making the cut - Roberta Lucca, Bossa
DOCX
Sistemas de equações de 1º grau - Como fazer + exercicios
PDF
Гаражи, Чернигов , ул. Пушкина
PPTX
Megacoderit
The star system
How to resize facebook photos using pic monkey
COURRIER CAB 31 MD
Google glass
VTSP 5.5
Геомаркетинг Геомаркетинговые исследования
Asat book0-fresh blood
Less is More
Carta de Oneida Pinto A El Espectador
Presentación1
Calendario escolar
Aulbrey Meade - Surgical Tech RESUME
Buruketak 3.1.
New Barco ClickShare CMS-1
Making the cut - Roberta Lucca, Bossa
Sistemas de equações de 1º grau - Como fazer + exercicios
Гаражи, Чернигов , ул. Пушкина
Megacoderit
Ad

Similar to 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe (20)

PPTX
OSFair2017 workshop | Monitoring the FAIRness of data sets - Introducing the ...
PDF
Llinked open data training for EU institutions
PPTX
Soren Auer - LOD2 - creating knowledge out of Interlinked Data
PPT
Webinar@AIMS_FAIR Principles and Data Management Planning
PDF
Linked Data for the Masses: The approach and the Software
PPTX
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
PDF
Industry@RuleML2015 DataGraft
KEY
How we can understand the world through open data
PDF
DatalEt-Ecosystem Provider - The DEEP project
PPTX
Fair data vs 5 star open data final
PPTX
CARARE: Can I use this data? FAIR into practice
PPTX
OSFair2017 Training | FAIR metrics - Starring your data sets
PPTX
Data sharing in the Netherlands
PDF
Can new technologies and digitalization improve infrastructure governance? - ...
PPTX
Exposing EO Linked (meta-)Data from OpenSearch Catalogue
PDF
Exploration, visualization and querying of linked open data sources
PPTX
Overview of the data pilot and OpenAIRE tools, Elly Dijk and Marjan Grootveld...
PPTX
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
PPTX
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
PDF
Data-as-a-Service: DataGraft
OSFair2017 workshop | Monitoring the FAIRness of data sets - Introducing the ...
Llinked open data training for EU institutions
Soren Auer - LOD2 - creating knowledge out of Interlinked Data
Webinar@AIMS_FAIR Principles and Data Management Planning
Linked Data for the Masses: The approach and the Software
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
Industry@RuleML2015 DataGraft
How we can understand the world through open data
DatalEt-Ecosystem Provider - The DEEP project
Fair data vs 5 star open data final
CARARE: Can I use this data? FAIR into practice
OSFair2017 Training | FAIR metrics - Starring your data sets
Data sharing in the Netherlands
Can new technologies and digitalization improve infrastructure governance? - ...
Exposing EO Linked (meta-)Data from OpenSearch Catalogue
Exploration, visualization and querying of linked open data sources
Overview of the data pilot and OpenAIRE tools, Elly Dijk and Marjan Grootveld...
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
Data-as-a-Service: DataGraft

More from Slim Turki, Dr. (19)

PDF
Data Spaces: A Promising Approach for African Data Governance
PDF
Local Digital Twins Conversations: Framing the Green + Digital Transition
PDF
Data ecosystems: turning data into public value
PPTX
#opendata Back to the future
PDF
Data Ecosystems for Geospatial Data
PDF
Open Data in Disaster Management
PPTX
BE-GOOD: Building an Ecosystem to Generate Opportunities in Open Data
PDF
How open data ecosystems are stimulated?
PDF
BE-GOOD Challenges - factsheet 2017-06
PDF
Service innovation: the hidden value of open data
PPTX
From open data to data-driven services
PDF
How open data are turned into services?
PPT
SPOCS: A semantic interoperability layer to support the implementation of the...
PPTX
Open Data: Barriers, Risks, and Opportunities
PPTX
Luxembourg Service Jam 2013 - Guide book
PDF
Luxembourg Service Jam 2012 - Guide book
PPTX
Global Service Jam - Luxembourg spot
PPT
Legora@IESS1.0
PDF
Compliance In e-government Service Engineering
Data Spaces: A Promising Approach for African Data Governance
Local Digital Twins Conversations: Framing the Green + Digital Transition
Data ecosystems: turning data into public value
#opendata Back to the future
Data Ecosystems for Geospatial Data
Open Data in Disaster Management
BE-GOOD: Building an Ecosystem to Generate Opportunities in Open Data
How open data ecosystems are stimulated?
BE-GOOD Challenges - factsheet 2017-06
Service innovation: the hidden value of open data
From open data to data-driven services
How open data are turned into services?
SPOCS: A semantic interoperability layer to support the implementation of the...
Open Data: Barriers, Risks, and Opportunities
Luxembourg Service Jam 2013 - Guide book
Luxembourg Service Jam 2012 - Guide book
Global Service Jam - Luxembourg spot
Legora@IESS1.0
Compliance In e-government Service Engineering

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Electronic commerce courselecture one. Pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Encapsulation theory and applications.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Cloud computing and distributed systems.
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
cuic standard and advanced reporting.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation_ Review paper, used for researhc scholars
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Advanced methodologies resolving dimensionality complications for autism neur...
“AI and Expert System Decision Support & Business Intelligence Systems”
Electronic commerce courselecture one. Pdf
Big Data Technologies - Introduction.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Encapsulation theory and applications.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Mobile App Security Testing_ A Comprehensive Guide.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Cloud computing and distributed systems.
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Network Security Unit 5.pdf for BCA BBA.
cuic standard and advanced reporting.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication

1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe

  • 1. 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe Sébastien Martin, Muriel Foulonneau, Slim Turki
  • 2. Context & Objectives • • • • Level of reuse of open data is still disappointing. Development of open data requires a better reusability of data. Degree of openness is a key success factor. Catalogs listing data have a crucial role. Analyse PublicData.eu catalogue (i) identify the quality of a sample of metadata properties, which are critical to enable data reuse (ii) study the stated level of data openness. 21/11/2013 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 2
  • 3. PublicData.eu • • Many local and national portals to provide access to public sector open datasets - 114 EU catalogues on datacatalogs.org Gather datasets across geographic and institutional boundaries PublicData.eu • • • • • • pan-European catalogue launched under the FP7 LOD2 project. aggregates data from CKAN open data catalogues all over Europe. collects data from 26 sources 1st to be published in Europe in 2011 data beyond the European Union, e.g., Serbian datasets. not exhaustive, it represents a unique aggregation of European datasets. • • 17.027 datasets UK: largest provider 21/11/2013 3
  • 4. Methodology Descriptions of datasets collected in May 2013 236 distinct dataset properties identified, partially due to • • linguistic diversity; some providers adapt property names in their language problems of consistency in naming (upper / lower case, spaces / underscore for a single field). Major challenge to understand the content of the PublicData.eu Data collected and analysed to identify information made available on data openness and reusability in particular the licensing conditions and the data formats. 21/11/2013 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 4
  • 5. Tim Berners-Lee’s evaluation scale ★ Available on the web (whatever format) but with an open license, to be Open Data ★★ Available as machine-readable structured data ★★★ 2 + non-proprietary format ★★★★ ★★★★★ 21/11/2013 3 + Use open standards from W3C (RDF and SPARQL) to identify things 4 + Link your data to other people’s data to provide context 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 5
  • 6. ★ Data Licences 13.535 / 17.027 datasets have at least 1 license indication 12.470 datasets can be considered having some form of open license  73,24% 769 datasets have a Creative Commons license Significant number of datasets have a national license: • • • apie v2 to publish information created by French public authorities UK-crown which “covers material created by civil servants, ministers and government departments and agencies” in the UK, UK Open Government License 128 datasets with an explicitly closed license 21/11/2013 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 6
  • 7. ★★ Machine readable format • Facilitates data reusability • 4.051 / 17.027 with content_TYPE • 11.285 with at least one indication about format • 56 datasets in RDF • Dominant proportion of spreadsheets type’s formats Distribution of formats 40% not a machine readable format 34% of datasets available in a machine readable format  machine readability cond. for openness levels of 2★ and > 21/11/2013 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 7
  • 8. ★★★ Use of non-proprietary formats Creates ambiguities as the openness nature of formats can be debated in some cases: • • Certain formats are proprietary but their specifications are open. Some formats have been open at a certain point of time but additions and further evolutions remain proprietary In many cases, value of property was too vague to determine whether the format was or not proprietary. It was possible to identify: • • For 49% of the datasets, a non-proprietary format For 21% a proprietary format. Use of proprietary formats is a critical issue for improving the level of openness of datasets. 21/11/2013 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 8
  • 9. ★★★★ Use of open standards from W3C Including HTML, XML, and RDF in particular. • XML-based formats may be entirely independent from W3C (e.g. KML) Availability in W3C standards: 9,5% of datasets Availability in XML based formats: 10% Information remains unknown in most cases 21/11/2013 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 9
  • 10. ★★★★★ Linked data Linked data are only mentioned in the description of a single dataset (Brandweer Amsterdam-Amstelland Uitrukberichten) for which the format is described as “linked data api, rdf json”. 58 datasets mention RDF (or RDFa) as a format or content type, i.e., 0,34%. 21/11/2013 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 10
  • 11. Level of openness (1/2) 6.891 / 17.027 datasets show at least one information about their degree of openness. All come from Data.gov.uk (8 689 datasets) For a majority of datasets, the level of openness is unknown. • 21/11/2013 Coherent with lack of licensing information without which it is impossible to conclude on even ★ openness level. 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe Distribution of openness levels in UK datasets 11
  • 12. Level of openness (2/2) Approximate level of openness derived from licensing and format properties • • 73,24% of the datasets should have ★ or above. Reference to 5★ should take into consideration linkages, cannot be inferred from dataset metadata. Level of openness according to Format and License related properties Data openness mainly related to 1st level of compliance: licensing issue. • 21/11/2013 Data providers have clearly not focused on publication of data in reusable formats. 1-5 stars: Metadata on the Openness Level of 12 Open Data Sets in Europe
  • 13. Conclusion • Limited openness of datasets advertised as open data • Heterogeneity of associated metadata  Difficulty for reusers to (i) discover datasets, despite the creation of large catalogues of datasets, and to (ii) effectively reuse machine readable and contextualized data. ★ may be sufficient to ensure transparency of gov. action, facilitating reuse of data through services is not served below 2★ Confirmed risks regarding major challenges that data providers have to face: (i) language barrier and (ii) lack of consistency of metadata. Harmonization of practices, training and tools necessary to ensure that datasets are available in relevant formats. 21/11/2013 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe 13
  • 14. 1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe Sébastien Martin, Muriel Foulonneau, Slim Turki Contact: muriel.foulonneau@tudor.lu

Editor's Notes

  • #6: The study uses the Tim Berners-Lee’s five star evaluation scale.
  • #7: The one star openness level depends upon data licenses. Licensing information can be found in 10 distinct metadata properties, i.e., licence, License, licence_url, License_details, License_ID, License_summary, License_title, License_uri, License_url, and mandate.
  • #8: The two star level depends upon the format in which the data is made available.