SlideShare a Scribd company logo
Linked Statistical Data 101
ESS Workshop on dissemination of official
statistics as open data
18-19 January 2017, Malta
Oscar Corcho
Escuela Técnica Superior de Ingenieros Informáticos
Universidad Politécnica de Madrid
Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net/
ocorcho@fi.upm.es
Contents
• Foundations of (Linked) Open Data
- For public administrations, in general
- For statistical offices, in particular
• Linked Statistical Data by example
- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background
- W3C RDF DataCube
• Preparing the discussion on benefits for different
types of stakeholders
2
Contents
• Foundations of (Linked) Open Data
- For public administrations, in general
- For statistical offices, in particular
• Linked Statistical Data by example
- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background
- W3C RDF DataCube
• Preparing the discussion on benefits for different
types of stakeholders
3
What is Open Data?
• Open data is data that can be freely used, re-used
and redistributed by anyone - subject only, at most, to
the requirement to attribute and share alike
• Key aspects:
- Availability and access: the data must be available as a
whole and at no more than a reasonable reproduction cost,
preferably by downloading over the Internet. The data must
also be available in a convenient and modifiable form.
- Re-use and redistribution: the data must be provided
under terms that permit re-use and redistribution including
the intermixing with other datasets.
- Universal participation: everyone must be able to use, re-
use and redistribute - there should be no discrimination
against fields of endeavour or against persons or groups
[source: Open Data Handbook, http://opendatahandbook.org/en/what-is-open-data/ ]
Relevant Legislation. Europe and Spain
• Open Access Initiative (2001). Scientific information; > 510 orgs
• Aarhus Convention (1998). Right to participate and access; 41
countries and the EU
• PSI Directives. PSI reuse (2003/98/EC and 2013/37/UE)
• Convention about access to official documentation (2009)
- 12 countries
• Law 37/2007. PSI reuse (transposition of directive 2003/98/EC)
- Modified in law 18/2015 (BOE 10/07/2015, directive 2013/37/UE )
• Law 11/2007. Citizen access to public services, and rights to good
quality services
• RD 4/2010 Esquema Nacional de Interoperabilidad
- Open standards, technology neutral, open source
• RD 1495/2011 It develops Law 37/2007 for national agencies
• Norma Técnica de Interoperabilidad (19/02/2013, BOE 4/3/2013)
[source: based on a presentation from Antonio Rodríguez Pascual (CNIG)]
An Explosion of Open Data Portals
Open Data and how to publish it
1) In a posterboard
- For those with a lot of free time available
- Or those who happen to be there at the right time
Adapted from: Antonio Rodríguez Pascual (IGN)
Open Data and how to publish it
2) On a Web page or mobile app
- For people, but not downloadable
Adapted from: Antonio Rodríguez Pascual (IGN)
Open Data and how to publish it
3) In files
- These can be downloaded and use by humans in
information systems (XML, HTML, CSV, GTFS, etc.)
- Luckily, it is not a scanned PDF
Adapted from: Antonio Rodríguez Pascual (IGN)
Open Data and how to publish it
4) Via Web Services
- They can be used by systems (sometimes persons)
- They allow generating added value
- Ease of integration in the application logic
Adapted from: Antonio Rodríguez Pascual (IGN)
All together…, Shaken, not stirred…
What is Linked Data?
1. Use URIs to identify
rsources
2. Use HTTP URIs, so that
they can be found
3. Use de-referenceable
URIs, that is, provide
useful data (RDF, JSON,
SPARQL)
4. Include links to other
URIs.
• http://www.w3.org/DesignIssues/
LinkedData.html
Open Data and how to publish it
5) Via APIs (semantically enhanced) and linked
- To be used by systems (and sometimes persons)
- It allows generating added-value services
- Standardised formats (JSON, JSON-LD, RDF)
- Standardised models (vocabularies, ontologies)
 Difficult to reuse
√ Reusable.
 Not open
√ Reusable, open
 Difficult to link together
√ Reusable, open,
complete, easier to link
Data representation formats
And many more: JSON, JSON-LD, Shapefiles, KMZ, KML, PC-Axis, etc.
Recap: The 5-star categorisation from TBL
Contents
• Foundations of (Linked) Open Data
- For public administrations, in general
- For statistical offices, in particular
• Linked Statistical Data by example
- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background
- W3C RDF DataCube
• Preparing the discussion on benefits for different
types of stakeholders
16
INFRASTRUCTURE
MICRODATA
MACRODATA
i
Cartography, streets,
directories, codes…
ANALYSTS
JOURNALISTS
CITIZENS
RESEARCHERS
METADATA
Which type of data and which (re)users?
[source: Alberto González Yanes (ISTAC)]
Our use case: Aragón
• IAEST
- Instituto Aragonés de
Estadística
• Good open data
ecosystem
- Aragón Open Data
• http://opendata.aragon.es/
- Zaragoza
• http://datos.zaragoza.es/
18
Reports and templates
from Oracle BI
Current Web application
for local statistics
Statistics about municipalities
Statistics about municipalities
• At IAEst Web
- http://www.aragon.es/DepartamentosOrganismosPublicos/In
stitutos/InstitutoAragonesEstadistica/AreasGenericas/ci.Esta
disticaLocal.detalleDepartamento
• At OpenDataAragón
- http://opendata.aragon.es/catalogo/edificios-superficie-y-
vivienda-comarcas
Reports and templates
from Oracle BI
Current Web application
for local statistics
What have we done?
SPARQL
Elda
Linked Data
Transformation process
API
Publication process
General architecture
This is not the purpose of my talk
https://github.com/aragonopendata/local-data-aragopedia
URIs for datasets
• Let’s look for the dataset on “Number of homes per
owner per municipality”
- Número de hogares por tipo de propietario por municipio
• The dataset has a URI
- http://opendata.aragon.es/recurso/iaest/dataset/01-
010013TM
What is behind that URI?
24
This is not the purpose of my talk
URIs for each observation
• And now we can point to specific observations in this
dataset
- In 2001, the number of buildings owned by one person in the
municipality of Ilche
• http://opendata.aragon.es/recurso/iaest/observacion/01-
010013TM/00794aab-964f-35c7-8e7c-156c9bc60133
25
URIs for each observation
26
And links to other URIs in Aragón
• The municipality of Ilche
- http://opendata.aragon.es/recurso/territorio/Municipio/Ilche
- This information is owned by another department of the
Government of Aragón
27
And links to codelists
• Types of owners
- http://opendata.aragon.es/kos/iaest/clase-de-propietario
• The community
• A person
• A society
• A public organisation
28
SPARQL endpoint
The women population in Zaragoza in the age range of 0-15
years growed until 2013 and then reduced
select distinct ?year ?personas
where
{
?x a qb:Observation .
?x qb:dataSet <http://opendata.aragon.es/recurso/iaest/dataset/03-030005TM> .
?x <http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?year .
?x <http://purl.org/linked-data/sdmx/2009/dimension#refArea>
<http://opendata.aragon.es/recurso/territorio/Municipio/Zaragoza>.
?x <http://opendata.aragon.es/def/iaest/dimension#edad-grandes-grupos>
<http://opendata.aragon.es/kos/iaest/edad-grandes-grupos/0-a-15> .
?x <http://opendata.aragon.es/def/iaest/dimension#sexo>
<http://opendata.aragon.es/kos/iaest/sexo/mujeres>.
?x <http://opendata.aragon.es/def/iaest/medida#personas> ?personas .
} ORDER BY ?year
Examples at
https://github.com/aragonopendata/local-data-aragopedia/blob/master/consultas.md
Contents
• Foundations of (Linked) Open Data
- For public administrations, in general
- For statistical offices, in particular
• Linked Statistical Data by example
- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background
- W3C RDF DataCube
• Preparing the discussion on benefits for different
types of stakeholders
30
W3C Data Cube
3131
http://www.w3.org/TR/vocab-data-cube/
W3C Data Cube
3232
DataSets and Observations
33
Observations in a dataset
34
qb:DataSetqb:Observation
qb:dataSet
rdf:type
iaest-data:01-010003M/22001/030-045 aod:Abiego
sdmx:refArea
Iaest-codelist:superficie030-045
iaest:superficieUtil
“1”^^xsd:int
Iaest:numeroHogares
iaest:01-010003M
qb:dataSet
rdf:type
DataCube Structure Definition
35
Describing the dataset
36
qb:DataSet qb:DataStructureDefinition qb:ComponentSpecification qb:ComponentProperty
sdmx:refArea
iaest:superficieUtil
qb:structure qb:component qb:componentProperty
rdf:type rdf:type
iaest:01-010003M iaest--dsd:01-010003M
qb:structure qb:component
qb:measure
iaest:numeroHogares
qb:dimension
qb:dimension
rdf:typerdf:type
Dimensions
37
qb:DataSet qb:DataStructureDefinition
rdfs:range
qb:concept
qb:DimensionProperty qb:MeasureProperty
qb:Observation
esadm:MunicipioIaest:SuperficieUtil
qb:ComponentSpecification
qb:ComponentProperty
rdfs:subClassOf
qb:dataSet
iaest:numeroHogaressdmx:refAreaiaest:superficieUtil
rdf:type rdf:type
rdfs:range
xsd:int
rdfs:range
qb:structure qb:component
qb:componentProperty
SKOS Codelists
38
rdfs:subClassOf
sdmx:CodeList skos:Conceptskos:ConceptScheme
iaest:SuperficieUtil
qb:codeList
Iaest-codelist:SuperficieUtil
rdf:type
Iaest-codelist:superficie030-045
skos:hasTopConcept
rdf:type
Iaest-codelist:superficie046-060
Iaest-codelist:superficie180-mas
…
Contents
• Foundations of (Linked) Open Data
- For public administrations, in general
- For statistical offices, in particular
• Linked Statistical Data by example
- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background
- W3C RDF DataCube
• Preparing the discussion on benefits for different
types of stakeholders
39
Why Linked Statistical Data? (I)
• Facilitate data (re)use by developers outside our
organisation
• Data access APIs (according to standards)
• Do they prefer CSVs, PCAxis, SDMX, RDF?
• Fine-grained data granularity (refer to specific facts)
• Integration with other data sources from other public
or private organisations
- E.g., Government of Aragón for municipalities
• Allow for queries across datasets
- E.g., tell me how many municipalities may benefit from this
funding that I am making available with these restrictions:
number of registered companies lower than 5 and
unemployed population higher than 15%
Why Linked Statistical Data? (II)
• Internal benefits as well
- Codelists are made available and more visible internally
- Methodology and metadata explicitly described as part of
the RDF DataCube data (e.g., reference years in datasets)
41
Linked Statistical Data 101
ESS Workshop on dissemination of official
statistics as open data
18-19 January 2017, Malta
Oscar Corcho
Escuela Técnica Superior de Ingenieros Informáticos
Universidad Politécnica de Madrid
Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net/
ocorcho@fi.upm.es

More Related Content

PPTX
Ojo Al Data 100 - Call for sharing session at IODC 2016
PPTX
Ontology Engineering at Scale for Open City Data Sharing
PPTX
Organisational Interoperability in Practice at Universidad Politécnica de Madrid
PPTX
A Linked Data Dataset for Madrid Transport Authority's Datasets
PDF
2014 - Why #DataKind should open its next chapter in Brussels !
PDF
EDF2014: Piek Vossen, Professor Computational Lexicology, VU University Amste...
PDF
Ontotext Cultural Heritage and Digital Humanities Projects
PPSX
Plataforma ciencia cidada invasoras.pt
Ojo Al Data 100 - Call for sharing session at IODC 2016
Ontology Engineering at Scale for Open City Data Sharing
Organisational Interoperability in Practice at Universidad Politécnica de Madrid
A Linked Data Dataset for Madrid Transport Authority's Datasets
2014 - Why #DataKind should open its next chapter in Brussels !
EDF2014: Piek Vossen, Professor Computational Lexicology, VU University Amste...
Ontotext Cultural Heritage and Digital Humanities Projects
Plataforma ciencia cidada invasoras.pt

What's hot (20)

PPTX
PoolParty Semantic Suite - Solutions for Sustainable Development
PPTX
Introduction to: Big Data Europe Project
PDF
KM4city, Il Valore degli #OpenData: Esperienze a confronto
PPTX
Linked Open Data (LOD) Pilot Austria
PDF
Alex Corbi - Visualizing open data with carto_db
PDF
SC7 Workshop 2: Space-based applications and Big Data
PDF
SC7 Workshop 2: Industry view of Big Data Challenges for Secure Societies
PDF
SC7 Workshop 2: Big Data and Secure Societies
PDF
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
PDF
Apps for Italy - a4i
PDF
How to become the best datascientist in Europe
PPTX
Semantic MediaWiki - Knowledge Management and Open Data Use Cases
PDF
Brussels Capital of Data Science
PDF
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
PPT
Open Science at the European Commission
PDF
Collections 2.0
PDF
Mikko Järvenpää - infogr.am - Latvia - Stanford Engineering - Feb 23 2015
PPT
Neven Vrček - Role of Governments, Academy & Science Parks - University of Za...
PDF
iSOCO - Research Lab Brief Introduction
PDF
creating a trading zone around twitter srchives. case study: paris attacks
PoolParty Semantic Suite - Solutions for Sustainable Development
Introduction to: Big Data Europe Project
KM4city, Il Valore degli #OpenData: Esperienze a confronto
Linked Open Data (LOD) Pilot Austria
Alex Corbi - Visualizing open data with carto_db
SC7 Workshop 2: Space-based applications and Big Data
SC7 Workshop 2: Industry view of Big Data Challenges for Secure Societies
SC7 Workshop 2: Big Data and Secure Societies
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Apps for Italy - a4i
How to become the best datascientist in Europe
Semantic MediaWiki - Knowledge Management and Open Data Use Cases
Brussels Capital of Data Science
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
Open Science at the European Commission
Collections 2.0
Mikko Järvenpää - infogr.am - Latvia - Stanford Engineering - Feb 23 2015
Neven Vrček - Role of Governments, Academy & Science Parks - University of Za...
iSOCO - Research Lab Brief Introduction
creating a trading zone around twitter srchives. case study: paris attacks
Ad

Viewers also liked (9)

PPTX
Aplicando los principios de Linked Data en AEMET
PPTX
STARS4ALL general presentation at ALAN2016
PPTX
Linked Statistical Data: does it actually pay off?
PPTX
Presentación de la red de excelencia de Open Data y Smart Cities
PDF
Detrás de un gran dataset siempre hay un gran vocabulario
PPTX
Why do they call it Linked Data when they want to say...?
PDF
ARIADNE: Initial Dissemination Plan
PPTX
Educando sobre datos abiertos: desde el colegio a la universidad
PPTX
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
Aplicando los principios de Linked Data en AEMET
STARS4ALL general presentation at ALAN2016
Linked Statistical Data: does it actually pay off?
Presentación de la red de excelencia de Open Data y Smart Cities
Detrás de un gran dataset siempre hay un gran vocabulario
Why do they call it Linked Data when they want to say...?
ARIADNE: Initial Dissemination Plan
Educando sobre datos abiertos: desde el colegio a la universidad
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
Ad

Similar to Linked Statistical Data 101 (20)

PDF
How open data are turned into services?
PDF
ENGAGE Workshop at OpenDataWeek2013
PDF
Methodological Guidelines for Publishing Linked Data
PPTX
Paul Davidson – Opening up public data to improve transparancy and efficiency
PDF
Ontology Building vs Data Harvesting and Cleaning for Smart-city Services
PPT
Vassilios Peristeras: From Open to Linked Government Data: (European Commissi...
PDF
New trends in ontological engineering, practices and tools
PDF
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
PDF
Presentations from ICT 2015 in Lisbon
PPTX
Local Open Data: A perspective from local government in England by Gesche Schmid
PPTX
Local Open Data: a perspective from local government in England 2014
PDF
Webinar for the INSPIRE 2018 Hackathon (July session, 20/07/2018))
PPTX
Eurocities wg innovation utrecht
PPTX
SoBigData. European Research Infrastructure for Big Data and Social Mining
PPTX
Semantically Mapping Science (SMS)
PDF
OpenTransportNet: Stimulating Innovation with Open Geographic Information
PPTX
INTERSTAT_FIWARE_Summit_v3.0.pptx
PPTX
Open Source & Open Data Session report from imaGIne 2014 Conference
PPT
Von Open Data zu Linked Open Data, M. Kaltenböck, SWC
PDF
14a Conferenza Nazionale di Statistica
How open data are turned into services?
ENGAGE Workshop at OpenDataWeek2013
Methodological Guidelines for Publishing Linked Data
Paul Davidson – Opening up public data to improve transparancy and efficiency
Ontology Building vs Data Harvesting and Cleaning for Smart-city Services
Vassilios Peristeras: From Open to Linked Government Data: (European Commissi...
New trends in ontological engineering, practices and tools
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
Presentations from ICT 2015 in Lisbon
Local Open Data: A perspective from local government in England by Gesche Schmid
Local Open Data: a perspective from local government in England 2014
Webinar for the INSPIRE 2018 Hackathon (July session, 20/07/2018))
Eurocities wg innovation utrecht
SoBigData. European Research Infrastructure for Big Data and Social Mining
Semantically Mapping Science (SMS)
OpenTransportNet: Stimulating Innovation with Open Geographic Information
INTERSTAT_FIWARE_Summit_v3.0.pptx
Open Source & Open Data Session report from imaGIne 2014 Conference
Von Open Data zu Linked Open Data, M. Kaltenböck, SWC
14a Conferenza Nazionale di Statistica

More from Oscar Corcho (19)

PPTX
Introducción a los Datos Abiertos - Open Data Day 2020
PPTX
Open Data (and Software, and other Research Artefacts) - A proper management
PDF
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
PPTX
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
PPTX
STARS4ALL - Contaminación Lumínica
PPTX
Towards Reproducible Science: a few building blocks from my personal experience
PPTX
Publishing Linked Statistical Data: Aragón, a case study
PPTX
An initial analysis of topic-based similarity among scientific documents base...
PPTX
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
PPTX
Research Objects for improved sharing and reproducibility
PPTX
(Big) Data (Science) Skills
PPTX
Big Data - El Futuro a través de los Datos
PPTX
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
PPTX
Aspectos técnicos de la ontología PPROC
PPTX
AragoDBpedia
PPTX
The role of annotation in reproducibility (Empirical 2014)
PPTX
Best practices for Archival Processing of Research Objects (a librarian view)
PPTX
Linked Data: Oportunidades para el Transporte
PPT
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...
Introducción a los Datos Abiertos - Open Data Day 2020
Open Data (and Software, and other Research Artefacts) - A proper management
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
STARS4ALL - Contaminación Lumínica
Towards Reproducible Science: a few building blocks from my personal experience
Publishing Linked Statistical Data: Aragón, a case study
An initial analysis of topic-based similarity among scientific documents base...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Research Objects for improved sharing and reproducibility
(Big) Data (Science) Skills
Big Data - El Futuro a través de los Datos
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
Aspectos técnicos de la ontología PPROC
AragoDBpedia
The role of annotation in reproducibility (Empirical 2014)
Best practices for Archival Processing of Research Objects (a librarian view)
Linked Data: Oportunidades para el Transporte
OrdRing 2013 keynote - On the need for a W3C community group on RDF Stream Pr...

Recently uploaded (20)

PDF
The Detrimental Impacts of Hydraulic Fracturing for Oil and Gas_ A Researched...
PPTX
Omnibus rules on leave administration.pptx
PPTX
sepsis.pptxMNGHGBDHSB KJHDGBSHVCJB KJDCGHBYUHFB SDJKFHDUJ
PDF
Items # 6&7 - 900 Cambridge Oval Right-of-Way
PPTX
AMO Pune Complete information and work profile
PDF
ISO-9001-2015-internal-audit-checklist2-sample.pdf
PDF
Storytelling youth indigenous from Bolivia 2025.pdf
PDF
Abhay Bhutada and Other Visionary Leaders Reinventing Governance in India
DOC
LU毕业证学历认证,赫尔大学毕业证硕士的学历和学位
PDF
Courtesy Meeting NIPA and MBS Australia.
PDF
Population Estimates 2025 Regional Snapshot 08.11.25
PPTX
Introduction_to_the_Study_of_Globalization.pptx
PPTX
OUR GOVERNMENT-Grade 5 -World around us.
PPTX
Vocational Education for educational purposes
PPTX
PCCR-ROTC-UNIT-ORGANIZATIONAL-STRUCTURE-pptx-Copy (1).pptx
PDF
Item # 2 - 934 Patterson Specific Use Permit (SUP)
PDF
Item # 3 - 934 Patterson Final Review.pdf
PDF
2025 Shadow report on Ukraine's progression regarding Chapter 29 of the acquis
PPTX
SOMANJAN PRAMANIK_3500032 2042.pptx
PDF
How FPOs Are Reshaping Agriculture in Maharashtra?
The Detrimental Impacts of Hydraulic Fracturing for Oil and Gas_ A Researched...
Omnibus rules on leave administration.pptx
sepsis.pptxMNGHGBDHSB KJHDGBSHVCJB KJDCGHBYUHFB SDJKFHDUJ
Items # 6&7 - 900 Cambridge Oval Right-of-Way
AMO Pune Complete information and work profile
ISO-9001-2015-internal-audit-checklist2-sample.pdf
Storytelling youth indigenous from Bolivia 2025.pdf
Abhay Bhutada and Other Visionary Leaders Reinventing Governance in India
LU毕业证学历认证,赫尔大学毕业证硕士的学历和学位
Courtesy Meeting NIPA and MBS Australia.
Population Estimates 2025 Regional Snapshot 08.11.25
Introduction_to_the_Study_of_Globalization.pptx
OUR GOVERNMENT-Grade 5 -World around us.
Vocational Education for educational purposes
PCCR-ROTC-UNIT-ORGANIZATIONAL-STRUCTURE-pptx-Copy (1).pptx
Item # 2 - 934 Patterson Specific Use Permit (SUP)
Item # 3 - 934 Patterson Final Review.pdf
2025 Shadow report on Ukraine's progression regarding Chapter 29 of the acquis
SOMANJAN PRAMANIK_3500032 2042.pptx
How FPOs Are Reshaping Agriculture in Maharashtra?

Linked Statistical Data 101

  • 1. Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar Corcho Escuela Técnica Superior de Ingenieros Informáticos Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net/ ocorcho@fi.upm.es
  • 2. Contents • Foundations of (Linked) Open Data - For public administrations, in general - For statistical offices, in particular • Linked Statistical Data by example - A use case from IAEST (Aragón Statistical Office) • A bit of technical background - W3C RDF DataCube • Preparing the discussion on benefits for different types of stakeholders 2
  • 3. Contents • Foundations of (Linked) Open Data - For public administrations, in general - For statistical offices, in particular • Linked Statistical Data by example - A use case from IAEST (Aragón Statistical Office) • A bit of technical background - W3C RDF DataCube • Preparing the discussion on benefits for different types of stakeholders 3
  • 4. What is Open Data? • Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and share alike • Key aspects: - Availability and access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the Internet. The data must also be available in a convenient and modifiable form. - Re-use and redistribution: the data must be provided under terms that permit re-use and redistribution including the intermixing with other datasets. - Universal participation: everyone must be able to use, re- use and redistribute - there should be no discrimination against fields of endeavour or against persons or groups [source: Open Data Handbook, http://opendatahandbook.org/en/what-is-open-data/ ]
  • 5. Relevant Legislation. Europe and Spain • Open Access Initiative (2001). Scientific information; > 510 orgs • Aarhus Convention (1998). Right to participate and access; 41 countries and the EU • PSI Directives. PSI reuse (2003/98/EC and 2013/37/UE) • Convention about access to official documentation (2009) - 12 countries • Law 37/2007. PSI reuse (transposition of directive 2003/98/EC) - Modified in law 18/2015 (BOE 10/07/2015, directive 2013/37/UE ) • Law 11/2007. Citizen access to public services, and rights to good quality services • RD 4/2010 Esquema Nacional de Interoperabilidad - Open standards, technology neutral, open source • RD 1495/2011 It develops Law 37/2007 for national agencies • Norma Técnica de Interoperabilidad (19/02/2013, BOE 4/3/2013) [source: based on a presentation from Antonio Rodríguez Pascual (CNIG)]
  • 6. An Explosion of Open Data Portals
  • 7. Open Data and how to publish it 1) In a posterboard - For those with a lot of free time available - Or those who happen to be there at the right time Adapted from: Antonio Rodríguez Pascual (IGN)
  • 8. Open Data and how to publish it 2) On a Web page or mobile app - For people, but not downloadable Adapted from: Antonio Rodríguez Pascual (IGN)
  • 9. Open Data and how to publish it 3) In files - These can be downloaded and use by humans in information systems (XML, HTML, CSV, GTFS, etc.) - Luckily, it is not a scanned PDF Adapted from: Antonio Rodríguez Pascual (IGN)
  • 10. Open Data and how to publish it 4) Via Web Services - They can be used by systems (sometimes persons) - They allow generating added value - Ease of integration in the application logic Adapted from: Antonio Rodríguez Pascual (IGN)
  • 11. All together…, Shaken, not stirred…
  • 12. What is Linked Data? 1. Use URIs to identify rsources 2. Use HTTP URIs, so that they can be found 3. Use de-referenceable URIs, that is, provide useful data (RDF, JSON, SPARQL) 4. Include links to other URIs. • http://www.w3.org/DesignIssues/ LinkedData.html
  • 13. Open Data and how to publish it 5) Via APIs (semantically enhanced) and linked - To be used by systems (and sometimes persons) - It allows generating added-value services - Standardised formats (JSON, JSON-LD, RDF) - Standardised models (vocabularies, ontologies)
  • 14.  Difficult to reuse √ Reusable.  Not open √ Reusable, open  Difficult to link together √ Reusable, open, complete, easier to link Data representation formats And many more: JSON, JSON-LD, Shapefiles, KMZ, KML, PC-Axis, etc.
  • 15. Recap: The 5-star categorisation from TBL
  • 16. Contents • Foundations of (Linked) Open Data - For public administrations, in general - For statistical offices, in particular • Linked Statistical Data by example - A use case from IAEST (Aragón Statistical Office) • A bit of technical background - W3C RDF DataCube • Preparing the discussion on benefits for different types of stakeholders 16
  • 18. Our use case: Aragón • IAEST - Instituto Aragonés de Estadística • Good open data ecosystem - Aragón Open Data • http://opendata.aragon.es/ - Zaragoza • http://datos.zaragoza.es/ 18
  • 19. Reports and templates from Oracle BI Current Web application for local statistics Statistics about municipalities
  • 20. Statistics about municipalities • At IAEst Web - http://www.aragon.es/DepartamentosOrganismosPublicos/In stitutos/InstitutoAragonesEstadistica/AreasGenericas/ci.Esta disticaLocal.detalleDepartamento • At OpenDataAragón - http://opendata.aragon.es/catalogo/edificios-superficie-y- vivienda-comarcas
  • 21. Reports and templates from Oracle BI Current Web application for local statistics What have we done?
  • 22. SPARQL Elda Linked Data Transformation process API Publication process General architecture This is not the purpose of my talk https://github.com/aragonopendata/local-data-aragopedia
  • 23. URIs for datasets • Let’s look for the dataset on “Number of homes per owner per municipality” - Número de hogares por tipo de propietario por municipio • The dataset has a URI - http://opendata.aragon.es/recurso/iaest/dataset/01- 010013TM
  • 24. What is behind that URI? 24 This is not the purpose of my talk
  • 25. URIs for each observation • And now we can point to specific observations in this dataset - In 2001, the number of buildings owned by one person in the municipality of Ilche • http://opendata.aragon.es/recurso/iaest/observacion/01- 010013TM/00794aab-964f-35c7-8e7c-156c9bc60133 25
  • 26. URIs for each observation 26
  • 27. And links to other URIs in Aragón • The municipality of Ilche - http://opendata.aragon.es/recurso/territorio/Municipio/Ilche - This information is owned by another department of the Government of Aragón 27
  • 28. And links to codelists • Types of owners - http://opendata.aragon.es/kos/iaest/clase-de-propietario • The community • A person • A society • A public organisation 28
  • 29. SPARQL endpoint The women population in Zaragoza in the age range of 0-15 years growed until 2013 and then reduced select distinct ?year ?personas where { ?x a qb:Observation . ?x qb:dataSet <http://opendata.aragon.es/recurso/iaest/dataset/03-030005TM> . ?x <http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?year . ?x <http://purl.org/linked-data/sdmx/2009/dimension#refArea> <http://opendata.aragon.es/recurso/territorio/Municipio/Zaragoza>. ?x <http://opendata.aragon.es/def/iaest/dimension#edad-grandes-grupos> <http://opendata.aragon.es/kos/iaest/edad-grandes-grupos/0-a-15> . ?x <http://opendata.aragon.es/def/iaest/dimension#sexo> <http://opendata.aragon.es/kos/iaest/sexo/mujeres>. ?x <http://opendata.aragon.es/def/iaest/medida#personas> ?personas . } ORDER BY ?year Examples at https://github.com/aragonopendata/local-data-aragopedia/blob/master/consultas.md
  • 30. Contents • Foundations of (Linked) Open Data - For public administrations, in general - For statistical offices, in particular • Linked Statistical Data by example - A use case from IAEST (Aragón Statistical Office) • A bit of technical background - W3C RDF DataCube • Preparing the discussion on benefits for different types of stakeholders 30
  • 34. Observations in a dataset 34 qb:DataSetqb:Observation qb:dataSet rdf:type iaest-data:01-010003M/22001/030-045 aod:Abiego sdmx:refArea Iaest-codelist:superficie030-045 iaest:superficieUtil “1”^^xsd:int Iaest:numeroHogares iaest:01-010003M qb:dataSet rdf:type
  • 36. Describing the dataset 36 qb:DataSet qb:DataStructureDefinition qb:ComponentSpecification qb:ComponentProperty sdmx:refArea iaest:superficieUtil qb:structure qb:component qb:componentProperty rdf:type rdf:type iaest:01-010003M iaest--dsd:01-010003M qb:structure qb:component qb:measure iaest:numeroHogares qb:dimension qb:dimension rdf:typerdf:type
  • 39. Contents • Foundations of (Linked) Open Data - For public administrations, in general - For statistical offices, in particular • Linked Statistical Data by example - A use case from IAEST (Aragón Statistical Office) • A bit of technical background - W3C RDF DataCube • Preparing the discussion on benefits for different types of stakeholders 39
  • 40. Why Linked Statistical Data? (I) • Facilitate data (re)use by developers outside our organisation • Data access APIs (according to standards) • Do they prefer CSVs, PCAxis, SDMX, RDF? • Fine-grained data granularity (refer to specific facts) • Integration with other data sources from other public or private organisations - E.g., Government of Aragón for municipalities • Allow for queries across datasets - E.g., tell me how many municipalities may benefit from this funding that I am making available with these restrictions: number of registered companies lower than 5 and unemployed population higher than 15%
  • 41. Why Linked Statistical Data? (II) • Internal benefits as well - Codelists are made available and more visible internally - Methodology and metadata explicitly described as part of the RDF DataCube data (e.g., reference years in datasets) 41
  • 42. Linked Statistical Data 101 ESS Workshop on dissemination of official statistics as open data 18-19 January 2017, Malta Oscar Corcho Escuela Técnica Superior de Ingenieros Informáticos Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net/ ocorcho@fi.upm.es

Editor's Notes

  • #3: Linked Statistical Data 101 In this introductory talk we will discuss the main foundations for the application of Linked Data principles into official statistics. We will briefly introduce what Linked Data is, as well as the main principles, languages and technologies behind it (URIs, RDF, SPARQL). We will also discuss about the different formats in which data can be made available on the Web (e.g., RDF Turtle, JSON-LD, CSV on the Web). We will then move into providing a detailed presentation, with step by step examples based on existing Linked Statistical Data sources, of the W3C recommendation RDF DataCube, which is the basis for the dissemination of statistical data as Linked Data. Finally, we will provide some examples of applications, and the opportunities that this approach offers for the development of the proofs of concepts selected by Eurostat and to be discussed during the meeting.
  • #4: Linked Statistical Data 101 In this introductory talk we will discuss the main foundations for the application of Linked Data principles into official statistics. We will briefly introduce what Linked Data is, as well as the main principles, languages and technologies behind it (URIs, RDF, SPARQL). We will also discuss about the different formats in which data can be made available on the Web (e.g., RDF Turtle, JSON-LD, CSV on the Web). We will then move into providing a detailed presentation, with step by step examples based on existing Linked Statistical Data sources, of the W3C recommendation RDF DataCube, which is the basis for the dissemination of statistical data as Linked Data. Finally, we will provide some examples of applications, and the opportunities that this approach offers for the development of the proofs of concepts selected by Eurostat and to be discussed during the meeting.
  • #17: Linked Statistical Data 101 In this introductory talk we will discuss the main foundations for the application of Linked Data principles into official statistics. We will briefly introduce what Linked Data is, as well as the main principles, languages and technologies behind it (URIs, RDF, SPARQL). We will also discuss about the different formats in which data can be made available on the Web (e.g., RDF Turtle, JSON-LD, CSV on the Web). We will then move into providing a detailed presentation, with step by step examples based on existing Linked Statistical Data sources, of the W3C recommendation RDF DataCube, which is the basis for the dissemination of statistical data as Linked Data. Finally, we will provide some examples of applications, and the opportunities that this approach offers for the development of the proofs of concepts selected by Eurostat and to be discussed during the meeting.
  • #31: Linked Statistical Data 101 In this introductory talk we will discuss the main foundations for the application of Linked Data principles into official statistics. We will briefly introduce what Linked Data is, as well as the main principles, languages and technologies behind it (URIs, RDF, SPARQL). We will also discuss about the different formats in which data can be made available on the Web (e.g., RDF Turtle, JSON-LD, CSV on the Web). We will then move into providing a detailed presentation, with step by step examples based on existing Linked Statistical Data sources, of the W3C recommendation RDF DataCube, which is the basis for the dissemination of statistical data as Linked Data. Finally, we will provide some examples of applications, and the opportunities that this approach offers for the development of the proofs of concepts selected by Eurostat and to be discussed during the meeting.
  • #40: Linked Statistical Data 101 In this introductory talk we will discuss the main foundations for the application of Linked Data principles into official statistics. We will briefly introduce what Linked Data is, as well as the main principles, languages and technologies behind it (URIs, RDF, SPARQL). We will also discuss about the different formats in which data can be made available on the Web (e.g., RDF Turtle, JSON-LD, CSV on the Web). We will then move into providing a detailed presentation, with step by step examples based on existing Linked Statistical Data sources, of the W3C recommendation RDF DataCube, which is the basis for the dissemination of statistical data as Linked Data. Finally, we will provide some examples of applications, and the opportunities that this approach offers for the development of the proofs of concepts selected by Eurostat and to be discussed during the meeting.