SlideShare a Scribd company logo
Oscar Corcho1, Idafen Santana-Pérez1,
Hugo Lafuente2, David Portolés3,
César Cano4, Alfredo Peris4 and José María Subero4
1 Ontology Engineering Group, Universidad Politécnica de Madrid
2 Localidata
3 Idearium Consultores
4 Gobierno de Aragón
Publishing Linked
Statistical Data:
Aragón, a case study
ocorcho@fi.upm.es
@ocorcho
22/10/2017
SemStats 2017 @ ISWC
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Context
2
 IAEst: Instituto Aragonés de Estadística
o http://www.aragon.es/iaest
o The statistical office from Aragón
o Offering open data through
• Open Data portal in Aragón (http://opendata.aragon.es/)
• Their own portal (our interest is on the database of
“estadística local”)
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Context: Existing IAEst data infrastructure
3
 Existing data infrastructure
o Data warehouse infrastructure based on an Oracle BI
o Exports into different formats, including CSVs
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Context: Existing IAEst data infrastructure
4
 Existing data infrastructure
o Data warehouse infrastructure based on an Oracle BI
o Exports into different formats, including CSVs
o http://www.aragon.es/DepartamentosOrganismosPublicos/Institu
tos/InstitutoAragonesEstadistica/AreasGenericas/ci.EstadisticaL
ocal.detalleDepartamento
 Data retrieval and browsing
o Taxonomy-based
o Fixed filters coded in the app
o User selects
• Administrative division
• The concrete municipality
• Browses the folder structure
o Data retrieved in HTML, PDF
or CSV
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Predesigned reports
offered from Oracle BI
Web app for
Estadística Local
Context: Existing IAEst web app
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Context: Existing IAEst data sharing
 En la Web del IAEst
o http://www.aragon.es/DepartamentosOrganismosPublicos/Institu
tos/InstitutoAragonesEstadistica/AreasGenericas/ci.EstadisticaL
ocal.detalleDepartamento
 En OpenDataAragón
o http://opendata.aragon.es/catalogo/edificios-superficie-y-
vivienda-comarcas
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Goals
7
Extract those statistical reports, transform them into
RDF according to W3C standards, curate them, link
them to the existing Linked Data from Aragón (mostly
URIs from municipalities and regions) and provide an
API and a new user interface to make use of them
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Results
8
 An easier-to-maintain data transformation process
o Enriching existing Linked Data APIs from Aragón
o Using GitHub for
• Version control and archival
• Continuous updates: detecting new data and data structures
on a daily basis
• https://github.com/aragonopendata/local-data-aragopedia/
 Developer-friendly API
 Additional user interface
o Improving data retrieval and browsing capabilities
 Side effect: data curation
o Many errors and improvements detected in pre-existing CSV
exports, which have been corrected throughout the process
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Transformation and publication process
9
Initial characterisation
•Identify sources
•Identify dimensions
and measurements
Transformation
•Daily data download
•Processing (UTF8)
•Upload into GitHub
•New dimensions/measures
annotation
•RDF transformation
Publication and use
•Linked Data APIs
https://github.com/aragonopendata/local-data-aragopedia/
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Initial characterisation
10
 Identify and download data sources to be published
(~1000)
o https://github.com/aragonopendata/local-data-
aragopedia/tree/master/data/resource/DatosDescarga-UTF8
 Pre-process data (UTF-8 encoding, download error
verification and retrials)
 Identify potential dimensions and measurements
o Analysis of column header names (e.g., municipio, comarca),
and data content (how many different values)
• https://github.com/aragonopendata/local-data-
aragopedia/blob/master/data/resource/heads.txt
o From 700+ dimensions to ~500
• Curated by IAEst experts (e.g., Male, M, Males, Female, F,
Females, Women, Men)
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Initial characterisation
11
 SKOS concept schemes for each dimension
o https://github.com/aragonopendata/local-data-
aragopedia/tree/master/data/dump/DatosTTL/codelists
o Mapping files available in GitHub (e.g.,
https://github.com/aragonopendata/local-data-
aragopedia/blob/master/data/metadata/mapping-tipo-edificio-
detalle.xlsx)
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Initial characterisation
12
 Measurement properties
o https://github.com/aragonopendata/local-data-
aragopedia/blob/master/data/dump/DatosTTL/codelists/propertie
s.ttl
 DSDs
o https://github.com/aragonopendata/local-data-
aragopedia/tree/master/data/dump/DatosTTL/dataStructures
 Errors were identified during this phase
o Same concept, different names (e.g. sexo and género)
o Typos in header names
o Columns with no values
o Data belonging to wrong municipalities and districts
o https://github.com/aragonopendata/local-data-
aragopedia/blob/master/data/dump/errorReport.txt
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Continuous Transformation
13
 Continuous production cycle
o Update RDF as reports are generated, modified or removed
 Executed every night
o Retrieves all the reports from the list (generated before)
o Checks whether the reports have been already transformed
or if the contain new data
o Hash signatures for each generated Data Cube
• https://github.com/aragonopendata/local-data-
aragopedia/blob/master/data/resource/hashcode.csv
• Used to compare data versions
• If hashes do not match, the Data Cube is marked to be
regenerated
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Continuous Transformation
14
 Each iteration generates a GitHub issue, listing the
cubes that have must be created, modified, etc.
o https://github.com/aragonopendata/local-data-
aragopedia/issues
• https://github.com/aragonopendata/local-data-
aragopedia/issues/93 (new data)
• https://github.com/aragonopendata/local-data-
aragopedia/issues/457 (datacube to delete, new
configurations needed)
o When user interaction is needed, this is reflected in the issue
text, and the IAEst responsible needs to update it
 RDF transformation is done according to the
configuration file
o https://github.com/aragonopendata/local-data-
aragopedia/blob/master/data/metadata/Informe-01-010001-
A-TC-TM-TP.xlsx
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Continuous Transformation
15
 RDF data is stored in GitHub (new version)
o https://github.com/aragonopendata/local-data-
aragopedia/tree/master/data/dump/DatosTTL/informes
 RDF data is stored in the Open Data Aragón
SPARQL endpoint
o http://opendata.aragon.es/sparql
o Reusing the 3cixty KB deployment utilities
o Each cube is stored on its own graph
o Graphs updated for Data Structure Definition (DSD),
properties and SKOS information
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Data transformation. In summary…
bi.aragon.es
Google
Drive
Dataset and
configuration download
New dataset?
GitHub
Sí
For each
dataset
Generate new
configuration and
create an issue
New structure?
No
Create
issue
Sí
New data?
Regenerate
data and
create issue
No
Sí
SPARQL
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Data publication and use
17
 Data can be accessed
o API (using ELDA)
• http://opendata.aragon.es/herramientas/apis?#aragodbpedia
o GitHub (CSVs, RDF)
o SPARQL endpoint
SPARQL
Elda
Linked Data
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Data API
http://opendata.aragon.es/herramientas/apis?#aragodb
pedia
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Data publication and use
19
 Aragopedia
o http://opendata.aragon.es/apps/aragopedia/datos
o Where, when and what (dónde, cuándo y qué)
o Data can be downloaded in
• CSV
• JSON
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Aragopedia
20
 Aragopedia
o JSON result of querying about
• Maestrazgo region (where)
• population (what)
• in 1999 (when)
Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017
Conclusions (Results)
21
 An easier-to-maintain data transformation process
o Enriching existing Linked Data APIs from Aragón
o Using GitHub for
• Version control and archival
• Continuous updates: detecting new data and data structures
on a daily basis
• https://github.com/aragonopendata/local-data-aragopedia/
 Developer-friendly API
 Additional user interface
o Improving data retrieval and browsing capabilities
 Side effect: data curation
o Many errors and improvements detected in pre-existing CSV
exports, which have been corrected throughout the process
Oscar Corcho1, Idafen Santana-Pérez1,
Hugo Lafuente2, David Portolés3,
César Cano4, Alfredo Peris4 and José María Subero4
1 Ontology Engineering Group, Universidad Politécnica de Madrid
2 Localidata
3 Idearium Consultores
4 Gobierno de Aragón
Publishing Linked
Statistical Data:
Aragón, a case study
ocorcho@fi.upm.es
@ocorcho
22/10/2017
SemStats 2017 @ ISWC

More Related Content

PPTX
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
PPTX
OpenCube Workshop at eGov2015 & ePart2015 dual conference
PPTX
LD4 conference 2020 The Use of Linked Data at the ISSN International Centre
PDF
LinkedStat: making ISTAT data more valuable
PDF
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...
PDF
Querying the Wikidata Knowledge Graph
PDF
ORCID cross-sector application and use cases, Funder workflow: National Resea...
PDF
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
OpenCube Workshop at eGov2015 & ePart2015 dual conference
LD4 conference 2020 The Use of Linked Data at the ISSN International Centre
LinkedStat: making ISTAT data more valuable
Supervised Papers Classification on Large-Scale High-Dimensional Data with Ap...
Querying the Wikidata Knowledge Graph
ORCID cross-sector application and use cases, Funder workflow: National Resea...
"Dude, where's my graph?" RDF Data Cubes for Clinical Trials Data

What's hot (20)

PPTX
LD4KD 2015 - Demos and tools
PDF
iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...
PDF
High quality Linked Data generation for librarians
PDF
What Factors Influence the Design of a Linked Data Generation Algorithm?
PPTX
Boosting big data with apache spark
PDF
IMDb Data Integration
PDF
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
PPTX
Provenance as a building block for an open science infrastructure
PDF
Semantic E-Commerce - Use Cases in Enterprise Web Applications
PDF
Stream processing: The Matrix Revolutions
PPTX
MOAR RDA For Systems Folks
PDF
The reuse of open data: an opportunity for Spain
PDF
Finding Insights In Connected Data: Using Graph Databases In Journalism
PDF
Data Collection Industry Insights: Hotels
PPT
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
PDF
KESeDa: Knowledge Extraction from Heterogeneous Semi-Structured Data Sources
PPTX
RDF-Gen: Generating RDF from streaming and archival data
PPTX
Getting started with JUSP
PDF
Enabling Data Analytics from Knowledge Graphs @ ISWC 2017 Doctoral Consortium
PPTX
Introducing the IRUSdataUK pilot webinar
LD4KD 2015 - Demos and tools
iLastic: Linked Data Generation Workflow and User Interface for iMinds Schola...
High quality Linked Data generation for librarians
What Factors Influence the Design of a Linked Data Generation Algorithm?
Boosting big data with apache spark
IMDb Data Integration
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
Provenance as a building block for an open science infrastructure
Semantic E-Commerce - Use Cases in Enterprise Web Applications
Stream processing: The Matrix Revolutions
MOAR RDA For Systems Folks
The reuse of open data: an opportunity for Spain
Finding Insights In Connected Data: Using Graph Databases In Journalism
Data Collection Industry Insights: Hotels
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
KESeDa: Knowledge Extraction from Heterogeneous Semi-Structured Data Sources
RDF-Gen: Generating RDF from streaming and archival data
Getting started with JUSP
Enabling Data Analytics from Knowledge Graphs @ ISWC 2017 Doctoral Consortium
Introducing the IRUSdataUK pilot webinar
Ad

Similar to Publishing Linked Statistical Data: Aragón, a case study (20)

PPTX
BDE SC6-ws-05/12/2016 technology part - SWC
PPTX
Boost your data analytics with open data and public news content
PDF
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
PDF
Big data Europe: concept, platform and pilots
PPTX
PPTX
eNanoMapper database, search tools and templates
PPTX
StatDCAT-Application Profile: presentation
PPT
The Power of Semantic Technologies to Explore Linked Open Data
PPTX
Linked Statistical Data: does it actually pay off?
PDF
Meetup070416 Presentations
PDF
Tag.bio: Self Service Data Mesh Platform
PDF
Fairification experience clarifying the semantics of data matrices
PDF
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
PPTX
BDE SC6 workshop - introduction 2016
PDF
TEAMS 6, 7 and 8
PDF
WSO2 Machine Learner - Product Overview
PPTX
BDE SC6-hang out - technology part-SWC - Martin
PPTX
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
PPTX
Bde euro proworkshop
PDF
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
BDE SC6-ws-05/12/2016 technology part - SWC
Boost your data analytics with open data and public news content
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
Big data Europe: concept, platform and pilots
eNanoMapper database, search tools and templates
StatDCAT-Application Profile: presentation
The Power of Semantic Technologies to Explore Linked Open Data
Linked Statistical Data: does it actually pay off?
Meetup070416 Presentations
Tag.bio: Self Service Data Mesh Platform
Fairification experience clarifying the semantics of data matrices
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
BDE SC6 workshop - introduction 2016
TEAMS 6, 7 and 8
WSO2 Machine Learner - Product Overview
BDE SC6-hang out - technology part-SWC - Martin
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Bde euro proworkshop
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
Ad

More from Oscar Corcho (20)

PPTX
Organisational Interoperability in Practice at Universidad Politécnica de Madrid
PPTX
Introducción a los Datos Abiertos - Open Data Day 2020
PPTX
Open Data (and Software, and other Research Artefacts) - A proper management
PDF
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
PPTX
Ontology Engineering at Scale for Open City Data Sharing
PPTX
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
PPTX
STARS4ALL - Contaminación Lumínica
PPTX
Towards Reproducible Science: a few building blocks from my personal experience
PPTX
An initial analysis of topic-based similarity among scientific documents base...
PPTX
Linked Statistical Data 101
PPTX
Aplicando los principios de Linked Data en AEMET
PPTX
Ojo Al Data 100 - Call for sharing session at IODC 2016
PPTX
Educando sobre datos abiertos: desde el colegio a la universidad
PPTX
STARS4ALL general presentation at ALAN2016
PPTX
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
PPTX
Presentación de la red de excelencia de Open Data y Smart Cities
PPTX
Why do they call it Linked Data when they want to say...?
PPTX
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
PPTX
Research Objects for improved sharing and reproducibility
PPTX
(Big) Data (Science) Skills
Organisational Interoperability in Practice at Universidad Politécnica de Madrid
Introducción a los Datos Abiertos - Open Data Day 2020
Open Data (and Software, and other Research Artefacts) - A proper management
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
Ontology Engineering at Scale for Open City Data Sharing
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
STARS4ALL - Contaminación Lumínica
Towards Reproducible Science: a few building blocks from my personal experience
An initial analysis of topic-based similarity among scientific documents base...
Linked Statistical Data 101
Aplicando los principios de Linked Data en AEMET
Ojo Al Data 100 - Call for sharing session at IODC 2016
Educando sobre datos abiertos: desde el colegio a la universidad
STARS4ALL general presentation at ALAN2016
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
Presentación de la red de excelencia de Open Data y Smart Cities
Why do they call it Linked Data when they want to say...?
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Research Objects for improved sharing and reproducibility
(Big) Data (Science) Skills

Recently uploaded (20)

PDF
How FPOs Are Reshaping Agriculture in Maharashtra?
PDF
PPT Items # 6&7 - 900 Cambridge Oval Right-of-Way
PPTX
Quiz - Saturday.pptxaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
DOC
LU毕业证学历认证,赫尔大学毕业证硕士的学历和学位
PPTX
Weekly Report 17-10-2024_cybersecutity.pptx
PPTX
AMO Pune Complete information and work profile
PPTX
GOVERNMENT-ACCOUNTING1. bsa 4 government accounting
PPTX
Nur Shakila Assesmentlwemkf;m;mwee f.pptx
DOCX
EAPP.docxdffgythjyuikuuiluikluikiukuuuuuu
PPTX
STG - Sarikei 2025 Coordination Meeting.pptx
PDF
Storytelling youth indigenous from Bolivia 2025.pdf
PDF
It Helpdesk Solutions - ArcLight Group
PPTX
26.1.2025 venugopal K Awarded with commendation certificate.pptx
PPTX
Vocational Education for educational purposes
PDF
Item # 5 - 5307 Broadway St final review
DOCX
Alexistogel: Solusi Tepat untuk Anda yang Cari Bandar Toto Macau Resmi
PPTX
GSA Q+A Follow-Up To EO's, Requirements & Timelines
PDF
2025 Shadow report on Ukraine's progression regarding Chapter 29 of the acquis
PPTX
Inferenceahaiajaoaakakakakakakakakakakakakaka
PPTX
11Sept2023_LTIA-Cluster-Training-Presentation.pptx
How FPOs Are Reshaping Agriculture in Maharashtra?
PPT Items # 6&7 - 900 Cambridge Oval Right-of-Way
Quiz - Saturday.pptxaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
LU毕业证学历认证,赫尔大学毕业证硕士的学历和学位
Weekly Report 17-10-2024_cybersecutity.pptx
AMO Pune Complete information and work profile
GOVERNMENT-ACCOUNTING1. bsa 4 government accounting
Nur Shakila Assesmentlwemkf;m;mwee f.pptx
EAPP.docxdffgythjyuikuuiluikluikiukuuuuuu
STG - Sarikei 2025 Coordination Meeting.pptx
Storytelling youth indigenous from Bolivia 2025.pdf
It Helpdesk Solutions - ArcLight Group
26.1.2025 venugopal K Awarded with commendation certificate.pptx
Vocational Education for educational purposes
Item # 5 - 5307 Broadway St final review
Alexistogel: Solusi Tepat untuk Anda yang Cari Bandar Toto Macau Resmi
GSA Q+A Follow-Up To EO's, Requirements & Timelines
2025 Shadow report on Ukraine's progression regarding Chapter 29 of the acquis
Inferenceahaiajaoaakakakakakakakakakakakakaka
11Sept2023_LTIA-Cluster-Training-Presentation.pptx

Publishing Linked Statistical Data: Aragón, a case study

  • 1. Oscar Corcho1, Idafen Santana-Pérez1, Hugo Lafuente2, David Portolés3, César Cano4, Alfredo Peris4 and José María Subero4 1 Ontology Engineering Group, Universidad Politécnica de Madrid 2 Localidata 3 Idearium Consultores 4 Gobierno de Aragón Publishing Linked Statistical Data: Aragón, a case study ocorcho@fi.upm.es @ocorcho 22/10/2017 SemStats 2017 @ ISWC
  • 2. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Context 2  IAEst: Instituto Aragonés de Estadística o http://www.aragon.es/iaest o The statistical office from Aragón o Offering open data through • Open Data portal in Aragón (http://opendata.aragon.es/) • Their own portal (our interest is on the database of “estadística local”)
  • 3. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Context: Existing IAEst data infrastructure 3  Existing data infrastructure o Data warehouse infrastructure based on an Oracle BI o Exports into different formats, including CSVs
  • 4. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Context: Existing IAEst data infrastructure 4  Existing data infrastructure o Data warehouse infrastructure based on an Oracle BI o Exports into different formats, including CSVs o http://www.aragon.es/DepartamentosOrganismosPublicos/Institu tos/InstitutoAragonesEstadistica/AreasGenericas/ci.EstadisticaL ocal.detalleDepartamento  Data retrieval and browsing o Taxonomy-based o Fixed filters coded in the app o User selects • Administrative division • The concrete municipality • Browses the folder structure o Data retrieved in HTML, PDF or CSV
  • 5. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Predesigned reports offered from Oracle BI Web app for Estadística Local Context: Existing IAEst web app
  • 6. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Context: Existing IAEst data sharing  En la Web del IAEst o http://www.aragon.es/DepartamentosOrganismosPublicos/Institu tos/InstitutoAragonesEstadistica/AreasGenericas/ci.EstadisticaL ocal.detalleDepartamento  En OpenDataAragón o http://opendata.aragon.es/catalogo/edificios-superficie-y- vivienda-comarcas
  • 7. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Goals 7 Extract those statistical reports, transform them into RDF according to W3C standards, curate them, link them to the existing Linked Data from Aragón (mostly URIs from municipalities and regions) and provide an API and a new user interface to make use of them
  • 8. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Results 8  An easier-to-maintain data transformation process o Enriching existing Linked Data APIs from Aragón o Using GitHub for • Version control and archival • Continuous updates: detecting new data and data structures on a daily basis • https://github.com/aragonopendata/local-data-aragopedia/  Developer-friendly API  Additional user interface o Improving data retrieval and browsing capabilities  Side effect: data curation o Many errors and improvements detected in pre-existing CSV exports, which have been corrected throughout the process
  • 9. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Transformation and publication process 9 Initial characterisation •Identify sources •Identify dimensions and measurements Transformation •Daily data download •Processing (UTF8) •Upload into GitHub •New dimensions/measures annotation •RDF transformation Publication and use •Linked Data APIs https://github.com/aragonopendata/local-data-aragopedia/
  • 10. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Initial characterisation 10  Identify and download data sources to be published (~1000) o https://github.com/aragonopendata/local-data- aragopedia/tree/master/data/resource/DatosDescarga-UTF8  Pre-process data (UTF-8 encoding, download error verification and retrials)  Identify potential dimensions and measurements o Analysis of column header names (e.g., municipio, comarca), and data content (how many different values) • https://github.com/aragonopendata/local-data- aragopedia/blob/master/data/resource/heads.txt o From 700+ dimensions to ~500 • Curated by IAEst experts (e.g., Male, M, Males, Female, F, Females, Women, Men)
  • 11. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Initial characterisation 11  SKOS concept schemes for each dimension o https://github.com/aragonopendata/local-data- aragopedia/tree/master/data/dump/DatosTTL/codelists o Mapping files available in GitHub (e.g., https://github.com/aragonopendata/local-data- aragopedia/blob/master/data/metadata/mapping-tipo-edificio- detalle.xlsx)
  • 12. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Initial characterisation 12  Measurement properties o https://github.com/aragonopendata/local-data- aragopedia/blob/master/data/dump/DatosTTL/codelists/propertie s.ttl  DSDs o https://github.com/aragonopendata/local-data- aragopedia/tree/master/data/dump/DatosTTL/dataStructures  Errors were identified during this phase o Same concept, different names (e.g. sexo and género) o Typos in header names o Columns with no values o Data belonging to wrong municipalities and districts o https://github.com/aragonopendata/local-data- aragopedia/blob/master/data/dump/errorReport.txt
  • 13. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Continuous Transformation 13  Continuous production cycle o Update RDF as reports are generated, modified or removed  Executed every night o Retrieves all the reports from the list (generated before) o Checks whether the reports have been already transformed or if the contain new data o Hash signatures for each generated Data Cube • https://github.com/aragonopendata/local-data- aragopedia/blob/master/data/resource/hashcode.csv • Used to compare data versions • If hashes do not match, the Data Cube is marked to be regenerated
  • 14. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Continuous Transformation 14  Each iteration generates a GitHub issue, listing the cubes that have must be created, modified, etc. o https://github.com/aragonopendata/local-data- aragopedia/issues • https://github.com/aragonopendata/local-data- aragopedia/issues/93 (new data) • https://github.com/aragonopendata/local-data- aragopedia/issues/457 (datacube to delete, new configurations needed) o When user interaction is needed, this is reflected in the issue text, and the IAEst responsible needs to update it  RDF transformation is done according to the configuration file o https://github.com/aragonopendata/local-data- aragopedia/blob/master/data/metadata/Informe-01-010001- A-TC-TM-TP.xlsx
  • 15. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Continuous Transformation 15  RDF data is stored in GitHub (new version) o https://github.com/aragonopendata/local-data- aragopedia/tree/master/data/dump/DatosTTL/informes  RDF data is stored in the Open Data Aragón SPARQL endpoint o http://opendata.aragon.es/sparql o Reusing the 3cixty KB deployment utilities o Each cube is stored on its own graph o Graphs updated for Data Structure Definition (DSD), properties and SKOS information
  • 16. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Data transformation. In summary… bi.aragon.es Google Drive Dataset and configuration download New dataset? GitHub Sí For each dataset Generate new configuration and create an issue New structure? No Create issue Sí New data? Regenerate data and create issue No Sí SPARQL
  • 17. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Data publication and use 17  Data can be accessed o API (using ELDA) • http://opendata.aragon.es/herramientas/apis?#aragodbpedia o GitHub (CSVs, RDF) o SPARQL endpoint SPARQL Elda Linked Data
  • 18. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Data API http://opendata.aragon.es/herramientas/apis?#aragodb pedia
  • 19. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Data publication and use 19  Aragopedia o http://opendata.aragon.es/apps/aragopedia/datos o Where, when and what (dónde, cuándo y qué) o Data can be downloaded in • CSV • JSON
  • 20. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Aragopedia 20  Aragopedia o JSON result of querying about • Maestrazgo region (where) • population (what) • in 1999 (when)
  • 21. Publishing Linked Statistical Data: Aragón, a case study. – SemStats 2017 Conclusions (Results) 21  An easier-to-maintain data transformation process o Enriching existing Linked Data APIs from Aragón o Using GitHub for • Version control and archival • Continuous updates: detecting new data and data structures on a daily basis • https://github.com/aragonopendata/local-data-aragopedia/  Developer-friendly API  Additional user interface o Improving data retrieval and browsing capabilities  Side effect: data curation o Many errors and improvements detected in pre-existing CSV exports, which have been corrected throughout the process
  • 22. Oscar Corcho1, Idafen Santana-Pérez1, Hugo Lafuente2, David Portolés3, César Cano4, Alfredo Peris4 and José María Subero4 1 Ontology Engineering Group, Universidad Politécnica de Madrid 2 Localidata 3 Idearium Consultores 4 Gobierno de Aragón Publishing Linked Statistical Data: Aragón, a case study ocorcho@fi.upm.es @ocorcho 22/10/2017 SemStats 2017 @ ISWC

Editor's Notes

  • #2: Cambiar la licencia por la que aplique.
  • #23: Cambiar la licencia por la que aplique.