© Information Systems Lab - 2013
http://islab.uom.gr
Linked Open Government Data Analytics
Evangelos Kalampokis, Efthimios Tambouris,
Konstantinos Tarabanis
© Information Systems Lab, University of Macedonia
Aim of the paper
 Introduce the concept of Data Analytics on top of
distributed statistical linked OGD
 Describe the technical prerequisites
 Demonstrate the end-user value
© Information Systems Lab, University of Macedonia
Open Government Data
 More than 180 Open Government Data portals around the globe
provide data that “can be freely used, reused and redistributed by
anyone”
© Information Systems Lab, University of Macedonia
OGD impact
 The majority of existing
applications exploits a
single dataset and
visualizes data on a
map.
 Expected OGD
potential has not yet
realized
© Information Systems Lab, University of Macedonia
Importance of Data in modern societies
 Business Intelligence
 Evidence based policy-making
 Academia
© Information Systems Lab, University of Macedonia
Open Statistical Data
 A big portion of Open Government Data concerns statistics such as
population figures, economic and social indicators
 For example, the majority (5867 out of 6098 datasets) of the data
published on the EU Open Data Portal are of statistical nature
© Information Systems Lab, University of Macedonia
 But…although OGD enables free access to everyone, data is often
isolated (e.g. due to the available formats)
Data Silos
http://www.flickr.com/photos/rachelrusinski/526260022
© Information Systems Lab, University of Macedonia
Vision: Linked Open Government Data Analytics
 Combining statistical OGD that were previously closed in disparate
sources
 Performing data analytics on top of combined data
 Gaining unexpected and unexplored insights into different domains
and problem areas.
© Information Systems Lab, University of Macedonia
Combining Statistical Data
 Requires effort to:
– Discover data (e.g. datasets sharing common joint points and thus allow
for further analysis)
– Collect data
– Clean data (timely, accurate, relevant data)
– Transform data (common formats)
– Integrate data (interoperability, levels of granularity etc.)
– Visualize and statistically analyze (semi-automatic according to the type
of variables and measures)
 We need to shift this effort from end-users to data-providers
http://www.flickr.com/photos/tetsumo/3586864217
© Information Systems Lab, University of Macedonia
Connecting Data Silos
 We need an infrastructure that will enable connecting data silos over
the Web and thus reducing the effort required for statistical data reuse
 This is where Linked Data comes in…
http://www.flickr.com/photos/sethwoodworth/2303531107
© Information Systems Lab, University of Macedonia
Linked Data
 Items in a dataset are identified using URIs
 URIs are dereferenceable using HTTP
 RDF links to other URIs in other datasets are included
© Information Systems Lab, University of Macedonia
Technical Prerequisites
 Metadata for data discovery
 Vocabularies
 Code lists, concept schemes and classifications
 Typed links (e.g. olws:sameAs) between
– Dimensions definitions
– Values of dimensions
– Categories of measures
© Information Systems Lab, University of Macedonia
RDF data cube vocabulary
© Information Systems Lab, University of Macedonia
The UK Elections Case
 Objective:
– To gain insights regarding UK elections through OGD
 Starting point:
– Data regarding the results of two UK general elections from 2005 and
2010 – in both national and constituency level (Open Data in Guardian)
 OGD:
– We need to discover data that could be analyzed together with the
election results data (i.e. that share common joint points)
© Information Systems Lab, University of Macedonia
OGD
 Source:
– Data from data.gov.uk
 Datasets:
– Unemployment and poverty between 2005-2010 in the UK parliament
constituencies
– In this paper we concentrate on unemployment due to space limitations
© Information Systems Lab, University of Macedonia
Data Conditioning: Linked Data Creation
© Information Systems Lab, University of Macedonia
Linked Data Analytics
 Enables the semi-automatic visualization and performance of
statistical analyses based on :
– Joint points (i.e. variables that are described at a parliament constituency
level)
– Type of variables (e.g. Regression in the case of continuous and
classification in the case of categorical)
© Information Systems Lab, University of Macedonia
Logistic regression Classification Analysis
 Measures the relationship between a categorical dependent variable
and one or more continuous independent variables by converting the
dependent variable to probability scores through the logistic function
 Identify the relationship of unemployment rate of a parliament
constituency and the probability P(A) a particular political party to win
the elections in the constituency

P(A) 
1
1ey

y  c0 c1x1 ...cn xn
© Information Systems Lab, University of Macedonia
Visualization Unemployment & Labours Results (2005)
 The probability for the
Labour Party to win in a
constituency increases as
the unemployment rate of
the constituency increases
 In constituencies with
unemployment rate > 5%
the Labour Party has
strong probability to win
 In 2005 the average
unemployment rate was
3.35%
© Information Systems Lab, University of Macedonia
Visualization Unemployment & Labours Results (2010)
 The pattern is the same
but was moved to the right.
 The average
unemployment rate was
3.35% in 2005 and 7.5% in
2010
© Information Systems Lab, University of Macedonia
Visualization Unemployment & Cons Results (2005)
 In 2005 the average
unemployment rate was
3.35%
 If unemployment rate > 5%
then Conservatives have
very small probability to
win
© Information Systems Lab, University of Macedonia
Visualization Unemployment & Cons Results (2010)
 In 2010 Conservatives do
not win in constituencies
with unemployment rate
>13%
 However the average
unemployment rate
increased from 3.5% to
7.5%
 The logistic regression
pattern is the same
© Information Systems Lab, University of Macedonia
Statistical model creation
 Logistic function that measures the probability P(A) for a party to win in
a specific parliament constituency
 For example, consider the Labour Party in the 2010 elections
 x is the unemployment rate of the constituency.
 In a constituency with 12% unemployment rate the probability for the
Labour Party to win is P(A)=0.8

P(A) 
1
1  ey

y  3.823 0.437 x
© Information Systems Lab, University of Macedonia
Conclusion and Future Work
 Significant efforts for developing tools and applications facilitating
Open Government Data (OGD) publishing and reuse
 OGD has not yet realized the full potential.
 Today, data analytics employ data closed in isolated systems
 We claim that the real value of OGD will emerge from performing Data
Analytics on top of combined statistical datasets
 Linked Open Government Data Analytics show the road ahead
 Future work includes development of a platform enabling semi-
automatic identification of important relations between variables
described in distributed datasets
© Information Systems Lab, University of Macedonia
Acknowledgments
 The work presented in the paper is partly funded by

More Related Content

PPTX
Towards a Vocabulary for Incorporating Predictive Models into the Linked Data...
PPTX
Augmenting Open Government Data with Social Media Data
PPTX
Innovative Approaches for the collection of road transport statistics
PPTX
A Decade of Evaluating Europeana: Constructs, Contexts, Methods & Criteria
PDF
Understanding the Depth of Google Scholar and its Implication for Webometrics...
PPT
Gabriel Rissola: "Measuring the impact of eInclusion actors"
PDF
e-Consultation Platforms: Generating or just Recycling Ideas?
Towards a Vocabulary for Incorporating Predictive Models into the Linked Data...
Augmenting Open Government Data with Social Media Data
Innovative Approaches for the collection of road transport statistics
A Decade of Evaluating Europeana: Constructs, Contexts, Methods & Criteria
Understanding the Depth of Google Scholar and its Implication for Webometrics...
Gabriel Rissola: "Measuring the impact of eInclusion actors"
e-Consultation Platforms: Generating or just Recycling Ideas?

What's hot (16)

PPT
PDF
Teaching Data Journalism by Andreas Veglis - Milan 2015
PPT
Esdin Results And Elf (2)
PDF
eParticipation initiatives in Europe: Learning from Practitioners
PPS
2009 E Usage Stats LibMeter Zbw Hh Workshop
PPTX
KAPA_2011_Seoul_Conference_Khan & Park
PPT
Mapping the e-science landscape In South Korea using the Webometrics method
PPT
Local government web sites in Finland: A geographic and webometric analysis
PPTX
Kapa conference scientometrics-e-govt_khan & park
PDF
Teaching Service Science in the iSchool at the University of Toronto
PDF
20190527_Marc Vanholsbeeck_Open Science monitoring and the notion of research...
PPTX
The future of scholarly communications professionals
PDF
201-Hong Infrastructure for evidence based STI policy in Korea
PPTX
Usage Statistics and Beyond
PDF
Open Data and the transparency of the lists of beneficiaries of EU Regional P...
PDF
Research Data Alliance Plenary 9: DDRI Working Group Session
Teaching Data Journalism by Andreas Veglis - Milan 2015
Esdin Results And Elf (2)
eParticipation initiatives in Europe: Learning from Practitioners
2009 E Usage Stats LibMeter Zbw Hh Workshop
KAPA_2011_Seoul_Conference_Khan & Park
Mapping the e-science landscape In South Korea using the Webometrics method
Local government web sites in Finland: A geographic and webometric analysis
Kapa conference scientometrics-e-govt_khan & park
Teaching Service Science in the iSchool at the University of Toronto
20190527_Marc Vanholsbeeck_Open Science monitoring and the notion of research...
The future of scholarly communications professionals
201-Hong Infrastructure for evidence based STI policy in Korea
Usage Statistics and Beyond
Open Data and the transparency of the lists of beneficiaries of EU Regional P...
Research Data Alliance Plenary 9: DDRI Working Group Session
Ad

Viewers also liked (14)

PDF
FP5 IST eGov Project Presentation at EU MS eGov WG in 2002
PDF
Targeted policy making by transforming social networks
PPT
Life events Revisited: Conceptualization and Representation Using Generic Wor...
PDF
On Public Service Provision Informative Phase: A dialogue-based Model and Pla...
PDF
FP7 OpenCube project presentation at NTTS 2015 conference
PDF
Big data in social sciences and IT developments (ethics considerations)
PDF
Evaluating eParticipation sophistication of Regional Authorities websites: Th...
PDF
Introducing the need for a Domain Model in Public Service Provision (PSP) eGo...
PDF
FP5 IST eGov Project Presentation at the French Prime Minister Cabinet in 2002
PDF
Quick Linked Data Introduction
PDF
Miroslav Líška | Methodology data.gov.sk-semanticweb, LOD Slovakia and Slovpe...
PDF
Linked Data Tutorial
PDF
Introduction to linked data
PPTX
Vassilios Peristeras | Promoting Semantic Interoperability for European Publi...
FP5 IST eGov Project Presentation at EU MS eGov WG in 2002
Targeted policy making by transforming social networks
Life events Revisited: Conceptualization and Representation Using Generic Wor...
On Public Service Provision Informative Phase: A dialogue-based Model and Pla...
FP7 OpenCube project presentation at NTTS 2015 conference
Big data in social sciences and IT developments (ethics considerations)
Evaluating eParticipation sophistication of Regional Authorities websites: Th...
Introducing the need for a Domain Model in Public Service Provision (PSP) eGo...
FP5 IST eGov Project Presentation at the French Prime Minister Cabinet in 2002
Quick Linked Data Introduction
Miroslav Líška | Methodology data.gov.sk-semanticweb, LOD Slovakia and Slovpe...
Linked Data Tutorial
Introduction to linked data
Vassilios Peristeras | Promoting Semantic Interoperability for European Publi...
Ad

Similar to Linked Open Government Data Analytics (20)

PPTX
Pablo de Pedraza: Labor market matching, economic cycle and online vacancies
PDF
Developing Competitive Strategies in Higher Education through Visual Data Mining
PPTX
Profiling Linked Open Data
PPT
PSI e-infrastructures evaluation
PPTX
Comparison of Elementary Dynamic Network Models Using Empirical Data
PDF
A. Nurra, From ICT survey data to experimental statistics; using IaD source f...
PPTX
Engage Project on Open Data
PPT
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

PDF
Predictive geospatial analytics using principal component regression
PPTX
#opendata Back to the future
PDF
FIWARE Global Summit - The Digital Single Market - Benefits and Solutions for...
PDF
Similar Data Points Identification with LLM: A Human-in-the-Loop Strategy Usi...
PPTX
Paul Davidson – Opening up public data to improve transparancy and efficiency
PDF
Analysing Transportation Data with Open Source Big Data Analytic Tools
PDF
Klub Innowacji UW - Centrum rafinacji informacji (cri)
PPT
Roumiana Ilieva - eVoting System & Information Modeling Approach
PPTX
The challenges of open data: emerging technology to support learner journeys
PPTX
Visualisation and its importance (2).pptx
PDF
Open Data Strategies & Mobile Government (GCC Perspective)
PPTX
Knowledge Graphs and their central role in big data processing: Past, Present...
Pablo de Pedraza: Labor market matching, economic cycle and online vacancies
Developing Competitive Strategies in Higher Education through Visual Data Mining
Profiling Linked Open Data
PSI e-infrastructures evaluation
Comparison of Elementary Dynamic Network Models Using Empirical Data
A. Nurra, From ICT survey data to experimental statistics; using IaD source f...
Engage Project on Open Data
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Predictive geospatial analytics using principal component regression
#opendata Back to the future
FIWARE Global Summit - The Digital Single Market - Benefits and Solutions for...
Similar Data Points Identification with LLM: A Human-in-the-Loop Strategy Usi...
Paul Davidson – Opening up public data to improve transparancy and efficiency
Analysing Transportation Data with Open Source Big Data Analytic Tools
Klub Innowacji UW - Centrum rafinacji informacji (cri)
Roumiana Ilieva - eVoting System & Information Modeling Approach
The challenges of open data: emerging technology to support learner journeys
Visualisation and its importance (2).pptx
Open Data Strategies & Mobile Government (GCC Perspective)
Knowledge Graphs and their central role in big data processing: Past, Present...

Recently uploaded (20)

PPTX
HOW TO HANDLE THE STAGE FOR ACADEMIA AND OTHERS.pptx
PPTX
power point presentation ofDracena species.pptx
PPTX
Shizophrnia ppt for clinical psychology students of AS
PPTX
CAPE CARIBBEAN STUDIES- Integration-1.pptx
PPTX
Paraphrasing Sentence To Make Your Writing More Interesting
PPTX
ANICK 6 BIRTHDAY....................................................
PPTX
Phrases and phrasal verb for a small step.
PDF
Module 7 guard mounting of security pers
PPTX
CASEWORK Pointers presentation Field instruction I
PPTX
Lesson 1 (Digital Media) - Multimedia.pptx
PPTX
Module_4_Updated_Presentation CORRUPTION AND GRAFT IN THE PHILIPPINES.pptx
PDF
IKS PPT.....................................
PDF
Presentation on cloud computing and ppt..
PDF
MODULE 3 BASIC SECURITY DUTIES AND ROLES.pdf
PDF
_Nature and dynamics of communities and community development .pdf
PPT
Lessons from Presentation Zen_ how to craft your story visually
PPTX
CASEWORK Power Point Presentation - pointers
PPTX
Phylogeny and disease transmission of Dipteran Fly (ppt).pptx
PPTX
Porpusive Communication for students 01.pptx
PPTX
Religious Thinkers Presentationof subcontinent
HOW TO HANDLE THE STAGE FOR ACADEMIA AND OTHERS.pptx
power point presentation ofDracena species.pptx
Shizophrnia ppt for clinical psychology students of AS
CAPE CARIBBEAN STUDIES- Integration-1.pptx
Paraphrasing Sentence To Make Your Writing More Interesting
ANICK 6 BIRTHDAY....................................................
Phrases and phrasal verb for a small step.
Module 7 guard mounting of security pers
CASEWORK Pointers presentation Field instruction I
Lesson 1 (Digital Media) - Multimedia.pptx
Module_4_Updated_Presentation CORRUPTION AND GRAFT IN THE PHILIPPINES.pptx
IKS PPT.....................................
Presentation on cloud computing and ppt..
MODULE 3 BASIC SECURITY DUTIES AND ROLES.pdf
_Nature and dynamics of communities and community development .pdf
Lessons from Presentation Zen_ how to craft your story visually
CASEWORK Power Point Presentation - pointers
Phylogeny and disease transmission of Dipteran Fly (ppt).pptx
Porpusive Communication for students 01.pptx
Religious Thinkers Presentationof subcontinent

Linked Open Government Data Analytics

  • 1. © Information Systems Lab - 2013 http://islab.uom.gr Linked Open Government Data Analytics Evangelos Kalampokis, Efthimios Tambouris, Konstantinos Tarabanis
  • 2. © Information Systems Lab, University of Macedonia Aim of the paper  Introduce the concept of Data Analytics on top of distributed statistical linked OGD  Describe the technical prerequisites  Demonstrate the end-user value
  • 3. © Information Systems Lab, University of Macedonia Open Government Data  More than 180 Open Government Data portals around the globe provide data that “can be freely used, reused and redistributed by anyone”
  • 4. © Information Systems Lab, University of Macedonia OGD impact  The majority of existing applications exploits a single dataset and visualizes data on a map.  Expected OGD potential has not yet realized
  • 5. © Information Systems Lab, University of Macedonia Importance of Data in modern societies  Business Intelligence  Evidence based policy-making  Academia
  • 6. © Information Systems Lab, University of Macedonia Open Statistical Data  A big portion of Open Government Data concerns statistics such as population figures, economic and social indicators  For example, the majority (5867 out of 6098 datasets) of the data published on the EU Open Data Portal are of statistical nature
  • 7. © Information Systems Lab, University of Macedonia  But…although OGD enables free access to everyone, data is often isolated (e.g. due to the available formats) Data Silos http://www.flickr.com/photos/rachelrusinski/526260022
  • 8. © Information Systems Lab, University of Macedonia Vision: Linked Open Government Data Analytics  Combining statistical OGD that were previously closed in disparate sources  Performing data analytics on top of combined data  Gaining unexpected and unexplored insights into different domains and problem areas.
  • 9. © Information Systems Lab, University of Macedonia Combining Statistical Data  Requires effort to: – Discover data (e.g. datasets sharing common joint points and thus allow for further analysis) – Collect data – Clean data (timely, accurate, relevant data) – Transform data (common formats) – Integrate data (interoperability, levels of granularity etc.) – Visualize and statistically analyze (semi-automatic according to the type of variables and measures)  We need to shift this effort from end-users to data-providers http://www.flickr.com/photos/tetsumo/3586864217
  • 10. © Information Systems Lab, University of Macedonia Connecting Data Silos  We need an infrastructure that will enable connecting data silos over the Web and thus reducing the effort required for statistical data reuse  This is where Linked Data comes in… http://www.flickr.com/photos/sethwoodworth/2303531107
  • 11. © Information Systems Lab, University of Macedonia Linked Data  Items in a dataset are identified using URIs  URIs are dereferenceable using HTTP  RDF links to other URIs in other datasets are included
  • 12. © Information Systems Lab, University of Macedonia Technical Prerequisites  Metadata for data discovery  Vocabularies  Code lists, concept schemes and classifications  Typed links (e.g. olws:sameAs) between – Dimensions definitions – Values of dimensions – Categories of measures
  • 13. © Information Systems Lab, University of Macedonia RDF data cube vocabulary
  • 14. © Information Systems Lab, University of Macedonia The UK Elections Case  Objective: – To gain insights regarding UK elections through OGD  Starting point: – Data regarding the results of two UK general elections from 2005 and 2010 – in both national and constituency level (Open Data in Guardian)  OGD: – We need to discover data that could be analyzed together with the election results data (i.e. that share common joint points)
  • 15. © Information Systems Lab, University of Macedonia OGD  Source: – Data from data.gov.uk  Datasets: – Unemployment and poverty between 2005-2010 in the UK parliament constituencies – In this paper we concentrate on unemployment due to space limitations
  • 16. © Information Systems Lab, University of Macedonia Data Conditioning: Linked Data Creation
  • 17. © Information Systems Lab, University of Macedonia Linked Data Analytics  Enables the semi-automatic visualization and performance of statistical analyses based on : – Joint points (i.e. variables that are described at a parliament constituency level) – Type of variables (e.g. Regression in the case of continuous and classification in the case of categorical)
  • 18. © Information Systems Lab, University of Macedonia Logistic regression Classification Analysis  Measures the relationship between a categorical dependent variable and one or more continuous independent variables by converting the dependent variable to probability scores through the logistic function  Identify the relationship of unemployment rate of a parliament constituency and the probability P(A) a particular political party to win the elections in the constituency  P(A)  1 1ey  y  c0 c1x1 ...cn xn
  • 19. © Information Systems Lab, University of Macedonia Visualization Unemployment & Labours Results (2005)  The probability for the Labour Party to win in a constituency increases as the unemployment rate of the constituency increases  In constituencies with unemployment rate > 5% the Labour Party has strong probability to win  In 2005 the average unemployment rate was 3.35%
  • 20. © Information Systems Lab, University of Macedonia Visualization Unemployment & Labours Results (2010)  The pattern is the same but was moved to the right.  The average unemployment rate was 3.35% in 2005 and 7.5% in 2010
  • 21. © Information Systems Lab, University of Macedonia Visualization Unemployment & Cons Results (2005)  In 2005 the average unemployment rate was 3.35%  If unemployment rate > 5% then Conservatives have very small probability to win
  • 22. © Information Systems Lab, University of Macedonia Visualization Unemployment & Cons Results (2010)  In 2010 Conservatives do not win in constituencies with unemployment rate >13%  However the average unemployment rate increased from 3.5% to 7.5%  The logistic regression pattern is the same
  • 23. © Information Systems Lab, University of Macedonia Statistical model creation  Logistic function that measures the probability P(A) for a party to win in a specific parliament constituency  For example, consider the Labour Party in the 2010 elections  x is the unemployment rate of the constituency.  In a constituency with 12% unemployment rate the probability for the Labour Party to win is P(A)=0.8  P(A)  1 1  ey  y  3.823 0.437 x
  • 24. © Information Systems Lab, University of Macedonia Conclusion and Future Work  Significant efforts for developing tools and applications facilitating Open Government Data (OGD) publishing and reuse  OGD has not yet realized the full potential.  Today, data analytics employ data closed in isolated systems  We claim that the real value of OGD will emerge from performing Data Analytics on top of combined statistical datasets  Linked Open Government Data Analytics show the road ahead  Future work includes development of a platform enabling semi- automatic identification of important relations between variables described in distributed datasets
  • 25. © Information Systems Lab, University of Macedonia Acknowledgments  The work presented in the paper is partly funded by