BIG DATA EUROPE
PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL
EUROPE IN A CHANGING WORLD - INCLUSIVE, INNOVATIVE AND REFLECTIVE SOCIETIES
WORKSHOP: THE CHALLENGES OF BIG DATA FOR SOCIETIES
IN A CHANGING WORLD, 05 DECEMBER 2016
MARTIN KALTENBÖCK (CFO, SEMANTIC WEB COMPANY)
Integrating Big Data, Software & Communities for Addressing
Europe’s Societal ChallengesBDE SC6 Workshop
Big Data Europe (CSA: 2015-17)
 Show societal value of Big Data: 7 Domains
 Lower barrier for using big data technologies
o Required effort and resources
o Limited data science skills
 Help establishing cross-
lingual/organizational/domain Data Value
Chains 16-déc.-16
Big Data Europe
16-déc.-16
COORDINATION
Stakeholder Engagement
(Requirements Elicitation)
SUPPORT
Design, Realise, Evaluate
Big Data Aggregator
Platform
Create and Manage Societal
Big Data Interest Groups
Cloud-deployment ready
Big Data Aggregator
Platform
CSA
Measures
Results
THE BDE PLATFORM
ARCHITECTURE & COMPONENTS
Integrating Big Data, Software & Communities for Addressing
Europe’s Societal Challenges
The three Big Data „V“ Variety is often
neglected
Current State of Platform Architecture
Adding a Semantic Layer to Data
Lakes Manufacturing Marketing Sales SupportAccounting
Semantic Data Lake
• central place for
model, schema and
data historization
• Combination of Scale
Out (cost reduction)
and semantics
(increased control &
flexibility)
• grows incrementally
(pay-as-you-go)
Inbound
Data Sources
Outbound and
Consumption
Inbound Raw Data Store
Data Lake (order of magnitude cheaper scalable data store)
Knowledge Graph for Relationship Definition and Meta Data
Frontend to Access Relationship and KPI Definition /
Documentation
Frontend to Access (ad hoc) Reports
Outbound Data Delivery to
Target Systems
JSON-LD CSVW R2RMLXML2RDF
Why to use BDE Technology?
Hortonworks Cloudera MapR Bigtop BDE
File System HDFS HDFS NFS HDFS HDFS
Installation Native Native Native Native lightweight virtualization
Plug & play components
(no rigid schema)
no no no no yes
High Availability Single failure
recovery (yarn)
Single failure
recovery (yarn)
Self healing, mult.
failure rec.
Single failure recovery
(yarn)
Multiple Failure recovery
Cost Commercial Commercial Commercial Free Free
Scaling Freemium Freemium Freemium Free Free
Addition of custom
components
Not easy No No No Yes
Integration testing yes yes yes yes --
Operating systems Linux Linux Linux Linux All
Management tool Ambari Cloudera manager MapR Control system - Docker swarm UI+ Custom
SC6 PILOT
CITIZENS BUDGET ON MUNICIPAL LEVEL
ARCHITECTURE & COMPONENTS
Integrating Big Data, Software & Communities for Addressing
Europe’s Societal Challenges
SC6 in Big Data Europe – what is
included
 Europe in a changing world - inclusive,
innovative and reflective societies
 Social Sciences
 Smart Statistics
 (Digital) Humanities
 Digital (Research) Archives
16-déc.-16www.big-data-europe.eu
SC6: Social Sciences
16-déc.-16www.big-data-europe.eu
Pilot focus area:
Citizens budget spending
on municipal level
Big Data Focus area: Statistical
and research data linking &
integration
Selected Key Data assets: Detailed
budget execution data in city level,
statistical data from public data
portals and statistical offices,
federated social sciences data
SC6 Pilot: Idea & Objectives
State of the Art:
o Budget: the most important document of public policy
o Budget execution affects everyday lives
o Citizens are more involved in city level activities
Objective:
Can we make budgets more useful for citizens,
researchers and decision makers?
16-déc.-16
SC6 Pilot: Idea & Objectives
 Create an online Dashboard on Economic Data
o Harvest data from several sources in diff. formats
o Normalise the data (RDF)
o Link & map the data (attributes, structure, languages)
o Analyse the data – financial ratios (comparisons,
predictions etc.)
o Visualise the analysis on an online dashboard
including help & infos to understand data & analysis
o Procide raw data (for further use as open data)16-déc.-16www.big-data-europe.eu
2 H2020 projects working together on the SC6 Pilot
• Big Data Europe
• Your Data Stories
SC6 Pilot core team: Ivana Versic (Cessda), Michalis Vafopoulos
(NCSR-D),
Martin Kaltenböck (SWC), Jürgen Jakobitsch (SWC), Hossein Abroshan
(Cessda)
SC6 Pilot Partners
Data used / produced in Pilot
Budget Data and Budget Execution Data
 Municipality of Athens, Greece
o Description: budget execution data in detail
o Frequency: daily
o Ownership: open
o Format: API
 Municipality of Thessaloniki, Greece
o Description: budget execution data in detail
o Frequency: daily
o Ownership: open
o Format: csv, xls (files for download provided)
16-déc.-16www.big-data-europe.eu
 Municipality of Kalamaria, Greece
o Description: budget execution data in detail
o Frequency: weekly
o Ownership: open
o Format: csv, xls (files for download provided)
 Additional Open Data
o Description: economic taxonomies etc.
o Ownership: open
o Format: RDF (skos, owl), other
o E.g. COFOG (UN Classification)
 Size of Data
o ~ 30 Mio triples (statements) for 1 year
4 Vs of Big Data in SC6 Pilot
 Variety: requirement based on the harvesting of budget data and
budget execution data from several sources, available in different
structures and formats.
 Volume: requirement regarding the growing amount of open budget
data available as well as of budget execution data
 Velocity: requirements regarding budget execution data that is
provided on continuous basis by the publisher (daily, weekly, monthly).
 Veracity: Veracity refers to the biases, noise and abnormality in data.
Even for within the same country there are differences on the published
data because often are coming from different systems or public
accounting standards are not enforced absolutely uniformly (e.g.
different municipal departments) 16-déc.-16www.big-data-europe.eu
SC6: Social Sciences
www.big-data-europe.eu
Pilot Architecture & Components
SC6 Pilot - Architecture
16-déc.-16www.big-data-europe.eu
SC6 Pilot: Technical
Components
 Apache Flume, https://flume.apache.org/ (data ingestion)
 Apache Kafka, http://kafka.apache.org (messaging service)
 Apache Spark, http://spark.apache.org (distributed analysis, transformation)
 Apache HDFS, http://hadoop.apache.org (raw data storage)
 SWCs’ PoolParty Semantic Suite, http://poolparty.biz (data consolidation, curation,
mapping)
 OpenLink s’ Virtuoso, http://virtuoso.openlinksw.com (triple store – data storage)
 Apache HTTP, http://httpd.apache.org (linked data serving)
 Apache Avro, http://avro.apache.org/docs/current/ (intermediate data schema)
 D3 JS Library, https://d3js.org/ (visualisation of RDF data using SPARQL queries)
 SWCs’ PoolParty GraphSearch (SPARQL based interface component for filter & faceted
search)
16-déc.-16www.big-data-europe.eu
SC6 Pilot: 1st version
implemented
16-déc.-16www.big-data-europe.eu
https://bde.poolparty.bizGraphSearchSC6
SC6 Pilot: Pilot Evaluation
Evaluation Approach SC6 Pilot (starts 01/2017):
 Invite municipalities to evaluate and use the system
 Invite community (open data, data community, BDE community, W3C)
 Evaluate within the participating projects (BDE, DataStories, invite:
OpenBudget)
 BDE SC6 workshop in Cologne, 5.12.2016
Additional evaluation – tests over time with
 a growing amount of data
 a growing number of different sources & formats docked onto the system
 additional analytics in place
16-déc.-16www.big-data-europe.eu
How to benefit best from BDE
16-déc.-16www.big-data-europe.eu
• BDE Workshops& Webinars
• Use & expand the BDE Platform (BDE
github)
• Visit Website: news, events, community, …
• Big Data Europe W3C Community Group
• 7+1x Mailing Lists – stay tuned!
• BDE Platform website coming soon!!
• Related EC Call on Big Data, open until 02
Feb2017:
Policy-development in the age of big data:
data-driven policy-making, policy-modelling
Contacts:
 CESSDA, http://cessda.net/
Ivana Ilijasic Versic, ivana.versic@cessda.net
Hossein Abroshan, hossein.abroshan@cessda.net
 NCSR-D, http://www.demokritos.gr/?lang=en
Michalis Vafopoulos, vafopoulos@gmail.com
 Semantic Web Company (SWC), http://www.semantic-web.at
Martin Kaltenböck, m.kaltenboeck@semantic-web.at
Jürgen Jakobitsch, j.jakobitsch@semantic-web.at
16-déc.-16www.big-data-europe.eu
Questions & Contacts
www.big-data-europe.eu
16-déc.-16
#BigDataEurope
Martin Kaltenböck
CFO, Semantic Web Company
m.kaltenboeck@semantic-web.at
http://www.linkedin.com/in/martinkaltenboeck
https://twitter.com/kalte2707
http://de.slideshare.net/MartinKaltenboeck
http://blog.semantic-web.at

More Related Content

PPTX
BDE SC6.2 Workshop-05/12/16 - CESSDA
PPTX
BDE SC6-pilot - 05/12/16 - cologne Michalis Vafopoulos
PPTX
BDE SC6 workshop - introduction 2016
PPTX
2016 09-28 bde sc6-pilot-webinar vaf
PPTX
Bde euro proworkshop
PDF
Big Data Europe Concept and Platform
PDF
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
PDF
BDE SC3.3 Workshop - BDE review: Scope and Opportunities
BDE SC6.2 Workshop-05/12/16 - CESSDA
BDE SC6-pilot - 05/12/16 - cologne Michalis Vafopoulos
BDE SC6 workshop - introduction 2016
2016 09-28 bde sc6-pilot-webinar vaf
Bde euro proworkshop
Big Data Europe Concept and Platform
Big Data Europe SC6 WS #3: PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL, Mart...
BDE SC3.3 Workshop - BDE review: Scope and Opportunities

What's hot (20)

PDF
SC7 Hangout 3: Community Building Activities for Big Data in Secure Societies
PDF
BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
PDF
BDE Webinar: How does the research community benefit from the new EU General ...
PDF
Big data value policy context and public private partnership
PDF
Josep Maria Salanova - Introduction to BDE+SC4
PPTX
SC1 Workshop 2 General Introduction to BDE
PDF
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
PDF
SC1 - Hangout 2: The Open PHACTS pilot
PDF
SC7 Hangout 3: Architecture of the BDE Pilot for Secure Societies
PDF
Big data Europe: concept, platform and pilots
PPTX
SC4 Workshop 2: Soren Auer BDE project Overview
PPTX
SC1 Workshop 2 Technical overview
PDF
Bde sc3 2nd_workshop_2016_10_04_p01_bde_introduction
PDF
BDE-BDVA Webinar: BigDataEurope Overview & Synergies with BDVA
PPTX
BDE SC4 Hangout - Simon Scerri, Introduction
PPTX
SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"
PDF
LNdata en Strata Conference 2013
PDF
Bde sc3 2nd_workshop_2016_10_04_p09_csi
PPTX
Rethinking public sector data ecosystems - Open Government Data, Semantic Med...
PPTX
Platform introduction & Summary
SC7 Hangout 3: Community Building Activities for Big Data in Secure Societies
BDE-SC1 Webinar: OpenPHACTS Re-engineered with Big Data Europe
BDE Webinar: How does the research community benefit from the new EU General ...
Big data value policy context and public private partnership
Josep Maria Salanova - Introduction to BDE+SC4
SC1 Workshop 2 General Introduction to BDE
BDE Webinar: SC6 - EUROPE IN A CHANGING WORLD -INCLUSIVE, INNOVATIVE AND REFL...
SC1 - Hangout 2: The Open PHACTS pilot
SC7 Hangout 3: Architecture of the BDE Pilot for Secure Societies
Big data Europe: concept, platform and pilots
SC4 Workshop 2: Soren Auer BDE project Overview
SC1 Workshop 2 Technical overview
Bde sc3 2nd_workshop_2016_10_04_p01_bde_introduction
BDE-BDVA Webinar: BigDataEurope Overview & Synergies with BDVA
BDE SC4 Hangout - Simon Scerri, Introduction
SC7 Webinar 4 04/05/2017 NCSR Demokritos Presentation "Event Detection"
LNdata en Strata Conference 2013
Bde sc3 2nd_workshop_2016_10_04_p09_csi
Rethinking public sector data ecosystems - Open Government Data, Semantic Med...
Platform introduction & Summary
Ad

Viewers also liked (20)

PDF
Second SC5 Pilot: Identifying the Release Location of a Substance
PDF
The Rationale and Methodology of the 2nd SC5 Pilot
PDF
BDE-SC6 Hangout - “Insight into Virtual Currency Ecosystems”
PPTX
BDE SC6-hang out - technology part-SWC - Martin
PDF
BigDataEurope - Big Data & Secure Societies
PDF
BioASQ and BDE in SC1.1
PDF
Bde sc3 2nd_workshop_2016_10_04_p06_bde_pilot
PDF
SC7 Workshop 2: Demo of the BigDataEurope pilot for Secure Societies
PPTX
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
PDF
SC7 Hangout 2: Community Building activities for Big Data in Secure Societies
PPTX
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
PDF
SC7 Workshop 2: Big Data Technologies and Scenarios
PDF
1st BDE SC5 pilot: rationale, components and reusability
PDF
The physics background of the BDE SC5 pilot cases
PDF
SC7 Workshop 2: Big Data Challenges in building GEOSS
PDF
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoring
PDF
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
PDF
Bde sc3 2nd_workshop_2016_10_04_p07_laustsen_jens
PDF
SC7 Workshop 2: Space-based applications and Big Data
PDF
SC7 Workshop 2: The BigDataEurope project
Second SC5 Pilot: Identifying the Release Location of a Substance
The Rationale and Methodology of the 2nd SC5 Pilot
BDE-SC6 Hangout - “Insight into Virtual Currency Ecosystems”
BDE SC6-hang out - technology part-SWC - Martin
BigDataEurope - Big Data & Secure Societies
BioASQ and BDE in SC1.1
Bde sc3 2nd_workshop_2016_10_04_p06_bde_pilot
SC7 Workshop 2: Demo of the BigDataEurope pilot for Secure Societies
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
SC7 Hangout 2: Community Building activities for Big Data in Secure Societies
Apache Big_Data Europe event: "Integrators at work! Real-life applications of...
SC7 Workshop 2: Big Data Technologies and Scenarios
1st BDE SC5 pilot: rationale, components and reusability
The physics background of the BDE SC5 pilot cases
SC7 Workshop 2: Big Data Challenges in building GEOSS
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoring
Apache Big_Data Europe event: "Demonstrating the Societal Value of Big & Smar...
Bde sc3 2nd_workshop_2016_10_04_p07_laustsen_jens
SC7 Workshop 2: Space-based applications and Big Data
SC7 Workshop 2: The BigDataEurope project
Ad

Similar to BDE SC6-ws-05/12/2016 technology part - SWC (20)

PDF
BigDataEurope @BDVA Summit2016 2: Societal Pilots
PPTX
Linked Open Data (LOD) Pilot Austria
PDF
European Data Portal - ePSI platform webinar 8 February 2016
PPTX
Dublinked tech workshop_15_dec2011
PDF
BigDataEurope @BDVA Summit2016 1: The BDE Platform
PDF
SC7 Workshop 1: Big Data in Secure Societies
PDF
SC6 Workshop 1: What can big data do for you?
PPTX
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
PPTX
StatDCAT-Application Profile: presentation
PDF
IES Cities Project Overview and API: IES Cities Hackathon, Zaragoza
PPTX
Publishing Linked Statistical Data: Aragón, a case study
PPT
Putting the L in front: from Open Data to Linked Open Data
PPT
SAP Slides (28.10.09) Web
PDF
OVH Analytics Data Compute and Apache Spark as a Service
PDF
SC7 Workshop 3: Big Data Europe Project
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PPTX
BDE: Concepts, Platform and Pilots
PPT
Data as a service
PPTX
SC2 Workshop 1: Big Data Europe (BDE) - Project Overview & Food Workshop
PDF
Data management plans – EUDAT Best practices and case study | www.eudat.eu
BigDataEurope @BDVA Summit2016 2: Societal Pilots
Linked Open Data (LOD) Pilot Austria
European Data Portal - ePSI platform webinar 8 February 2016
Dublinked tech workshop_15_dec2011
BigDataEurope @BDVA Summit2016 1: The BDE Platform
SC7 Workshop 1: Big Data in Secure Societies
SC6 Workshop 1: What can big data do for you?
Creating and Utilizing Linked Open Statistical Data for the Development of Ad...
StatDCAT-Application Profile: presentation
IES Cities Project Overview and API: IES Cities Hackathon, Zaragoza
Publishing Linked Statistical Data: Aragón, a case study
Putting the L in front: from Open Data to Linked Open Data
SAP Slides (28.10.09) Web
OVH Analytics Data Compute and Apache Spark as a Service
SC7 Workshop 3: Big Data Europe Project
Advanced Analytics and Machine Learning with Data Virtualization
BDE: Concepts, Platform and Pilots
Data as a service
SC2 Workshop 1: Big Data Europe (BDE) - Project Overview & Food Workshop
Data management plans – EUDAT Best practices and case study | www.eudat.eu

More from BigData_Europe (20)

PDF
Luigi Selmi - The Big Data Integrator Platform
PDF
Rajendra Akerkar - LeMO Project
PDF
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
PDF
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
PDF
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
PDF
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
PDF
BDE SC3.3 Workshop - Agenda
PDF
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
PDF
BDE SC3.3 Workshop - Data management in WT testing and monitoring
PDF
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
PDF
BDE SC3.3 Workshop - BDE Platform: Technical overview
PDF
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
PDF
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
PDF
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
PDF
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
PPTX
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
PPTX
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
PPTX
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
PPTX
BDE SC1 Workshop 3 - Big Data Europe (Simon Scerri)
PPTX
SC1 Hangout: Updating public databases: Automation and other challenges for c...
Luigi Selmi - The Big Data Integrator Platform
Rajendra Akerkar - LeMO Project
Big Data Europe SC6 WS #3: Big Data Europe Platform: Apps, challenges, goals ...
Big Data Europe SC6 WS 3: Where we are and are going for Big Data in OpenScie...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...
BDE SC3.3 Workshop - Agenda
BDE SC3.3 Workshop - BDE Pilot case for Wind Turbine condition monitoring re...
BDE SC3.3 Workshop - Data management in WT testing and monitoring
BDE SC3.3 Workshop - Big Data in Wind Turbine Condition Monitoring
BDE SC3.3 Workshop - BDE Platform: Technical overview
BDE SC3.3 Workshop - Options for Wind Farm performance assessment and Power f...
BDE SC3.3 Workshop - Wind Farm Monitoring and advanced analytics
Big Data Europe: Workshop 3 SC6 Social Science: THE IMPORTANCE OF METADATA & ...
BDE SC1 Workshop 3 - BigMedilytics Overview (Supriyo Chatterjea)
BDE SC1 Workshop 3 - iASiS (Guillermo Palma)
BDE SC1 Workshop 3 - MIDAS (Michaela Black)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Big Data Europe (Simon Scerri)
SC1 Hangout: Updating public databases: Automation and other challenges for c...

Recently uploaded (20)

PDF
A biomechanical Functional analysis of the masitary muscles in man
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
Global Data and Analytics Market Outlook Report
PDF
Microsoft Core Cloud Services powerpoint
PDF
Best Data Science Professional Certificates in the USA | IABAC
PPT
DU, AIS, Big Data and Data Analytics.ppt
PPTX
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
PDF
An essential collection of rules designed to help businesses manage and reduc...
PPT
statistics analysis - topic 3 - describing data visually
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
CHAPTER-2-THE-ACCOUNTING-PROCESS-2-4.pptx
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
recommendation Project PPT with details attached
PPTX
New ISO 27001_2022 standard and the changes
PPTX
Tapan_20220802057_Researchinternship_final_stage.pptx
PDF
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
PPT
statistic analysis for study - data collection
PDF
Session 11 - Data Visualization Storytelling (2).pdf
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
A biomechanical Functional analysis of the masitary muscles in man
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Global Data and Analytics Market Outlook Report
Microsoft Core Cloud Services powerpoint
Best Data Science Professional Certificates in the USA | IABAC
DU, AIS, Big Data and Data Analytics.ppt
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
An essential collection of rules designed to help businesses manage and reduc...
statistics analysis - topic 3 - describing data visually
CYBER SECURITY the Next Warefare Tactics
CHAPTER-2-THE-ACCOUNTING-PROCESS-2-4.pptx
retention in jsjsksksksnbsndjddjdnFPD.pptx
recommendation Project PPT with details attached
New ISO 27001_2022 standard and the changes
Tapan_20220802057_Researchinternship_final_stage.pptx
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
statistic analysis for study - data collection
Session 11 - Data Visualization Storytelling (2).pdf
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx

BDE SC6-ws-05/12/2016 technology part - SWC

  • 1. BIG DATA EUROPE PILOT SC6: CITIZEN BUDGET ON MUNICIPAL LEVEL EUROPE IN A CHANGING WORLD - INCLUSIVE, INNOVATIVE AND REFLECTIVE SOCIETIES WORKSHOP: THE CHALLENGES OF BIG DATA FOR SOCIETIES IN A CHANGING WORLD, 05 DECEMBER 2016 MARTIN KALTENBÖCK (CFO, SEMANTIC WEB COMPANY) Integrating Big Data, Software & Communities for Addressing Europe’s Societal ChallengesBDE SC6 Workshop
  • 2. Big Data Europe (CSA: 2015-17)  Show societal value of Big Data: 7 Domains  Lower barrier for using big data technologies o Required effort and resources o Limited data science skills  Help establishing cross- lingual/organizational/domain Data Value Chains 16-déc.-16
  • 3. Big Data Europe 16-déc.-16 COORDINATION Stakeholder Engagement (Requirements Elicitation) SUPPORT Design, Realise, Evaluate Big Data Aggregator Platform Create and Manage Societal Big Data Interest Groups Cloud-deployment ready Big Data Aggregator Platform CSA Measures Results
  • 4. THE BDE PLATFORM ARCHITECTURE & COMPONENTS Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges
  • 5. The three Big Data „V“ Variety is often neglected
  • 6. Current State of Platform Architecture
  • 7. Adding a Semantic Layer to Data Lakes Manufacturing Marketing Sales SupportAccounting Semantic Data Lake • central place for model, schema and data historization • Combination of Scale Out (cost reduction) and semantics (increased control & flexibility) • grows incrementally (pay-as-you-go) Inbound Data Sources Outbound and Consumption Inbound Raw Data Store Data Lake (order of magnitude cheaper scalable data store) Knowledge Graph for Relationship Definition and Meta Data Frontend to Access Relationship and KPI Definition / Documentation Frontend to Access (ad hoc) Reports Outbound Data Delivery to Target Systems JSON-LD CSVW R2RMLXML2RDF
  • 8. Why to use BDE Technology? Hortonworks Cloudera MapR Bigtop BDE File System HDFS HDFS NFS HDFS HDFS Installation Native Native Native Native lightweight virtualization Plug & play components (no rigid schema) no no no no yes High Availability Single failure recovery (yarn) Single failure recovery (yarn) Self healing, mult. failure rec. Single failure recovery (yarn) Multiple Failure recovery Cost Commercial Commercial Commercial Free Free Scaling Freemium Freemium Freemium Free Free Addition of custom components Not easy No No No Yes Integration testing yes yes yes yes -- Operating systems Linux Linux Linux Linux All Management tool Ambari Cloudera manager MapR Control system - Docker swarm UI+ Custom
  • 9. SC6 PILOT CITIZENS BUDGET ON MUNICIPAL LEVEL ARCHITECTURE & COMPONENTS Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges
  • 10. SC6 in Big Data Europe – what is included  Europe in a changing world - inclusive, innovative and reflective societies  Social Sciences  Smart Statistics  (Digital) Humanities  Digital (Research) Archives 16-déc.-16www.big-data-europe.eu
  • 11. SC6: Social Sciences 16-déc.-16www.big-data-europe.eu Pilot focus area: Citizens budget spending on municipal level Big Data Focus area: Statistical and research data linking & integration Selected Key Data assets: Detailed budget execution data in city level, statistical data from public data portals and statistical offices, federated social sciences data
  • 12. SC6 Pilot: Idea & Objectives State of the Art: o Budget: the most important document of public policy o Budget execution affects everyday lives o Citizens are more involved in city level activities Objective: Can we make budgets more useful for citizens, researchers and decision makers? 16-déc.-16
  • 13. SC6 Pilot: Idea & Objectives  Create an online Dashboard on Economic Data o Harvest data from several sources in diff. formats o Normalise the data (RDF) o Link & map the data (attributes, structure, languages) o Analyse the data – financial ratios (comparisons, predictions etc.) o Visualise the analysis on an online dashboard including help & infos to understand data & analysis o Procide raw data (for further use as open data)16-déc.-16www.big-data-europe.eu
  • 14. 2 H2020 projects working together on the SC6 Pilot • Big Data Europe • Your Data Stories SC6 Pilot core team: Ivana Versic (Cessda), Michalis Vafopoulos (NCSR-D), Martin Kaltenböck (SWC), Jürgen Jakobitsch (SWC), Hossein Abroshan (Cessda) SC6 Pilot Partners
  • 15. Data used / produced in Pilot Budget Data and Budget Execution Data  Municipality of Athens, Greece o Description: budget execution data in detail o Frequency: daily o Ownership: open o Format: API  Municipality of Thessaloniki, Greece o Description: budget execution data in detail o Frequency: daily o Ownership: open o Format: csv, xls (files for download provided) 16-déc.-16www.big-data-europe.eu  Municipality of Kalamaria, Greece o Description: budget execution data in detail o Frequency: weekly o Ownership: open o Format: csv, xls (files for download provided)  Additional Open Data o Description: economic taxonomies etc. o Ownership: open o Format: RDF (skos, owl), other o E.g. COFOG (UN Classification)  Size of Data o ~ 30 Mio triples (statements) for 1 year
  • 16. 4 Vs of Big Data in SC6 Pilot  Variety: requirement based on the harvesting of budget data and budget execution data from several sources, available in different structures and formats.  Volume: requirement regarding the growing amount of open budget data available as well as of budget execution data  Velocity: requirements regarding budget execution data that is provided on continuous basis by the publisher (daily, weekly, monthly).  Veracity: Veracity refers to the biases, noise and abnormality in data. Even for within the same country there are differences on the published data because often are coming from different systems or public accounting standards are not enforced absolutely uniformly (e.g. different municipal departments) 16-déc.-16www.big-data-europe.eu
  • 18. SC6 Pilot - Architecture 16-déc.-16www.big-data-europe.eu
  • 19. SC6 Pilot: Technical Components  Apache Flume, https://flume.apache.org/ (data ingestion)  Apache Kafka, http://kafka.apache.org (messaging service)  Apache Spark, http://spark.apache.org (distributed analysis, transformation)  Apache HDFS, http://hadoop.apache.org (raw data storage)  SWCs’ PoolParty Semantic Suite, http://poolparty.biz (data consolidation, curation, mapping)  OpenLink s’ Virtuoso, http://virtuoso.openlinksw.com (triple store – data storage)  Apache HTTP, http://httpd.apache.org (linked data serving)  Apache Avro, http://avro.apache.org/docs/current/ (intermediate data schema)  D3 JS Library, https://d3js.org/ (visualisation of RDF data using SPARQL queries)  SWCs’ PoolParty GraphSearch (SPARQL based interface component for filter & faceted search) 16-déc.-16www.big-data-europe.eu
  • 20. SC6 Pilot: 1st version implemented 16-déc.-16www.big-data-europe.eu https://bde.poolparty.bizGraphSearchSC6
  • 21. SC6 Pilot: Pilot Evaluation Evaluation Approach SC6 Pilot (starts 01/2017):  Invite municipalities to evaluate and use the system  Invite community (open data, data community, BDE community, W3C)  Evaluate within the participating projects (BDE, DataStories, invite: OpenBudget)  BDE SC6 workshop in Cologne, 5.12.2016 Additional evaluation – tests over time with  a growing amount of data  a growing number of different sources & formats docked onto the system  additional analytics in place 16-déc.-16www.big-data-europe.eu
  • 22. How to benefit best from BDE 16-déc.-16www.big-data-europe.eu • BDE Workshops& Webinars • Use & expand the BDE Platform (BDE github) • Visit Website: news, events, community, … • Big Data Europe W3C Community Group • 7+1x Mailing Lists – stay tuned! • BDE Platform website coming soon!! • Related EC Call on Big Data, open until 02 Feb2017: Policy-development in the age of big data: data-driven policy-making, policy-modelling
  • 23. Contacts:  CESSDA, http://cessda.net/ Ivana Ilijasic Versic, ivana.versic@cessda.net Hossein Abroshan, hossein.abroshan@cessda.net  NCSR-D, http://www.demokritos.gr/?lang=en Michalis Vafopoulos, vafopoulos@gmail.com  Semantic Web Company (SWC), http://www.semantic-web.at Martin Kaltenböck, m.kaltenboeck@semantic-web.at Jürgen Jakobitsch, j.jakobitsch@semantic-web.at 16-déc.-16www.big-data-europe.eu
  • 24. Questions & Contacts www.big-data-europe.eu 16-déc.-16 #BigDataEurope Martin Kaltenböck CFO, Semantic Web Company m.kaltenboeck@semantic-web.at http://www.linkedin.com/in/martinkaltenboeck https://twitter.com/kalte2707 http://de.slideshare.net/MartinKaltenboeck http://blog.semantic-web.at

Editor's Notes

  • #3: Project obecjtives: Addressing each of the Societal Challenge domains (7), we have a domain representative for each & a pilot instantiation of the BDE platform for each in progress One of the challenges to Big Data opportunities is the lack of skills (data science) – our aim is to provide out of the box technology with not a lot of training required to use and apply BDE technology can be applied in multiple domains and in different phases within Data Value Chains, working with different data providers and addressing multiple objectives (as opposed to current solutions, which tend to be very specific to one data source or domain, and address one objective.
  • #4: Project obecjtives: Addressing each of the Societal Challenge domains (7), we have a domain representative for each & a pilot instantiation of the BDE platform for each in progress One of the challenges to Big Data opportunities is the lack of skills (data science) – our aim is to provide out of the box technology with not a lot of training required to use and apply BDE technology can be applied in multiple domains and in different phases within Data Value Chains, working with different data providers and addressing multiple objectives (as opposed to current solutions, which tend to be very specific to one data source or domain, and address one objective.
  • #6: http://www.gi.de/nc/service/informatiklexikon/detailansicht/article/big-data.html
  • #8: Data Lake is a storage repository for big data scale raw data in original data formats. late binding approach to schema: “Let us decide, when we need it.” scale out architecture on commodity infrastructure, mostly with HFS/Hadoop/Spark, which gives a huge cost advantage – about factor 10 compared to data warehouses. Semantic Data Lake = Data Lake + Knowledge Graph management of structure (vocabularies/schemas, KPIs trees, metadata, …) on top of the Data Lake is performed in a knowledge graph - a complex data fabric representing all kinds of things and how they relate to each other. A knowledge graph is unique regarding flexibility, multiple views and metadata capabilities. Based on the Resource Description Framework (RDF) standard and Linked Data principles.
  • #9: Data Lake is a storage repository for big data scale raw data in original data formats. late binding approach to schema: “Let us decide, when we need it.” scale out architecture on commodity infrastructure, mostly with HFS/Hadoop/Spark, which gives a huge cost advantage – about factor 10 compared to data warehouses. Semantic Data Lake = Data Lake + Knowledge Graph management of structure (vocabularies/schemas, KPIs trees, metadata, …) on top of the Data Lake is performed in a knowledge graph - a complex data fabric representing all kinds of things and how they relate to each other. A knowledge graph is unique regarding flexibility, multiple views and metadata capabilities. Based on the Resource Description Framework (RDF) standard and Linked Data principles.