SlideShare a Scribd company logo
policycloud.eu 1
02/07/2020
Pavlos Kranas (LeanXcale S.L)
PolicyCLOUD
Technical Overview
PolicyCloud has received funding from the European Union’s Horizon 2020 research and innovation
programme under grant agreement No 870675.
policycloud.eu 2
Objective
PolicyCLOUD: Analytics-as-a -Service facilitating
efficient data-driven public policy management
02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
policycloud.eu 3
Background
Facts
Increasing use of devices and networks leading to the generation of vast quantities of data
Data linking is becoming the norm (e.g. linking new data sources with established data sources)
Current approaches in policy making are not evidence-based
Mature approaches to analyse and understand the “environment”
Goal
Creation of efficient and effective policies through data-driven policy management
Decision support to authorities for policy modelling, implementation and simulation through
identified populations, as well as for policy enforcement and adaptation
02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
policycloud.eu 4
Main challenges (1/5)
A data-driven approach for effective policies management
Across the complete data path, including data modelling,
representation and interoperability, cleaning, heterogeneous
datasets linking, analytics for knowledge extraction
Exploit the collective knowledge out of policy “collections” combined
with the data from several sources (e.g. sensor readings, online
platforms, etc.)
02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
policycloud.eu 5
Main challenges (2/5)
Compilation, assessment and optimization of multi-domain policies
Holistic policy modelling, making and implementation in different sectors (e.g.
environment, migration, goods and services, etc.), through the analysis and
linking of KPIs of different policies that may be interdependent and inter-
correlated (e.g. environment)
Analysis of (unexpected) patterns and policies relationships
Identification of effective KPIs to be re-used
and non-effective ones (including the causes for not
being effective) towards their improvement
02/07/2020 Kick-off meeting Madrid
policycloud.eu 6
Main challenges (3/5)
Data management techniques across the complete data
path
Meta-interpretation layer for the semantic and syntactic capturing of
data properties and their representation
Data cleaning to ensure data quality and coherence including the
adaptive selection of information sources based on evolving volatility
levels (i.e. changing availability or engagement level of information
sources)
02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
policycloud.eu 7
Main challenges (4/5)
Analytics as a service reusable on top of different datasets
Machine and deep learning techniques (e.g. classification, regression,
clustering and frequent pattern mining) to infer new data and knowledge
Opinion mining, sentiment analysis, social dynamics and behavioral data analytics
Technologies that allow analytics tasks to be decoupled from specific
datasets and thus be triggered as services and applies to various cases
and datasets
02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
policycloud.eu 8
Main challenges (5/5)
Unique endpoint to exploit analytics in different cases
Execution of different models / analytical tools on data (e.g. to identify trends, to mine
opinion artefacts, to explore situational and context awareness information, to identify
sensitives, etc.)
Modelled policies (through their KPIs) realized / implemented and monitored against these
KPIs
Adaptive and incremental visualization enabling the policy lifecycle to be visualized in
different ways, while the visualization can be modified on the fly and can enable the
specification of the assets to be visualized (e.g. data sources or meta-processed information)
02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
policycloud.eu 9
Conceptual architecture
02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
policycloud.eu 10
Seamless Analytical Framework
Baseline Technology firstly introduced and implemented as Proof-
Of-Concept in H2020 BigDataStack
Collaborative work between IBM and LeanXcale
First prototype already delivered! Its functionality is planned to be
extended in PolicyCLOUD
02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
policycloud.eu 11
Seamless Analytical Framework
Modern enterprises use
Operational databases for OLTP load
Key-Value for IoT data
Data warehouses for data analytics
Datalakes
etc ...
Need polyglot capabilities
02/07/2020 Kick-off meeting Madrid
policycloud.eu 12
Seamless Analytical Framework
Nowadays: Data Federation using Spark
02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
policycloud.eu 13
Seamless Analytical Framework
Nowadays: Data Federation using Spark
BUT:
Can be very resource consuming
Cannot exploit the specific capabilities of each different datastore
02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
policycloud.eu 14
Seamless Analytical Framework – User Story
Data ingestion in operational datastore (LeanXcale)
Old data becomes historical, with no modifications
Data Warehouse to perform analytics on big data volumes
Distribution of datasets is problematic
Data to be retrieved from both stores
To be merged in the application level
Data consistency considerations when moving datasets
02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
policycloud.eu 15
Seamless Analytical Framework – Solution
Seamless Analytical Framework
Federate data coming from two different datastores:
HTAP Relational LXS Datastore
IBM Object store
sharing the SAME dataset
Single (black box) component that
consists of two datastores
exploits unique characteristics of each one
transparently from the user
does not compromise some requirements for the benefits of others
02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
policycloud.eu 16
Query Federation
02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
policycloud.eu 17
Data Movement High Level
02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
policycloud.eu 18
Supported Operations
Currently supports
Full Scan
Ordered Scan
LIMIT
Aggregations
Group By Aggregations
Ordered Group By Aggregations
Does not yet support
JOIN on fragmented data tables
02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
policycloud.eu 19
`1 with
Data
Skipping
CEP
Gatewa
y
Seamless component
LXS DB
Remote IBM COS
2/7/2020 Policy Cloud Data Driven Policies against Radicalisation
Data Quality
Assessment
accessingApplication
Data
Mov
er
Data Skipping
CEP
Machine Learning app.
Danaos (BigDataStack) Data Center
policycloud.eu 20
16.07.2019 Periodic Review Meeting
20
Relevant for SQL queries
Implemented for Apache Spark SQL
Up to latest Apache Spark version 3.0
Standalone technology but also nicely
complements the seamless component
Determine which objects are NOT relevant to a
SQL query using a data skipping index
Stores and indexes tiny summary metadata for
each object.
Skipping over irrelevant objects reduces the
bytes scanned
Index All
Objects
Data Set Objects
Query example: retrieve data of violent storms
SELECT vessel_code, datetime, longitude, latitude, wind_speed
FROM cos://us-south/…/danaos stored as parquet
WHERE wind_speed > 30
Data Skipping
WHERE
Clause
Data Skipping
Indexing
Candidate
Objects SQL
Dataset addressed
by an SQL query
policycloud.eu 21
16.07.2019 Periodic Review Meeting
21
Relevant for SQL queries
Implemented for Apache Spark SQL
Standalone technology but also nicely
complements the seamless component
Determine which objects are NOT relevant to a
SQL query using a data skipping index
Stores and indexes tiny summary metadata for
each object.
Skipping over irrelevant objects
reduces the bytes scanned
Saves time and $
Index All
Objects
Data Set Objects
Example: Look for data in violent storm conditions
SELECT vessel_code, datetime, longitude, latitude, wind_speed
FROM cos://us-south/…/danaos stored as parquet
WHERE wind_speed > 30
Saves Time
and $
WHERE
Clause
Data Skipping
Indexing
Candidate
Objects SQL
Data Skipping
policycloud.eu 22
Joint demo IBM/Danaos at the high visibility in IBM THINK ’19 conference
When could we try Data Skipping ….
policycloud.eu 23
Joint demo IBM/Danaos at the high visibility in IBM THINK ’19 conference
Data Skipping technology integrated as open beta into IBM Cloud SQL Query
When could we try Data Skipping …
policycloud.eu 24
GET IN TOUCH
@PolicyCloudEU
PolicyCloud EU
www.policycloud.eu
PolicyCloud has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 870675.

More Related Content

PDF
Evolution of Data Spaces
PDF
Building blocks for fair digital society
PDF
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
PDF
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
PDF
RD shared services and research data spring
PDF
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
PDF
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
PDF
Call for Papers - International Journal of Data Mining & Knowledge Management...
Evolution of Data Spaces
Building blocks for fair digital society
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
RD shared services and research data spring
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
Call for Papers - International Journal of Data Mining & Knowledge Management...

What's hot (20)

PDF
Europe rules – making the fair data economy flourish
PDF
International Journal of Data Mining & Knowledge Management Process(IJDKP)
PDF
Sitra rise of the pilots janne enberg
PDF
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
PDF
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
PDF
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
PDF
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
PPT
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
PPTX
Research Data Shared Service Webinar #1
PDF
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
PDF
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
PDF
Data Mining & Knowledge Management Process (IJDKP)
PDF
International Data Spaces: Data Sovereignty for Business Model Innovation
PDF
OPEN DEI vision about European Data Spaces
PPTX
EDF2014: Franck Cotton & Kamel Gadouche, France: TeraLab - A Secure Big Data...
PDF
Sitra data strategy
PDF
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
PDF
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
PDF
2018 03 codata - making the case
PPTX
Research Data Shared Service update at DPC
Europe rules – making the fair data economy flourish
International Journal of Data Mining & Knowledge Management Process(IJDKP)
Sitra rise of the pilots janne enberg
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
Research Data Shared Service Webinar #1
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
Data Mining & Knowledge Management Process (IJDKP)
International Data Spaces: Data Sovereignty for Business Model Innovation
OPEN DEI vision about European Data Spaces
EDF2014: Franck Cotton & Kamel Gadouche, France: TeraLab - A Secure Big Data...
Sitra data strategy
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
2018 03 codata - making the case
Research Data Shared Service update at DPC
Ad

Similar to Policy Cloud Data Driven - Technical overview (20)

PDF
Leveraging a big data model in the IT domain
PPTX
Standard Safeguarding Dataset - overview for CSCDUG.pptx
PDF
Real World Application of Big Data In Data Mining Tools
PDF
13 pv-do es-18-bigdata-v3
PDF
Data dynamics in IoT Era
PDF
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
PDF
Data management plans – EUDAT Best practices and case study | www.eudat.eu
PDF
IBM Think Milano
PPTX
chapter_2_Data Science, Addis ababa_new.pptx
PDF
Unlock Your Data for ML & AI using Data Virtualization
PPTX
20140902 LinDa Workshop Semantincs2014 - LinDA Project Overview
PDF
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
PDF
Catalysing research and enterprise collaboration in the data ecosystem
PPTX
Mapping presentation THAG big data from space
PPSX
Enterprise Information Architecture Using Data Mining
PDF
Hybrid Cloud Strategy for Big Data and Analytics
PPT
H2020 data pilot openaire
PPT
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...
PPT
data mining and the purpose of using mining
PDF
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Leveraging a big data model in the IT domain
Standard Safeguarding Dataset - overview for CSCDUG.pptx
Real World Application of Big Data In Data Mining Tools
13 pv-do es-18-bigdata-v3
Data dynamics in IoT Era
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Data management plans – EUDAT Best practices and case study | www.eudat.eu
IBM Think Milano
chapter_2_Data Science, Addis ababa_new.pptx
Unlock Your Data for ML & AI using Data Virtualization
20140902 LinDa Workshop Semantincs2014 - LinDA Project Overview
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Catalysing research and enterprise collaboration in the data ecosystem
Mapping presentation THAG big data from space
Enterprise Information Architecture Using Data Mining
Hybrid Cloud Strategy for Big Data and Analytics
H2020 data pilot openaire
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...
data mining and the purpose of using mining
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Ad

More from Big Data Value Association (20)

PDF
Data Privacy, Security in personal data sharing
PDF
Key Modules for a trsuted and privacy preserving personal data marketplace
PDF
GDPR and Data Ethics considerations in personal data sharing
PPTX
Intro - Three pillars for building a Smart Data Ecosystem: Trust, Security an...
PPTX
Three pillars for building a Smart Data Ecosystem: Trust, Security and Privacy
PPTX
Market into context - Three pillars for building a Smart Data Ecosystem: Trus...
PDF
BDV Skills Accreditation - Future of digital skills in Europe reskilling and ...
PDF
BDV Skills Accreditation - Big Data skilling in Emilia-Romagna
PDF
BDV Skills Accreditation - EIT labels for professionals
PDF
BDV Skills Accreditation - Recognizing Data Science Skills with BDV Data Scie...
PDF
BDV Skills Accreditation - Objectives of the workshop
PDF
BDV Skills Accreditation - Welcome introduction to the workshop
PDF
BDV Skills Accreditation - Definition and ensuring of digital roles and compe...
PDF
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector Webinar
PDF
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
PPTX
Virtual BenchLearning - Data Bench Framework
PPTX
Virtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
PPTX
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...
PDF
Policy Cloud Data Driven Policies against Radicalisation - Technical Overview
PDF
Policy Cloud Data Driven Policies against Radicalisation - Participatory poli...
Data Privacy, Security in personal data sharing
Key Modules for a trsuted and privacy preserving personal data marketplace
GDPR and Data Ethics considerations in personal data sharing
Intro - Three pillars for building a Smart Data Ecosystem: Trust, Security an...
Three pillars for building a Smart Data Ecosystem: Trust, Security and Privacy
Market into context - Three pillars for building a Smart Data Ecosystem: Trus...
BDV Skills Accreditation - Future of digital skills in Europe reskilling and ...
BDV Skills Accreditation - Big Data skilling in Emilia-Romagna
BDV Skills Accreditation - EIT labels for professionals
BDV Skills Accreditation - Recognizing Data Science Skills with BDV Data Scie...
BDV Skills Accreditation - Objectives of the workshop
BDV Skills Accreditation - Welcome introduction to the workshop
BDV Skills Accreditation - Definition and ensuring of digital roles and compe...
BigDataPilotDemoDays - I BiDaaS Application to the Manufacturing Sector Webinar
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
Virtual BenchLearning - Data Bench Framework
Virtual BenchLearning - DeepHealth - Needs & Requirements for Benchmarking
Virtual BenchLearning - I-BiDaaS - Industrial-Driven Big Data as a Self-Servi...
Policy Cloud Data Driven Policies against Radicalisation - Technical Overview
Policy Cloud Data Driven Policies against Radicalisation - Participatory poli...

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Business Analytics and business intelligence.pdf
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPT
Quality review (1)_presentation of this 21
PDF
Foundation of Data Science unit number two notes
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
IB Computer Science - Internal Assessment.pptx
Clinical guidelines as a resource for EBP(1).pdf
climate analysis of Dhaka ,Banglades.pptx
Business Analytics and business intelligence.pdf
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction to machine learning and Linear Models
IBA_Chapter_11_Slides_Final_Accessible.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Quality review (1)_presentation of this 21
Foundation of Data Science unit number two notes
.pdf is not working space design for the following data for the following dat...
oil_refinery_comprehensive_20250804084928 (1).pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Qualitative Qantitative and Mixed Methods.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck

Policy Cloud Data Driven - Technical overview

  • 1. policycloud.eu 1 02/07/2020 Pavlos Kranas (LeanXcale S.L) PolicyCLOUD Technical Overview PolicyCloud has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 870675.
  • 2. policycloud.eu 2 Objective PolicyCLOUD: Analytics-as-a -Service facilitating efficient data-driven public policy management 02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
  • 3. policycloud.eu 3 Background Facts Increasing use of devices and networks leading to the generation of vast quantities of data Data linking is becoming the norm (e.g. linking new data sources with established data sources) Current approaches in policy making are not evidence-based Mature approaches to analyse and understand the “environment” Goal Creation of efficient and effective policies through data-driven policy management Decision support to authorities for policy modelling, implementation and simulation through identified populations, as well as for policy enforcement and adaptation 02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
  • 4. policycloud.eu 4 Main challenges (1/5) A data-driven approach for effective policies management Across the complete data path, including data modelling, representation and interoperability, cleaning, heterogeneous datasets linking, analytics for knowledge extraction Exploit the collective knowledge out of policy “collections” combined with the data from several sources (e.g. sensor readings, online platforms, etc.) 02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
  • 5. policycloud.eu 5 Main challenges (2/5) Compilation, assessment and optimization of multi-domain policies Holistic policy modelling, making and implementation in different sectors (e.g. environment, migration, goods and services, etc.), through the analysis and linking of KPIs of different policies that may be interdependent and inter- correlated (e.g. environment) Analysis of (unexpected) patterns and policies relationships Identification of effective KPIs to be re-used and non-effective ones (including the causes for not being effective) towards their improvement 02/07/2020 Kick-off meeting Madrid
  • 6. policycloud.eu 6 Main challenges (3/5) Data management techniques across the complete data path Meta-interpretation layer for the semantic and syntactic capturing of data properties and their representation Data cleaning to ensure data quality and coherence including the adaptive selection of information sources based on evolving volatility levels (i.e. changing availability or engagement level of information sources) 02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
  • 7. policycloud.eu 7 Main challenges (4/5) Analytics as a service reusable on top of different datasets Machine and deep learning techniques (e.g. classification, regression, clustering and frequent pattern mining) to infer new data and knowledge Opinion mining, sentiment analysis, social dynamics and behavioral data analytics Technologies that allow analytics tasks to be decoupled from specific datasets and thus be triggered as services and applies to various cases and datasets 02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
  • 8. policycloud.eu 8 Main challenges (5/5) Unique endpoint to exploit analytics in different cases Execution of different models / analytical tools on data (e.g. to identify trends, to mine opinion artefacts, to explore situational and context awareness information, to identify sensitives, etc.) Modelled policies (through their KPIs) realized / implemented and monitored against these KPIs Adaptive and incremental visualization enabling the policy lifecycle to be visualized in different ways, while the visualization can be modified on the fly and can enable the specification of the assets to be visualized (e.g. data sources or meta-processed information) 02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
  • 9. policycloud.eu 9 Conceptual architecture 02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
  • 10. policycloud.eu 10 Seamless Analytical Framework Baseline Technology firstly introduced and implemented as Proof- Of-Concept in H2020 BigDataStack Collaborative work between IBM and LeanXcale First prototype already delivered! Its functionality is planned to be extended in PolicyCLOUD 02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
  • 11. policycloud.eu 11 Seamless Analytical Framework Modern enterprises use Operational databases for OLTP load Key-Value for IoT data Data warehouses for data analytics Datalakes etc ... Need polyglot capabilities 02/07/2020 Kick-off meeting Madrid
  • 12. policycloud.eu 12 Seamless Analytical Framework Nowadays: Data Federation using Spark 02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
  • 13. policycloud.eu 13 Seamless Analytical Framework Nowadays: Data Federation using Spark BUT: Can be very resource consuming Cannot exploit the specific capabilities of each different datastore 02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
  • 14. policycloud.eu 14 Seamless Analytical Framework – User Story Data ingestion in operational datastore (LeanXcale) Old data becomes historical, with no modifications Data Warehouse to perform analytics on big data volumes Distribution of datasets is problematic Data to be retrieved from both stores To be merged in the application level Data consistency considerations when moving datasets 02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
  • 15. policycloud.eu 15 Seamless Analytical Framework – Solution Seamless Analytical Framework Federate data coming from two different datastores: HTAP Relational LXS Datastore IBM Object store sharing the SAME dataset Single (black box) component that consists of two datastores exploits unique characteristics of each one transparently from the user does not compromise some requirements for the benefits of others 02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
  • 16. policycloud.eu 16 Query Federation 02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
  • 17. policycloud.eu 17 Data Movement High Level 02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
  • 18. policycloud.eu 18 Supported Operations Currently supports Full Scan Ordered Scan LIMIT Aggregations Group By Aggregations Ordered Group By Aggregations Does not yet support JOIN on fragmented data tables 02/07/2020 Policy Cloud Data Driven Policies against Radicalisation
  • 19. policycloud.eu 19 `1 with Data Skipping CEP Gatewa y Seamless component LXS DB Remote IBM COS 2/7/2020 Policy Cloud Data Driven Policies against Radicalisation Data Quality Assessment accessingApplication Data Mov er Data Skipping CEP Machine Learning app. Danaos (BigDataStack) Data Center
  • 20. policycloud.eu 20 16.07.2019 Periodic Review Meeting 20 Relevant for SQL queries Implemented for Apache Spark SQL Up to latest Apache Spark version 3.0 Standalone technology but also nicely complements the seamless component Determine which objects are NOT relevant to a SQL query using a data skipping index Stores and indexes tiny summary metadata for each object. Skipping over irrelevant objects reduces the bytes scanned Index All Objects Data Set Objects Query example: retrieve data of violent storms SELECT vessel_code, datetime, longitude, latitude, wind_speed FROM cos://us-south/…/danaos stored as parquet WHERE wind_speed > 30 Data Skipping WHERE Clause Data Skipping Indexing Candidate Objects SQL Dataset addressed by an SQL query
  • 21. policycloud.eu 21 16.07.2019 Periodic Review Meeting 21 Relevant for SQL queries Implemented for Apache Spark SQL Standalone technology but also nicely complements the seamless component Determine which objects are NOT relevant to a SQL query using a data skipping index Stores and indexes tiny summary metadata for each object. Skipping over irrelevant objects reduces the bytes scanned Saves time and $ Index All Objects Data Set Objects Example: Look for data in violent storm conditions SELECT vessel_code, datetime, longitude, latitude, wind_speed FROM cos://us-south/…/danaos stored as parquet WHERE wind_speed > 30 Saves Time and $ WHERE Clause Data Skipping Indexing Candidate Objects SQL Data Skipping
  • 22. policycloud.eu 22 Joint demo IBM/Danaos at the high visibility in IBM THINK ’19 conference When could we try Data Skipping ….
  • 23. policycloud.eu 23 Joint demo IBM/Danaos at the high visibility in IBM THINK ’19 conference Data Skipping technology integrated as open beta into IBM Cloud SQL Query When could we try Data Skipping …
  • 24. policycloud.eu 24 GET IN TOUCH @PolicyCloudEU PolicyCloud EU www.policycloud.eu PolicyCloud has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 870675.