SlideShare a Scribd company logo
Multi-zone Data Virtualization for Data Lakes
How to share data with other government agencies
preserving privacy and security guidelines
Paul Grooten
October 26th, 2017
Statistics Netherlands (CBS) Key Characteristics
2Multi-zone Data Virtualization for Data Lakes |
Autonomous Public Body with a Legal Entity (“ZBO”)
Official Statistics
Economic - Social - Census
National and Regional
180 mEur 2000+
The Hague Heerlen Bonaire
Founded in 1899 (5 fte, 2 rooms), now 3 offices
Ambition: to become the Data Hub of the Dutch Government
Data
Collection
Data
Processing
Publishing
Statistical process
Which problems do we want to solve
• Current methods and technologies are not sufficient
anymore to share data easily on a bigger scale
• We want to share more statistical data
(also with external parties)
• We want to become faster and need a shorter time
to market
• We need to reduce costs (storage, infrastructure)
• We need to work on secure & privacy preserved
data sharing
• Data sets should be easy to find
3Multi-zone Data Virtualization for Data Lakes |
The layered Data Architecture
4
Demand
Supply
(Legacy)
Datasources
Data Source Layer
(DSL)
CSV
SQL
DB
Web
Srv
ETL tooling
XLS
App
CBDS
Vraag
Consumer Layer
(CL)

Web PageS2STooling
P V A
P V A= Data Prep = Data Visualization = Data Analytics
Security
DataVirtualization
DENODO
Data Transformation
Layer (DTL)
Data Provisioning
Layer (DPL)
Building
Block 1
Building
Block 2
Building
Block 3
Building
Block 4
Web-
Service C
OData Web-
Service B
Web-
Service A
Security
User
Que.
DataGovernance
Tech
Meta
MetadataManagement
Import Conceptual Meta
Conn.
String
Existing New
CIO office | Versie 1.81
Security&Autorisation
Multi-zone Data Virtualization for Data Lakes |
…towards a multi zone DaaS Architecture
5
Security
CL
Datasources
DSL
DataVirtualization
DTL
DPL
DataGovernance
Existing New
User
Que.
MetadataManagement
Tech
Meta
Zone CBS
DDC=Departemental Data Center | UDC=Urban Data Center | CL=Consumer Layer
DPL=Data Provisioning Layer | DTL=DataTransformation Layer | DSL=Data Source Layer
Building
Block 1
Building
Block 2
Web-
Service A
CBDS
DSC
P V
A
P V A= Data Prep = Data Visualization = Data Analytics
Security
Zone DDC1
Building
Block 7
Building
Block 8
Web-
Service D
DDC1
Secured
VPN
P V
A
Zone UDC1
Building
Block 3
Building
Block 4
Web-
Service B
UDC1
Secured
VPN
P V
A
Zone UDC2
Building
Block 5
Building
Block 6
Web-
Service C
UDC2
Secured
VPN
P V
A
EHB
Security&Autorisation
So what is a Zone?
6
Security
CL
Datasources
DSL
DataVirtualization
DTL
DPL
DataGovernance
Existing New
User
Que.
MetadataManagement
Tech
Meta
Zone CBS
DDC=Departemental Data Center | UDC=Urban Data Center | CL=Consumer Layer
DPL=Data Provisioning Layer | DTL=DataTransformation Layer | DSL=Data Source Layer
Building
Block 1
Building
Block 2
Web-
Service A
CBDS
DSC
P V
A
P V A= Data Prep = Data Visualization = Data Analytics
Security
Zone EZ
Building
Block 7
Building
Block 8
Web-
Service D
DDC1
Secured
VPN
P V
A
Zone UDC1
Building
Block 3
Building
Block 4
Web-
Service B
UDC1
Secured
VPN
P V
A
Zone UDC2
Building
Block 5
Building
Block 6
Web-
Service C
UDC2
Secured
VPN
P V
A
EHB
Security&Autorisation
A Zone :
• Is a virtual container in
which a specified set of Data
Governance rules apply
• Has a specific user group
• Contains virtual datasets
• Has it’s own authorization
(which can and will differ
from other zones)
• Has an owner
• Has it’s own Change
Advisory Board
• Can have it’s own cache
database on it’s own
hardware
What do we want to achieve with the Data Lake
7
€ M
{ "
stimulateCost data-
access
Statistical
Risc
Growth Re-use
Time to
Market
reduce
Multi-zone Data Virtualization for Data Lakes |
What are our next steps
• Finish Proof-of-Concept by end 2017
• Develop product (MVP)
• Get approval from Board of Management
• Implement Minimal Viable Product in 2018/H2
• Enhance MVP with new functionalities,
like disclosure control (confidentiality
on-the-fly) protection
8Multi-zone Data Virtualization for Data Lakes |
Recommendations
• Check whether your strategy is in line
with your plans (v.v.)
• Start experimenting with Data
Virtualization in an early stage
(start with the express version)
• Build a culture that embraces change and communicate your
plans as often as possible
9Multi-zone Data Virtualization for Data Lakes |
Denodo DataFest 2017: Multi-zone Data Virtualization for Data Lakes

More Related Content

PDF
Denodo DataFest 2017: Data Virtualization in the World of Edge Computing
PDF
Denodo DataFest 2017: Lowering IT Costs with Big Data and Cloud Modernization
PDF
Denodo DataFest 2016: Big Data Virtualization in the Cloud
PDF
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
PPTX
Delivering Quality Open Data by Chelsea Ursaner
PDF
KEYNOTE: Edge optimized architecture for fabric defect detection in real-time
PDF
Data Virtualization for Data Architects (New Zealand)
PDF
Denodo DataFest 2017: Outpace Your Competition with Real-Time Responses
Denodo DataFest 2017: Data Virtualization in the World of Edge Computing
Denodo DataFest 2017: Lowering IT Costs with Big Data and Cloud Modernization
Denodo DataFest 2016: Big Data Virtualization in the Cloud
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Delivering Quality Open Data by Chelsea Ursaner
KEYNOTE: Edge optimized architecture for fabric defect detection in real-time
Data Virtualization for Data Architects (New Zealand)
Denodo DataFest 2017: Outpace Your Competition with Real-Time Responses

What's hot (20)

PDF
Data Virtualization: From Zero to Hero (Middle East)
PDF
Data virtualization an introduction
PDF
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
PDF
Data Virtualization for Data Architects (Australia)
PDF
Agile Data Management with Enterprise Data Fabric (ASEAN)
PDF
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
PDF
Agile Data Management with Enterprise Data Fabric (Middle East)
PDF
Denodo DataFest 2017: Modern Data Architectures Need Real-time Data Delivery
PPTX
Powering Self Service Business Intelligence with Hadoop and Data Virtualization
PDF
Denodo DataFest 2017: Company Leadership from Data Leadership
PDF
Unlock Your Data for ML & AI using Data Virtualization
PDF
Cloud Modernization with Data Virtualization
PDF
Data Virtualization: From Zero to Hero
PDF
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
PDF
3 Reasons Data Virtualization Matters in Your Portfolio
PDF
Multi-Cloud Integration with Data Virtualization (ASEAN)
PDF
Data Virtualization to Survive a Multi and Hybrid Cloud World
PDF
The CDO Agenda: how data architecture can help?
PPTX
Prologis: How Data Virtualization Enables Data Scientists
PDF
Multi-Cloud-Datenintegration mit Datenvirtualisierung
Data Virtualization: From Zero to Hero (Middle East)
Data virtualization an introduction
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
Data Virtualization for Data Architects (Australia)
Agile Data Management with Enterprise Data Fabric (ASEAN)
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Agile Data Management with Enterprise Data Fabric (Middle East)
Denodo DataFest 2017: Modern Data Architectures Need Real-time Data Delivery
Powering Self Service Business Intelligence with Hadoop and Data Virtualization
Denodo DataFest 2017: Company Leadership from Data Leadership
Unlock Your Data for ML & AI using Data Virtualization
Cloud Modernization with Data Virtualization
Data Virtualization: From Zero to Hero
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
3 Reasons Data Virtualization Matters in Your Portfolio
Multi-Cloud Integration with Data Virtualization (ASEAN)
Data Virtualization to Survive a Multi and Hybrid Cloud World
The CDO Agenda: how data architecture can help?
Prologis: How Data Virtualization Enables Data Scientists
Multi-Cloud-Datenintegration mit Datenvirtualisierung
Ad

Viewers also liked (8)

PDF
Denodo DataFest 2017: Denodo 7.0 Demo. Centralized Self-Service Search and Di...
PDF
Performance Considerations in Logical Data Warehouse
PDF
Denodo DataFest 2017: The Need for Speed and Agility in Business
PDF
Denodo DataFest 2017: Enabling Single View of Entities with Microservices
PPTX
Data Virtualization - Supernova
PDF
Denodo DataFest 2017: Succeeding in Self-Service BI
PDF
Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise...
PDF
Denodo DataFest 2017: Edge Computing: Collecting vs. Connecting to Streaming ...
Denodo DataFest 2017: Denodo 7.0 Demo. Centralized Self-Service Search and Di...
Performance Considerations in Logical Data Warehouse
Denodo DataFest 2017: The Need for Speed and Agility in Business
Denodo DataFest 2017: Enabling Single View of Entities with Microservices
Data Virtualization - Supernova
Denodo DataFest 2017: Succeeding in Self-Service BI
Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise...
Denodo DataFest 2017: Edge Computing: Collecting vs. Connecting to Streaming ...
Ad

Similar to Denodo DataFest 2017: Multi-zone Data Virtualization for Data Lakes (11)

PDF
Data Lakes: A Logical Approach for Faster Unified Insights
PDF
Data Virtualization: An Essential Component of a Cloud Data Lake
PPTX
Data lake protection ft 3119 -ver1.0
PDF
Big data from the trenches
PDF
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
PPTX
Top Trends in Building Data Lakes for Machine Learning and AI
PDF
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
PDF
The Great Lakes: How to Approach a Big Data Implementation
PPTX
Security Framework for Multitenant Architecture
PPTX
Necessity of Data Lakes in the Financial Services Sector
PDF
Enabling a Data Mesh Architecture with Data Virtualization
Data Lakes: A Logical Approach for Faster Unified Insights
Data Virtualization: An Essential Component of a Cloud Data Lake
Data lake protection ft 3119 -ver1.0
Big data from the trenches
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Top Trends in Building Data Lakes for Machine Learning and AI
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
The Great Lakes: How to Approach a Big Data Implementation
Security Framework for Multitenant Architecture
Necessity of Data Lakes in the Financial Services Sector
Enabling a Data Mesh Architecture with Data Virtualization

More from Denodo (20)

PDF
Enterprise Monitoring and Auditing in Denodo
PDF
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
PDF
Achieving Self-Service Analytics with a Governed Data Services Layer
PDF
What you need to know about Generative AI and Data Management?
PDF
Mastering Data Compliance in a Dynamic Business Landscape
PDF
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
PDF
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
PDF
Drive Data Privacy Regulatory Compliance
PDF
Знакомство с виртуализацией данных для профессионалов в области данных
PDF
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
PDF
Denodo Partner Connect - Technical Webinar - Ask Me Anything
PDF
Lunch and Learn ANZ: Key Takeaways for 2023!
PDF
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
PDF
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
PDF
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
PDF
How to Build Your Data Marketplace with Data Virtualization?
PDF
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
PDF
Enabling Data Catalog users with advanced usability
PDF
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
PDF
GenAI y el futuro de la gestión de datos: mitos y realidades
Enterprise Monitoring and Auditing in Denodo
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Achieving Self-Service Analytics with a Governed Data Services Layer
What you need to know about Generative AI and Data Management?
Mastering Data Compliance in a Dynamic Business Landscape
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Drive Data Privacy Regulatory Compliance
Знакомство с виртуализацией данных для профессионалов в области данных
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Lunch and Learn ANZ: Key Takeaways for 2023!
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
How to Build Your Data Marketplace with Data Virtualization?
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Enabling Data Catalog users with advanced usability
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
GenAI y el futuro de la gestión de datos: mitos y realidades

Recently uploaded (20)

PDF
Introduction to the R Programming Language
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Introduction to machine learning and Linear Models
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Computer network topology notes for revision
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Mega Projects Data Mega Projects Data
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Quality review (1)_presentation of this 21
PPTX
climate analysis of Dhaka ,Banglades.pptx
Introduction to the R Programming Language
IBA_Chapter_11_Slides_Final_Accessible.pptx
Database Infoormation System (DBIS).pptx
Reliability_Chapter_ presentation 1221.5784
Miokarditis (Inflamasi pada Otot Jantung)
Clinical guidelines as a resource for EBP(1).pdf
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Introduction to machine learning and Linear Models
Supervised vs unsupervised machine learning algorithms
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Computer network topology notes for revision
Fluorescence-microscope_Botany_detailed content
Galatica Smart Energy Infrastructure Startup Pitch Deck
Mega Projects Data Mega Projects Data
[EN] Industrial Machine Downtime Prediction
STUDY DESIGN details- Lt Col Maksud (21).pptx
Quality review (1)_presentation of this 21
climate analysis of Dhaka ,Banglades.pptx

Denodo DataFest 2017: Multi-zone Data Virtualization for Data Lakes

  • 1. Multi-zone Data Virtualization for Data Lakes How to share data with other government agencies preserving privacy and security guidelines Paul Grooten October 26th, 2017
  • 2. Statistics Netherlands (CBS) Key Characteristics 2Multi-zone Data Virtualization for Data Lakes | Autonomous Public Body with a Legal Entity (“ZBO”) Official Statistics Economic - Social - Census National and Regional 180 mEur 2000+ The Hague Heerlen Bonaire Founded in 1899 (5 fte, 2 rooms), now 3 offices Ambition: to become the Data Hub of the Dutch Government Data Collection Data Processing Publishing Statistical process
  • 3. Which problems do we want to solve • Current methods and technologies are not sufficient anymore to share data easily on a bigger scale • We want to share more statistical data (also with external parties) • We want to become faster and need a shorter time to market • We need to reduce costs (storage, infrastructure) • We need to work on secure & privacy preserved data sharing • Data sets should be easy to find 3Multi-zone Data Virtualization for Data Lakes |
  • 4. The layered Data Architecture 4 Demand Supply (Legacy) Datasources Data Source Layer (DSL) CSV SQL DB Web Srv ETL tooling XLS App CBDS Vraag Consumer Layer (CL)  Web PageS2STooling P V A P V A= Data Prep = Data Visualization = Data Analytics Security DataVirtualization DENODO Data Transformation Layer (DTL) Data Provisioning Layer (DPL) Building Block 1 Building Block 2 Building Block 3 Building Block 4 Web- Service C OData Web- Service B Web- Service A Security User Que. DataGovernance Tech Meta MetadataManagement Import Conceptual Meta Conn. String Existing New CIO office | Versie 1.81 Security&Autorisation Multi-zone Data Virtualization for Data Lakes |
  • 5. …towards a multi zone DaaS Architecture 5 Security CL Datasources DSL DataVirtualization DTL DPL DataGovernance Existing New User Que. MetadataManagement Tech Meta Zone CBS DDC=Departemental Data Center | UDC=Urban Data Center | CL=Consumer Layer DPL=Data Provisioning Layer | DTL=DataTransformation Layer | DSL=Data Source Layer Building Block 1 Building Block 2 Web- Service A CBDS DSC P V A P V A= Data Prep = Data Visualization = Data Analytics Security Zone DDC1 Building Block 7 Building Block 8 Web- Service D DDC1 Secured VPN P V A Zone UDC1 Building Block 3 Building Block 4 Web- Service B UDC1 Secured VPN P V A Zone UDC2 Building Block 5 Building Block 6 Web- Service C UDC2 Secured VPN P V A EHB Security&Autorisation
  • 6. So what is a Zone? 6 Security CL Datasources DSL DataVirtualization DTL DPL DataGovernance Existing New User Que. MetadataManagement Tech Meta Zone CBS DDC=Departemental Data Center | UDC=Urban Data Center | CL=Consumer Layer DPL=Data Provisioning Layer | DTL=DataTransformation Layer | DSL=Data Source Layer Building Block 1 Building Block 2 Web- Service A CBDS DSC P V A P V A= Data Prep = Data Visualization = Data Analytics Security Zone EZ Building Block 7 Building Block 8 Web- Service D DDC1 Secured VPN P V A Zone UDC1 Building Block 3 Building Block 4 Web- Service B UDC1 Secured VPN P V A Zone UDC2 Building Block 5 Building Block 6 Web- Service C UDC2 Secured VPN P V A EHB Security&Autorisation A Zone : • Is a virtual container in which a specified set of Data Governance rules apply • Has a specific user group • Contains virtual datasets • Has it’s own authorization (which can and will differ from other zones) • Has an owner • Has it’s own Change Advisory Board • Can have it’s own cache database on it’s own hardware
  • 7. What do we want to achieve with the Data Lake 7 € M { " stimulateCost data- access Statistical Risc Growth Re-use Time to Market reduce Multi-zone Data Virtualization for Data Lakes |
  • 8. What are our next steps • Finish Proof-of-Concept by end 2017 • Develop product (MVP) • Get approval from Board of Management • Implement Minimal Viable Product in 2018/H2 • Enhance MVP with new functionalities, like disclosure control (confidentiality on-the-fly) protection 8Multi-zone Data Virtualization for Data Lakes |
  • 9. Recommendations • Check whether your strategy is in line with your plans (v.v.) • Start experimenting with Data Virtualization in an early stage (start with the express version) • Build a culture that embraces change and communicate your plans as often as possible 9Multi-zone Data Virtualization for Data Lakes |