SlideShare a Scribd company logo
DATA VIRTUALIZATION
Packed Lunch Webinar Series
Sessions Covering Key Data Integration
Challenges Solved with Data Virtualization
Shaping the Role of a Data Lake in a
Modern Data Fabric Architecture
Pablo Alvarez
Global Director of
Product Management
Denodo
Alberto Pan
CTO
Denodo
3
The Rise and Fall of the Data Lake
• Data Lakes were often the flagship initiatives of the
Hadoop era
• However, few data lakes manage to fulfill initial
expectations, and often failed to deliver results
• Those “data swamps” were often criticized for lack of
process, governance and security
• However, many of the technological advances of those
data lakes lived on in newer technologies
4
The Advent of the Object Storage
• Object Storage is a form of storage for unstructured data (objects) that eliminates
scaling limitations of traditional storage options
• In other words, it is limitless in terms of capacity
• Its rooted in the Big Data initiatives of the early 2010’s, especially HDFS
• But it came to popularity with its adoption by cloud providers
• Nowadays, Amazon’s S3 (Simple Storage Service) and Azure’s ADLS (Azure Data Lake
Storage) are the most popular
• Although there are alternatives from other vendors (Google, Oracle, IBM, etc) and open
source options (like MinIO)
5
Object Storage is the Foundation of Cloud Data Systems
• Modern cloud data systems, like cloud EDW, data lakes and
the “lakehouse”, have evolved based on the premise of
separation of processing and storage
• Unlike traditional EDW, processing power was not tied to
additional disk space
• Object storage technologies provided the limitless storage they
needed, in a more cost-efficient way, and adapted to the cloud
• Open formats, like Parquet and Avro, specifically designed for
interoperability on analytics, helped them grow and gain
adoption
However, it’s
versatility has made
them useful beyond
“just” storage for
those systems
Let’s look at some
examples
7
Common Usage Patterns for Modern Data Lakes
• Cheap storage for backup, old or rarely
used data
• Ingest 3rd party data
• Move non-critical workloads to
cheaper systems
• Data science playground
• New life for legacy Hadoop efforts
• And many others
8
Can you work with an object storage alone?
• Object storage platforms provide limitless, cost-
efficient storage space
• However, they are still filesystems
• Although some client applications can connect and use
those files directly as if they were in a local filesystem,
processing data that way is not efficient
• In addition, object storage platforms offer limited
granularity in terms of security, and few options for
governance
• Incorporating an object storage in your data strategy
will need additional pieces
9
What else do we need?
1. In order to process data in the object storage efficiently, we will need a modern MPP engine that
can work in parallel to process large data volumes
• Most new generation cloud data systems, like Snowflake, Databricks, Presto, Redshift, etc. follow that design
2. But an MPP engine alone is not enough, as seen by the failures of previous incarnations of Data
Lake projects!
3. We need to bring additional options for data management:
• Fine-grained security and access control
• Documentation, classification and search capabilities to bring cataloguing and governance into the process
• Data integration capabilities to ingest, massage, curate and expose information in the right format
4. Additionally, we need to keep in mind that data in the object storage is just a portion of the data in
the organization. All data should be managed with consistency, regardless of location
10
Adding an MPP engine to the Denodo Platform
Logical Layer
Traditional
DB & DW
Cloud Excel
Lake filesystem
(S3/ADLS)
Lake Engine
MPP Engine
11
How does it work?
• Easy, efficient MPP access
to content in the object
storage
• No need for an
additional external
engine
• Integrated security and
management
• Out-of-the-box MPP
options for caching and
query acceleration
Logical Layer MPP Coordinator
MPP worker
MPP worker
MPP worker
MPP worker
Object
Storage
12
How does it work?
Object Storage configuration
Object Storage browsing
• Automated
deployment using
Kubernetes and Helm
charts
• Integrated
configuration
• Graphical browsing and
introspection of object
storage
13
Putting in Context
Denodo
Virtualization
Server
Denodo
Data Catalog
Denodo
Web Services
On-prem
data
Other Apps
IdP
Denodo
MPP
Warehouse A
Warehouse B
AWS S3 bucket
AWS Aurora
14
Move Non-Critical Workloads to Cheaper Systems
• Separation of compute and storage means that the
same data and queries can be computed with other
engines with minimal changes
• Denodo includes the tools to move and keep data
updated when needed
• A logical layer means that the change is transparent
for consumers
15
Cheap storage for backup, old or rarely used data
• Object storage is a great option for data that
is rarely used but that need to be stored for
backup or compliance reasons
• These data can be exported into Parquet and
moved to the object storage
• Denodo can automatically map these data
and make it accessible at no additional cost
16
Ingest 3rd party data
• An object storage where our partners have access is a
great way to offer a way to bring third party data into
the organization
• Data can be in parquet, but also in JSON, CSV or even
Excel
• Denodo can automatically map it
• And provide the right tools to massage and load in the
corporate systems on periodical bases
17
Data Science Playground
• Denodo provides access in SQL to any company data asset
• This data can be easily moved into the object storage,
where the MPP engine can efficiently process it for
deeper analysis
• Denodo offers native python drivers and is compatible
with popular data scient toolkits (e.g. pandas) and tools
(R, DataIku, etc.)
• Additionally, a data scientist may prefer to export content
to a parquet file and connect directly to that file from a
different platform, like Databricks
18
Conclusions
1. Object Storage technologies, especially in the cloud (S3,
ADLS, etc.), offer a very attractive and flexible technology
to store very large data volumes at low cost
2. New-gen MPP engines provide efficient processing
capabilities for data stored in an object storage,
especially when formats like Parquet are used
3. A logical layer, like Denodo, provides the additional
security, governance and data integration requirements
to safely introduce an object storage based data lake into
your data strategy
Fireside Chat:
Shaping the Role of a Data Lake in a
Modern Data Fabric Architecture
Pablo Alvarez
Global Director of
Product Management
Denodo
Alberto Pan
CTO
Denodo
Q&A
21
Next Steps
Access Denodo Platform in the Cloud.
Start your Free Trial today!
G E T STA RT E D TO DAY
www.denodo.com/free-trials
Logical Data Fabric
A Technical Whitepaper
DOWNLOAD WHITEPAPER
Thanks!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm,
without prior the written authorization from Denodo Technologies.

More Related Content

PDF
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
PDF
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
PDF
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
PDF
Minimizing the Complexities of Machine Learning with Data Virtualization
PDF
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
PDF
Data Virtualization: An Essential Component of a Cloud Data Lake
PDF
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
PDF
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Minimizing the Complexities of Machine Learning with Data Virtualization
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
Data Virtualization: An Essential Component of a Cloud Data Lake
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)

Similar to Shaping the Role of a Data Lake in a Modern Data Fabric Architecture (20)

PDF
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PDF
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
PPTX
Lecture 5- Data Collection and Storage.pptx
PPTX
Lecture 3.31 3.32.pptx
PDF
Building a Logical Data Fabric using Data Virtualization (ASEAN)
PDF
Unlock Your Data for ML & AI using Data Virtualization
PDF
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
PPTX
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
PDF
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
PDF
Cloud - NDT - Presentation
PDF
Accelerate Analytics and ML in the Hybrid Cloud Era
PPTX
Slide Share MDW Modern Data Warehouse DWH
PDF
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
PDF
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
PDF
Data Warehouse or Data Lake, Which Do I Choose?
PDF
Unraveling the Data Lake: MPP integration within a Logical Data Fabric
PDF
Data Orchestration for the Hybrid Cloud Era
PDF
So You Want to Build a Data Lake?
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Lecture 5- Data Collection and Storage.pptx
Lecture 3.31 3.32.pptx
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Unlock Your Data for ML & AI using Data Virtualization
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
Cloud - NDT - Presentation
Accelerate Analytics and ML in the Hybrid Cloud Era
Slide Share MDW Modern Data Warehouse DWH
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Warehouse or Data Lake, Which Do I Choose?
Unraveling the Data Lake: MPP integration within a Logical Data Fabric
Data Orchestration for the Hybrid Cloud Era
So You Want to Build a Data Lake?
Ad

More from Denodo (20)

PDF
Enterprise Monitoring and Auditing in Denodo
PDF
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
PDF
Achieving Self-Service Analytics with a Governed Data Services Layer
PDF
What you need to know about Generative AI and Data Management?
PDF
Mastering Data Compliance in a Dynamic Business Landscape
PDF
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
PDF
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
PDF
Drive Data Privacy Regulatory Compliance
PDF
Знакомство с виртуализацией данных для профессионалов в области данных
PDF
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
PDF
Denodo Partner Connect - Technical Webinar - Ask Me Anything
PDF
Lunch and Learn ANZ: Key Takeaways for 2023!
PDF
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
PDF
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
PDF
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
PDF
How to Build Your Data Marketplace with Data Virtualization?
PDF
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
PDF
Enabling Data Catalog users with advanced usability
PDF
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
PDF
GenAI y el futuro de la gestión de datos: mitos y realidades
Enterprise Monitoring and Auditing in Denodo
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Achieving Self-Service Analytics with a Governed Data Services Layer
What you need to know about Generative AI and Data Management?
Mastering Data Compliance in a Dynamic Business Landscape
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Drive Data Privacy Regulatory Compliance
Знакомство с виртуализацией данных для профессионалов в области данных
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Lunch and Learn ANZ: Key Takeaways for 2023!
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
How to Build Your Data Marketplace with Data Virtualization?
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Enabling Data Catalog users with advanced usability
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
GenAI y el futuro de la gestión de datos: mitos y realidades
Ad

Recently uploaded (20)

PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
.pdf is not working space design for the following data for the following dat...
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
modul_python (1).pptx for professional and student
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPT
Predictive modeling basics in data cleaning process
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Mega Projects Data Mega Projects Data
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Database Infoormation System (DBIS).pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
.pdf is not working space design for the following data for the following dat...
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
modul_python (1).pptx for professional and student
Supervised vs unsupervised machine learning algorithms
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Predictive modeling basics in data cleaning process
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Mega Projects Data Mega Projects Data
Galatica Smart Energy Infrastructure Startup Pitch Deck
IBA_Chapter_11_Slides_Final_Accessible.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Database Infoormation System (DBIS).pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...

Shaping the Role of a Data Lake in a Modern Data Fabric Architecture

  • 1. DATA VIRTUALIZATION Packed Lunch Webinar Series Sessions Covering Key Data Integration Challenges Solved with Data Virtualization
  • 2. Shaping the Role of a Data Lake in a Modern Data Fabric Architecture Pablo Alvarez Global Director of Product Management Denodo Alberto Pan CTO Denodo
  • 3. 3 The Rise and Fall of the Data Lake • Data Lakes were often the flagship initiatives of the Hadoop era • However, few data lakes manage to fulfill initial expectations, and often failed to deliver results • Those “data swamps” were often criticized for lack of process, governance and security • However, many of the technological advances of those data lakes lived on in newer technologies
  • 4. 4 The Advent of the Object Storage • Object Storage is a form of storage for unstructured data (objects) that eliminates scaling limitations of traditional storage options • In other words, it is limitless in terms of capacity • Its rooted in the Big Data initiatives of the early 2010’s, especially HDFS • But it came to popularity with its adoption by cloud providers • Nowadays, Amazon’s S3 (Simple Storage Service) and Azure’s ADLS (Azure Data Lake Storage) are the most popular • Although there are alternatives from other vendors (Google, Oracle, IBM, etc) and open source options (like MinIO)
  • 5. 5 Object Storage is the Foundation of Cloud Data Systems • Modern cloud data systems, like cloud EDW, data lakes and the “lakehouse”, have evolved based on the premise of separation of processing and storage • Unlike traditional EDW, processing power was not tied to additional disk space • Object storage technologies provided the limitless storage they needed, in a more cost-efficient way, and adapted to the cloud • Open formats, like Parquet and Avro, specifically designed for interoperability on analytics, helped them grow and gain adoption
  • 6. However, it’s versatility has made them useful beyond “just” storage for those systems Let’s look at some examples
  • 7. 7 Common Usage Patterns for Modern Data Lakes • Cheap storage for backup, old or rarely used data • Ingest 3rd party data • Move non-critical workloads to cheaper systems • Data science playground • New life for legacy Hadoop efforts • And many others
  • 8. 8 Can you work with an object storage alone? • Object storage platforms provide limitless, cost- efficient storage space • However, they are still filesystems • Although some client applications can connect and use those files directly as if they were in a local filesystem, processing data that way is not efficient • In addition, object storage platforms offer limited granularity in terms of security, and few options for governance • Incorporating an object storage in your data strategy will need additional pieces
  • 9. 9 What else do we need? 1. In order to process data in the object storage efficiently, we will need a modern MPP engine that can work in parallel to process large data volumes • Most new generation cloud data systems, like Snowflake, Databricks, Presto, Redshift, etc. follow that design 2. But an MPP engine alone is not enough, as seen by the failures of previous incarnations of Data Lake projects! 3. We need to bring additional options for data management: • Fine-grained security and access control • Documentation, classification and search capabilities to bring cataloguing and governance into the process • Data integration capabilities to ingest, massage, curate and expose information in the right format 4. Additionally, we need to keep in mind that data in the object storage is just a portion of the data in the organization. All data should be managed with consistency, regardless of location
  • 10. 10 Adding an MPP engine to the Denodo Platform Logical Layer Traditional DB & DW Cloud Excel Lake filesystem (S3/ADLS) Lake Engine MPP Engine
  • 11. 11 How does it work? • Easy, efficient MPP access to content in the object storage • No need for an additional external engine • Integrated security and management • Out-of-the-box MPP options for caching and query acceleration Logical Layer MPP Coordinator MPP worker MPP worker MPP worker MPP worker Object Storage
  • 12. 12 How does it work? Object Storage configuration Object Storage browsing • Automated deployment using Kubernetes and Helm charts • Integrated configuration • Graphical browsing and introspection of object storage
  • 13. 13 Putting in Context Denodo Virtualization Server Denodo Data Catalog Denodo Web Services On-prem data Other Apps IdP Denodo MPP Warehouse A Warehouse B AWS S3 bucket AWS Aurora
  • 14. 14 Move Non-Critical Workloads to Cheaper Systems • Separation of compute and storage means that the same data and queries can be computed with other engines with minimal changes • Denodo includes the tools to move and keep data updated when needed • A logical layer means that the change is transparent for consumers
  • 15. 15 Cheap storage for backup, old or rarely used data • Object storage is a great option for data that is rarely used but that need to be stored for backup or compliance reasons • These data can be exported into Parquet and moved to the object storage • Denodo can automatically map these data and make it accessible at no additional cost
  • 16. 16 Ingest 3rd party data • An object storage where our partners have access is a great way to offer a way to bring third party data into the organization • Data can be in parquet, but also in JSON, CSV or even Excel • Denodo can automatically map it • And provide the right tools to massage and load in the corporate systems on periodical bases
  • 17. 17 Data Science Playground • Denodo provides access in SQL to any company data asset • This data can be easily moved into the object storage, where the MPP engine can efficiently process it for deeper analysis • Denodo offers native python drivers and is compatible with popular data scient toolkits (e.g. pandas) and tools (R, DataIku, etc.) • Additionally, a data scientist may prefer to export content to a parquet file and connect directly to that file from a different platform, like Databricks
  • 18. 18 Conclusions 1. Object Storage technologies, especially in the cloud (S3, ADLS, etc.), offer a very attractive and flexible technology to store very large data volumes at low cost 2. New-gen MPP engines provide efficient processing capabilities for data stored in an object storage, especially when formats like Parquet are used 3. A logical layer, like Denodo, provides the additional security, governance and data integration requirements to safely introduce an object storage based data lake into your data strategy
  • 19. Fireside Chat: Shaping the Role of a Data Lake in a Modern Data Fabric Architecture Pablo Alvarez Global Director of Product Management Denodo Alberto Pan CTO Denodo
  • 20. Q&A
  • 21. 21 Next Steps Access Denodo Platform in the Cloud. Start your Free Trial today! G E T STA RT E D TO DAY www.denodo.com/free-trials Logical Data Fabric A Technical Whitepaper DOWNLOAD WHITEPAPER
  • 22. Thanks! www.denodo.com info@denodo.com © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.