SlideShare a Scribd company logo
Unraveling the Data Lake.
MPP integration within a
Logical Data Fabric
Antonio Tortosa
Technical Consultant | Denodo
AGENDA
1. The challenge of cloud object storage
2. Incorporating Massive Parallel Processing engines into a logical data
fabric
3. Denodo Platform and Presto
The challenge of cloud
object storage
4
The simplified version of object storage
The challenge of cloud object storage
Source: Amazon S3. How it works
■ Cheap storage for backup, old or rarely used data
■ Ingest 3rd party data
■ Move non-critical workloads to cheaper systems
■ Data science playground
5
The reality of enterprise data strategy
The challenge of cloud object storage
Data Lake / Object Storage
Enterprise Data
Warehouse
Business Intelligence
Reporting
Data Discovery
Other Apps
On-prem
data
CDC
ETL
6
The missing pieces
The challenge of cloud object storage
Processing - An engine capable of effectively processing the data stored
However, an MPP engine alone is not enough, as seen by the failures of previous
incarnations of Data Lake projects
Integration - A logical model serving a common canonical view of the data ecosystem
Data in the object storage is just a portion of the data in the organization. All data
should be managed with consistency, regardless of location
Data Management - Fine grained Security & Data Governance
Ease of data discovery. Documentation, classification and search capabilities.
Fine-grained security and access control
Incorporating MPP into a
logical data fabric
8
Parallel Processing of object storage data
Incorporating MPP into a logical data fabric
Logical Layer MPP Coordinator
MPP Worker
MPP Worker
MPP Worker
MPP Worker
Object
Storage
Data query
Data flow
Other calls
9
Integration of object storage with the data ecosystem
Incorporating MPP into a logical data fabric
Logical Layer MPP Coordinator
Other Sources
MPP Worker
MPP Worker
MPP Worker
MPP Worker
Object
Storage
Data query
Data flow
Other calls
Denodo Platform and Presto
11
Execution
At execution time Denodo sends the
query to Presto, now having objects
storage files natively mapped.
In addition, if other Denodo data
sources have to be used Presto uses
its Denodo connector to pull that
data into the worker nodes memory
in real-time.
Introspection
Denodo can connect to the object
storage, for example S3 buckets, and
graphically browse the folders and
files.
Parquet files, folders with content,
and partitions are automatically
detected and the developers choose
the ones that will become Denodo
base views.
Mapping
Denodo connects to Presto and
creates the necessary structures to
map the object storage files in the
target schema. Denodo automatically
detects field data types and
partitions.
Denodo then creates base views
from these tables.
The process at a glance
Denodo Platform and Presto
12
Introspection of Object Storage
Denodo Platform and Presto
MPP Worker
MPP Worker
MPP Worker
Object
Storage
■ The MPP Workers need to have the object storage files mapped
internally as tables.
○ This is typically done manually by data engineers and need different
tools to navigate the object storage and to create the tables in the
MPP engine.
■ Denodo simplifies this process by providing a unified point of view.
○ The same tool that allows introspection into the object storage
manages the mapping of the files to tables.
13
Introspection of Object Storage
Denodo Platform and Presto
14
Mapping of Object Storage files into views
Denodo Platform and Presto
15
Execution - Presto with other sources
Denodo Platform and Presto
Logical Layer MPP Coordinator
Other Sources
MPP Worker
MPP Worker
MPP Worker
MPP Worker
Object
Storage
SQL
query
Data flow
Other
calls
16
Execution - Presto with other sources
Denodo Platform and Presto
90,859 rows
2,880,404 rows
17
Execution - Presto with other sources
Denodo Platform and Presto
Fully delegated query
to Presto
■ customer_crm is brought into memory
in real time to the worker nodes so its
locally referenced as
tmp_231_885_0_2549
■ store_sales was previously mapped
by Denodo Platform and is referenced in
this query as
vdp_table_167509671357
18
Enterprise Data Architecture
Final Solution
Data Lake / Object Storage
Enterprise Data
Warehouse
Business Intelligence
Reporting
Other Apps
On-prem
data
CDC
ETL
Denodo
Virtual DataPort
Denodo
Web Services
Denodo
MPP
Denodo
Data Catalog
CLOSING
REMARKS
▪ There is a renewed interest in Data Lakes thanks to cloud object storage solving some of the
original drawbacks
▪ However, they are not a single solution to a realistic enterprise data strategy. In addition to
this we must consider as well
○ Cost-effective processing
○ Integration with other sources within the enterprise data ecosystem
○ Data governance and data security
▪ Denodo Platform has always provided these. Yet, in its 2023Q1 update, the state-of-the-art
integration with Presto provides a even better solution to the integration of Data Lakes into a
logical data fabric.
▪ Now, Denodo customers can, from a unified access layer,
○ Introspect object storage files
○ Integrate object storage data with other corporate sources to increase data adoption
▪ With this, Denodo Platform will seamlessly colocate the data in the Presto MPP cluster to
accelerate query execution.
Q&A
Thanks!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and
microfilm, without prior the written authorization from Denodo Technologies.

More Related Content

PDF
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
PDF
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
PDF
In Memory Parallel Processing for Big Data Scenarios
PDF
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
PDF
Lunch and Learn ANZ: Shaping the Role of a Data Lake in a Modern Data Fabric ...
PDF
Can data virtualization uphold performance with complex queries?
PDF
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
PDF
Performance Acceleration: Summaries, Recommendation, MPP and more
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
In Memory Parallel Processing for Big Data Scenarios
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Lunch and Learn ANZ: Shaping the Role of a Data Lake in a Modern Data Fabric ...
Can data virtualization uphold performance with complex queries?
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Performance Acceleration: Summaries, Recommendation, MPP and more

Similar to Unraveling the Data Lake: MPP integration within a Logical Data Fabric (20)

PDF
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
PDF
Virtualisation de données : Enjeux, Usages & Bénéfices
PDF
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
PDF
Agile Data Management with Enterprise Data Fabric (ASEAN)
PDF
Why Data Virtualization? An Introduction
PDF
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
PDF
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
PDF
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
PDF
Getting Started with Data Virtualization – What problems DV solves
PDF
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
PDF
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
PDF
The Evolution of Data Stack: From Query Accelerators to Data Fabrics
PDF
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
PDF
Demystifying Data Virtualization (ASEAN)
PDF
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
PPTX
Take your Data Management Practice to the Next Level with Denodo 7
PDF
Logical Data Fabric: Architectural Components
PDF
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
PDF
Impulser la digitalisation et modernisation de la fonction Finance grâce à la...
PDF
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Virtualisation de données : Enjeux, Usages & Bénéfices
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Agile Data Management with Enterprise Data Fabric (ASEAN)
Why Data Virtualization? An Introduction
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Denodo DataFest 2016: What’s New in Denodo Platform – Demo and Roadmap
Getting Started with Data Virtualization – What problems DV solves
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
The Evolution of Data Stack: From Query Accelerators to Data Fabrics
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Demystifying Data Virtualization (ASEAN)
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Take your Data Management Practice to the Next Level with Denodo 7
Logical Data Fabric: Architectural Components
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Impulser la digitalisation et modernisation de la fonction Finance grâce à la...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
Ad

More from Denodo (20)

PDF
Enterprise Monitoring and Auditing in Denodo
PDF
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
PDF
Achieving Self-Service Analytics with a Governed Data Services Layer
PDF
What you need to know about Generative AI and Data Management?
PDF
Mastering Data Compliance in a Dynamic Business Landscape
PDF
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
PDF
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
PDF
Drive Data Privacy Regulatory Compliance
PDF
Знакомство с виртуализацией данных для профессионалов в области данных
PDF
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
PDF
Denodo Partner Connect - Technical Webinar - Ask Me Anything
PDF
Lunch and Learn ANZ: Key Takeaways for 2023!
PDF
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
PDF
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
PDF
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
PDF
How to Build Your Data Marketplace with Data Virtualization?
PDF
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
PDF
Enabling Data Catalog users with advanced usability
PDF
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
PDF
GenAI y el futuro de la gestión de datos: mitos y realidades
Enterprise Monitoring and Auditing in Denodo
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Achieving Self-Service Analytics with a Governed Data Services Layer
What you need to know about Generative AI and Data Management?
Mastering Data Compliance in a Dynamic Business Landscape
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Drive Data Privacy Regulatory Compliance
Знакомство с виртуализацией данных для профессионалов в области данных
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Lunch and Learn ANZ: Key Takeaways for 2023!
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
How to Build Your Data Marketplace with Data Virtualization?
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Enabling Data Catalog users with advanced usability
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
GenAI y el futuro de la gestión de datos: mitos y realidades
Ad

Recently uploaded (20)

PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
modul_python (1).pptx for professional and student
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Introduction to the R Programming Language
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Quality review (1)_presentation of this 21
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Introduction to Data Science and Data Analysis
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Leprosy and NLEP programme community medicine
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Miokarditis (Inflamasi pada Otot Jantung)
modul_python (1).pptx for professional and student
oil_refinery_comprehensive_20250804084928 (1).pptx
Introduction to the R Programming Language
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
STERILIZATION AND DISINFECTION-1.ppthhhbx
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Quality review (1)_presentation of this 21
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to Data Science and Data Analysis
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Clinical guidelines as a resource for EBP(1).pdf
Galatica Smart Energy Infrastructure Startup Pitch Deck
Qualitative Qantitative and Mixed Methods.pptx
ISS -ESG Data flows What is ESG and HowHow
Leprosy and NLEP programme community medicine
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Introduction-to-Cloud-ComputingFinal.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx

Unraveling the Data Lake: MPP integration within a Logical Data Fabric

  • 1. Unraveling the Data Lake. MPP integration within a Logical Data Fabric Antonio Tortosa Technical Consultant | Denodo
  • 2. AGENDA 1. The challenge of cloud object storage 2. Incorporating Massive Parallel Processing engines into a logical data fabric 3. Denodo Platform and Presto
  • 3. The challenge of cloud object storage
  • 4. 4 The simplified version of object storage The challenge of cloud object storage Source: Amazon S3. How it works ■ Cheap storage for backup, old or rarely used data ■ Ingest 3rd party data ■ Move non-critical workloads to cheaper systems ■ Data science playground
  • 5. 5 The reality of enterprise data strategy The challenge of cloud object storage Data Lake / Object Storage Enterprise Data Warehouse Business Intelligence Reporting Data Discovery Other Apps On-prem data CDC ETL
  • 6. 6 The missing pieces The challenge of cloud object storage Processing - An engine capable of effectively processing the data stored However, an MPP engine alone is not enough, as seen by the failures of previous incarnations of Data Lake projects Integration - A logical model serving a common canonical view of the data ecosystem Data in the object storage is just a portion of the data in the organization. All data should be managed with consistency, regardless of location Data Management - Fine grained Security & Data Governance Ease of data discovery. Documentation, classification and search capabilities. Fine-grained security and access control
  • 7. Incorporating MPP into a logical data fabric
  • 8. 8 Parallel Processing of object storage data Incorporating MPP into a logical data fabric Logical Layer MPP Coordinator MPP Worker MPP Worker MPP Worker MPP Worker Object Storage Data query Data flow Other calls
  • 9. 9 Integration of object storage with the data ecosystem Incorporating MPP into a logical data fabric Logical Layer MPP Coordinator Other Sources MPP Worker MPP Worker MPP Worker MPP Worker Object Storage Data query Data flow Other calls
  • 11. 11 Execution At execution time Denodo sends the query to Presto, now having objects storage files natively mapped. In addition, if other Denodo data sources have to be used Presto uses its Denodo connector to pull that data into the worker nodes memory in real-time. Introspection Denodo can connect to the object storage, for example S3 buckets, and graphically browse the folders and files. Parquet files, folders with content, and partitions are automatically detected and the developers choose the ones that will become Denodo base views. Mapping Denodo connects to Presto and creates the necessary structures to map the object storage files in the target schema. Denodo automatically detects field data types and partitions. Denodo then creates base views from these tables. The process at a glance Denodo Platform and Presto
  • 12. 12 Introspection of Object Storage Denodo Platform and Presto MPP Worker MPP Worker MPP Worker Object Storage ■ The MPP Workers need to have the object storage files mapped internally as tables. ○ This is typically done manually by data engineers and need different tools to navigate the object storage and to create the tables in the MPP engine. ■ Denodo simplifies this process by providing a unified point of view. ○ The same tool that allows introspection into the object storage manages the mapping of the files to tables.
  • 13. 13 Introspection of Object Storage Denodo Platform and Presto
  • 14. 14 Mapping of Object Storage files into views Denodo Platform and Presto
  • 15. 15 Execution - Presto with other sources Denodo Platform and Presto Logical Layer MPP Coordinator Other Sources MPP Worker MPP Worker MPP Worker MPP Worker Object Storage SQL query Data flow Other calls
  • 16. 16 Execution - Presto with other sources Denodo Platform and Presto 90,859 rows 2,880,404 rows
  • 17. 17 Execution - Presto with other sources Denodo Platform and Presto Fully delegated query to Presto ■ customer_crm is brought into memory in real time to the worker nodes so its locally referenced as tmp_231_885_0_2549 ■ store_sales was previously mapped by Denodo Platform and is referenced in this query as vdp_table_167509671357
  • 18. 18 Enterprise Data Architecture Final Solution Data Lake / Object Storage Enterprise Data Warehouse Business Intelligence Reporting Other Apps On-prem data CDC ETL Denodo Virtual DataPort Denodo Web Services Denodo MPP Denodo Data Catalog
  • 19. CLOSING REMARKS ▪ There is a renewed interest in Data Lakes thanks to cloud object storage solving some of the original drawbacks ▪ However, they are not a single solution to a realistic enterprise data strategy. In addition to this we must consider as well ○ Cost-effective processing ○ Integration with other sources within the enterprise data ecosystem ○ Data governance and data security ▪ Denodo Platform has always provided these. Yet, in its 2023Q1 update, the state-of-the-art integration with Presto provides a even better solution to the integration of Data Lakes into a logical data fabric. ▪ Now, Denodo customers can, from a unified access layer, ○ Introspect object storage files ○ Integrate object storage data with other corporate sources to increase data adoption ▪ With this, Denodo Platform will seamlessly colocate the data in the Presto MPP cluster to accelerate query execution.
  • 20. Q&A
  • 21. Thanks! www.denodo.com info@denodo.com © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.