SlideShare a Scribd company logo
DATA VIRTUALIZATION
APAC WEBINAR SERIES
Sessions Covering Key Data
Integration Challenges Solved
with Data Virtualization
Logical data lakes: From single purpose to
multipurpose data lakes
Chris Day
Director, APAC Sales Engineering, Denodo
Sushant Kumar
Product Marketing Manager, Denodo
Agenda
Logical data lakes: From single purpose to multipurpose data lakes
Product Demo
Q&A
Next Steps
Logical data lakes: From single purpose
to multipurpose data lakes
4
Product Marketing Manager, Denodo
Sushant Kumar
5
A data lake is a storage repository that holds a
vast amount of raw data in its native format. The data
structure and requirements are not defined until the
data is needed
The current needs for sophisticated data-
driven intelligence and data science
favored this concept for its simplicity and
power
Hadoop and its ecosystem provided the
foundation that data lakes required: vast
storage and processing muscle
It also favored the concept of ELT
vs ETL: load data first, (maybe)
Data Lakes
6
The early data scientists saw Hadoop as their
personal supercomputer.
Hadoop-based Data Lakes helped
democratize access to state of the art
supercomputing with off-the- shelf HW (and
later cloud)
The industry push for BI made Hadoop–based
solutions the standard to bring modern
analytics to any corporation
Data Lakes – A Data Scientist’s Playground
7
Data Lakes – Not a Perfect World
Physical Nature
• Based on Replication. Data Lakes require data to be copied to its physical storage
• Replication extends development cycles and costs
• Not all data is suitable for replication
• Real time needs: Cloud and SaaS APIs
• Large volumes: existing EDW
• Laws and restrictions
Single Purpose
• Usage of the data lake is often monopolize by data scientists
• New data silo. No clear path to share insights with business users
• Lacks the governance, security and quality that business users are used to (e.g. in the EDW)
8
The Rise of Logical Architectures
The Evolution of AnalyticalArchitectures
Source: Adopt the Logical Data Warehouse Architecture to Meet Your Modern Analytical Needs Gartner April 2018
Multi‐purpose data lakes are data delivery environments developed to support a broad range of
users, from traditional self‐service BI users (e.g. finance, marketing, human resource, transport) to
sophisticated data scientists.
Multi‐purpose data lakes allow a broader and deeper use of the data lake investment without
minimizing the potential value for data science and without making it an inflexible environment.
Rick Van der Lans, R20 Consultancy
10
The Multipurpose Data Lake with Data Virtualization
Logical Nature
• Replication is an option, not a necessity
• Broaden data access, shorten development times, better insights
• Tight integration with big data systems. Fast execution with
large data volumes
Multi-purpose
• Curated access for non-technical users
• Better governance and access control
• Better ROI for the investment of the lake
11
The Multipurpose Data Lake with Data Virtualization
“Amulti-purpose data lake can become an organization’s universal data delivery system”
Architecting the Multi-Purpose Data Lake with Data Virtualization , Rick Van der Lans, April 2018
12
Single access to all data assets, internal and
external:
▪ Physical Data Lake (usually based on SQL-on-Hadoop
systems)
▪ Other databases (EDW,ODS, applications, etc.)
▪ SaaS APIs (Salesforce, Google, social media, etc.)
▪ Files (local, S3, Azure, etc.)
The Virtual Data Lake – Access to all Data Sources
13
The physical Data Lake can also be used as Denodo’s cache
This allows to quickly load any data accessible by Denodo to
the Hadoop cluster
Caching becomes an alternative to ingestion ELT processes
that preserves lineage and governance
Load process based on direct load to HDFS:
1. Creation of the target table in Cache system
2. Generation of Parquet files (in chunks) with Snappy
compression in the local machine
3. Upload in parallel of Parquet files to HDFS
The Virtual Data Lake – Ingesting and Caching
14
Denodo optimizer provides native integration with MPP
systems to provide one extra key capability: Query
Acceleration
Denodo can move, on demand, processing to the MPP
during execution of a query
• Parallel power for calculations in the virtual
layer
• Avoids slow processing in-disk when
processing buffers don’t fit into Denodo’s
memory (swapped data)
The Virtual Data Lake – Using the Lake Processing Engine
15
The Virtual Data Lake – Putting the Pieces Together
2Mrows
(sales by customer)
CurrentSales
68 M rows
1. Partial Aggregation
push down
Maximizes source processing
dramatically Reducesnetwork
traffic 3. On-demand data transfer
Denodo automatically generates
and upload Parquet files
4. Integration with local data
The engine detects when data
is cached or comes from a
local table already in the MPP
2. Integrated with Cost Based Optimizer
Based on data volume estimation and
the cost of these particularoperations,
the CBO can decide to move all orpart
of the execution tree to theMPP
5. Fast parallel execution
Support for Spark, Presto and Impala
for fast analytical processing in
inexpensive Hadoop-based solutions
Hist.Sales
220 M rows
Customer
2 M rows
(Cached)
join
group by ZIP
System Execution Time Optimization Techniques
Others ~ 10 min Simple federation
No MPP 43 sec Aggregation push-down
With MPP 11 sec
Aggregation push-down + MPP integration
(Impala 8 nodes)
group by
Customer ID
16
▪ A Virtual Data Lake improves decision making and shortens development
cycles
• Surfaces all company data from multiple repositories without the need to
replicate all data into the lake
• Eliminates data silos: allows for on-demand combination of data from multiple
sources
▪ A Virtual Data Lake broadens adoption of the lake and
improves its ROI
• Improves governance and metadata management to avoid “data swamps”
• Allows controlled access to the lake to non-technical users
▪ A Virtual Data Lake offer performance for the Big Data World
• Leverages the processing power of the existing cluster controlled by Denodo’s
optimizer
The Virtual Data Lake - Conclusions
17
Challenges
• Competition from a low cost vendor
• Lower the price, affecting margins?
• Or, maintain high price, but differentiate in other ways?
Customer story – Large Heavy Equipment Manufacturer
18
Benefits
Large Heavy Equipment Manufacturer
Self-service / Predictive Analytics – IoT Integration
Improved asset performance and
proactive maintenance
Increased revenue from sale of
services and parts
Reduced warranty costs of parts
failure
19
Gartner, Adopt the Logical Data Warehouse Architecture to Meet Your Modern Analytical Needs, May 2018
When designed properly, DV can speed data integration, lower data
latency, offer flexibility and reuse, and reduce data sprawl across
dispersed data sources.
Due to its many benefits, DV is often the first step for organizations
evolving a traditional, repository- style data warehouse into a Logical
Architecture”
Product Demonstration
Director, APAC Sales Engineering, Denodo
Chris Day
21
Key Takeaways
FIRST
Takeaway
Hadoop-based Data Lakes are the standard approach to modern
analytics within most organizations
SECOND
Takeaway
Physical Data Lakes introduce many complexities (replication,
synchronization, governance, etc.) that restrict their use
THIRD
Takeaway
Logical Data Lakes allow users to access data from all sources –
internal and external – to grow value of Data Lake approach
FOURTH
Takeaway
Data Virtualization creates ‘multipurpose’ Data Lakes for all kinds
of users – data scientists and business users
FIFTH
Takeaway
Data Virtualization introduces governance and access controls to
the Data Lake without impeding the ‘power users'
21
Q&A
23
Next Steps
Access Denodo Platform in the Cloud!
Take a Test Drive today!
https://bit.ly/2AouQLQ
GET STARTED TODAY
Next session
Data Virtualization enabled Data Fabric:
Operationalize the data lake
Sushant Kumar
Product Marketing Manager, Denodo
Chris Day
Director, APAC Sales Engineering, Denodo
Thanks!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical,
including photocopying and microfilm, without prior the written authorization from Denodo Technologies.

More Related Content

PDF
Advanced Analytics and Machine Learning with Data Virtualization
PDF
Multi-Cloud-Datenintegration mit Datenvirtualisierung
PDF
Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)
PDF
Can data virtualization uphold performance with complex queries?
PDF
Data Virtualization: An Essential Component of a Cloud Data Lake
PDF
Data Virtualization: From Zero to Hero
PPTX
Fast Data Strategy Houston Roadshow Presentation
PDF
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Advanced Analytics and Machine Learning with Data Virtualization
Multi-Cloud-Datenintegration mit Datenvirtualisierung
Logical Data Warehouse: The Foundation of Modern Data and Analytics (APAC)
Can data virtualization uphold performance with complex queries?
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: From Zero to Hero
Fast Data Strategy Houston Roadshow Presentation
Accelerate Self-Service Analytics with Data Virtualization and Visualization

What's hot (20)

PDF
Enabling Cloud Data Integration (EMEA)
PDF
In Memory Parallel Processing for Big Data Scenarios
PDF
Unlock Your Data for ML & AI using Data Virtualization
PDF
Why Data Virtualization? An Introduction
PDF
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
PDF
GDPR Noncompliance: Avoid the Risk with Data Virtualization
PDF
Data Virtualization: From Zero to Hero (Middle East)
PDF
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
PDF
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
PDF
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
PDF
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
PDF
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
PDF
Minimizing the Complexities of Machine Learning with Data Virtualization
PDF
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
PDF
Data Virtualization: The Agile Delivery Platform
PDF
Denodo Global Cloud Survey 2020
PDF
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
PPTX
Powering Self Service Business Intelligence with Hadoop and Data Virtualization
PDF
Performance Acceleration: Summaries, Recommendation, MPP and more
PDF
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Enabling Cloud Data Integration (EMEA)
In Memory Parallel Processing for Big Data Scenarios
Unlock Your Data for ML & AI using Data Virtualization
Why Data Virtualization? An Introduction
From Single Purpose to Multi Purpose Data Lakes - Broadening End Users
GDPR Noncompliance: Avoid the Risk with Data Virtualization
Data Virtualization: From Zero to Hero (Middle East)
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Minimizing the Complexities of Machine Learning with Data Virtualization
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Data Virtualization: The Agile Delivery Platform
Denodo Global Cloud Survey 2020
The Rise of Logical Data Architecture - Breaking the Data Gravity Notion (Mid...
Powering Self Service Business Intelligence with Hadoop and Data Virtualization
Performance Acceleration: Summaries, Recommendation, MPP and more
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Ad

Similar to Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC) (20)

PDF
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
PDF
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
PDF
Data Lakes: A Logical Approach for Faster Unified Insights
PDF
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
PDF
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
PDF
What is the future of data strategy?
PDF
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
PDF
Logitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATION
PDF
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
PDF
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
PDF
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
PDF
Modern Data Management for Federal Modernization
PDF
Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)
PDF
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
PDF
Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATION
PDF
Connecting Silos in Real Time with Data Virtualization
PDF
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
PDF
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
PDF
Virtualisation de données : Enjeux, Usages & Bénéfices
PDF
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Data Lakes: A Logical Approach for Faster Unified Insights
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
What is the future of data strategy?
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
Logitech - LOGITECH ACCELERATES CLOUD ANALYTICS USING DATA VIRTUALIZATION
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Modern Data Management for Federal Modernization
Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Big Data LDN 2018: CONNECTING SILOS IN REAL-TIME WITH DATA VIRTUALIZATION
Connecting Silos in Real Time with Data Virtualization
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Virtualisation de données : Enjeux, Usages & Bénéfices
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Ad

More from Denodo (20)

PDF
Enterprise Monitoring and Auditing in Denodo
PDF
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
PDF
Achieving Self-Service Analytics with a Governed Data Services Layer
PDF
What you need to know about Generative AI and Data Management?
PDF
Mastering Data Compliance in a Dynamic Business Landscape
PDF
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
PDF
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
PDF
Drive Data Privacy Regulatory Compliance
PDF
Знакомство с виртуализацией данных для профессионалов в области данных
PDF
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
PDF
Denodo Partner Connect - Technical Webinar - Ask Me Anything
PDF
Lunch and Learn ANZ: Key Takeaways for 2023!
PDF
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
PDF
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
PDF
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
PDF
How to Build Your Data Marketplace with Data Virtualization?
PDF
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
PDF
Enabling Data Catalog users with advanced usability
PDF
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
PDF
GenAI y el futuro de la gestión de datos: mitos y realidades
Enterprise Monitoring and Auditing in Denodo
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Achieving Self-Service Analytics with a Governed Data Services Layer
What you need to know about Generative AI and Data Management?
Mastering Data Compliance in a Dynamic Business Landscape
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Drive Data Privacy Regulatory Compliance
Знакомство с виртуализацией данных для профессионалов в области данных
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Lunch and Learn ANZ: Key Takeaways for 2023!
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
How to Build Your Data Marketplace with Data Virtualization?
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Enabling Data Catalog users with advanced usability
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
GenAI y el futuro de la gestión de datos: mitos y realidades

Recently uploaded (20)

PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
annual-report-2024-2025 original latest.
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
Computer network topology notes for revision
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Business Acumen Training GuidePresentation.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Galatica Smart Energy Infrastructure Startup Pitch Deck
annual-report-2024-2025 original latest.
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
1_Introduction to advance data techniques.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction to Knowledge Engineering Part 1
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Fluorescence-microscope_Botany_detailed content
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction to machine learning and Linear Models
Computer network topology notes for revision

Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)

  • 1. DATA VIRTUALIZATION APAC WEBINAR SERIES Sessions Covering Key Data Integration Challenges Solved with Data Virtualization
  • 2. Logical data lakes: From single purpose to multipurpose data lakes Chris Day Director, APAC Sales Engineering, Denodo Sushant Kumar Product Marketing Manager, Denodo
  • 3. Agenda Logical data lakes: From single purpose to multipurpose data lakes Product Demo Q&A Next Steps
  • 4. Logical data lakes: From single purpose to multipurpose data lakes 4 Product Marketing Manager, Denodo Sushant Kumar
  • 5. 5 A data lake is a storage repository that holds a vast amount of raw data in its native format. The data structure and requirements are not defined until the data is needed The current needs for sophisticated data- driven intelligence and data science favored this concept for its simplicity and power Hadoop and its ecosystem provided the foundation that data lakes required: vast storage and processing muscle It also favored the concept of ELT vs ETL: load data first, (maybe) Data Lakes
  • 6. 6 The early data scientists saw Hadoop as their personal supercomputer. Hadoop-based Data Lakes helped democratize access to state of the art supercomputing with off-the- shelf HW (and later cloud) The industry push for BI made Hadoop–based solutions the standard to bring modern analytics to any corporation Data Lakes – A Data Scientist’s Playground
  • 7. 7 Data Lakes – Not a Perfect World Physical Nature • Based on Replication. Data Lakes require data to be copied to its physical storage • Replication extends development cycles and costs • Not all data is suitable for replication • Real time needs: Cloud and SaaS APIs • Large volumes: existing EDW • Laws and restrictions Single Purpose • Usage of the data lake is often monopolize by data scientists • New data silo. No clear path to share insights with business users • Lacks the governance, security and quality that business users are used to (e.g. in the EDW)
  • 8. 8 The Rise of Logical Architectures The Evolution of AnalyticalArchitectures Source: Adopt the Logical Data Warehouse Architecture to Meet Your Modern Analytical Needs Gartner April 2018
  • 9. Multi‐purpose data lakes are data delivery environments developed to support a broad range of users, from traditional self‐service BI users (e.g. finance, marketing, human resource, transport) to sophisticated data scientists. Multi‐purpose data lakes allow a broader and deeper use of the data lake investment without minimizing the potential value for data science and without making it an inflexible environment. Rick Van der Lans, R20 Consultancy
  • 10. 10 The Multipurpose Data Lake with Data Virtualization Logical Nature • Replication is an option, not a necessity • Broaden data access, shorten development times, better insights • Tight integration with big data systems. Fast execution with large data volumes Multi-purpose • Curated access for non-technical users • Better governance and access control • Better ROI for the investment of the lake
  • 11. 11 The Multipurpose Data Lake with Data Virtualization “Amulti-purpose data lake can become an organization’s universal data delivery system” Architecting the Multi-Purpose Data Lake with Data Virtualization , Rick Van der Lans, April 2018
  • 12. 12 Single access to all data assets, internal and external: ▪ Physical Data Lake (usually based on SQL-on-Hadoop systems) ▪ Other databases (EDW,ODS, applications, etc.) ▪ SaaS APIs (Salesforce, Google, social media, etc.) ▪ Files (local, S3, Azure, etc.) The Virtual Data Lake – Access to all Data Sources
  • 13. 13 The physical Data Lake can also be used as Denodo’s cache This allows to quickly load any data accessible by Denodo to the Hadoop cluster Caching becomes an alternative to ingestion ELT processes that preserves lineage and governance Load process based on direct load to HDFS: 1. Creation of the target table in Cache system 2. Generation of Parquet files (in chunks) with Snappy compression in the local machine 3. Upload in parallel of Parquet files to HDFS The Virtual Data Lake – Ingesting and Caching
  • 14. 14 Denodo optimizer provides native integration with MPP systems to provide one extra key capability: Query Acceleration Denodo can move, on demand, processing to the MPP during execution of a query • Parallel power for calculations in the virtual layer • Avoids slow processing in-disk when processing buffers don’t fit into Denodo’s memory (swapped data) The Virtual Data Lake – Using the Lake Processing Engine
  • 15. 15 The Virtual Data Lake – Putting the Pieces Together 2Mrows (sales by customer) CurrentSales 68 M rows 1. Partial Aggregation push down Maximizes source processing dramatically Reducesnetwork traffic 3. On-demand data transfer Denodo automatically generates and upload Parquet files 4. Integration with local data The engine detects when data is cached or comes from a local table already in the MPP 2. Integrated with Cost Based Optimizer Based on data volume estimation and the cost of these particularoperations, the CBO can decide to move all orpart of the execution tree to theMPP 5. Fast parallel execution Support for Spark, Presto and Impala for fast analytical processing in inexpensive Hadoop-based solutions Hist.Sales 220 M rows Customer 2 M rows (Cached) join group by ZIP System Execution Time Optimization Techniques Others ~ 10 min Simple federation No MPP 43 sec Aggregation push-down With MPP 11 sec Aggregation push-down + MPP integration (Impala 8 nodes) group by Customer ID
  • 16. 16 ▪ A Virtual Data Lake improves decision making and shortens development cycles • Surfaces all company data from multiple repositories without the need to replicate all data into the lake • Eliminates data silos: allows for on-demand combination of data from multiple sources ▪ A Virtual Data Lake broadens adoption of the lake and improves its ROI • Improves governance and metadata management to avoid “data swamps” • Allows controlled access to the lake to non-technical users ▪ A Virtual Data Lake offer performance for the Big Data World • Leverages the processing power of the existing cluster controlled by Denodo’s optimizer The Virtual Data Lake - Conclusions
  • 17. 17 Challenges • Competition from a low cost vendor • Lower the price, affecting margins? • Or, maintain high price, but differentiate in other ways? Customer story – Large Heavy Equipment Manufacturer
  • 18. 18 Benefits Large Heavy Equipment Manufacturer Self-service / Predictive Analytics – IoT Integration Improved asset performance and proactive maintenance Increased revenue from sale of services and parts Reduced warranty costs of parts failure
  • 19. 19 Gartner, Adopt the Logical Data Warehouse Architecture to Meet Your Modern Analytical Needs, May 2018 When designed properly, DV can speed data integration, lower data latency, offer flexibility and reuse, and reduce data sprawl across dispersed data sources. Due to its many benefits, DV is often the first step for organizations evolving a traditional, repository- style data warehouse into a Logical Architecture”
  • 20. Product Demonstration Director, APAC Sales Engineering, Denodo Chris Day
  • 21. 21 Key Takeaways FIRST Takeaway Hadoop-based Data Lakes are the standard approach to modern analytics within most organizations SECOND Takeaway Physical Data Lakes introduce many complexities (replication, synchronization, governance, etc.) that restrict their use THIRD Takeaway Logical Data Lakes allow users to access data from all sources – internal and external – to grow value of Data Lake approach FOURTH Takeaway Data Virtualization creates ‘multipurpose’ Data Lakes for all kinds of users – data scientists and business users FIFTH Takeaway Data Virtualization introduces governance and access controls to the Data Lake without impeding the ‘power users' 21
  • 22. Q&A
  • 23. 23 Next Steps Access Denodo Platform in the Cloud! Take a Test Drive today! https://bit.ly/2AouQLQ GET STARTED TODAY
  • 24. Next session Data Virtualization enabled Data Fabric: Operationalize the data lake Sushant Kumar Product Marketing Manager, Denodo Chris Day Director, APAC Sales Engineering, Denodo
  • 25. Thanks! www.denodo.com info@denodo.com © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.