SlideShare a Scribd company logo
How Data Virtualization
impacts AI/ML projects
Chris Day
Director, APAC Sales Engineering
cday@denodo.com
14 April 2021
3
Advanced Analytics & Machine Learning Projects Need Data
Improving patient
outcomes
• Data includes patient
demographics, family history,
patient vitals, lab test results,
claims data etc.
Predictive maintenance
• Maintenance data logs, data
coming in from sensors –
including temperature, running
time, power level duration etc.
Predicting late payment
• Data includes company or
individual demographics,
payment history, customer
support logs etc.
Preventing frauds
• Data includes the location
where the claim originated,
time of the day, claimant
history and any recent adverse
events.
Reducing customer churn
• Data includes customer
demographics , products
purchased, products used, pat
transaction, company size,
history, revenue etc.
4
VentureBeat AI, July 2019
87% of data science projects never
make it into production
5
The Scale of the Problem
What is Data Virtualization?
7
What is Data Virtualization?
Consume
in business applications
Combine
related data into views
Connect
to disparate data
sources
2
3
1
8
What is Data Virtualization?
9
Data Virtualization: Unified Data Integration and Delivery
• Data Abstraction:
decoupling applications
from data sources
• Data Integration without
replication or relocation
of physical data
• Easy Access to Any Data,
high performance and
real-time/ right-time
• Unified metadata, security
& governance across all
data assets
• Dynamic Data Catalog for
self-service data services
and easy discovery
• Data Delivery in any
format with intelligent
query optimization
Tackling the Data Problem
11
Typical Data Science Workflow
A typical workflow for a data scientist is:
1. Gather the requirements for the business problem
2. Identify useful data
▪ Ingest data
3. Cleanse data into a useful format
4. Analyze data
5. Prepare input for your algorithms
6. Execute data science algorithms (ML, AI, etc.)
▪ Iterate steps 2 to 6 until valuable insights are
produced
7. Visualize and share
Source:http://sudeep.co/data-science/Understanding-the-Data-Science-Lifecycle/
12
Where Does Your Time Go?
• 80% of time – Finding and
preparing the data
• 10% of time – Analysis
• 10% of time – Visualizing data
Source:http://sudeep.co/data-science/Understanding-the-Data-Science-Lifecycle/
13
Where Does Your Time Go?
A large amount of time and effort goes into tasks not intrinsically related to data science:
• Finding where the right data may be
• Getting access to the data
▪ Bureaucracy
▪ Understand access methods and technology (noSQL, REST APIs, etc.)
• Transforming data into a format easy to work with
• Combining data originally available in different sources and formats
• Profile & cleanse data to eliminate incomplete or inconsistent data points
14
Data Scientist Workflow
Identify useful
data
Modify datainto
auseful format
Analyzedata Executedata
science algorithms
(ML,AI, etc.)
Prepare for
MLalgorithm
15
Identify Useful Data
If the company has a virtual layer with a good coverage of
data sources, this task is greatly simplified.
▪ A data virtualization tool like Denodo can offer
unified access to all data available in the company.
▪ It abstracts the technologies underneath, offering a
standard SQL interface to query and manipulate.
To further simplify the challenge, Denodo offers a Data
Catalog to search, find and explore your data assets.
16
Data Scientist Workflow
Identify useful
data
Modify datainto
auseful format
Analyzedata Execute data
science algorithms
(ML,AI, etc.)
Prepare for
MLalgorithm
17
Data Virtualization offers the unique opportunity of
using standard SQL (joins, aggregations,
transformations, etc.) to access, manipulate and
analyze any data.
Cleansing and transformation steps can be easily
accomplished in SQL.
Its modeling capabilities enable the definition of views
that embed this logic to foster reusability.
Ingestion And Data Manipulation Tasks
18
McCormick Uses Denodo to Provide Data to Its AI Project
Background
▪ McCormick’s AI and machine learning based project required data
that was stored in internal systems spread across 4 different
continents and in spreadsheets.
▪ Portions of data in the internal systems and spreadsheets that
were shared with McCormick's research partner firms needed to be
masked and at the same time unmasked when shared internally.
▪ McCormick wanted to create a data service that could simplify the
process of data access and data sharing across the organisation
and be used by the analytics teams for their machine learning
projects.
19
• Data Quality
• Multiple Brands
• Which Data to Use?
20
McCormick – Multi-purpose Platform
Solution Highlights
▪ Agile Data Delivery
▪ High Level of Reuse
▪ Single Discovery & Consumption
Platform
21
Data Virtualization Benefits for McCormick
▪ Machine learning and applications were able to
access refreshed, validated and indexed data in
real time, without replication, from Denodo
enterprise data service.
▪ The Denodo enterprise data service gave the
business users the capability to compare data in
multiple systems.
▪ Spreadsheets now the exception.
▪ Ensure the quality of proposed data and services.
22
✓ Denodo can play key role in the data science ecosystem to reduce data
exploration and analysis timeframes.
✓ Extends and integrates with the capabilities of notebooks, Python, R, etc.
to improve the toolset of the data scientist.
✓ Provides a modern “SQL-on-Anything” engine.
✓ Can leverage Big Data technologies like Spark (as a data source, an
ingestion tool and for external processing) to efficiently work with large
data volumes.
✓ New and expanded tools for data scientists and citizen analysts: “Apache
Zeppelin for Denodo” Notebook.
Data Virtualization Benefits for AI and Machine Learning Projects
23
https://bit.ly/2Qb5tYB
24
https://bit.ly/3dVe1Ll
Thanks!
www.denodo.com info@denodo.com
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm,
without prior the written authorization from Denodo Technologies.

More Related Content

PDF
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
PDF
Introduction to Modern Data Virtualization 2021 (APAC)
PDF
Data Virtualization: An Introduction
PDF
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
PDF
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
PDF
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
PDF
The Proof is in the Pudding
PDF
Discover how Covid-19 is accelerating the need for healthcare interoperabilit...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Introduction to Modern Data Virtualization 2021 (APAC)
Data Virtualization: An Introduction
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
A Successful Data Strategy for Insurers in Volatile Times (ASEAN)
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Proof is in the Pudding
Discover how Covid-19 is accelerating the need for healthcare interoperabilit...

What's hot (20)

PDF
Secure your data with Virtual Data Fabric (Middle East)
PDF
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
PDF
Customer Keynote: Data Service and Security at an Enterprise Scale with Logic...
PDF
Denodo’s Data Catalog: Bridging the Gap between Data and Business
PDF
Creating a Healthcare Data Fabric, and Providing a Single, Unified, and Curat...
PDF
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the IT
PDF
Introduction to Modern Data Virtualization (US)
PPTX
Data virtualization in the cloud – accelerating time to-value
PDF
Advanced Analytics and Machine Learning with Data Virtualization (India)
PDF
Logical Data Fabric: Maturing Implementation from Small to Big (APAC)
PDF
Logical Data Fabric: Architectural Components
PDF
Big Data Fabric: A Recipe for Big Data Initiatives
PDF
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
PDF
Data Virtualization for Compliance – Creating a Controlled Data Environment
PPTX
Data fabric and VMware
PDF
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
PDF
Three Dimensions of Data as a Service
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PDF
Everything To Everybody? Making Your Denodo Implementation a Huge Success
PDF
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
Secure your data with Virtual Data Fabric (Middle East)
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
Customer Keynote: Data Service and Security at an Enterprise Scale with Logic...
Denodo’s Data Catalog: Bridging the Gap between Data and Business
Creating a Healthcare Data Fabric, and Providing a Single, Unified, and Curat...
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the IT
Introduction to Modern Data Virtualization (US)
Data virtualization in the cloud – accelerating time to-value
Advanced Analytics and Machine Learning with Data Virtualization (India)
Logical Data Fabric: Maturing Implementation from Small to Big (APAC)
Logical Data Fabric: Architectural Components
Big Data Fabric: A Recipe for Big Data Initiatives
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Data Virtualization for Compliance – Creating a Controlled Data Environment
Data fabric and VMware
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Three Dimensions of Data as a Service
Advanced Analytics and Machine Learning with Data Virtualization
Everything To Everybody? Making Your Denodo Implementation a Huge Success
Analyst Keynote: TDWI: Data Virtualization as a Data Management Strategy for ...
Ad

Similar to Quicker Insights and Sustainable Business Agility Powered By Data Virtualization (A/NZ) (20)

PDF
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
PDF
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
PDF
Data Science Operationalization: The Journey of Enterprise AI
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PDF
How Data Virtualization Puts Machine Learning into Production (APAC)
PDF
How Data Virtualization Adds Value to Your Data Science Stack
PDF
Minimizing the Complexities of Machine Learning with Data Virtualization
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PPTX
Data Virtualization: An Introduction
PDF
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
PPTX
Data Virtualization Accelerating Your Data Strategy
PDF
Data Virtualization: An Introduction
PDF
A Key to Real-time Insights in a Post-COVID World (ASEAN)
PDF
The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...
PDF
Unlock Your Data for ML & AI using Data Virtualization
PDF
Why Data Virtualization? An Introduction.
PDF
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
PDF
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...
PDF
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
Data Science Operationalization: The Journey of Enterprise AI
Advanced Analytics and Machine Learning with Data Virtualization
How Data Virtualization Puts Machine Learning into Production (APAC)
How Data Virtualization Adds Value to Your Data Science Stack
Minimizing the Complexities of Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
Data Virtualization: An Introduction
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Data Virtualization Accelerating Your Data Strategy
Data Virtualization: An Introduction
A Key to Real-time Insights in a Post-COVID World (ASEAN)
The Role of Logical Data Fabric in a Unified Platform for Modern Analytics (A...
Unlock Your Data for ML & AI using Data Virtualization
Why Data Virtualization? An Introduction.
Data Ninja Webinar Series: Realizing the Promise of Data Lakes
Data Ninja Webinar Series: Accelerating Business Value with Data Virtualizati...
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Ad

More from Denodo (20)

PDF
Enterprise Monitoring and Auditing in Denodo
PDF
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
PDF
Achieving Self-Service Analytics with a Governed Data Services Layer
PDF
What you need to know about Generative AI and Data Management?
PDF
Mastering Data Compliance in a Dynamic Business Landscape
PDF
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
PDF
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
PDF
Drive Data Privacy Regulatory Compliance
PDF
Знакомство с виртуализацией данных для профессионалов в области данных
PDF
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
PDF
Denodo Partner Connect - Technical Webinar - Ask Me Anything
PDF
Lunch and Learn ANZ: Key Takeaways for 2023!
PDF
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
PDF
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
PDF
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
PDF
How to Build Your Data Marketplace with Data Virtualization?
PDF
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
PDF
Enabling Data Catalog users with advanced usability
PDF
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
PDF
GenAI y el futuro de la gestión de datos: mitos y realidades
Enterprise Monitoring and Auditing in Denodo
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Achieving Self-Service Analytics with a Governed Data Services Layer
What you need to know about Generative AI and Data Management?
Mastering Data Compliance in a Dynamic Business Landscape
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Drive Data Privacy Regulatory Compliance
Знакомство с виртуализацией данных для профессионалов в области данных
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Lunch and Learn ANZ: Key Takeaways for 2023!
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
How to Build Your Data Marketplace with Data Virtualization?
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Enabling Data Catalog users with advanced usability
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
GenAI y el futuro de la gestión de datos: mitos y realidades

Recently uploaded (20)

PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Logistic Regression ml machine learning.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Global journeys: estimating international migration
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
A Quantitative-WPS Office.pptx research study
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Database Infoormation System (DBIS).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPT
Quality review (1)_presentation of this 21
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to machine learning and Linear Models
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Miokarditis (Inflamasi pada Otot Jantung)
climate analysis of Dhaka ,Banglades.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Logistic Regression ml machine learning.pptx
Clinical guidelines as a resource for EBP(1).pdf
Global journeys: estimating international migration
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Introduction to Knowledge Engineering Part 1
A Quantitative-WPS Office.pptx research study
Business Ppt On Nestle.pptx huunnnhhgfvu
Database Infoormation System (DBIS).pptx
Reliability_Chapter_ presentation 1221.5784
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Data_Analytics_and_PowerBI_Presentation.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Quality review (1)_presentation of this 21

Quicker Insights and Sustainable Business Agility Powered By Data Virtualization (A/NZ)

  • 1. How Data Virtualization impacts AI/ML projects Chris Day Director, APAC Sales Engineering cday@denodo.com 14 April 2021
  • 2. 3 Advanced Analytics & Machine Learning Projects Need Data Improving patient outcomes • Data includes patient demographics, family history, patient vitals, lab test results, claims data etc. Predictive maintenance • Maintenance data logs, data coming in from sensors – including temperature, running time, power level duration etc. Predicting late payment • Data includes company or individual demographics, payment history, customer support logs etc. Preventing frauds • Data includes the location where the claim originated, time of the day, claimant history and any recent adverse events. Reducing customer churn • Data includes customer demographics , products purchased, products used, pat transaction, company size, history, revenue etc.
  • 3. 4 VentureBeat AI, July 2019 87% of data science projects never make it into production
  • 4. 5 The Scale of the Problem
  • 5. What is Data Virtualization?
  • 6. 7 What is Data Virtualization? Consume in business applications Combine related data into views Connect to disparate data sources 2 3 1
  • 7. 8 What is Data Virtualization?
  • 8. 9 Data Virtualization: Unified Data Integration and Delivery • Data Abstraction: decoupling applications from data sources • Data Integration without replication or relocation of physical data • Easy Access to Any Data, high performance and real-time/ right-time • Unified metadata, security & governance across all data assets • Dynamic Data Catalog for self-service data services and easy discovery • Data Delivery in any format with intelligent query optimization
  • 10. 11 Typical Data Science Workflow A typical workflow for a data scientist is: 1. Gather the requirements for the business problem 2. Identify useful data ▪ Ingest data 3. Cleanse data into a useful format 4. Analyze data 5. Prepare input for your algorithms 6. Execute data science algorithms (ML, AI, etc.) ▪ Iterate steps 2 to 6 until valuable insights are produced 7. Visualize and share Source:http://sudeep.co/data-science/Understanding-the-Data-Science-Lifecycle/
  • 11. 12 Where Does Your Time Go? • 80% of time – Finding and preparing the data • 10% of time – Analysis • 10% of time – Visualizing data Source:http://sudeep.co/data-science/Understanding-the-Data-Science-Lifecycle/
  • 12. 13 Where Does Your Time Go? A large amount of time and effort goes into tasks not intrinsically related to data science: • Finding where the right data may be • Getting access to the data ▪ Bureaucracy ▪ Understand access methods and technology (noSQL, REST APIs, etc.) • Transforming data into a format easy to work with • Combining data originally available in different sources and formats • Profile & cleanse data to eliminate incomplete or inconsistent data points
  • 13. 14 Data Scientist Workflow Identify useful data Modify datainto auseful format Analyzedata Executedata science algorithms (ML,AI, etc.) Prepare for MLalgorithm
  • 14. 15 Identify Useful Data If the company has a virtual layer with a good coverage of data sources, this task is greatly simplified. ▪ A data virtualization tool like Denodo can offer unified access to all data available in the company. ▪ It abstracts the technologies underneath, offering a standard SQL interface to query and manipulate. To further simplify the challenge, Denodo offers a Data Catalog to search, find and explore your data assets.
  • 15. 16 Data Scientist Workflow Identify useful data Modify datainto auseful format Analyzedata Execute data science algorithms (ML,AI, etc.) Prepare for MLalgorithm
  • 16. 17 Data Virtualization offers the unique opportunity of using standard SQL (joins, aggregations, transformations, etc.) to access, manipulate and analyze any data. Cleansing and transformation steps can be easily accomplished in SQL. Its modeling capabilities enable the definition of views that embed this logic to foster reusability. Ingestion And Data Manipulation Tasks
  • 17. 18 McCormick Uses Denodo to Provide Data to Its AI Project Background ▪ McCormick’s AI and machine learning based project required data that was stored in internal systems spread across 4 different continents and in spreadsheets. ▪ Portions of data in the internal systems and spreadsheets that were shared with McCormick's research partner firms needed to be masked and at the same time unmasked when shared internally. ▪ McCormick wanted to create a data service that could simplify the process of data access and data sharing across the organisation and be used by the analytics teams for their machine learning projects.
  • 18. 19 • Data Quality • Multiple Brands • Which Data to Use?
  • 19. 20 McCormick – Multi-purpose Platform Solution Highlights ▪ Agile Data Delivery ▪ High Level of Reuse ▪ Single Discovery & Consumption Platform
  • 20. 21 Data Virtualization Benefits for McCormick ▪ Machine learning and applications were able to access refreshed, validated and indexed data in real time, without replication, from Denodo enterprise data service. ▪ The Denodo enterprise data service gave the business users the capability to compare data in multiple systems. ▪ Spreadsheets now the exception. ▪ Ensure the quality of proposed data and services.
  • 21. 22 ✓ Denodo can play key role in the data science ecosystem to reduce data exploration and analysis timeframes. ✓ Extends and integrates with the capabilities of notebooks, Python, R, etc. to improve the toolset of the data scientist. ✓ Provides a modern “SQL-on-Anything” engine. ✓ Can leverage Big Data technologies like Spark (as a data source, an ingestion tool and for external processing) to efficiently work with large data volumes. ✓ New and expanded tools for data scientists and citizen analysts: “Apache Zeppelin for Denodo” Notebook. Data Virtualization Benefits for AI and Machine Learning Projects
  • 24. Thanks! www.denodo.com info@denodo.com © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.