SlideShare a Scribd company logo
Extend the Reach of Data Science with Data Virtualization
Extend the Reach of Data Science with
Data Virtualization
John O'Brien
Principal Advisor and CEO , Radiant Advisors
Copyright 2019. Radiant Advisors All Rights Reserved.
DATA SCIENCE PROJECT CHARACTERISTICS
3
Extend the Reach of Data Science with Data Virtualization
Bulk of work in data science projects involves integrating many disparate
data sets to create extremely wide data
Data science data requires as many data sets as possible to be integrated in
such a way that the business context aligns with the goals of the project
Data-savvy business analysts are knowledgeable with business systems’
data and SQL but are not programmers
Copyright 2019. Radiant Advisors All Rights Reserved.
BUSINESS SEMANTIC LAYER FOR DATA SCIENCE
4
Extend the Reach of Data Science with Data Virtualization
Dedicate a virtual database to business analysts and data scientists for building
integrations that can be assembled into broader business contexts
Make more heterogeneous data available to data science projects without data
acquisition and storage
Data virtualization architecture needs to account for how to assemble wider
integrated data sets that are built upon subsets for data science usage
Copyright 2019. Radiant Advisors All Rights Reserved.
ADVANTAGE OF A SQL QUERY ENGINE
5
Extend the Reach of Data Science with Data Virtualization
Fast data retrieval times come from the SQL engine in data virtualization that
generates optimized query execution plans for the underlying databases
SQL-based joins and unions can break down complex integrated views into
subsets of views that are easily created, verified, and reused
Caching data allows faster data science when iteratively working with the same
data set
Copyright 2019. Radiant Advisors All Rights Reserved.
FASTER DATA SCIENCE FROM REUSABILITY
6
Extend the Reach of Data Science with Data Virtualization
Data science projects are faster when they move from complex one-time
integrations to modularity for reuse
Data prep can be organized as a series of views upon views that build a data set
for data science input
Views that are created as basic modules of data integration can be assembled
into complex, integrated, cross-subject-area data sets for data science
Copyright 2019. Radiant Advisors All Rights Reserved.
FASTER DATA SCIENCE FROM DATA REFRESHES
7
Extend the Reach of Data Science with Data Virtualization
Machine learning model training, supervised reinforcement, and
unsupervised techniques
▪ Materialize training data from a virtual table that stores its results in another
database for machine learning supervised training
▪ Access real-time data from a virtual table for the latest data to be used in machine
learning reinforcement training
▪ Cache data sets to alleviate performance bottlenecks
Copyright 2019. Radiant Advisors All Rights Reserved.
SUMMARY
8
Extend the Reach of Data Science with Data Virtualization
Data science projects that leverage data virtualization benefit from having
access to all heterogeneous data accessible in the enterprise
A data science-oriented virtual data architecture will accelerate development
and increase quality by reducing the complexity of data science integrations
Business analysts can work in a data virtualization environment to contribute to
data science projects
© Copyright Denodo Technologies. All rights reserved
Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written
authorization from Denodo Technologies.

More Related Content

PDF
How to accelerate Splunk analytics
PDF
The 2017 data center: Clouds, containers and IoT data
PDF
A Journey to the Cloud with Data Virtualization
PDF
Big data high performance computing commenting
PDF
Mapping the road to better data storage strategies
PDF
Why is hybrid cloud still so hard? 4 keys to unlock the future of IT
PDF
Open Source Ecosystem Future of Enterprise IT
PDF
Role of Unified AI and ML in Cloud Technologies. Which Cloud Service Provider...
How to accelerate Splunk analytics
The 2017 data center: Clouds, containers and IoT data
A Journey to the Cloud with Data Virtualization
Big data high performance computing commenting
Mapping the road to better data storage strategies
Why is hybrid cloud still so hard? 4 keys to unlock the future of IT
Open Source Ecosystem Future of Enterprise IT
Role of Unified AI and ML in Cloud Technologies. Which Cloud Service Provider...

What's hot (20)

PDF
The traditional data center is dead: How to win with hybrid DR
PDF
Forecast 2012 Panel: Big Data in the Cloud Das Kamhout
PDF
Cloud Adoption, Risks and Rewards Infographic
PDF
10 Good Reasons: NetApp for Healthcare
PDF
Azure: Your Data Center in the Cloud
PDF
Pyramid Pressefrühstück (Präsentation Intel) München
PDF
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
PDF
What Healthcare Organizations Need to Know about Hybrid Data Storage
PDF
MSPs: Give customers the cloud (without letting them float away)
PDF
Denodo DataFest 2016: Enterprise View of Data with Semantic Data Layer
PDF
Pieter den Hamer Alliander
PPTX
Live Seminar Cloudera & Big Data Ecosystem
PPTX
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
PPTX
Big Data in Engineering Applications
PPTX
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
PDF
Build Better Data-Driven Insights
PPTX
Multi Cloud Data Integration- Retail
PPTX
Asking the Right Questions of Your Data
PPTX
The future of hybrid IT models
PPTX
Multi Cloud Data Integration- Manufacturing Industry
The traditional data center is dead: How to win with hybrid DR
Forecast 2012 Panel: Big Data in the Cloud Das Kamhout
Cloud Adoption, Risks and Rewards Infographic
10 Good Reasons: NetApp for Healthcare
Azure: Your Data Center in the Cloud
Pyramid Pressefrühstück (Präsentation Intel) München
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
What Healthcare Organizations Need to Know about Hybrid Data Storage
MSPs: Give customers the cloud (without letting them float away)
Denodo DataFest 2016: Enterprise View of Data with Semantic Data Layer
Pieter den Hamer Alliander
Live Seminar Cloudera & Big Data Ecosystem
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Big Data in Engineering Applications
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Build Better Data-Driven Insights
Multi Cloud Data Integration- Retail
Asking the Right Questions of Your Data
The future of hybrid IT models
Multi Cloud Data Integration- Manufacturing Industry
Ad

Similar to Extend the Reach of Data Science with Data Virtualization (20)

PDF
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
PDF
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
PDF
Data Science Operationalization: The Journey of Enterprise AI
PDF
Advanced Analytics and Machine Learning with Data Virtualization (India)
PDF
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
PDF
How Data Virtualization Puts Machine Learning into Production (APAC)
PDF
Advanced Analytics and Machine Learning with Data Virtualization
PPTX
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
PDF
CAN DATA SCIENCE COMMAND THE FUTURE OF BUSINESSES IN 2025.pdf
PDF
Data Virtualization: The Agile Delivery Platform
PPTX
Data science applications and usecases
PDF
Building successful data science teams
PDF
Why Data Virtualization? An Introduction.
PDF
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
PDF
Where does Fast Data Strategy Fit within IT Projects
PDF
Ds01 data science
PPTX
Real-time applications of Data Science.pptx
PDF
TOP 15 Data Science Advantages for Business
PDF
Data science - An Introduction
PDF
Advanced Analytics and Machine Learning with Data Virtualization
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Data Science Operationalization: The Journey of Enterprise AI
Advanced Analytics and Machine Learning with Data Virtualization (India)
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Machine Learning into Production (APAC)
Advanced Analytics and Machine Learning with Data Virtualization
End to End Machine Learning Open Source Solution Presented in Cisco Developer...
CAN DATA SCIENCE COMMAND THE FUTURE OF BUSINESSES IN 2025.pdf
Data Virtualization: The Agile Delivery Platform
Data science applications and usecases
Building successful data science teams
Why Data Virtualization? An Introduction.
Bridging the Gap Between Data Science & Engineer: Building High-Performance T...
Where does Fast Data Strategy Fit within IT Projects
Ds01 data science
Real-time applications of Data Science.pptx
TOP 15 Data Science Advantages for Business
Data science - An Introduction
Advanced Analytics and Machine Learning with Data Virtualization
Ad

More from Denodo (20)

PDF
Enterprise Monitoring and Auditing in Denodo
PDF
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
PDF
Achieving Self-Service Analytics with a Governed Data Services Layer
PDF
What you need to know about Generative AI and Data Management?
PDF
Mastering Data Compliance in a Dynamic Business Landscape
PDF
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
PDF
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
PDF
Drive Data Privacy Regulatory Compliance
PDF
Знакомство с виртуализацией данных для профессионалов в области данных
PDF
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
PDF
Denodo Partner Connect - Technical Webinar - Ask Me Anything
PDF
Lunch and Learn ANZ: Key Takeaways for 2023!
PDF
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
PDF
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
PDF
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
PDF
How to Build Your Data Marketplace with Data Virtualization?
PDF
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
PDF
Enabling Data Catalog users with advanced usability
PDF
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
PDF
GenAI y el futuro de la gestión de datos: mitos y realidades
Enterprise Monitoring and Auditing in Denodo
Lunch and Learn ANZ: Mastering Cloud Data Cost Control: A FinOps Approach
Achieving Self-Service Analytics with a Governed Data Services Layer
What you need to know about Generative AI and Data Management?
Mastering Data Compliance in a Dynamic Business Landscape
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Expert Panel: Overcoming Challenges with Distributed Data to Maximize Busines...
Drive Data Privacy Regulatory Compliance
Знакомство с виртуализацией данных для профессионалов в области данных
Data Democratization: A Secret Sauce to Say Goodbye to Data Fragmentation
Denodo Partner Connect - Technical Webinar - Ask Me Anything
Lunch and Learn ANZ: Key Takeaways for 2023!
It’s a Wrap! 2023 – A Groundbreaking Year for AI and The Way Forward
Quels sont les facteurs-clés de succès pour appliquer au mieux le RGPD à votr...
Lunch and Learn ANZ: Achieving Self-Service Analytics with a Governed Data Se...
How to Build Your Data Marketplace with Data Virtualization?
Webinar #2 - Transforming Challenges into Opportunities for Credit Unions
Enabling Data Catalog users with advanced usability
Denodo Partner Connect: Technical Webinar - Architect Associate Certification...
GenAI y el futuro de la gestión de datos: mitos y realidades

Recently uploaded (20)

PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Computer network topology notes for revision
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Quality review (1)_presentation of this 21
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Introduction to machine learning and Linear Models
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction to Knowledge Engineering Part 1
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPT
Miokarditis (Inflamasi pada Otot Jantung)
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Computer network topology notes for revision
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Qualitative Qantitative and Mixed Methods.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Quality review (1)_presentation of this 21
1_Introduction to advance data techniques.pptx
Introduction to machine learning and Linear Models
ISS -ESG Data flows What is ESG and HowHow
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Knowledge Engineering Part 1
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
climate analysis of Dhaka ,Banglades.pptx
Fluorescence-microscope_Botany_detailed content
Miokarditis (Inflamasi pada Otot Jantung)

Extend the Reach of Data Science with Data Virtualization

  • 2. Extend the Reach of Data Science with Data Virtualization John O'Brien Principal Advisor and CEO , Radiant Advisors
  • 3. Copyright 2019. Radiant Advisors All Rights Reserved. DATA SCIENCE PROJECT CHARACTERISTICS 3 Extend the Reach of Data Science with Data Virtualization Bulk of work in data science projects involves integrating many disparate data sets to create extremely wide data Data science data requires as many data sets as possible to be integrated in such a way that the business context aligns with the goals of the project Data-savvy business analysts are knowledgeable with business systems’ data and SQL but are not programmers
  • 4. Copyright 2019. Radiant Advisors All Rights Reserved. BUSINESS SEMANTIC LAYER FOR DATA SCIENCE 4 Extend the Reach of Data Science with Data Virtualization Dedicate a virtual database to business analysts and data scientists for building integrations that can be assembled into broader business contexts Make more heterogeneous data available to data science projects without data acquisition and storage Data virtualization architecture needs to account for how to assemble wider integrated data sets that are built upon subsets for data science usage
  • 5. Copyright 2019. Radiant Advisors All Rights Reserved. ADVANTAGE OF A SQL QUERY ENGINE 5 Extend the Reach of Data Science with Data Virtualization Fast data retrieval times come from the SQL engine in data virtualization that generates optimized query execution plans for the underlying databases SQL-based joins and unions can break down complex integrated views into subsets of views that are easily created, verified, and reused Caching data allows faster data science when iteratively working with the same data set
  • 6. Copyright 2019. Radiant Advisors All Rights Reserved. FASTER DATA SCIENCE FROM REUSABILITY 6 Extend the Reach of Data Science with Data Virtualization Data science projects are faster when they move from complex one-time integrations to modularity for reuse Data prep can be organized as a series of views upon views that build a data set for data science input Views that are created as basic modules of data integration can be assembled into complex, integrated, cross-subject-area data sets for data science
  • 7. Copyright 2019. Radiant Advisors All Rights Reserved. FASTER DATA SCIENCE FROM DATA REFRESHES 7 Extend the Reach of Data Science with Data Virtualization Machine learning model training, supervised reinforcement, and unsupervised techniques ▪ Materialize training data from a virtual table that stores its results in another database for machine learning supervised training ▪ Access real-time data from a virtual table for the latest data to be used in machine learning reinforcement training ▪ Cache data sets to alleviate performance bottlenecks
  • 8. Copyright 2019. Radiant Advisors All Rights Reserved. SUMMARY 8 Extend the Reach of Data Science with Data Virtualization Data science projects that leverage data virtualization benefit from having access to all heterogeneous data accessible in the enterprise A data science-oriented virtual data architecture will accelerate development and increase quality by reducing the complexity of data science integrations Business analysts can work in a data virtualization environment to contribute to data science projects
  • 9. © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.