SlideShare a Scribd company logo
What is Data Virtualization and Why It Matters to You Alberto Pan,  CTO Justo Hidalgo,  VP Product Management & Consulting Denodo Technologies
 
Contents Why Data Virtualization? Productivity Distributed Query Optimization Layer Independence Governance Data Quality Architecture
Our Goal:  Serving the Information Barista
GREAT, BUT  WHAT’S THE PROBLEM?
Disjoint Views of Entities – the Elements Customer data spread over different and heterogeneous data sources Too much effort to locate and obtain the data. Data need to be not only extracted, but  combined among different applications, interfaces and formats. Log files (.txt/.log files) CRM (MySQL) Billing System (Web Service - Rest) Incidences System (Web Application) Inventory System (MS SQL Server) Product Catalog (Web Service -SOAP) Knowledge Base (Internet) Product Data (CSV)
It Would be So Nice If…
Happy Ending:  Single View of Element- Virtual Integration JDBC ODBC WS CSV XML Web Web Flat files Homogeneous access to all data CRM (MySQL) Billing System (Web Service - Rest) Incidences System (Web Application) Inventory System (MS SQL Server) Product Catalog (Web Service -SOAP) Knowledge Base Product Data (CSV) Log files (.txt/.log files)
BUT, WHY A  DATA VIRTUALIZATION  LAYER ?
DIDN’T WE HAVE ENOUGH WITH ETL, ESB, EAI, WS, …?
 
So, We Went and Asked our Experts
Why a Data Virtualization Layer? P roductivity D istributed Query Optimization P hysical and Logical independence G overnance D ata Quality
PRODUCTIVITY (because time is money)
Built-in connectors for data sources Complex Data Combination operations do not need to be programmed Productivity… Applications & 3 rd  Party Tools Enterprise Applications, BI, Portals, Dashboards, Web Applications… NAME  DESCRIPTION  PRICE NAME  DESCRIPTION  PRICE NAME  MANUFACTURER  SCORE NAME  DESCRIPTION  PRICE  MANUFACTURER  SCORE U ∞
Applications  do not need to deal with complex data-related issues E.g. swapping of large result sets E.g. caching of costly result sets E.g. management of changes in the sources is done in the DV layer, leaving the business layer unaffected Collaboration  and Prototyping Virtualization allows rapid prototyping and testing …  Productivity…
Uniform  access Developers use a single model and API instead of learning a mixture of different APIs Learning and execution curves are lower for every additional project on top of the DV layer …  Productivity Multi-access A Data Virtualization layer can offer the most appropriate access type for each application (JDBC, Web Service, Sharepoint widget…)
DISTRIBUTED QUERY OPTIMIZATION (because customers are waiting)
Multiple  execution strategies  available Performance of a distributed join query may vary enormously depending on the used method  e.g: hash join , merge join, nested join,… Even if the join is among the same data views, the optimum method may be different for different queries. Distributed Query Optimization…
The final Executable Plan depends on characteristics such as Strategies Sources Order Hash Join Logic Plan Candidate Physical Plans BOOK REVIEW BOOK REVIEW 1 BOOK REVIEW 2 BOOK REVIEW 2 BOOKSTORE A BOOKSTORE B     BOOK STORE A     BOOK STORE B Nested Loop Join BOOK STORE A   NL   BOOK STORE B BOOK STORE A     BOOK STORE B Hash Join
Source  query limitations Push processing  to data sources Materialization : pre-load frequently used data and temporal locality … Distributed Query Optimization join pushed into  data source Delegate join into  data source
 
Applications are  independent of changes  in data source location, implementation (e.g. from legacy to new system) and schema. E.g. A mainframe is replaced by a new system. Customer data now comes from two systems instead of one due to a merge/acquisition. Two aplications are reengineered into a single one.  The data schema of a data source changes. Physical and Logical Independence…
Let each tool do its business ! An ESB is good at orchestrating business services Data Virtualization is good at accessing  information repositories, homogeneizing them  and turning them into services … Physical and Logical Independence… ESB DATA VIRTUALIZATION
Changes  need to be done in a single place. E.g. the way to determine if a customer is ‘VIP’ changes. Many applications will use this data field. In some applications (e.g. BRMS systems) the field can be used many times. …  Physical and Logical Independence
GOVERNANCE (because 24x7 matters)
Single entry point for  data auditing : Track Data and Metadata changes.  E.g. Which user was the last one that modified a certain view?  Single point  to introspect and query metadata. What is the schema provided by any data source? Governance…
Change  impact management . Single point to answer questions like: … Governance… What are the consequences of a change in a data source? Where does the data used by applications come from?. What transformations are applied on source data before they are consumed by applications?
Single entry point for  data monitoring : Track data sources and data services usage. E.g.  how does the number of concurrent connections to a data source evolves throughout the day?  send me an e-mail alert if at least 10% of the last 100 queries to a data source failed. Security : Provide authentication and authorization mechanisms for data access. Provide Data encryption functionalities. Protect  data sources: Limit concurrent queries to a certain data source. Cache all or part of the data. Limit data replication needs at the data source level. … Governance
DATA QUALITY (because reliability matters)
Many  data quality actions  can be applied at this layer, avoiding duplicating them in every data source/ application. Data Quality
…  AND WHAT CAN WE DO WITH THESE PIECES?
Data Virtualization Detailed Architecture…
WRAPPING UP
Denodo Platform 4.6  – Virtualized Data Services in Less Time Improved connectivity with Enterprise Ecosystem Sources Connectivity, Middleware and DQ Tools, Publish level Improved Productivity & Ease of Use for  Application Developer (connectivity, web integration etc.)  and  Data Management Professional (metadata, governance etc) Benefits to Business Rapid access to real-time data from disparate sources for - Agile Reporting and Operational BI / Dashboards - Customer Service Operations, Customer Portals Web Integration becomes “mainstream”
You might want to start small …
…  but you can get very far with Data Virtualization!
www.denodo.com | info@denodo.com

More Related Content

PDF
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
PDF
Data-Ed Online: Approaching Data Quality
PDF
Reference master data management
PPTX
Modern Data Warehousing with the Microsoft Analytics Platform System
PDF
Denodo Data Virtualization Platform: Security (session 5 from Architect to Ar...
PDF
Data Governance
PPTX
Building a modern data warehouse
PPT
Business Intelligence - Intro
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Data-Ed Online: Approaching Data Quality
Reference master data management
Modern Data Warehousing with the Microsoft Analytics Platform System
Denodo Data Virtualization Platform: Security (session 5 from Architect to Ar...
Data Governance
Building a modern data warehouse
Business Intelligence - Intro

What's hot (20)

PDF
Real Time Data Strategy and Architecture
PDF
BI Consultancy - Data, Analytics and Strategy
PDF
Data Lake Architecture – Modern Strategies & Approaches
PDF
You Need a Data Catalog. Do You Know Why?
PPTX
Building the Data Lake with Azure Data Factory and Data Lake Analytics
PDF
Why an AI-Powered Data Catalog Tool is Critical to Business Success
PDF
Master Data Management – Aligning Data, Process, and Governance
PPTX
A 30 day plan to start ending your data struggle with Snowflake
PDF
Enterprise Architecture vs. Data Architecture
PPTX
Master Data Management methodology
PDF
Data Modeling Fundamentals
PDF
Building a Data Governance Strategy
PPT
Data Warehouse Modeling
PPTX
Data Lake Overview
PDF
Five Things to Consider About Data Mesh and Data Governance
PPTX
Data Quality & Data Governance
PDF
Emerging Trends in Data Architecture – What’s the Next Big Thing?
PDF
Changing the game with cloud dw
PPT
5 Level of MDM Maturity
PPT
Enterprise Master Data Architecture
Real Time Data Strategy and Architecture
BI Consultancy - Data, Analytics and Strategy
Data Lake Architecture – Modern Strategies & Approaches
You Need a Data Catalog. Do You Know Why?
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Master Data Management – Aligning Data, Process, and Governance
A 30 day plan to start ending your data struggle with Snowflake
Enterprise Architecture vs. Data Architecture
Master Data Management methodology
Data Modeling Fundamentals
Building a Data Governance Strategy
Data Warehouse Modeling
Data Lake Overview
Five Things to Consider About Data Mesh and Data Governance
Data Quality & Data Governance
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Changing the game with cloud dw
5 Level of MDM Maturity
Enterprise Master Data Architecture
Ad

Similar to Why Data Virtualization? An Introduction by Denodo (20)

PDF
Data Virtualization: Introduction and Business Value (UK)
PDF
Modern Data Management for Federal Modernization
PDF
t2_4-architecting-data-for-integration-and-longevity
PPT
Technology Overview
PDF
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
PPTX
Data Mesh in Azure using Cloud Scale Analytics (WAF)
PPS
Qo Introduction V2
PDF
Data Services and the Modern Data Ecosystem (ASEAN)
PDF
Why Data Virtualization? An Introduction
PPT
How to Get Cloud Architecture and Design Right the First Time
PDF
Data Driven Advanced Analytics using Denodo Platform on AWS
PPT
Cloud Data Integration Best Practices
PDF
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
PDF
Analyti x mapping manager product overview presentation
PPTX
Fast Data Strategy Houston Roadshow Presentation
PPTX
Big Data: It’s all about the Use Cases
PDF
An Introduction to Data Virtualization in 2018
PPT
Managing Data Integration Initiatives
PDF
GraphSummit - Process Tempo - Build Graph Applications.pdf
PDF
5 Steps for Architecting a Data Lake
Data Virtualization: Introduction and Business Value (UK)
Modern Data Management for Federal Modernization
t2_4-architecting-data-for-integration-and-longevity
Technology Overview
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Qo Introduction V2
Data Services and the Modern Data Ecosystem (ASEAN)
Why Data Virtualization? An Introduction
How to Get Cloud Architecture and Design Right the First Time
Data Driven Advanced Analytics using Denodo Platform on AWS
Cloud Data Integration Best Practices
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Analyti x mapping manager product overview presentation
Fast Data Strategy Houston Roadshow Presentation
Big Data: It’s all about the Use Cases
An Introduction to Data Virtualization in 2018
Managing Data Integration Initiatives
GraphSummit - Process Tempo - Build Graph Applications.pdf
5 Steps for Architecting a Data Lake
Ad

More from Justo Hidalgo (20)

PDF
Product Management - much more than coding and designing
PPTX
Idea, Producto y Negocio. Qué hay que saber para crear productos digitales (a...
PDF
Data Analytics for Startups - Tetuan Valley Startup School Fall 2015
PDF
Ebook subscription services - an example of user-focused innovation in publis...
PDF
24symbols' story... so far! Pres at xSpain 2015
PDF
IDPF 2015 - How 24symbols makes use of Data Science
PPT
Add a Data Scientist to your startup.. or call it quits!
PPT
May you live in interesting times. Munich Book Academy, December 2014
PPTX
Measure or die! Tetuan Valley Barcelona, Fall 2014
PPTX
ELS2014 - Add a Data Scientist to your Startup or Call it Quits
PPTX
Data Analytics for Startups - Tetuan Valley Startup School Fall 2014
PPTX
Metrics: because everything counts. Tetuan Valley Spring Session, 2014
PPT
Building a Books-as-a-Service Platform: Challenges and Opportunities. BiB 2013
PPTX
Introduction to Metrics - Tetuan Valley/CEU course, March 2014
PPTX
Metrics for Startups - Tetuan Valley Startup School Fall Session, 2013
PPTX
Online Marketing and Metrics Presentation at UEIA, 2012
PPTX
Metrics. Because everything COUNTS (LeanCamp Madrid 2012)
PPTX
Taller Nebrija sobre cursos MOOC
PPT
24symbols at 42Beers
PPT
Sowing the seeds of love - a call for a publishing startup accelerator program
Product Management - much more than coding and designing
Idea, Producto y Negocio. Qué hay que saber para crear productos digitales (a...
Data Analytics for Startups - Tetuan Valley Startup School Fall 2015
Ebook subscription services - an example of user-focused innovation in publis...
24symbols' story... so far! Pres at xSpain 2015
IDPF 2015 - How 24symbols makes use of Data Science
Add a Data Scientist to your startup.. or call it quits!
May you live in interesting times. Munich Book Academy, December 2014
Measure or die! Tetuan Valley Barcelona, Fall 2014
ELS2014 - Add a Data Scientist to your Startup or Call it Quits
Data Analytics for Startups - Tetuan Valley Startup School Fall 2014
Metrics: because everything counts. Tetuan Valley Spring Session, 2014
Building a Books-as-a-Service Platform: Challenges and Opportunities. BiB 2013
Introduction to Metrics - Tetuan Valley/CEU course, March 2014
Metrics for Startups - Tetuan Valley Startup School Fall Session, 2013
Online Marketing and Metrics Presentation at UEIA, 2012
Metrics. Because everything COUNTS (LeanCamp Madrid 2012)
Taller Nebrija sobre cursos MOOC
24symbols at 42Beers
Sowing the seeds of love - a call for a publishing startup accelerator program

Recently uploaded (20)

PPT
340036916-American-Literature-Literary-Period-Overview.ppt
PPTX
DMT - Profile Brief About Business .pptx
PDF
COST SHEET- Tender and Quotation unit 2.pdf
PDF
20250805_A. Stotz All Weather Strategy - Performance review July 2025.pdf
PPTX
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
PDF
Roadmap Map-digital Banking feature MB,IB,AB
PDF
Reconciliation AND MEMORANDUM RECONCILATION
PDF
The FMS General Management Prep-Book 2025.pdf
PPTX
5 Stages of group development guide.pptx
PDF
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
PPTX
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
PPTX
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
PDF
Unit 1 Cost Accounting - Cost sheet
DOCX
unit 1 COST ACCOUNTING AND COST SHEET
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
PDF
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
PDF
Hindu Circuler Economy - Model (Concept)
PPTX
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh
PDF
Lecture 3 - Risk Management and Compliance.pdf
PPTX
ICG2025_ICG 6th steering committee 30-8-24.pptx
340036916-American-Literature-Literary-Period-Overview.ppt
DMT - Profile Brief About Business .pptx
COST SHEET- Tender and Quotation unit 2.pdf
20250805_A. Stotz All Weather Strategy - Performance review July 2025.pdf
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
Roadmap Map-digital Banking feature MB,IB,AB
Reconciliation AND MEMORANDUM RECONCILATION
The FMS General Management Prep-Book 2025.pdf
5 Stages of group development guide.pptx
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
AI-assistance in Knowledge Collection and Curation supporting Safe and Sustai...
Unit 1 Cost Accounting - Cost sheet
unit 1 COST ACCOUNTING AND COST SHEET
Belch_12e_PPT_Ch18_Accessible_university.pptx
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
Hindu Circuler Economy - Model (Concept)
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh
Lecture 3 - Risk Management and Compliance.pdf
ICG2025_ICG 6th steering committee 30-8-24.pptx

Why Data Virtualization? An Introduction by Denodo

  • 1. What is Data Virtualization and Why It Matters to You Alberto Pan, CTO Justo Hidalgo, VP Product Management & Consulting Denodo Technologies
  • 2.  
  • 3. Contents Why Data Virtualization? Productivity Distributed Query Optimization Layer Independence Governance Data Quality Architecture
  • 4. Our Goal: Serving the Information Barista
  • 5. GREAT, BUT WHAT’S THE PROBLEM?
  • 6. Disjoint Views of Entities – the Elements Customer data spread over different and heterogeneous data sources Too much effort to locate and obtain the data. Data need to be not only extracted, but combined among different applications, interfaces and formats. Log files (.txt/.log files) CRM (MySQL) Billing System (Web Service - Rest) Incidences System (Web Application) Inventory System (MS SQL Server) Product Catalog (Web Service -SOAP) Knowledge Base (Internet) Product Data (CSV)
  • 7. It Would be So Nice If…
  • 8. Happy Ending: Single View of Element- Virtual Integration JDBC ODBC WS CSV XML Web Web Flat files Homogeneous access to all data CRM (MySQL) Billing System (Web Service - Rest) Incidences System (Web Application) Inventory System (MS SQL Server) Product Catalog (Web Service -SOAP) Knowledge Base Product Data (CSV) Log files (.txt/.log files)
  • 9. BUT, WHY A DATA VIRTUALIZATION LAYER ?
  • 10. DIDN’T WE HAVE ENOUGH WITH ETL, ESB, EAI, WS, …?
  • 11.  
  • 12. So, We Went and Asked our Experts
  • 13. Why a Data Virtualization Layer? P roductivity D istributed Query Optimization P hysical and Logical independence G overnance D ata Quality
  • 15. Built-in connectors for data sources Complex Data Combination operations do not need to be programmed Productivity… Applications & 3 rd Party Tools Enterprise Applications, BI, Portals, Dashboards, Web Applications… NAME DESCRIPTION PRICE NAME DESCRIPTION PRICE NAME MANUFACTURER SCORE NAME DESCRIPTION PRICE MANUFACTURER SCORE U ∞
  • 16. Applications do not need to deal with complex data-related issues E.g. swapping of large result sets E.g. caching of costly result sets E.g. management of changes in the sources is done in the DV layer, leaving the business layer unaffected Collaboration and Prototyping Virtualization allows rapid prototyping and testing … Productivity…
  • 17. Uniform access Developers use a single model and API instead of learning a mixture of different APIs Learning and execution curves are lower for every additional project on top of the DV layer … Productivity Multi-access A Data Virtualization layer can offer the most appropriate access type for each application (JDBC, Web Service, Sharepoint widget…)
  • 18. DISTRIBUTED QUERY OPTIMIZATION (because customers are waiting)
  • 19. Multiple execution strategies available Performance of a distributed join query may vary enormously depending on the used method e.g: hash join , merge join, nested join,… Even if the join is among the same data views, the optimum method may be different for different queries. Distributed Query Optimization…
  • 20. The final Executable Plan depends on characteristics such as Strategies Sources Order Hash Join Logic Plan Candidate Physical Plans BOOK REVIEW BOOK REVIEW 1 BOOK REVIEW 2 BOOK REVIEW 2 BOOKSTORE A BOOKSTORE B   BOOK STORE A   BOOK STORE B Nested Loop Join BOOK STORE A   NL BOOK STORE B BOOK STORE A   BOOK STORE B Hash Join
  • 21. Source query limitations Push processing to data sources Materialization : pre-load frequently used data and temporal locality … Distributed Query Optimization join pushed into data source Delegate join into data source
  • 22.  
  • 23. Applications are independent of changes in data source location, implementation (e.g. from legacy to new system) and schema. E.g. A mainframe is replaced by a new system. Customer data now comes from two systems instead of one due to a merge/acquisition. Two aplications are reengineered into a single one. The data schema of a data source changes. Physical and Logical Independence…
  • 24. Let each tool do its business ! An ESB is good at orchestrating business services Data Virtualization is good at accessing information repositories, homogeneizing them and turning them into services … Physical and Logical Independence… ESB DATA VIRTUALIZATION
  • 25. Changes need to be done in a single place. E.g. the way to determine if a customer is ‘VIP’ changes. Many applications will use this data field. In some applications (e.g. BRMS systems) the field can be used many times. … Physical and Logical Independence
  • 27. Single entry point for data auditing : Track Data and Metadata changes. E.g. Which user was the last one that modified a certain view? Single point to introspect and query metadata. What is the schema provided by any data source? Governance…
  • 28. Change impact management . Single point to answer questions like: … Governance… What are the consequences of a change in a data source? Where does the data used by applications come from?. What transformations are applied on source data before they are consumed by applications?
  • 29. Single entry point for data monitoring : Track data sources and data services usage. E.g. how does the number of concurrent connections to a data source evolves throughout the day? send me an e-mail alert if at least 10% of the last 100 queries to a data source failed. Security : Provide authentication and authorization mechanisms for data access. Provide Data encryption functionalities. Protect data sources: Limit concurrent queries to a certain data source. Cache all or part of the data. Limit data replication needs at the data source level. … Governance
  • 30. DATA QUALITY (because reliability matters)
  • 31. Many data quality actions can be applied at this layer, avoiding duplicating them in every data source/ application. Data Quality
  • 32. … AND WHAT CAN WE DO WITH THESE PIECES?
  • 33. Data Virtualization Detailed Architecture…
  • 35. Denodo Platform 4.6 – Virtualized Data Services in Less Time Improved connectivity with Enterprise Ecosystem Sources Connectivity, Middleware and DQ Tools, Publish level Improved Productivity & Ease of Use for Application Developer (connectivity, web integration etc.)  and Data Management Professional (metadata, governance etc) Benefits to Business Rapid access to real-time data from disparate sources for - Agile Reporting and Operational BI / Dashboards - Customer Service Operations, Customer Portals Web Integration becomes “mainstream”
  • 36. You might want to start small …
  • 37. … but you can get very far with Data Virtualization!

Editor's Notes

  • #11: http://www.flickr.com/photos/maxbraun/98688824/
  • #12: http://dutchamericantranslations.wordpress.com/2010/01/04/matters-of-taste-acronym-or-initialism/
  • #13: http://www.flickr.com/photos/glenirah/4376553184/
  • #15: http://www.flickr.com/photos/adikos/4443291195/
  • #17: Collaboration: self-documenting model, but also actionable. Rapid prototyping platform.
  • #18: Collaboration: self-documenting model, but also actionable. Rapid prototyping platform.
  • #19: http://www.flickr.com/photos/laserstars/908946494/
  • #23: http://www.flickr.com/photos/tudor/458287668/
  • #27: http://www.flickr.com/photos/totalaldo/508664515/
  • #31: http://www.flickr.com/photos/heist_mine/4256417595/
  • #33: http://www.flickr.com/photos/oskay/2157682522/
  • #35: http://www.flickr.com/photos/stevendepolo/3703145222/
  • #37: http://www.flickr.com/photos/m-nicolson/2414298534/
  • #39: http://www.flickr.com/photos/psd/2086641/