SlideShare a Scribd company logo
Optimize Data for
the Logical Data
Warehouse
www.attunity.com
Get more value from your data!
Enterprise Data Challenges Today
Marketing
Operations
Sales
Exploding
Data
ERP
CRM
POS
Legacy
Logs
Sensor
Files
Data
Warehou
se
Database
Multiple
Platforms
Several
Business Lines
Escalating
Costs
Lack Of
Visibility
Increasing
Complexity
Enterprise Data Requirements
ERP
CRM
POS
Legacy
Logs
Sensor
Files
RDBMS
DW Hadoop
Cloud
Files
Right Data | Right Platform | Right Time
Marketing
Operations
Sales
Real Time Quick Value
The Right
Platform
The Need for Data Warehouse Optimization
 Significant amount of data in data warehouse is unused/dormant
 ETL/ELT processes for unused data unnecessarily consuming CPU capacity
 Dormant data consuming unnecessary storage capacity
Hot Warm Cold Data
Transformations (ELT) of unused data
Storage capacity for dormant data
4
Transformations /
Data Loads
Analytical
Queries
System
Resources
65%
35%
• Single console across platforms
Teradata|Exadata|DB2|Netezza|Hadoop
• Modernize the Data Warehouse with
Hadoop by identifying intensive
workloads and unused data
• Optimize storage by identifying
frequently and infrequently used data
• Improve performance by diagnosing
bottlenecks based on data usage
• Charge-back / Show-back activity and
usage by Business Lines and
Departments
• Track user activity on sensitive data for
audit and compliance
Attunity Visibility
Enterprise Data Usage Analytics
Customer Success
• Data Warehouse at 600+ TB
• Data growth 50% every year
• Cost prohibitive, poor performance
With Attunity Visibility:
• Offloaded to Hadoop
• Saved over $15 million in data
warehouse costs over 3 years
• 300 Node Hadoop
• Data Warehouse at 300+ TB
• System at maximum capacity
• No visibility into business use of data
With Attunity Visibility:
• Offloaded to Hadoop
• Saved over $6 million in data
warehouse costs in less than 2 years
• 500 Node Hadoop
Fortune 50 Bank Online Travel Site
Data Usage
Unused Databases with
largest number of Tables
by Size. Drill to identify
specific Tables.
• Unused Data (e.g. Tables
with no ‘SELECT’
statements)
70 Terabytes in
Unused Databases
• History of data used in
large “Fact” table
• Queries go back only 2 years
• Maintains 8 years of data
Data Usage
Workload Performance
Almost 60% of CPU to
load and ingest data
• Intensive ETL workloads
ETL process only 1.6%
of workload but 54%
of CPU consumption
• Specific workloads and its
impact on resources
Workload Resource Consumption
Workload Resource Consumption
The Top 100 repetitive SQL of
101,000 in ETL SQL accounts
for 30+ % of CPU consumption
by ETL.
Business-User Activity
• Activity and resource
consumption by
department
Product Architecture
14
Attunity Visibility Architecture
EDW Database
Platforms
Purger
Populator
Cataloger
Analyzer
Collector Central
Repository
Visibility Processes
Key Components
Repository – Centrally
stores analyzed queries
& performance metrics
Cataloger - Snapshot
of DW metadata/
schema
Collector - Collects
information from query
logs
Analyzer - Analyzes
and parses data
collected; Builds &
stores a full parse tree
Populator - Aggregates
& moves parsed data
from Target Schema
into Reporting Schema
Purger - Removes old
data from Repository
Web application with
dashboards & analytics
User Activity Data Usage Workload Performance
Attunity and Cisco
Solutions for the Logical Data Warehouse
• Create the ROI for the Logical Data Warehouse with Attunity Visibility
• Ingest data to fill and expand Logical Data Warehouse with Attunity Replicate
Start getting more value from your Big Data today!
Thank you!
For more information,
send an e-mail to
sales@attunity.com
Or, go to www.attunity.com

More Related Content

PPTX
Which data should you move to Hadoop?
PPTX
How Glidewell Moves Data to Amazon Redshift
PPTX
Accelerating Big Data Analytics
PPTX
Digital Business Transformation in the Streaming Era
PPTX
Attunity Solutions for Teradata
PPTX
Break Free From Oracle with Attunity and Microsoft
PPTX
How to Operationalise Real-Time Hadoop in the Cloud
PPT
Attunity Efficient ODR For Sql Server Using Attunity CDC Suite For SSIS Slide...
Which data should you move to Hadoop?
How Glidewell Moves Data to Amazon Redshift
Accelerating Big Data Analytics
Digital Business Transformation in the Streaming Era
Attunity Solutions for Teradata
Break Free From Oracle with Attunity and Microsoft
How to Operationalise Real-Time Hadoop in the Cloud
Attunity Efficient ODR For Sql Server Using Attunity CDC Suite For SSIS Slide...

What's hot (20)

PDF
Data platform architecture
PPTX
Real-time Data Pipelines with SAP and Apache Kafka
PPTX
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
PPTX
Versa Shore Microsoft APS PDW webinar
PPTX
Optimizing industrial operations using the big data ecosystem
PPTX
Atlanta Data Science Meetup | Qubole slides
PPTX
Anatomy of a data driven architecture - Tamir Dresher
PDF
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
PPTX
Big Data Analytics in the Cloud with Microsoft Azure
PPTX
Modernizing Your Data Warehouse using APS
PDF
Seeing Redshift: How Amazon Changed Data Warehousing Forever
PPTX
Synapse for mere mortals
PPTX
Modernize & Automate Analytics Data Pipelines
PDF
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
PDF
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
PPTX
The modern analytics architecture
PPTX
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
PPTX
Accelerating Data Warehouse Modernization
PDF
The Holy Grail of Data Analytics
PDF
Building Custom Big Data Integrations
Data platform architecture
Real-time Data Pipelines with SAP and Apache Kafka
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Versa Shore Microsoft APS PDW webinar
Optimizing industrial operations using the big data ecosystem
Atlanta Data Science Meetup | Qubole slides
Anatomy of a data driven architecture - Tamir Dresher
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Big Data Analytics in the Cloud with Microsoft Azure
Modernizing Your Data Warehouse using APS
Seeing Redshift: How Amazon Changed Data Warehousing Forever
Synapse for mere mortals
Modernize & Automate Analytics Data Pipelines
Database Camp 2016 @ United Nations, NYC - Bob Wiederhold, CEO, Couchbase
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
The modern analytics architecture
Webinar: The Modern Streaming Data Stack with Kinetica & StreamSets
Accelerating Data Warehouse Modernization
The Holy Grail of Data Analytics
Building Custom Big Data Integrations
Ad

Viewers also liked (14)

PPTX
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
PDF
Fluentd - road to v1 -
PDF
Indexed Hive
PPTX
The Future of Hadoop Security - Hadoop Summit 2014
PPTX
Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
PDF
Teradata - Presentation at Hortonworks Booth - Strata 2014
PDF
Apache Sqoop: A Data Transfer Tool for Hadoop
PPTX
Priyank Patel, Teradata, Hadoop & SQL
PDF
Hive tuning
PDF
Working with informtiaca teradata parallel transporter
PDF
Embulk, an open-source plugin-based parallel bulk data loader
PPT
ERP - Implementation is The Challenge
PPTX
Project Rhino: Enhancing Data Protection for Hadoop
PPT
Enterprise Systems: SCM, CRM, & ERP
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Fluentd - road to v1 -
Indexed Hive
The Future of Hadoop Security - Hadoop Summit 2014
Partners 2013 LinkedIn Use Cases for Teradata Connectors for Hadoop
Teradata - Presentation at Hortonworks Booth - Strata 2014
Apache Sqoop: A Data Transfer Tool for Hadoop
Priyank Patel, Teradata, Hadoop & SQL
Hive tuning
Working with informtiaca teradata parallel transporter
Embulk, an open-source plugin-based parallel bulk data loader
ERP - Implementation is The Challenge
Project Rhino: Enhancing Data Protection for Hadoop
Enterprise Systems: SCM, CRM, & ERP
Ad

Similar to Optimize Data for the Logical Data Warehouse (20)

PDF
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
PDF
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
PPT
Migrating legacy ERP data into Hadoop
PPTX
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
PDF
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
PPTX
Enabling the Real Time Analytical Enterprise
PPTX
Wsta event 3 19-2015.v6
PPTX
Big Data LDN 2016: When Big Data Meets Fast Data
PPTX
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
PDF
Attunity Hortonworks Webinar- Sept 22, 2016
PDF
Big Data LDN 2017: Data Integration & Big Data Management
PDF
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
PDF
Logical-DataWarehouse-Alluxio-meetup
PPTX
Introduction to Harnessing Big Data
PDF
Accelerate Data Discovery
PDF
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
PDF
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
PPTX
Big Data Expo 2015 - HP Information Management & Governance
PDF
Unlock Your Data for ML & AI using Data Virtualization
PDF
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Migrating legacy ERP data into Hadoop
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Enabling the Real Time Analytical Enterprise
Wsta event 3 19-2015.v6
Big Data LDN 2016: When Big Data Meets Fast Data
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Attunity Hortonworks Webinar- Sept 22, 2016
Big Data LDN 2017: Data Integration & Big Data Management
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Logical-DataWarehouse-Alluxio-meetup
Introduction to Harnessing Big Data
Accelerate Data Discovery
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
In-Memory Logical Data Warehouse for accelerating Machine Learning Pipelines ...
Big Data Expo 2015 - HP Information Management & Governance
Unlock Your Data for ML & AI using Data Virtualization
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
Teaching material agriculture food technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Machine learning based COVID-19 study performance prediction
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Unlocking AI with Model Context Protocol (MCP)
Teaching material agriculture food technology
Review of recent advances in non-invasive hemoglobin estimation
Machine learning based COVID-19 study performance prediction
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Big Data Technologies - Introduction.pptx
cuic standard and advanced reporting.pdf
Empathic Computing: Creating Shared Understanding
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Advanced methodologies resolving dimensionality complications for autism neur...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Reach Out and Touch Someone: Haptics and Empathic Computing
sap open course for s4hana steps from ECC to s4
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Understanding_Digital_Forensics_Presentation.pptx

Optimize Data for the Logical Data Warehouse

  • 1. Optimize Data for the Logical Data Warehouse www.attunity.com Get more value from your data!
  • 2. Enterprise Data Challenges Today Marketing Operations Sales Exploding Data ERP CRM POS Legacy Logs Sensor Files Data Warehou se Database Multiple Platforms Several Business Lines Escalating Costs Lack Of Visibility Increasing Complexity
  • 3. Enterprise Data Requirements ERP CRM POS Legacy Logs Sensor Files RDBMS DW Hadoop Cloud Files Right Data | Right Platform | Right Time Marketing Operations Sales Real Time Quick Value The Right Platform
  • 4. The Need for Data Warehouse Optimization  Significant amount of data in data warehouse is unused/dormant  ETL/ELT processes for unused data unnecessarily consuming CPU capacity  Dormant data consuming unnecessary storage capacity Hot Warm Cold Data Transformations (ELT) of unused data Storage capacity for dormant data 4 Transformations / Data Loads Analytical Queries System Resources 65% 35%
  • 5. • Single console across platforms Teradata|Exadata|DB2|Netezza|Hadoop • Modernize the Data Warehouse with Hadoop by identifying intensive workloads and unused data • Optimize storage by identifying frequently and infrequently used data • Improve performance by diagnosing bottlenecks based on data usage • Charge-back / Show-back activity and usage by Business Lines and Departments • Track user activity on sensitive data for audit and compliance Attunity Visibility Enterprise Data Usage Analytics
  • 6. Customer Success • Data Warehouse at 600+ TB • Data growth 50% every year • Cost prohibitive, poor performance With Attunity Visibility: • Offloaded to Hadoop • Saved over $15 million in data warehouse costs over 3 years • 300 Node Hadoop • Data Warehouse at 300+ TB • System at maximum capacity • No visibility into business use of data With Attunity Visibility: • Offloaded to Hadoop • Saved over $6 million in data warehouse costs in less than 2 years • 500 Node Hadoop Fortune 50 Bank Online Travel Site
  • 7. Data Usage Unused Databases with largest number of Tables by Size. Drill to identify specific Tables. • Unused Data (e.g. Tables with no ‘SELECT’ statements) 70 Terabytes in Unused Databases
  • 8. • History of data used in large “Fact” table • Queries go back only 2 years • Maintains 8 years of data Data Usage
  • 9. Workload Performance Almost 60% of CPU to load and ingest data • Intensive ETL workloads
  • 10. ETL process only 1.6% of workload but 54% of CPU consumption • Specific workloads and its impact on resources Workload Resource Consumption
  • 11. Workload Resource Consumption The Top 100 repetitive SQL of 101,000 in ETL SQL accounts for 30+ % of CPU consumption by ETL.
  • 12. Business-User Activity • Activity and resource consumption by department
  • 14. 14 Attunity Visibility Architecture EDW Database Platforms Purger Populator Cataloger Analyzer Collector Central Repository Visibility Processes Key Components Repository – Centrally stores analyzed queries & performance metrics Cataloger - Snapshot of DW metadata/ schema Collector - Collects information from query logs Analyzer - Analyzes and parses data collected; Builds & stores a full parse tree Populator - Aggregates & moves parsed data from Target Schema into Reporting Schema Purger - Removes old data from Repository Web application with dashboards & analytics User Activity Data Usage Workload Performance
  • 15. Attunity and Cisco Solutions for the Logical Data Warehouse • Create the ROI for the Logical Data Warehouse with Attunity Visibility • Ingest data to fill and expand Logical Data Warehouse with Attunity Replicate Start getting more value from your Big Data today!
  • 16. Thank you! For more information, send an e-mail to sales@attunity.com Or, go to www.attunity.com

Editor's Notes

  • #3: Today IT is faced with enormous challenges in delivering data to the enterprise. Data is growing exponentially but IT budgets are staying flat. You cannot continue to invest in infrastructure at the same rate of data growth.   In addition, data is increasing being delivered through multiple or platforms making it very complex to efficiently manage and optimize the environment.   It is also very difficult for IT to prioritize and justify investments without ability to charge-back or show-back utilization (of data and system resources) by business lines.
  • #4: On the other hand, the Business expects to data in real-time that is at the right place at the right time, and can extract value from the data as quickly as possible.
  • #5: ELT processes are driving up data warehousing costs.   Our experience in analyzing data usage at large organizations shows that a significant amount of data is not being used – but is continuously loaded on a daily basis.   Dormant data not only is taking up storage capacity, but the bigger impact is the processing capacity in terms of CPU and I/O that is wasted on running ELT on the data warehouse - to load data that the business does not actively use.   Admittedly, in many situations – organizations are required by regulatory reasons to maintain a history of data – even if it is not being used.   So the best approach here to significantly cut data warehousing costs is to : Eliminate batch loads for data that is not used and not needed. More importantly offload the ELT processes for unused data that needs to be maintained – do it all on on Hadoop and actively archive that unused data on Hadoop. This way you can recover all the wasted capacity from your expensive data warehouse systems.