SlideShare a Scribd company logo
The Transformation of
your Data in modern IT
Jeff Wiggins, Technical Manager Emerging
Technology Division
© Copyright 2016 Dell Inc.2
ALL ORGANISATIONS ARE ON A JOURNEY TO…
1000X
MORE DATA
REAL TIME
OPERATION
ANALYTIC
INSIGHTS
PERSONALISATION & ENHANCED SERVICES
© Copyright 2016 Dell Inc.3
THE JOURNEY TO DIGITAL BREAKS
TRADITIONAL IT INFRASTRUCTURE
Gartner IT Budget Growth
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
TRADITIONAL
DATA
NEW DATA
SOURCES
© Copyright 2016 Dell Inc.4
Challenges with Enterprise Data Warehouses
1. Expensive storage
– 70% of data in a typical EDW is unused
2. Expensive processing
– On average 55% of EDW CPU utilisation is low value ETL
3. Expensive licensing…
4. New data sources
– Traditional systems are unable to capture and use new data sources, such as
unstructured or semi-structured data
© Copyright 2016 Dell Inc.5
COST DRIVERS
OPERATIONS
50%
ANALYTICS
20%
ETL/ELT
30%
COLD DATA
70%
HOT DATA
30%
ENTERPRISE DATA WAREHOUSE
HADOOP WITH ENTERPRISE GRADE STORAGE SOLUTION
ETL/ELT OFFLOADACTIVE ARCHIVE
> $16 K
per TB
< $1 K
per TB
Cost Comparison
Vs.
© Copyright 2016 Dell Inc.6
Throw Data Away1
Waste capacity on low
value workloads
2
Unable to leverage new
data sources
3
CHALLENGES WITH EXISTING EDW INFRASTRUCTURE
© Copyright 2016 Dell Inc.7
DATA ARCHITECTURE OPTIMISATION WITH HADOOP
Don’t throw
data away
1
Reclaim Enterprise Data
Warehouse for high value BI
2
Leverage new data
sources
3
EMC CONFIDENTIAL—INTERNAL USE ONLY
Enterprise Data Hub
1. Open Architecture
• Open source platform
• APIs & engines for
multiple workloads
• Extensible for 3rd parties
2. Secure & Compliant
• Robust access controls
• Data encryption options
• Shared security policies
3. Enterprise Data Governance
• Meta data management
• Data lineage/tethering
• Audit histories
4. Unified & manageable
• Common storage &
resource management
• On-prem , cloud &
managed service
• Highly available
(including DR)
Enterprise-Grade Hadoop: Must-Haves
Resource Management
Online
NoSQL
DBMS
Analytic
MPP
DBMS
Search
Engine
Batch
Processing
Stream
Processing
Machine
Learning
SQL Streaming File System
System
Management
Data
Management
Metadata,Security,Audit,Lineage
© Copyright 2016 Dell Inc.9
ENTERPRISE DATAHUB- A PROGRESSION
EDWs
Marts Storage
Search
Servers
Documents
Archives
ERP, CRM, RDBMS, Machines Files, Images, Video, Logs, Clickstreams External Data Sources
Multi-workload analytic platform
• Bring applications to data
• Combine different workloads on
common data (i.e. SQL + Search)
• True BI agility
4
1
2
1
34
Active archive
• Full fidelity original data
• Indefinite time, any source
• Lowest cost storage
1
Data management, transformations
• One source of data for all analytics
• Persisted state of transformed data
• Significantly faster & cheaper
2
Self-service exploratory BI
• Simple search + BI tools
• “Schema on read” agility
• Reduce BI user backlog requests
3
© Copyright 2016 Dell Inc.10
ALBERT wants to:
 Optimise the existing data
infrastructure spend
 Enable analytics on all data,
structured and unstructured
 Lay the solid foundation of
Self-Service BI
• Albert has an existing large Enterprise Data
Warehouse Infrastructure. With rapid growth in
data volume, he needs to add 500 TB of capacity
to his existing EDW Infrastructure.
2013
6.5M
2014 2015 2016
EDW Cost
SAMPLE PROBLEM SCENARIO
• At Average Cost of $13,000 Per TB of EDW
Storage, the expansion is estimated to cost $6.5
Million to add 500 TB of capacity.
© Copyright 2016 Dell Inc.11
Data
Management
DATA SOLUTIONS FOR EDW MODERNISATION
Clickstream
Web & Social
Geolocation
Sensor & Machine
Server Logs
EXISTINGSOURCES
ERP
CRM
DATA
SERVICES
OPERATIONAL
SERVICES
Advanced Application ETL
HADOOP CORE
Business
Analytics
Visualization
& Dashboards
IT
Applications
NEWSOURCES
2
3
1
ETL/ELT OFFLOAD
ACTIVE ARCHIVE
ENRICH WITH NEW DATA
TYPES
MULTI-PROTOCOL
ACCESS
ENTERPRISE-GRADE
DATA MANAGEMENT
5
NFS, SMB,
HTTP, Swift
1
2
3
4
5
4
New Data Flow
Current Data Flow
Legend
OFFLOAD
© Copyright 2016 Dell Inc.12
ENTERPRISE EVOLUTION PROCESS
COST DRIVERS REVENUE DRIVERS
Enterprise Data
Warehouse is
Processing Limited
Enterprise Data
Warehouse is
Capacity Limited
Need to add new
data source
Types
Typical Evolution Process (Every customer journey is different)
HADOOP WITH ENTERPRISE GRADE STORAGE SOLUTION
ETL/ELT OFFLOADACTIVE ARCHIVE
ENRICH WITH NEW DATA
TYPES
© Copyright 2016 Dell Inc.13
DATA SILO CONSOLIDATION
13© Copyright 2016 EMC Corporation. All rights reserved.
© Copyright 2016 Dell Inc.14
DATA SILO CONSOLIDATION
Home Directories & File SharesSurveillance
Next-Gen Application
Hadoop & Analytics
Transaction
Logs
BLOBSEDW
Content
Shares
Marketing M&E
Social & Next-Gen
Archive &
Backup Target
Data Monetization
Design, Test
& Manufacture
Application Test
14© Copyright 2016 EMC Corporation. All rights reserved.
© Copyright 2016 Dell Inc.15
DATA SILO CONSOLIDATION
Home Directories & File SharesSurveillance
Next-Gen Application
Hadoop & Analytics
Transaction
Logs
BLOBSEDW
Content
Shares
Marketing M&E
Social & Next-Gen
Archive &
Backup Target
Data Monetization
Design, Test
& Manufacture
Application Test
15© Copyright 2016 EMC Corporation. All rights reserved.
© Copyright 2016 Dell Inc.16
DATA SILO CONSOLIDATION
DATA LAKE
Home Directories & File SharesSurveillance
Next-Gen Application
Hadoop & Analytics
Transaction
Logs
BLOBSEDW
Content
Shares
Marketing M&E
Social & Next-Gen
Archive &
Backup Target
Data Monetization
Design, Test
& Manufacture
Application Test
16© Copyright 2016 EMC Corporation. All rights reserved.
© Copyright 2016 Dell Inc.17
DATA LAKE
SCALE-OUT SINGLE
REPOSITORY
IN-PLACE
ANALYTICS
MULTI-PROTOCOL /
WORKLOAD TIERS
17
ENTERPRISE
FEATURES
MANAGE
PBs
© Copyright 2016 EMC Corporation. All rights reserved.
© Copyright 2016 Dell Inc.18
LOADING DATA WITH SQOOP…
sqoop import --verbose 
--connect ‘jdbc:mysql://localhost/people’ 
--table persons 
--username root 
--hcatalog-table persons 
--hcatalog-storage-stanza "stored as orc” 
--m 1 
--create-hcatalog-table 
--driver com.mysql.jdbc.Drive
MySQL HDFS Hive
Batch
Sqoop
Sqoop can do bidirectional transfers between
JDBC compliant stores and Isilon HDFS.
© Copyright 2016 Dell Inc.19
HIVE – ONE TOOL FOR MANY SQL USE CASES…
OLTP, ERP, CRM Systems
Unstructured documents, emails
Clickstream
Server logs
Social Media/Web Data
Sensor. Machine Data
Geolocation
Interactive
Analytics
Batch Reports /
Deep Analytics
Hive - SQL
ETL / ELT
Compute & Isilon HDFS storage scales independently as needed
Processed
HiveQL
Interactive
Hive Server
© Copyright 2016 Dell Inc.20
Hive Server 2
(compile, optimize, execute)
Isilon
HDFS
DELL EMC AT SCALE HIVE ARCHITECTURE
Client – beeline, Hive View,
Zeppelin, BI of Choice
databas
e
Table
1
Partition
1
Table
2
Partition
2
Hive
MetaStore
TEZ / MR
Data in Isilon HDFS
• Structured
• Unstructured
• Semi structured
Schema
definitions
Distribution Engine
Data Storage
Interpreter
Hive parses and plans query
Query converted to MR/TEZ
MR or TEZ run
by Hadoop
© Copyright 2016 Dell Inc.22
1. Active Archive
– Optimise EDW storage by archiving cold data but still analyse as needed
2. ETL Offload
– Improve EDW performance by offloading ETL processing to Hadoop
3. Semi/Unstructured Data Analytics
– Increase confidence in business decisions with new data sources
4. Multi-protocol Access
– Enable seamless in-place access using NFS, SMB, HTTP, Swift, FTP, …
5. Scale storage & compute independently – virtualise Hadoop
6. Data Management
– Enterprise-grade data management at Hadoop economics
© Copyright 2016 Dell Inc.23
Dell EMC SOLUTION ACCELERATORS
PROVIDING DELIVERY CERTAINTY AND IMPROVING TIME TO VALUE
INGEST STORE ANALYZE SURFACE ACT
VISUALIZE
COTs and Custom App Integration
 Rapid implementation of
applications
 Knowledge exchange of custom
integration projects
 Documented best practices
MODEL AND REFINE
Develop & Refine Analytical Models
 Library of analytical models
and algorithms
 Industry focused
 Use case focused
CAPTURE AND STORE
Source Systems, Data Lake Storage
 Documented procedures to use
Open Source tools
© Copyright 2016 Dell Inc.24
UNDECIDED? BIG DATA VISION WORKSHOP
IDENTIFY YOUR OPPORTUNITY
Align Business &
IT Around Big
Data
Identify
Opportunities for
Big Data Analytics
Demonstrate Data
Science
Possibilities
Prioritize Use
Cases by
Feasibility and
Value
Recommendation
& Roadmap
© Copyright 2016 Dell Inc.25 25© Copyright 2016 EMC Corporation. All rights reserved.

More Related Content

PPTX
2020 Cloudera Data Impact Awards Finalists
PPTX
The Vortex of Change - Digital Transformation (Presented by Intel)
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
Introducing the data science sandbox as a service 8.30.18
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
PPTX
Turning Data into Business Value with a Modern Data Platform
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
PPTX
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
2020 Cloudera Data Impact Awards Finalists
The Vortex of Change - Digital Transformation (Presented by Intel)
Extending Cloudera SDX beyond the Platform
Introducing the data science sandbox as a service 8.30.18
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Turning Data into Business Value with a Modern Data Platform
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...

What's hot (20)

PPTX
From Insight to Action: Using Data Science to Transform Your Organization
PPTX
Building a Modern Analytic Database with Cloudera 5.8
PPTX
Becoming Data-Driven Through Cultural Change
PPTX
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
PPTX
Modern Data Warehouse Fundamentals Part 1
PDF
The Future of Data Management: The Enterprise Data Hub
PPTX
Using Big Data to Transform Your Customer’s Experience - Part 1

PPTX
Introducing Workload XM 8.7.18
PPTX
Get Started with Cloudera’s Cyber Solution
PPTX
Keynote: The Journey to Pervasive Analytics
PPTX
Breakout: Operational Analytics with Hadoop
PPTX
Enterprise Data Hub: The Next Big Thing in Big Data
PPTX
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
PPTX
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
PPTX
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
PDF
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 
PDF
Converged Everything, Converged Infrastructure delivering business value and ...
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PPTX
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
PPTX
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
From Insight to Action: Using Data Science to Transform Your Organization
Building a Modern Analytic Database with Cloudera 5.8
Becoming Data-Driven Through Cultural Change
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Modern Data Warehouse Fundamentals Part 1
The Future of Data Management: The Enterprise Data Hub
Using Big Data to Transform Your Customer’s Experience - Part 1

Introducing Workload XM 8.7.18
Get Started with Cloudera’s Cyber Solution
Keynote: The Journey to Pervasive Analytics
Breakout: Operational Analytics with Hadoop
Enterprise Data Hub: The Next Big Thing in Big Data
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
Webinar - Delivering Enhanced Message Processing at Scale With an Always-on D...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 
Converged Everything, Converged Infrastructure delivering business value and ...
Data Lakehouse, Data Mesh, and Data Fabric (r1)
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Ad

Viewers also liked (20)

PPTX
Dell EMC Future Ready Advantage
PDF
David Goulden keynote at Dell EMC World
PDF
MT50 Data is the new currency: Protect it!
PDF
Vito securitas - Luc Blyaert
PDF
Dell emc - The Changing IT Landscape
PDF
MT81 Keys to Successful Enterprise IoT Initiatives
PDF
MT12 - SAP solutions from Dell – from your Datacenter to the Cloud
PDF
The Path to Digital Transformation
PPTX
Apache Kudu: Technical Deep Dive


PDF
MT85 Challenges at the Edge: Dell Edge Gateways
PPT
Building a 360 degree customer view
PDF
State of the Mainframe for 2017
PDF
Big Data Analytics for Real-time Operational Intelligence with Your z/OS Data
PDF
Top 5 mistakes when writing Spark applications
PDF
IBM Bankenstamm - Mehrwert durch kanalübergreifenden Kundendialog im Banking
PPTX
Transforming Business for the Digital Age (Presented by Microsoft)
PPTX
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
PPTX
Hadoop Operations
PPTX
Big Data Analytics Proposal #1
PPTX
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Dell EMC Future Ready Advantage
David Goulden keynote at Dell EMC World
MT50 Data is the new currency: Protect it!
Vito securitas - Luc Blyaert
Dell emc - The Changing IT Landscape
MT81 Keys to Successful Enterprise IoT Initiatives
MT12 - SAP solutions from Dell – from your Datacenter to the Cloud
The Path to Digital Transformation
Apache Kudu: Technical Deep Dive


MT85 Challenges at the Edge: Dell Edge Gateways
Building a 360 degree customer view
State of the Mainframe for 2017
Big Data Analytics for Real-time Operational Intelligence with Your z/OS Data
Top 5 mistakes when writing Spark applications
IBM Bankenstamm - Mehrwert durch kanalübergreifenden Kundendialog im Banking
Transforming Business for the Digital Age (Presented by Microsoft)
The role of Big Data and Modern Data Management in Driving a Customer 360 fro...
Hadoop Operations
Big Data Analytics Proposal #1
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Ad

Similar to The Transformation of your Data in modern IT (Presented by DellEMC) (20)

PDF
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
PDF
Introduction to Hadoop
PPTX
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
PDF
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
PPTX
Modernise your EDW - Data Lake
PDF
Building a Modern Data Architecture with Enterprise Hadoop
PPTX
Accelerating Big Data Insights
PDF
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
PDF
Big data presentation (2014)
PPTX
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
PDF
Ibm integrated analytics system
PPTX
Hortonworks Oracle Big Data Integration
PDF
Bridging the Big Data Gap in the Software-Driven World
PDF
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
PDF
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
PDF
Hadoop and the Data Warehouse: Point/Counter Point
PPTX
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
PDF
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
PDF
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
PDF
Hadoop and SQL: Delivery Analytics Across the Organization
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Introduction to Hadoop
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Modernise your EDW - Data Lake
Building a Modern Data Architecture with Enterprise Hadoop
Accelerating Big Data Insights
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Big data presentation (2014)
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Ibm integrated analytics system
Hortonworks Oracle Big Data Integration
Bridging the Big Data Gap in the Software-Driven World
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
Hadoop and the Data Warehouse: Point/Counter Point
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Hadoop and SQL: Delivery Analytics Across the Organization

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
Machine Learning with Limited Labeled Data 4/3/19
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Modern Data Warehouse Fundamentals Part 2
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PPTX
Analyst Webinar: Doing a 180 on Customer 360
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
Cloudera SDX
PPTX
Get started with Cloudera's cyber solution
PPTX
Spark and Deep Learning Frameworks at Scale 7.19.18
PPTX
Cloud Data Warehousing with Cloudera Altus 7.24.18
PPTX
How Cloudera SDX can aid GDPR compliance
PPTX
When SAP alone is not enough
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Cloudera SDX
Get started with Cloudera's cyber solution
Spark and Deep Learning Frameworks at Scale 7.19.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
How Cloudera SDX can aid GDPR compliance
When SAP alone is not enough

Recently uploaded (20)

PPTX
CHAPTER 2 - PM Management and IT Context
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
System and Network Administraation Chapter 3
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
Digital Strategies for Manufacturing Companies
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPT
Introduction Database Management System for Course Database
PPTX
L1 - Introduction to python Backend.pptx
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
CHAPTER 2 - PM Management and IT Context
How Creative Agencies Leverage Project Management Software.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
System and Network Administraation Chapter 3
Odoo Companies in India – Driving Business Transformation.pdf
Understanding Forklifts - TECH EHS Solution
ISO 45001 Occupational Health and Safety Management System
Digital Strategies for Manufacturing Companies
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
VVF-Customer-Presentation2025-Ver1.9.pptx
Introduction Database Management System for Course Database
L1 - Introduction to python Backend.pptx
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Wondershare Filmora 15 Crack With Activation Key [2025
ManageIQ - Sprint 268 Review - Slide Deck
How to Choose the Right IT Partner for Your Business in Malaysia

The Transformation of your Data in modern IT (Presented by DellEMC)

  • 1. The Transformation of your Data in modern IT Jeff Wiggins, Technical Manager Emerging Technology Division
  • 2. © Copyright 2016 Dell Inc.2 ALL ORGANISATIONS ARE ON A JOURNEY TO… 1000X MORE DATA REAL TIME OPERATION ANALYTIC INSIGHTS PERSONALISATION & ENHANCED SERVICES
  • 3. © Copyright 2016 Dell Inc.3 THE JOURNEY TO DIGITAL BREAKS TRADITIONAL IT INFRASTRUCTURE Gartner IT Budget Growth Clickstream Geolocation Web Data Internet of Things Docs, emails Server logs TRADITIONAL DATA NEW DATA SOURCES
  • 4. © Copyright 2016 Dell Inc.4 Challenges with Enterprise Data Warehouses 1. Expensive storage – 70% of data in a typical EDW is unused 2. Expensive processing – On average 55% of EDW CPU utilisation is low value ETL 3. Expensive licensing… 4. New data sources – Traditional systems are unable to capture and use new data sources, such as unstructured or semi-structured data
  • 5. © Copyright 2016 Dell Inc.5 COST DRIVERS OPERATIONS 50% ANALYTICS 20% ETL/ELT 30% COLD DATA 70% HOT DATA 30% ENTERPRISE DATA WAREHOUSE HADOOP WITH ENTERPRISE GRADE STORAGE SOLUTION ETL/ELT OFFLOADACTIVE ARCHIVE > $16 K per TB < $1 K per TB Cost Comparison Vs.
  • 6. © Copyright 2016 Dell Inc.6 Throw Data Away1 Waste capacity on low value workloads 2 Unable to leverage new data sources 3 CHALLENGES WITH EXISTING EDW INFRASTRUCTURE
  • 7. © Copyright 2016 Dell Inc.7 DATA ARCHITECTURE OPTIMISATION WITH HADOOP Don’t throw data away 1 Reclaim Enterprise Data Warehouse for high value BI 2 Leverage new data sources 3
  • 8. EMC CONFIDENTIAL—INTERNAL USE ONLY Enterprise Data Hub 1. Open Architecture • Open source platform • APIs & engines for multiple workloads • Extensible for 3rd parties 2. Secure & Compliant • Robust access controls • Data encryption options • Shared security policies 3. Enterprise Data Governance • Meta data management • Data lineage/tethering • Audit histories 4. Unified & manageable • Common storage & resource management • On-prem , cloud & managed service • Highly available (including DR) Enterprise-Grade Hadoop: Must-Haves Resource Management Online NoSQL DBMS Analytic MPP DBMS Search Engine Batch Processing Stream Processing Machine Learning SQL Streaming File System System Management Data Management Metadata,Security,Audit,Lineage
  • 9. © Copyright 2016 Dell Inc.9 ENTERPRISE DATAHUB- A PROGRESSION EDWs Marts Storage Search Servers Documents Archives ERP, CRM, RDBMS, Machines Files, Images, Video, Logs, Clickstreams External Data Sources Multi-workload analytic platform • Bring applications to data • Combine different workloads on common data (i.e. SQL + Search) • True BI agility 4 1 2 1 34 Active archive • Full fidelity original data • Indefinite time, any source • Lowest cost storage 1 Data management, transformations • One source of data for all analytics • Persisted state of transformed data • Significantly faster & cheaper 2 Self-service exploratory BI • Simple search + BI tools • “Schema on read” agility • Reduce BI user backlog requests 3
  • 10. © Copyright 2016 Dell Inc.10 ALBERT wants to:  Optimise the existing data infrastructure spend  Enable analytics on all data, structured and unstructured  Lay the solid foundation of Self-Service BI • Albert has an existing large Enterprise Data Warehouse Infrastructure. With rapid growth in data volume, he needs to add 500 TB of capacity to his existing EDW Infrastructure. 2013 6.5M 2014 2015 2016 EDW Cost SAMPLE PROBLEM SCENARIO • At Average Cost of $13,000 Per TB of EDW Storage, the expansion is estimated to cost $6.5 Million to add 500 TB of capacity.
  • 11. © Copyright 2016 Dell Inc.11 Data Management DATA SOLUTIONS FOR EDW MODERNISATION Clickstream Web & Social Geolocation Sensor & Machine Server Logs EXISTINGSOURCES ERP CRM DATA SERVICES OPERATIONAL SERVICES Advanced Application ETL HADOOP CORE Business Analytics Visualization & Dashboards IT Applications NEWSOURCES 2 3 1 ETL/ELT OFFLOAD ACTIVE ARCHIVE ENRICH WITH NEW DATA TYPES MULTI-PROTOCOL ACCESS ENTERPRISE-GRADE DATA MANAGEMENT 5 NFS, SMB, HTTP, Swift 1 2 3 4 5 4 New Data Flow Current Data Flow Legend OFFLOAD
  • 12. © Copyright 2016 Dell Inc.12 ENTERPRISE EVOLUTION PROCESS COST DRIVERS REVENUE DRIVERS Enterprise Data Warehouse is Processing Limited Enterprise Data Warehouse is Capacity Limited Need to add new data source Types Typical Evolution Process (Every customer journey is different) HADOOP WITH ENTERPRISE GRADE STORAGE SOLUTION ETL/ELT OFFLOADACTIVE ARCHIVE ENRICH WITH NEW DATA TYPES
  • 13. © Copyright 2016 Dell Inc.13 DATA SILO CONSOLIDATION 13© Copyright 2016 EMC Corporation. All rights reserved.
  • 14. © Copyright 2016 Dell Inc.14 DATA SILO CONSOLIDATION Home Directories & File SharesSurveillance Next-Gen Application Hadoop & Analytics Transaction Logs BLOBSEDW Content Shares Marketing M&E Social & Next-Gen Archive & Backup Target Data Monetization Design, Test & Manufacture Application Test 14© Copyright 2016 EMC Corporation. All rights reserved.
  • 15. © Copyright 2016 Dell Inc.15 DATA SILO CONSOLIDATION Home Directories & File SharesSurveillance Next-Gen Application Hadoop & Analytics Transaction Logs BLOBSEDW Content Shares Marketing M&E Social & Next-Gen Archive & Backup Target Data Monetization Design, Test & Manufacture Application Test 15© Copyright 2016 EMC Corporation. All rights reserved.
  • 16. © Copyright 2016 Dell Inc.16 DATA SILO CONSOLIDATION DATA LAKE Home Directories & File SharesSurveillance Next-Gen Application Hadoop & Analytics Transaction Logs BLOBSEDW Content Shares Marketing M&E Social & Next-Gen Archive & Backup Target Data Monetization Design, Test & Manufacture Application Test 16© Copyright 2016 EMC Corporation. All rights reserved.
  • 17. © Copyright 2016 Dell Inc.17 DATA LAKE SCALE-OUT SINGLE REPOSITORY IN-PLACE ANALYTICS MULTI-PROTOCOL / WORKLOAD TIERS 17 ENTERPRISE FEATURES MANAGE PBs © Copyright 2016 EMC Corporation. All rights reserved.
  • 18. © Copyright 2016 Dell Inc.18 LOADING DATA WITH SQOOP… sqoop import --verbose --connect ‘jdbc:mysql://localhost/people’ --table persons --username root --hcatalog-table persons --hcatalog-storage-stanza "stored as orc” --m 1 --create-hcatalog-table --driver com.mysql.jdbc.Drive MySQL HDFS Hive Batch Sqoop Sqoop can do bidirectional transfers between JDBC compliant stores and Isilon HDFS.
  • 19. © Copyright 2016 Dell Inc.19 HIVE – ONE TOOL FOR MANY SQL USE CASES… OLTP, ERP, CRM Systems Unstructured documents, emails Clickstream Server logs Social Media/Web Data Sensor. Machine Data Geolocation Interactive Analytics Batch Reports / Deep Analytics Hive - SQL ETL / ELT Compute & Isilon HDFS storage scales independently as needed Processed HiveQL Interactive Hive Server
  • 20. © Copyright 2016 Dell Inc.20 Hive Server 2 (compile, optimize, execute) Isilon HDFS DELL EMC AT SCALE HIVE ARCHITECTURE Client – beeline, Hive View, Zeppelin, BI of Choice databas e Table 1 Partition 1 Table 2 Partition 2 Hive MetaStore TEZ / MR Data in Isilon HDFS • Structured • Unstructured • Semi structured Schema definitions Distribution Engine Data Storage Interpreter Hive parses and plans query Query converted to MR/TEZ MR or TEZ run by Hadoop
  • 21. © Copyright 2016 Dell Inc.22 1. Active Archive – Optimise EDW storage by archiving cold data but still analyse as needed 2. ETL Offload – Improve EDW performance by offloading ETL processing to Hadoop 3. Semi/Unstructured Data Analytics – Increase confidence in business decisions with new data sources 4. Multi-protocol Access – Enable seamless in-place access using NFS, SMB, HTTP, Swift, FTP, … 5. Scale storage & compute independently – virtualise Hadoop 6. Data Management – Enterprise-grade data management at Hadoop economics
  • 22. © Copyright 2016 Dell Inc.23 Dell EMC SOLUTION ACCELERATORS PROVIDING DELIVERY CERTAINTY AND IMPROVING TIME TO VALUE INGEST STORE ANALYZE SURFACE ACT VISUALIZE COTs and Custom App Integration  Rapid implementation of applications  Knowledge exchange of custom integration projects  Documented best practices MODEL AND REFINE Develop & Refine Analytical Models  Library of analytical models and algorithms  Industry focused  Use case focused CAPTURE AND STORE Source Systems, Data Lake Storage  Documented procedures to use Open Source tools
  • 23. © Copyright 2016 Dell Inc.24 UNDECIDED? BIG DATA VISION WORKSHOP IDENTIFY YOUR OPPORTUNITY Align Business & IT Around Big Data Identify Opportunities for Big Data Analytics Demonstrate Data Science Possibilities Prioritize Use Cases by Feasibility and Value Recommendation & Roadmap
  • 24. © Copyright 2016 Dell Inc.25 25© Copyright 2016 EMC Corporation. All rights reserved.