SlideShare a Scribd company logo
1	
  
Becoming	
  Informa/on-­‐Driven	
  
Introduc/on	
  to	
  the	
  Enterprise	
  Data	
  Hub	
  
Mike	
  Olson	
  
Cloudera,	
  Inc.	
  
Co-­‐Founder	
  &	
  Chief	
  Strategy	
  Ocer	
  
2	
  
Expanding	
  Data	
  Requires	
  A	
  New	
  Approach	
  
Š2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  2	
  
1980s	
  
Bring	
  Data	
  to	
  Compute	
  
Now	
  
Bring	
  Compute	
  to	
  Data	
  
RelaEve	
  size	
  &	
  complexity	
  
Data	
  
InformaEon-­‐centric	
  
businesses	
  use	
  all	
  data:	
  
	
  	
  
Mul/-­‐structured,	
  	
  
internal	
  &	
  external	
  data	
  	
  
of	
  all	
  types	
  
Compute	
  
Compute	
  
Compute	
  
Process-­‐centric	
  	
  
businesses	
  use:	
  
	
  
• Structured	
  data	
  mainly	
  
• Internal	
  data	
  only	
  
• “Important”	
  data	
  only	
  
	
  
	
  
Compute	
  
Compute	
  
Compute	
  
Data	
  
Data	
  
Data	
  
Data	
  
3	
  
The	
  Old	
  Way:	
  Bringing	
  Data	
  to	
  Compute	
  
Š2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  3	
  
Complex	
  Architecture	
  
•  Many	
  special-­‐purpose	
  
systems	
  
•  Moving	
  data	
  around	
  
•  No	
  complete	
  views	
  
Visibility	
  
•  Leaving	
  data	
  behind	
  
•  Risk	
  and	
  compliance	
  
•  High	
  cost	
  of	
  storage	
  
Time	
  to	
  Data	
  
•  Up-­‐front	
  modeling	
  
•  Transforms	
  slow	
  
•  Transforms	
  lose	
  data	
  
Cost	
  of	
  AnalyEcs	
  
•  Exis/ng	
  systems	
  strained	
  
•  No	
  agility	
  
•  BI	
  backlog	
  
4	
  
1	
  
2	
  
3	
  
SERVERS	
  MARTS	
  EDWS	
   DOCUMENTS	
   STORAGE	
   SEARCH	
   ARCHIVE	
  
ERP,	
  CRM,	
  RDBMS,	
  MACHINES	
   FILES,	
  IMAGES,	
  VIDEOS,	
  LOGS,	
  CLICKSTREAMS	
   EXTERNAL	
  DATA	
  SOURCES	
  
4	
  
SERVERS	
   MARTS	
   EDWS	
   DOCUMENTS	
   STORAGE	
   SEARCH	
   ARCHIVE	
  
ERP,	
  CRM,	
  RDBMS,	
  MACHINES	
   FILES,	
  IMAGES,	
  VIDEOS,	
  LOGS,	
  CLICKSTREAMS	
   ESTERNAL	
  DATA	
  SOURCES	
  
Š2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
MulE-­‐workload	
  analyEc	
  plaRorm	
  
•  Bring	
  applica/ons	
  to	
  data	
  
•  Combine	
  different	
  workloads	
  on	
  	
  
common	
  data	
  (i.e.	
  SQL	
  +	
  Search)	
  
•  True	
  BI	
  agility	
  
4	
  
1	
  
2	
  
3	
   4	
  
The	
  New	
  Way:	
  Bringing	
  Compute	
  to	
  Data	
  
4	
  
AcEve	
  archive	
  
•  Full	
  delity	
  original	
  data	
  
•  Indefinite	
  /me,	
  any	
  source	
  
•  Lowest	
  cost	
  storage	
  
1	
  
Data	
  management,	
  transforms	
  
•  One	
  source	
  of	
  data	
  for	
  all	
  analy/cs	
  
•  Persist	
  state	
  of	
  transformed	
  data	
  
•  Significantly	
  faster	
  &	
  cheaper	
  
2	
  
Self-­‐service	
  exploratory	
  BI	
  
•  Simple	
  search	
  +	
  BI	
  tools	
  
•  “Schema	
  on	
  read”	
  agility	
  
•  Reduce	
  BI	
  user	
  backlog	
  requests	
  
3	
  
5	
  
Beeer,	
  faster,	
  cheaper	
  and	
  mul/-­‐framework	
  
BATCH	
  
PROCESSING	
  
MR	
  /	
  PIG/	
  Hive	
  /	
  Cascading	
  
SQL	
  
IMPALA	
  
SEARCH	
  
SOLR	
  
MACHINE	
  
LEARNING	
  
SAS,	
  R,	
  H20,	
  MLlib	
  
STREAM	
  
PROCESSING	
  
SPARK	
  STREAMING	
  
NOSQL	
  
HBASE	
  
Process	
  Data	
  
IN-­‐MEMORY	
  
SPARK	
  
Train	
  &	
  Test	
  
Models	
  
Respond	
  to	
  
Events	
  in	
  RT	
  
Explore	
  &	
  
Analyze	
  Data	
  
• Highly	
  mature	
  
• Wide	
  range	
  of	
  clients	
  
• Significant	
  advances	
  
in	
  speed	
  &	
  usability	
  
• Integra/on	
  with	
  the	
  
SAS	
  &	
  Revolu/on	
  
product	
  porgolio	
  
• Python	
  /	
  0xdata	
  /	
  ML	
  
lib	
  for	
  advanced	
  users	
  
• Very	
  low	
  (~10ms)	
  
latency	
  
• High	
  volumes	
  of	
  
single	
  events	
  
• High	
  speed	
  
• High	
  concurrency	
  
• Workload	
  mgt	
  
• Broad	
  BI	
  support	
  
• For	
  unstructured	
  &	
  
semi-­‐structured	
  data	
  
• For	
  business	
  users	
  
• Low	
  (1	
  second)	
  latency	
  
• Windows	
  (collec/ons)	
  
of	
  events	
  
Š2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
6	
  
Opera/onal	
  Data	
  Store	
  
•  Consolidate,	
  cleanse	
  &	
  stage	
  
data	
  
•  Promote	
  to	
  other	
  opera/onal	
  
systems	
  or	
  EDW’s	
  
Data	
  Warehouse	
  
•  ELT	
  
•  Archive	
  
Ra/onalizing	
  exis/ng	
  infrastructure	
  
Migra/ng	
  data	
  sets,	
  workloads	
  or	
  en/re	
  systems	
  from	
  more	
  expensive	
  or	
  less	
  
flexible	
  systems	
  
Š2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
7	
  
Combine	
  &	
  
explore	
  new	
  	
  
data	
  sets	
  
• Scrip/ng	
  
• Data	
  blending	
  
• Tradi/onal	
  ETL	
  
Support	
  ad-­‐hoc	
  
marts	
  and	
  self-­‐
serve	
  BI	
  users	
  
• Tableau,	
  Qlik	
  et	
  al	
  
Enable	
  data	
  
scien/sts	
  to	
  train	
  
&	
  test	
  models	
  
• ML	
  libraries	
  
• SAS,	
  Revolu/on	
  
What	
  do	
  we	
  mean	
  by	
  data	
  discovery?	
  
Providing	
  a	
  flexible	
  analy/c	
  sandbox	
  where	
  users	
  can	
  apply	
  mul/ple	
  tools	
  &	
  
techniques	
  to	
  derive	
  insights	
  from	
  new	
  &	
  tradi/onal	
  data	
  
Š2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
8	
  
Analyze	
  paeerns	
  
over	
  deep	
  
histories	
  
• Recommenda/ons	
  
• Outliers	
  
Automate	
  
responses	
  to	
  new	
  
data	
  /	
  
observa/ons	
  
• Classifying	
  or	
  scoring	
  
new	
  data	
  
User	
  explora/on	
  /	
  
judgment	
  
applica/on	
  
• Reviewing	
  outliers	
  
• Overriding	
  sugges/ons	
  
What	
  do	
  we	
  mean	
  by	
  pervasive	
  analy/cs?	
  
Using	
  predic/ve	
  analy/cs	
  to	
  improve	
  business	
  processes	
  or	
  augment	
  
professional	
  judgment	
  in	
  an	
  automated	
  way	
  across	
  the	
  organiza/on	
  
Š2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
9	
  
Big	
  Data	
  in	
  Credit	
  Card	
  Processing	
  
“Customer	
  privacy	
  is	
  
paramount,	
  but	
  we	
  need	
  to	
  
keep	
  vast	
  amounts	
  of	
  
informaFon	
  online	
  to	
  run	
  
our	
  business.	
  Can	
  we	
  achieve	
  
both	
  goals?”	
  
“Modern	
  credit	
  card	
  fraud	
  
rings	
  operate	
  globally	
  over	
  
long	
  Fme	
  scales	
  –	
  how	
  can	
  we	
  
collect,	
  store	
  &	
  analyze	
  the	
  
petabytes	
  of	
  data	
  it	
  takes	
  to	
  
detect	
  them?”	
  
“We	
  obviously	
  have	
  vast	
  and	
  
detailed	
  informaFon	
  about	
  
customer	
  purchases.	
  Can	
  we	
  
combine	
  it	
  with	
  GPS	
  &	
  mobile	
  
data,	
  combined	
  with	
  
browsing	
  behavior	
  to	
  offer	
  
new	
  products?”	
  
“How	
  can	
  we	
  deliver	
  what	
  
the	
  business	
  team	
  wants,	
  
and	
  faster,	
  without	
  
spending	
  tens	
  of	
  millions	
  of	
  
dollars	
  to	
  expand	
  our	
  data	
  
warehouse?”	
  
Fraud	
  DetecEon	
  
Regulatory	
  	
  
Compliance	
  
Product	
  &	
  Service	
  	
  
InnovaEon	
  
OperaEonal	
  	
  
Eciency	
  
CFO	
  &	
  CRO	
   CIO	
  &	
  CRO	
   R&D,	
  CMO	
   CIO	
  
10	
  
Big	
  Data	
  in	
  Retail	
  
360°	
  Customer	
  View	
   Fraud	
  PrevenEon	
  
LogisEcs	
  &	
  	
  
Supply	
  Chain	
   OperaEonal	
  Eciency	
  
CMO	
   CMO	
  &	
  	
  
Customer	
  Service	
  
CEO,	
  VP	
  OperaEons	
   CIO	
  
“We	
  want	
  to	
  know	
  what	
  our	
  
customer	
  do	
  on-­‐line	
  and	
  in	
  
our	
  stored.	
  How	
  can	
  we	
  
combine	
  data	
  from	
  separate	
  
analyFcs	
  silos	
  to	
  understand	
  
&	
  serve	
  them	
  beSer?”	
  
“TheT,	
  or	
  ‘shrinkage’	
  in	
  our	
  
stores	
  is	
  on	
  the	
  increase	
  –	
  
can	
  we	
  combine	
  POS	
  data	
  
with	
  video	
  surveillance	
  to	
  
reduce	
  it	
  without	
  impacFng	
  
customer	
  service	
  
negaFvely?”	
  
“How	
  can	
  we	
  reduce	
  stock-­‐
outs	
  &	
  ensure	
  products	
  are	
  in	
  
the	
  right	
  stores	
  at	
  the	
  right	
  
Fme?	
  Can	
  we	
  combine	
  data	
  
from	
  our	
  carriers	
  with	
  in-­‐
store	
  historical	
  data	
  from	
  
thousands	
  of	
  stores?	
  
“Our	
  EDW	
  infrastructure	
  is	
  
being	
  overwhelmed	
  with	
  
data	
  and	
  workloads;	
  we	
  are	
  
running	
  into	
  capacity	
  limits,	
  
and	
  the	
  annual	
  costs	
  of	
  
expansion	
  are	
  in	
  the	
  tens	
  of	
  
millions.	
  What	
  can	
  we	
  do?”	
  
11	
  
Big	
  Data	
  in	
  Health	
  Care	
  
360°	
  PaEent	
  View	
  
Regulatory	
  
Compliance	
  
Maximize	
  
Medical	
  Ecacy	
   OperaEonal	
  Eciency	
  
VP	
  OperaEons,	
  	
  
Chief	
  of	
  Compliance	
  
VP	
  OperaEons	
  
Chief	
  Medical	
  Ocer	
  
CFO	
  
Chief	
  Medical	
  Ocer	
  
CIO	
  
“PaFent	
  data	
  ends	
  up	
  
scaSered	
  across	
  many	
  
different	
  systems	
  –	
  is	
  there	
  a	
  
way	
  to	
  get	
  a	
  complete	
  picture	
  
by	
  combining	
  it	
  while	
  
ensuring	
  HIPAA	
  compliance?”	
  
“The	
  move	
  to	
  EMR	
  combined	
  
with	
  the	
  strict	
  regulaFons	
  
means	
  we	
  need	
  to	
  keep	
  at	
  
least	
  7	
  years	
  of	
  data	
  online	
  –	
  
how	
  can	
  we	
  afford	
  to	
  do	
  that	
  
and	
  make	
  it	
  searchable	
  and	
  
available	
  for	
  analysis?”	
  
“We	
  invest	
  hundreds	
  of	
  
millions	
  in	
  new	
  equipment	
  
every	
  year.	
  How	
  can	
  we	
  judge	
  
the	
  long	
  term	
  ecacy	
  for	
  
paFent	
  outcomes,	
  and	
  make	
  
smarter	
  investment	
  
decisions?”	
  
“Our	
  EDW	
  infrastructure	
  is	
  
being	
  overwhelmed	
  with	
  data	
  
and	
  workloads;	
  we	
  are	
  
running	
  into	
  capacity	
  limits,	
  
and	
  the	
  annual	
  costs	
  of	
  
expansion	
  are	
  in	
  the	
  tens	
  of	
  
millions.	
  What	
  can	
  we	
  do?”	
  
12
13
14	
   Š2014	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Mike	
  Olson	
  
@mikeolson	
  
mike.olson@cloudera.com	
  

More Related Content

PDF
Big Data at Oracle - Strata 2015 San Jose
PDF
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
PDF
Data Lakes - The Key to a Scalable Data Architecture
PDF
DAMA Chicago - Ensuring your data lake doesn’t become a data swamp
 
PPTX
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
PDF
Stream based Data Integration
PDF
Big Data Discovery
PDF
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
Big Data at Oracle - Strata 2015 San Jose
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Data Lakes - The Key to a Scalable Data Architecture
DAMA Chicago - Ensuring your data lake doesn’t become a data swamp
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Stream based Data Integration
Big Data Discovery
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016

What's hot (20)

PDF
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
PPTX
Hadoop and Manufacturing
PPTX
Hortonworks Oracle Big Data Integration
PPTX
10 Amazing Things To Do With a Hadoop-Based Data Lake
PPTX
Breakout: Operational Analytics with Hadoop
PDF
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
PPTX
Building a Modern Analytic Database with Cloudera 5.8
PDF
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
PDF
Dataguise hortonworks insurance_feb25
PDF
Solving Big Data Problems using Hortonworks
PDF
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
PPTX
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
PDF
IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...
PDF
The path to a Modern Data Architecture in Financial Services
PPTX
Oracle's BigData solutions
PDF
Data Governance for Data Lakes
PDF
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
PPTX
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
PPTX
Govern This! Data Discovery and the application of data governance with new s...
PDF
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Leverage Big Data to Enhance Customer Experience in Telecommunications – with...
Hadoop and Manufacturing
Hortonworks Oracle Big Data Integration
10 Amazing Things To Do With a Hadoop-Based Data Lake
Breakout: Operational Analytics with Hadoop
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
Building a Modern Analytic Database with Cloudera 5.8
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Dataguise hortonworks insurance_feb25
Solving Big Data Problems using Hortonworks
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...
The path to a Modern Data Architecture in Financial Services
Oracle's BigData solutions
Data Governance for Data Lakes
How to Become an Analytics Ready Insurer - with Informatica and Hortonworks
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Govern This! Data Discovery and the application of data governance with new s...
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Ad

Similar to Ask bigger questions (20)

PDF
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
PPTX
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
PPTX
Cloudera Federal Forum 2014: Hadoop's Impact on the Future of Data Management
PPTX
When SAP alone is not enough
PDF
The Future of Data Management: The Enterprise Data Hub
PPTX
Breakout: Data Discovery with Hadoop
PPTX
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
PPTX
The Future of Data Management: The Enterprise Data Hub
PPTX
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
PPTX
The Journey to Success with Big Data
PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Relying on Data for Strategic Decision-Making--Financial Services Experience
PPTX
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
PPTX
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
PPTX
Turning Data into Business Value with a Modern Data Platform
PPTX
Making Self-Service BI a Reality in the Enterprise
PDF
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
PDF
Capgemini Leap Data Transformation Framework with Cloudera
PPTX
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
PPTX
The Five Markers on Your Big Data Journey
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
Cloudera Federal Forum 2014: Hadoop's Impact on the Future of Data Management
When SAP alone is not enough
The Future of Data Management: The Enterprise Data Hub
Breakout: Data Discovery with Hadoop
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
The Future of Data Management: The Enterprise Data Hub
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
The Journey to Success with Big Data
Modern Data Warehouse Fundamentals Part 1
Relying on Data for Strategic Decision-Making--Financial Services Experience
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
Turning Data into Business Value with a Modern Data Platform
Making Self-Service BI a Reality in the Enterprise
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Capgemini Leap Data Transformation Framework with Cloudera
Syncsort, Tableau, & Cloudera present: Break the Barriers to Big Data Insight
The Five Markers on Your Big Data Journey
Ad

More from South West Data Meetup (11)

PDF
Leveraging open source for large scale analytics
PDF
Met Office Informatics Lab
PDF
Time Series Analytics for Big Fast Data
PDF
@Bristol Data Dome Workshop (ISO/Urban Tide)
PPTX
Assurance Scoring: using machine learning and analytics to reduce risk in the...
PDF
Imagine Bristol - interactive workshop day
PDF
Open Data Institute (ODI) Node
PPTX
Bristol's Open Data Journey
PDF
@Bristol Data Dome workshop - NSC Creative
PDF
Declarative data analysis
PPTX
Bristol is Open: Exploring Open Data in the City
Leveraging open source for large scale analytics
Met Office Informatics Lab
Time Series Analytics for Big Fast Data
@Bristol Data Dome Workshop (ISO/Urban Tide)
Assurance Scoring: using machine learning and analytics to reduce risk in the...
Imagine Bristol - interactive workshop day
Open Data Institute (ODI) Node
Bristol's Open Data Journey
@Bristol Data Dome workshop - NSC Creative
Declarative data analysis
Bristol is Open: Exploring Open Data in the City

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Lecture1 pattern recognition............
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
Database Infoormation System (DBIS).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Introduction to machine learning and Linear Models
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPT
Quality review (1)_presentation of this 21
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
.pdf is not working space design for the following data for the following dat...
climate analysis of Dhaka ,Banglades.pptx
Lecture1 pattern recognition............
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Business Analytics and business intelligence.pdf
Database Infoormation System (DBIS).pptx
IB Computer Science - Internal Assessment.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Introduction to machine learning and Linear Models
Miokarditis (Inflamasi pada Otot Jantung)
Business Acumen Training GuidePresentation.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Quality review (1)_presentation of this 21
Reliability_Chapter_ presentation 1221.5784
Introduction-to-Cloud-ComputingFinal.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
.pdf is not working space design for the following data for the following dat...

Ask bigger questions

  • 1. 1   Becoming  Informa/on-­‐Driven   Introduc/on  to  the  Enterprise  Data  Hub   Mike  Olson   Cloudera,  Inc.   Co-­‐Founder  &  Chief  Strategy  Ocer  
  • 2. 2   Expanding  Data  Requires  A  New  Approach   Š2014  Cloudera,  Inc.  All  rights  reserved.  2   1980s   Bring  Data  to  Compute   Now   Bring  Compute  to  Data   RelaEve  size  &  complexity   Data   InformaEon-­‐centric   businesses  use  all  data:       Mul/-­‐structured,     internal  &  external  data     of  all  types   Compute   Compute   Compute   Process-­‐centric     businesses  use:     • Structured  data  mainly   • Internal  data  only   • “Important”  data  only       Compute   Compute   Compute   Data   Data   Data   Data  
  • 3. 3   The  Old  Way:  Bringing  Data  to  Compute   Š2014  Cloudera,  Inc.  All  rights  reserved.  3   Complex  Architecture   •  Many  special-­‐purpose   systems   •  Moving  data  around   •  No  complete  views   Visibility   •  Leaving  data  behind   •  Risk  and  compliance   •  High  cost  of  storage   Time  to  Data   •  Up-­‐front  modeling   •  Transforms  slow   •  Transforms  lose  data   Cost  of  AnalyEcs   •  Exis/ng  systems  strained   •  No  agility   •  BI  backlog   4   1   2   3   SERVERS  MARTS  EDWS   DOCUMENTS   STORAGE   SEARCH   ARCHIVE   ERP,  CRM,  RDBMS,  MACHINES   FILES,  IMAGES,  VIDEOS,  LOGS,  CLICKSTREAMS   EXTERNAL  DATA  SOURCES  
  • 4. 4   SERVERS   MARTS   EDWS   DOCUMENTS   STORAGE   SEARCH   ARCHIVE   ERP,  CRM,  RDBMS,  MACHINES   FILES,  IMAGES,  VIDEOS,  LOGS,  CLICKSTREAMS   ESTERNAL  DATA  SOURCES   Š2014  Cloudera,  Inc.  All  rights  reserved.   MulE-­‐workload  analyEc  plaRorm   •  Bring  applica/ons  to  data   •  Combine  different  workloads  on     common  data  (i.e.  SQL  +  Search)   •  True  BI  agility   4   1   2   3   4   The  New  Way:  Bringing  Compute  to  Data   4   AcEve  archive   •  Full  delity  original  data   •  Indenite  /me,  any  source   •  Lowest  cost  storage   1   Data  management,  transforms   •  One  source  of  data  for  all  analy/cs   •  Persist  state  of  transformed  data   •  Signicantly  faster  &  cheaper   2   Self-­‐service  exploratory  BI   •  Simple  search  +  BI  tools   •  “Schema  on  read”  agility   •  Reduce  BI  user  backlog  requests   3  
  • 5. 5   Beeer,  faster,  cheaper  and  mul/-­‐framework   BATCH   PROCESSING   MR  /  PIG/  Hive  /  Cascading   SQL   IMPALA   SEARCH   SOLR   MACHINE   LEARNING   SAS,  R,  H20,  MLlib   STREAM   PROCESSING   SPARK  STREAMING   NOSQL   HBASE   Process  Data   IN-­‐MEMORY   SPARK   Train  &  Test   Models   Respond  to   Events  in  RT   Explore  &   Analyze  Data   • Highly  mature   • Wide  range  of  clients   • Signicant  advances   in  speed  &  usability   • Integra/on  with  the   SAS  &  Revolu/on   product  porgolio   • Python  /  0xdata  /  ML   lib  for  advanced  users   • Very  low  (~10ms)   latency   • High  volumes  of   single  events   • High  speed   • High  concurrency   • Workload  mgt   • Broad  BI  support   • For  unstructured  &   semi-­‐structured  data   • For  business  users   • Low  (1  second)  latency   • Windows  (collec/ons)   of  events   Š2014  Cloudera,  Inc.  All  rights  reserved.  
  • 6. 6   Opera/onal  Data  Store   •  Consolidate,  cleanse  &  stage   data   •  Promote  to  other  opera/onal   systems  or  EDW’s   Data  Warehouse   •  ELT   •  Archive   Ra/onalizing  exis/ng  infrastructure   Migra/ng  data  sets,  workloads  or  en/re  systems  from  more  expensive  or  less   flexible  systems   Š2014  Cloudera,  Inc.  All  rights  reserved.  
  • 7. 7   Combine  &   explore  new     data  sets   • Scrip/ng   • Data  blending   • Tradi/onal  ETL   Support  ad-­‐hoc   marts  and  self-­‐ serve  BI  users   • Tableau,  Qlik  et  al   Enable  data   scien/sts  to  train   &  test  models   • ML  libraries   • SAS,  Revolu/on   What  do  we  mean  by  data  discovery?   Providing  a  flexible  analy/c  sandbox  where  users  can  apply  mul/ple  tools  &   techniques  to  derive  insights  from  new  &  tradi/onal  data   Š2014  Cloudera,  Inc.  All  rights  reserved.  
  • 8. 8   Analyze  paeerns   over  deep   histories   • Recommenda/ons   • Outliers   Automate   responses  to  new   data  /   observa/ons   • Classifying  or  scoring   new  data   User  explora/on  /   judgment   applica/on   • Reviewing  outliers   • Overriding  sugges/ons   What  do  we  mean  by  pervasive  analy/cs?   Using  predic/ve  analy/cs  to  improve  business  processes  or  augment   professional  judgment  in  an  automated  way  across  the  organiza/on   Š2014  Cloudera,  Inc.  All  rights  reserved.  
  • 9. 9   Big  Data  in  Credit  Card  Processing   “Customer  privacy  is   paramount,  but  we  need  to   keep  vast  amounts  of   informaFon  online  to  run   our  business.  Can  we  achieve   both  goals?”   “Modern  credit  card  fraud   rings  operate  globally  over   long  Fme  scales  –  how  can  we   collect,  store  &  analyze  the   petabytes  of  data  it  takes  to   detect  them?”   “We  obviously  have  vast  and   detailed  informaFon  about   customer  purchases.  Can  we   combine  it  with  GPS  &  mobile   data,  combined  with   browsing  behavior  to  offer   new  products?”   “How  can  we  deliver  what   the  business  team  wants,   and  faster,  without   spending  tens  of  millions  of   dollars  to  expand  our  data   warehouse?”   Fraud  DetecEon   Regulatory     Compliance   Product  &  Service     InnovaEon   OperaEonal     Eciency   CFO  &  CRO   CIO  &  CRO   R&D,  CMO   CIO  
  • 10. 10   Big  Data  in  Retail   360°  Customer  View   Fraud  PrevenEon   LogisEcs  &     Supply  Chain   OperaEonal  Eciency   CMO   CMO  &     Customer  Service   CEO,  VP  OperaEons   CIO   “We  want  to  know  what  our   customer  do  on-­‐line  and  in   our  stored.  How  can  we   combine  data  from  separate   analyFcs  silos  to  understand   &  serve  them  beSer?”   “TheT,  or  ‘shrinkage’  in  our   stores  is  on  the  increase  –   can  we  combine  POS  data   with  video  surveillance  to   reduce  it  without  impacFng   customer  service   negaFvely?”   “How  can  we  reduce  stock-­‐ outs  &  ensure  products  are  in   the  right  stores  at  the  right   Fme?  Can  we  combine  data   from  our  carriers  with  in-­‐ store  historical  data  from   thousands  of  stores?   “Our  EDW  infrastructure  is   being  overwhelmed  with   data  and  workloads;  we  are   running  into  capacity  limits,   and  the  annual  costs  of   expansion  are  in  the  tens  of   millions.  What  can  we  do?”  
  • 11. 11   Big  Data  in  Health  Care   360°  PaEent  View   Regulatory   Compliance   Maximize   Medical  Ecacy   OperaEonal  Eciency   VP  OperaEons,     Chief  of  Compliance   VP  OperaEons   Chief  Medical  Ocer   CFO   Chief  Medical  Ocer   CIO   “PaFent  data  ends  up   scaSered  across  many   different  systems  –  is  there  a   way  to  get  a  complete  picture   by  combining  it  while   ensuring  HIPAA  compliance?”   “The  move  to  EMR  combined   with  the  strict  regulaFons   means  we  need  to  keep  at   least  7  years  of  data  online  –   how  can  we  afford  to  do  that   and  make  it  searchable  and   available  for  analysis?”   “We  invest  hundreds  of   millions  in  new  equipment   every  year.  How  can  we  judge   the  long  term  ecacy  for   paFent  outcomes,  and  make   smarter  investment   decisions?”   “Our  EDW  infrastructure  is   being  overwhelmed  with  data   and  workloads;  we  are   running  into  capacity  limits,   and  the  annual  costs  of   expansion  are  in  the  tens  of   millions.  What  can  we  do?”  
  • 12. 12
  • 13. 13
  • 14. 14   Š2014  Cloudera,  Inc.  All  rights  reserved.   Mike  Olson   @mikeolson   mike.olson@cloudera.com Â