SlideShare a Scribd company logo
1
Limitless Data, Rapid Discovery,
Powerful Insight
How to Connect Cloudera to SAP Lumira with Simba
David Tishgart // Cloudera // @dtish
Angela Harvey // SAP // @AngelaHarveySAP
Kyle Porter // Simba Technologies
2© Cloudera, Inc. All rights reserved.
Data can be a
powerful strategic asset
data helps achieve
your business vision.
…only if...
3© Cloudera, Inc. All rights reserved.
Data Changes How We Work
Everything that can be
measured will be measured.
Employees and customers expect
more personal interactions, but
not at the cost of their privacy.
The most innovative companies
embrace experimentation
and agility.
Instrumentation Consumerization Experimentation
4© Cloudera, Inc. All rights reserved.
Objectives of Data Discovery
Report Model Rule
Marketing analysis
Log analysis
Churn analysis
Product recommendation
Predictive support
Trade recommendation
Ad targeting
Transaction classification
Lead scoring
5© Cloudera, Inc. All rights reserved.
The Iterative Process of Data Discovery
Ingest
Transformation
80% of Time
Diverse Ingest
Search and lineage
Agile Transforms
Access
Data
Generation
Data Discovery
Flow
6© Cloudera, Inc. All rights reserved.
The Iterative Process of Data Discovery
Ingest
Transformation
80% of Time
Diverse Ingest
Search and lineage
Agile Transforms
20% of Time
SQL
Statistical
Machine Learning
Analysis
Technique
Access
Data
Generation
Data Discovery
Flow
7© Cloudera, Inc. All rights reserved.
The Iterative Process of Data Discovery
Report, Model,
or Rules
Ingest
Transformation
80% of Time
Diverse Ingest
Search and lineage
Agile Transforms
20% of Time
SQL
Statistical
Machine Learning
Implement
Point Solution
Custom App
Analysis
Technique
Access
Data
Generation
Data Discovery
Flow
8© Cloudera, Inc. All rights reserved.
Traditional Data Discovery Architecture
Access Data Experiment FastAnalyze Data
Enterprise Data Warehouse
ImplementData Sources
ETL
Structured
Unstructured
Ingest
Storage #1, 2, N
ELT
Store & Process
Traditional Architecture
EDW
Archive
ETL
Access Data
Analyze Data
Search
Statistical
Machine
Learning
SQL
Serve
Serve
Serve
Optimize
Implement
Custom
Application
Point Solution
ELT
ELT
9© Cloudera, Inc. All rights reserved.
Enterprise Data Warehouse
ImplementData Sources
ETL
Structured
Unstructured
Ingest
Storage #1, 2, N
ELT
Store & Process
Traditional Architecture
EDW
Archive
ETL
Access Data
Analyze Data
Search
Statistical
Machine
Learning
SQL
Serve
Serve
Serve
Optimize
Implement
Custom
Application
Point Solution
ELT
ELT
Challenges with Traditional Architectures
1) Limited Data
1
10© Cloudera, Inc. All rights reserved.
Enterprise Data Warehouse
ImplementData Sources
ETL
Structured
Unstructured
Ingest
Storage #1, 2, N
ELT
Store & Process
Traditional Architecture
EDW
Archive
ETL
Access Data
Analyze Data
Search
Statistical
Machine
Learning
SQL
Serve
Serve
Serve
Optimize
Implement
Custom
Application
Point Solution
ELT
ELT
Challenges with Traditional Architectures
1) Limited Data 2) Long Time to Value
1
2
2
11© Cloudera, Inc. All rights reserved.
Enterprise Data Warehouse
ImplementData Sources
ETL
Structured
Unstructured
Ingest
Storage #1, 2, N
ELT
Store & Process
Traditional Architecture
EDW
Archive
ETL
Access Data
Analyze Data
Search
Statistical
Machine
Learning
SQL
Serve
Serve
Serve
Optimize
Implement
Custom
Application
Point Solution
ELT
ELT
Challenges with Traditional Architectures
1) Limited Data 2) Long Time to Value
1
2
2
3
3………...
3) Compliance & Privacy Concerns
3
12© Cloudera, Inc. All rights reserved.
A New Way Forward
1) Unlimited Data Access
Enterprise Data Warehouse
ImplementData Sources
Structured
Unstructured
ELT
Store & Process
Modern Architecture
EDW
ETL
Access Data
Analyze Data
Statistical
Machine
Learning
SQL
Serve
Optimize
Implement
Custom
Application
Point Solution
ETL
Active Ingest
Ingest EDH
Archiv
e Load
Cloudera
Search
ELT
1
1
13© Cloudera, Inc. All rights reserved.
A New Way Forward
1) Unlimited Data Access 2) Reduce Time to Value
Enterprise Data Warehouse
ImplementData Sources
Structured
Unstructured
ELT
Store & Process
Modern Architecture
EDW
ETL
Access Data
Analyze Data
Statistical
Machine
Learning
SQL
Serve
Optimize
Implement
Custom
Application
Point Solution
ETL
Active Ingest
Ingest EDH
Archiv
e Load
Cloudera
Search
ELT 2
2
1
1
14© Cloudera, Inc. All rights reserved.
A New Way Forward
1) Unlimited Data Access 2) Reduce Time to Value 3) Secure and Compliant
Enterprise Data Warehouse
ImplementData Sources
Structured
Unstructured
ELT
Store & Process
Modern Architecture
EDW
ETL
Access Data
Analyze Data
Statistical
Machine
Learning
SQL
Serve
Optimize
Implement
Custom
Application
Point Solution
ETL
Active Ingest
Ingest EDH
Archiv
e Load
Cloudera
Search
ELT
3
2
2
3
1
1
15© Cloudera, Inc. All rights reserved.
State of Indiana uses
Cloudera and SAP to
deliver analytical
insights that will
reduce infant mortality
below the national
average by 2020
16
Cloudera and SAP: Driving Data Analytics
Business Users
SAP HANA Enterprise Data Hub
Process and store any volume of disparate
data in its original fidelity at scale.
Discover and analyze large amounts of
diverse data.
Automate the analytics process and enable
decision point analytics.
Data Sources
SAP Business Objects, Predictive Analytics, Lumira
2
3
1
2
3
1
Cloudera Confidential
17
SAP Analytics & Big Data
Agile
Visualization
Advanced
Analytics
Enterprise
Business Intelligence
 SAP Analytics tools view Hadoop as just another data source
 Complement your existing data infrastructure with Cloudera and derive value with familiar SAP tools
 Use SAP Analytics directly against Big Data sources, or with HANA for real-time analytical capabilities
Data Sources
SAP BI Suite
• Connect universes directly to
Cloudera then report using any
client tool (Web Intelligence,
Crystal Reports, Dashboards)
SAP Lumira
• Connect to Cloudera thru Hive
or Impala drivers
• Leverage our Big Data
visualizations or build your own
SAP Predictive Analysis
• Go beyond knowing what
happened and understand why,
or model what could happen
• Tease more information out of
Big Data sources, creating
more attributes for better
modeling
• Fast—pushing the predictive
calculations to Hadoop removes
the need to bring data to the
desktop
18
SAP Lumira
Trusted Data Discovery as the next generation of SAP Business Intelligence
Lumira Server or Lumira, Edge
Lumira Cloud
Lumira Desktop
Wrangle and transform data
Personal data, Big Data—in-box Impala
driver, corporate data
Visualize & discover insights
Trusted data discovery
Share beautiful stories
Infographics, predictive
19
Kyle Porter // Simba Technologies
DEMO:
Connecting
Cloudera to SAP
Lumira with Simba
DEMO:
Connecting Cloudera to
SAP Lumira with Simba
20© Cloudera, Inc. All rights reserved.
21© Cloudera, Inc. All rights reserved.
Thank You
David Tishgart // david.tishgart@cloudera.com
Angela Harvey // angela.harvey@sap.com
Kyle Porter // kylep@simba.com

More Related Content

PPTX
Breakout: Data Discovery with Hadoop
PPTX
Govern This! Data Discovery and the application of data governance with new s...
PDF
Data Discovery and BI - Is there Really a Difference?
PPTX
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
PPTX
Rethink Analytics with an Enterprise Data Hub
PPTX
Enterprise Data Hub: The Next Big Thing in Big Data
PDF
The Future of Data Management: The Enterprise Data Hub
PPTX
Building a Modern Analytic Database with Cloudera 5.8
Breakout: Data Discovery with Hadoop
Govern This! Data Discovery and the application of data governance with new s...
Data Discovery and BI - Is there Really a Difference?
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Rethink Analytics with an Enterprise Data Hub
Enterprise Data Hub: The Next Big Thing in Big Data
The Future of Data Management: The Enterprise Data Hub
Building a Modern Analytic Database with Cloudera 5.8

What's hot (20)

PDF
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
PPTX
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
PPTX
From Insight to Action: Using Data Science to Transform Your Organization
PPTX
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
PPTX
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
PPTX
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
PPTX
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
PPTX
High-Performance Analytics in the Cloud with Apache Impala
PPTX
Breakout: Operational Analytics with Hadoop
PPT
Emergence of MongoDB as an Enterprise Data Hub
PPTX
Put Alternative Data to Use in Capital Markets

PDF
Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
PPTX
Webinar: Transforming Customer Experience Through an Always-On Data Platform
PPTX
Better Together: The New Data Management Orchestra
PDF
Contexti / Oracle - Big Data : From Pilot to Production
PPTX
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
PPTX
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
PPTX
The Future of Data Management: The Enterprise Data Hub
PPTX
Extending Data Lake using the Lambda Architecture June 2015
PPTX
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
From Insight to Action: Using Data Science to Transform Your Organization
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
High-Performance Analytics in the Cloud with Apache Impala
Breakout: Operational Analytics with Hadoop
Emergence of MongoDB as an Enterprise Data Hub
Put Alternative Data to Use in Capital Markets

Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
Webinar: Transforming Customer Experience Through an Always-On Data Platform
Better Together: The New Data Management Orchestra
Contexti / Oracle - Big Data : From Pilot to Production
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
The Future of Data Management: The Enterprise Data Hub
Extending Data Lake using the Lambda Architecture June 2015
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Ad

Viewers also liked (20)

PDF
Expert recommendations for picking the right SAP BusinessObjects BI solution ...
PDF
Evolución de Herramientas de BI hacia el Entorno BigData
PPT
Hadoop Security Preview
PPTX
Integrating hadoop - Big Data TechCon 2013
PPTX
Not only SQL - Database Choices
PPTX
Ontologising the Health Level Seven (HL7) Standard
PPTX
Introduction to Hadoop - The Essentials
ODP
Webservices REST com Zend Framework
PPTX
SOA standards
PPT
Soa business centric and soap basic
PPTX
Splunking HL7 Healthcare Data for Business Value
PDF
Ebs soa con8716_pdf_8716_0001
PPTX
Improvements in Hadoop Security
PPTX
Is Your Hadoop Environment Secure?
PPTX
Automated Testing for BizTalk HL7 Solutions
PPT
Description of soa and SOAP,WSDL & UDDI
PPTX
Hadoop and Big Data Security
PPTX
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
PDF
Hadoop Architecture Options for Existing Enterprise DataWarehouse
PDF
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Expert recommendations for picking the right SAP BusinessObjects BI solution ...
Evolución de Herramientas de BI hacia el Entorno BigData
Hadoop Security Preview
Integrating hadoop - Big Data TechCon 2013
Not only SQL - Database Choices
Ontologising the Health Level Seven (HL7) Standard
Introduction to Hadoop - The Essentials
Webservices REST com Zend Framework
SOA standards
Soa business centric and soap basic
Splunking HL7 Healthcare Data for Business Value
Ebs soa con8716_pdf_8716_0001
Improvements in Hadoop Security
Is Your Hadoop Environment Secure?
Automated Testing for BizTalk HL7 Solutions
Description of soa and SOAP,WSDL & UDDI
Hadoop and Big Data Security
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Ad

Similar to Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to SAP Lumira with Simba (20)

PPTX
Hadoop and Manufacturing
PPTX
Simplifying Real-Time Architectures for IoT with Apache Kudu
PPTX
How to Build Continuous Ingestion for the Internet of Things
PPTX
Tusker Corporate Profile
PDF
Complement Your Existing Data Warehouse with Big Data & Hadoop
PPTX
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
PDF
Analytics in a Day Virtual Workshop
 
PDF
CS-Op Analytics
PDF
Azure Synapse 101 Webinar Presentation
PDF
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
PPTX
Impala Unlocks Interactive BI on Hadoop
PPTX
IoT-Enabled Predictive Maintenance
PDF
Why an AI-Powered Data Catalog Tool is Critical to Business Success
PPTX
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
PDF
Making Hadoop based analytics simple for everyone to use
PDF
DevOps is to Infrastructure as Code, as DataOps is to...?
PPTX
Feature Store as a Data Foundation for Machine Learning
PPTX
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
PDF
Data & Analytics with CIS & Microsoft Platforms
PPTX
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Hadoop and Manufacturing
Simplifying Real-Time Architectures for IoT with Apache Kudu
How to Build Continuous Ingestion for the Internet of Things
Tusker Corporate Profile
Complement Your Existing Data Warehouse with Big Data & Hadoop
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Analytics in a Day Virtual Workshop
 
CS-Op Analytics
Azure Synapse 101 Webinar Presentation
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Impala Unlocks Interactive BI on Hadoop
IoT-Enabled Predictive Maintenance
Why an AI-Powered Data Catalog Tool is Critical to Business Success
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
Making Hadoop based analytics simple for everyone to use
DevOps is to Infrastructure as Code, as DataOps is to...?
Feature Store as a Data Foundation for Machine Learning
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Data & Analytics with CIS & Microsoft Platforms
Building a Data Hub that Empowers Customer Insight (Technical Workshop)

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
2020 Cloudera Data Impact Awards Finalists
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
Machine Learning with Limited Labeled Data 4/3/19
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Modern Data Warehouse Fundamentals Part 2
PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PPTX
Analyst Webinar: Doing a 180 on Customer 360
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
Introducing the data science sandbox as a service 8.30.18
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Introducing the data science sandbox as a service 8.30.18

Recently uploaded (20)

PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
history of c programming in notes for students .pptx
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
top salesforce developer skills in 2025.pdf
PDF
Nekopoi APK 2025 free lastest update
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Introduction to Artificial Intelligence
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Digital Strategies for Manufacturing Companies
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Online Work Permit System for Fast Permit Processing
Internet Downloader Manager (IDM) Crack 6.42 Build 41
history of c programming in notes for students .pptx
Which alternative to Crystal Reports is best for small or large businesses.pdf
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
top salesforce developer skills in 2025.pdf
Nekopoi APK 2025 free lastest update
Odoo Companies in India – Driving Business Transformation.pdf
Introduction to Artificial Intelligence
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
ManageIQ - Sprint 268 Review - Slide Deck
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Odoo POS Development Services by CandidRoot Solutions
Digital Strategies for Manufacturing Companies
Wondershare Filmora 15 Crack With Activation Key [2025
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
ISO 45001 Occupational Health and Safety Management System
2025 Textile ERP Trends: SAP, Odoo & Oracle
CHAPTER 2 - PM Management and IT Context
How to Choose the Right IT Partner for Your Business in Malaysia
Online Work Permit System for Fast Permit Processing

Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to SAP Lumira with Simba

  • 1. 1 Limitless Data, Rapid Discovery, Powerful Insight How to Connect Cloudera to SAP Lumira with Simba David Tishgart // Cloudera // @dtish Angela Harvey // SAP // @AngelaHarveySAP Kyle Porter // Simba Technologies
  • 2. 2© Cloudera, Inc. All rights reserved. Data can be a powerful strategic asset data helps achieve your business vision. …only if...
  • 3. 3© Cloudera, Inc. All rights reserved. Data Changes How We Work Everything that can be measured will be measured. Employees and customers expect more personal interactions, but not at the cost of their privacy. The most innovative companies embrace experimentation and agility. Instrumentation Consumerization Experimentation
  • 4. 4© Cloudera, Inc. All rights reserved. Objectives of Data Discovery Report Model Rule Marketing analysis Log analysis Churn analysis Product recommendation Predictive support Trade recommendation Ad targeting Transaction classification Lead scoring
  • 5. 5© Cloudera, Inc. All rights reserved. The Iterative Process of Data Discovery Ingest Transformation 80% of Time Diverse Ingest Search and lineage Agile Transforms Access Data Generation Data Discovery Flow
  • 6. 6© Cloudera, Inc. All rights reserved. The Iterative Process of Data Discovery Ingest Transformation 80% of Time Diverse Ingest Search and lineage Agile Transforms 20% of Time SQL Statistical Machine Learning Analysis Technique Access Data Generation Data Discovery Flow
  • 7. 7© Cloudera, Inc. All rights reserved. The Iterative Process of Data Discovery Report, Model, or Rules Ingest Transformation 80% of Time Diverse Ingest Search and lineage Agile Transforms 20% of Time SQL Statistical Machine Learning Implement Point Solution Custom App Analysis Technique Access Data Generation Data Discovery Flow
  • 8. 8© Cloudera, Inc. All rights reserved. Traditional Data Discovery Architecture Access Data Experiment FastAnalyze Data Enterprise Data Warehouse ImplementData Sources ETL Structured Unstructured Ingest Storage #1, 2, N ELT Store & Process Traditional Architecture EDW Archive ETL Access Data Analyze Data Search Statistical Machine Learning SQL Serve Serve Serve Optimize Implement Custom Application Point Solution ELT ELT
  • 9. 9© Cloudera, Inc. All rights reserved. Enterprise Data Warehouse ImplementData Sources ETL Structured Unstructured Ingest Storage #1, 2, N ELT Store & Process Traditional Architecture EDW Archive ETL Access Data Analyze Data Search Statistical Machine Learning SQL Serve Serve Serve Optimize Implement Custom Application Point Solution ELT ELT Challenges with Traditional Architectures 1) Limited Data 1
  • 10. 10© Cloudera, Inc. All rights reserved. Enterprise Data Warehouse ImplementData Sources ETL Structured Unstructured Ingest Storage #1, 2, N ELT Store & Process Traditional Architecture EDW Archive ETL Access Data Analyze Data Search Statistical Machine Learning SQL Serve Serve Serve Optimize Implement Custom Application Point Solution ELT ELT Challenges with Traditional Architectures 1) Limited Data 2) Long Time to Value 1 2 2
  • 11. 11© Cloudera, Inc. All rights reserved. Enterprise Data Warehouse ImplementData Sources ETL Structured Unstructured Ingest Storage #1, 2, N ELT Store & Process Traditional Architecture EDW Archive ETL Access Data Analyze Data Search Statistical Machine Learning SQL Serve Serve Serve Optimize Implement Custom Application Point Solution ELT ELT Challenges with Traditional Architectures 1) Limited Data 2) Long Time to Value 1 2 2 3 3………... 3) Compliance & Privacy Concerns 3
  • 12. 12© Cloudera, Inc. All rights reserved. A New Way Forward 1) Unlimited Data Access Enterprise Data Warehouse ImplementData Sources Structured Unstructured ELT Store & Process Modern Architecture EDW ETL Access Data Analyze Data Statistical Machine Learning SQL Serve Optimize Implement Custom Application Point Solution ETL Active Ingest Ingest EDH Archiv e Load Cloudera Search ELT 1 1
  • 13. 13© Cloudera, Inc. All rights reserved. A New Way Forward 1) Unlimited Data Access 2) Reduce Time to Value Enterprise Data Warehouse ImplementData Sources Structured Unstructured ELT Store & Process Modern Architecture EDW ETL Access Data Analyze Data Statistical Machine Learning SQL Serve Optimize Implement Custom Application Point Solution ETL Active Ingest Ingest EDH Archiv e Load Cloudera Search ELT 2 2 1 1
  • 14. 14© Cloudera, Inc. All rights reserved. A New Way Forward 1) Unlimited Data Access 2) Reduce Time to Value 3) Secure and Compliant Enterprise Data Warehouse ImplementData Sources Structured Unstructured ELT Store & Process Modern Architecture EDW ETL Access Data Analyze Data Statistical Machine Learning SQL Serve Optimize Implement Custom Application Point Solution ETL Active Ingest Ingest EDH Archiv e Load Cloudera Search ELT 3 2 2 3 1 1
  • 15. 15© Cloudera, Inc. All rights reserved. State of Indiana uses Cloudera and SAP to deliver analytical insights that will reduce infant mortality below the national average by 2020
  • 16. 16 Cloudera and SAP: Driving Data Analytics Business Users SAP HANA Enterprise Data Hub Process and store any volume of disparate data in its original fidelity at scale. Discover and analyze large amounts of diverse data. Automate the analytics process and enable decision point analytics. Data Sources SAP Business Objects, Predictive Analytics, Lumira 2 3 1 2 3 1 Cloudera Confidential
  • 17. 17 SAP Analytics & Big Data Agile Visualization Advanced Analytics Enterprise Business Intelligence  SAP Analytics tools view Hadoop as just another data source  Complement your existing data infrastructure with Cloudera and derive value with familiar SAP tools  Use SAP Analytics directly against Big Data sources, or with HANA for real-time analytical capabilities Data Sources SAP BI Suite • Connect universes directly to Cloudera then report using any client tool (Web Intelligence, Crystal Reports, Dashboards) SAP Lumira • Connect to Cloudera thru Hive or Impala drivers • Leverage our Big Data visualizations or build your own SAP Predictive Analysis • Go beyond knowing what happened and understand why, or model what could happen • Tease more information out of Big Data sources, creating more attributes for better modeling • Fast—pushing the predictive calculations to Hadoop removes the need to bring data to the desktop
  • 18. 18 SAP Lumira Trusted Data Discovery as the next generation of SAP Business Intelligence Lumira Server or Lumira, Edge Lumira Cloud Lumira Desktop Wrangle and transform data Personal data, Big Data—in-box Impala driver, corporate data Visualize & discover insights Trusted data discovery Share beautiful stories Infographics, predictive
  • 19. 19 Kyle Porter // Simba Technologies DEMO: Connecting Cloudera to SAP Lumira with Simba DEMO: Connecting Cloudera to SAP Lumira with Simba
  • 20. 20© Cloudera, Inc. All rights reserved.
  • 21. 21© Cloudera, Inc. All rights reserved. Thank You David Tishgart // david.tishgart@cloudera.com Angela Harvey // angela.harvey@sap.com Kyle Porter // kylep@simba.com

Editor's Notes

  • #3: At Cloudera, our mission is to help organizations gain value from all their data. Increasingly, leading organizations view data as among their most important strategic assets, but only if they’re able to leverage that data to meet their business objectives.
  • #4: We see a few trends driving the increased importance of having a strategy for data. The Internet has changed everything, and we are more connected than ever before. We all expect to be on the web these days; we rely on it for work, shopping, entertainment, and social interaction. With the simultaneous proliferation of mobile devices and sensors, we now have the ability measure almost everything. As a result, we’re generating data, and moving it, at a rate that’s entirely new. In this new online world, customers and employees expect more personalization, but not at the cost of privacy. Security matters. Ultimately, data enables us to better understand our customers, patients, employees, or students. Innovative organizations embrace experimentation and agile methods. Representative Customer Stories Vivint: Everything that can be measured, will be measured. Challenge: Vivint needed a central repository to gather and analyze data generated from each of the 20-30 sensors -- e.g. thermostats, smart appliances, video cameras, window and door sensors, and smoke and carbon monoxide sensors -- in every one of its 800,000 customers' homes. Solution: Vivint has deployed an enterprise data hub on Cloudera that allows it to look across many data streams simultaneously for behaviors, geo-location, and actionable events. Benefit: With its enterprise data hub that combines sensor data across multiple data streams, Vivint can glean new insights that help the company understand and enrich customers' lives. For example, knowing when a home is occupied or vacant is important to security – but when tied into the heating, ventilation and cooling (HVAC) system, you can add a layer of energy cost savings by cooling or heating a home based on occupancy. Western Union: Employees and customers expect more personal interactions. Challenge: With customers spanning every corner of the globe and all walks of life, Western Union saw an opportunity to personalize the experience for each customer by combining the volumes of information about their transactions -- of which Western Union processed 29 per second in 2013 -- with user behavior data, clickstream data, and mobile usage patterns. Solution: Western Union implemented an enterprise data hub on Cloudera to centralize its data -- both structured and unstructured -- in order to provide a 360-degree customer view, while also supporting use cases for risk management and AML compliance. Benefit: Deeper customer understanding is driving product improvements and enhancements that improve Western Union customers' experience. For example, Western Union learned through its EDH that many customers in key sectors process the same transactions repeatedly, prompting the company to add a "Send Again" button to its mobile app to streamline the processing of repeat transactions. By deploying that capability, the company immediately saw a conversion uptake in those key sectors. Marketing Associates: The most innovative companies embrace experimentation and agility. Challenge: Marketing Associates' Magnify Analytic Solutions division has built expertise executing B2C online marketing contests and product giveaways for large clients such as Chrysler, DuPont, Ford, and Jaguar, which requires intensive data processing, elastic flexibility and scalability, and agility and performance. Ford recently offered Magnify the opportunity to manage its entire CRM system -- which Magnify jumped at, but knew it would need a new big data infrastructure to support. Solution: Without any prior in-house experience with Hadoop, Magnify built an enterprise data hub on Cloudera, leaning heavily on Cloudera Manager, Search, Impala, and integrations with SAS and Tableau to streamline the new platform's adoption. Benefit: The EDH has been a tremendous success, enabling Magnify to deliver a self-service, 360-degree view of consumers to its clients (vs. sending them Excel spreadsheets every 1-2 days which was the case prior). And better yet, the all-inclusive price of Cloudera Enterprise, Data Hub Edition and all resources needed to built its development, production, and QA environment came in well below ongoing costs of the traditional environment.
  • #5: Key take away: Analysts are trying to accomplish 1 of 3 things when they embark on data discovery projects. Building reports, models, or rules. Definition: A report is visual representation of static data. Once you have the report you can operationalize it by incorporating it into a dashboard that gets refreshed regular that you want your team to see. A model is a function of variables that have weights given to certain attributes resulting in an output. The output can be served to end users so that they get the relevant info they need. A rule is an attribute that can be inputted into a point solution to change the output (ad platform, fraud detection, etc). Example of difference between model and rule: When marketers target on ad platforms they input rules into the engine (gender, age, location, etc.) They are not in charge of the model that actually determines the optimal person to target the individuals. That model is built into the point solution.
  • #6: Key take away: It takes a lot of time and processing frameworks to arrive at value. The process of discovering value from data is an cyclical process that takes multiple processing frameworks, a wide variety of data, and countless iterations through out the process. The analyst must discover the data sets they want to included in analysis and then transform and cleanse this data in preparation for analysis. Depending on what the analyst is looking for, a report, model, or rule, they would use a variety of techniques in order to arrive at the outcome they think to be most effective. Once the report, model, or rule has been developed the Data Discovery process is over. They now must implement this information into a solution in order for this value to reach the masses. Single Data Discovery is extremely important, but shouldn’t be the end goal. Once a single analysts discovers this information they must make sure an entire team, department, organization, or customer base gets this information in a timely manner. If these output doesn’t influence optimal behavior, and the KPIs don’t move, then the analyst must go back to the discovery process and optimize the output. Optimizing means that they will include a different data or change the transformation or analysis technique they used. Let’s take a look at how traditional architectures are set up to handle this process.
  • #7: Key take away: It takes a lot of time and processing frameworks to arrive at value. The process of discovering value from data is an cyclical process that takes multiple processing frameworks, a wide variety of data, and countless iterations through out the process. The analyst must discover the data sets they want to included in analysis and then transform and cleanse this data in preparation for analysis. Depending on what the analyst is looking for, a report, model, or rule, they would use a variety of techniques in order to arrive at the outcome they think to be most effective. Once the report, model, or rule has been developed the Data Discovery process is over. They now must implement this information into a solution in order for this value to reach the masses. Single Data Discovery is extremely important, but shouldn’t be the end goal. Once a single analysts discovers this information they must make sure an entire team, department, organization, or customer base gets this information in a timely manner. If these output doesn’t influence optimal behavior, and the KPIs don’t move, then the analyst must go back to the discovery process and optimize the output. Optimizing means that they will include a different data or change the transformation or analysis technique they used. Let’s take a look at how traditional architectures are set up to handle this process.
  • #8: Key take away: It takes a lot of time and processing frameworks to arrive at value. The process of discovering value from data is an cyclical process that takes multiple processing frameworks, a wide variety of data, and countless iterations through out the process. The analyst must discover the data sets they want to included in analysis and then transform and cleanse this data in preparation for analysis. Depending on what the analyst is looking for, a report, model, or rule, they would use a variety of techniques in order to arrive at the outcome they think to be most effective. Once the report, model, or rule has been developed the Data Discovery process is over. They now must implement this information into a solution in order for this value to reach the masses. Single Data Discovery is extremely important, but shouldn’t be the end goal. Once a single analysts discovers this information they must make sure an entire team, department, organization, or customer base gets this information in a timely manner. If these output doesn’t influence optimal behavior, and the KPIs don’t move, then the analyst must go back to the discovery process and optimize the output. Optimizing means that they will include a different data or change the transformation or analysis technique they used. Let’s take a look at how traditional architectures are set up to handle this process.
  • #9: Key takeaway: It is not just a BI challenge, it is the way that data is managed. Keeping 3 main high level objectives of an architecture built for Data Discovery in mind- accessing data, analyzing data, and experimenting and iterating fast- we can examine a traditional architecture and see where organizations might run into issues. Questions for customer: Does this look like your architecture?
  • #10: Key takeaway: Experimentation and iterations take time with traditional architectures making it difficult to fail fast or succeed.
  • #11: Key takeaway: Experimentation and iterations take time with traditional architectures making it difficult to fail fast or succeed.
  • #12: Key takeaway: Experimentation and iterations take time with traditional architectures making it difficult to fail fast or succeed.
  • #13: Key takeaway: An EDH provides the foundation to change the way you collect and manage data in order to provide your analyst what they need in less time. ETL on the fly: Talk to schema-on-write vs schema-on-read (http://www.slideshare.net/awadallah/schemaonread-vs-schemaonwrite).
  • #14: Key takeaway: An EDH provides the foundation to change the way you collect and manage data in order to provide your analyst what they need in less time. ETL on the fly: Talk to schema-on-write vs schema-on-read (http://www.slideshare.net/awadallah/schemaonread-vs-schemaonwrite).
  • #15: Key takeaway: An EDH provides the foundation to change the way you collect and manage data in order to provide your analyst what they need in less time. ETL on the fly: Talk to schema-on-write vs schema-on-read (http://www.slideshare.net/awadallah/schemaonread-vs-schemaonwrite).
  • #16: Link to account record in SFDC (valid for Cloudera employees only): https://na6.salesforce.com/00180000019dZ6D The State of Indiana builds an enterprise data management platform to reduce costs and improve lives of its citizens Background: The state of Indiana has a population of more than 6.5 million people (known as “Hoosiers”) and 36,500 square miles of land area. It ranks 16th in the country based on population. Challenge: One of the state’s goals is “transparency,” providing citizens with comprehensive insight into state operations to confirm that taxpayer dollars are delivering the most efficient and effective services possible. State officials also see great opportunity in using data and analytics to help improve the lives of Indiana citizens. However, as with most state governments, officials found it difficult to integrate data stored in silos across 71 departments quickly or efficiently. Its existing data platforms couldn’t scale (except at great cost) to manage the huge amount of data needed. Additionally, ETL processes to move data into a common platform were extremely time-consuming. In one case, staff found that integrating expense reports from different agencies so they could be analyzed took more than 8 hours, which was unacceptable to users. Solution By implementing a Hadoop-based operational data store with Cloudera Enterprise, Data Hub Edition, the organization is tackling these challenges--reducing the time and cost to mine its data and gaining new insight.   Cloudera will ingest, process, and analyze data from SAP HANA and more than 50 other data sources, including virtual SQL tables, across the organization. Staff will be able to analyze statewide data via Impala + R. SAP Lumira will be used for data visualization. Cloudera Navigator will support data auditing, lineage and discovery. And enterprise architects will use Cloudera Manager to monitor and quickly diagnose cluster issues. Security was a significant concern for state officials given that the state manages sensitive information, including financial and health data. Cloudera was selected over HortonWorks due to the integrated encryption via Sentry. The state will use Sentry to encrypt data columns and Kerberos to encrypt the drives.   Bringing together so much data in a single view can be challenging and vendor support can make a significant difference between success and failure. According to state enterprise data architects, Cloudera is “much easier to work with” than other vendors – enabling staff to focus on their big picture goals.   Benefit: What will be the benefit of this state’s enterprise data platform and work with Cloudera? From an operational perspective, current tests show significant time savings from offloading ETL work to Hadoop, with queries once taking more than eight hours reduced to just four seconds. Additionally, the platform will help reduce costs as IT staff can optimize how and when they use SAP HANA, offloading less critical or even hot workloads to Hadoop.   However, what’s most exciting is the new insight that will be gained to help improve the lives of Indiana’s citizens.   Take, for example, the state’s goal to reduce the infant mortality rate. Indiana currently has one of the highest infant mortality rates in the U.S. One baby dies every 13 hours in Indiana. At the 2nd annual Indiana Infant Mortality Summit, held in 2014, presenters reported that if the state could reduce infant mortality rate to national average, 60 babies would survive each year. But the question for officials is: Which programs are best delivered to which mothers and when? Many factors contribute to infant mortality, including smoking, obesity, prenatal care, unsafe sleep, and early deliveries. By being able to integrate and analyze data across a family’s interaction with state agencies – from social and family services, to health services, to financial aid and food programs – state officials are confident they’ll uncover important insights that help them prevent unnecessary deaths. For example, officials want to understand the relationship between infant mortality and nutrition programs; do moms who receive WIC funds (The Special Supplemental Nutrition Program for Women, Infants, and Children) have healthier babies, and if so, would increasing funding WIC funds in specific areas of the state help save newborns?
  • #17: Data Services certification for CDH in progress 2. HANA SP08 + CDH connector validated Deeper certification (PE) planned post HANA SP09 release 3. Lumira ODBC driver for Impala certification –Q4 InfiniteInsights certification under investigation
  • #19: Download and install on your desktop in less than 5 minutes Insight from many data sources Combine, manipulate, and enrich data to apply it to your business scenarios Self-service visualizations and analytics to tell your story Optimized for SAP HANA for real time on detailed data Connectivity to Hadoop Extract value from your Hadoop data by performing analysis on the data Simple connections and an easy to use interface mean business users can extract value from Big Data sources Mash together data from big data and traditional sources for better insights Big Data visualizations, heat maps, scatter charts, create your own charts thru the CVOM SDK Extensible…if your needs go beyond desktop analysis you have the SAP stack behind you. interopability with Predictive Analysis means you can go beyond what’s already happened and make predictions on future behavior. Use results from Predictive to create visualizations in Lumira. With HANA means you can leverage Smart Data Access to access data directly in Hadoop and centralize data management. do rapid calculations when needed. BI Suite?? Share stories beyond the data analyst with Lumira Server or Lumira Cloud. Same interface for web, desktop, and cloud. Extensibility of the datasource, create your own data drivers with the open API if you have a customized datasource.