SlideShare a Scribd company logo
Big Data Use Cases
InSemble Inc.
http://www.insemble.com
Agenda
What is Big Data ?1
Technical Use Cases and Demo4
Hadoop Ecosystem & Business Use cases3
Relevance to your Enterprise2
Q and A with Cloudera5
Big Data Definitions
• Wikipedia defines it as “ Data Sets with sizes beyond the
ability of commonly used software tools to capture, curate,
manage and process data within a tolerable elapsed time
• Gartner defines it as Data with the following
characteristics
– High Velocity
– High Variety
– High Volume
• Another Definition is “ Big Data is a large volume,
unstructured data which cannot be handled by traditional
database management systems
Why a game changer
• Schema on Read
– Interpreting data at processing time
– Key, Values are not intrinsic properties of data but chosen by
person analyzing the data
• Move code to data
– With traditional, we bring data to code and I/O becomes a
bottleneck
– With distributed systems, we have to deal with our own
checkpointing/recovery
• More data beats better algorithms
Enterprise Relevance
• Missed Opportunities
– Channels
– Data that is analyzed
• Constraint was high cost
– Storage
– Processing
• Future-proof your business
– Schema on Read
– Access pattern not as relevant
– Not just future-proofing your architecture
Hadoop Ecosystem
Source: Apache Hadoop Documentation
Hadoop 2 with YARN
Source: Hadoop In Practice by Alex Holmes
Big Data Journey
!Real time Insight from all channels
!IT is key differentiator for your business
!Perfect alignment of Business and IT
!Ad Hoc Data Exploration
!Batch, Interactive, Real time use cases
!Predictive Analytics, Machine Learning
!Consolidated Analytics
!ETL
!Time Constraints
!Security standards defined
!Governance Standards Defined
!Integrated with the Enterprise
!Evaluate Business Benefits
!Understand Ecosystem
!Identify Platform
Aware of Benefits
Execute
Expand
Managed
Optimized
- Scout for Opportunities
- Pilot project
- Multiple Use cases
- Governance Model
- Core competency
Journey Over Time
BusinessValue
Effects
GREAT
GOOD
9
Insurance Domain – Case Study

source: Cloudera( Three-Customer-Case-Studies_Industry-Brief.pdf

Solution
• Cloudera Enterprise
• Apache Hive/Impala
• SQOOP
• Coexist with Enterprise Warehouses &
Mainframe
REQUIREMENTS
• Customized Plans based on multiple data points
• Lifestyle, health patterns, habits, preferences
• Find correlations from digitizing massive amounts of data
• Traffic patterns, demographics, weather
• Run analytics on multiple states simultaneously
BENEFITS
• Run descriptive models across historical data
from all states
• Customized products catered to
individual behaviors and risks
• Differentiated Marketing Offers
Common Use Cases
Detail Records, Time Constraints1
Sentiment Analysis, Fraud Detection4
Recommendation Engines, Insurance Underwriting3
Consolidated View, 360 degree View2
Personalized Marketing, Products5
Securing Hadoop Data
Source: http://www.voltage.com
General Thoughts
• Technology in hyper growth phase
• Complex
• Tools/Productivity/Monitoring products
evolving
• Pilot Project
• Incremental Journey
Technical Use Case: Managing
Hadoop Cluster
• Ambari vs Cloudera Manager
• Both provision, manage and monitor hadoop cluster
• Ambari
• Open Source
• Based on existing open source projects such as Puppet,
Ganglia and Nagios
• Cloudera Manager
• Proprietary tool but more mature
• As management tool, do we really need OSS?
• Rolling upgrades and manage multiple clusters
Technical User Case: Choose SQL
Engine on Hadoop
Performance Benchmark
source: http://blog.cloudera.com
Benchmark for multiple users
source: http://blog.cloudera.com
Other considerations
• Insert, update, and delete with full ACID
support
• Available since hive 0.14 https://issues.apache.org/
jira/browse/HIVE-5317
• Support for nested data structure
• Fault tolerance
• Work with certain file formats (Avro, LZO
compression)
• Integrate SQL on hadoop with other big data
use cases.
Demo - Hadoop cluster in AWS
• Total 6 EC2 machine, type t2.medium
• RHEL 6.5, 3.75G Memory, 10G hard drive
• 5-node Hadoop cluster
• Public data set downloaded from

https://data.cityofchicago.org
Demo
• Chicago Crime data from 2009 to present
• 2 million plus records
• Dangerous communities in Chicago (Hive vs
Hive on Tez vs Impala)
• Use Tableau to connect to Hadoop cluster
• Crime counts based on crime type
• Homicide count by Year
• dangerous community
• Homicide Map
Questions?
Vijay Mandava: vijay@insemble.com
Lan Jiang: lan@insemble.com / @Lan_Jiang



More Related Content

PDF
Big Data Telecom
PPTX
Big-Data Server Farm Architecture
PPTX
Big Data Use Cases
PDF
Taming Big Data With Modern Software Architecture
PPTX
Big Data with Not Only SQL
PPTX
5 Big Data Use Cases for 2013
PDF
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
PPTX
Introduction to Big Data
Big Data Telecom
Big-Data Server Farm Architecture
Big Data Use Cases
Taming Big Data With Modern Software Architecture
Big Data with Not Only SQL
5 Big Data Use Cases for 2013
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Introduction to Big Data

What's hot (20)

PPTX
Big Data in Action : Operations, Analytics and more
PPT
Big Data Real Time Analytics - A Facebook Case Study
PDF
Big Data Analytics for Real Time Systems
PDF
Ibm big data
PDF
Apache hadoop bigdata-in-banking
PPTX
BDaas- BigData as a service
PDF
Data architecture for modern enterprise
PDF
Sina Sohangir Presentation on IWMC 2015
PDF
02 a holistic approach to big data
PPTX
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
PDF
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
PPTX
BIG DATA and USE CASES
PPTX
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
ODP
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
PPTX
The 5 Keys to a Killer Data Lake
PDF
Ibm big data ibm marriage of hadoop and data warehousing
PPTX
Top Big data Analytics tools: Emerging trends and Best practices
PPTX
Enterprise Architecture in the Era of Big Data and Quantum Computing
PPTX
Big Data in Action : Operations, Analytics and more
Big Data Real Time Analytics - A Facebook Case Study
Big Data Analytics for Real Time Systems
Ibm big data
Apache hadoop bigdata-in-banking
BDaas- BigData as a service
Data architecture for modern enterprise
Sina Sohangir Presentation on IWMC 2015
02 a holistic approach to big data
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
BIG DATA and USE CASES
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The 5 Keys to a Killer Data Lake
Ibm big data ibm marriage of hadoop and data warehousing
Top Big data Analytics tools: Emerging trends and Best practices
Enterprise Architecture in the Era of Big Data and Quantum Computing
Ad

Similar to Big Data Use Cases (20)

PPTX
Modul_1_Introduction_to_Big_Data.pptx
PDF
Big data beyond the hype may 2014
PDF
Big Data Evolution
PPTX
Big Data - Applications and Technologies Overview
PPTX
Enterprise Data Hub: The Next Big Thing in Big Data
PDF
Lecture 1-big data engineering (Introduction).pdf
PDF
Where HADOOP fits in and challenges
PPTX
Big data by Mithlesh sadh
PPTX
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
PPTX
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
PDF
Hadoop and the Data Warehouse: When to Use Which
PPTX
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
PDF
What is Hadoop & its Use cases-PromtpCloud
PPTX
Building a Modern Analytic Database with Cloudera 5.8
PPTX
Big data unit 2
PDF
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
PPTX
Big data analytics - hadoop
PPTX
Applying Big Data Superpowers to Healthcare
PDF
Business Intelligence Architecture
PPTX
Big-Data-Seminar-6-Aug-2014-Koenig
Modul_1_Introduction_to_Big_Data.pptx
Big data beyond the hype may 2014
Big Data Evolution
Big Data - Applications and Technologies Overview
Enterprise Data Hub: The Next Big Thing in Big Data
Lecture 1-big data engineering (Introduction).pdf
Where HADOOP fits in and challenges
Big data by Mithlesh sadh
Comprehensive Security for the Enterprise IV: Visibility Through a Single End...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop and the Data Warehouse: When to Use Which
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
What is Hadoop & its Use cases-PromtpCloud
Building a Modern Analytic Database with Cloudera 5.8
Big data unit 2
TDWI Solution Summit San Diego 2014 Advanced Analytics at Macys.com
Big data analytics - hadoop
Applying Big Data Superpowers to Healthcare
Business Intelligence Architecture
Big-Data-Seminar-6-Aug-2014-Koenig
Ad

Recently uploaded (20)

PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Global journeys: estimating international migration
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Introduction to Business Data Analytics.
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPT
Quality review (1)_presentation of this 21
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Fluorescence-microscope_Botany_detailed content
IB Computer Science - Internal Assessment.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Global journeys: estimating international migration
climate analysis of Dhaka ,Banglades.pptx
1_Introduction to advance data techniques.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
Introduction to Business Data Analytics.
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Launch Your Data Science Career in Kochi – 2025
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Knowledge Engineering Part 1
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Database Infoormation System (DBIS).pptx
Quality review (1)_presentation of this 21
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg

Big Data Use Cases

  • 1. Big Data Use Cases InSemble Inc. http://www.insemble.com
  • 2. Agenda What is Big Data ?1 Technical Use Cases and Demo4 Hadoop Ecosystem & Business Use cases3 Relevance to your Enterprise2 Q and A with Cloudera5
  • 3. Big Data Definitions • Wikipedia defines it as “ Data Sets with sizes beyond the ability of commonly used software tools to capture, curate, manage and process data within a tolerable elapsed time • Gartner defines it as Data with the following characteristics – High Velocity – High Variety – High Volume • Another Definition is “ Big Data is a large volume, unstructured data which cannot be handled by traditional database management systems
  • 4. Why a game changer • Schema on Read – Interpreting data at processing time – Key, Values are not intrinsic properties of data but chosen by person analyzing the data • Move code to data – With traditional, we bring data to code and I/O becomes a bottleneck – With distributed systems, we have to deal with our own checkpointing/recovery • More data beats better algorithms
  • 5. Enterprise Relevance • Missed Opportunities – Channels – Data that is analyzed • Constraint was high cost – Storage – Processing • Future-proof your business – Schema on Read – Access pattern not as relevant – Not just future-proofing your architecture
  • 6. Hadoop Ecosystem Source: Apache Hadoop Documentation
  • 7. Hadoop 2 with YARN Source: Hadoop In Practice by Alex Holmes
  • 8. Big Data Journey !Real time Insight from all channels !IT is key differentiator for your business !Perfect alignment of Business and IT !Ad Hoc Data Exploration !Batch, Interactive, Real time use cases !Predictive Analytics, Machine Learning !Consolidated Analytics !ETL !Time Constraints !Security standards defined !Governance Standards Defined !Integrated with the Enterprise !Evaluate Business Benefits !Understand Ecosystem !Identify Platform Aware of Benefits Execute Expand Managed Optimized - Scout for Opportunities - Pilot project - Multiple Use cases - Governance Model - Core competency Journey Over Time BusinessValue Effects GREAT GOOD
  • 9. 9 Insurance Domain – Case Study
 source: Cloudera( Three-Customer-Case-Studies_Industry-Brief.pdf
 Solution • Cloudera Enterprise • Apache Hive/Impala • SQOOP • Coexist with Enterprise Warehouses & Mainframe REQUIREMENTS • Customized Plans based on multiple data points • Lifestyle, health patterns, habits, preferences • Find correlations from digitizing massive amounts of data • Traffic patterns, demographics, weather • Run analytics on multiple states simultaneously BENEFITS • Run descriptive models across historical data from all states • Customized products catered to individual behaviors and risks • Differentiated Marketing Offers
  • 10. Common Use Cases Detail Records, Time Constraints1 Sentiment Analysis, Fraud Detection4 Recommendation Engines, Insurance Underwriting3 Consolidated View, 360 degree View2 Personalized Marketing, Products5
  • 11. Securing Hadoop Data Source: http://www.voltage.com
  • 12. General Thoughts • Technology in hyper growth phase • Complex • Tools/Productivity/Monitoring products evolving • Pilot Project • Incremental Journey
  • 13. Technical Use Case: Managing Hadoop Cluster • Ambari vs Cloudera Manager • Both provision, manage and monitor hadoop cluster • Ambari • Open Source • Based on existing open source projects such as Puppet, Ganglia and Nagios • Cloudera Manager • Proprietary tool but more mature • As management tool, do we really need OSS? • Rolling upgrades and manage multiple clusters
  • 14. Technical User Case: Choose SQL Engine on Hadoop
  • 16. Benchmark for multiple users source: http://blog.cloudera.com
  • 17. Other considerations • Insert, update, and delete with full ACID support • Available since hive 0.14 https://issues.apache.org/ jira/browse/HIVE-5317 • Support for nested data structure • Fault tolerance • Work with certain file formats (Avro, LZO compression) • Integrate SQL on hadoop with other big data use cases.
  • 18. Demo - Hadoop cluster in AWS • Total 6 EC2 machine, type t2.medium • RHEL 6.5, 3.75G Memory, 10G hard drive • 5-node Hadoop cluster • Public data set downloaded from
 https://data.cityofchicago.org
  • 19. Demo • Chicago Crime data from 2009 to present • 2 million plus records • Dangerous communities in Chicago (Hive vs Hive on Tez vs Impala) • Use Tableau to connect to Hadoop cluster • Crime counts based on crime type • Homicide count by Year • dangerous community • Homicide Map
  • 20. Questions? Vijay Mandava: vijay@insemble.com Lan Jiang: lan@insemble.com / @Lan_Jiang