SlideShare a Scribd company logo
Essential Tools For
Your Big Data Arsenal
Matt Asay (@mjasay)
VP, Business Development & Strategy, MongoDB
The Big Data Unknown
Top Big Data Challenges?
Translation?
Most struggle
to know what
Big Data
is, how to
manage it and
who can
manage it

Source: Gartner
3
Understanding Big Data – It’s Not Very “Big”

64% - Ingest
diverse, new data in
real-time
15% - More than 100TB
of data
20% - Less than 100TB
(average of all? <20TB)
from Big Data Executive Summary – 50+ top executives from Government and F500 firms

4
Innovation As Iteration
“I have not failed. I've just found 10,000 ways that won't work.”
― Thomas A. Edison
Back in 1970…Cars Were Great!

7
So Were Computers!

8
Lots of Great Innovations Since 1970

9
Including the Relational Database

10
RDBMS Makes Development Hard

Code

DB Schema

Application

11

XML Config

Object Relational
Mapping

Relational
Database
And Even Harder To Iterate
New
Table

New
Column

New
Table
Name

Pet

Phone

New
Column

3 months later…

12

Email
From Complexity to Simplicity
RDBMS

MongoDB

{

_id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin",
department : "Marketing",
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{

type :

"Health",

plan : "PPO Plus" },
{

type :

"Dental",

plan : "Standard" }
]
}

13
So…Use Open Source

14
Big Data != Big Upfront Payment

15
RDBMS Is Expensive To Scale

“Clients can also opt to run zEC12 without a raised
datacenter floor -- a first for high-end IBM mainframes.”
IBM Press Release 28 Aug, 2012

16
Spoiled for choice
DB-Engines.com Database Ranking
1 Oracle
2 MySQL
3 Microsoft SQL Server
4 PostgreSQL
5 DB2
6 MongoDB
7 Microsoft Access
8 SQLite
9 Sybase
10 Teradata

17

Relational DBMS
1583.84
Relational DBMS
1331.34
Relational DBMS
1207
Relational DBMS
177.01
Relational DBMS
175.83
NoSQL Document Store 149.48
Relational DBMS
142.49
Relational DBMS
77.88
Relational DBMS
73.66
Relational DBMS
54.41

54.23
25.58
-106.78
-5.22
3.58
-2.71
-4.21
-4.9
-1.68
3.32
Remember the Long Tail?

18
It Didn’t Work Out So Well

19
Use Popular, Well-Known Technologies

20

Source: Silicon Angle, 2012
Ask the Right Questions…

“Organizations already have people who know
their own data better than mystical data
scientists….Learning Hadoop [or MongoDB] is
easier than learning the company’s business.”
(Gartner, 2012)

21
Leverage Existing Skills

22
Search as a Sign?

23
When To Use Hadoop, NoSQL
25

Applications
CRM, ERP, Collaboration, Mobile, BI

Data Management
Online Data
RDBMS
RDBMS

Offline Data
Hadoop

Infrastructure
OS & Virtualization, Compute, Storage, Network

EDW

Security & Auditing

Management & Monitoring

Enterprise Big Data Stack
Consideration – Online vs. Offline
Online

• Real-time
• Low-latency
• High availability
26

vs.

Offline

• Long-running
• High-Latency
• Availability is lower priority
Consideration – Online vs. Offline
Online

27

vs.

Offline
Hadoop Is Good for…

Risk Modeling

Recommendation
Engine

Ad Targeting

Transaction
Analysis

Trade
Surveillance

Network Failure
Prediction

28

Churn Analysis

Search Quality

Data Lake
MongoDB/NoSQL Is Good for…

360° View of the
Customer

Fraud Detection

User Data
Management

Content
Management &
Delivery

Reference Data

Product Catalogs

29

Mobile & Social
Apps

Machine to
Machine Apps

Data Hub
How To Use The Two Together?
Finding Waldo

31
Customer example: Online Travel

Travel

Algorithms
MongoDB
Connector for
Hadoop

•
•
•
•

32

Flights, hotels and cars
Real-time offers
User profiles, reviews
User metadata (previous
purchases, clicks, views)

•
•
•
•

User segmentation
Offer recommendation engine
Ad serving engine
Bundling engine
Predictive Analytics

Government

Algorithms

MongoDB
+ Hadoop
• Predictive analytics system
for crime, health issues
• Diverse, unstructured (incl.
geospatial) data from 30+
agencies
• Correlate data in real-time
33

• Long-form trend analysis
• MongoDB data dumped into
Hadoop, analyzed, re-inserted
into MongoDB for better realtime response
Data Hub

Churn
Analysis

Insurance
MongoDB
Connector for
Hadoop

•
•
•
•
•

34

Insurance policies
Demographic data
Customer web data
Call center data
Real-time churn detection

• Customer action analysis
• Churn prediction
algorithms
Machine Learning

Ad-Serving

Algorithms
MongoDB
Connector for
Hadoop

•
•
•
•
•

35

Catalogs and products
User profiles
Clicks
Views
Transactions

• User segmentation
• Recommendation engine
• Prediction engine
MongoDB + Hadoop Connector
• Makes MongoDB a Hadoop-enabled file system
• Read and write to live data, in-place
• Copy data between Hadoop and MongoDB

• Full support for data processing
– Hive
– MapReduce
– Pig
– Streaming
– EMR

36

MongoDB
Connector for
Hadoop
@mjasay

More Related Content

PPT
"Big Data Dreams"
PPTX
Ai presentatie
PPTX
Big Data
PDF
Fraud Detection with Graphs at the Danish Business Authority
PDF
Smart Data Webinar: Knowledge as a Service
PDF
Graph Database
PPTX
Big Data and The Future of Insight - Future Foundation
PDF
Seven Trends in Government Business Intelligence
"Big Data Dreams"
Ai presentatie
Big Data
Fraud Detection with Graphs at the Danish Business Authority
Smart Data Webinar: Knowledge as a Service
Graph Database
Big Data and The Future of Insight - Future Foundation
Seven Trends in Government Business Intelligence

What's hot (20)

PDF
Tamr | Making enterprise elephants dance @ boston data festival
DOCX
What is Big Data? - Business Plans
PPTX
BigData in Banking
PDF
dsl & bigdata
PPTX
Big Data Analytics
PPTX
What are the 6 elements of a project
PPTX
Big data analytics in banking sector
PDF
Smart Data Webinar: Transforming Industries with Artificial Intelligence (AI)...
PPTX
Importance of Big data for your Business
PDF
Big Data for the Rest of Us - OpenWest 2014 - Matt Asay
PDF
Milkrun routing optimization
PDF
Big data-analytics-ebook
PDF
Transport routing optimization
PPTX
Presentation on Big Data
PDF
Shortest path routing
PPTX
Data Science in Sourcing Gartner BI 2016
PPTX
Big Data & Business Analytics: Understanding the Marketspace
PDF
Big data analytic market opportunity
PPTX
Big Data Analytics and a Chartered Accountant
PDF
Location decisions Center of Gravity
Tamr | Making enterprise elephants dance @ boston data festival
What is Big Data? - Business Plans
BigData in Banking
dsl & bigdata
Big Data Analytics
What are the 6 elements of a project
Big data analytics in banking sector
Smart Data Webinar: Transforming Industries with Artificial Intelligence (AI)...
Importance of Big data for your Business
Big Data for the Rest of Us - OpenWest 2014 - Matt Asay
Milkrun routing optimization
Big data-analytics-ebook
Transport routing optimization
Presentation on Big Data
Shortest path routing
Data Science in Sourcing Gartner BI 2016
Big Data & Business Analytics: Understanding the Marketspace
Big data analytic market opportunity
Big Data Analytics and a Chartered Accountant
Location decisions Center of Gravity
Ad

Viewers also liked (20)

PDF
Ifsp tramiteprocedimientosciviles
PPT
Enfermedades de transmisión sexual
PDF
RECOPILACIÓN 456 JUEGOS Y DINÁMICAS DE INTEGRACIÓN GRUPAL.
PDF
Curso Taller de Preparación para la Certificación (PMI- RMP)®- Realizar el an...
DOCX
Secuencia didáctica
PPT
Présentation affichage parking Plainpalais
PDF
Anthony robbins -_Mensaje_a_un_Amig@
PPT
Curso de Dirección de Proyectos
PDF
The ultimate guide to employee referrals
PDF
Segundo Paquete Económico 2017 Zacatecas - Egresos (4-8)
PDF
Taller de Preparación para la Certificación (PMI-RMP)® - Realizar el Análisis...
PDF
PDF
Matemática básica
PPTX
Pensamiento Critico
PDF
Guia buenas prácticas uso racional de energia en el sector de la pyme
DOC
Estudio economico De Un Proyecto
PDF
Guia de Evaluación, Monitoreo y Supervisión para servicios de salud
PDF
Manual bpm para la elaboracion de embutidos
DOC
INFORME DE AUDITORIA GUBERNAMENTAL
PPTX
Energía Alternativa
Ifsp tramiteprocedimientosciviles
Enfermedades de transmisión sexual
RECOPILACIÓN 456 JUEGOS Y DINÁMICAS DE INTEGRACIÓN GRUPAL.
Curso Taller de Preparación para la Certificación (PMI- RMP)®- Realizar el an...
Secuencia didáctica
Présentation affichage parking Plainpalais
Anthony robbins -_Mensaje_a_un_Amig@
Curso de Dirección de Proyectos
The ultimate guide to employee referrals
Segundo Paquete Económico 2017 Zacatecas - Egresos (4-8)
Taller de Preparación para la Certificación (PMI-RMP)® - Realizar el Análisis...
Matemática básica
Pensamiento Critico
Guia buenas prácticas uso racional de energia en el sector de la pyme
Estudio economico De Un Proyecto
Guia de Evaluación, Monitoreo y Supervisión para servicios de salud
Manual bpm para la elaboracion de embutidos
INFORME DE AUDITORIA GUBERNAMENTAL
Energía Alternativa
Ad

Similar to Your Big Data Arsenal - Strata 2013 (20)

PDF
Big Data for One Big Family
PPTX
The Business of Big Data - IA Ventures
PPTX
A Big Data Concept
PPTX
Big data and data mining
PDF
02 a holistic approach to big data
PPTX
Finding business value in Big Data
PPTX
Big Data : From HindSight to Insight to Foresight
PPT
Big data introduction, Hadoop in details
PDF
Overview - IBM Big Data Platform
PDF
Big Data Trends - WorldFuture 2015 Conference
PDF
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
PDF
QuickView #3 - Big Data
PPTX
New professional careers in data
PDF
Building the Cognitive Era : Big Data Strategies
PPTX
Data mining with big data implementation
PPTX
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
PPT
Introduction to Big Data An analogy between Sugar Cane & Big Data
PPTX
MongoDB & Hadoop - Understanding Your Big Data
PDF
Key note big data analytics ecosystem strategy
PDF
The 25 Predictions About The Future Of Big Data
Big Data for One Big Family
The Business of Big Data - IA Ventures
A Big Data Concept
Big data and data mining
02 a holistic approach to big data
Finding business value in Big Data
Big Data : From HindSight to Insight to Foresight
Big data introduction, Hadoop in details
Overview - IBM Big Data Platform
Big Data Trends - WorldFuture 2015 Conference
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
QuickView #3 - Big Data
New professional careers in data
Building the Cognitive Era : Big Data Strategies
Data mining with big data implementation
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
Introduction to Big Data An analogy between Sugar Cane & Big Data
MongoDB & Hadoop - Understanding Your Big Data
Key note big data analytics ecosystem strategy
The 25 Predictions About The Future Of Big Data

Recently uploaded (20)

PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPT
Teaching material agriculture food technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Cloud computing and distributed systems.
PPTX
Big Data Technologies - Introduction.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation theory and applications.pdf
Chapter 3 Spatial Domain Image Processing.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25 Week I
Review of recent advances in non-invasive hemoglobin estimation
Teaching material agriculture food technology
Network Security Unit 5.pdf for BCA BBA.
Cloud computing and distributed systems.
Big Data Technologies - Introduction.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation_ Review paper, used for researhc scholars
Understanding_Digital_Forensics_Presentation.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Your Big Data Arsenal - Strata 2013

  • 1. Essential Tools For Your Big Data Arsenal Matt Asay (@mjasay) VP, Business Development & Strategy, MongoDB
  • 2. The Big Data Unknown
  • 3. Top Big Data Challenges? Translation? Most struggle to know what Big Data is, how to manage it and who can manage it Source: Gartner 3
  • 4. Understanding Big Data – It’s Not Very “Big” 64% - Ingest diverse, new data in real-time 15% - More than 100TB of data 20% - Less than 100TB (average of all? <20TB) from Big Data Executive Summary – 50+ top executives from Government and F500 firms 4
  • 6. “I have not failed. I've just found 10,000 ways that won't work.” ― Thomas A. Edison
  • 7. Back in 1970…Cars Were Great! 7
  • 9. Lots of Great Innovations Since 1970 9
  • 11. RDBMS Makes Development Hard Code DB Schema Application 11 XML Config Object Relational Mapping Relational Database
  • 12. And Even Harder To Iterate New Table New Column New Table Name Pet Phone New Column 3 months later… 12 Email
  • 13. From Complexity to Simplicity RDBMS MongoDB { _id : ObjectId("4c4ba5e5e8aabf3"), employee_name: "Dunham, Justin", department : "Marketing", title : "Product Manager, Web", report_up: "Neray, Graham", pay_band: “C", benefits : [ { type : "Health", plan : "PPO Plus" }, { type : "Dental", plan : "Standard" } ] } 13
  • 15. Big Data != Big Upfront Payment 15
  • 16. RDBMS Is Expensive To Scale “Clients can also opt to run zEC12 without a raised datacenter floor -- a first for high-end IBM mainframes.” IBM Press Release 28 Aug, 2012 16
  • 17. Spoiled for choice DB-Engines.com Database Ranking 1 Oracle 2 MySQL 3 Microsoft SQL Server 4 PostgreSQL 5 DB2 6 MongoDB 7 Microsoft Access 8 SQLite 9 Sybase 10 Teradata 17 Relational DBMS 1583.84 Relational DBMS 1331.34 Relational DBMS 1207 Relational DBMS 177.01 Relational DBMS 175.83 NoSQL Document Store 149.48 Relational DBMS 142.49 Relational DBMS 77.88 Relational DBMS 73.66 Relational DBMS 54.41 54.23 25.58 -106.78 -5.22 3.58 -2.71 -4.21 -4.9 -1.68 3.32
  • 18. Remember the Long Tail? 18
  • 19. It Didn’t Work Out So Well 19
  • 20. Use Popular, Well-Known Technologies 20 Source: Silicon Angle, 2012
  • 21. Ask the Right Questions… “Organizations already have people who know their own data better than mystical data scientists….Learning Hadoop [or MongoDB] is easier than learning the company’s business.” (Gartner, 2012) 21
  • 23. Search as a Sign? 23
  • 24. When To Use Hadoop, NoSQL
  • 25. 25 Applications CRM, ERP, Collaboration, Mobile, BI Data Management Online Data RDBMS RDBMS Offline Data Hadoop Infrastructure OS & Virtualization, Compute, Storage, Network EDW Security & Auditing Management & Monitoring Enterprise Big Data Stack
  • 26. Consideration – Online vs. Offline Online • Real-time • Low-latency • High availability 26 vs. Offline • Long-running • High-Latency • Availability is lower priority
  • 27. Consideration – Online vs. Offline Online 27 vs. Offline
  • 28. Hadoop Is Good for… Risk Modeling Recommendation Engine Ad Targeting Transaction Analysis Trade Surveillance Network Failure Prediction 28 Churn Analysis Search Quality Data Lake
  • 29. MongoDB/NoSQL Is Good for… 360° View of the Customer Fraud Detection User Data Management Content Management & Delivery Reference Data Product Catalogs 29 Mobile & Social Apps Machine to Machine Apps Data Hub
  • 30. How To Use The Two Together?
  • 32. Customer example: Online Travel Travel Algorithms MongoDB Connector for Hadoop • • • • 32 Flights, hotels and cars Real-time offers User profiles, reviews User metadata (previous purchases, clicks, views) • • • • User segmentation Offer recommendation engine Ad serving engine Bundling engine
  • 33. Predictive Analytics Government Algorithms MongoDB + Hadoop • Predictive analytics system for crime, health issues • Diverse, unstructured (incl. geospatial) data from 30+ agencies • Correlate data in real-time 33 • Long-form trend analysis • MongoDB data dumped into Hadoop, analyzed, re-inserted into MongoDB for better realtime response
  • 34. Data Hub Churn Analysis Insurance MongoDB Connector for Hadoop • • • • • 34 Insurance policies Demographic data Customer web data Call center data Real-time churn detection • Customer action analysis • Churn prediction algorithms
  • 35. Machine Learning Ad-Serving Algorithms MongoDB Connector for Hadoop • • • • • 35 Catalogs and products User profiles Clicks Views Transactions • User segmentation • Recommendation engine • Prediction engine
  • 36. MongoDB + Hadoop Connector • Makes MongoDB a Hadoop-enabled file system • Read and write to live data, in-place • Copy data between Hadoop and MongoDB • Full support for data processing – Hive – MapReduce – Pig – Streaming – EMR 36 MongoDB Connector for Hadoop

Editor's Notes

  • #7: Big Data is new, and you’re likely going to fail as you start. But it’s almost guaranteed, as well, that you won’t know which data to capture, or how to leverage it, without trial and error. As such, if you were to “design for failure,” what key things would you need? You need to reduce the cost of failure, both in terms of time and money. You’d need to build on data infrastructure that supports your iterations toward success and then rewards you by making it easy and cost effective to scale.
  • #8: IBM designed IMS with Rockwell and Caterpillar starting in 1966 for the Apollo program. IMS&apos;s challenge was to inventory the very large bill of materials (BOM) for the Saturn V moon rocket and Apollo space vehicle.
  • #9: Loading a paper tape reader on the KDF9 computer.
  • #10: IBM designed IMS with Rockwell and Caterpillar starting in 1966 for the Apollo program. IMS&apos;s challenge was to inventory the very large bill of materials (BOM) for the Saturn V moon rocket and Apollo space vehicle.
  • #14: This is helpful because as much as 95% of enterprise information is unstructured, and doesn’t fit neatly into tidy rows and columns. NoSQL and Hadoop allow for dynamic schema.
  • #21: The industry is talking about Hadoop and MongoDB for Big Data. So should you
  • #23: Why not Hbase? MongoDB dramatically more popularMuch easier to useWorks from small scale to large scaleFar closer to the functionality available in RDBMS, including geospatial, secondary indexes, text search40+ languages mean you can work in your preferred programming language
  • #24: The industry is not betting on RDBMS for Big Data. Neither should you
  • #26: This is where MongoDB fits into the existing enterprise IT stackMongoDB is an operational data store used for online data, in the same way that Oracle is an operational data store. It supports applications that ingest, store, manage and even analyze data in real-time. (Compared to Hadoop and data warehouses, which are used for offline, batch analytical workloads.)
  • #28: OrSo not everyone would agree with the term offline big data
  • #29: What each of these has in common is that they’re retrospective: they’re about looking at the past to help predict the future. The learnings from these Hadoop applications end up being applied by a different technology. This is where MongoDB comes in.
  • #32: Marketing has been breaking people down into segments (Hadoop - user base as a whole) for a long time, while new marketing needs to focus on individuals (user base as a user). CriteoIf you&apos;re only optimizing on the aggregate data, you&apos;re missing out on the personalization. but i you only do the individual, you&apos;re missing out on patterns across your entire user baseYou want to do both but the systems needed to do this (on a single data set) tend to be architecturally at odds with each other.One dataset. That system lives in two different databases which are taking care of the different processing needs on the dataIn order for Hadoop/column-level processing to be useful, you want to have lots of columns. You need your real-time data store to be very rich, or you&apos;ll lose information. You don&apos;t want to be simplifying that data from the start by putting it into an RDBMS or key-value store. as a specific example, people talk about Hadoop used for log file analysis. Those log files lack a lot of context coming out of your web server for example (IP address, time stamp, etc. but not any real context as to what the log files mean - this is the watch movie button). You can actually have a much richer version of that interaction by keeping that data in a doc db that describes what is actually happening in that log