SlideShare a Scribd company logo
MongoDB & Hadoop:
Providing Business Insights
Thomas Boyd
Senior Solutions Architect, MongoDB
What is MongoDB?
The leading NoSQL database

General
Purpose

2

Document
Database

OpenSource
MongoDB Document Model
RDBMS

MongoDB

{

_id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin",
department : "Marketing",
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{

type :

"Health",

plan : "PPO Plus" },
{

type :

"Dental",

plan : "Standard" }
]
}

3
What is Hadoop?
“The Apache Hadoop software library is a framework that
allows for the distributed processing of large data sets
across clusters of computers using simple programming
models.”*
•
•
•
•

Large datasets
Analytics
Batch
Map-Reduce

*source: hadoop.apache.org
4
5

Applications
CRM, ERP, Collaboration, Mobile, BI

Data Management
Online Data

Offline Data

RDBMS
RDBMS

Hadoop

EDW

Infrastructure
OS & Virtualization, Compute, Storage, Network

Security & Auditing

Management & Monitoring

Enterprise IT Stack
Consideration: Online vs. Offline
Online

• Real-time
• Low-latency
• High availability
6

vs.

Offline

• Long-running
• High-Latency
• Availability is lower priority
Consideration: Online vs. Offline
Online

7

vs.

Offline
Hadoop is good for…

Risk Modeling

Recommendation
Engine

Ad Targeting

Transaction
Analysis

Trade
Surveillance

Network Failure
Prediction

8

Churn Analysis

Search Quality

Data Lake
MongoDB is good for…

360 Degree View
of the Customer

Fraud Detection

User Data
Management

Content
Management &
Delivery

Reference Data

Product Catalogs

9

Mobile & Social
Apps

Machine to
Machine Apps

Data Hub
MongoDB and Hadoop: Complementary

• Real-time systems
• Light-weight analytical
workloads

10

• “Data Lake”
• In-depth analytics
Use MongoDB+Hadoop Together

ECommerce

Analysis
MongoDB
Connector for
Hadoop

•
•
•
•
•
•
11

Products & Inventory
Real-time recommendations
Customer profile
Session management
Customer clickstream
Fraud detection

•
•
•
•

Transaction history
Clickstream history
Recommendation model
Fraud modeling
Example – Fraud Detection

Nightly
Analysis

Payments

• Online payments
processing

MongoDB
Connector for
Hadoop

• Fraud modeling

query
only

Fraud
Detection

query only
12

Results
Cache

3rd Party Data
Sources
Customer example – Global Travel
Firm

Travel

Algorithms
MongoDB
Connector for
Hadoop

•
•
•
•

13

Flights, hotels and cars
Real-time offers
User profiles, reviews
User metadata (previous
purchases, clicks,
views)

•
•
•
•

User segmentation
Offer recommendation engine
Ad serving engine
Bundling engine
Customer example – MetLife

Churn
Analysis

Insurance
MongoDB
Connector for
Hadoop
•
•
•
•
•

14

Insurance policies
Demographic data
Customer web data
Call center data
Real-time churn detection

• Customer action analysis
• Churn prediction
algorithms
Customer example – Criteo

Ad-Serving

Algorithms
MongoDB
Connector for
Hadoop

•
•
•
•
•

15

Catalogs and products
User profiles
Clicks
Views
Transactions

• User segmentation
• Recommendation engine
• Prediction engine
What is MongoDB-Hadoop Connector?
• Java Map-Reduce, Stream Map-Reduce, Pig, &
Hive access to MongoDB
– MongoDB as input
• mongo.job.input.format=com.hadoop.MongoInputFormat
• mongo.input.uri=mongodb://my-db:27017/db1.collection1

– MongoDB as output
• mongo.job.output.format=com.hadoop.MongoOutputFormat
• mongo.input.uri=mongodb://my-db:27017/db1.collection2

– Using MongoDB backup files
• mongo.job.output.format=com.hadoop.BSONFileOutputFormat
• mapred.output.dir=file:///results.bson
16
Enhancing MongoDB-Hadoop Connector
• Version 1.1.0, July 2013

• Version 1.2.0, December 2013

– Pig support

– Apache Hadoop 2.2 support

– Hive support

– Multiple collections as M-R

– Streaming support

source

– Read/Write MongoDB backups
– Update writes

– Custom splitting support

– Much more….

17

– Multiple mongos support

– Performance improvements
MongoDB Native Analytics
• Rich query language
• Native secondary indexes
• Geospatial indexes & search
• Text indexes & search
• Aggregation framework
• Javascript Map-Reduce
• Client-side analytics

18
Resources
Resource
White paper: Big Data: Examples and
Guidelines for the Enterprise Decision Maker

http://www.mongodb.com/lp/white
paper/big-data-nosql

Recorded Webinar Series: Thrive with Big
Data

http://www.mongodb.com/lp/bigdata-series

Recorded Webinar: What’s New with
MongoDB Hadoop Integration

http://www.mongodb.com/presenta
tions/webinar-whats-newmongodb-hadoop-integration

Documentation: MongoDB Connector for
Hadoop

http://docs.mongodb.org/ecosyste
m/tools/hadoop/

Trouble Tickets

http://jira.mongodb.org (project =
Hadoop Integration)

Subscriptions, support, consulting, training

19

Location

https://www.mongodb.com/produc
ts/how-to-buy
Webinar: MongoDB and Hadoop - Working Together to provide Business Insights

More Related Content

PPTX
Capacity Planning For Your Growing MongoDB Cluster
PPTX
Webinar: Enterprise Trends for Database-as-a-Service
PPTX
Securing Your MongoDB Deployment
PDF
MongoDB Administration 101
PPTX
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
PPTX
MongoDB Operations for Developers
PPTX
When to Use MongoDB...and When You Should Not...
PPTX
When to Use MongoDB
Capacity Planning For Your Growing MongoDB Cluster
Webinar: Enterprise Trends for Database-as-a-Service
Securing Your MongoDB Deployment
MongoDB Administration 101
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDB
MongoDB Operations for Developers
When to Use MongoDB...and When You Should Not...
When to Use MongoDB

What's hot (20)

PPTX
Prepare for Peak Holiday Season with MongoDB
PDF
Common MongoDB Use Cases
PDF
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
PPTX
Ops Jumpstart: MongoDB Administration 101
PPTX
Mongo db and hadoop driving business insights - final
KEY
MongoDB vs Mysql. A devops point of view
PPTX
Building a Scalable and Modern Infrastructure at CARFAX
PPTX
An Introduction to MongoDB Compass
PDF
Webinar: Faster Big Data Analytics with MongoDB
PDF
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
PDF
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
PDF
Aesop change data propagation
PPTX
MongoDB & Hadoop - Understanding Your Big Data
PPTX
Scrapinghub Deck for Startups
KEY
Mongo Seattle - The Business of MongoDB
PPT
MongoDB - An Agile NoSQL Database
PPTX
Augmenting Mongo DB with treasure data
PDF
Cignex mongodb-sharding-mongodbdays
PPTX
mongodb_Introduction
PPTX
Practical Use of a NoSQL Database
Prepare for Peak Holiday Season with MongoDB
Common MongoDB Use Cases
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
Ops Jumpstart: MongoDB Administration 101
Mongo db and hadoop driving business insights - final
MongoDB vs Mysql. A devops point of view
Building a Scalable and Modern Infrastructure at CARFAX
An Introduction to MongoDB Compass
Webinar: Faster Big Data Analytics with MongoDB
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Apache Spark and MongoDB - Turning Analytics into Real-Time Action
Aesop change data propagation
MongoDB & Hadoop - Understanding Your Big Data
Scrapinghub Deck for Startups
Mongo Seattle - The Business of MongoDB
MongoDB - An Agile NoSQL Database
Augmenting Mongo DB with treasure data
Cignex mongodb-sharding-mongodbdays
mongodb_Introduction
Practical Use of a NoSQL Database
Ad

Similar to Webinar: MongoDB and Hadoop - Working Together to provide Business Insights (20)

PDF
MongoDB and Hadoop: Driving Business Insights
PPTX
MongoDB and Hadoop
PPTX
MongoDB et Hadoop
PPTX
MongoDB and Hadoop: Driving Business Insights
PPTX
Webinar: Enterprise Trends for Database-as-a-Service
PPTX
Unlocking Operational Intelligence from the Data Lake
PPT
MongoDB Tick Data Presentation
PPTX
Unlocking Operational Intelligence from the Data Lake
PDF
Single View of the Customer
PPTX
L’architettura di Classe Enterprise di Nuova Generazione
PPTX
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
PPTX
How Startups can leverage big data?
PPTX
Webinar: How Banks Use MongoDB as a Tick Database
PPTX
Workshop: Make the Most of Customer Data Platforms - David Raab
PPTX
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
PPTX
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
PDF
Roadmap for Enterprise Graph Strategy
PPTX
MongoDB and Hadoop: Driving Business Insights
PPTX
Enterprise Reporting with MongoDB and JasperSoft
PPTX
Webinar: Scaling MongoDB
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop
MongoDB et Hadoop
MongoDB and Hadoop: Driving Business Insights
Webinar: Enterprise Trends for Database-as-a-Service
Unlocking Operational Intelligence from the Data Lake
MongoDB Tick Data Presentation
Unlocking Operational Intelligence from the Data Lake
Single View of the Customer
L’architettura di Classe Enterprise di Nuova Generazione
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
How Startups can leverage big data?
Webinar: How Banks Use MongoDB as a Tick Database
Workshop: Make the Most of Customer Data Platforms - David Raab
Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
Roadmap for Enterprise Graph Strategy
MongoDB and Hadoop: Driving Business Insights
Enterprise Reporting with MongoDB and JasperSoft
Webinar: Scaling MongoDB
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Electronic commerce courselecture one. Pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
cuic standard and advanced reporting.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation theory and applications.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Network Security Unit 5.pdf for BCA BBA.
Mobile App Security Testing_ A Comprehensive Guide.pdf
Programs and apps: productivity, graphics, security and other tools
Spectroscopy.pptx food analysis technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Electronic commerce courselecture one. Pdf
Big Data Technologies - Introduction.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
cuic standard and advanced reporting.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Unlocking AI with Model Context Protocol (MCP)
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Empathic Computing: Creating Shared Understanding
Encapsulation theory and applications.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Webinar: MongoDB and Hadoop - Working Together to provide Business Insights

  • 1. MongoDB & Hadoop: Providing Business Insights Thomas Boyd Senior Solutions Architect, MongoDB
  • 2. What is MongoDB? The leading NoSQL database General Purpose 2 Document Database OpenSource
  • 3. MongoDB Document Model RDBMS MongoDB { _id : ObjectId("4c4ba5e5e8aabf3"), employee_name: "Dunham, Justin", department : "Marketing", title : "Product Manager, Web", report_up: "Neray, Graham", pay_band: “C", benefits : [ { type : "Health", plan : "PPO Plus" }, { type : "Dental", plan : "Standard" } ] } 3
  • 4. What is Hadoop? “The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.”* • • • • Large datasets Analytics Batch Map-Reduce *source: hadoop.apache.org 4
  • 5. 5 Applications CRM, ERP, Collaboration, Mobile, BI Data Management Online Data Offline Data RDBMS RDBMS Hadoop EDW Infrastructure OS & Virtualization, Compute, Storage, Network Security & Auditing Management & Monitoring Enterprise IT Stack
  • 6. Consideration: Online vs. Offline Online • Real-time • Low-latency • High availability 6 vs. Offline • Long-running • High-Latency • Availability is lower priority
  • 7. Consideration: Online vs. Offline Online 7 vs. Offline
  • 8. Hadoop is good for… Risk Modeling Recommendation Engine Ad Targeting Transaction Analysis Trade Surveillance Network Failure Prediction 8 Churn Analysis Search Quality Data Lake
  • 9. MongoDB is good for… 360 Degree View of the Customer Fraud Detection User Data Management Content Management & Delivery Reference Data Product Catalogs 9 Mobile & Social Apps Machine to Machine Apps Data Hub
  • 10. MongoDB and Hadoop: Complementary • Real-time systems • Light-weight analytical workloads 10 • “Data Lake” • In-depth analytics
  • 11. Use MongoDB+Hadoop Together ECommerce Analysis MongoDB Connector for Hadoop • • • • • • 11 Products & Inventory Real-time recommendations Customer profile Session management Customer clickstream Fraud detection • • • • Transaction history Clickstream history Recommendation model Fraud modeling
  • 12. Example – Fraud Detection Nightly Analysis Payments • Online payments processing MongoDB Connector for Hadoop • Fraud modeling query only Fraud Detection query only 12 Results Cache 3rd Party Data Sources
  • 13. Customer example – Global Travel Firm Travel Algorithms MongoDB Connector for Hadoop • • • • 13 Flights, hotels and cars Real-time offers User profiles, reviews User metadata (previous purchases, clicks, views) • • • • User segmentation Offer recommendation engine Ad serving engine Bundling engine
  • 14. Customer example – MetLife Churn Analysis Insurance MongoDB Connector for Hadoop • • • • • 14 Insurance policies Demographic data Customer web data Call center data Real-time churn detection • Customer action analysis • Churn prediction algorithms
  • 15. Customer example – Criteo Ad-Serving Algorithms MongoDB Connector for Hadoop • • • • • 15 Catalogs and products User profiles Clicks Views Transactions • User segmentation • Recommendation engine • Prediction engine
  • 16. What is MongoDB-Hadoop Connector? • Java Map-Reduce, Stream Map-Reduce, Pig, & Hive access to MongoDB – MongoDB as input • mongo.job.input.format=com.hadoop.MongoInputFormat • mongo.input.uri=mongodb://my-db:27017/db1.collection1 – MongoDB as output • mongo.job.output.format=com.hadoop.MongoOutputFormat • mongo.input.uri=mongodb://my-db:27017/db1.collection2 – Using MongoDB backup files • mongo.job.output.format=com.hadoop.BSONFileOutputFormat • mapred.output.dir=file:///results.bson 16
  • 17. Enhancing MongoDB-Hadoop Connector • Version 1.1.0, July 2013 • Version 1.2.0, December 2013 – Pig support – Apache Hadoop 2.2 support – Hive support – Multiple collections as M-R – Streaming support source – Read/Write MongoDB backups – Update writes – Custom splitting support – Much more…. 17 – Multiple mongos support – Performance improvements
  • 18. MongoDB Native Analytics • Rich query language • Native secondary indexes • Geospatial indexes & search • Text indexes & search • Aggregation framework • Javascript Map-Reduce • Client-side analytics 18
  • 19. Resources Resource White paper: Big Data: Examples and Guidelines for the Enterprise Decision Maker http://www.mongodb.com/lp/white paper/big-data-nosql Recorded Webinar Series: Thrive with Big Data http://www.mongodb.com/lp/bigdata-series Recorded Webinar: What’s New with MongoDB Hadoop Integration http://www.mongodb.com/presenta tions/webinar-whats-newmongodb-hadoop-integration Documentation: MongoDB Connector for Hadoop http://docs.mongodb.org/ecosyste m/tools/hadoop/ Trouble Tickets http://jira.mongodb.org (project = Hadoop Integration) Subscriptions, support, consulting, training 19 Location https://www.mongodb.com/produc ts/how-to-buy

Editor's Notes

  • #6: This is where MongoDB fits into the existing enterprise IT stackMongoDB is an operational data store used for online data, in the same way that Oracle is an operational data store. It supports applications that ingest, store, manage and even analyze data in real-time. (Compared to Hadoop and data warehouses, which are used for offline, batch analytical workloads.)