SlideShare a Scribd company logo
Using MongoDB for BigData
May, 2017
... In 20 minutes
Agenda
• What is MongoDB?
• Replica Sets
• Sharding and Sharded Cluster
• MapReduce
• Spark and MongoDB
What is MongoDB?
• Open-source
• NoSQL
• Community / Enterprise versions
• Developed by MongoDB Inc. (formerly 10gen) in C++, C and JavaScript
• Cross-platform: Windows, Linux, OS X, Solaris, FreeBSD
• Document-oriented: stores extended binary JSON = BSON documents
• Stores any binary data like videos, pictures ... in GridFS
• Database development in JavaScript (standard libraries and user defined functions)
• Deploy, monitor, back up and scale MongoDB: Ops Manager
• Use MongoDB as a data source for your SQL-based BI: MongoDB Connector for BI,
SlamData
• Cross-platform UI for development: Robomongo
• Hosted MongoDB as a service: MongoDB Atlas
• Hosted platform for managing MongoDB: MongoDB Cloud Manager
• An another cloud provider: mLab
Some Organizations Rely on MongoDB
DB-Engines.com Top 10
Most popular NoSQL database
Terminology
RDBMS MongoDB
Database Database
Table Collection
Row Document
Index Index
Join Embedding
Partition Shard
Partition Key Shard Key
use blogdb
db.blog.insert({
"title" : "My Blog Post",
"content" : "Here's my blog post.",
"date" : ISODate("2016-08-24T21:12:09.982Z")
});
db.blog.find()
{ "_id" : ObjectId("591be0f4c79fa21e08c2e24e"), "title" : "My Blog Post", "content" :
"Here's my blog post.", "date" : ISODate("2016-08-24T21:12:09.982Z") }
Some Examples
collection object
method
current database object
generated universally unique primary key
db.blog.update(
{"_id" : ObjectId("591be0f4c79fa21e08c2e24e")},
{$push:
{"comments":
{"user":"usr1", "date":ISODate("2016-08-24T22:12:09.982Z"), text:"first comment"}
}})
db.blog.find().pretty()
{
"_id" : ObjectId("591be0f4c79fa21e08c2e24e"),
"title" : "My Blog Post",
"content" : "Here's my blog post.",
"date" : ISODate("2016-08-24T21:12:09.982Z"),
"comments" : [
{
"user" : "usr1",
"date" : ISODate("2016-08-24T22:12:09.982Z"),
"text" : "first comment"
}
]
}
Some Examples
embedding
orders collection:
MapReduce Example
{
_id: ObjectId("50a8240b927d5d8b5891743c"),
cust_id: "abc123",
ord_date: new Date("Oct 04, 2012"),
status: 'A',
price: 25,
items: [ { sku: "mmm", qty: 5, price: 2.5 },
{ sku: "nnn", qty: 5, price: 2.5 } ]
}
var mapFunction1 = function() {
emit(this.cust_id, this.price);
};
var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
};
db.orders.mapReduce(
mapFunction1,
reduceFunction1,
{ out: "map_reduce_example" }
)
Replica Sets: high availability
Read (optional) Read (optional)
Replication
Replica sets: automatic failover
Sharded Cluster
Shards are replica sets
Sharding
Ranged
Hashed
Zone
Manually associate shard key ranges to zones (groups of shards)
MongoDB connector to Apache Spark
Can be sharded
clusters too!
Data can be filtered,
aggregated at MongoDB
level
• Speedy
• Highly available
• Flexible data model
• Simple to use
• Infinite data size
BUT
• Sharded Cluster deployment requires planning!
Summary
• Install a MongoDB server / sign up to a free hosted MongoDB service like mLab sandbox
• Load the postcodes.zip data file using the mongoimport utility. If you use a MongoDB service, you will
need to install MongoDB client on your machine first.
• Create a Btree index on place.name, postal_code, place.name + place.country and
place.country fields
• Create a 2dsphere index on place.loc
• Add the {"postal_code" : "38116", "place" : { "name" : "Graceland", "country"
:"US", "state" : "Memphis", "loc" : [ 19.0419, 47.5328 ] } } document to the
collection
• Change the place.loc field of the same document to [-90.02604930000001, 35.0476912]
• Add the field owner: Lisa Marie Presley to the same document. Observe that the structure of
the document is different from the other documents of the collection.
Send me the queries that answer to the following questions:
• What is the value of the postal code of Graceland/Memphis? We need only the {"postal_code" :
"38116"} document, fields other than postal_code are not acceptable!
• How many postal_codes are in Budapest/Hungary?
• When was the "59199cdff0269ea12235e9dc" ObjectId created?
• Top 5 countries by number of documents in descending order
• Which places are within 20km around longitude -90.02604930000001 and latitude 35.0476912
(Graceland)? The result must be sorted in alphabetical order and each place appear in the result only
once (distinct).
Homework
Questions
?
Thank You!

More Related Content

PPTX
Mongo db intro.pptx
PPTX
MongoDB
PPT
Introduction to MongoDB
PPTX
PDF
Mongo DB: Operational Big Data Database
PPTX
Mango Database - Web Development
PPTX
Why MongoDB over other Databases - Habilelabs
PDF
Mongodb tutorial at Easylearning Guru
Mongo db intro.pptx
MongoDB
Introduction to MongoDB
Mongo DB: Operational Big Data Database
Mango Database - Web Development
Why MongoDB over other Databases - Habilelabs
Mongodb tutorial at Easylearning Guru

What's hot (20)

PDF
MongoDB World 2016: Poster Sessions eBook
PPTX
An Introduction to Big Data, NoSQL and MongoDB
PPTX
MongoDB for Spatio-Behavioral Data Analysis and Visualization
PPTX
Redis & MongoDB: Stop Big Data Indigestion Before It Starts
PPT
Introduction to mongoDB
PPTX
The Hive Think Tank: Rocking the Database World with RocksDB
KEY
MongoDB NYC Python
PPSX
Mongodb
PDF
Mongo db transcript
PPTX
Back to Basics 2017 - Introduction to NoSQL
PDF
Mongo presentation conf
PPTX
Why Your MongoDB Needs Redis
PPTX
Introduction to MongoDB
KEY
MongoDB Hadoop DC
PPTX
Monogo db in-action
PPTX
MongoDB : Scaling, Security & Performance
PDF
Mongodb
KEY
MongoDB vs Mysql. A devops point of view
PDF
Introduction to MongoDB
PPT
Introduction to MongoDB (Webinar Jan 2011)
MongoDB World 2016: Poster Sessions eBook
An Introduction to Big Data, NoSQL and MongoDB
MongoDB for Spatio-Behavioral Data Analysis and Visualization
Redis & MongoDB: Stop Big Data Indigestion Before It Starts
Introduction to mongoDB
The Hive Think Tank: Rocking the Database World with RocksDB
MongoDB NYC Python
Mongodb
Mongo db transcript
Back to Basics 2017 - Introduction to NoSQL
Mongo presentation conf
Why Your MongoDB Needs Redis
Introduction to MongoDB
MongoDB Hadoop DC
Monogo db in-action
MongoDB : Scaling, Security & Performance
Mongodb
MongoDB vs Mysql. A devops point of view
Introduction to MongoDB
Introduction to MongoDB (Webinar Jan 2011)
Ad

Similar to Using MongoDB For BigData in 20 Minutes (20)

PPTX
Mongo db and hadoop driving business insights - final
PDF
Mongo db first steps with csharp
PPTX
Dev Jumpstart: Build Your First App with MongoDB
PPTX
Big Data and NoSQL for Database and BI Pros
PPTX
Big Data, NoSQL with MongoDB and Cassasdra
PDF
Node Js, AngularJs and Express Js Tutorial
PPTX
MongoDB
PDF
Building your first app with MongoDB
PDF
PPTX
MongoDB 2.4 and spring data
PPTX
MongoDB presentation
PPTX
Webinar: When to Use MongoDB
PPTX
Dev Jumpstart: Build Your First App with MongoDB
PDF
Introduction to MongoDB Basics from SQL to NoSQL
PPTX
Back to Basics German 3: Einführung ins Sharding
PDF
MongoDB and Ruby on Rails
PPTX
MongoDB is a document database. It stores data in a type of JSON format calle...
PPTX
Basics of MongoDB
PDF
Using MongoDB and Python
PDF
2016 feb-23 pyugre-py_mongo
Mongo db and hadoop driving business insights - final
Mongo db first steps with csharp
Dev Jumpstart: Build Your First App with MongoDB
Big Data and NoSQL for Database and BI Pros
Big Data, NoSQL with MongoDB and Cassasdra
Node Js, AngularJs and Express Js Tutorial
MongoDB
Building your first app with MongoDB
MongoDB 2.4 and spring data
MongoDB presentation
Webinar: When to Use MongoDB
Dev Jumpstart: Build Your First App with MongoDB
Introduction to MongoDB Basics from SQL to NoSQL
Back to Basics German 3: Einführung ins Sharding
MongoDB and Ruby on Rails
MongoDB is a document database. It stores data in a type of JSON format calle...
Basics of MongoDB
Using MongoDB and Python
2016 feb-23 pyugre-py_mongo
Ad

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Quality review (1)_presentation of this 21
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Computer network topology notes for revision
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Introduction to machine learning and Linear Models
PDF
Foundation of Data Science unit number two notes
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Business Analytics and business intelligence.pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
climate analysis of Dhaka ,Banglades.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
STUDY DESIGN details- Lt Col Maksud (21).pptx
Quality review (1)_presentation of this 21
Fluorescence-microscope_Botany_detailed content
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Computer network topology notes for revision
Database Infoormation System (DBIS).pptx
Introduction to machine learning and Linear Models
Foundation of Data Science unit number two notes
Data_Analytics_and_PowerBI_Presentation.pptx
Reliability_Chapter_ presentation 1221.5784
Business Ppt On Nestle.pptx huunnnhhgfvu
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Business Analytics and business intelligence.pdf
Miokarditis (Inflamasi pada Otot Jantung)
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Acceptance and paychological effects of mandatory extra coach I classes.pptx
.pdf is not working space design for the following data for the following dat...
oil_refinery_comprehensive_20250804084928 (1).pptx

Using MongoDB For BigData in 20 Minutes

  • 1. Using MongoDB for BigData May, 2017 ... In 20 minutes
  • 2. Agenda • What is MongoDB? • Replica Sets • Sharding and Sharded Cluster • MapReduce • Spark and MongoDB
  • 3. What is MongoDB? • Open-source • NoSQL • Community / Enterprise versions • Developed by MongoDB Inc. (formerly 10gen) in C++, C and JavaScript • Cross-platform: Windows, Linux, OS X, Solaris, FreeBSD • Document-oriented: stores extended binary JSON = BSON documents • Stores any binary data like videos, pictures ... in GridFS • Database development in JavaScript (standard libraries and user defined functions) • Deploy, monitor, back up and scale MongoDB: Ops Manager • Use MongoDB as a data source for your SQL-based BI: MongoDB Connector for BI, SlamData • Cross-platform UI for development: Robomongo • Hosted MongoDB as a service: MongoDB Atlas • Hosted platform for managing MongoDB: MongoDB Cloud Manager • An another cloud provider: mLab
  • 5. DB-Engines.com Top 10 Most popular NoSQL database
  • 6. Terminology RDBMS MongoDB Database Database Table Collection Row Document Index Index Join Embedding Partition Shard Partition Key Shard Key
  • 7. use blogdb db.blog.insert({ "title" : "My Blog Post", "content" : "Here's my blog post.", "date" : ISODate("2016-08-24T21:12:09.982Z") }); db.blog.find() { "_id" : ObjectId("591be0f4c79fa21e08c2e24e"), "title" : "My Blog Post", "content" : "Here's my blog post.", "date" : ISODate("2016-08-24T21:12:09.982Z") } Some Examples collection object method current database object generated universally unique primary key
  • 8. db.blog.update( {"_id" : ObjectId("591be0f4c79fa21e08c2e24e")}, {$push: {"comments": {"user":"usr1", "date":ISODate("2016-08-24T22:12:09.982Z"), text:"first comment"} }}) db.blog.find().pretty() { "_id" : ObjectId("591be0f4c79fa21e08c2e24e"), "title" : "My Blog Post", "content" : "Here's my blog post.", "date" : ISODate("2016-08-24T21:12:09.982Z"), "comments" : [ { "user" : "usr1", "date" : ISODate("2016-08-24T22:12:09.982Z"), "text" : "first comment" } ] } Some Examples embedding
  • 9. orders collection: MapReduce Example { _id: ObjectId("50a8240b927d5d8b5891743c"), cust_id: "abc123", ord_date: new Date("Oct 04, 2012"), status: 'A', price: 25, items: [ { sku: "mmm", qty: 5, price: 2.5 }, { sku: "nnn", qty: 5, price: 2.5 } ] } var mapFunction1 = function() { emit(this.cust_id, this.price); }; var reduceFunction1 = function(keyCustId, valuesPrices) { return Array.sum(valuesPrices); }; db.orders.mapReduce( mapFunction1, reduceFunction1, { out: "map_reduce_example" } )
  • 10. Replica Sets: high availability Read (optional) Read (optional) Replication
  • 13. Sharding Ranged Hashed Zone Manually associate shard key ranges to zones (groups of shards)
  • 14. MongoDB connector to Apache Spark Can be sharded clusters too! Data can be filtered, aggregated at MongoDB level
  • 15. • Speedy • Highly available • Flexible data model • Simple to use • Infinite data size BUT • Sharded Cluster deployment requires planning! Summary
  • 16. • Install a MongoDB server / sign up to a free hosted MongoDB service like mLab sandbox • Load the postcodes.zip data file using the mongoimport utility. If you use a MongoDB service, you will need to install MongoDB client on your machine first. • Create a Btree index on place.name, postal_code, place.name + place.country and place.country fields • Create a 2dsphere index on place.loc • Add the {"postal_code" : "38116", "place" : { "name" : "Graceland", "country" :"US", "state" : "Memphis", "loc" : [ 19.0419, 47.5328 ] } } document to the collection • Change the place.loc field of the same document to [-90.02604930000001, 35.0476912] • Add the field owner: Lisa Marie Presley to the same document. Observe that the structure of the document is different from the other documents of the collection. Send me the queries that answer to the following questions: • What is the value of the postal code of Graceland/Memphis? We need only the {"postal_code" : "38116"} document, fields other than postal_code are not acceptable! • How many postal_codes are in Budapest/Hungary? • When was the "59199cdff0269ea12235e9dc" ObjectId created? • Top 5 countries by number of documents in descending order • Which places are within 20km around longitude -90.02604930000001 and latitude 35.0476912 (Graceland)? The result must be sorted in alphabetical order and each place appear in the result only once (distinct). Homework