SlideShare a Scribd company logo
MongoDB tuning on AWS
Information
•Tweet: Hashtag #jawsdays #ijaws
•Please register you on ijaws on Doorkeeper (Next
meetup on Mid April)
•There’s the JOB board behind the wall
Self-Introduction
•Ryuji Tamagawa@facebook
•tamagawa_ryuji@twitter
•Software Developer working in
Osaka
•Translator (for O’Reilly)
•Loves performance tuning
Introducing MongoDB
•Hybrid of NoSQL and RDB
•Easily Scales (up to certain point)
•Stores JSON document as ‘BSON’
•Has Seconday Index ( on any part of JSON Doc), Query Optimizer
•Replication, Sharding ready
To make MongoDB runs fast on AWS
•You have to understand:
•its architectural feature of memory
management
•Workload pattern of your application
•Size of your ‘HOT’ data
What’s the ‘HOT’ data?
•‘Hot’ Data is what accessed frequently
•Ex: If you simply write data like access logs and transfer
them to somewhere else, ‘hot’ spot could be very small
•If the collection has indexes, one write can make many
places hot
MongoDB does not manage memory
•Most DBMS has built-in MMS,
but MongoDB doesn’t.
•MongoDB accesses database
files through ‘Memory
mapped files’: Let the OS
manage the buffer
Traditional RDB
Memory
Buffer
DB Files
MongoDB
Memory Mapped
DB Files
OS
App
The Rules of Thumb about Memory
•Give enough memory to the OS to hold ‘HOT’ data
•Don’t forget about the indexes
•Use dedicated EC2 instances
Keep your data safe with Replication
•Using ReplicaSet, you can distribute
your data to many places easily
•You have choices to keep your data
safe from crashes
•EBS or Instance Store : trade off
between cost, safety, performance
Primary
Secondary Secondary
Try MongoDB’s Replicaset with:
https://bitbucket.org/tamagawa_ryuji/mongodb_replicaset_playground_on_vagrant
Storage Performance Evaluated
• Converted Wikipadia-ja’s page data (about 1,700,000
documents) to JSON
• Write them to MongoDB on EC2 from another instance
• Data writer is a simple python application with
pymongo driver running 4 processes
Storage Performance Evaluated
Instance Type
Instance
Cost(Spot)
Storage Time to finish
ebs-normal 0:10:55
ephemeral0 0:07:36
PIOPS 1500 0:08:26
ephemeral0 0:10:22
PIOPS 1500 0:09:02
ephemeral0 0:05:19
m3.large $0.09
m3.xlarge
(SSD instance store)
$0.16
hi1.4xlarge
(Storage Optimized)
$0.50
Comparing Instance Types
Instance
Type
CPU ECU Memory Storage Cost
Memory
($/GB)
CPU
($/ECU)
Storage
($/100GB)
m3.medium 1 3 3.75 1 x 4 SSD $0.17 $0.05 $0.06 $4.28
m3.large 2 6.5 7.5 1 x 32 SSD $0.34 $0.05 $0.05 $1.07
m3.xlarge 4 13 15 2 x 40 SSD $0.68 $0.05 $0.05 $0.86
m3.2xlarge 8 26 30 2 x 80 SSD $1.37 $0.05 $0.05 $0.86
m2.xlarge 2 6.5 17.1 1 x 420 $0.51 $0.03 $0.08 $0.12
m2.2xlarge 4 13 34.2 1 x 850 $1.01 $0.03 $0.08 $0.12
m2.4xlarge 8 26 68.4 2 x 840 $2.02 $0.03 $0.08 $0.12
cr1.8xlarge 32 88 244 2 x 120 SSD $4.31 $0.02 $0.05 $1.80
i2.xlarge 4 14 30.5 1 x 800 SSD $1.05 $0.03 $0.08 $0.13
i2.2xlarge 8 27 61 2 x 800 SSD $2.10 $0.03 $0.08 $0.13
i2.4xlarge 16 53 122 4 x 800 SSD $4.20 $0.03 $0.08 $0.13
i2.8xlarge 32 104 244 8 x 800 SSD $8.40 $0.03 $0.08 $0.13
hs1.8xlarge 16 35 117 24 x 2048 $5.67 $0.05 $0.16 $0.01
THANK YOU !
YOUR CONTACTS ARE WELCOME !!

More Related Content

PDF
Amazon DynamoDB by Aswin
KEY
Scaling with MongoDB
PPTX
Share point 2013 on azure
PDF
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
PPTX
MongoDB for the SQL Server
PPTX
Building a Directed Graph with MongoDB
PDF
Mongo presentation conf
PDF
Starting with MongoDB
Amazon DynamoDB by Aswin
Scaling with MongoDB
Share point 2013 on azure
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
MongoDB for the SQL Server
Building a Directed Graph with MongoDB
Mongo presentation conf
Starting with MongoDB

What's hot (20)

PPT
Introduction to MongoDB
PPTX
Intro To Mongo Db
KEY
MongoDB NYC Python
PPT
MongoDB @ fliptop
PPTX
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DB
PDF
Omaha Rails User Group - Ec2
PPTX
Explore the Cosmos (DB) with .NET Core 2.0
PDF
Ec2onrails
PPTX
Presentation: mongo db & elasticsearch & membase
PPTX
What to know about Amazon Elastic Block Store (EBS)
PPTX
Mongodb basics and architecture
PPTX
An Introduction To NoSQL & MongoDB
PPTX
Dev Jumpstart: Build Your First App with MongoDB
PDF
Introduction into CouchDB / Jan Lehnardt
PDF
AWS Cloud experience concepts tips and tricks
PPTX
Elasticsearch Arcihtecture & What's New in Version 5
PDF
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
PPTX
PostgreSQL is the new NoSQL - at Devoxx 2018
PPT
MongoDb - Details on the POC
PDF
Introduction to MongoDB
Introduction to MongoDB
Intro To Mongo Db
MongoDB NYC Python
MongoDB @ fliptop
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DB
Omaha Rails User Group - Ec2
Explore the Cosmos (DB) with .NET Core 2.0
Ec2onrails
Presentation: mongo db & elasticsearch & membase
What to know about Amazon Elastic Block Store (EBS)
Mongodb basics and architecture
An Introduction To NoSQL & MongoDB
Dev Jumpstart: Build Your First App with MongoDB
Introduction into CouchDB / Jan Lehnardt
AWS Cloud experience concepts tips and tricks
Elasticsearch Arcihtecture & What's New in Version 5
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
PostgreSQL is the new NoSQL - at Devoxx 2018
MongoDb - Details on the POC
Introduction to MongoDB
Ad

Similar to MongoDB tuning on AWS (20)

PPTX
Augmenting Mongo DB with treasure data
PPTX
Augmenting Mongo DB with Treasure Data
PPTX
Basics of MongoDB
PPTX
Why databases cry at night
PPTX
MongoDB 2.4 and spring data
PPT
Mongodb Training Tutorial in Bangalore
PPT
mongodb-120401144140-phpapp01 claud camputing
PPTX
MongoDB Aggregation Performance
PDF
Scaling with mongo db (with notes)
PPTX
Top MongoDB interview Questions and Answers
PPT
MongoDB Pros and Cons
PPTX
Hardware Provisioning for MongoDB
PPTX
MongoDB by Emroz sardar.
PDF
PDF
Mongodb
PDF
10gen MongoDB Video Presentation at WebGeek DevCup
PDF
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
PPTX
Running MongoDB in the Cloud
PDF
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
PDF
Document Similarity with Cloud Computing
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with Treasure Data
Basics of MongoDB
Why databases cry at night
MongoDB 2.4 and spring data
Mongodb Training Tutorial in Bangalore
mongodb-120401144140-phpapp01 claud camputing
MongoDB Aggregation Performance
Scaling with mongo db (with notes)
Top MongoDB interview Questions and Answers
MongoDB Pros and Cons
Hardware Provisioning for MongoDB
MongoDB by Emroz sardar.
Mongodb
10gen MongoDB Video Presentation at WebGeek DevCup
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Running MongoDB in the Cloud
MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repositor...
Document Similarity with Cloud Computing
Ad

More from Ryuji Tamagawa (20)

PDF
20171012 found IT #9 PySparkの勘所
PDF
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
PPTX
hbstudy 74 Site Reliability Engineering
PDF
PySparkの勘所(20170630 sapporo db analytics showcase)
PDF
20170210 sapporotechbar7
PDF
20161215 python pandas-spark四方山話
PDF
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
PDF
20160708 データ処理のプラットフォームとしてのpython 札幌
PDF
20160127三木会 RDB経験者のためのspark
PDF
20151205 Japan.R SparkRとParquet
PDF
Performant data processing with PySpark, SparkR and DataFrame API
PDF
Apache Sparkの紹介
PDF
足を地に着け落ち着いて考える
PDF
ヘルシープログラマ・翻訳と実践
PDF
Google Big Query
PDF
BigQueryの課金、節約しませんか
PDF
You might be paying too much for BigQuery
PDF
Google BigQueryについて 紹介と推測
PDF
lessons learned from talking at rakuten technology conference
PDF
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
20171012 found IT #9 PySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
hbstudy 74 Site Reliability Engineering
PySparkの勘所(20170630 sapporo db analytics showcase)
20170210 sapporotechbar7
20161215 python pandas-spark四方山話
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
20160708 データ処理のプラットフォームとしてのpython 札幌
20160127三木会 RDB経験者のためのspark
20151205 Japan.R SparkRとParquet
Performant data processing with PySpark, SparkR and DataFrame API
Apache Sparkの紹介
足を地に着け落ち着いて考える
ヘルシープログラマ・翻訳と実践
Google Big Query
BigQueryの課金、節約しませんか
You might be paying too much for BigQuery
Google BigQueryについて 紹介と推測
lessons learned from talking at rakuten technology conference
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました

MongoDB tuning on AWS

  • 2. Information •Tweet: Hashtag #jawsdays #ijaws •Please register you on ijaws on Doorkeeper (Next meetup on Mid April) •There’s the JOB board behind the wall
  • 3. Self-Introduction •Ryuji Tamagawa@facebook •tamagawa_ryuji@twitter •Software Developer working in Osaka •Translator (for O’Reilly) •Loves performance tuning
  • 4. Introducing MongoDB •Hybrid of NoSQL and RDB •Easily Scales (up to certain point) •Stores JSON document as ‘BSON’ •Has Seconday Index ( on any part of JSON Doc), Query Optimizer •Replication, Sharding ready
  • 5. To make MongoDB runs fast on AWS •You have to understand: •its architectural feature of memory management •Workload pattern of your application •Size of your ‘HOT’ data
  • 6. What’s the ‘HOT’ data? •‘Hot’ Data is what accessed frequently •Ex: If you simply write data like access logs and transfer them to somewhere else, ‘hot’ spot could be very small •If the collection has indexes, one write can make many places hot
  • 7. MongoDB does not manage memory •Most DBMS has built-in MMS, but MongoDB doesn’t. •MongoDB accesses database files through ‘Memory mapped files’: Let the OS manage the buffer Traditional RDB Memory Buffer DB Files MongoDB Memory Mapped DB Files OS App
  • 8. The Rules of Thumb about Memory •Give enough memory to the OS to hold ‘HOT’ data •Don’t forget about the indexes •Use dedicated EC2 instances
  • 9. Keep your data safe with Replication •Using ReplicaSet, you can distribute your data to many places easily •You have choices to keep your data safe from crashes •EBS or Instance Store : trade off between cost, safety, performance Primary Secondary Secondary Try MongoDB’s Replicaset with: https://bitbucket.org/tamagawa_ryuji/mongodb_replicaset_playground_on_vagrant
  • 10. Storage Performance Evaluated • Converted Wikipadia-ja’s page data (about 1,700,000 documents) to JSON • Write them to MongoDB on EC2 from another instance • Data writer is a simple python application with pymongo driver running 4 processes
  • 11. Storage Performance Evaluated Instance Type Instance Cost(Spot) Storage Time to finish ebs-normal 0:10:55 ephemeral0 0:07:36 PIOPS 1500 0:08:26 ephemeral0 0:10:22 PIOPS 1500 0:09:02 ephemeral0 0:05:19 m3.large $0.09 m3.xlarge (SSD instance store) $0.16 hi1.4xlarge (Storage Optimized) $0.50
  • 12. Comparing Instance Types Instance Type CPU ECU Memory Storage Cost Memory ($/GB) CPU ($/ECU) Storage ($/100GB) m3.medium 1 3 3.75 1 x 4 SSD $0.17 $0.05 $0.06 $4.28 m3.large 2 6.5 7.5 1 x 32 SSD $0.34 $0.05 $0.05 $1.07 m3.xlarge 4 13 15 2 x 40 SSD $0.68 $0.05 $0.05 $0.86 m3.2xlarge 8 26 30 2 x 80 SSD $1.37 $0.05 $0.05 $0.86 m2.xlarge 2 6.5 17.1 1 x 420 $0.51 $0.03 $0.08 $0.12 m2.2xlarge 4 13 34.2 1 x 850 $1.01 $0.03 $0.08 $0.12 m2.4xlarge 8 26 68.4 2 x 840 $2.02 $0.03 $0.08 $0.12 cr1.8xlarge 32 88 244 2 x 120 SSD $4.31 $0.02 $0.05 $1.80 i2.xlarge 4 14 30.5 1 x 800 SSD $1.05 $0.03 $0.08 $0.13 i2.2xlarge 8 27 61 2 x 800 SSD $2.10 $0.03 $0.08 $0.13 i2.4xlarge 16 53 122 4 x 800 SSD $4.20 $0.03 $0.08 $0.13 i2.8xlarge 32 104 244 8 x 800 SSD $8.40 $0.03 $0.08 $0.13 hs1.8xlarge 16 35 117 24 x 2048 $5.67 $0.05 $0.16 $0.01
  • 13. THANK YOU ! YOUR CONTACTS ARE WELCOME !!