Mythbusting: Understanding How We Measure Performance at MongoDB

#MongoDBDays
Mythbusting: Understanding
How We Measure the
Performance of MongoDB
Andrew Erlichson
VP, Engineering, MongoDB

Before we start…
• We are going to look a lot at
– C++ kernel code
– Java benchmarks
– JavaScript tests
• And lots of charts
• And its going to be awesome!

Goals of Benchmarking
– You have an idea
– It must be tested
– Others must be able to
reproduce your work.
– Explanations that
contribute to knowledge
are key.
– Should have practical
applications
Academia

Industry Benchmarketing
• Prove you are faster
than the competition
• Emphasize your best
attributes
• Repeatable
• Explanations

Industry Benchmarketing
• Prove you are faster
than the competition
• Emphasize your best
attributes
• Repeatable*
• Explanations*
*OPTIONAL

Goals of Internal Benchmarking
• Always be improving
• Understand our
bottlenecks
• Main customer is
engineering
• Explanations are
somewhat important.

Overview – Internal Benchmarking
• Some common traps
• Performance measurement & diagnosis
• What's next

#1 Time taken to Insert x
Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();

So that looks ok, right?
id++;
doc.put("_id",id);
String cVal = "…"
doc.put("c",cVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
}

What are else you measuring?
id++;
doc.put("_id",id);
String cVal = "…"
doc.put("c",cVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
}
Object creation and GC
management?

id++;
doc.put("_id",id);
String cVal = "…"
doc.put("c",cVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
}
management?
Thread contention on
nextInt()?

id++;
doc.put("_id",id);
String cVal = "…"
doc.put("c",cVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
}
management?
nextInt()?
Time to synthesize data?

id++;
doc.put("_id",id);
String cVal = "…"
doc.put("c",cVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
}
management?
nextInt()?
addAndGet()?

id++;
doc.put("_id",id);
String cVal = "…"
doc.put("c",cVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
}
management?
nextInt()?
addAndGet()?
Clock resolution?

Solution: Pre-Create the objects
// Pre Create the Object outside the Loop
BasicDBObject[] aDocs = new BasicDBObject[documentsPerInsert];
for (int i=0; i < documentsPerInsert; i++) {
String cVal = "…";
doc.put("c",cVal);
aDocs[i] = doc;
}
Pre-create non varying
data outside the timing
loop
Alternative
• Pre-create the data in a file; load from file

Solution: Remove contention
// Use ThreadLocalRandom generator or an instance of java.util.Random per thread
java.util.concurrent.ThreadLocalRandom rand;
for (long roundNum = 0; roundNum < numRounds; roundNum++) {
id++;
doc = aDocs[i];
doc.put("_id",id);
doc.put("k", nextInt(rand, numMaxInserts)+1);
}
coll.insert(aDocs);
}
// Maintain count outside the loop
globalInserts.addAndGet(documentsPerInsert * roundNum);
Remove contention
nextInt() by making
Thread local

Solution: Remove contention
// Use ThreadLocalRandom generator or an instance of java.util.Random per thread
java.util.concurrent.ThreadLocalRandom rand;
Remove contention
nextInt() by making
Thread local
for (long roundNum = 0; roundNum < numRounds; roundNum++) {
id++;
doc = aDocs[i];
doc.put("_id",id);
doc.put("k", nextInt(rand, numMaxInserts)+1);
}
coll.insert(aDocs);
}
// Maintain count outside the loop
globalInserts.addAndGet(documentsPerInsert * roundNum);
Remove contention on
addAndGet()

Solution: Timer resolution
…
long startTime = System.nanoTime();
…
long endTime = System.nanoTime() - startTime;
"granularity of the value
depends on the
underlying operating
system and may be
larger"
"resolution is at least as
good as that of
currentTimeMillis()"
Source
• http://docs.oracle.com/javase/7/docs/api/java/lang/System.html

General Principal #1
Know what you are
measuring

#2 Response time to return all
results
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}

So that looks ok, right?
for (int i=0; i < 1000; i++) {
}
DBObject foundObj;
}

What else are you measuring?
for (int i=0; i < 1000; i++) {
}
DBObject foundObj;
}
Each doc is is 4080 bytes
on disk with powerOf2Sizes

for (int i=0; i < 1000; i++) {
}
DBObject foundObj;
}
Unrestricted predicate?

for (int i=0; i < 1000; i++) {
}
DBObject foundObj;
}
Unrestricted predicate?
Measuring
• Time to parse &
execute query
• Time to retrieve all
document
But also
• Cost of shipping ~4MB
data through network
stack

Solution: Limit the projection
predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20));
BasicDBObject projection = new BasicDBObject();
projection.put("_id", 1);
DBCursor cur = coll.find(predicate, projection );
DBObject foundObj;
}
Return fixed range

DBObject foundObj;
}
Return fixed range
Only project _id

DBObject foundObj;
}
Return fixed range
Only project _id
Only 46k transferred
through network stack

General Principal #2
Measure only what you
need to measure

Part Two
Performance
measurement &
diagnosis

Broad categories
• Micro Benchmarks
• Workloads

mongo-perf: goals
• Measure
– commands
• Configure
– Single mongod, ReplSet size (1 -> n), Sharding
– Single vs. Multiple DB
– O/S
• Characterize
– Throughput (ops/second) by thread count
• Compare

What do you get?
Better
Thread Count
Ops/Se
c

What do you get?
Measured
improvement
between rc0 and
rc2
Better

Benchmark source code
tests.push( { name: "Commands.CountsIntIDRange",
pre: function( collection ) {
collection.drop();
for ( var i = 0; i < 1000; i++ ) {
collection.insert( { _id : i } );
}
collection.getDB().getLastError();
},
ops: [
{ op: "command",
ns : "testdb",
command : { count : "mycollection",
query : { _id : { "$gt" : 10, "$lt" : 100 } } } }
] } );

Workloads
• "public" workloads
– YCSB
– Sysbench
• "real world" simulations
– Inbox fan in/out
– Message Stores
– Content Management

Example: Bulk Load Performance
16m Documents
Better
55% degradation
2.6.0-rc1 vs 2.4.10

Ouch… where's the tree in the
woods?
• 2.4.10 -> 2.6.0
– 4495 git commits

git-bisect
• Bisect between good/bad hashes
• git-bisect nominates a new githash
– Build against githash
– Re-run test
– Confirm if this githash is good/bad
• Rinse and repeat

Bulk Load Performance - Fix
Better
11% improvement
2.6.1 vs 2.4.10

The problem with measurement
• Observability
– What can you observe on the system?
• Effect
– What effects does the observation cause?

mtools
• MongoDB log file analysis
– Filter logs for operations, events
– Response time, lock durations
– Plot
• https://github.com/rueckstiess/mtools

Code Change – Yielding Policy

Response Times
Bulk Insert 2.6.0 vs 2.6.1
Ceiling similar, lower floor
resulting in 40%
improvement in throughput

Secondary effects of Yield policy change
Write lock time reduced
Order of magnitude reduction
of write lock duration

Unexpected side effects of
measurement?
> db.serverStatus()
Yes – will cause a read lock to be acquired
> db.serverStatus({recordStats:0})
No – lock is not acquired
> mongostat
Yes - until SERVER-14008 resolved, uses db.serverStatus()

CPU sampling
• Get an impression of
– Call Graphs
– CPU time spent on node and called nodes

Setup & building with google-profiler
> sudo apt-get install google-perftools
> sudo apt-get install libunwind7-dev
> scons --use-cpu-profiler mongod

Start the profiling
> mongod –dbpath <…>
Note: Do not use –fork
> mongo
> use admin
> db.runCommand({_cpuProfilerStart: {profileFilename: 'foo.prof'}})
Execute some commands that you want to profile
> db.runCommand({_cpuProfilerStop: 1})

Sample start vs. end of workload

Public Benchmarks – Not all forks are
the same…
• YCSB
– https://github.com/achille/YCSB
• sysbench-mongodb
– https://github.com/mdcallag/sysbench-mongodb

#MongoDBDays
Thank You
Andrew Erlichson
andrew@mongodb.com
VP, Engineering

Mythbusting: Understanding How We Measure Performance at MongoDB

More Related Content

What's hot (20)

Similar to Mythbusting: Understanding How We Measure Performance at MongoDB (7)

More from MongoDB (20)

Recently uploaded (20)

Mythbusting: Understanding How We Measure Performance at MongoDB

Editor's Notes