SlideShare a Scribd company logo
Getting The Most Out
Of Your Flash/SSDs
Young Paik
Technical Marketing Director
young@aerospike.com

Aerospike aer . o . spike [air-oh- spahyk]
noun, 1. tip of a rocket that enhances speed and stability
Introduction
Flash/SSDs (used interchangeably) are still
relatively new.
Getting the most out of them requires a good
understanding of how they work and how
Aerospike uses them.

© 2014 Aerospike. All rights reserved. Confidential

Pg. 2
Agenda









SSDs vs. Rotational Drives
What Aerospike Does To Make The Most of SSDs
The Factors That Most Improve The Performance of
SSDs
Testing SSDs
More on Testing SSDs
Even more on Testing SSDs
Final Preparations For Your Drives

© 2014 Aerospike. All rights reserved. Confidential

Pg. 3
SSDs
vs.
Rotational Drives
Differences Matter
Some will tell you that their databases will work
on SSDs and that no changes are necessary.
There are differences between SSDs and
rotational drives that are important. You must do
more than simply swap out your old drive and put
in an SSD to get the best performance.

© 2014 Aerospike. All rights reserved. Confidential

Pg. 5
Comparing Old and New
There are differences between rotational and SSD disks that are independent of the
database you are using.
Characteristic

Rotational

SSD

Notes

Random read

Poor

Excellent

This is where SSDs shine the most. With no moving
parts, SSDs are clearly the choice for random reads.

Random write

Poor

Good

Similar to reads, but SSDs are not quite as fast with
random writes as they are with reads.

Sequential write

Good

Excellent

Rotational drives narrow the gap here. While they
are close in pure write performance, any reads
during these writes will require the movement of the
heads on rotational drives.

Rewritability
(durability)

Excellent

Poor

This is where SSDs are the weakest. NAND (Flash)
chips have limits to how many times you can write to
the same area. Databases must take this into account
to avoid “hotspots.” Databases that do not are
relying on the operating systems (i.e. the TRIM
command) to alleviate these issues. Aerospike
manages this differently.

© 2014 Aerospike. All rights reserved. Confidential

Pg. 6
What Aerospike
Does To Make The
Most Of SSDs
Techniques
In order to make the best use of SSDs, Aerospike has
designed an architecture that does the following:
Uses raw disk

Aerospike does not use a file system, which would only slow
down the database.

Writes in large blocks

Rather than trying to write many smaller items, it is much
more efficient to write a few large ones. Aerospike uses
black sizes that are integral multiple of 128 KB.

Reads in small blocks

Reads are done in 512 byte data segments.

Handles
defragmentation on a
regular basis

All databases must delete data. This creates fragmentation
of the data on disk, which makes it harder to use efficiently.
Aerospike does this through a continual process called
defragmentation. This means you do not need the TRIM
command used on most operating systems.

Works with vendors

Aerospike works closely with SSD manufacturers to test
hardware and provide feedback for the best performance.

© 2014 Aerospike. All rights reserved. Confidential

Pg. 8
Accessing An Object In Aerospike
Writing A New Standard Data Type Record With SSDs

Client

Master Node

DRAM (Index)

SSD (DATA)
1) Client finds Master Node from
partition map.
2) Client makes write request to
Master Node.
3) Master Node make an entry indo
index (in DRAM) and queues write
in temporary write buffer.
4) Master Node coordinates write
with replica nodes (not shown).
5) Master Node returns success to
client.
6) Master Node asynchronously writes
data in blocks.
7) Index in DRAM points to location
on SSD.

Asynchronous write

Block size (128 KB by default)

© 2014 Aerospike. All rights reserved. Confidential

Pg. 9
Defragmentation In Aerospike
How Space Is Freed Up
SSD (DATA)

Aerospike writes the data in large data
blocks.

1
2
3
4
5
6
7
8

Block size (128 KB by default)

© 2014 Aerospike. All rights reserved. Confidential

Pg. 10
Defragmentation In Aerospike
How Space Is Freed Up
SSD (DATA)

As new data is added to the disk, new
blocks will be continually written to
the SSD.

1
2
3
4
5
6
7
8

Block size (128 KB by default)

© 2014 Aerospike. All rights reserved. Confidential

Pg. 11
Defragmentation In Aerospike
How Space Is Freed Up
SSD (DATA)

Over time, some records will be
deleted or updated, resulting in
fragmented usage on the flash/SSD
disk. This unused space must be freed
up.

1
2
3
4
5
6
7
8

Block size

© 2014 Aerospike. All rights reserved. Confidential

Pg. 12
Defragmentation In Aerospike
How Space Is Freed Up
SSD (DATA)

Some databases use a nightly process
called “compaction,” which is an
intensive process. Aerospike runs a
regular process (every few minutes) that
looks for blocks below some level of use
(called the high watermark).

1
2
3
4
5
6
7

In this example, if the high watermark is
50%, blocks 1 and 3 to the left are below
50% occupied. The defragmenter will
take the data in these blocks and merge
then into another block.

8

Block size

© 2014 Aerospike. All rights reserved. Confidential

Pg. 13
Defragmentation In Aerospike
How Space Is Freed Up
SSD (DATA)

The defragmenter will get write the new
block (block 7) and clear up blocks 1 and
3 for new writes.

1
2
3
4

Because this runs constantly, there is no
special time where the performance of
the database is bad.

5
6
7
8

This algorithm operates best when the
SSD is less than 50% occupied. As disk use
grows above this, the performance of the
defragmenter will decrease.

Block size

© 2014 Aerospike. All rights reserved. Confidential

Pg. 14
Aerospike Certification Tool (ACT) for SSDs
■ Industry Standard Flash (SSD / PCI-E) Benchmark
■ Open Source Tool used by Flash Vendors to certify drives
The Factors That
Most Improve The
Performance of SSDs
How To Prepare Your System
➤ Select




the correct hardware

SSD
Disk Controller

➤ Configure

the hardware
➤ Configure Aerospike

© 2014 Aerospike. All rights reserved. Confidential

Pg. 17
Most Important Factors for SSD Performance
Factor

Importance
(rough)

Notes

Interface
(SATA v. PCIe)

Very High

One of the most critical choices is the use of interface. Today, the
difference in price and layout is huge, so is quite easy for customers to
make. If the very low latency is absolutely required, use PCIe. Costs are
2x-5x what they would be on SATA.

Consumer v. Enterprise

Very High

A few years ago the difference between these types was small, but
today very few consumer rated drives pass Aerospike certification.

Make/model

Very High

Differences in specific models from the same maker can be very large.
In some cases, the manufacturer may have quietly made changes to the
hardware and firmware, but not changed the model number.

Disk controller (RAID,
HBA)

Very High

Aerospike prefers direct control of each SSD. RAID controllers will add
latency, without much added benefit (Aerospike is already replicated).

Over-provisioning (OP)

Very High

Over-provisioning allocates space on the drive for use by the controller.
The amount the manufacturer has set will amount varies from one
model to the other. Typical amounts are 6% - 28%.

Used before
NCQ

Scheduler

High

If the SSD has been in use for a long time for other purposes, the disk
will be unevenly worn, causing poor performance.

Medium

Native Command Queuing is a SATA extension that allows the disk to
internally optimize how commands are executed. Rarely a problem on
modern equipment.

Low

This is the I/O scheduler for the Linux kernel. Aerospike prefers the
NOOP scheduler and automatically selects it.

© 2014 Aerospike. All rights reserved. Confidential

Pg. 18
Selecting The Correct SSD Model
Given the most important factors, obviously it is important to choose the
correct model. Aerospike publishes a list that it updates with information on
models that have passed testing.

These SSDs can be found at:
https://support.aerospike.com/customer/portal/articles/1315402-recommended-ssds

© 2014 Aerospike. All rights reserved. Confidential

Pg. 19
Selecting The Correct Disk Controller
Warning: Be very careful on the disk controller. Aerospike uses them in a
way that goes against traditional conventional wisdom.

Best practices:
➤ Do not use RAID across the SSDs. Aerospike stores small objects and is
much more sensitive to latency than bandwidth.
➤ When possible, use direct attach (SATA or PCIe)
➤ If you can’t use direct attach try one of the following:




➤
➤
➤

➤

Use HBAs without RAID
Configure each SSD as a separate RAID 0 array

Spread the SSDs among as many controllers as possible
All servers will have a limit to the number of drives that will perform
well. 4 is a common number.
If your company has a standard configuration for Hadoop, these often
have similar hardware needs to Aerospike
Some controllers have special software to boost performance. E.g.
The LSI 2208 chip has the fastpath available for specific models.
Check with your vendor.

© 2014 Aerospike. All rights reserved. Confidential

Pg. 20
Over-provisioning (OP)
OP can make the difference between bad performance
and great performance.
2 types of OP:



Manufacturer’s OP
User OP

Manufacturer’s typically set 6%-8% for consumer rated
drives and 14%-28% for enterprise rated. This varies
depending on the model and capacity.

© 2014 Aerospike. All rights reserved. Confidential

Pg. 21
Over-Provisioning: What You Can Do
Adding user over-provisioning can be done in one
of 2 ways:





Manufacturer’s software
Host Protected Area (HPA) – Linux has a command
that can use called hdparm that you can use to set
the HPA (Host Protected Area)
Disk partitions – You can also leave some space on
the disk as unpartitioned. The remainder of the
space will be used by the controller.

No matter which method you use, it is good to
reserve 21% for use by the controller.

© 2014 Aerospike. All rights reserved. Confidential

Pg. 22
Comparing OP Methods
HPA (Host Protected Area) Partitioning
Ease of use

Use hdparm 9.37+

Use built-in fdisk command

Most versions of Linux come with earlier
versions.

Performance

Both methods have the same performance

Device ID

Must specify the basic
device (e.g. /dev/sdb)

Must specify the specific
partition (e.g. /dev/sdb1)

Notes

hdparm may not work
through your RAID
controller

All commands must specify
the full partition. Not doing
so may result in using disks
not OPed.

© 2014 Aerospike. All rights reserved. Confidential

Pg. 23
OP Using Host Protected Area (HPA)
In order to use the HPA, it is easiest to use the
command hdparm (must have version 9.37+). You
can get a copy of this at:
http://sourceforge.net/projects/hdparm/

© 2014 Aerospike. All rights reserved. Confidential

Pg. 24
OP Using Host Protected Area (HPA) - Example
First find the number of sectors (must be root or use sudo)
> sudo /opt/hdparm-9.43/hdparm -N /dev/sdb
/dev/sdb:
max sectors
= 500118192/500118192, HPA is disabled

Then multiply by the OP amount (79%):
500,118,192 x 0.79 = 395,093,372 sectors
> sudo /opt/hdparm-9.43/hdparm -Np395093372 --yes-i-know-what-iam-doing /dev/sdb
/dev/sdb:
setting max visible sectors to 395093372 (permanent)
max sectors
= 395093372/500118192, HPA is enabled

Finally reboot. This is actually necessary to make sure the new
settings take hold.

© 2014 Aerospike. All rights reserved. Confidential

Pg. 25
OP Using Partitions - Example
In this example we will over-provision the disk /dev/sdb by creating a single
partition that is 79% of the overall capacity (15121 = 19140 x 0.79):
> sudo /sbin/fdisk /dev/sdb
Command (m for help): n
Command action
e
extended
p
primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-19140, default 1): 1
Last cylinder, +cylinders or +size{K,M,G} (1-19140, default 19140): 15121
Command (m for help): p
Disk /dev/sdb: 157.4 GB, 157437394944 bytes
255 heads, 63 sectors/track, 19140 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xeff8f3ae
Device Boot
Start
End
Blocks
Id
/dev/sdb1
1
15121
121459401
83
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.

System
Linux

We recommend rebooting the server once this has been done. Note that for this
disk you will need to use /dev/sdb1 as the device.
© 2014 Aerospike. All rights reserved. Confidential

Pg. 26
Testing SSDs
Did You Choose Well?
The only way to be sure how these all work in
your environment is to test.
The best way is to use the Aerospike Certification
Test (ACT). This is a tool that has been Open
Sourced by Aerospike for testing SSD
configurations.

© 2014 Aerospike. All rights reserved. Confidential

Pg. 28
Aerospike ACT
The ACT accesses SSDs similarly to the way the
Aerospike database does: reads with concurrent large
block writes. By default the tests run for a period of
24 hours.
The tests are based on factors of “x”.
1x represents 2,000 reads/s and 1,000 writes/s per SSD
2x represents 4,000 reads/s and 2,000 writes/s per SSD
etc.

1x represents decent performance of an SSD in 2010. Today,
several models of SSDs perform well at 3x. These tests must be
run for 24 hours to ensure stability.
Test with greater and greater “x” levels until the SSD performs
poorly.
© 2014 Aerospike. All rights reserved. Confidential

Pg. 29
Methodology For Single Disk
The basic methodology is:
➤ Test a single drive at 3x
➤ Retest with different configurations (OP, disk
controller, settings, etc)
➤ If the best of these pass standards, retest at a
higher x. If not, lower test standards to 2x.
➤ Repeat these tests until you have discovered the
limits of performance.
➤ Finally, test at twice the highest level passed to
make sure the disk can handle large bursts of
traffic. If a disk passes the test criteria at Nx and
completes the test at twice that speed, it is said to
pass at Nx.
© 2014 Aerospike. All rights reserved. Confidential

Pg. 30
What Is Passing?
Aerospike defines passing with the following
criteria:
No more than 5% of all transactions exceed 1 ms
No more than 1% of all transactions exceed 8 ms
No more than 0.1% of all transactions exceed 64 ms
You may determine your own.

© 2014 Aerospike. All rights reserved. Confidential

Pg. 31
Analyzing The Results
When you run the ACT analysis tool, you will see output like this (time slices are
hourly):
trans
device
%>(ms)
%>(ms)
slice
1
2
4
8 16 32 64
1
2
4
8 16 32 64
----- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ -----1 21.01 1.59 0.04 0.00 0.00 0.00 0.00 20.88 1.57 0.04 0.00 0.00 0.00 0.00
2 23.34 1.58 0.03 0.00 0.00 0.00 0.00 23.19 1.56 0.03 0.00 0.00 0.00 0.00
3 23.89 1.66 0.04 0.00 0.00 0.00 0.00 23.75 1.64 0.04 0.00 0.00 0.00 0.00
4 25.39 2.06 0.05 0.00 0.00 0.00 0.00 25.24 2.03 0.05 0.00 0.00 0.00 0.00
5 26.72 2.41 0.07 0.00 0.00 0.00 0.00 26.57 2.38 0.07 0.00 0.00 0.00 0.00
6 26.68 2.37 0.07 0.00 0.00 0.00 0.00 26.53 2.34 0.06 0.00 0.00 0.00 0.00
7 24.93 1.82 0.04 0.00 0.00 0.00 0.00 24.78 1.79 0.04 0.00 0.00 0.00 0.00
8 25.61 1.99 0.05 0.00 0.00 0.00 0.00 25.46 1.97 0.05 0.00 0.00 0.00 0.00
9 25.68 1.96 0.05 0.00 0.00 0.00 0.00 25.53 1.94 0.05 0.00 0.00 0.00 0.00
10 26.79 2.28 0.06 0.00 0.00 0.00 0.00 26.64 2.25 0.06 0.00 0.00 0.00 0.00
11 24.69 1.63 0.03 0.00 0.00 0.00 0.00 24.54 1.61 0.03 0.00 0.00 0.00 0.00
12 25.73 1.92 0.04 0.00 0.00 0.00 0.00 25.58 1.90 0.04 0.00 0.00 0.00 0.00
13 26.86 2.26 0.06 0.00 0.00 0.00 0.00 26.70 2.23 0.06 0.00 0.00 0.00 0.00
14 26.17 2.03 0.05 0.00 0.00 0.00 0.00 26.02 2.01 0.05 0.00 0.00 0.00 0.00
15 26.40 2.10 0.05 0.00 0.00 0.00 0.00 26.24 2.07 0.05 0.00 0.00 0.00 0.00
16 26.70 2.18 0.06 0.00 0.00 0.00 0.00 26.54 2.15 0.05 0.00 0.00 0.00 0.00
17 26.57 2.13 0.05 0.00 0.00 0.00 0.00 26.41 2.11 0.05 0.00 0.00 0.00 0.00
18 26.53 2.11 0.05 0.00 0.00 0.00 0.00 26.37 2.09 0.05 0.00 0.00 0.00 0.00
19 26.53 2.11 0.05 0.00 0.00 0.00 0.00 26.37 2.08 0.05 0.00 0.00 0.00 0.00
20 25.43 1.79 0.04 0.00 0.00 0.00 0.00 25.27 1.77 0.04 0.00 0.00 0.00 0.00
21 27.56 2.40 0.06 0.00 0.00 0.00 0.00 27.40 2.37 0.06 0.00 0.00 0.00 0.00
22 27.61 2.43 0.07 0.00 0.00 0.00 0.00 27.45 2.40 0.07 0.00 0.00 0.00 0.00
23 25.21 1.71 0.04 0.00 0.00 0.00 0.00 25.05 1.68 0.04 0.00 0.00 0.00 0.00
24 26.61 2.10 0.05 0.00 0.00 0.00 0.00 26.45 2.08 0.05 0.00 0.00 0.00 0.00
----- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ -----avg 25.78 2.03 0.05 0.00 0.00 0.00 0.00 25.62 2.00 0.05 0.00 0.00 0.00 0.00
max 27.61 2.43 0.07 0.00 0.00 0.00 0.00 27.45 2.40 0.07 0.00 0.00 0.00 0.00

© 2014 Aerospike. All rights reserved. Confidential

Pg. 32
Methodology For Multiple Disks
In this case, you already know the performance of
a single drive. What you are actually testing for is
if this will scale linearly with the controller(s) you
have.
➤ Test 2 drives in parallel and increase the
number of drives until the performance is
obviously unacceptable or you have reached the
number of drives you wish to test.
As with the single disk, if a disk setup passes the
test criteria at Nx and completes the test at
twice that speed, it is said to pass at Nx.
© 2014 Aerospike. All rights reserved. Confidential

Pg. 33
Running ACT Tests
In order to run ACT tests (e.g. for drive /dev/sdb). This will
require root or sudo.
1.

2.
3.
4.

5.

Download and compile the ACT. Follow the included directions to
compile.
http://aerospike.github.io/act/
Prepare the drive(s) for use:
<ACT_DIR>/actprep /dev/sdb
Create a config file for the ACT run
python <ACT_DIR>/act_config_helper.py
Execute the ACT on the config file (since these will run for a long
time, it is useful to put it into the background.
<ACT_DIR>/act [config_file] > [log_file] &
Test to make sure it is running and outputting data. The “-t 10”
means to put the data into 10 second slices (default is 3600).
<ACT_DIR>/latency_calc/act_latency.py –l [log_file] –t 10

6.

Wait for test to complete (24 hours)
© 2014 Aerospike. All rights reserved. Confidential

Pg. 34
Example: Creating Config Files
> python act_config_helper.py
Enter the number of devices you want to create config for: 1
Enter either raw device if over-provisioned using hdparm or
partition if over-provisioned using fdisk
Enter device name # 1(e.g. /dev/sdb or /dev/sdb1): /dev/sdb
Duration for the test (default :24 hours) [ENTER]
Configure test duration ? (N for using default) (y/N) :n
Use advanced mode for configuration ? (y/N) n
"1x" load is 2000 reads per sec and 1000 writes per sec
Enter the load you want to test the devices ( e.g. enter 1 for 1x
test):3
Do you want to Create the config (Save to a file) ? : (y/N) y
Config File actconfig_3x_1d.txt successfully created

The result will be the output file “actconfig_3x_1d.txt”. If you have multiple SSDs, the
the load will be taken for each device. Defaults for the ACT are for small objects (1.5
KB) and can be changed in the advanced options.

© 2014 Aerospike. All rights reserved. Confidential

Pg. 35
Analyzing The Results
Analyze the final output log.
<ACT_DIR>/latency_calc/act_latency.py –l [log_file]
trans
device
%>(ms)
%>(ms)
slice
1
2
4
8 16 32 64
1
2
4
8 16 32 64
----- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ -----1 21.01 1.59 0.04 0.00 0.00 0.00 0.00 20.88 1.57 0.04 0.00 0.00 0.00 0.00
2 23.34 1.58 0.03 0.00 0.00 0.00 0.00 23.19 1.56 0.03 0.00 0.00 0.00 0.00
3 23.89 1.66 0.04 0.00 0.00 0.00 0.00 23.75 1.64 0.04 0.00 0.00 0.00 0.00
4 25.39 2.06 0.05 0.00 0.00 0.00 0.00 25.24 2.03 0.05 0.00 0.00 0.00 0.00
5 26.72 2.41 0.07 0.00 0.00 0.00 0.00 26.57 2.38 0.07 0.00 0.00 0.00 0.00
6 26.68 2.37 0.07 0.00 0.00 0.00 0.00 26.53 2.34 0.06 0.00 0.00 0.00 0.00
7 24.93 1.82 0.04 0.00 0.00 0.00 0.00 24.78 1.79 0.04 0.00 0.00 0.00 0.00
8 25.61 1.99 0.05 0.00 0.00 0.00 0.00 25.46 1.97 0.05 0.00 0.00 0.00 0.00
9 25.68 1.96 0.05 0.00 0.00 0.00 0.00 25.53 1.94 0.05 0.00 0.00 0.00 0.00
10 26.79 2.28 0.06 0.00 0.00 0.00 0.00 26.64 2.25 0.06 0.00 0.00 0.00 0.00
11 24.69 1.63 0.03 0.00 0.00 0.00 0.00 24.54 1.61 0.03 0.00 0.00 0.00 0.00
12 25.73 1.92 0.04 0.00 0.00 0.00 0.00 25.58 1.90 0.04 0.00 0.00 0.00 0.00
13 26.86 2.26 0.06 0.00 0.00 0.00 0.00 26.70 2.23 0.06 0.00 0.00 0.00 0.00
14 26.17 2.03 0.05 0.00 0.00 0.00 0.00 26.02 2.01 0.05 0.00 0.00 0.00 0.00
15 26.40 2.10 0.05 0.00 0.00 0.00 0.00 26.24 2.07 0.05 0.00 0.00 0.00 0.00
16 26.70 2.18 0.06 0.00 0.00 0.00 0.00 26.54 2.15 0.05 0.00 0.00 0.00 0.00
17 26.57 2.13 0.05 0.00 0.00 0.00 0.00 26.41 2.11 0.05 0.00 0.00 0.00 0.00
18 26.53 2.11 0.05 0.00 0.00 0.00 0.00 26.37 2.09 0.05 0.00 0.00 0.00 0.00
19 26.53 2.11 0.05 0.00 0.00 0.00 0.00 26.37 2.08 0.05 0.00 0.00 0.00 0.00
20 25.43 1.79 0.04 0.00 0.00 0.00 0.00 25.27 1.77 0.04 0.00 0.00 0.00 0.00
21 27.56 2.40 0.06 0.00 0.00 0.00 0.00 27.40 2.37 0.06 0.00 0.00 0.00 0.00
22 27.61 2.43 0.07 0.00 0.00 0.00 0.00 27.45 2.40 0.07 0.00 0.00 0.00 0.00
23 25.21 1.71 0.04 0.00 0.00 0.00 0.00 25.05 1.68 0.04 0.00 0.00 0.00 0.00
24 26.61 2.10 0.05 0.00 0.00 0.00 0.00 26.45 2.08 0.05 0.00 0.00 0.00 0.00
----- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ -----avg 25.78 2.03 0.05 0.00 0.00 0.00 0.00 25.62 2.00 0.05 0.00 0.00 0.00 0.00
max 27.61 2.43 0.07 0.00 0.00 0.00 0.00 27.45 2.40 0.07 0.00 0.00 0.00 0.00

© 2014 Aerospike. All rights reserved. Confidential

Pg. 36
Final Preparation
Final Preparations
Once you have your hardware properly configured,
there are some final steps before you use the SSDs.
You must blank out the drives (similar to a format
with a filesystem) bye running the dd command on
each of the drives. These can be run in parallel, but
must be done by root or with sudo:
> sudo dd if=/dev/zero of=/dev/<DEVICE_ID> bs=128k &
If you used partitioning to OP the drives, make sure to use the partition
id (e.g. /dev/sdb1).
WARNING: Do not run this on the disk with your operating system
(usually /dev/sda)!

© 2014 Aerospike. All rights reserved. Confidential

Pg. 38
Troubleshooting Common Issues


Tests show much greater than expected latency





Test won’t complete





Make sure you have properly configured over-provisioning. This is a common issue.
If you are doing a multi-disk test, the problem may lie in a single disk. Variances in
manufacturing may lead to a single drive masking poor latencies for all drives. Also make
sure your drives are fresh. Old drives may have hotspots.
Your load may be overwhelming your controller or the drive. A log message will let you
know if it is stopping because it cannot keep up.
If there is no error message in the log, sometimes logging out of the server will stop the
ACT process. You must use nohup or a similar mechanism to ensure the process will run
for the full 24 hours.

Operating system gives odd errors


You may have inadvertently run actprep or dd on the OS drive. Even the best of us have
done this.

© 2014 Aerospike. All rights reserved. Confidential

Pg. 39
Q&A
Thank You
Send all questions/comments/complaints to
YOUNG PAIK
YOUNG@AEROSPIKE.COM

More Related Content

PPT
Aerospike: Key Value Data Access
PDF
Art of the Possible_Tim Faulkes.pdf
PPTX
An Overview of Apache Cassandra
PPTX
Configuring Aerospike - Part 1
PPTX
Introduction to Aerospike
PDF
Aerospike Hybrid Memory Architecture
PPTX
Configuring Aerospike - Part 2
PPTX
Aerospike Architecture
Aerospike: Key Value Data Access
Art of the Possible_Tim Faulkes.pdf
An Overview of Apache Cassandra
Configuring Aerospike - Part 1
Introduction to Aerospike
Aerospike Hybrid Memory Architecture
Configuring Aerospike - Part 2
Aerospike Architecture

What's hot (20)

PDF
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
PDF
Ceph and RocksDB
PPTX
Aerospike Architecture
PDF
NVMe overview
PDF
ClickHouse Features for Advanced Users, by Aleksei Milovidov
PDF
Introduction to Greenplum
PPTX
Ceph Performance and Sizing Guide
PDF
Google Bigtable Paper Presentation
PDF
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
PPTX
Apache Spark Architecture
PDF
Introduction to Apache Cassandra
PDF
ETL With Cassandra Streaming Bulk Loading
PDF
Deep Dive into Cassandra
PDF
Tuning Apache/MySQL/PHP para desenvolvedores
PDF
Apache Spark in Depth: Core Concepts, Architecture & Internals
PDF
A Technical Introduction to WiredTiger
PPTX
Intro to Apache Spark
PPTX
RocksDB compaction
PDF
Power of the Log: LSM & Append Only Data Structures
PDF
ClickHouse Intro
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Ceph and RocksDB
Aerospike Architecture
NVMe overview
ClickHouse Features for Advanced Users, by Aleksei Milovidov
Introduction to Greenplum
Ceph Performance and Sizing Guide
Google Bigtable Paper Presentation
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Architecture
Introduction to Apache Cassandra
ETL With Cassandra Streaming Bulk Loading
Deep Dive into Cassandra
Tuning Apache/MySQL/PHP para desenvolvedores
Apache Spark in Depth: Core Concepts, Architecture & Internals
A Technical Introduction to WiredTiger
Intro to Apache Spark
RocksDB compaction
Power of the Log: LSM & Append Only Data Structures
ClickHouse Intro
Ad

Viewers also liked (10)

PPTX
Aerospike: Maximizing Performance
PPTX
Distributing Data The Aerospike Way
PPTX
Redis vs Aerospike
PPTX
Developing High Performance Application with Aerospike & Go
PPTX
Live Analytics with Go & Aerospike
PDF
Building ZingMe News Feed System
PDF
Zing Me Real Time Web Chat Architect
PDF
Design a scalable social network: Problems and solutions
PPTX
Inside Zalo: Developing a mobile messenger for the audience of millions - VN ...
PDF
Riak Search 2.0を使ったデータ集計
Aerospike: Maximizing Performance
Distributing Data The Aerospike Way
Redis vs Aerospike
Developing High Performance Application with Aerospike & Go
Live Analytics with Go & Aerospike
Building ZingMe News Feed System
Zing Me Real Time Web Chat Architect
Design a scalable social network: Problems and solutions
Inside Zalo: Developing a mobile messenger for the audience of millions - VN ...
Riak Search 2.0を使ったデータ集計
Ad

Similar to Getting The Most Out Of Your Flash/SSDs (20)

PPTX
Solid State Drives (SSDs) -What it Takes to Make Data Go Away
PPTX
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
PDF
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
PPT
fdocuments.in_aerospike-key-value-data-access.ppt
PDF
IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...
PDF
Disk configtips wp-cn
PPTX
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
PPTX
Introduction to Hard Disk Drive by Vishal Garg
PDF
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
PPT
Open Ware Ramsan Dram Ssd
PDF
Development to Production with Sharded MongoDB Clusters
PDF
How to deploy SQL Server on an Microsoft Azure virtual machines
DOCX
3 5 SSD
PDF
Generic SAN Acceleration White Paper DRAFT
PPTX
Architectural designs driving sql server performance and high availability
PPT
SQL 2005 Disk IO Performance
PDF
Firebird and RAID
PPTX
JetStor portfolio update final_2020-2021
PDF
Designing SSD-friendly Applications for Better Application Performance and Hi...
PPT
Apresentacao Solid Access Corp Presentation Openware 5 20 10
Solid State Drives (SSDs) -What it Takes to Make Data Go Away
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
fdocuments.in_aerospike-key-value-data-access.ppt
IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...
Disk configtips wp-cn
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Introduction to Hard Disk Drive by Vishal Garg
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Open Ware Ramsan Dram Ssd
Development to Production with Sharded MongoDB Clusters
How to deploy SQL Server on an Microsoft Azure virtual machines
3 5 SSD
Generic SAN Acceleration White Paper DRAFT
Architectural designs driving sql server performance and high availability
SQL 2005 Disk IO Performance
Firebird and RAID
JetStor portfolio update final_2020-2021
Designing SSD-friendly Applications for Better Application Performance and Hi...
Apresentacao Solid Access Corp Presentation Openware 5 20 10

More from Aerospike, Inc. (18)

PDF
2017 DB Trends for Powering Real-Time Systems of Engagement
PPTX
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
PPTX
Leveraging Big Data with Hadoop, NoSQL and RDBMS
PDF
Using Databases and Containers From Development to Deployment
PDF
01282016 Aerospike-Docker webinar
PPTX
There are 250 Database products, are you running the right one?
PPTX
The role of NoSQL in the Next Generation of Financial Informatics
PPTX
Tectonic Shift: A New Foundation for Data Driven Business
PPTX
How to Get a Game Changing Performance Advantage with Intel SSDs and Aerospike
PDF
What the Spark!? Intro and Use Cases
PDF
Get Started with Data Science by Analyzing Traffic Data from California Highways
PPTX
Running a High Performance NoSQL Database on Amazon EC2 for Just $1.68/Hour
PPTX
ACID & CAP: Clearing CAP Confusion and Why C In CAP ≠ C in ACID
PPTX
Flash Economics and Lessons learned from operating low latency platforms at h...
PDF
Storm Persistence and Real-Time Analytics
PDF
You Snooze You Lose or How to Win in Ad Tech?
PPT
Big Data Learnings from a Vendor's Perspective
PPT
Predictable Big Data Performance in Real-time
2017 DB Trends for Powering Real-Time Systems of Engagement
WEBINAR: Architectures for Digital Transformation and Next-Generation Systems...
Leveraging Big Data with Hadoop, NoSQL and RDBMS
Using Databases and Containers From Development to Deployment
01282016 Aerospike-Docker webinar
There are 250 Database products, are you running the right one?
The role of NoSQL in the Next Generation of Financial Informatics
Tectonic Shift: A New Foundation for Data Driven Business
How to Get a Game Changing Performance Advantage with Intel SSDs and Aerospike
What the Spark!? Intro and Use Cases
Get Started with Data Science by Analyzing Traffic Data from California Highways
Running a High Performance NoSQL Database on Amazon EC2 for Just $1.68/Hour
ACID & CAP: Clearing CAP Confusion and Why C In CAP ≠ C in ACID
Flash Economics and Lessons learned from operating low latency platforms at h...
Storm Persistence and Real-Time Analytics
You Snooze You Lose or How to Win in Ad Tech?
Big Data Learnings from a Vendor's Perspective
Predictable Big Data Performance in Real-time

Recently uploaded (20)

PPTX
Spectroscopy.pptx food analysis technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Cloud computing and distributed systems.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation theory and applications.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Approach and Philosophy of On baking technology
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
sap open course for s4hana steps from ECC to s4
Spectroscopy.pptx food analysis technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Cloud computing and distributed systems.
MYSQL Presentation for SQL database connectivity
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Spectral efficient network and resource selection model in 5G networks
Encapsulation theory and applications.pdf
Unlocking AI with Model Context Protocol (MCP)
Approach and Philosophy of On baking technology
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
NewMind AI Weekly Chronicles - August'25 Week I
Programs and apps: productivity, graphics, security and other tools
Digital-Transformation-Roadmap-for-Companies.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Understanding_Digital_Forensics_Presentation.pptx
sap open course for s4hana steps from ECC to s4

Getting The Most Out Of Your Flash/SSDs

  • 1. Getting The Most Out Of Your Flash/SSDs Young Paik Technical Marketing Director young@aerospike.com Aerospike aer . o . spike [air-oh- spahyk] noun, 1. tip of a rocket that enhances speed and stability
  • 2. Introduction Flash/SSDs (used interchangeably) are still relatively new. Getting the most out of them requires a good understanding of how they work and how Aerospike uses them. © 2014 Aerospike. All rights reserved. Confidential Pg. 2
  • 3. Agenda        SSDs vs. Rotational Drives What Aerospike Does To Make The Most of SSDs The Factors That Most Improve The Performance of SSDs Testing SSDs More on Testing SSDs Even more on Testing SSDs Final Preparations For Your Drives © 2014 Aerospike. All rights reserved. Confidential Pg. 3
  • 5. Differences Matter Some will tell you that their databases will work on SSDs and that no changes are necessary. There are differences between SSDs and rotational drives that are important. You must do more than simply swap out your old drive and put in an SSD to get the best performance. © 2014 Aerospike. All rights reserved. Confidential Pg. 5
  • 6. Comparing Old and New There are differences between rotational and SSD disks that are independent of the database you are using. Characteristic Rotational SSD Notes Random read Poor Excellent This is where SSDs shine the most. With no moving parts, SSDs are clearly the choice for random reads. Random write Poor Good Similar to reads, but SSDs are not quite as fast with random writes as they are with reads. Sequential write Good Excellent Rotational drives narrow the gap here. While they are close in pure write performance, any reads during these writes will require the movement of the heads on rotational drives. Rewritability (durability) Excellent Poor This is where SSDs are the weakest. NAND (Flash) chips have limits to how many times you can write to the same area. Databases must take this into account to avoid “hotspots.” Databases that do not are relying on the operating systems (i.e. the TRIM command) to alleviate these issues. Aerospike manages this differently. © 2014 Aerospike. All rights reserved. Confidential Pg. 6
  • 7. What Aerospike Does To Make The Most Of SSDs
  • 8. Techniques In order to make the best use of SSDs, Aerospike has designed an architecture that does the following: Uses raw disk Aerospike does not use a file system, which would only slow down the database. Writes in large blocks Rather than trying to write many smaller items, it is much more efficient to write a few large ones. Aerospike uses black sizes that are integral multiple of 128 KB. Reads in small blocks Reads are done in 512 byte data segments. Handles defragmentation on a regular basis All databases must delete data. This creates fragmentation of the data on disk, which makes it harder to use efficiently. Aerospike does this through a continual process called defragmentation. This means you do not need the TRIM command used on most operating systems. Works with vendors Aerospike works closely with SSD manufacturers to test hardware and provide feedback for the best performance. © 2014 Aerospike. All rights reserved. Confidential Pg. 8
  • 9. Accessing An Object In Aerospike Writing A New Standard Data Type Record With SSDs Client Master Node DRAM (Index) SSD (DATA) 1) Client finds Master Node from partition map. 2) Client makes write request to Master Node. 3) Master Node make an entry indo index (in DRAM) and queues write in temporary write buffer. 4) Master Node coordinates write with replica nodes (not shown). 5) Master Node returns success to client. 6) Master Node asynchronously writes data in blocks. 7) Index in DRAM points to location on SSD. Asynchronous write Block size (128 KB by default) © 2014 Aerospike. All rights reserved. Confidential Pg. 9
  • 10. Defragmentation In Aerospike How Space Is Freed Up SSD (DATA) Aerospike writes the data in large data blocks. 1 2 3 4 5 6 7 8 Block size (128 KB by default) © 2014 Aerospike. All rights reserved. Confidential Pg. 10
  • 11. Defragmentation In Aerospike How Space Is Freed Up SSD (DATA) As new data is added to the disk, new blocks will be continually written to the SSD. 1 2 3 4 5 6 7 8 Block size (128 KB by default) © 2014 Aerospike. All rights reserved. Confidential Pg. 11
  • 12. Defragmentation In Aerospike How Space Is Freed Up SSD (DATA) Over time, some records will be deleted or updated, resulting in fragmented usage on the flash/SSD disk. This unused space must be freed up. 1 2 3 4 5 6 7 8 Block size © 2014 Aerospike. All rights reserved. Confidential Pg. 12
  • 13. Defragmentation In Aerospike How Space Is Freed Up SSD (DATA) Some databases use a nightly process called “compaction,” which is an intensive process. Aerospike runs a regular process (every few minutes) that looks for blocks below some level of use (called the high watermark). 1 2 3 4 5 6 7 In this example, if the high watermark is 50%, blocks 1 and 3 to the left are below 50% occupied. The defragmenter will take the data in these blocks and merge then into another block. 8 Block size © 2014 Aerospike. All rights reserved. Confidential Pg. 13
  • 14. Defragmentation In Aerospike How Space Is Freed Up SSD (DATA) The defragmenter will get write the new block (block 7) and clear up blocks 1 and 3 for new writes. 1 2 3 4 Because this runs constantly, there is no special time where the performance of the database is bad. 5 6 7 8 This algorithm operates best when the SSD is less than 50% occupied. As disk use grows above this, the performance of the defragmenter will decrease. Block size © 2014 Aerospike. All rights reserved. Confidential Pg. 14
  • 15. Aerospike Certification Tool (ACT) for SSDs ■ Industry Standard Flash (SSD / PCI-E) Benchmark ■ Open Source Tool used by Flash Vendors to certify drives
  • 16. The Factors That Most Improve The Performance of SSDs
  • 17. How To Prepare Your System ➤ Select   the correct hardware SSD Disk Controller ➤ Configure the hardware ➤ Configure Aerospike © 2014 Aerospike. All rights reserved. Confidential Pg. 17
  • 18. Most Important Factors for SSD Performance Factor Importance (rough) Notes Interface (SATA v. PCIe) Very High One of the most critical choices is the use of interface. Today, the difference in price and layout is huge, so is quite easy for customers to make. If the very low latency is absolutely required, use PCIe. Costs are 2x-5x what they would be on SATA. Consumer v. Enterprise Very High A few years ago the difference between these types was small, but today very few consumer rated drives pass Aerospike certification. Make/model Very High Differences in specific models from the same maker can be very large. In some cases, the manufacturer may have quietly made changes to the hardware and firmware, but not changed the model number. Disk controller (RAID, HBA) Very High Aerospike prefers direct control of each SSD. RAID controllers will add latency, without much added benefit (Aerospike is already replicated). Over-provisioning (OP) Very High Over-provisioning allocates space on the drive for use by the controller. The amount the manufacturer has set will amount varies from one model to the other. Typical amounts are 6% - 28%. Used before NCQ Scheduler High If the SSD has been in use for a long time for other purposes, the disk will be unevenly worn, causing poor performance. Medium Native Command Queuing is a SATA extension that allows the disk to internally optimize how commands are executed. Rarely a problem on modern equipment. Low This is the I/O scheduler for the Linux kernel. Aerospike prefers the NOOP scheduler and automatically selects it. © 2014 Aerospike. All rights reserved. Confidential Pg. 18
  • 19. Selecting The Correct SSD Model Given the most important factors, obviously it is important to choose the correct model. Aerospike publishes a list that it updates with information on models that have passed testing. These SSDs can be found at: https://support.aerospike.com/customer/portal/articles/1315402-recommended-ssds © 2014 Aerospike. All rights reserved. Confidential Pg. 19
  • 20. Selecting The Correct Disk Controller Warning: Be very careful on the disk controller. Aerospike uses them in a way that goes against traditional conventional wisdom. Best practices: ➤ Do not use RAID across the SSDs. Aerospike stores small objects and is much more sensitive to latency than bandwidth. ➤ When possible, use direct attach (SATA or PCIe) ➤ If you can’t use direct attach try one of the following:   ➤ ➤ ➤ ➤ Use HBAs without RAID Configure each SSD as a separate RAID 0 array Spread the SSDs among as many controllers as possible All servers will have a limit to the number of drives that will perform well. 4 is a common number. If your company has a standard configuration for Hadoop, these often have similar hardware needs to Aerospike Some controllers have special software to boost performance. E.g. The LSI 2208 chip has the fastpath available for specific models. Check with your vendor. © 2014 Aerospike. All rights reserved. Confidential Pg. 20
  • 21. Over-provisioning (OP) OP can make the difference between bad performance and great performance. 2 types of OP:   Manufacturer’s OP User OP Manufacturer’s typically set 6%-8% for consumer rated drives and 14%-28% for enterprise rated. This varies depending on the model and capacity. © 2014 Aerospike. All rights reserved. Confidential Pg. 21
  • 22. Over-Provisioning: What You Can Do Adding user over-provisioning can be done in one of 2 ways:    Manufacturer’s software Host Protected Area (HPA) – Linux has a command that can use called hdparm that you can use to set the HPA (Host Protected Area) Disk partitions – You can also leave some space on the disk as unpartitioned. The remainder of the space will be used by the controller. No matter which method you use, it is good to reserve 21% for use by the controller. © 2014 Aerospike. All rights reserved. Confidential Pg. 22
  • 23. Comparing OP Methods HPA (Host Protected Area) Partitioning Ease of use Use hdparm 9.37+ Use built-in fdisk command Most versions of Linux come with earlier versions. Performance Both methods have the same performance Device ID Must specify the basic device (e.g. /dev/sdb) Must specify the specific partition (e.g. /dev/sdb1) Notes hdparm may not work through your RAID controller All commands must specify the full partition. Not doing so may result in using disks not OPed. © 2014 Aerospike. All rights reserved. Confidential Pg. 23
  • 24. OP Using Host Protected Area (HPA) In order to use the HPA, it is easiest to use the command hdparm (must have version 9.37+). You can get a copy of this at: http://sourceforge.net/projects/hdparm/ © 2014 Aerospike. All rights reserved. Confidential Pg. 24
  • 25. OP Using Host Protected Area (HPA) - Example First find the number of sectors (must be root or use sudo) > sudo /opt/hdparm-9.43/hdparm -N /dev/sdb /dev/sdb: max sectors = 500118192/500118192, HPA is disabled Then multiply by the OP amount (79%): 500,118,192 x 0.79 = 395,093,372 sectors > sudo /opt/hdparm-9.43/hdparm -Np395093372 --yes-i-know-what-iam-doing /dev/sdb /dev/sdb: setting max visible sectors to 395093372 (permanent) max sectors = 395093372/500118192, HPA is enabled Finally reboot. This is actually necessary to make sure the new settings take hold. © 2014 Aerospike. All rights reserved. Confidential Pg. 25
  • 26. OP Using Partitions - Example In this example we will over-provision the disk /dev/sdb by creating a single partition that is 79% of the overall capacity (15121 = 19140 x 0.79): > sudo /sbin/fdisk /dev/sdb Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-19140, default 1): 1 Last cylinder, +cylinders or +size{K,M,G} (1-19140, default 19140): 15121 Command (m for help): p Disk /dev/sdb: 157.4 GB, 157437394944 bytes 255 heads, 63 sectors/track, 19140 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0xeff8f3ae Device Boot Start End Blocks Id /dev/sdb1 1 15121 121459401 83 Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks. System Linux We recommend rebooting the server once this has been done. Note that for this disk you will need to use /dev/sdb1 as the device. © 2014 Aerospike. All rights reserved. Confidential Pg. 26
  • 28. Did You Choose Well? The only way to be sure how these all work in your environment is to test. The best way is to use the Aerospike Certification Test (ACT). This is a tool that has been Open Sourced by Aerospike for testing SSD configurations. © 2014 Aerospike. All rights reserved. Confidential Pg. 28
  • 29. Aerospike ACT The ACT accesses SSDs similarly to the way the Aerospike database does: reads with concurrent large block writes. By default the tests run for a period of 24 hours. The tests are based on factors of “x”. 1x represents 2,000 reads/s and 1,000 writes/s per SSD 2x represents 4,000 reads/s and 2,000 writes/s per SSD etc. 1x represents decent performance of an SSD in 2010. Today, several models of SSDs perform well at 3x. These tests must be run for 24 hours to ensure stability. Test with greater and greater “x” levels until the SSD performs poorly. © 2014 Aerospike. All rights reserved. Confidential Pg. 29
  • 30. Methodology For Single Disk The basic methodology is: ➤ Test a single drive at 3x ➤ Retest with different configurations (OP, disk controller, settings, etc) ➤ If the best of these pass standards, retest at a higher x. If not, lower test standards to 2x. ➤ Repeat these tests until you have discovered the limits of performance. ➤ Finally, test at twice the highest level passed to make sure the disk can handle large bursts of traffic. If a disk passes the test criteria at Nx and completes the test at twice that speed, it is said to pass at Nx. © 2014 Aerospike. All rights reserved. Confidential Pg. 30
  • 31. What Is Passing? Aerospike defines passing with the following criteria: No more than 5% of all transactions exceed 1 ms No more than 1% of all transactions exceed 8 ms No more than 0.1% of all transactions exceed 64 ms You may determine your own. © 2014 Aerospike. All rights reserved. Confidential Pg. 31
  • 32. Analyzing The Results When you run the ACT analysis tool, you will see output like this (time slices are hourly): trans device %>(ms) %>(ms) slice 1 2 4 8 16 32 64 1 2 4 8 16 32 64 ----- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ -----1 21.01 1.59 0.04 0.00 0.00 0.00 0.00 20.88 1.57 0.04 0.00 0.00 0.00 0.00 2 23.34 1.58 0.03 0.00 0.00 0.00 0.00 23.19 1.56 0.03 0.00 0.00 0.00 0.00 3 23.89 1.66 0.04 0.00 0.00 0.00 0.00 23.75 1.64 0.04 0.00 0.00 0.00 0.00 4 25.39 2.06 0.05 0.00 0.00 0.00 0.00 25.24 2.03 0.05 0.00 0.00 0.00 0.00 5 26.72 2.41 0.07 0.00 0.00 0.00 0.00 26.57 2.38 0.07 0.00 0.00 0.00 0.00 6 26.68 2.37 0.07 0.00 0.00 0.00 0.00 26.53 2.34 0.06 0.00 0.00 0.00 0.00 7 24.93 1.82 0.04 0.00 0.00 0.00 0.00 24.78 1.79 0.04 0.00 0.00 0.00 0.00 8 25.61 1.99 0.05 0.00 0.00 0.00 0.00 25.46 1.97 0.05 0.00 0.00 0.00 0.00 9 25.68 1.96 0.05 0.00 0.00 0.00 0.00 25.53 1.94 0.05 0.00 0.00 0.00 0.00 10 26.79 2.28 0.06 0.00 0.00 0.00 0.00 26.64 2.25 0.06 0.00 0.00 0.00 0.00 11 24.69 1.63 0.03 0.00 0.00 0.00 0.00 24.54 1.61 0.03 0.00 0.00 0.00 0.00 12 25.73 1.92 0.04 0.00 0.00 0.00 0.00 25.58 1.90 0.04 0.00 0.00 0.00 0.00 13 26.86 2.26 0.06 0.00 0.00 0.00 0.00 26.70 2.23 0.06 0.00 0.00 0.00 0.00 14 26.17 2.03 0.05 0.00 0.00 0.00 0.00 26.02 2.01 0.05 0.00 0.00 0.00 0.00 15 26.40 2.10 0.05 0.00 0.00 0.00 0.00 26.24 2.07 0.05 0.00 0.00 0.00 0.00 16 26.70 2.18 0.06 0.00 0.00 0.00 0.00 26.54 2.15 0.05 0.00 0.00 0.00 0.00 17 26.57 2.13 0.05 0.00 0.00 0.00 0.00 26.41 2.11 0.05 0.00 0.00 0.00 0.00 18 26.53 2.11 0.05 0.00 0.00 0.00 0.00 26.37 2.09 0.05 0.00 0.00 0.00 0.00 19 26.53 2.11 0.05 0.00 0.00 0.00 0.00 26.37 2.08 0.05 0.00 0.00 0.00 0.00 20 25.43 1.79 0.04 0.00 0.00 0.00 0.00 25.27 1.77 0.04 0.00 0.00 0.00 0.00 21 27.56 2.40 0.06 0.00 0.00 0.00 0.00 27.40 2.37 0.06 0.00 0.00 0.00 0.00 22 27.61 2.43 0.07 0.00 0.00 0.00 0.00 27.45 2.40 0.07 0.00 0.00 0.00 0.00 23 25.21 1.71 0.04 0.00 0.00 0.00 0.00 25.05 1.68 0.04 0.00 0.00 0.00 0.00 24 26.61 2.10 0.05 0.00 0.00 0.00 0.00 26.45 2.08 0.05 0.00 0.00 0.00 0.00 ----- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ -----avg 25.78 2.03 0.05 0.00 0.00 0.00 0.00 25.62 2.00 0.05 0.00 0.00 0.00 0.00 max 27.61 2.43 0.07 0.00 0.00 0.00 0.00 27.45 2.40 0.07 0.00 0.00 0.00 0.00 © 2014 Aerospike. All rights reserved. Confidential Pg. 32
  • 33. Methodology For Multiple Disks In this case, you already know the performance of a single drive. What you are actually testing for is if this will scale linearly with the controller(s) you have. ➤ Test 2 drives in parallel and increase the number of drives until the performance is obviously unacceptable or you have reached the number of drives you wish to test. As with the single disk, if a disk setup passes the test criteria at Nx and completes the test at twice that speed, it is said to pass at Nx. © 2014 Aerospike. All rights reserved. Confidential Pg. 33
  • 34. Running ACT Tests In order to run ACT tests (e.g. for drive /dev/sdb). This will require root or sudo. 1. 2. 3. 4. 5. Download and compile the ACT. Follow the included directions to compile. http://aerospike.github.io/act/ Prepare the drive(s) for use: <ACT_DIR>/actprep /dev/sdb Create a config file for the ACT run python <ACT_DIR>/act_config_helper.py Execute the ACT on the config file (since these will run for a long time, it is useful to put it into the background. <ACT_DIR>/act [config_file] > [log_file] & Test to make sure it is running and outputting data. The “-t 10” means to put the data into 10 second slices (default is 3600). <ACT_DIR>/latency_calc/act_latency.py –l [log_file] –t 10 6. Wait for test to complete (24 hours) © 2014 Aerospike. All rights reserved. Confidential Pg. 34
  • 35. Example: Creating Config Files > python act_config_helper.py Enter the number of devices you want to create config for: 1 Enter either raw device if over-provisioned using hdparm or partition if over-provisioned using fdisk Enter device name # 1(e.g. /dev/sdb or /dev/sdb1): /dev/sdb Duration for the test (default :24 hours) [ENTER] Configure test duration ? (N for using default) (y/N) :n Use advanced mode for configuration ? (y/N) n "1x" load is 2000 reads per sec and 1000 writes per sec Enter the load you want to test the devices ( e.g. enter 1 for 1x test):3 Do you want to Create the config (Save to a file) ? : (y/N) y Config File actconfig_3x_1d.txt successfully created The result will be the output file “actconfig_3x_1d.txt”. If you have multiple SSDs, the the load will be taken for each device. Defaults for the ACT are for small objects (1.5 KB) and can be changed in the advanced options. © 2014 Aerospike. All rights reserved. Confidential Pg. 35
  • 36. Analyzing The Results Analyze the final output log. <ACT_DIR>/latency_calc/act_latency.py –l [log_file] trans device %>(ms) %>(ms) slice 1 2 4 8 16 32 64 1 2 4 8 16 32 64 ----- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ -----1 21.01 1.59 0.04 0.00 0.00 0.00 0.00 20.88 1.57 0.04 0.00 0.00 0.00 0.00 2 23.34 1.58 0.03 0.00 0.00 0.00 0.00 23.19 1.56 0.03 0.00 0.00 0.00 0.00 3 23.89 1.66 0.04 0.00 0.00 0.00 0.00 23.75 1.64 0.04 0.00 0.00 0.00 0.00 4 25.39 2.06 0.05 0.00 0.00 0.00 0.00 25.24 2.03 0.05 0.00 0.00 0.00 0.00 5 26.72 2.41 0.07 0.00 0.00 0.00 0.00 26.57 2.38 0.07 0.00 0.00 0.00 0.00 6 26.68 2.37 0.07 0.00 0.00 0.00 0.00 26.53 2.34 0.06 0.00 0.00 0.00 0.00 7 24.93 1.82 0.04 0.00 0.00 0.00 0.00 24.78 1.79 0.04 0.00 0.00 0.00 0.00 8 25.61 1.99 0.05 0.00 0.00 0.00 0.00 25.46 1.97 0.05 0.00 0.00 0.00 0.00 9 25.68 1.96 0.05 0.00 0.00 0.00 0.00 25.53 1.94 0.05 0.00 0.00 0.00 0.00 10 26.79 2.28 0.06 0.00 0.00 0.00 0.00 26.64 2.25 0.06 0.00 0.00 0.00 0.00 11 24.69 1.63 0.03 0.00 0.00 0.00 0.00 24.54 1.61 0.03 0.00 0.00 0.00 0.00 12 25.73 1.92 0.04 0.00 0.00 0.00 0.00 25.58 1.90 0.04 0.00 0.00 0.00 0.00 13 26.86 2.26 0.06 0.00 0.00 0.00 0.00 26.70 2.23 0.06 0.00 0.00 0.00 0.00 14 26.17 2.03 0.05 0.00 0.00 0.00 0.00 26.02 2.01 0.05 0.00 0.00 0.00 0.00 15 26.40 2.10 0.05 0.00 0.00 0.00 0.00 26.24 2.07 0.05 0.00 0.00 0.00 0.00 16 26.70 2.18 0.06 0.00 0.00 0.00 0.00 26.54 2.15 0.05 0.00 0.00 0.00 0.00 17 26.57 2.13 0.05 0.00 0.00 0.00 0.00 26.41 2.11 0.05 0.00 0.00 0.00 0.00 18 26.53 2.11 0.05 0.00 0.00 0.00 0.00 26.37 2.09 0.05 0.00 0.00 0.00 0.00 19 26.53 2.11 0.05 0.00 0.00 0.00 0.00 26.37 2.08 0.05 0.00 0.00 0.00 0.00 20 25.43 1.79 0.04 0.00 0.00 0.00 0.00 25.27 1.77 0.04 0.00 0.00 0.00 0.00 21 27.56 2.40 0.06 0.00 0.00 0.00 0.00 27.40 2.37 0.06 0.00 0.00 0.00 0.00 22 27.61 2.43 0.07 0.00 0.00 0.00 0.00 27.45 2.40 0.07 0.00 0.00 0.00 0.00 23 25.21 1.71 0.04 0.00 0.00 0.00 0.00 25.05 1.68 0.04 0.00 0.00 0.00 0.00 24 26.61 2.10 0.05 0.00 0.00 0.00 0.00 26.45 2.08 0.05 0.00 0.00 0.00 0.00 ----- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ -----avg 25.78 2.03 0.05 0.00 0.00 0.00 0.00 25.62 2.00 0.05 0.00 0.00 0.00 0.00 max 27.61 2.43 0.07 0.00 0.00 0.00 0.00 27.45 2.40 0.07 0.00 0.00 0.00 0.00 © 2014 Aerospike. All rights reserved. Confidential Pg. 36
  • 38. Final Preparations Once you have your hardware properly configured, there are some final steps before you use the SSDs. You must blank out the drives (similar to a format with a filesystem) bye running the dd command on each of the drives. These can be run in parallel, but must be done by root or with sudo: > sudo dd if=/dev/zero of=/dev/<DEVICE_ID> bs=128k & If you used partitioning to OP the drives, make sure to use the partition id (e.g. /dev/sdb1). WARNING: Do not run this on the disk with your operating system (usually /dev/sda)! © 2014 Aerospike. All rights reserved. Confidential Pg. 38
  • 39. Troubleshooting Common Issues  Tests show much greater than expected latency    Test won’t complete    Make sure you have properly configured over-provisioning. This is a common issue. If you are doing a multi-disk test, the problem may lie in a single disk. Variances in manufacturing may lead to a single drive masking poor latencies for all drives. Also make sure your drives are fresh. Old drives may have hotspots. Your load may be overwhelming your controller or the drive. A log message will let you know if it is stopping because it cannot keep up. If there is no error message in the log, sometimes logging out of the server will stop the ACT process. You must use nohup or a similar mechanism to ensure the process will run for the full 24 hours. Operating system gives odd errors  You may have inadvertently run actprep or dd on the OS drive. Even the best of us have done this. © 2014 Aerospike. All rights reserved. Confidential Pg. 39
  • 40. Q&A
  • 41. Thank You Send all questions/comments/complaints to YOUNG PAIK YOUNG@AEROSPIKE.COM

Editor's Notes

  • #2: FastestBest uptimePredictable performanceconsistency