SlideShare a Scribd company logo
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
김일호, SolutionsArchitect
03-May-2016
Tips about performance on AWS
Amazed by AWS
EC2 performance dive deep
InstancesAPIs
Networking
EC2
EC2
Purchase Options
Amazon Elastic Compute Cloud is Big
Host Server
Hypervisor
Guest 1 Guest 2 Guest n
Amazon EC2 Instances
2006 2008 2010 2012 2014 2016
m1.small
m1.large
m1.xlarge
c1.medium
c1.xlarge
m2.xlarge
m2.4xlarge
m2.2xlarge
cc1.4xlarge
t1.micro
cg1.4xlarge
cc2.8xlarge
m1.medium
hi1.4xlarge
m3.xlarge
m3.2xlarge
hs1.8xlarge
cr1.8xlarge
c3.large
c3.xlarge
c3.2xlarge
c3.4xlarge
c3.8xlarge
g2.2xlarge
i2.xlarge
i2.2xlarge
i2.4xlarge
i2.4xlarge
m3.medium
m3.large
r3.large
r3.xlarge
r3.2xlarge
r3.4xlarge
r3.8xlarge
t2.micro
t2.small
t2.med
c4.large
c4.xlarge
c4.2xlarge
c4.4xlarge
c4.8xlarge
d2.xlarge
d2.2xlarge
d2.4xlarge
d2.8xlarge
g2.8xlarge
t2.large
m4.large
m4.xlarge
m4.2xlarge
m4.4xlarge
m4.10xlarge
Amazon EC2 Instances History
What to Expect from the Session
• Defining system performance and how it is
characterized for different workloads
• How Amazon EC2 instances deliver performance
while providing flexibility and agility
• How to make the most of your EC2 instance experience
through the lens of several instance types
Defining Performance
• Servers are hired to do jobs
• Performance is measured differently depending on the job
Hiring a Server
?
• What performance means
depend on your perspective:
– Response time
– Throughput
– Consistency
Defining Performance: Perspective Matters
Application
System libraries
System calls
Kernel
Devices
Workload
Simple Performance Model for Single Thread
• Using CPU: executing (in user mode)
• Not using CPU: waiting for turn on CPU, waiting for disk or
network I/O, thread locks, memory paging, or for more work.
Performance Factors
Resource Performance factors Key indicators
CPU Sockets, number of cores, clock
frequency, bursting capability
CPU utilization, run queue length
Memory Memory capacity Free memory, anonymous paging,
thread swapping
Network
interface
Max bandwidth, packet rate Receive throughput, transmit throughput
over max bandwidth
Disks Input / output operations per
second, throughput
Wait queue length, device utilization,
device errors
Resource Utilization
• For given performance, how efficiently are resources being used
• Something at 100% utilization can’t accept any more work
• Low utilization can indicate more resource is being purchased
than needed
Example: Web Application
• MediaWiki installed on Apache with 140 pages of content
• Load increased in intervals over time
Example: Web Application
• Memory stats
Example: Web Application
• Disk stats
Example: Web Application
• Network stats
Example: Web Application
• CPU stats
• Picking an instance is tantamount to resource performance tuning
• Give back instances as easily as you can acquire new ones
• Find an ideal instance type and workload combination
Instance Selection = Performance Tuning
Delivering Compute Performance with
Amazon EC2 Instances
CPU Instructions and Protection Levels
• CPU has at least two protection levels: ring0 and ring1
• Privileged instructions can’t be executed in user mode to protect
system. Applications leverage system calls to the kernel.
Kernel
Application
Example: Web application system calls
X86 CPU Virtualization: Prior to Intel VT-x
• Binary translation for privileged instructions
• Para-virtualization (PV)
• PV requires going through the VMM, adding latency
• Applications that are system call bound are most affected
VMM
Application
Kernel
PV
X86 CPU Virtualization: After Intel VT-x
• Hardware assisted virtualization (HVM)
• PV-HVM uses PV drivers opportunistically for operations that are
slow emulated:
• e.g. network and block I/O
Kernel
Application
VMM
PV-HVM
Tip: Use PV-HVM AMIs with EBS
Time Keeping Explained
• Time keeping in an instance is deceptively hard
• gettimeofday(), clock_gettime(), QueryPerformanceCounter()
• The TSC
• CPU counter, accessible from userspace
• Requires calibration, vDSO
• Invariant on Sandy Bridge+ processors
• Xen pvclock; does not support vDSO
• On current generation instances, use TSC as clocksource
Tip: Use TSC as clocksource
tsc
source=tsc
CPU Performance and Scheduling
• Hypervisor ensures every guest receives CPU time
• Fixed allocation
• Uncapped vs. capped
• Variable allocation
• Different schedulers can be used depending on the goal
• Fairness
• Response time / deadline
• Shares
Review: C4 Instances
Custom Intel E5-2666 v3 at 2.9 GHz
P-state and C-state controls
Model vCPU Memory (GiB) EBS (Mbps)
c4.large 2 3.75 500
c4.xlarge 4 7.5 750
c4.2xlarge 8 15 1,000
c4.4xlarge 16 30 2,000
c4.8xlarge 36 60 4,000
What’s new in C4: P-state and C-state control
• By entering deeper idle states, non-idle cores can achieve
up to 300MHz higher clock frequencies
• But… deeper idle states require more time to exit, may not
be appropriate for latency sensitive workloads
Tip: P-state control for AVX2
• If an application makes heavy use of AVX2 on all cores, the
processor may attempt to draw more power than it should
• Processor will transparently reduce frequency
• Frequent changes of CPU frequency can slow an application
Review: T2 Instances
• Lowest cost EC2 Instance at $0.013 per hour
• Burstable performance
• Fixed allocation enforced with CPU Credits
Model vCPU CPU Credits
/ Hour
Memory
(GiB)
Storage
t2.micro 1 6 1 EBS Only
t2.small 1 12 2 EBS Only
t2.medium 2 24 4 EBS Only
t2.large 2 36 8 EBS Only
How Credits Work
• A CPU Credit provides the
performance of a full CPU core for
one minute
• An instance earns CPU credits at
a steady rate
• An instance consumes credits
when active
• Credits expire (leak) after 24 hours
Baseline Rate
Credit
Balance
Burst
Rate
Tip: Monitor CPU credit balance
Monitoring CPU Performance in Guest
• Indicators that work is being done
• User time
• System time (kernel mode)
• Wait I/O, threads blocked on disk I/O
• Else, Idle
• What happens if OS is scheduled off the CPU?
Tip: How to interpret Steal Time
• Fixed CPU allocations of CPU can be offered through
CPU caps
• Steal time happens when CPU cap is enforced
• Leverage CloudWatch metrics
Delivering I/O Performance with
Amazon EC2 Instances
I/O and Devices Virtualization
• Scheduling I/O requests between virtual devices and
shared physical hardware
• Split driver model
• Intel VT-d
• Direct pass through and IOMMU for dedicated devices
• Enhanced Networking
Hardware
Split Driver Model
Driver Domain Guest Domain Guest Domain
VMM
Frontend
driver
Frontend
driver
Backend
driver
Device
Driver
Physical
CPU
Physical
Memory
Network
Device
Virtual CPU
Virtual
Memory
CPU
Scheduling
Split Driver Model
• Each virtual device has two main components
• Communication ring buffer
• An event channel signaling activity in the ring buffer
• Data is transferred through shared pages
• Shared pages requires inter domain permissions, or granting
Review: I2 Instances
16 vCPU: 3.2 TB SSD; 32 vCPU: 6.4 TB SSD
365K random read IOPS for 32 vCPU instance
Model vCPU Memory
(GiB)
Storage Read IOPS Write IOPS
i2.xlarge 4 30.5 1 x 800 SSD 35,000 35,000
i2.2xlarge 8 61 2 x 800 SSD 75,000 75,000
i2.4xlarge 16 122 4 x 800 SSD 175,000 155,000
i2.8xlarge 32 244 8 x 800 SSD 365,000 315,000
Granting in pre-3.8.0 Kernels
• Requires “grant mapping” prior to 3.8.0
• Grant mappings are expensive operations due to TLB flushes
read(fd, buffer,…)
Granting in 3.8.0+ Kernels, Persistent and Indirect
• Grant mappings are setup in a pool once
• Data is copied in and out of the grant pool
read(fd, buffer…)
Copy to
and from
grant pool
Tip: Use 3.8+ kernel
• Amazon Linux 13.09 or later
• Ubuntu 14.04 or later
• RHEL7 or later
• Etc.
Event Handling
• Guest vCPUs are interrupted to process events.
• Pre-2.6.36 kernels: notifications went to a single virtual
hardware interrupt
• Post-2.6.36 kernels: allow instance to tell hypervisor to deliver
notification to a specific vCPU for balancing
• Check "dmesg" for the following text: "Xen HVM callback vector for
event delivery is enabled“
• Also, check version of irqbalance is 1.0.7 or higher
Hardware
Split Driver Model: Networking
Driver Domain Guest Domain Guest Domain
VMM
Frontend
driver
Frontend
driver
Backend
driver
Device
Driver
Physical
CPU
Physical
Memory
Network
Device
Virtual CPU
Virtual
Memory
CPU
Scheduling
Sockets
Application
Hardware
Split Driver Model: Networking
Driver Domain Guest Domain Guest Domain
VMM
Frontend
driver
Frontend
driver
Backend
driver
Device
Driver
Physical
CPU
Physical
Memory
Network
Device
Virtual CPU
Virtual
Memory
CPU
Scheduling
Sockets
Application
Hardware
Split Driver Model: Networking
Driver Domain Guest Domain Guest Domain
VMM
Frontend
driver
Frontend
driver
Backend
driver
Device
Driver
Physical
CPU
Physical
Memory
Network
Device
Virtual CPU
Virtual
Memory
CPU
Scheduling
Sockets
Application
Hardware
Split Driver Model: Networking
Driver Domain Guest Domain Guest Domain
VMM
Frontend
driver
Frontend
driver
Backend
driver
Device
Driver
Physical
CPU
Physical
Memory
Network
Device
Virtual CPU
Virtual
Memory
CPU
Scheduling
Sockets
Application
Hardware
Split Driver Model: Networking
Driver Domain Guest Domain Guest Domain
VMM
Frontend
driver
Frontend
driver
Backend
driver
Device
Driver
Physical
CPU
Physical
Memory
Network
Device
Virtual CPU
Virtual
Memory
CPU
Scheduling
Sockets
Application
Device Pass Through: Enhanced Networking
• SR-IOV eliminates need for driver domain
• Physical network device exposes virtual function to
instance
• Requires a specialized driver, which means:
• Your instance OS needs to know about it
• EC2 needs to be told your instance can use it
Hardware
After Enhanced Networking
Driver Domain Guest Domain Guest Domain
VMM
Frontend
driver
NIC
Driver
Backend
driver
Device
Driver
Physical
CPU
Physical
Memory
SR-IOV Network
Device
Virtual CPU
Virtual
Memory
CPU
Scheduling
Sockets
Application
Hardware
After Enhanced Networking
Driver Domain Guest Domain Guest Domain
VMM
Frontend
driver
NIC
Driver
Backend
driver
Device
Driver
Physical
CPU
Physical
Memory
SR-IOV Network
Device
Virtual CPU
Virtual
Memory
CPU
Scheduling
Sockets
Application
Hardware
After Enhanced Networking
Driver Domain Guest Domain Guest Domain
VMM
Frontend
driver
NIC
Driver
Backend
driver
Device
Driver
Physical
CPU
Physical
Memory
SR-IOV Network
Device
Virtual CPU
Virtual
Memory
CPU
Scheduling
Sockets
Application
Tip: Use Enhanced Networking
• Highest packets-per-second
• Lowest variance in latency
• Instance OS must support it
• Look for SR-IOV property of instance or image
Inter-instance latency
How to build enhanced network
driver on Linux
Let’s start to create AMI for
Enhanced networking enabled
CentOS 6.5 with c4.8xlarge
Let’s start with the AMI officially provided CentOS, should
be clean.
CentOS is provided in awsmarketplace.
Only missing c4.8xlarge
L
Anyway, let’s go to AMI
search
Find and Select AMI
CentOS 6 x86_64 (2014_09_29) EBS HVM-74e73035-
3435-48d6-88e0-89cc02ad83ee-ami-a8a117c0.2
ami-c2a818aa (IAD)
This is the CentOS AMI at awsmarketplace CentOS. Nice
?!?!
Check the requirements of
Enabling Enhanced Networking on Linux
• C3, C4, D2, I2, M4 and R3
• HVM AMI with Linux kernel above V.2.6.32
• Launch the instance in VPC
• A network driver to support enhanced networking on
Linux.
Check kernel version and network driver
[root@ip-192-168-1-171 ~]# cat /etc/redhat-release
CentOS release 6.5 (Final)
[root@ip-192-168-1-171 ~]# uname -na
Linux ip-192-168-1-171 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue Sep 9 21:36:05 UTC 2014 x86_64 x86_64
x86_64 GNU/Linux
[root@ip-192-168-1-171 ~]# modinfo ixgbevf
modinfo ixgbevf
filename: /lib/modules/2.6.32-431.29.2.el6.x86_64/kernel/drivers/net/ixgbevf/ixgbevf.ko
version: 2.7.12-k
license: GPL
description: Intel(R) 82599 Virtual Function Driver
author: Intel Corporation, <linux.nics@intel.com>
srcversion: E75203124BB105EC871944F
alias: pci:v00008086d00001515sv*sd*bc*sc*i*
alias: pci:v00008086d000010EDsv*sd*bc*sc*i*
depends:
vermagic: 2.6.32-431.29.2.el6.x86_64 SMP mod_unload modversions
parm: debug:Debug level (0=none,...,16=all) (int)
Launch c4.large instance and login
Let’s update all including kernel
root@ip-192-168-1-171 ~]# yum update -y
Loaded plugins: fastestmirror, presto
Loading mirror speeds from cached hostfile
* base: mirrors.mit.edu
* extras: linux.cc.lehigh.edu
* updates: mirrors.lga7.us.voxel.net
Setting up Update Process
Resolving Dependencies
--> Running transaction check
---> Package audit.x86_64 0:2.2-4.el6_5 will be updated
---> Package audit.x86_64 0:2.3.7-5.el6 will be an update
---> Package audit-libs.x86_64 0:2.2-4.el6_5 will be updated
---> Package audit-libs.x86_64 0:2.3.7-5.el6 will be an update
---> Package authconfig.x86_64 0:6.1.12-13.el6 will be updated
---> Package authconfig.x86_64 0:6.1.12-19.el6 will be an update
---> Package bash.x86_64 0:4.1.2-15.el6_5.2 will be updated
---> Package bash.x86_64 0:4.1.2-29.el6 will be an update
---> Package binutils.x86_64 0:2.20.51.0.2-5.36.el6 will be updated
---> Package binutils.x86_64 0:2.20.51.0.2-5.42.el6 will be an update
---> Package ca-certificates.noarch 0:2014.1.98-65.0.el6_5 will be updated
---> Package ca-certificates.noarch 0:2014.1.98-65.1.el6 will be an update
---> Package centos-release.x86_64 0:6-5.el6.centos.11.2 will be updated
---> Package centos-release.x86_64 0:6-6.el6.centos.12.2 will be an update
---> Package coreutils.x86_64 0:8.4-31.el6_5.2 will be updated
---> Package coreutils.x86_64 0:8.4-37.el6 will be an update
---> Package coreutils-libs.x86_64 0:8.4-31.el6_5.2 will be updated
………………………
Reboot and check the updated
[root@ip-192-168-1-171 ~]# cat /etc/redhat-release
CentOS release 6.6 (Final)
[root@ip-192-168-1-171 ~]# uname -na
Linux ip-192-168-1-171 2.6.32-504.12.2.el6.x86_64 #1 SMP Wed Mar 11 22:03:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@ip-192-168-1-171 ~]# modinfo ixgbevf
filename: /lib/modules/2.6.32-504.12.2.el6.x86_64/kernel/drivers/net/ixgbevf/ixgbevf.ko
version: 2.12.1-k
license: GPL
description: Intel(R) 82599 Virtual Function Driver
author: Intel Corporation, <linux.nics@intel.com>
srcversion: 8797AC845BB302315230490
alias: pci:v00008086d00001515sv*sd*bc*sc*i*
alias: pci:v00008086d000010EDsv*sd*bc*sc*i*
depends:
vermagic: 2.6.32-504.12.2.el6.x86_64 SMP mod_unload modversions
parm: debug:Debug level (0=none,...,16=all) (int)
Upgraded network driver installed
version: 2.7.12-k -> 2.12.1-k
Enable SR-IOV
Install AWS CLI or EC2 CLI tools
Not supported at AWS Console yet L
1. Stop the instance
2. Enable SR-IOV of the instance with CLI
3. Check the status
4. Start the instance a82066443ffe:~ ilho$ aws ec2 modify-instance-attribute --instance-id i-681280bf 
--sriov-net-support simple
a82066443ffe:~ ilho$ aws ec2 describe-instance-attribute --instance-id i-681280bf 
--attribute sriovNetSupport
{
"InstanceId": "i-681280bf",
"SriovNetSupport": {
"Value": "simple"
}
}
A problem with more than 32 vCPUs on Linux
CentOS 6.x does not support more than 32 vCPUs in
kernel.
It can not boot when you launch c4.8xlarge(36 vCPUs) L
• d2.8xlarge and m4.10xlarge? -> L
A solution is to add an option to kernel boot parameter
Add maxcpus option to kernel boot parameter
$ vi /boot/grub/menu.lst
Add maxcpus=32
[root@ip-10-10-10-242 ~]# cat /boot/grub/menu.lst
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You do not have a /boot partition. This means that
# all kernel and initrd paths are relative to /, eg.
# root (hd0,0)
# kernel /boot/vmlinuz-version ro root=/dev/vda1
# initrd /boot/initrd-[generic-]version.img
#boot=/dev/vda
default=0
timeout=1
serial --unit=0 --speed=115200
terminal --timeout=1 serial console
title CentOS (2.6.32-431.29.2.el6.x86_64)
root (hd0,0)
kernel /boot/vmlinuz-2.6.32-431.29.2.el6.x86_64 maxcpus=32 ro root=UUID=dcb1645e-05a6-4311-
8bce-a9c12bec5801 rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD console=ttyS0,115200
crashkernel=auto SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM
initrd /boot/initramfs-2.6.32-431.29.2.el6.x86_64.img
Now let’s launch c4.8xlarge instance
I think it’s ready to go.
Reboot
Launch a c4.8xlarge instance
It can not launch.
Why?
AWS Marketplace AMI does not support to launch an
instance type not in the list.
Change the base AMI
Find CentOS 6.5 community version
AMI : CentOS-6.5-base-20150305
(ami-0e80db66)
Repeat three steps
1. $ sudo yum update –y
2. Enable SR-IOV
3. Add maxcpus=32
Network driver version should be checked
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/en
hanced-networking.html
To enable enhanced networking on your instance, you
must ensure that its kernel has the ixgbevf module installed
and that you set the sriovNetSupport attribute for the
instance. For the best performance, we recommend that
the ixgbevf module is version 2.14.2 or higher.
Build and Install the network driver #2
[ec2-user@ip-192-168-1-50 src]$ make;sudo make install
make -C /lib/modules/2.6.32-504.12.2.el6.x86_64/build SUBDIRS=/home/ec2-user/ixgbevf-2.16.1/src
modules
make[1]: Entering directory `/usr/src/kernels/2.6.32-504.12.2.el6.x86_64'
CC [M] /home/ec2-user/ixgbevf-2.16.1/src/ixgbevf_main.o
CC [M] /home/ec2-user/ixgbevf-2.16.1/src/ixgbevf_param.o
CC [M] /home/ec2-user/ixgbevf-2.16.1/src/ixgbevf_ethtool.o
CC [M] /home/ec2-user/ixgbevf-2.16.1/src/kcompat.o
CC [M] /home/ec2-user/ixgbevf-2.16.1/src/ixgbe_vf.o
CC [M] /home/ec2-user/ixgbevf-2.16.1/src/ixgbe_mbx.o
LD [M] /home/ec2-user/ixgbevf-2.16.1/src/ixgbevf.o
Building modules, stage 2.
MODPOST 1 modules
CC /home/ec2-user/ixgbevf-2.16.1/src/ixgbevf.mod.o
LD [M] /home/ec2-user/ixgbevf-2.16.1/src/ixgbevf.ko.unsigned
NO SIGN [M] /home/ec2-user/ixgbevf-2.16.1/src/ixgbevf.ko
make[1]: Leaving directory `/usr/src/kernels/2.6.32-504.12.2.el6.x86_64'
make -C /lib/modules/2.6.32-504.12.2.el6.x86_64/build SUBDIRS=/home/ec2-user/ixgbevf-2.16.1/src
modules
make[1]: Entering directory `/usr/src/kernels/2.6.32-504.12.2.el6.x86_64'
Building modules, stage 2.
MODPOST 1 modules
make[1]: Leaving directory `/usr/src/kernels/2.6.32-504.12.2.el6.x86_64'
gzip -c ../ixgbevf.7 > ixgbevf.7.gz
# remove all old versions of the driver
find /lib/modules/2.6.32-504.12.2.el6.x86_64 -name ixgbevf.ko -exec rm -f {} ; || true
find /lib/modules/2.6.32-504.12.2.el6.x86_64 -name ixgbevf.ko.gz -exec rm -f {} ; || true
install -D -m 644 ixgbevf.ko /lib/modules/2.6.32-
504.12.2.el6.x86_64/kernel/drivers/net/ixgbevf/ixgbevf.ko
/sbin/depmod -a 2.6.32-504.12.2.el6.x86_64 || true
install -D -m 644 ixgbevf.7.gz /usr/share/man/man7/ixgbevf.7.gz
man -c -P'cat > /dev/null' ixgbevf || true
http://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
Check the new driver installed
[ec2-user@ip-192-168-1-50 src]$ modinfo ixgbevf
filename: /lib/modules/2.6.32-504.12.2.el6.x86_64/kernel/drivers/net/ixgbevf/ixgbevf.ko
version: 2.16.1
license: GPL
description: Intel(R) 10 Gigabit Virtual Function Network Driver
author: Intel Corporation, <linux.nics@intel.com>
srcversion: 3B690FE23A02C25EF74012F
alias: pci:v00008086d00001515sv*sd*bc*sc*i*
alias: pci:v00008086d000010EDsv*sd*bc*sc*i*
depends:
vermagic: 2.6.32-504.12.2.el6.x86_64 SMP mod_unload modversions
parm: InterruptThrottleRate:Maximum interrupts per second, per vector, (956-488281,
0=off, 1=dynamic), default 1 (array of int)
reboot
[ec2-user@ip-192-168-1-50 ~]$ ethtool -i eth0
driver: ixgbevf
version: 2.16.1
firmware-version: N/A
bus-info: 0000:00:03.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no
Share AMI to the users
Sharing the customized AMI with your team ~
Let’s get some sleep… J
Reply back with 1/2 (OS unreachable)
20160503 Amazed by AWS | Tips about Performance on AWS
Root cause found
the 70-persistent-net.rules file /etc/udev/rules.d/ having an
entry with the MAC address of the original instance the AMI
was taken from. When the image is taken, the MAC from
the original instance persists, which tells the OS after the
new instance is deployed that the now-nonexistent MAC of
the original instance should be eth0. Since the OS cannot
find the device with the original MAC address, eth0 fails to
be identified and isn't brought up.
A solution to avoid this caveat
Before creating AMI, You must remove
/etc/udev/rules.d/70-persistent-net.rules file
Create AMI
Share AMI
Now launch c4 instance successfully with the latest driver.
A small contribution
The note was added in public document “enhanced networking”
section.
Lesson learned
Just use CentOS 7 or Amazon Linux. J
EBS designing for performance
Amazon EBS overview
EBS =
What is EBS?
• Network block storage as a service
• EBS volumes attach to any Amazon EC2 instance in the
same Availability Zone
• Designed for five nines of availability
• 2 million volumes created every day
EBS volume types
Magnetic General purpose (SSD) Provisioned IOPS
(SSD)
EBS volume types
IOPS: Typically 100, best effort
Throughput: 40-90 MB/s
Latency: Read 10-40ms, Write 2-10ms
Best for infrequently accessed data
Magnetic
EBS volume types
IOPS Baseline: 100-10,000 (3 / GiB)
IOPS Burst: 30 minutes @ 3,000
Throughput: Up to 160 MB/s
Latency: Single-digit ms
Performance consistency: 99%
Most workloadsGeneral purpose (SSD)
EBS volume types
IOPS: 100-20,000 (customer provisioned)
Throughput: Up to 320 MB/s
Latency: Single-digit ms
Performance consistency: 99.9%
Mission Critical workloadsProvisioned IOPS (SSD)
Performance
Queuing theory
Little’s law is the foundation for performance tuning theory
• Mathematically proven by John Little in 1961
𝑾 =
𝑳
𝑨
W = Wait time = average wait time per request
L = Queue length = average number of requests waiting
A = Arrival rate = the rate of requests arriving
EBS performance is related to this law
Performance optimization is measured by:
IOPS: Read/write I/O rate (IOPS)
Latency: Time between I/O submission
and completion (ms)
Throughput: Read/write transfer rate
(MB/s); throughput = IOPS X I/O size
Key components to performance
EC2 instance
I/O
EBS
Network link
A day in the life of an I/O
A day in the life: I/O
All I/O must pass through I/O domain
Requires “grant mapping” prior to 3.8.0
Grant mappings are expensive operations due to TLB flushes
EBS
Grant
mappingread(fd, buffer,
BLOCK_SIZE)
I/O domainInstance
A day in the life: I/O (continued)
Responses
Requests
Instance I/O domain
READ
8KB @ 1234
Request queue is a single memory page
Each I/O request has 11 grant references (4KiB/reference)
Maximum data in queue = 1408 KiB
3.8.0+ Kernels – Persistent grants
Grant mappings are setup in a pool once
Data is copied in and out of the grant pool
Copying is significantly faster than remapping
EBS
Grant
poolread(fd, buffer,
BLOCK_SIZE)
I/O domainInstance
3.8.0+ Kernels – Indirect grants
Responses
Requests
Instance
READ
8KB @ 1234
I/O domain
Each I/O request has grant references that contain grant references
Maximum data in queue = 4096 KiB (default)
Instance I/O: Before 3.8.0
0 1 2 3 4 5 30 31
128KiB
44KiB 44KiB 40KiB
Instance I/O: Linux 3.8.0+
0 1 2 3 4 5 30 31
128KiB
Tip: Use 3.8+ kernel
Amazon Linux 2013.09 or later
Ubuntu 14.04 or later
RHEL7 or later
Etc.
Queue depth
An I/O operation
EBS
After it’s gone, it’s gone
EC2
Queue depth is the pending I/O for a volume
Workload/
software
Typical block
size
Random/
Seq?
Max EBS @ 500
MB/s instances
Max EBS @
1 GB/s instances
Max EBS @ 10 GB/s
instances
Oracle DB Configurable:2 KB
–16 KB
Default 8 KB
random ~7,800 IOPS ~15,600 IOPS ~48,000 IOPS
Microsoft SQL
Server
8 KB w/ 64 KB
extents
random ~7,800 IOPS ~15,600 IOPS ~48,000 IOPS
MySQL 16 KB random ~4,000 IOPS ~7,800 IOPS ~48,000 IOPS
PostgreSQL 8 KB random ~7,800 IOPS ~15,600 IOPS ~48,000 IOPS
MongoDB 4 KB sequential ~15,600 IOPS ~31,000 IOPS ~48,000 IOPS
Apache
Cassandra
4 KB random ~15,600 IOPS ~31,000 IOPS ~48,000 IOPS
GlusterFS 128 KB sequential ~500 IOPS ~1,000 IOPS ~6,000 IOPS
Cheat sheet: Sample storage workloads on AWS
Example workload
Transaction (OLTP)
Examples: eCommerce website, metadata storage
Benchmark: MySQL + sysbench
Tip: Workload
Where possible, use real
production workloads for
performance testing
Baseline configuration
Availability Zone: US West (Oregon)
Instance type: m2.4xlarge
vCPU: 8
Memory: 68.4GiB
EBS-optimized
Data volume: 500GiB EBS magnetic
OS: Amazon Linux 2015.03.1
Optimization: Increase parallelism
MySQL threads
Transactions(n)
Baseline
2 n
Tip: Parallelism
Increase parallelism of your
system
Key components to performance
EC2 instance
I/O
EBS
Network link
m2.4xlarge
CPU: Intel Xeon
vCPU: 8
Memory: 68.4 GiB
Price: $0.98/hour*
Instance selection
r3.2xlarge
CPU: Intel Xeon E5-2670 v2
vCPU: 8
Memory: 61 GiB
Enhanced networking
Price: $0.70/hour*
* All pricing from US West (Oregon)
EBS optimized instances
• Most instancefamilies support the EBS-optimizedflag
• EBS-optimized instances now support up to 4 Gb/s
• Drive 32,000 16K IOPS or 500 MB/s
• Available by defaulton newer instance types
• EC2 *.8xlargeinstances support 10 Gb/s network
• Max IOPS per node supported is ~48,000 IOPS @ 16K I/O
Tip: Use EBS-optimized instances
Use EBS-optimized instances
for consistent EBS performance
Updated configuration: Instance type
Availability Zone: US West (Oregon)
Instance type: r3.2xlarge
vCPU: 8
Memory: 61 GiB
EBS-optimized
EBS volume: 500GiB magnetic
OS: Amazon Linux 2015.03.1
25%
Optimization: Current generation instances
MySQL threads
Transactions(n)
Baseline
r3.2xlarge
2 n
Tip: Instance selection
Use the right instance family for
your workload
Use current generation instances
Key components to performance
EC2 instance
I/O
EBS
Network link
Volume selection
EBS magnetic
Latency:
Read: 10-40ms
Write: 2-10ms
SSD backed
Latency:
Read/Write: Single-digit ms
File systems
Use a modern, journaled filesystem
ext4, xfs, etc.
Ensure partitions are aligned on 4KiB boundaries
Pre-warming
Volume initialization
Newly created volumes
• Just attach, mount, and go!
• Pre-warming is no longer recommended
Volumes restored from snapshots
• You can use your volume right away
• Accelerate data loading by reading
Updated configuration: EBS volumes
Availability Zone: US West (Oregon)
Instance type: r3.2xlarge
vCPU: 8
Memory: 61 GiB
EBS-optimized
Boot volume: 8 GiB – EBS general purpose
Data volume: 500 GiB – EBS general purpose
OS: Amazon Linux 2015.03.1
Optimization: Volume selection
Transactions(n)
19% 50%
MySQL threads
Baseline
r3.2xlarge
r3.2xlarge gp2
2 n
Tip: Volume selection
Use SSD backed volumes when
performance matters
EBS IOPS vs. Throughput
20,000 IOPS
PIOPS volume
20,000 IOPS
320 MB/s
throughput
You can achieve 20,000 IOPS when
driving smaller I/O operations
You can achieve up to 320 MB/s
when driving larger I/O operations
EBS IOPS vs. Throughput
8,000 IOPS
PIOPS volume
8,000 IOPS
320 MB/s
throughput
8,000 x 8 KB = 64 MB/s
8,000 x 16 KB = 128 MB/s
8,000 x 32 KB = 256 MB/s
16,000 x 8 KB = 128 MB/s
8,000 x 64 KB=512 MB/s
5,000 x 64 KB = 320 MB/s
Striping
Increases performance, or capacity, or both
Don’t mix volume types
Typically RAID 0 or LVM stripe
Avoid RAID for redundancy
EBS
EC2
Striping: Snapshots
Quiesce I/O
1. Database: FLUSH and LOCK tables
2. Filesystem: sync and fsfreeze
3. EBS: snapshot all volumes
When snapshot API returns,
it is safe to resume
EBS-optimized instance
Four key components: Balanced
EC2
A “boatload” of I/O
Right-sized EBS
Monitoring
Amazon CloudWatch
Important Amazon CloudWatch metrics:
• IOPS and bandwidth
• Latency
• Queue depth
All EBS metrics are prefixed with “Volume”
CloudWatch: Instance bandwidth
m4.2xlarge
Instance: 128MB/s
m4.4xlarge
Instance: 256MB/s
m4.10xlarge
Volume: 320MB/s
S3 Performance tips
Distributing Key Names
Don’t do this
<my_bucket>/2013_11_13-164533125.jpg
<my_bucket>/2013_11_13-051033564.jpg
<my_bucket>/2013_11_13-061133789.jpg
<my_bucket>/2013_11_13-051033458.jpg
<my_bucket>/2013_11_12-063433125.jpg
<my_bucket>/2013_11_12-021033564.jpg
<my_bucket>/2013_11_12-065533789.jpg
<my_bucket>/2013_11_12-011033458.jpg
<my_bucket>/2013_11_11-022333125.jpg
<my_bucket>/2013_11_11-153433564.jpg
<my_bucket>/2013_11_11-065233789.jpg
<my_bucket>/2013_11_11-065633458.jpg
Distributing Key Names
Add randomness to the beginning of the key name
<my_bucket>/521335461-2013_11_13.jpg
<my_bucket>/465330151-2013_11_13.jpg
<my_bucket>/987331160-2013_11_13.jpg
<my_bucket>/465765461-2013_11_13.jpg
<my_bucket>/125631151-2013_11_13.jpg
<my_bucket>/934563160-2013_11_13.jpg
<my_bucket>/532132341-2013_11_13.jpg
<my_bucket>/565437681-2013_11_13.jpg
<my_bucket>/234567460-2013_11_13.jpg
<my_bucket>/456767561-2013_11_13.jpg
<my_bucket>/345565651-2013_11_13.jpg
<my_bucket>/431345660-2013_11_13.jpg
Other Techniques for Distributing Key Names
Store objects as a hash of their name
• add the original name as metadata
• “deadmau5_mix.mp3” à 0aa316fb000eae52921aab1b4697424958a53ad9
– watch for duplicate names!
• prepend keyname with short hash
• 0aa3-deadmau5_mix.mp3
Epoch time (reverse)
• 5321354831-deadmau5_mix.mp3
Randomness in a Key Name Can Be an Anti-Pattern
Lifecycle policies
LISTs with prefix filters
Maintaining thumbnails of images
• craig.jpg -> stored as orig-09329jed0fc
• thumb-09329jed0fc
When you need to recover a file with its original name
Solving for the Anti-Pattern
Add additional prefixes to help sorting
Amazon S3 maintains keys lexicographically in its internal
indices
<my_bucket>/images/521335461-2013_11_13.jpg
<my_bucket>/images/465330151-2013_11_13.jpg
<my_bucket>/movies/293924440-2013_11_13.jpg
<my_bucket>/movies/987331160-2013_11_13.jpg
<my_bucket>/thumbs-small/838434842-2013_11_13.jpg
<my_bucket>/thumbs-small/342532454-2013_11_13.jpg
<my_bucket>/thumbs-small/345233453-2013_11_13.jpg
<my_bucket>/thumbs-small/345453454-2013_11_13.jpg
Amazon CloudFront
CloudFront dynamic content
accleratoion
Region
Edge Location
12 Regions
32 Availability Zones
54 Edge Locations
Need to update
We’re here J
Configure multiple origins
Elastic Load
Balancing
Dynamic content
Amazon EC2
Static content
Amazon S3
* (default)
/error/*
/assets/*
Amazon CloudFront
example.com
CloudFront Behaviors
CloudFront
Customer	Location
www.mysite.com
Path Pattern Matching
/*.jpg; /*.php etc.
GET http://mysite.com/images/1.jpg to ORIGIN A
GET http://mysite.com/index.php to ORIGIN B
GET http://mysite.com/web/home.css to ORIGIN C
GET http://mysite.com/* (DEFAULT) to ORIGIN D
Origin A: S3 bucket
Origin B:
www.mysite.com
Origin C: S3 Bucket
Origin D:
www.mysite.com
Path Pattern Matching
/*.php
/images/*.jpg
/web/*.css
/*.* (DEFAULT)
Region
Edge Location
12 Regions
32 Availability Zones
54 Edge Locations
Need to update AWS optimized network
Internet
Demo J
S3 Transfer Acceleration
New~
Amazon S3 Transfer Acceleration
Embedded WAN acceleration
S3 Bucket
AWS Edge
Location
Uploader
Optimized
Throughput!
Move over long geographic distances
Up to 300% (6x) faster
No firewall mods, no client software
54 global edge locations
Change your endpoint, not your code
Accelerate Speed Comparison
• Test URL
• http://s3-accelerate-speedtest.s3-
accelerate.amazonaws.com/en/accelerate-speed-
comparsion.html
• bit.ly/news3ta
• Test Result (May-02-2016)
• Tested on May-02-2016, LGU+ Wifi at GSTower in Seoul
• http://bit.ly/newss3taresult
Testing S3 Transfer Accelerator by AWSCLI
$ sudo pip install –upgrade
awscli
$ aws configure set
default.s3.use_accelerate_endpo
int true
Testing S3 Transfer Accelerator by AWSCLI
$ aws s3 cp 33MB.pptx s3://ilho-saopaulo-01/
$ aws s3 cp 33MB.pptx s3://ilho-saopaulo-01/ --endpoint-
url http://ilho-saopaulo-01.s3-accelerate.amazonaws.com
20160503 Amazed by AWS | Tips about Performance on AWS
S3 Transfer Acceleration Pricing
Starting at $0.04/GB transferred (+ usual
bandwidth charges). Up to $0.08/GB in
some regions
Pay only for what you use
Accelerated performance or no charge
Compare to hardware, per-GB or licenses

More Related Content

PDF
Aurora는 어떻게 다른가 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
PDF
AWS Summit Seoul 2015 - EBS 성능 향상 및 EC2 비용 최적화 기법
PDF
Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
PDF
2017 AWS DB Day | Amazon Redshift 소개 및 실습
PDF
Introducing Elastic MapReduce
PDF
Comenzando com la nube hibrida
PDF
AWS를 통한 신뢰성 높은 지속적 배포 및 통합(CD/CI) 사례 - AWS Summit Seoul 2017
PDF
AWS를 이용해서 나만의 글로벌 인터넷 방송국 만들기 :: 이상오 :: AWS Summit Seoul 2016
Aurora는 어떻게 다른가 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
AWS Summit Seoul 2015 - EBS 성능 향상 및 EC2 비용 최적화 기법
Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
2017 AWS DB Day | Amazon Redshift 소개 및 실습
Introducing Elastic MapReduce
Comenzando com la nube hibrida
AWS를 통한 신뢰성 높은 지속적 배포 및 통합(CD/CI) 사례 - AWS Summit Seoul 2017
AWS를 이용해서 나만의 글로벌 인터넷 방송국 만들기 :: 이상오 :: AWS Summit Seoul 2016

Viewers also liked (20)

PDF
AWS Innovate: Smart Deployment on AWS - Andy Kim
PDF
AWS 기반 문서중앙화 솔루션 구축 방안::이덕재::AWS Summit Seoul 2016
PDF
AWS를 활용한 글로벌 아키텍처 운용 전략 - 김상필 솔루션즈 아키텍트:: AWS Cloud Track 2 Advanced
PDF
엔터프라이즈를 위한 AWS 지원 및 사례 (서수영) - AWS 웨비나 시리즈
PDF
엔터프라이즈 기술 지원을 통한 효율적인 클라우드 운영 사례 - AWS Summit Seoul 2017
PDF
고급 클라우드 아키텍처 방법론- 양승도 솔루션즈 아키텍트:: AWS Cloud Track 2 Advanced
PDF
소프트웨어 기반의 비디오 처리 기술을 통한 확장성 및 비용 절감 사례 :: Elemental Technologies :: AWS Medi...
PDF
찾아가는 AWS 세미나(구로,가산,판교) - AWS에서 작은 서비스 구현하기 (김필중 솔루션즈 아키텍트)
PDF
New Trends of Geospatial Services on AWS Cloud - Channy Yun :: ICGIS 2015 Seoul
PDF
스플렁크를 이용한 AWS운영 인텔리전스 확보:: Splunk 최승돈 :: AWS Summit Seoul 2016
PDF
AWS와 비즈니스 프로젝트 협력 방식 및 사례 소개 - 노경훈 매니저:: AWS Cloud Track 2 Advanced
PDF
AWS에 대해 가장 궁금했던 열가지 - 정우근 매니저:: AWS Cloud Track 1 Intro
PDF
중국에서의 AWS 활용 현황 및 유저그룹 활동 - AWS Summit Seoul 2017
PDF
AWS로 사용자 천만명 서비스 만들기 - 윤석찬 (AWS 테크에반젤리스트) :: AWS 웨비나 시리즈 2015
PDF
AWS CLOUD 2017 - AWS와 함께하는 엔터프라이즈 비즈니스 애플리케이션 도입하기 (임혁용 매니저)
PDF
AWS와 함께하는 스타트업의 성장곡선 (윤석찬)- 클라우드 태권 2015
PDF
AWS 기반 실시간 서비스 개발 및 운영 사례 - AWS Summit Seoul 2017
PDF
Gaming on AWS - 5rocks on AWS
PDF
Cloud Taekwon 2015 - 비트패킹컴퍼니 사례 공유
PDF
AWS 기반 스마트시티 제언 및 사례 - AWS Summit Seoul 2017
AWS Innovate: Smart Deployment on AWS - Andy Kim
AWS 기반 문서중앙화 솔루션 구축 방안::이덕재::AWS Summit Seoul 2016
AWS를 활용한 글로벌 아키텍처 운용 전략 - 김상필 솔루션즈 아키텍트:: AWS Cloud Track 2 Advanced
엔터프라이즈를 위한 AWS 지원 및 사례 (서수영) - AWS 웨비나 시리즈
엔터프라이즈 기술 지원을 통한 효율적인 클라우드 운영 사례 - AWS Summit Seoul 2017
고급 클라우드 아키텍처 방법론- 양승도 솔루션즈 아키텍트:: AWS Cloud Track 2 Advanced
소프트웨어 기반의 비디오 처리 기술을 통한 확장성 및 비용 절감 사례 :: Elemental Technologies :: AWS Medi...
찾아가는 AWS 세미나(구로,가산,판교) - AWS에서 작은 서비스 구현하기 (김필중 솔루션즈 아키텍트)
New Trends of Geospatial Services on AWS Cloud - Channy Yun :: ICGIS 2015 Seoul
스플렁크를 이용한 AWS운영 인텔리전스 확보:: Splunk 최승돈 :: AWS Summit Seoul 2016
AWS와 비즈니스 프로젝트 협력 방식 및 사례 소개 - 노경훈 매니저:: AWS Cloud Track 2 Advanced
AWS에 대해 가장 궁금했던 열가지 - 정우근 매니저:: AWS Cloud Track 1 Intro
중국에서의 AWS 활용 현황 및 유저그룹 활동 - AWS Summit Seoul 2017
AWS로 사용자 천만명 서비스 만들기 - 윤석찬 (AWS 테크에반젤리스트) :: AWS 웨비나 시리즈 2015
AWS CLOUD 2017 - AWS와 함께하는 엔터프라이즈 비즈니스 애플리케이션 도입하기 (임혁용 매니저)
AWS와 함께하는 스타트업의 성장곡선 (윤석찬)- 클라우드 태권 2015
AWS 기반 실시간 서비스 개발 및 운영 사례 - AWS Summit Seoul 2017
Gaming on AWS - 5rocks on AWS
Cloud Taekwon 2015 - 비트패킹컴퍼니 사례 공유
AWS 기반 스마트시티 제언 및 사례 - AWS Summit Seoul 2017
Ad

Similar to 20160503 Amazed by AWS | Tips about Performance on AWS (12)

PDF
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
PDF
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
PDF
VMworld 2013: Extreme Performance Series: Monster Virtual Machines
PDF
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
PDF
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
PDF
Deep Dive on Amazon EC2 Instances (March 2017)
PPTX
CPN302 your-linux-ami-optimization-and-performance
PDF
Advanced performance troubleshooting using esxtop
PPTX
HPC and cloud distributed computing, as a journey
PPTX
Sql Start! 2020 - SQL Server Lift & Shift su Azure
PDF
Optimizing elastic search on google compute engine
PDF
Running ElasticSearch on Google Compute Engine in Production
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
VMworld 2013: Extreme Performance Series: Monster Virtual Machines
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph all-flash array design based on NUMA architecture
Deep Dive on Amazon EC2 Instances (March 2017)
CPN302 your-linux-ami-optimization-and-performance
Advanced performance troubleshooting using esxtop
HPC and cloud distributed computing, as a journey
Sql Start! 2020 - SQL Server Lift & Shift su Azure
Optimizing elastic search on google compute engine
Running ElasticSearch on Google Compute Engine in Production
Ad

More from Amazon Web Services Korea (20)

PDF
[D3T1S01] Gen AI를 위한 Amazon Aurora 활용 사례 방법
PDF
[D3T1S06] Neptune Analytics with Vector Similarity Search
PDF
[D3T1S03] Amazon DynamoDB design puzzlers
PDF
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
PDF
[D3T1S07] AWS S3 - 클라우드 환경에서 데이터베이스 보호하기
PDF
[D3T1S05] Aurora 혼합 구성 아키텍처를 사용하여 예상치 못한 트래픽 급증 대응하기
PDF
[D3T1S02] Aurora Limitless Database Introduction
PDF
[D3T2S01] Amazon Aurora MySQL 메이저 버전 업그레이드 및 Amazon B/G Deployments 실습
PDF
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
PDF
AWS Modern Infra with Storage Roadshow 2023 - Day 2
PDF
AWS Modern Infra with Storage Roadshow 2023 - Day 1
PDF
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
PDF
Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...
PDF
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...
PDF
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
PDF
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...
PDF
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
PDF
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...
PDF
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...
PDF
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...
[D3T1S01] Gen AI를 위한 Amazon Aurora 활용 사례 방법
[D3T1S06] Neptune Analytics with Vector Similarity Search
[D3T1S03] Amazon DynamoDB design puzzlers
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S07] AWS S3 - 클라우드 환경에서 데이터베이스 보호하기
[D3T1S05] Aurora 혼합 구성 아키텍처를 사용하여 예상치 못한 트래픽 급증 대응하기
[D3T1S02] Aurora Limitless Database Introduction
[D3T2S01] Amazon Aurora MySQL 메이저 버전 업그레이드 및 Amazon B/G Deployments 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
AWS Modern Infra with Storage Roadshow 2023 - Day 2
AWS Modern Infra with Storage Roadshow 2023 - Day 1
사례로 알아보는 Database Migration Service : 데이터베이스 및 데이터 이관, 통합, 분리, 분석의 도구 - 발표자: ...
Amazon DocumentDB - Architecture 및 Best Practice (Level 200) - 발표자: 장동훈, Sr. ...
Amazon Elasticache - Fully managed, Redis & Memcached Compatible Service (Lev...
Internal Architecture of Amazon Aurora (Level 400) - 발표자: 정달영, APAC RDS Speci...
[Keynote] 슬기로운 AWS 데이터베이스 선택하기 - 발표자: 강민석, Korea Database SA Manager, WWSO, A...
Demystify Streaming on AWS - 발표자: 이종혁, Sr Analytics Specialist, WWSO, AWS :::...
Amazon EMR - Enhancements on Cost/Performance, Serverless - 발표자: 김기영, Sr Anal...
Amazon OpenSearch - Use Cases, Security/Observability, Serverless and Enhance...
Enabling Agility with Data Governance - 발표자: 김성연, Analytics Specialist, WWSO,...

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Electronic commerce courselecture one. Pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
cuic standard and advanced reporting.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Programs and apps: productivity, graphics, security and other tools
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Network Security Unit 5.pdf for BCA BBA.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Reach Out and Touch Someone: Haptics and Empathic Computing
Electronic commerce courselecture one. Pdf
The AUB Centre for AI in Media Proposal.docx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Spectral efficient network and resource selection model in 5G networks
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Chapter 3 Spatial Domain Image Processing.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Understanding_Digital_Forensics_Presentation.pptx
cuic standard and advanced reporting.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Review of recent advances in non-invasive hemoglobin estimation
Programs and apps: productivity, graphics, security and other tools

20160503 Amazed by AWS | Tips about Performance on AWS

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 김일호, SolutionsArchitect 03-May-2016 Tips about performance on AWS Amazed by AWS
  • 4. Host Server Hypervisor Guest 1 Guest 2 Guest n Amazon EC2 Instances
  • 5. 2006 2008 2010 2012 2014 2016 m1.small m1.large m1.xlarge c1.medium c1.xlarge m2.xlarge m2.4xlarge m2.2xlarge cc1.4xlarge t1.micro cg1.4xlarge cc2.8xlarge m1.medium hi1.4xlarge m3.xlarge m3.2xlarge hs1.8xlarge cr1.8xlarge c3.large c3.xlarge c3.2xlarge c3.4xlarge c3.8xlarge g2.2xlarge i2.xlarge i2.2xlarge i2.4xlarge i2.4xlarge m3.medium m3.large r3.large r3.xlarge r3.2xlarge r3.4xlarge r3.8xlarge t2.micro t2.small t2.med c4.large c4.xlarge c4.2xlarge c4.4xlarge c4.8xlarge d2.xlarge d2.2xlarge d2.4xlarge d2.8xlarge g2.8xlarge t2.large m4.large m4.xlarge m4.2xlarge m4.4xlarge m4.10xlarge Amazon EC2 Instances History
  • 6. What to Expect from the Session • Defining system performance and how it is characterized for different workloads • How Amazon EC2 instances deliver performance while providing flexibility and agility • How to make the most of your EC2 instance experience through the lens of several instance types
  • 8. • Servers are hired to do jobs • Performance is measured differently depending on the job Hiring a Server ?
  • 9. • What performance means depend on your perspective: – Response time – Throughput – Consistency Defining Performance: Perspective Matters Application System libraries System calls Kernel Devices Workload
  • 10. Simple Performance Model for Single Thread • Using CPU: executing (in user mode) • Not using CPU: waiting for turn on CPU, waiting for disk or network I/O, thread locks, memory paging, or for more work.
  • 11. Performance Factors Resource Performance factors Key indicators CPU Sockets, number of cores, clock frequency, bursting capability CPU utilization, run queue length Memory Memory capacity Free memory, anonymous paging, thread swapping Network interface Max bandwidth, packet rate Receive throughput, transmit throughput over max bandwidth Disks Input / output operations per second, throughput Wait queue length, device utilization, device errors
  • 12. Resource Utilization • For given performance, how efficiently are resources being used • Something at 100% utilization can’t accept any more work • Low utilization can indicate more resource is being purchased than needed
  • 13. Example: Web Application • MediaWiki installed on Apache with 140 pages of content • Load increased in intervals over time
  • 18. • Picking an instance is tantamount to resource performance tuning • Give back instances as easily as you can acquire new ones • Find an ideal instance type and workload combination Instance Selection = Performance Tuning
  • 19. Delivering Compute Performance with Amazon EC2 Instances
  • 20. CPU Instructions and Protection Levels • CPU has at least two protection levels: ring0 and ring1 • Privileged instructions can’t be executed in user mode to protect system. Applications leverage system calls to the kernel. Kernel Application
  • 21. Example: Web application system calls
  • 22. X86 CPU Virtualization: Prior to Intel VT-x • Binary translation for privileged instructions • Para-virtualization (PV) • PV requires going through the VMM, adding latency • Applications that are system call bound are most affected VMM Application Kernel PV
  • 23. X86 CPU Virtualization: After Intel VT-x • Hardware assisted virtualization (HVM) • PV-HVM uses PV drivers opportunistically for operations that are slow emulated: • e.g. network and block I/O Kernel Application VMM PV-HVM
  • 24. Tip: Use PV-HVM AMIs with EBS
  • 25. Time Keeping Explained • Time keeping in an instance is deceptively hard • gettimeofday(), clock_gettime(), QueryPerformanceCounter() • The TSC • CPU counter, accessible from userspace • Requires calibration, vDSO • Invariant on Sandy Bridge+ processors • Xen pvclock; does not support vDSO • On current generation instances, use TSC as clocksource
  • 26. Tip: Use TSC as clocksource tsc source=tsc
  • 27. CPU Performance and Scheduling • Hypervisor ensures every guest receives CPU time • Fixed allocation • Uncapped vs. capped • Variable allocation • Different schedulers can be used depending on the goal • Fairness • Response time / deadline • Shares
  • 28. Review: C4 Instances Custom Intel E5-2666 v3 at 2.9 GHz P-state and C-state controls Model vCPU Memory (GiB) EBS (Mbps) c4.large 2 3.75 500 c4.xlarge 4 7.5 750 c4.2xlarge 8 15 1,000 c4.4xlarge 16 30 2,000 c4.8xlarge 36 60 4,000
  • 29. What’s new in C4: P-state and C-state control • By entering deeper idle states, non-idle cores can achieve up to 300MHz higher clock frequencies • But… deeper idle states require more time to exit, may not be appropriate for latency sensitive workloads
  • 30. Tip: P-state control for AVX2 • If an application makes heavy use of AVX2 on all cores, the processor may attempt to draw more power than it should • Processor will transparently reduce frequency • Frequent changes of CPU frequency can slow an application
  • 31. Review: T2 Instances • Lowest cost EC2 Instance at $0.013 per hour • Burstable performance • Fixed allocation enforced with CPU Credits Model vCPU CPU Credits / Hour Memory (GiB) Storage t2.micro 1 6 1 EBS Only t2.small 1 12 2 EBS Only t2.medium 2 24 4 EBS Only t2.large 2 36 8 EBS Only
  • 32. How Credits Work • A CPU Credit provides the performance of a full CPU core for one minute • An instance earns CPU credits at a steady rate • An instance consumes credits when active • Credits expire (leak) after 24 hours Baseline Rate Credit Balance Burst Rate
  • 33. Tip: Monitor CPU credit balance
  • 34. Monitoring CPU Performance in Guest • Indicators that work is being done • User time • System time (kernel mode) • Wait I/O, threads blocked on disk I/O • Else, Idle • What happens if OS is scheduled off the CPU?
  • 35. Tip: How to interpret Steal Time • Fixed CPU allocations of CPU can be offered through CPU caps • Steal time happens when CPU cap is enforced • Leverage CloudWatch metrics
  • 36. Delivering I/O Performance with Amazon EC2 Instances
  • 37. I/O and Devices Virtualization • Scheduling I/O requests between virtual devices and shared physical hardware • Split driver model • Intel VT-d • Direct pass through and IOMMU for dedicated devices • Enhanced Networking
  • 38. Hardware Split Driver Model Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling
  • 39. Split Driver Model • Each virtual device has two main components • Communication ring buffer • An event channel signaling activity in the ring buffer • Data is transferred through shared pages • Shared pages requires inter domain permissions, or granting
  • 40. Review: I2 Instances 16 vCPU: 3.2 TB SSD; 32 vCPU: 6.4 TB SSD 365K random read IOPS for 32 vCPU instance Model vCPU Memory (GiB) Storage Read IOPS Write IOPS i2.xlarge 4 30.5 1 x 800 SSD 35,000 35,000 i2.2xlarge 8 61 2 x 800 SSD 75,000 75,000 i2.4xlarge 16 122 4 x 800 SSD 175,000 155,000 i2.8xlarge 32 244 8 x 800 SSD 365,000 315,000
  • 41. Granting in pre-3.8.0 Kernels • Requires “grant mapping” prior to 3.8.0 • Grant mappings are expensive operations due to TLB flushes read(fd, buffer,…)
  • 42. Granting in 3.8.0+ Kernels, Persistent and Indirect • Grant mappings are setup in a pool once • Data is copied in and out of the grant pool read(fd, buffer…) Copy to and from grant pool
  • 43. Tip: Use 3.8+ kernel • Amazon Linux 13.09 or later • Ubuntu 14.04 or later • RHEL7 or later • Etc.
  • 44. Event Handling • Guest vCPUs are interrupted to process events. • Pre-2.6.36 kernels: notifications went to a single virtual hardware interrupt • Post-2.6.36 kernels: allow instance to tell hypervisor to deliver notification to a specific vCPU for balancing • Check "dmesg" for the following text: "Xen HVM callback vector for event delivery is enabled“ • Also, check version of irqbalance is 1.0.7 or higher
  • 45. Hardware Split Driver Model: Networking Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application
  • 46. Hardware Split Driver Model: Networking Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application
  • 47. Hardware Split Driver Model: Networking Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application
  • 48. Hardware Split Driver Model: Networking Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application
  • 49. Hardware Split Driver Model: Networking Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application
  • 50. Device Pass Through: Enhanced Networking • SR-IOV eliminates need for driver domain • Physical network device exposes virtual function to instance • Requires a specialized driver, which means: • Your instance OS needs to know about it • EC2 needs to be told your instance can use it
  • 51. Hardware After Enhanced Networking Driver Domain Guest Domain Guest Domain VMM Frontend driver NIC Driver Backend driver Device Driver Physical CPU Physical Memory SR-IOV Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application
  • 52. Hardware After Enhanced Networking Driver Domain Guest Domain Guest Domain VMM Frontend driver NIC Driver Backend driver Device Driver Physical CPU Physical Memory SR-IOV Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application
  • 53. Hardware After Enhanced Networking Driver Domain Guest Domain Guest Domain VMM Frontend driver NIC Driver Backend driver Device Driver Physical CPU Physical Memory SR-IOV Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application
  • 54. Tip: Use Enhanced Networking • Highest packets-per-second • Lowest variance in latency • Instance OS must support it • Look for SR-IOV property of instance or image
  • 56. How to build enhanced network driver on Linux
  • 57. Let’s start to create AMI for Enhanced networking enabled
  • 58. CentOS 6.5 with c4.8xlarge Let’s start with the AMI officially provided CentOS, should be clean. CentOS is provided in awsmarketplace.
  • 59. Only missing c4.8xlarge L Anyway, let’s go to AMI search
  • 60. Find and Select AMI CentOS 6 x86_64 (2014_09_29) EBS HVM-74e73035- 3435-48d6-88e0-89cc02ad83ee-ami-a8a117c0.2 ami-c2a818aa (IAD) This is the CentOS AMI at awsmarketplace CentOS. Nice ?!?!
  • 61. Check the requirements of Enabling Enhanced Networking on Linux • C3, C4, D2, I2, M4 and R3 • HVM AMI with Linux kernel above V.2.6.32 • Launch the instance in VPC • A network driver to support enhanced networking on Linux.
  • 62. Check kernel version and network driver [root@ip-192-168-1-171 ~]# cat /etc/redhat-release CentOS release 6.5 (Final) [root@ip-192-168-1-171 ~]# uname -na Linux ip-192-168-1-171 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue Sep 9 21:36:05 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux [root@ip-192-168-1-171 ~]# modinfo ixgbevf modinfo ixgbevf filename: /lib/modules/2.6.32-431.29.2.el6.x86_64/kernel/drivers/net/ixgbevf/ixgbevf.ko version: 2.7.12-k license: GPL description: Intel(R) 82599 Virtual Function Driver author: Intel Corporation, <linux.nics@intel.com> srcversion: E75203124BB105EC871944F alias: pci:v00008086d00001515sv*sd*bc*sc*i* alias: pci:v00008086d000010EDsv*sd*bc*sc*i* depends: vermagic: 2.6.32-431.29.2.el6.x86_64 SMP mod_unload modversions parm: debug:Debug level (0=none,...,16=all) (int) Launch c4.large instance and login
  • 63. Let’s update all including kernel root@ip-192-168-1-171 ~]# yum update -y Loaded plugins: fastestmirror, presto Loading mirror speeds from cached hostfile * base: mirrors.mit.edu * extras: linux.cc.lehigh.edu * updates: mirrors.lga7.us.voxel.net Setting up Update Process Resolving Dependencies --> Running transaction check ---> Package audit.x86_64 0:2.2-4.el6_5 will be updated ---> Package audit.x86_64 0:2.3.7-5.el6 will be an update ---> Package audit-libs.x86_64 0:2.2-4.el6_5 will be updated ---> Package audit-libs.x86_64 0:2.3.7-5.el6 will be an update ---> Package authconfig.x86_64 0:6.1.12-13.el6 will be updated ---> Package authconfig.x86_64 0:6.1.12-19.el6 will be an update ---> Package bash.x86_64 0:4.1.2-15.el6_5.2 will be updated ---> Package bash.x86_64 0:4.1.2-29.el6 will be an update ---> Package binutils.x86_64 0:2.20.51.0.2-5.36.el6 will be updated ---> Package binutils.x86_64 0:2.20.51.0.2-5.42.el6 will be an update ---> Package ca-certificates.noarch 0:2014.1.98-65.0.el6_5 will be updated ---> Package ca-certificates.noarch 0:2014.1.98-65.1.el6 will be an update ---> Package centos-release.x86_64 0:6-5.el6.centos.11.2 will be updated ---> Package centos-release.x86_64 0:6-6.el6.centos.12.2 will be an update ---> Package coreutils.x86_64 0:8.4-31.el6_5.2 will be updated ---> Package coreutils.x86_64 0:8.4-37.el6 will be an update ---> Package coreutils-libs.x86_64 0:8.4-31.el6_5.2 will be updated ………………………
  • 64. Reboot and check the updated [root@ip-192-168-1-171 ~]# cat /etc/redhat-release CentOS release 6.6 (Final) [root@ip-192-168-1-171 ~]# uname -na Linux ip-192-168-1-171 2.6.32-504.12.2.el6.x86_64 #1 SMP Wed Mar 11 22:03:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux [root@ip-192-168-1-171 ~]# modinfo ixgbevf filename: /lib/modules/2.6.32-504.12.2.el6.x86_64/kernel/drivers/net/ixgbevf/ixgbevf.ko version: 2.12.1-k license: GPL description: Intel(R) 82599 Virtual Function Driver author: Intel Corporation, <linux.nics@intel.com> srcversion: 8797AC845BB302315230490 alias: pci:v00008086d00001515sv*sd*bc*sc*i* alias: pci:v00008086d000010EDsv*sd*bc*sc*i* depends: vermagic: 2.6.32-504.12.2.el6.x86_64 SMP mod_unload modversions parm: debug:Debug level (0=none,...,16=all) (int) Upgraded network driver installed version: 2.7.12-k -> 2.12.1-k
  • 65. Enable SR-IOV Install AWS CLI or EC2 CLI tools Not supported at AWS Console yet L 1. Stop the instance 2. Enable SR-IOV of the instance with CLI 3. Check the status 4. Start the instance a82066443ffe:~ ilho$ aws ec2 modify-instance-attribute --instance-id i-681280bf --sriov-net-support simple a82066443ffe:~ ilho$ aws ec2 describe-instance-attribute --instance-id i-681280bf --attribute sriovNetSupport { "InstanceId": "i-681280bf", "SriovNetSupport": { "Value": "simple" } }
  • 66. A problem with more than 32 vCPUs on Linux CentOS 6.x does not support more than 32 vCPUs in kernel. It can not boot when you launch c4.8xlarge(36 vCPUs) L • d2.8xlarge and m4.10xlarge? -> L A solution is to add an option to kernel boot parameter
  • 67. Add maxcpus option to kernel boot parameter $ vi /boot/grub/menu.lst Add maxcpus=32 [root@ip-10-10-10-242 ~]# cat /boot/grub/menu.lst # grub.conf generated by anaconda # # Note that you do not have to rerun grub after making changes to this file # NOTICE: You do not have a /boot partition. This means that # all kernel and initrd paths are relative to /, eg. # root (hd0,0) # kernel /boot/vmlinuz-version ro root=/dev/vda1 # initrd /boot/initrd-[generic-]version.img #boot=/dev/vda default=0 timeout=1 serial --unit=0 --speed=115200 terminal --timeout=1 serial console title CentOS (2.6.32-431.29.2.el6.x86_64) root (hd0,0) kernel /boot/vmlinuz-2.6.32-431.29.2.el6.x86_64 maxcpus=32 ro root=UUID=dcb1645e-05a6-4311- 8bce-a9c12bec5801 rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD console=ttyS0,115200 crashkernel=auto SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM initrd /boot/initramfs-2.6.32-431.29.2.el6.x86_64.img
  • 68. Now let’s launch c4.8xlarge instance I think it’s ready to go. Reboot Launch a c4.8xlarge instance It can not launch.
  • 69. Why? AWS Marketplace AMI does not support to launch an instance type not in the list.
  • 70. Change the base AMI Find CentOS 6.5 community version AMI : CentOS-6.5-base-20150305 (ami-0e80db66) Repeat three steps 1. $ sudo yum update –y 2. Enable SR-IOV 3. Add maxcpus=32
  • 71. Network driver version should be checked http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/en hanced-networking.html To enable enhanced networking on your instance, you must ensure that its kernel has the ixgbevf module installed and that you set the sriovNetSupport attribute for the instance. For the best performance, we recommend that the ixgbevf module is version 2.14.2 or higher.
  • 72. Build and Install the network driver #2 [ec2-user@ip-192-168-1-50 src]$ make;sudo make install make -C /lib/modules/2.6.32-504.12.2.el6.x86_64/build SUBDIRS=/home/ec2-user/ixgbevf-2.16.1/src modules make[1]: Entering directory `/usr/src/kernels/2.6.32-504.12.2.el6.x86_64' CC [M] /home/ec2-user/ixgbevf-2.16.1/src/ixgbevf_main.o CC [M] /home/ec2-user/ixgbevf-2.16.1/src/ixgbevf_param.o CC [M] /home/ec2-user/ixgbevf-2.16.1/src/ixgbevf_ethtool.o CC [M] /home/ec2-user/ixgbevf-2.16.1/src/kcompat.o CC [M] /home/ec2-user/ixgbevf-2.16.1/src/ixgbe_vf.o CC [M] /home/ec2-user/ixgbevf-2.16.1/src/ixgbe_mbx.o LD [M] /home/ec2-user/ixgbevf-2.16.1/src/ixgbevf.o Building modules, stage 2. MODPOST 1 modules CC /home/ec2-user/ixgbevf-2.16.1/src/ixgbevf.mod.o LD [M] /home/ec2-user/ixgbevf-2.16.1/src/ixgbevf.ko.unsigned NO SIGN [M] /home/ec2-user/ixgbevf-2.16.1/src/ixgbevf.ko make[1]: Leaving directory `/usr/src/kernels/2.6.32-504.12.2.el6.x86_64' make -C /lib/modules/2.6.32-504.12.2.el6.x86_64/build SUBDIRS=/home/ec2-user/ixgbevf-2.16.1/src modules make[1]: Entering directory `/usr/src/kernels/2.6.32-504.12.2.el6.x86_64' Building modules, stage 2. MODPOST 1 modules make[1]: Leaving directory `/usr/src/kernels/2.6.32-504.12.2.el6.x86_64' gzip -c ../ixgbevf.7 > ixgbevf.7.gz # remove all old versions of the driver find /lib/modules/2.6.32-504.12.2.el6.x86_64 -name ixgbevf.ko -exec rm -f {} ; || true find /lib/modules/2.6.32-504.12.2.el6.x86_64 -name ixgbevf.ko.gz -exec rm -f {} ; || true install -D -m 644 ixgbevf.ko /lib/modules/2.6.32- 504.12.2.el6.x86_64/kernel/drivers/net/ixgbevf/ixgbevf.ko /sbin/depmod -a 2.6.32-504.12.2.el6.x86_64 || true install -D -m 644 ixgbevf.7.gz /usr/share/man/man7/ixgbevf.7.gz man -c -P'cat > /dev/null' ixgbevf || true http://sourceforge.net/projects/e1000/files/ixgbevf%20stable/
  • 73. Check the new driver installed [ec2-user@ip-192-168-1-50 src]$ modinfo ixgbevf filename: /lib/modules/2.6.32-504.12.2.el6.x86_64/kernel/drivers/net/ixgbevf/ixgbevf.ko version: 2.16.1 license: GPL description: Intel(R) 10 Gigabit Virtual Function Network Driver author: Intel Corporation, <linux.nics@intel.com> srcversion: 3B690FE23A02C25EF74012F alias: pci:v00008086d00001515sv*sd*bc*sc*i* alias: pci:v00008086d000010EDsv*sd*bc*sc*i* depends: vermagic: 2.6.32-504.12.2.el6.x86_64 SMP mod_unload modversions parm: InterruptThrottleRate:Maximum interrupts per second, per vector, (956-488281, 0=off, 1=dynamic), default 1 (array of int) reboot [ec2-user@ip-192-168-1-50 ~]$ ethtool -i eth0 driver: ixgbevf version: 2.16.1 firmware-version: N/A bus-info: 0000:00:03.0 supports-statistics: yes supports-test: yes supports-eeprom-access: no supports-register-dump: yes supports-priv-flags: no
  • 74. Share AMI to the users Sharing the customized AMI with your team ~ Let’s get some sleep… J
  • 75. Reply back with 1/2 (OS unreachable)
  • 77. Root cause found the 70-persistent-net.rules file /etc/udev/rules.d/ having an entry with the MAC address of the original instance the AMI was taken from. When the image is taken, the MAC from the original instance persists, which tells the OS after the new instance is deployed that the now-nonexistent MAC of the original instance should be eth0. Since the OS cannot find the device with the original MAC address, eth0 fails to be identified and isn't brought up.
  • 78. A solution to avoid this caveat Before creating AMI, You must remove /etc/udev/rules.d/70-persistent-net.rules file Create AMI Share AMI Now launch c4 instance successfully with the latest driver.
  • 79. A small contribution The note was added in public document “enhanced networking” section.
  • 80. Lesson learned Just use CentOS 7 or Amazon Linux. J
  • 81. EBS designing for performance
  • 83. EBS =
  • 84. What is EBS? • Network block storage as a service • EBS volumes attach to any Amazon EC2 instance in the same Availability Zone • Designed for five nines of availability • 2 million volumes created every day
  • 85. EBS volume types Magnetic General purpose (SSD) Provisioned IOPS (SSD)
  • 86. EBS volume types IOPS: Typically 100, best effort Throughput: 40-90 MB/s Latency: Read 10-40ms, Write 2-10ms Best for infrequently accessed data Magnetic
  • 87. EBS volume types IOPS Baseline: 100-10,000 (3 / GiB) IOPS Burst: 30 minutes @ 3,000 Throughput: Up to 160 MB/s Latency: Single-digit ms Performance consistency: 99% Most workloadsGeneral purpose (SSD)
  • 88. EBS volume types IOPS: 100-20,000 (customer provisioned) Throughput: Up to 320 MB/s Latency: Single-digit ms Performance consistency: 99.9% Mission Critical workloadsProvisioned IOPS (SSD)
  • 90. Queuing theory Little’s law is the foundation for performance tuning theory • Mathematically proven by John Little in 1961 𝑾 = 𝑳 𝑨 W = Wait time = average wait time per request L = Queue length = average number of requests waiting A = Arrival rate = the rate of requests arriving EBS performance is related to this law
  • 91. Performance optimization is measured by: IOPS: Read/write I/O rate (IOPS) Latency: Time between I/O submission and completion (ms) Throughput: Read/write transfer rate (MB/s); throughput = IOPS X I/O size
  • 92. Key components to performance EC2 instance I/O EBS Network link
  • 93. A day in the life of an I/O
  • 94. A day in the life: I/O All I/O must pass through I/O domain Requires “grant mapping” prior to 3.8.0 Grant mappings are expensive operations due to TLB flushes EBS Grant mappingread(fd, buffer, BLOCK_SIZE) I/O domainInstance
  • 95. A day in the life: I/O (continued) Responses Requests Instance I/O domain READ 8KB @ 1234 Request queue is a single memory page Each I/O request has 11 grant references (4KiB/reference) Maximum data in queue = 1408 KiB
  • 96. 3.8.0+ Kernels – Persistent grants Grant mappings are setup in a pool once Data is copied in and out of the grant pool Copying is significantly faster than remapping EBS Grant poolread(fd, buffer, BLOCK_SIZE) I/O domainInstance
  • 97. 3.8.0+ Kernels – Indirect grants Responses Requests Instance READ 8KB @ 1234 I/O domain Each I/O request has grant references that contain grant references Maximum data in queue = 4096 KiB (default)
  • 98. Instance I/O: Before 3.8.0 0 1 2 3 4 5 30 31 128KiB 44KiB 44KiB 40KiB
  • 99. Instance I/O: Linux 3.8.0+ 0 1 2 3 4 5 30 31 128KiB
  • 100. Tip: Use 3.8+ kernel Amazon Linux 2013.09 or later Ubuntu 14.04 or later RHEL7 or later Etc.
  • 101. Queue depth An I/O operation EBS After it’s gone, it’s gone EC2 Queue depth is the pending I/O for a volume
  • 102. Workload/ software Typical block size Random/ Seq? Max EBS @ 500 MB/s instances Max EBS @ 1 GB/s instances Max EBS @ 10 GB/s instances Oracle DB Configurable:2 KB –16 KB Default 8 KB random ~7,800 IOPS ~15,600 IOPS ~48,000 IOPS Microsoft SQL Server 8 KB w/ 64 KB extents random ~7,800 IOPS ~15,600 IOPS ~48,000 IOPS MySQL 16 KB random ~4,000 IOPS ~7,800 IOPS ~48,000 IOPS PostgreSQL 8 KB random ~7,800 IOPS ~15,600 IOPS ~48,000 IOPS MongoDB 4 KB sequential ~15,600 IOPS ~31,000 IOPS ~48,000 IOPS Apache Cassandra 4 KB random ~15,600 IOPS ~31,000 IOPS ~48,000 IOPS GlusterFS 128 KB sequential ~500 IOPS ~1,000 IOPS ~6,000 IOPS Cheat sheet: Sample storage workloads on AWS
  • 103. Example workload Transaction (OLTP) Examples: eCommerce website, metadata storage Benchmark: MySQL + sysbench
  • 104. Tip: Workload Where possible, use real production workloads for performance testing
  • 105. Baseline configuration Availability Zone: US West (Oregon) Instance type: m2.4xlarge vCPU: 8 Memory: 68.4GiB EBS-optimized Data volume: 500GiB EBS magnetic OS: Amazon Linux 2015.03.1
  • 106. Optimization: Increase parallelism MySQL threads Transactions(n) Baseline 2 n
  • 108. Key components to performance EC2 instance I/O EBS Network link
  • 109. m2.4xlarge CPU: Intel Xeon vCPU: 8 Memory: 68.4 GiB Price: $0.98/hour* Instance selection r3.2xlarge CPU: Intel Xeon E5-2670 v2 vCPU: 8 Memory: 61 GiB Enhanced networking Price: $0.70/hour* * All pricing from US West (Oregon)
  • 110. EBS optimized instances • Most instancefamilies support the EBS-optimizedflag • EBS-optimized instances now support up to 4 Gb/s • Drive 32,000 16K IOPS or 500 MB/s • Available by defaulton newer instance types • EC2 *.8xlargeinstances support 10 Gb/s network • Max IOPS per node supported is ~48,000 IOPS @ 16K I/O
  • 111. Tip: Use EBS-optimized instances Use EBS-optimized instances for consistent EBS performance
  • 112. Updated configuration: Instance type Availability Zone: US West (Oregon) Instance type: r3.2xlarge vCPU: 8 Memory: 61 GiB EBS-optimized EBS volume: 500GiB magnetic OS: Amazon Linux 2015.03.1
  • 113. 25% Optimization: Current generation instances MySQL threads Transactions(n) Baseline r3.2xlarge 2 n
  • 114. Tip: Instance selection Use the right instance family for your workload Use current generation instances
  • 115. Key components to performance EC2 instance I/O EBS Network link
  • 116. Volume selection EBS magnetic Latency: Read: 10-40ms Write: 2-10ms SSD backed Latency: Read/Write: Single-digit ms
  • 117. File systems Use a modern, journaled filesystem ext4, xfs, etc. Ensure partitions are aligned on 4KiB boundaries
  • 119. Volume initialization Newly created volumes • Just attach, mount, and go! • Pre-warming is no longer recommended Volumes restored from snapshots • You can use your volume right away • Accelerate data loading by reading
  • 120. Updated configuration: EBS volumes Availability Zone: US West (Oregon) Instance type: r3.2xlarge vCPU: 8 Memory: 61 GiB EBS-optimized Boot volume: 8 GiB – EBS general purpose Data volume: 500 GiB – EBS general purpose OS: Amazon Linux 2015.03.1
  • 121. Optimization: Volume selection Transactions(n) 19% 50% MySQL threads Baseline r3.2xlarge r3.2xlarge gp2 2 n
  • 122. Tip: Volume selection Use SSD backed volumes when performance matters
  • 123. EBS IOPS vs. Throughput 20,000 IOPS PIOPS volume 20,000 IOPS 320 MB/s throughput You can achieve 20,000 IOPS when driving smaller I/O operations You can achieve up to 320 MB/s when driving larger I/O operations
  • 124. EBS IOPS vs. Throughput 8,000 IOPS PIOPS volume 8,000 IOPS 320 MB/s throughput 8,000 x 8 KB = 64 MB/s 8,000 x 16 KB = 128 MB/s 8,000 x 32 KB = 256 MB/s 16,000 x 8 KB = 128 MB/s 8,000 x 64 KB=512 MB/s 5,000 x 64 KB = 320 MB/s
  • 125. Striping Increases performance, or capacity, or both Don’t mix volume types Typically RAID 0 or LVM stripe Avoid RAID for redundancy EBS EC2
  • 126. Striping: Snapshots Quiesce I/O 1. Database: FLUSH and LOCK tables 2. Filesystem: sync and fsfreeze 3. EBS: snapshot all volumes When snapshot API returns, it is safe to resume
  • 127. EBS-optimized instance Four key components: Balanced EC2 A “boatload” of I/O Right-sized EBS
  • 129. Amazon CloudWatch Important Amazon CloudWatch metrics: • IOPS and bandwidth • Latency • Queue depth All EBS metrics are prefixed with “Volume”
  • 130. CloudWatch: Instance bandwidth m4.2xlarge Instance: 128MB/s m4.4xlarge Instance: 256MB/s m4.10xlarge Volume: 320MB/s
  • 132. Distributing Key Names Don’t do this <my_bucket>/2013_11_13-164533125.jpg <my_bucket>/2013_11_13-051033564.jpg <my_bucket>/2013_11_13-061133789.jpg <my_bucket>/2013_11_13-051033458.jpg <my_bucket>/2013_11_12-063433125.jpg <my_bucket>/2013_11_12-021033564.jpg <my_bucket>/2013_11_12-065533789.jpg <my_bucket>/2013_11_12-011033458.jpg <my_bucket>/2013_11_11-022333125.jpg <my_bucket>/2013_11_11-153433564.jpg <my_bucket>/2013_11_11-065233789.jpg <my_bucket>/2013_11_11-065633458.jpg
  • 133. Distributing Key Names Add randomness to the beginning of the key name <my_bucket>/521335461-2013_11_13.jpg <my_bucket>/465330151-2013_11_13.jpg <my_bucket>/987331160-2013_11_13.jpg <my_bucket>/465765461-2013_11_13.jpg <my_bucket>/125631151-2013_11_13.jpg <my_bucket>/934563160-2013_11_13.jpg <my_bucket>/532132341-2013_11_13.jpg <my_bucket>/565437681-2013_11_13.jpg <my_bucket>/234567460-2013_11_13.jpg <my_bucket>/456767561-2013_11_13.jpg <my_bucket>/345565651-2013_11_13.jpg <my_bucket>/431345660-2013_11_13.jpg
  • 134. Other Techniques for Distributing Key Names Store objects as a hash of their name • add the original name as metadata • “deadmau5_mix.mp3” à 0aa316fb000eae52921aab1b4697424958a53ad9 – watch for duplicate names! • prepend keyname with short hash • 0aa3-deadmau5_mix.mp3 Epoch time (reverse) • 5321354831-deadmau5_mix.mp3
  • 135. Randomness in a Key Name Can Be an Anti-Pattern Lifecycle policies LISTs with prefix filters Maintaining thumbnails of images • craig.jpg -> stored as orig-09329jed0fc • thumb-09329jed0fc When you need to recover a file with its original name
  • 136. Solving for the Anti-Pattern Add additional prefixes to help sorting Amazon S3 maintains keys lexicographically in its internal indices <my_bucket>/images/521335461-2013_11_13.jpg <my_bucket>/images/465330151-2013_11_13.jpg <my_bucket>/movies/293924440-2013_11_13.jpg <my_bucket>/movies/987331160-2013_11_13.jpg <my_bucket>/thumbs-small/838434842-2013_11_13.jpg <my_bucket>/thumbs-small/342532454-2013_11_13.jpg <my_bucket>/thumbs-small/345233453-2013_11_13.jpg <my_bucket>/thumbs-small/345453454-2013_11_13.jpg
  • 139. Region Edge Location 12 Regions 32 Availability Zones 54 Edge Locations Need to update We’re here J
  • 140. Configure multiple origins Elastic Load Balancing Dynamic content Amazon EC2 Static content Amazon S3 * (default) /error/* /assets/* Amazon CloudFront example.com
  • 141. CloudFront Behaviors CloudFront Customer Location www.mysite.com Path Pattern Matching /*.jpg; /*.php etc. GET http://mysite.com/images/1.jpg to ORIGIN A GET http://mysite.com/index.php to ORIGIN B GET http://mysite.com/web/home.css to ORIGIN C GET http://mysite.com/* (DEFAULT) to ORIGIN D Origin A: S3 bucket Origin B: www.mysite.com Origin C: S3 Bucket Origin D: www.mysite.com Path Pattern Matching /*.php /images/*.jpg /web/*.css /*.* (DEFAULT)
  • 142. Region Edge Location 12 Regions 32 Availability Zones 54 Edge Locations Need to update AWS optimized network Internet
  • 143. Demo J
  • 145. Amazon S3 Transfer Acceleration Embedded WAN acceleration S3 Bucket AWS Edge Location Uploader Optimized Throughput! Move over long geographic distances Up to 300% (6x) faster No firewall mods, no client software 54 global edge locations Change your endpoint, not your code
  • 146. Accelerate Speed Comparison • Test URL • http://s3-accelerate-speedtest.s3- accelerate.amazonaws.com/en/accelerate-speed- comparsion.html • bit.ly/news3ta • Test Result (May-02-2016) • Tested on May-02-2016, LGU+ Wifi at GSTower in Seoul • http://bit.ly/newss3taresult
  • 147. Testing S3 Transfer Accelerator by AWSCLI $ sudo pip install –upgrade awscli $ aws configure set default.s3.use_accelerate_endpo int true
  • 148. Testing S3 Transfer Accelerator by AWSCLI $ aws s3 cp 33MB.pptx s3://ilho-saopaulo-01/ $ aws s3 cp 33MB.pptx s3://ilho-saopaulo-01/ --endpoint- url http://ilho-saopaulo-01.s3-accelerate.amazonaws.com
  • 150. S3 Transfer Acceleration Pricing Starting at $0.04/GB transferred (+ usual bandwidth charges). Up to $0.08/GB in some regions Pay only for what you use Accelerated performance or no charge Compare to hardware, per-GB or licenses