SlideShare a Scribd company logo
Performance Tuning a Cloud Application 
Shane Gibson 
Sr. Principal Infrastructure Architect 
Cloud Platform Engineering
2 
Agenda 
• About Symantec and Me 
• Key Value as a Service 
• The Pesky Problem 
• Resolving “The Pesky Problem” 
• Performance Tuning Recommendations 
• Summary 
• Q&A
3 
About Symantec and Me
4 
The Symantec Team 
• Cloud Platform Engineering 
– We are building a consolidated cloud platform that provides infrastructure and 
platform services for next generation Symantec products and services 
– Starting small, but scaling to tens of thousands of nodes across multiple DCs 
– Cool technologies in use: OpenStack, Hadoop, Storm, Cassandra, MagnetoDB 
– Strong commitment to provide back to Open Source communities 
• Shane Gibson 
– Served 4 years in USMC as a computer geek (mainframes and Unix) 
– Unix/Linux SysAdmin, System Architect, Network Architect, Security Architect 
– Now Cloud Infrastructure Architect for CPE group at Symantec
5 
Key Value as a Service 
(the “cloud” application)
6 
Key Value as a Service: General Architecture 
• MagnetoDB is a key value store with OpenStack REST and AWS 
DynamoDB API compatibility 
• Uses a “pluggable” backend storage capability 
• Composite service made up of: 
– MagnetoDB front-end API and Streaming service 
– Cassandra for back end, Key Value based storage 
– OpenStack Keystone 
– AMQP Messaging Bus (eg RabbitMQ, QPID, ZeroMQ) 
– Load Balancing capabilities (Hardware or LBaaS)
7 
Key Value as a Service: MagnetoDB 
– API Services Layer 
• Data API 
• Streaming API 
• Monitoring API 
• AWS DynamoDB API 
– Keystone and Notifications 
integrations 
– MagnetoDB Database Driver 
• Cassandra
8 
Key Value as a Service: MagnetoDB 
– API Services Layer 
• Data API 
• Streaming API 
• Monitoring API 
• AWS DynamoDB API 
– Keystone and Notifications 
integrations 
– MagnetoDB Database Driver 
• Cassandra
9 
Key Value as a Service: Cassandra 
– Database storage engine 
– Massively linearly scalable 
– Highly available w/ no SPoF 
– Other features: 
• tunable consistency 
• key-value data model 
• ring topology 
• predictable high performance 
and fault tolerance 
• Rack and Datacenter awareness
10 
Key Value as a Service: Cassandra 
– Database storage engine 
– Massively linearly scalable 
– Highly available w/ no SPoF 
– Other features: 
• tunable consistency 
• key-value data model 
• ring topology 
• predictable high performance 
and fault tolerance 
• Rack and Datacenter awareness
11 
Key Value as a Service: Other Stuff 
– Need a load balancing layer of 
some sort 
• LBaaS or hardware 
– Keystone service 
– AMQP service 
• RabbitMQ
12 
Key Value as a Service: Other Stuff 
– Need a load balancing layer of 
some sort 
• LBaaS or hardware 
– Keystone service 
– AMQP service 
• RabbitMQ
13 
Key Value as a Service: Other Stuff 
– Need a load balancing layer of 
some sort 
• LBaaS or hardware 
– Keystone service 
– AMQP service 
• RabbitMQ
14 
Key Value as a Service: Other Stuff 
– Need a load balancing layer of 
some sort 
• LBaaS or hardware 
– Keystone service 
– AMQP service 
• RabbitMQ
15 
Key Value as a Service: Putting it all Together
16 
The Pesky Problem
17 
The Pesky Problem: Deployed on Bare Metal 
• Initial deployment of KVaaS service on bare metal nodes 
• Mixed both MagnetoDB API service on same node as Cassandra 
– MagnetoDB CPU –vs- Cassandra Disk I/O profile 
• Cassandra directly managing the disks via JBOD (good!) 
• MagnetoDB likes lots of CPU, direct access to 32 (HT) CPUs 
– Please don’t start me on a HyperThread CPU count rant  
• KVaaS team performance expectation set from this experience!
18 
The Pesky Problem: Moved to OpenStack Nova 
• KVaaS service migrated to a “stock” OpenStack Nova cluster 
• Nova Compute nodes set with RAID 10 ephemeral disks 
• OpenContrail used for SDN configuration 
• Performance for each VM Guest roughly 66% of bare metal 
• KVaaS team was unhappy 
bare metal 
250 RPS / HT Core* 
 
virtualized 
165 RPS / HT Core* 
 
19 
The Pesky Problem: Moved to OpenStack Nova, cont. 
performance comparison of “list_tables” 
* results averaged by core since test beds were different
20 
The Pesky Problem: The Goal 
• Deploy our KVaaS service … as a flexible and scalable solution 
• Ability to use OpenStack APIs to manage the service 
• Cloud Provider run KVaaS service or Tenant managed service 
• Initial deployment planned for OpenStack Nova platform 
– Not a containerization service … 
– Though … considering it … 
• Easier auto-scaling, better service packing, flexibility, etc. 
• Explore mixed MagnetoDB/Cassandra –vs- separated services
21 
Resolving “The Pesky Problem”
22 
Resolving the “Pesky Problem”: Approach 
• Baseline the test environment 
– Bare metal deployment and test 
– Mimics the original deployment characteristics 
• Deploy OpenStack Nova – Install KVaaS services 
• Performance tune each component 
– Linux OS and Hardware configuration 
– KVM Hypervisor/Nova Compute performance tuning 
– MagnetoDB/Cassandra performance tuning
23 
Resolving the “Pesky Problem”: Testing Tools 
• Linux OS and Hardware 
– perf, openssl speed, iostat, iozone, iperf, dd (yes, really!), dtrace 
• KVM Hypervisor/Nova Compute 
– kvm_stat, kvmtrace, perf stat –e ‘kvm:*’, specvirt 
• MagnetoDB/Cassandra 
– magnetodb-test-bench, jstat, cstar_perf, cassandra-stress 
• General Test Suite 
– Phoronix Test Suite
24 
Resolving the “Pesky Problem”: Test Architecture
25 
Resolving the “Pesky Problem”: Test Bench
26 
Performance Tuning 
Recommendations
27 
Performance Tuning Results: Linux OS and Hardware 
Recommendations: 
Host: 
Guest: 
• vhost_net or virtio_net, 
virtio_blk, virtio_balloon, 
virtio_pc 
• Paravirtualization ! 
• Disable system perf. gathering – get 
info from host hyper. tools 
• Elevator scheduler to “noop” 
• Give guests as much memory as you 
can (FS cache!) 
• vhost_net, transparent_hugepages, 
high_res_timer, hpet, compaction, 
ksm, cgroups 
• task scheduling tweaks (CFS) 
• Filesystem mount options 
(noatime, nodirtime, relatime) 
• Tune wmem and rmem buffers !!! 
• Elevator I/O Scheduler = deadline
28 
Performance Tuning Results: Linux OS and Hardware 
7-10x 
Recommendations: 
Host: 
• vhost_net, transparent_hugepages, 
high_res_timer, hpet, compaction, 
ksm, cgroups 
30% 
10% less latency 
8x throughput 
• task scheduling tweaks (CFS) 
• Filesystem mount options 
(noatime, nodirtime, relatime) 
• Tune wmem and rmem buffers !!! 
• Elevator I/O Scheduler = deadline 
2x throughput 
Guest: 
• vhost_net or virtio_net, 
virtio_blk, virtio_balloon, 
virtio_pc 
• Paravirtualization ! 
• Disable system perf. gathering – get 
info from host hyper. tools 
• Elevator scheduler to “noop” 
• Give guests as much memory as you 
can (FS cache!)
29 
Performance Tuning Results: KVM /Nova Compute 
Recommendations: 
Host: 
• tweak Transparent Huge Pages 
• bubble up raw devices if possible 
(warning: migration/portability) 
• multi-queue virtio-net 
• SR-IOV if can dedicate NIC 
(warning: see bubble up warning!) 
Guest: 
• qcow2 or raw for guest file backing 
• disk partition alignment is still very 
important 
• preallocate metadata (qcow2) 
• fallocate entire guest image if can 
(qcow2, lose oversubscribe ability) 
• set VM swappiness to zero 
• Async. I/O set to “native”
Recommendations: 
Host: 
• tweak Transparent Huge Pages 
• bubble up raw devices if possible 
(warning: migration/portability) 
• multi-queue virtio-net 
• SR-IOV if can dedicate NIC 
(warning: see bubble up warning!) 
30 
Performance Tuning Results: KVM /Nova Compute 
30 
2 to 15% gain 
~ 10% gain 
40+% gain w/ 
Host + Guest 
8% gain in TPM 
Guest: 
• qcow2 or raw for guest file backing 
• disk partition alignment is still very 
important 
• preallocate metadata (qcow2) 
• fallocate entire guest image if can 
(qcow2, lose oversubscribe ability) 
• set VM swappiness to zero 
• Async. I/O set to “native”
31 
Performance Tuning Results: MagnetoDB/Cassandra 
Recommendations: 
• disk: vm.dirty_ratio & vm.dirty_background_ratio – increasing cache may 
help write work loads that have ordered writes, or writes in bursty 
chunks 
• “CommitLogDirectory“ and “DataFileDirectories“ on separate devices for 
write performance improvement 
• GC tuning of Java heap/new gen – significant latency decreases 
• Tune Bloom Filters, Data Caches, and Compaction 
• Use compression for similar “column families”
32 
Performance Tuning Results: MagnetoDB/Cassandra 
Recommendations: 
10x pages 
• disk: vm.dirty_ratio & vm.dirty_background_ratio – increasing cache may 
help write work loads that have ordered writes, or writes in bursty 
chunks 
• “CommitLogDirectory“ and “DataFileDirectories“ on separate devices for 
write performance improvement 
• GC tuning of Java heap/new gen – significant latency decreases 
• Tune Bloom Filters, Data Caches, and Compaction 
• Use compression for similar “column families” 
25-35% read perf. 
5-10% write gains
33 
Summary
34 
Summary: Notes 
• “clouds” are best composed of small services that can be 
independently combined, tuned, and scaled 
• human expectations in the transition from bare metal to cloud 
need to be reset 
• an iterative step-by-step approach is best 
– Test … Tune … Test … Tune … ! 
• lots of complex pieces in a cloud application
35 
Summary: Notes (continued) 
• Compose your services as individual building blocks 
• Tune each component/service independently 
• Then tune the whole system 
• Automation is critical to iterative test/tune strategies!! 
• Performance tuning is absolutely worth the investment 
• Knowing your work loads is still (maybe even more?) critical
36 
Questions and 
(hopefully?) Answers 
Let’s talk…
Thank you! 
Copyright © 2014 Symantec Corporation. All rights reserved. Symantec and the Symantec Logo are trademarks or registered trademarks of Symantec Corporation or its 
affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners. 
This document is provided for informational purposes only and is not intended as advertising. All warranties relating to the information in this document, either express or 
implied, are disclaimed to the maximum extent allowed by law. The information in this document is subject to change without notice. 
37 
Shane Gibson 
shane_gibson@symantec.com

More Related Content

ODP
Disk Performance Comparison Xen v.s. KVM
PDF
KVM Tuning @ eBay
PDF
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, Citrix
PPTX
Optimizing VM images for OpenStack with KVM/QEMU
PDF
QEMU Disk IO Which performs Better: Native or threads?
PDF
Kvm performance optimization for ubuntu
PDF
Improving the Performance of the qcow2 Format (KVM Forum 2017)
PPTX
RHEVM - Live Storage Migration
Disk Performance Comparison Xen v.s. KVM
KVM Tuning @ eBay
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, Citrix
Optimizing VM images for OpenStack with KVM/QEMU
QEMU Disk IO Which performs Better: Native or threads?
Kvm performance optimization for ubuntu
Improving the Performance of the qcow2 Format (KVM Forum 2017)
RHEVM - Live Storage Migration

What's hot (20)

PDF
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
PPTX
Cinder Live Migration and Replication - OpenStack Summit Austin
PDF
OSv presentation from Linux Foundation Collaboration Summit
PDF
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
PDF
Cinder enhancements-for-replication-using-stateless-snapshots
PDF
Kvm optimizations
PDF
KVM tools and enterprise usage
PDF
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
PDF
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
PDF
Live migrating a container: pros, cons and gotchas
ODP
Ceph Day Melbourne - Troubleshooting Ceph
PDF
Intel QLC: Cost-effective Ceph on NVMe
PDF
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
PPTX
Cinder - status of replication
PDF
Storage based snapshots for KVM VMs in CloudStack
PDF
XPDS14 - Xen as High-Performance NFV Platform - Jun Nakajima, Intel
PDF
kdump: usage and_internals
PDF
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
PPTX
ProfessionalVMware BrownBag VCP5 Section3: Storage
PDF
XPDS14 - OSv - A Modern Semi-POSIX LibraryOS - Glauber Costa, Cloudius Systems
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
Cinder Live Migration and Replication - OpenStack Summit Austin
OSv presentation from Linux Foundation Collaboration Summit
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
Cinder enhancements-for-replication-using-stateless-snapshots
Kvm optimizations
KVM tools and enterprise usage
You Call that Micro, Mr. Docker? How OSv and Unikernels Help Micro-services S...
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Live migrating a container: pros, cons and gotchas
Ceph Day Melbourne - Troubleshooting Ceph
Intel QLC: Cost-effective Ceph on NVMe
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
Cinder - status of replication
Storage based snapshots for KVM VMs in CloudStack
XPDS14 - Xen as High-Performance NFV Platform - Jun Nakajima, Intel
kdump: usage and_internals
Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance
ProfessionalVMware BrownBag VCP5 Section3: Storage
XPDS14 - OSv - A Modern Semi-POSIX LibraryOS - Glauber Costa, Cloudius Systems
Ad

Viewers also liked (6)

PDF
Case Study: Solving Common Oracle DBA Tasks at a leading German Bank
PDF
553: Oracle Database Performance: Are Database Users Telling Me The Truth?
PDF
Memory, Big Data, NoSQL and Virtualization
PPT
Real Life Java EE Performance Tuning
PDF
Tuning Linux for your database FLOSSUK 2016
PPTX
Case Study On Oracle (2000)
Case Study: Solving Common Oracle DBA Tasks at a leading German Bank
553: Oracle Database Performance: Are Database Users Telling Me The Truth?
Memory, Big Data, NoSQL and Virtualization
Real Life Java EE Performance Tuning
Tuning Linux for your database FLOSSUK 2016
Case Study On Oracle (2000)
Ad

Similar to Performance Tuning a Cloud Application: A Real World Case Study (20)

PPTX
Kubernetes Internals
PPTX
To Build My Own Cloud with Blackjack…
PDF
Critical Attributes for a High-Performance, Low-Latency Database
PDF
Toward 10,000 Containers on OpenStack
PDF
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
PPT
NoSQL_Night
PDF
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
PPTX
Cassandra to ScyllaDB: Technical Comparison and the Path to Success
PDF
Boyan Krosnov - Building a software-defined cloud - our experience
PDF
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
PPTX
Hybrid cloud openstack meetup
PDF
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
PPTX
Apache Performance Tuning: Scaling Out
PDF
NAVGEM on the Cloud: Computational Evaluation of Cloud HPC with a Global Atmo...
PPTX
Building big data pipelines with Kafka and Kubernetes
PDF
Bulk Loading into Cassandra
PPTX
EMC World 2016 - code.15 Better Together: Scale-Out Databases on Scale-Out St...
PDF
Unveiling CERN Cloud Architecture - October, 2015
PDF
A closer look to locaweb IaaS
PPTX
Oow2016 review-iaas-paas-13th-18thoctober
Kubernetes Internals
To Build My Own Cloud with Blackjack…
Critical Attributes for a High-Performance, Low-Latency Database
Toward 10,000 Containers on OpenStack
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
NoSQL_Night
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Cassandra to ScyllaDB: Technical Comparison and the Path to Success
Boyan Krosnov - Building a software-defined cloud - our experience
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
Hybrid cloud openstack meetup
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
Apache Performance Tuning: Scaling Out
NAVGEM on the Cloud: Computational Evaluation of Cloud HPC with a Global Atmo...
Building big data pipelines with Kafka and Kubernetes
Bulk Loading into Cassandra
EMC World 2016 - code.15 Better Together: Scale-Out Databases on Scale-Out St...
Unveiling CERN Cloud Architecture - October, 2015
A closer look to locaweb IaaS
Oow2016 review-iaas-paas-13th-18thoctober

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Empathic Computing: Creating Shared Understanding
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
A Presentation on Artificial Intelligence
PPTX
Big Data Technologies - Introduction.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Cloud computing and distributed systems.
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
Machine learning based COVID-19 study performance prediction
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Empathic Computing: Creating Shared Understanding
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
The AUB Centre for AI in Media Proposal.docx
Unlocking AI with Model Context Protocol (MCP)
“AI and Expert System Decision Support & Business Intelligence Systems”
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
A Presentation on Artificial Intelligence
Big Data Technologies - Introduction.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Diabetes mellitus diagnosis method based random forest with bat algorithm
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Network Security Unit 5.pdf for BCA BBA.
Cloud computing and distributed systems.
Chapter 3 Spatial Domain Image Processing.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation_ Review paper, used for researhc scholars

Performance Tuning a Cloud Application: A Real World Case Study

  • 1. Performance Tuning a Cloud Application Shane Gibson Sr. Principal Infrastructure Architect Cloud Platform Engineering
  • 2. 2 Agenda • About Symantec and Me • Key Value as a Service • The Pesky Problem • Resolving “The Pesky Problem” • Performance Tuning Recommendations • Summary • Q&A
  • 4. 4 The Symantec Team • Cloud Platform Engineering – We are building a consolidated cloud platform that provides infrastructure and platform services for next generation Symantec products and services – Starting small, but scaling to tens of thousands of nodes across multiple DCs – Cool technologies in use: OpenStack, Hadoop, Storm, Cassandra, MagnetoDB – Strong commitment to provide back to Open Source communities • Shane Gibson – Served 4 years in USMC as a computer geek (mainframes and Unix) – Unix/Linux SysAdmin, System Architect, Network Architect, Security Architect – Now Cloud Infrastructure Architect for CPE group at Symantec
  • 5. 5 Key Value as a Service (the “cloud” application)
  • 6. 6 Key Value as a Service: General Architecture • MagnetoDB is a key value store with OpenStack REST and AWS DynamoDB API compatibility • Uses a “pluggable” backend storage capability • Composite service made up of: – MagnetoDB front-end API and Streaming service – Cassandra for back end, Key Value based storage – OpenStack Keystone – AMQP Messaging Bus (eg RabbitMQ, QPID, ZeroMQ) – Load Balancing capabilities (Hardware or LBaaS)
  • 7. 7 Key Value as a Service: MagnetoDB – API Services Layer • Data API • Streaming API • Monitoring API • AWS DynamoDB API – Keystone and Notifications integrations – MagnetoDB Database Driver • Cassandra
  • 8. 8 Key Value as a Service: MagnetoDB – API Services Layer • Data API • Streaming API • Monitoring API • AWS DynamoDB API – Keystone and Notifications integrations – MagnetoDB Database Driver • Cassandra
  • 9. 9 Key Value as a Service: Cassandra – Database storage engine – Massively linearly scalable – Highly available w/ no SPoF – Other features: • tunable consistency • key-value data model • ring topology • predictable high performance and fault tolerance • Rack and Datacenter awareness
  • 10. 10 Key Value as a Service: Cassandra – Database storage engine – Massively linearly scalable – Highly available w/ no SPoF – Other features: • tunable consistency • key-value data model • ring topology • predictable high performance and fault tolerance • Rack and Datacenter awareness
  • 11. 11 Key Value as a Service: Other Stuff – Need a load balancing layer of some sort • LBaaS or hardware – Keystone service – AMQP service • RabbitMQ
  • 12. 12 Key Value as a Service: Other Stuff – Need a load balancing layer of some sort • LBaaS or hardware – Keystone service – AMQP service • RabbitMQ
  • 13. 13 Key Value as a Service: Other Stuff – Need a load balancing layer of some sort • LBaaS or hardware – Keystone service – AMQP service • RabbitMQ
  • 14. 14 Key Value as a Service: Other Stuff – Need a load balancing layer of some sort • LBaaS or hardware – Keystone service – AMQP service • RabbitMQ
  • 15. 15 Key Value as a Service: Putting it all Together
  • 16. 16 The Pesky Problem
  • 17. 17 The Pesky Problem: Deployed on Bare Metal • Initial deployment of KVaaS service on bare metal nodes • Mixed both MagnetoDB API service on same node as Cassandra – MagnetoDB CPU –vs- Cassandra Disk I/O profile • Cassandra directly managing the disks via JBOD (good!) • MagnetoDB likes lots of CPU, direct access to 32 (HT) CPUs – Please don’t start me on a HyperThread CPU count rant  • KVaaS team performance expectation set from this experience!
  • 18. 18 The Pesky Problem: Moved to OpenStack Nova • KVaaS service migrated to a “stock” OpenStack Nova cluster • Nova Compute nodes set with RAID 10 ephemeral disks • OpenContrail used for SDN configuration • Performance for each VM Guest roughly 66% of bare metal • KVaaS team was unhappy 
  • 19. bare metal 250 RPS / HT Core*  virtualized 165 RPS / HT Core*  19 The Pesky Problem: Moved to OpenStack Nova, cont. performance comparison of “list_tables” * results averaged by core since test beds were different
  • 20. 20 The Pesky Problem: The Goal • Deploy our KVaaS service … as a flexible and scalable solution • Ability to use OpenStack APIs to manage the service • Cloud Provider run KVaaS service or Tenant managed service • Initial deployment planned for OpenStack Nova platform – Not a containerization service … – Though … considering it … • Easier auto-scaling, better service packing, flexibility, etc. • Explore mixed MagnetoDB/Cassandra –vs- separated services
  • 21. 21 Resolving “The Pesky Problem”
  • 22. 22 Resolving the “Pesky Problem”: Approach • Baseline the test environment – Bare metal deployment and test – Mimics the original deployment characteristics • Deploy OpenStack Nova – Install KVaaS services • Performance tune each component – Linux OS and Hardware configuration – KVM Hypervisor/Nova Compute performance tuning – MagnetoDB/Cassandra performance tuning
  • 23. 23 Resolving the “Pesky Problem”: Testing Tools • Linux OS and Hardware – perf, openssl speed, iostat, iozone, iperf, dd (yes, really!), dtrace • KVM Hypervisor/Nova Compute – kvm_stat, kvmtrace, perf stat –e ‘kvm:*’, specvirt • MagnetoDB/Cassandra – magnetodb-test-bench, jstat, cstar_perf, cassandra-stress • General Test Suite – Phoronix Test Suite
  • 24. 24 Resolving the “Pesky Problem”: Test Architecture
  • 25. 25 Resolving the “Pesky Problem”: Test Bench
  • 26. 26 Performance Tuning Recommendations
  • 27. 27 Performance Tuning Results: Linux OS and Hardware Recommendations: Host: Guest: • vhost_net or virtio_net, virtio_blk, virtio_balloon, virtio_pc • Paravirtualization ! • Disable system perf. gathering – get info from host hyper. tools • Elevator scheduler to “noop” • Give guests as much memory as you can (FS cache!) • vhost_net, transparent_hugepages, high_res_timer, hpet, compaction, ksm, cgroups • task scheduling tweaks (CFS) • Filesystem mount options (noatime, nodirtime, relatime) • Tune wmem and rmem buffers !!! • Elevator I/O Scheduler = deadline
  • 28. 28 Performance Tuning Results: Linux OS and Hardware 7-10x Recommendations: Host: • vhost_net, transparent_hugepages, high_res_timer, hpet, compaction, ksm, cgroups 30% 10% less latency 8x throughput • task scheduling tweaks (CFS) • Filesystem mount options (noatime, nodirtime, relatime) • Tune wmem and rmem buffers !!! • Elevator I/O Scheduler = deadline 2x throughput Guest: • vhost_net or virtio_net, virtio_blk, virtio_balloon, virtio_pc • Paravirtualization ! • Disable system perf. gathering – get info from host hyper. tools • Elevator scheduler to “noop” • Give guests as much memory as you can (FS cache!)
  • 29. 29 Performance Tuning Results: KVM /Nova Compute Recommendations: Host: • tweak Transparent Huge Pages • bubble up raw devices if possible (warning: migration/portability) • multi-queue virtio-net • SR-IOV if can dedicate NIC (warning: see bubble up warning!) Guest: • qcow2 or raw for guest file backing • disk partition alignment is still very important • preallocate metadata (qcow2) • fallocate entire guest image if can (qcow2, lose oversubscribe ability) • set VM swappiness to zero • Async. I/O set to “native”
  • 30. Recommendations: Host: • tweak Transparent Huge Pages • bubble up raw devices if possible (warning: migration/portability) • multi-queue virtio-net • SR-IOV if can dedicate NIC (warning: see bubble up warning!) 30 Performance Tuning Results: KVM /Nova Compute 30 2 to 15% gain ~ 10% gain 40+% gain w/ Host + Guest 8% gain in TPM Guest: • qcow2 or raw for guest file backing • disk partition alignment is still very important • preallocate metadata (qcow2) • fallocate entire guest image if can (qcow2, lose oversubscribe ability) • set VM swappiness to zero • Async. I/O set to “native”
  • 31. 31 Performance Tuning Results: MagnetoDB/Cassandra Recommendations: • disk: vm.dirty_ratio & vm.dirty_background_ratio – increasing cache may help write work loads that have ordered writes, or writes in bursty chunks • “CommitLogDirectory“ and “DataFileDirectories“ on separate devices for write performance improvement • GC tuning of Java heap/new gen – significant latency decreases • Tune Bloom Filters, Data Caches, and Compaction • Use compression for similar “column families”
  • 32. 32 Performance Tuning Results: MagnetoDB/Cassandra Recommendations: 10x pages • disk: vm.dirty_ratio & vm.dirty_background_ratio – increasing cache may help write work loads that have ordered writes, or writes in bursty chunks • “CommitLogDirectory“ and “DataFileDirectories“ on separate devices for write performance improvement • GC tuning of Java heap/new gen – significant latency decreases • Tune Bloom Filters, Data Caches, and Compaction • Use compression for similar “column families” 25-35% read perf. 5-10% write gains
  • 34. 34 Summary: Notes • “clouds” are best composed of small services that can be independently combined, tuned, and scaled • human expectations in the transition from bare metal to cloud need to be reset • an iterative step-by-step approach is best – Test … Tune … Test … Tune … ! • lots of complex pieces in a cloud application
  • 35. 35 Summary: Notes (continued) • Compose your services as individual building blocks • Tune each component/service independently • Then tune the whole system • Automation is critical to iterative test/tune strategies!! • Performance tuning is absolutely worth the investment • Knowing your work loads is still (maybe even more?) critical
  • 36. 36 Questions and (hopefully?) Answers Let’s talk…
  • 37. Thank you! Copyright © 2014 Symantec Corporation. All rights reserved. Symantec and the Symantec Logo are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners. This document is provided for informational purposes only and is not intended as advertising. All warranties relating to the information in this document, either express or implied, are disclaimed to the maximum extent allowed by law. The information in this document is subject to change without notice. 37 Shane Gibson shane_gibson@symantec.com

Editor's Notes

  • #2: timing: 01 16:41
  • #3: timing: 03 16:43
  • #5: timing: 05 16:45
  • #7: timing: 07 16:47
  • #8: timing: 09 16:49
  • #9: timing: 09 16:49
  • #10: timing: 11 16:51
  • #11: timing: 11 16:51
  • #12: timing: 13 16:53
  • #13: timing: 13 16:53
  • #14: timing: 13 16:53
  • #15: timing: 13 16:53
  • #16: timing: 15 16:55
  • #18: timing: 17 16:57
  • #19: timing: 19 16:59
  • #20: timing: 19 16:59
  • #21: timing: 21 17:01
  • #23: timing: 23 17:03
  • #24: timing: 25 17:05
  • #25: timing: 27 17:07
  • #26: timing: 29 17:09
  • #28: timing: 31 17:11
  • #29: timing: 31 17:11
  • #30: timing: 33 17:13
  • #31: timing: 33 17:13
  • #32: timing: 35 17:15
  • #33: timing: 35 17:15
  • #35: timing: 37 17:17
  • #36: timing: 39 17:19