SlideShare a Scribd company logo
Demo
Tuning Linux for MongoDB
-By
Soumya
Bhattacharyya
Linux
• UNIX-like, mostly POSIX-compliant operating system
• First released on September 17th, 1991 by Linus
Torvalds
• 50Mhz CPUs were considered fast
• CPUs had 1 core
• RAM was measured in megabytes
• Ethernet speed was 1 - 10mbps
• General purpose
• It will run on a Raspberry Pi -> Mainframes
• Geared towards many different users and use cases
• Linux 3.2+ is much more efficient
MongoDB
• Document-oriented database first released in
2009
• Thread per connection model
• Non-contiguous memory access pattern
• Storage Engines
• MMAPv1
• Calls ‘mmap()’ to map on-disk data to RAM
• Keeps warm data in Linux filesystem cache
• Highly random I/O pattern
• Scales with RAM and Disk only**
• Cache uses all the RAM it can get
MongoDB
• Storage Engines
• WiredTiger and RocksDB
• Built-in Compression
• Uses combination of in-heap cache and filesystem
cache
• In-heap cache: uncompressed pages
• Filesystem cache: compressed pages
• Relatively sequential write patterns, low write
overhead
• Scales with RAM, Disk and CPUs
Ulimit
• Allows per-Linux-user resource
constraints
• Number of User-level Processes
• Number of Open Files
• CPU Seconds
• Scheduling Priority
• Others…
• MongoDB
• Should probably have it’s own
VM, container or server
• Creates a process for each
connection
Ulimit
• MongoDB (continued)
• Creates an open file for each active data file on disk
• 64,000 open files and 64,000 max processes is a good
start
• Read current ulimit: “ulimit -a” (run as mongo user)
• Set ulimit for mongo user in ‘/etc/security/limits.d/‘ or in
‘/etc/security/limits.conf’:
• Restart mongod/mongos after the ulimit change to apply
it
Virtual Memory: Dirty Ratio
• Dirty Pages
• Pages stored in-cache, but needs to be written to
storage
• VM Dirty Ratio
• Max percent of total memory that can be dirty
• VM stalls and flushes
when this limit is
reached
• Start with ’10’, default (30) too high
• VM Dirty Background Ratio
• Separate threshold for
background dirty page
flushing
• Flushes without pauses
• Start with ‘3’, default (15) too high
Virtual Memory: Swappiness
• A Linux kernel sysctl setting for preferring
RAM or disk for swap
• Linux default: 60
• To avoid disk-based swap: 1 (not zero!)
• To allow some disk-based swap: 10
• ‘0’ can cause unpredicted behaviour
Virtual Memory: Transparent HugePages
• Introduced in RHEL/CentOS 6, Linux 2.6.38+
• Merges 4kb pages into 2mb HugePages (512x) in background
(Khugepaged process)
• Decreases overall performance when used with MongoDB!
• Disable it
• Add “transparent_hugepage=never” to kernel command-line (GRUB)
• Reboot
NUMA (Non-Uniform Memory Access)
• A memory architecture that takes into
account the locality of memory, caches and
CPUs for lower latency
• MongoDB code base is not NUMA
“aware”, causing unbalanced
allocations
• Disable NUMA
• In the server BIOS
• Using ‘numactl’ in mongod init
script BEFORE ‘mongod’
command:
numactl --interleave=all /usr/bin/mongod <other flags>
Block Devices: Type and Layout
• Isolation
• Run Mongod dbPaths on separate volume
• Optionally, run Mongod journal on separate
volume
• RAID Level
• RAID 10 == performance/durability sweet spot
• RAID 0 == fast and dangerous
• SSDs
• Benefit MMAPv1 a lot
• Benefit WT and RocksDB a bit less
• Keep about 30% free for internal GC on the SSD
• EBS
• Network-attached can be risky
• JBOD + Replset as Data Redundancy (use at own
risk)
• Number of Replset Members
• Read and Write Concern
• Proper Geolocation/Node Redundancy
Block Devices: IO Scheduler
• Algorithm kernel uses to commit reads
and writes to disk
• CFQ
• Linux default
• Perhaps too clever/inefficient for
database workloads
• Deadline
• Best general default IMHO
• Predictable I/O request latencies
• Noop
• Use with virtualisation or (sometimes)
with BBU RAID controllers
Block Devices: Block Read-ahead
• Tuning that causes data ahead of a block
on disk to be read and then cached
• Assumption: there is a sequential read
pattern and something will benefit from the
extra cached blocks
• Risk: too high waste cache space
and increases eviction work
• MongoDB tends to have very random
disk patterns
• A good start for MongoDB volumes is a ’32’
(16kb) read-ahead
Block Devices: Udev rule
/etc/udev/rules.d/60-‐mongodb-‐disk.rules:
# set deadline scheduler and 32/16kb read-‐aheadfor/dev/sda
ACTION=="add|change", KERNEL=="sda", ATTR{queue/scheduler}="deadline", ATTR{bdi/read_ahead_kb}="16"
• Add file to ‘/etc/udev/rules.d’
• Reboot (or use CLI tools to
apply)
Filesystems and Options
• Use XFS or EXT4, not EXT3
• Use XFS only on WiredTiger
• Set ‘noatime’ on MongoDB data volumes in
‘/etc/fstab’:
• Remount the filesystem after an options change, or
reboot
Network Stack
• Defaults are not good for > 100mbps Ethernet
• Suggested starting point (add to
‘/etc/sysctl.conf’):
• Run “sysctl -p” as root to reload Network Stack
settings
NTPd (Network Time Protocol)
• Replication and Clustering needs
consistent clocks
• Run NTP daemon on all MongoDB
and Monitoring hosts
• Enable on restart
• Use a consistent time source/server
SELinux (Security-Enhanced Linux)
• A kernel-level security access control module
• Modes of SELinux
• Enforcing: Block and log policy violations
• Permissive: Log policy violations only
• Disabled: Completely disabled
• Recommended: Enforcing
• Percona Server for MongoDB 3.2+ RPMs
install an SELinux policy on
RedHat/CentOS!
• A “framework” for applying
tunings to Linux
• RedHat/CentOS 7
• Debian added it, not sure
on official status
• Watch my/Percona-Lab GitHub
for profiles in the future!
Tuned
CPUs and Frequency Scaling
• Lots of cores > faster cores
• ‘cpufreq’: a daemon for dynamic scaling of the CPU frequency
• Terrible idea for databases
• Disable or set governor to 100% frequency always, i.e mode:
‘performance’
• Disable any BIOS-level performance/efficiency tuneable
• ENERGY_PERF_BIAS
• A CentOS/RedHat tuning for energy vs performance balance
• RHEL 6 = ‘performance’
• RHEL 7 = ‘normal’ (!)
• Advice: use ‘tuned’ to set to ‘performance’

More Related Content

PPTX
Redis and its Scaling and Obersvability
PDF
Redis - Usability and Use Cases
PDF
MariaDB 10.11 key features overview for DBAs
PDF
Introduction to MongoDB
PPTX
Apache Tez: Accelerating Hadoop Query Processing
PDF
MySQL8.0 SYS スキーマ概要
PDF
IT Automation with Ansible
PDF
Building robust CDC pipeline with Apache Hudi and Debezium
Redis and its Scaling and Obersvability
Redis - Usability and Use Cases
MariaDB 10.11 key features overview for DBAs
Introduction to MongoDB
Apache Tez: Accelerating Hadoop Query Processing
MySQL8.0 SYS スキーマ概要
IT Automation with Ansible
Building robust CDC pipeline with Apache Hudi and Debezium

What's hot (20)

PDF
The Full MySQL and MariaDB Parallel Replication Tutorial
PDF
MySQL NDB Cluster 101
PDF
SSHパケットの復号ツールを作ろう_v1(Decrypt SSH .pcap File)
PDF
Job schedulerを活用したoperations as codeの世界
PDF
An Introduction to Redis for Developers.pdf
PPTX
Redis Introduction
PDF
Distributing big astronomical catalogues with Greenplum - Greenplum Summit 2019
PDF
Database Security Threats - MariaDB Security Best Practices
PPTX
Introduction to redis
PDF
Elasticsearch
PDF
Ansible勉強会資料
PDF
Alles, was Sie ueber HCL Notes 64-Bit Clients wissen muessen
PDF
MongoDB WiredTiger Internals: Journey To Transactions
PDF
GPU on OpenStack - GPUインターナルクラウドのベストプラクティス - OpenStack最新情報セミナー 2017年7月
PDF
Building an Activity Feed with Cassandra
PDF
Vectors are the new JSON in PostgreSQL
PPTX
Trusts You Might Have Missed
PDF
[네전따] 네트워크 엔지니어에게 쿠버네티스는 어떤 의미일까요
ODP
Elasticsearch presentation 1
PDF
Pacemaker + PostgreSQL レプリケーション構成(PG-REX)のフェイルオーバー高速化
The Full MySQL and MariaDB Parallel Replication Tutorial
MySQL NDB Cluster 101
SSHパケットの復号ツールを作ろう_v1(Decrypt SSH .pcap File)
Job schedulerを活用したoperations as codeの世界
An Introduction to Redis for Developers.pdf
Redis Introduction
Distributing big astronomical catalogues with Greenplum - Greenplum Summit 2019
Database Security Threats - MariaDB Security Best Practices
Introduction to redis
Elasticsearch
Ansible勉強会資料
Alles, was Sie ueber HCL Notes 64-Bit Clients wissen muessen
MongoDB WiredTiger Internals: Journey To Transactions
GPU on OpenStack - GPUインターナルクラウドのベストプラクティス - OpenStack最新情報セミナー 2017年7月
Building an Activity Feed with Cassandra
Vectors are the new JSON in PostgreSQL
Trusts You Might Have Missed
[네전따] 네트워크 엔지니어에게 쿠버네티스는 어떤 의미일까요
Elasticsearch presentation 1
Pacemaker + PostgreSQL レプリケーション構成(PG-REX)のフェイルオーバー高速化
Ad

Similar to Tuning linux for mongo db (20)

PPTX
Tuning Linux for MongoDB
PDF
Mongo db program_installation_guide
KEY
Deployment Strategy
KEY
Deployment Strategies (Mongo Austin)
PDF
OSDC 2016 - Tuning Linux for your Database by Colin Charles
KEY
Deployment Strategies
PDF
MongoDB: Advantages of an Open Source NoSQL Database
PDF
Shaping the Future of Travel with MongoDB
PDF
Tuning Linux for your database FLOSSUK 2016
PDF
Deployment
PDF
MongoDB and server performance
PPTX
Performance Tuning Cheat Sheet for MongoDB
KEY
Mongo db admin_20110329
PPTX
Mongo DB
KEY
MongoDB Administration 20110922
PPTX
High Performance, Scalable MongoDB in a Bare Metal Cloud
PDF
Tuning Linux Windows and Firebird for Heavy Workload
PDF
MongoDB at MapMyFitness
PPTX
High Performance, Scalable MongoDB in a Bare Metal Cloud
PDF
MongoDB at MapMyFitness from a DevOps Perspective
Tuning Linux for MongoDB
Mongo db program_installation_guide
Deployment Strategy
Deployment Strategies (Mongo Austin)
OSDC 2016 - Tuning Linux for your Database by Colin Charles
Deployment Strategies
MongoDB: Advantages of an Open Source NoSQL Database
Shaping the Future of Travel with MongoDB
Tuning Linux for your database FLOSSUK 2016
Deployment
MongoDB and server performance
Performance Tuning Cheat Sheet for MongoDB
Mongo db admin_20110329
Mongo DB
MongoDB Administration 20110922
High Performance, Scalable MongoDB in a Bare Metal Cloud
Tuning Linux Windows and Firebird for Heavy Workload
MongoDB at MapMyFitness
High Performance, Scalable MongoDB in a Bare Metal Cloud
MongoDB at MapMyFitness from a DevOps Perspective
Ad

Recently uploaded (20)

PDF
Sims 4 Historia para lo sims 4 para jugar
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PDF
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
PPT
tcp ip networks nd ip layering assotred slides
PPTX
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
PDF
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
PPTX
Job_Card_System_Styled_lorem_ipsum_.pptx
PDF
SASE Traffic Flow - ZTNA Connector-1.pdf
PPTX
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
PPTX
PptxGenJS_Demo_Chart_20250317130215833.pptx
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PDF
How to Ensure Data Integrity During Shopify Migration_ Best Practices for Sec...
PPTX
522797556-Unit-2-Temperature-measurement-1-1.pptx
PDF
Paper PDF World Game (s) Great Redesign.pdf
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PDF
WebRTC in SignalWire - troubleshooting media negotiation
PPTX
SAP Ariba Sourcing PPT for learning material
PPTX
artificial intelligence overview of it and more
PPTX
Introuction about WHO-FIC in ICD-10.pptx
PDF
Introduction to the IoT system, how the IoT system works
Sims 4 Historia para lo sims 4 para jugar
Slides PDF The World Game (s) Eco Economic Epochs.pdf
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
tcp ip networks nd ip layering assotred slides
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
Job_Card_System_Styled_lorem_ipsum_.pptx
SASE Traffic Flow - ZTNA Connector-1.pdf
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
PptxGenJS_Demo_Chart_20250317130215833.pptx
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
How to Ensure Data Integrity During Shopify Migration_ Best Practices for Sec...
522797556-Unit-2-Temperature-measurement-1-1.pptx
Paper PDF World Game (s) Great Redesign.pdf
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
WebRTC in SignalWire - troubleshooting media negotiation
SAP Ariba Sourcing PPT for learning material
artificial intelligence overview of it and more
Introuction about WHO-FIC in ICD-10.pptx
Introduction to the IoT system, how the IoT system works

Tuning linux for mongo db

  • 1. Demo Tuning Linux for MongoDB -By Soumya Bhattacharyya
  • 2. Linux • UNIX-like, mostly POSIX-compliant operating system • First released on September 17th, 1991 by Linus Torvalds • 50Mhz CPUs were considered fast • CPUs had 1 core • RAM was measured in megabytes • Ethernet speed was 1 - 10mbps • General purpose • It will run on a Raspberry Pi -> Mainframes • Geared towards many different users and use cases • Linux 3.2+ is much more efficient
  • 3. MongoDB • Document-oriented database first released in 2009 • Thread per connection model • Non-contiguous memory access pattern • Storage Engines • MMAPv1 • Calls ‘mmap()’ to map on-disk data to RAM • Keeps warm data in Linux filesystem cache • Highly random I/O pattern • Scales with RAM and Disk only** • Cache uses all the RAM it can get
  • 4. MongoDB • Storage Engines • WiredTiger and RocksDB • Built-in Compression • Uses combination of in-heap cache and filesystem cache • In-heap cache: uncompressed pages • Filesystem cache: compressed pages • Relatively sequential write patterns, low write overhead • Scales with RAM, Disk and CPUs
  • 5. Ulimit • Allows per-Linux-user resource constraints • Number of User-level Processes • Number of Open Files • CPU Seconds • Scheduling Priority • Others… • MongoDB • Should probably have it’s own VM, container or server • Creates a process for each connection
  • 6. Ulimit • MongoDB (continued) • Creates an open file for each active data file on disk • 64,000 open files and 64,000 max processes is a good start • Read current ulimit: “ulimit -a” (run as mongo user) • Set ulimit for mongo user in ‘/etc/security/limits.d/‘ or in ‘/etc/security/limits.conf’: • Restart mongod/mongos after the ulimit change to apply it
  • 7. Virtual Memory: Dirty Ratio • Dirty Pages • Pages stored in-cache, but needs to be written to storage • VM Dirty Ratio • Max percent of total memory that can be dirty • VM stalls and flushes when this limit is reached • Start with ’10’, default (30) too high • VM Dirty Background Ratio • Separate threshold for background dirty page flushing • Flushes without pauses • Start with ‘3’, default (15) too high
  • 8. Virtual Memory: Swappiness • A Linux kernel sysctl setting for preferring RAM or disk for swap • Linux default: 60 • To avoid disk-based swap: 1 (not zero!) • To allow some disk-based swap: 10 • ‘0’ can cause unpredicted behaviour
  • 9. Virtual Memory: Transparent HugePages • Introduced in RHEL/CentOS 6, Linux 2.6.38+ • Merges 4kb pages into 2mb HugePages (512x) in background (Khugepaged process) • Decreases overall performance when used with MongoDB! • Disable it • Add “transparent_hugepage=never” to kernel command-line (GRUB) • Reboot
  • 10. NUMA (Non-Uniform Memory Access) • A memory architecture that takes into account the locality of memory, caches and CPUs for lower latency • MongoDB code base is not NUMA “aware”, causing unbalanced allocations • Disable NUMA • In the server BIOS • Using ‘numactl’ in mongod init script BEFORE ‘mongod’ command: numactl --interleave=all /usr/bin/mongod <other flags>
  • 11. Block Devices: Type and Layout • Isolation • Run Mongod dbPaths on separate volume • Optionally, run Mongod journal on separate volume • RAID Level • RAID 10 == performance/durability sweet spot • RAID 0 == fast and dangerous • SSDs • Benefit MMAPv1 a lot • Benefit WT and RocksDB a bit less • Keep about 30% free for internal GC on the SSD • EBS • Network-attached can be risky • JBOD + Replset as Data Redundancy (use at own risk) • Number of Replset Members • Read and Write Concern • Proper Geolocation/Node Redundancy
  • 12. Block Devices: IO Scheduler • Algorithm kernel uses to commit reads and writes to disk • CFQ • Linux default • Perhaps too clever/inefficient for database workloads • Deadline • Best general default IMHO • Predictable I/O request latencies • Noop • Use with virtualisation or (sometimes) with BBU RAID controllers
  • 13. Block Devices: Block Read-ahead • Tuning that causes data ahead of a block on disk to be read and then cached • Assumption: there is a sequential read pattern and something will benefit from the extra cached blocks • Risk: too high waste cache space and increases eviction work • MongoDB tends to have very random disk patterns • A good start for MongoDB volumes is a ’32’ (16kb) read-ahead
  • 14. Block Devices: Udev rule /etc/udev/rules.d/60-‐mongodb-‐disk.rules: # set deadline scheduler and 32/16kb read-‐aheadfor/dev/sda ACTION=="add|change", KERNEL=="sda", ATTR{queue/scheduler}="deadline", ATTR{bdi/read_ahead_kb}="16" • Add file to ‘/etc/udev/rules.d’ • Reboot (or use CLI tools to apply)
  • 15. Filesystems and Options • Use XFS or EXT4, not EXT3 • Use XFS only on WiredTiger • Set ‘noatime’ on MongoDB data volumes in ‘/etc/fstab’: • Remount the filesystem after an options change, or reboot
  • 16. Network Stack • Defaults are not good for > 100mbps Ethernet • Suggested starting point (add to ‘/etc/sysctl.conf’): • Run “sysctl -p” as root to reload Network Stack settings
  • 17. NTPd (Network Time Protocol) • Replication and Clustering needs consistent clocks • Run NTP daemon on all MongoDB and Monitoring hosts • Enable on restart • Use a consistent time source/server
  • 18. SELinux (Security-Enhanced Linux) • A kernel-level security access control module • Modes of SELinux • Enforcing: Block and log policy violations • Permissive: Log policy violations only • Disabled: Completely disabled • Recommended: Enforcing • Percona Server for MongoDB 3.2+ RPMs install an SELinux policy on RedHat/CentOS!
  • 19. • A “framework” for applying tunings to Linux • RedHat/CentOS 7 • Debian added it, not sure on official status • Watch my/Percona-Lab GitHub for profiles in the future! Tuned
  • 20. CPUs and Frequency Scaling • Lots of cores > faster cores • ‘cpufreq’: a daemon for dynamic scaling of the CPU frequency • Terrible idea for databases • Disable or set governor to 100% frequency always, i.e mode: ‘performance’ • Disable any BIOS-level performance/efficiency tuneable • ENERGY_PERF_BIAS • A CentOS/RedHat tuning for energy vs performance balance • RHEL 6 = ‘performance’ • RHEL 7 = ‘normal’ (!) • Advice: use ‘tuned’ to set to ‘performance’