SlideShare a Scribd company logo
Linux IO internals
for database administrators
Ilya Kosmodemiansky
ik@postgresql-consulting.com
Why this talk
• Linux is a most common OS for databases
• DBAs often run into IO problems
• Most of the information on topic is written by kernel
developers (for kernel developers) or is checklist-style
• Checklists are useful, but up to certain workload
The main IO problem for databases
• How to maximize page throughput between memory and
disks
• Things involved:
Disks
Memory
CPU
IO Schedulers
Filesystems
Database itself
• IO problems for databases are not always only about disks
A typical database
DRAM
Disks
Shared memory
Database
Linux
Page cache
User space
Kernel space
WAL buffer
WAL Datafile
Key things about such workload
• Shared memory segment can be very large
• Keeping in-memory pages synchronized with disk generates
huge IO
• WAL should be written fast and safe
• One and every layer of OS IO stack involved
Memory allocation and mapping
CPU L1
MMU TLB
Memory
L2 L3
page
table
Virtual addressing
Translation
Physical addressing
DBAs takeaways:
• Database with huge shared memory segment benefits from
huge pages
• But not from transparent huge pages
Databases operate large continuous shared memory segments
THP defragmentation can lead to severe performance
degradation in such cases
Freeing memory
Page cache
Swap
Free list Free list Free list
alloc()free()
page-out
vm.min_free_kbytes
reclaim
vm.swapiness
OOM-killer
vm.panic_on_oom
Page-out
• Page-out happens if:
someone calls fsync
30 sec timeout exceeded (vm.dirty_expire_centisecs)
Too many dirty pages (vm.dirty_background_ratio and
vm.dirty_ratio)
• It is reasonable to tune vm.dirty_* on rotating disks with
RAID controller, but helps a little on server class SSDs
DBAs takeaways:
• vm.overcommit_memory
0 - heuristic overcommit, reduces swap usage
1 - always overcommit
2 - do not overcommit (vm.overcommit_ratio = 50 by
default)
• vm.min_free_kbytes - reasonably high (can be easy 1000000
on a server with enough memory)
• vm.swappiness = 1
0 - swap disabled
60 - default
100 - swap preffered instead of other reaping mechanisms
• Your database would not like OOM-killer
OOM-killer: plague and cholera
• vm.panic_on_oom effectively disables OOM-killer, but that is
probably not the result you desire
• Or for a certain process: echo -17 > /proc/12465/oom_adj
but again
Filesystems: write barriers
Journal
Kernel buffer
Journalated fylesystem
Data
Journal entry
Data
DBAs takeaways:
• ext4 or xfs
• Disable write barrier (only if SSD/controller cache is protected
by capacitor/battery)
IO stack
shared_buffers
Page cache
VFS
EXT4
Block device interface
Disks
Buffer cache
Elevator/IO Scheduler
Elevators
• noop or none - then disks throughput is so high, that it can
not benefit from keen scheduling
PCIe SSDs
SAN disk arrays
• deadline - rotating disks
• CFQ - universal, default one
Thanks
to my collegues Alexey Lesovsky and Max Boguk for a lot of
research on this topic
Questions?
ik@postgresql-consulting.com

More Related Content

PDF
Linux tuning to improve PostgreSQL performance
PDF
Overview of Postgres Utility Processes
 
ODP
PostgreSQL Replication in 10 Minutes - SCALE
PDF
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
PPTX
PostgreSQL Hangout Parameter Tuning
ODP
Logical replication with pglogical
PDF
PostgreSQL worst practices, version PGConf.US 2017 by Ilya Kosmodemiansky
PPT
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
Linux tuning to improve PostgreSQL performance
Overview of Postgres Utility Processes
 
PostgreSQL Replication in 10 Minutes - SCALE
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
PostgreSQL Hangout Parameter Tuning
Logical replication with pglogical
PostgreSQL worst practices, version PGConf.US 2017 by Ilya Kosmodemiansky
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...

What's hot (20)

PDF
Replication Solutions for PostgreSQL
PPTX
The Magic of Tuning in PostgreSQL
PDF
PostgreSQL Scaling And Failover
PDF
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
PDF
Postgres on OpenStack
 
PDF
Tuning Linux for Databases.
PDF
Postgres in Amazon RDS
PDF
Really Big Elephants: PostgreSQL DW
PDF
Deploying postgre sql on amazon ec2
ODP
Shootout at the AWS Corral
PDF
Out of the box replication in postgres 9.4
PPT
PostgreSQL9.3 Switchover/Switchback
PDF
Architecture for building scalable and highly available Postgres Cluster
PDF
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
PDF
PostgreSQL WAL for DBAs
PDF
MySQL Server Backup, Restoration, and Disaster Recovery Planning
PPTX
Rit 2011 ats
PDF
Tool it Up! - Session #3 - MySQL
PDF
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
PPTX
Creating a Benchmarking Infrastructure That Just Works
Replication Solutions for PostgreSQL
The Magic of Tuning in PostgreSQL
PostgreSQL Scaling And Failover
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Postgres on OpenStack
 
Tuning Linux for Databases.
Postgres in Amazon RDS
Really Big Elephants: PostgreSQL DW
Deploying postgre sql on amazon ec2
Shootout at the AWS Corral
Out of the box replication in postgres 9.4
PostgreSQL9.3 Switchover/Switchback
Architecture for building scalable and highly available Postgres Cluster
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
PostgreSQL WAL for DBAs
MySQL Server Backup, Restoration, and Disaster Recovery Planning
Rit 2011 ats
Tool it Up! - Session #3 - MySQL
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Creating a Benchmarking Infrastructure That Just Works
Ad

Viewers also liked (20)

PDF
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
PDF
Infrastructure Monitoring with Postgres
PDF
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
PDF
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
PDF
Как PostgreSQL работает с диском
PDF
PostgreSQL Meetup Berlin at Zalando HQ
PDF
10 things, an Oracle DBA should care about when moving to PostgreSQL
PDF
Streaming replication in practice
PDF
Deep dive into PostgreSQL statistics.
PDF
Troubleshooting PostgreSQL Streaming Replication
PDF
Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)
PDF
Иван Фролков. Tricky SQL
PDF
Максим Богук. Postgres-XC
PDF
Илья Космодемьянский. Использование очередей асинхронных сообщений с PostgreSQL
PDF
Kosmodemiansky wr 2013
PDF
Highload 2014. PostgreSQL: ups, DevOps.
PDF
#RuPostges в Yandex, эпизод 3. Что же нового в PostgreSQL 9.6
PDF
Deep dive into PostgreSQL statistics.
PDF
Pgconfru 2015 kosmodemiansky
PDF
PostgreSQL Vacuum: Nine Circles of Hell
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya Kosmodemiansky
Infrastructure Monitoring with Postgres
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Как PostgreSQL работает с диском
PostgreSQL Meetup Berlin at Zalando HQ
10 things, an Oracle DBA should care about when moving to PostgreSQL
Streaming replication in practice
Deep dive into PostgreSQL statistics.
Troubleshooting PostgreSQL Streaming Replication
Как PostgreSQL работает с диском, Илья Космодемьянский (PostgreSQL-Consulting)
Иван Фролков. Tricky SQL
Максим Богук. Postgres-XC
Илья Космодемьянский. Использование очередей асинхронных сообщений с PostgreSQL
Kosmodemiansky wr 2013
Highload 2014. PostgreSQL: ups, DevOps.
#RuPostges в Yandex, эпизод 3. Что же нового в PostgreSQL 9.6
Deep dive into PostgreSQL statistics.
Pgconfru 2015 kosmodemiansky
PostgreSQL Vacuum: Nine Circles of Hell
Ad

Similar to Linux internals for Database administrators at Linux Piter 2016 (20)

PDF
Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2...
PDF
Linux performance tuning & stabilization tips (mysqlconf2010)
PPT
Mysql talk
PDF
MySQL Oslayer performace optimization
PDF
5 Steps to PostgreSQL Performance
PDF
Five steps perform_2009 (1)
PPT
Oracle real application_cluster
PDF
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
PDF
All Your IOPS Are Belong To Us - A Pinteresting Case Study in MySQL Performan...
ODP
Memory management in Linux
PDF
OSDC 2016 - Tuning Linux for your Database by Colin Charles
PPTX
Oracle Performance On Linux X86 systems
PPT
Scalable Storage Configuration for the Physics Database Services
PDF
Tx lf propercareandfeedmysql
PDF
Алексей Лесовский "Тюнинг Linux для баз данных. "
PDF
InnoDB Architecture and Performance Optimization, Peter Zaitsev
PDF
How Shit Works: Storage
PDF
MySQL Performance Tuning: The Perfect Scalability (OOW2019)
ODP
The care and feeding of a MySQL database
PPTX
What every data programmer needs to know about disks
Linux IO internals for database administrators (SCaLE 2017 and PGDay Nordic 2...
Linux performance tuning & stabilization tips (mysqlconf2010)
Mysql talk
MySQL Oslayer performace optimization
5 Steps to PostgreSQL Performance
Five steps perform_2009 (1)
Oracle real application_cluster
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar Ahmed
All Your IOPS Are Belong To Us - A Pinteresting Case Study in MySQL Performan...
Memory management in Linux
OSDC 2016 - Tuning Linux for your Database by Colin Charles
Oracle Performance On Linux X86 systems
Scalable Storage Configuration for the Physics Database Services
Tx lf propercareandfeedmysql
Алексей Лесовский "Тюнинг Linux для баз данных. "
InnoDB Architecture and Performance Optimization, Peter Zaitsev
How Shit Works: Storage
MySQL Performance Tuning: The Perfect Scalability (OOW2019)
The care and feeding of a MySQL database
What every data programmer needs to know about disks

Recently uploaded (20)

PPTX
sap open course for s4hana steps from ECC to s4
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
KodekX | Application Modernization Development
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Encapsulation theory and applications.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Cloud computing and distributed systems.
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
cuic standard and advanced reporting.pdf
sap open course for s4hana steps from ECC to s4
“AI and Expert System Decision Support & Business Intelligence Systems”
Programs and apps: productivity, graphics, security and other tools
MIND Revenue Release Quarter 2 2025 Press Release
Reach Out and Touch Someone: Haptics and Empathic Computing
KodekX | Application Modernization Development
Dropbox Q2 2025 Financial Results & Investor Presentation
Encapsulation theory and applications.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Cloud computing and distributed systems.
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Understanding_Digital_Forensics_Presentation.pptx
Spectroscopy.pptx food analysis technology
MYSQL Presentation for SQL database connectivity
Spectral efficient network and resource selection model in 5G networks
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
cuic standard and advanced reporting.pdf

Linux internals for Database administrators at Linux Piter 2016

  • 1. Linux IO internals for database administrators Ilya Kosmodemiansky ik@postgresql-consulting.com
  • 2. Why this talk • Linux is a most common OS for databases • DBAs often run into IO problems • Most of the information on topic is written by kernel developers (for kernel developers) or is checklist-style • Checklists are useful, but up to certain workload
  • 3. The main IO problem for databases • How to maximize page throughput between memory and disks • Things involved: Disks Memory CPU IO Schedulers Filesystems Database itself • IO problems for databases are not always only about disks
  • 4. A typical database DRAM Disks Shared memory Database Linux Page cache User space Kernel space WAL buffer WAL Datafile
  • 5. Key things about such workload • Shared memory segment can be very large • Keeping in-memory pages synchronized with disk generates huge IO • WAL should be written fast and safe • One and every layer of OS IO stack involved
  • 6. Memory allocation and mapping CPU L1 MMU TLB Memory L2 L3 page table Virtual addressing Translation Physical addressing
  • 7. DBAs takeaways: • Database with huge shared memory segment benefits from huge pages • But not from transparent huge pages Databases operate large continuous shared memory segments THP defragmentation can lead to severe performance degradation in such cases
  • 8. Freeing memory Page cache Swap Free list Free list Free list alloc()free() page-out vm.min_free_kbytes reclaim vm.swapiness OOM-killer vm.panic_on_oom
  • 9. Page-out • Page-out happens if: someone calls fsync 30 sec timeout exceeded (vm.dirty_expire_centisecs) Too many dirty pages (vm.dirty_background_ratio and vm.dirty_ratio) • It is reasonable to tune vm.dirty_* on rotating disks with RAID controller, but helps a little on server class SSDs
  • 10. DBAs takeaways: • vm.overcommit_memory 0 - heuristic overcommit, reduces swap usage 1 - always overcommit 2 - do not overcommit (vm.overcommit_ratio = 50 by default) • vm.min_free_kbytes - reasonably high (can be easy 1000000 on a server with enough memory) • vm.swappiness = 1 0 - swap disabled 60 - default 100 - swap preffered instead of other reaping mechanisms • Your database would not like OOM-killer
  • 11. OOM-killer: plague and cholera • vm.panic_on_oom effectively disables OOM-killer, but that is probably not the result you desire • Or for a certain process: echo -17 > /proc/12465/oom_adj but again
  • 12. Filesystems: write barriers Journal Kernel buffer Journalated fylesystem Data Journal entry Data
  • 13. DBAs takeaways: • ext4 or xfs • Disable write barrier (only if SSD/controller cache is protected by capacitor/battery)
  • 14. IO stack shared_buffers Page cache VFS EXT4 Block device interface Disks Buffer cache Elevator/IO Scheduler
  • 15. Elevators • noop or none - then disks throughput is so high, that it can not benefit from keen scheduling PCIe SSDs SAN disk arrays • deadline - rotating disks • CFQ - universal, default one
  • 16. Thanks to my collegues Alexey Lesovsky and Max Boguk for a lot of research on this topic