SlideShare a Scribd company logo
RabbitMQ Operations
About me
About me
• RabbitMQ staff engineer at Pivotal
About me
• RabbitMQ staff engineer at Pivotal
• @michaelklishin just about everywhere
RabbitMQ Operations
About this talk
About this talk
• Brain dump from years of answering questions
About this talk
• Brain dump from years of answering questions
• Focusses on the most recent release (3.5.6)
RabbitMQ Operations
Provisioning
Provisioning
• Be aware of mirrors: GitHub, Bintray, …
Provisioning
• Be aware of mirrors: GitHub, Bintray, …
• Looking into community-hosted mirrors
Provisioning
• Be aware of mirrors: GitHub, Bintray, …
• Looking into community-hosted mirrors
• Use packages + Chef/Puppet/…
RabbitMQ Operations
OS resources
OS resources
• Modern Linux defaults are absolutely inadequate
for servers
ulimit -n default: 1024
RabbitMQ Operations
Set ulimit -n and fs.file-max to 500K
and forget about it
TCP keepalive timeout:
from 11 minutes to over 2
hours by default
net.ipv4.tcp_keepalive_time = 6
net.ipv4.tcp_keepalive_intvl = 3
net.ipv4.tcp_keepalive_probes = 3
enable client heartbeats, e.g. with an
interval of 6-12 seconds
OS resources
• Modern Linux defaults are absolutely inadequate
for servers
• Tuning for throughput vs. high number of
concurrent connections
Throughput: larger TCP buffers
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
rabbit.hipe_compile = true
(only on Erlang 17.x or 18.x)
Concurrent connections: smaller TCP
buffers, low tcp_fin_timeout,
tcp_tw_reuse = 1, …
rabbit.tcp_listen_options.sndbuf
rabbit.tcp_listen_options.recbuf
rabbit.tcp_listen_options.backlog
Reduce per connection
RAM use by 10x
rabbit.tcp_listen_options.sndbuf = 16384
rabbit.tcp_listen_options.recbuf = 16384
RabbitMQ Operations
Reduce per connection
RAM use by 10x
Throughput drops by a comparable amount
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_tw_reuse = 1
Careful with tcp_tw_reuse
behind NAT*
* http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html
net.core.somaxconn = 4096
http://www.rabbitmq.com/networking.html
RabbitMQ Operations
Disk space
Disk space
• Pay attention to what partition /var/lib ends up on
Disk space
• Pay attention to what partition /var/lib ends up on
• Transient messages can be paged to disk
Disk space
• Pay attention to what partition /var/lib ends up on
• Transient messages can be paged to disk
• RabbitMQ’s disk monitor isn’t supported on all
platforms
RabbitMQ Operations
RAM usage
RAM usage
• rabbit.vm_memory_high_watermark
RAM usage
• rabbit.vm_memory_high_watermark
• rabbit.vm_memory_high_watermark_paging_ratio
rabbitmqctl status
rabbitmqctl report
RAM usage
• rabbit.vm_memory_high_watermark
• rabbit.vm_memory_high_watermark_paging_ratio
• Significant paging efficiency improvements in
3.5.5-3.5.6
RAM usage
• rabbit.vm_memory_high_watermark
• rabbit.vm_memory_high_watermark_paging_ratio
• Significant paging efficiency improvements in
3.5.5-3.5.6
• Disable rabbit.fhc_read_buffering (3.5.6+)
rabbitmqctl eval
‘file_handle_cache:clear_read_cache().’
recon
Ability to set VM RAM watermark as absolute
value is coming in 3.6
RabbitMQ Operations
Stats collector falls behind
Stats collector falls behind
• Management DB stats collector can get
overwhelmed
Stats collector falls behind
• Management DB stats collector can get
overwhelmed
• Key symptom: disproportionally higher RAM use
on the node that hosts management DB
rabbitmqctl eval 'P = whereis(rabbit_mgmt_db), erlang:process_info(P).'
[{registered_name,rabbit_mgmt_db},
{current_function,{erlang,hibernate,3}},
{initial_call,{proc_lib,init_p,5}},
{status,waiting},
{message_queue_len,0},
{messages,[]},
{links,[<5477.358.0>]},
{dictionary,[{'$ancestors',[<5477.358.0>,rabbit_mgmt_sup,rabbit_mgmt_sup_sup,
<5477.338.0>]},
{'$initial_call',{gen,init_it,7}}]},
{trap_exit,false},
{error_handler,error_handler},
{priority,high},
{group_leader,<5477.337.0>},
{total_heap_size,167},
{heap_size,167},
{stack_size,0},
{reductions,318},
{garbage_collection,[{min_bin_vheap_size,46422},
{min_heap_size,233},
{fullsweep_after,65535},
{minor_gcs,0}]},
{suspending,[]}]
rabbit.collect_statistics_interval = 30000
rabbitmq_management.rates_mode = none
rabbitmqctl eval 'P = whereis(rabbit_mgmt_db), erlang:exit(P, please_crash).'
Parallel stats collector is coming in 3.7
RabbitMQ Operations
Cluster formation
Cluster formation
• Node restart order dependency
Cluster formation
• Node restart order dependency
• github.com/rabbitmq/rabbitmq-clusterer
Cluster formation
• Node restart order dependency
• github.com/rabbitmq/rabbitmq-clusterer
• github.com/aweber/rabbitmq-autocluster
RabbitMQ Operations
Backups
How do I back up?
• cp $RABBITMQ_MNESIA_DIR + tar
How do I back up?
• cp $RABBITMQ_MNESIA_DIR + tar
• Replicate everything off-site with exchange
federation + set message TTL via a policy
RabbitMQ Operations
Hostname changes
rabbitmqctl rename_cluster_node [old name] [new name]
RabbitMQ Operations
Network partition handling
Network partition handling
• When in doubt, use “autoheal”
Network partition handling
• When in doubt, use “autoheal”
• “Merge” is coming but has very real downsides,
too
RabbitMQ Operations
Misc
Misc
• Don’t use default vhost and/or credentials
Misc
• Don’t use default vhost and/or credentials
• Don’t use 32-bit Erlang
Misc
• Don’t use default vhost and/or credentials
• Don’t use 32-bit Erlang
• Use reasonably up-to-date releases
Misc
• Don’t use default vhost and/or credentials
• Don’t use 32-bit Erlang
• Use reasonably up-to-date releases
• Participate in rabbitmq-users
Misc
• OCF resource template from Fuel (by Mirantis)
Misc
• OCF resource template from Fuel (by Mirantis)
• Use TLS
RabbitMQ Operations
Coming in 3.6
Coming in 3.6
• In process file buffering disabled by default
Coming in 3.6
• In process file buffering disabled by default
• Queue master to node distribution strategies
Coming in 3.6
• In process file buffering disabled by default
• Queue master to node distribution strategies
• SHA-256 (or 512) for password hashing
Coming in 3.6
• In process file buffering disabled by default
• Queue master to node distribution strategies
• SHA-256 (or 512) for password hashing
• More responsive management UI with
pagination
Coming in 3.6
• In process file buffering disabled by default
• Queue master to node distribution strategies
• SHA-256 (or 512) for password hashing
• More responsive management UI with
pagination
• Streaming rabbitmqctl
Coming past 3.6
Coming past 3.6
• Pluggable cluster formation (à la ElasticSearch)
Coming past 3.6
• Pluggable cluster formation (à la ElasticSearch)
• On disk data recovery tools
Coming past 3.6
• Pluggable cluster formation (à la ElasticSearch)
• On disk data recovery tools
• Better CLI tools
Coming past 3.6
• Pluggable cluster formation (à la ElasticSearch)
• On disk data recovery tools
• Better CLI tools
• Easier off-site replication
Coming past 3.6
• Pluggable cluster formation (à la ElasticSearch)
• On disk data recovery tools
• Better CLI tools
• Easier off-site replication
• “Merge” partition handling strategy (no earlier
than 3.8)
RabbitMQ Operations
Thank you
Thank you
• @michaelklishin
• github.com/michaelklishin
• rabbitmq-users
• Our team is hiring!

More Related Content

PPTX
The RabbitMQ Message Broker
PDF
Power-up services with gRPC
PDF
Rabbitmq basics
PDF
Architecture | The Future of Messaging: RabbitMQ and AMQP | Eberhard Wolff
PPTX
RabbitMQ & Kafka
PPTX
Introduction to gRPC
PPTX
Distributed Tracing in Practice
PDF
Developing event-driven microservices with event sourcing and CQRS (phillyete)
The RabbitMQ Message Broker
Power-up services with gRPC
Rabbitmq basics
Architecture | The Future of Messaging: RabbitMQ and AMQP | Eberhard Wolff
RabbitMQ & Kafka
Introduction to gRPC
Distributed Tracing in Practice
Developing event-driven microservices with event sourcing and CQRS (phillyete)

What's hot (20)

PDF
Introduction to AMQP Messaging with RabbitMQ
PDF
Docker Introduction.pdf
PPTX
Data Pipelines with Kafka Connect
PDF
Introducing Confluent Cloud: Apache Kafka as a Service
PDF
Cilium - BPF & XDP for containers
PDF
ksqlDB: A Stream-Relational Database System
PDF
How VXLAN works on Linux
PDF
Open shift 4 infra deep dive
PDF
ML2/OVN アーキテクチャ概観
PPTX
5. IO virtualization
PPTX
Kubernetes presentation
PPTX
Adopting OpenTelemetry
PPTX
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
PPTX
Apache Kafka at LinkedIn
PDF
PPTX
Manchester MuleSoft Meetup #6 - Runtime Fabric with Mulesoft
PDF
PPTX
Running and Managing Mule Applications
PPTX
Introduction to Docker - 2017
PDF
MuleSoft Surat Meetup#45 - Anypoint Flex Gateway as a Kubernetes Ingress Cont...
Introduction to AMQP Messaging with RabbitMQ
Docker Introduction.pdf
Data Pipelines with Kafka Connect
Introducing Confluent Cloud: Apache Kafka as a Service
Cilium - BPF & XDP for containers
ksqlDB: A Stream-Relational Database System
How VXLAN works on Linux
Open shift 4 infra deep dive
ML2/OVN アーキテクチャ概観
5. IO virtualization
Kubernetes presentation
Adopting OpenTelemetry
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Apache Kafka at LinkedIn
Manchester MuleSoft Meetup #6 - Runtime Fabric with Mulesoft
Running and Managing Mule Applications
Introduction to Docker - 2017
MuleSoft Surat Meetup#45 - Anypoint Flex Gateway as a Kubernetes Ingress Cont...
Ad

Viewers also liked (20)

PDF
Troubleshooting RabbitMQ and services that use it
PPTX
Troubleshooting common oslo.messaging and RabbitMQ issues
PPTX
Scalable Open Source
PDF
3 years with Clojure
PDF
PPTX
High powered messaging with RabbitMQ
PPTX
How to Troubleshoot OpenStack Without Losing Sleep
PDF
Open source responsibly
PDF
How to use KASAN to debug memory corruption in OpenStack environment- (2)
ODP
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
PDF
Taming Command Bus
PDF
Introduction to Heroku Postgres
PDF
Improvements in RabbitMQ
PDF
Rabbitmq, amqp Intro - Messaging Patterns
PPTX
Roman Kuznietsov: Zeromq: sockets on steroids.Zero mq
KEY
PPTX
Hypervisor Selection in CloudStack and OpenStack
PDF
新浪微博开放平台Redis实战
KEY
Taste Rabbitmq
PPTX
高性能No sql数据库redis
Troubleshooting RabbitMQ and services that use it
Troubleshooting common oslo.messaging and RabbitMQ issues
Scalable Open Source
3 years with Clojure
High powered messaging with RabbitMQ
How to Troubleshoot OpenStack Without Losing Sleep
Open source responsibly
How to use KASAN to debug memory corruption in OpenStack environment- (2)
OpenStack Oslo Messaging RPC API Tutorial Demo Call, Cast and Fanout
Taming Command Bus
Introduction to Heroku Postgres
Improvements in RabbitMQ
Rabbitmq, amqp Intro - Messaging Patterns
Roman Kuznietsov: Zeromq: sockets on steroids.Zero mq
Hypervisor Selection in CloudStack and OpenStack
新浪微博开放平台Redis实战
Taste Rabbitmq
高性能No sql数据库redis
Ad

Similar to RabbitMQ Operations (20)

PDF
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
PDF
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
PDF
lessons from managing a pulsar cluster
PDF
Perforce Administration: Optimization, Scalability, Availability and Reliability
PDF
Построение распределенной системы сбора данных с помощью RabbitMQ, Alvaro Vid...
PDF
Percona XtraDB 集群文档
PDF
Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ
PDF
4Developers 2015: Scaling LAMP doesn't have to suck - Sebastian Grodzicki
PPTX
Tuning linux for mongo db
PDF
Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...
PPTX
Kafka blr-meetup-presentation - Kafka internals
PDF
Evolution of MongoDB Replicaset and Its Best Practices
PPTX
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
PDF
Linux-HA with Pacemaker
PDF
Evolution Of MongoDB Replicaset
KEY
Deployment Strategies (Mongo Austin)
PPTX
Tuning Linux for MongoDB
PDF
Reliability Guarantees for Apache Kafka
PDF
VMworld 2013: Extreme Performance Series: Storage in a Flash
PDF
How to stuff a 900 pound gorilla into a smartphone
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
lessons from managing a pulsar cluster
Perforce Administration: Optimization, Scalability, Availability and Reliability
Построение распределенной системы сбора данных с помощью RabbitMQ, Alvaro Vid...
Percona XtraDB 集群文档
Alvaro Videla, Building a Distributed Data Ingestion System with RabbitMQ
4Developers 2015: Scaling LAMP doesn't have to suck - Sebastian Grodzicki
Tuning linux for mongo db
Training Slides: Intermediate 202: Performing Cluster Maintenance with Zero-D...
Kafka blr-meetup-presentation - Kafka internals
Evolution of MongoDB Replicaset and Its Best Practices
Ibm spectrum scale fundamentals workshop for americas part 2 IBM Spectrum Sca...
Linux-HA with Pacemaker
Evolution Of MongoDB Replicaset
Deployment Strategies (Mongo Austin)
Tuning Linux for MongoDB
Reliability Guarantees for Apache Kafka
VMworld 2013: Extreme Performance Series: Storage in a Flash
How to stuff a 900 pound gorilla into a smartphone

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
KodekX | Application Modernization Development
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
Understanding_Digital_Forensics_Presentation.pptx
Spectroscopy.pptx food analysis technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
KodekX | Application Modernization Development
Unlocking AI with Model Context Protocol (MCP)
MYSQL Presentation for SQL database connectivity
Building Integrated photovoltaic BIPV_UPV.pdf
Empathic Computing: Creating Shared Understanding
Reach Out and Touch Someone: Haptics and Empathic Computing
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Encapsulation_ Review paper, used for researhc scholars
Network Security Unit 5.pdf for BCA BBA.
Advanced methodologies resolving dimensionality complications for autism neur...
Review of recent advances in non-invasive hemoglobin estimation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Spectral efficient network and resource selection model in 5G networks
MIND Revenue Release Quarter 2 2025 Press Release
Per capita expenditure prediction using model stacking based on satellite ima...

RabbitMQ Operations