SlideShare a Scribd company logo
Troubleshooting Redis
@charsyam
KAKAO
About me
•Senior Software Engineer in KAKAO
•Redis/Twemproxy Contributor
•Redis-doc project merger.
•Apache Tajo Commiter
Kakaostory
Kakaostory
DAU: 8M
MAU: 15M
Kakaostory
420M API CALL COUNT
Kakaostory Service Stack
•For Storage
•MariaDB(Master/Slave for HA)
•Hbase
•Cassandra
•For Cache
•Redis
•Arcus
• (Memcached variant, opensource, supporting collections)
Redis
5.2TB, 274 Servers
(Arcus: 3.3TB, 137 Servers)
Why Redis?
•As lookaside Cache for service data
•Example)
•User Profile Information
•Feeds
•Activities
•Friends
•Notifications
Agenda
•Single Threaded
•Memory Fragmentation
•Redis Troubleshooting cases
•Redis Monitoring
•Redis HA
Single Threaded
Redis Event Loop
Client #1
Client #2
……
Client #N
Redis Event Loop
I/O Multiplexing
Process
Command
command #1
command #2
Only One Command
at Once
Long-time Spending
operations
KEYS
FlushAll/FlushDB
LUA Script
MULTI/EXEC
Delete Collections
Why slow?
O(n)
KEYS – Iterating all Keys
di = dictGetSafeIterator(c->db->dict);
allkeys = (pattern[0] == '*' && pattern[1] == '0');
while((de = dictNext(di)) != NULL) {
……
stringmatchlen(pattern,plen,key,sdslen(key),0)
}
FlushAll – Deleting all items
for (i = 0; i < ht->size && ht->used > 0; i++) {
dictEntry *he, *nextHe;
if ((he = ht->table[i]) == NULL) continue;
while(he) {
nextHe = he->next;
dictFreeKey(d, he);
dictFreeVal(d, he);
zfree(he);
ht->used--;
he = nextHe;
}
}
How slow?
Command Item Count Time
flushall 1,000,000 1000ms(1 second)
FlushAll
Delete collections
Item Count Time
list 1,000,000 1000ms(1 second)
set 1,000,000 1000ms(1 second)
Sorted set 1,000,000 1000ms(1 second)
hash 1,000,000 1000ms(1 second)
You can use Xscan commands from 2.8.x
Using Multiple Instances
in a Physical Server
(can use more cpus)
Fork for
Creating RDB,
AOF Rewrite
Maximum 2x Memory
Disk IO
CPU Load/Usage
CPU 4 core, 32G Memory
Mem: 24G
Mem: 8G
Mem: 8G
Mem: 8G
more Reliable
Set CPU Affinity
using taskset
Divide NIC Interrupt CPU
and Redis Process CPU
Memory Fragmentation
Memory Fragmentation #1
Used_memory RSS
Memory Fragmentation #2
Used_memory RSS
Starting to use Arcus at this case
Redis Troubleshooting Cases
Problem #1
KEYS
Performance Spike
INFO all
# Commandstats
cmdstat_psetex:calls=2326667,usec=9322929,usec_per_call=4.01
……
cmdstat_pexpire:calls=3695333,usec=10068580,usec_per_call=2.72
cmdstat_keys:calls=249,usec=1000314022,usec_per_call=4017325.50
cmdstat_ping:calls=27005,usec=30027,usec_per_call=1.11
……
Slowlog get 10
rename KEYS Command
Using Scan
Redis Dict Structure
Scan #1
Scan #2
Scan #3
Problem #2
All Write Commands Fail
“MISCONF Redis is configured to save RDB
snapshots, but is currently not able to persist on
disk. Commands that may modify the data set are
disabled. Please check Redis logs for details about
the error.”
Reason
if (((server.stop_writes_on_bgsave_err &&
server.saveparamslen > 0 &&
server.lastbgsave_status == C_ERR) ||
server.aof_last_write_status == C_ERR) &&
server.masterhost == NULL &&
(c->cmd->flags & CMD_WRITE ||
c->cmd->proc == pingCommand))
{
…
}
config set stop-writes-on-bgsave-error no
Problem #3
Using Default Option
Redis as Cache
SAVE 900 1
SAVE 300 10
SAVE 60 10000
Heavy Disk IO
High Cpu Load
with creating RDB
Config set SAVE “”
Problem #4
Using Swap Memory
Redis using 28G
on single 32G machine
Migrate or Restart
Monitor Redis Server
and keep within bounds
Problem #5
Simultaneous AOF Rewrite
A 256GB Single Machine
Redis
26GB
Redis
26GB
Redis
26GB
Redis
26GB
Redis
26GB
Redis
26GB
Redis
26GB
Redis
26GB
Simultaneous AOF Rewrite
Redis
26GB
Redis
26GB
Redis
26GB
Redis
26GB
Redis
26GB
Redis
26GB
Redis
26GB
Redis
26GB
AOF Rewrite AOF Rewrite AOF Rewrite AOF Rewrite
AOF Rewrite AOF Rewrite AOF Rewrite AOF Rewrite
Stop all AOF Rewrites
Turn off Automatic
AOF Rewrite
Config set auto-aof-rewrite-percentage 0
Manually Run AOF Rewrite
Problem #6
Replication is Broken with
Network Line Failure
All redis replication
are broken
by Network line failure
What Happens
if network
is recovered
Replication
Master Slave
replicationCron
Health check Periodically
All slaves automatically
try to reconnect to
master.
Slave of no one
Problem #7
Replication Failure
Permission
Memory Allocation Fail
sysctl vm.overcommit_memory=1
Replication Failure
with OutputBufferSize
Hard Limit
Soft Limit
config set client-output-buffer-limit
"slave 1024mb 1024mb 60"
Problem #8
Hash Table Expansion
Redis Dict – Hash Table Expansion #1
Redis Dict – Hash Table Expansion #2
Redis Dict – Hash Table Expansion #3
Grows by twice
Maxmemory
and
freeMemoryIfNeeded
1 Billion items
1,000,000,000 * 4 = 4G
Maxmemory = 16G
Used_memory = 12G
Hash Table Expansion
is needed.
4G * 2 = 8G.
You need 20G(12G + 8G)
20G > 16G(maxmemory)
Need a feature that can
Set Initial size of Hash
Table
(Not Supported)
https://github.com/antirez/redis/pull/2812
Redis Monitoring
Monitoring is
important as much as
Management
Redis Monitoring Metrics
Factor System or Redis Info
CPU Usage, Load System
Network Inbound/outbound System
Client connections
Maxclient setting
Info
Key size
Processed commands
Redis
Memory Usage, RSS(very
Important)
Redis
Disk Usage, IO System
Expired Keys, Evicted Keys Redis
Redis HA
Using DNS for Failover
Private Internal
DNS Server
with TTL 0
DNS HA Flow
Detect A
Redis
Failure
Change
B can write
Change DNS
A with B
Send A
Client Kill
New clients
Will connect
to B
B Config
rewrite
JVM
add –Dsun.net.inetaddr.ttl=0
twemproxy
using 0.4.1
Using
Coordinator
Zookeeper
Zookeeper with Redis Information
Zookeeper with Redis
Application Servers
ZooKeeper
Redis
Shard-1
Redis
Shard-2
Redis
Shard-3
Redis Cluster
Monitor
Get Redis Shard Information
Health Check
Update Shard
Info
Event: Node Add or Remove, Master change
Summary
•Redis is Single Threaded
•Creating RDB or AOF Rewrite is expensive
•Don’t use KEYS command.
•Don’t use default redis configuration.
•Monitoring is very importatnt.
Thanks

More Related Content

PPTX
RedisConf17- Using Redis at scale @ Twitter
PPTX
Day 2 General Session Presentations RedisConf
PDF
Counting image views using redis cluster
PDF
Redis Day Keynote Salvatore Sanfillipo Redis Labs
PPTX
Redis Developers Day 2014 - Redis Labs Talks
PDF
Redis for horizontally scaled data processing at jFrog bintray
PDF
HIgh Performance Redis- Tague Griffith, GoPro
PDF
RedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman
RedisConf17- Using Redis at scale @ Twitter
Day 2 General Session Presentations RedisConf
Counting image views using redis cluster
Redis Day Keynote Salvatore Sanfillipo Redis Labs
Redis Developers Day 2014 - Redis Labs Talks
Redis for horizontally scaled data processing at jFrog bintray
HIgh Performance Redis- Tague Griffith, GoPro
RedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman

What's hot (20)

PPTX
What's new with enterprise Redis - Leena Joshi, Redis Labs
PPTX
Ceph Deployment at Target: Customer Spotlight
PDF
Redis in a Multi Tenant Environment–High Availability, Monitoring & Much More!
PDF
Experiences building a distributed shared log on RADOS - Noah Watkins
KEY
Handling Redis failover with ZooKeeper
PDF
Using Redis at Facebook
PDF
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
PPTX
RedisConf17 - Redis Cluster at flickr and tripod
PDF
Red Hat Storage Roadmap
PDF
Managing Redis with Kubernetes - Kelsey Hightower, Google
PPTX
MySQL on Ceph
PDF
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
PPTX
Redis on NVMe SSD - Zvika Guz, Samsung
PPTX
Redis Replication
PPTX
Which Hypervisor is Best?
PDF
Building Scalable, Real Time Applications for Financial Services with DataStax
PDF
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
PDF
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
PPTX
MySQL Head to Head Performance
PDF
RGW Beyond Cloud: Live Video Storage with Ceph - Shengjing Zhu, Yiming Xie
What's new with enterprise Redis - Leena Joshi, Redis Labs
Ceph Deployment at Target: Customer Spotlight
Redis in a Multi Tenant Environment–High Availability, Monitoring & Much More!
Experiences building a distributed shared log on RADOS - Noah Watkins
Handling Redis failover with ZooKeeper
Using Redis at Facebook
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
RedisConf17 - Redis Cluster at flickr and tripod
Red Hat Storage Roadmap
Managing Redis with Kubernetes - Kelsey Hightower, Google
MySQL on Ceph
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
Redis on NVMe SSD - Zvika Guz, Samsung
Redis Replication
Which Hypervisor is Best?
Building Scalable, Real Time Applications for Financial Services with DataStax
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
MySQL Head to Head Performance
RGW Beyond Cloud: Live Video Storage with Ceph - Shengjing Zhu, Yiming Xie
Ad

Viewers also liked (19)

PPTX
Redis Networking Nerd Down: For Lovers of Packets and Jumbo Frames- John Bull...
PDF
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
PDF
Background Tasks in Node - Evan Tahler, TaskRabbit
PPTX
Dynomite @ Redis Conference 2016
PDF
Use Redis in Odd and Unusual Ways
PDF
Build a Geospatial App with Redis 3.2- Andrew Bass, Sean Yesmunt, Sergio Prad...
PPTX
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
PDF
Getting Started with Redis
PPTX
Redis Developers Day 2015 - Secondary Indexes and State of Lua
PDF
UV logic using redis bitmap
PDF
RespClient - Minimal Redis Client for PowerShell
PPTX
Using Redis as Distributed Cache for ASP.NET apps - Peter Kellner, 73rd Stre...
PPTX
RedisConf 2016 talk - The Redis API: Simple, Composable, Powerful
PDF
Scalable Streaming Data Pipelines with Redis
PDF
Cloud Foundry for Data Science
PDF
Back your App with MySQL & Redis, the Cloud Foundry Way- Kenny Bastani, Pivotal
PPTX
Redis & MongoDB: Stop Big Data Indigestion Before It Starts
PDF
Redis High availability and fault tolerance in a multitenant environment
PDF
March 29, 2016 Dr. Josiah Carlson talks about using Redis as a Time Series DB
Redis Networking Nerd Down: For Lovers of Packets and Jumbo Frames- John Bull...
Dynomite: A Highly Available, Distributed and Scalable Dynamo Layer--Ioannis ...
Background Tasks in Node - Evan Tahler, TaskRabbit
Dynomite @ Redis Conference 2016
Use Redis in Odd and Unusual Ways
Build a Geospatial App with Redis 3.2- Andrew Bass, Sean Yesmunt, Sergio Prad...
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Getting Started with Redis
Redis Developers Day 2015 - Secondary Indexes and State of Lua
UV logic using redis bitmap
RespClient - Minimal Redis Client for PowerShell
Using Redis as Distributed Cache for ASP.NET apps - Peter Kellner, 73rd Stre...
RedisConf 2016 talk - The Redis API: Simple, Composable, Powerful
Scalable Streaming Data Pipelines with Redis
Cloud Foundry for Data Science
Back your App with MySQL & Redis, the Cloud Foundry Way- Kenny Bastani, Pivotal
Redis & MongoDB: Stop Big Data Indigestion Before It Starts
Redis High availability and fault tolerance in a multitenant environment
March 29, 2016 Dr. Josiah Carlson talks about using Redis as a Time Series DB
Ad

Similar to Troubleshooting Redis- DaeMyung Kang, Kakao (20)

PDF
Redis ndc2013
PDF
Redis trouble shooting_eng
PDF
Redis begins
PDF
RedisConf18 - Redis at LINE - 25 Billion Messages Per Day
PDF
Redis acc
PDF
Redis acc 2015_eng
PDF
Redis Beyond
PDF
Redis part 2
PDF
Redis as a Cache Boosting Performance and Scalability
PPTX
Get more than a cache back! - ConFoo Montreal
PDF
PPTX
PDF
Redis Everywhere - Sunshine PHP
PPTX
This is redis - feature and usecase
PPTX
Redis meetup
PDF
An Introduction to Redis for Developers.pdf
PDF
Introduction to Redis
PDF
#SydPHP - The Magic of Redis
PPTX
Introduction to Redis
PPTX
Redis Clustering Advanced___31Mar2025.pptx
Redis ndc2013
Redis trouble shooting_eng
Redis begins
RedisConf18 - Redis at LINE - 25 Billion Messages Per Day
Redis acc
Redis acc 2015_eng
Redis Beyond
Redis part 2
Redis as a Cache Boosting Performance and Scalability
Get more than a cache back! - ConFoo Montreal
Redis Everywhere - Sunshine PHP
This is redis - feature and usecase
Redis meetup
An Introduction to Redis for Developers.pdf
Introduction to Redis
#SydPHP - The Magic of Redis
Introduction to Redis
Redis Clustering Advanced___31Mar2025.pptx

More from Redis Labs (20)

PPTX
Redis Day Bangalore 2020 - Session state caching with redis
PPTX
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
PPTX
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
PPTX
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
PPTX
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
PPTX
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
PPTX
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
PPTX
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
PPTX
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
PPTX
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
PPTX
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
PPTX
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
PPTX
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
PPTX
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
PPTX
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
PPTX
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
PPTX
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
PPTX
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
PDF
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
PPTX
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Redis Day Bangalore 2020 - Session state caching with redis
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Advanced Soft Computing BINUS July 2025.pdf
PPTX
Cloud computing and distributed systems.
PPTX
Big Data Technologies - Introduction.pptx
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Advanced IT Governance
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Electronic commerce courselecture one. Pdf
NewMind AI Weekly Chronicles - August'25 Week I
MYSQL Presentation for SQL database connectivity
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Advanced Soft Computing BINUS July 2025.pdf
Cloud computing and distributed systems.
Big Data Technologies - Introduction.pptx
GamePlan Trading System Review: Professional Trader's Honest Take
The Rise and Fall of 3GPP – Time for a Sabbatical?
The AUB Centre for AI in Media Proposal.docx
NewMind AI Monthly Chronicles - July 2025
Advanced IT Governance
Dropbox Q2 2025 Financial Results & Investor Presentation
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Electronic commerce courselecture one. Pdf

Troubleshooting Redis- DaeMyung Kang, Kakao