SlideShare a Scribd company logo
Fail-Safe Starvation-Free Durable Priority Queues in Redis
Jesse H. Willett
jhw@prosperworks.com
https://github.com/jhwillett
2
ProsperWorks is a multi-tenant
CRM-as-a-Service.
ProsperWorks was built with three
basic principles in mind:
● Keep it simple.
● Show what matters.
● Make it actionable.
Who are We?
We help businesses sell more with a CRM teams actually love to use.
3
I am a server architect focused on storage, scaling, and asynchronous workloads.
I have worked at scale on public-facing live services built on many stacks:
● ProsperWorks: Postgres/Citus+Redis+Elasticsearch Ruby on Rails
● Lyft: MongoDB+Redis Doctrine/PHP
● Zynga: Memcache+Membase PHP
I have also worked on image processing grids, feature phone games, text search
engines, PC strategy games, and desktop publishing suites.
All of these systems had queues. Queues naturally manage the impedance
mismatch between system with different time or cost signatures.
Who am I?
4
Presenting Ick, a Redis-based priority queue which we have used in our
Postgres-to-Elasticsearch pipeline since Q3 2015.
Ick extends Redis Sorted Sets with 175 LoC of Lua. The combination neatly
solves many problems in asynchronous job processing.
“Ick” was my gut reaction to the idea of closing a race condition by deploying Lua
to Redis. Once successful, we adopted the backronym “Ick == Indexing QUeue”.
Ick is available via Ruby bindings in the gem redis-ick under the MIT License.
So far only Prosperworks uses redis-ick, and I am the only maintainer.
What is This?
5
● Redis Reliable Queue Pattern
○ Does not support deduplication or reordering.
○ Ick is RPOPLPUSH for Sorted Sets with a custom score update mode.
● Redis Streams
○ Does not support deduplication or reordering.
○ Still, we might have used Streams if they had been available in 2015.
● Apache Kafka
○ Log compaction could serve our deduplication needs, no reordering.
○ Too costly to own or rent for a small team in 2015, yet another storage service.
● Amazon Kinesis
○ Does not support deduplication or reordering.
○ Cost effective, yet another storage service.
Ick Comparables
6
● Our primary store is Postgres with a normalized entity-relationship model.
● Elasticsearch hosts search over a de-normalized form of our entities.
○ ES provides scale and advanced search features.
○ Mapping from PG to ES is coupled to our business logic, lives best in our code.
● Challenges keeping ES up-to-date with live changes in PG.
○ High-frequency fast PG updates from our web layer and from asynchronous jobs.
○ Low-frequency slow ES Bulk API calls.
○ A few seconds of latency in the PG ⇒ ES pipeline is acceptable.
○ UX degrades with minutes of latency. Hours of latency is unacceptable.
● A Natural Pattern:
○ When the app writes to PG, also put ids of dirty entities in a Redis queue.
○ In some cases, we also search out dirty entities in PG directly.
○ A background consumer process takes batches of dirty ids and updates ES in bulk.
Problem Space
7
# in producer
redis.rpush(key,msg)
# in consumer
batch = batch_size.times { redis. lpop(key) }.flatten # messages no longer in Redis
process_batch_slowly(batch)
● Advantages:
○ Simple
○ Many implementations: Resque works like this w/ batch_size 1
○ Scaling to many workers is straightforward.
● Disadvantages:
○ Messages lost on failure.
○ Unconstrained backlog growth when ES falls behind.
Solution 1: Basic List Pattern
8
Sometimes we see hot data: entities which are dirtied several times per second.
Under heavy load our ES Bulk API calls can take 5s or more.
With too much hot data, our backlog can grow without bound.
To get leverage over this problem we need to deduplicate messages.
We prefer deduplication at the queue level. We considered and rejected:
● One lock per message at enqueue time - brittle and expensive.
● Version information in the message - large decrease in solution generality.
We Really Care about Deduplication!
9
In Redis, this means we prefer Sorted Sets:
Sorted sets are a data type which is similar to a mix between a Set and a Hash. Like sets, sorted
sets are composed of unique, non-repeating string elements, so in some sense a sorted set is a set
as well.
However while elements inside sets are not ordered, every element in a sorted set is associated with
a floating point value, called the score [...]
Moreover, elements in a sorted sets are taken in order [by score].
Sorted Set accesses cost O(log N) versus the O(1) of Lists, but deduplicate.
Sorted Sets support FIFO-like behavior if we use timestamps as scores.
Sorted Sets for Deduplication
10
# in producer
redis.zadd(key,Time.now.to_f,msg) # Time.now for score ==> FIFO-like
# in consumer
batch = redis. zrangebyrank(key,0,batch_size) # critical section start
process_batch_slowly(batch)
redis.zrem(key,*batch.map(&:first)) # critical section end
● Advantages:
○ Messages preserved across failure.
○ De-duplication aka write-folding constrains backlog growth.
○ 1 + 2/batch_size Redis ops per message, down from 2 ops/message.
● Disadvantages:
○ Race condition between zadd and process_batch_slowly can lead to dropped messages.
○ Hot data can starve if continually re-added with a higher score.
Solution 2: Basic Sorted Set Pattern
11
# in producer
redis.zadd(key,Time.now.to_f,msg) # variadic ZADD is an option
# in consumer
batch = redis. zrangebyrank(key,0,batch_size)
process_batch_slowly(batch)
batch2 = redis. zrangebyrank(key,0,batch_size) # critical section start
unchanged = batch & batch2 # remove msgs whose scores have changed
redis.zrem(key,*unchanged.map(&:first)) # critical section end
● Advantages:
○ Critical section is smaller.
○ Critical section is not exposed to process_batch_slowly.
○ Messages only dropped from Redis after success (i.e. ZREM as ACK)
● Disadvantages:
○ Extra Redis op per cycle.
○ Hot data can starve if continually re-added with a higher score.
Solution 3: Improved Sorted Set Pattern
12
The Sorted Set solutions have a critical section where dirty signals can be lost,
and also a more subtle problem with hot data.
Hot data is continually re-added with higher scores.
During periods of intermediate load, we might carry a steady-state backlog which
is larger than a single batch size for an extended period.
When these conditions coincide, hot data may dance out of the low-score end of
the Sorted Set for hours.
We call this is the Hot Data Starvation Problem.
We Really Care about Hot Data!
13
An Ick is a pair of Redis Sorted Sets: a producer set and a consumer set.
● ICKADD adds messages to the producer set.
● ICKRESERVE moves lowest-score messages from the pset to the cset, then returns the cset.
● ICKCOMMIT removes messages from the cset.
● On duplicates, ICKADD and ICKRESERVE both select the minimum score.
ICKADD [score,msg]* app ==> Redis pset
ICKRESERVE n Redis pset ==> Redis cset up to size N ==> app return batch
ICKCOMMIT msgs* Redis cset removed
Introducing Ick
14
# push ‘a’ and ‘b’ into the Ick
Ick.new(redis). ickadd(key,123,’a’,456,’b’) # pset [[123,’a’],[456,’b’]]
# re-push ‘b’ with higher score, nothing changes
Ick.new(redis). ickadd(key,789,’b’) # pset [[123,’a’],[456,’b’]] unchanged
# re-push ‘b’ with lower score, score changes
Ick.new(redis). ickadd(key,100,’b’) # pset [[100,’b’],[123,’a’]] move b to 100
ICKADD adds to the producer set. Duplicates are assigned the minimum score.
Almost ZADD XX but more predictable.
Assuming scores trend up over time, there is no starvation. Scores never go up,
so all messages trend toward the lowest score, where they are consumed.
ICKADD
15
# push some messages into the Ick
Ick.new(redis). ickadd(key,12,’a’,10,’b’,13,’c’) # pset [[10,’b’],[12,’a’],[13,’c’]]
# reserve a batch
batch = Ick.new(redis). ickreserve(key,2) # pset [[13,’c’]] removed b and a
# cset [[10,’b’],[12,’a’]] added b and ad
# batch [’b’,10,’a’,12] per ZRANGE w/
score
# repeated ICKRESERVE just re-fetch the consumer set
batch = Ick.new(redis). ickreserve(key,2) # pset [[13,’c’]] unchanged
# cset [[10,’b’],[12,’a’]] unchanged
# batch [’b’,10,’a’,12] unchanged
ICKRESERVE fills up the consumer set by moving the lowest-score messages from
the producer set, then returns the consumer set.
This merge respects the minimum score rule.
ICKRESERVE
16
# push some messages into the Ick
Ick.new(redis). ickadd(key,12,’a’,10,’b’,13,’c’) # pset [[10,’b’],[12,’a’],[13,’c’]]
# reserve a batch
batch = Ick.new(redis). ickreserve(key,2) # pset [[13,’c’]] removed b and a
# cset [[10,’b’],[12,’a’]] added b and a
# batch [’b’,10,’a’,12] per ZRANGE w/
score
# commit ‘a’ to acknowledge success
Ick.new(redis). ickcommit(key,’a’) # pset [[13,’c’]] unchanged
# cset [[10,’b’]] removed a
ICKCOMMIT forgets messages in the producer set.
ICKCOMMIT
17
● All Ick ops are bulk operations and support multiple messages per Redis ops.
● Duplicate messages always resolved to the minimum score.
● We use current timestamps for scores.
○ The scores of new messages tends to increase.
● Even a hot data does not lose its place in line.
● A message can be present in both the pset and the cset.
○ When it is re-added after being reserved.
○ Good: reifies the critical section where PG vs ES agreement is indeterminate.
Properties of Icks
18
# in producer
Ick.new(redis). ickadd(key,Time.now.to_f,msg) # supports variadic bulk ICKADD
# in consumer
batch = Ick.new(redis). ickreserve(key,batch_size)
process_batch_slowly(batch)
Ick.new(redis). ickcommit(key,*batch.map(&:first)) # critical section only in Redis tx
● Advantages:
○ Critical section is bundled up in a Redis transaction.
○ Hot data starvation solved by constraining scores to only decrease, never increase.
○ Messages only dropped from Redis after success (i.e. ICKCOMMIT as ACK)
● Disadvantages:
○ Must deploy Lua to your Redis.
○ Not inherently scalable.
Solution 4: Ick Pattern
19
Ick support for multiple Ick consumers was considered but rejected:
● Consumer processes would need to identify themselves somehow.
● How are messages allocated to consumers?
● How do consumers come and go?
● Will this break deduplication or other serializability guarantees?
● How can the app customize?
We scale at the app level by hashing messages over many Ick+consumer pairs.
This suffers from head-of-line blocking but keeps these hard problems in
higher-level code which we can monitor and tie to business logic more easily.
Dealing with Scale
20
We usually use the current time for score in our Icks.
This is FIFO-like: any backlog has priority over current demand has priority over
future demand.
Unfortunately, resources are finite. We alert when the scores of the current batch
get older than our service level objectives.
Unfortunately, demand is bursty. For bulk operations we offset the scores by 5
seconds plus 1 second per 100 messages.
That is, as bulk operations get bulkier they also get nicer.
Advanced Ick Patterns: Hilbert’s SLA
21
I recently added a new Ick operation which combines ICKCOMMIT of the last
batch with ICKRESERVE for the next batch:
last_batch = []
while still_going() do
next_batch = Ick.new(redis). ickexchange(key,batch_size,*last_batch.map(&:first))
process_batch_slowly(next_batch)
last_batch = next_batch
end
Ick.new(redis). ickexchange(key,0,*last_batch.map(&:first))
It is gratifying to have two-phase commit without doubling the Redis ops.
This pattern would be useful in any two-phase commit or pipeline system.
Advanced Ick Patterns: ICKEXCHANGE
22
I anticipate using Ick to schedule delayed jobs by using scores as “release date”.
To support this I added an option to ICKRESERVE:
# push messages and reserve initial batch
Ick.new(redis). ickadd(key,12,’a’,10,’b’,13,’c’) # pset [[10,’b’],[12,’a’],[13,’c’]]
Ick.new(redis). ickreserve(key,2) # pset [[13,’c’]] moved b and a
# cset [[10,’b’],[12,’a’]] moved b and a
# no commits, but a younger message is added
Ick.new(redis). ickadd(key,7,’x’) # pset [[7,’x’],[13,’c’]] 7 sorts first
# cset [[10,’b’],[12,’a’]] but cset is full
# plain reserve is wedged but backwash unblocks
Ick.new(redis). ickreserve(key,2) # pset [[7,’x’],[13,’c’]] no change
# cset [[10,’b’],[12,’a’]] full
Ick.new(redis). ickreserve(key,2,backwash: true) # pset [[12,’a’],[13,’c’]] backwashed a and
b!
# cset [[7,’x’],[10,’b’]] unblocked x!
Advanced Ick Patterns: Backwash
Thank You
Jesse H. Willett
jhw@prosperworks.com
https://github.com/jhwillett

More Related Content

PPTX
Site specific drug delivery utilizing monoclonal antibodies
PPTX
Tumor Immunity
PPT
Python redis talk
PDF
Redis Everywhere - Sunshine PHP
PDF
Redis: REmote DIctionary Server
PDF
Tuga IT 2017 - Redis
PPTX
RedisConf18 - Techniques for Synchronizing In-Memory Caches with Redis
PDF
Speed up your Symfony2 application and build awesome features with Redis
Site specific drug delivery utilizing monoclonal antibodies
Tumor Immunity
Python redis talk
Redis Everywhere - Sunshine PHP
Redis: REmote DIctionary Server
Tuga IT 2017 - Redis
RedisConf18 - Techniques for Synchronizing In-Memory Caches with Redis
Speed up your Symfony2 application and build awesome features with Redis

Similar to RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis (20)

PDF
Paris Redis Meetup Introduction
PDF
PDF
Redis Streams - Fiverr Tech5 meetup
PDF
Introduction to Redis
PDF
Redis v5 & Streams
KEY
Indexing thousands of writes per second with redis
PPTX
Our journey into scalable player engagement platform
PDF
mar07-redis.pdf
PDF
An Introduction to Redis for Developers.pdf
PDF
How you can benefit from using Redis. Javier Ramirez, teowaki, at Codemotion ...
PDF
Redis — The AK-47 of Post-relational Databases
PPTX
Introduction to Redis
PPT
Redis And python at pycon_2011
PDF
Fun with ruby and redis, arrrrcamp edition, javier_ramirez, teowaki
PDF
Introduction to redis - version 2
KEY
Redis in Practice
PPTX
Azure Redis Cache - Cache on Steroids!
PDF
RedisConf18 - Redis at LINE - 25 Billion Messages Per Day
PDF
Redis 맛보기
PDF
Anton Moldovan "Building an efficient replication system for thousands of ter...
Paris Redis Meetup Introduction
Redis Streams - Fiverr Tech5 meetup
Introduction to Redis
Redis v5 & Streams
Indexing thousands of writes per second with redis
Our journey into scalable player engagement platform
mar07-redis.pdf
An Introduction to Redis for Developers.pdf
How you can benefit from using Redis. Javier Ramirez, teowaki, at Codemotion ...
Redis — The AK-47 of Post-relational Databases
Introduction to Redis
Redis And python at pycon_2011
Fun with ruby and redis, arrrrcamp edition, javier_ramirez, teowaki
Introduction to redis - version 2
Redis in Practice
Azure Redis Cache - Cache on Steroids!
RedisConf18 - Redis at LINE - 25 Billion Messages Per Day
Redis 맛보기
Anton Moldovan "Building an efficient replication system for thousands of ter...
Ad

More from Redis Labs (20)

PPTX
Redis Day Bangalore 2020 - Session state caching with redis
PPTX
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
PPTX
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
PPTX
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
PPTX
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
PPTX
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
PPTX
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
PPTX
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
PPTX
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
PPTX
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
PPTX
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
PPTX
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
PPTX
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
PPTX
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
PPTX
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
PPTX
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
PPTX
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
PPTX
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
PDF
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
PPTX
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Redis Day Bangalore 2020 - Session state caching with redis
Protecting Your API with Redis by Jane Paek - Redis Day Seattle 2020
The Happy Marriage of Redis and Protobuf by Scott Haines of Twilio - Redis Da...
SQL, Redis and Kubernetes by Paul Stanton of Windocks - Redis Day Seattle 2020
Rust and Redis - Solving Problems for Kubernetes by Ravi Jagannathan of VMwar...
Redis for Data Science and Engineering by Dmitry Polyakovsky of Oracle
Practical Use Cases for ACLs in Redis 6 by Jamie Scott - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Leveraging Redis for System Monitoring by Adam McCormick of SBG - Redis Day S...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
Highly Available Persistent Session Management Service by Mohamed Elmergawi o...
Anatomy of a Redis Command by Madelyn Olson of Amazon Web Services - Redis Da...
Building a Multi-dimensional Analytics Engine with RedisGraph by Matthew Goos...
RediSearch 1.6 by Pieter Cailliau - Redis Day Bangalore 2020
RedisGraph 2.0 by Pieter Cailliau - Redis Day Bangalore 2020
RedisTimeSeries 1.2 by Pieter Cailliau - Redis Day Bangalore 2020
RedisAI 0.9 by Sherin Thomas of Tensorwerk - Redis Day Bangalore 2020
Rate-Limiting 30 Million requests by Vijay Lakshminarayanan and Girish Koundi...
Three Pillars of Observability by Rajalakshmi Raji Srinivasan of Site24x7 Zoh...
Solving Complex Scaling Problems by Prashant Kumar and Abhishek Jain of Myntr...
Ad

Recently uploaded (20)

PDF
Modernizing your data center with Dell and AMD
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Encapsulation theory and applications.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
cuic standard and advanced reporting.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Modernizing your data center with Dell and AMD
Mobile App Security Testing_ A Comprehensive Guide.pdf
Understanding_Digital_Forensics_Presentation.pptx
Big Data Technologies - Introduction.pptx
Encapsulation_ Review paper, used for researhc scholars
Reach Out and Touch Someone: Haptics and Empathic Computing
The Rise and Fall of 3GPP – Time for a Sabbatical?
Unlocking AI with Model Context Protocol (MCP)
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
20250228 LYD VKU AI Blended-Learning.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Digital-Transformation-Roadmap-for-Companies.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Encapsulation theory and applications.pdf
Machine learning based COVID-19 study performance prediction
cuic standard and advanced reporting.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Spectral efficient network and resource selection model in 5G networks
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis

  • 1. Fail-Safe Starvation-Free Durable Priority Queues in Redis Jesse H. Willett jhw@prosperworks.com https://github.com/jhwillett
  • 2. 2 ProsperWorks is a multi-tenant CRM-as-a-Service. ProsperWorks was built with three basic principles in mind: ● Keep it simple. ● Show what matters. ● Make it actionable. Who are We? We help businesses sell more with a CRM teams actually love to use.
  • 3. 3 I am a server architect focused on storage, scaling, and asynchronous workloads. I have worked at scale on public-facing live services built on many stacks: ● ProsperWorks: Postgres/Citus+Redis+Elasticsearch Ruby on Rails ● Lyft: MongoDB+Redis Doctrine/PHP ● Zynga: Memcache+Membase PHP I have also worked on image processing grids, feature phone games, text search engines, PC strategy games, and desktop publishing suites. All of these systems had queues. Queues naturally manage the impedance mismatch between system with different time or cost signatures. Who am I?
  • 4. 4 Presenting Ick, a Redis-based priority queue which we have used in our Postgres-to-Elasticsearch pipeline since Q3 2015. Ick extends Redis Sorted Sets with 175 LoC of Lua. The combination neatly solves many problems in asynchronous job processing. “Ick” was my gut reaction to the idea of closing a race condition by deploying Lua to Redis. Once successful, we adopted the backronym “Ick == Indexing QUeue”. Ick is available via Ruby bindings in the gem redis-ick under the MIT License. So far only Prosperworks uses redis-ick, and I am the only maintainer. What is This?
  • 5. 5 ● Redis Reliable Queue Pattern ○ Does not support deduplication or reordering. ○ Ick is RPOPLPUSH for Sorted Sets with a custom score update mode. ● Redis Streams ○ Does not support deduplication or reordering. ○ Still, we might have used Streams if they had been available in 2015. ● Apache Kafka ○ Log compaction could serve our deduplication needs, no reordering. ○ Too costly to own or rent for a small team in 2015, yet another storage service. ● Amazon Kinesis ○ Does not support deduplication or reordering. ○ Cost effective, yet another storage service. Ick Comparables
  • 6. 6 ● Our primary store is Postgres with a normalized entity-relationship model. ● Elasticsearch hosts search over a de-normalized form of our entities. ○ ES provides scale and advanced search features. ○ Mapping from PG to ES is coupled to our business logic, lives best in our code. ● Challenges keeping ES up-to-date with live changes in PG. ○ High-frequency fast PG updates from our web layer and from asynchronous jobs. ○ Low-frequency slow ES Bulk API calls. ○ A few seconds of latency in the PG ⇒ ES pipeline is acceptable. ○ UX degrades with minutes of latency. Hours of latency is unacceptable. ● A Natural Pattern: ○ When the app writes to PG, also put ids of dirty entities in a Redis queue. ○ In some cases, we also search out dirty entities in PG directly. ○ A background consumer process takes batches of dirty ids and updates ES in bulk. Problem Space
  • 7. 7 # in producer redis.rpush(key,msg) # in consumer batch = batch_size.times { redis. lpop(key) }.flatten # messages no longer in Redis process_batch_slowly(batch) ● Advantages: ○ Simple ○ Many implementations: Resque works like this w/ batch_size 1 ○ Scaling to many workers is straightforward. ● Disadvantages: ○ Messages lost on failure. ○ Unconstrained backlog growth when ES falls behind. Solution 1: Basic List Pattern
  • 8. 8 Sometimes we see hot data: entities which are dirtied several times per second. Under heavy load our ES Bulk API calls can take 5s or more. With too much hot data, our backlog can grow without bound. To get leverage over this problem we need to deduplicate messages. We prefer deduplication at the queue level. We considered and rejected: ● One lock per message at enqueue time - brittle and expensive. ● Version information in the message - large decrease in solution generality. We Really Care about Deduplication!
  • 9. 9 In Redis, this means we prefer Sorted Sets: Sorted sets are a data type which is similar to a mix between a Set and a Hash. Like sets, sorted sets are composed of unique, non-repeating string elements, so in some sense a sorted set is a set as well. However while elements inside sets are not ordered, every element in a sorted set is associated with a floating point value, called the score [...] Moreover, elements in a sorted sets are taken in order [by score]. Sorted Set accesses cost O(log N) versus the O(1) of Lists, but deduplicate. Sorted Sets support FIFO-like behavior if we use timestamps as scores. Sorted Sets for Deduplication
  • 10. 10 # in producer redis.zadd(key,Time.now.to_f,msg) # Time.now for score ==> FIFO-like # in consumer batch = redis. zrangebyrank(key,0,batch_size) # critical section start process_batch_slowly(batch) redis.zrem(key,*batch.map(&:first)) # critical section end ● Advantages: ○ Messages preserved across failure. ○ De-duplication aka write-folding constrains backlog growth. ○ 1 + 2/batch_size Redis ops per message, down from 2 ops/message. ● Disadvantages: ○ Race condition between zadd and process_batch_slowly can lead to dropped messages. ○ Hot data can starve if continually re-added with a higher score. Solution 2: Basic Sorted Set Pattern
  • 11. 11 # in producer redis.zadd(key,Time.now.to_f,msg) # variadic ZADD is an option # in consumer batch = redis. zrangebyrank(key,0,batch_size) process_batch_slowly(batch) batch2 = redis. zrangebyrank(key,0,batch_size) # critical section start unchanged = batch & batch2 # remove msgs whose scores have changed redis.zrem(key,*unchanged.map(&:first)) # critical section end ● Advantages: ○ Critical section is smaller. ○ Critical section is not exposed to process_batch_slowly. ○ Messages only dropped from Redis after success (i.e. ZREM as ACK) ● Disadvantages: ○ Extra Redis op per cycle. ○ Hot data can starve if continually re-added with a higher score. Solution 3: Improved Sorted Set Pattern
  • 12. 12 The Sorted Set solutions have a critical section where dirty signals can be lost, and also a more subtle problem with hot data. Hot data is continually re-added with higher scores. During periods of intermediate load, we might carry a steady-state backlog which is larger than a single batch size for an extended period. When these conditions coincide, hot data may dance out of the low-score end of the Sorted Set for hours. We call this is the Hot Data Starvation Problem. We Really Care about Hot Data!
  • 13. 13 An Ick is a pair of Redis Sorted Sets: a producer set and a consumer set. ● ICKADD adds messages to the producer set. ● ICKRESERVE moves lowest-score messages from the pset to the cset, then returns the cset. ● ICKCOMMIT removes messages from the cset. ● On duplicates, ICKADD and ICKRESERVE both select the minimum score. ICKADD [score,msg]* app ==> Redis pset ICKRESERVE n Redis pset ==> Redis cset up to size N ==> app return batch ICKCOMMIT msgs* Redis cset removed Introducing Ick
  • 14. 14 # push ‘a’ and ‘b’ into the Ick Ick.new(redis). ickadd(key,123,’a’,456,’b’) # pset [[123,’a’],[456,’b’]] # re-push ‘b’ with higher score, nothing changes Ick.new(redis). ickadd(key,789,’b’) # pset [[123,’a’],[456,’b’]] unchanged # re-push ‘b’ with lower score, score changes Ick.new(redis). ickadd(key,100,’b’) # pset [[100,’b’],[123,’a’]] move b to 100 ICKADD adds to the producer set. Duplicates are assigned the minimum score. Almost ZADD XX but more predictable. Assuming scores trend up over time, there is no starvation. Scores never go up, so all messages trend toward the lowest score, where they are consumed. ICKADD
  • 15. 15 # push some messages into the Ick Ick.new(redis). ickadd(key,12,’a’,10,’b’,13,’c’) # pset [[10,’b’],[12,’a’],[13,’c’]] # reserve a batch batch = Ick.new(redis). ickreserve(key,2) # pset [[13,’c’]] removed b and a # cset [[10,’b’],[12,’a’]] added b and ad # batch [’b’,10,’a’,12] per ZRANGE w/ score # repeated ICKRESERVE just re-fetch the consumer set batch = Ick.new(redis). ickreserve(key,2) # pset [[13,’c’]] unchanged # cset [[10,’b’],[12,’a’]] unchanged # batch [’b’,10,’a’,12] unchanged ICKRESERVE fills up the consumer set by moving the lowest-score messages from the producer set, then returns the consumer set. This merge respects the minimum score rule. ICKRESERVE
  • 16. 16 # push some messages into the Ick Ick.new(redis). ickadd(key,12,’a’,10,’b’,13,’c’) # pset [[10,’b’],[12,’a’],[13,’c’]] # reserve a batch batch = Ick.new(redis). ickreserve(key,2) # pset [[13,’c’]] removed b and a # cset [[10,’b’],[12,’a’]] added b and a # batch [’b’,10,’a’,12] per ZRANGE w/ score # commit ‘a’ to acknowledge success Ick.new(redis). ickcommit(key,’a’) # pset [[13,’c’]] unchanged # cset [[10,’b’]] removed a ICKCOMMIT forgets messages in the producer set. ICKCOMMIT
  • 17. 17 ● All Ick ops are bulk operations and support multiple messages per Redis ops. ● Duplicate messages always resolved to the minimum score. ● We use current timestamps for scores. ○ The scores of new messages tends to increase. ● Even a hot data does not lose its place in line. ● A message can be present in both the pset and the cset. ○ When it is re-added after being reserved. ○ Good: reifies the critical section where PG vs ES agreement is indeterminate. Properties of Icks
  • 18. 18 # in producer Ick.new(redis). ickadd(key,Time.now.to_f,msg) # supports variadic bulk ICKADD # in consumer batch = Ick.new(redis). ickreserve(key,batch_size) process_batch_slowly(batch) Ick.new(redis). ickcommit(key,*batch.map(&:first)) # critical section only in Redis tx ● Advantages: ○ Critical section is bundled up in a Redis transaction. ○ Hot data starvation solved by constraining scores to only decrease, never increase. ○ Messages only dropped from Redis after success (i.e. ICKCOMMIT as ACK) ● Disadvantages: ○ Must deploy Lua to your Redis. ○ Not inherently scalable. Solution 4: Ick Pattern
  • 19. 19 Ick support for multiple Ick consumers was considered but rejected: ● Consumer processes would need to identify themselves somehow. ● How are messages allocated to consumers? ● How do consumers come and go? ● Will this break deduplication or other serializability guarantees? ● How can the app customize? We scale at the app level by hashing messages over many Ick+consumer pairs. This suffers from head-of-line blocking but keeps these hard problems in higher-level code which we can monitor and tie to business logic more easily. Dealing with Scale
  • 20. 20 We usually use the current time for score in our Icks. This is FIFO-like: any backlog has priority over current demand has priority over future demand. Unfortunately, resources are finite. We alert when the scores of the current batch get older than our service level objectives. Unfortunately, demand is bursty. For bulk operations we offset the scores by 5 seconds plus 1 second per 100 messages. That is, as bulk operations get bulkier they also get nicer. Advanced Ick Patterns: Hilbert’s SLA
  • 21. 21 I recently added a new Ick operation which combines ICKCOMMIT of the last batch with ICKRESERVE for the next batch: last_batch = [] while still_going() do next_batch = Ick.new(redis). ickexchange(key,batch_size,*last_batch.map(&:first)) process_batch_slowly(next_batch) last_batch = next_batch end Ick.new(redis). ickexchange(key,0,*last_batch.map(&:first)) It is gratifying to have two-phase commit without doubling the Redis ops. This pattern would be useful in any two-phase commit or pipeline system. Advanced Ick Patterns: ICKEXCHANGE
  • 22. 22 I anticipate using Ick to schedule delayed jobs by using scores as “release date”. To support this I added an option to ICKRESERVE: # push messages and reserve initial batch Ick.new(redis). ickadd(key,12,’a’,10,’b’,13,’c’) # pset [[10,’b’],[12,’a’],[13,’c’]] Ick.new(redis). ickreserve(key,2) # pset [[13,’c’]] moved b and a # cset [[10,’b’],[12,’a’]] moved b and a # no commits, but a younger message is added Ick.new(redis). ickadd(key,7,’x’) # pset [[7,’x’],[13,’c’]] 7 sorts first # cset [[10,’b’],[12,’a’]] but cset is full # plain reserve is wedged but backwash unblocks Ick.new(redis). ickreserve(key,2) # pset [[7,’x’],[13,’c’]] no change # cset [[10,’b’],[12,’a’]] full Ick.new(redis). ickreserve(key,2,backwash: true) # pset [[12,’a’],[13,’c’]] backwashed a and b! # cset [[7,’x’],[10,’b’]] unblocked x! Advanced Ick Patterns: Backwash
  • 23. Thank You Jesse H. Willett jhw@prosperworks.com https://github.com/jhwillett