SlideShare a Scribd company logo
Redis TLV Meetup
v5 & Streams
@itamarhaber, October 2018
Redis v5
Redis v5
• Released on Wed Oct 17 13:28:26 CEST 2018
• 9.5 years after v1, 15 months of development
• Major feature: Streams
• Other stuff:
– Project Spartacus
– Active Defragmentation v2
– LOLWUT
– ZPOPMIN/MAX and their blocking variants
– Integrated help for subcommands
Georg Ness Schotter vs LOLWUT (performance)
©VictoriaandAlbertMuseum,London
ZPOP - youtube.com/watch?v=Xk4avdjdM-E
Sorted Sets now support List-like pop operations.
â—Ź ZPOPMIN - removes and returns the
lowest-ranking member
â—Ź ZPOPMAX - same, but for
highest-ranking
â—Ź BZPOPMIN, BZPOPMAX -
blocking variants
Integrated help for subcommands, e.g.:
Streams
A data stream is a sequence of elements. Consider:
• Real time sensor readings, e.g. particle colliders
• IoT, e.g. the irrigation of avocado groves
• User activity in an application
• …
• Messages in distributed systems
In the context of data processing...
… one in which the failure of a computer you
didn't even know existed can render your own
computer unusable.
Leslie Lamport
A distributed system is
“
... a model in which components located on
networked computers communicate and
coordinate their actions by passing messages
Distributed Computing, Wikipedia
Includes: client-server, 3/n-tier, peer to peer, SOA,
micro- & nanoservices, FaaS & serverless…
A distributed system is
“
There are only two hard problems in
distributed systems:
2. Exactly-once delivery
1. Guaranteed order of messages
2. Exactly-once delivery
Mathias Verraes, on Twitter
An observation
“
Fact #1: you can choose one and only one:
• At-most-once delivery, i.e. "fire and forget"
• At-least-once delivery, i.e. explicit
acknowledgement sent by the receiver
Fact #2: exactly-once delivery doesn't exist (unless
you change the definition of MSD)
Observation: order is usually important (duh)
Refresher on message delivery semantics
Consider the non-exhaustive list at taskqueues.com
• 17 message brokers, including: Apache Kafka,
NATS, RabbitMQ and Redis
• 17 queue solutions, including: Celery, Kue,
Laravel, Sidekiq, Resque and RQ (<- all these use
Redis as their backend btw ;))
And that's without considering protocol-based
architectures, legacy busses etc...
This isn't exactly a new challenge
Streams: Anatomy
The Log is a storage abstraction that is:
• Append-only, can be truncated
• A sequence of records ordered by time
A Logical Log is:
• Based on a logical offset, e.g. time (vs. bytes)
• (Therefore time range queries)
• Can be made up of data structures (vs. lines)
A Stream is not unlike a (logical) log
A Stream is (also) a storage abstraction, that is
basically an ordered, logical log of messages
(records). These messages are:
â—Ź Made up of:
â—‹ Data payload (semi-structured usually)
â—‹ Metadata (e.g. identifier)
â—Ź Immutable, once created
â—Ź Always added to the end of the stream
So what is a stream?
A producer is a software component that adds
messages (always to the end of) a stream.
A consumer is a software component that reads
messages from a stream (and acts on them). It can
start reading the messages from any arbitrary
offset, or just wait for new ones.
The Stream Players: producers and consumers
A picture I made of a stream
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Producer
Consumer 1
position
Consumer 2
position
"Next"
message
Message 0
1. A component can both be a producer and a
consumer. Either for the same, or between
different streams. It depends.
2. Multiple stream producers can exist. At least one
is usually needed though.
3. A stream can exist without any consumers.
That's kind of pointless though.
Some observations
Streams: Motivation
Consider the alternative, i.e. batch processing.
Besides fast response times (batch=1):
â—Ź Scalable (distributed) design
â—Ź Loose coupling of components and faults
â—Ź Enables building complex-er pipelines
Why (do architects) build stuff on streams?
… that architects like using, like:
â—Ź CQRS
â—Ź Event sourcing
â—Ź Unified (distributed commit) log
â—Ź Microservices
â—Ź ...
Also, they fit so well with other stuff...
An abstraction that is useful to work with (in
distributed systems).
Capable of trivially addressing message ordering.
Able to provide (depending on the implementation)
ATM and ALO MDS.
An enabler for Stream Processing (e.g. Spark
Streams, Kafka's Stream Processors).
The Stream (for architects) is
Streams: Redis'
Necessity is the mother of invention
There ain't no such thing as a free lunch
The existing (i.e. lists, sorted sets, PubSub) isn't
"good enough" for things like:
• Log/time series-like data patterns
• At-least-once messaging with fan-out
Also Disque, listpacks, radix trees & reading Kafka :)
Why reinvent hot water (in Redis)?
“
“
â—Ź Sorted Sets? Memory hungry, no `BZMINPOP` (at
that time ;)), ordering depends on mutable score
and require uniqueness of elements
â—Ź Lists? Inefficient access (linear), index
(changeable)-based, and only have queue-like
blocking operations (single consumer)
â—Ź PubSub? Fan-out, sure, but only AMO MDS
How do you model "messages" in Redis < v5?
â—Ź A project by Salvatore Sanfilippo
â—Ź Like Redis, but is
â—Ź "a distributed, in-memory, message broker"
â—Ź Eventually consistent (AP in CAP terms)
â—Ź Last updated: Jan 2016
â—Ź Planned to: come back as a Redis module in v6
â—Ź Observation: A Stream "API" can also be built on
top of a message broker (see Kafka)
Interjection: What is Disque?
â—Ź A 1st-class citizen, a data structure like any other
â—Ź The most complex, implementation-wise
â—Ź Stores entries
â—Ź Is conceptually made up from 3 APIs:
a. Producer
b. Consumer
c. Consumers Group
What is the Redis Stream?
XADD key [MAXLEN ~ n] <ID | *>
<field> <string> [field string…]
> XADD mystream * foo bar baz qaz
1532556560197-42
Time complexity: O(log n)
See https://redis.io/commands/xadd
The Redis Stream Producer API
Every entry has a unique ID that is its logical offset.
The ID is in following format:
<epoch-milliseconds>-<sequence>
Each ID part is a 64-bit unsigned integer
Sequence is for ordering at millisecond scope
When user-provided, has to be bigger than latest.
When not, max(localtime, latest) is used.
The entry ID
Redis' Stream entries are made up of field-value
pairs. Like a Hash.
Unlike a Hash, repeating field names in consecutive
entries are compressed.
Values are not compressed. Yet.
(Time series engines often compress values, with
values being after all just numbers)
The entry itself
The `MAXLEN` subcommand is for that.
The `~` means about, less expensive to use.
The stream is capped by the number of entries.
Not by time frame.
Future regarding that is "yet unclear" - ref:
https://stackoverflow.com/questions/51168273/re
dis-stream-managing-a-time-frame
Side note: capped streams
XLEN key - does exactly that, not very interesting.
X[REV]RANGE key <start | -> < end | +>
[COUNT count] - much more interesting :)
Get a single entry (start = end = ID)
SCAN-like iteration on a stream (IDs inc.), but better
Range (timeframe) queries on a stream
What's in the stream "API"
> XRANGE mystream - +
1) 1) 1532556560197-0
2) 1) "foo"
2) "bar"
3) "baz"
4) "qaz"
A "real" picture of a stream
Yes. No. Maybe.
It can be used for consuming, but that requires the
client constantly polling the stream for new entries.
So generally, no. There's something better for
consumers.
Is X[REV]RANGE the Consumer API?
XREAD [COUNT count]
STREAMS key [key ...] ID [ID ...]
Somewhat like X[REV]RANGE, but:
â—Ź Supports multiple streams
â—Ź Easier to consume from an offset onwards
(compared to fetching ranges)
â—Ź But it is still polling, so...
The Redis Stream Consumer API
XREAD [COUNT count] [BLOCK ms]
STREAMS key [key ...] ID [ID ...]
â—Ź Like `BRPOP` (or `BZMINPOP` ;))
â—Ź Supports the special `$` ID, i.e. "new messages
since blockage"
â—Ź What about message delivery semantics?
The Redis Stream Consumer Blocking API
Like PubSub, it appears to "fire and forget", or
at-most-once delivery for efficient fan-out.
Contrastingly, messages in a stream are stored.
The consumer manages its last read ID, and can
resume from any point.
(And unlike blocking list (and zset :)) operations,
multiple consumers can consume the same stream)
XREAD [BLOCK] message delivery semantics …
A consumer of a stream gets all entries in order,
and will eventually become a bottleneck. Or fail.
Possible workarounds:
• Add a "type" field to each record - that's dumb
• Shard the stream to multiple keys - meh
• Have the consumer dispatch entries as jobs in
queues or messages in a … GOTO 10
The problem with scaling consumers
Consider the Stream.
There needs to be a way for constructing a
high-level/pseudo consumer, such that is made up
of multiple of its instances running in parallel, each
processing a mutually-exclusive subset of the
entries.
Another, high-level, perspective
… allow multiple consumers to cooperate in
processing messages arriving in a stream, so that
each consumers in a given group takes a subset
of the messages.
Shifts the complexity of recovering from consumer
failures and group management to the Redis server
Consumer Groups
“
A group picture (via @antirez)
1. Members are identified
2. New members get only undelivered messages
3. Each message is delivered to only one member
4. A member can only read its messages
5. A member must explicitly acknowledge the
receipt of messages
Observation: Big Brother (Redis) is observing you
Trivia: this is where most of the effort went into
(Consumer) Group membership rules
XREADGROUP GROUP
<groupname> <consumername>
[COUNT count] [BLOCK ms]
STREAMS key [key ...] ID [ID ...]
â—Ź consumername is the member's ID
â—Ź groupname is the name of the group
â—Ź The special `>` ID means "new messages", any
other ID returns the consumer's history
Consumers Group API, #1
XGROUP CREATE <key> <groupname>
<id or $> // Explicit creation!
// And key must exist
XGROUP SETID <key> <id or $>
XGROUP DESTROY <key> <groupname>
XGROUP DELCONSUMER <key>
<groupname> <consumername>
Consumers Group API, #2
One of the internal data structures used.
Tracks which member saw which messages.
â—Ź When a new message is delivered, a new entry in
the list is created
â—Ź When an "old" message is delivered, the last
delivered timestamp and number of deliveries
counter (for it) are updated
The Pending Entries List (PEL) is
XACK <key> <group> <id> [<id> …]
Acknowledges the receipt of messages.
(that's at-least-once message delivery semantics)
Essentially removes them from the PEL.
Observation: consumername is not required, only
an ID, so anyone can `XACK` pending messages.
Consumers Group API, #3
XPENDING <key> <group>
[<start> <stop> <count>]
[<consumer>]
XCLAIM <key> <group> <consumer>
<min-idle-time>
<id> [<id> …] [MOAR]
CG introspection & handling consumer failures.
Consumers Group API, #4
XINFO <key>
XDEL <key> <id> [<id> …]
XTRIM <key> [MAXLEN [~] <n>]
Streams API, some loose ends
Definitive answer!
Mebbe.
K10xby!!one
Question?
• Introduction to Redis Streams https://redis.io/topics/streams-intro
• The Redis Manifesto https://github.com/antirez/redis/blob/unstable/MANIFESTO
• Salvatore's blog posts http://antirez.com/news/114 and http://antirez.com/news/116
• Salvatore's inaugural Streams demo https://www.youtube.com/watch?v=ELDzy9lCFHQ
• Salvatore's live demo at Redis Day Tel Aviv 2018
https://www.youtube.com/watch?v=qXEyuUxQXZM
• RCP 11 - The stream data type https://github.com/redis/redis-rcp/blob/master/RCP11.md
• Reddit discussion
https://www.reddit.com/r/redis/comments/4mmrgr/stream_data_structure_for_redis_lets
_design_it/
• Hacker News discussion https://news.ycombinator.com/item?id=15384396
• Consumer groups specification
https://gist.github.com/antirez/68e67f3251d10f026861be2d0fe0d2f4
• Consumer groups API https://gist.github.com/antirez/4e7049ce4fce4aa61bf0cfbc3672e64d
& https://gist.github.com/antirez/4e7049ce4fce4aa61bf0cfbc3672e64d
(some) Redis References

More Related Content

PDF
Redis Lua Scripts
PDF
Introduction to Redis
PDF
Redis Modules API - an introduction
PDF
Redis vs Infinispan | DevNation Tech Talk
PDF
Boosting Machine Learning with Redis Modules and Spark
PDF
Postgres clusters
PPTX
Level DB - Quick Cheat Sheet
PPTX
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Redis Lua Scripts
Introduction to Redis
Redis Modules API - an introduction
Redis vs Infinispan | DevNation Tech Talk
Boosting Machine Learning with Redis Modules and Spark
Postgres clusters
Level DB - Quick Cheat Sheet
Thrift vs Protocol Buffers vs Avro - Biased Comparison

What's hot (20)

PPTX
HBaseCon 2015: OpenTSDB and AsyncHBase Update
KEY
Everything I Ever Learned About JVM Performance Tuning @Twitter
PDF
3 avro hug-2010-07-21
PDF
Managing terabytes: When Postgres gets big
PDF
Cassandra Explained
PDF
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
PDF
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
PDF
Serialization in Go
PPTX
Learning Cassandra
PDF
HBaseCon2017 gohbase: Pure Go HBase Client
PDF
Kafka Summit SF 2017 - Shopify Flash Sales with Apache Kafka
PDF
Pgxc scalability pg_open2012
PDF
Distributed Postgres
PDF
Ekon24 mORMot 2
PPTX
RocksDB compaction
PDF
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
PDF
Log Structured Merge Tree
ODP
HornetQ Presentation On JBoss World 2009
PDF
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
PPTX
Some key value stores using log-structure
HBaseCon 2015: OpenTSDB and AsyncHBase Update
Everything I Ever Learned About JVM Performance Tuning @Twitter
3 avro hug-2010-07-21
Managing terabytes: When Postgres gets big
Cassandra Explained
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
CassieQ: The Distributed Message Queue Built on Cassandra (Anton Kropp, Cural...
Serialization in Go
Learning Cassandra
HBaseCon2017 gohbase: Pure Go HBase Client
Kafka Summit SF 2017 - Shopify Flash Sales with Apache Kafka
Pgxc scalability pg_open2012
Distributed Postgres
Ekon24 mORMot 2
RocksDB compaction
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Log Structured Merge Tree
HornetQ Presentation On JBoss World 2009
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
Some key value stores using log-structure
Ad

Similar to Redis v5 & Streams (20)

PDF
Redis Streams - Fiverr Tech5 meetup
PDF
Beyond Caching: Extending Redis Enterprise for Real-Time Streams Processing
PDF
Using Redis Streams To Build Event Driven Microservices And User Interface In...
PDF
Atom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
PPTX
Redis Streams plus Spark Structured Streaming
PDF
Redis+Spark Structured Streaming: Roshan Kumar
PPTX
Redis Streams
PDF
Learning how to Streams in Kafka and Redis.pdf
PPTX
Redis Streams for Event-Driven Microservices
PDF
An Introduction to Redis for Developers.pdf
PDF
Streaming Solutions for Real time problems
PDF
Reactor, Reactive streams and MicroServices
PDF
Message broker.pdf
PDF
End-to-End Reactive Data Access Using R2DBC with RSocket and Proteus
PDF
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
PDF
Redis Everywhere - Sunshine PHP
PDF
Mini-Training: Redis
PPTX
Design Patterns for working with Fast Data in Kafka
PPTX
Design Patterns for working with Fast Data
PDF
Streaming systems - Part 1
Redis Streams - Fiverr Tech5 meetup
Beyond Caching: Extending Redis Enterprise for Real-Time Streams Processing
Using Redis Streams To Build Event Driven Microservices And User Interface In...
Atom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
Redis Streams plus Spark Structured Streaming
Redis+Spark Structured Streaming: Roshan Kumar
Redis Streams
Learning how to Streams in Kafka and Redis.pdf
Redis Streams for Event-Driven Microservices
An Introduction to Redis for Developers.pdf
Streaming Solutions for Real time problems
Reactor, Reactive streams and MicroServices
Message broker.pdf
End-to-End Reactive Data Access Using R2DBC with RSocket and Proteus
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
Redis Everywhere - Sunshine PHP
Mini-Training: Redis
Design Patterns for working with Fast Data in Kafka
Design Patterns for working with Fast Data
Streaming systems - Part 1
Ad

More from Itamar Haber (13)

PDF
How I Implemented the #1 Requested Feature In Redis In Less than 1 Hour with ...
PPTX
Developing a Redis Module - Hackathon Kickoff
PDF
Extend Redis with Modules
PDF
Leveraging Probabilistic Data Structures for Real Time Analytics with Redis M...
PDF
Power to the People: Redis Lua Scripts
PDF
What's new in Redis v3.2
PPTX
Redis Developers Day 2015 - Secondary Indexes and State of Lua
PDF
Use Redis in Odd and Unusual Ways
PPTX
Why Your MongoDB Needs Redis
PPTX
Redis & MongoDB: Stop Big Data Indigestion Before It Starts
PPTX
Benchmarking Redis by itself and versus other NoSQL databases
PPTX
Redis Indices (#RedisTLV)
PPTX
Redis Use Patterns (DevconTLV June 2014)
How I Implemented the #1 Requested Feature In Redis In Less than 1 Hour with ...
Developing a Redis Module - Hackathon Kickoff
Extend Redis with Modules
Leveraging Probabilistic Data Structures for Real Time Analytics with Redis M...
Power to the People: Redis Lua Scripts
What's new in Redis v3.2
Redis Developers Day 2015 - Secondary Indexes and State of Lua
Use Redis in Odd and Unusual Ways
Why Your MongoDB Needs Redis
Redis & MongoDB: Stop Big Data Indigestion Before It Starts
Benchmarking Redis by itself and versus other NoSQL databases
Redis Indices (#RedisTLV)
Redis Use Patterns (DevconTLV June 2014)

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Cloud computing and distributed systems.
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Approach and Philosophy of On baking technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
Teaching material agriculture food technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
 
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Machine learning based COVID-19 study performance prediction
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Cloud computing and distributed systems.
Reach Out and Touch Someone: Haptics and Empathic Computing
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Encapsulation_ Review paper, used for researhc scholars
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Approach and Philosophy of On baking technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Teaching material agriculture food technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Review of recent advances in non-invasive hemoglobin estimation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Unlocking AI with Model Context Protocol (MCP)
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
 
NewMind AI Weekly Chronicles - August'25 Week I
Mobile App Security Testing_ A Comprehensive Guide.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton

Redis v5 & Streams

  • 1. Redis TLV Meetup v5 & Streams @itamarhaber, October 2018
  • 3. Redis v5 • Released on Wed Oct 17 13:28:26 CEST 2018 • 9.5 years after v1, 15 months of development • Major feature: Streams • Other stuff: – Project Spartacus – Active Defragmentation v2 – LOLWUT – ZPOPMIN/MAX and their blocking variants – Integrated help for subcommands
  • 4. Georg Ness Schotter vs LOLWUT (performance) ©VictoriaandAlbertMuseum,London
  • 5. ZPOP - youtube.com/watch?v=Xk4avdjdM-E Sorted Sets now support List-like pop operations. â—Ź ZPOPMIN - removes and returns the lowest-ranking member â—Ź ZPOPMAX - same, but for highest-ranking â—Ź BZPOPMIN, BZPOPMAX - blocking variants
  • 6. Integrated help for subcommands, e.g.:
  • 8. A data stream is a sequence of elements. Consider: • Real time sensor readings, e.g. particle colliders • IoT, e.g. the irrigation of avocado groves • User activity in an application • … • Messages in distributed systems In the context of data processing...
  • 9. … one in which the failure of a computer you didn't even know existed can render your own computer unusable. Leslie Lamport A distributed system is “
  • 10. ... a model in which components located on networked computers communicate and coordinate their actions by passing messages Distributed Computing, Wikipedia Includes: client-server, 3/n-tier, peer to peer, SOA, micro- & nanoservices, FaaS & serverless… A distributed system is “
  • 11. There are only two hard problems in distributed systems: 2. Exactly-once delivery 1. Guaranteed order of messages 2. Exactly-once delivery Mathias Verraes, on Twitter An observation “
  • 12. Fact #1: you can choose one and only one: • At-most-once delivery, i.e. "fire and forget" • At-least-once delivery, i.e. explicit acknowledgement sent by the receiver Fact #2: exactly-once delivery doesn't exist (unless you change the definition of MSD) Observation: order is usually important (duh) Refresher on message delivery semantics
  • 13. Consider the non-exhaustive list at taskqueues.com • 17 message brokers, including: Apache Kafka, NATS, RabbitMQ and Redis • 17 queue solutions, including: Celery, Kue, Laravel, Sidekiq, Resque and RQ (<- all these use Redis as their backend btw ;)) And that's without considering protocol-based architectures, legacy busses etc... This isn't exactly a new challenge
  • 15. The Log is a storage abstraction that is: • Append-only, can be truncated • A sequence of records ordered by time A Logical Log is: • Based on a logical offset, e.g. time (vs. bytes) • (Therefore time range queries) • Can be made up of data structures (vs. lines) A Stream is not unlike a (logical) log
  • 16. A Stream is (also) a storage abstraction, that is basically an ordered, logical log of messages (records). These messages are: â—Ź Made up of: â—‹ Data payload (semi-structured usually) â—‹ Metadata (e.g. identifier) â—Ź Immutable, once created â—Ź Always added to the end of the stream So what is a stream?
  • 17. A producer is a software component that adds messages (always to the end of) a stream. A consumer is a software component that reads messages from a stream (and acts on them). It can start reading the messages from any arbitrary offset, or just wait for new ones. The Stream Players: producers and consumers
  • 18. A picture I made of a stream 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Producer Consumer 1 position Consumer 2 position "Next" message Message 0
  • 19. 1. A component can both be a producer and a consumer. Either for the same, or between different streams. It depends. 2. Multiple stream producers can exist. At least one is usually needed though. 3. A stream can exist without any consumers. That's kind of pointless though. Some observations
  • 21. Consider the alternative, i.e. batch processing. Besides fast response times (batch=1): â—Ź Scalable (distributed) design â—Ź Loose coupling of components and faults â—Ź Enables building complex-er pipelines Why (do architects) build stuff on streams?
  • 22. … that architects like using, like: â—Ź CQRS â—Ź Event sourcing â—Ź Unified (distributed commit) log â—Ź Microservices â—Ź ... Also, they fit so well with other stuff...
  • 23. An abstraction that is useful to work with (in distributed systems). Capable of trivially addressing message ordering. Able to provide (depending on the implementation) ATM and ALO MDS. An enabler for Stream Processing (e.g. Spark Streams, Kafka's Stream Processors). The Stream (for architects) is
  • 25. Necessity is the mother of invention There ain't no such thing as a free lunch The existing (i.e. lists, sorted sets, PubSub) isn't "good enough" for things like: • Log/time series-like data patterns • At-least-once messaging with fan-out Also Disque, listpacks, radix trees & reading Kafka :) Why reinvent hot water (in Redis)? “ “
  • 26. â—Ź Sorted Sets? Memory hungry, no `BZMINPOP` (at that time ;)), ordering depends on mutable score and require uniqueness of elements â—Ź Lists? Inefficient access (linear), index (changeable)-based, and only have queue-like blocking operations (single consumer) â—Ź PubSub? Fan-out, sure, but only AMO MDS How do you model "messages" in Redis < v5?
  • 27. â—Ź A project by Salvatore Sanfilippo â—Ź Like Redis, but is â—Ź "a distributed, in-memory, message broker" â—Ź Eventually consistent (AP in CAP terms) â—Ź Last updated: Jan 2016 â—Ź Planned to: come back as a Redis module in v6 â—Ź Observation: A Stream "API" can also be built on top of a message broker (see Kafka) Interjection: What is Disque?
  • 28. â—Ź A 1st-class citizen, a data structure like any other â—Ź The most complex, implementation-wise â—Ź Stores entries â—Ź Is conceptually made up from 3 APIs: a. Producer b. Consumer c. Consumers Group What is the Redis Stream?
  • 29. XADD key [MAXLEN ~ n] <ID | *> <field> <string> [field string…] > XADD mystream * foo bar baz qaz 1532556560197-42 Time complexity: O(log n) See https://redis.io/commands/xadd The Redis Stream Producer API
  • 30. Every entry has a unique ID that is its logical offset. The ID is in following format: <epoch-milliseconds>-<sequence> Each ID part is a 64-bit unsigned integer Sequence is for ordering at millisecond scope When user-provided, has to be bigger than latest. When not, max(localtime, latest) is used. The entry ID
  • 31. Redis' Stream entries are made up of field-value pairs. Like a Hash. Unlike a Hash, repeating field names in consecutive entries are compressed. Values are not compressed. Yet. (Time series engines often compress values, with values being after all just numbers) The entry itself
  • 32. The `MAXLEN` subcommand is for that. The `~` means about, less expensive to use. The stream is capped by the number of entries. Not by time frame. Future regarding that is "yet unclear" - ref: https://stackoverflow.com/questions/51168273/re dis-stream-managing-a-time-frame Side note: capped streams
  • 33. XLEN key - does exactly that, not very interesting. X[REV]RANGE key <start | -> < end | +> [COUNT count] - much more interesting :) Get a single entry (start = end = ID) SCAN-like iteration on a stream (IDs inc.), but better Range (timeframe) queries on a stream What's in the stream "API"
  • 34. > XRANGE mystream - + 1) 1) 1532556560197-0 2) 1) "foo" 2) "bar" 3) "baz" 4) "qaz" A "real" picture of a stream
  • 35. Yes. No. Maybe. It can be used for consuming, but that requires the client constantly polling the stream for new entries. So generally, no. There's something better for consumers. Is X[REV]RANGE the Consumer API?
  • 36. XREAD [COUNT count] STREAMS key [key ...] ID [ID ...] Somewhat like X[REV]RANGE, but: â—Ź Supports multiple streams â—Ź Easier to consume from an offset onwards (compared to fetching ranges) â—Ź But it is still polling, so... The Redis Stream Consumer API
  • 37. XREAD [COUNT count] [BLOCK ms] STREAMS key [key ...] ID [ID ...] â—Ź Like `BRPOP` (or `BZMINPOP` ;)) â—Ź Supports the special `$` ID, i.e. "new messages since blockage" â—Ź What about message delivery semantics? The Redis Stream Consumer Blocking API
  • 38. Like PubSub, it appears to "fire and forget", or at-most-once delivery for efficient fan-out. Contrastingly, messages in a stream are stored. The consumer manages its last read ID, and can resume from any point. (And unlike blocking list (and zset :)) operations, multiple consumers can consume the same stream) XREAD [BLOCK] message delivery semantics …
  • 39. A consumer of a stream gets all entries in order, and will eventually become a bottleneck. Or fail. Possible workarounds: • Add a "type" field to each record - that's dumb • Shard the stream to multiple keys - meh • Have the consumer dispatch entries as jobs in queues or messages in a … GOTO 10 The problem with scaling consumers
  • 40. Consider the Stream. There needs to be a way for constructing a high-level/pseudo consumer, such that is made up of multiple of its instances running in parallel, each processing a mutually-exclusive subset of the entries. Another, high-level, perspective
  • 41. … allow multiple consumers to cooperate in processing messages arriving in a stream, so that each consumers in a given group takes a subset of the messages. Shifts the complexity of recovering from consumer failures and group management to the Redis server Consumer Groups “
  • 42. A group picture (via @antirez)
  • 43. 1. Members are identified 2. New members get only undelivered messages 3. Each message is delivered to only one member 4. A member can only read its messages 5. A member must explicitly acknowledge the receipt of messages Observation: Big Brother (Redis) is observing you Trivia: this is where most of the effort went into (Consumer) Group membership rules
  • 44. XREADGROUP GROUP <groupname> <consumername> [COUNT count] [BLOCK ms] STREAMS key [key ...] ID [ID ...] â—Ź consumername is the member's ID â—Ź groupname is the name of the group â—Ź The special `>` ID means "new messages", any other ID returns the consumer's history Consumers Group API, #1
  • 45. XGROUP CREATE <key> <groupname> <id or $> // Explicit creation! // And key must exist XGROUP SETID <key> <id or $> XGROUP DESTROY <key> <groupname> XGROUP DELCONSUMER <key> <groupname> <consumername> Consumers Group API, #2
  • 46. One of the internal data structures used. Tracks which member saw which messages. â—Ź When a new message is delivered, a new entry in the list is created â—Ź When an "old" message is delivered, the last delivered timestamp and number of deliveries counter (for it) are updated The Pending Entries List (PEL) is
  • 47. XACK <key> <group> <id> [<id> …] Acknowledges the receipt of messages. (that's at-least-once message delivery semantics) Essentially removes them from the PEL. Observation: consumername is not required, only an ID, so anyone can `XACK` pending messages. Consumers Group API, #3
  • 48. XPENDING <key> <group> [<start> <stop> <count>] [<consumer>] XCLAIM <key> <group> <consumer> <min-idle-time> <id> [<id> …] [MOAR] CG introspection & handling consumer failures. Consumers Group API, #4
  • 49. XINFO <key> XDEL <key> <id> [<id> …] XTRIM <key> [MAXLEN [~] <n>] Streams API, some loose ends
  • 51. • Introduction to Redis Streams https://redis.io/topics/streams-intro • The Redis Manifesto https://github.com/antirez/redis/blob/unstable/MANIFESTO • Salvatore's blog posts http://antirez.com/news/114 and http://antirez.com/news/116 • Salvatore's inaugural Streams demo https://www.youtube.com/watch?v=ELDzy9lCFHQ • Salvatore's live demo at Redis Day Tel Aviv 2018 https://www.youtube.com/watch?v=qXEyuUxQXZM • RCP 11 - The stream data type https://github.com/redis/redis-rcp/blob/master/RCP11.md • Reddit discussion https://www.reddit.com/r/redis/comments/4mmrgr/stream_data_structure_for_redis_lets _design_it/ • Hacker News discussion https://news.ycombinator.com/item?id=15384396 • Consumer groups specification https://gist.github.com/antirez/68e67f3251d10f026861be2d0fe0d2f4 • Consumer groups API https://gist.github.com/antirez/4e7049ce4fce4aa61bf0cfbc3672e64d & https://gist.github.com/antirez/4e7049ce4fce4aa61bf0cfbc3672e64d (some) Redis References