SlideShare a Scribd company logo
Redis Indices 
127.0.0.1:6379> CREATE INDEX _email ON user:*->email 
@itamarhaber / #RedisTLV / 22/9/2014
A Little About Myself 
A Redis Geek and Chief Developers Advocate 
at .com 
I write at http://redislabs.com/blog and edit the 
Redis Watch newsletter at 
http://redislabs.com/redis-watch-archive
Motivation 
● Redis is a Key-Value datastore -> fetching 
(is always) by (primary) key is fast 
● Searching for keys is expensive - SCAN (or, 
god forbid, the "evil" KEYS command) 
● Searching for values in keys requires a full 
(hash) table scan & sending the data to the 
client for processing
https://twitter.com/antirez/status/507082534513963009
antirez is Right 
● Redis is a "database SDK" 
● Indices imply some kind of schema (and 
there's none in Redis) 
● Redis wasn't made for indexing 
● ... 
But despite the Creator's humble opinion, 
sometimes you still need a fast way to search :)
So What is an Index? 
"A database index is a data 
structure that improves the speed 
of data retrieval operations" 
Wikipedia, 2014 
Space-Time Tradeoff
What Can be Indexed? 
Data Index 
Key -> Value Value -> Key 
• Values can be numbers or strings 
• Can be derived from "opaque" values: 
JSONs, data structures (e.g. Hash), 
functions, …
Index Operations Checklist 
1. Create index from existing data 
2. Update the index on 
a. Addition of new values 
b. Updates of existing values 
c. Deletion of keys (and also RENAME/MIGRATE…) 
3. Drop the index 
4. If needed do index housekeeping 
5. Access keys using the index
A Simple Example: Reverse Lookup 
Assume the following database, where every 
user has a single unique email address: 
HMSET user:1 id "1" email "dfucbitz@terah.net" 
How would you go about efficiently fetching the 
user's ID given an email address?
Reverse Lookup (Pseudo) Recipe 
def idxEmailAdd(email, id): # 2.a 
if not(r.setnx("_email:" + email, id)): 
raise Exception("INDEX_EXISTS") 
def idxEmailCreate(): # 1 
for each u in r.scan("user:*"): 
id, email = r.hmget(u, "id", "email") 
idxEmailAdd(email, id)
Reverse Lookup Recipe, more admin 
def idxEmailDel(email): # 2.c 
r.del("_email:" + email) 
def idxEmailUpdate(old, new): # 2.b 
idxEmailDel(old) 
idxEmailAdd(new) 
def idxEmailDrop(): ... # similar to Create
Reverse Lookup Recipe, integration 
def addUser(json): 
... 
idxEmailAdd(email, id) 
... 
def updateUser(json): ...
Reverse Lookup Recipe, usage 
def getUser(id): 
return r.hgetall("user:" + id) 
TA-DA! 
def getUserByEmail(email): # 5 
return getUser(r.get("_email:" + email))
Reverse Lookup Recipe, Analysis 
● Asymptotic computational complexity: 
o Creating the index: O(N), N is no. of values 
o Adding a new value to the index: O(1) 
o Deleting a value from the index: O(1) 
o Updating a value: O(1) + O(1) = O(1) 
o Deleting the index: O(N), N is no. of values 
● What about memory? Every key in Redis 
takes up some extra space...
Hash Index 
_email = { "dfucbitz@terah.net": 1, 
"foo@bar.baz": 2 ... } 
● Small lookups (e.g. countries) → single key 
● Big lookups → partitioned to "buckets" (e.g. 
by email address hash value) 
More info: http://redis.io/topics/memory-optimization
Always Remember 
That You Are Absolutely 
Unique 
(Just Like Everyone Else)
Uniqueness 
The lookup recipe makes the assumption that 
every user has a single email address and that 
it's unique (i.e. 1:1 relationship). 
What happens if several keys (users) have the 
same indexed value (email)?
Non-Uniqueness with Lists 
Use lists instead of using Redis' strings/hashes. 
To add: 
r.lpush("_email:" + email, id) # 2.a 
Simple. What about accessing the list for writes 
or reads? Naturally, getting the all list's 
members is O(N) but...
What?!? WTF do you mean O(N)?!? 
Because a Redis List is essentially a linked list, 
traversing it requires up to N operations 
(LINDEX, LRANGE…). That 
means that updates & deletes 
are O(N) 
Conclusion: suitable when N (i.e. number of 
duplicate index entries) is smallish (e.g. < 10)
OT: A Tip for Traversing Lists 
Lists don't have LSCAN, but with 
RPOPLPUSH you easily can do a 
circular list pattern and go over all 
the members in O(N) w/o copying 
the entire list. 
More at: http://redis.io/commands/rpoplpush
Back to Non-Uniqueness - Hashes 
Use Hashes to store multiple index values: 
r.hset("_email:" + email, id, "") # 2.a 
Great - still O(1). How about deleting? 
r.hdel("_email:" + email, id) # 2.b 
Another O(1). 
(unused)
Non-Uniqueness, Sets Variant 
r.sadd("_email:" + email, id) # 2.a 
Great - still O(1). How about deleting? 
r.srem("_email:" + email, id) # 2.b 
Another O(1).
List vs. Hash vs. Set for NUIVs* 
* Non-Unique Index Value 
● Memory: List ~= Set ~= Hash (N < 100) 
● Performance: List < Set, Hash 
● Unlike a List's elements, Set members and 
Hash fields are: 
o Unique - meaning you can't index the same key 
more than once (makes sense). 
o Unordered - a non-issue for this type of index. 
o Are SCANable 
● Forget Lists, use Sets or Hashes.
Forget Hashes, Sets are Better 
Because of the Set operations: 
SUNION, SDIFF, SINTER 
Endless possibilities, including 
matchmaking: 
SINTER _interest:devops _hair:blond _gender:...
[This Slide has No Title] 
NULL means no value and Redis is all about 
values. 
When needed, arbitrarily decide on a value for 
NULLs (e.g. "<null>") and handle it 
appropriately in code.
Index Cardinality (~= unique values) 
● High cardinality/no duplicates -> use a Hash 
● Some duplicates -> use Hash and "pointers" 
to Sets 
_email = { "dfucbitz@terah.net": 1, 
"foo@bar.baz": "*" ...} 
_email:foo@bar.baz = { 2, 3 } 
● Low cardinality is, however, another story...
Low Cardinality 
When an indexed attribute has a small number 
of possible values (e.g. Boolean, gender...): 
● If distribution of values is 50:50, consider not 
indexing it at all 
● If distribution is heavily unbalanced (5:95), 
index only the smaller subsets, full scan rest 
● Use a bitmap index if possible
Bitmap Index 
Assumption: key names are ordered 
How: a Bitset where a bit's position maps to a 
key and the bit's value is the indexed value: 
first bit -> dfucbitz is online 
_isLoggedIn = /100…/ 
second bit -> foo isn't logged in
Bitmap Index, cont. 
More than 2 values? Use n Bitsets, where n is 
the number of possible indexed values, e.g.: 
_isFromTerah = /100.../ 
_isFromEarth = /010.../ 
Bonus: BITOP AND / OR / XOR / NOT 
BITOP NOT _ET _isFromEarth 
BITOP AND onlineET _isLoggedIn _ET
Interlude: Redis Indices Save Space 
Consider the following: in a relational database 
you need "x2" space: for the indexed data 
(stored in a table) and for the index itself. 
With most Redis indices, you don't have to 
store the indexed data -> space saved :)
Numerical Ranges with Sorted Sets 
Numerical values, including timestamps 
(epoch), are trivially indexed with a Sorted Set: 
ZADD _yearOfBirth 1972 "1" 1961 "2"... 
ZADD _lastLogin 1411245569 "1" 
Use ZRANGEBYSCORE and 
ZREVRANGEBYSCORE for range queries
Ordered "Composite" Numerical Indices 
Use Sorted Sets scores that are constructed by 
the sort (range) order. Store two values in one 
score using the integer and fractional parts: 
user:1 = { "id": "1", "weightKg": "82", 
"heightCm": "218", ... } 
score = weightKg + ( heightCm / 1000 )
"Composite" Numerical Indices, cont. 
For more "complex" sorts (up to 53 bits of 
percision), you can construct the score like so: 
user:1 = { "id": "1", "weightKg": "82", 
"heightCm": "218", "IQ": "100", ... } 
score = weightKg * 1000000 + 
heightCm * 1000 + IQ 
Adapted from: 
http://www.dr-josiah.com/2013/10/multi-column-sql-like-sorting-in-redis.html
Full Text Search (Almost) (v2.8.9+) 
ZRANGEBYLEX on Sorted Set members that 
have the same score is handy for suffix 
wildcard searches, i.e. dfuc*, a-la 
autocomplete: http://autocomplete.redis.io/ 
Tip: by storing the reversed string (gnirts) you 
can also do prefix searches, i.e. *terah.net, just 
as easily.
Another Nice Thing With Sorted Sets 
By combining the use of two of these, it is 
possible to map ranges to keys (or just data). 
For example, what is 5? 
ZADD min 1 "low" 4 "medium" 7 "high" 
ZADD max 3 "low" 6 "medium" 9 "high" 
ZREVRANGEBYSCORE min –inf 5 LIMIT 0 1 
ZRANGEBYSCORE max 5 +inf LIMIT 0 1
Binary Trees 
Everybody knows that 
binary trees are really useful 
for searching and other stuff. 
You can store a binary tree 
as an array in a Sorted Set: 
(Happy 80th Birthday!)
Why stop at binary trees? BTrees! 
@thinkingfish from Twitter explained that they 
took the BSD implementation of BTrees and 
welded it into Redis (open source rulez!). This 
allows them to do efficient (speed-wise, not 
memory) key and range lookups. 
http://highscalability.com/blog/2014/9/8/how-twitter-uses-redis- 
to-scale-105tb-ram-39mm-qps-10000-ins.html
Index Atomicity & Consistency 
In a relational database the index is (hopefully) 
always in sync with the data. 
You can strive for that in Redis, but: 
• Your code will be much more complex 
• Performance will suffer 
• There will be bugs/edge cases/extreme 
uses…
The Opposite of Atomicity & Consistency 
On the other extreme, you could consider 
implementing indexing with a: 
• Periodical process (lazy indexing) 
• Producer/Consumer pattern (i.e. queue) 
• Keyspace notifications 
You won't have any guarantees, but you'll be 
offloading the index creation from the app.
Indices, Lua & Clustering 
Server-side scripting is an obvious 
consideration for implementing a lot (if 
not all) of the indexing logic. But ... 
… in a cluster setup, a script runs on 
a single shard and can only access the 
keys there -> no guarantee that a key 
and an index are on the same shard.
Don't Think – Copy-Paste! 
For even more "inspiration" you can review the 
source code of popular ORMs libraries for 
Redis, for example: 
• https://github.com/josiahcarlson/rom 
• https://github.com/yohanboniface/redis-limpyd
Redis Indices (#RedisTLV)

More Related Content

PPT
teknik-perekaman-ekg-ppt
PPTX
Disiplin dan standar pelayanan kebidanan
PPTX
Pert 12 kode etik keperawatan indonesia
DOC
Penyimpangan kdm pneumonia
DOCX
Plasenta Previa
PPTX
Askep komunitas 2020
PDF
Kegawatdaruratan Masa Kehamilan Lanjut
PDF
Jenis model Dokumentasi Keperawatan
teknik-perekaman-ekg-ppt
Disiplin dan standar pelayanan kebidanan
Pert 12 kode etik keperawatan indonesia
Penyimpangan kdm pneumonia
Plasenta Previa
Askep komunitas 2020
Kegawatdaruratan Masa Kehamilan Lanjut
Jenis model Dokumentasi Keperawatan

What's hot (20)

DOCX
Sindrom Guillain Bare
PDF
Antepartum hemorrhage
PDF
Jenis model Dokumentasi Keperawatan
PPTX
Kardiotokografi
PPTX
Kebutuhan Dasar Ibu Hamil
PPTX
Ponek rsud cibinong
PDF
Modul 3 kdk ii
DOC
118354188 lp-letak-sungsang
PDF
Pogi, usg, 2014, final, 2. panduan usg pogi, 20140422
DOCX
Edema paru
DOCX
Presus vbac
DOCX
Asuhan kebidanan ibu hamil pada ny “r“ umur 30th g2 p1 a0
PPTX
Siklus Kehidupan Wanita
PPTX
penyakit jantung dalam kehamilan
PPTX
Klasifikasi dan tingkat maserasi
PPT
Yans_ PPT Yans Berpikir Kritis Dalam Keperawatan.ppt
PPT
Askep pada anak dengan campak
PPTX
PERSIAPAN DAN PERAWATAN OPERASI
PPTX
ASUHAN KEBIDANAN PATOLOGI (Bendungan ASI)
DOCX
Makalah labio palato
Sindrom Guillain Bare
Antepartum hemorrhage
Jenis model Dokumentasi Keperawatan
Kardiotokografi
Kebutuhan Dasar Ibu Hamil
Ponek rsud cibinong
Modul 3 kdk ii
118354188 lp-letak-sungsang
Pogi, usg, 2014, final, 2. panduan usg pogi, 20140422
Edema paru
Presus vbac
Asuhan kebidanan ibu hamil pada ny “r“ umur 30th g2 p1 a0
Siklus Kehidupan Wanita
penyakit jantung dalam kehamilan
Klasifikasi dan tingkat maserasi
Yans_ PPT Yans Berpikir Kritis Dalam Keperawatan.ppt
Askep pada anak dengan campak
PERSIAPAN DAN PERAWATAN OPERASI
ASUHAN KEBIDANAN PATOLOGI (Bendungan ASI)
Makalah labio palato
Ad

Viewers also liked (19)

PDF
Everything you always wanted to know about Redis but were afraid to ask
PPTX
Redis Developers Day 2015 - Secondary Indexes and State of Lua
PPTX
Benchmarking Redis by itself and versus other NoSQL databases
PPTX
Redis Use Patterns (DevconTLV June 2014)
PDF
Power to the People: Redis Lua Scripts
PPTX
CouchDB Map/Reduce
PPTX
Scalable Streaming Data Pipelines with Redis
PDF
Getting Started with Redis
PPTX
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
PDF
Use Redis in Odd and Unusual Ways
PDF
Build a Geospatial App with Redis 3.2- Andrew Bass, Sean Yesmunt, Sergio Prad...
PDF
RespClient - Minimal Redis Client for PowerShell
PDF
UV logic using redis bitmap
PDF
HIgh Performance Redis- Tague Griffith, GoPro
PPTX
Using Redis as Distributed Cache for ASP.NET apps - Peter Kellner, 73rd Stre...
PDF
Troubleshooting Redis- DaeMyung Kang, Kakao
PPTX
RedisConf 2016 talk - The Redis API: Simple, Composable, Powerful
PDF
Scalable Streaming Data Pipelines with Redis
PDF
Cloud Foundry for Data Science
Everything you always wanted to know about Redis but were afraid to ask
Redis Developers Day 2015 - Secondary Indexes and State of Lua
Benchmarking Redis by itself and versus other NoSQL databases
Redis Use Patterns (DevconTLV June 2014)
Power to the People: Redis Lua Scripts
CouchDB Map/Reduce
Scalable Streaming Data Pipelines with Redis
Getting Started with Redis
Get more than a cache back! The Microsoft Azure Redis Cache (NDC Oslo)
Use Redis in Odd and Unusual Ways
Build a Geospatial App with Redis 3.2- Andrew Bass, Sean Yesmunt, Sergio Prad...
RespClient - Minimal Redis Client for PowerShell
UV logic using redis bitmap
HIgh Performance Redis- Tague Griffith, GoPro
Using Redis as Distributed Cache for ASP.NET apps - Peter Kellner, 73rd Stre...
Troubleshooting Redis- DaeMyung Kang, Kakao
RedisConf 2016 talk - The Redis API: Simple, Composable, Powerful
Scalable Streaming Data Pipelines with Redis
Cloud Foundry for Data Science
Ad

Similar to Redis Indices (#RedisTLV) (20)

PPTX
PDF
Introduction to Redis
PDF
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
PPTX
About elasticsearch
PDF
Introduction to Databases - query optimizations for MySQL
PDF
#SydPHP - The Magic of Redis
PPTX
Code is not text! How graph technologies can help us to understand our code b...
PPTX
lecture_34e.pptx
PDF
Developing web applications in Rust
PDF
quickguide-einnovator-9-redis
PPTX
NoSQL - Leo's notes
PPTX
MongoDB 3.0
PPTX
Database Performance Tuning
PDF
Rails Tips and Best Practices
PDF
Extend Redis with Modules
PPTX
AI與大數據數據處理 Spark實戰(20171216)
KEY
Mongodb intro
PDF
Индексируем базу: как делать хорошо и не делать плохо Winter saint p 2021 m...
PPTX
MongoDB and Indexes - MUG Denver - 20160329
PPTX
Apache Cassandra, part 1 – principles, data model
Introduction to Redis
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
About elasticsearch
Introduction to Databases - query optimizations for MySQL
#SydPHP - The Magic of Redis
Code is not text! How graph technologies can help us to understand our code b...
lecture_34e.pptx
Developing web applications in Rust
quickguide-einnovator-9-redis
NoSQL - Leo's notes
MongoDB 3.0
Database Performance Tuning
Rails Tips and Best Practices
Extend Redis with Modules
AI與大數據數據處理 Spark實戰(20171216)
Mongodb intro
Индексируем базу: как делать хорошо и не делать плохо Winter saint p 2021 m...
MongoDB and Indexes - MUG Denver - 20160329
Apache Cassandra, part 1 – principles, data model

More from Itamar Haber (10)

PDF
Redis v5 & Streams
PDF
Redis Modules API - an introduction
PDF
Redis Lua Scripts
PDF
How I Implemented the #1 Requested Feature In Redis In Less than 1 Hour with ...
PDF
Redis Streams - Fiverr Tech5 meetup
PPTX
Developing a Redis Module - Hackathon Kickoff
PDF
Leveraging Probabilistic Data Structures for Real Time Analytics with Redis M...
PDF
What's new in Redis v3.2
PPTX
Why Your MongoDB Needs Redis
PPTX
Redis & MongoDB: Stop Big Data Indigestion Before It Starts
Redis v5 & Streams
Redis Modules API - an introduction
Redis Lua Scripts
How I Implemented the #1 Requested Feature In Redis In Less than 1 Hour with ...
Redis Streams - Fiverr Tech5 meetup
Developing a Redis Module - Hackathon Kickoff
Leveraging Probabilistic Data Structures for Real Time Analytics with Redis M...
What's new in Redis v3.2
Why Your MongoDB Needs Redis
Redis & MongoDB: Stop Big Data Indigestion Before It Starts

Recently uploaded (20)

PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Global journeys: estimating international migration
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
Lecture1 pattern recognition............
PPT
Quality review (1)_presentation of this 21
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
IB Computer Science - Internal Assessment.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Miokarditis (Inflamasi pada Otot Jantung)
Introduction-to-Cloud-ComputingFinal.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Mega Projects Data Mega Projects Data
Global journeys: estimating international migration
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Reliability_Chapter_ presentation 1221.5784
Supervised vs unsupervised machine learning algorithms
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Lecture1 pattern recognition............
Quality review (1)_presentation of this 21
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm

Redis Indices (#RedisTLV)

  • 1. Redis Indices 127.0.0.1:6379> CREATE INDEX _email ON user:*->email @itamarhaber / #RedisTLV / 22/9/2014
  • 2. A Little About Myself A Redis Geek and Chief Developers Advocate at .com I write at http://redislabs.com/blog and edit the Redis Watch newsletter at http://redislabs.com/redis-watch-archive
  • 3. Motivation ● Redis is a Key-Value datastore -> fetching (is always) by (primary) key is fast ● Searching for keys is expensive - SCAN (or, god forbid, the "evil" KEYS command) ● Searching for values in keys requires a full (hash) table scan & sending the data to the client for processing
  • 5. antirez is Right ● Redis is a "database SDK" ● Indices imply some kind of schema (and there's none in Redis) ● Redis wasn't made for indexing ● ... But despite the Creator's humble opinion, sometimes you still need a fast way to search :)
  • 6. So What is an Index? "A database index is a data structure that improves the speed of data retrieval operations" Wikipedia, 2014 Space-Time Tradeoff
  • 7. What Can be Indexed? Data Index Key -> Value Value -> Key • Values can be numbers or strings • Can be derived from "opaque" values: JSONs, data structures (e.g. Hash), functions, …
  • 8. Index Operations Checklist 1. Create index from existing data 2. Update the index on a. Addition of new values b. Updates of existing values c. Deletion of keys (and also RENAME/MIGRATE…) 3. Drop the index 4. If needed do index housekeeping 5. Access keys using the index
  • 9. A Simple Example: Reverse Lookup Assume the following database, where every user has a single unique email address: HMSET user:1 id "1" email "dfucbitz@terah.net" How would you go about efficiently fetching the user's ID given an email address?
  • 10. Reverse Lookup (Pseudo) Recipe def idxEmailAdd(email, id): # 2.a if not(r.setnx("_email:" + email, id)): raise Exception("INDEX_EXISTS") def idxEmailCreate(): # 1 for each u in r.scan("user:*"): id, email = r.hmget(u, "id", "email") idxEmailAdd(email, id)
  • 11. Reverse Lookup Recipe, more admin def idxEmailDel(email): # 2.c r.del("_email:" + email) def idxEmailUpdate(old, new): # 2.b idxEmailDel(old) idxEmailAdd(new) def idxEmailDrop(): ... # similar to Create
  • 12. Reverse Lookup Recipe, integration def addUser(json): ... idxEmailAdd(email, id) ... def updateUser(json): ...
  • 13. Reverse Lookup Recipe, usage def getUser(id): return r.hgetall("user:" + id) TA-DA! def getUserByEmail(email): # 5 return getUser(r.get("_email:" + email))
  • 14. Reverse Lookup Recipe, Analysis ● Asymptotic computational complexity: o Creating the index: O(N), N is no. of values o Adding a new value to the index: O(1) o Deleting a value from the index: O(1) o Updating a value: O(1) + O(1) = O(1) o Deleting the index: O(N), N is no. of values ● What about memory? Every key in Redis takes up some extra space...
  • 15. Hash Index _email = { "dfucbitz@terah.net": 1, "foo@bar.baz": 2 ... } ● Small lookups (e.g. countries) → single key ● Big lookups → partitioned to "buckets" (e.g. by email address hash value) More info: http://redis.io/topics/memory-optimization
  • 16. Always Remember That You Are Absolutely Unique (Just Like Everyone Else)
  • 17. Uniqueness The lookup recipe makes the assumption that every user has a single email address and that it's unique (i.e. 1:1 relationship). What happens if several keys (users) have the same indexed value (email)?
  • 18. Non-Uniqueness with Lists Use lists instead of using Redis' strings/hashes. To add: r.lpush("_email:" + email, id) # 2.a Simple. What about accessing the list for writes or reads? Naturally, getting the all list's members is O(N) but...
  • 19. What?!? WTF do you mean O(N)?!? Because a Redis List is essentially a linked list, traversing it requires up to N operations (LINDEX, LRANGE…). That means that updates & deletes are O(N) Conclusion: suitable when N (i.e. number of duplicate index entries) is smallish (e.g. < 10)
  • 20. OT: A Tip for Traversing Lists Lists don't have LSCAN, but with RPOPLPUSH you easily can do a circular list pattern and go over all the members in O(N) w/o copying the entire list. More at: http://redis.io/commands/rpoplpush
  • 21. Back to Non-Uniqueness - Hashes Use Hashes to store multiple index values: r.hset("_email:" + email, id, "") # 2.a Great - still O(1). How about deleting? r.hdel("_email:" + email, id) # 2.b Another O(1). (unused)
  • 22. Non-Uniqueness, Sets Variant r.sadd("_email:" + email, id) # 2.a Great - still O(1). How about deleting? r.srem("_email:" + email, id) # 2.b Another O(1).
  • 23. List vs. Hash vs. Set for NUIVs* * Non-Unique Index Value ● Memory: List ~= Set ~= Hash (N < 100) ● Performance: List < Set, Hash ● Unlike a List's elements, Set members and Hash fields are: o Unique - meaning you can't index the same key more than once (makes sense). o Unordered - a non-issue for this type of index. o Are SCANable ● Forget Lists, use Sets or Hashes.
  • 24. Forget Hashes, Sets are Better Because of the Set operations: SUNION, SDIFF, SINTER Endless possibilities, including matchmaking: SINTER _interest:devops _hair:blond _gender:...
  • 25. [This Slide has No Title] NULL means no value and Redis is all about values. When needed, arbitrarily decide on a value for NULLs (e.g. "<null>") and handle it appropriately in code.
  • 26. Index Cardinality (~= unique values) ● High cardinality/no duplicates -> use a Hash ● Some duplicates -> use Hash and "pointers" to Sets _email = { "dfucbitz@terah.net": 1, "foo@bar.baz": "*" ...} _email:foo@bar.baz = { 2, 3 } ● Low cardinality is, however, another story...
  • 27. Low Cardinality When an indexed attribute has a small number of possible values (e.g. Boolean, gender...): ● If distribution of values is 50:50, consider not indexing it at all ● If distribution is heavily unbalanced (5:95), index only the smaller subsets, full scan rest ● Use a bitmap index if possible
  • 28. Bitmap Index Assumption: key names are ordered How: a Bitset where a bit's position maps to a key and the bit's value is the indexed value: first bit -> dfucbitz is online _isLoggedIn = /100…/ second bit -> foo isn't logged in
  • 29. Bitmap Index, cont. More than 2 values? Use n Bitsets, where n is the number of possible indexed values, e.g.: _isFromTerah = /100.../ _isFromEarth = /010.../ Bonus: BITOP AND / OR / XOR / NOT BITOP NOT _ET _isFromEarth BITOP AND onlineET _isLoggedIn _ET
  • 30. Interlude: Redis Indices Save Space Consider the following: in a relational database you need "x2" space: for the indexed data (stored in a table) and for the index itself. With most Redis indices, you don't have to store the indexed data -> space saved :)
  • 31. Numerical Ranges with Sorted Sets Numerical values, including timestamps (epoch), are trivially indexed with a Sorted Set: ZADD _yearOfBirth 1972 "1" 1961 "2"... ZADD _lastLogin 1411245569 "1" Use ZRANGEBYSCORE and ZREVRANGEBYSCORE for range queries
  • 32. Ordered "Composite" Numerical Indices Use Sorted Sets scores that are constructed by the sort (range) order. Store two values in one score using the integer and fractional parts: user:1 = { "id": "1", "weightKg": "82", "heightCm": "218", ... } score = weightKg + ( heightCm / 1000 )
  • 33. "Composite" Numerical Indices, cont. For more "complex" sorts (up to 53 bits of percision), you can construct the score like so: user:1 = { "id": "1", "weightKg": "82", "heightCm": "218", "IQ": "100", ... } score = weightKg * 1000000 + heightCm * 1000 + IQ Adapted from: http://www.dr-josiah.com/2013/10/multi-column-sql-like-sorting-in-redis.html
  • 34. Full Text Search (Almost) (v2.8.9+) ZRANGEBYLEX on Sorted Set members that have the same score is handy for suffix wildcard searches, i.e. dfuc*, a-la autocomplete: http://autocomplete.redis.io/ Tip: by storing the reversed string (gnirts) you can also do prefix searches, i.e. *terah.net, just as easily.
  • 35. Another Nice Thing With Sorted Sets By combining the use of two of these, it is possible to map ranges to keys (or just data). For example, what is 5? ZADD min 1 "low" 4 "medium" 7 "high" ZADD max 3 "low" 6 "medium" 9 "high" ZREVRANGEBYSCORE min –inf 5 LIMIT 0 1 ZRANGEBYSCORE max 5 +inf LIMIT 0 1
  • 36. Binary Trees Everybody knows that binary trees are really useful for searching and other stuff. You can store a binary tree as an array in a Sorted Set: (Happy 80th Birthday!)
  • 37. Why stop at binary trees? BTrees! @thinkingfish from Twitter explained that they took the BSD implementation of BTrees and welded it into Redis (open source rulez!). This allows them to do efficient (speed-wise, not memory) key and range lookups. http://highscalability.com/blog/2014/9/8/how-twitter-uses-redis- to-scale-105tb-ram-39mm-qps-10000-ins.html
  • 38. Index Atomicity & Consistency In a relational database the index is (hopefully) always in sync with the data. You can strive for that in Redis, but: • Your code will be much more complex • Performance will suffer • There will be bugs/edge cases/extreme uses…
  • 39. The Opposite of Atomicity & Consistency On the other extreme, you could consider implementing indexing with a: • Periodical process (lazy indexing) • Producer/Consumer pattern (i.e. queue) • Keyspace notifications You won't have any guarantees, but you'll be offloading the index creation from the app.
  • 40. Indices, Lua & Clustering Server-side scripting is an obvious consideration for implementing a lot (if not all) of the indexing logic. But ... … in a cluster setup, a script runs on a single shard and can only access the keys there -> no guarantee that a key and an index are on the same shard.
  • 41. Don't Think – Copy-Paste! For even more "inspiration" you can review the source code of popular ORMs libraries for Redis, for example: • https://github.com/josiahcarlson/rom • https://github.com/yohanboniface/redis-limpyd