SlideShare a Scribd company logo
Cache Invalidation
- Jacky Lai
Two Hard Things
• There are only two hard things in Computer Science:
cache invalidation and naming things. -- Phil Karlton
• http://martinfowler.com/bliki/TwoHardThings.html
Sample Database
id name size start_time end_time
1 HAPPY SMALL 02:00 04:00
2 HAPPY SMALL 02:00 06:00
3 HAPPY LARGE 01:00 08:00
Table: product
How hard could it be?
• As long as we invalidate the data when write operation, correct?
Single Cache Value Handling
• Memcache is designed as key-value datastore.
• If we are using PK as cache key, for single item retrieving and invalidation, it’s
easy.
key value
1 {“id”: 1, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “04:00”}
key value
1 {“id”: 1, “name”: “SUNNY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “04:00”}
Step1: oldProduct = findOneByPK( 1L ); // read from cache
Step 2: updateOneByPK( {“id” : 1, “name” : “SUNNY”, …} ) // replace cache value
Step 3: oldProduct = findOneByPK( 1L ); // read from cache
id name size start_time end_time
1 HAPPY SMALL 02:00 04:00
PK as cacheKey
List Data Handling
• Memcache is not designed to handle collection of data.
• Storing collection is easy.
• But invalidation is challenging.
• Memcache is not a database. We cannot easily find the right cache key to
invalidate.
List Data Handling – Cont.
• When we are using non-pk field as cache key, we are caching list.
id name size start_time end_time
1 HAPPY SMALL 02:00 04:00
2 HAPPY SMALL 02:00 06:00
3 HAPPY LARGE 01:00 08:00
key value
HAPPY_SMALL [
{“id”: 1, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “04:00”},
{“id”: 2, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “06:00”}
]
Name and size as
cacheKey
Step 1: products = findByNameAndSize(“HAPPY”, Size.SMALL);
Issue 1: Changing Value (non-cache key)
Step 1: oldProducts = findByStartTime(“01:00”);
key value
01:00 [{“id”: 3, “name”: “HAPPY”, “size”: “LARGE”, “start_time”: “01:00”, “end_time”: “08:00”}]
id name size start_time end_time
1 HAPPY SMALL 02:00 04:00
2 HAPPY SMALL 02:00 06:00
3 HAPPY LARGE 01:00 08:00
Step 2: updateOneByPK( {“id” : 3, “name” : “SUNNY”, …} ) // “name” is not cache key (start_time)
id name size start_time end_time
1 HAPPY SMALL 02:00 04:00
2 HAPPY SMALL 02:00 06:00
3 SUNNY LARGE 01:00 08:00Step 3: newProducts = findByStartTime(“01:00”);
key value
01:00 [{“id”: 3, “name”: “HAPPY”, “size”: “LARGE”, “start_time”: “01:00”, “end_time”: “08:00”}]
Issue 1: Changing Value (non-cache key) –
Cont.
• If we have the following methods:
• findBySize(Size.LARGE)
• findByStartTime(“02:00”)
• findByEndTime(“08:00”)
• After change one product name to “SUNNY”, which cache entry needs
to be invalidated?
• The number of elements in each cache entry doesn’t change. Only
cache values are changed.
key value
LARGE [{ … },
{ … }]
02:00 [{ … },
{ … }]
08:00 [{ … },
{ … }]
Which
one?
Issue 2: Changing Value (cache key)
id name size start_time end_time
1 HAPPY SMALL 02:00 04:00
2 HAPPY SMALL 02:00 06:00
3 HAPPY LARGE 01:00 08:00
Step 1:
oldHappySmallProducts = findByNameSize(“HAPPY”, Size.SMALL);
oldHappyLargeProducts = findByNameSize(“HAPPY”, Size.LARGE);
key value
HAPPY_SMALL [{“id”: 1, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “04:00”},
{“id”: 2, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “06:00”}]
HAPPY_LARGE [{“id”: 1, “name”: “HAPPY”, “size”: “LARGE”, “start_time”: “01:00”, “end_time”: “08:00”}]
Step 2: updateOneByPK( {“id” : 2, “size” : “LARGE”, …} ) // “size” is cache key
id name size start_time end_time
1 HAPPY SMALL 02:00 04:00
2 HAPPY LARGE 02:00 06:00
3 HAPPY LARGE 01:00 08:00
Issue 2: Changing Value (cache key) – Cont.
key value
HAPPY_SMALL [{“id”: 1, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “04:00”},
{“id”: 2, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “06:00”}]
HAPPY_LARGE [{“id”: 1, “name”: “HAPPY”, “size”: “LARGE”, “start_time”: “01:00”, “end_time”: “08:00”}]
Step 3:
newHappySmallProducts = findByNameSize(“HAPPY”, Size.SMALL);
newHappyLargeProducts = findByNameSize(“HAPPY”, Size.LARGE);
HAPPY_SMALL should now contain 1 item,
HAPPY_LARGE should now contain 2 items.
Both number of elements and cache value are changed.
Propose Solution:
Issue 1: Changing Value (non-cache key)
• We can do a two-phase fetch.
• http://lists.danga.com/pipermail/memcached/2007-July/004578.html
• We fetch the list of PK / id.
Step 1: primaryKeys = findPKsByStartTime(“02:00”);
key value
1 [{“id”: 1, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “04:00”}]
2 [{“id”: 2, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “06:00”}]
id name size start_time end_time
1 HAPPY SMALL 02:00 04:00
2 HAPPY SMALL 02:00 06:00
3 HAPPY LARGE 01:00 08:00
key value
02:00 [1, 2]
Step 2: products = multiGet( primaryKeys );
Propose Solution:
Issue 1: Changing Value (non-cache key)
• When non-cache key value has changed, none of the indexes cache
entry needs to be changed.
Step 3: updateOneByPK( {“id” : 1, “name” : “SUNNY”, …} ) // “name” is not cache key (start_time)
key value
02:00 [1, 2]
key value
1 [{“id”: 1, “name”: “SUNNY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “04:00”}]
2 [{“id”: 2, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “06:00”}]
No change.
Only change cache entry
with PK
Propose Solution:
Issue 1: Changing Value (non-cache key)
• Object-relational Impedence mismatch.
• Relational: children has the parent id.
• Object: parent stores references of children.
id parent_id name
1
1
1
id name
1
field value
children [1, 2, 3]
id name
1
2
3
Propose Solution:
Issue 1: Changing Value (non-cache key)
• “Two-phase fetch” follows object model.
• In Java, a List is an object with child references.
• Benefits:
• Flexible. Only change one item, and the change will propagate to all queries
that contain it.
• Space saving.
Propose Solution:
Issue 2: Changing Value (cache key)
• Stale data happens because cache value is out-of-sync with database
system.
• We let them talk.
id name sizekey value
Product Updated Event Notification
/ Product Created Event Notification
“productId” : 2,
“oldProduct” : {},
“newProduct”: {},
“updatedFields” : [
“name”
]
Propose Solution:
Issue 2: Changing Value (cache key) – Cont.
• For each query that uses the updated field as cache key (or part of
cache keys), invalidates the backing cache entry.
• findByNameSize(“HAPPY”, Size.SMALL);
• findByNameSize(“SUNNY”, Size.LARGE);
• findByName(“HAPPY”);
Product Updated Event:
{
“productId” : 2,
“oldProduct” : {},
“newProduct”: {},
“updatedFields” : [
“name”
]
}
key value
HAPPY_SMALL [ { }, { }, … ]
SUNNY_LARGE [ { }, { }, … ]
HAPPY [ { }, { }, … ]
• findByName(“LUCKY”) // unaffected, no need to invalidate
Summary
• We handle 3 scenarios differently:
• invalidate single item by primary key.
• Invalidate collection with non-cache key value.
• Invalidate collection iwht cache key value.
The End.

More Related Content

PDF
Cassandra Day Atlanta 2015: Building Your First Application with Apache Cassa...
PDF
Cassandra 2.1
PDF
Cassandra Summit 2015
PDF
Cassandra summit keynote 2014
PDF
CQL3 in depth
PDF
Cassandra Day Chicago 2015: Advanced Data Modeling
PDF
Effective ActiveRecord
PDF
Searching and reporting with splunk 6.x e learning
Cassandra Day Atlanta 2015: Building Your First Application with Apache Cassa...
Cassandra 2.1
Cassandra Summit 2015
Cassandra summit keynote 2014
CQL3 in depth
Cassandra Day Chicago 2015: Advanced Data Modeling
Effective ActiveRecord
Searching and reporting with splunk 6.x e learning

Recently uploaded (20)

PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Nekopoi APK 2025 free lastest update
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
AI in Product Development-omnex systems
PDF
System and Network Administration Chapter 2
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Digital Strategies for Manufacturing Companies
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
history of c programming in notes for students .pptx
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
System and Network Administraation Chapter 3
PPTX
Essential Infomation Tech presentation.pptx
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Transform Your Business with a Software ERP System
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Nekopoi APK 2025 free lastest update
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Softaken Excel to vCard Converter Software.pdf
AI in Product Development-omnex systems
System and Network Administration Chapter 2
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Digital Strategies for Manufacturing Companies
Design an Analysis of Algorithms I-SECS-1021-03
history of c programming in notes for students .pptx
Which alternative to Crystal Reports is best for small or large businesses.pdf
System and Network Administraation Chapter 3
Essential Infomation Tech presentation.pptx
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Wondershare Filmora 15 Crack With Activation Key [2025
Reimagine Home Health with the Power of Agentic AI​
Navsoft: AI-Powered Business Solutions & Custom Software Development
CHAPTER 2 - PM Management and IT Context
Transform Your Business with a Software ERP System
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Ad
Ad

Cache invalidation

  • 2. Two Hard Things • There are only two hard things in Computer Science: cache invalidation and naming things. -- Phil Karlton • http://martinfowler.com/bliki/TwoHardThings.html
  • 3. Sample Database id name size start_time end_time 1 HAPPY SMALL 02:00 04:00 2 HAPPY SMALL 02:00 06:00 3 HAPPY LARGE 01:00 08:00 Table: product
  • 4. How hard could it be? • As long as we invalidate the data when write operation, correct?
  • 5. Single Cache Value Handling • Memcache is designed as key-value datastore. • If we are using PK as cache key, for single item retrieving and invalidation, it’s easy. key value 1 {“id”: 1, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “04:00”} key value 1 {“id”: 1, “name”: “SUNNY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “04:00”} Step1: oldProduct = findOneByPK( 1L ); // read from cache Step 2: updateOneByPK( {“id” : 1, “name” : “SUNNY”, …} ) // replace cache value Step 3: oldProduct = findOneByPK( 1L ); // read from cache id name size start_time end_time 1 HAPPY SMALL 02:00 04:00 PK as cacheKey
  • 6. List Data Handling • Memcache is not designed to handle collection of data. • Storing collection is easy. • But invalidation is challenging. • Memcache is not a database. We cannot easily find the right cache key to invalidate.
  • 7. List Data Handling – Cont. • When we are using non-pk field as cache key, we are caching list. id name size start_time end_time 1 HAPPY SMALL 02:00 04:00 2 HAPPY SMALL 02:00 06:00 3 HAPPY LARGE 01:00 08:00 key value HAPPY_SMALL [ {“id”: 1, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “04:00”}, {“id”: 2, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “06:00”} ] Name and size as cacheKey Step 1: products = findByNameAndSize(“HAPPY”, Size.SMALL);
  • 8. Issue 1: Changing Value (non-cache key) Step 1: oldProducts = findByStartTime(“01:00”); key value 01:00 [{“id”: 3, “name”: “HAPPY”, “size”: “LARGE”, “start_time”: “01:00”, “end_time”: “08:00”}] id name size start_time end_time 1 HAPPY SMALL 02:00 04:00 2 HAPPY SMALL 02:00 06:00 3 HAPPY LARGE 01:00 08:00 Step 2: updateOneByPK( {“id” : 3, “name” : “SUNNY”, …} ) // “name” is not cache key (start_time) id name size start_time end_time 1 HAPPY SMALL 02:00 04:00 2 HAPPY SMALL 02:00 06:00 3 SUNNY LARGE 01:00 08:00Step 3: newProducts = findByStartTime(“01:00”); key value 01:00 [{“id”: 3, “name”: “HAPPY”, “size”: “LARGE”, “start_time”: “01:00”, “end_time”: “08:00”}]
  • 9. Issue 1: Changing Value (non-cache key) – Cont. • If we have the following methods: • findBySize(Size.LARGE) • findByStartTime(“02:00”) • findByEndTime(“08:00”) • After change one product name to “SUNNY”, which cache entry needs to be invalidated? • The number of elements in each cache entry doesn’t change. Only cache values are changed. key value LARGE [{ … }, { … }] 02:00 [{ … }, { … }] 08:00 [{ … }, { … }] Which one?
  • 10. Issue 2: Changing Value (cache key) id name size start_time end_time 1 HAPPY SMALL 02:00 04:00 2 HAPPY SMALL 02:00 06:00 3 HAPPY LARGE 01:00 08:00 Step 1: oldHappySmallProducts = findByNameSize(“HAPPY”, Size.SMALL); oldHappyLargeProducts = findByNameSize(“HAPPY”, Size.LARGE); key value HAPPY_SMALL [{“id”: 1, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “04:00”}, {“id”: 2, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “06:00”}] HAPPY_LARGE [{“id”: 1, “name”: “HAPPY”, “size”: “LARGE”, “start_time”: “01:00”, “end_time”: “08:00”}] Step 2: updateOneByPK( {“id” : 2, “size” : “LARGE”, …} ) // “size” is cache key id name size start_time end_time 1 HAPPY SMALL 02:00 04:00 2 HAPPY LARGE 02:00 06:00 3 HAPPY LARGE 01:00 08:00
  • 11. Issue 2: Changing Value (cache key) – Cont. key value HAPPY_SMALL [{“id”: 1, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “04:00”}, {“id”: 2, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “06:00”}] HAPPY_LARGE [{“id”: 1, “name”: “HAPPY”, “size”: “LARGE”, “start_time”: “01:00”, “end_time”: “08:00”}] Step 3: newHappySmallProducts = findByNameSize(“HAPPY”, Size.SMALL); newHappyLargeProducts = findByNameSize(“HAPPY”, Size.LARGE); HAPPY_SMALL should now contain 1 item, HAPPY_LARGE should now contain 2 items. Both number of elements and cache value are changed.
  • 12. Propose Solution: Issue 1: Changing Value (non-cache key) • We can do a two-phase fetch. • http://lists.danga.com/pipermail/memcached/2007-July/004578.html • We fetch the list of PK / id. Step 1: primaryKeys = findPKsByStartTime(“02:00”); key value 1 [{“id”: 1, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “04:00”}] 2 [{“id”: 2, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “06:00”}] id name size start_time end_time 1 HAPPY SMALL 02:00 04:00 2 HAPPY SMALL 02:00 06:00 3 HAPPY LARGE 01:00 08:00 key value 02:00 [1, 2] Step 2: products = multiGet( primaryKeys );
  • 13. Propose Solution: Issue 1: Changing Value (non-cache key) • When non-cache key value has changed, none of the indexes cache entry needs to be changed. Step 3: updateOneByPK( {“id” : 1, “name” : “SUNNY”, …} ) // “name” is not cache key (start_time) key value 02:00 [1, 2] key value 1 [{“id”: 1, “name”: “SUNNY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “04:00”}] 2 [{“id”: 2, “name”: “HAPPY”, “size”: “SMALL”, “start_time”: “02:00”, “end_time”: “06:00”}] No change. Only change cache entry with PK
  • 14. Propose Solution: Issue 1: Changing Value (non-cache key) • Object-relational Impedence mismatch. • Relational: children has the parent id. • Object: parent stores references of children. id parent_id name 1 1 1 id name 1 field value children [1, 2, 3] id name 1 2 3
  • 15. Propose Solution: Issue 1: Changing Value (non-cache key) • “Two-phase fetch” follows object model. • In Java, a List is an object with child references. • Benefits: • Flexible. Only change one item, and the change will propagate to all queries that contain it. • Space saving.
  • 16. Propose Solution: Issue 2: Changing Value (cache key) • Stale data happens because cache value is out-of-sync with database system. • We let them talk. id name sizekey value Product Updated Event Notification / Product Created Event Notification “productId” : 2, “oldProduct” : {}, “newProduct”: {}, “updatedFields” : [ “name” ]
  • 17. Propose Solution: Issue 2: Changing Value (cache key) – Cont. • For each query that uses the updated field as cache key (or part of cache keys), invalidates the backing cache entry. • findByNameSize(“HAPPY”, Size.SMALL); • findByNameSize(“SUNNY”, Size.LARGE); • findByName(“HAPPY”); Product Updated Event: { “productId” : 2, “oldProduct” : {}, “newProduct”: {}, “updatedFields” : [ “name” ] } key value HAPPY_SMALL [ { }, { }, … ] SUNNY_LARGE [ { }, { }, … ] HAPPY [ { }, { }, … ] • findByName(“LUCKY”) // unaffected, no need to invalidate
  • 18. Summary • We handle 3 scenarios differently: • invalidate single item by primary key. • Invalidate collection with non-cache key value. • Invalidate collection iwht cache key value.