SlideShare a Scribd company logo
Caching techinques in
                                 python
                                Michael Domanski
                                europython 2010


czwartek, 22 lipca 2010
who I am

                     • python developer, professionally for a few
                          years now
                     • experienced also in c and objective-c
                     • currently working for 10clouds.com


czwartek, 22 lipca 2010
Interesting intro

                     • a bit of theory
                     • common patterns
                     • common problems
                     • common solutions

czwartek, 22 lipca 2010
How I think about
                               cache

                     • imagine a giant dict storing all your data
                     • you have to manage all data manually
                     • or provide some automated behaviour


czwartek, 22 lipca 2010
similar to....

                     • manual memory managment in c
                     • cache is memory
                     • and you have to controll it manually


czwartek, 22 lipca 2010
profits


                     • improved performance
                     • ...?


czwartek, 22 lipca 2010
problems


                     • managing any type of memory is hard
                     • automation often have to be done custom
                          each time




czwartek, 22 lipca 2010
common patterns



czwartek, 22 lipca 2010
memoization



czwartek, 22 lipca 2010
• very old pattern (circa 1968)
                     • we own the name to Donald Mitchie



czwartek, 22 lipca 2010
how it works


                     • we assosciate input with output, and store
                          in somewhere
                     • based on the assumption that for a given
                          input, output is always the same




czwartek, 22 lipca 2010
code example
                 CACHE_DICT = {}

                 def cached(key):
                     def func_wrapper(func):
                         def arg_wrapper(*args, **kwargs):
                             if not key in CACHE_DICT:
                                  value = func(*args, **kwargs)
                                  CACHE_DICT[key] = value
                             return CACHE_DICT[key]
                         return arg_wrapper
                     return func_wrapper




czwartek, 22 lipca 2010
what if output can
                               change?

                     • our pattern is still usefull
                     • we simply need to add something


czwartek, 22 lipca 2010
cache invalidation



czwartek, 22 lipca 2010
There are only two hard problems in Computer
                           Science: cache invalidation and naming things
                                                                  Phil Karlton


czwartek, 22 lipca 2010
• basically, we update data in cache
                     • we need to know when and what to
                          change

                     • the more granular you want to be, the
                          harder it gets




czwartek, 22 lipca 2010
code example
                   def invalidate(key):
                     try:
                          del CACHE_DICT[key]
                     except KeyError:
                          print "someone tried to invalidate not present
                 key: %s" %key




czwartek, 22 lipca 2010
common problems



czwartek, 22 lipca 2010
invalidating too much/
                                not enough

                     • flushing all data any time something changes
                     • not flushing cache at all
                     • tragic effects


czwartek, 22 lipca 2010
@cached('key1')
                 def simple_function1():
                     return db_get(id=1)

                 @cached('key2')
                 def simple_function2():
                     return db_get(id=2)

                 # SUPPOSE THIS IS IN ANOTHER MODULE

                 @cached('big_key1')
                 def some_bigger_function():
                     """
                     this function depends on big_key1, key1 and key2
                     """
                     def inner_workings():
                         db_set(1, 'something totally new')
                     #######
                     ##   imagine 100 lines of code here :)
                     ######
                     inner_workings()

                          return [simple_function1(),simple_function2()]

                 if __name__ == '__main__':
                     simple_function1()
                     simple_function2()
                     a,b = some_bigger_function()
                     assert a == db_get(id=1), "this fails because we didn't invalidated cache properly"




czwartek, 22 lipca 2010
invalidating too soon/
                                  too late

                     • your cache have to be synchronised to you
                          db
                     • sometimes very hard to spot
                     • leads to tragic mistakes


czwartek, 22 lipca 2010
@cached('key1')
                 def simple_function1():
                     return db_get(id=1)

                 @cached('key2')
                 def simple_function2():
                     return db_get(id=2)

                 # SUPPOSE THIS IS IN ANOTHER MODULE

                 def some_bigger_function():
                     db_set(1, 'something')
                     value = simple_function1()
                     db_set(2, 'something else')
                     #### now we know we used 2 cached functions so....
                     invalidate('key1')
                     invalidate('key2')
                     #### now we know we are safe, but for a price
                     return simple_function2()

                 if __name__ == '__main__':
                     some_bigger_function()




czwartek, 22 lipca 2010
superposition of
                               dependancy
                     • somehow less obvious problem
                     • eventually you will start caching effects of
                          computation
                     • you have to know very preciselly of what
                          your data is dependant



czwartek, 22 lipca 2010
@cached('key1')
                 def simple_function1():
                     return db_get(id=1)

                 @cached('key2')
                 def simple_function2():
                     return db_get(id=2)

                 # SUPPOSE THIS IS IN ANOTHER MODULE

                 @cached('key')
                 def some_bigger_function():

                          return {
                              '1': simple_function1(),
                              '2': simple_function2(),
                              '3': db_get(id=3)
                          }

                 if __name__ == '__main__':
                     simple_function1()
                     # somewhere else
                     db_set(1, 'foobar')
                     # and again
                     db_set(3, 'bazbar')
                     invalidate('key')
                     # ooops, we forgot something
                     data = some_bigger_function()
                     assert data['1'] == db_get(id=1), "this fails because we didn't manage to invalidate all the
                 keys"




czwartek, 22 lipca 2010
summing up
                     • know your data....
                     • be aware what and when you cache
                     • take care when using cached data in
                          computation




czwartek, 22 lipca 2010
common solutions



czwartek, 22 lipca 2010
process level cache



czwartek, 22 lipca 2010
why?

                     • very fast access
                     • simple to implement
                     • very effective as long as you’re using single
                          process




czwartek, 22 lipca 2010
clever tricks with dicts



czwartek, 22 lipca 2010
code example
                 CACHE_DICT = {}

                 def cached(key):
                     def func_wrapper(func):
                         def arg_wrapper(*args, **kwargs):
                             if not key in CACHE_DICT:
                                  value = func(*args, **kwargs)
                                  CACHE_DICT[key] = value
                             return CACHE_DICT[key]
                         return arg_wrapper
                     return func_wrapper




czwartek, 22 lipca 2010
invalidation



czwartek, 22 lipca 2010
code example
                   def invalidate(key):
                     try:
                          del CACHE_DICT[key]
                     except KeyError:
                          print "someone tried to invalidate not present
                 key: %s" %key




czwartek, 22 lipca 2010
application level cache



czwartek, 22 lipca 2010
memcache



czwartek, 22 lipca 2010
• battle tested
                     • scales
                     • fast
                     • supports a few cool features
                     • behaves a lot like dict
                     • supports time-based expiration
czwartek, 22 lipca 2010
libraries?

                     • python-memcache
                     • python-libmemcache
                     • python-cmemcache
                     • pylibmc

czwartek, 22 lipca 2010
why no benchmarks

                     • not the point of this talk :)
                     • benchmarks are generic, caching is specific
                     • pick your flavour, think for yourself


czwartek, 22 lipca 2010
code example
                          cache = memcache.Client(['localhost:11211'])

                 def memcached(key):
                     def func_wrapper(func):
                         def arg_wrapper(*args, **kwargs):
                             value = cache.get(str(key))
                             if not value:
                                 value = func(*args, **kwargs)
                                 cache.set(str(key), value)
                             return value
                         return arg_wrapper
                     return func_wrapper




czwartek, 22 lipca 2010
invalidation



czwartek, 22 lipca 2010
code example
                          def mem_invalidate(key):
                            cache.set(str(key), None)




czwartek, 22 lipca 2010
batch key managment



czwartek, 22 lipca 2010
• what if I don’t want to expire each key
                          manually

                     • that’s a lot to remember
                     • and we have to be carefull :(


czwartek, 22 lipca 2010
groups?

                     • group keys into sets
                     • which are tied to one key per set
                     • expire one key, instead of twenty


czwartek, 22 lipca 2010
how to get there?

                     • store some extra data
                     • you can store dicts in cache
                     • and cache behaves like dict
                     • so it’s a case of comparing keys and values

czwartek, 22 lipca 2010
#we start with specified key and group
                 key='some_key'
                 group='some_group'

                 # now retrieve some data from memcached
                 data=memcached_client.get_multi(key, group)
                 # now data is a dict that should look like
                 #{'some_key' :{'group_key' : '1234',
                 #                  'value' : 'some_value' },
                 # 'some_group' : '1234'}
                 #
                 if data and (key in data) and (group in data):
                     if data[key]['group_key']==data[group]:
                         return data[key]['value']




czwartek, 22 lipca 2010
def cached(key, group_key='', exp_time=0 ):

          # we don't want to mix time based and event based expiration models
          if group_key : assert exp_time==0, "can't set expiration time for grouped keys"
          def f_wrapper(func):
              def arg_wrapper(*args, **kwargs):
                  value = None
                  if group_key:
                      data = cache.get_multi([tools.make_key(group_key)]+[tools.make_key(key)])
                      data_dict = data.get(tools.make_key(key))
                      if data_dict:
                           value = data_dict['value']
                           group_value = data_dict['group_value']
                           if group_value != data[tools.make_key(group_key)]:
                               value = None
                  else:
                      value = cache.get(key)
                  if not value:
                      value = func(*args, **kwargs)
                      if exp_time:
                           cache.set(tools.make_key(key), value, exp_time)
                      elif not group_key:
                           cache.set(tools.make_key(key), value)
                      else: # exp_time not set and we have group_keys
                           group_value = make_group_value(group_key)
                           data_dict = { 'value':value, 'group_value': group_value}
                           cache.set_multi({ tools.make_key(key):data_dict, tools.make_key(group_key):group_value })
                  return value
              arg_wrapper.__name__ = func.__name__
              return arg_wrapper
          return f_wrapper



czwartek, 22 lipca 2010
questions?



czwartek, 22 lipca 2010
code samples @
                       http://github.com/
                    mdomans/europython2010

czwartek, 22 lipca 2010
follow me

                 twitter: mdomans
                 blog:    blog.mdomans.com


czwartek, 22 lipca 2010

More Related Content

PDF
The messy lecture
PDF
Brubeck: Overview
PDF
Mattbrenner
PDF
OSDC.fr 2012 :: Cascalog : progammation logique pour Hadoop
PDF
Brubeck
PDF
Node Access in Drupal 7 (and Drupal 8)
ODP
Introduction to Domain-Driven Design
KEY
Drupalcamp gent - Node access
The messy lecture
Brubeck: Overview
Mattbrenner
OSDC.fr 2012 :: Cascalog : progammation logique pour Hadoop
Brubeck
Node Access in Drupal 7 (and Drupal 8)
Introduction to Domain-Driven Design
Drupalcamp gent - Node access

What's hot (15)

PDF
Security Research2.0 - FIT 2008
PDF
hibernate
PPTX
Build your own entity with Drupal
PDF
IT Data Visualization - Sumit 2008
PPTX
Django - sql alchemy - jquery
PDF
VJET bringing the best of Java and JavaScript together
ZIP
Easy undo.key
PDF
Spock and Geb in Action
PDF
Database madness with_mongoengine_and_sql_alchemy
PDF
Rails' Next Top Model
PDF
Drupal Entities - Emerging Patterns of Usage
PDF
Zend Framework 1 + Doctrine 2
PDF
BDD - Behavior Driven Development Webapps mit Groovy Spock und Geb
PPTX
Core Data Performance Guide Line
PDF
YUI3 Modules
Security Research2.0 - FIT 2008
hibernate
Build your own entity with Drupal
IT Data Visualization - Sumit 2008
Django - sql alchemy - jquery
VJET bringing the best of Java and JavaScript together
Easy undo.key
Spock and Geb in Action
Database madness with_mongoengine_and_sql_alchemy
Rails' Next Top Model
Drupal Entities - Emerging Patterns of Usage
Zend Framework 1 + Doctrine 2
BDD - Behavior Driven Development Webapps mit Groovy Spock und Geb
Core Data Performance Guide Line
YUI3 Modules
Ad

Similar to Caching techniques in python, europython2010 (20)

PDF
Where Django Caching Bust at the Seams
KEY
Scaling Django
PDF
Don’t give up, You can... Cache!
ODP
Caching and tuning fun for high scalability @ LOAD2012
PDF
Plugin Memcached%20 Study
PDF
Memcached
PDF
Erlang Cache
PDF
Python in Action (Part 2)
PPTX
From distributed caches to in-memory data grids
PDF
All The Little Pieces
ODP
Caching and tuning fun for high scalability
PDF
Caching for Performance Masterclass: Caching at Scale
PDF
Cassandra for Python Developers
ODP
Caching and tuning fun for high scalability @ 4Developers
PDF
Kickin' Ass with Cache-Fu (without notes)
 
PPTX
深入解析Oracle-数据库架构设计与性能优化实践
PDF
Website Performance: server- and clientside techniques
PDF
Building Scalable Web Apps
KEY
Scaling Django for X Factor - DJUGL Oct 2012
PDF
Caching and Scaling WordPress using Fragment Caching
Where Django Caching Bust at the Seams
Scaling Django
Don’t give up, You can... Cache!
Caching and tuning fun for high scalability @ LOAD2012
Plugin Memcached%20 Study
Memcached
Erlang Cache
Python in Action (Part 2)
From distributed caches to in-memory data grids
All The Little Pieces
Caching and tuning fun for high scalability
Caching for Performance Masterclass: Caching at Scale
Cassandra for Python Developers
Caching and tuning fun for high scalability @ 4Developers
Kickin' Ass with Cache-Fu (without notes)
 
深入解析Oracle-数据库架构设计与性能优化实践
Website Performance: server- and clientside techniques
Building Scalable Web Apps
Scaling Django for X Factor - DJUGL Oct 2012
Caching and Scaling WordPress using Fragment Caching
Ad

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Cloud computing and distributed systems.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation theory and applications.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Electronic commerce courselecture one. Pdf
The AUB Centre for AI in Media Proposal.docx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Reach Out and Touch Someone: Haptics and Empathic Computing
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Per capita expenditure prediction using model stacking based on satellite ima...
The Rise and Fall of 3GPP – Time for a Sabbatical?
MYSQL Presentation for SQL database connectivity
Cloud computing and distributed systems.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
“AI and Expert System Decision Support & Business Intelligence Systems”
NewMind AI Weekly Chronicles - August'25-Week II
Assigned Numbers - 2025 - Bluetooth® Document
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation theory and applications.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Big Data Technologies - Introduction.pptx
A comparative analysis of optical character recognition models for extracting...
Chapter 3 Spatial Domain Image Processing.pdf
Electronic commerce courselecture one. Pdf

Caching techniques in python, europython2010

  • 1. Caching techinques in python Michael Domanski europython 2010 czwartek, 22 lipca 2010
  • 2. who I am • python developer, professionally for a few years now • experienced also in c and objective-c • currently working for 10clouds.com czwartek, 22 lipca 2010
  • 3. Interesting intro • a bit of theory • common patterns • common problems • common solutions czwartek, 22 lipca 2010
  • 4. How I think about cache • imagine a giant dict storing all your data • you have to manage all data manually • or provide some automated behaviour czwartek, 22 lipca 2010
  • 5. similar to.... • manual memory managment in c • cache is memory • and you have to controll it manually czwartek, 22 lipca 2010
  • 6. profits • improved performance • ...? czwartek, 22 lipca 2010
  • 7. problems • managing any type of memory is hard • automation often have to be done custom each time czwartek, 22 lipca 2010
  • 10. • very old pattern (circa 1968) • we own the name to Donald Mitchie czwartek, 22 lipca 2010
  • 11. how it works • we assosciate input with output, and store in somewhere • based on the assumption that for a given input, output is always the same czwartek, 22 lipca 2010
  • 12. code example CACHE_DICT = {} def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper czwartek, 22 lipca 2010
  • 13. what if output can change? • our pattern is still usefull • we simply need to add something czwartek, 22 lipca 2010
  • 15. There are only two hard problems in Computer Science: cache invalidation and naming things Phil Karlton czwartek, 22 lipca 2010
  • 16. • basically, we update data in cache • we need to know when and what to change • the more granular you want to be, the harder it gets czwartek, 22 lipca 2010
  • 17. code example def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key czwartek, 22 lipca 2010
  • 19. invalidating too much/ not enough • flushing all data any time something changes • not flushing cache at all • tragic effects czwartek, 22 lipca 2010
  • 20. @cached('key1') def simple_function1(): return db_get(id=1) @cached('key2') def simple_function2(): return db_get(id=2) # SUPPOSE THIS IS IN ANOTHER MODULE @cached('big_key1') def some_bigger_function(): """ this function depends on big_key1, key1 and key2 """ def inner_workings(): db_set(1, 'something totally new') ####### ## imagine 100 lines of code here :) ###### inner_workings() return [simple_function1(),simple_function2()] if __name__ == '__main__': simple_function1() simple_function2() a,b = some_bigger_function() assert a == db_get(id=1), "this fails because we didn't invalidated cache properly" czwartek, 22 lipca 2010
  • 21. invalidating too soon/ too late • your cache have to be synchronised to you db • sometimes very hard to spot • leads to tragic mistakes czwartek, 22 lipca 2010
  • 22. @cached('key1') def simple_function1(): return db_get(id=1) @cached('key2') def simple_function2(): return db_get(id=2) # SUPPOSE THIS IS IN ANOTHER MODULE def some_bigger_function(): db_set(1, 'something') value = simple_function1() db_set(2, 'something else') #### now we know we used 2 cached functions so.... invalidate('key1') invalidate('key2') #### now we know we are safe, but for a price return simple_function2() if __name__ == '__main__': some_bigger_function() czwartek, 22 lipca 2010
  • 23. superposition of dependancy • somehow less obvious problem • eventually you will start caching effects of computation • you have to know very preciselly of what your data is dependant czwartek, 22 lipca 2010
  • 24. @cached('key1') def simple_function1(): return db_get(id=1) @cached('key2') def simple_function2(): return db_get(id=2) # SUPPOSE THIS IS IN ANOTHER MODULE @cached('key') def some_bigger_function(): return { '1': simple_function1(), '2': simple_function2(), '3': db_get(id=3) } if __name__ == '__main__': simple_function1() # somewhere else db_set(1, 'foobar') # and again db_set(3, 'bazbar') invalidate('key') # ooops, we forgot something data = some_bigger_function() assert data['1'] == db_get(id=1), "this fails because we didn't manage to invalidate all the keys" czwartek, 22 lipca 2010
  • 25. summing up • know your data.... • be aware what and when you cache • take care when using cached data in computation czwartek, 22 lipca 2010
  • 28. why? • very fast access • simple to implement • very effective as long as you’re using single process czwartek, 22 lipca 2010
  • 29. clever tricks with dicts czwartek, 22 lipca 2010
  • 30. code example CACHE_DICT = {} def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper czwartek, 22 lipca 2010
  • 32. code example def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key czwartek, 22 lipca 2010
  • 35. • battle tested • scales • fast • supports a few cool features • behaves a lot like dict • supports time-based expiration czwartek, 22 lipca 2010
  • 36. libraries? • python-memcache • python-libmemcache • python-cmemcache • pylibmc czwartek, 22 lipca 2010
  • 37. why no benchmarks • not the point of this talk :) • benchmarks are generic, caching is specific • pick your flavour, think for yourself czwartek, 22 lipca 2010
  • 38. code example cache = memcache.Client(['localhost:11211']) def memcached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): value = cache.get(str(key)) if not value: value = func(*args, **kwargs) cache.set(str(key), value) return value return arg_wrapper return func_wrapper czwartek, 22 lipca 2010
  • 40. code example def mem_invalidate(key): cache.set(str(key), None) czwartek, 22 lipca 2010
  • 42. • what if I don’t want to expire each key manually • that’s a lot to remember • and we have to be carefull :( czwartek, 22 lipca 2010
  • 43. groups? • group keys into sets • which are tied to one key per set • expire one key, instead of twenty czwartek, 22 lipca 2010
  • 44. how to get there? • store some extra data • you can store dicts in cache • and cache behaves like dict • so it’s a case of comparing keys and values czwartek, 22 lipca 2010
  • 45. #we start with specified key and group key='some_key' group='some_group' # now retrieve some data from memcached data=memcached_client.get_multi(key, group) # now data is a dict that should look like #{'some_key' :{'group_key' : '1234', # 'value' : 'some_value' }, # 'some_group' : '1234'} # if data and (key in data) and (group in data): if data[key]['group_key']==data[group]: return data[key]['value'] czwartek, 22 lipca 2010
  • 46. def cached(key, group_key='', exp_time=0 ): # we don't want to mix time based and event based expiration models if group_key : assert exp_time==0, "can't set expiration time for grouped keys" def f_wrapper(func): def arg_wrapper(*args, **kwargs): value = None if group_key: data = cache.get_multi([tools.make_key(group_key)]+[tools.make_key(key)]) data_dict = data.get(tools.make_key(key)) if data_dict: value = data_dict['value'] group_value = data_dict['group_value'] if group_value != data[tools.make_key(group_key)]: value = None else: value = cache.get(key) if not value: value = func(*args, **kwargs) if exp_time: cache.set(tools.make_key(key), value, exp_time) elif not group_key: cache.set(tools.make_key(key), value) else: # exp_time not set and we have group_keys group_value = make_group_value(group_key) data_dict = { 'value':value, 'group_value': group_value} cache.set_multi({ tools.make_key(key):data_dict, tools.make_key(group_key):group_value }) return value arg_wrapper.__name__ = func.__name__ return arg_wrapper return f_wrapper czwartek, 22 lipca 2010
  • 48. code samples @ http://github.com/ mdomans/europython2010 czwartek, 22 lipca 2010
  • 49. follow me twitter: mdomans blog: blog.mdomans.com czwartek, 22 lipca 2010