SlideShare a Scribd company logo
memcache@facebook


          Marc Kwiatkowski
          memcache tech lead
          QCon




Monday, April 12, 2010
How big is facebook?




Monday, April 12, 2010
million active users




Monday, April 12, 2010


400M Active
million active users
                                        Title

            M
                                                M
            M



            M



            M



            M



            M




Monday, April 12, 2010


400M Active
Objects
            ▪   More than          million status updates posted each day
                ▪         /s
            ▪   More than billion photos uploaded to the site each month
                ▪        /s
            ▪   More than billion pieces of content (web links, news stories,
                blog posts, notes, photo albums, etc.) shared each week
                ▪    K/s
            ▪   Average user has           friends on the site
                ▪        Billion friend graph edges
            ▪   Average user clicks the Like button on pieces of content each
                month

Monday, April 12, 2010
- Infrastructure
         ▪   Thousands of servers in several data centers in two regions
             ▪   Web servers
             ▪   DB servers
             ▪   Memcache Servers
             ▪   Other services




Monday, April 12, 2010
The scale of memcache @ facebook
         ▪   Memcache Ops/s
             ▪   over      M gets/sec
             ▪   over    M sets/sec
             ▪   over T cached items
             ▪   over      Tbytes
         ▪   Network IO
             ▪   peak rx      Mpkts/s   GB/s
             ▪   peak tx     Mpkts/s    GB/s




Monday, April 12, 2010
A typical memcache server’s P.O.V.
         ▪   Network I/O
             ▪   rx      Kpkts/s . MB/s
             ▪   tx      Kpkts/s      MB/s
         ▪   Memcache OPS
             ▪        K gets/s
             ▪    K sets/s
         ▪          M items




Monday, April 12, 2010
All rates are 1 day moving averages
Evolution of facebook’s
          architecture




Monday, April 12, 2010
Monday, April 12, 2010

• When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone
  on one server
• Then as Facebook grew, they could scale like a traditional site by just adding servers
• Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a
  separated network that could be served on an isolated set of servers
• But as people connected more between schools connected, the model changed--and the big change came
  when Facebook opened to everyone in Sept. 2006
• [For globe]: That led to people being connected everywhere around the world--not just on a single college
  campus.
• [For globe]: This visualization shows accepted friend requests animating from requesting friend to
  accepting friend
Monday, April 12, 2010

• When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone
  on one server
• Then as Facebook grew, they could scale like a traditional site by just adding servers
• Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a
  separated network that could be served on an isolated set of servers
• But as people connected more between schools connected, the model changed--and the big change came
  when Facebook opened to everyone in Sept. 2006
• [For globe]: That led to people being connected everywhere around the world--not just on a single college
  campus.
• [For globe]: This visualization shows accepted friend requests animating from requesting friend to
  accepting friend
Monday, April 12, 2010

• When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone
  on one server
• Then as Facebook grew, they could scale like a traditional site by just adding servers
• Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a
  separated network that could be served on an isolated set of servers
• But as people connected more between schools connected, the model changed--and the big change came
  when Facebook opened to everyone in Sept. 2006
• [For globe]: That led to people being connected everywhere around the world--not just on a single college
  campus.
• [For globe]: This visualization shows accepted friend requests animating from requesting friend to
  accepting friend
Monday, April 12, 2010

• When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone
  on one server
• Then as Facebook grew, they could scale like a traditional site by just adding servers
• Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a
  separated network that could be served on an isolated set of servers
• But as people connected more between schools connected, the model changed--and the big change came
  when Facebook opened to everyone in Sept. 2006
• [For globe]: That led to people being connected everywhere around the world--not just on a single college
  campus.
• [For globe]: This visualization shows accepted friend requests animating from requesting friend to
  accepting friend
Monday, April 12, 2010

• When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone
  on one server
• Then as Facebook grew, they could scale like a traditional site by just adding servers
• Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a
  separated network that could be served on an isolated set of servers
• But as people connected more between schools connected, the model changed--and the big change came
  when Facebook opened to everyone in Sept. 2006
• [For globe]: That led to people being connected everywhere around the world--not just on a single college
  campus.
• [For globe]: This visualization shows accepted friend requests animating from requesting friend to
  accepting friend
Scaling Facebook: Interconnected data


                              Bob




Monday, April 12, 2010


•On Facebook, the data required to serve your home page or
 any other page s incredibly interconnected
•Your data can’t sit on one server or cluster of servers because
 almost every piece of content on Facebook requires
 information about your network of friends
•And the average user has 130 friends
•As we scale, we have to be able to quickly pull data across all
 of our servers, wherever it’s stored.
Scaling Facebook: Interconnected data


                           Bob              Brian




Monday, April 12, 2010


•On Facebook, the data required to serve your
 home page or any other page s incredibly
 interconnected
•Your data can’t sit on one server or cluster of
 servers because almost every piece of content on
 Facebook requires information about your
 network of friends
•And the average user has 130 friends
•As we scale, we have to be able to quickly pull
 data across all of our servers, wherever it’s stored.
Scaling Facebook: Interconnected data


                    Felicia   Bob           Brian




Monday, April 12, 2010


•On Facebook, the data required to serve your
 home page or any other page s incredibly
 interconnected
•Your data can’t sit on one server or cluster of
 servers because almost every piece of content on
 Facebook requires information about your
 network of friends
•And the average user has 130 friends
•As we scale, we have to be able to quickly pull
 data across all of our servers, wherever it’s stored.
Memcache Rules of the Game
         ▪   GET object from memcache
             ▪   on miss, query database and SET object to memcache
         ▪   Update database row and DELETE object in memcache
         ▪   No derived objects in memcache
             ▪   Every memcache object maps to persisted data in database




Monday, April 12, 2010
Scaling memcache




Monday, April 12, 2010
Phatty Phatty Multiget




Monday, April 12, 2010
Phatty Phatty Multiget




Monday, April 12, 2010
Phatty Phatty Multiget (notes)
         ▪   PHP runtime is single threaded and synchronous
         ▪   To get good performance for data-parallel operations like
             retrieving info for all friends, it’s necessary to dispatch memcache
             get requests in parallel
         ▪   Initially we just used polling I/O in PHP.
         ▪   Later we switched to true asynchronous I/O in a PHP C extension
         ▪   In both case the result was reduced latency through parallelism.




Monday, April 12, 2010
Pools and Threads




                         PHP Client




Monday, April 12, 2010
sp:         cs:
                               sp:
                                                           cs:       cs:
                         sp:



                                     PHP Client




Monday, April 12, 2010

Different objects have different sizes and access patterns. We began creating
memcache pools to segregate different kinds of objects for better cache efficiency and
memory utilization.
sp:   sp:   sp:          cs:      cs:       cs:




                                     PHP Client




Monday, April 12, 2010

Different objects have different sizes and access patterns. We began creating
memcache pools to segregate different kinds of objects for better cache efficiency and
memory utilization.
PHP Client




Monday, April 12, 2010

Different objects have different sizes and access patterns. We began creating
memcache pools to segregate different kinds of objects for better cache efficiency and
memory utilization.
Pools and Threads (notes)
         ▪   Privacy objects are small but have poor hit rates
         ▪   User-profiles are large but have good hit rates
         ▪   We achieve better overall caching by segregating different classes
             of objects into different pools of memcache servers
         ▪   Memcache was originally a classic single-threaded unix daemon
             ▪   This meant we needed to run instances with / the RAM on
                 each memcache server
             ▪    X the number of connections to each both
             ▪    X the meta-data overhead
             ▪   We needed a multi-threaded service


Monday, April 12, 2010
Connections and Congestion
       ▪   [animation]




Monday, April 12, 2010
Connections and Congestion (notes)
         ▪   As we added web-servers the connections to each memcache box
             grew.
             ▪   Each webserver ran   -   PHP processes
             ▪   Each memcache box has       K+ TCP connections
             ▪   UDP could reduce the number of connections
         ▪   As we added users and features, the number of keys per-multiget
             increased
             ▪   Popular people and groups
             ▪   Platform and FBML
         ▪   We began to see incast congestion on our ToR switches.


Monday, April 12, 2010
Serialization and Compression
         ▪   We noticed our short profiles weren’t so short
             ▪       K PHP serialized object
             ▪   fb-serialization
                 ▪   based on thrift wire format
                 ▪    X faster
                 ▪       smaller
             ▪   gzcompress serialized strings




Monday, April 12, 2010
Multiple Datacenters

                          SC Web




                            SC
                         Memcache




                                   SC MySQL




Monday, April 12, 2010
Multiple Datacenters

                          SC Web              SF Web




                            SC               SF
                         Memcache         Memcache




                                   SC MySQL




Monday, April 12, 2010
Multiple Datacenters

                           SC Web              SF Web

                         Memcache Proxy    Memcache Proxy




                            SC                SF
                         Memcache          Memcache




                                    SC MySQL




Monday, April 12, 2010
▪Multiple Datacenters (notes)

         ▪   In the early days we had two data-centers
             ▪   The one we were about to turn off
             ▪   The one we were about to turn on
         ▪   Eventually we outgrew a single data-center
             ▪   Still only one master database tier
             ▪   Rules of the game require that after an update we need to
                 broadcast deletes to all tiers
             ▪   The mcproxy era begins




Monday, April 12, 2010
Multiple Regions
                              West Coast           East Coast
                          SC Web                       VA Web




                            SC                          VA
                         Memcache                    Memcache




                                              Memcache Proxy


                                   SC MySQL        VA MySQL




Monday, April 12, 2010
Multiple Regions
                                 West Coast                      East Coast
                           SC Web              SF Web                VA Web

                         Memcache Proxy    Memcache Proxy




                            SC                SF                      VA
                         Memcache          Memcache                Memcache




                                                            Memcache Proxy


                                    SC MySQL                     VA MySQL




Monday, April 12, 2010
Multiple Regions
                                 West Coast                                      East Coast
                           SC Web              SF Web                                VA Web

                         Memcache Proxy    Memcache Proxy




                            SC                SF                                      VA
                         Memcache          Memcache                                Memcache




                                                                            Memcache Proxy


                                    SC MySQL            MySql replication        VA MySQL




Monday, April 12, 2010
▪   Multiple Regions (notes)

         ▪   Latency to east coast and European users was/is terrible.
         ▪   So we deployed a slave DB tier in Ashburn VA
             ▪   Slave DB tracks syncs with master via MySQL binlog
         ▪   This introduces a race condition
         ▪   mcproxy to the rescue again
             ▪   Add memcache delete pramga to MySQL update and insert ops
             ▪   Added thread to slave mysqld to dispatch deletes in east coast
                 via mcpro




Monday, April 12, 2010
Replicated Keys

                  Memcache     Memcache     Memcache




                  PHP Client   PHP Client   PHP Client

Monday, April 12, 2010
Replicated Keys

                  Memcache     Memcache     Memcache

                                 key




                  PHP Client   PHP Client   PHP Client

Monday, April 12, 2010
Replicated Keys

                  Memcache      Memcache     Memcache

                               key key key




                  PHP Client    PHP Client   PHP Client

Monday, April 12, 2010
Replicated Keys

                  Memcache      Memcache     Memcache

                               key key key




                  PHP Client    PHP Client   PHP Client

Monday, April 12, 2010
Replicated Keys

                  Memcache     Memcache     Memcache




                                 key
                  PHP Client   PHP Client   PHP Client

Monday, April 12, 2010
Replicated Keys

                  Memcache      Memcache     Memcache

                         key#    key#         key#




                                  key
                  PHP Client    PHP Client   PHP Client

Monday, April 12, 2010
Replicated Keys (notes)
         ▪   Viral groups and applications cause hot keys
         ▪   More gets than a single memcache server can process
             ▪   (Remember the rules of the game!)
             ▪   That means more queries than a single DB server can process
             ▪   That means that group or application is effectively down
         ▪   Creating key aliases allows us to add server capacity.
             ▪   Hot keys are published to all web-servers
             ▪   Each web-server picks an alias for gets
                 ▪   get key:xxx => get key:xxx#N
             ▪   Each web-server deletes all aliases


Monday, April 12, 2010
Memcache Rules of the Game
         ▪   New Rule
             ▪   If a key is hot, pick an alias and fetch that for reads
             ▪   Delete all aliases on updates




Monday, April 12, 2010
Mirrored Pools


                Specialized Replica                   Specialized Replica
                         Shard       Shard                 Shard        Shard




                                   General pool with wide fanout
                           Shard        Shard      Shard            Shard n

                                                              ...


Monday, April 12, 2010
Mirrored Pools (notes)
            ▪   As our memcache tier grows the ratio of keys/packet decreases
                ▪        keys/ server = packet
                ▪        keys/   server =   packets
                ▪   More network traffic
                ▪   More memcache server kernel interrupts per request
            ▪   Confirmed Info - critical account meta-data
                ▪   Have you confirmed your account?
                ▪   Are you a minor?
                ▪   Pulled from large user-profile objects
            ▪   Since we just need a few bytes of data for many users

Monday, April 12, 2010
Hot Misses
         ▪   [animation]




Monday, April 12, 2010
Hot Misses (notes)
         ▪   Remember the rules of the game
             ▪   update and delete
             ▪   miss, query, and set
         ▪   When the object is very, very popular, that query rate can kill a
             database server
         ▪   We need flow control!




Monday, April 12, 2010
Memcache Rules of the Game
            ▪   For hot keys, on miss grab a mutex before issuing db query
                ▪   memcache-add a per-object mutex
                    ▪    key:xxx => key:xxx#mutex
                    ▪    If add succeeds do the query
                    ▪    If add fails (because mutex already exists) back-off and try again
                    ▪    After set delete mutex




Monday, April 12, 2010
Hot Deletes
            ▪   [hot groups graphics]




Monday, April 12, 2010
Hot Deletes (notes)
         ▪   We’re not out of the woods yet
         ▪   Cache mutex doesn’t work for frequently updated objects
             ▪   like membership lists and walls for viral groups and applications.
         ▪   Each process that acquires a mutex finds that the object has been
             deleted again
             ▪   ...and again
             ▪   ...and again




Monday, April 12, 2010
Rules of the Game: Caching Intent
         ▪   Each memcache server is in the perfect position to detect and
             mitigate contention
             ▪   Record misses
             ▪   Record deletes
             ▪   Serve stale data
             ▪   Serve lease-ids
             ▪   Don’t allow updates without a valid lease id




Monday, April 12, 2010
Next Steps




Monday, April 12, 2010
Shaping Memcache Traffic
         ▪   mcproxy as router
             ▪   admission control
             ▪   tunneling inter-datacenter traffic




Monday, April 12, 2010
Cache Hierarchies
         ▪   Warming up Cold Clusters
         ▪   Proxies for Cacheless Clusters




Monday, April 12, 2010
Big Low Latency Clusters
         ▪   Bigger Clusters are Better
         ▪   Low Latency is Better
         ▪   L .
         ▪   UDP
         ▪   Proxy Facebook Architecture




Monday, April 12, 2010
Worse IS better
         ▪   Richard Gabriel’s famous essay contrasted
             ▪   ITS and Unix
             ▪   LISP and C
             ▪   MIT and New Jersey




Monday, April 12, 2010
http://www.jwz.org/doc/worse-is-better.html
Why Memcache Works
         ▪   Uniform, low latency with partial results is a better user
             experience
         ▪   memcache provides a few robust primitives
             ▪   key-to-server mapping
             ▪   parallel I/O
             ▪   flow-control
             ▪   traffic shaping
         ▪   that allow ad hoc solutions to a wide range of scaling issues




Monday, April 12, 2010
We started with simple, obvious improvements.
As we grew we deployed less obvious improvements...
But they’ve remained pretty simple
(c)   Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. .




Monday, April 12, 2010

More Related Content

ZIP
Practical Semantic Web and Why You Should Care - DrupalCon DC 2009
PDF
Web 2.0 Bootcamp
PDF
A Data Mashup Language for the Data Web
KEY
The Simple Power of the Link
PDF
Serendipity in Linked Open Data
PDF
Unlocking doors: recent initiatives in open and linked data at National Libra...
PDF
Unlocking doors: recent initiatives in open and linked data at National Libra...
PDF
Linked (Open) Data - But what does it buy me?
Practical Semantic Web and Why You Should Care - DrupalCon DC 2009
Web 2.0 Bootcamp
A Data Mashup Language for the Data Web
The Simple Power of the Link
Serendipity in Linked Open Data
Unlocking doors: recent initiatives in open and linked data at National Libra...
Unlocking doors: recent initiatives in open and linked data at National Libra...
Linked (Open) Data - But what does it buy me?

Viewers also liked (6)

PPTX
1ªaula história educ.brasil
PPTX
História da educação
PPT
2.introdução à didática
PPTX
A educação nas antigas civilizações.
PPT
Modelo Apresentação SLide PMKT
PPT
2 Antiguidade Oriental
1ªaula história educ.brasil
História da educação
2.introdução à didática
A educação nas antigas civilizações.
Modelo Apresentação SLide PMKT
2 Antiguidade Oriental
Ad

Similar to Marc facebook (20)

PDF
Секреты Фейсбука: как выдержать 50 миллионов запросов в секунду (Robert Johnson)
PPTX
Fb mechanism
PDF
Facebook, Robert Johnson
PDF
PyCon 2011 Scaling Disqus
PPT
Facebook Technology Stack
PPTX
Big Data & Hadoop Introduction
PDF
20080528dublinpt1
PDF
A library that thinks like the web
PDF
L20 Scalability
PPTX
Internet - History, present and future
PPTX
Facebook architecture presentation: scalability challenge
PDF
Lessons from Highly Scalable Architectures at Social Networking Sites
PPTX
How Facebook actually works????
PPT
PUC Masterclass Big Data
PDF
Real-Time Everything - the Era of Communication Ubiquity
PDF
Oneupweb facebook fallacy
PDF
Matías Paterlini: Desarrollo de aplicaciones en Facebook
PPSX
Web 2.0: "We are married to the Web"
PDF
Plugin Memcached%20 Study
PDF
Memcached
Секреты Фейсбука: как выдержать 50 миллионов запросов в секунду (Robert Johnson)
Fb mechanism
Facebook, Robert Johnson
PyCon 2011 Scaling Disqus
Facebook Technology Stack
Big Data & Hadoop Introduction
20080528dublinpt1
A library that thinks like the web
L20 Scalability
Internet - History, present and future
Facebook architecture presentation: scalability challenge
Lessons from Highly Scalable Architectures at Social Networking Sites
How Facebook actually works????
PUC Masterclass Big Data
Real-Time Everything - the Era of Communication Ubiquity
Oneupweb facebook fallacy
Matías Paterlini: Desarrollo de aplicaciones en Facebook
Web 2.0: "We are married to the Web"
Plugin Memcached%20 Study
Memcached
Ad

More from d0nn9n (20)

PPT
腾讯大讲堂:62 拇指下的精彩(手机qq交互设计经验分享)
PPT
腾讯大讲堂:55 企业法律风险防范
PPT
腾讯大讲堂:56 qzone安全之路
PPT
腾讯大讲堂:59 数据蕴含商机,挖掘决胜千里
PPT
腾讯大讲堂:57 超级qq的千万之路
PDF
蔡学镛 Rebol漫谈
PDF
赵泽欣 - 淘宝网前端应用与发展
PPT
Yanggang wps
PDF
熊节 - 软件工厂的精益之路
PDF
谢恩伟 - 微软在云端
PPTX
去哪儿平台技术
PDF
吴磊 - Silverlight企业级RIA
PDF
Tom - Scrum
PDF
Tim - FSharp
PDF
Tiger oracle
PDF
Paulking groovy
PDF
Paulking dlp
PDF
Patrick jcp
PDF
Nick twitter
PDF
Kane debt
腾讯大讲堂:62 拇指下的精彩(手机qq交互设计经验分享)
腾讯大讲堂:55 企业法律风险防范
腾讯大讲堂:56 qzone安全之路
腾讯大讲堂:59 数据蕴含商机,挖掘决胜千里
腾讯大讲堂:57 超级qq的千万之路
蔡学镛 Rebol漫谈
赵泽欣 - 淘宝网前端应用与发展
Yanggang wps
熊节 - 软件工厂的精益之路
谢恩伟 - 微软在云端
去哪儿平台技术
吴磊 - Silverlight企业级RIA
Tom - Scrum
Tim - FSharp
Tiger oracle
Paulking groovy
Paulking dlp
Patrick jcp
Nick twitter
Kane debt

Recently uploaded (20)

PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PPTX
Computer Architecture Input Output Memory.pptx
PDF
Computing-Curriculum for Schools in Ghana
PPTX
TNA_Presentation-1-Final(SAVE)) (1).pptx
PDF
My India Quiz Book_20210205121199924.pdf
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
HVAC Specification 2024 according to central public works department
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PPTX
Introduction to pro and eukaryotes and differences.pptx
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Share_Module_2_Power_conflict_and_negotiation.pptx
Practical Manual AGRO-233 Principles and Practices of Natural Farming
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
What if we spent less time fighting change, and more time building what’s rig...
LDMMIA Reiki Yoga Finals Review Spring Summer
A powerpoint presentation on the Revised K-10 Science Shaping Paper
Computer Architecture Input Output Memory.pptx
Computing-Curriculum for Schools in Ghana
TNA_Presentation-1-Final(SAVE)) (1).pptx
My India Quiz Book_20210205121199924.pdf
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
HVAC Specification 2024 according to central public works department
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Introduction to pro and eukaryotes and differences.pptx
B.Sc. DS Unit 2 Software Engineering.pptx
Paper A Mock Exam 9_ Attempt review.pdf.
FORM 1 BIOLOGY MIND MAPS and their schemes

Marc facebook

  • 1. memcache@facebook Marc Kwiatkowski memcache tech lead QCon Monday, April 12, 2010
  • 2. How big is facebook? Monday, April 12, 2010
  • 3. million active users Monday, April 12, 2010 400M Active
  • 4. million active users Title M M M M M M M Monday, April 12, 2010 400M Active
  • 5. Objects ▪ More than million status updates posted each day ▪ /s ▪ More than billion photos uploaded to the site each month ▪ /s ▪ More than billion pieces of content (web links, news stories, blog posts, notes, photo albums, etc.) shared each week ▪ K/s ▪ Average user has friends on the site ▪ Billion friend graph edges ▪ Average user clicks the Like button on pieces of content each month Monday, April 12, 2010
  • 6. - Infrastructure ▪ Thousands of servers in several data centers in two regions ▪ Web servers ▪ DB servers ▪ Memcache Servers ▪ Other services Monday, April 12, 2010
  • 7. The scale of memcache @ facebook ▪ Memcache Ops/s ▪ over M gets/sec ▪ over M sets/sec ▪ over T cached items ▪ over Tbytes ▪ Network IO ▪ peak rx Mpkts/s GB/s ▪ peak tx Mpkts/s GB/s Monday, April 12, 2010
  • 8. A typical memcache server’s P.O.V. ▪ Network I/O ▪ rx Kpkts/s . MB/s ▪ tx Kpkts/s MB/s ▪ Memcache OPS ▪ K gets/s ▪ K sets/s ▪ M items Monday, April 12, 2010 All rates are 1 day moving averages
  • 9. Evolution of facebook’s architecture Monday, April 12, 2010
  • 10. Monday, April 12, 2010 • When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone on one server • Then as Facebook grew, they could scale like a traditional site by just adding servers • Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a separated network that could be served on an isolated set of servers • But as people connected more between schools connected, the model changed--and the big change came when Facebook opened to everyone in Sept. 2006 • [For globe]: That led to people being connected everywhere around the world--not just on a single college campus. • [For globe]: This visualization shows accepted friend requests animating from requesting friend to accepting friend
  • 11. Monday, April 12, 2010 • When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone on one server • Then as Facebook grew, they could scale like a traditional site by just adding servers • Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a separated network that could be served on an isolated set of servers • But as people connected more between schools connected, the model changed--and the big change came when Facebook opened to everyone in Sept. 2006 • [For globe]: That led to people being connected everywhere around the world--not just on a single college campus. • [For globe]: This visualization shows accepted friend requests animating from requesting friend to accepting friend
  • 12. Monday, April 12, 2010 • When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone on one server • Then as Facebook grew, they could scale like a traditional site by just adding servers • Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a separated network that could be served on an isolated set of servers • But as people connected more between schools connected, the model changed--and the big change came when Facebook opened to everyone in Sept. 2006 • [For globe]: That led to people being connected everywhere around the world--not just on a single college campus. • [For globe]: This visualization shows accepted friend requests animating from requesting friend to accepting friend
  • 13. Monday, April 12, 2010 • When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone on one server • Then as Facebook grew, they could scale like a traditional site by just adding servers • Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a separated network that could be served on an isolated set of servers • But as people connected more between schools connected, the model changed--and the big change came when Facebook opened to everyone in Sept. 2006 • [For globe]: That led to people being connected everywhere around the world--not just on a single college campus. • [For globe]: This visualization shows accepted friend requests animating from requesting friend to accepting friend
  • 14. Monday, April 12, 2010 • When Mark Zuckerberg and his roommates started Facebook in a Harvard dorm in 2004, the put everyone on one server • Then as Facebook grew, they could scale like a traditional site by just adding servers • Even as the site grew beyond Harvard to Stanford, Columbia and thousands of other campuses, each was a separated network that could be served on an isolated set of servers • But as people connected more between schools connected, the model changed--and the big change came when Facebook opened to everyone in Sept. 2006 • [For globe]: That led to people being connected everywhere around the world--not just on a single college campus. • [For globe]: This visualization shows accepted friend requests animating from requesting friend to accepting friend
  • 15. Scaling Facebook: Interconnected data Bob Monday, April 12, 2010 •On Facebook, the data required to serve your home page or any other page s incredibly interconnected •Your data can’t sit on one server or cluster of servers because almost every piece of content on Facebook requires information about your network of friends •And the average user has 130 friends •As we scale, we have to be able to quickly pull data across all of our servers, wherever it’s stored.
  • 16. Scaling Facebook: Interconnected data Bob Brian Monday, April 12, 2010 •On Facebook, the data required to serve your home page or any other page s incredibly interconnected •Your data can’t sit on one server or cluster of servers because almost every piece of content on Facebook requires information about your network of friends •And the average user has 130 friends •As we scale, we have to be able to quickly pull data across all of our servers, wherever it’s stored.
  • 17. Scaling Facebook: Interconnected data Felicia Bob Brian Monday, April 12, 2010 •On Facebook, the data required to serve your home page or any other page s incredibly interconnected •Your data can’t sit on one server or cluster of servers because almost every piece of content on Facebook requires information about your network of friends •And the average user has 130 friends •As we scale, we have to be able to quickly pull data across all of our servers, wherever it’s stored.
  • 18. Memcache Rules of the Game ▪ GET object from memcache ▪ on miss, query database and SET object to memcache ▪ Update database row and DELETE object in memcache ▪ No derived objects in memcache ▪ Every memcache object maps to persisted data in database Monday, April 12, 2010
  • 22. Phatty Phatty Multiget (notes) ▪ PHP runtime is single threaded and synchronous ▪ To get good performance for data-parallel operations like retrieving info for all friends, it’s necessary to dispatch memcache get requests in parallel ▪ Initially we just used polling I/O in PHP. ▪ Later we switched to true asynchronous I/O in a PHP C extension ▪ In both case the result was reduced latency through parallelism. Monday, April 12, 2010
  • 23. Pools and Threads PHP Client Monday, April 12, 2010
  • 24. sp: cs: sp: cs: cs: sp: PHP Client Monday, April 12, 2010 Different objects have different sizes and access patterns. We began creating memcache pools to segregate different kinds of objects for better cache efficiency and memory utilization.
  • 25. sp: sp: sp: cs: cs: cs: PHP Client Monday, April 12, 2010 Different objects have different sizes and access patterns. We began creating memcache pools to segregate different kinds of objects for better cache efficiency and memory utilization.
  • 26. PHP Client Monday, April 12, 2010 Different objects have different sizes and access patterns. We began creating memcache pools to segregate different kinds of objects for better cache efficiency and memory utilization.
  • 27. Pools and Threads (notes) ▪ Privacy objects are small but have poor hit rates ▪ User-profiles are large but have good hit rates ▪ We achieve better overall caching by segregating different classes of objects into different pools of memcache servers ▪ Memcache was originally a classic single-threaded unix daemon ▪ This meant we needed to run instances with / the RAM on each memcache server ▪ X the number of connections to each both ▪ X the meta-data overhead ▪ We needed a multi-threaded service Monday, April 12, 2010
  • 28. Connections and Congestion ▪ [animation] Monday, April 12, 2010
  • 29. Connections and Congestion (notes) ▪ As we added web-servers the connections to each memcache box grew. ▪ Each webserver ran - PHP processes ▪ Each memcache box has K+ TCP connections ▪ UDP could reduce the number of connections ▪ As we added users and features, the number of keys per-multiget increased ▪ Popular people and groups ▪ Platform and FBML ▪ We began to see incast congestion on our ToR switches. Monday, April 12, 2010
  • 30. Serialization and Compression ▪ We noticed our short profiles weren’t so short ▪ K PHP serialized object ▪ fb-serialization ▪ based on thrift wire format ▪ X faster ▪ smaller ▪ gzcompress serialized strings Monday, April 12, 2010
  • 31. Multiple Datacenters SC Web SC Memcache SC MySQL Monday, April 12, 2010
  • 32. Multiple Datacenters SC Web SF Web SC SF Memcache Memcache SC MySQL Monday, April 12, 2010
  • 33. Multiple Datacenters SC Web SF Web Memcache Proxy Memcache Proxy SC SF Memcache Memcache SC MySQL Monday, April 12, 2010
  • 34. ▪Multiple Datacenters (notes) ▪ In the early days we had two data-centers ▪ The one we were about to turn off ▪ The one we were about to turn on ▪ Eventually we outgrew a single data-center ▪ Still only one master database tier ▪ Rules of the game require that after an update we need to broadcast deletes to all tiers ▪ The mcproxy era begins Monday, April 12, 2010
  • 35. Multiple Regions West Coast East Coast SC Web VA Web SC VA Memcache Memcache Memcache Proxy SC MySQL VA MySQL Monday, April 12, 2010
  • 36. Multiple Regions West Coast East Coast SC Web SF Web VA Web Memcache Proxy Memcache Proxy SC SF VA Memcache Memcache Memcache Memcache Proxy SC MySQL VA MySQL Monday, April 12, 2010
  • 37. Multiple Regions West Coast East Coast SC Web SF Web VA Web Memcache Proxy Memcache Proxy SC SF VA Memcache Memcache Memcache Memcache Proxy SC MySQL MySql replication VA MySQL Monday, April 12, 2010
  • 38. Multiple Regions (notes) ▪ Latency to east coast and European users was/is terrible. ▪ So we deployed a slave DB tier in Ashburn VA ▪ Slave DB tracks syncs with master via MySQL binlog ▪ This introduces a race condition ▪ mcproxy to the rescue again ▪ Add memcache delete pramga to MySQL update and insert ops ▪ Added thread to slave mysqld to dispatch deletes in east coast via mcpro Monday, April 12, 2010
  • 39. Replicated Keys Memcache Memcache Memcache PHP Client PHP Client PHP Client Monday, April 12, 2010
  • 40. Replicated Keys Memcache Memcache Memcache key PHP Client PHP Client PHP Client Monday, April 12, 2010
  • 41. Replicated Keys Memcache Memcache Memcache key key key PHP Client PHP Client PHP Client Monday, April 12, 2010
  • 42. Replicated Keys Memcache Memcache Memcache key key key PHP Client PHP Client PHP Client Monday, April 12, 2010
  • 43. Replicated Keys Memcache Memcache Memcache key PHP Client PHP Client PHP Client Monday, April 12, 2010
  • 44. Replicated Keys Memcache Memcache Memcache key# key# key# key PHP Client PHP Client PHP Client Monday, April 12, 2010
  • 45. Replicated Keys (notes) ▪ Viral groups and applications cause hot keys ▪ More gets than a single memcache server can process ▪ (Remember the rules of the game!) ▪ That means more queries than a single DB server can process ▪ That means that group or application is effectively down ▪ Creating key aliases allows us to add server capacity. ▪ Hot keys are published to all web-servers ▪ Each web-server picks an alias for gets ▪ get key:xxx => get key:xxx#N ▪ Each web-server deletes all aliases Monday, April 12, 2010
  • 46. Memcache Rules of the Game ▪ New Rule ▪ If a key is hot, pick an alias and fetch that for reads ▪ Delete all aliases on updates Monday, April 12, 2010
  • 47. Mirrored Pools Specialized Replica Specialized Replica Shard Shard Shard Shard General pool with wide fanout Shard Shard Shard Shard n ... Monday, April 12, 2010
  • 48. Mirrored Pools (notes) ▪ As our memcache tier grows the ratio of keys/packet decreases ▪ keys/ server = packet ▪ keys/ server = packets ▪ More network traffic ▪ More memcache server kernel interrupts per request ▪ Confirmed Info - critical account meta-data ▪ Have you confirmed your account? ▪ Are you a minor? ▪ Pulled from large user-profile objects ▪ Since we just need a few bytes of data for many users Monday, April 12, 2010
  • 49. Hot Misses ▪ [animation] Monday, April 12, 2010
  • 50. Hot Misses (notes) ▪ Remember the rules of the game ▪ update and delete ▪ miss, query, and set ▪ When the object is very, very popular, that query rate can kill a database server ▪ We need flow control! Monday, April 12, 2010
  • 51. Memcache Rules of the Game ▪ For hot keys, on miss grab a mutex before issuing db query ▪ memcache-add a per-object mutex ▪ key:xxx => key:xxx#mutex ▪ If add succeeds do the query ▪ If add fails (because mutex already exists) back-off and try again ▪ After set delete mutex Monday, April 12, 2010
  • 52. Hot Deletes ▪ [hot groups graphics] Monday, April 12, 2010
  • 53. Hot Deletes (notes) ▪ We’re not out of the woods yet ▪ Cache mutex doesn’t work for frequently updated objects ▪ like membership lists and walls for viral groups and applications. ▪ Each process that acquires a mutex finds that the object has been deleted again ▪ ...and again ▪ ...and again Monday, April 12, 2010
  • 54. Rules of the Game: Caching Intent ▪ Each memcache server is in the perfect position to detect and mitigate contention ▪ Record misses ▪ Record deletes ▪ Serve stale data ▪ Serve lease-ids ▪ Don’t allow updates without a valid lease id Monday, April 12, 2010
  • 56. Shaping Memcache Traffic ▪ mcproxy as router ▪ admission control ▪ tunneling inter-datacenter traffic Monday, April 12, 2010
  • 57. Cache Hierarchies ▪ Warming up Cold Clusters ▪ Proxies for Cacheless Clusters Monday, April 12, 2010
  • 58. Big Low Latency Clusters ▪ Bigger Clusters are Better ▪ Low Latency is Better ▪ L . ▪ UDP ▪ Proxy Facebook Architecture Monday, April 12, 2010
  • 59. Worse IS better ▪ Richard Gabriel’s famous essay contrasted ▪ ITS and Unix ▪ LISP and C ▪ MIT and New Jersey Monday, April 12, 2010 http://www.jwz.org/doc/worse-is-better.html
  • 60. Why Memcache Works ▪ Uniform, low latency with partial results is a better user experience ▪ memcache provides a few robust primitives ▪ key-to-server mapping ▪ parallel I/O ▪ flow-control ▪ traffic shaping ▪ that allow ad hoc solutions to a wide range of scaling issues Monday, April 12, 2010 We started with simple, obvious improvements. As we grew we deployed less obvious improvements... But they’ve remained pretty simple
  • 61. (c) Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. . Monday, April 12, 2010