SlideShare a Scribd company logo
Qcon 090408233824-phpapp01
Facebook Architecture



Aditya Agarwal
Director of Engineering
11/22/2008
Agenda
 1   Architecture Overview

 2   PHP, MySQL, Memcache

 3   Thrift, Scribe, Tools

 4   News Feed Architecture
At a Glance
   The Social Graph
   120M+ active users
   50B+ PVs per month
   10B+ Photos
   1B+ connections
   50K+ Platform Apps
   400K+ App Developers
General Design Principles
▪   Use open source where possible
      ▪   Explore making optimizations where needed

▪   Unix Philosophy
      ▪   Keep individual components simple yet performant
      ▪   Combine as necessary
      ▪   Concentrate on clean interface points

▪   Build everything for scale
▪   Try to minimize failure points
▪   Simplicity, Simplicity, Simplicity!
Architecture Overview

        LAMP           +      Services
        PHP                    AdServer
                               Search
        Memcache               Network Selector
                               News Feed
        MySQL                  Blogfeeds
                               CSSParser
              php!             Mobile
                               ShareScraper


                                     !php
                     Thrift
                     Scribe
                     ODS
                     Tools
PHP

▪   Good web programming language
     ▪   Extensive library support for web development
     ▪   Active developer community


▪   Good for rapid iteration
     ▪   Dynamically typed, interpreted scripting language
PHP: What we Learnt
▪   Tough to scale for large code bases
      ▪   Weak typing
      ▪   Limited opportunities for static analysis, code optimizations


▪   Not necessarily optimized for large website use case
      ▪   E.g. No dynamic reloading of files on web server


▪   Linearly increasing cost per included file


▪   Extension framework is difficult to use
PHP: Customizations
▪   Op-code optimization
▪   APC improvements
     ▪   Lazy loading
     ▪   Cache priming
     ▪   More efficient locking semantics for variable cache data

▪   Custom extensions
     ▪   Memcache client extension
     ▪   Serialization format
     ▪   Logging, Stats collection, Monitoring
     ▪   Asynchronous event-handling mechanism
MySQL
▪   Fast, reliable


▪   Used primarily as <key,value> store
      ▪   Data randomly distributed amongst large set of logical instances
      ▪   Most data access based on global id


▪   Large number of logical instances spread out across physical nodes
      ▪   Load balancing at physical node level


▪   No read replication
MySQL: What We Learnt (ing)
▪   Logical migration of data is very difficult


▪   Create a large number of logical dbs, load balance them over varying
    number of physical nodes


▪   No joins in production
      ▪   Logically difficult (because data is distributed randomly)


▪   Easier to scale CPU on web tier
MySQL: What we Learnt (ing)
▪   Most data access is for recent data
      ▪   Optimize table layout for recency
      ▪   Archive older data


▪   Don’t ever store non-static data in a central db
      ▪   CDB makes it easier to perform certain aggregated queries
      ▪   Will not scale


▪   Use services or memcache for global queries
      ▪   E.g.: What are the most popular groups in my network
MySQL: Customizations
▪   No extensive native MySQL modifications


▪   Custom partitioning scheme
     ▪   Global id assigned to all data


▪   Custom archiving scheme
     ▪   Based on frequency and recency of data on a per-user basis


▪   Extended Query Engine for cross-data center replication, cache
    consistency
MySQL: Customizations
▪   Graph based data-access libraries
     ▪   Loosely typed objects (nodes) with limited datatypes (int, varchar, text)
     ▪   Replicated connections (edges)
     ▪   Analogous to distributed foreign keys


▪   Some data collocated
     ▪   Example: User profile data and all of user’s connections


▪   Most data distributed randomly
Memcache
▪   High-Performance, distributed in-memory hash table
▪   Used to alleviate database load
▪   Primary form of caching
▪   Over 25TB of in-memory cache
▪   Average latency < 200 micro-seconds
▪   Cache serialized PHP data structures
▪   Lots and lots of multi-gets to retrieve data spanning across graph edges
Memache: Customizations
▪   Memache over UDP
     ▪   Reduce memory overhead of thousands of TCP connection buffers
     ▪   Application-level flow control (optimization for multi-gets)


▪   On demand aggregation of per-thread stats
     ▪   Reduces global lock contention


▪   Multiple Kernel changes to optimize for Memcache usage
     ▪   Distributing network interrupt handling over multiple cores
     ▪   Opportunistic polling of network interface
Let’s put this into action
Under the Covers
▪   Get my profile data
      ▪   Fetch from cache, potentially go to my DB (based on user-id)

▪   Get friend connections
      ▪   Cache, if not DB (based on user-id)

▪   In parallel, fetch last 10 photo album ids for each of my friends
      ▪   Multi-get; individual cache misses fetches data from db (based on photo-
          album id)

▪   Fetch data for most recent photo albums in parallel
▪   Execute page-specific rendering logic in PHP
▪   Return data, make user happy
LAMP is not Perfect
LAMP is not Perfect
▪   PHP+MySQL+Memcache works for a large class of problems but not for
    everything
     ▪   PHP is stateless
     ▪   PHP not the fastest executing language
     ▪   All data is remote

▪   Reasons why services are written
     ▪   Store code closer to data
     ▪   Compiled environment is more efficient
     ▪   Certain functionality only present in other languages
Services Philosophy
▪   Create a service iff required
      ▪   Real overhead for deployment, maintenance, separate code-base
      ▪   Another failure point

▪   Create a common framework and toolset that will allow for easier
    creation of services
      ▪   Thrift
      ▪   Scribe
      ▪   ODS, Alerting service, Monitoring service

▪   Use the right language, library and tool for the task
Thrift




High-Level Goal: Enable transparent interaction between these.
                                                                 …and some others too.
Thrift
▪   Lightweight software framework for cross-language development
▪   Provide IDL, statically generate code
▪   Supported bindings: C++, PHP, Python, Java, Ruby, Erlang, Perl, Haskell
    etc.
▪   Transports: Simple Interface to I/O
     ▪   Tsocket, TFileTransport, TMemoryBuffer

▪   Protocols: Serialization Format
     ▪   TBinaryProtocol, TJSONProtocol

▪   Servers
     ▪   Non-Blocking, Async, Single Threaded, Multi-threaded
Hasn’t this been done before?                      (yes.)


▪   SOAP
       ▪   XML, XML, and more XML

▪   CORBA
       ▪   Bloated? Remote bindings?

▪   COM
       ▪   Face-Win32ClientSoftware.dll-Book

▪   Pillar
       ▪   Slick! But no versioning/abstraction.

▪   Protocol Buffers
Thrift: Why?
•   It’s quick. Really quick.

•   Less time wasted by individual developers
     •   No duplicated networking and protocol code
     •   Less time dealing with boilerplate stuff
     •   Write your client and server in about 5 minutes


•   Division of labor
     •   Work on high-performance servers separate from applications

•   Common toolkit
     •   Fosters code reuse and shared tools
Scribe
▪   Scalable distributed logging framework
▪   Useful for logging a wide array of data
      ▪   Search Redologs
      ▪   Powers news feed publishing
      ▪   A/B testing data

▪   Weak Reliability
      ▪   More reliable than traditional logging but not suitable for database
          transactions.

▪   Simple data model
▪   Built on top of Thrift
Other Tools
▪   SMC (Service Management Console)
     ▪   Centralized configuration
     ▪   Used to determine logical service -> physical node mapping
Other Tools
▪   ODS
     ▪   Used to log and view historical trends for any stats published by service
     ▪   Useful for service monitoring, alerting
Open Source
▪   Thrift
      ▪   http://developers.facebook.com/thrift/



▪   Scribe
      ▪   http://developers.facebook.com/scribe/



▪   PHPEmbed
      ▪   http://developers.facebook.com/phpembed/



▪   More good stuff
      ▪   http://developers.facebook.com/opensource.php
NewsFeed – The Goodz
NewsFeed – The Work
                                                                                       friends’
                                                                                       actions
                                      web tier                           Leaf Server
                        Html

                                        PHP          Actions (Scribe)    Leaf Server
                     home.php                                            Leaf Server
     user

                                          return                         Leaf Server
                                        view state



                                       view                             aggregators
                                       state
                                      storage                                             friends’
                                                                                          actions?
                                                                         aggregating...
- Most arrows indicate thrift calls                                      ranking...
Search – The Goodz
Search – The Work
                    Thrift


                                        search tier
                                         slave             slave   master     slave
                                        index             index    index    index
user
         web tier
                      Scribe     live              db
        PHP                    change            index
                                logs              files




                                           Indexing service




                                           DB Tier
               Updates
Questions?

More info at www.facebook.com/eblog


Aditya Agarwal
aditya@facebook.com

More Related Content

PDF
PDF
Yahoo Communities Architecture Unlikely Bedfellows
PDF
My Sql And Search At Craigslist
PDF
What's behind facebook
PPT
PDF
Lessons from Highly Scalable Architectures at Social Networking Sites
PDF
facebook architecture for 600M users
PDF
Memcached and the Rise of the Dynamic Web
Yahoo Communities Architecture Unlikely Bedfellows
My Sql And Search At Craigslist
What's behind facebook
Lessons from Highly Scalable Architectures at Social Networking Sites
facebook architecture for 600M users
Memcached and the Rise of the Dynamic Web

What's hot (18)

PPTX
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
PDF
Yahoo Pipes Middleware In The Cloud
PPTX
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
PDF
SharePoint Saturday The Conference 2011 - SP2010 Performance
PDF
JCR In Action (ApacheCon US 2009)
PPTX
The Rise of NoSQL and Polyglot Persistence
PPT
Alfresco Large Scale Enterprise Deployments
PDF
Indroduction to Web Application
PPTX
How to JavaOne 2016 - Generate Customized Java 8 Code from Your Database [TUT...
PDF
Facebook Architecture - Breaking it Open
PPT
SPDY Talk
PDF
From 0 to syncing
PPTX
Scale your Alfresco Solutions
PDF
Newsql 2015-150213024325-conversion-gate01
PPT
Zing Database – Distributed Key-Value Database
PDF
Vibe Custom Development
PPTX
HTML5 표준 소개
PPTX
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
Yahoo Pipes Middleware In The Cloud
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
SharePoint Saturday The Conference 2011 - SP2010 Performance
JCR In Action (ApacheCon US 2009)
The Rise of NoSQL and Polyglot Persistence
Alfresco Large Scale Enterprise Deployments
Indroduction to Web Application
How to JavaOne 2016 - Generate Customized Java 8 Code from Your Database [TUT...
Facebook Architecture - Breaking it Open
SPDY Talk
From 0 to syncing
Scale your Alfresco Solutions
Newsql 2015-150213024325-conversion-gate01
Zing Database – Distributed Key-Value Database
Vibe Custom Development
HTML5 표준 소개
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Ad

Viewers also liked (10)

PPT
Alf Uno_Supply Day 2014_ Roberto Trevisan_11.07.14
PDF
Η ποντιακή διάλεκτος του Γ. Σαββαντίδη με ορισμένες λέξεις
PPTX
DOC
creative Cv
PDF
VOLANS如何讓我的設備每次都能向飛魚星路由器取得相同IP址?
PDF
VOLANS如何讓飛魚星路由器的其中一條外線斷線時,可自動將流量轉送至其他線路,使網路不會中斷?
PDF
VOLANS為什麼不能登入飛魚星路由器的WEB設定畫面?
PDF
Alf Uno_Supply Day 2015_ Alberto F. De Toni
PDF
Practica introductoria. manejo y uso del teodolito.
Alf Uno_Supply Day 2014_ Roberto Trevisan_11.07.14
Η ποντιακή διάλεκτος του Γ. Σαββαντίδη με ορισμένες λέξεις
creative Cv
VOLANS如何讓我的設備每次都能向飛魚星路由器取得相同IP址?
VOLANS如何讓飛魚星路由器的其中一條外線斷線時,可自動將流量轉送至其他線路,使網路不會中斷?
VOLANS為什麼不能登入飛魚星路由器的WEB設定畫面?
Alf Uno_Supply Day 2015_ Alberto F. De Toni
Practica introductoria. manejo y uso del teodolito.
Ad

Similar to Qcon 090408233824-phpapp01 (20)

PDF
Flickr Architecture Presentation
PDF
Flickr and PHP - Cal Henderson
PDF
Flickr Architecture Presentation
KEY
Profiling php applications
ZIP
Memcached, presented to LCA2010
PDF
Your backend architecture is what matters slideshare
PDF
20080528dublinpt1
PPS
Flickr Services
PPS
Flickr Services
PDF
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
PDF
Fixing twitter
PDF
Fixing_Twitter
PDF
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
KEY
Web frameworks don't matter
PDF
Top ten-list
PDF
flickr's architecture & php
ODP
MNPHP Scalable Architecture 101 - Feb 3 2011
PPS
Scalable Web Arch
PPS
Scalable Web Architectures - Common Patterns & Approaches
PDF
Evolving Archetecture
Flickr Architecture Presentation
Flickr and PHP - Cal Henderson
Flickr Architecture Presentation
Profiling php applications
Memcached, presented to LCA2010
Your backend architecture is what matters slideshare
20080528dublinpt1
Flickr Services
Flickr Services
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing twitter
Fixing_Twitter
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Web frameworks don't matter
Top ten-list
flickr's architecture & php
MNPHP Scalable Architecture 101 - Feb 3 2011
Scalable Web Arch
Scalable Web Architectures - Common Patterns & Approaches
Evolving Archetecture

Qcon 090408233824-phpapp01

  • 3. Agenda 1 Architecture Overview 2 PHP, MySQL, Memcache 3 Thrift, Scribe, Tools 4 News Feed Architecture
  • 4. At a Glance The Social Graph 120M+ active users 50B+ PVs per month 10B+ Photos 1B+ connections 50K+ Platform Apps 400K+ App Developers
  • 5. General Design Principles ▪ Use open source where possible ▪ Explore making optimizations where needed ▪ Unix Philosophy ▪ Keep individual components simple yet performant ▪ Combine as necessary ▪ Concentrate on clean interface points ▪ Build everything for scale ▪ Try to minimize failure points ▪ Simplicity, Simplicity, Simplicity!
  • 6. Architecture Overview LAMP + Services PHP AdServer Search Memcache Network Selector News Feed MySQL Blogfeeds CSSParser php! Mobile ShareScraper !php Thrift Scribe ODS Tools
  • 7. PHP ▪ Good web programming language ▪ Extensive library support for web development ▪ Active developer community ▪ Good for rapid iteration ▪ Dynamically typed, interpreted scripting language
  • 8. PHP: What we Learnt ▪ Tough to scale for large code bases ▪ Weak typing ▪ Limited opportunities for static analysis, code optimizations ▪ Not necessarily optimized for large website use case ▪ E.g. No dynamic reloading of files on web server ▪ Linearly increasing cost per included file ▪ Extension framework is difficult to use
  • 9. PHP: Customizations ▪ Op-code optimization ▪ APC improvements ▪ Lazy loading ▪ Cache priming ▪ More efficient locking semantics for variable cache data ▪ Custom extensions ▪ Memcache client extension ▪ Serialization format ▪ Logging, Stats collection, Monitoring ▪ Asynchronous event-handling mechanism
  • 10. MySQL ▪ Fast, reliable ▪ Used primarily as <key,value> store ▪ Data randomly distributed amongst large set of logical instances ▪ Most data access based on global id ▪ Large number of logical instances spread out across physical nodes ▪ Load balancing at physical node level ▪ No read replication
  • 11. MySQL: What We Learnt (ing) ▪ Logical migration of data is very difficult ▪ Create a large number of logical dbs, load balance them over varying number of physical nodes ▪ No joins in production ▪ Logically difficult (because data is distributed randomly) ▪ Easier to scale CPU on web tier
  • 12. MySQL: What we Learnt (ing) ▪ Most data access is for recent data ▪ Optimize table layout for recency ▪ Archive older data ▪ Don’t ever store non-static data in a central db ▪ CDB makes it easier to perform certain aggregated queries ▪ Will not scale ▪ Use services or memcache for global queries ▪ E.g.: What are the most popular groups in my network
  • 13. MySQL: Customizations ▪ No extensive native MySQL modifications ▪ Custom partitioning scheme ▪ Global id assigned to all data ▪ Custom archiving scheme ▪ Based on frequency and recency of data on a per-user basis ▪ Extended Query Engine for cross-data center replication, cache consistency
  • 14. MySQL: Customizations ▪ Graph based data-access libraries ▪ Loosely typed objects (nodes) with limited datatypes (int, varchar, text) ▪ Replicated connections (edges) ▪ Analogous to distributed foreign keys ▪ Some data collocated ▪ Example: User profile data and all of user’s connections ▪ Most data distributed randomly
  • 15. Memcache ▪ High-Performance, distributed in-memory hash table ▪ Used to alleviate database load ▪ Primary form of caching ▪ Over 25TB of in-memory cache ▪ Average latency < 200 micro-seconds ▪ Cache serialized PHP data structures ▪ Lots and lots of multi-gets to retrieve data spanning across graph edges
  • 16. Memache: Customizations ▪ Memache over UDP ▪ Reduce memory overhead of thousands of TCP connection buffers ▪ Application-level flow control (optimization for multi-gets) ▪ On demand aggregation of per-thread stats ▪ Reduces global lock contention ▪ Multiple Kernel changes to optimize for Memcache usage ▪ Distributing network interrupt handling over multiple cores ▪ Opportunistic polling of network interface
  • 17. Let’s put this into action
  • 18. Under the Covers ▪ Get my profile data ▪ Fetch from cache, potentially go to my DB (based on user-id) ▪ Get friend connections ▪ Cache, if not DB (based on user-id) ▪ In parallel, fetch last 10 photo album ids for each of my friends ▪ Multi-get; individual cache misses fetches data from db (based on photo- album id) ▪ Fetch data for most recent photo albums in parallel ▪ Execute page-specific rendering logic in PHP ▪ Return data, make user happy
  • 19. LAMP is not Perfect
  • 20. LAMP is not Perfect ▪ PHP+MySQL+Memcache works for a large class of problems but not for everything ▪ PHP is stateless ▪ PHP not the fastest executing language ▪ All data is remote ▪ Reasons why services are written ▪ Store code closer to data ▪ Compiled environment is more efficient ▪ Certain functionality only present in other languages
  • 21. Services Philosophy ▪ Create a service iff required ▪ Real overhead for deployment, maintenance, separate code-base ▪ Another failure point ▪ Create a common framework and toolset that will allow for easier creation of services ▪ Thrift ▪ Scribe ▪ ODS, Alerting service, Monitoring service ▪ Use the right language, library and tool for the task
  • 22. Thrift High-Level Goal: Enable transparent interaction between these. …and some others too.
  • 23. Thrift ▪ Lightweight software framework for cross-language development ▪ Provide IDL, statically generate code ▪ Supported bindings: C++, PHP, Python, Java, Ruby, Erlang, Perl, Haskell etc. ▪ Transports: Simple Interface to I/O ▪ Tsocket, TFileTransport, TMemoryBuffer ▪ Protocols: Serialization Format ▪ TBinaryProtocol, TJSONProtocol ▪ Servers ▪ Non-Blocking, Async, Single Threaded, Multi-threaded
  • 24. Hasn’t this been done before? (yes.) ▪ SOAP ▪ XML, XML, and more XML ▪ CORBA ▪ Bloated? Remote bindings? ▪ COM ▪ Face-Win32ClientSoftware.dll-Book ▪ Pillar ▪ Slick! But no versioning/abstraction. ▪ Protocol Buffers
  • 25. Thrift: Why? • It’s quick. Really quick. • Less time wasted by individual developers • No duplicated networking and protocol code • Less time dealing with boilerplate stuff • Write your client and server in about 5 minutes • Division of labor • Work on high-performance servers separate from applications • Common toolkit • Fosters code reuse and shared tools
  • 26. Scribe ▪ Scalable distributed logging framework ▪ Useful for logging a wide array of data ▪ Search Redologs ▪ Powers news feed publishing ▪ A/B testing data ▪ Weak Reliability ▪ More reliable than traditional logging but not suitable for database transactions. ▪ Simple data model ▪ Built on top of Thrift
  • 27. Other Tools ▪ SMC (Service Management Console) ▪ Centralized configuration ▪ Used to determine logical service -> physical node mapping
  • 28. Other Tools ▪ ODS ▪ Used to log and view historical trends for any stats published by service ▪ Useful for service monitoring, alerting
  • 29. Open Source ▪ Thrift ▪ http://developers.facebook.com/thrift/ ▪ Scribe ▪ http://developers.facebook.com/scribe/ ▪ PHPEmbed ▪ http://developers.facebook.com/phpembed/ ▪ More good stuff ▪ http://developers.facebook.com/opensource.php
  • 31. NewsFeed – The Work friends’ actions web tier Leaf Server Html PHP Actions (Scribe) Leaf Server home.php Leaf Server user return Leaf Server view state view aggregators state storage friends’ actions? aggregating... - Most arrows indicate thrift calls ranking...
  • 32. Search – The Goodz
  • 33. Search – The Work Thrift search tier slave slave master slave index index index index user web tier Scribe live db PHP change index logs files Indexing service DB Tier Updates
  • 34. Questions? More info at www.facebook.com/eblog Aditya Agarwal aditya@facebook.com