SlideShare a Scribd company logo
6.1. Web Scale34330EEDCExecutionEnvironments for Distributed Computing6.1.1. 	Anatomy of a service6.1.2. 	Too many Writes to 	Database6.1.3.	Cheaper peaks6.1.4.	Facebook PlatformMaster in Computer Architecture, Networks and Systems  - CANS
6.1. Web Scale34330EEDCExecution Environments for Distributed Computing6.1.1. 	Anatomy of a service6.1.2. 	Too many Writes to 	Database6.1.3.	Cheaper peaks6.1.4.	Facebook PlatformMaster in Computer Architecture, Networks and Systems  - CANS
Anatomy of a Web Service
Problems may arise in…Various browsers, plugins, operatingsystems, performance, screensize,PEBKAC, etc
Problems may arise in…Internet partitioning, performance bottlenecks,packetloss, jitter
Problems may arise in…DDoStargetinganothercustomer,routingproblems, capacity,Power/coolingproblems, «lazy» remotehands
Problems may arise in…Performance limits, bugs,configurationerrors,faulty HW
Problems may arise in…Networklimits, interruptlimitsOS limits, bugs,configurationerrors,faulty HW, error recovery,
Problems may arise in…Speed of clients, #threads, contentnot in sync, unresponsive Apps,toomanysources of contents,userpersistence,configurationerrors, bugs
Problems may arise in… Requests/sec100 KB	    5 MB	      50 KB      5 KB       50 KB	  50 KBDefault configuration of Tomcatallows 200 threads/instance
Problems may arise in…Speed of clients, #threads, contentnot in sync, unresponsive Apps,toomanysources of contents,userpersistenceconfigurationerrors, bugs
Problems may arise in…Databaseconcurrency, accessto 3rd party data (APIs),CPU ormemoryboundproblems,datacenterreplication,logginguseractions
Problems may arise in…Database concurrency, modifying schemas,Massive tables -> indexes,disk performance,CPU/memory bound,datacenter replication
Problems may arise in…Availability and performance,More than 24h to analyze daily logsNot reaching Inbox (spam folders)Surpass monitoring capacity
6.1. Web Scale34330EEDCExecution Environments for Distributed Computing6.1.1. 	Anatomy of a service6.1.2. 	Too many Writes to 	Database6.1.3.	Cheaper peaks6.1.4.	Facebook PlatformMaster in Computer Architecture, Networks and Systems  - CANS
Too many writes to databaseThere’s no machine  that could do 44k/sec over 1 TB of data.Scaling reads is easier: Big cacheReplicationOn write you have to:Update dataUpdate Transaction logUpdate indexesInvalidate cacheReplicateWrite to 2 or more disks (RAID x)http://www.scribd.com/doc/2592098/DVPmysqlucFederation-at-Flickr-Doing-Billions-of-Queries-Per-Day
		CaseDatabase FederationSharding per User-IDGlobal Ring, know where is the dataPHP Logic to connect shards and data consistentWhat’s a Shard?:Horizontal partitioning of a table, usually per Primary KeyBenefitsYou can scale as long as you have budgetDisadvantagesYou lost the possibility to do any JOIN, COUNT, RANGE, between ShardsYour application logic has to be awareIf you what to rebalance shards, you will need some kind of global unique, beware of auto-incrementsMore services needing HA, BCP, change control, and so on
		CaseGlobal Ring?Storing Key-Value of:User_ID -> Shard_IDPhoto_ID -> User_IDGroup_ID -> Shard_IDEvery access to data has to know where -> memcached with a TTL of 30 minutesGlobal IDs?:You don’t want two objects with the same ID!StrategiesGUIDs: 128 bits Ids, so bigger indexes, and poor supported by MySQLCentral autoincrement: You have a table where for every Id needed you do an insert and let MySQL take care of everything. At 60 photos/sec will be a BIG tableReplace Into: An only MySQL solution, small tables and allows for redundancy (one server provides odd and another even
		Case: Replace INTOThe Tickets64 schema looks like:CREATE TABLE `Tickets64` ( 	`id` bigint(20) unsigned NOT NULL auto_increment,`stub` char(1) NOT NULL default '',	PRIMARY KEY (`id`), UNIQUE KEY `stub` (`stub`)) ENGINE=MyISAMSELECT * from Tickets64 returns a single row that looks something like:+-------------------+------+ | id 			| stub |+-------------------+------+| 72157623227190423 | a 	| +-------------------+------+ When they need a new globally unique 64-bit ID they issue the following SQL:REPLACE INTO Tickets64 (stub) VALUES ('a'); SELECT LAST_INSERT_ID();
		CasePHP LogicYou lost any kind of intershard relational query (No JOINs)You lost any kind of integrity reference (No ForeignKeys)You have to control distributed transactionsYou select a Favorite (so they need to update your Shard and the one of the other user)Open 2 connections to the two shardsBegin a transaction on both ShardsAdd the dataIf everything is ok -> commit, else roll back and errorSo we improve scalability but impact code complexity and performance off a single page view (hint: async database access)
		CaseThey get an arbitrary scalable infrastructureThey have a marginally more complex code
Hai!I’mworking!
		CaseThey get an arbitrary scalable infrastructureThey have a marginally more complex codeThey “only” have 20 engineers, so scalability also means:Roughly 2.5 million Flickr members per engineer.Roughly 200 million photos per engineer.28 user facing pages. 23 administrative pages.20 API methods, though only 7.5 public API methods.80 API calls per second.250 CPUs.850 annual deploys.16 feature flags.
6.1. Web Scale34330EEDCExecution Environments for Distributed Computing6.1.1. 	Anatomy of a service6.1.2. 	Too many Writes to 	Database6.1.3.	Cheaper peaks6.1.4.	Facebook PlatformMaster in Computer Architecture, Networks and Systems  - CANS
CheaperpeaksIfyourcapacityplanning comes fromtheaggregate of allyourcustomers and you plan tohavethousands of them, whatcouldyou do?And your performance impacts in thebrand of yourcustomer (so you’llhaveproblems)You are a Start-up withoutloads of money
		  CaseWhat a recommendation engine looks like?
		  CaseHave to store data for every page view their customer getsDo MAGIC over millions of rows to calculate related items for YOUShow recommendations to userOnly 2 snippets of Javascript/HTMLLess than 0’5 seconds per view
		  CaseOption AEvery hit to tracker becomes an Insert to a MySQL sharded by customerEvery hit to recommender recalculates the list of items to show based on collective intelligenceBenefitsStraightforward to code and manageQuick and easy for a proof of conceptDisadvantagesOne customer on their peak could surpass the capacity of the MySQL instanceThe same customer on their valley could be wasting money on an idle instanceOur webserver could be overloaded with the sum of all our customersThe recommender is a CPU and memory Hog and we need too many servers to cope with our estimated demand
		  CaseOption BEvery hit to tracker becomes an Insert to a MySQL sharded by customerWe have a cron job that recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsBenefitsStraightforward to codeThe compute intensive task is out of critical path, is asynchronousDisadvantagesOne customer on their peak could surpass the capacity of the MySQL instanceThe same customer on their valley could be wasting money on an idle instanceOur webserver could be overloaded with the sum of all our customersWe have to control what are doing our cron jobs and check for errors and tune them so they don’t bring down the database
		  CaseOption CEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB (sharded by customer) the corresponding set of itemsBenefitsStraightforward to code, only had to move and parse filesA surge on pageviews don’t bring down the database for writesThe compute intensive task is out of critical path, it’s asynchronousDisadvantagesOne customer on their peak could surpass the capacity of the MySQL instanceThe same customer on their valley could be wasting money on an idle instanceWe have to control what are doing our cron jobs and check for errors and tune them so they don’t bring down the databaseWe could hit bandwidth limits
		  CaseOption DEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructureWe could hit bandwidth limits
		  Case: Map/ReduceHadoop: It’s “only” a Framework for running Map/Reduce Applications on large clusters. Allows replication and Fault tolerance, as HW failure will be the norm, using a distributed file system, HDFSMap/Reduce: In a map/reduce application, there are two kinds of jobs, Map and Reduce. Mappers read the HDFS blocks and does local processing and run in parallel. From a webserver log file <url,#hits>Reducers get the output of many mappers and consolidate data. If there was a mapper per day, reducer could calculate how many monthly hits get an URLHbase: Hadoop/MR design gets better throughput than latency so it’s used as analytical platform, but Hbase allow low latency random access to very big tables (billions of rows per millions of columns)Column oriented DB: Table->Row->ColumnFamily->Timestamp=>Value
		  CaseOption DEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructureWe could hit bandwidth limits
		  CaseOption EEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingAll static files served by a CDNBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousUnlimited bandwidthDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructure
		  Case: CDNWhat’s a Content Delivery Network?Your server or http repository (Amazon S3,..) is the Origin of the contentThey give you a DNS name (bb.cdn.net) and you have to create a CNAME to this name (www.example.com -> bb.cdn.net.)When a user asks for www.example.com, the CDN will chose which of their nodes is the nearest to the user and give it/they IP addressesThe user asks for a content (/a.gif) to the node of the CDN, that will check if it has a fresh copy that will send or if it’s a MISS will check with they upstream caches till your OriginSo we get unlimited bandwidth and better latency (we can’t surpass the speed of light)
		  CaseOption EEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingAll static files served by a CDNBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousUnlimited bandwidthDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructure
		  CaseThey get a completely scalable infrastructure at AWSCan provision a new Cruncher, Datastore or Recommender in a matter of minutes and remove it as soon as neededThey don’t have any upper limit of how many request could serveAll the requests that can impact on the User Experience of the customers of theirs are served by a CDNAs there are only 3 kinds of servers and are managed as images, don’t need so much engineers to take care of the infrastructure
6.1. Web Scale34330EEDCExecution Environments for Distributed Computing6.1.1. 	Anatomy of a service6.1.2. 	Too many Writes to 	Database6.1.3.	Cheaper peaks6.1.4.	Facebook PlatformMaster in Computer Architecture, Networks and Systems  - CANS
Facebook PlatformIf your primary data source is not under your control and it’s too far, what happens?An API case
   CaseDuplicatedGifts
   CaseLovingit	     More «Pongos»Hittingthebullseye?
   CaseIt’s a social wish list applicationWhen you access checks if your friends have enabled the application and shows their wish listsYou can share your wish lists on FacebookYou can capture wishes (gifts) and be shown a feed of possible merchantsInitial loading time is criticalExpect virality so we won’t have too much response time
   CaseFlow
   CaseNicebutSlow.3 to 7 secondsto load
   CaseDefine goalsDefine metricsAnalizemetricsImproveone at time
   Case: GoalsTime to load < 1 secondEverythingworks
	   Case: MetricsTime tosessionsetupValidatingto FacebookGettingFriendsInformationLookupsto local Database (lists, items, captureditems)Time to load «home» pageGet HTMLGetwidgetsGetJavascriptsGetvariousgraphicassets
	   Case: Analyzing MetricsTime to session setupValidating to Facebook (300 ms)Getting Friends Information (3 sec)Lookups to local Database (lists, items, captured items) (30 ms)Time to load «home» pageGet HTML (400 ms)Get widgets (300 ms)Get Javascripts (300 ms)Get various graphic assets (500 ms)
	   Case: Facebook accessFromToFrom 3 seconds to 500 ms!
	   Case: Facebook accessIn ASP.net we “only” have 12 threads/CPU -> Only 12 concurrent requests. From 4 users/sec to 24/secWe could use asynchronous calls but:Low parallelism, if we don’t know the GetAppUsers, we can’t ask for GetUserInfo, so no speedupWe could increase the default #threads to another number (.NET 4.0 defaults at 5000/CPU)We can get fail resiliency adjusting timeouts and increasing threads, connections, and so on
	   Case: Leveraging “free” toolsSet future Expires on static filesUsers leverage their browser’s cache and are lighter at server’s sideUse “free” CDN to get Jquery et Al.Microsoft and Google provide a public and free repository of Javascript toolsUse CSS spritesAlthough graphic files are small, they need a TCP connection to retrieve. Combining most graphic assets in a big file and use CSS to select which one to show#nav li a {background-image:url('../img/image_nav.gif')} #nav li a.item1 {background-position:0px 0px} #nav li a:hover.item1 {background-position:0px -72px}
	   Case: more on SpritesAvg size 2KB/fileHTTP/1.1 (rfc 2616) suggests that browsers download no more than 2 components in parallel per hostnameSmall files doesn’t use all available bandwidth. TCP Slow Start…Latency also plays an important role
AboutthissessionSergi Morales, Founder & CTO of Expertos en TIPhone: +34 6688-XPNTIEmail : sergi.morales+eedc@expertosenti.comBlog : http://blog.expertosenti.com	Web: http://www.expertosenti.comExpertos en TI: We help Internet oriented projects to leverage all the research done by the big sites (Flickr, Facebook, Twitter, Salesforce, Google, and so on) so they can improve their bottom line and be prepared for growth
About the EEDC course 34330 	Execution Environments for Distributed Computing (EEDC), 	Master in Computer Architecture, Networks and Systems (CANS)	Computer Architectura Department (AC)		Universitat Politècnica de Catalunya – Barcelona Tech (UPC)	ECTS credits: 6INSTRUCTOR	Professor Jordi TorresPhone: +34 93 401 7223 Email : torres@ac.upc.eduOffice : Campus Nord, Modul C6. Room 217.	Web: http://www.JordiTorres.org

More Related Content

PPT
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
PPT
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
PDF
How to boost performance of your rails app using dynamo db and memcached
PPTX
Azure Storage Revisited
PPTX
NoSQLDatabases
PDF
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
PPT
5 Years of Progress in Active Data Warehousing
DOCX
Ibm info sphere datastage tutorial part 1 architecture examples
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How to boost performance of your rails app using dynamo db and memcached
Azure Storage Revisited
NoSQLDatabases
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
5 Years of Progress in Active Data Warehousing
Ibm info sphere datastage tutorial part 1 architecture examples

What's hot (15)

PPTX
Relational databases vs Non-relational databases
PDF
No sql3 rmoug
PPTX
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
PPTX
Implement SQL Server on an Azure VM
PPTX
Self-Service ETL: The PowerBI Data Flows
PDF
Windows azure sql_database_tutorials
PPT
Service Primitives for Internet Scale Applications
PDF
Ibm db2 big sql
DOCX
Queues, Pools and Caches - Paper
PDF
Big Data: SQL on Hadoop from IBM
PPT
CouchBase The Complete NoSql Solution for Big Data
PPTX
PDF
Data Infrastructure at LinkedIn
PPTX
RavenDB overview
DOC
Dbm 380 Motivated Minds/newtonhelp.com
Relational databases vs Non-relational databases
No sql3 rmoug
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
Implement SQL Server on an Azure VM
Self-Service ETL: The PowerBI Data Flows
Windows azure sql_database_tutorials
Service Primitives for Internet Scale Applications
Ibm db2 big sql
Queues, Pools and Caches - Paper
Big Data: SQL on Hadoop from IBM
CouchBase The Complete NoSql Solution for Big Data
Data Infrastructure at LinkedIn
RavenDB overview
Dbm 380 Motivated Minds/newtonhelp.com
Ad

Similar to EEDC 2010. Scaling Web Applications (20)

PPS
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
PPS
Web20expo Scalable Web Arch
PPS
Web20expo Scalable Web Arch
PPS
Web20expo Scalable Web Arch
PPS
Scalable Web Arch
PPS
Scalable Web Architectures - Common Patterns & Approaches
PPTX
Couchbase - orbitz use case - nyc meetup
ODP
redis
PPTX
Clustrix Database Percona Ruby on Rails benchmark
PDF
Your backend architecture is what matters slideshare
PDF
Scalarium and CouchDB
PPTX
Scaling your website
PPTX
Couchbase presentation
PDF
Scalability Considerations
PPTX
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
PPT
Geek Sessions Talk
PDF
Scaling Social Games
PDF
Scaling Databases On The Cloud
PDF
Scaing databases on the cloud
PDF
Advanced Deployment
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Web20expo Scalable Web Arch
Web20expo Scalable Web Arch
Web20expo Scalable Web Arch
Scalable Web Arch
Scalable Web Architectures - Common Patterns & Approaches
Couchbase - orbitz use case - nyc meetup
redis
Clustrix Database Percona Ruby on Rails benchmark
Your backend architecture is what matters slideshare
Scalarium and CouchDB
Scaling your website
Couchbase presentation
Scalability Considerations
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Geek Sessions Talk
Scaling Social Games
Scaling Databases On The Cloud
Scaing databases on the cloud
Advanced Deployment
Ad

More from Expertos en TI (14)

PPTX
CloudCamp VLC DevOps
PPTX
Estrategias para implantar Cloud
PPTX
CloudCamp BCN, To Cloud or not to Cloud
PDF
Mejora De Rendimiento En Web
PDF
Diagnóstico Del Rendimiento
PDF
Escalabilidad
PDF
Escalabilidad Guias
PDF
Escalabilidad Formacion
PDF
Escalabilidad Asesorar
PDF
Escalabilidad guias
PDF
Escalabilidad formacion
PDF
Escalabilidad asesorar
PDF
Escalabilidad
PPTX
EEDC 2010. Scaling SaaS Applications
CloudCamp VLC DevOps
Estrategias para implantar Cloud
CloudCamp BCN, To Cloud or not to Cloud
Mejora De Rendimiento En Web
Diagnóstico Del Rendimiento
Escalabilidad
Escalabilidad Guias
Escalabilidad Formacion
Escalabilidad Asesorar
Escalabilidad guias
Escalabilidad formacion
Escalabilidad asesorar
Escalabilidad
EEDC 2010. Scaling SaaS Applications

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Electronic commerce courselecture one. Pdf
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Cloud computing and distributed systems.
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
A Presentation on Artificial Intelligence
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Big Data Technologies - Introduction.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx
MIND Revenue Release Quarter 2 2025 Press Release
The Rise and Fall of 3GPP – Time for a Sabbatical?
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation_ Review paper, used for researhc scholars
Electronic commerce courselecture one. Pdf
Machine Learning_overview_presentation.pptx
Cloud computing and distributed systems.
Chapter 3 Spatial Domain Image Processing.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
sap open course for s4hana steps from ECC to s4
A Presentation on Artificial Intelligence
Dropbox Q2 2025 Financial Results & Investor Presentation
Big Data Technologies - Introduction.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf

EEDC 2010. Scaling Web Applications

  • 1. 6.1. Web Scale34330EEDCExecutionEnvironments for Distributed Computing6.1.1. Anatomy of a service6.1.2. Too many Writes to Database6.1.3. Cheaper peaks6.1.4. Facebook PlatformMaster in Computer Architecture, Networks and Systems - CANS
  • 2. 6.1. Web Scale34330EEDCExecution Environments for Distributed Computing6.1.1. Anatomy of a service6.1.2. Too many Writes to Database6.1.3. Cheaper peaks6.1.4. Facebook PlatformMaster in Computer Architecture, Networks and Systems - CANS
  • 3. Anatomy of a Web Service
  • 4. Problems may arise in…Various browsers, plugins, operatingsystems, performance, screensize,PEBKAC, etc
  • 5. Problems may arise in…Internet partitioning, performance bottlenecks,packetloss, jitter
  • 6. Problems may arise in…DDoStargetinganothercustomer,routingproblems, capacity,Power/coolingproblems, «lazy» remotehands
  • 7. Problems may arise in…Performance limits, bugs,configurationerrors,faulty HW
  • 8. Problems may arise in…Networklimits, interruptlimitsOS limits, bugs,configurationerrors,faulty HW, error recovery,
  • 9. Problems may arise in…Speed of clients, #threads, contentnot in sync, unresponsive Apps,toomanysources of contents,userpersistence,configurationerrors, bugs
  • 10. Problems may arise in… Requests/sec100 KB 5 MB 50 KB 5 KB 50 KB 50 KBDefault configuration of Tomcatallows 200 threads/instance
  • 11. Problems may arise in…Speed of clients, #threads, contentnot in sync, unresponsive Apps,toomanysources of contents,userpersistenceconfigurationerrors, bugs
  • 12. Problems may arise in…Databaseconcurrency, accessto 3rd party data (APIs),CPU ormemoryboundproblems,datacenterreplication,logginguseractions
  • 13. Problems may arise in…Database concurrency, modifying schemas,Massive tables -> indexes,disk performance,CPU/memory bound,datacenter replication
  • 14. Problems may arise in…Availability and performance,More than 24h to analyze daily logsNot reaching Inbox (spam folders)Surpass monitoring capacity
  • 15. 6.1. Web Scale34330EEDCExecution Environments for Distributed Computing6.1.1. Anatomy of a service6.1.2. Too many Writes to Database6.1.3. Cheaper peaks6.1.4. Facebook PlatformMaster in Computer Architecture, Networks and Systems - CANS
  • 16. Too many writes to databaseThere’s no machine that could do 44k/sec over 1 TB of data.Scaling reads is easier: Big cacheReplicationOn write you have to:Update dataUpdate Transaction logUpdate indexesInvalidate cacheReplicateWrite to 2 or more disks (RAID x)http://www.scribd.com/doc/2592098/DVPmysqlucFederation-at-Flickr-Doing-Billions-of-Queries-Per-Day
  • 17. CaseDatabase FederationSharding per User-IDGlobal Ring, know where is the dataPHP Logic to connect shards and data consistentWhat’s a Shard?:Horizontal partitioning of a table, usually per Primary KeyBenefitsYou can scale as long as you have budgetDisadvantagesYou lost the possibility to do any JOIN, COUNT, RANGE, between ShardsYour application logic has to be awareIf you what to rebalance shards, you will need some kind of global unique, beware of auto-incrementsMore services needing HA, BCP, change control, and so on
  • 18. CaseGlobal Ring?Storing Key-Value of:User_ID -> Shard_IDPhoto_ID -> User_IDGroup_ID -> Shard_IDEvery access to data has to know where -> memcached with a TTL of 30 minutesGlobal IDs?:You don’t want two objects with the same ID!StrategiesGUIDs: 128 bits Ids, so bigger indexes, and poor supported by MySQLCentral autoincrement: You have a table where for every Id needed you do an insert and let MySQL take care of everything. At 60 photos/sec will be a BIG tableReplace Into: An only MySQL solution, small tables and allows for redundancy (one server provides odd and another even
  • 19. Case: Replace INTOThe Tickets64 schema looks like:CREATE TABLE `Tickets64` ( `id` bigint(20) unsigned NOT NULL auto_increment,`stub` char(1) NOT NULL default '', PRIMARY KEY (`id`), UNIQUE KEY `stub` (`stub`)) ENGINE=MyISAMSELECT * from Tickets64 returns a single row that looks something like:+-------------------+------+ | id | stub |+-------------------+------+| 72157623227190423 | a | +-------------------+------+ When they need a new globally unique 64-bit ID they issue the following SQL:REPLACE INTO Tickets64 (stub) VALUES ('a'); SELECT LAST_INSERT_ID();
  • 20. CasePHP LogicYou lost any kind of intershard relational query (No JOINs)You lost any kind of integrity reference (No ForeignKeys)You have to control distributed transactionsYou select a Favorite (so they need to update your Shard and the one of the other user)Open 2 connections to the two shardsBegin a transaction on both ShardsAdd the dataIf everything is ok -> commit, else roll back and errorSo we improve scalability but impact code complexity and performance off a single page view (hint: async database access)
  • 21. CaseThey get an arbitrary scalable infrastructureThey have a marginally more complex code
  • 23. CaseThey get an arbitrary scalable infrastructureThey have a marginally more complex codeThey “only” have 20 engineers, so scalability also means:Roughly 2.5 million Flickr members per engineer.Roughly 200 million photos per engineer.28 user facing pages. 23 administrative pages.20 API methods, though only 7.5 public API methods.80 API calls per second.250 CPUs.850 annual deploys.16 feature flags.
  • 24. 6.1. Web Scale34330EEDCExecution Environments for Distributed Computing6.1.1. Anatomy of a service6.1.2. Too many Writes to Database6.1.3. Cheaper peaks6.1.4. Facebook PlatformMaster in Computer Architecture, Networks and Systems - CANS
  • 25. CheaperpeaksIfyourcapacityplanning comes fromtheaggregate of allyourcustomers and you plan tohavethousands of them, whatcouldyou do?And your performance impacts in thebrand of yourcustomer (so you’llhaveproblems)You are a Start-up withoutloads of money
  • 26. CaseWhat a recommendation engine looks like?
  • 27. CaseHave to store data for every page view their customer getsDo MAGIC over millions of rows to calculate related items for YOUShow recommendations to userOnly 2 snippets of Javascript/HTMLLess than 0’5 seconds per view
  • 28. CaseOption AEvery hit to tracker becomes an Insert to a MySQL sharded by customerEvery hit to recommender recalculates the list of items to show based on collective intelligenceBenefitsStraightforward to code and manageQuick and easy for a proof of conceptDisadvantagesOne customer on their peak could surpass the capacity of the MySQL instanceThe same customer on their valley could be wasting money on an idle instanceOur webserver could be overloaded with the sum of all our customersThe recommender is a CPU and memory Hog and we need too many servers to cope with our estimated demand
  • 29. CaseOption BEvery hit to tracker becomes an Insert to a MySQL sharded by customerWe have a cron job that recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsBenefitsStraightforward to codeThe compute intensive task is out of critical path, is asynchronousDisadvantagesOne customer on their peak could surpass the capacity of the MySQL instanceThe same customer on their valley could be wasting money on an idle instanceOur webserver could be overloaded with the sum of all our customersWe have to control what are doing our cron jobs and check for errors and tune them so they don’t bring down the database
  • 30. CaseOption CEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB (sharded by customer) the corresponding set of itemsBenefitsStraightforward to code, only had to move and parse filesA surge on pageviews don’t bring down the database for writesThe compute intensive task is out of critical path, it’s asynchronousDisadvantagesOne customer on their peak could surpass the capacity of the MySQL instanceThe same customer on their valley could be wasting money on an idle instanceWe have to control what are doing our cron jobs and check for errors and tune them so they don’t bring down the databaseWe could hit bandwidth limits
  • 31. CaseOption DEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructureWe could hit bandwidth limits
  • 32. Case: Map/ReduceHadoop: It’s “only” a Framework for running Map/Reduce Applications on large clusters. Allows replication and Fault tolerance, as HW failure will be the norm, using a distributed file system, HDFSMap/Reduce: In a map/reduce application, there are two kinds of jobs, Map and Reduce. Mappers read the HDFS blocks and does local processing and run in parallel. From a webserver log file <url,#hits>Reducers get the output of many mappers and consolidate data. If there was a mapper per day, reducer could calculate how many monthly hits get an URLHbase: Hadoop/MR design gets better throughput than latency so it’s used as analytical platform, but Hbase allow low latency random access to very big tables (billions of rows per millions of columns)Column oriented DB: Table->Row->ColumnFamily->Timestamp=>Value
  • 33. CaseOption DEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructureWe could hit bandwidth limits
  • 34. CaseOption EEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingAll static files served by a CDNBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousUnlimited bandwidthDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructure
  • 35. Case: CDNWhat’s a Content Delivery Network?Your server or http repository (Amazon S3,..) is the Origin of the contentThey give you a DNS name (bb.cdn.net) and you have to create a CNAME to this name (www.example.com -> bb.cdn.net.)When a user asks for www.example.com, the CDN will chose which of their nodes is the nearest to the user and give it/they IP addressesThe user asks for a content (/a.gif) to the node of the CDN, that will check if it has a fresh copy that will send or if it’s a MISS will check with they upstream caches till your OriginSo we get unlimited bandwidth and better latency (we can’t surpass the speed of light)
  • 36. CaseOption EEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingAll static files served by a CDNBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousUnlimited bandwidthDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructure
  • 37. CaseThey get a completely scalable infrastructure at AWSCan provision a new Cruncher, Datastore or Recommender in a matter of minutes and remove it as soon as neededThey don’t have any upper limit of how many request could serveAll the requests that can impact on the User Experience of the customers of theirs are served by a CDNAs there are only 3 kinds of servers and are managed as images, don’t need so much engineers to take care of the infrastructure
  • 38. 6.1. Web Scale34330EEDCExecution Environments for Distributed Computing6.1.1. Anatomy of a service6.1.2. Too many Writes to Database6.1.3. Cheaper peaks6.1.4. Facebook PlatformMaster in Computer Architecture, Networks and Systems - CANS
  • 39. Facebook PlatformIf your primary data source is not under your control and it’s too far, what happens?An API case
  • 40. CaseDuplicatedGifts
  • 41. CaseLovingit More «Pongos»Hittingthebullseye?
  • 42. CaseIt’s a social wish list applicationWhen you access checks if your friends have enabled the application and shows their wish listsYou can share your wish lists on FacebookYou can capture wishes (gifts) and be shown a feed of possible merchantsInitial loading time is criticalExpect virality so we won’t have too much response time
  • 43. CaseFlow
  • 44. CaseNicebutSlow.3 to 7 secondsto load
  • 45. CaseDefine goalsDefine metricsAnalizemetricsImproveone at time
  • 46. Case: GoalsTime to load < 1 secondEverythingworks
  • 47. Case: MetricsTime tosessionsetupValidatingto FacebookGettingFriendsInformationLookupsto local Database (lists, items, captureditems)Time to load «home» pageGet HTMLGetwidgetsGetJavascriptsGetvariousgraphicassets
  • 48. Case: Analyzing MetricsTime to session setupValidating to Facebook (300 ms)Getting Friends Information (3 sec)Lookups to local Database (lists, items, captured items) (30 ms)Time to load «home» pageGet HTML (400 ms)Get widgets (300 ms)Get Javascripts (300 ms)Get various graphic assets (500 ms)
  • 49. Case: Facebook accessFromToFrom 3 seconds to 500 ms!
  • 50. Case: Facebook accessIn ASP.net we “only” have 12 threads/CPU -> Only 12 concurrent requests. From 4 users/sec to 24/secWe could use asynchronous calls but:Low parallelism, if we don’t know the GetAppUsers, we can’t ask for GetUserInfo, so no speedupWe could increase the default #threads to another number (.NET 4.0 defaults at 5000/CPU)We can get fail resiliency adjusting timeouts and increasing threads, connections, and so on
  • 51. Case: Leveraging “free” toolsSet future Expires on static filesUsers leverage their browser’s cache and are lighter at server’s sideUse “free” CDN to get Jquery et Al.Microsoft and Google provide a public and free repository of Javascript toolsUse CSS spritesAlthough graphic files are small, they need a TCP connection to retrieve. Combining most graphic assets in a big file and use CSS to select which one to show#nav li a {background-image:url('../img/image_nav.gif')} #nav li a.item1 {background-position:0px 0px} #nav li a:hover.item1 {background-position:0px -72px}
  • 52. Case: more on SpritesAvg size 2KB/fileHTTP/1.1 (rfc 2616) suggests that browsers download no more than 2 components in parallel per hostnameSmall files doesn’t use all available bandwidth. TCP Slow Start…Latency also plays an important role
  • 53. AboutthissessionSergi Morales, Founder & CTO of Expertos en TIPhone: +34 6688-XPNTIEmail : sergi.morales+eedc@expertosenti.comBlog : http://blog.expertosenti.com Web: http://www.expertosenti.comExpertos en TI: We help Internet oriented projects to leverage all the research done by the big sites (Flickr, Facebook, Twitter, Salesforce, Google, and so on) so they can improve their bottom line and be prepared for growth
  • 54. About the EEDC course 34330 Execution Environments for Distributed Computing (EEDC), Master in Computer Architecture, Networks and Systems (CANS) Computer Architectura Department (AC) Universitat Politècnica de Catalunya – Barcelona Tech (UPC) ECTS credits: 6INSTRUCTOR Professor Jordi TorresPhone: +34 93 401 7223 Email : torres@ac.upc.eduOffice : Campus Nord, Modul C6. Room 217. Web: http://www.JordiTorres.org
  • 55. 34330EEDCExecutionEnvironments for Distributed ComputingSergi MoralesFounder & CTOT: 668897684E: sergi.morales@expertosenti.comL: www.linkedin.com/in/sergimoralesMaster in Computer Architecture, Networks and Systems - CANS
  • 56. CaseAsynchronous access to Facebook API serverExpect to failTables with so many rows, a key/value approachConsistent hashing to loadbalance dataSticky servers?