EEDC 2010. Scaling Web Applications

6.1. Web Scale34330EEDCExecutionEnvironments for Distributed Computing6.1.1. Anatomy of a service6.1.2. Too many Writes to Database6.1.3. Cheaper peaks6.1.4. Facebook PlatformMaster in Computer Architecture, Networks and Systems - CANS

6.1. Web Scale34330EEDCExecution Environments for Distributed Computing6.1.1. Anatomy of a service6.1.2. Too many Writes to Database6.1.3. Cheaper peaks6.1.4. Facebook PlatformMaster in Computer Architecture, Networks and Systems - CANS

Problems may arise in…Various browsers, plugins, operatingsystems, performance, screensize,PEBKAC, etc

Problems may arise in…Internet partitioning, performance bottlenecks,packetloss, jitter

Problems may arise in…DDoStargetinganothercustomer,routingproblems, capacity,Power/coolingproblems, «lazy» remotehands

Problems may arise in…Performance limits, bugs,configurationerrors,faulty HW

Problems may arise in…Networklimits, interruptlimitsOS limits, bugs,configurationerrors,faulty HW, error recovery,

Problems may arise in…Speed of clients, #threads, contentnot in sync, unresponsive Apps,toomanysources of contents,userpersistence,configurationerrors, bugs

Problems may arise in… Requests/sec100 KB 5 MB 50 KB 5 KB 50 KB 50 KBDefault configuration of Tomcatallows 200 threads/instance

Problems may arise in…Speed of clients, #threads, contentnot in sync, unresponsive Apps,toomanysources of contents,userpersistenceconfigurationerrors, bugs

Problems may arise in…Databaseconcurrency, accessto 3rd party data (APIs),CPU ormemoryboundproblems,datacenterreplication,logginguseractions

Problems may arise in…Database concurrency, modifying schemas,Massive tables -> indexes,disk performance,CPU/memory bound,datacenter replication

Problems may arise in…Availability and performance,More than 24h to analyze daily logsNot reaching Inbox (spam folders)Surpass monitoring capacity

Too many writes to databaseThere’s no machine that could do 44k/sec over 1 TB of data.Scaling reads is easier: Big cacheReplicationOn write you have to:Update dataUpdate Transaction logUpdate indexesInvalidate cacheReplicateWrite to 2 or more disks (RAID x)http://www.scribd.com/doc/2592098/DVPmysqlucFederation-at-Flickr-Doing-Billions-of-Queries-Per-Day

CaseDatabase FederationSharding per User-IDGlobal Ring, know where is the dataPHP Logic to connect shards and data consistentWhat’s a Shard?:Horizontal partitioning of a table, usually per Primary KeyBenefitsYou can scale as long as you have budgetDisadvantagesYou lost the possibility to do any JOIN, COUNT, RANGE, between ShardsYour application logic has to be awareIf you what to rebalance shards, you will need some kind of global unique, beware of auto-incrementsMore services needing HA, BCP, change control, and so on

CaseGlobal Ring?Storing Key-Value of:User_ID -> Shard_IDPhoto_ID -> User_IDGroup_ID -> Shard_IDEvery access to data has to know where -> memcached with a TTL of 30 minutesGlobal IDs?:You don’t want two objects with the same ID!StrategiesGUIDs: 128 bits Ids, so bigger indexes, and poor supported by MySQLCentral autoincrement: You have a table where for every Id needed you do an insert and let MySQL take care of everything. At 60 photos/sec will be a BIG tableReplace Into: An only MySQL solution, small tables and allows for redundancy (one server provides odd and another even

Case: Replace INTOThe Tickets64 schema looks like:CREATE TABLE `Tickets64` ( `id` bigint(20) unsigned NOT NULL auto_increment,`stub` char(1) NOT NULL default '', PRIMARY KEY (`id`), UNIQUE KEY `stub` (`stub`)) ENGINE=MyISAMSELECT * from Tickets64 returns a single row that looks something like:+-------------------+------+ | id | stub |+-------------------+------+| 72157623227190423 | a | +-------------------+------+ When they need a new globally unique 64-bit ID they issue the following SQL:REPLACE INTO Tickets64 (stub) VALUES ('a'); SELECT LAST_INSERT_ID();

CasePHP LogicYou lost any kind of intershard relational query (No JOINs)You lost any kind of integrity reference (No ForeignKeys)You have to control distributed transactionsYou select a Favorite (so they need to update your Shard and the one of the other user)Open 2 connections to the two shardsBegin a transaction on both ShardsAdd the dataIf everything is ok -> commit, else roll back and errorSo we improve scalability but impact code complexity and performance off a single page view (hint: async database access)

CaseThey get an arbitrary scalable infrastructureThey have a marginally more complex code

CaseThey get an arbitrary scalable infrastructureThey have a marginally more complex codeThey “only” have 20 engineers, so scalability also means:Roughly 2.5 million Flickr members per engineer.Roughly 200 million photos per engineer.28 user facing pages. 23 administrative pages.20 API methods, though only 7.5 public API methods.80 API calls per second.250 CPUs.850 annual deploys.16 feature flags.

CheaperpeaksIfyourcapacityplanning comes fromtheaggregate of allyourcustomers and you plan tohavethousands of them, whatcouldyou do?And your performance impacts in thebrand of yourcustomer (so you’llhaveproblems)You are a Start-up withoutloads of money

CaseWhat a recommendation engine looks like?

CaseHave to store data for every page view their customer getsDo MAGIC over millions of rows to calculate related items for YOUShow recommendations to userOnly 2 snippets of Javascript/HTMLLess than 0’5 seconds per view

CaseOption AEvery hit to tracker becomes an Insert to a MySQL sharded by customerEvery hit to recommender recalculates the list of items to show based on collective intelligenceBenefitsStraightforward to code and manageQuick and easy for a proof of conceptDisadvantagesOne customer on their peak could surpass the capacity of the MySQL instanceThe same customer on their valley could be wasting money on an idle instanceOur webserver could be overloaded with the sum of all our customersThe recommender is a CPU and memory Hog and we need too many servers to cope with our estimated demand

CaseOption BEvery hit to tracker becomes an Insert to a MySQL sharded by customerWe have a cron job that recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsBenefitsStraightforward to codeThe compute intensive task is out of critical path, is asynchronousDisadvantagesOne customer on their peak could surpass the capacity of the MySQL instanceThe same customer on their valley could be wasting money on an idle instanceOur webserver could be overloaded with the sum of all our customersWe have to control what are doing our cron jobs and check for errors and tune them so they don’t bring down the database

CaseOption CEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB (sharded by customer) the corresponding set of itemsBenefitsStraightforward to code, only had to move and parse filesA surge on pageviews don’t bring down the database for writesThe compute intensive task is out of critical path, it’s asynchronousDisadvantagesOne customer on their peak could surpass the capacity of the MySQL instanceThe same customer on their valley could be wasting money on an idle instanceWe have to control what are doing our cron jobs and check for errors and tune them so they don’t bring down the databaseWe could hit bandwidth limits

CaseOption DEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructureWe could hit bandwidth limits

Case: Map/ReduceHadoop: It’s “only” a Framework for running Map/Reduce Applications on large clusters. Allows replication and Fault tolerance, as HW failure will be the norm, using a distributed file system, HDFSMap/Reduce: In a map/reduce application, there are two kinds of jobs, Map and Reduce. Mappers read the HDFS blocks and does local processing and run in parallel. From a webserver log file <url,#hits>Reducers get the output of many mappers and consolidate data. If there was a mapper per day, reducer could calculate how many monthly hits get an URLHbase: Hadoop/MR design gets better throughput than latency so it’s used as analytical platform, but Hbase allow low latency random access to very big tables (billions of rows per millions of columns)Column oriented DB: Table->Row->ColumnFamily->Timestamp=>Value

CaseOption EEvery hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&…We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related itemsEvery hit to recommender gets from the DB the corresponding set of itemsWent the Hadoop/Hbase way, no more shardingAll static files served by a CDNBenefitsEasy to add and remove Data servers on demand so no wasting/limits hereA surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronousUnlimited bandwidthDisadvantagesBeta software, poor documentation/examplesWe have more complexity at our infrastructure

Case: CDNWhat’s a Content Delivery Network?Your server or http repository (Amazon S3,..) is the Origin of the contentThey give you a DNS name (bb.cdn.net) and you have to create a CNAME to this name (www.example.com -> bb.cdn.net.)When a user asks for www.example.com, the CDN will chose which of their nodes is the nearest to the user and give it/they IP addressesThe user asks for a content (/a.gif) to the node of the CDN, that will check if it has a fresh copy that will send or if it’s a MISS will check with they upstream caches till your OriginSo we get unlimited bandwidth and better latency (we can’t surpass the speed of light)

CaseThey get a completely scalable infrastructure at AWSCan provision a new Cruncher, Datastore or Recommender in a matter of minutes and remove it as soon as neededThey don’t have any upper limit of how many request could serveAll the requests that can impact on the User Experience of the customers of theirs are served by a CDNAs there are only 3 kinds of servers and are managed as images, don’t need so much engineers to take care of the infrastructure

Facebook PlatformIf your primary data source is not under your control and it’s too far, what happens?An API case

CaseLovingit More «Pongos»Hittingthebullseye?

CaseIt’s a social wish list applicationWhen you access checks if your friends have enabled the application and shows their wish listsYou can share your wish lists on FacebookYou can capture wishes (gifts) and be shown a feed of possible merchantsInitial loading time is criticalExpect virality so we won’t have too much response time

CaseNicebutSlow.3 to 7 secondsto load

CaseDefine goalsDefine metricsAnalizemetricsImproveone at time

Case: GoalsTime to load < 1 secondEverythingworks

Case: MetricsTime tosessionsetupValidatingto FacebookGettingFriendsInformationLookupsto local Database (lists, items, captureditems)Time to load «home» pageGet HTMLGetwidgetsGetJavascriptsGetvariousgraphicassets

Case: Analyzing MetricsTime to session setupValidating to Facebook (300 ms)Getting Friends Information (3 sec)Lookups to local Database (lists, items, captured items) (30 ms)Time to load «home» pageGet HTML (400 ms)Get widgets (300 ms)Get Javascripts (300 ms)Get various graphic assets (500 ms)

Case: Facebook accessFromToFrom 3 seconds to 500 ms!

Case: Facebook accessIn ASP.net we “only” have 12 threads/CPU -> Only 12 concurrent requests. From 4 users/sec to 24/secWe could use asynchronous calls but:Low parallelism, if we don’t know the GetAppUsers, we can’t ask for GetUserInfo, so no speedupWe could increase the default #threads to another number (.NET 4.0 defaults at 5000/CPU)We can get fail resiliency adjusting timeouts and increasing threads, connections, and so on

Case: Leveraging “free” toolsSet future Expires on static filesUsers leverage their browser’s cache and are lighter at server’s sideUse “free” CDN to get Jquery et Al.Microsoft and Google provide a public and free repository of Javascript toolsUse CSS spritesAlthough graphic files are small, they need a TCP connection to retrieve. Combining most graphic assets in a big file and use CSS to select which one to show#nav li a {background-image:url('../img/image_nav.gif')} #nav li a.item1 {background-position:0px 0px} #nav li a:hover.item1 {background-position:0px -72px}

Case: more on SpritesAvg size 2KB/fileHTTP/1.1 (rfc 2616) suggests that browsers download no more than 2 components in parallel per hostnameSmall files doesn’t use all available bandwidth. TCP Slow Start…Latency also plays an important role

AboutthissessionSergi Morales, Founder & CTO of Expertos en TIPhone: +34 6688-XPNTIEmail : sergi.morales+eedc@expertosenti.comBlog : http://blog.expertosenti.com Web: http://www.expertosenti.comExpertos en TI: We help Internet oriented projects to leverage all the research done by the big sites (Flickr, Facebook, Twitter, Salesforce, Google, and so on) so they can improve their bottom line and be prepared for growth

About the EEDC course 34330 Execution Environments for Distributed Computing (EEDC), Master in Computer Architecture, Networks and Systems (CANS) Computer Architectura Department (AC) Universitat Politècnica de Catalunya – Barcelona Tech (UPC) ECTS credits: 6INSTRUCTOR Professor Jordi TorresPhone: +34 93 401 7223 Email : torres@ac.upc.eduOffice : Campus Nord, Modul C6. Room 217. Web: http://www.JordiTorres.org

EEDC 2010. Scaling Web Applications

More Related Content

What's hot (15)

Similar to EEDC 2010. Scaling Web Applications (20)

More from Expertos en TI (14)

Recently uploaded (20)

EEDC 2010. Scaling Web Applications