SlideShare a Scribd company logo
1UC BerkeleyCloud Computing: Past, Present, and Future Professor Anthony D. Joseph*, UC BerkeleyReliable Adaptive Distributed Systems LabRWTH Aachen22 March 2010http://abovetheclouds.cs.berkeley.edu/*Director, Intel Research Berkeley
RAD Lab 5-year MissionEnable 1 person to develop, deploy, operate next -generation Internet applicationKey enabling technology: Statistical machine learningdebugging, monitoring, pwr mgmt, auto-configuration, perfprediction, ...Highly interdisciplinary faculty & studentsPI’s: Patterson/Fox/Katz (systems/networks), Jordan (machine learning), Stoica (networks & P2P), Joseph (security), Shenker (networks), Franklin (DB)2 postdocs, ~30 PhD students, ~6 undergradsGrad/Undergrad teaching integrated with research
Course TimelineFriday10:00-12:00 History of Cloud Computing: Time-sharing, virtual machines, datacenter architectures, utility computing12:00-13:30 Lunch13:30-15:00 Modern Cloud Computing: economics, elasticity, failures15:00-15:30 Break15:30-17:00 Cloud Computing Infrastructure: networking, storage, computation modelsMonday10:00-12:00 Cloud Computing research topics: scheduling, multiple datacenters, testbeds
Nexus: A common substrate for cluster computingJoint work with Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Scott Shenker, and Ion Stoica
Recall: Hadoop on HDFSnamenodejob submission nodenamenode daemonjobtrackertasktrackertasktrackertasktrackerdatanode daemondatanode daemondatanode daemonLinux file systemLinux file systemLinux file system………slave nodeslave nodeslave nodeAdapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
ProblemRapid innovation in cluster computing frameworksNo single framework optimal for all applicationsEnergy efficiency means maximizing cluster utilizationWant to run multiple frameworks in a single cluster
What do we want to run in the cluster?PregelApacheHamaDryadPig
Why share the cluster between frameworks?Better utilization and efficiency (e.g., take advantage of diurnal patterns)Better data sharing across frameworks and applications
SolutionNexus is an “operating system” for the cluster over which diverse frameworks can runNexus multiplexes resources between frameworksFrameworks control job execution
GoalsScalableRobust (i.e., simple enough to harden)Flexible enough for a variety of different cluster frameworksExtensible enough to encourage innovative future frameworks
Question 1: Granularity of SharingOption: Coarse-grained sharingGive framework a (slice of) machine for its entire durationData locality compromised if machine held for long timeHard to account for new frameworks and changing demands -> hurts utilization and interactivityHadoop 1Hadoop 2Hadoop 3
Question 1: Granularity of SharingNexus: Fine-grained sharingSupport frameworks that use smaller tasks (in time and space) by multiplexing them across all available resourcesHadoop 3Hadoop 3Hadoop 2Hadoop 1Frameworks can take turns accessing data on each nodeCan resize frameworks shares to get utilization & interactivityHadoop 1Hadoop 2Hadoop 2Hadoop 1Hadoop 3Hadoop 1Hadoop 2Hadoop 3Hadoop 3Hadoop 2Hadoop 2Hadoop 3Hadoop 2Hadoop 1Hadoop 3Hadoop 1Hadoop 2
Question 2: Resource AllocationOption: Global schedulerFrameworks express needs in a specification language, a global scheduler matches resources to frameworksRequires encoding a framework’s semantics using the language, which is complex and can lead to ambiguitiesRestricts frameworks if specification is unanticipatedDesigning a general-purpose global scheduler is hard
Question 2: Resource AllocationNexus: Resource offersOffer free resources to frameworks, let frameworks pick which resources best suit their needsKeeps Nexus simple and allows us to support future jobs
Distributed decisions might not be optimalOutlineNexus ArchitectureResource AllocationMulti-Resource FairnessImplementationResults
Nexus Architecture
Hadoop jobHadoop jobMPI jobHadoop v20 schedulerHadoop v19 schedulerMPIschedulerNexus masterNexus slaveNexus slaveNexus slaveMPIexecutorMPIexecutorHadoop v19 executorHadoop v20 executorHadoop v19 executortasktasktasktaskOverviewtask
Hadoop jobHadoopschedulerNexus masterNexus slaveNexus slaveMPI executortaskResource OffersMPI jobMPIschedulerPick framework to offer toResourceofferMPIexecutortask
Resource OffersMPI jobHadoop jobMPIschedulerHadoopscheduleroffer = list of {machine, free_resources}Example:		         [ {node 1, <2 CPUs, 4 GB>},           {node 2, <2 CPUs, 4 GB>} ]Pick framework to offer toResource offerNexus masterNexus slaveNexus slaveMPI executorMPIexecutortasktask
Hadoop jobHadoopschedulerNexus masterNexus slaveNexus slaveHadoopexecutorMPI executortaskResource OffersMPI jobMPIschedulerFramework-specific schedulingtaskPick framework to offer toResourceofferLaunches & isolates executorsMPIexecutortask
Resource Offer DetailsMin and max task sizes to control fragmentationFilters let framework restrict offers sent to itBy machine listBy quantity of resourcesTimeouts can be added to filtersFrameworks can signal when to destroy filters, or when they want more offers
Using Offers for Data LocalityWe found that a simple policy called delay scheduling can give very high locality:Framework waits for offers on nodes that have its dataIf waited longer than a certain delay, starts launching non-local tasks
Framework IsolationIsolation mechanism is pluggable due to the inherent perfomance/isolation tradeoffCurrent implementation supports Solaris projects and Linux containers Both isolate CPU, memory and network bandwidthLinux developers working on disk IO isolationOther options: VMs, Solaris zones, policing
Resource Allocation
Allocation PoliciesNexus picks framework to offer resources to, and hence controls how many resources each framework can get (but not which)Allocation policies are pluggable to suit organization needs, through allocation modules
Example: Hierarchical Fairshare PolicyCluster Share PolicyFacebook.com20% 100%80%0%AdsSpamUser 2User 114%70%30%20%100%6%Job 4Job 3Job 2Job 1CurrTimeCurrTimeCurrTime
RevocationKilling tasks to make room for other usersNot the normal case because fine-grained tasks enable quick reallocation of resources Sometimes necessary:Long running tasks never relinquishing resourcesBuggy job running foreverGreedy user who decides to makes his task long
Revocation MechanismAllocation policy defines a safe share for each userUsers will get at least safe share within specified timeRevoke only if a user is below its safe share and is interested in offersRevoke tasks from users farthest above their safe shareFramework warned before its task is killed
How Do We Run MPI?Users always told their safe shareAvoid revocation by staying below itGiving each user a small safe share may not be enough if jobs need many machines Can run a traditional grid or HPC scheduler as a user with a larger safe share of the cluster, and have MPI jobs queue up on itE.g. Torque gets 40% of cluster
Example: Torque on NexusFacebook.comSafe share = 40%40%20%40%TorqueAdsSpamUser 2User 1MPI JobMPI JobMPI JobMPI JobJob 4Job 1Job 2Job 1
Multi-Resource Fairness
What is Fair?Goal: define a fair allocation of resources in the cluster between multiple usersExample: suppose we have: 30 CPUs and 30 GB RAMTwo users with equal sharesUser 1 needs <1 CPU, 1 GB RAM> per taskUser 2 needs <1 CPU, 3 GB RAM> per taskWhat is a fair allocation?
Definition 1: Asset FairnessIdea: give weights to resources (e.g. 1 CPU = 1 GB) and equalize value of resources given to each userAlgorithm: when resources are free, offer to whoever has the least valueResult:U1: 12 tasks: 12 CPUs, 12 GB ($24)U2: 6   tasks:   6 CPUs, 18 GB ($24)PROBLEMUser 1 has < 50% of both CPUs and RAMUser 1User 2100%50%0%CPURAM
Lessons from Definition 1“You shouldn’t do worse than if you ran a smaller, private cluster equal in size to your share”Thus, given N users, each user should get ≥ 1/N of his dominating resource (i.e., the resource that he consumes most of)
Def. 2: Dominant Resource FairnessIdea: give every user an equal share of her dominant resource (i.e., resource it consumes most of) Algorithm: when resources are free, offer to the user with the smallest dominant share (i.e., fractional share of the her dominant resource)Result:U1: 15 tasks: 15 CPUs, 15 GBU2:   5 tasks:   5 CPUs, 15 GBUser 1User 2100%50%0%CPURAM
Fairness Properties
Implementation
Implementation Stats7000 lines of C++APIs in C, C++, Java, Python, RubyExecutor isolation using Linux containers and Solaris projects
FrameworksPorted frameworks:Hadoop(900 line patch)MPI (160 line wrapper scripts)New frameworks:Spark, Scala framework for iterative jobs (1300 lines)Apache+haproxy, elastic web server farm (200 lines)
Results
OverheadLess than 4% seen in practice
Dynamic Resource Sharing
Multiple Hadoops ExperimentHadoop 1Hadoop 2Hadoop 3
Multiple Hadoops ExperimentHadoop 3Hadoop 3Hadoop 2Hadoop 1Hadoop 1Hadoop 1Hadoop 2Hadoop 2Hadoop 1Hadoop 3Hadoop 1Hadoop 2Hadoop 2Hadoop 3Hadoop 3Hadoop 2Hadoop 2Hadoop 3Hadoop 2Hadoop 3Hadoop 1Hadoop 1Hadoop 3Hadoop 2
Results with 16 Hadoops
Web Server Farm Framework
Web Framework ExperimenthttperfHTTP requestHTTP requestHTTP requestLoad calculationScheduler (haproxy)Load gen frameworktaskresource offerNexus masterstatus updateNexus slaveNexus slaveNexus slaveWeb executorLoad gen executorWeb executorLoad gen executorLoad genexecutorWeb executortask(Apache)tasktask(Apache)tasktasktasktask(Apache)
Web Framework Results
Future WorkExperiment with parallel programming modelsFurther explore low-latency services on Nexus (web applications, etc)Shared services (e.g. BigTable, GFS)Deploy to users and open source
Cloud Computing Testbeds
Open Cirrus™: Seizing the Open Source Cloud Stack OpportunityA joint initiative sponsored by HP, Intel, and Yahoo!http://opencirrus.org/
Proprietary Cloud Computing stacksGOOGLEAMAZONMICROSOFTPublicly accessible layerApplicationsApplicationsApplicationsApplication FrameworksMapReduce, Sawzall, Google App Engine, Protocol BuffersApplication FrameworksEMR – HadoopApplication Frameworks.NET ServicesSoftware InfrastructureVM ManagementJob SchedulingBorgStorage ManagementGFS,  BigTableMonitoringBorgSoftware InfrastructureVM ManagementEC2Job SchedulingStorage ManagementS3, EBSMonitoringBorgSoftware InfrastructureVM ManagementFabric ControllerJob SchedulingFabric ControllerStorage ManagementSQL Services, blobs, tables, queuesMonitoringFabric ControllerHardware InfrastructureBorgHardware InfrastructureHardware InfrastructureFabric Controller
ApplicationsMonitoringGangliaNagiosZenossMONMoaraStorage ManagementHDFSKFSGlusterLustrePVFSMooseFS HBase HypertableApplication FrameworksPig, Hadoop, MPI, Sprout, MahoutSoftware InfrastructureVM ManagementJob SchedulingStorage ManagementMonitoringJob SchedulingMaui/TorqueVM ManagementEucalyptus EnomalismTashi ReservoirNimbus,oVirtHardware InfrastructurePRS Emulab Cobbler xCatHardware InfrastructurePRS, Emulab, Cobbler, xCatOpen Cloud Computing stacksHeavily fragmented today!
Open Cirrus™ Cloud Computing TestbedShared:  research, applications, infrastructure (12K cores), data setsGlobal services: sign on, monitoring, store. Open src stack (prs, tashi, hadoop)Sponsored by HP, Intel, and Yahoo! (with additional support from NSF)9 sites currently, target of around 20 in the next two years. 	Open Cirrus Goals GoalsFoster new systems and services research around cloud computingCatalyze open-source stack and APIs for the cloud How are we unique?Support for systems research and applications researchFederation of heterogeneous datacenters
Open Cirrus Organization Central Management Office, oversees Open CirrusCurrently owned by HP Governance modelResearch team Technical team New site additions Support (legal (export, privacy), IT, etc.) Each site  Runs its own research and technical teams Contributes individual technologies Operates some of the global services E.g. HP site supports portal and PRSIntel site developing and supporting TashiYahoo! contributes to Hadoop
Intel BigData Open Cirrus SiteMobile Rack8 (1u) nodes-------------2 Xeon E5440(quad-core)[Harpertown/Core 2] 16GB DRAM2 1TB Diskhttp://opencirrus.intel-research.net1 Gb/s (x8 p2p)1 Gb/s (x4)Switch24 Gb/s1 Gb/s (x8)1 Gb/s (x4)45 Mb/s T3 to InternetSwitch48 Gb/s*Switch48 Gb/s1 Gb/s (x2x5 p2p)1 Gb/s (x4)1 Gb/s (x4)1 Gb/s (x4)1 Gb/s (x4)1 Gb/s (x4)3U Rack5 storage nodes-------------12 1TB DisksSwitch48 Gb/sSwitch48 Gb/sSwitch48 Gb/sSwitch48 Gb/sSwitch48 Gb/s1 Gb/s (x4x4 p2p)1 Gb/s (x4x4 p2p)1 Gb/s (x15 p2p)1 Gb/s (x15 p2p)1 Gb/s (x15 p2p)(r1r5)PDUw/per-port monitoring and controlBlade Rack 40 nodesBlade Rack 40 nodes1U Rack 15 nodes2U Rack 15 nodes2U Rack 15 nodes20 nodes: 1 Xeon (1-core) [Irwindale/Pent4], 6GB DRAM, 366GB disk (36+300GB)10 nodes: 2 Xeon 5160 (2-core) [Woodcrest/Core], 4GB RAM,  2 75GB disks10 nodes: 2 Xeon E5345 (4-core) [Clovertown/Core],8GB DRAM, 2 150GB Disk2 Xeon E5345(quad-core)[Clovertown/Core]8GB DRAM2 150GB Disk2 Xeon E5420(quad-core)[Harpertown/Core 2]8GB DRAM2 1TB Disk2 Xeon E5440(quad-core)[Harpertown/Core 2]8GB DRAM6 1TB Disk2 Xeon E5520(quad-core)[Nehalem-EP/Core i7] 16GB DRAM6 1TB Diskx3x2x2Key:rXrY=row X rack YrXrYcZ=row X rack Y chassis Z(r2r2c1-4)(r2r1c1-4)(r1r1, r1r2)(r1r3, r1r4, r2r3)(r3r2, r3r3)
Open Cirrus SitesTotal1,0294 PB12,0741,74626.3 TB
Testbed Comparison
Open Cirrus StackCompute + network + storage resources Management and control subsystemPower + cooling Physical Resource set (Zoni) serviceCredit: John Wilkes (HP)
Open Cirrus StackResearchTashiNFS storage serviceHDFS storageservicePRS clients, each with theirown “physical data center”Zoni service
Open Cirrus StackResearchTashiNFS storage serviceHDFS storageserviceVirtual clusterVirtual clusterVirtual clusters (e.g., Tashi)Zoni service
Open Cirrus StackResearchTashiNFS storage serviceHDFS storageserviceVirtual clusterVirtual clusterApplication runningOn HadoopOn Tashi virtual clusterOn a PRSOn real hardwareBigData AppHadoopZoni service
Open Cirrus StackResearchTashiNFS storage serviceHDFS storageserviceVirtual clusterVirtual clusterExperiment/save/restoreBigData appHadoopZoni service
Open Cirrus StackResearchTashiNFS storage serviceHDFS storageserviceVirtual clusterVirtual clusterExperiment/save/restoreBigData AppHadoopPlatform servicesZoni service
Open Cirrus StackResearchTashiNFS storage serviceHDFS storageserviceVirtual clusterVirtual clusterUser servicesExperiment/save/restoreBigData AppHadoopPlatform servicesZoni service
Open Cirrus StackResearchTashiNFS storage serviceHDFS storageserviceVirtual clusterVirtual clusterUser servicesExperiment/save/restoreBigData AppHadoopPlatform servicesZoni
System OrganizationCompute nodes are divided into dynamically-allocated, vlan-isolated PRS subdomainsApps switch back and forth between virtual and phyiscalOpenservice researchApps running in a  VM mgmt infrastructure (e.g., Tashi)Tashi development Production storage service Proprietaryservice research Open workload monitoring and trace collection
Open Cirrus stack - ZoniZoni service goalsProvide mini-datacenters to researchersIsolate experiments from each otherStable base for other researchZoni service approachAllocate sets of physical co-located nodes, isolated inside VLANs.Zoni code from HP being merged into Tashi Apache project and extended by IntelRunning on HP siteBeing ported to Intel siteWill eventually run on all sites
Open Cirrus Stack - Tashi An open source Apache Software Foundation project sponsored by Intel (with CMU, Yahoo, HP)Infrastructure for cloud computing on Big Data  http://incubator.apache.org/projects/tashi Research focus: Location-aware co-scheduling of VMs, storage, and power.Seamless physical/virtual migration.  Joint with Greg Ganger (CMU), Mor Harchol-Balter (CMU), Milan Milenkovic (CTG)
NodeNodeNodeNodeNodeNodeTashi High-Level DesignServices are instantiated through virtual machinesMost decisions happen inthe scheduler; manages compute/storage/power in concertData location and power informationis exposed to scheduler and servicesSchedulerVirtualization ServiceStorage ServiceThe storage service aggregates thecapacity of the commodity nodes to house Big Data repositories. ClusterManagerCluster nodes are assumed to be commodity machinesCM maintains databasesand routes messages;decision logic is limited
Location Matters (calculated)
73Open Cirrus Stack - Hadoop  An open-source Apache Software Foundation project sponsored by Yahoo! http://wiki.apache.org/hadoop/ProjectDescription Provides a parallel programming model (MapReduce), a distributed file system, and a parallel database (HDFS)
What kinds of research projects are Open Cirrus sites looking for? Open Cirrus is seeking research in the following areas (different centers will weight these differently):Datacenter federationDatacenter managementWeb servicesData-intensive applications and systems The following kinds of projects are generally not of interest:Traditional HPC application developmentProduction applications that just need lots of cyclesClosed source system development
How do users get access to Open Cirrus sites?Project PIs apply to each site separately. Contact names, email addresses, and web links for applications to each site will be available on the Open Cirrus Web site (which goes live Q209)http://opencirrus.orgEach Open Cirrus site decides which users and projects get access to its site.Developing a global sign on for all sites (Q2 09)Users will be able to login to each Open Cirrus site for which they are authorized using the same login and password.
Summary and Lessons Intel is collaborating with HP and Yahoo! to provide a cloud computing testbed for the research communityUsing the cloud as an accelerator for interactive streaming/big data apps is an important usage model Primary goals are to Foster new systems research around cloud computingCatalyze open-source reference stack and APIs for the cloudAccess model, Local and global services, Application frameworksExplore location-aware and power-aware workload schedulingDevelop integrated physical/virtual allocations to combat cluster squattingDesign cloud storage modelsGFS-style storage systems not mature, impact of SSDs unknownInvestigate new application framework alternatives to map-reduce/Hadoop
Other Cloud Computing Research Topics: Isolation and DC Energy
Heterogeneity in Virtualized EnvironmentsVM technology isolates CPU and memory, but disk and network are sharedFull bandwidth when no contentionEqual shares when there is contention2.5x performance differenceEC2 small instances
Isolation ResearchNeed predictable variance over raw performanceSome resources that people have run into problems with: Power, disk space, disk I/O rate (drive, bus), memory space (user/kernel), memory bus, cache at all levels (TLB, etc), hyperthreading/etc, CPU rate, interruptsNetwork: NIC (Rx/Tx), Switch, cross-datacenter, cross-countryOS resources: File descriptors, ports, sockets
Datacenter EnergyEPA, 8/2007:1.5% of total U.S. energy consumptionGrowing from 60 to 100 Billion kWh in 5 yrs48% of typical IT budget spent on energy75 MW new DC deployments in PG&E’s service area – that they know about! (expect another 2x)Microsoft: $500m new Chicago facilityThree substations with a capacity of 198MW 200+ shipping containers w/ 2,000 servers eachOverall  growth of 20,000/month
81Power/Cooling Issues
First Milestone: DC Energy ConservationDCs limited by powerFor each dollar spent on servers, add $0.48 (2005)/$0.71 (2010) for power/cooling$26B spent to power and cool servers in 2005 grows to $45B in 2010Within DC racks, network equipment often the “hottest” components in the hot spot
Thermal Image of Typical Cluster RackRackSwitchM. K. Patterson, A. Pratt, P. Kumar, “From UPS to Silicon: an end-to-end evaluation of datacenter efficiency”, Intel Corporation
DC Networking and PowerSelectively power down ports/portions of net elementsEnhanced power-awareness in the network stackPower-aware routing and support for system virtualizationSupport for datacenter “slice” power down and restartApplication and power-aware media access/controlDynamic selection of full/half duplexDirectional asymmetry to save power, e.g., 10Gb/s send, 100Mb/s receive Power-awareness in applications and protocolsHard state (proxying), soft state (caching), protocol/data “streamlining” for power as well as b/w reductionPower implications for topology designTradeoffs in redundancy/high-availability vs. power consumptionVLANs support for power-aware system virtualization
SummaryMany areas for research into Cloud Computing!Datacenter design, languages, scheduling, isolation, energy efficiency (at all levels)Opportunities to try out research at scale!Amazon EC2, Open Cirrus, …
Thank you!adj@eecs.berkeley.eduhttp://abovetheclouds.cs.berkeley.edu/86
Cloud Computing

More Related Content

PPTX
Cloud Computing
PDF
Intro to the Distributed Version of TensorFlow
PDF
Large Scale Deep Learning with TensorFlow
PDF
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
PDF
TASK SCHEDULING USING AMALGAMATION OF MET HEURISTICS SWARM OPTIMIZATION ALGOR...
PDF
Fault tolerant mechanisms in Big Data
PPT
task scheduling in cloud datacentre using genetic algorithm
PPTX
TensorFlow in Context
Cloud Computing
Intro to the Distributed Version of TensorFlow
Large Scale Deep Learning with TensorFlow
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
TASK SCHEDULING USING AMALGAMATION OF MET HEURISTICS SWARM OPTIMIZATION ALGOR...
Fault tolerant mechanisms in Big Data
task scheduling in cloud datacentre using genetic algorithm
TensorFlow in Context

What's hot (20)

PDF
An Efficient Cloud based Approach for Service Crawling
PDF
FIWARE Tech Summit - FIWARE Cygnus and STH-Comet
PPTX
Cloud Computing and PSo
PPTX
Resource management
PDF
TensorFlow
PDF
Are cloud based virtual labs cost effective? (CSEDU 2012)
PPTX
Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017
PDF
Load balancing in public cloud combining the concepts of data mining and netw...
PDF
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
PPTX
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
PDF
F233842
PDF
Application of selective algorithm for effective resource provisioning in clo...
PPTX
Simulation of Heterogeneous Cloud Infrastructures
PDF
Task Scheduling in Grid Computing.
PDF
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
PPTX
Task scheduling Survey in Cloud Computing
PPTX
Genetic Algorithm for task scheduling in Cloud Computing Environment
PPTX
Data Replication In Cloud Computing
PDF
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
PPTX
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
An Efficient Cloud based Approach for Service Crawling
FIWARE Tech Summit - FIWARE Cygnus and STH-Comet
Cloud Computing and PSo
Resource management
TensorFlow
Are cloud based virtual labs cost effective? (CSEDU 2012)
Tensorflow 101 @ Machine Learning Innovation Summit SF June 6, 2017
Load balancing in public cloud combining the concepts of data mining and netw...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
F233842
Application of selective algorithm for effective resource provisioning in clo...
Simulation of Heterogeneous Cloud Infrastructures
Task Scheduling in Grid Computing.
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Task scheduling Survey in Cloud Computing
Genetic Algorithm for task scheduling in Cloud Computing Environment
Data Replication In Cloud Computing
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
Ad

Viewers also liked (7)

PPT
Rzevsky agent models of large systems
PPTX
A Manifesto for 21st-Century IT
DOCX
Dynamic resource allocation using virtual machines for cloud computing enviro...
PPTX
"A 30min Introduction to Agent-Based Modelling" for GORS
PPT
Chapter 6 complexity science and complex adaptive systems
PPT
A Survey on Resource Allocation & Monitoring in Cloud Computing
PPT
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...
Rzevsky agent models of large systems
A Manifesto for 21st-Century IT
Dynamic resource allocation using virtual machines for cloud computing enviro...
"A 30min Introduction to Agent-Based Modelling" for GORS
Chapter 6 complexity science and complex adaptive systems
A Survey on Resource Allocation & Monitoring in Cloud Computing
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...
Ad

Similar to Cloud Computing (20)

PPTX
Comparison between Cloud Mirror, Mesos Cluster, and Google Omega
PDF
Podila mesos con europe keynote aug sep 2016
PDF
Mesos and YARN
PDF
Distributed Services Scheduling and Cloud Provisioning
PPTX
PMIx Updated Overview
PDF
Resource Allocation using Virtual Clusters
PDF
Cluster Schedulers
PDF
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?
PDF
PDF
Hadoop scheduler
PPTX
IoT ppts unit 5.gfajfajfgajfgagfafaffpptx
PPT
Map Reduce
PPT
Computing Outside The Box June 2009
PDF
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
PDF
Introduction To Apache Mesos
PDF
Datacenter Computing with Apache Mesos - BigData DC
PPTX
Взгляд на облака с точки зрения HPC
PPT
Hadoop mapreduce and yarn frame work- unit5
PPT
High Performance Computing - Cloud Point of View
PPTX
Introduction to Apache Mesos
Comparison between Cloud Mirror, Mesos Cluster, and Google Omega
Podila mesos con europe keynote aug sep 2016
Mesos and YARN
Distributed Services Scheduling and Cloud Provisioning
PMIx Updated Overview
Resource Allocation using Virtual Clusters
Cluster Schedulers
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?
Hadoop scheduler
IoT ppts unit 5.gfajfajfgajfgagfafaffpptx
Map Reduce
Computing Outside The Box June 2009
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Introduction To Apache Mesos
Datacenter Computing with Apache Mesos - BigData DC
Взгляд на облака с точки зрения HPC
Hadoop mapreduce and yarn frame work- unit5
High Performance Computing - Cloud Point of View
Introduction to Apache Mesos

More from butest (20)

PDF
EL MODELO DE NEGOCIO DE YOUTUBE
DOC
1. MPEG I.B.P frame之不同
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPT
Timeline: The Life of Michael Jackson
DOCX
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPTX
Com 380, Summer II
PPT
PPT
DOCX
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
DOC
MICHAEL JACKSON.doc
PPTX
Social Networks: Twitter Facebook SL - Slide 1
PPT
Facebook
DOCX
Executive Summary Hare Chevrolet is a General Motors dealership ...
DOC
Welcome to the Dougherty County Public Library's Facebook and ...
DOC
NEWS ANNOUNCEMENT
DOC
C-2100 Ultra Zoom.doc
DOC
MAC Printing on ITS Printers.doc.doc
DOC
Mac OS X Guide.doc
DOC
hier
DOC
WEB DESIGN!
EL MODELO DE NEGOCIO DE YOUTUBE
1. MPEG I.B.P frame之不同
LESSONS FROM THE MICHAEL JACKSON TRIAL
Timeline: The Life of Michael Jackson
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
LESSONS FROM THE MICHAEL JACKSON TRIAL
Com 380, Summer II
PPT
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
MICHAEL JACKSON.doc
Social Networks: Twitter Facebook SL - Slide 1
Facebook
Executive Summary Hare Chevrolet is a General Motors dealership ...
Welcome to the Dougherty County Public Library's Facebook and ...
NEWS ANNOUNCEMENT
C-2100 Ultra Zoom.doc
MAC Printing on ITS Printers.doc.doc
Mac OS X Guide.doc
hier
WEB DESIGN!

Cloud Computing

  • 1. 1UC BerkeleyCloud Computing: Past, Present, and Future Professor Anthony D. Joseph*, UC BerkeleyReliable Adaptive Distributed Systems LabRWTH Aachen22 March 2010http://abovetheclouds.cs.berkeley.edu/*Director, Intel Research Berkeley
  • 2. RAD Lab 5-year MissionEnable 1 person to develop, deploy, operate next -generation Internet applicationKey enabling technology: Statistical machine learningdebugging, monitoring, pwr mgmt, auto-configuration, perfprediction, ...Highly interdisciplinary faculty & studentsPI’s: Patterson/Fox/Katz (systems/networks), Jordan (machine learning), Stoica (networks & P2P), Joseph (security), Shenker (networks), Franklin (DB)2 postdocs, ~30 PhD students, ~6 undergradsGrad/Undergrad teaching integrated with research
  • 3. Course TimelineFriday10:00-12:00 History of Cloud Computing: Time-sharing, virtual machines, datacenter architectures, utility computing12:00-13:30 Lunch13:30-15:00 Modern Cloud Computing: economics, elasticity, failures15:00-15:30 Break15:30-17:00 Cloud Computing Infrastructure: networking, storage, computation modelsMonday10:00-12:00 Cloud Computing research topics: scheduling, multiple datacenters, testbeds
  • 4. Nexus: A common substrate for cluster computingJoint work with Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Scott Shenker, and Ion Stoica
  • 5. Recall: Hadoop on HDFSnamenodejob submission nodenamenode daemonjobtrackertasktrackertasktrackertasktrackerdatanode daemondatanode daemondatanode daemonLinux file systemLinux file systemLinux file system………slave nodeslave nodeslave nodeAdapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
  • 6. ProblemRapid innovation in cluster computing frameworksNo single framework optimal for all applicationsEnergy efficiency means maximizing cluster utilizationWant to run multiple frameworks in a single cluster
  • 7. What do we want to run in the cluster?PregelApacheHamaDryadPig
  • 8. Why share the cluster between frameworks?Better utilization and efficiency (e.g., take advantage of diurnal patterns)Better data sharing across frameworks and applications
  • 9. SolutionNexus is an “operating system” for the cluster over which diverse frameworks can runNexus multiplexes resources between frameworksFrameworks control job execution
  • 10. GoalsScalableRobust (i.e., simple enough to harden)Flexible enough for a variety of different cluster frameworksExtensible enough to encourage innovative future frameworks
  • 11. Question 1: Granularity of SharingOption: Coarse-grained sharingGive framework a (slice of) machine for its entire durationData locality compromised if machine held for long timeHard to account for new frameworks and changing demands -> hurts utilization and interactivityHadoop 1Hadoop 2Hadoop 3
  • 12. Question 1: Granularity of SharingNexus: Fine-grained sharingSupport frameworks that use smaller tasks (in time and space) by multiplexing them across all available resourcesHadoop 3Hadoop 3Hadoop 2Hadoop 1Frameworks can take turns accessing data on each nodeCan resize frameworks shares to get utilization & interactivityHadoop 1Hadoop 2Hadoop 2Hadoop 1Hadoop 3Hadoop 1Hadoop 2Hadoop 3Hadoop 3Hadoop 2Hadoop 2Hadoop 3Hadoop 2Hadoop 1Hadoop 3Hadoop 1Hadoop 2
  • 13. Question 2: Resource AllocationOption: Global schedulerFrameworks express needs in a specification language, a global scheduler matches resources to frameworksRequires encoding a framework’s semantics using the language, which is complex and can lead to ambiguitiesRestricts frameworks if specification is unanticipatedDesigning a general-purpose global scheduler is hard
  • 14. Question 2: Resource AllocationNexus: Resource offersOffer free resources to frameworks, let frameworks pick which resources best suit their needsKeeps Nexus simple and allows us to support future jobs
  • 15. Distributed decisions might not be optimalOutlineNexus ArchitectureResource AllocationMulti-Resource FairnessImplementationResults
  • 17. Hadoop jobHadoop jobMPI jobHadoop v20 schedulerHadoop v19 schedulerMPIschedulerNexus masterNexus slaveNexus slaveNexus slaveMPIexecutorMPIexecutorHadoop v19 executorHadoop v20 executorHadoop v19 executortasktasktasktaskOverviewtask
  • 18. Hadoop jobHadoopschedulerNexus masterNexus slaveNexus slaveMPI executortaskResource OffersMPI jobMPIschedulerPick framework to offer toResourceofferMPIexecutortask
  • 19. Resource OffersMPI jobHadoop jobMPIschedulerHadoopscheduleroffer = list of {machine, free_resources}Example: [ {node 1, <2 CPUs, 4 GB>}, {node 2, <2 CPUs, 4 GB>} ]Pick framework to offer toResource offerNexus masterNexus slaveNexus slaveMPI executorMPIexecutortasktask
  • 20. Hadoop jobHadoopschedulerNexus masterNexus slaveNexus slaveHadoopexecutorMPI executortaskResource OffersMPI jobMPIschedulerFramework-specific schedulingtaskPick framework to offer toResourceofferLaunches & isolates executorsMPIexecutortask
  • 21. Resource Offer DetailsMin and max task sizes to control fragmentationFilters let framework restrict offers sent to itBy machine listBy quantity of resourcesTimeouts can be added to filtersFrameworks can signal when to destroy filters, or when they want more offers
  • 22. Using Offers for Data LocalityWe found that a simple policy called delay scheduling can give very high locality:Framework waits for offers on nodes that have its dataIf waited longer than a certain delay, starts launching non-local tasks
  • 23. Framework IsolationIsolation mechanism is pluggable due to the inherent perfomance/isolation tradeoffCurrent implementation supports Solaris projects and Linux containers Both isolate CPU, memory and network bandwidthLinux developers working on disk IO isolationOther options: VMs, Solaris zones, policing
  • 25. Allocation PoliciesNexus picks framework to offer resources to, and hence controls how many resources each framework can get (but not which)Allocation policies are pluggable to suit organization needs, through allocation modules
  • 26. Example: Hierarchical Fairshare PolicyCluster Share PolicyFacebook.com20% 100%80%0%AdsSpamUser 2User 114%70%30%20%100%6%Job 4Job 3Job 2Job 1CurrTimeCurrTimeCurrTime
  • 27. RevocationKilling tasks to make room for other usersNot the normal case because fine-grained tasks enable quick reallocation of resources Sometimes necessary:Long running tasks never relinquishing resourcesBuggy job running foreverGreedy user who decides to makes his task long
  • 28. Revocation MechanismAllocation policy defines a safe share for each userUsers will get at least safe share within specified timeRevoke only if a user is below its safe share and is interested in offersRevoke tasks from users farthest above their safe shareFramework warned before its task is killed
  • 29. How Do We Run MPI?Users always told their safe shareAvoid revocation by staying below itGiving each user a small safe share may not be enough if jobs need many machines Can run a traditional grid or HPC scheduler as a user with a larger safe share of the cluster, and have MPI jobs queue up on itE.g. Torque gets 40% of cluster
  • 30. Example: Torque on NexusFacebook.comSafe share = 40%40%20%40%TorqueAdsSpamUser 2User 1MPI JobMPI JobMPI JobMPI JobJob 4Job 1Job 2Job 1
  • 32. What is Fair?Goal: define a fair allocation of resources in the cluster between multiple usersExample: suppose we have: 30 CPUs and 30 GB RAMTwo users with equal sharesUser 1 needs <1 CPU, 1 GB RAM> per taskUser 2 needs <1 CPU, 3 GB RAM> per taskWhat is a fair allocation?
  • 33. Definition 1: Asset FairnessIdea: give weights to resources (e.g. 1 CPU = 1 GB) and equalize value of resources given to each userAlgorithm: when resources are free, offer to whoever has the least valueResult:U1: 12 tasks: 12 CPUs, 12 GB ($24)U2: 6 tasks: 6 CPUs, 18 GB ($24)PROBLEMUser 1 has < 50% of both CPUs and RAMUser 1User 2100%50%0%CPURAM
  • 34. Lessons from Definition 1“You shouldn’t do worse than if you ran a smaller, private cluster equal in size to your share”Thus, given N users, each user should get ≥ 1/N of his dominating resource (i.e., the resource that he consumes most of)
  • 35. Def. 2: Dominant Resource FairnessIdea: give every user an equal share of her dominant resource (i.e., resource it consumes most of) Algorithm: when resources are free, offer to the user with the smallest dominant share (i.e., fractional share of the her dominant resource)Result:U1: 15 tasks: 15 CPUs, 15 GBU2: 5 tasks: 5 CPUs, 15 GBUser 1User 2100%50%0%CPURAM
  • 38. Implementation Stats7000 lines of C++APIs in C, C++, Java, Python, RubyExecutor isolation using Linux containers and Solaris projects
  • 39. FrameworksPorted frameworks:Hadoop(900 line patch)MPI (160 line wrapper scripts)New frameworks:Spark, Scala framework for iterative jobs (1300 lines)Apache+haproxy, elastic web server farm (200 lines)
  • 41. OverheadLess than 4% seen in practice
  • 44. Multiple Hadoops ExperimentHadoop 3Hadoop 3Hadoop 2Hadoop 1Hadoop 1Hadoop 1Hadoop 2Hadoop 2Hadoop 1Hadoop 3Hadoop 1Hadoop 2Hadoop 2Hadoop 3Hadoop 3Hadoop 2Hadoop 2Hadoop 3Hadoop 2Hadoop 3Hadoop 1Hadoop 1Hadoop 3Hadoop 2
  • 45. Results with 16 Hadoops
  • 46. Web Server Farm Framework
  • 47. Web Framework ExperimenthttperfHTTP requestHTTP requestHTTP requestLoad calculationScheduler (haproxy)Load gen frameworktaskresource offerNexus masterstatus updateNexus slaveNexus slaveNexus slaveWeb executorLoad gen executorWeb executorLoad gen executorLoad genexecutorWeb executortask(Apache)tasktask(Apache)tasktasktasktask(Apache)
  • 49. Future WorkExperiment with parallel programming modelsFurther explore low-latency services on Nexus (web applications, etc)Shared services (e.g. BigTable, GFS)Deploy to users and open source
  • 51. Open Cirrus™: Seizing the Open Source Cloud Stack OpportunityA joint initiative sponsored by HP, Intel, and Yahoo!http://opencirrus.org/
  • 52. Proprietary Cloud Computing stacksGOOGLEAMAZONMICROSOFTPublicly accessible layerApplicationsApplicationsApplicationsApplication FrameworksMapReduce, Sawzall, Google App Engine, Protocol BuffersApplication FrameworksEMR – HadoopApplication Frameworks.NET ServicesSoftware InfrastructureVM ManagementJob SchedulingBorgStorage ManagementGFS, BigTableMonitoringBorgSoftware InfrastructureVM ManagementEC2Job SchedulingStorage ManagementS3, EBSMonitoringBorgSoftware InfrastructureVM ManagementFabric ControllerJob SchedulingFabric ControllerStorage ManagementSQL Services, blobs, tables, queuesMonitoringFabric ControllerHardware InfrastructureBorgHardware InfrastructureHardware InfrastructureFabric Controller
  • 53. ApplicationsMonitoringGangliaNagiosZenossMONMoaraStorage ManagementHDFSKFSGlusterLustrePVFSMooseFS HBase HypertableApplication FrameworksPig, Hadoop, MPI, Sprout, MahoutSoftware InfrastructureVM ManagementJob SchedulingStorage ManagementMonitoringJob SchedulingMaui/TorqueVM ManagementEucalyptus EnomalismTashi ReservoirNimbus,oVirtHardware InfrastructurePRS Emulab Cobbler xCatHardware InfrastructurePRS, Emulab, Cobbler, xCatOpen Cloud Computing stacksHeavily fragmented today!
  • 54. Open Cirrus™ Cloud Computing TestbedShared: research, applications, infrastructure (12K cores), data setsGlobal services: sign on, monitoring, store. Open src stack (prs, tashi, hadoop)Sponsored by HP, Intel, and Yahoo! (with additional support from NSF)9 sites currently, target of around 20 in the next two years. Open Cirrus Goals GoalsFoster new systems and services research around cloud computingCatalyze open-source stack and APIs for the cloud How are we unique?Support for systems research and applications researchFederation of heterogeneous datacenters
  • 55. Open Cirrus Organization Central Management Office, oversees Open CirrusCurrently owned by HP Governance modelResearch team Technical team New site additions Support (legal (export, privacy), IT, etc.) Each site Runs its own research and technical teams Contributes individual technologies Operates some of the global services E.g. HP site supports portal and PRSIntel site developing and supporting TashiYahoo! contributes to Hadoop
  • 56. Intel BigData Open Cirrus SiteMobile Rack8 (1u) nodes-------------2 Xeon E5440(quad-core)[Harpertown/Core 2] 16GB DRAM2 1TB Diskhttp://opencirrus.intel-research.net1 Gb/s (x8 p2p)1 Gb/s (x4)Switch24 Gb/s1 Gb/s (x8)1 Gb/s (x4)45 Mb/s T3 to InternetSwitch48 Gb/s*Switch48 Gb/s1 Gb/s (x2x5 p2p)1 Gb/s (x4)1 Gb/s (x4)1 Gb/s (x4)1 Gb/s (x4)1 Gb/s (x4)3U Rack5 storage nodes-------------12 1TB DisksSwitch48 Gb/sSwitch48 Gb/sSwitch48 Gb/sSwitch48 Gb/sSwitch48 Gb/s1 Gb/s (x4x4 p2p)1 Gb/s (x4x4 p2p)1 Gb/s (x15 p2p)1 Gb/s (x15 p2p)1 Gb/s (x15 p2p)(r1r5)PDUw/per-port monitoring and controlBlade Rack 40 nodesBlade Rack 40 nodes1U Rack 15 nodes2U Rack 15 nodes2U Rack 15 nodes20 nodes: 1 Xeon (1-core) [Irwindale/Pent4], 6GB DRAM, 366GB disk (36+300GB)10 nodes: 2 Xeon 5160 (2-core) [Woodcrest/Core], 4GB RAM, 2 75GB disks10 nodes: 2 Xeon E5345 (4-core) [Clovertown/Core],8GB DRAM, 2 150GB Disk2 Xeon E5345(quad-core)[Clovertown/Core]8GB DRAM2 150GB Disk2 Xeon E5420(quad-core)[Harpertown/Core 2]8GB DRAM2 1TB Disk2 Xeon E5440(quad-core)[Harpertown/Core 2]8GB DRAM6 1TB Disk2 Xeon E5520(quad-core)[Nehalem-EP/Core i7] 16GB DRAM6 1TB Diskx3x2x2Key:rXrY=row X rack YrXrYcZ=row X rack Y chassis Z(r2r2c1-4)(r2r1c1-4)(r1r1, r1r2)(r1r3, r1r4, r2r3)(r3r2, r3r3)
  • 57. Open Cirrus SitesTotal1,0294 PB12,0741,74626.3 TB
  • 59. Open Cirrus StackCompute + network + storage resources Management and control subsystemPower + cooling Physical Resource set (Zoni) serviceCredit: John Wilkes (HP)
  • 60. Open Cirrus StackResearchTashiNFS storage serviceHDFS storageservicePRS clients, each with theirown “physical data center”Zoni service
  • 61. Open Cirrus StackResearchTashiNFS storage serviceHDFS storageserviceVirtual clusterVirtual clusterVirtual clusters (e.g., Tashi)Zoni service
  • 62. Open Cirrus StackResearchTashiNFS storage serviceHDFS storageserviceVirtual clusterVirtual clusterApplication runningOn HadoopOn Tashi virtual clusterOn a PRSOn real hardwareBigData AppHadoopZoni service
  • 63. Open Cirrus StackResearchTashiNFS storage serviceHDFS storageserviceVirtual clusterVirtual clusterExperiment/save/restoreBigData appHadoopZoni service
  • 64. Open Cirrus StackResearchTashiNFS storage serviceHDFS storageserviceVirtual clusterVirtual clusterExperiment/save/restoreBigData AppHadoopPlatform servicesZoni service
  • 65. Open Cirrus StackResearchTashiNFS storage serviceHDFS storageserviceVirtual clusterVirtual clusterUser servicesExperiment/save/restoreBigData AppHadoopPlatform servicesZoni service
  • 66. Open Cirrus StackResearchTashiNFS storage serviceHDFS storageserviceVirtual clusterVirtual clusterUser servicesExperiment/save/restoreBigData AppHadoopPlatform servicesZoni
  • 67. System OrganizationCompute nodes are divided into dynamically-allocated, vlan-isolated PRS subdomainsApps switch back and forth between virtual and phyiscalOpenservice researchApps running in a VM mgmt infrastructure (e.g., Tashi)Tashi development Production storage service Proprietaryservice research Open workload monitoring and trace collection
  • 68. Open Cirrus stack - ZoniZoni service goalsProvide mini-datacenters to researchersIsolate experiments from each otherStable base for other researchZoni service approachAllocate sets of physical co-located nodes, isolated inside VLANs.Zoni code from HP being merged into Tashi Apache project and extended by IntelRunning on HP siteBeing ported to Intel siteWill eventually run on all sites
  • 69. Open Cirrus Stack - Tashi An open source Apache Software Foundation project sponsored by Intel (with CMU, Yahoo, HP)Infrastructure for cloud computing on Big Data http://incubator.apache.org/projects/tashi Research focus: Location-aware co-scheduling of VMs, storage, and power.Seamless physical/virtual migration. Joint with Greg Ganger (CMU), Mor Harchol-Balter (CMU), Milan Milenkovic (CTG)
  • 70. NodeNodeNodeNodeNodeNodeTashi High-Level DesignServices are instantiated through virtual machinesMost decisions happen inthe scheduler; manages compute/storage/power in concertData location and power informationis exposed to scheduler and servicesSchedulerVirtualization ServiceStorage ServiceThe storage service aggregates thecapacity of the commodity nodes to house Big Data repositories. ClusterManagerCluster nodes are assumed to be commodity machinesCM maintains databasesand routes messages;decision logic is limited
  • 72. 73Open Cirrus Stack - Hadoop An open-source Apache Software Foundation project sponsored by Yahoo! http://wiki.apache.org/hadoop/ProjectDescription Provides a parallel programming model (MapReduce), a distributed file system, and a parallel database (HDFS)
  • 73. What kinds of research projects are Open Cirrus sites looking for? Open Cirrus is seeking research in the following areas (different centers will weight these differently):Datacenter federationDatacenter managementWeb servicesData-intensive applications and systems The following kinds of projects are generally not of interest:Traditional HPC application developmentProduction applications that just need lots of cyclesClosed source system development
  • 74. How do users get access to Open Cirrus sites?Project PIs apply to each site separately. Contact names, email addresses, and web links for applications to each site will be available on the Open Cirrus Web site (which goes live Q209)http://opencirrus.orgEach Open Cirrus site decides which users and projects get access to its site.Developing a global sign on for all sites (Q2 09)Users will be able to login to each Open Cirrus site for which they are authorized using the same login and password.
  • 75. Summary and Lessons Intel is collaborating with HP and Yahoo! to provide a cloud computing testbed for the research communityUsing the cloud as an accelerator for interactive streaming/big data apps is an important usage model Primary goals are to Foster new systems research around cloud computingCatalyze open-source reference stack and APIs for the cloudAccess model, Local and global services, Application frameworksExplore location-aware and power-aware workload schedulingDevelop integrated physical/virtual allocations to combat cluster squattingDesign cloud storage modelsGFS-style storage systems not mature, impact of SSDs unknownInvestigate new application framework alternatives to map-reduce/Hadoop
  • 76. Other Cloud Computing Research Topics: Isolation and DC Energy
  • 77. Heterogeneity in Virtualized EnvironmentsVM technology isolates CPU and memory, but disk and network are sharedFull bandwidth when no contentionEqual shares when there is contention2.5x performance differenceEC2 small instances
  • 78. Isolation ResearchNeed predictable variance over raw performanceSome resources that people have run into problems with: Power, disk space, disk I/O rate (drive, bus), memory space (user/kernel), memory bus, cache at all levels (TLB, etc), hyperthreading/etc, CPU rate, interruptsNetwork: NIC (Rx/Tx), Switch, cross-datacenter, cross-countryOS resources: File descriptors, ports, sockets
  • 79. Datacenter EnergyEPA, 8/2007:1.5% of total U.S. energy consumptionGrowing from 60 to 100 Billion kWh in 5 yrs48% of typical IT budget spent on energy75 MW new DC deployments in PG&E’s service area – that they know about! (expect another 2x)Microsoft: $500m new Chicago facilityThree substations with a capacity of 198MW 200+ shipping containers w/ 2,000 servers eachOverall growth of 20,000/month
  • 81. First Milestone: DC Energy ConservationDCs limited by powerFor each dollar spent on servers, add $0.48 (2005)/$0.71 (2010) for power/cooling$26B spent to power and cool servers in 2005 grows to $45B in 2010Within DC racks, network equipment often the “hottest” components in the hot spot
  • 82. Thermal Image of Typical Cluster RackRackSwitchM. K. Patterson, A. Pratt, P. Kumar, “From UPS to Silicon: an end-to-end evaluation of datacenter efficiency”, Intel Corporation
  • 83. DC Networking and PowerSelectively power down ports/portions of net elementsEnhanced power-awareness in the network stackPower-aware routing and support for system virtualizationSupport for datacenter “slice” power down and restartApplication and power-aware media access/controlDynamic selection of full/half duplexDirectional asymmetry to save power, e.g., 10Gb/s send, 100Mb/s receive Power-awareness in applications and protocolsHard state (proxying), soft state (caching), protocol/data “streamlining” for power as well as b/w reductionPower implications for topology designTradeoffs in redundancy/high-availability vs. power consumptionVLANs support for power-aware system virtualization
  • 84. SummaryMany areas for research into Cloud Computing!Datacenter design, languages, scheduling, isolation, energy efficiency (at all levels)Opportunities to try out research at scale!Amazon EC2, Open Cirrus, …

Editor's Notes

  • #7: Just mention briefly that there are things MR and Dryad can’t do, and that there are competing implementations; perhaps also note the need to share resources with other data center services here?The excitement surrounding cluster computing frameworks like Hadoop continues to accelerate. (e.g. EC2 Hadoop and Dryad in Azure)Startups, enterprises, and us researchers are bursting with ideas to improve these already existing frameworks. But more importantly as we encounter the limitations of MR, we’re making a shopping list of what we want in next generation frameworks, new abstractions, programming models, even new implementations of existing models (e.g. Erlang MR called Disco).We believe that no single framework can best facilitate this innovation, but instead that people will want to run existing and new frameworks on the same physical clusters at the same time.
  • #10: Useful even if you only use one frameworkRun isolated framework instances (production vs test)Run multiple versions of framework together
  • #14: Global scheduler needs to make guesses about a lot more (job running times, etc)Talk about adaptive frameworks that may not know how many tasks they need in advanceTalk about irregular parallelism jobs that don’t even know DAG in advance**We are exploring resource offers but don’t yet know the limits; seem to work OK for jobs with data locality needs though**
  • #15: Global scheduler needs to make guesses about a lot more (job running times, etc)Talk about adaptive frameworks that may not know how many tasks they need in advanceTalk about irregular parallelism jobs that don’t even know DAG in advance**We are exploring resource offers but don’t yet know the limits; seem to work OK for jobs with data locality needs though**
  • #18: …multiple frameworks to run concurrently! Here we see a new framework, Dryad being run side by side with Hadoop, and Nexus is multiplexing the slaves between both. Some are running Hadoop tasks, some Dryad, and some both.
  • #19: …multiple frameworks to run concurrently! Here we see a new framework, Dryad being run side by side with Hadoop, and Nexus is multiplexing the slaves between both. Some are running Hadoop tasks, some Dryad, and some both.
  • #20: …multiple frameworks to run concurrently! Here we see a new framework, Dryad being run side by side with Hadoop, and Nexus is multiplexing the slaves between both. Some are running Hadoop tasks, some Dryad, and some both.
  • #21: …multiple frameworks to run concurrently! Here we see a new framework, Dryad being run side by side with Hadoop, and Nexus is multiplexing the slaves between both. Some are running Hadoop tasks, some Dryad, and some both.
  • #23: Waiting 1s gives 90% locality, 5s gives 95%
  • #24: Linux containers can actually be both “application” containers where an app shares the filesystem with the host (similar to Solaris projects), or “system” containers where each container has its own filesystem (similar to Solaris zones); both types also prevent processes in a container from seeing those outside it
  • #27: Transition to next slide: when you have policy == SLAs
  • #35: What to do with the rest of the resources?
  • #44: Mentioned shared HDFS!
  • #45: Mentioned shared HDFS!
  • #46: 16 Hadoop instances doing synthetic filter job100 nodes, 4 slots per nodeDelay scheduling improves performance by 1.7x