Usage Patterns to Provision for Scientific Experiments in Clouds

Usage Patterns to Provision for ScientificExperimentation in CloudsEran Chinthaka Withana and Beth PlaleSchool of Informatics and Computing, Indiana UniversityBloomington, Indiana, USA.2nd International Conference on Cloud Computing Technology and Science, Indianapolis, IN, US

SummaryDoing Science in CloudImproving Scientific Job Executions in Cloud ResourcesRole of Successful Predictions to Reduce Startup OverheadsSystem ArchitectureUse of ReasoningEvaluationDiscussion and Future Work2

Clouds as a Complementary Solution to Grids for ScienceIssues with existing systemsBatch oriented HPC resources with long queue wait times, even under moderate loadsNo access transparency Quota system requires maximum resources to be known and approved in advanceAdvantages of using cloud resourcesAvailability of “unlimited” compute resources the instant they are neededPay-as-you-go model eliminates up-front commitmentsEncourages scientists to budget for the resources they are willing to payIssues with CloudsSlow interconnects virtualization overhead and startup timesConsumption based billingEmergence of new programming paradigms to exploit the advantages of Cloud resources3

Challenges with Cloud Computing ResourcesScheduling algorithmsFocused on optimal utilization of relatively homogeneous grid or cluster resourcesResources can be provisioned supporting user requirements in cloudsPrediction AlgorithmsDifferent hardware configurations forces execution time predictions to factor non-uniformity of resources 4

Improving Scientific Job Executions in Cloud ResourcesSolution SpaceMeta-scheduler that uses historical information to anticipate future activity (AppleS, GRADS)Resource abstraction service (Nimrod/G)Reducing the impact of startup overheads, learning from user behavioral patterns, by predicting future jobsTalk outlineAlgorithm to predict future jobs by extracting user patterns from historical informationReduces the impact of high startup overheads for time-critical applicationsUse of knowledge-based techniquesZero knowledge or pre-populated job information consisting of connection between jobsSimilar cases retrieved are used to predict future jobs, reducing high startup overheadsAlgorithm assessment Two different workloads representing individual scientific jobs executed in LANL and set of workflows executed by three users5

Use CaseSuite of workflows can differ from domain to domainWRF (Weather Research and Forecasting) as upstream nodeMeteorologists will run pre-processing jobs to generate visualization of parametersIn Agriculture, scientists will use for crop predictionWild-fire propagation and predictionGenerate visualizations for mobile phones using NCL scriptsAtmospheric Scientists for optimal placement of wind farmsUser patterns reveal the sequence of jobs taking different users/domains into considerationUseful for a science gateway serving wide-range of mid-scale scientists6Weather PredictionsCrop PredictionsWRFWind Farm Location EvaluationsWild Fire Propagation Simulation

Role of Successful Predictions to Reduce Startup OverheadsLargest gain can be achieved when our prediction accuracy is high and setup time (s) is large with respect to execution time (t)r = probability of successful prediction (prediction accuracy)Percentage time =reductionFor simplicity, assuming equal job exec and startup times Percentage time =reduction7

Relationship of Predictions to Execution TimeObservationsPercentage time reduction increases with accuracy of predictionsTime reduction is reduced exponentially with increased work-to-overhead ratioNeed to find the criticalpoint for a given situationFixing the required percentage time reduction for a given t/s ratio and finding the required accuracy of predictionsCost of wrong predictionsDepends on compute resourcePercentage time =reduction8Accuracy of Predictions = total successful future job predictions / total predictions

Prediction Engine: System ArchitecturePredictionRetriever9

Use of ReasoningStore and retrieve casesStepsRetrieval of similar casesSimilarity measurementUse of thresholdsReuse of old casesCase adaptationStorage10

Case Similarity CalculationEach case is represented using set of attributesSelected by finding the effect on goal variable (next job)11

Evaluation1Use casesIndividual job workload140k jobs over two years from 1024-node CM-5 at Los Alamos National LabWorkflow use case1: Parallel Workload Archive http://www.cs.huji.ac.il/labs/parallel/workload/ 12

Evaluation: Average Accuracy of Predictions13Individual Jobs WorkloadWorkflow Workload

Evaluation: Time SavedAmount of time that can be saved, if the resources are provisioned, when the job is ready to runStartup timeAssumed to be 3mins (average for commercial providers)14Individual Jobs WorkloadWorkflow Workload

Evaluation: Prediction Accuracies for Use Cases15

Discussion and Future WorkAccuracy 78% for individual jobs96% for workflow workloadNumber of jobs required to make system stable depends on uniqueness and the distribution of unique applicationsAmount of time that can be saved, using future job prediction, is inversely proportional to t/s ratioMore accurate methods to prune features and identify weightsEvaluation of machine learning techniques as an alternative to knowledge-based systemsCombining future job predictions with job reliability predictions to further improve throughput of job executions16

Related Work[1] M. Armbrust et al., “Above the clouds: A berkeley view of cloud computing,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-28, 2009.[2] J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008. [3] C. Catlett, “The philosophy of TeraGrid: building an open, extensible, distributed TeraScale facility,” in ACM International Symposium on Cluster Computing and the Grid. Published by the IEEE Computer Society, 2002.[4] J. S. Chase, D. E. Irwin, L. E. Grit, J. D. Moore, and S. Sprenkle, “Dynamic virtual clusters in a grid site manager.” in HPDC. IEEE Computer Society, 2003, pp. 90–103. [5] R. J. Figueiredo, P. A. Dinda, and J. A. B. Fortes, “A case for grid computing on virtual machines,” in ICDCS ’03: Proceedings of the 23rd International Conference on Distributed Computing Systems. Washington, DC, USA: IEEE Computer Society, 2003, p. 550.[6] I. Foster, T. Freeman, K. Keahy, D. Scheftner, B. Sotomayer, and X. Zhang, “Virtual clusters for grid communities,” in CCGRID ’06: Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid. Washington, DC, USA: IEEE Computer Society, 2006, pp. 513–520.[7] K. Keahey, T. Freeman, J. Lauret, and D. Olson, “Virtual workspaces for scientific applications,” Journal of Physics: Conference Series, vol. 78, p. 012038 (5pp), 2007.[8] B. Sotomayor, K. Keahey, and I. Foster, “Overhead matters: A model for virtual resource management,” in VTDC ’06: Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing. Washington, DC, USA: IEEE Computer Society, 2006, p. 5. ………………………………………………………….[12] F. Berman et al., “Adaptive computing on the grid using apples,” IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 4, pp. 369–382, 2003. [13] F. Berman, A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, J. Mellor-Crumme et al., “The GrADS project: Software support for high-level grid application development,” International Journal of High Performance Computing Applications, vol. 15, no. 4, p. 327, 2001.[14] R. Buyya, D. Abramson, and J. Giddy, “Nimrod/G: An architecture for a resource management and scheduling system in a global computational grid,” in hpc. Published by the IEEE Computer Society, 2000, p. 283.17

Usage Patterns to Provision for Scientific Experiments in Clouds

More Related Content

What's hot (20)

Similar to Usage Patterns to Provision for Scientific Experiments in Clouds (20)

More from Eran Chinthaka Withana (7)

Usage Patterns to Provision for Scientific Experiments in Clouds