SlideShare a Scribd company logo
Usage Patterns to Provision for ScientificExperimentation in CloudsEran Chinthaka Withana and Beth PlaleSchool of Informatics and Computing, Indiana UniversityBloomington, Indiana, USA.2nd International Conference on Cloud Computing Technology and Science, Indianapolis, IN, US
SummaryDoing Science in CloudImproving Scientific Job Executions in Cloud ResourcesRole of Successful Predictions to Reduce Startup OverheadsSystem ArchitectureUse of ReasoningEvaluationDiscussion and Future Work2
Clouds as a Complementary Solution to Grids for ScienceIssues with existing systemsBatch oriented HPC resources with long queue wait times, even under moderate loadsNo access transparency Quota system requires maximum resources to be known and approved in advanceAdvantages of using cloud resourcesAvailability of “unlimited” compute resources the instant they are neededPay-as-you-go model eliminates up-front commitmentsEncourages scientists to budget for the resources they are willing to payIssues with CloudsSlow interconnects virtualization overhead and startup timesConsumption based billingEmergence of new programming paradigms to exploit the advantages of Cloud resources3
Challenges with Cloud Computing ResourcesScheduling algorithmsFocused on optimal utilization of relatively homogeneous grid or cluster resourcesResources can be provisioned supporting user requirements in cloudsPrediction AlgorithmsDifferent hardware configurations forces execution time predictions to factor non-uniformity of resources 4
Improving Scientific Job Executions in Cloud ResourcesSolution SpaceMeta-scheduler that uses historical information to anticipate future activity (AppleS, GRADS)Resource abstraction service (Nimrod/G)Reducing the impact of startup overheads, learning from user behavioral patterns, by predicting future jobsTalk outlineAlgorithm to predict future jobs by extracting user patterns from historical informationReduces the impact of high startup overheads for time-critical applicationsUse of knowledge-based techniquesZero knowledge or pre-populated job information consisting of connection between jobsSimilar cases retrieved are used to predict future jobs, reducing high startup overheadsAlgorithm assessment Two different workloads representing individual scientific jobs executed in LANL and set of workflows executed by three users5
Use CaseSuite of workflows can differ from domain to domainWRF (Weather Research and Forecasting) as upstream nodeMeteorologists will run pre-processing jobs to generate visualization of parametersIn Agriculture, scientists will use for crop predictionWild-fire propagation and predictionGenerate visualizations for mobile phones using NCL scriptsAtmospheric Scientists for optimal placement of wind farmsUser patterns reveal the sequence of jobs taking different users/domains into considerationUseful for a science gateway serving wide-range of mid-scale scientists6Weather PredictionsCrop PredictionsWRFWind Farm Location EvaluationsWild Fire Propagation Simulation
Role of Successful Predictions to Reduce Startup OverheadsLargest gain can be achieved when our prediction accuracy is high and setup time (s) is large with respect to execution time (t)r = probability of successful prediction (prediction accuracy)Percentage time  =reductionFor simplicity, assuming equal job exec and startup times Percentage time  =reduction7
Relationship of Predictions to Execution TimeObservationsPercentage time reduction increases with accuracy of predictionsTime reduction is reduced exponentially with increased work-to-overhead ratioNeed to find the criticalpoint for a given situationFixing the required percentage time reduction for a given t/s ratio and finding the required accuracy of predictionsCost of wrong predictionsDepends on compute resourcePercentage time  =reduction8Accuracy of Predictions =          total successful future job predictions / total predictions
Prediction Engine: System ArchitecturePredictionRetriever9
Use of ReasoningStore and retrieve casesStepsRetrieval of similar casesSimilarity measurementUse of thresholdsReuse of old casesCase adaptationStorage10
Case Similarity CalculationEach case is represented using set of attributesSelected by finding the effect on goal variable (next job)11
Evaluation1Use casesIndividual job workload140k jobs over two years from 1024-node CM-5 at Los Alamos National LabWorkflow use case1: Parallel Workload Archive http://www.cs.huji.ac.il/labs/parallel/workload/ 12
Evaluation: Average Accuracy of Predictions13Individual Jobs WorkloadWorkflow Workload
Evaluation: Time SavedAmount of time that can be saved, if the resources are provisioned, when the job is ready to runStartup timeAssumed to be 3mins (average for commercial providers)14Individual Jobs WorkloadWorkflow Workload
Evaluation: Prediction Accuracies for Use Cases15
Discussion and Future WorkAccuracy 78% for individual jobs96% for workflow workloadNumber of jobs required to make system stable depends on uniqueness and the distribution of unique applicationsAmount of time that can be saved, using future job prediction, is inversely proportional to t/s ratioMore accurate methods to prune features and identify weightsEvaluation of machine learning techniques as an alternative to knowledge-based systemsCombining future job predictions with job reliability predictions to further improve throughput of job executions16
Related Work[1] M. Armbrust et al., “Above the clouds: A berkeley view of cloud computing,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-28, 2009.[2] J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008. [3] C. Catlett, “The philosophy of TeraGrid: building an open, extensible, distributed TeraScale facility,” in ACM International Symposium on Cluster Computing and the Grid. Published by the IEEE Computer Society, 2002.[4] J. S. Chase, D. E. Irwin, L. E. Grit, J. D. Moore, and S. Sprenkle, “Dynamic virtual clusters in a grid site manager.” in HPDC. IEEE Computer Society, 2003, pp. 90–103. [5] R. J. Figueiredo, P. A. Dinda, and J. A. B. Fortes, “A case for grid computing on virtual machines,” in ICDCS ’03: Proceedings of the 23rd International Conference on Distributed Computing Systems. Washington, DC, USA: IEEE Computer Society, 2003, p. 550.[6] I. Foster, T. Freeman, K. Keahy, D. Scheftner, B. Sotomayer, and X. Zhang, “Virtual clusters for grid communities,” in CCGRID ’06: Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid.  Washington, DC, USA: IEEE Computer Society, 2006, pp. 513–520.[7] K. Keahey, T. Freeman, J. Lauret, and D. Olson, “Virtual workspaces for scientific applications,” Journal of Physics: Conference Series, vol. 78, p. 012038 (5pp), 2007.[8] B. Sotomayor, K. Keahey, and I. Foster, “Overhead matters: A model for virtual resource management,” in VTDC ’06: Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing. Washington, DC, USA: IEEE Computer Society, 2006, p. 5.  ………………………………………………………….[12] F. Berman et al., “Adaptive computing on the grid using apples,” IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 4, pp. 369–382, 2003. [13] F. Berman, A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, J. Mellor-Crumme et al., “The GrADS project: Software support for high-level grid application development,” International Journal of High Performance Computing Applications, vol. 15, no. 4, p. 327, 2001.[14] R. Buyya, D. Abramson, and J. Giddy, “Nimrod/G: An architecture for a resource management and scheduling system in a global computational grid,” in hpc. Published by the IEEE Computer Society, 2000, p. 283.17
Thank You !!

More Related Content

PPT
PPTX
User Inspired Management of Scientific Jobs in Grids and Clouds
PPTX
Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...
PDF
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
PDF
Data repository for sensor network a data mining approach
PDF
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
PPTX
Task Scheduling methodology in cloud computing
PDF
Intelligent flood disaster warning on the fly: developing IoT-based managemen...
User Inspired Management of Scientific Jobs in Grids and Clouds
Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Data repository for sensor network a data mining approach
EPAS: A SAMPLING BASED SIMILARITY IDENTIFICATION ALGORITHM FOR THE CLOUD
Task Scheduling methodology in cloud computing
Intelligent flood disaster warning on the fly: developing IoT-based managemen...

What's hot (20)

PPTX
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
PPTX
Open Science Data Cloud - CCA 11
PPTX
A time efficient approach for detecting errors in big sensor data on cloud
PDF
Nephele pegasus
PPTX
Open Science Data Cloud (IEEE Cloud 2011)
PDF
Volume 2-issue-6-1933-1938
PPTX
Bionimbus Cambridge Workshop (3-28-11, v7)
PDF
Big data and open access: a collision course for science
PDF
Demand-driven Gaussian window optimization for executing preferred population...
PDF
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
PDF
A time efficient approach for detecting errors in big sensor data on cloud
PDF
International Journal of Engineering Research and Development (IJERD)
PPT
Semantics in Sensor Networks
PDF
A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...
PPTX
Energy-aware Task Scheduling using Ant-colony Optimization in cloud
PDF
Paper444012-4014
PPTX
An optimized scientific workflow scheduling in cloud computing
PDF
Handling Selfishness in Replica Allocation over a Mobile Ad-Hoc Network
PDF
A Review on Scheduling in Cloud Computing
PPTX
Task scheduling Survey in Cloud Computing
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Open Science Data Cloud - CCA 11
A time efficient approach for detecting errors in big sensor data on cloud
Nephele pegasus
Open Science Data Cloud (IEEE Cloud 2011)
Volume 2-issue-6-1933-1938
Bionimbus Cambridge Workshop (3-28-11, v7)
Big data and open access: a collision course for science
Demand-driven Gaussian window optimization for executing preferred population...
A Survey on Improve Efficiency And Scability vertical mining using Agriculter...
A time efficient approach for detecting errors in big sensor data on cloud
International Journal of Engineering Research and Development (IJERD)
Semantics in Sensor Networks
A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...
Energy-aware Task Scheduling using Ant-colony Optimization in cloud
Paper444012-4014
An optimized scientific workflow scheduling in cloud computing
Handling Selfishness in Replica Allocation over a Mobile Ad-Hoc Network
A Review on Scheduling in Cloud Computing
Task scheduling Survey in Cloud Computing
Ad

Similar to Usage Patterns to Provision for Scientific Experiments in Clouds (20)

PDF
Ieeepro techno solutions 2014 ieee dotnet project - deadline based resource...
PDF
Ieeepro techno solutions 2014 ieee java project - deadline based resource p...
PDF
Ieeepro techno solutions 2014 ieee dotnet project - deadline based resource...
PDF
Hybrid fault tolerant cost aware mechanism for scientific workflow in cloud c...
PDF
Iaetsd active resource provision in cloud computing
PDF
Qo s aware scientific application scheduling algorithm in cloud environment
PDF
Qo s aware scientific application scheduling algorithm in cloud environment
PDF
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
PDF
A multi-core makespan model for parallel scientific workflow execution in clo...
PDF
Evolutionary Multi-Goal Workflow Progress in Shade
PDF
IRJET- Cost Effective Workflow Scheduling in Bigdata
PPT
Computing Outside The Box September 2009
PPTX
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
PDF
Multi objective genetic approach with Ranking
PPTX
Multi-Tenancy and Virtualization in Cloud Computing
PPT
Computing Outside The Box June 2009
PPT
IaaS Cloud Benchmarking: Approaches, Challenges, and Experience
PDF
(5 10) chitra natarajan
PDF
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
PDF
IRJET- A Statistical Approach Towards Energy Saving in Cloud Computing
Ieeepro techno solutions 2014 ieee dotnet project - deadline based resource...
Ieeepro techno solutions 2014 ieee java project - deadline based resource p...
Ieeepro techno solutions 2014 ieee dotnet project - deadline based resource...
Hybrid fault tolerant cost aware mechanism for scientific workflow in cloud c...
Iaetsd active resource provision in cloud computing
Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environment
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
A multi-core makespan model for parallel scientific workflow execution in clo...
Evolutionary Multi-Goal Workflow Progress in Shade
IRJET- Cost Effective Workflow Scheduling in Bigdata
Computing Outside The Box September 2009
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Multi objective genetic approach with Ranking
Multi-Tenancy and Virtualization in Cloud Computing
Computing Outside The Box June 2009
IaaS Cloud Benchmarking: Approaches, Challenges, and Experience
(5 10) chitra natarajan
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
IRJET- A Statistical Approach Towards Energy Saving in Cloud Computing
Ad

More from Eran Chinthaka Withana (7)

PPTX
Cassandra At Wize Commerce
PPTX
Opensource development and apache software foundation
PPTX
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...
PPTX
Versioning for Workflow Evolution
PPTX
Web Services in the Real World
PPTX
Axis2 Landscape
PPT
CBR Based Workflow Composition Assistant
Cassandra At Wize Commerce
Opensource development and apache software foundation
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...
Versioning for Workflow Evolution
Web Services in the Real World
Axis2 Landscape
CBR Based Workflow Composition Assistant

Usage Patterns to Provision for Scientific Experiments in Clouds

  • 1. Usage Patterns to Provision for ScientificExperimentation in CloudsEran Chinthaka Withana and Beth PlaleSchool of Informatics and Computing, Indiana UniversityBloomington, Indiana, USA.2nd International Conference on Cloud Computing Technology and Science, Indianapolis, IN, US
  • 2. SummaryDoing Science in CloudImproving Scientific Job Executions in Cloud ResourcesRole of Successful Predictions to Reduce Startup OverheadsSystem ArchitectureUse of ReasoningEvaluationDiscussion and Future Work2
  • 3. Clouds as a Complementary Solution to Grids for ScienceIssues with existing systemsBatch oriented HPC resources with long queue wait times, even under moderate loadsNo access transparency Quota system requires maximum resources to be known and approved in advanceAdvantages of using cloud resourcesAvailability of “unlimited” compute resources the instant they are neededPay-as-you-go model eliminates up-front commitmentsEncourages scientists to budget for the resources they are willing to payIssues with CloudsSlow interconnects virtualization overhead and startup timesConsumption based billingEmergence of new programming paradigms to exploit the advantages of Cloud resources3
  • 4. Challenges with Cloud Computing ResourcesScheduling algorithmsFocused on optimal utilization of relatively homogeneous grid or cluster resourcesResources can be provisioned supporting user requirements in cloudsPrediction AlgorithmsDifferent hardware configurations forces execution time predictions to factor non-uniformity of resources 4
  • 5. Improving Scientific Job Executions in Cloud ResourcesSolution SpaceMeta-scheduler that uses historical information to anticipate future activity (AppleS, GRADS)Resource abstraction service (Nimrod/G)Reducing the impact of startup overheads, learning from user behavioral patterns, by predicting future jobsTalk outlineAlgorithm to predict future jobs by extracting user patterns from historical informationReduces the impact of high startup overheads for time-critical applicationsUse of knowledge-based techniquesZero knowledge or pre-populated job information consisting of connection between jobsSimilar cases retrieved are used to predict future jobs, reducing high startup overheadsAlgorithm assessment Two different workloads representing individual scientific jobs executed in LANL and set of workflows executed by three users5
  • 6. Use CaseSuite of workflows can differ from domain to domainWRF (Weather Research and Forecasting) as upstream nodeMeteorologists will run pre-processing jobs to generate visualization of parametersIn Agriculture, scientists will use for crop predictionWild-fire propagation and predictionGenerate visualizations for mobile phones using NCL scriptsAtmospheric Scientists for optimal placement of wind farmsUser patterns reveal the sequence of jobs taking different users/domains into considerationUseful for a science gateway serving wide-range of mid-scale scientists6Weather PredictionsCrop PredictionsWRFWind Farm Location EvaluationsWild Fire Propagation Simulation
  • 7. Role of Successful Predictions to Reduce Startup OverheadsLargest gain can be achieved when our prediction accuracy is high and setup time (s) is large with respect to execution time (t)r = probability of successful prediction (prediction accuracy)Percentage time =reductionFor simplicity, assuming equal job exec and startup times Percentage time =reduction7
  • 8. Relationship of Predictions to Execution TimeObservationsPercentage time reduction increases with accuracy of predictionsTime reduction is reduced exponentially with increased work-to-overhead ratioNeed to find the criticalpoint for a given situationFixing the required percentage time reduction for a given t/s ratio and finding the required accuracy of predictionsCost of wrong predictionsDepends on compute resourcePercentage time =reduction8Accuracy of Predictions = total successful future job predictions / total predictions
  • 9. Prediction Engine: System ArchitecturePredictionRetriever9
  • 10. Use of ReasoningStore and retrieve casesStepsRetrieval of similar casesSimilarity measurementUse of thresholdsReuse of old casesCase adaptationStorage10
  • 11. Case Similarity CalculationEach case is represented using set of attributesSelected by finding the effect on goal variable (next job)11
  • 12. Evaluation1Use casesIndividual job workload140k jobs over two years from 1024-node CM-5 at Los Alamos National LabWorkflow use case1: Parallel Workload Archive http://www.cs.huji.ac.il/labs/parallel/workload/ 12
  • 13. Evaluation: Average Accuracy of Predictions13Individual Jobs WorkloadWorkflow Workload
  • 14. Evaluation: Time SavedAmount of time that can be saved, if the resources are provisioned, when the job is ready to runStartup timeAssumed to be 3mins (average for commercial providers)14Individual Jobs WorkloadWorkflow Workload
  • 16. Discussion and Future WorkAccuracy 78% for individual jobs96% for workflow workloadNumber of jobs required to make system stable depends on uniqueness and the distribution of unique applicationsAmount of time that can be saved, using future job prediction, is inversely proportional to t/s ratioMore accurate methods to prune features and identify weightsEvaluation of machine learning techniques as an alternative to knowledge-based systemsCombining future job predictions with job reliability predictions to further improve throughput of job executions16
  • 17. Related Work[1] M. Armbrust et al., “Above the clouds: A berkeley view of cloud computing,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-28, 2009.[2] J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008. [3] C. Catlett, “The philosophy of TeraGrid: building an open, extensible, distributed TeraScale facility,” in ACM International Symposium on Cluster Computing and the Grid. Published by the IEEE Computer Society, 2002.[4] J. S. Chase, D. E. Irwin, L. E. Grit, J. D. Moore, and S. Sprenkle, “Dynamic virtual clusters in a grid site manager.” in HPDC. IEEE Computer Society, 2003, pp. 90–103. [5] R. J. Figueiredo, P. A. Dinda, and J. A. B. Fortes, “A case for grid computing on virtual machines,” in ICDCS ’03: Proceedings of the 23rd International Conference on Distributed Computing Systems. Washington, DC, USA: IEEE Computer Society, 2003, p. 550.[6] I. Foster, T. Freeman, K. Keahy, D. Scheftner, B. Sotomayer, and X. Zhang, “Virtual clusters for grid communities,” in CCGRID ’06: Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid. Washington, DC, USA: IEEE Computer Society, 2006, pp. 513–520.[7] K. Keahey, T. Freeman, J. Lauret, and D. Olson, “Virtual workspaces for scientific applications,” Journal of Physics: Conference Series, vol. 78, p. 012038 (5pp), 2007.[8] B. Sotomayor, K. Keahey, and I. Foster, “Overhead matters: A model for virtual resource management,” in VTDC ’06: Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing. Washington, DC, USA: IEEE Computer Society, 2006, p. 5. ………………………………………………………….[12] F. Berman et al., “Adaptive computing on the grid using apples,” IEEE Transactions on Parallel and Distributed Systems, vol. 14, no. 4, pp. 369–382, 2003. [13] F. Berman, A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, J. Mellor-Crumme et al., “The GrADS project: Software support for high-level grid application development,” International Journal of High Performance Computing Applications, vol. 15, no. 4, p. 327, 2001.[14] R. Buyya, D. Abramson, and J. Giddy, “Nimrod/G: An architecture for a resource management and scheduling system in a global computational grid,” in hpc. Published by the IEEE Computer Society, 2000, p. 283.17