Programming the cloud with Skywriting

0 likes354 views

The document discusses the concept of 'skywriting,' a Turing-complete coordination language designed for programming in cloud environments, focusing on handling iterative algorithms, task farming, and speculative execution within heterogeneous clusters. It outlines how skywriting aids in distributed execution by parallelizing task handling while addressing cluster heterogeneity and failure management. Future directions include improvements in language integration and the development of decentralized master-worker systems for self-scaling clusters.

Technology

Programming the cloud withSkywritingDerek MurraywithMalteSchwarzkopf, Chris Smowton, Anil Madhavapeddy and Steve Hand

OutlineState of the artSkywriting by exampleIterative algorithmsHeterogeneous clustersSpeculative executionPerformance case studiesFuture directions

TaskTaskTaskTaskTask farmingTaskTaskTask

Problem: iterative algorithmsNot convergedTaskConverged

Problem: cluster heterogeneityMasterWorkerWorkerWorker

Solution: SkywritingTuring-complete coordination languageSupport for spawning tasksInterface to external codeDistributed execution engineExecutes tasks in parallel on a clusterHandles failure, locality, data motion, etc.

Spawning a Skywriting taskfunction f(arg1, arg2) { … }result = spawn(f, [arg1, arg2]);

Building a task graphfunction f(x, y) { … }function g(x, y){ … }function h(x, y) { … }a = spawn(f, [7, 8]);b = spawn(g, [a, 0]);c = spawn(g, [a, 1]);d = spawn(h, [b, c]);return d;faaggcbhd

Iterative algorithmcurrent = …;do {prev = current; a = spawn(f, [prev, 0]);b= spawn(f, [prev, 1]);c = spawn(f, [prev, 2]); current = spawn(g, [a, b, c]); done = spawn(h, [current]);while (!*done);

$Aside: recursive algorithmfunction f(x) {if (/* x is small enough */) {return /* do something with x */; } else {x_lo = /* bottom half of x */;x_hi = /* top half of x */;return [spawn(f, [x_lo]),spawn(f, [x_hi])]; }}$

$Executing external codey = exec(executor_name, { “inputs” : [x1, x2, x3], … },num_outputs);Run Java, C, .NET and pipe-based code$

Workers advertise “execution facilities”

Tasks migrate to necessary facilitiesSpeculative execution

Speculative executionx = …;a = spawn(f, [x]);b= spawn(f, [x]);c= spawn(f, [x]);result =waituntil(any, [a, b, c]);return result[“available”];

Performance case studiesAll experiments used Amazon EC2m1.smallinstances, running Ubuntu 8.10MicrobenchmarkSmith-Waterman

Future workDistributed data structuresCoping when the lists etc. get bigBetter language integrationCompile to JVM, CLR, LLVM etc.Decentralised master-workerRun on multiple cloudsSelf-scaling clustersAdd and remove workers as needed

Programming the cloud with Skywriting

1. Programming the cloud withSkywritingDerek MurraywithMalteSchwarzkopf, Chris Smowton, Anil Madhavapeddy and Steve Hand

2. OutlineState of the artSkywriting by exampleIterative algorithmsHeterogeneous clustersSpeculative executionPerformance case studiesFuture directions

3. TaskTaskTaskTaskTask farmingTaskTaskTask

4. Task farmingMasterWorkerWorkerWorker

6. Task farmingABruns before

7. MapReduceInputMapShuffleReduceOutput

9. Dryad

12. Problem: iterative algorithmsNot convergedTaskConverged

13. Problem: cluster heterogeneityMasterWorkerWorkerWorker

14. Problem: cluster heterogeneityMaster

15. Problem: cluster heterogeneityMaster

16. Problem: speculative execution

18. Solution: SkywritingTuring-complete coordination languageSupport for spawning tasksInterface to external codeDistributed execution engineExecutes tasks in parallel on a clusterHandles failure, locality, data motion, etc.

19. Spawning a Skywriting taskfunction f(arg1, arg2) { … }result = spawn(f, [arg1, arg2]);

20. Building a task graphfunction f(x, y) { … }function g(x, y){ … }function h(x, y) { … }a = spawn(f, [7, 8]);b = spawn(g, [a, 0]);c = spawn(g, [a, 1]);d = spawn(h, [b, c]);return d;faaggcbhd

21. Iterative algorithmcurrent = …;do {prev = current; a = spawn(f, [prev, 0]);b= spawn(f, [prev, 1]);c = spawn(f, [prev, 2]); current = spawn(g, [a, b, c]); done = spawn(h, [current]);while (!*done);

22. Iterative algorithmfffghfff

23. Aside: recursive algorithmfunction f(x) {if (/* x is small enough */) {return /* do something with x */; } else {x_lo = /* bottom half of x */;x_hi = /* top half of x */;return [spawn(f, [x_lo]),spawn(f, [x_hi])]; }}

24. Executing external codey = exec(executor_name, { “inputs” : [x1, x2, x3], … },num_outputs);Run Java, C, .NET and pipe-based code

25. Heterogeneous cluster support

26. Workers advertise “execution facilities”

27. Tasks migrate to necessary facilitiesSpeculative execution

28. Speculative executionx = …;a = spawn(f, [x]);b= spawn(f, [x]);c= spawn(f, [x]);result =waituntil(any, [a, b, c]);return result[“available”];

29. Performance case studiesAll experiments used Amazon EC2m1.smallinstances, running Ubuntu 8.10MicrobenchmarkSmith-Waterman

30. Job creation overhead

31. Smith-Waterman data flow

32. Parallel Smith-Waterman

33. Parallel Smith-Waterman

34. Future workDistributed data structuresCoping when the lists etc. get bigBetter language integrationCompile to JVM, CLR, LLVM etc.Decentralised master-workerRun on multiple cloudsSelf-scaling clustersAdd and remove workers as needed

35. ConclusionsUniversal programming model for cloud computingRuns real jobs with low overheadLots more still to do!

36. Questions? Email

37. Derek.Murray@cl.cam.ac.uk

38. Project website (source code, tutorial, etc.)

39. http://www.cl.cam.ac.uk/research/srg/netos/skywriting/

40. http://tinyurl.com/skywritingproj

Editor's Notes

#2: Thanks for the introduction, Eva. Well, as Eva said, my name’s Derek Murray, I’m a third year PhD student at Cambridge, and today I’m going to talk about Skywriting, which is a little bit of work I’ve been doing with these guys: Malte, Chris, Anil and my supervisor Steve Hand.Skywriting is a system for large-scale distributed computation – in this respect it’s similar to things like Google MapReduce and Microsoft’s Dryad – so that’s systems where your data or compute need is so big that you have to use a cluster in parallel to get the job done.It was the success of these systems – in particular Hadoop, the open-source MapReduce – that motivated us to start this work. What I found interesting was that people were using these things in entirely unexpected ways… taking MapReduce, which is excellent for log-processing, and running some big iterative machine learning algorithm on it. We reckoned that people were using MapReduce not because of its programming model, but despite it.So we set out to build something that combines all the advantages of previous systems, with a very flexible programming model. The result was Skywriting, so let’s see what you think…
#4: All the systems we’ll discuss today use the simple notion of task parallelism. Many algorithms can be divided into tasks, which are just chunks of sequential code. The key observation is that two independent tasks can run in parallel. And when your whole job divides into a fully independent bag of tasks, it’s said to be “embarrassingly parallel”.
#5: And how do you run these embarrassingly parallel jobs? Well, you give your bag of tasks to a master, which doles them out on demand to a set of workers.This is a very simple architecture to program. And it has a lot of benefits. If one of the workers crashes, fine! The master will notice and give that worker’s current task to someone else. And if a worker is a bit slower than the others, that’s also fine! Each worker pulls a new task when it has completed the last one, so even a heterogeneous pool can do useful work.
#6: Embarrassing parallelism is not very interesting: it only lets you do boring things like search for aliens and brute-force people’s passwords.
#7: It gets much more interesting – i.e. commercially useful – when the tasks have dependencies between them. So here, we have two tasks A and B, and a relation that says A must run before B. The usual reason for this is because A writes some output, and B wants to read it.Think of this like makefile rules. You can build up graphs out of these dependencies, and resolve them in parallel.In fact, the original name for this project was “Cloud Make”. Fortunately it changed….
#8: Are you all familiar with MapReduce?Introduced by Google in 2004, MapReduce used the observation that the map() function from functional programming can run in parallel over large lists. So they broke down their huge data into chunks, and ran each through a “map task”, generating some key-value pairs that are then sorted by key in this shuffle phase, and then the values for each key are folded in parallel using a “reduce task”.This basically uses the same master-worker task farm that I showed on a previous slide, with the single constraint that all the map tasks must finish before the reduce tasks begin. Therefore it had the benefit of working at huge scale, and being very reliable.
#10: A couple of years later, Microsoft, which also has a search engine, released “Dryad”, which generalisesMapReduce by allowing the user to specify a job as any directed acyclic graph. The graph has vertices – which are arbitrary sequential code in your favourite language – and channels, which could be files, in-memory FIFOs, TCP connections or whatever.Clearly you can implement MapReduce in Dryad, since it’s just a DAG. But Dryad makes things like Joins much easier, because a task can have multiple inputs.
#13: So far, we can run any finite directed acyclic graph using Dryad. As the name suggests, however, Dryad is not terribly good at cyclic data flows.These turn up all the time in fields like machine learning, scientific computing and information retrieval. Take PageRank, for example, which involves repeatedly premultiplying a vector by a large sparse matrix representing the web. You keep doing this until you reach a fixpoint, and the PageRank vector has converged.At present, all you can do is submit one job after another. This is bad for a number of reasons. First of all, it’s very slow: MapReduce and Dryad are designed for batch submission, and so starting an individual job takes on the order of 30 seconds. If your iteration is shorter than that, you’re losing out on parallel speedup.It also introduces a co-dependency between the client and the cluster. Now the client, which is just some simple program that submits jobs to the cluster, has to stay running for the duration of the job, but since it’s outside the cluster, it gets none of the advantages of fault-tolerance, of data locality, of fair scheduling. Since the client now contains critical job state, it’s necessary to add all these features manually.
#14: Remember our Master-worker architecture? Well, if you’ve ever tried to setup Hadoop or Dryad, you’ll know that you need to make sure all of the workers are the same, running the same operating system, on the same local network.
#15: But what if all you have is a little ad-hoc cluster, with a Windows desktop, a Linux server and a Mac laptop?
#16: Or, perhaps less contrived, what if your data are spread between different cloud providers. So you might have some data in Amazon S3, some in Google’s App Engine, and some in Windows Azure. Our mantra is “put the computation near the data”, and it’s not practical to shift all the data to one place.
#17: And what about this? Say you have a really important task to complete, but you don’t know how long it’ll take – maybe you’re using some kind of randomised algorithm. So you fire off three copies of the same task… and eventually one finishes. At this point, you can just kill the other two.Although MapReduce and Dryad have limited support for this, it’s not first-class: you can’t do it on demand, only in response to “straggler” nodes that take much longer to complete than others.
#18: I’ve spent quite a lot of slides being rather coy about what’s to come, but if you’ve read the abstract, you’ll know that Skywriting is
#19: …two things. First, instead of using DAGs to describe a job, we use the most powerful thing available to us: a Turing-complete coordination language. This sounds ominous and theoretical, but actually it’s just a programming language that looks a lot like JavaScript, with all the usual control flow structures, loops, ifs, functions and so on.Since we want to run things efficiently in parallel, it has support for spawning tasks, and a way to call external code.The other main component is the distributed execution engine, which actually executes Skywriting programs in the cluster. The interesting thing about this is that a “task” is just a Skywriting function – a continuation to be more precise – which means that tasks can spawn other tasks, and thereby grow the job dynamically.
#28: 1.0 – 1.2 GHz Xeon or Opteron. 1.7GB RAM, 150GB disk.
#31: 50 x 50 on 50 workers.Input size is
#32: Best score is 15x15 = 225 tasks, at 83 s (2.6x speedup).

Programming the cloud with Skywriting

More Related Content

What's hot (20)

Similar to Programming the cloud with Skywriting (20)

Recently uploaded (20)

Programming the cloud with Skywriting

Editor's Notes