SLPAY: Distributed Systems Made Simple

ᔕᕈᒪᐱᓭ
Distributed Systems
Made Simple
Pascal Felber, Raluca Halalai, Lorenzo Leonini,
Etienne Rivière,Valerio Schiavoni, José Valerio
Université de Neuchâtel, Switzerland
www.splay-project.org

ᔕᕈᒪᐱᓭ
Monday, January 23, 12

2

Motivations
• Developing, testing, and tuning distributed
applications is hard
• In Computer Science research, ﬁxing the
gap of simplicity between pseudocode
description and implementation is hard
• Using worldwide testbeds is hard

ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel

3
Networks of idle
Testbeds workstations

• Set of machines for
testing distributed
application/protocol
• Several different testbeds!

A cluster
@UniNE

Your machine

4

What is PlanetLab?


4

What is PlanetLab?
• Machines contributed by universities,
companies,etc.
• 1098 nodes at 531 sites (02/09/2011)
• Shared resources, no privileged access
• University-quality Internet links
• High resource contention
• Faults, churn, packet-loss is the norm
• Challenging conditions

5

Daily Job With
Distributed Systems
1
code

1 • Write (testbed speciﬁc) code
• Local tests, in-house cluster, PlanetLab...


5

Daily Job With
Distributed Systems
2
1
code debug

2 • Debug (in this context, a nightmare)


5

Daily Job With
Distributed Systems
3
1 2
code debug deploy

3 • Deploy, with testbed speciﬁc scripts


5

Daily Job With
Distributed Systems
4
1 2 3
code debug deploy get logs

4 • Get logs, with testbed speciﬁc scripts

5

Daily Job With
Distributed Systems
1 2 3 4 5
code debug deploy get logs plots

5 • Produce plots, hopefully

6

ᔕᕈᒪᐱᓭ at a Glance
code debug deploy get logs plots t...
plo
gnu

• Supports the development, evaluation, testing, and
tuning of distributed applications on any testbed:
• In-house cluster, shared testbeds, emulated
environments...
• Provides an easy-to-use pseudocode-like language
based on Lua
• Open-source: http://www.splay-project.org


7

The Big Picture

application

splayd

splayd

splayd

splayd

Splay
Controller
SQL DB


7

The Big Picture

application
splayd splayd

splayd

splayd

Splay
Controller
SQL DB


7

The Big Picture

splayd splayd

application

splayd

splayd

Splay
Controller
SQL DB


8

SPLAY architecture
• Portable daemons (ANSI
C) on the testbed(s)
• Testbed agnostic
for the user
• Lightweight
(memory & CPU)
• Retrieve results
by remote logging


9

Lua

SPLAY language
ributed system. It is noteworthy that the churn manage- Lua’s interpreter can directly execute source code,

•
ment system relieves the need for fault injection systems as well as hardware-dependent (but operating system-
Based on Lua
uch as Loki [13]. Another typical use of the churn man- independent) bytecode. In S PLAY, the favored way of
agement system is for long-running applications, e.g., a submitting applications is in the form of source code, but
• High-level scripting language, simple interaction with
DHT that serves as a substrate for some other distributed
application under test and needs to stay available for the
bytecode programs are also supported (e.g., for intellec-
tual property protection).
C, bytecode-based, garbage collection, ...
whole duration of the experiments. In such a scenario, Isolation and sandboxing are achieved thanks to Lua’s

•
one can ask the churn manager to maintain a fixed size support for first-class functions with lexical scoping and
Close to pseudo code (focus on the algorithm)
population of nodes and to automatically bootstrap new closures, which allow us to restrict access to I/O and net-
ones as faults occur in the testbed.
•
working libraries. We modify the behavior of these func-
Favors natural ways of expressing algorithms (e.g., RPCs)
tions to implement the restriction imposed by the admin-
istrator or by the user at the time he/she submits the ap-
•
3.3 Language and Applications
SPLAY applications can be run locally without the plication for deployment over S PLAY.
Lua also supports cooperative multitasking by the
S PLAY applications are written in the Lua language [17],
a highly efficient scripting language. infrastructure (for debugging & testing)
deployment This design choice means of coroutines, which are at the core of S PLAY’s

•
was dictated by four majors factors. First, the most im- event-based model (discussed below).
portant reason is the support of sandboxingsetremote
Comprehensive for of events/threads
processes and, as a result, increased security both for the
libraries (extensible)
luasocket* io (fs)* stdlib*
sb_socket socketevents sb_fs sb_stdlib
estbed owner and its users. As mentioned earlier, sand-
boxing is a sound basis for execution in non-dedicated crypto* llenc json*

environments, where resources need to be constrained misc log rpc splay::app
and where the hosting operating system must be shielded * : third!party and lua libraries : main dependencies
rom possibly buggy or ill-behaved code. Second, one
ᔕᕈᒪᐱᓭ Distributed Systems Made Simple -Figure Felber - University of main S PLAY libraries.
of S PLAY’s goals is to support large numbers of pro- Pascal 6: Overview of the Neuchâtel
cesses within23, 12
Monday, January a single host of the testbed. This calls for

10

Why ?
• Light & Fast
• (Very) close to equivalent code in C
• Concise
• Allow developers to focus on ideas more
than implementation details
• Key for researchers and students
• Sandbox thanks to the possibility of easily
redeﬁne (even built-in) functions

11

Concise
Pseudo code
as published
on original
paper

Executable
code using
SPLAY
libraries


12

http://www.splay-project.org

13

Sandbox: Motivations
• Experiments should access only
their own resources
• Required for non-dedicated testbeds
• Idle workstations in universities or companies
• Must protect the regular users from ill-behaved
code (e.g., full disk, full memory, etc.)
• Memory-allocation, ﬁlesystem, network resources
must be restricted

14

Sandboxing with Lua
• In Lua, all functions can be re-defined transparently
• Resource control done in re-defined functions
before calling the original function
• No code modification required for the user

Sandboxing

SPLAY Application
SPLAY Lua libraries
(includes all stdlib and system calls)

Operating system


15

for _,module in pairs(splay)
present novelties
introduced due to dist
systems

luasocket events io

crypto llenc/benc json

misc log rpc

Modules sandboxed to prevent
access to sensible resources

17

splay.events
• Distributed protocols can use message-
passing paradigm to communicate
• Nodes react to events events
• Local, incoming, outgoing messages
• The core of the Splay runtime
• Libraries splay.socket & splay.io
provide non-blocking operational mode
• Based on Lua’s coroutines


19

splay.rpc

• Default mechanism to communicate
between nodes
• Support for UDP/TCP
• Efﬁcient BitTorrent-like encoding rpc

• Experimental binary encoding


responseCallback.onResponseReceived(msg);

}/*
20
* CAN BE COMMENTED AS SEND RETRIAL OF MESSAGES IS OFF else { it is a

Life Before ᔕᕈᒪᐱᓭ
* duplicate lookup response, send a PONG to (hopefully) stop source to

* send us again that message final Message pong = new

* Message(MessageType.PONG, msg.messageId, this, msg.src); pong.ackedMsg

* = msg; simulator.send(pong); }

*/

• Time spent on developing testbed
}

/* ELSE IF IS SOMETHING NOT A LOOKUP RESPONSE */

else {

final Message beingAcked = msg.ackedMsg; speciﬁc protocols
• Or fallback to simulations...
final ResponseArrivedCallback handler = this.sentMessagesCallbacks

.remove(beingAcked);

• The focus should be on ideas
/* if there is a mapping */

if (handler != null) {

handler.onResponseReceived(msg);

• Researchers usually have no time
}/*

* else { this may happen in case of a response arrive later than

* expected... log .info("%%% " + this + " received the response => " +

to produce industrial-quality code
* msg +

* " but no handler was expecting it! So, simply send back a PONG to make it stop sending again..n"

* ); send a PONG to (hopefully) stop source to send us again final

* Message pong = new Message(MessageType.PONG, msg.messageId, this,

there is more :-(
* msg.src); pong.ackedMsg = msg; simulator.send(pong); }

*/

}

// /* it replied at last, so "reset" the value */

// this.nodeConsecutiveTimeouts.put((DHTNode) msg.src, 0);

}

such as epidemic diffusion on Erd¨ s-Renyi random 21
o
graphs [16] and various types of distribution trees [10]
Life With ᔕᕈᒪᐱᓭ
(n-ary trees, parallel trees). As one can note from the fol-
lowing ﬁgure, all implementations are extremely concise
7
in terms of lines of code (LOC):

Chord (base) 59 (base) + 17 (FT) + 26 (leafset) = 101
Pastry 265
Scribe Pastry 79
SplitStream Pastry Scribe 58
BitTorrent 420
Cyclon 93
Epidemic 35
Trees 47

Although the number of lines is clearly just a rough
• Lines thepseudocode ~== Lines of executableitcodestill a
indicator of
of
expressiveness of a system, is
5 Splay
daemons have been continuously running of Neuchâtel hosts for
ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University on these
more than one year.

There exist several characterizations of churn that can
22
y be leveraged to reproduce realistic conditions for the pro-
-
u-
Churn management
tocol under test. First, synthetic description issued from
analytical studies [23] can be used to generate churn sce-
n
•
narios and is key for testing large-scale application
Churn replay them in the system. Second, several
0
e-
•
traces “Natural churn”of real networks have been made
of the dynamics on PlanetLab not reproducible

e •
publicly available by the community (e.g., see the reposi-
Hard to compare algorithms under same conditions
•
tory at [1]); they cover a wide rangechurn
SPLAY allows to finely control of applications such
as highly churned file-sharing system [8] or high perfor-
• Traces (e.g., from file sharing systems)
mance computing clusters [26].
• Synthetic descriptions: dedicated language
1 < 2 > < 3 >4< 5 >6
1 at 30s join 10
Leaves/joins per min.

Nodes leaving
2 from 5m to 10m inc 10 Nodes joining 30
3 from 10m to 15m const 20

Nodes
10
churn 50% 0
40
4 at 15m leave 50% 30
5 from 15m to 20m inc 10 20 Binned number of joins
churn 150% 10 and leaves per minute
at 20m stop 0
er 6
0 5 10 15 20
Time (minutes)

Figure 5: Example of a synthetic churn description.

This section evaluates the use of the churn management as 14% of t
23
module, both using traces and synthetic descriptions. Us- minute; (2) r
ing churn is as simple as launching a regular S PLAY appli- plex nor lon

Synthetic churn
cation with a trace ﬁle as extra argument. S PLAY provides
a set of tools to generate and process trace ﬁles. One can,
for instance, speed-up a trace, increase the churn ampli-
we did for F
estimate tha
human effor
tude whilst keeping its statistical properties, or generate a than with an
trace from a synthetic description. a Pastry DHT
Routing performance in lieve that the
90th perc.
60 courage the
75th perc. 40 protocols un
50th perc.

Route failures (%)
25th perc. Self-repair current syste
5th perc. 20
Delay (ms)

Failure rate 10
110

Time (seconds)
0
50% of the network fail 8 130
20
6
15
10 4
5 2
0 0
0 1 2 3 4 5 6 7 8 9 10 0
Time (minutes)
Figure 11: Stabilized
Using churn management to reproduce massive Figure 13: D
churn conditions for thefailure: half of theimplementation.
Massive S PLAY Pastry nodes fail tion of (1) the
superset of da
Figure 11 presents aMade Simple - experiment of aof massive
ᔕᕈᒪᐱᓭ Distributed Systems typical Pascal Felber - University Neuchâtel
failure using the synthetic description. We ran Pastry 5.6 Deploy

24

Trace-driven churn
• Traces from the OverNet ﬁle sharing network [IPTPS03]
• Speed up of 2, 5, 10 (1 minute = 30, 12, 6 seconds)

• Evolution of hit-ratio & delays of Pastry on PlanetLab
Churn x2 Churn x5 Churn x10
12 12 12
10 10 10
Route failures (%)

Route failures (%)

Route failures (%)
Leaves and joins per minute Delay (seconds)


8 8 8
6 6 6
4 4 4
2 2 2
2 0 2 0 2 0
1.5 1.5 1.5
1 1 1
0.5 0.5 0.5
0 0 0
200 200 200
Nodes leaving Nodes leaving Nodes leaving
Nodes joining 700 Nodes joining 700 Nodes joining 700
150 150 150
650 650 650
Nodes

Nodes

Nodes
100 100 100
600 600 600
50 550 50 550 50 550

0 500 0 500 0 500
0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50

Figure 11: Study of the effect of churn on Pastry deployed on PlanetLab. Churn is derived from the trace of the Overnet ﬁle sharing
system and sped up for increasing volatility.
10
Churn x2 Churn x5 Churn x10
110% 150% 200% 70 SPLAY trees CRCP trees
ime (seconds)

8 130% 170% 60
6
mpletions

50
4 http://www.splay-project.org40 16 KB 128 KB 512 KB
2
Monday, January 23, 12 30

25

Live Demo

www.splay-project.org


26

Live demo:
Gossip-based dissemination

• Also called epidemic dissemination
• A message spreads like the ﬂu among population
• Robust way to spread message in random network
• Two actions: push and pull
• Push: node chooses f other random node to infect
• A newly infected node infects f other in turn
• Done for a limited number of steps (TTL)
• Pull: node tries to fetch message from random node

27

SOURCE

Every node knows a random set
of other nodes.

A node wants to send a message
to all nodes.


28

A ﬁrst initial push in which nodes
that receive the messages for the
ﬁrst time forwards it to f random
neighbors, up to TTL times.


29

Periodically, nodes pull for new
messages from one random
neighbor. The message
eventually reaches all peers with
high probability.


30
Algorithm 1: A simple push-pull dissemination protocol, on node P .
Constants
f : Fanout (push: P forwards a new message to f random peers).
TTL: Time To Live (push: a message is forwarded to f peers at most
TTL times).
: Pull period (pull : P asks for new messages every seconds).

Variables
R: set of received messages IDs
N : set of node identiﬁers

/* Invoked when a message is pushed to node P by node Q */
function Push(Message m, hops)
if m received for the ﬁrst time then
/* Store the message locally */
R R⌅m
/* Propage the message if required */
if hops > 0 then
invoke Push(msg, hops-1 ) on f random peers from N

/* Periodic pull */
thread PeriodicPull()
every seconds do
invoke Pull(R, P ) on a random node from N

/* Invoked when a node Q requests messages from node P */
function Pull(RQ , Q)
foreach m | m ⇥ R ⇧ m ⇤⇥ RQ do
invoke Push(m, 0 ) on Q


31

require"splay.base"
rpc = require"splay.urpc" Loading Base and UDP RPC Libraries.
--[[ SETTINGS ]]--

fanout = 3
ttl = 3
pull_period = 3

rpc_timeout = 15 Set the timeout for RPC operations
(i.e., an RPC will return nil after 15 seconds).
--[[ VARS ]]--

msgs = {}

function push(q, m)
  if not msgs[m.id] then
    msgs[m.id] = m Logging facilities: all logs are available at the
    log.print("NEW: "..m.id.." (ttl:"..m.t..") from "..q.ip..":"..q.port)
    m.t = m.t - 1
controllers, where they are timestamped and
    if m.t > 0 then stored in the DB.
      for _, n in pairs(misc.random_pick(job.nodes, fanout)) do
        log.print("FORWARD: "..m.id.." (ttl:"..m.t..") to
"..n.ip..":"..n.port) The RPC call itself is embedded in an inner
        events.thread(function()
          rpc.call(n, {"push", job.me, m}, rpc_timeout)
anonymous function in a new anonymous
        end) thread: we do not care about its success
      end (epidemics are naturally robust to message
    end
  else loss). Testing for success would be easy: the
    log.print("DUP: "..m.id.." (ttl:"..m.t..") from "..q.ip..":"..q.port) second return value is nil in case of failure.
  end
end


32

SPLAY provides a library for cooperative
function periodic_pull()
  while events.sleep(pull_period) do
multithreading, events, scheduling and
    local q = misc.random_pick_one(job.nodes) synchronization. SPLAY also provides various
    log.print("PULLING: "..q.ip..":"..q.port) commodity libraries.
    local ids = {}
    for id, _ in pairs(msgs) do
      ids[id] = true (In LUA, arrays and maps are the same type)
    end
    local r = rpc.call(q, {"pull", ids}, rpc_timeout)
    if r then Variable r is nil if the RPC times out.
      for id, m in pairs(r) do
        if not msgs[id] then
          msgs[id] = m
          log.print("NEW REPLY: "..m.id.." from "..q.ip..":"..q.port)
        else
          log.print("DUP REPLY: "..m.id.." from "..q.ip..":"..q.port)
        end
      end
    end
  end
end

function pull(ids)
  local r = {} There is no difference between a local function
  for id, m in pairs(msgs) do
    if not ids[id] then
and one called by a distant node. Even variables
      r[id] = m can be accessed by a distant node by a RPC!
    end SPLAY does all the work of serialization, network
  end
  return r
calls, etc. while preserving low footprint and
end high performance.


33

function source()
  local m = {id = 1, t = ttl, data = "data"}
  for _, n in pairs(misc.random_pick(job.nodes, fanout)) do
    log.print("SOURCE: "..m.id.." to "..n.ip..":"..n.port)
    events.thread(function()
      rpc.call(n, {"push", job.me, m}, rpc_timeout) Calling a remote function.
    end)
  end
end

events.loop(function()
  rpc.server(job.me)
  log.print("ME", job.me.ip, job.me.port, job.position) Start the RPC server: it is up and running.
  events.sleep(15)
  events.thread(periodic_pull)
  if job.position == 1 then Embed a function in a separate thread.
    source()
  end
end)


34

ay
aw
e- e
ak lid
T S

• Distributed systems raise a number of issues
for their evaluation
• Hard to implement, debug, deploy, tune
• ᔕᕈᒪᐱᓭ leverages Lua and centralized
controller to produce an easy to use yet
powerful working environment


SLPAY: Distributed Systems Made Simple

More Related Content

Viewers also liked (13)

Similar to SLPAY: Distributed Systems Made Simple (20)

More from Förderverein Technische Fakultät (20)

Recently uploaded (20)

SLPAY: Distributed Systems Made Simple