SlideShare a Scribd company logo
ᔕᕈᒪᐱᓭ
                            Distributed Systems
                               Made Simple
                         Pascal Felber, Raluca Halalai, Lorenzo Leonini,
                         Etienne Rivière,Valerio Schiavoni, José Valerio
                              Université de Neuchâtel, Switzerland
                                     www.splay-project.org




                                              ᔕᕈᒪᐱᓭ
Monday, January 23, 12
2




                                       Motivations
                    • Developing, testing, and tuning distributed
                         applications is hard
                    • In Computer Science research, fixing the
                         gap of simplicity between pseudocode
                         description and implementation is hard
                    • Using worldwide testbeds is hard

                            ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
3
                                                                                  Networks of idle
                Testbeds                                                           workstations


          • Set of machines for
                 testing distributed
                 application/protocol
          •      Several different testbeds!


                                                 A cluster
                                                 @UniNE

         Your machine
                         ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
4




                What is PlanetLab?




                         ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
4




                What is PlanetLab?
               • Machines contributed by universities,
                         companies,etc.
                     •    1098 nodes at 531 sites (02/09/2011)
                     •    Shared resources, no privileged access
               • University-quality Internet links
               • High resource contention
               • Faults, churn, packet-loss is the norm
                • Challenging conditions
                             ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
5



                                   Daily Job With
                                 Distributed Systems
                         1
                 code




                         1   • Write (testbed specific) code
                               • Local tests, in-house cluster, PlanetLab...


                              ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
5



                                   Daily Job With
                                 Distributed Systems
                                         2
                   1
                 code               debug




                         2   • Debug (in this context, a nightmare)


                              ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
5



                                   Daily Job With
                                 Distributed Systems
                                                                3
                   1                  2
                 code               debug                  deploy




                         3   • Deploy, with testbed specific scripts

                              ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
5



                                   Daily Job With
                                 Distributed Systems
                                                                                       4
                   1                  2                      3
                 code               debug                  deploy                 get logs




                         4   • Get logs, with testbed specific scripts
                              ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
5



                                   Daily Job With
                                 Distributed Systems
                   1                  2                      3                       4                            5
                 code               debug                  deploy                 get logs                      plots




                         5   • Produce plots, hopefully
                              ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
6




                         ᔕᕈᒪᐱᓭ at a Glance
               code            debug                  deploy                 get logs                   plots      t...
                                                                                                                plo
                                                                                                             gnu


             •       Supports the development, evaluation, testing, and
                     tuning of distributed applications on any testbed:
                   •   In-house cluster, shared testbeds, emulated
                       environments...
             •       Provides an easy-to-use pseudocode-like language
                     based on Lua
             •       Open-source: http://www.splay-project.org

                           ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
7




                                The Big Picture

                                                                                                           application

   splayd

  splayd

   splayd

   splayd



                                                                                                 Splay
                                                                                               Controller
                                                                                              SQL DB



                         ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
7




                                             The Big Picture

                                                                                                                        application
                         splayd                  splayd




                                             splayd

                             splayd




                                                                                                              Splay
                                                                                                            Controller
                                                                                                           SQL DB



                                      ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
7




                                             The Big Picture

                         splayd                    splayd

                                            application


                                             splayd

                             splayd




                                                                                                              Splay
                                                                                                            Controller
                                                                                                           SQL DB



                                      ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
8




                           SPLAY architecture
           • Portable daemons (ANSI
                  C) on the testbed(s)
                 • Testbed agnostic
                         for the user
                 • Lightweight
                         (memory & CPU)
           • Retrieve results
                  by remote logging

                             ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
9

      Lua


                                    SPLAY language
 ributed system. It is noteworthy that the churn manage-              Lua’s interpreter can directly execute source code,

            •
ment system relieves the need for fault injection systems          as well as hardware-dependent (but operating system-
                Based on Lua
 uch as Loki [13]. Another typical use of the churn man-           independent) bytecode. In S PLAY, the favored way of
agement system is for long-running applications, e.g., a           submitting applications is in the form of source code, but
                •    High-level scripting language, simple interaction with
DHT that serves as a substrate for some other distributed
application under test and needs to stay available for the
                                                                   bytecode programs are also supported (e.g., for intellec-
                                                                   tual property protection).
                     C, bytecode-based, garbage collection, ...
whole duration of the experiments. In such a scenario,                Isolation and sandboxing are achieved thanks to Lua’s

            •
one can ask the churn manager to maintain a fixed size              support for first-class functions with lexical scoping and
                Close to pseudo code (focus on the algorithm)
population of nodes and to automatically bootstrap new             closures, which allow us to restrict access to I/O and net-
ones as faults occur in the testbed.
            •
                                                                   working libraries. We modify the behavior of these func-
                Favors natural ways of expressing algorithms (e.g., RPCs)
                                                                   tions to implement the restriction imposed by the admin-
                                                                   istrator or by the user at the time he/she submits the ap-
            •
3.3 Language and Applications
                SPLAY applications can be run locally without the  plication for deployment over S PLAY.
                                                                      Lua also supports cooperative multitasking by the
S PLAY applications are written in the Lua language [17],
a highly efficient scripting language. infrastructure (for debugging & testing)
                deployment This design choice                      means of coroutines, which are at the core of S PLAY’s

            •
was dictated by four majors factors. First, the most im-           event-based model (discussed below).
portant reason is the support of sandboxingsetremote
                Comprehensive for of                                                 events/threads
processes and, as a result, increased security both for the
                libraries (extensible)
                                                                       luasocket*                      io (fs)*        stdlib*
                                                                       sb_socket     socketevents       sb_fs         sb_stdlib
 estbed owner and its users. As mentioned earlier, sand-
boxing is a sound basis for execution in non-dedicated                 crypto*             llenc        json*

environments, where resources need to be constrained                    misc                log         rpc            splay::app
and where the hosting operating system must be shielded          * : third!party and lua libraries          : main dependencies
 rom possibly buggy or ill-behaved code. Second, one
                                ᔕᕈᒪᐱᓭ Distributed Systems Made Simple -Figure Felber - University of main S PLAY libraries.
of S PLAY’s goals is to support large numbers of pro-                     Pascal 6: Overview of the Neuchâtel
cesses within23, 12
  Monday, January a single host of the testbed. This calls for
10




                                    Why                                              ?
                    • Light & Fast
                     • (Very) close to equivalent code in C
                    • Concise
                     • Allow developers to focus on ideas more
                           than implementation details
                         • Key for researchers and students
                    •    Sandbox thanks to the possibility of easily
                         redefine (even built-in) functions
                             ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
11



                                       Concise
                                     Pseudo code
                                     as published
                                      on original
                                        paper

                                          Executable
                                          code using
                                            SPLAY
                                           libraries

                         ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
12




                         http://www.splay-project.org
Monday, January 23, 12
13




                         Sandbox: Motivations
     • Experiments should access only
            their own resources
     • Required for non-dedicated testbeds
      • Idle workstations in universities or companies
      • Must protect the regular users from ill-behaved
                  code (e.g., full disk, full memory, etc.)
     • Memory-allocation, filesystem, network resources
            must be restricted
                           ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
14




                         Sandboxing with Lua
             •      In Lua, all functions can be re-defined transparently
             •      Resource control done in re-defined functions
                    before calling the original function
             •      No code modification required for the user

                                                      Sandboxing

                                                 SPLAY Application
                                                    SPLAY Lua libraries
                                           (includes all stdlib and system calls)

                                                 Operating system



                          ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
15




       for _,module in pairs(splay)
                                                                                                           present novelties
                                                                                                           introduced due to dist
                                                                                                           systems




                          luasocket                     events                          io

                             crypto                  llenc/benc                       json

                               misc                        log                        rpc


                                  Modules sandboxed to prevent
                                   access to sensible resources
                         ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
17




                             splay.events
                • Distributed protocols can use message-
                         passing paradigm to communicate
                •        Nodes react to events                   events
                         •  Local, incoming, outgoing messages
                •        The core of the Splay runtime
                         •  Libraries splay.socket & splay.io
                            provide non-blocking operational mode
                         •  Based on Lua’s coroutines

                             ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
19




                                  splay.rpc

                    • Default mechanism to communicate
                         between nodes
                    • Support for UDP/TCP
                    • Efficient BitTorrent-like encoding                                                      rpc

                    • Experimental binary encoding

                           ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
responseCallback.onResponseReceived(msg);


                    }/*
                                                                                                                                                                20
                     * CAN BE COMMENTED AS SEND RETRIAL OF MESSAGES IS OFF else { it is a




                                                           Life Before ᔕᕈᒪᐱᓭ
                     * duplicate lookup response, send a PONG to (hopefully) stop source to


                     * send us again that message final Message pong = new


                     * Message(MessageType.PONG, msg.messageId, this, msg.src); pong.ackedMsg


                     * = msg; simulator.send(pong); }


                     */




                                                                                                         • Time spent on developing testbed
                }


                /* ELSE IF IS SOMETHING NOT A LOOKUP RESPONSE */


                else {




                    final Message beingAcked = msg.ackedMsg;                                                               specific protocols
                                                                                                         • Or fallback to simulations...
                    final ResponseArrivedCallback handler = this.sentMessagesCallbacks


                          .remove(beingAcked);




                                                                                                         • The focus should be on ideas
                    /* if there is a mapping */


                    if (handler != null) {


                     handler.onResponseReceived(msg);




                                                                                                         • Researchers usually have no time
                    }/*


                     * else { this may happen in case of a response arrive later than


                     * expected... log .info("%%% " + this + " received the response => " +




                                                                                                                           to produce industrial-quality code
                     * msg +


                     * " but no handler was expecting it! So, simply send back a PONG to make it stop sending again..n"


                     * ); send a PONG to (hopefully) stop source to send us again final


                     * Message pong = new Message(MessageType.PONG, msg.messageId, this,



        there is more :-(
                     * msg.src); pong.ackedMsg = msg; simulator.send(pong); }


                     */


                }




           // /* it replied at last, so "reset" the value */


                                                                  ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
           // this.nodeConsecutiveTimeouts.put((DHTNode) msg.src, 0);


            }
Monday, January 23, 12
such as epidemic diffusion on Erd¨ s-Renyi random 21
                                         o
graphs [16] and various types of distribution trees [10]
                          Life With ᔕᕈᒪᐱᓭ
(n-ary trees, parallel trees). As one can note from the fol-
lowing figure, all implementations are extremely concise
                                   7
in terms of lines of code (LOC):

            Chord        (base)     59 (base) + 17 (FT) + 26 (leafset) = 101
            Pastry                                      265
            Scribe                   Pastry                        79
      SplitStream                    Pastry               Scribe           58
       BitTorrent                                                            420
           Cyclon                  93
        Epidemic             35
            Trees             47


  Although the number of lines is clearly just a rough
      • Lines thepseudocode ~== Lines of executableitcodestill a
indicator of
              of
                   expressiveness of a system, is
        5 Splay
          daemons have been continuously running of Neuchâtel hosts for
               ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University on these
more than one year.
Monday, January 23, 12
There exist several characterizations of churn that can
                                                                                                                                                        22
 y      be leveraged to reproduce realistic conditions for the pro-
 -
u-
                          Churn management
        tocol under test. First, synthetic description issued from
        analytical studies [23] can be used to generate churn sce-
 n
         •
        narios and is key for testing large-scale application
            Churn replay them in the system. Second, several
 0
e-
           •
        traces “Natural churn”of real networks have been made
               of the dynamics on PlanetLab not reproducible

 e         •
        publicly available by the community (e.g., see the reposi-
               Hard to compare algorithms under same conditions
         •
        tory at [1]); they cover a wide rangechurn
            SPLAY allows to finely control     of applications such
        as highly churned file-sharing system [8] or high perfor-
           •   Traces (e.g., from file sharing systems)
        mance computing clusters [26].
           •          Synthetic descriptions: dedicated language
                                                                          1        <    2   >   <   3   >4<   5   >6
          1    at 30s join 10
                                             Leaves/joins per min.



                                                                              Nodes leaving
          2    from 5m to 10m inc 10                                          Nodes joining                            30
          3    from 10m to 15m const                                                                                   20




                                                                                                                            Nodes
                                                                                                                       10
                   churn 50%                                                                                           0
                                                                     40
          4    at 15m leave 50%                                      30
          5    from 15m to 20m inc 10                                20                                                             Binned number of joins
                   churn 150%                                        10                                                             and leaves per minute
               at 20m stop                                            0
er        6
                                                                          0       5         10        15          20
                                                                                       Time (minutes)

                    Figure 5: Example of a synthetic churn description.
                            ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
 Monday, January 23, 12
This section evaluates the use of the churn management                                                            as 14% of t
                                                                                                                                                      23
                         module, both using traces and synthetic descriptions. Us-                                                         minute; (2) r
                         ing churn is as simple as launching a regular S PLAY appli-                                                       plex nor lon

                                       Synthetic churn
                         cation with a trace file as extra argument. S PLAY provides
                         a set of tools to generate and process trace files. One can,
                         for instance, speed-up a trace, increase the churn ampli-
                                                                                                                                           we did for F
                                                                                                                                           estimate tha
                                                                                                                                           human effor
                         tude whilst keeping its statistical properties, or generate a                                                     than with an
                         trace from a synthetic description. a Pastry DHT
                           Routing performance in                                                                                          lieve that the
                                                 90th perc.
                                                                                                                 60                        courage the
                                                 75th perc.                                                      40                        protocols un
                                                 50th perc.




                                                                                                                      Route failures (%)
                                                 25th perc.                       Self-repair                                              current syste
                                                   5th perc.                                                     20
                          Delay (ms)




                                                Failure rate                                                                                                 10
                                                                                                                                                                      110




                                                                                                                                            Time (seconds)
                                                                                                                 0
                                                                              50% of the network fail                                                        8        130
                                       20
                                                                                                                                                             6
                                       15
                                       10                                                                                                                    4
                                        5                                                                                                                    2
                                        0                                                                                                                    0
                                            0     1    2       3     4   5    6       7       8         9   10                                                    0
                                                                   Time (minutes)
                         Figure 11: Stabilized
                                      Using churn management to reproduce massive                                                          Figure 13: D
                         churn conditions for thefailure: half of theimplementation.
                                        Massive S PLAY Pastry nodes fail                                                                   tion of (1) the
                                                                                                                                           superset of da
                            Figure 11 presents aMade Simple - experiment of aof massive
                              ᔕᕈᒪᐱᓭ Distributed Systems typical Pascal Felber - University Neuchâtel
                         failure using the synthetic description. We ran Pastry                                                            5.6 Deploy
Monday, January 23, 12
24




                                                                                              Trace-driven churn
                                                  •                       Traces from the OverNet file sharing network [IPTPS03]
                                                                      •        Speed up of 2, 5, 10 (1 minute = 30, 12, 6 seconds)

                                                  •                       Evolution of hit-ratio & delays of Pastry on PlanetLab
                                                                                     Churn x2                                                                                                                          Churn x5                                                                                                                                 Churn x10
                                                                                                                    12                                                                                                                                       12                                                                                                                                 12
                                                                                                                    10                                                                                                                                       10                                                                                                                                 10
                                                                                                                          Route failures (%)




                                                                                                                                                                                                                                                                   Route failures (%)




                                                                                                                                                                                                                                                                                                                                                                                                      Route failures (%)
  Leaves and joins per minute Delay (seconds)




                                                                                                                                               Leaves and joins per minute Delay (seconds)




                                                                                                                                                                                                                                                                                        Leaves and joins per minute Delay (seconds)
                                                                                                                    8                                                                                                                                        8                                                                                                                                  8
                                                                                                                    6                                                                                                                                        6                                                                                                                                  6
                                                                                                                    4                                                                                                                                        4                                                                                                                                  4
                                                                                                                    2                                                                                                                                        2                                                                                                                                  2
                                                  2                                                                 0                                                                          2                                                             0                                                                           2                                                      0
                                                1.5                                                                                                                                          1.5                                                                                                                                       1.5
                                                  1                                                                                                                                            1                                                                                                                                         1
                                                0.5                                                                                                                                          0.5                                                                                                                                       0.5
                                                  0                                                                                                                                            0                                                                                                                                         0
                                                200                                                                                                                                          200                                                                                                                                      200
                                                                                          Nodes leaving                                                                                                                     Nodes leaving                                                                                                                              Nodes leaving
                                                                                          Nodes joining             700                                                                                                     Nodes joining                    700                                                                                                       Nodes joining            700
                                                150                                                                                                                                          150                                                                                                                                      150
                                                                                                                    650                                                                                                                                      650                                                                                                                                650
                                                                                                                           Nodes




                                                                                                                                                                                                                                                                    Nodes




                                                                                                                                                                                                                                                                                                                                                                                                       Nodes
                                                100                                                                                                                                          100                                                                                                                                      100
                                                                                                                    600                                                                                                                                      600                                                                                                                                600
                                                50                                                                  550                                                                      50                                                              550                                                                       50                                                       550

                                                 0                                                                  500                                                                       0                                                              500                                                                        0                                                       500
                                                      0          5        10   15   20   25   30   35   40   45    50                                                                              0    5   10   15   20   25   30   35   40          45    50                                                                               0   5    10   15   20    25   30   35   40   45   50

  Figure 11: Study of the effect of churn on Pastry deployed on PlanetLab. Churn is derived from the trace of the Overnet file sharing
  system and sped up for increasing volatility.
                                                                          10
                                                                                Churn x2                                                                                                                          Churn x5                                                                                                                                 Churn x10
                                                                               110%             150%              200%                                                                                                                                 70      SPLAY trees                                                                                 CRCP trees
                                                          ime (seconds)




                                                                           8   130%             170%                                                                                                                               60
                                                                           6
                                                                                                                                                                                                                                          mpletions




                                                                                                                                                                                                                                   50
                                                                           4                                                                                                                           http://www.splay-project.org40                             16 KB                                                                      128 KB                  512 KB
                 2
Monday, January 23, 12                                                                                                                                                                                                             30
25




                                     Live Demo




                              www.splay-project.org

                         ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
26

                                 Live demo:
                         Gossip-based dissemination

      • Also called epidemic dissemination
        • A message spreads like the flu among population
        • Robust way to spread message in random network
      • Two actions: push and pull
        • Push: node chooses f other random node to infect
          • A newly infected node infects f other in turn
          • Done for a limited number of steps (TTL)
        • Pull: node tries to fetch message from random node
                         ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
27




                                SOURCE




                             Every node knows a random set
                                     of other nodes.

                            A node wants to send a message
                                     to all nodes.

                         ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
28




                             A first initial push in which nodes
                            that receive the messages for the
                             first time forwards it to f random
                               neighbors, up to TTL times.



                         ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
29




                              Periodically, nodes pull for new
                               messages from one random
                                 neighbor. The message
                             eventually reaches all peers with
                                      high probability.

                         ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
30
                         Algorithm 1: A simple push-pull dissemination protocol, on node P .
                          Constants
                            f : Fanout (push: P forwards a new message to f random peers).
                            TTL: Time To Live (push: a message is forwarded to f peers at most
                            TTL times).
                               : Pull period (pull : P asks for new messages every seconds).

                          Variables
                            R: set of received messages IDs
                            N : set of node identifiers

                          /* Invoked when a message is pushed to node P by node Q                        */
                          function Push(Message m, hops)
                             if m received for the first time then
                                 /* Store the message locally                                            */
                                 R     R⌅m
                                 /* Propage the message if required                                      */
                                 if hops > 0 then
                                     invoke Push(msg, hops-1 ) on f random peers from N


                          /* Periodic pull                                                               */
                          thread PeriodicPull()
                             every     seconds do
                                invoke Pull(R, P ) on a random node from N

                          /* Invoked when a node Q requests messages from node P                         */
                          function Pull(RQ , Q)
                             foreach m | m ⇥ R ⇧ m ⇤⇥ RQ do
                                 invoke Push(m, 0 ) on Q

                            ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
31




 require"splay.base"
 rpc = require"splay.urpc"                                                               Loading Base and UDP RPC Libraries.
 --[[ SETTINGS ]]--

 fanout = 3
 ttl = 3
 pull_period = 3

 rpc_timeout = 15                                                                            Set the timeout for RPC operations
                                                                                      (i.e., an RPC will return nil after 15 seconds).
 --[[ VARS ]]--

 msgs = {}

 function push(q, m)
   if not msgs[m.id] then
     msgs[m.id] = m                                                                  Logging facilities: all logs are available at the
     log.print("NEW: "..m.id.." (ttl:"..m.t..") from "..q.ip..":"..q.port)
     m.t = m.t - 1
                                                                                     controllers, where they are timestamped and
     if m.t > 0 then                                                                                stored in the DB.
       for _, n in pairs(misc.random_pick(job.nodes, fanout)) do
         log.print("FORWARD: "..m.id.." (ttl:"..m.t..") to
 "..n.ip..":"..n.port)                                                                The RPC call itself is embedded in an inner
         events.thread(function()
           rpc.call(n, {"push", job.me, m}, rpc_timeout)
                                                                                       anonymous function in a new anonymous
         end)                                                                           thread: we do not care about its success
       end                                                                            (epidemics are naturally robust to message
     end
   else                                                                              loss). Testing for success would be easy: the
     log.print("DUP: "..m.id.." (ttl:"..m.t..") from "..q.ip..":"..q.port)            second return value is nil in case of failure.
   end
 end




                                ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
32




                                                                                        SPLAY provides a library for cooperative
 function periodic_pull()
   while events.sleep(pull_period) do
                                                                                         multithreading, events, scheduling and
     local q = misc.random_pick_one(job.nodes)                                        synchronization. SPLAY also provides various
     log.print("PULLING: "..q.ip..":"..q.port)                                                    commodity libraries.
     local ids = {}
     for id, _ in pairs(msgs) do
       ids[id] = true                                                                  (In LUA, arrays and maps are the same type)
     end
     local r = rpc.call(q, {"pull", ids}, rpc_timeout)
     if r then                                                                             Variable r is nil if the RPC times out.
       for id, m in pairs(r) do
         if not msgs[id] then
           msgs[id] = m
           log.print("NEW REPLY: "..m.id.." from "..q.ip..":"..q.port)
         else
           log.print("DUP REPLY: "..m.id.." from "..q.ip..":"..q.port)
         end
       end
     end
   end
 end




 function pull(ids)
   local r = {}                                                                      There is no difference between a local function
   for id, m in pairs(msgs) do
     if not ids[id] then
                                                                                    and one called by a distant node. Even variables
       r[id] = m                                                                      can be accessed by a distant node by a RPC!
     end                                                                            SPLAY does all the work of serialization, network
   end
   return r
                                                                                      calls, etc. while preserving low footprint and
 end                                                                                                high performance.




                                 ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
33




 function source()
   local m = {id = 1, t = ttl, data = "data"}
   for _, n in pairs(misc.random_pick(job.nodes, fanout)) do
     log.print("SOURCE: "..m.id.." to "..n.ip..":"..n.port)
     events.thread(function()
       rpc.call(n, {"push", job.me, m}, rpc_timeout)                                           Calling a remote function.
     end)
   end
 end




 events.loop(function()
   rpc.server(job.me)
   log.print("ME", job.me.ip, job.me.port, job.position)                               Start the RPC server: it is up and running.
   events.sleep(15)
   events.thread(periodic_pull)
   if job.position == 1 then                                                             Embed a function in a separate thread.
     source()
   end
 end)




                                ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12
34



                   ay
                aw
              e- e
            ak lid
           T S


                         • Distributed systems raise a number of issues
                           for their evaluation
                         • Hard to implement, debug, deploy, tune
                         • ᔕᕈᒪᐱᓭ leverages Lua and centralized
                           controller to produce an easy to use yet
                           powerful working environment


                              ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel
Monday, January 23, 12

More Related Content

PDF
XCPU3: Workload Distribution and Aggregation
PDF
Ntino Cloud BioLinux Barcelona Spain 2012
PDF
Nanotechnology Tools for Life Sciences
PDF
Talk_Budapest_for Greensolar Management 3rd_July_2013-1
PPTX
DOI and the Mitteilungen: communicating scientific results in the future
PDF
Le où, quoi et comment de la vidéo: Questions de conservation de collections ...
PPT
Smmart corporate presentation
PPTX
Arabic cluster qtn 7
XCPU3: Workload Distribution and Aggregation
Ntino Cloud BioLinux Barcelona Spain 2012
Nanotechnology Tools for Life Sciences
Talk_Budapest_for Greensolar Management 3rd_July_2013-1
DOI and the Mitteilungen: communicating scientific results in the future
Le où, quoi et comment de la vidéo: Questions de conservation de collections ...
Smmart corporate presentation
Arabic cluster qtn 7

Viewers also liked (13)

PPTX
Jean piaget
PDF
Galway City Innovation District
PDF
Progrès dans les technologies photovoltaïques: abaissement des coûts et nouve...
PDF
2016 ISCN Awards: Campus Planning and Management Systems
PPTX
Economie numérique et développement local
PDF
Data Analytics and Industry-Academic Partnerships: An Irish Perspective
PDF
Expo 2023 Speaker Series | John Comazzi on Swiss Expo 2002
PDF
Gestion du projet Microcity
PPT
Research Methods Workshop, Discourse Analysis
PDF
L’importance de la lumière naturelle
PDF
Microcity, édifice hybride et amplificateur urbain
PPTX
Nano material and surface engineering ppt
PDF
Economie collaborative et transformation du système alimentaire - OuiShare Food
Jean piaget
Galway City Innovation District
Progrès dans les technologies photovoltaïques: abaissement des coûts et nouve...
2016 ISCN Awards: Campus Planning and Management Systems
Economie numérique et développement local
Data Analytics and Industry-Academic Partnerships: An Irish Perspective
Expo 2023 Speaker Series | John Comazzi on Swiss Expo 2002
Gestion du projet Microcity
Research Methods Workshop, Discourse Analysis
L’importance de la lumière naturelle
Microcity, édifice hybride et amplificateur urbain
Nano material and surface engineering ppt
Economie collaborative et transformation du système alimentaire - OuiShare Food
Ad

Similar to SLPAY: Distributed Systems Made Simple (20)

PDF
SPLAY: Distributed Systems Made Simple
PDF
Orientation - Java
PDF
Research Issues in P2P Netwroks
PDF
Continuous Deployment at Disqus (Pylons Minicon)
PPTX
Automated Testing of NASA Software
PDF
Big Data Malaysia - A Primer on Deep Learning
PDF
OAI7 Research Objects
PDF
An Introduction to Deep Learning
PPTX
DOE Magellan OpenStack user story
PDF
NICS Puppet Case Study
PDF
The Science of Cyber Security Experimentation: The DETER Project
PDF
Towards Lensfield
PDF
Puppet buero20 presentation
PDF
Sun Microsystems Puppet Case Study
PDF
Governing services, data, rules, processes and more
PDF
DEF CON 24 - Clarence Chio - machine duping 101
PDF
Adsa lab manual
PDF
CERN Data Centre Evolution
PDF
Testing Plug-in Architectures
PPTX
Unit one ppt of deeep learning which includes Ann cnn
SPLAY: Distributed Systems Made Simple
Orientation - Java
Research Issues in P2P Netwroks
Continuous Deployment at Disqus (Pylons Minicon)
Automated Testing of NASA Software
Big Data Malaysia - A Primer on Deep Learning
OAI7 Research Objects
An Introduction to Deep Learning
DOE Magellan OpenStack user story
NICS Puppet Case Study
The Science of Cyber Security Experimentation: The DETER Project
Towards Lensfield
Puppet buero20 presentation
Sun Microsystems Puppet Case Study
Governing services, data, rules, processes and more
DEF CON 24 - Clarence Chio - machine duping 101
Adsa lab manual
CERN Data Centre Evolution
Testing Plug-in Architectures
Unit one ppt of deeep learning which includes Ann cnn
Ad

More from Förderverein Technische Fakultät (20)

PDF
„Die Klimakrise ist da! Wo führt sie hin?“
PDF
Constrained text generation to measure reading performance: A new approach ba...
PPTX
Greening local government units: Current status and required competences
PDF
Supervisory control of business processes
PPTX
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
PDF
A Game of Chess is Like a Swordfight.pdf
PDF
From Mind to Meta.pdf
PDF
Miniatures Design for Tabletop Games.pdf
PPTX
Distributed Systems in the Post-Moore Era.pptx
PPTX
Don't Treat the Symptom, Find the Cause!.pptx
PDF
Engineering Serverless Workflow Applications in Federated FaaS.pdf
PDF
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
PDF
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
PDF
Towards a data driven identification of teaching patterns.pdf
PPTX
Förderverein Technische Fakultät.pptx
PDF
The Computing Continuum.pdf
PPTX
East-west oriented photovoltaic power systems: model, benefits and technical ...
PDF
Machine Learning in Finance via Randomization
PPTX
Advances in Visual Quality Restoration with Generative Adversarial Networks
„Die Klimakrise ist da! Wo führt sie hin?“
Constrained text generation to measure reading performance: A new approach ba...
Greening local government units: Current status and required competences
Supervisory control of business processes
The Digital Transformation of Education: A Hyper-Disruptive Era through Block...
A Game of Chess is Like a Swordfight.pdf
From Mind to Meta.pdf
Miniatures Design for Tabletop Games.pdf
Distributed Systems in the Post-Moore Era.pptx
Don't Treat the Symptom, Find the Cause!.pptx
Engineering Serverless Workflow Applications in Federated FaaS.pdf
The Role of Machine Learning in Fluid Network Control and Data Planes.pdf
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Towards a data driven identification of teaching patterns.pdf
Förderverein Technische Fakultät.pptx
The Computing Continuum.pdf
East-west oriented photovoltaic power systems: model, benefits and technical ...
Machine Learning in Finance via Randomization
Advances in Visual Quality Restoration with Generative Adversarial Networks

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
project resource management chapter-09.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
1. Introduction to Computer Programming.pptx
PDF
Mushroom cultivation and it's methods.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
August Patch Tuesday
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Encapsulation theory and applications.pdf
PDF
Web App vs Mobile App What Should You Build First.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
NewMind AI Weekly Chronicles - August'25-Week II
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
project resource management chapter-09.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
WOOl fibre morphology and structure.pdf for textiles
1. Introduction to Computer Programming.pptx
Mushroom cultivation and it's methods.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
August Patch Tuesday
Assigned Numbers - 2025 - Bluetooth® Document
Programs and apps: productivity, graphics, security and other tools
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Chapter 5: Probability Theory and Statistics
Encapsulation theory and applications.pdf
Web App vs Mobile App What Should You Build First.pdf
cloud_computing_Infrastucture_as_cloud_p
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf

SLPAY: Distributed Systems Made Simple

  • 1. ᔕᕈᒪᐱᓭ Distributed Systems Made Simple Pascal Felber, Raluca Halalai, Lorenzo Leonini, Etienne Rivière,Valerio Schiavoni, José Valerio Université de Neuchâtel, Switzerland www.splay-project.org ᔕᕈᒪᐱᓭ Monday, January 23, 12
  • 2. 2 Motivations • Developing, testing, and tuning distributed applications is hard • In Computer Science research, fixing the gap of simplicity between pseudocode description and implementation is hard • Using worldwide testbeds is hard ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 3. 3 Networks of idle Testbeds workstations • Set of machines for testing distributed application/protocol • Several different testbeds! A cluster @UniNE Your machine ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 4. 4 What is PlanetLab? ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 5. 4 What is PlanetLab? • Machines contributed by universities, companies,etc. • 1098 nodes at 531 sites (02/09/2011) • Shared resources, no privileged access • University-quality Internet links • High resource contention • Faults, churn, packet-loss is the norm • Challenging conditions ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 6. 5 Daily Job With Distributed Systems 1 code 1 • Write (testbed specific) code • Local tests, in-house cluster, PlanetLab... ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 7. 5 Daily Job With Distributed Systems 2 1 code debug 2 • Debug (in this context, a nightmare) ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 8. 5 Daily Job With Distributed Systems 3 1 2 code debug deploy 3 • Deploy, with testbed specific scripts ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 9. 5 Daily Job With Distributed Systems 4 1 2 3 code debug deploy get logs 4 • Get logs, with testbed specific scripts ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 10. 5 Daily Job With Distributed Systems 1 2 3 4 5 code debug deploy get logs plots 5 • Produce plots, hopefully ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 11. 6 ᔕᕈᒪᐱᓭ at a Glance code debug deploy get logs plots t... plo gnu • Supports the development, evaluation, testing, and tuning of distributed applications on any testbed: • In-house cluster, shared testbeds, emulated environments... • Provides an easy-to-use pseudocode-like language based on Lua • Open-source: http://www.splay-project.org ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 12. 7 The Big Picture application splayd splayd splayd splayd Splay Controller SQL DB ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 13. 7 The Big Picture application splayd splayd splayd splayd Splay Controller SQL DB ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 14. 7 The Big Picture splayd splayd application splayd splayd Splay Controller SQL DB ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 15. 8 SPLAY architecture • Portable daemons (ANSI C) on the testbed(s) • Testbed agnostic for the user • Lightweight (memory & CPU) • Retrieve results by remote logging ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 16. 9 Lua SPLAY language ributed system. It is noteworthy that the churn manage- Lua’s interpreter can directly execute source code, • ment system relieves the need for fault injection systems as well as hardware-dependent (but operating system- Based on Lua uch as Loki [13]. Another typical use of the churn man- independent) bytecode. In S PLAY, the favored way of agement system is for long-running applications, e.g., a submitting applications is in the form of source code, but • High-level scripting language, simple interaction with DHT that serves as a substrate for some other distributed application under test and needs to stay available for the bytecode programs are also supported (e.g., for intellec- tual property protection). C, bytecode-based, garbage collection, ... whole duration of the experiments. In such a scenario, Isolation and sandboxing are achieved thanks to Lua’s • one can ask the churn manager to maintain a fixed size support for first-class functions with lexical scoping and Close to pseudo code (focus on the algorithm) population of nodes and to automatically bootstrap new closures, which allow us to restrict access to I/O and net- ones as faults occur in the testbed. • working libraries. We modify the behavior of these func- Favors natural ways of expressing algorithms (e.g., RPCs) tions to implement the restriction imposed by the admin- istrator or by the user at the time he/she submits the ap- • 3.3 Language and Applications SPLAY applications can be run locally without the plication for deployment over S PLAY. Lua also supports cooperative multitasking by the S PLAY applications are written in the Lua language [17], a highly efficient scripting language. infrastructure (for debugging & testing) deployment This design choice means of coroutines, which are at the core of S PLAY’s • was dictated by four majors factors. First, the most im- event-based model (discussed below). portant reason is the support of sandboxingsetremote Comprehensive for of events/threads processes and, as a result, increased security both for the libraries (extensible) luasocket* io (fs)* stdlib* sb_socket socketevents sb_fs sb_stdlib estbed owner and its users. As mentioned earlier, sand- boxing is a sound basis for execution in non-dedicated crypto* llenc json* environments, where resources need to be constrained misc log rpc splay::app and where the hosting operating system must be shielded * : third!party and lua libraries : main dependencies rom possibly buggy or ill-behaved code. Second, one ᔕᕈᒪᐱᓭ Distributed Systems Made Simple -Figure Felber - University of main S PLAY libraries. of S PLAY’s goals is to support large numbers of pro- Pascal 6: Overview of the Neuchâtel cesses within23, 12 Monday, January a single host of the testbed. This calls for
  • 17. 10 Why ? • Light & Fast • (Very) close to equivalent code in C • Concise • Allow developers to focus on ideas more than implementation details • Key for researchers and students • Sandbox thanks to the possibility of easily redefine (even built-in) functions ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 18. 11 Concise Pseudo code as published on original paper Executable code using SPLAY libraries ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 19. 12 http://www.splay-project.org Monday, January 23, 12
  • 20. 13 Sandbox: Motivations • Experiments should access only their own resources • Required for non-dedicated testbeds • Idle workstations in universities or companies • Must protect the regular users from ill-behaved code (e.g., full disk, full memory, etc.) • Memory-allocation, filesystem, network resources must be restricted ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 21. 14 Sandboxing with Lua • In Lua, all functions can be re-defined transparently • Resource control done in re-defined functions before calling the original function • No code modification required for the user Sandboxing SPLAY Application SPLAY Lua libraries (includes all stdlib and system calls) Operating system ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 22. 15 for _,module in pairs(splay) present novelties introduced due to dist systems luasocket events io crypto llenc/benc json misc log rpc Modules sandboxed to prevent access to sensible resources ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 23. 17 splay.events • Distributed protocols can use message- passing paradigm to communicate • Nodes react to events events • Local, incoming, outgoing messages • The core of the Splay runtime • Libraries splay.socket & splay.io provide non-blocking operational mode • Based on Lua’s coroutines ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 24. 19 splay.rpc • Default mechanism to communicate between nodes • Support for UDP/TCP • Efficient BitTorrent-like encoding rpc • Experimental binary encoding ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 25. responseCallback.onResponseReceived(msg); }/* 20 * CAN BE COMMENTED AS SEND RETRIAL OF MESSAGES IS OFF else { it is a Life Before ᔕᕈᒪᐱᓭ * duplicate lookup response, send a PONG to (hopefully) stop source to * send us again that message final Message pong = new * Message(MessageType.PONG, msg.messageId, this, msg.src); pong.ackedMsg * = msg; simulator.send(pong); } */ • Time spent on developing testbed } /* ELSE IF IS SOMETHING NOT A LOOKUP RESPONSE */ else { final Message beingAcked = msg.ackedMsg; specific protocols • Or fallback to simulations... final ResponseArrivedCallback handler = this.sentMessagesCallbacks .remove(beingAcked); • The focus should be on ideas /* if there is a mapping */ if (handler != null) { handler.onResponseReceived(msg); • Researchers usually have no time }/* * else { this may happen in case of a response arrive later than * expected... log .info("%%% " + this + " received the response => " + to produce industrial-quality code * msg + * " but no handler was expecting it! So, simply send back a PONG to make it stop sending again..n" * ); send a PONG to (hopefully) stop source to send us again final * Message pong = new Message(MessageType.PONG, msg.messageId, this, there is more :-( * msg.src); pong.ackedMsg = msg; simulator.send(pong); } */ } // /* it replied at last, so "reset" the value */ ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel // this.nodeConsecutiveTimeouts.put((DHTNode) msg.src, 0); } Monday, January 23, 12
  • 26. such as epidemic diffusion on Erd¨ s-Renyi random 21 o graphs [16] and various types of distribution trees [10] Life With ᔕᕈᒪᐱᓭ (n-ary trees, parallel trees). As one can note from the fol- lowing figure, all implementations are extremely concise 7 in terms of lines of code (LOC): Chord (base) 59 (base) + 17 (FT) + 26 (leafset) = 101 Pastry 265 Scribe Pastry 79 SplitStream Pastry Scribe 58 BitTorrent 420 Cyclon 93 Epidemic 35 Trees 47 Although the number of lines is clearly just a rough • Lines thepseudocode ~== Lines of executableitcodestill a indicator of of expressiveness of a system, is 5 Splay daemons have been continuously running of Neuchâtel hosts for ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University on these more than one year. Monday, January 23, 12
  • 27. There exist several characterizations of churn that can 22 y be leveraged to reproduce realistic conditions for the pro- - u- Churn management tocol under test. First, synthetic description issued from analytical studies [23] can be used to generate churn sce- n • narios and is key for testing large-scale application Churn replay them in the system. Second, several 0 e- • traces “Natural churn”of real networks have been made of the dynamics on PlanetLab not reproducible e • publicly available by the community (e.g., see the reposi- Hard to compare algorithms under same conditions • tory at [1]); they cover a wide rangechurn SPLAY allows to finely control of applications such as highly churned file-sharing system [8] or high perfor- • Traces (e.g., from file sharing systems) mance computing clusters [26]. • Synthetic descriptions: dedicated language 1 < 2 > < 3 >4< 5 >6 1 at 30s join 10 Leaves/joins per min. Nodes leaving 2 from 5m to 10m inc 10 Nodes joining 30 3 from 10m to 15m const 20 Nodes 10 churn 50% 0 40 4 at 15m leave 50% 30 5 from 15m to 20m inc 10 20 Binned number of joins churn 150% 10 and leaves per minute at 20m stop 0 er 6 0 5 10 15 20 Time (minutes) Figure 5: Example of a synthetic churn description. ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 28. This section evaluates the use of the churn management as 14% of t 23 module, both using traces and synthetic descriptions. Us- minute; (2) r ing churn is as simple as launching a regular S PLAY appli- plex nor lon Synthetic churn cation with a trace file as extra argument. S PLAY provides a set of tools to generate and process trace files. One can, for instance, speed-up a trace, increase the churn ampli- we did for F estimate tha human effor tude whilst keeping its statistical properties, or generate a than with an trace from a synthetic description. a Pastry DHT Routing performance in lieve that the 90th perc. 60 courage the 75th perc. 40 protocols un 50th perc. Route failures (%) 25th perc. Self-repair current syste 5th perc. 20 Delay (ms) Failure rate 10 110 Time (seconds) 0 50% of the network fail 8 130 20 6 15 10 4 5 2 0 0 0 1 2 3 4 5 6 7 8 9 10 0 Time (minutes) Figure 11: Stabilized Using churn management to reproduce massive Figure 13: D churn conditions for thefailure: half of theimplementation. Massive S PLAY Pastry nodes fail tion of (1) the superset of da Figure 11 presents aMade Simple - experiment of aof massive ᔕᕈᒪᐱᓭ Distributed Systems typical Pascal Felber - University Neuchâtel failure using the synthetic description. We ran Pastry 5.6 Deploy Monday, January 23, 12
  • 29. 24 Trace-driven churn • Traces from the OverNet file sharing network [IPTPS03] • Speed up of 2, 5, 10 (1 minute = 30, 12, 6 seconds) • Evolution of hit-ratio & delays of Pastry on PlanetLab Churn x2 Churn x5 Churn x10 12 12 12 10 10 10 Route failures (%) Route failures (%) Route failures (%) Leaves and joins per minute Delay (seconds) Leaves and joins per minute Delay (seconds) Leaves and joins per minute Delay (seconds) 8 8 8 6 6 6 4 4 4 2 2 2 2 0 2 0 2 0 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 0 200 200 200 Nodes leaving Nodes leaving Nodes leaving Nodes joining 700 Nodes joining 700 Nodes joining 700 150 150 150 650 650 650 Nodes Nodes Nodes 100 100 100 600 600 600 50 550 50 550 50 550 0 500 0 500 0 500 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 Figure 11: Study of the effect of churn on Pastry deployed on PlanetLab. Churn is derived from the trace of the Overnet file sharing system and sped up for increasing volatility. 10 Churn x2 Churn x5 Churn x10 110% 150% 200% 70 SPLAY trees CRCP trees ime (seconds) 8 130% 170% 60 6 mpletions 50 4 http://www.splay-project.org40 16 KB 128 KB 512 KB 2 Monday, January 23, 12 30
  • 30. 25 Live Demo www.splay-project.org ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 31. 26 Live demo: Gossip-based dissemination • Also called epidemic dissemination • A message spreads like the flu among population • Robust way to spread message in random network • Two actions: push and pull • Push: node chooses f other random node to infect • A newly infected node infects f other in turn • Done for a limited number of steps (TTL) • Pull: node tries to fetch message from random node ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 32. 27 SOURCE Every node knows a random set of other nodes. A node wants to send a message to all nodes. ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 33. 28 A first initial push in which nodes that receive the messages for the first time forwards it to f random neighbors, up to TTL times. ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 34. 29 Periodically, nodes pull for new messages from one random neighbor. The message eventually reaches all peers with high probability. ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 35. 30 Algorithm 1: A simple push-pull dissemination protocol, on node P . Constants f : Fanout (push: P forwards a new message to f random peers). TTL: Time To Live (push: a message is forwarded to f peers at most TTL times). : Pull period (pull : P asks for new messages every seconds). Variables R: set of received messages IDs N : set of node identifiers /* Invoked when a message is pushed to node P by node Q */ function Push(Message m, hops) if m received for the first time then /* Store the message locally */ R R⌅m /* Propage the message if required */ if hops > 0 then invoke Push(msg, hops-1 ) on f random peers from N /* Periodic pull */ thread PeriodicPull() every seconds do invoke Pull(R, P ) on a random node from N /* Invoked when a node Q requests messages from node P */ function Pull(RQ , Q) foreach m | m ⇥ R ⇧ m ⇤⇥ RQ do invoke Push(m, 0 ) on Q ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 36. 31 require"splay.base" rpc = require"splay.urpc" Loading Base and UDP RPC Libraries. --[[ SETTINGS ]]-- fanout = 3 ttl = 3 pull_period = 3 rpc_timeout = 15 Set the timeout for RPC operations (i.e., an RPC will return nil after 15 seconds). --[[ VARS ]]-- msgs = {} function push(q, m)   if not msgs[m.id] then     msgs[m.id] = m Logging facilities: all logs are available at the     log.print("NEW: "..m.id.." (ttl:"..m.t..") from "..q.ip..":"..q.port)     m.t = m.t - 1 controllers, where they are timestamped and     if m.t > 0 then stored in the DB.       for _, n in pairs(misc.random_pick(job.nodes, fanout)) do         log.print("FORWARD: "..m.id.." (ttl:"..m.t..") to "..n.ip..":"..n.port) The RPC call itself is embedded in an inner         events.thread(function()           rpc.call(n, {"push", job.me, m}, rpc_timeout) anonymous function in a new anonymous         end) thread: we do not care about its success       end (epidemics are naturally robust to message     end   else loss). Testing for success would be easy: the     log.print("DUP: "..m.id.." (ttl:"..m.t..") from "..q.ip..":"..q.port) second return value is nil in case of failure.   end end ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 37. 32 SPLAY provides a library for cooperative function periodic_pull()   while events.sleep(pull_period) do multithreading, events, scheduling and     local q = misc.random_pick_one(job.nodes) synchronization. SPLAY also provides various     log.print("PULLING: "..q.ip..":"..q.port) commodity libraries.     local ids = {}     for id, _ in pairs(msgs) do       ids[id] = true (In LUA, arrays and maps are the same type)     end     local r = rpc.call(q, {"pull", ids}, rpc_timeout)     if r then Variable r is nil if the RPC times out.       for id, m in pairs(r) do         if not msgs[id] then           msgs[id] = m           log.print("NEW REPLY: "..m.id.." from "..q.ip..":"..q.port)         else           log.print("DUP REPLY: "..m.id.." from "..q.ip..":"..q.port)         end       end     end   end end function pull(ids)   local r = {} There is no difference between a local function   for id, m in pairs(msgs) do     if not ids[id] then and one called by a distant node. Even variables       r[id] = m can be accessed by a distant node by a RPC!     end SPLAY does all the work of serialization, network   end   return r calls, etc. while preserving low footprint and end high performance. ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 38. 33 function source()   local m = {id = 1, t = ttl, data = "data"}   for _, n in pairs(misc.random_pick(job.nodes, fanout)) do     log.print("SOURCE: "..m.id.." to "..n.ip..":"..n.port)     events.thread(function()       rpc.call(n, {"push", job.me, m}, rpc_timeout) Calling a remote function.     end)   end end events.loop(function()   rpc.server(job.me)   log.print("ME", job.me.ip, job.me.port, job.position) Start the RPC server: it is up and running.   events.sleep(15)   events.thread(periodic_pull)   if job.position == 1 then Embed a function in a separate thread.     source()   end end) ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12
  • 39. 34 ay aw e- e ak lid T S • Distributed systems raise a number of issues for their evaluation • Hard to implement, debug, deploy, tune • ᔕᕈᒪᐱᓭ leverages Lua and centralized controller to produce an easy to use yet powerful working environment ᔕᕈᒪᐱᓭ Distributed Systems Made Simple - Pascal Felber - University of Neuchâtel Monday, January 23, 12