Building	
  distributed	
  systems	
  using	
  
                       Helix	
  




                h?p://helix.incubator.apache.org	
  
                Apache	
  IncubaGon	
  Oct,	
  2012	
  	
  
                @apachehelix	
  



Kishore	
  Gopalakrishna,	
  @kishoreg1980
h?p://www.linkedin.com/in/kgopalak	
  
	
                                                            1	
  
Outline	
  
•  Introduc)on	
  
•  Architecture	
  
•  How	
  to	
  use	
  Helix	
  
•  Tools	
  
•  Helix	
  usage	
  
	
  


                                          2	
  
Examples	
  of	
  distributed	
  data	
  systems	
  




                                                       3	
  
Lifecycle	
  

                                                        Cluster	
  
                                     Fault	
            Expansion	
  
                                     tolerance	
                •  Thro?le	
  data	
  movement	
  
                      MulG	
                                    •  Re-­‐distribuGon	
  
                                        •  ReplicaGon	
  
                      node	
  
                                        •  Fault	
  detecGon	
  
                                        •  Recovery	
  
         Single	
  
         Node	
  
                      •  ParGGoning	
  
                      •  Discovery	
  
                      •  Co-­‐locaGon	
  



                                                                                               4	
  
Zookeeper	
  provides	
  low	
  level	
  primiGves.	
  	
     We	
  need	
  high	
  level	
  primiGves.	
  
                                                              	
  


                                                                       ApplicaGon	
  
•  File	
  system	
                                                                                     •      Node	
  
•  Lock	
                                                                                               •      ParGGon	
  
•  Ephemeral	
                                                                                          •      Replica	
  
                                                                                                        •      State	
  
                                                                                                        •      TransiGon	
  
                        ApplicaGon	
                                   Framework	
                      	
  




                                                                       Consensus	
  
                         Zookeeper	
  
                                                                        System	
  


                                                                                                                       5	
  
6	
  
Outline	
  
•  IntroducGon	
  
•  Architecture	
  
•  How	
  to	
  use	
  Helix	
  
•  Tools	
  
•  Helix	
  usage	
  



                                          7	
  
Terminologies 	
  	
  
Node	
          A	
  single	
  machine	
  

Cluster	
       Set	
  of	
  Nodes	
  

Resource	
      A	
  logical	
  en/ty	
  e.g.	
  database,	
  index,	
  task	
  

ParGGon	
       Subset	
  of	
  the	
  resource.	
  

Replica	
       Copy	
  of	
  a	
  parGGon	
  

State	
         Status	
  of	
  a	
  parGGon	
  replica,	
  e.g	
  Master,	
  Slave	
  

TransiGon	
     AcGon	
  that	
  lets	
  replicas	
  change	
  status	
  e.g	
  Slave	
  -­‐>	
  Master	
  




                                                                                                              8	
  
Core	
  concept	
  -­‐	
  state	
  machine	
  
•  Set	
  of	
  legal	
  states	
  
       –  S1,	
  S2,	
  S3	
  
•  Set	
  of	
  legal	
  state	
  transiGons	
  
       –  S1àS2	
  
       –  S2àS1	
  
       –  S2àS3	
  
       –  S3àS2	
  
	
  
	
  
                                                           9	
  
Core	
  concept	
  -­‐	
  constraints	
  
•  Minimum	
  and	
  maximum	
  number	
  of	
  replicas	
  
   that	
  should	
  be	
  in	
  a	
  given	
  state	
  
       –  S3àmax=1,	
  S2àmin=2	
  
•  Maximum	
  concurrent	
  transiGons	
  
       –  Per	
  node	
  
       –  Per	
  resource	
  
       –  Across	
  cluster	
  
	
  
	
  
                                                               10	
  
Core	
  concept:	
  objecGves	
  
•  ParGGon	
  placement	
  
   –  Even	
  distribuGon	
  of	
  replicas	
  in	
  state	
  S1,S2	
  across	
  
      cluster	
  
•  Failure/Expansion	
  semanGcs	
  
   –  Create	
  new	
  replicas	
  and	
  assign	
  state	
  
   –  Change	
  state	
  of	
  exisGng	
  replicas	
  
   –  Even	
  distribuGon	
  of	
  replicas	
  	
  
  	
   	
  	
  

                                                                                    11	
  
Augmented	
  finite	
  state	
  machine	
  
          State	
  Machine	
                        Constraints	
                           ObjecGves	
  

• States	
                              • States	
                                • ParGGon	
  Placement	
  
  • S1,S2,S3	
                            • S1à	
  max=1,	
  S2=min=2	
          • Failure	
  semanGcs	
  
• TransiGon	
                           • TransiGons	
  
  • S1àS2,	
  S2àS1,	
  S2àS3,	
       • Concurrent(S1-­‐>S2)	
  
    S3àS1	
  	
                            across	
  cluster	
  <	
  5	
  	
  




                                                                                                               12	
  
Message	
  consumer	
  group	
  -­‐	
  problem	
  	
  
      ASSIGNMENT	
                        SCALING	
                           FAULT	
  TOLERANCE	
  




PARTITIONED	
  QUEUE	
                CONSUMER	
  

       ParGGon	
  management	
                       ElasGcity	
                     Fault	
  tolerance	
  

      •  One	
  consumer	
  per	
          •  Re-­‐distribute	
  queues	
     •  Re-­‐distribute	
  
         queue	
                              among	
  consumers	
            •  Minimize	
  movement	
  
      •  Even	
  distribuGon	
             •  Minimize	
  movement	
          •  Limit	
  max	
  queue	
  per	
  
                                                                                 consumer	
  
                                                                                                                    13	
  
Message	
  consumer	
  group:	
  soluGon	
  
           ONLINE	
  OFFLINE	
  STATE	
  MODEL	
  
                                             MAX	
  10	
  queues	
  per	
  consumer	
  
                 Start	
  consumpGon	
  
                                             MAX=1	
  per	
  par))on	
  	
  




         Offline	
                           Online	
  


                      Stop	
  consumpGon	
  


                                                                                          14	
  
 	
  
                                           Distributed	
  data	
  store	
  
                   P.1	
        P.2	
        P.3	
       P.5	
          P.6	
       P.7	
       P.9	
          P.10	
       P.11	
  


                   P.4	
        P.5	
        P.6	
       P.8	
          P.1	
       P.2	
       P.12	
         P.3	
        P.4	
  
                                                                        P.1	
  

                   P.9	
        P.10	
                   P.11	
         P.12	
                  P.7	
          P.8	
  
                                                                                                                                       SLAVE	
  
MASTER	
                     Node	
  1	
                            Node	
  2	
                             Node	
  3	
  

                       ParGGon	
  
                                                         Fault	
  tolerance	
                              ElasGcity	
  
                      management	
  
                 • MulGple	
  replicas	
               • Fault	
  detecGon	
                  • Minimize	
  
                 • 1	
  designated	
                   • Promote	
  slave	
                     downGme	
  
                   master	
                              to	
  master	
                       • Minimize	
  data	
  
                 • Even	
                              • Even	
                                 movement	
  
                   distribuGon	
                         distribuGon	
                        • Thro?le	
  data	
  
                                                       • No	
  SPOF	
                           movement	
  
Distributed	
  data	
  store:	
  soluGon	
  
          MASTER	
  SLAVE	
  STATE	
  MODEL	
  



                              COUNT=2        minimize(maxnj∈N	
  S(nj)	
  )
             t1≤5
                                S	
  
                    t1                            t2

                         t3             t4
          O	
                                           M	
     COUNT=1       minimize(maxnj∈N	
  M(nj)	
  )




                                                                                                               16	
  
 	
  
                          Distributed	
  search	
  service	
  
           INDEX	
  SHARD	
  
                P.1	
        P.2	
                                        P.5	
     P.6	
  
                                             P.3	
       P.4	
  


                P.3	
        P.4	
                                        P.1	
     P.2	
  
                                             P.5	
       P.6	
  


                                                                   REPLICA	
  
                      Node	
  1	
                Node	
  2	
             Node	
  3	
  

               ParGGon	
  
                                         Fault	
  tolerance	
             ElasGcity	
  
              management	
  
           • MulGple	
  replicas	
     • Fault	
  detecGon	
       • re-­‐distribute	
  
           • Even	
                    • Auto	
  create	
            parGGons	
  
             distribuGon	
               replicas	
                • Minimize	
  
           • Rack	
  aware	
           • Controlled	
                movement	
  
             placement	
                 creaGon	
  of	
           • Thro?le	
  data	
  
                                         replicas	
  	
              movement	
  
Distributed	
  search	
  service:	
  soluGon	
  



                                                    MAX	
  per	
  node=5	
  




                                MAX=3	
  
                           (number	
  of	
  replicas)	
  

                                                                          18	
  
Internals	
  




                19	
  
IDEALSTATE	
  

                                                      P1	
   P2	
   P3	
  
 ConfiguraGon	
           Constraints	
  

• 3	
  nodes	
        • 1	
  Master	
  
• 3	
  parGGons	
  
• 2	
  replicas	
  
                      • 1	
  Slave	
  
                      • Even	
  
                                                      N1:M	
     N2:M	
     N3:M	
  
• StateMachine	
        distribuGon	
  


                                   Replica	
  
                                 placement	
  
                                                      N2:S	
     N3:S	
     N1:S	
  
                                    Replica	
  	
  
                                     State	
  

                                                                                   20	
  
CURRENT	
  STATE	
  


N1	
     •  P1:OFFLINE	
  
         •  P3:OFFLINE	
  


N2	
     •  P2:MASTER	
  
         •  P1:MASTER	
  


N3	
     •  P3:MASTER	
  
          •  P2:SLAVE	
  




                             21	
  
EXTERNAL	
  VIEW	
  


P1	
   P2	
   P3	
  
N1:O	
     N2:M	
     N3:M	
  


N2:M	
     N3:S	
     N1:O	
  




                                 22	
  
Helix	
  Based	
  System	
  Roles	
  
                                                                                                                          PARTICIPANT
  IDEAL STATE

                                                                                                                          SPECTATOR
                                                Controller


                                                                                Parition routing
                                                                                      logic
CURRENT STATE
                                    RESPONSE            COMMAND




P.1	
        P.2	
        P.3	
            P.5	
          P.6	
       P.7	
         P.9	
         P.10	
       P.11	
  


P.4	
        P.5	
        P.6	
            P.8	
          P.1	
       P.2	
         P.12	
        P.3	
        P.4	
  
                                                          P.1	
  

P.9	
        P.10	
                        P.11	
         P.12	
                    P.7	
         P.8	
  



          Node	
  1	
                                 Node	
  2	
                              Node	
  3	
  

                                                                                                                                23	
  
Logical	
  deployment	
  




                            24	
  
Outline	
  
•  IntroducGon	
  
•  Architecture	
  
•  How	
  to	
  use	
  Helix	
  
•  Tools	
  
•  Helix	
  usage	
  
	
  
	
  
                                         25	
  
Helix	
  based	
  soluGon	
  
1.  Define	
  
	
  
2.  Configure	
  
	
  
3.  Run	
  


                                          26	
  
Define:	
  State	
  model	
  definiGon	
  
•  States	
                             •  e.g.	
  MasterSlave	
  
    –  All	
  possible	
  states	
  
    –  Priority	
  
•  TransiGons	
  
    –  Legal	
  transiGons	
                          S	
  

    –  Priority	
  
•  Applicable	
  to	
  each	
             O	
                        M	
  
   parGGon	
  of	
  a	
  resource	
  


                                                                             27	
  
Define:	
  state	
  model	
  
    Builder = new StateModelDefinition.Builder(“MASTERSLAVE”);!
    // Add states and their rank to indicate priority. !
    builder.addState(MASTER, 1);!
    builder.addState(SLAVE, 2);!
    builder.addState(OFFLINE);!
!
    //Set the initial state when the node starts!
    builder.initialState(OFFLINE);	
  



    //Add transitions between the states.!
    builder.addTransition(OFFLINE, SLAVE);!
    builder.addTransition(SLAVE, OFFLINE);!
    builder.addTransition(SLAVE, MASTER);!
    builder.addTransition(MASTER, SLAVE);!
    !



                                                                  28	
  
Define:	
  constraints	
  
                 State	
           Transi)on	
  
ParGGon	
               Y	
                  Y	
  
Resource	
               -­‐	
               Y	
  
Node	
                  Y	
                  Y	
  
                                                             COUNT=2
Cluster	
                -­‐	
               Y	
  
                                                               S	
  


                                                                               COUNT=1
                 State	
           Transi)on	
       O	
               M	
  
   ParGGon	
      M=1,S=2	
               -­‐	
  




                                                                                   29	
  
Define:constraints	
  

           // static constraint!
           builder.upperBound(MASTER, 1);!
!
!
           // dynamic constraint!
           builder.dynamicUpperBound(SLAVE, "R");!
!
!
       !
           // Unconstrained !
           builder.upperBound(OFFLINE, -1;	
  
	
  




                                                     30	
  
Define:	
  parGcipant	
  plug-­‐in	
  code	
  




                                                31	
  
Step	
  2:	
  configure	
  
helix-­‐admin	
  –zkSvr	
  <zkAddress>	
  

CREATE	
  CLUSTER	
  

-­‐-­‐addCluster	
  <clusterName>	
  

ADD	
  NODE	
  

-­‐-­‐addNode	
  <clusterName	
  instanceId(host:port)>	
  	
  

CONFIGURE	
  RESOURCE	
  	
  

-­‐-­‐addResource	
  <clusterName	
  resourceName	
  par;;ons	
  statemodel>	
  	
  
REBALANCE	
  èSET	
  IDEALSTATE	
  

-­‐-­‐rebalance	
  <clusterName	
  resourceName	
  replicas>	
  
                                                                                       32	
  
zookeeper	
  view	
  
                 IDEALSTATE	
  




                                  33	
  
Step	
  3:	
  Run	
  
START	
  CONTROLLER	
  
      run-­‐helix-­‐controller	
  	
  -­‐zkSvr	
  localhost:2181	
  –cluster	
  MyCluster	
  

START	
  PARTICIPANT	
  




                                                                                                34	
  
zookeeper	
  view	
  




                        35	
  
Znode	
  content	
  
CURRENT	
  STATE	
               EXTERNAL	
  VIEW	
  




                                                        36	
  
Spectator	
  Plug-­‐in	
  code	
  




                                     37	
  
Helix	
  ExecuGon	
  modes	
  




                                 38	
  
IDEALSTATE	
  

                                                  P1	
   P2	
   P3	
  
 ConfiguraGon	
           Constraints	
  
                                                  N1:M	
     N2:M	
            N3:M	
  
• 3	
  nodes	
        • 1	
  Master	
  
• 3	
  parGGons	
     • 1	
  Slave	
  
• 2	
  replicas	
     • Even	
  
• StateMachine	
        distribuGon	
  
                                                  N2:S	
     N3:S	
             N1:S	
  


                                    Replica	
                           Replica	
  	
  
                                  placement	
                            State	
  



                                                                                           39	
  
ExecuGon	
  modes	
  
•  Who	
  controls	
  what	
  	
  
                            AUTO	
          AUTO	
      CUSTOM	
  
                            REBALANCE	
  

        Replica	
           Helix	
         App	
       App	
  
        placement	
  

        Replica	
  	
       Helix	
         Helix	
     App	
  
        State	
  




                                                                     40	
  
Auto	
  rebalance	
  v/s	
  Auto	
  
AUTO	
  REBALANCE	
          AUTO	
  




                                                  41	
  
In	
  acGon	
  	
  
                  Auto	
  rebalance	
                                                Auto	
  	
  
            MasterSlave	
  p=3	
  r=2	
  N=3	
                              MasterSlave	
  p=3	
  r=2	
  N=3	
  
Node1	
                Node2	
                 Node3	
           Node	
  1	
           Node	
  2	
           Node	
  3	
  
P1:M	
                 P2:M	
                  P3:M	
            P1:M	
                P2:M	
                P3:M	
  
P2:S	
                 P3:S	
                  P1:S	
            P2:S	
                P3:S	
                P1:S	
  
      On	
  failure:	
  Auto	
  create	
  replica	
  	
        On	
  failure:	
  Only	
  change	
  states	
  to	
  saGsfy	
  
      and	
  assign	
  state	
                                 constraint	
  
 Node	
  1	
            Node	
  2	
            Node	
  3	
        Node	
  1	
          Node	
  2	
           Node	
  3	
  
 P1:O	
                 P2:M	
                 P3:M	
             P1:M	
               P2:M	
                P3:M	
  
 P2:O	
                 P3:S	
                 P1:S	
             P2:S	
               P3:S	
                P1:M	
  
                        P1:M	
                 P2:S	
  



                                                                                                                                42	
  
Custom	
  mode:	
  example	
  




                                 43	
  
Custom	
  mode:	
  handling	
  failure	
  
™  Custom	
  code	
  invoker	
  
     ™      Code	
  that	
  lives	
  on	
  all	
  nodes,	
  but	
  acGve	
  in	
  one	
  place	
  
     ™      Invoked	
  when	
  node	
  joins/leaves	
  the	
  cluster	
  
     ™      Computes	
  new	
  idealstate	
  
     ™      Helix	
  controller	
  fires	
  the	
  transiGon	
  without	
  viola)ng	
  constraints	
  



  P1	
   P2	
   P3	
                   P1	
   P2	
   P3	
                 Transi)ons	
  
                                                                          1	
       N1	
   MàS	
  
                                                                          2	
       N2	
   Sà	
  M	
  
  N1:M	
     N2:M	
      N3:M	
         N1:S	
     N2:M	
     N3:M	
  
                                                                         1	
  &	
  2	
  in	
  parallel	
  violate	
  single	
  
                                                                         master	
  constraint	
  

  N2:S	
      N3:S	
     N1:S	
        N2:M	
      N3:S	
     N1:S	
  
                                                                          Helix	
  sends	
  2	
  aqer	
  1	
  is	
  finished	
  
                                                                                                                            44	
  
Controller	
  deployment	
  

Embedded	
                                             Separate	
  
•  Embedded	
  controller	
  within	
                 •  At	
  least	
  2	
  separate	
  
   each	
  parGcipant	
                                  controllers	
  process	
  to	
  
•  Only	
  one	
  controller	
  acGve	
                  avoid	
  SPOF	
  
•  No	
  extra	
  process	
  to	
  manage	
           •  Only	
  one	
  controller	
  acGve	
  
•  Suitable	
  for	
  small	
  size	
  cluster.	
     •  Extra	
  process	
  to	
  manage	
  
•  Upgrading	
  controller	
  is	
  costly	
          •  Recommended	
  for	
  large	
  
•  ParGcipant	
  health	
  impacts	
                     size	
  clusters	
  
   controller	
  
                                                      •  Upgrading	
  controller	
  is	
  
                                                         easy	
  



                                                                                                  45	
  
Controller	
  fault	
  tolerance	
  

                                                Zookeeper




                       Controller               Controller                Controller
                          1                        2                         3




                       LEADER                   STANDBY                  STANDBY




Zookeeper	
  ephemeral	
  based	
  leader	
  elecGon	
  for	
  deciding	
  controller	
  leader	
  	
  	
  


                                                                                                              46	
  
Controller	
  fault	
  tolerance	
  


                                        Zookeeper




                Controller               Controller               Controller
                   1                        2                        3




                OFFLINE                  LEADER                   STANDBY




When	
  leader	
  fails,	
  another	
  controller	
  becomes	
  the	
  new	
  leader	
  


                                                                                           47	
  
Managing	
  the	
  controllers	
  




                                     48	
  
Scaling	
  the	
  controller:	
  	
  
                    Leader	
  Standby	
  Model	
  
          STANDBY
                                                       Cluster
            S


                                                       Cluster
                                          Controller


  O                   L                                Cluster

OFFLINE              LEADER

                                          Controller   Cluster




                                                       Cluster


                                          Controller
                                                       Cluster




                                                                 49	
  
Scaling	
  the	
  controller:	
  Failure	
  
           STANDBY

             S                                    Cluster




                                                  Cluster
                                     Controller

  O                   L
                                                  Cluster

OFFLINE              LEADER

                                     Controller   Cluster




                                                  Cluster


                                     Controller
                                                  Cluster




                                                            50	
  
Outline	
  
•  IntroducGon	
  
•  Architecture	
  
•  How	
  to	
  use	
  Helix	
  
•  Tools	
  
•  Helix	
  usage	
  	
  



                                          51	
  
Tools	
  
•  Chaos	
  monkey	
  
•  Data	
  driven	
  tesGng	
  and	
  debugging	
  
•  Rolling	
  upgrade	
  
•  On	
  demand	
  task	
  scheduling	
  and	
  intra-­‐cluster	
  
   messaging	
  
•  Health	
  monitoring	
  and	
  alerts	
  



                                                                      52	
  
Data	
  driven	
  tesGng	
  
•  Instrument	
  –	
  
       •  	
  Zookeeper,	
  controller,	
  parGcipant	
  logs	
  
•  Simulate	
  –	
  Chaos	
  monkey	
  
•  Analyze	
  –	
  Invariants	
  are	
  
       •  Respect	
  state	
  transiGon	
  constraints	
  
       •  Respect	
  state	
  count	
  constraints	
  
       •  And	
  so	
  on	
  
•  Debugging	
  made	
  easy	
  
       •  Reproduce	
  exact	
  sequence	
  of	
  events	
  	
  
	
  
                                                                    53	
  
Structured	
  Log	
  File	
  -­‐	
  sample	
  
 timestamp      partition     instanceName                   sessionId                  state

1323312236368   TestDB_123   express1-md_16918   ef172fe9-09ca-4d77b05e-15a414478ccc   OFFLINE

1323312236426   TestDB_123   express1-md_16918   ef172fe9-09ca-4d77b05e-15a414478ccc   OFFLINE

1323312236530   TestDB_123   express1-md_16918   ef172fe9-09ca-4d77b05e-15a414478ccc   OFFLINE

1323312236530   TestDB_91    express1-md_16918   ef172fe9-09ca-4d77b05e-15a414478ccc   OFFLINE

1323312236561   TestDB_123   express1-md_16918   ef172fe9-09ca-4d77b05e-15a414478ccc   SLAVE

1323312236561   TestDB_91    express1-md_16918   ef172fe9-09ca-4d77b05e-15a414478ccc   OFFLINE

1323312236685   TestDB_123   express1-md_16918   ef172fe9-09ca-4d77b05e-15a414478ccc   SLAVE

1323312236685   TestDB_91    express1-md_16918   ef172fe9-09ca-4d77b05e-15a414478ccc   OFFLINE

1323312236685   TestDB_60    express1-md_16918   ef172fe9-09ca-4d77b05e-15a414478ccc   OFFLINE

1323312236719   TestDB_123   express1-md_16918   ef172fe9-09ca-4d77b05e-15a414478ccc   SLAVE

1323312236719   TestDB_91    express1-md_16918   ef172fe9-09ca-4d77b05e-15a414478ccc   SLAVE

1323312236719   TestDB_60    express1-md_16918   ef172fe9-09ca-4d77b05e-15a414478ccc   OFFLINE

1323312236814   TestDB_123   express1-md_16918   ef172fe9-09ca-4d77b05e-15a414478ccc   SLAVE
No	
  more	
  than	
  R=2	
  slaves	
  
Time     State    Number Slaves         Instance
42632   OFFLINE        0          10.117.58.247_12918
42796   SLAVE          1          10.117.58.247_12918
43124   OFFLINE        1          10.202.187.155_12918
43131   OFFLINE        1          10.220.225.153_12918
43275   SLAVE          2          10.220.225.153_12918
43323   SLAVE          3          10.202.187.155_12918
85795   MASTER         2          10.220.225.153_12918
How	
  long	
  was	
  it	
  out	
  of	
  whack?	
  
Number	
  of	
  Slaves	
            Time	
  	
                          Percentage	
  
0	
                                 1082319	
                           0.5	
  
1	
                                 35578388	
                          16.46	
  
2	
                                 179417802	
                         82.99	
  
3	
                                 118863	
                            0.05	
  


              83%	
  of	
  the	
  Gme,	
  there	
  were	
  2	
  slaves	
  to	
  a	
  parGGon	
  
              93%	
  of	
  the	
  Gme,	
  there	
  was	
  1	
  master	
  to	
  a	
  parGGon	
  

Number	
  of	
  Masters	
           Time	
                              Percentage	
  
                  0                                15490456                        7.164960359
                  1                                200706916                       92.83503964
Invariant	
  2:	
  State	
  TransiGons	
  
 FROM	
            TO	
            COUNT	
  

MASTER           SLAVE               55
OFFLINE        DROPPED                0
OFFLINE          SLAVE              298
SLAVE           MASTER              155
SLAVE           OFFLINE               0
Outline	
  
•  IntroducGon	
  
•  Architecture	
  
•  How	
  to	
  use	
  Helix	
  
•  Tools	
  
•  Helix	
  usage	
  	
  



                                          58	
  
Helix	
  usage	
  at	
  LinkedIn	
  
	
  




       	
     Espresso	
  




                                                           59	
  
In	
  flight	
  
•  Apache	
  S4	
  
    –  ParGGoning,	
  co-­‐locaGon	
  
    –  Dynamic	
  cluster	
  expansion	
  
•  Archiva	
  
    –  ParGGoned	
  replicated	
  file	
  store	
  
    –  Rsync	
  based	
  replicaGon	
  
•  Others	
  in	
  evaluaGon	
  
    –  Bigtop	
  

                                                     60	
  
Auto	
  scaling	
  soqware	
  deployment	
  tool	
  
•  States	
                                   Offline
                                                        < 100

    •  Download,	
  Configure,	
  Start	
     Download


    •  AcGve,	
  Standby	
                   Configure

•  Constraint	
  for	
  each	
  state	
  
                                               Start
    •  Download	
  	
  <	
  100	
  
    •  AcGve	
  1000	
                         Active   1000



    •  Standby	
  100	
                       Standby   100




                                                                61	
  
Summary	
  
•  Helix:	
  A	
  Generic	
  framework	
  for	
  building	
  
   distributed	
  systems	
  
•  Modifying/enhancing	
  system	
  behavior	
  is	
  easy	
  
   –  AbstracGon	
  and	
  modularity	
  is	
  key	
  
•  Simple	
  programming	
  model:	
  declaraGve	
  state	
  
   machine	
  



                                                             62	
  
Roadmap	
  

•  Features	
  
       •    Span	
  mulGple	
  data	
  centers	
  
       •    AutomaGc	
  Load	
  balancing	
  
       •    Distributed	
  health	
  monitoring	
  
       •    YARN	
  Generic	
  ApplicaGon	
  master	
  for	
  real	
  Gme	
  
            Apps	
  
       •  Stand	
  alone	
  Helix	
  agent	
  
       	
  
	
  
website	
   h?p://helix.incubator.apache.org	
  

user	
       user@helix.incubator.apache.org	
  

dev	
        dev@helix.incubator.apache.org	
  
twi?er	
     @apachehelix,	
  @kishoreg1980	
  




                                                   64	
  

More Related Content

PPTX
Apache Helix presentation at SOCC 2012
PDF
Helix talk at RelateIQ
PDF
Apache Helix presentation at ApacheCon 2013
PDF
RESTful API Design
PPTX
1.3 Introduction-To-Nginx-Slides.pptx
PDF
Angular Observables & RxJS Introduction
PPT
Alfresco - You probably didn't know that
PPTX
quick intro to elastic search
Apache Helix presentation at SOCC 2012
Helix talk at RelateIQ
Apache Helix presentation at ApacheCon 2013
RESTful API Design
1.3 Introduction-To-Nginx-Slides.pptx
Angular Observables & RxJS Introduction
Alfresco - You probably didn't know that
quick intro to elastic search

What's hot (20)

PDF
The Making of the Oracle R2DBC Driver and How to Take Your Code from Synchron...
PDF
OpenGL SC 2.0 Quick Reference
PDF
Apache Spark's Built-in File Sources in Depth
PDF
A Hands-on Introduction on Terraform Best Concepts and Best Practices
PPTX
Alfresco tuning part1
PPTX
Scala fundamentals
PPTX
Elasticsearch Introduction
PPTX
Alfresco tuning part2
PDF
Container Performance Analysis
PPTX
Introduction to ELK
PPTX
One sink to rule them all: Introducing the new Async Sink
PPTX
Elastic Stack Introduction
PDF
Introducing the Apache Flink Kubernetes Operator
PPTX
Soap vs rest
PPTX
Laravel Tutorial PPT
PDF
Spark and S3 with Ryan Blue
PPTX
Apache NiFi Crash Course Intro
PPTX
進擊的前端工程師:今天就用 JSON Server 自己打造 API 吧!
PPTX
Spring framework
The Making of the Oracle R2DBC Driver and How to Take Your Code from Synchron...
OpenGL SC 2.0 Quick Reference
Apache Spark's Built-in File Sources in Depth
A Hands-on Introduction on Terraform Best Concepts and Best Practices
Alfresco tuning part1
Scala fundamentals
Elasticsearch Introduction
Alfresco tuning part2
Container Performance Analysis
Introduction to ELK
One sink to rule them all: Introducing the new Async Sink
Elastic Stack Introduction
Introducing the Apache Flink Kubernetes Operator
Soap vs rest
Laravel Tutorial PPT
Spark and S3 with Ryan Blue
Apache NiFi Crash Course Intro
進擊的前端工程師:今天就用 JSON Server 自己打造 API 吧!
Spring framework
Ad

Similar to Apache Helix presentation at Vmware (20)

PDF
Building Distributed Systems Using Helix
PDF
Data driven testing: Case study with Apache Helix
PDF
Casing3d opengl
KEY
Conole vilnius 3_nov
PDF
Multi-bundle Scoping in OSGi
PDF
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
PDF
"JBoss clustering solutions Mission Critical Enterprise" by Mircea Markus @ e...
PDF
淺談 Java GC 原理、調教和 新發展
PDF
使用ZooKeeper打造軟體式負載平衡
PDF
Lightweight Grids With Terracotta
PDF
Deep learning architectures
PDF
PostgreSQL: meet your queue
PDF
Philly DB MapR Overview
PDF
%w(map reduce).first - A Tale About Rabbits, Latency, and Slim Crontabs
KEY
From chaos to order
PDF
Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)
PDF
Galaxy RNA-Seq Analysis: Tuxedo Protocol
PDF
Real-Time Analytics with Kafka, Cassandra and Storm
PDF
Introduction to ZooKeeper - TriHUG May 22, 2012
Building Distributed Systems Using Helix
Data driven testing: Case study with Apache Helix
Casing3d opengl
Conole vilnius 3_nov
Multi-bundle Scoping in OSGi
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
"JBoss clustering solutions Mission Critical Enterprise" by Mircea Markus @ e...
淺談 Java GC 原理、調教和 新發展
使用ZooKeeper打造軟體式負載平衡
Lightweight Grids With Terracotta
Deep learning architectures
PostgreSQL: meet your queue
Philly DB MapR Overview
%w(map reduce).first - A Tale About Rabbits, Latency, and Slim Crontabs
From chaos to order
Анализ телеметрии при масштабировании, Theo Schlossnagle (Circonus)
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Real-Time Analytics with Kafka, Cassandra and Storm
Introduction to ZooKeeper - TriHUG May 22, 2012
Ad

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPT
What is a Computer? Input Devices /output devices
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PPTX
The various Industrial Revolutions .pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Five Habits of High-Impact Board Members
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Architecture types and enterprise applications.pdf
PDF
August Patch Tuesday
PDF
CloudStack 4.21: First Look Webinar slides
DOCX
search engine optimization ppt fir known well about this
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
observCloud-Native Containerability and monitoring.pptx
PPTX
Benefits of Physical activity for teenagers.pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
NewMind AI Weekly Chronicles – August ’25 Week III
1 - Historical Antecedents, Social Consideration.pdf
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
What is a Computer? Input Devices /output devices
Final SEM Unit 1 for mit wpu at pune .pptx
The various Industrial Revolutions .pptx
Zenith AI: Advanced Artificial Intelligence
Five Habits of High-Impact Board Members
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Architecture types and enterprise applications.pdf
August Patch Tuesday
CloudStack 4.21: First Look Webinar slides
search engine optimization ppt fir known well about this
sustainability-14-14877-v2.pddhzftheheeeee
observCloud-Native Containerability and monitoring.pptx
Benefits of Physical activity for teenagers.pptx
Tartificialntelligence_presentation.pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
How ambidextrous entrepreneurial leaders react to the artificial intelligence...

Apache Helix presentation at Vmware

  • 1. Building  distributed  systems  using   Helix   h?p://helix.incubator.apache.org   Apache  IncubaGon  Oct,  2012     @apachehelix   Kishore  Gopalakrishna,  @kishoreg1980 h?p://www.linkedin.com/in/kgopalak     1  
  • 2. Outline   •  Introduc)on   •  Architecture   •  How  to  use  Helix   •  Tools   •  Helix  usage     2  
  • 3. Examples  of  distributed  data  systems   3  
  • 4. Lifecycle   Cluster   Fault   Expansion   tolerance   •  Thro?le  data  movement   MulG   •  Re-­‐distribuGon   •  ReplicaGon   node   •  Fault  detecGon   •  Recovery   Single   Node   •  ParGGoning   •  Discovery   •  Co-­‐locaGon   4  
  • 5. Zookeeper  provides  low  level  primiGves.     We  need  high  level  primiGves.     ApplicaGon   •  File  system   •  Node   •  Lock   •  ParGGon   •  Ephemeral   •  Replica   •  State   •  TransiGon   ApplicaGon   Framework     Consensus   Zookeeper   System   5  
  • 7. Outline   •  IntroducGon   •  Architecture   •  How  to  use  Helix   •  Tools   •  Helix  usage   7  
  • 8. Terminologies     Node   A  single  machine   Cluster   Set  of  Nodes   Resource   A  logical  en/ty  e.g.  database,  index,  task   ParGGon   Subset  of  the  resource.   Replica   Copy  of  a  parGGon   State   Status  of  a  parGGon  replica,  e.g  Master,  Slave   TransiGon   AcGon  that  lets  replicas  change  status  e.g  Slave  -­‐>  Master   8  
  • 9. Core  concept  -­‐  state  machine   •  Set  of  legal  states   –  S1,  S2,  S3   •  Set  of  legal  state  transiGons   –  S1àS2   –  S2àS1   –  S2àS3   –  S3àS2       9  
  • 10. Core  concept  -­‐  constraints   •  Minimum  and  maximum  number  of  replicas   that  should  be  in  a  given  state   –  S3àmax=1,  S2àmin=2   •  Maximum  concurrent  transiGons   –  Per  node   –  Per  resource   –  Across  cluster       10  
  • 11. Core  concept:  objecGves   •  ParGGon  placement   –  Even  distribuGon  of  replicas  in  state  S1,S2  across   cluster   •  Failure/Expansion  semanGcs   –  Create  new  replicas  and  assign  state   –  Change  state  of  exisGng  replicas   –  Even  distribuGon  of  replicas           11  
  • 12. Augmented  finite  state  machine   State  Machine   Constraints   ObjecGves   • States   • States   • ParGGon  Placement   • S1,S2,S3   • S1à  max=1,  S2=min=2   • Failure  semanGcs   • TransiGon   • TransiGons   • S1àS2,  S2àS1,  S2àS3,   • Concurrent(S1-­‐>S2)   S3àS1     across  cluster  <  5     12  
  • 13. Message  consumer  group  -­‐  problem     ASSIGNMENT   SCALING   FAULT  TOLERANCE   PARTITIONED  QUEUE   CONSUMER   ParGGon  management   ElasGcity   Fault  tolerance   •  One  consumer  per   •  Re-­‐distribute  queues   •  Re-­‐distribute   queue   among  consumers   •  Minimize  movement   •  Even  distribuGon   •  Minimize  movement   •  Limit  max  queue  per   consumer   13  
  • 14. Message  consumer  group:  soluGon   ONLINE  OFFLINE  STATE  MODEL   MAX  10  queues  per  consumer   Start  consumpGon   MAX=1  per  par))on     Offline   Online   Stop  consumpGon   14  
  • 15.     Distributed  data  store   P.1   P.2   P.3   P.5   P.6   P.7   P.9   P.10   P.11   P.4   P.5   P.6   P.8   P.1   P.2   P.12   P.3   P.4   P.1   P.9   P.10   P.11   P.12   P.7   P.8   SLAVE   MASTER   Node  1   Node  2   Node  3   ParGGon   Fault  tolerance   ElasGcity   management   • MulGple  replicas   • Fault  detecGon   • Minimize   • 1  designated   • Promote  slave   downGme   master   to  master   • Minimize  data   • Even   • Even   movement   distribuGon   distribuGon   • Thro?le  data   • No  SPOF   movement  
  • 16. Distributed  data  store:  soluGon   MASTER  SLAVE  STATE  MODEL   COUNT=2 minimize(maxnj∈N  S(nj)  ) t1≤5 S   t1 t2 t3 t4 O   M   COUNT=1 minimize(maxnj∈N  M(nj)  ) 16  
  • 17.     Distributed  search  service   INDEX  SHARD   P.1   P.2   P.5   P.6   P.3   P.4   P.3   P.4   P.1   P.2   P.5   P.6   REPLICA   Node  1   Node  2   Node  3   ParGGon   Fault  tolerance   ElasGcity   management   • MulGple  replicas   • Fault  detecGon   • re-­‐distribute   • Even   • Auto  create   parGGons   distribuGon   replicas   • Minimize   • Rack  aware   • Controlled   movement   placement   creaGon  of   • Thro?le  data   replicas     movement  
  • 18. Distributed  search  service:  soluGon   MAX  per  node=5   MAX=3   (number  of  replicas)   18  
  • 19. Internals   19  
  • 20. IDEALSTATE   P1   P2   P3   ConfiguraGon   Constraints   • 3  nodes   • 1  Master   • 3  parGGons   • 2  replicas   • 1  Slave   • Even   N1:M   N2:M   N3:M   • StateMachine   distribuGon   Replica   placement   N2:S   N3:S   N1:S   Replica     State   20  
  • 21. CURRENT  STATE   N1   •  P1:OFFLINE   •  P3:OFFLINE   N2   •  P2:MASTER   •  P1:MASTER   N3   •  P3:MASTER   •  P2:SLAVE   21  
  • 22. EXTERNAL  VIEW   P1   P2   P3   N1:O   N2:M   N3:M   N2:M   N3:S   N1:O   22  
  • 23. Helix  Based  System  Roles   PARTICIPANT IDEAL STATE SPECTATOR Controller Parition routing logic CURRENT STATE RESPONSE COMMAND P.1   P.2   P.3   P.5   P.6   P.7   P.9   P.10   P.11   P.4   P.5   P.6   P.8   P.1   P.2   P.12   P.3   P.4   P.1   P.9   P.10   P.11   P.12   P.7   P.8   Node  1   Node  2   Node  3   23  
  • 25. Outline   •  IntroducGon   •  Architecture   •  How  to  use  Helix   •  Tools   •  Helix  usage       25  
  • 26. Helix  based  soluGon   1.  Define     2.  Configure     3.  Run   26  
  • 27. Define:  State  model  definiGon   •  States   •  e.g.  MasterSlave   –  All  possible  states   –  Priority   •  TransiGons   –  Legal  transiGons   S   –  Priority   •  Applicable  to  each   O   M   parGGon  of  a  resource   27  
  • 28. Define:  state  model   Builder = new StateModelDefinition.Builder(“MASTERSLAVE”);! // Add states and their rank to indicate priority. ! builder.addState(MASTER, 1);! builder.addState(SLAVE, 2);! builder.addState(OFFLINE);! ! //Set the initial state when the node starts! builder.initialState(OFFLINE);   //Add transitions between the states.! builder.addTransition(OFFLINE, SLAVE);! builder.addTransition(SLAVE, OFFLINE);! builder.addTransition(SLAVE, MASTER);! builder.addTransition(MASTER, SLAVE);! ! 28  
  • 29. Define:  constraints   State   Transi)on   ParGGon   Y   Y   Resource   -­‐   Y   Node   Y   Y   COUNT=2 Cluster   -­‐   Y   S   COUNT=1 State   Transi)on   O   M   ParGGon   M=1,S=2   -­‐   29  
  • 30. Define:constraints   // static constraint! builder.upperBound(MASTER, 1);! ! ! // dynamic constraint! builder.dynamicUpperBound(SLAVE, "R");! ! ! ! // Unconstrained ! builder.upperBound(OFFLINE, -1;     30  
  • 32. Step  2:  configure   helix-­‐admin  –zkSvr  <zkAddress>   CREATE  CLUSTER   -­‐-­‐addCluster  <clusterName>   ADD  NODE   -­‐-­‐addNode  <clusterName  instanceId(host:port)>     CONFIGURE  RESOURCE     -­‐-­‐addResource  <clusterName  resourceName  par;;ons  statemodel>     REBALANCE  èSET  IDEALSTATE   -­‐-­‐rebalance  <clusterName  resourceName  replicas>   32  
  • 33. zookeeper  view   IDEALSTATE   33  
  • 34. Step  3:  Run   START  CONTROLLER   run-­‐helix-­‐controller    -­‐zkSvr  localhost:2181  –cluster  MyCluster   START  PARTICIPANT   34  
  • 36. Znode  content   CURRENT  STATE   EXTERNAL  VIEW   36  
  • 39. IDEALSTATE   P1   P2   P3   ConfiguraGon   Constraints   N1:M   N2:M   N3:M   • 3  nodes   • 1  Master   • 3  parGGons   • 1  Slave   • 2  replicas   • Even   • StateMachine   distribuGon   N2:S   N3:S   N1:S   Replica   Replica     placement   State   39  
  • 40. ExecuGon  modes   •  Who  controls  what     AUTO   AUTO   CUSTOM   REBALANCE   Replica   Helix   App   App   placement   Replica     Helix   Helix   App   State   40  
  • 41. Auto  rebalance  v/s  Auto   AUTO  REBALANCE   AUTO   41  
  • 42. In  acGon     Auto  rebalance   Auto     MasterSlave  p=3  r=2  N=3   MasterSlave  p=3  r=2  N=3   Node1   Node2   Node3   Node  1   Node  2   Node  3   P1:M   P2:M   P3:M   P1:M   P2:M   P3:M   P2:S   P3:S   P1:S   P2:S   P3:S   P1:S   On  failure:  Auto  create  replica     On  failure:  Only  change  states  to  saGsfy   and  assign  state   constraint   Node  1   Node  2   Node  3   Node  1   Node  2   Node  3   P1:O   P2:M   P3:M   P1:M   P2:M   P3:M   P2:O   P3:S   P1:S   P2:S   P3:S   P1:M   P1:M   P2:S   42  
  • 44. Custom  mode:  handling  failure   ™  Custom  code  invoker   ™  Code  that  lives  on  all  nodes,  but  acGve  in  one  place   ™  Invoked  when  node  joins/leaves  the  cluster   ™  Computes  new  idealstate   ™  Helix  controller  fires  the  transiGon  without  viola)ng  constraints   P1   P2   P3   P1   P2   P3   Transi)ons   1   N1   MàS   2   N2   Sà  M   N1:M   N2:M   N3:M   N1:S   N2:M   N3:M   1  &  2  in  parallel  violate  single   master  constraint   N2:S   N3:S   N1:S   N2:M   N3:S   N1:S   Helix  sends  2  aqer  1  is  finished   44  
  • 45. Controller  deployment   Embedded   Separate   •  Embedded  controller  within   •  At  least  2  separate   each  parGcipant   controllers  process  to   •  Only  one  controller  acGve   avoid  SPOF   •  No  extra  process  to  manage   •  Only  one  controller  acGve   •  Suitable  for  small  size  cluster.   •  Extra  process  to  manage   •  Upgrading  controller  is  costly   •  Recommended  for  large   •  ParGcipant  health  impacts   size  clusters   controller   •  Upgrading  controller  is   easy   45  
  • 46. Controller  fault  tolerance   Zookeeper Controller Controller Controller 1 2 3 LEADER STANDBY STANDBY Zookeeper  ephemeral  based  leader  elecGon  for  deciding  controller  leader       46  
  • 47. Controller  fault  tolerance   Zookeeper Controller Controller Controller 1 2 3 OFFLINE LEADER STANDBY When  leader  fails,  another  controller  becomes  the  new  leader   47  
  • 49. Scaling  the  controller:     Leader  Standby  Model   STANDBY Cluster S Cluster Controller O L Cluster OFFLINE LEADER Controller Cluster Cluster Controller Cluster 49  
  • 50. Scaling  the  controller:  Failure   STANDBY S Cluster Cluster Controller O L Cluster OFFLINE LEADER Controller Cluster Cluster Controller Cluster 50  
  • 51. Outline   •  IntroducGon   •  Architecture   •  How  to  use  Helix   •  Tools   •  Helix  usage     51  
  • 52. Tools   •  Chaos  monkey   •  Data  driven  tesGng  and  debugging   •  Rolling  upgrade   •  On  demand  task  scheduling  and  intra-­‐cluster   messaging   •  Health  monitoring  and  alerts   52  
  • 53. Data  driven  tesGng   •  Instrument  –   •   Zookeeper,  controller,  parGcipant  logs   •  Simulate  –  Chaos  monkey   •  Analyze  –  Invariants  are   •  Respect  state  transiGon  constraints   •  Respect  state  count  constraints   •  And  so  on   •  Debugging  made  easy   •  Reproduce  exact  sequence  of  events       53  
  • 54. Structured  Log  File  -­‐  sample   timestamp partition instanceName sessionId state 1323312236368 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE 1323312236426 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE 1323312236530 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE 1323312236530 TestDB_91 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE 1323312236561 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE 1323312236561 TestDB_91 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE 1323312236685 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE 1323312236685 TestDB_91 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE 1323312236685 TestDB_60 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE 1323312236719 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE 1323312236719 TestDB_91 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE 1323312236719 TestDB_60 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc OFFLINE 1323312236814 TestDB_123 express1-md_16918 ef172fe9-09ca-4d77b05e-15a414478ccc SLAVE
  • 55. No  more  than  R=2  slaves   Time State Number Slaves Instance 42632 OFFLINE 0 10.117.58.247_12918 42796 SLAVE 1 10.117.58.247_12918 43124 OFFLINE 1 10.202.187.155_12918 43131 OFFLINE 1 10.220.225.153_12918 43275 SLAVE 2 10.220.225.153_12918 43323 SLAVE 3 10.202.187.155_12918 85795 MASTER 2 10.220.225.153_12918
  • 56. How  long  was  it  out  of  whack?   Number  of  Slaves   Time     Percentage   0   1082319   0.5   1   35578388   16.46   2   179417802   82.99   3   118863   0.05   83%  of  the  Gme,  there  were  2  slaves  to  a  parGGon   93%  of  the  Gme,  there  was  1  master  to  a  parGGon   Number  of  Masters   Time   Percentage   0 15490456 7.164960359 1 200706916 92.83503964
  • 57. Invariant  2:  State  TransiGons   FROM   TO   COUNT   MASTER SLAVE 55 OFFLINE DROPPED 0 OFFLINE SLAVE 298 SLAVE MASTER 155 SLAVE OFFLINE 0
  • 58. Outline   •  IntroducGon   •  Architecture   •  How  to  use  Helix   •  Tools   •  Helix  usage     58  
  • 59. Helix  usage  at  LinkedIn       Espresso   59  
  • 60. In  flight   •  Apache  S4   –  ParGGoning,  co-­‐locaGon   –  Dynamic  cluster  expansion   •  Archiva   –  ParGGoned  replicated  file  store   –  Rsync  based  replicaGon   •  Others  in  evaluaGon   –  Bigtop   60  
  • 61. Auto  scaling  soqware  deployment  tool   •  States   Offline < 100 •  Download,  Configure,  Start   Download •  AcGve,  Standby   Configure •  Constraint  for  each  state   Start •  Download    <  100   •  AcGve  1000   Active 1000 •  Standby  100   Standby 100 61  
  • 62. Summary   •  Helix:  A  Generic  framework  for  building   distributed  systems   •  Modifying/enhancing  system  behavior  is  easy   –  AbstracGon  and  modularity  is  key   •  Simple  programming  model:  declaraGve  state   machine   62  
  • 63. Roadmap   •  Features   •  Span  mulGple  data  centers   •  AutomaGc  Load  balancing   •  Distributed  health  monitoring   •  YARN  Generic  ApplicaGon  master  for  real  Gme   Apps   •  Stand  alone  Helix  agent      
  • 64. website   h?p://helix.incubator.apache.org   user   user@helix.incubator.apache.org   dev   dev@helix.incubator.apache.org   twi?er   @apachehelix,  @kishoreg1980   64  

Editor's Notes

  • #5: Moving from single node to scalable, fault tolerant distributed mode is non trivial and slow, even though core functionality remains the same.
  • #6: You must define correct behavior of your system. How do you partition? What is the replication factor? Are replicas the same or are there different roles such as master/slave replicas? How should the system behave when nodes fail, new nodes are added etc. This differs from system to system.2. Once you have defined how the system must behave, you have to implement that behavior in code, maybe on top of ZK or otherwise. That implementation is non-trivial, hard to debug, hard to test. Worse, in response to requirements, if the behavior of the system were to change even slightly, the entire process has to repeat.MOVING FROM ONE SHARD PER NODE to MULTIPLE SHARD PER NODEInstead, wouldn&apos;t it be nice if all you had to do was step 1 i.e. simply define the correct behavior of your distributed system and step 2 was somehow taken care of?
  • #8: Core helix concepts What makes it generic
  • #10: With each replica associated with a state
  • #11: With each replica associated with a state
  • #12: With each replica associated with a state
  • #13: In this slide, we will look at the problem from a different perspective and possibly re-define the cluster management problem.So re-cap to solve dds we need to define number of partitions and replicas, and for each replicas we need to different roles like master/slave etcOne of the well proven way to express such behavior is use a state machine
  • #15: Dynamically change number of replicasAdd new resources, add nodesChange behavior easilyChange what runs whereelasticity
  • #18: Limit the number of partitions on a single node,
  • #19: Dynamically change number of replicasAdd new resources, add nodesChange behavior easilyChange what runs whereelasticity
  • #25: Design choicesCommunicationZK optimizationsMultiple deployment optionsNo SPOF
  • #33: Mention configuration etc
  • #43: Auto rebalance: Applicable to message consumer group, searchAutoDistributed data store
  • #53: Allows one to come up with common toolsThink of maven plugins
  • #60: Used in production and manage the core infrastructure components in the companyOperation is easy and easy for dev ops to operate multiple systems
  • #61: S4 Apps and processing tasks, each have different state modelMultitenancy multiple resources
  • #62: Define statesHow many you want in each stateState modelHelix provides MasterSlaveOnlineOfflineLeaderStandbyTwo master systemAutomatic replica creation
  • #63: Provides the right combination of abstraction and flexibilityCode is stable and deployed in productionIntegration between multiple systems co-locatingGood thing helps think more about your design putting in right level of abstraction and modularity
  • #65: contribution