SlideShare a Scribd company logo
Distributed
Ruby and Rails
      @ihower
   http://ihower.tw
        2010/1
About Me
•           a.k.a. ihower
    • http://ihower.tw
    • http://twitter.com/ihower
    • http://github.com/ihower
• Ruby on Rails Developer since 2006
• Ruby Taiwan Community
 • http://ruby.tw
Agenda
• Distributed Ruby
• Distributed Message Queues
• Background-processing in Rails
• Message Queues for Rails
• SOA for Rails
• Distributed Filesystem
• Distributed database
1.Distributed Ruby

• DRb
• Rinda
• Starfish
• MapReduce
• MagLev VM
DRb

• Ruby's RMI                 system
             (remote method invocation)


• an object in one Ruby process can invoke
  methods on an object in another Ruby
  process on the same or a different machine
DRb (cont.)

• no defined interface, faster development time
• tightly couple applications, because no
  defined API, but rather method on objects
• unreliable under large-scale, heavy loads
  production environments
server example 1
require 'drb'

class HelloWorldServer

      def say_hello
          'Hello, world!'
      end

end

DRb.start_service("druby://127.0.0.1:61676",
HelloWorldServer.new)
DRb.thread.join
client example 1
require 'drb'

server = DRbObject.new_with_uri("druby://127.0.0.1:61676")

puts server.say_hello
puts server.inspect

# Hello, world!
# <DRb::DRbObject:0x1003c04c8 @ref=nil, @uri="druby://
127.0.0.1:61676">
example 2
# user.rb
class User

  attr_accessor :username

end
server example 2
require 'drb'
require 'user'

class UserServer

  attr_accessor :users

  def find(id)
    self.users[id-1]
  end

end

user_server = UserServer.new
user_server.users = []
5.times do |i|
  user = User.new
  user.username = i + 1
  user_server.users << user
end

DRb.start_service("druby://127.0.0.1:61676", user_server)
DRb.thread.join
client example 2
require 'drb'

user_server = DRbObject.new_with_uri("druby://127.0.0.1:61676")

user = user_server.find(2)

puts user.inspect
puts "Username: #{user.username}"
user.name = "ihower"
puts "Username: #{user.username}"
Err...

# <DRb::DRbUnknown:0x1003b8318 @name="User", @buf="004bo:
tUser006:016@usernameia">
# client2.rb:8: undefined method `username' for
#<DRb::DRbUnknown:0x1003b8318> (NoMethodError)
Why? DRbUndumped
•   Default DRb operation

    • Pass by value
    • Must share code
• With DRbUndumped
 • Pass by reference
 • No need to share code
Example 2 Fixed
# user.rb
class User

  include DRbUndumped

  attr_accessor :username

end

# <DRb::DRbObject:0x1003b84f8 @ref=2149433940,
@uri="druby://127.0.0.1:61676">
# Username: 2
# Username: ihower
Why use DRbUndumped?

 • Big objects
 • Singleton objects
 • Lightweight clients
 • Rapidly changing software
ID conversion
• Converts reference into DRb object on server
 • DRbIdConv (Default)
 • TimerIdConv
 • NamedIdConv
 • GWIdConv
Beware of garbage
         collection
•   referenced objects may be collected on
    server (usually doesn't matter)
•   Building Your own ID Converter if you want
    to control persistent state.
DRb security
require 'drb'

ro = DRbObject.new_with_uri("druby://127.0.0.1:61676")
class << ro
    undef :instance_eval
end

# !!!!!!!! WARNING !!!!!!!!! DO NOT RUN
ro.instance_eval("`rm -rf *`")
$SAFE=1

instance_eval':
Insecure operation - instance_eval (SecurityError)
DRb security (cont.)

• Access Control Lists (ACLs)
 • via IP address array
 • still can run denial-of-service attack
• DRb over SSL
Rinda

• Rinda is a Ruby port of Linda distributed
    computing paradigm.
•   Linda is a model of coordination and communication among several parallel processes
    operating upon objects stored in and retrieved from shared, virtual, associative memory. This
    model is implemented as a "coordination language" in which several primitives operating on
    ordered sequence of typed data objects, "tuples," are added to a sequential language, such
    as C, and a logically global associative memory, called a tuplespace, in which processes
    store and retrieve tuples. (WikiPedia)
Rinda (cont.)

• Rinda consists of:
 • a TupleSpace implementation
 • a RingServer that allows DRb services to
    automatically discover each other.
RingServer

• We hardcoded IP addresses in DRb
  program, it’s tight coupling of applications
  and make fault tolerance difficult.
• RingServer can detect and interact with
  other services on the network without
  knowing IP addresses.
1. Where Service X?

                                                          RingServer
                                                          via broadcast UDP
                                                                address
                2. Service X: 192.168.1.12




  Client
@192.1681.100

                        3. Hi, Service X @ 192.168.1.12



                                                           Service X
                                                           @ 192.168.1.12
                  4. Hi There 192.168.1.100
ring server example
require 'rinda/ring'
require 'rinda/tuplespace'

DRb.start_service
Rinda::RingServer.new(Rinda::TupleSpace.new)
DRb.thread.join
service example
require 'rinda/ring'

class HelloWorldServer
    include DRbUndumped # Need for RingServer

      def say_hello
          'Hello, world!'
      end

end

DRb.start_service
ring_server = Rinda::RingFinger.primary
ring_server.write([:hello_world_service, :HelloWorldServer, HelloWorldServer.new,
                   'I like to say hi!'], Rinda::SimpleRenewer.new)

DRb.thread.join
client example
require 'rinda/ring'

DRb.start_service
ring_server = Rinda::RingFinger.primary

service = ring_server.read([:hello_world_service, nil,nil,nil])
server = service[2]

puts server.say_hello
puts service.inspect

# Hello, world!
# [:hello_world_service, :HelloWorldServer, #<DRb::DRbObject:0x10039b650
@uri="druby://fe80::21b:63ff:fec9:335f%en1:57416", @ref=2149388540>, "I like
to say hi!"]
TupleSpaces

• Shared object space
• Atomic access
• Just like bulletin board
• Tuple template is
  [:name, :Class, object, ‘description’ ]
5 Basic Operations

• write
• read
• take (Atomic Read+Delete)
• read_all
• notify (Callback for write/take/delete)
Starfish

• Starfish is a utility to make distributed
  programming ridiculously easy
• It runs both the server and the client in
  infinite loops
• MapReduce with ActiveRecode or Files
starfish foo.rb
# foo.rb

class Foo
  attr_reader :i

  def initialize
    @i = 0
  end

  def inc
    logger.info "YAY it incremented by 1 up to #{@i}"
    @i += 1
  end
end

server :log => "foo.log" do |object|
  object = Foo.new
end

client do |object|
  object.inc
end
starfish server example
   ARGV.unshift('server.rb')

   require 'rubygems'
   require 'starfish'

   class HelloWorld
     def say_hi
       'Hi There'
     end
   end

   Starfish.server = lambda do |object|
       object = HelloWorld.new
   end

   Starfish.new('hello_world').server
starfish client example
   ARGV.unshift('client.rb')

   require 'rubygems'
   require 'starfish'

   Starfish.client = lambda do |object|
     puts object.say_hi
     exit(0) # exit program immediately
   end

   Starfish.new('hello_world').client
starfish client example                 (another way)


       ARGV.unshift('server.rb')

       require 'rubygems'
       require 'starfish'

       catch(:halt) do
         Starfish.client = lambda do
       |object|
           puts object.say_hi
           throw :halt
         end

         Starfish.new
       ('hello_world').client

       end

       puts "bye bye"
MapReduce

• introduced by Google to support
  distributed computing on large data sets on
  clusters of computers.
• inspired by map and reduce functions
  commonly used in functional programming.
starfish server example
ARGV.unshift('server.rb')

require 'rubygems'
require 'starfish'

Starfish.server = lambda{ |map_reduce|
  map_reduce.type = File
  map_reduce.input = "/var/log/apache2/access.log"
  map_reduce.queue_size = 10
  map_reduce.lines_per_client = 5
  map_reduce.rescan_when_complete = false
}

Starfish.new('log_server').server
starfish client example
   ARGV.unshift('client.rb')

   require 'rubygems'
   require 'starfish'

   Starfish.client = lambda { |logs|
     logs.each do |log|
       puts "Processing #{log}"
       sleep(1)
     end
   }

   Starfish.new("log_server").client
Other implementations
• Skynet
 • Use TupleSpace or MySQL as message queue
 • Include an extension for ActiveRecord
 • http://skynet.rubyforge.org/
• MRToolkit based on Hadoop
 • http://code.google.com/p/mrtoolkit/
MagLev VM

• a fast, stable, Ruby implementation with
  integrated object persistence and
  distributed shared cache.
• http://maglev.gemstone.com/
• public Alpha currently
2.Distributed Message
       Queues

• Starling
• AMQP/RabbitMQ
• Stomp/ActiveMQ
• beanstalkd
what’s message queue?
          Message X
 Client                Queue



                      Check and processing




          Processor
Why not DRb?

• DRb has security risk and poorly designed APIs
• distributed message queue is a great way to do
  distributed programming: reliable and scalable.
Starling
• a light-weight persistent queue server that
  speaks the Memcache protocol (mimics its
  API)
• Fast, effective, quick setup and ease of use
• Powered by EventMachine
  http://eventmachine.rubyforge.org/EventMachine.html



• Twitter’s open source project, they use it
  before 2009. (now switch to Kestrel, a port of Starling from Ruby
  to Scala)
Starling command

• sudo gem install starling-starling
 • http://github.com/starling/starling
• sudo starling -h 192.168.1.100
• sudo starling_top -h 192.168.1.100
Starling set example
require 'rubygems'
require 'starling'

starling = Starling.new('192.168.1.4:22122')

100.times do |i|
  starling.set('my_queue', i)
end

                     append to the queue, not
                     overwrite in Memcached
Starling get example
require 'rubygems'
require 'starling'

starling = Starling.new('192.168.2.4:22122')

loop do
  puts starling.get("my_queue")
end
get method
• FIFO
• After get, the object is no longer in the
  queue. You will lost message if processing
  error happened.
• The get method blocks until something is
  returned. It’s infinite loop.
Handle processing
 error exception
 require 'rubygems'
 require 'starling'

 starling = Starling.new('192.168.2.4:22122')
 results = starling.get("my_queue")

 begin
     puts results.flatten
 rescue NoMethodError => e
     puts e.message
     Starling.set("my_queue", [results])
 rescue Exception => e
     Starling.set("my_queue", results)
     raise e
 end
Starling cons

• Poll queue constantly
• RabbitMQ can subscribe to a queue that
  notify you when a message is available for
  processing.
AMQP/RabbitMQ
• a complete and highly reliable enterprise
  messaging system based on the emerging
  AMQP standard.
  • Erlang
• http://github.com/tmm1/amqp
 • Powered by EventMachine
Stomp/ActiveMQ

• Apache ActiveMQ is the most popular and
  powerful open source messaging and
  Integration Patterns provider.
• sudo gem install stomp
• ActiveMessaging plugin for Rails
beanstalkd
• Beanstalk is a simple, fast workqueue
  service. Its interface is generic, but was
  originally designed for reducing the latency
  of page views in high-volume web
  applications by running time-consuming tasks
  asynchronously.
• http://kr.github.com/beanstalkd/
• http://beanstalk.rubyforge.org/
• Facebook’s open source project
Why we need asynchronous/
 background-processing in Rails?

• cron-like processing
  text search index update etc)
                                      (compute daily statistics data, create reports, Full-



• long-running tasks             (sending mail, resizing photo’s, encoding videos,
  generate PDF, image upload to S3, posting something to twitter etc)


 • Server traffic jam: expensive request will block
     server resources(i.e. your Rails app)
  • Bad user experience: they maybe try to reload
     and reload again! (responsive matters)
3.Background-
   processing for Rails
• script/runner
• rake
• cron
• daemon
• run_later plugin
• spawn plugin
script/runner


• In Your Rails App root:
• script/runner “Worker.process”
rake

• In RAILS_ROOT/lib/tasks/dev.rake
• rake dev:process
  namespace :dev do
    task :process do
          #...
    end
  end
cron

• Cron is a time-based job scheduler in Unix-
  like computer operating systems.
• crontab -e
Whenever
          http://github.com/javan/whenever

•   A Ruby DSL for Defining Cron Jobs

• http://asciicasts.com/episodes/164-cron-in-ruby
• or http://cronedit.rubyforge.org/
          every 3.hours do
            runner "MyModel.some_process"
            rake "my:rake:task"
            command "/usr/bin/my_great_command"
          end
Daemon

• http://daemons.rubyforge.org/
• http://github.com/dougal/daemon_generator/
rufus-scheduler
   http://github.com/jmettraux/rufus-scheduler


• scheduling pieces of code (jobs)
• Not replacement for cron/at since it runs
  inside of Ruby.
           require 'rubygems'
           require 'rufus/scheduler'

           scheduler =
           Rufus::Scheduler.start_new

           scheduler.every '5s' do
               puts 'check blood pressure'
           end

           scheduler.join
Daemon Kit
   http://github.com/kennethkalmer/daemon-kit



• Creating Ruby daemons by providing a
  sound application skeleton (through a
  generator), task specific generators (jabber
  bot, etc) and robust environment
  management code.
Monitor your daemon

• http://mmonit.com/monit/
• http://github.com/arya/bluepill
• http://god.rubyforge.org/
daemon_controller
http://github.com/FooBarWidget/daemon_controller




• A library for robust daemon management
• Make daemon-dependent applications Just
  Work without having to start the daemons
  manually.
off-load task via system
       command
# mailings_controller.rb
def deliver
  call_rake :send_mailing, :mailing_id => params[:id].to_i
  flash[:notice] = "Delivering mailing"
  redirect_to mailings_url
end

# controllers/application.rb
def call_rake(task, options = {})
  options[:rails_env] ||= Rails.env
  args = options.map { |n, v| "#{n.to_s.upcase}='#{v}'" }
  system "/usr/bin/rake #{task} #{args.join(' ')} --trace 2>&1 >> #{Rails.root}/log/rake.log &"
end

# lib/tasks/mailer.rake
desc "Send mailing"
task :send_mailing => :environment do
  mailing = Mailing.find(ENV["MAILING_ID"])
  mailing.deliver
end

# models/mailing.rb
def deliver
  sleep 10 # placeholder for sending email
  update_attribute(:delivered_at, Time.now)
end
Simple Thread

after_filter do
    Thread.new do
        AccountMailer.deliver_signup(@user)
    end
end
run_later plugin
      http://github.com/mattmatt/run_later


• Borrowed from Merb
• Uses worker thread and a queue
• Simple solution for simple tasks
  run_later do
      AccountMailer.deliver_signup(@user)
  end
spawn plugin
http://github.com/tra/spawn


  spawn do
    logger.info("I feel sleepy...")
    sleep 11
    logger.info("Time to wake up!")
  end
spawn (cont.)
• By default, spawn will use the fork to spawn
  child processes.You can configure it to do
  threading.
• Works by creating new database
  connections in ActiveRecord::Base for the
  spawned block.
• Fock need copy Rails every time
threading vs. forking
•   Forking advantages:
    •   more reliable? - the ActiveRecord code is not thread-safe.
    •   keep running - subprocess can live longer than its parent.
    •   easier - just works with Rails default settings. Threading
        requires you set allow_concurrency=true and. Also,
        beware of automatic reloading of classes in development
        mode (config.cache_classes = false).
•   Threading advantages:
    •   less filling - threads take less resources... how much less?
        it depends.
    •   debugging - you can set breakpoints in your threads
Okay, we need
    reliable messaging system:
•   Persistent
•   Scheduling: not necessarily all at the same time
•   Scalability: just throw in more instances of your
    program to speed up processing
•   Loosely coupled components that merely ‘talk’
    to each other
•   Ability to easily replace Ruby with something
    else for specific tasks
•   Easy to debug and monitor
4.Message Queues
     (for Rails only)
• ar_mailer
• BackgroundDRb
• workling
• delayed_job
• resque
Rails only?

• Easy to use/write code
• Jobs are Ruby classes or objects
• But need to load Rails environment
ar_mailer
       http://seattlerb.rubyforge.org/ar_mailer/



• a two-phase delivery agent for ActionMailer.
 • Store messages into the database
 • Delivery by a separate process, ar_sendmail
    later.
BackgroundDRb
            http://backgroundrb.rubyforge.org/

• BackgrounDRb is a Ruby job server and
  scheduler.
• Have scalability problem due to
  Mark Bates)
                                         (~20 servers for



• Hard to know if processing error
• Use database to persist tasks
• Use memcached to know processing result
workling
     http://github.com/purzelrakete/workling




• Gives your Rails App a simple API that you
  can use to make code run in the
  background, outside of the your request.
• Supports Starling(default), BackgroundJob,
  Spawn and AMQP/RabbitMQ Runners.
Workling/Starling
         setup
• script/plugin install git://github.com/purzelrae/
  workling.git
• sudo starling -p 15151
• RAILS_ENV=production script/
  workling_client start
Workling example
 class EmailWorker < Workling::Base
   def deliver(options)
     user = User.find(options[:id])
     user.deliver_activation_email
   end
 end


 # in your controller
 def create
     EmailWorker.asynch_deliver( :id => 1)
 end
delayed_job
• Database backed asynchronous priority
  queue
• Extracted from Shopify
• you can place any Ruby object on its queue
  as arguments
• Only load the Rails environment only once
delayed_job setup
                (use fork version)




• script/plugin install git://github.com/
  collectiveidea/delayed_job.git
• script/generate delayed_job
• rake db:migrate
delayed_job example
     send_later
def deliver
  mailing = Mailing.find(params[:id])
  mailing.send_later(:deliver)
  flash[:notice] = "Mailing is being delivered."
  redirect_to mailings_url
end
delayed_job example
  custom workers
class MailingJob < Struct.new(:mailing_id)

  def perform
    mailing = Mailing.find(mailing_id)
    mailing.deliver
  end

end

# in your controller
def deliver
  Delayed::Job.enqueue(MailingJob.new(params[:id]))
  flash[:notice] = "Mailing is being delivered."
  redirect_to mailings_url
end
delayed_job example
       always asynchronously


   class Device
     def deliver
       # long running method
     end
     handle_asynchronously :deliver
   end

   device = Device.new
   device.deliver
Running jobs

• rake jobs:works
  (Don’t use in production, it will exit if the database has any network connectivity
  problems.)


• RAILS_ENV=production script/delayed_job start
• RAILS_ENV=production script/delayed_job stop
Priority
                  just Integer, default is 0

• you can run multipie workers to handle different
  priority jobs
• RAILS_ENV=production script/delayed_job -min-
  priority 3 start

  Delayed::Job.enqueue(MailingJob.new(params[:id]), 3)

  Delayed::Job.enqueue(MailingJob.new(params[:id]), -3)
Scheduled
        no guarantees at precise time, just run_after_at



Delayed::Job.enqueue(MailingJob.new(params[:id]), 3, 3.days.from_now)

Delayed::Job.enqueue(MailingJob.new(params[:id]),
                                    3, 1.month.from_now.beginning_of_month)
Configuring Dealyed
        Job
# config/initializers/delayed_job_config.rb
Delayed::Worker.destroy_failed_jobs = false
Delayed::Worker.sleep_delay = 5 # sleep if empty queue
Delayed::Worker.max_attempts = 25
Delayed::Worker.max_run_time = 4.hours # set to the amount of time
of longest task will take
Automatic retry on failure
 • If a method throws an exception it will be
   caught and the method rerun later.
 • The method will be retried up to 25
   (default) times at increasingly longer
   intervals until it passes.
   • 108 hours at most
     Job.db_time_now + (job.attempts ** 4) + 5
Capistrano Recipes
• Remember to restart delayed_job after
  deployment
• Check out lib/delayed_job/recipes.rb
   after "deploy:stop",    "delayed_job:stop"
   after "deploy:start",   "delayed_job:start"
   after "deploy:restart", "delayed_job:restart"
Resque
             http://github.com/defunkt/resque

•   a Redis-backed library for creating background jobs,
    placing those jobs on multiple queues, and processing
    them later.
•   Github’s open source project
•   you can only place JSONable Ruby objects
•   includes a Sinatra app for monitoring what's going on
•   support multiple queues
•   you expect a lot of failure/chaos
My recommendations:

• General purpose: delayed_job
  (Github highly recommend DelayedJob to anyone whose site is not 50% background work.)



• Time-scheduled: cron + rake
5. SOA for Rails

• What’s SOA
• Why SOA
• Considerations
• The tool set
What’s SOA
           Service oriented architectures



• “monolithic” approach is not enough
• SOA is a way to design complex applications
  by splitting out major components into
  individual services and communicating via
  APIs.
• a service is a vertical slice of functionality:
  database, application code and caching layer
a monolithic web app example
                 request




             Load
            Balancer




            WebApps




            Database
a SOA example
                                     request




                                 Load
       request
                                Balancer



     WebApp                  WebApps
for Administration           for User




       Services A    Services B




        Database     Database
Why SOA? Isolation
• Shared Resources
• Encapsulation
• Scalability
• Interoperability
• Reuse
• Testability
• Reduce Local Complexity
Shared Resources
• Different front-web website use the same
  resource.
• SOA help you avoiding duplication databases
  and code.
• Why not only shared database?
 • code is not DRY                 WebApp
                              for Administration
                                                      WebApps
                                                      for User


 • caching will be problematic
                                               Database
Encapsulation

• you can change underly implementation in
  services without affect other parts of system
 • upgrade library
 • upgrade to Ruby 1.9
• you can provide API versioning
Scalability1: Partitioned
     Data Provides
•   Database is the first bottleneck, a single DB
    server can not scale. SOA help you reduce
    database load
•   Anti-pattern: only split the database              WebApps


    •   model relationship is broken
    •   referential integrity               Database
                                               A
                                                                 Database
                                                                    B


•   Myth: database replication can not help you
    speed and consistency
Scalability 2: Caching

• SOA help you design caching system easier
 • Cache data at the right times and expire
    at the right times
 • Cache logical model, not physical
 • You do not need cache view everywhere
Scalability 3: Efficient
• Different components have different task
  loading, SOA can scale by service.

                               WebApps



              Load
             Balancer                                 Load
                                                     Balancer




    Services A    Services A    Services B   Services B    Services B   Services B
Security

• Different services can be inside different
  firewall
  • You can only open public web and
    services, others are inside firewall.
Interoperability
• HTTP is the common interface, SOA help
  you integrate them:
 • Multiple languages
 • Internal system e.g. Full-text searching engine
 • Legacy database, system
 • External vendors
Reuse

• Reuse across multiple applications
• Reuse for public APIs
• Example: Amazon Web Services (AWS)
Testability

• Isolate problem
• Mocking API calls
 • Reduce the time to run test suite
Reduce Local
         Complexity
• Team modularity along the same module
  splits as your software
• Understandability: The amount of code is
  minimized to a quantity understandable by
  a small team
• Source code control
Considerations

• Partition into Separate Services
• API Design
• Which Protocol
How to partition into
 Separate Services
• Partitioning on Logical Function
• Partitioning on Read/Write Frequencies
• Partitioning by Minimizing Joins
• Partitioning by Iteration Speed
API Design

• Send Everything you need
• Parallel HTTP requests
• Send as Little as Possible
• Use Logical Models
Physical Models &
     Logical Models
• Physical models are mapped to database
  tables through ORM. (It’s 3NF)
• Logical models are mapped to your
  business problem. (External API use it)
• Logical models are mapped to physical
  models by you.
Logical Models
• Not relational or normalized
• Maintainability
  • can change with no change to data store
  • can stay the same while the data store
    changes
• Better fit for REST interfaces
• Better caching
Which Protocol?

• SOAP
• XML-RPC
• REST
RESTful Web services

• Rails way
• REST is about resources
 • URL
 • Verbs: GET/PUT/POST/DELETE
The tool set

• Web framework
• XML Parser
• JSON Parser
• HTTP Client
Web framework

• We do not need controller, view too much
• Rails is a little more, how about Sinatra?
• Rails metal
ActiveResource

• Mapping RESTful resources as models in a
  Rails application.
• But not useful in practice, why?
XML parser

• http://nokogiri.org/
• Nokogiri ( ) is an HTML, XML, SAX, and
  Reader parser. Among Nokogiri’s many
  features is the ability to search documents
  via XPath or CSS3 selectors.
JSON Parser

• http://github.com/brianmario/yajl-ruby/
• An extremely efficient streaming JSON
  parsing and encoding library. Ruby C
  bindings to Yajl
HTTP Client


• http://github.com/pauldix/typhoeus/
• Typhoeus runs HTTP requests in parallel
  while cleanly encapsulating handling logic
Tips

• Define your logical model (i.e. your service
  request result) first.

• model.to_json and model.to_xml is easy to
  use, but not useful in practice.
6.Distributed File System
 •   NFS not scale
     •   we can use rsync to duplicate
 •   MogileFS
     •   http://www.danga.com/mogilefs/
     •   http://seattlerb.rubyforge.org/mogilefs-client/
 •   Amazon S3
 •   HDFS (Hadoop Distributed File System)
 •   GlusterFS
7.Distributed Database

• NoSQL
• CAP theorem
 • Eventually consistent
• HBase/Cassandra/Voldemort
The End
References
•   Books&Articles:
    •    Distributed Programming with Ruby, Mark Bates (Addison Wesley)
    •    Enterprise Rails, Dan Chak (O’Reilly)
    •    Service-Oriented Design with Ruby and Rails, Paul Dix (Addison Wesley)
    •    RESTful Web Services, Richardson&Ruby (O’Reilly)
    •    RESTful WEb Services Cookbook, Allamaraju&Amundsen (O’Reilly)
    •    Enterprise Recipes with Ruby on Rails, Maik Schmidt (The Pragmatic Programmers)
    •    Ruby in Practice, McAnally&Arkin (Manning)

    •    Building Scalable Web Sites, Cal Henderson (O’Reilly)
    •    Background Processing in Rails, Erik Andrejko (Rails Magazine)
    •    Background Processing with Delayed_Job, James Harrison (Rails Magazine)
    •    Bulinging Scalable Web Sites, Cal Henderson (O’Reilly)
    •                 Web   点          (                 )
•   Slides:
    •    Background Processing (Rob Mack) Austin on Rails - April 2009
    •    The Current State of Asynchronous Processing in Ruby (Mathias Meyer, Peritor GmbH)
    •    Asynchronous Processing (Jonathan Dahl)
    •    Long-Running Tasks In Rails Without Much Effort (Andy Stewart) - April 2008
    •    Starling + Workling: simple distributed background jobs with Twitter’s queuing system, Rany Keddo 2008
    •    Physical Models & Logical Models in Rails, dan chak
References
•   Links:
    •   http://segment7.net/projects/ruby/drb/
    •   http://www.slideshare.net/luccastera/concurrent-programming-with-ruby-and-tuple-spaces
    •   http://github.com/blog/542-introducing-resque
    •   http://www.engineyard.com/blog/2009/5-tips-for-deploying-background-jobs/
    •   http://www.opensourcery.co.za/2008/07/07/messaging-and-ruby-part-1-the-big-picture/
    •   http://leemoonsoo.blogspot.com/2009/04/simple-comparison-open-source.html
    •   http://blog.gslin.org/archives/2009/07/25/2065/
    •   http://www.javaeye.com/topic/524977
    •   http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
Todo (maybe next time)
•   AMQP/RabbitMQ example code
    •   How about Nanite?
•   XMPP
•   MagLev VM
•   More MapReduce example code
    •   How about Amazon Elastic MapReduce?
•   Resque example code
•   More SOA example and code
•   MogileFS example code

More Related Content

PDF
Beginner's guide create a custom 'copy' planning function type
DOC
VENKAT FICO Resume
PPTX
Gap analysis in sapm ecc 6.0
PDF
Power BI - Power Query
DOCX
Cutover strategy in sap fico
PPTX
Charm workflow for urgent changes while adding node
PDF
SAP BW to BW4HANA Migration
PDF
How to write a routine for 0 calday in infopackage selection
Beginner's guide create a custom 'copy' planning function type
VENKAT FICO Resume
Gap analysis in sapm ecc 6.0
Power BI - Power Query
Cutover strategy in sap fico
Charm workflow for urgent changes while adding node
SAP BW to BW4HANA Migration
How to write a routine for 0 calday in infopackage selection

What's hot (20)

PPT
PPTX
SAP BW - Creation of hierarchies (time dependant hierachy structures)
PPTX
Back face detection
PDF
ISU Assess Workflow to Re Estimate Previous Billed Meter Readings
DOCX
Sap ps module tutorial
PDF
Central Finance Configuration.pdf
PPTX
SAP Smart forms
PDF
Sap information steward
PDF
2D Transformation in Computer Graphics
PDF
SQL Server 2019 Big Data Cluster
PPTX
SAP FICO Config step-by-step-sap-fi-basic-configuration-settings.pptx
PPTX
Binary parallel adder
PPTX
Azure Data Factory Data Flows Training v005
PPTX
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
PDF
Find out userexits in sap
PDF
SAP Financial Accounting - Configuration Document
PDF
Sap fi configuration
PPTX
Microsoft Azure Data Factory Data Flow Scenarios
PPTX
PowerBI Portfolio
PDF
SAP S4HANA Migration Cockpit.pdf
SAP BW - Creation of hierarchies (time dependant hierachy structures)
Back face detection
ISU Assess Workflow to Re Estimate Previous Billed Meter Readings
Sap ps module tutorial
Central Finance Configuration.pdf
SAP Smart forms
Sap information steward
2D Transformation in Computer Graphics
SQL Server 2019 Big Data Cluster
SAP FICO Config step-by-step-sap-fi-basic-configuration-settings.pptx
Binary parallel adder
Azure Data Factory Data Flows Training v005
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Find out userexits in sap
SAP Financial Accounting - Configuration Document
Sap fi configuration
Microsoft Azure Data Factory Data Flow Scenarios
PowerBI Portfolio
SAP S4HANA Migration Cockpit.pdf
Ad

Viewers also liked (20)

PDF
Getting Distributed (With Ruby On Rails)
PDF
ESPM 2009
ODP
IntelliSemantc - Second generation semantic technologies for patents
PDF
Make your app idea a reality with Ruby On Rails
KEY
ActiveRecord Validations, Season 2
PDF
ActiveRecord Query Interface (1), Season 1
PDF
Service-Oriented Design and Implement with Rails3
PPTX
Rails Engine Patterns
DOC
Proyecto festival
PPT
ActiveWarehouse/ETL - BI & DW for Ruby/Rails
PDF
6 reasons Jubilee could be a Rubyist's new best friend
PDF
Rails Performance
ODP
Performance Optimization of Rails Applications
PPTX
Neev Expertise in Ruby on Rails (RoR)
PDF
Introduction to Ruby on Rails
ODP
Ruby on Rails
PDF
IoT×Emotion
PDF
Ruby on Rails Presentation
PDF
Ruby Beyond Rails
PDF
Ruby on Rails versus Django - A newbie Web Developer's Perspective -Shreyank...
Getting Distributed (With Ruby On Rails)
ESPM 2009
IntelliSemantc - Second generation semantic technologies for patents
Make your app idea a reality with Ruby On Rails
ActiveRecord Validations, Season 2
ActiveRecord Query Interface (1), Season 1
Service-Oriented Design and Implement with Rails3
Rails Engine Patterns
Proyecto festival
ActiveWarehouse/ETL - BI & DW for Ruby/Rails
6 reasons Jubilee could be a Rubyist's new best friend
Rails Performance
Performance Optimization of Rails Applications
Neev Expertise in Ruby on Rails (RoR)
Introduction to Ruby on Rails
Ruby on Rails
IoT×Emotion
Ruby on Rails Presentation
Ruby Beyond Rails
Ruby on Rails versus Django - A newbie Web Developer's Perspective -Shreyank...
Ad

Similar to Distributed Ruby and Rails (20)

ODP
DRb at the Ruby Drink-up of Sophia, December 2011
KEY
Everything ruby
PDF
Dragoncraft Architectural Overview
PDF
Apache thrift-RPC service cross languages
PDF
Cloud Foundry Open Tour China (english)
PDF
Cloud Foundry Open Tour China
PDF
How to create multiprocess server on windows with ruby - rubykaigi2016 Ritta ...
PDF
Middleware as Code with mruby
PDF
FreeSWITCH as a Microservice
KEY
DRb and Rinda
PDF
D Rb Silicon Valley Ruby Conference
PDF
Get your teeth into Plack
KEY
Rails web api 开发
KEY
TorqueBox - Ruby Hoedown 2011
KEY
A language for the Internet: Why JavaScript and Node.js is right for Internet...
PPTX
Ruby on Rails All Hands Meeting
PDF
Aws Lambda in Swift - NSLondon - 3rd December 2020
KEY
Writing robust Node.js applications
PDF
locize tech talk
PDF
Docker tlv
DRb at the Ruby Drink-up of Sophia, December 2011
Everything ruby
Dragoncraft Architectural Overview
Apache thrift-RPC service cross languages
Cloud Foundry Open Tour China (english)
Cloud Foundry Open Tour China
How to create multiprocess server on windows with ruby - rubykaigi2016 Ritta ...
Middleware as Code with mruby
FreeSWITCH as a Microservice
DRb and Rinda
D Rb Silicon Valley Ruby Conference
Get your teeth into Plack
Rails web api 开发
TorqueBox - Ruby Hoedown 2011
A language for the Internet: Why JavaScript and Node.js is right for Internet...
Ruby on Rails All Hands Meeting
Aws Lambda in Swift - NSLondon - 3rd December 2020
Writing robust Node.js applications
locize tech talk
Docker tlv

More from Wen-Tien Chang (20)

PDF
評估驅動開發 Eval-Driven Development (EDD): 生成式 AI 軟體不確定性的解決方法
PDF
⼤語⾔模型 LLM 應⽤開發入⾨
PDF
Ruby Rails 老司機帶飛
PDF
A brief introduction to Machine Learning
PDF
淺談 Startup 公司的軟體開發流程 v2
PDF
RSpec on Rails Tutorial
PDF
RSpec & TDD Tutorial
PDF
ALPHAhackathon: How to collaborate
PDF
Git 版本控制系統 -- 從微觀到宏觀
PDF
Exception Handling: Designing Robust Software in Ruby (with presentation note)
PDF
Exception Handling: Designing Robust Software in Ruby
PDF
從 Classes 到 Objects: 那些 OOP 教我的事
PDF
Yet another introduction to Git - from the bottom up
PDF
A brief introduction to Vagrant – 原來 VirtualBox 可以這樣玩
PDF
Ruby 程式語言綜覽簡介
PDF
A brief introduction to SPDY - 邁向 HTTP/2.0
PDF
RubyConf Taiwan 2012 Opening & Closing
PDF
從 Scrum 到 Kanban: 為什麼 Scrum 不適合 Lean Startup
PDF
Git Tutorial 教學
PDF
那些 Functional Programming 教我的事
評估驅動開發 Eval-Driven Development (EDD): 生成式 AI 軟體不確定性的解決方法
⼤語⾔模型 LLM 應⽤開發入⾨
Ruby Rails 老司機帶飛
A brief introduction to Machine Learning
淺談 Startup 公司的軟體開發流程 v2
RSpec on Rails Tutorial
RSpec & TDD Tutorial
ALPHAhackathon: How to collaborate
Git 版本控制系統 -- 從微觀到宏觀
Exception Handling: Designing Robust Software in Ruby (with presentation note)
Exception Handling: Designing Robust Software in Ruby
從 Classes 到 Objects: 那些 OOP 教我的事
Yet another introduction to Git - from the bottom up
A brief introduction to Vagrant – 原來 VirtualBox 可以這樣玩
Ruby 程式語言綜覽簡介
A brief introduction to SPDY - 邁向 HTTP/2.0
RubyConf Taiwan 2012 Opening & Closing
從 Scrum 到 Kanban: 為什麼 Scrum 不適合 Lean Startup
Git Tutorial 教學
那些 Functional Programming 教我的事

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Modernizing your data center with Dell and AMD
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Encapsulation theory and applications.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Review of recent advances in non-invasive hemoglobin estimation
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Diabetes mellitus diagnosis method based random forest with bat algorithm
Reach Out and Touch Someone: Haptics and Empathic Computing
Modernizing your data center with Dell and AMD
Big Data Technologies - Introduction.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Building Integrated photovoltaic BIPV_UPV.pdf
Understanding_Digital_Forensics_Presentation.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Approach and Philosophy of On baking technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Advanced methodologies resolving dimensionality complications for autism neur...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Encapsulation theory and applications.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
MYSQL Presentation for SQL database connectivity
Review of recent advances in non-invasive hemoglobin estimation
The AUB Centre for AI in Media Proposal.docx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

Distributed Ruby and Rails

  • 1. Distributed Ruby and Rails @ihower http://ihower.tw 2010/1
  • 2. About Me • a.k.a. ihower • http://ihower.tw • http://twitter.com/ihower • http://github.com/ihower • Ruby on Rails Developer since 2006 • Ruby Taiwan Community • http://ruby.tw
  • 3. Agenda • Distributed Ruby • Distributed Message Queues • Background-processing in Rails • Message Queues for Rails • SOA for Rails • Distributed Filesystem • Distributed database
  • 4. 1.Distributed Ruby • DRb • Rinda • Starfish • MapReduce • MagLev VM
  • 5. DRb • Ruby's RMI system (remote method invocation) • an object in one Ruby process can invoke methods on an object in another Ruby process on the same or a different machine
  • 6. DRb (cont.) • no defined interface, faster development time • tightly couple applications, because no defined API, but rather method on objects • unreliable under large-scale, heavy loads production environments
  • 7. server example 1 require 'drb' class HelloWorldServer def say_hello 'Hello, world!' end end DRb.start_service("druby://127.0.0.1:61676", HelloWorldServer.new) DRb.thread.join
  • 8. client example 1 require 'drb' server = DRbObject.new_with_uri("druby://127.0.0.1:61676") puts server.say_hello puts server.inspect # Hello, world! # <DRb::DRbObject:0x1003c04c8 @ref=nil, @uri="druby:// 127.0.0.1:61676">
  • 9. example 2 # user.rb class User attr_accessor :username end
  • 10. server example 2 require 'drb' require 'user' class UserServer attr_accessor :users def find(id) self.users[id-1] end end user_server = UserServer.new user_server.users = [] 5.times do |i| user = User.new user.username = i + 1 user_server.users << user end DRb.start_service("druby://127.0.0.1:61676", user_server) DRb.thread.join
  • 11. client example 2 require 'drb' user_server = DRbObject.new_with_uri("druby://127.0.0.1:61676") user = user_server.find(2) puts user.inspect puts "Username: #{user.username}" user.name = "ihower" puts "Username: #{user.username}"
  • 12. Err... # <DRb::DRbUnknown:0x1003b8318 @name="User", @buf="004bo: tUser006:016@usernameia"> # client2.rb:8: undefined method `username' for #<DRb::DRbUnknown:0x1003b8318> (NoMethodError)
  • 13. Why? DRbUndumped • Default DRb operation • Pass by value • Must share code • With DRbUndumped • Pass by reference • No need to share code
  • 14. Example 2 Fixed # user.rb class User include DRbUndumped attr_accessor :username end # <DRb::DRbObject:0x1003b84f8 @ref=2149433940, @uri="druby://127.0.0.1:61676"> # Username: 2 # Username: ihower
  • 15. Why use DRbUndumped? • Big objects • Singleton objects • Lightweight clients • Rapidly changing software
  • 16. ID conversion • Converts reference into DRb object on server • DRbIdConv (Default) • TimerIdConv • NamedIdConv • GWIdConv
  • 17. Beware of garbage collection • referenced objects may be collected on server (usually doesn't matter) • Building Your own ID Converter if you want to control persistent state.
  • 18. DRb security require 'drb' ro = DRbObject.new_with_uri("druby://127.0.0.1:61676") class << ro undef :instance_eval end # !!!!!!!! WARNING !!!!!!!!! DO NOT RUN ro.instance_eval("`rm -rf *`")
  • 19. $SAFE=1 instance_eval': Insecure operation - instance_eval (SecurityError)
  • 20. DRb security (cont.) • Access Control Lists (ACLs) • via IP address array • still can run denial-of-service attack • DRb over SSL
  • 21. Rinda • Rinda is a Ruby port of Linda distributed computing paradigm. • Linda is a model of coordination and communication among several parallel processes operating upon objects stored in and retrieved from shared, virtual, associative memory. This model is implemented as a "coordination language" in which several primitives operating on ordered sequence of typed data objects, "tuples," are added to a sequential language, such as C, and a logically global associative memory, called a tuplespace, in which processes store and retrieve tuples. (WikiPedia)
  • 22. Rinda (cont.) • Rinda consists of: • a TupleSpace implementation • a RingServer that allows DRb services to automatically discover each other.
  • 23. RingServer • We hardcoded IP addresses in DRb program, it’s tight coupling of applications and make fault tolerance difficult. • RingServer can detect and interact with other services on the network without knowing IP addresses.
  • 24. 1. Where Service X? RingServer via broadcast UDP address 2. Service X: 192.168.1.12 Client @192.1681.100 3. Hi, Service X @ 192.168.1.12 Service X @ 192.168.1.12 4. Hi There 192.168.1.100
  • 25. ring server example require 'rinda/ring' require 'rinda/tuplespace' DRb.start_service Rinda::RingServer.new(Rinda::TupleSpace.new) DRb.thread.join
  • 26. service example require 'rinda/ring' class HelloWorldServer include DRbUndumped # Need for RingServer def say_hello 'Hello, world!' end end DRb.start_service ring_server = Rinda::RingFinger.primary ring_server.write([:hello_world_service, :HelloWorldServer, HelloWorldServer.new, 'I like to say hi!'], Rinda::SimpleRenewer.new) DRb.thread.join
  • 27. client example require 'rinda/ring' DRb.start_service ring_server = Rinda::RingFinger.primary service = ring_server.read([:hello_world_service, nil,nil,nil]) server = service[2] puts server.say_hello puts service.inspect # Hello, world! # [:hello_world_service, :HelloWorldServer, #<DRb::DRbObject:0x10039b650 @uri="druby://fe80::21b:63ff:fec9:335f%en1:57416", @ref=2149388540>, "I like to say hi!"]
  • 28. TupleSpaces • Shared object space • Atomic access • Just like bulletin board • Tuple template is [:name, :Class, object, ‘description’ ]
  • 29. 5 Basic Operations • write • read • take (Atomic Read+Delete) • read_all • notify (Callback for write/take/delete)
  • 30. Starfish • Starfish is a utility to make distributed programming ridiculously easy • It runs both the server and the client in infinite loops • MapReduce with ActiveRecode or Files
  • 31. starfish foo.rb # foo.rb class Foo attr_reader :i def initialize @i = 0 end def inc logger.info "YAY it incremented by 1 up to #{@i}" @i += 1 end end server :log => "foo.log" do |object| object = Foo.new end client do |object| object.inc end
  • 32. starfish server example ARGV.unshift('server.rb') require 'rubygems' require 'starfish' class HelloWorld def say_hi 'Hi There' end end Starfish.server = lambda do |object| object = HelloWorld.new end Starfish.new('hello_world').server
  • 33. starfish client example ARGV.unshift('client.rb') require 'rubygems' require 'starfish' Starfish.client = lambda do |object| puts object.say_hi exit(0) # exit program immediately end Starfish.new('hello_world').client
  • 34. starfish client example (another way) ARGV.unshift('server.rb') require 'rubygems' require 'starfish' catch(:halt) do Starfish.client = lambda do |object| puts object.say_hi throw :halt end Starfish.new ('hello_world').client end puts "bye bye"
  • 35. MapReduce • introduced by Google to support distributed computing on large data sets on clusters of computers. • inspired by map and reduce functions commonly used in functional programming.
  • 36. starfish server example ARGV.unshift('server.rb') require 'rubygems' require 'starfish' Starfish.server = lambda{ |map_reduce| map_reduce.type = File map_reduce.input = "/var/log/apache2/access.log" map_reduce.queue_size = 10 map_reduce.lines_per_client = 5 map_reduce.rescan_when_complete = false } Starfish.new('log_server').server
  • 37. starfish client example ARGV.unshift('client.rb') require 'rubygems' require 'starfish' Starfish.client = lambda { |logs| logs.each do |log| puts "Processing #{log}" sleep(1) end } Starfish.new("log_server").client
  • 38. Other implementations • Skynet • Use TupleSpace or MySQL as message queue • Include an extension for ActiveRecord • http://skynet.rubyforge.org/ • MRToolkit based on Hadoop • http://code.google.com/p/mrtoolkit/
  • 39. MagLev VM • a fast, stable, Ruby implementation with integrated object persistence and distributed shared cache. • http://maglev.gemstone.com/ • public Alpha currently
  • 40. 2.Distributed Message Queues • Starling • AMQP/RabbitMQ • Stomp/ActiveMQ • beanstalkd
  • 41. what’s message queue? Message X Client Queue Check and processing Processor
  • 42. Why not DRb? • DRb has security risk and poorly designed APIs • distributed message queue is a great way to do distributed programming: reliable and scalable.
  • 43. Starling • a light-weight persistent queue server that speaks the Memcache protocol (mimics its API) • Fast, effective, quick setup and ease of use • Powered by EventMachine http://eventmachine.rubyforge.org/EventMachine.html • Twitter’s open source project, they use it before 2009. (now switch to Kestrel, a port of Starling from Ruby to Scala)
  • 44. Starling command • sudo gem install starling-starling • http://github.com/starling/starling • sudo starling -h 192.168.1.100 • sudo starling_top -h 192.168.1.100
  • 45. Starling set example require 'rubygems' require 'starling' starling = Starling.new('192.168.1.4:22122') 100.times do |i| starling.set('my_queue', i) end append to the queue, not overwrite in Memcached
  • 46. Starling get example require 'rubygems' require 'starling' starling = Starling.new('192.168.2.4:22122') loop do puts starling.get("my_queue") end
  • 47. get method • FIFO • After get, the object is no longer in the queue. You will lost message if processing error happened. • The get method blocks until something is returned. It’s infinite loop.
  • 48. Handle processing error exception require 'rubygems' require 'starling' starling = Starling.new('192.168.2.4:22122') results = starling.get("my_queue") begin puts results.flatten rescue NoMethodError => e puts e.message Starling.set("my_queue", [results]) rescue Exception => e Starling.set("my_queue", results) raise e end
  • 49. Starling cons • Poll queue constantly • RabbitMQ can subscribe to a queue that notify you when a message is available for processing.
  • 50. AMQP/RabbitMQ • a complete and highly reliable enterprise messaging system based on the emerging AMQP standard. • Erlang • http://github.com/tmm1/amqp • Powered by EventMachine
  • 51. Stomp/ActiveMQ • Apache ActiveMQ is the most popular and powerful open source messaging and Integration Patterns provider. • sudo gem install stomp • ActiveMessaging plugin for Rails
  • 52. beanstalkd • Beanstalk is a simple, fast workqueue service. Its interface is generic, but was originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously. • http://kr.github.com/beanstalkd/ • http://beanstalk.rubyforge.org/ • Facebook’s open source project
  • 53. Why we need asynchronous/ background-processing in Rails? • cron-like processing text search index update etc) (compute daily statistics data, create reports, Full- • long-running tasks (sending mail, resizing photo’s, encoding videos, generate PDF, image upload to S3, posting something to twitter etc) • Server traffic jam: expensive request will block server resources(i.e. your Rails app) • Bad user experience: they maybe try to reload and reload again! (responsive matters)
  • 54. 3.Background- processing for Rails • script/runner • rake • cron • daemon • run_later plugin • spawn plugin
  • 55. script/runner • In Your Rails App root: • script/runner “Worker.process”
  • 56. rake • In RAILS_ROOT/lib/tasks/dev.rake • rake dev:process namespace :dev do task :process do #... end end
  • 57. cron • Cron is a time-based job scheduler in Unix- like computer operating systems. • crontab -e
  • 58. Whenever http://github.com/javan/whenever • A Ruby DSL for Defining Cron Jobs • http://asciicasts.com/episodes/164-cron-in-ruby • or http://cronedit.rubyforge.org/ every 3.hours do runner "MyModel.some_process" rake "my:rake:task" command "/usr/bin/my_great_command" end
  • 60. rufus-scheduler http://github.com/jmettraux/rufus-scheduler • scheduling pieces of code (jobs) • Not replacement for cron/at since it runs inside of Ruby. require 'rubygems' require 'rufus/scheduler' scheduler = Rufus::Scheduler.start_new scheduler.every '5s' do puts 'check blood pressure' end scheduler.join
  • 61. Daemon Kit http://github.com/kennethkalmer/daemon-kit • Creating Ruby daemons by providing a sound application skeleton (through a generator), task specific generators (jabber bot, etc) and robust environment management code.
  • 62. Monitor your daemon • http://mmonit.com/monit/ • http://github.com/arya/bluepill • http://god.rubyforge.org/
  • 63. daemon_controller http://github.com/FooBarWidget/daemon_controller • A library for robust daemon management • Make daemon-dependent applications Just Work without having to start the daemons manually.
  • 64. off-load task via system command # mailings_controller.rb def deliver call_rake :send_mailing, :mailing_id => params[:id].to_i flash[:notice] = "Delivering mailing" redirect_to mailings_url end # controllers/application.rb def call_rake(task, options = {}) options[:rails_env] ||= Rails.env args = options.map { |n, v| "#{n.to_s.upcase}='#{v}'" } system "/usr/bin/rake #{task} #{args.join(' ')} --trace 2>&1 >> #{Rails.root}/log/rake.log &" end # lib/tasks/mailer.rake desc "Send mailing" task :send_mailing => :environment do mailing = Mailing.find(ENV["MAILING_ID"]) mailing.deliver end # models/mailing.rb def deliver sleep 10 # placeholder for sending email update_attribute(:delivered_at, Time.now) end
  • 65. Simple Thread after_filter do Thread.new do AccountMailer.deliver_signup(@user) end end
  • 66. run_later plugin http://github.com/mattmatt/run_later • Borrowed from Merb • Uses worker thread and a queue • Simple solution for simple tasks run_later do AccountMailer.deliver_signup(@user) end
  • 67. spawn plugin http://github.com/tra/spawn spawn do logger.info("I feel sleepy...") sleep 11 logger.info("Time to wake up!") end
  • 68. spawn (cont.) • By default, spawn will use the fork to spawn child processes.You can configure it to do threading. • Works by creating new database connections in ActiveRecord::Base for the spawned block. • Fock need copy Rails every time
  • 69. threading vs. forking • Forking advantages: • more reliable? - the ActiveRecord code is not thread-safe. • keep running - subprocess can live longer than its parent. • easier - just works with Rails default settings. Threading requires you set allow_concurrency=true and. Also, beware of automatic reloading of classes in development mode (config.cache_classes = false). • Threading advantages: • less filling - threads take less resources... how much less? it depends. • debugging - you can set breakpoints in your threads
  • 70. Okay, we need reliable messaging system: • Persistent • Scheduling: not necessarily all at the same time • Scalability: just throw in more instances of your program to speed up processing • Loosely coupled components that merely ‘talk’ to each other • Ability to easily replace Ruby with something else for specific tasks • Easy to debug and monitor
  • 71. 4.Message Queues (for Rails only) • ar_mailer • BackgroundDRb • workling • delayed_job • resque
  • 72. Rails only? • Easy to use/write code • Jobs are Ruby classes or objects • But need to load Rails environment
  • 73. ar_mailer http://seattlerb.rubyforge.org/ar_mailer/ • a two-phase delivery agent for ActionMailer. • Store messages into the database • Delivery by a separate process, ar_sendmail later.
  • 74. BackgroundDRb http://backgroundrb.rubyforge.org/ • BackgrounDRb is a Ruby job server and scheduler. • Have scalability problem due to Mark Bates) (~20 servers for • Hard to know if processing error • Use database to persist tasks • Use memcached to know processing result
  • 75. workling http://github.com/purzelrakete/workling • Gives your Rails App a simple API that you can use to make code run in the background, outside of the your request. • Supports Starling(default), BackgroundJob, Spawn and AMQP/RabbitMQ Runners.
  • 76. Workling/Starling setup • script/plugin install git://github.com/purzelrae/ workling.git • sudo starling -p 15151 • RAILS_ENV=production script/ workling_client start
  • 77. Workling example class EmailWorker < Workling::Base def deliver(options) user = User.find(options[:id]) user.deliver_activation_email end end # in your controller def create EmailWorker.asynch_deliver( :id => 1) end
  • 78. delayed_job • Database backed asynchronous priority queue • Extracted from Shopify • you can place any Ruby object on its queue as arguments • Only load the Rails environment only once
  • 79. delayed_job setup (use fork version) • script/plugin install git://github.com/ collectiveidea/delayed_job.git • script/generate delayed_job • rake db:migrate
  • 80. delayed_job example send_later def deliver mailing = Mailing.find(params[:id]) mailing.send_later(:deliver) flash[:notice] = "Mailing is being delivered." redirect_to mailings_url end
  • 81. delayed_job example custom workers class MailingJob < Struct.new(:mailing_id) def perform mailing = Mailing.find(mailing_id) mailing.deliver end end # in your controller def deliver Delayed::Job.enqueue(MailingJob.new(params[:id])) flash[:notice] = "Mailing is being delivered." redirect_to mailings_url end
  • 82. delayed_job example always asynchronously class Device def deliver # long running method end handle_asynchronously :deliver end device = Device.new device.deliver
  • 83. Running jobs • rake jobs:works (Don’t use in production, it will exit if the database has any network connectivity problems.) • RAILS_ENV=production script/delayed_job start • RAILS_ENV=production script/delayed_job stop
  • 84. Priority just Integer, default is 0 • you can run multipie workers to handle different priority jobs • RAILS_ENV=production script/delayed_job -min- priority 3 start Delayed::Job.enqueue(MailingJob.new(params[:id]), 3) Delayed::Job.enqueue(MailingJob.new(params[:id]), -3)
  • 85. Scheduled no guarantees at precise time, just run_after_at Delayed::Job.enqueue(MailingJob.new(params[:id]), 3, 3.days.from_now) Delayed::Job.enqueue(MailingJob.new(params[:id]), 3, 1.month.from_now.beginning_of_month)
  • 86. Configuring Dealyed Job # config/initializers/delayed_job_config.rb Delayed::Worker.destroy_failed_jobs = false Delayed::Worker.sleep_delay = 5 # sleep if empty queue Delayed::Worker.max_attempts = 25 Delayed::Worker.max_run_time = 4.hours # set to the amount of time of longest task will take
  • 87. Automatic retry on failure • If a method throws an exception it will be caught and the method rerun later. • The method will be retried up to 25 (default) times at increasingly longer intervals until it passes. • 108 hours at most Job.db_time_now + (job.attempts ** 4) + 5
  • 88. Capistrano Recipes • Remember to restart delayed_job after deployment • Check out lib/delayed_job/recipes.rb after "deploy:stop", "delayed_job:stop" after "deploy:start", "delayed_job:start" after "deploy:restart", "delayed_job:restart"
  • 89. Resque http://github.com/defunkt/resque • a Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them later. • Github’s open source project • you can only place JSONable Ruby objects • includes a Sinatra app for monitoring what's going on • support multiple queues • you expect a lot of failure/chaos
  • 90. My recommendations: • General purpose: delayed_job (Github highly recommend DelayedJob to anyone whose site is not 50% background work.) • Time-scheduled: cron + rake
  • 91. 5. SOA for Rails • What’s SOA • Why SOA • Considerations • The tool set
  • 92. What’s SOA Service oriented architectures • “monolithic” approach is not enough • SOA is a way to design complex applications by splitting out major components into individual services and communicating via APIs. • a service is a vertical slice of functionality: database, application code and caching layer
  • 93. a monolithic web app example request Load Balancer WebApps Database
  • 94. a SOA example request Load request Balancer WebApp WebApps for Administration for User Services A Services B Database Database
  • 95. Why SOA? Isolation • Shared Resources • Encapsulation • Scalability • Interoperability • Reuse • Testability • Reduce Local Complexity
  • 96. Shared Resources • Different front-web website use the same resource. • SOA help you avoiding duplication databases and code. • Why not only shared database? • code is not DRY WebApp for Administration WebApps for User • caching will be problematic Database
  • 97. Encapsulation • you can change underly implementation in services without affect other parts of system • upgrade library • upgrade to Ruby 1.9 • you can provide API versioning
  • 98. Scalability1: Partitioned Data Provides • Database is the first bottleneck, a single DB server can not scale. SOA help you reduce database load • Anti-pattern: only split the database WebApps • model relationship is broken • referential integrity Database A Database B • Myth: database replication can not help you speed and consistency
  • 99. Scalability 2: Caching • SOA help you design caching system easier • Cache data at the right times and expire at the right times • Cache logical model, not physical • You do not need cache view everywhere
  • 100. Scalability 3: Efficient • Different components have different task loading, SOA can scale by service. WebApps Load Balancer Load Balancer Services A Services A Services B Services B Services B Services B
  • 101. Security • Different services can be inside different firewall • You can only open public web and services, others are inside firewall.
  • 102. Interoperability • HTTP is the common interface, SOA help you integrate them: • Multiple languages • Internal system e.g. Full-text searching engine • Legacy database, system • External vendors
  • 103. Reuse • Reuse across multiple applications • Reuse for public APIs • Example: Amazon Web Services (AWS)
  • 104. Testability • Isolate problem • Mocking API calls • Reduce the time to run test suite
  • 105. Reduce Local Complexity • Team modularity along the same module splits as your software • Understandability: The amount of code is minimized to a quantity understandable by a small team • Source code control
  • 106. Considerations • Partition into Separate Services • API Design • Which Protocol
  • 107. How to partition into Separate Services • Partitioning on Logical Function • Partitioning on Read/Write Frequencies • Partitioning by Minimizing Joins • Partitioning by Iteration Speed
  • 108. API Design • Send Everything you need • Parallel HTTP requests • Send as Little as Possible • Use Logical Models
  • 109. Physical Models & Logical Models • Physical models are mapped to database tables through ORM. (It’s 3NF) • Logical models are mapped to your business problem. (External API use it) • Logical models are mapped to physical models by you.
  • 110. Logical Models • Not relational or normalized • Maintainability • can change with no change to data store • can stay the same while the data store changes • Better fit for REST interfaces • Better caching
  • 111. Which Protocol? • SOAP • XML-RPC • REST
  • 112. RESTful Web services • Rails way • REST is about resources • URL • Verbs: GET/PUT/POST/DELETE
  • 113. The tool set • Web framework • XML Parser • JSON Parser • HTTP Client
  • 114. Web framework • We do not need controller, view too much • Rails is a little more, how about Sinatra? • Rails metal
  • 115. ActiveResource • Mapping RESTful resources as models in a Rails application. • But not useful in practice, why?
  • 116. XML parser • http://nokogiri.org/ • Nokogiri ( ) is an HTML, XML, SAX, and Reader parser. Among Nokogiri’s many features is the ability to search documents via XPath or CSS3 selectors.
  • 117. JSON Parser • http://github.com/brianmario/yajl-ruby/ • An extremely efficient streaming JSON parsing and encoding library. Ruby C bindings to Yajl
  • 118. HTTP Client • http://github.com/pauldix/typhoeus/ • Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic
  • 119. Tips • Define your logical model (i.e. your service request result) first. • model.to_json and model.to_xml is easy to use, but not useful in practice.
  • 120. 6.Distributed File System • NFS not scale • we can use rsync to duplicate • MogileFS • http://www.danga.com/mogilefs/ • http://seattlerb.rubyforge.org/mogilefs-client/ • Amazon S3 • HDFS (Hadoop Distributed File System) • GlusterFS
  • 121. 7.Distributed Database • NoSQL • CAP theorem • Eventually consistent • HBase/Cassandra/Voldemort
  • 123. References • Books&Articles: • Distributed Programming with Ruby, Mark Bates (Addison Wesley) • Enterprise Rails, Dan Chak (O’Reilly) • Service-Oriented Design with Ruby and Rails, Paul Dix (Addison Wesley) • RESTful Web Services, Richardson&Ruby (O’Reilly) • RESTful WEb Services Cookbook, Allamaraju&Amundsen (O’Reilly) • Enterprise Recipes with Ruby on Rails, Maik Schmidt (The Pragmatic Programmers) • Ruby in Practice, McAnally&Arkin (Manning) • Building Scalable Web Sites, Cal Henderson (O’Reilly) • Background Processing in Rails, Erik Andrejko (Rails Magazine) • Background Processing with Delayed_Job, James Harrison (Rails Magazine) • Bulinging Scalable Web Sites, Cal Henderson (O’Reilly) • Web 点 ( ) • Slides: • Background Processing (Rob Mack) Austin on Rails - April 2009 • The Current State of Asynchronous Processing in Ruby (Mathias Meyer, Peritor GmbH) • Asynchronous Processing (Jonathan Dahl) • Long-Running Tasks In Rails Without Much Effort (Andy Stewart) - April 2008 • Starling + Workling: simple distributed background jobs with Twitter’s queuing system, Rany Keddo 2008 • Physical Models & Logical Models in Rails, dan chak
  • 124. References • Links: • http://segment7.net/projects/ruby/drb/ • http://www.slideshare.net/luccastera/concurrent-programming-with-ruby-and-tuple-spaces • http://github.com/blog/542-introducing-resque • http://www.engineyard.com/blog/2009/5-tips-for-deploying-background-jobs/ • http://www.opensourcery.co.za/2008/07/07/messaging-and-ruby-part-1-the-big-picture/ • http://leemoonsoo.blogspot.com/2009/04/simple-comparison-open-source.html • http://blog.gslin.org/archives/2009/07/25/2065/ • http://www.javaeye.com/topic/524977 • http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
  • 125. Todo (maybe next time) • AMQP/RabbitMQ example code • How about Nanite? • XMPP • MagLev VM • More MapReduce example code • How about Amazon Elastic MapReduce? • Resque example code • More SOA example and code • MogileFS example code