Job Queues Overview

Queue
● Allows for asynchronous computation of jobs (or tasks)
● Uses consumers (or workers) to complete the job in the
background
● Results are available when the job is complete

Queue
● First In First Out data structure (FIFO)

Queue Operations
● enqueue ➜ adds an item to end of queue
● dequeue ➜ pulls the oldest item off the queue
● isEmpty ➜ boolean
● length ➜ integer (number of items in queue)

Queue Data Structure
For an unbounded queue, we choose a singly linked list
with head and tail pointers as the data structure.
● enqueue - sets current tail next pointer and tail pointer to new item
● dequeue - returns current head and sets head pointer to head next pointer
● isEmpty - head/tail is null
All O(1) operations!

Producers
Producers push jobs onto the job queue
Examples:
● Web servers - A typical HTTP response must return
within a short timeframe (200ms - 2000ms)
● Humans phoning into tech support

Consumers
Consumers pop jobs off of the queue and complete them
Example use cases (any long running process):
● Map / reduce calls on large datasets
● Media conversion, manipulation and rendering
● Image resize
● Downloading remote resources
● CPU intensive tasks (calculations)

Producers and Consumers
Producers and Consumers can be part of the same
process!
Example: a web crawler (breadth first search)
1. Push a base URL to the queue (e.g. http://yahoo.com/)
2. Pop a URL from the queue and parse it
3. For each link the page, push it onto the queue
4. Goto 2

Job States
Each job exists in one of the following states:
● Queued
● Processing (in progress)
● Completed
● Failed
Jobs may also output:
● Logs
● Progress (% complete)

Job Data
Consumers are functional. The only input they receive
comes from the job, which comes from the producer.
Job data should include:
● Type
● Any information needed to complete the job

Amdahl’s law...
...states that the speedup a concurrent algorithm can
achieve is limited by the serial path.
Locks and serial parts limit the maximum performance of a
concurrent system.

Priority Queue
● Priority ordered Queue data structure
● Highest priority jobs are dequeued first
● On the same priority level, oldest jobs are dequeued
first

Priority Queue Operations
● enqueue ➜ adds a job to end of queue with a priorty
● dequeue ➜ pulls the highest priority, oldest job off the
queue
● isEmpty ➜ boolean
● length ➜ integer (number of items in queue)

Priority Queue Data Structure
● Data structure (max heap)
● Binary tree with the max heap property (each parent
node is larger than its children)
● For a priority queue, each item in the tree would be a
pointer to a regular queue for that priority
Enqueue and dequeue O(log n) operations!

Priority Queue Metrics
● Average wait time per job type
● Number of queued jobs
● Jobs processed / time
● Jobs pushed / time
Jobs processed / time ≥ Jobs push / time
Otherwise a backlog forms!

Job Scheduler
In sophisticated job systems, a job scheduler exists to:
● Maximize use of computing power
● Minimize wait time
● Provide an interface to job tasks
They can use a combination of priority, estimated
(historical) job time and available computing power to
determine how jobs are run. Sophisticated job scheduling
algorithms exists.

Case Study: Grocery Lines
4 consumers, 4 queues, 12 jobs of varying durations
Average wait time = (10 + 13 + 4 + 6 + 1 + 9 + 6 + 13) / 12 = 5.1666...

4 consumers, 1 queue, 12 jobs of varying durations
Order: 6, 1, 4, 10, 7 (1), 8 (4), 2 (6), 3 (6), 11 (8), 5 (8), 12 (9), 9 (10)

4 consumers, 1 queue, 12 jobs of varying durations
intelligently ordered to minimize wait time:
Order: 1, 2, 3, 4, 5 (1), 6 (2), 7 (3), 8 (4), 9 (5), 10 (6), 11 (8), 12 (9)

Job Queue Software
● Beanstalkd (C) http://kr.github.io/beanstalkd/
● Celery (Python + many backends) http://www.celeryproject.org/
● Delayed::Job (Ruby + DB) https://github.com/collectiveidea/delayed_job
● Gearman (C++) http://gearman.org/
● Kue (Node + Redis) https://github.com/learnboost/kue
● Resque (Ruby + Redis) http://resquework.org/
● RQ (Python + Redis) http://python-rq.org/
● Sidekiq (Ruby) http://sidekiq.org/
● SQS by Amazon (managed) http://aws.amazon.com/sqs/
More links and information at http://queues.io/

Job Queues Overview

More Related Content

Similar to Job Queues Overview (20)

Recently uploaded (20)

Job Queues Overview

Editor's Notes