SlideShare a Scribd company logo
Job Queues
Queue 
● Allows for asynchronous computation of jobs (or tasks) 
● Uses consumers (or workers) to complete the job in the 
background 
● Results are available when the job is complete
Queue 
● First In First Out data structure (FIFO)
Queue Operations 
● enqueue ➜ adds an item to end of queue 
● dequeue ➜ pulls the oldest item off the queue 
● isEmpty ➜ boolean 
● length ➜ integer (number of items in queue)
Queue Data Structure 
For an unbounded queue, we choose a singly linked list 
with head and tail pointers as the data structure. 
● enqueue - sets current tail next pointer and tail pointer to new item 
● dequeue - returns current head and sets head pointer to head next pointer 
● isEmpty - head/tail is null 
All O(1) operations!
Producers 
Producers push jobs onto the job queue 
Examples: 
● Web servers - A typical HTTP response must return 
within a short timeframe (200ms - 2000ms) 
● Humans phoning into tech support
Consumers 
Consumers pop jobs off of the queue and complete them 
Example use cases (any long running process): 
● Map / reduce calls on large datasets 
● Media conversion, manipulation and rendering 
● Image resize 
● Downloading remote resources 
● CPU intensive tasks (calculations)
Producers and Consumers 
Producers and Consumers can be part of the same 
process! 
Example: a web crawler (breadth first search) 
1. Push a base URL to the queue (e.g. http://yahoo.com/) 
2. Pop a URL from the queue and parse it 
3. For each link the page, push it onto the queue 
4. Goto 2
Job States 
Each job exists in one of the following states: 
● Queued 
● Processing (in progress) 
● Completed 
● Failed 
Jobs may also output: 
● Logs 
● Progress (% complete)
Job Data 
Consumers are functional. The only input they receive 
comes from the job, which comes from the producer. 
Job data should include: 
● Type 
● Any information needed to complete the job
Amdahl’s law... 
...states that the speedup a concurrent algorithm can 
achieve is limited by the serial path. 
Locks and serial parts limit the maximum performance of a 
concurrent system.
Job Queues Overview
Priority Queue 
● Priority ordered Queue data structure 
● Highest priority jobs are dequeued first 
● On the same priority level, oldest jobs are dequeued 
first
Priority Queue Operations 
● enqueue ➜ adds a job to end of queue with a priorty 
● dequeue ➜ pulls the highest priority, oldest job off the 
queue 
● isEmpty ➜ boolean 
● length ➜ integer (number of items in queue)
Priority Queue Data Structure 
● Data structure (max heap) 
● Binary tree with the max heap property (each parent 
node is larger than its children) 
● For a priority queue, each item in the tree would be a 
pointer to a regular queue for that priority 
Enqueue and dequeue O(log n) operations!
Priority Queue Metrics 
● Average wait time per job type 
● Number of queued jobs 
● Jobs processed / time 
● Jobs pushed / time 
Jobs processed / time ≥ Jobs push / time 
Otherwise a backlog forms!
Job Scheduler 
In sophisticated job systems, a job scheduler exists to: 
● Maximize use of computing power 
● Minimize wait time 
● Provide an interface to job tasks 
They can use a combination of priority, estimated 
(historical) job time and available computing power to 
determine how jobs are run. Sophisticated job scheduling 
algorithms exists.
Case Study: Grocery Lines
Case Study: Grocery Lines 
4 consumers, 4 queues, 12 jobs of varying durations 
Average wait time = (10 + 13 + 4 + 6 + 1 + 9 + 6 + 13) / 12 = 5.1666...
Case Study: Grocery Lines 
4 consumers, 1 queue, 12 jobs of varying durations 
Order: 6, 1, 4, 10, 7 (1), 8 (4), 2 (6), 3 (6), 11 (8), 5 (8), 12 (9), 9 (10) 
Average wait time = (1 + 4 + 6 + 6 + 8 + 8 + 9 + 10) / 12 = 4.3333...
Case Study: Grocery Lines 
4 consumers, 1 queue, 12 jobs of varying durations 
intelligently ordered to minimize wait time: 
Order: 1, 2, 3, 4, 5 (1), 6 (2), 7 (3), 8 (4), 9 (5), 10 (6), 11 (8), 12 (9) 
Average wait time = (1 + 2 + 3 + 4 + 5 + 6 + 8 + 9) / 12 = 3.1667...
Job Queue Software 
● Beanstalkd (C) http://kr.github.io/beanstalkd/ 
● Celery (Python + many backends) http://www.celeryproject.org/ 
● Delayed::Job (Ruby + DB) https://github.com/collectiveidea/delayed_job 
● Gearman (C++) http://gearman.org/ 
● Kue (Node + Redis) https://github.com/learnboost/kue 
● Resque (Ruby + Redis) http://resquework.org/ 
● RQ (Python + Redis) http://python-rq.org/ 
● Sidekiq (Ruby) http://sidekiq.org/ 
● SQS by Amazon (managed) http://aws.amazon.com/sqs/ 
More links and information at http://queues.io/

More Related Content

PDF
Outbrain Click Prediction
PDF
Ad Placement Challenge
PPTX
Dremel interactive analysis of web scale datasets
PPTX
Migration strategies for a mission critical cluster
PDF
Avito Duplicate Ads Detection @ kaggle
PDF
Ganga: an interface to the LHC computing grid
PDF
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
ODP
IIUG 2016 Gathering Informix data into R
Outbrain Click Prediction
Ad Placement Challenge
Dremel interactive analysis of web scale datasets
Migration strategies for a mission critical cluster
Avito Duplicate Ads Detection @ kaggle
Ganga: an interface to the LHC computing grid
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
IIUG 2016 Gathering Informix data into R

Similar to Job Queues Overview (20)

PDF
20140120 presto meetup_en
PDF
How I learned to time travel, or, data pipelining and scheduling with Airflow
PDF
How I learned to time travel, or, data pipelining and scheduling with Airflow
PDF
Spring batch overivew
PPTX
Journey through high performance django application
PDF
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
PPTX
Lessons learned from designing QA automation event streaming platform(IoT big...
PPTX
Ledingkart Meetup #2: Scaling Search @Lendingkart
PDF
Test strategies for data processing pipelines, v2.0
PDF
Data pipelines from zero to solid
PDF
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
PDF
Multiprocessing.pdf..............,.......
PDF
Multiprocessing.............,...........
PDF
parallelizing_the_naughty_dog_engine_using_fibers.pdf
PDF
Cassandra Data Modeling
PDF
Apache airflow
PDF
introduction to data processing using Hadoop and Pig
PPT
Part 1 - PROCESS CONCEPTS
PDF
What’s new in 9.6, by PostgreSQL contributor
20140120 presto meetup_en
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
Spring batch overivew
Journey through high performance django application
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Lessons learned from designing QA automation event streaming platform(IoT big...
Ledingkart Meetup #2: Scaling Search @Lendingkart
Test strategies for data processing pipelines, v2.0
Data pipelines from zero to solid
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Multiprocessing.pdf..............,.......
Multiprocessing.............,...........
parallelizing_the_naughty_dog_engine_using_fibers.pdf
Cassandra Data Modeling
Apache airflow
introduction to data processing using Hadoop and Pig
Part 1 - PROCESS CONCEPTS
What’s new in 9.6, by PostgreSQL contributor
Ad

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Electronic commerce courselecture one. Pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
A Presentation on Artificial Intelligence
Spectral efficient network and resource selection model in 5G networks
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
NewMind AI Monthly Chronicles - July 2025
Diabetes mellitus diagnosis method based random forest with bat algorithm
“AI and Expert System Decision Support & Business Intelligence Systems”
20250228 LYD VKU AI Blended-Learning.pptx
Review of recent advances in non-invasive hemoglobin estimation
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
cuic standard and advanced reporting.pdf
Encapsulation theory and applications.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
The AUB Centre for AI in Media Proposal.docx
Machine learning based COVID-19 study performance prediction
Electronic commerce courselecture one. Pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MYSQL Presentation for SQL database connectivity
A Presentation on Artificial Intelligence
Ad

Job Queues Overview

  • 2. Queue ● Allows for asynchronous computation of jobs (or tasks) ● Uses consumers (or workers) to complete the job in the background ● Results are available when the job is complete
  • 3. Queue ● First In First Out data structure (FIFO)
  • 4. Queue Operations ● enqueue ➜ adds an item to end of queue ● dequeue ➜ pulls the oldest item off the queue ● isEmpty ➜ boolean ● length ➜ integer (number of items in queue)
  • 5. Queue Data Structure For an unbounded queue, we choose a singly linked list with head and tail pointers as the data structure. ● enqueue - sets current tail next pointer and tail pointer to new item ● dequeue - returns current head and sets head pointer to head next pointer ● isEmpty - head/tail is null All O(1) operations!
  • 6. Producers Producers push jobs onto the job queue Examples: ● Web servers - A typical HTTP response must return within a short timeframe (200ms - 2000ms) ● Humans phoning into tech support
  • 7. Consumers Consumers pop jobs off of the queue and complete them Example use cases (any long running process): ● Map / reduce calls on large datasets ● Media conversion, manipulation and rendering ● Image resize ● Downloading remote resources ● CPU intensive tasks (calculations)
  • 8. Producers and Consumers Producers and Consumers can be part of the same process! Example: a web crawler (breadth first search) 1. Push a base URL to the queue (e.g. http://yahoo.com/) 2. Pop a URL from the queue and parse it 3. For each link the page, push it onto the queue 4. Goto 2
  • 9. Job States Each job exists in one of the following states: ● Queued ● Processing (in progress) ● Completed ● Failed Jobs may also output: ● Logs ● Progress (% complete)
  • 10. Job Data Consumers are functional. The only input they receive comes from the job, which comes from the producer. Job data should include: ● Type ● Any information needed to complete the job
  • 11. Amdahl’s law... ...states that the speedup a concurrent algorithm can achieve is limited by the serial path. Locks and serial parts limit the maximum performance of a concurrent system.
  • 13. Priority Queue ● Priority ordered Queue data structure ● Highest priority jobs are dequeued first ● On the same priority level, oldest jobs are dequeued first
  • 14. Priority Queue Operations ● enqueue ➜ adds a job to end of queue with a priorty ● dequeue ➜ pulls the highest priority, oldest job off the queue ● isEmpty ➜ boolean ● length ➜ integer (number of items in queue)
  • 15. Priority Queue Data Structure ● Data structure (max heap) ● Binary tree with the max heap property (each parent node is larger than its children) ● For a priority queue, each item in the tree would be a pointer to a regular queue for that priority Enqueue and dequeue O(log n) operations!
  • 16. Priority Queue Metrics ● Average wait time per job type ● Number of queued jobs ● Jobs processed / time ● Jobs pushed / time Jobs processed / time ≥ Jobs push / time Otherwise a backlog forms!
  • 17. Job Scheduler In sophisticated job systems, a job scheduler exists to: ● Maximize use of computing power ● Minimize wait time ● Provide an interface to job tasks They can use a combination of priority, estimated (historical) job time and available computing power to determine how jobs are run. Sophisticated job scheduling algorithms exists.
  • 19. Case Study: Grocery Lines 4 consumers, 4 queues, 12 jobs of varying durations Average wait time = (10 + 13 + 4 + 6 + 1 + 9 + 6 + 13) / 12 = 5.1666...
  • 20. Case Study: Grocery Lines 4 consumers, 1 queue, 12 jobs of varying durations Order: 6, 1, 4, 10, 7 (1), 8 (4), 2 (6), 3 (6), 11 (8), 5 (8), 12 (9), 9 (10) Average wait time = (1 + 4 + 6 + 6 + 8 + 8 + 9 + 10) / 12 = 4.3333...
  • 21. Case Study: Grocery Lines 4 consumers, 1 queue, 12 jobs of varying durations intelligently ordered to minimize wait time: Order: 1, 2, 3, 4, 5 (1), 6 (2), 7 (3), 8 (4), 9 (5), 10 (6), 11 (8), 12 (9) Average wait time = (1 + 2 + 3 + 4 + 5 + 6 + 8 + 9) / 12 = 3.1667...
  • 22. Job Queue Software ● Beanstalkd (C) http://kr.github.io/beanstalkd/ ● Celery (Python + many backends) http://www.celeryproject.org/ ● Delayed::Job (Ruby + DB) https://github.com/collectiveidea/delayed_job ● Gearman (C++) http://gearman.org/ ● Kue (Node + Redis) https://github.com/learnboost/kue ● Resque (Ruby + Redis) http://resquework.org/ ● RQ (Python + Redis) http://python-rq.org/ ● Sidekiq (Ruby) http://sidekiq.org/ ● SQS by Amazon (managed) http://aws.amazon.com/sqs/ More links and information at http://queues.io/

Editor's Notes

  • #8: .wav to .mp3
  • #10: .wav to .mp3
  • #11: .wav to .mp3
  • #16: Visualization from http://www.comp.nus.edu.sg/~stevenha/visualization/heap.html Specialized data structures exist at http://en.wikipedia.org/wiki/Priority_queue#Implementation