SlideShare a Scribd company logo
QuickTime™ and a
decompressor
are needed to see this picture.
HaLoop: Efficient Iterative Data
Processing On Large Scale Clusters
Yingyi Bu, UC Irvine
Bill Howe, UW
Magda Balazinska, UW
Michael Ernst, UW
http://clue.cs.washington.edu/
Award IIS 0844572
Cluster Exploratory (CluE)
QuickTime™ and a
decompressor
are needed to see this picture.
http://escience.washington.edu/
VLDB 2010, Singapore
Horizon
01/30/15Bill Howe, UW 2QuickTime™ and a
decompressor
are needed to see this picture.
Thesis in one slide
 Observation: MapReduce has proven successful as a
common runtime for non-recursive declarative languages
 HIVE (SQL)
 Pig (RA with nested types)
 Observation: Many people roll their own loops
 Graphs, clustering, mining, recursive queries
 iteration managed by external script
 Thesis: With minimal extensions, we can provide an efficient
common runtime for recursive languages
 Map, Reduce, Fixpoint
01/30/15Bill Howe, UW 3QuickTime™ and a
decompressor
are needed to see this picture.
Related Work: Twister [Ekanayake HPDC 2010]
 Redesigned evaluation engine using pub/sub
 Termination condition evaluated by main()
13. while(!complete){
14. monitor = driver.runMapReduceBCast(cData);
15. monitor.monitorTillCompletion();
16. DoubleVectorData newCData = ((KMeansCombiner) driver
.getCurrentCombiner()).getResults();
17. totalError = getError(cData, newCData);
18. cData = newCData;
19. if (totalError < THRESHOLD) {
20. complete = true;
21. break;
22. }
23. }
O(k)
01/30/15Bill Howe, UW 4QuickTime™ and a
decompressor
are needed to see this picture.
In Detail: PageRank (Twister)
while (!complete) {
// start the pagerank map reduce process
monitor = driver.runMapReduceBCast(new
BytesValue(tmpCompressedDvd.getBytes()));
monitor.monitorTillCompletion();
// get the result of process
newCompressedDvd = ((PageRankCombiner)
driver.getCurrentCombiner()).getResults();
// decompress the compressed pagerank values
newDvd = decompress(newCompressedDvd);
tmpDvd = decompress(tmpCompressedDvd);
totalError = getError(tmpDvd, newDvd);
// get the difference between new and old pagerank values
if (totalError < tolerance) {
complete = true;
}
tmpCompressedDvd = newCompressedDvd;
}
O(N) in the size
of the graph
run MR
term.
cond.
01/30/15Bill Howe, UW 5QuickTime™ and a
decompressor
are needed to see this picture.
Related Work: Spark [Zaharia HotCloud 2010]
 Reduction output collected at driver program

“…does not currently support a grouped reduce
operation as in MapReduce”
val spark = new SparkContext(<Mesos master>)
var count = spark.accumulator(0)
for (i <- spark.parallelize(1 to 10000, 10)) {
val x = Math.random * 2 - 1
val y = Math.random * 2 - 1
if (x*x + y*y < 1) count += 1
}
println("Pi is roughly " + 4 * count.value / 10000.0)
all output sent
to driver.
01/30/15Bill Howe, UW 6QuickTime™ and a
decompressor
are needed to see this picture.
Related Work: Pregel [Malewicz PODC 2009]
 Graphs only
 clustering: k-means, canopy, DBScan
 Assumes each vertex has access to outgoing edges
 So an edge representation …
 …requires offline preprocessing
 perhaps using MapReduce
Edge(from, to)
01/30/15Bill Howe, UW 7QuickTime™ and a
decompressor
are needed to see this picture.
Related Work: Piccolo [Power OSDI 2010]
 Partitioned table data model, with user-
defined partitioning
 Programming model:
 message-passing with global synchronization
barriers
 User can give locality hints
 Worth exploring a direct comparison
GroupTables(curr, next, graph)
01/30/15Bill Howe, UW 8QuickTime™ and a
decompressor
are needed to see this picture.
Related Work: BOOM [c.f. Alvaro EuroSys 10]
 Distributed computing based on Overlog
(Datalog + temporal logic + more)
 Recursion supported naturally
 app: API-compliant implementation of MR
 Worth exploring a direct comparison
01/30/15Bill Howe, UW 9QuickTime™ and a
decompressor
are needed to see this picture.
Details
 Architecture
 Programming Model
 Caching (and Indexing)
 Scheduling
01/30/15Bill Howe, UW 10QuickTime™ and a
decompressor
are needed to see this picture.
Example 1: PageRank
url rank
www.a.com 1.0
www.b.com 1.0
www.c.com 1.0
www.d.com 1.0
www.e.com 1.0
url_src url_dest
www.a.com www.b.com
www.a.com www.c.com
www.c.com www.a.com
www.e.com www.c.com
www.d.com www.b.com
www.c.com www.e.com
www.e.com www.c.om
www.a.com www.d.com
Rank Table R0
Linkage Table L
url rank
www.a.com 2.13
www.b.com 3.89
www.c.com 2.60
www.d.com 2.60
www.e.com 2.13
Rank Table R3
Ri L
Ri.rank = Ri.rank/γurlCOUNT(url_dest)
Ri.url = L.url_src
π(url_dest, γurl_destSUM(rank))
Ri+1
01/30/15Bill Howe, UW 11QuickTime™ and a
decompressor
are needed to see this picture.
A MapReduce Implementation
M
M
M
M
M
r
r
Ri
L-split1
L-split0
M
M
r
r
i=i+1 Converged?
Join & compute rank
Aggregate fixpoint evaluation
Client
done
r
r
01/30/15Bill Howe, UW 12QuickTime™ and a
decompressor
are needed to see this picture.
What’s the problem?
1. L is loaded on each iteration
2. L is shuffled on each iteration
3. Fixpoint evaluated as a separate MapReduce job per iteration
m
m
m
Ri
L-split1
L-split0
M
M
r
r
1.
2.
3.
L is loop invariant, but
plus
r
r M
M
r
r
01/30/15Bill Howe, UW 13QuickTime™ and a
decompressor
are needed to see this picture.
Example 2: Transitive Closure
Friend Find all transitive friends of Eric
{Eric, Elisa}
{Eric, Tom
Eric, Harry}
{}
R1
R0 {Eric, Eric}
R2
R3
(semi-naïve evaluation)
01/30/15Bill Howe, UW 14QuickTime™ and a
decompressor
are needed to see this picture.
Example 2 in MapReduce
M
M
M
M
M
r
r
Si
Friend1
Friend0
i=i+1
Anything new?
Join
Dupe-elim
Client
done
r
r
(compute next generation of friends)
(remove the ones
we’ve already seen)
01/30/15Bill Howe, UW 15QuickTime™ and a
decompressor
are needed to see this picture.
What’s the problem?
1. Friend is loaded on each iteration
2. Friend is shuffled on each iteration
Friend is loop invariant, but
M
M
M
M
M
r
r
Si
Friend1
Friend0
Join
Dupe-elim
r
r
(compute next generation of friends)
(remove the ones
we’ve already seen)
1.
2.
01/30/15Bill Howe, UW 16QuickTime™ and a
decompressor
are needed to see this picture.
Example 3: k-means
M
M
M
P0
i=i+1
ki - ki+1 < threshold?
Client
done
r
r
P1
P2
= k centroids at iteration iki
ki
ki
ki
ki+1
01/30/15Bill Howe, UW 17QuickTime™ and a
decompressor
are needed to see this picture.
What’s the problem?
M
M
M
P0
i=i+1
ki - ki+1 < threshold?
Client
done
r
r
P1
P2
= k centroids at iteration ikiki
ki
ki
ki+1
1. P is loaded on each iteration
P is loop invariant, but
1.
01/30/15Bill Howe, UW 18QuickTime™ and a
decompressor
are needed to see this picture.
Approach: Inter-iteration caching
Mapper input cache (MI)
Mapper output cache (MO)
Reducer input cache (RI)
Reducer output cache (RO)
M
M
M
r
r
…
Loop body
01/30/15Bill Howe, UW 19QuickTime™ and a
decompressor
are needed to see this picture.
RI: Reducer Input Cache
 Provides:
 Access to loop invariant data without
map/shuffle
 Used By:
 Reducer function
 Assumes:
1. Mapper output for a given table constant
across iterations
2. Static partitioning (implies: no new nodes)
 PageRank
 Avoid shuffling the network at every step
 Transitive Closure
 Avoid shuffling the graph at every step
 K-means
 No help
…
01/30/15Bill Howe, UW 20QuickTime™ and a
decompressor
are needed to see this picture.
Reducer Input Cache Benefit
Transitive Closure
Billion Triples Dataset (120GB)
90 small instances on EC2
Overall run time
01/30/15Bill Howe, UW 21QuickTime™ and a
decompressor
are needed to see this picture.
Reducer Input Cache Benefit
Transitive Closure
Billion Triples Dataset (120GB)
90 small instances on EC2
Join step only
Livejournal, 12GB
01/30/15Bill Howe, UW 22QuickTime™ and a
decompressor
are needed to see this picture.
Reducer Input Cache Benefit
Transitive Closure
Billion Triples Dataset (120GB)
90 small instances on EC2
Reduce and Shuffle of Join Step
Livejournal, 12GB
01/30/15Bill Howe, UW 23QuickTime™ and a
decompressor
are needed to see this picture.
Join & compute rank
M
M
M
M
M
r
r
Ri
L-split1
L-split0
M
M
r
r
Aggregate fixpoint evaluation
r
r
Total
01/30/15Bill Howe, UW 24QuickTime™ and a
decompressor
are needed to see this picture.
RO: Reducer Output Cache
 Provides:
 Distributed access to output of previous
iterations
 Used By:
 Fixpoint evaluation
 Assumes:
1. Partitioning constant across iterations
2. Reducer output key functionally
determines Reducer input key
 PageRank
 Allows distributed fixpoint evaluation
 Obviates extra MapReduce job
 Transitive Closure
 No help
 K-means
 No help
…
01/30/15Bill Howe, UW 25QuickTime™ and a
decompressor
are needed to see this picture.
Reducer Output Cache BenefitFixpointevaluation(s)
Iteration # Iteration #
Livejournal dataset
50 EC2 small instances
Freebase dataset
90 EC2 small instances
01/30/15Bill Howe, UW 26QuickTime™ and a
decompressor
are needed to see this picture.
MI: Mapper Input Cache
 Provides:
 Access to non-local mapper input on later
iterations
 Used:
 During scheduling of map tasks
 Assumes:
1. Mapper input does not change
 PageRank
 Subsumed by use of Reducer Input Cache
 Transitive Closure
 Subsumed by use of Reducer Input Cache
 K-means
 Avoids non-local data reads on iterations > 0
…
01/30/15Bill Howe, UW 27QuickTime™ and a
decompressor
are needed to see this picture.
Mapper Input Cache Benefit
5% non-local data reads;
~5% improvement
01/30/15Bill Howe, UW 28QuickTime™ and a
decompressor
are needed to see this picture.
Conclusions (last slide)
 Relatively simple changes to MapReduce/Hadoop can
support arbitrary recursive programs
 TaskTracker (Cache management)
 Scheduler (Cache awareness)
 Programming model (multi-step loop bodies, cache control)
 Optimizations
 Caching loop invariant data realizes largest gain
 Good to eliminate extra MapReduce step for termination checks
 Mapper input cache benefit inconclusive; need a busier cluster
 Future Work
 Analyze expressiveness of Map Reduce Fixpoint
 Consider a model of Map (Reduce+
) Fixpoint
01/30/15Bill Howe, UW 29QuickTime™ and a
decompressor
are needed to see this picture.
Data-Intensive
Scalable Science
http://clue.cs.washington.edu
http://escience.washington.edu
Award IIS 0844572
Cluster Exploratory (CluE)
01/30/15Bill Howe, UW 30QuickTime™ and a
decompressor
are needed to see this picture.
Motivation in One Slide
 MapReduce can’t express recursion/iteration
 Lots of interesting programs need loops
 graph algorithms
 clustering
 machine learning
 recursive queries (CTEs, datalog, WITH clause)
 Dominant solution: Use a driver program outside
of mapreduce
 Hypothesis: making MapReduce loop-aware
affords optimization
 …and lays a foundation for scalable implementations of
recursive languages
01/30/15Bill Howe, UW 31QuickTime™ and a
decompressor
are needed to see this picture.
Experiments
 Amazon EC2
 20, 50, 90 default small instances
 Datasets
 Billions of Triples (120GB) [1.5B nodes 1.6B edges]
 Freebase (12GB) [7M ndoes 154M edges]
 Livejournal social network (18GB) [4.8M nodes, 67M edges]
 Queries
 Transitive Closure
 PageRank
 k-means
[VLDB 2010]
01/30/15Bill Howe, UW 32QuickTime™ and a
decompressor
are needed to see this picture.
HaLoop Architecture
01/30/15Bill Howe, UW 33QuickTime™ and a
decompressor
are needed to see this picture.
Scheduling Algorithm
Input: Node node
Global variable: HashMap<Node, List<Parition>> last, HashMaph<Node, List<Partition>> current
1: if (iteration ==0) {
2: Partition part = StandardMapReduceSchedule(node);
3: current.add(node, part);
4: }else{
5: if (node.hasFullLoad()) {
6: Node substitution = findNearbyNode(node);
7: last.get(substitution).addAll(last.remove(node));
8: return;
9: }
10: if (last.get(node).size()>0) {
11: Partition part = last.get(node).get(0);
12: schedule(part, node);
13: current.get(node).add(part);
14: list.remove(part);
15: }
16: }
The same as MapReduce
Find a substitution
Iteration-local Schedule
01/30/15Bill Howe, UW 34QuickTime™ and a
decompressor
are needed to see this picture.
Programming Interface
Job job = new Job();
job.AddMap(Map Rank, 1);
job.AddReduce(Reduce Rank, 1);
job.AddMap(Map Aggregate, 2);
job.AddReduce(Reduce Aggregate, 2);
job.AddInvariantTable(#1);
job.SetInput(IterationInput);
job.SetFixedPointThreshold(0.1);
job.SetDistanceMeasure(ResultDistance);
job.SetMaxNumOfIterations(10);
job.SetReducerInputCache(true);
job.SetReducerOutputCache(true);
job.Submit();
define loop body
Turn on caches
Declare an input as invariant
Specify loop body input,
parameterized by iteration #
Termination condition
01/30/15Bill Howe, UW 35QuickTime™ and a
decompressor
are needed to see this picture.
Cache Infrastructure Details
 Programmer control
 Architecture for cache management
 Scheduling for inter-iteration locality
 Indexing the values in the cache
01/30/15Bill Howe, UW 36QuickTime™ and a
decompressor
are needed to see this picture.
Other Extensions and Experiments
 Distributed databases and Pig/Hadoop for Astronomy [IASDS 09]
 Efficient “Friends of Friends” in Dryad [SSDBM 2010]
 SkewReduce: Automated skew handling [SOCC 2010]
 Image Stacking and Mosaicing with Hadoop [Hadoop Summit 2010]
 HaLoop: Efficient iterative processing with Hadoop [VLDB2010]
01/30/15Bill Howe, UW 37QuickTime™ and a
decompressor
are needed to see this picture.
MapReduce Broadly Applicable
 Biology
 [Schatz 08, 09]
 Astronomy
 [IASDS 09, SSDBM 10, SOCC 10, PASP 10]
 Oceanography
 [UltraVis 09]
 Visualization
 [UltraVis 09, EuroVis 10]
01/30/15Bill Howe, UW 38QuickTime™ and a
decompressor
are needed to see this picture.
Key idea
 When the loop output is large…
 transitive closure
 connected components
 PageRank (with a convergence test as the
termination condition)
 …need a distributed fixpoint operator
 typically implemented as yet another
MapReduce job -- on every iteration
01/30/15Bill Howe, UW 39QuickTime™ and a
decompressor
are needed to see this picture.
Background
 Why is MapReduce popular?
 Because it’s fast?
 Because it scales to 1000s of commodity
nodes?
 Because it’s fault tolerant?
 Witness
 MapReduce on GPUs
 MapReduce on MPI
 MapReduce in main memory
 MapReduce on <10 nodes
01/30/15Bill Howe, UW 40QuickTime™ and a
decompressor
are needed to see this picture.
So why is MapReduce popular?
 The programming model
 Two serial functions, parallelism for free
 Easy and expressive
 Compare this with MPI
 70+ operations
 But it can’t express recursion
 graph algorithms
 clustering
 machine learning
 recursive queries (CTEs, datalog, WITH clause)
01/30/15Bill Howe, UW 41QuickTime™ and a
decompressor
are needed to see this picture.
Fixpoint
 A fixpoint of a function f is a value x such that
f(x) = x
 The fixpoint queries FIX can be expressed with
the relational algebra plus a fixpoint operator
 Map - Reduce - Fixpoint
 hypothesis: sufficient model for all recursive queries

More Related Content

PPTX
rit seminars-privacy assured outsourcing of image reconstruction services in ...
PPTX
Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux F...
PDF
RedisConf18 - Lower Latency Graph Queries in Cypher with Redis Graph
PPTX
Introduction to Mahout
PDF
Lbc data reduction
PDF
Building and road detection from large aerial imagery
PDF
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...
PPTX
Parallel K means clustering using CUDA
rit seminars-privacy assured outsourcing of image reconstruction services in ...
Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux F...
RedisConf18 - Lower Latency Graph Queries in Cypher with Redis Graph
Introduction to Mahout
Lbc data reduction
Building and road detection from large aerial imagery
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...
Parallel K means clustering using CUDA

What's hot (11)

PDF
GPU Accelerated Domain Decomposition
PDF
131107 foss4 g_osaka_grass7_presentation
PDF
An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...
PDF
Parametric surface visualization in Directx 11 and C++
PDF
2 transformation computer graphics
PDF
6.3.2 CLIMADA model demo
PDF
matlab_simulink_for_control082p.pdf
PDF
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
PDF
Fast and Scalable NUMA-based Thread Parallel Breadth-first Search
PDF
Survey onhpcs languages
PDF
Chap5 - ADSP 21K Manual
GPU Accelerated Domain Decomposition
131107 foss4 g_osaka_grass7_presentation
An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...
Parametric surface visualization in Directx 11 and C++
2 transformation computer graphics
6.3.2 CLIMADA model demo
matlab_simulink_for_control082p.pdf
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Fast and Scalable NUMA-based Thread Parallel Breadth-first Search
Survey onhpcs languages
Chap5 - ADSP 21K Manual
Ad

Similar to HaLoop: Efficient Iterative Processing on Large-Scale Clusters (20)

PPTX
Foundations of streaming SQL: stream & table theory
PPT
Intermachine Parallelism
PDF
Deep Learning for Computer Vision: Attention Models (UPC 2016)
PDF
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
PDF
MapReduce Algorithm Design
PDF
Streaming SQL Foundations: Why I ❤ Streams+Tables
PPT
CS 354 Transformation, Clipping, and Culling
PDF
Portofolio Control Version SN
PPT
Build Your Own 3D Scanner: 3D Scanning with Structured Lighting
PPTX
Reactive programming at scale
PDF
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
PDF
Time-Evolving Graph Processing On Commodity Clusters
PDF
Log polar coordinates
DOCX
Project report on design & implementation of high speed carry select adder
PDF
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
PPT
Srushti_M.E_PPT.ppt
PDF
A Multicore Parallelization of Continuous Skyline Queries on Data Streams
PPTX
Rendering of Complex 3D Treemaps (GRAPP 2013)
PPTX
An efficient quantum image representation for compression and processing
PPTX
Introduction to MRST for Reservoir Simulation
Foundations of streaming SQL: stream & table theory
Intermachine Parallelism
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
MapReduce Algorithm Design
Streaming SQL Foundations: Why I ❤ Streams+Tables
CS 354 Transformation, Clipping, and Culling
Portofolio Control Version SN
Build Your Own 3D Scanner: 3D Scanning with Structured Lighting
Reactive programming at scale
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
Time-Evolving Graph Processing On Commodity Clusters
Log polar coordinates
Project report on design & implementation of high speed carry select adder
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Srushti_M.E_PPT.ppt
A Multicore Parallelization of Continuous Skyline Queries on Data Streams
Rendering of Complex 3D Treemaps (GRAPP 2013)
An efficient quantum image representation for compression and processing
Introduction to MRST for Reservoir Simulation
Ad

More from University of Washington (20)

PPTX
Database Agnostic Workload Management (CIDR 2019)
PPTX
Data Responsibly: The next decade of data science
PPTX
Thoughts on Big Data and more for the WA State Legislature
PPTX
The Other HPC: High Productivity Computing in Polystore Environments
PPTX
Big Data + Big Sim: Query Processing over Unstructured CFD Models
PPTX
Data, Responsibly: The Next Decade of Data Science
PPTX
Democratizing Data Science in the Cloud
PPTX
Science Data, Responsibly
PPTX
Data Science, Data Curation, and Human-Data Interaction
PPTX
The Other HPC: High Productivity Computing
PPTX
Urban Data Science at UW
PPTX
Intro to Data Science Concepts
PPTX
Big Data Talent in Academic and Industry R&D
PPTX
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
PPTX
Data Science and Urban Science @ UW
PPTX
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
PPTX
XLDB South America Keynote: eScience Institute and Myria
PPTX
Myria: Analytics-as-a-Service for (Data) Scientists
PPTX
Big Data Curricula at the UW eScience Institute, JSM 2013
PPTX
eResearch New Zealand Keynote
Database Agnostic Workload Management (CIDR 2019)
Data Responsibly: The next decade of data science
Thoughts on Big Data and more for the WA State Legislature
The Other HPC: High Productivity Computing in Polystore Environments
Big Data + Big Sim: Query Processing over Unstructured CFD Models
Data, Responsibly: The Next Decade of Data Science
Democratizing Data Science in the Cloud
Science Data, Responsibly
Data Science, Data Curation, and Human-Data Interaction
The Other HPC: High Productivity Computing
Urban Data Science at UW
Intro to Data Science Concepts
Big Data Talent in Academic and Industry R&D
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Data Science and Urban Science @ UW
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
XLDB South America Keynote: eScience Institute and Myria
Myria: Analytics-as-a-Service for (Data) Scientists
Big Data Curricula at the UW eScience Institute, JSM 2013
eResearch New Zealand Keynote

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Approach and Philosophy of On baking technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Spectroscopy.pptx food analysis technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Big Data Technologies - Introduction.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The AUB Centre for AI in Media Proposal.docx
Unlocking AI with Model Context Protocol (MCP)
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectral efficient network and resource selection model in 5G networks
Network Security Unit 5.pdf for BCA BBA.
Approach and Philosophy of On baking technology
Machine learning based COVID-19 study performance prediction
Encapsulation_ Review paper, used for researhc scholars
MIND Revenue Release Quarter 2 2025 Press Release
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Spectroscopy.pptx food analysis technology
Review of recent advances in non-invasive hemoglobin estimation
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Understanding_Digital_Forensics_Presentation.pptx
Big Data Technologies - Introduction.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx

HaLoop: Efficient Iterative Processing on Large-Scale Clusters

  • 1. QuickTime™ and a decompressor are needed to see this picture. HaLoop: Efficient Iterative Data Processing On Large Scale Clusters Yingyi Bu, UC Irvine Bill Howe, UW Magda Balazinska, UW Michael Ernst, UW http://clue.cs.washington.edu/ Award IIS 0844572 Cluster Exploratory (CluE) QuickTime™ and a decompressor are needed to see this picture. http://escience.washington.edu/ VLDB 2010, Singapore Horizon
  • 2. 01/30/15Bill Howe, UW 2QuickTime™ and a decompressor are needed to see this picture. Thesis in one slide  Observation: MapReduce has proven successful as a common runtime for non-recursive declarative languages  HIVE (SQL)  Pig (RA with nested types)  Observation: Many people roll their own loops  Graphs, clustering, mining, recursive queries  iteration managed by external script  Thesis: With minimal extensions, we can provide an efficient common runtime for recursive languages  Map, Reduce, Fixpoint
  • 3. 01/30/15Bill Howe, UW 3QuickTime™ and a decompressor are needed to see this picture. Related Work: Twister [Ekanayake HPDC 2010]  Redesigned evaluation engine using pub/sub  Termination condition evaluated by main() 13. while(!complete){ 14. monitor = driver.runMapReduceBCast(cData); 15. monitor.monitorTillCompletion(); 16. DoubleVectorData newCData = ((KMeansCombiner) driver .getCurrentCombiner()).getResults(); 17. totalError = getError(cData, newCData); 18. cData = newCData; 19. if (totalError < THRESHOLD) { 20. complete = true; 21. break; 22. } 23. } O(k)
  • 4. 01/30/15Bill Howe, UW 4QuickTime™ and a decompressor are needed to see this picture. In Detail: PageRank (Twister) while (!complete) { // start the pagerank map reduce process monitor = driver.runMapReduceBCast(new BytesValue(tmpCompressedDvd.getBytes())); monitor.monitorTillCompletion(); // get the result of process newCompressedDvd = ((PageRankCombiner) driver.getCurrentCombiner()).getResults(); // decompress the compressed pagerank values newDvd = decompress(newCompressedDvd); tmpDvd = decompress(tmpCompressedDvd); totalError = getError(tmpDvd, newDvd); // get the difference between new and old pagerank values if (totalError < tolerance) { complete = true; } tmpCompressedDvd = newCompressedDvd; } O(N) in the size of the graph run MR term. cond.
  • 5. 01/30/15Bill Howe, UW 5QuickTime™ and a decompressor are needed to see this picture. Related Work: Spark [Zaharia HotCloud 2010]  Reduction output collected at driver program  “…does not currently support a grouped reduce operation as in MapReduce” val spark = new SparkContext(<Mesos master>) var count = spark.accumulator(0) for (i <- spark.parallelize(1 to 10000, 10)) { val x = Math.random * 2 - 1 val y = Math.random * 2 - 1 if (x*x + y*y < 1) count += 1 } println("Pi is roughly " + 4 * count.value / 10000.0) all output sent to driver.
  • 6. 01/30/15Bill Howe, UW 6QuickTime™ and a decompressor are needed to see this picture. Related Work: Pregel [Malewicz PODC 2009]  Graphs only  clustering: k-means, canopy, DBScan  Assumes each vertex has access to outgoing edges  So an edge representation …  …requires offline preprocessing  perhaps using MapReduce Edge(from, to)
  • 7. 01/30/15Bill Howe, UW 7QuickTime™ and a decompressor are needed to see this picture. Related Work: Piccolo [Power OSDI 2010]  Partitioned table data model, with user- defined partitioning  Programming model:  message-passing with global synchronization barriers  User can give locality hints  Worth exploring a direct comparison GroupTables(curr, next, graph)
  • 8. 01/30/15Bill Howe, UW 8QuickTime™ and a decompressor are needed to see this picture. Related Work: BOOM [c.f. Alvaro EuroSys 10]  Distributed computing based on Overlog (Datalog + temporal logic + more)  Recursion supported naturally  app: API-compliant implementation of MR  Worth exploring a direct comparison
  • 9. 01/30/15Bill Howe, UW 9QuickTime™ and a decompressor are needed to see this picture. Details  Architecture  Programming Model  Caching (and Indexing)  Scheduling
  • 10. 01/30/15Bill Howe, UW 10QuickTime™ and a decompressor are needed to see this picture. Example 1: PageRank url rank www.a.com 1.0 www.b.com 1.0 www.c.com 1.0 www.d.com 1.0 www.e.com 1.0 url_src url_dest www.a.com www.b.com www.a.com www.c.com www.c.com www.a.com www.e.com www.c.com www.d.com www.b.com www.c.com www.e.com www.e.com www.c.om www.a.com www.d.com Rank Table R0 Linkage Table L url rank www.a.com 2.13 www.b.com 3.89 www.c.com 2.60 www.d.com 2.60 www.e.com 2.13 Rank Table R3 Ri L Ri.rank = Ri.rank/γurlCOUNT(url_dest) Ri.url = L.url_src π(url_dest, γurl_destSUM(rank)) Ri+1
  • 11. 01/30/15Bill Howe, UW 11QuickTime™ and a decompressor are needed to see this picture. A MapReduce Implementation M M M M M r r Ri L-split1 L-split0 M M r r i=i+1 Converged? Join & compute rank Aggregate fixpoint evaluation Client done r r
  • 12. 01/30/15Bill Howe, UW 12QuickTime™ and a decompressor are needed to see this picture. What’s the problem? 1. L is loaded on each iteration 2. L is shuffled on each iteration 3. Fixpoint evaluated as a separate MapReduce job per iteration m m m Ri L-split1 L-split0 M M r r 1. 2. 3. L is loop invariant, but plus r r M M r r
  • 13. 01/30/15Bill Howe, UW 13QuickTime™ and a decompressor are needed to see this picture. Example 2: Transitive Closure Friend Find all transitive friends of Eric {Eric, Elisa} {Eric, Tom Eric, Harry} {} R1 R0 {Eric, Eric} R2 R3 (semi-naïve evaluation)
  • 14. 01/30/15Bill Howe, UW 14QuickTime™ and a decompressor are needed to see this picture. Example 2 in MapReduce M M M M M r r Si Friend1 Friend0 i=i+1 Anything new? Join Dupe-elim Client done r r (compute next generation of friends) (remove the ones we’ve already seen)
  • 15. 01/30/15Bill Howe, UW 15QuickTime™ and a decompressor are needed to see this picture. What’s the problem? 1. Friend is loaded on each iteration 2. Friend is shuffled on each iteration Friend is loop invariant, but M M M M M r r Si Friend1 Friend0 Join Dupe-elim r r (compute next generation of friends) (remove the ones we’ve already seen) 1. 2.
  • 16. 01/30/15Bill Howe, UW 16QuickTime™ and a decompressor are needed to see this picture. Example 3: k-means M M M P0 i=i+1 ki - ki+1 < threshold? Client done r r P1 P2 = k centroids at iteration iki ki ki ki ki+1
  • 17. 01/30/15Bill Howe, UW 17QuickTime™ and a decompressor are needed to see this picture. What’s the problem? M M M P0 i=i+1 ki - ki+1 < threshold? Client done r r P1 P2 = k centroids at iteration ikiki ki ki ki+1 1. P is loaded on each iteration P is loop invariant, but 1.
  • 18. 01/30/15Bill Howe, UW 18QuickTime™ and a decompressor are needed to see this picture. Approach: Inter-iteration caching Mapper input cache (MI) Mapper output cache (MO) Reducer input cache (RI) Reducer output cache (RO) M M M r r … Loop body
  • 19. 01/30/15Bill Howe, UW 19QuickTime™ and a decompressor are needed to see this picture. RI: Reducer Input Cache  Provides:  Access to loop invariant data without map/shuffle  Used By:  Reducer function  Assumes: 1. Mapper output for a given table constant across iterations 2. Static partitioning (implies: no new nodes)  PageRank  Avoid shuffling the network at every step  Transitive Closure  Avoid shuffling the graph at every step  K-means  No help …
  • 20. 01/30/15Bill Howe, UW 20QuickTime™ and a decompressor are needed to see this picture. Reducer Input Cache Benefit Transitive Closure Billion Triples Dataset (120GB) 90 small instances on EC2 Overall run time
  • 21. 01/30/15Bill Howe, UW 21QuickTime™ and a decompressor are needed to see this picture. Reducer Input Cache Benefit Transitive Closure Billion Triples Dataset (120GB) 90 small instances on EC2 Join step only Livejournal, 12GB
  • 22. 01/30/15Bill Howe, UW 22QuickTime™ and a decompressor are needed to see this picture. Reducer Input Cache Benefit Transitive Closure Billion Triples Dataset (120GB) 90 small instances on EC2 Reduce and Shuffle of Join Step Livejournal, 12GB
  • 23. 01/30/15Bill Howe, UW 23QuickTime™ and a decompressor are needed to see this picture. Join & compute rank M M M M M r r Ri L-split1 L-split0 M M r r Aggregate fixpoint evaluation r r Total
  • 24. 01/30/15Bill Howe, UW 24QuickTime™ and a decompressor are needed to see this picture. RO: Reducer Output Cache  Provides:  Distributed access to output of previous iterations  Used By:  Fixpoint evaluation  Assumes: 1. Partitioning constant across iterations 2. Reducer output key functionally determines Reducer input key  PageRank  Allows distributed fixpoint evaluation  Obviates extra MapReduce job  Transitive Closure  No help  K-means  No help …
  • 25. 01/30/15Bill Howe, UW 25QuickTime™ and a decompressor are needed to see this picture. Reducer Output Cache BenefitFixpointevaluation(s) Iteration # Iteration # Livejournal dataset 50 EC2 small instances Freebase dataset 90 EC2 small instances
  • 26. 01/30/15Bill Howe, UW 26QuickTime™ and a decompressor are needed to see this picture. MI: Mapper Input Cache  Provides:  Access to non-local mapper input on later iterations  Used:  During scheduling of map tasks  Assumes: 1. Mapper input does not change  PageRank  Subsumed by use of Reducer Input Cache  Transitive Closure  Subsumed by use of Reducer Input Cache  K-means  Avoids non-local data reads on iterations > 0 …
  • 27. 01/30/15Bill Howe, UW 27QuickTime™ and a decompressor are needed to see this picture. Mapper Input Cache Benefit 5% non-local data reads; ~5% improvement
  • 28. 01/30/15Bill Howe, UW 28QuickTime™ and a decompressor are needed to see this picture. Conclusions (last slide)  Relatively simple changes to MapReduce/Hadoop can support arbitrary recursive programs  TaskTracker (Cache management)  Scheduler (Cache awareness)  Programming model (multi-step loop bodies, cache control)  Optimizations  Caching loop invariant data realizes largest gain  Good to eliminate extra MapReduce step for termination checks  Mapper input cache benefit inconclusive; need a busier cluster  Future Work  Analyze expressiveness of Map Reduce Fixpoint  Consider a model of Map (Reduce+ ) Fixpoint
  • 29. 01/30/15Bill Howe, UW 29QuickTime™ and a decompressor are needed to see this picture. Data-Intensive Scalable Science http://clue.cs.washington.edu http://escience.washington.edu Award IIS 0844572 Cluster Exploratory (CluE)
  • 30. 01/30/15Bill Howe, UW 30QuickTime™ and a decompressor are needed to see this picture. Motivation in One Slide  MapReduce can’t express recursion/iteration  Lots of interesting programs need loops  graph algorithms  clustering  machine learning  recursive queries (CTEs, datalog, WITH clause)  Dominant solution: Use a driver program outside of mapreduce  Hypothesis: making MapReduce loop-aware affords optimization  …and lays a foundation for scalable implementations of recursive languages
  • 31. 01/30/15Bill Howe, UW 31QuickTime™ and a decompressor are needed to see this picture. Experiments  Amazon EC2  20, 50, 90 default small instances  Datasets  Billions of Triples (120GB) [1.5B nodes 1.6B edges]  Freebase (12GB) [7M ndoes 154M edges]  Livejournal social network (18GB) [4.8M nodes, 67M edges]  Queries  Transitive Closure  PageRank  k-means [VLDB 2010]
  • 32. 01/30/15Bill Howe, UW 32QuickTime™ and a decompressor are needed to see this picture. HaLoop Architecture
  • 33. 01/30/15Bill Howe, UW 33QuickTime™ and a decompressor are needed to see this picture. Scheduling Algorithm Input: Node node Global variable: HashMap<Node, List<Parition>> last, HashMaph<Node, List<Partition>> current 1: if (iteration ==0) { 2: Partition part = StandardMapReduceSchedule(node); 3: current.add(node, part); 4: }else{ 5: if (node.hasFullLoad()) { 6: Node substitution = findNearbyNode(node); 7: last.get(substitution).addAll(last.remove(node)); 8: return; 9: } 10: if (last.get(node).size()>0) { 11: Partition part = last.get(node).get(0); 12: schedule(part, node); 13: current.get(node).add(part); 14: list.remove(part); 15: } 16: } The same as MapReduce Find a substitution Iteration-local Schedule
  • 34. 01/30/15Bill Howe, UW 34QuickTime™ and a decompressor are needed to see this picture. Programming Interface Job job = new Job(); job.AddMap(Map Rank, 1); job.AddReduce(Reduce Rank, 1); job.AddMap(Map Aggregate, 2); job.AddReduce(Reduce Aggregate, 2); job.AddInvariantTable(#1); job.SetInput(IterationInput); job.SetFixedPointThreshold(0.1); job.SetDistanceMeasure(ResultDistance); job.SetMaxNumOfIterations(10); job.SetReducerInputCache(true); job.SetReducerOutputCache(true); job.Submit(); define loop body Turn on caches Declare an input as invariant Specify loop body input, parameterized by iteration # Termination condition
  • 35. 01/30/15Bill Howe, UW 35QuickTime™ and a decompressor are needed to see this picture. Cache Infrastructure Details  Programmer control  Architecture for cache management  Scheduling for inter-iteration locality  Indexing the values in the cache
  • 36. 01/30/15Bill Howe, UW 36QuickTime™ and a decompressor are needed to see this picture. Other Extensions and Experiments  Distributed databases and Pig/Hadoop for Astronomy [IASDS 09]  Efficient “Friends of Friends” in Dryad [SSDBM 2010]  SkewReduce: Automated skew handling [SOCC 2010]  Image Stacking and Mosaicing with Hadoop [Hadoop Summit 2010]  HaLoop: Efficient iterative processing with Hadoop [VLDB2010]
  • 37. 01/30/15Bill Howe, UW 37QuickTime™ and a decompressor are needed to see this picture. MapReduce Broadly Applicable  Biology  [Schatz 08, 09]  Astronomy  [IASDS 09, SSDBM 10, SOCC 10, PASP 10]  Oceanography  [UltraVis 09]  Visualization  [UltraVis 09, EuroVis 10]
  • 38. 01/30/15Bill Howe, UW 38QuickTime™ and a decompressor are needed to see this picture. Key idea  When the loop output is large…  transitive closure  connected components  PageRank (with a convergence test as the termination condition)  …need a distributed fixpoint operator  typically implemented as yet another MapReduce job -- on every iteration
  • 39. 01/30/15Bill Howe, UW 39QuickTime™ and a decompressor are needed to see this picture. Background  Why is MapReduce popular?  Because it’s fast?  Because it scales to 1000s of commodity nodes?  Because it’s fault tolerant?  Witness  MapReduce on GPUs  MapReduce on MPI  MapReduce in main memory  MapReduce on <10 nodes
  • 40. 01/30/15Bill Howe, UW 40QuickTime™ and a decompressor are needed to see this picture. So why is MapReduce popular?  The programming model  Two serial functions, parallelism for free  Easy and expressive  Compare this with MPI  70+ operations  But it can’t express recursion  graph algorithms  clustering  machine learning  recursive queries (CTEs, datalog, WITH clause)
  • 41. 01/30/15Bill Howe, UW 41QuickTime™ and a decompressor are needed to see this picture. Fixpoint  A fixpoint of a function f is a value x such that f(x) = x  The fixpoint queries FIX can be expressed with the relational algebra plus a fixpoint operator  Map - Reduce - Fixpoint  hypothesis: sufficient model for all recursive queries

Editor's Notes

  • #4: Programming Model
  • #6: Programming Model
  • #41: User writes two serial functions, and MapReduce create a parallel program with fairly clear semantics and good scalability.