SlideShare a Scribd company logo
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN
MAPREDUCE FOR BIG DATA APPLICATIONS
ABSTRACT
The MapReduce programming model simplifies large-scale data processing on
commodity cluster by exploiting parallel map tasks and reduce tasks. Although many
efforts have been made to improve the performance of MapReduce jobs, they ignore
the network traffic generated in the shuffle phase, which plays a critical role in
performance enhancement. Traditionally, a hash function is used to partition
intermediate data among reduce tasks, which, however, is not traffic-efficient because
network topology and data size associated with each key are not taken into
consideration. In this paper, we study to reduce network traffic cost for a MapReduce
job by designing a novel intermediate data partition scheme. Furthermore, we jointly
consider the aggregator placement problem, where each aggregator can reduce
merged traffic from multiple map tasks. A decomposition-based distributed algorithm is
proposed to deal with the large-scale optimization problem for big data application and
an online algorithm is also designed to adjust data partition and aggregation in a
dynamic manner. Finally, extensive simulation results demonstrate that our proposals
can significantly reduce network traffic cost under both offline and online cases.

More Related Content

DOCX
Cross cloud map reduce for big data
PPT
5 spatial data editing
PDF
Scalable algorithms for nearest neighbor joins on big trajectory data
DOCX
EMR: A SCALABLE GRAPH-BASED RANKING MODEL FOR CONTENT-BASED IMAGE RETRIEVAL
PPTX
Visualization Proess
PPTX
Spatiotemporal analytics
DOCX
Data mining algorithm for cloud network information based on artificial intel...
Cross cloud map reduce for big data
5 spatial data editing
Scalable algorithms for nearest neighbor joins on big trajectory data
EMR: A SCALABLE GRAPH-BASED RANKING MODEL FOR CONTENT-BASED IMAGE RETRIEVAL
Visualization Proess
Spatiotemporal analytics
Data mining algorithm for cloud network information based on artificial intel...

What's hot (20)

PPT
Gis Concepts 4/5
PPTX
Geographical information system
PDF
Benchmarking tool for graph algorithms
PDF
ONLINE SUBGRAPH SKYLINE ANALYSIS OVER KNOWLEDGE GRAPHS
PPT
Gis Concepts 5/5
PPTX
Map Reduce introduction (google white papers)
DOCX
Improving resource utilisation in the cloud environment using multivariate pr...
PPT
Improvement of Spatial Data Quality Using the Data Conflation
PPT
Iccsa stankuteha180611
PPTX
QUERY AND NETWORK ANALYSIS IN GIS
PDF
Creating Geometric Networks at the City of Barrie
PDF
Workshop on Real-time & Stream Analytics IEEE BigData 2016
PPT
Case Study Of Webgraph
PPTX
CRS and SVG
PDF
Inventory3D v0.5
DOCX
Smart meter data analytics for distribution network connectivity verification
PDF
Network analysis in gis , part 4 transportation networks
PPTX
Network analysis in gis
PPT
Network Analysis in ArcGIS
PDF
Elastic resource utilization framework for high capacity and energy efficienc...
Gis Concepts 4/5
Geographical information system
Benchmarking tool for graph algorithms
ONLINE SUBGRAPH SKYLINE ANALYSIS OVER KNOWLEDGE GRAPHS
Gis Concepts 5/5
Map Reduce introduction (google white papers)
Improving resource utilisation in the cloud environment using multivariate pr...
Improvement of Spatial Data Quality Using the Data Conflation
Iccsa stankuteha180611
QUERY AND NETWORK ANALYSIS IN GIS
Creating Geometric Networks at the City of Barrie
Workshop on Real-time & Stream Analytics IEEE BigData 2016
Case Study Of Webgraph
CRS and SVG
Inventory3D v0.5
Smart meter data analytics for distribution network connectivity verification
Network analysis in gis , part 4 transportation networks
Network analysis in gis
Network Analysis in ArcGIS
Elastic resource utilization framework for high capacity and energy efficienc...
Ad

Similar to ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICATIONS (20)

PDF
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
PDF
PAGE: A Partition Aware Engine for Parallel Graph Computation
PDF
C044051215
DOC
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
PDF
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
PDF
11.concept for a web map implementation with faster query response
PDF
Concept for a web map implementation with faster query response
DOC
Page a partition aware engine for parallel graph computation
DOCX
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
DOCX
Page a partition aware engine
DOCX
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
PDF
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
PDF
Shortest path estimation for graph
PDF
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...
PDF
T180304125129
PDF
An efficient and robust parallel scheduler for bioinformatics applications in...
PDF
E031201032036
PDF
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
PDF
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
PDF
International Journal of Engineering Inventions (IJEI)
On Traffic-Aware Partition and Aggregation in Map Reduce for Big Data Applica...
PAGE: A Partition Aware Engine for Parallel Graph Computation
C044051215
A Survey on Data Mapping Strategy for data stored in the storage cloud 111
Novel Scheduling Algorithms for Efficient Deployment of Map Reduce Applicatio...
11.concept for a web map implementation with faster query response
Concept for a web map implementation with faster query response
Page a partition aware engine for parallel graph computation
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
Page a partition aware engine
PAGE: A PARTITION AWARE ENGINE FOR PARALLEL GRAPH COMPUTATION
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
Shortest path estimation for graph
IEEE Projects, Non-IEEE Projects, Data Mining, Cloud computing, Main Projects...
T180304125129
An efficient and robust parallel scheduler for bioinformatics applications in...
E031201032036
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
International Journal of Engineering Inventions (IJEI)
Ad

More from I3E Technologies (20)

PPTX
Design of a low voltage low-dropout regulator
PPTX
An efficient constant multiplier architecture based on vertical horizontal bi...
PPTX
Aging aware reliable multiplier design with adaptive hold logic
PPTX
A high performance fir filter architecture for fixed and reconfigurable appli...
PPTX
A generalized algorithm and reconfigurable architecture for efficient and sca...
PPTX
A combined sdc sdf architecture for normal i o pipelined radix-2 fft
PPTX
Reverse converter design via parallel prefix adders novel components, method...
PPTX
Pre encoded multipliers based on non-redundant radix-4 signed-digit encoding
PPTX
Energy optimized subthreshold vlsi logic family with unbalanced pull up down ...
PPTX
Variable form carrier-based pwm for boost-voltage motor driver with a charge-...
PPTX
Ultrasparse ac link converters
PPTX
Single inductor dual-output buck–boost power factor correction converter
PPTX
Ripple minimization through harmonic elimination in asymmetric interleaved mu...
PPTX
Resonance analysis and soft switching design of isolated boost converter with...
PPTX
Reliability evaluation of conventional and interleaved dc–dc boost converters
PPTX
Power factor corrected zeta converter based improved power quality switched m...
PPTX
Pfc cuk converter fed bldc motor drive
PPTX
Optimized operation of current fed dual active bridge dc dc converter for pv ...
PPTX
Online variable topology type photovoltaic grid-connected inverter
Design of a low voltage low-dropout regulator
An efficient constant multiplier architecture based on vertical horizontal bi...
Aging aware reliable multiplier design with adaptive hold logic
A high performance fir filter architecture for fixed and reconfigurable appli...
A generalized algorithm and reconfigurable architecture for efficient and sca...
A combined sdc sdf architecture for normal i o pipelined radix-2 fft
Reverse converter design via parallel prefix adders novel components, method...
Pre encoded multipliers based on non-redundant radix-4 signed-digit encoding
Energy optimized subthreshold vlsi logic family with unbalanced pull up down ...
Variable form carrier-based pwm for boost-voltage motor driver with a charge-...
Ultrasparse ac link converters
Single inductor dual-output buck–boost power factor correction converter
Ripple minimization through harmonic elimination in asymmetric interleaved mu...
Resonance analysis and soft switching design of isolated boost converter with...
Reliability evaluation of conventional and interleaved dc–dc boost converters
Power factor corrected zeta converter based improved power quality switched m...
Pfc cuk converter fed bldc motor drive
Optimized operation of current fed dual active bridge dc dc converter for pv ...
Online variable topology type photovoltaic grid-connected inverter

Recently uploaded (20)

PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Cell Types and Its function , kingdom of life
PPTX
master seminar digital applications in india
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Insiders guide to clinical Medicine.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Sports Quiz easy sports quiz sports quiz
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Basic Mud Logging Guide for educational purpose
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
RMMM.pdf make it easy to upload and study
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Cell Types and Its function , kingdom of life
master seminar digital applications in india
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Microbial disease of the cardiovascular and lymphatic systems
Insiders guide to clinical Medicine.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Sports Quiz easy sports quiz sports quiz
Supply Chain Operations Speaking Notes -ICLT Program
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Basic Mud Logging Guide for educational purpose
VCE English Exam - Section C Student Revision Booklet
102 student loan defaulters named and shamed – Is someone you know on the list?
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Pharmacology of Heart Failure /Pharmacotherapy of CHF
RMMM.pdf make it easy to upload and study

ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICATIONS

  • 1. ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICATIONS ABSTRACT The MapReduce programming model simplifies large-scale data processing on commodity cluster by exploiting parallel map tasks and reduce tasks. Although many efforts have been made to improve the performance of MapReduce jobs, they ignore the network traffic generated in the shuffle phase, which plays a critical role in performance enhancement. Traditionally, a hash function is used to partition intermediate data among reduce tasks, which, however, is not traffic-efficient because network topology and data size associated with each key are not taken into consideration. In this paper, we study to reduce network traffic cost for a MapReduce job by designing a novel intermediate data partition scheme. Furthermore, we jointly consider the aggregator placement problem, where each aggregator can reduce merged traffic from multiple map tasks. A decomposition-based distributed algorithm is proposed to deal with the large-scale optimization problem for big data application and an online algorithm is also designed to adjust data partition and aggregation in a dynamic manner. Finally, extensive simulation results demonstrate that our proposals can significantly reduce network traffic cost under both offline and online cases.