SlideShare a Scribd company logo
Cross-cloud MapReduce for Big Data
ABSTRACT:
MapReduce plays a critical role as a leading framework for big data analytics. In
this paper, we consider a geo-distributed cloud architecture that provides
MapReduce services based on the big data collected from end users all over the
world. Existing work handles MapReduce jobs by a traditional computation-centric
approach that all input data distributed in multiple clouds are aggregated to a
virtual cluster that resides in a single cloud. Its poor efficiency and high cost for
big data support motivate us to propose a novel data-centric architecture with three
key techniques, namely, cross-cloud virtual cluster, data-centric job placement, and
network coding based traffic routing. Our design leads to an optimization
framework with the objective of minimizing both computation and transmission
cost for running a set of MapReduce jobs in geo-distributed clouds. We further
design a parallel algorithm by decomposing the original large-scale problem into
several distributively solvable sub-problems that are coordinated by a high-level
master problem. Finally, we conduct real-world experiments and extensive
simulations to show that our proposal significantly outperforms the existing works.
SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:
 System : i3 Processor
 Hard Disk : 500 GB.
 Monitor : 15’’ LED
 Input Devices : Keyboard, Mouse
 Ram : 4GB.
SOFTWARE REQUIREMENTS:
 Operating system : Windows 7/UBUNTU.
 Coding Language : Java 1.7 ,Hadoop 0.8.1
 IDE : Eclipse
 Database : MYSQL
REFERENCE:
Peng Li, Member, IEEE, Song Guo, Senior Member, IEEE, Shui Yu, Member,
IEEE, and Weihua Zhuang, Fellow, IEEE, “Cross-cloud MapReduce for Big
Data”, IEEE Transactions on Cloud Computing, 2019.

More Related Content

PPTX
Planet lab : cloud vs grid computing
DOCX
JPJ1402 A Scalable Two-Phase Top-Down Specialization Approach For Data Anon...
PPTX
Twister4Azure - Iterative MapReduce for Azure Cloud
DOCX
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
PDF
Over flow multi site aware big data management for scientific workflows on cl...
PPTX
Migration of groups of virtual machines in distributed data centers to reduce...
DOCX
Improving resource utilisation in the cloud environment using multivariate pr...
PPTX
Ict01 g113 cloud-computing_castillo
Planet lab : cloud vs grid computing
JPJ1402 A Scalable Two-Phase Top-Down Specialization Approach For Data Anon...
Twister4Azure - Iterative MapReduce for Azure Cloud
ON TRAFFIC-AWARE PARTITION AND AGGREGATION IN MAPREDUCE FOR BIG DATA APPLICAT...
Over flow multi site aware big data management for scientific workflows on cl...
Migration of groups of virtual machines in distributed data centers to reduce...
Improving resource utilisation in the cloud environment using multivariate pr...
Ict01 g113 cloud-computing_castillo

What's hot (18)

PDF
Combining efficiency, fidelity, and flexibility in resource information services
PDF
2015 cloud sim projects
PDF
Distributed in memory processing of all k nearest neighbor queries
PDF
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
PPTX
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
PDF
A modeling approach for cloud infrastructure planning considering dependabili...
PPTX
Slide 1
DOCX
Cloud colonography distributed medical testbed over cloud
PPTX
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)
PDF
Using TensorFlow for Machine Learning
PDF
A 01
PPTX
PPT
Dotnet IEEE projects in cloud computing|| ieee dotnet cloud computing projects
PDF
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
PPTX
CloudClustering: Toward an Iterative Data Processing Pattern on the Cloud
PDF
Day 1 Conference Welcome by Erik Weaver
Combining efficiency, fidelity, and flexibility in resource information services
2015 cloud sim projects
Distributed in memory processing of all k nearest neighbor queries
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
A modeling approach for cloud infrastructure planning considering dependabili...
Slide 1
Cloud colonography distributed medical testbed over cloud
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)
Using TensorFlow for Machine Learning
A 01
Dotnet IEEE projects in cloud computing|| ieee dotnet cloud computing projects
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
CloudClustering: Toward an Iterative Data Processing Pattern on the Cloud
Day 1 Conference Welcome by Erik Weaver
Ad

Similar to Cross cloud map reduce for big data (20)

PDF
Software Architecture for Big Data and the Cloud 1st Edition Ivan Mistrik
DOCX
A Survey on Geographically Distributed Big-Data Processing using Map Reduce
PDF
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
PPTX
Cloud Services for Big Data Analytics
PPTX
Cloud Services for Big Data Analytics
PDF
PDF
Computation of spatial data on Hadoop Cluster
PPTX
Slide 1
PDF
IRJET- Cost Effective Workflow Scheduling in Bigdata
PPTX
Comparative study of Data management for cloud computing deployment
PDF
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
PPTX
Research on vector spatial data storage scheme based
PPTX
Big Data Architecture for Sensing Applications
PDF
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
PDF
Latest Research Topics on Cloud Computing
PDF
Adoption of Cloud Computing in Scientific Research
PDF
Service oriented cloud architecture for improved
PDF
Service oriented cloud architecture for improved performance of smart grid ap...
PPT
Database Management Myths & Reality for the future
PPTX
Cloud-Based Big Data Analytics
Software Architecture for Big Data and the Cloud 1st Edition Ivan Mistrik
A Survey on Geographically Distributed Big-Data Processing using Map Reduce
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
Cloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
Computation of spatial data on Hadoop Cluster
Slide 1
IRJET- Cost Effective Workflow Scheduling in Bigdata
Comparative study of Data management for cloud computing deployment
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
Research on vector spatial data storage scheme based
Big Data Architecture for Sensing Applications
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
Latest Research Topics on Cloud Computing
Adoption of Cloud Computing in Scientific Research
Service oriented cloud architecture for improved
Service oriented cloud architecture for improved performance of smart grid ap...
Database Management Myths & Reality for the future
Cloud-Based Big Data Analytics
Ad

More from JAYAPRAKASH JPINFOTECH (20)

PDF
Java Web Application Project Titles 2023-2024.pdf
PDF
Dot Net Final Year IEEE Project Titles.pdf
PDF
MATLAB Final Year IEEE Project Titles 2023 - 2024.pdf
PDF
Python IEEE Project Titles 2023 - 2024.pdf
PDF
Python ieee project titles 2021 - 2022 | Machine Learning Final Year Project...
DOCX
Spammer detection and fake user Identification on Social Networks
DOCX
Sentiment Classification using N-gram IDF and Automated Machine Learning
DOCX
Privacy-Preserving Social Media DataPublishing for Personalized Ranking-Based...
DOCX
FunkR-pDAE: Personalized Project Recommendation Using Deep Learning
DOCX
Discovering the Type 2 Diabetes in Electronic Health Records using the Sparse...
DOCX
Crop Yield Prediction and Efficient use of Fertilizers
DOCX
Collaborative Filtering-based Electricity Plan Recommender System
DOCX
Achieving Data Truthfulness and Privacy Preservation in Data Markets
DOCX
V2V Routing in a VANET Based on the Auto regressive Integrated Moving Average...
DOCX
Towards Fast and Reliable Multi-hop Routing in VANETs
DOCX
Selective Authentication Based Geographic Opportunistic Routing in Wireless S...
DOCX
Robust Defense Scheme Against Selective DropAttack in Wireless Ad Hoc Networks
DOCX
Privacy-Preserving Cloud-based Road Condition Monitoring with Source Authenti...
DOCX
Novel Intrusion Detection and Prevention for Mobile Ad Hoc Networks
DOCX
Node-Level Trust Evaluation in Wireless Sensor Networks
Java Web Application Project Titles 2023-2024.pdf
Dot Net Final Year IEEE Project Titles.pdf
MATLAB Final Year IEEE Project Titles 2023 - 2024.pdf
Python IEEE Project Titles 2023 - 2024.pdf
Python ieee project titles 2021 - 2022 | Machine Learning Final Year Project...
Spammer detection and fake user Identification on Social Networks
Sentiment Classification using N-gram IDF and Automated Machine Learning
Privacy-Preserving Social Media DataPublishing for Personalized Ranking-Based...
FunkR-pDAE: Personalized Project Recommendation Using Deep Learning
Discovering the Type 2 Diabetes in Electronic Health Records using the Sparse...
Crop Yield Prediction and Efficient use of Fertilizers
Collaborative Filtering-based Electricity Plan Recommender System
Achieving Data Truthfulness and Privacy Preservation in Data Markets
V2V Routing in a VANET Based on the Auto regressive Integrated Moving Average...
Towards Fast and Reliable Multi-hop Routing in VANETs
Selective Authentication Based Geographic Opportunistic Routing in Wireless S...
Robust Defense Scheme Against Selective DropAttack in Wireless Ad Hoc Networks
Privacy-Preserving Cloud-based Road Condition Monitoring with Source Authenti...
Novel Intrusion Detection and Prevention for Mobile Ad Hoc Networks
Node-Level Trust Evaluation in Wireless Sensor Networks

Recently uploaded (20)

PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
Hazard Identification & Risk Assessment .pdf
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
IGGE1 Understanding the Self1234567891011
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
Indian roads congress 037 - 2012 Flexible pavement
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
A powerpoint presentation on the Revised K-10 Science Shaping Paper
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Hazard Identification & Risk Assessment .pdf
Orientation - ARALprogram of Deped to the Parents.pptx
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Supply Chain Operations Speaking Notes -ICLT Program
What if we spent less time fighting change, and more time building what’s rig...
Weekly quiz Compilation Jan -July 25.pdf
A systematic review of self-coping strategies used by university students to ...
Final Presentation General Medicine 03-08-2024.pptx
IGGE1 Understanding the Self1234567891011
202450812 BayCHI UCSC-SV 20250812 v17.pptx
History, Philosophy and sociology of education (1).pptx
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Practical Manual AGRO-233 Principles and Practices of Natural Farming
Chinmaya Tiranga quiz Grand Finale.pdf
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
Indian roads congress 037 - 2012 Flexible pavement
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf

Cross cloud map reduce for big data

  • 1. Cross-cloud MapReduce for Big Data ABSTRACT: MapReduce plays a critical role as a leading framework for big data analytics. In this paper, we consider a geo-distributed cloud architecture that provides MapReduce services based on the big data collected from end users all over the world. Existing work handles MapReduce jobs by a traditional computation-centric approach that all input data distributed in multiple clouds are aggregated to a virtual cluster that resides in a single cloud. Its poor efficiency and high cost for big data support motivate us to propose a novel data-centric architecture with three key techniques, namely, cross-cloud virtual cluster, data-centric job placement, and network coding based traffic routing. Our design leads to an optimization framework with the objective of minimizing both computation and transmission cost for running a set of MapReduce jobs in geo-distributed clouds. We further design a parallel algorithm by decomposing the original large-scale problem into several distributively solvable sub-problems that are coordinated by a high-level master problem. Finally, we conduct real-world experiments and extensive simulations to show that our proposal significantly outperforms the existing works. SYSTEM REQUIREMENTS: HARDWARE REQUIREMENTS:  System : i3 Processor  Hard Disk : 500 GB.  Monitor : 15’’ LED
  • 2.  Input Devices : Keyboard, Mouse  Ram : 4GB. SOFTWARE REQUIREMENTS:  Operating system : Windows 7/UBUNTU.  Coding Language : Java 1.7 ,Hadoop 0.8.1  IDE : Eclipse  Database : MYSQL REFERENCE: Peng Li, Member, IEEE, Song Guo, Senior Member, IEEE, Shui Yu, Member, IEEE, and Weihua Zhuang, Fellow, IEEE, “Cross-cloud MapReduce for Big Data”, IEEE Transactions on Cloud Computing, 2019.