Cloud Computing Course
Final Project Assessment
Guided by Dr. Dinkar Sitaram
Problem Statement
The specifics of the problem include,
 Interoperability between Hadoop and OpenStack.
 Hadoop assumes that it has the direct control over resources.
But when installed on OpenStack, the compute and storage
resources of a Hadoop node may be distributed remotely over
the network.This introduces latency between the storage and
the compute components.
 Minimizing the data transfer over iSCSI.
Literature Survey
 Moving to the Cloud (Dr. Dinkar Sitaram et al.)
 http://www.hastexo.com/resources/docs/installing-openstack-
essex-20121-ubuntu-1204-precise-pangolin
 http://devstack.org/guides/multinode-lab.html
 https://github.com/mseknibilel/OpenStack-Folsom-Install-guide
 OpenStack Compute Administration Manual
(docs.openstack.org)
 StackGeek OpenStack Guide
(http://www.stackgeek.com/blog/kordless/guides/gettingstarted
.html)
 Hadoop Installation Guide (http://www.michael-
noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-
node-cluster/)
Proposed Solution Description
 The solution consists of following stages
 Using MRLU / Simple (Max Resource, Least Usage) scheduling
algorithm for allocatingVMs.
 Disabling the option for Live Migration.
 Using OpenStack root-disk for creating HDFS.
 Using Swift service to store User input data and results.
 Writing Bootstrap scripts to setup the IP address and other
initialization tasks.
Solution Description
 MRLU
TheVMs spawned by Nova should be on the machine with
maximum resource and least utility.
 Live Migration
In order to minimize the traffic via iSCSI, the solution demands
that we disable the live migration ofVMs on OpenStack.
 Root Disk
Instead of allocating Cinder storage for HDFS, we plan to use
root-disk located at /var/lib/nova/instances/ on the local
machine.This would impose that the HDFS is not connected
over iSCSI.
Solution Description
 Swift
To provide flexibility and abstraction for the user to interact
with the service, we use Swift to store the user input. Hadoop
uses this data to compute and store the results back on Swift.
 Bootstrapping
We define a set of tasks that need to be performed
before/after spawning theVMs. Some of these tasks include
assigning IP address to Hadoop nodes etc.This can be achieved
by simple bootstrap scripts.
Overview of the Solution
32 GB 32 GB 32 GB 32 GB
VM VM VM VM
Master Slave Slave Slave
HDFS HDFS HDFS HDFS
Nova
Controller
Horizon
Swift
10.10.10.32/27
Network Configuration of the setup
Nova
Controller
Nova
Compute 1
Nova
Compute 2
Public
Switch
Private
Switch
College Network
Router
192.168.0.66
10.10.10.5
192.168.0.67
10.10.10.9
192.168.0.65
10.10.10.6
Hadoop deployment on OpenStack
Nova Controller Nova Compute 1 Nova Compute 2
Hadoop Master
192.168.0.33
10.10.10.34
Hadoop Slave 1
192.168.0.34
10.10.10.35
Hadoop Slave 2
192.168.0.36
10.10.10.36
Hadoop Slave 3
192.168.0.35
10.10.10.37
Hadoop Slave 4
192.168.0.38
10.10.10.38
Future Enhancements
 Explore Swift as the backend storage for HDFS.
 Bootstrap scripts to auto configure the Hadoop cluster
using snapshots of the images.
Team Members
 Akshay MS (1PI09IS010)
 Sandeep Raju P (1PI09CS081)
 Suhas Mohan (1PI09IS104)
 Vijesh M (1PI09CS119)
 Vivek P (1PI09IS119)

More Related Content

PPTX
Introducing SciaaS @ Sanger
PPTX
Open source integrated infra structure using ansible configuration management
PDF
A One-Stop Solution for Puppet and OpenStack
PDF
Sanger OpenStack presentation March 2017
PDF
やっとでた! OpenStack Manila
PDF
Hybrid cloud federation
PDF
Colleen Murphy: Puppet and OpenStack
PPTX
DevStack: Learn OpenStack by Running OpenStack
Introducing SciaaS @ Sanger
Open source integrated infra structure using ansible configuration management
A One-Stop Solution for Puppet and OpenStack
Sanger OpenStack presentation March 2017
やっとでた! OpenStack Manila
Hybrid cloud federation
Colleen Murphy: Puppet and OpenStack
DevStack: Learn OpenStack by Running OpenStack

What's hot (20)

PDF
OpenStack-Ansible Project Update
PPTX
DevStack
PDF
Automating hard things may 2015
PPTX
Deploying OpenStack with Ansible
PDF
Top Ten Security Considerations when Setting up your OpenNebula Cloud
PDF
Enabling Scientific Workflows on FermiCloud using OpenNebula
PDF
Flexible, simple deployments with OpenStack-Ansible
PDF
OpenStack Storage - an Overview
PPTX
OpenStack!
PDF
[OpenStack Days Korea 2016] An SDN Pioneer's Vision of Networking
PDF
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
PDF
What is OpenStack to you? OpenStackFin 2014-02
PDF
Big Data on DC/OS
PPTX
Cloud init and cloud provisioning [openstack summit vancouver]
PDF
OpenStack en 10 minutes
PDF
Mirantis v OpenStack Ansible Dawn of Production
PPTX
Intro to OpenStack
PPTX
Docker in OpenStack
PPTX
Puppet + Windows Nano Server
PDF
SUSE Enterprise Storage
OpenStack-Ansible Project Update
DevStack
Automating hard things may 2015
Deploying OpenStack with Ansible
Top Ten Security Considerations when Setting up your OpenNebula Cloud
Enabling Scientific Workflows on FermiCloud using OpenNebula
Flexible, simple deployments with OpenStack-Ansible
OpenStack Storage - an Overview
OpenStack!
[OpenStack Days Korea 2016] An SDN Pioneer's Vision of Networking
OpenNebula TechDay Boston 2015 - HA HPC with OpenNebula
What is OpenStack to you? OpenStackFin 2014-02
Big Data on DC/OS
Cloud init and cloud provisioning [openstack summit vancouver]
OpenStack en 10 minutes
Mirantis v OpenStack Ansible Dawn of Production
Intro to OpenStack
Docker in OpenStack
Puppet + Windows Nano Server
SUSE Enterprise Storage
Ad

Viewers also liked (7)

PDF
Hadoop For OpenStack Log Analysis
PDF
2012 09-08-josug-jeff
PDF
Hadoop on OpenStack - Trove Day 2014
PDF
Hadoop and OpenStack - Hadoop Summit San Jose 2014
PDF
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
PDF
Savanna: Hadoop on OpenStack
PDF
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop For OpenStack Log Analysis
2012 09-08-josug-jeff
Hadoop on OpenStack - Trove Day 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Savanna: Hadoop on OpenStack
Hadoop on OpenStack - Sahara @DevNation 2014
Ad

Similar to Hadoop on OpenStack (20)

PPTX
DR_PRESENT 1
PDF
HPC on OpenStack
PDF
Infrastructure Around Hadoop
PDF
Scaling Hadoop at LinkedIn
PDF
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
PDF
ABOUT THE SUITABILITY OF CLOUDS IN HIGH-PERFORMANCE COMPUTING
PDF
ABOUT THE SUITABILITY OF CLOUDS IN HIGH-PERFORMANCE COMPUTING
PPT
Docker Based Hadoop Provisioning
PDF
Introduction of private cloud in LINE - OpenStack最新情報セミナー(2019年2月)
PPTX
HDFS tiered storage
PDF
20150704 benchmark and user experience in sahara weiting
PPTX
Openstack Icehouse IaaS Presentation
PDF
Cloud Foundry on OpenStack - An Experience Report | anynines
PDF
Running OpenStack in Production - Barcamp Saigon 2016
PPTX
DOE Magellan OpenStack user story
PPTX
What it takes to run Hadoop at Scale: Yahoo! Perspectives
PDF
Quantifying the Noisy Neighbor Problem in Openstack
PDF
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
PDF
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
PDF
Cloud Foundry and OpenStack: How They Fit - Cloud Expo 2014
DR_PRESENT 1
HPC on OpenStack
Infrastructure Around Hadoop
Scaling Hadoop at LinkedIn
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
ABOUT THE SUITABILITY OF CLOUDS IN HIGH-PERFORMANCE COMPUTING
ABOUT THE SUITABILITY OF CLOUDS IN HIGH-PERFORMANCE COMPUTING
Docker Based Hadoop Provisioning
Introduction of private cloud in LINE - OpenStack最新情報セミナー(2019年2月)
HDFS tiered storage
20150704 benchmark and user experience in sahara weiting
Openstack Icehouse IaaS Presentation
Cloud Foundry on OpenStack - An Experience Report | anynines
Running OpenStack in Production - Barcamp Saigon 2016
DOE Magellan OpenStack user story
What it takes to run Hadoop at Scale: Yahoo! Perspectives
Quantifying the Noisy Neighbor Problem in Openstack
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
SERENE 2014 School: Measurement-Driven Resilience Design of Cloud-Based Cyber...
Cloud Foundry and OpenStack: How They Fit - Cloud Expo 2014

Recently uploaded (20)

PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
2018-HIPAA-Renewal-Training for executives
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
CloudStack 4.21: First Look Webinar slides
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Architecture types and enterprise applications.pdf
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
NewMind AI Weekly Chronicles – August ’25 Week III
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
2018-HIPAA-Renewal-Training for executives
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
OpenACC and Open Hackathons Monthly Highlights July 2025
Convolutional neural network based encoder-decoder for efficient real-time ob...
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
A contest of sentiment analysis: k-nearest neighbor versus neural network
A review of recent deep learning applications in wood surface defect identifi...
CloudStack 4.21: First Look Webinar slides
Benefits of Physical activity for teenagers.pptx
Hindi spoken digit analysis for native and non-native speakers
UiPath Agentic Automation session 1: RPA to Agents
Getting started with AI Agents and Multi-Agent Systems
Architecture types and enterprise applications.pdf

Hadoop on OpenStack

  • 1. Cloud Computing Course Final Project Assessment Guided by Dr. Dinkar Sitaram
  • 2. Problem Statement The specifics of the problem include,  Interoperability between Hadoop and OpenStack.  Hadoop assumes that it has the direct control over resources. But when installed on OpenStack, the compute and storage resources of a Hadoop node may be distributed remotely over the network.This introduces latency between the storage and the compute components.  Minimizing the data transfer over iSCSI.
  • 3. Literature Survey  Moving to the Cloud (Dr. Dinkar Sitaram et al.)  http://www.hastexo.com/resources/docs/installing-openstack- essex-20121-ubuntu-1204-precise-pangolin  http://devstack.org/guides/multinode-lab.html  https://github.com/mseknibilel/OpenStack-Folsom-Install-guide  OpenStack Compute Administration Manual (docs.openstack.org)  StackGeek OpenStack Guide (http://www.stackgeek.com/blog/kordless/guides/gettingstarted .html)  Hadoop Installation Guide (http://www.michael- noll.com/tutorials/running-hadoop-on-ubuntu-linux-single- node-cluster/)
  • 4. Proposed Solution Description  The solution consists of following stages  Using MRLU / Simple (Max Resource, Least Usage) scheduling algorithm for allocatingVMs.  Disabling the option for Live Migration.  Using OpenStack root-disk for creating HDFS.  Using Swift service to store User input data and results.  Writing Bootstrap scripts to setup the IP address and other initialization tasks.
  • 5. Solution Description  MRLU TheVMs spawned by Nova should be on the machine with maximum resource and least utility.  Live Migration In order to minimize the traffic via iSCSI, the solution demands that we disable the live migration ofVMs on OpenStack.  Root Disk Instead of allocating Cinder storage for HDFS, we plan to use root-disk located at /var/lib/nova/instances/ on the local machine.This would impose that the HDFS is not connected over iSCSI.
  • 6. Solution Description  Swift To provide flexibility and abstraction for the user to interact with the service, we use Swift to store the user input. Hadoop uses this data to compute and store the results back on Swift.  Bootstrapping We define a set of tasks that need to be performed before/after spawning theVMs. Some of these tasks include assigning IP address to Hadoop nodes etc.This can be achieved by simple bootstrap scripts.
  • 7. Overview of the Solution 32 GB 32 GB 32 GB 32 GB VM VM VM VM Master Slave Slave Slave HDFS HDFS HDFS HDFS Nova Controller Horizon Swift 10.10.10.32/27
  • 8. Network Configuration of the setup Nova Controller Nova Compute 1 Nova Compute 2 Public Switch Private Switch College Network Router 192.168.0.66 10.10.10.5 192.168.0.67 10.10.10.9 192.168.0.65 10.10.10.6
  • 9. Hadoop deployment on OpenStack Nova Controller Nova Compute 1 Nova Compute 2 Hadoop Master 192.168.0.33 10.10.10.34 Hadoop Slave 1 192.168.0.34 10.10.10.35 Hadoop Slave 2 192.168.0.36 10.10.10.36 Hadoop Slave 3 192.168.0.35 10.10.10.37 Hadoop Slave 4 192.168.0.38 10.10.10.38
  • 10. Future Enhancements  Explore Swift as the backend storage for HDFS.  Bootstrap scripts to auto configure the Hadoop cluster using snapshots of the images.
  • 11. Team Members  Akshay MS (1PI09IS010)  Sandeep Raju P (1PI09CS081)  Suhas Mohan (1PI09IS104)  Vijesh M (1PI09CS119)  Vivek P (1PI09IS119)