SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1054
Cloud Computing Ambiance Using Secluded Access Control Method
Ms. A. Sivasankari1, Ms. P.Bhuvana2, Ms.Arunkumari.G3
1Head of the Department (cs), Dept of Computer Science and Applications, D.K.M. College for Women
(Autonomous),Vellore, Tamilnadu, India.
2Dept of Computer Science and Applications, D.K.M. College for Women (Autonomous), Vellore, Tamilnadu, India.
3Assistant professor, Dept of Computer Science and Applications, D.K.M. College for Women (Autonomous),
Vellore, Tamilnadu, India.
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract – Cloud computing has considerably reduced the
computational and storage costs of outsourced data. The
existing access control techniques offer users access
provisions centered on the frequent user attribute like role,
which reduce the fine grained admission calculate. The
storage space CorrectnessandFinegrainedAccessProvision
(SCFAP) scheme,whichprovidestheuseran exclusiveaccess
through the use of a hierarchical formation which is a
combination of users’ single and widespread attribute. Also,
we deploy the concept of voucher yielding system that
allows the users to authenticate the correctness of
outsourced data without the retrieval of the respective files.
The tokens are derived from the metadata containing file
position that helps in the process storage correctness
verification and improvises the storage efficiency. The
untried results show SCFAP has superior storage efficiency
and error recovery measures than existing techniques.
Keywords: Access control, access formation, barrier
limits, storage efficiency, token granting system.
1. INTRODUCTION
Data Centers today cater to a wide Diaspora of applications,
with workloads varying from data science batch and
streaming applications to decodinggenomesequences.Each
application can have different syntax and semantics,
with varying I/O needs from storage. With highly
sophisticated and optimized data processing frameworks,
such as Hadoop and Spark, applications are capable of
processing large amounts of data at the same time.
Dedicating physical resources for every application is not
economically feasible. In cloud environments,withtheaidof
server and storage virtualization, multiple processes
contend for the same physical resource (namely, compute,
network and storage) iscausescontentions.In-ordertomeet
their service level agreements (SLAs), cloud providers need
to ensure performance isolation guarantees for every
application.
With multi-core computing capabilities, CPUs have scaledto
accommodate the needs of “Big Data”, but storage still
remains a bottleneck. e physical media characteristics and
interface technology are mostly blamed for storage being
slow, but this is partially true. Full potential of storage
devices cannot be harnessed till all the layers of the I/O
hierarchy function efficiently.
Performance of storage devices depends on the order in
which the data is stored and accessed. Therefore, in large
scale distributed systems (“cloud”), data management plays
a vital role in processing and storing pet bytesofdata among
hundreds of thousands of storage devices. e problems
associated due to the in efficiencies in data management
get amplified in multi-tasking, and shared Big Data
environments.
Despite advanced optimizations applied across various
layers along the odyssey of data access, the I/O stack still
remains volatile. Linux OS (Host) block layer is the most
critical part of the I/O hierarchy as it orchestrates the I/O
requests from different applications to the underlying
storage. e key to the performance of the block layer is the
Block I/O scheduler, which is responsible for dividing the
I/O bandwidth amongst the contending processes as well as
determines the order of requests sent to storage device.
Unfortunately, despite its significance, the block layer,
essentially the block I/O scheduler hasn’t evolved to meet
the volume and contention resolution needs of data centers
experiencing Big Data workloads. We have designed and
developed two Contention Avoidance Storage solutions in
the Linux block layer, collectively known as “BID: Bulk I/O
Dispatch”, specifically to suit multi-tenant, multitasking Big
Data shared resource environments. Big Data applications
use data processing frameworks such as Hadoop Map
Reduce, which access storage in large data chunks (64 MB
HDFS blocks,) therefore exhibitingevidentsequential.Dueto
contentions amongst concurrent I/O submitting processes
and the working of the current I/O schedulers, the inherent
sequential of Big Data processes is lost. The processes may
be instances of the same application or belong to other
applications. contentions result into unwanted such as
multiplexing and inter leavings, thereby breaking of large
data accesses. Increase in latency of storage In the first
solution, we propose a dynamically adaptable Block I/O
scheduling scheme BID-HDD, for disk based storage.
BID-HDD tries to recreate the in I/O access in order to
provide performance isolation to each I/O submitting
process. Rough trace driven simulation based experiments
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1055
with cloud emulating Map Reduce benchmarks, we show of
BID -HDD which results in 28–52% I/O time performance
gain for all I/O requests than the best performing Linux disk
schedulers.
With recent developments in NVMe (non-volatile memory)
devices such as solid state drives (SSDs), commonly known
as storage class memories (SCM) , with supporting
infrastructure, and, virtualization techniques, a hybrid
approach of using heterogeneous tiers of storage together
such as those having HDDsandSSDscoupledwith workload-
aware tier to balance cost, performance and capacity have
become increasingly popular. Inthesecondpart,we propose
a novel hybrid scheme BID-Hybrid to exploit SCM’s (SSDs)
superior random performance to further avoid contentions
at disk based storage. The main goal of BID-Hybrid is to
further enhance the performance of BID-HDD scheduling
scheme, by interruption causing non-bulky I/Os to SSD and
thereby making the “HDD request queue”availablefor bulky
and sequential I/Os.
Contrary to the existing literature of tier, where data is
tiered based on deviation of adjacent disk block locations in
the device “request queue”, BID-Hybrid profiles process I/O
characteristics (bulkiness) to decide on the correct
candidates for tier. Current literature might cause
unnecessary deportations to SSDs, due to M/Os from an
application, which might be sequential but appear random
due to the contention by otherapplicationsinsubmittingI/O
to the “request queue”. While BID-Hybrid uses staging
capabilities and anticipation time for judicious and verified
decisions. BID-Hybrid serves I/Os from bulky processes in
HDD and tiers I/Os from non-bulky (lighter) interruption
causing processes to SSD.
BID-Hybrid is successfully able to achieve its objective
further reducing contention at disk based storage device.
BID Hybrid results in performance gain of 6–23% for Map
Reduce workloads over BID-HDD and 33– 54%overthe best
performing Linux scheduling schemes.
2. BACKGROUND
Hadoop Map Reduce working and workload characteristics
and Requirements from a block I/O scheduler in Big Data
deployments sections discuss the I/O workload
characteristics of Hadoop deployments and the
requirements from a I/O scheduler in such environments,
respectively. “Issues with current I/O schedulers” section
describes the working of the current state-of-the-art Linux
disk schedulers deployed in shared Big Data infrastructure.
3. WORKING AND WORKLOAD
CHARACTERISTICS
Hadoop Map Reduce is the defect large data processing
framework for Big Data. Hadoop is a multi-tasking system
which can process multiple data sets for multi-jobs in a
multi-user environment at the same time. Hadoop uses a
block-structured file system, known as Hadoop Distributed
File System (HDFS). HDFS splits the stored files into fixed
size (generally 64 MB/128 MB) file system blocks, knownas
chunks, which are usually tri-replicated across the storage
nodes for fault tolerance and performance. Hadoop is
designed in such a way that the processes access the data in
chunks. When a process opens a file, it reads/writes in
multiples of these chunks. Enterprise Hadoop workloads
have highly skewed characteristics making the profiling
tough with the “hot” data being really large. Us, the effect of
file system caching is negligible in HDFS. Most of the data
access is done from the underlyingdisk (orsolidstate)based
storage devices. Therefore, a single chunk causes multiple
page faults, which eventually would result in creation and
submission of thousands of I/O requests to the block layer
for further processing before dispatching them to the
physical storage.
Each Map Reduce application consists of multiple processes
submitting I/Os concurrently, possibly in different
interleaving stages, i.e. Map.Reduce, eachhavingskewedI/O
requirements. Moreover, these applications run on multi-
tenant infrastructure which is shared by a wide of such
applications, each having different syntax and semantics.
For Big Data multi-processing environments, although the
requests from each concurrent process results into large
number of sequential disk accesses, they face contention at
the storage interface from other applications. These
contentions are resolved by the OS Block Layer, more
essentially the I/O scheduler. e inherent sequential
operations of applications becomes non-sequential due to
the working of the current disk I/O schedulers, which
thereby result into unwanted like multiplexing and
interleaving of requests is also results in higher CPU
wait/idle time as it has to wait for the data. In order to pro-
vide performance isolation to each process as well as
improve system performance, it is imperative to remove or
avoid contentions.“Issues with current I/O schedulers”
section describes the working of the currentstate-of-the-art
Linux disk schedulers deployed in shared Big Data
infrastructure. In the next section, we discuss the
requirements of a block I/O scheduler most suited for
Hadoop deployments.
4. REQUIREMENTS FROM A BLOCK I/O
SCHEDULER IN BIG DATA DEPLOYMENTS
The key requirements from a block I/O scheduler in
multiprocess shared Big Data environments,suchasHadoop
Map Reduce are as follows:
Capitalize on large I/O access: Data is accessed in large
data chunks (64/128 MB in HDFS), whichhavea highdegree
of sequential in the storage media. I/O scheduler should be
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1056
able to capitalize on large I/O access and should not break
these large sequential requests.
Adaptive: Multiple CPUs (or applications) try to access the
same storage media in a shared infrastructure, which causes
skewed workload patterns. Additionally, each Map Reduce
task itself has varying and interleaving I/O characteristicsin
its Map, Reduce and phases. Therefore it is imperativefor an
I/O scheduler to dynamically adapt to such skewed and
changing I/O patterns.
Performance isolation: In-order to meet the SLAs, it is
highly imperative to provide I/O performance isolation for
each application. For ex: A single Map Reduce application
consists of multiple of tasks, each consisting of multiple
processes, each having different I/O requirements.
Therefore, an I/O scheduler through process-level
segregation should ensure I/O resource isolation to every
I/O contending process.
Regular I/O scheduler features Reducing CPU wait/idle
time by serving blocking I/ Os (reads) quickly; avoid
starvation of any requests; improve to reduce disk arm
movements.
Issues with current I/O schedulers
Since version 2.6.33, Linux currently employs three disk I/O
Schedulers namely Noop, Deadline and Completely Fair
Queuing CFQ.As observed in “Linux I/O stack” section, the
main functionalities of the block I/O schedulers are as
follows:Lifecycle Management of the block I/O “requests”
(which may consist of multiples of BIO structures) in the
“request queue”. Moving requests from “request queue” to
the “dispatch queue”.e dispatch queue is the sequence of
requests ready to be sent to the block device driver.
HDDs form the backbone of data centers storage. The effect
of caching is negligible in an enterprise Big Data
environment. Therefore large numbers of page faults occur,
which in turn result in most of the data accesses from the
underlying storage. Hence, it is imperative to tune the data
management software stack to harness the complete
potential of the physical media in highly skewed and
multiplexing Big Data deployments. The block layer is the
most performance critical component to resolve disk I/O
contentions along the odyssey of I/O path.
Unfortunately, despite its significance in orchestrating the
I/O requests, the block layer essentially the I/O Scheduler
has not evolved much to meet the needs of Big Data.
We have designed and developed two ContentionAvoidance
Storage solutions, collectively known as “BID: Bulk I/O
Dispatch” in the Linux block layer specifically to suit multi-
tenant, multitasking shared Big Data environments. In the
first part of this section, we propose a Block I/O scheduling
scheme BID-HDD for disk based storage. BID-HDD tries to
recreate the sequential in I/O access in order to provide
performance isolation to each I/O submitting process.
In the second part, we propose a hybrid scheme BID-Hybrid
to exploit SCM’s (SSDs) superior random performance to
further avoid contentionsatdisk basedstorage.Inthehybrid
approach, dynamic process level profiling in the block layer
is done for deciding the candidates for tiertoSSD. Therefore,
I/O blocks belonging to interruption causing processes are
to SSD; while bulky I/Os are served by HDD. BID-HDD
scheduling scheme is used for disk request processing and
multi-q FIFO architecture for SSD I/O request processing.
BID schemes are designed taking into consideration the
requirements laid out earlier in “Requirements from a block
I/O scheduler in Big Data deployments” section. BID as a
whole is aimed to avoid contentions for storage I/Os
following system constraints with-out compromising the
SLAs.
BID-HDD aims to avoid multiplexing of I/O requests from
different processesrunningconcurrently.Toachievethis,we
segregate the I/O requests from each process into
containers. The idea is to introduce dynamically adaptable
and need-based anticipation time for each process, i.e. “time
to wait for adjoining I/O request”. is allows coalescing of the
bulky data accesses and avoid starvation of any requests.
Each process container has a wait timer, based on inter
arrival time of requests and deadline associated with it.
Due to physical limitation of HDDs, therehave been recentto
incorporate fash based high-speed, non-volatile secondary
memory devices, known as SCMs in data centers.
Despite superior random performance of SCMs (or SSDs)
over HDDs, replacing disks with SCMs completely for data
center deploymentsdoesn’t seemtobefeasibleeconomically
as well as due to other associated issues.
With recent developments in NVMedevices,withsupporting
infrastructure, and, virtualization techniques, a hybrid
approach of using heterogeneous tiers of storage together
such as those having HDDsandSSDscoupledwith workload-
aware tier to balance cost, performance and capacity have
become increasingly popular. Data centers consist of many
tiers of storage devices. All storage devices of the same type
form a tier. For example: all HDDs across the data-center
form the HDD tier and all SSD form SSD tier, and similarlyfor
other SCMs. Based on profiling ofworkloads,balancedutility
value of data usage, the data is managed between the tiers of
storage for improved performance.
Workload aware Storage Tier or simply Tier istheautomatic
classification of how data is managed between
heterogeneous tiers of storage in enterprise data center
environment. It is vital to develop automated and dynamic
tier solutions to utilize all the tiers of storage. BID-Hybrid
aims to deliver the capability of dynamic and judicious
automated tier in the block layer as a SDS solution.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1057
5. RELATED WORKS
The domain of storage technologies has been an active field
of research. More recently, there has been research
inclination in developing both, the software as well as
physical architecture of NVMe, referred to as SCMs to meet
the SLAs of Big Data. We broadly classify theliteratureinour
focus into:
(a) Block layer developments, mostly I/O Scheduling, and
(b) Multi-tier storage environment. Table 4 mentions state-
of-the-art solutions in both these classifications.
6. BLOCK LAYER DEVELOPMENTS,MOSTLYI/O
SCHEDULING
In this section, we discuss the developments in the block
layer, concentrating mostly on I/O Scheduling. I/O
Scheduling has been around since the beginning of disk
drives, though we will limit our discussion to those
approaches which are relevant to recent developments.
Despite advanced optimizations applied across various
layers along the odyssey of data access, the Linux I/O stack
still remains volatile. e block layer hasn’tevolvedtocaterthe
requirements of Big Data.
One of the major findings was in establishing relationships
between performance and block I/O scheduler.Ourwork on
BID-HDD is an in this domain especially for rotation based
recording drives. BID is essentially a contention avoidance
technique which can be modeled to cater differentobjective
functions (storage media type, performance characteristics,
etc.).The provides a brief overview of the Linux block layer,
basic I/O units, request queue processing, etc.ADproposesa
framework which studies the VM interference in Hadoop
virtualized environments with the execution of single Map
Reduce job with several disk pair schedulers. It divides the
Map Reduce job into phases and executes series of
experiments using a heuristic tochoosea disk pairscheduler
for the next phase in a VM Environment. BORG is a self-
optimizing HDD based solution which reorganizes blocks in
the block layer by forming sequences via calculating
correlation amongst LBA ranges with connectivity based on
frequency distribution and temporal locality. It makes
weighted graphs and relocation of blocks happens to most
needed vertex first. e goal is to service most requests from
dedicated zones of a HDD.
Multi-q is an important piece of work which extends the
capabilities of the block layer for utilizing internal
parallelism of SSDs to enable fast computation for multicore
systems.
It proposes changes to the existing OS block layer with
support for multiple software and hardware queues for a
single storage device. Multi-q involves a software queue per
CPU core.
Similar lock contention scheme can be usedforBID,asitalso
involves multiple queues. CFFQ is an SSD extension of CFQ
scheduler in which each process has a FIFO request queue
and the I/O bandwidth is fairly distributed in round robin
fashion. SLASSD and Kim et al. propose to ensure diverse
SLAs, including reservations, limitations, and proportional
sharing by their I/O Scheduling schemes in shared VM
environment for SSDs. While SLASSD uses an opportunistic
goal oriented block I/O scheduling algorithm, Kim et al.
proposes host level SSD I/O schedulers, which are
extensions of state-of -the-art I/O scheduling scheme CFQ.
Big Data cloud deployments due to the highly skewed, non-
uniform and multiplexing workloads prediction of utility
value of blocks for tier based on heat of data might not be a
viable option.
6. CONCLUSION AND FUTURE WORKS
WehavedevelopedanddesignedtwonovelContention
Avoidance storage solutions, collectively known as “BID: Bulk
I/O Dispatch” in the Linux block layer, specifically to suit multi-
tenant, multi-tasking and skewed sharedBigDatadeployments.
rough trace-driven experiments using in-house developed
system simulators and cloud emulating Big Data benchmarks,
we show the effectiveness of both our schemes. BID-
HDD, which is essentially a blockI/Oscheduling schemefordisk
based storage, results in 28–52%lessertimeforallI/Orequests
than the best performing Linux disk schedulers. BID-Hybrid,
tries exploit SSDs superior random performance to further
reduce contentions at disk based storage. BID-Hybrid is
experimentally shown to be successful in achieving 6– 23%
performance gains over BID-HDD and 33–54% over best
performing Linux scheduling schemes.
In future, it would be interesting to design a system
with BID schemes for block level contention management
coupled with self-optimizing block re-organizationofBORG,
adaptive data migration policies of ADLAM, and replication-
management of such as Triple-H. is could solve the issue of
workload and cost-aware tiering for large scaledata-centers
experiencing Big Data workloads.
Broader impact of this research would aid Data
Centers in achieving their SLAs as well keeping the TCO low.
Apart from performance improvements of storage systems,
the over-all deployment of BID schemes in data centers
would also lead to energyfootprintreductionandincreasein
lifespan expectancy of disk based storage devices.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1058
7. REFERENCES
1.Krish K, Wadhwa B, Iqbal MS, Rafique MM, Butt AR. On
efficient hierarchical storageforbigdata processing.In:2016
16th IEEE/ACM international symposium on cluster, cloud
and grid computing (CCGrid). New York: IEEE; 2016. p.403–
8.
2. Nanavati M, Schwarzkopf M, Wires J, Warfield A.Non-
volatile storage. Queue. 2015;13(9):20–332056.
3. Mittal S, Vetter JS. A survey of software techniques for
using non-volatile memories for storage and main memory
systems. IEEE Trans Parallel DiatribeSyst.2016;27(5):
1537–50.
4.Love R. Linux Kernel development. 2010. p. 1–300.
https://rlove.org/. Accessed 31 Mar 2017.
5.Avanzini A. Debugging Fanatic, Linux and
Xenenthusiast. BFQ I/O scheduler. http://ari-
ava.blogspot.com/2014/06/opw-linux-block-io-layer-part-
1-base.html.Accessed 15 Apr 2016.
6. Vangoor BKR, Tarasov V, Zadok E. To fuse or not to fuse:
performance of user-space file systems. In: Proceedings of
FAST’17: 15th USENIX conference on file and storage
technologies. 2017. p. 59.
6. Aghayev A, Ts’o T, Gibson G, Desnoyers P. Evolving ext4
for shingled disks. 2017.
7. Arpaci-Dusseau RH, Arpaci-Dusseau AC. Operating
systems: three easy pieces, vol. 151. 2014.
8. Moon S, Lee J, Sun X, Kee Y-S.Optimizing the
hadoopMapReduce framework with high-performance
storage devices.J Supercomput. 2015;71(9):3525–48.
9. Eshghi K, Micheloni R. Ssd architecture and pci
express interface. In: Inside solid state drives
(SSDs).2013..
10.Yang Y, Zhu J. Write skew and zipf distribution: evidence
and implications. Trans Storage. 2016;12(4):21– 12119.
11.Roussos K. Storage virtualization gets smart.
Queue.2007;5(6):38–44.
12. Dean J, Ghemawat S. MapReduce: simplified data
processingonlargeclusters. CommunACM.2008;51:107–13
13. White T. Hadoop: the definitive guide. 2012. p. 1–
http://hadoopbook.com/. Accessed 31 Mar 2017.

More Related Content

PDF
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
PDF
Infrastructure Considerations for Analytical Workloads
PDF
Database Performance Management in Cloud
PDF
IRJET- Performing Load Balancing between Namenodes in HDFS
PDF
Big Data Analysis and Its Scheduling Policy – Hadoop
PDF
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
PDF
Survey Paper on Big Data and Hadoop
PDF
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
Infrastructure Considerations for Analytical Workloads
Database Performance Management in Cloud
IRJET- Performing Load Balancing between Namenodes in HDFS
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Survey Paper on Big Data and Hadoop
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...

What's hot (16)

PDF
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
PDF
Challenges Management and Opportunities of Cloud DBA
PDF
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
PDF
An Algorithm to synchronize the local database with cloud Database
PDF
EMC Isilon Multitenancy for Hadoop Big Data Analytics
 
PDF
EMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics
 
PDF
Implementation of Multi-node Clusters in Column Oriented Database using HDFS
PDF
Introduction to Big Data and Hadoop using Local Standalone Mode
PDF
Building a data warehouse of call data records
PDF
Magic quadrant for data warehouse database management systems
PDF
Gartner magic quadrant for data warehouse database management systems
PDF
Deduplication on Encrypted Big Data in HDFS
PDF
Hitachi overview-brochure-hus-hnas-family
DOCX
Cis 409 Enthusiastic Study / snaptutorial.com
PPTX
Information processing architectures
PDF
Big_SQL_3.0_Whitepaper
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
Challenges Management and Opportunities of Cloud DBA
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
An Algorithm to synchronize the local database with cloud Database
EMC Isilon Multitenancy for Hadoop Big Data Analytics
 
EMC Isilon Scale-Out NAS for In-Place Hadoop Data Analytics
 
Implementation of Multi-node Clusters in Column Oriented Database using HDFS
Introduction to Big Data and Hadoop using Local Standalone Mode
Building a data warehouse of call data records
Magic quadrant for data warehouse database management systems
Gartner magic quadrant for data warehouse database management systems
Deduplication on Encrypted Big Data in HDFS
Hitachi overview-brochure-hus-hnas-family
Cis 409 Enthusiastic Study / snaptutorial.com
Information processing architectures
Big_SQL_3.0_Whitepaper
Ad

Similar to Cloud Computing Ambiance using Secluded Access Control Method (20)

PDF
Efficient and scalable multitenant placement approach for in memory database ...
PDF
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
PDF
H017144148
PDF
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
PDF
IRJET- A Novel Approach to Process Small HDFS Files with Apache Spark
PDF
IRJET- Secured Hadoop Environment
PDF
cloud computing notes for enginnering students
PDF
Study on Composable Infrastructure – Breakdown of Composable Memory
PDF
Analysis of SOFTWARE DEFINED STORAGE (SDS)
PPTX
Data-Intensive Technologies for Cloud Computing
PDF
Storage Virtualization: Towards an Efficient and Scalable Framework
PPTX
Sdn in big data
PPTX
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
PDF
G017143640
PDF
Geo distributed parallelization pacts in map reduce
PDF
IRJET- A Study of Comparatively Analysis for HDFS and Google File System ...
PDF
Big Data: RDBMS vs. Hadoop vs. Spark
PDF
Big Data with Hadoop – For Data Management, Processing and Storing
PDF
PDF
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Efficient and scalable multitenant placement approach for in memory database ...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
H017144148
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
IRJET- A Novel Approach to Process Small HDFS Files with Apache Spark
IRJET- Secured Hadoop Environment
cloud computing notes for enginnering students
Study on Composable Infrastructure – Breakdown of Composable Memory
Analysis of SOFTWARE DEFINED STORAGE (SDS)
Data-Intensive Technologies for Cloud Computing
Storage Virtualization: Towards an Efficient and Scalable Framework
Sdn in big data
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
G017143640
Geo distributed parallelization pacts in map reduce
IRJET- A Study of Comparatively Analysis for HDFS and Google File System ...
Big Data: RDBMS vs. Hadoop vs. Spark
Big Data with Hadoop – For Data Management, Processing and Storing
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Construction Project Organization Group 2.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Sustainable Sites - Green Building Construction
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
composite construction of structures.pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPT
Project quality management in manufacturing
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
additive manufacturing of ss316l using mig welding
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
Construction Project Organization Group 2.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Internet of Things (IOT) - A guide to understanding
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
bas. eng. economics group 4 presentation 1.pptx
Foundation to blockchain - A guide to Blockchain Tech
Sustainable Sites - Green Building Construction
UNIT-1 - COAL BASED THERMAL POWER PLANTS
composite construction of structures.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Project quality management in manufacturing
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
R24 SURVEYING LAB MANUAL for civil enggi
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
additive manufacturing of ss316l using mig welding

Cloud Computing Ambiance using Secluded Access Control Method

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1054 Cloud Computing Ambiance Using Secluded Access Control Method Ms. A. Sivasankari1, Ms. P.Bhuvana2, Ms.Arunkumari.G3 1Head of the Department (cs), Dept of Computer Science and Applications, D.K.M. College for Women (Autonomous),Vellore, Tamilnadu, India. 2Dept of Computer Science and Applications, D.K.M. College for Women (Autonomous), Vellore, Tamilnadu, India. 3Assistant professor, Dept of Computer Science and Applications, D.K.M. College for Women (Autonomous), Vellore, Tamilnadu, India. ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract – Cloud computing has considerably reduced the computational and storage costs of outsourced data. The existing access control techniques offer users access provisions centered on the frequent user attribute like role, which reduce the fine grained admission calculate. The storage space CorrectnessandFinegrainedAccessProvision (SCFAP) scheme,whichprovidestheuseran exclusiveaccess through the use of a hierarchical formation which is a combination of users’ single and widespread attribute. Also, we deploy the concept of voucher yielding system that allows the users to authenticate the correctness of outsourced data without the retrieval of the respective files. The tokens are derived from the metadata containing file position that helps in the process storage correctness verification and improvises the storage efficiency. The untried results show SCFAP has superior storage efficiency and error recovery measures than existing techniques. Keywords: Access control, access formation, barrier limits, storage efficiency, token granting system. 1. INTRODUCTION Data Centers today cater to a wide Diaspora of applications, with workloads varying from data science batch and streaming applications to decodinggenomesequences.Each application can have different syntax and semantics, with varying I/O needs from storage. With highly sophisticated and optimized data processing frameworks, such as Hadoop and Spark, applications are capable of processing large amounts of data at the same time. Dedicating physical resources for every application is not economically feasible. In cloud environments,withtheaidof server and storage virtualization, multiple processes contend for the same physical resource (namely, compute, network and storage) iscausescontentions.In-ordertomeet their service level agreements (SLAs), cloud providers need to ensure performance isolation guarantees for every application. With multi-core computing capabilities, CPUs have scaledto accommodate the needs of “Big Data”, but storage still remains a bottleneck. e physical media characteristics and interface technology are mostly blamed for storage being slow, but this is partially true. Full potential of storage devices cannot be harnessed till all the layers of the I/O hierarchy function efficiently. Performance of storage devices depends on the order in which the data is stored and accessed. Therefore, in large scale distributed systems (“cloud”), data management plays a vital role in processing and storing pet bytesofdata among hundreds of thousands of storage devices. e problems associated due to the in efficiencies in data management get amplified in multi-tasking, and shared Big Data environments. Despite advanced optimizations applied across various layers along the odyssey of data access, the I/O stack still remains volatile. Linux OS (Host) block layer is the most critical part of the I/O hierarchy as it orchestrates the I/O requests from different applications to the underlying storage. e key to the performance of the block layer is the Block I/O scheduler, which is responsible for dividing the I/O bandwidth amongst the contending processes as well as determines the order of requests sent to storage device. Unfortunately, despite its significance, the block layer, essentially the block I/O scheduler hasn’t evolved to meet the volume and contention resolution needs of data centers experiencing Big Data workloads. We have designed and developed two Contention Avoidance Storage solutions in the Linux block layer, collectively known as “BID: Bulk I/O Dispatch”, specifically to suit multi-tenant, multitasking Big Data shared resource environments. Big Data applications use data processing frameworks such as Hadoop Map Reduce, which access storage in large data chunks (64 MB HDFS blocks,) therefore exhibitingevidentsequential.Dueto contentions amongst concurrent I/O submitting processes and the working of the current I/O schedulers, the inherent sequential of Big Data processes is lost. The processes may be instances of the same application or belong to other applications. contentions result into unwanted such as multiplexing and inter leavings, thereby breaking of large data accesses. Increase in latency of storage In the first solution, we propose a dynamically adaptable Block I/O scheduling scheme BID-HDD, for disk based storage. BID-HDD tries to recreate the in I/O access in order to provide performance isolation to each I/O submitting process. Rough trace driven simulation based experiments
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1055 with cloud emulating Map Reduce benchmarks, we show of BID -HDD which results in 28–52% I/O time performance gain for all I/O requests than the best performing Linux disk schedulers. With recent developments in NVMe (non-volatile memory) devices such as solid state drives (SSDs), commonly known as storage class memories (SCM) , with supporting infrastructure, and, virtualization techniques, a hybrid approach of using heterogeneous tiers of storage together such as those having HDDsandSSDscoupledwith workload- aware tier to balance cost, performance and capacity have become increasingly popular. Inthesecondpart,we propose a novel hybrid scheme BID-Hybrid to exploit SCM’s (SSDs) superior random performance to further avoid contentions at disk based storage. The main goal of BID-Hybrid is to further enhance the performance of BID-HDD scheduling scheme, by interruption causing non-bulky I/Os to SSD and thereby making the “HDD request queue”availablefor bulky and sequential I/Os. Contrary to the existing literature of tier, where data is tiered based on deviation of adjacent disk block locations in the device “request queue”, BID-Hybrid profiles process I/O characteristics (bulkiness) to decide on the correct candidates for tier. Current literature might cause unnecessary deportations to SSDs, due to M/Os from an application, which might be sequential but appear random due to the contention by otherapplicationsinsubmittingI/O to the “request queue”. While BID-Hybrid uses staging capabilities and anticipation time for judicious and verified decisions. BID-Hybrid serves I/Os from bulky processes in HDD and tiers I/Os from non-bulky (lighter) interruption causing processes to SSD. BID-Hybrid is successfully able to achieve its objective further reducing contention at disk based storage device. BID Hybrid results in performance gain of 6–23% for Map Reduce workloads over BID-HDD and 33– 54%overthe best performing Linux scheduling schemes. 2. BACKGROUND Hadoop Map Reduce working and workload characteristics and Requirements from a block I/O scheduler in Big Data deployments sections discuss the I/O workload characteristics of Hadoop deployments and the requirements from a I/O scheduler in such environments, respectively. “Issues with current I/O schedulers” section describes the working of the current state-of-the-art Linux disk schedulers deployed in shared Big Data infrastructure. 3. WORKING AND WORKLOAD CHARACTERISTICS Hadoop Map Reduce is the defect large data processing framework for Big Data. Hadoop is a multi-tasking system which can process multiple data sets for multi-jobs in a multi-user environment at the same time. Hadoop uses a block-structured file system, known as Hadoop Distributed File System (HDFS). HDFS splits the stored files into fixed size (generally 64 MB/128 MB) file system blocks, knownas chunks, which are usually tri-replicated across the storage nodes for fault tolerance and performance. Hadoop is designed in such a way that the processes access the data in chunks. When a process opens a file, it reads/writes in multiples of these chunks. Enterprise Hadoop workloads have highly skewed characteristics making the profiling tough with the “hot” data being really large. Us, the effect of file system caching is negligible in HDFS. Most of the data access is done from the underlyingdisk (orsolidstate)based storage devices. Therefore, a single chunk causes multiple page faults, which eventually would result in creation and submission of thousands of I/O requests to the block layer for further processing before dispatching them to the physical storage. Each Map Reduce application consists of multiple processes submitting I/Os concurrently, possibly in different interleaving stages, i.e. Map.Reduce, eachhavingskewedI/O requirements. Moreover, these applications run on multi- tenant infrastructure which is shared by a wide of such applications, each having different syntax and semantics. For Big Data multi-processing environments, although the requests from each concurrent process results into large number of sequential disk accesses, they face contention at the storage interface from other applications. These contentions are resolved by the OS Block Layer, more essentially the I/O scheduler. e inherent sequential operations of applications becomes non-sequential due to the working of the current disk I/O schedulers, which thereby result into unwanted like multiplexing and interleaving of requests is also results in higher CPU wait/idle time as it has to wait for the data. In order to pro- vide performance isolation to each process as well as improve system performance, it is imperative to remove or avoid contentions.“Issues with current I/O schedulers” section describes the working of the currentstate-of-the-art Linux disk schedulers deployed in shared Big Data infrastructure. In the next section, we discuss the requirements of a block I/O scheduler most suited for Hadoop deployments. 4. REQUIREMENTS FROM A BLOCK I/O SCHEDULER IN BIG DATA DEPLOYMENTS The key requirements from a block I/O scheduler in multiprocess shared Big Data environments,suchasHadoop Map Reduce are as follows: Capitalize on large I/O access: Data is accessed in large data chunks (64/128 MB in HDFS), whichhavea highdegree of sequential in the storage media. I/O scheduler should be
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1056 able to capitalize on large I/O access and should not break these large sequential requests. Adaptive: Multiple CPUs (or applications) try to access the same storage media in a shared infrastructure, which causes skewed workload patterns. Additionally, each Map Reduce task itself has varying and interleaving I/O characteristicsin its Map, Reduce and phases. Therefore it is imperativefor an I/O scheduler to dynamically adapt to such skewed and changing I/O patterns. Performance isolation: In-order to meet the SLAs, it is highly imperative to provide I/O performance isolation for each application. For ex: A single Map Reduce application consists of multiple of tasks, each consisting of multiple processes, each having different I/O requirements. Therefore, an I/O scheduler through process-level segregation should ensure I/O resource isolation to every I/O contending process. Regular I/O scheduler features Reducing CPU wait/idle time by serving blocking I/ Os (reads) quickly; avoid starvation of any requests; improve to reduce disk arm movements. Issues with current I/O schedulers Since version 2.6.33, Linux currently employs three disk I/O Schedulers namely Noop, Deadline and Completely Fair Queuing CFQ.As observed in “Linux I/O stack” section, the main functionalities of the block I/O schedulers are as follows:Lifecycle Management of the block I/O “requests” (which may consist of multiples of BIO structures) in the “request queue”. Moving requests from “request queue” to the “dispatch queue”.e dispatch queue is the sequence of requests ready to be sent to the block device driver. HDDs form the backbone of data centers storage. The effect of caching is negligible in an enterprise Big Data environment. Therefore large numbers of page faults occur, which in turn result in most of the data accesses from the underlying storage. Hence, it is imperative to tune the data management software stack to harness the complete potential of the physical media in highly skewed and multiplexing Big Data deployments. The block layer is the most performance critical component to resolve disk I/O contentions along the odyssey of I/O path. Unfortunately, despite its significance in orchestrating the I/O requests, the block layer essentially the I/O Scheduler has not evolved much to meet the needs of Big Data. We have designed and developed two ContentionAvoidance Storage solutions, collectively known as “BID: Bulk I/O Dispatch” in the Linux block layer specifically to suit multi- tenant, multitasking shared Big Data environments. In the first part of this section, we propose a Block I/O scheduling scheme BID-HDD for disk based storage. BID-HDD tries to recreate the sequential in I/O access in order to provide performance isolation to each I/O submitting process. In the second part, we propose a hybrid scheme BID-Hybrid to exploit SCM’s (SSDs) superior random performance to further avoid contentionsatdisk basedstorage.Inthehybrid approach, dynamic process level profiling in the block layer is done for deciding the candidates for tiertoSSD. Therefore, I/O blocks belonging to interruption causing processes are to SSD; while bulky I/Os are served by HDD. BID-HDD scheduling scheme is used for disk request processing and multi-q FIFO architecture for SSD I/O request processing. BID schemes are designed taking into consideration the requirements laid out earlier in “Requirements from a block I/O scheduler in Big Data deployments” section. BID as a whole is aimed to avoid contentions for storage I/Os following system constraints with-out compromising the SLAs. BID-HDD aims to avoid multiplexing of I/O requests from different processesrunningconcurrently.Toachievethis,we segregate the I/O requests from each process into containers. The idea is to introduce dynamically adaptable and need-based anticipation time for each process, i.e. “time to wait for adjoining I/O request”. is allows coalescing of the bulky data accesses and avoid starvation of any requests. Each process container has a wait timer, based on inter arrival time of requests and deadline associated with it. Due to physical limitation of HDDs, therehave been recentto incorporate fash based high-speed, non-volatile secondary memory devices, known as SCMs in data centers. Despite superior random performance of SCMs (or SSDs) over HDDs, replacing disks with SCMs completely for data center deploymentsdoesn’t seemtobefeasibleeconomically as well as due to other associated issues. With recent developments in NVMedevices,withsupporting infrastructure, and, virtualization techniques, a hybrid approach of using heterogeneous tiers of storage together such as those having HDDsandSSDscoupledwith workload- aware tier to balance cost, performance and capacity have become increasingly popular. Data centers consist of many tiers of storage devices. All storage devices of the same type form a tier. For example: all HDDs across the data-center form the HDD tier and all SSD form SSD tier, and similarlyfor other SCMs. Based on profiling ofworkloads,balancedutility value of data usage, the data is managed between the tiers of storage for improved performance. Workload aware Storage Tier or simply Tier istheautomatic classification of how data is managed between heterogeneous tiers of storage in enterprise data center environment. It is vital to develop automated and dynamic tier solutions to utilize all the tiers of storage. BID-Hybrid aims to deliver the capability of dynamic and judicious automated tier in the block layer as a SDS solution.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1057 5. RELATED WORKS The domain of storage technologies has been an active field of research. More recently, there has been research inclination in developing both, the software as well as physical architecture of NVMe, referred to as SCMs to meet the SLAs of Big Data. We broadly classify theliteratureinour focus into: (a) Block layer developments, mostly I/O Scheduling, and (b) Multi-tier storage environment. Table 4 mentions state- of-the-art solutions in both these classifications. 6. BLOCK LAYER DEVELOPMENTS,MOSTLYI/O SCHEDULING In this section, we discuss the developments in the block layer, concentrating mostly on I/O Scheduling. I/O Scheduling has been around since the beginning of disk drives, though we will limit our discussion to those approaches which are relevant to recent developments. Despite advanced optimizations applied across various layers along the odyssey of data access, the Linux I/O stack still remains volatile. e block layer hasn’tevolvedtocaterthe requirements of Big Data. One of the major findings was in establishing relationships between performance and block I/O scheduler.Ourwork on BID-HDD is an in this domain especially for rotation based recording drives. BID is essentially a contention avoidance technique which can be modeled to cater differentobjective functions (storage media type, performance characteristics, etc.).The provides a brief overview of the Linux block layer, basic I/O units, request queue processing, etc.ADproposesa framework which studies the VM interference in Hadoop virtualized environments with the execution of single Map Reduce job with several disk pair schedulers. It divides the Map Reduce job into phases and executes series of experiments using a heuristic tochoosea disk pairscheduler for the next phase in a VM Environment. BORG is a self- optimizing HDD based solution which reorganizes blocks in the block layer by forming sequences via calculating correlation amongst LBA ranges with connectivity based on frequency distribution and temporal locality. It makes weighted graphs and relocation of blocks happens to most needed vertex first. e goal is to service most requests from dedicated zones of a HDD. Multi-q is an important piece of work which extends the capabilities of the block layer for utilizing internal parallelism of SSDs to enable fast computation for multicore systems. It proposes changes to the existing OS block layer with support for multiple software and hardware queues for a single storage device. Multi-q involves a software queue per CPU core. Similar lock contention scheme can be usedforBID,asitalso involves multiple queues. CFFQ is an SSD extension of CFQ scheduler in which each process has a FIFO request queue and the I/O bandwidth is fairly distributed in round robin fashion. SLASSD and Kim et al. propose to ensure diverse SLAs, including reservations, limitations, and proportional sharing by their I/O Scheduling schemes in shared VM environment for SSDs. While SLASSD uses an opportunistic goal oriented block I/O scheduling algorithm, Kim et al. proposes host level SSD I/O schedulers, which are extensions of state-of -the-art I/O scheduling scheme CFQ. Big Data cloud deployments due to the highly skewed, non- uniform and multiplexing workloads prediction of utility value of blocks for tier based on heat of data might not be a viable option. 6. CONCLUSION AND FUTURE WORKS WehavedevelopedanddesignedtwonovelContention Avoidance storage solutions, collectively known as “BID: Bulk I/O Dispatch” in the Linux block layer, specifically to suit multi- tenant, multi-tasking and skewed sharedBigDatadeployments. rough trace-driven experiments using in-house developed system simulators and cloud emulating Big Data benchmarks, we show the effectiveness of both our schemes. BID- HDD, which is essentially a blockI/Oscheduling schemefordisk based storage, results in 28–52%lessertimeforallI/Orequests than the best performing Linux disk schedulers. BID-Hybrid, tries exploit SSDs superior random performance to further reduce contentions at disk based storage. BID-Hybrid is experimentally shown to be successful in achieving 6– 23% performance gains over BID-HDD and 33–54% over best performing Linux scheduling schemes. In future, it would be interesting to design a system with BID schemes for block level contention management coupled with self-optimizing block re-organizationofBORG, adaptive data migration policies of ADLAM, and replication- management of such as Triple-H. is could solve the issue of workload and cost-aware tiering for large scaledata-centers experiencing Big Data workloads. Broader impact of this research would aid Data Centers in achieving their SLAs as well keeping the TCO low. Apart from performance improvements of storage systems, the over-all deployment of BID schemes in data centers would also lead to energyfootprintreductionandincreasein lifespan expectancy of disk based storage devices.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1058 7. REFERENCES 1.Krish K, Wadhwa B, Iqbal MS, Rafique MM, Butt AR. On efficient hierarchical storageforbigdata processing.In:2016 16th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid). New York: IEEE; 2016. p.403– 8. 2. Nanavati M, Schwarzkopf M, Wires J, Warfield A.Non- volatile storage. Queue. 2015;13(9):20–332056. 3. Mittal S, Vetter JS. A survey of software techniques for using non-volatile memories for storage and main memory systems. IEEE Trans Parallel DiatribeSyst.2016;27(5): 1537–50. 4.Love R. Linux Kernel development. 2010. p. 1–300. https://rlove.org/. Accessed 31 Mar 2017. 5.Avanzini A. Debugging Fanatic, Linux and Xenenthusiast. BFQ I/O scheduler. http://ari- ava.blogspot.com/2014/06/opw-linux-block-io-layer-part- 1-base.html.Accessed 15 Apr 2016. 6. Vangoor BKR, Tarasov V, Zadok E. To fuse or not to fuse: performance of user-space file systems. In: Proceedings of FAST’17: 15th USENIX conference on file and storage technologies. 2017. p. 59. 6. Aghayev A, Ts’o T, Gibson G, Desnoyers P. Evolving ext4 for shingled disks. 2017. 7. Arpaci-Dusseau RH, Arpaci-Dusseau AC. Operating systems: three easy pieces, vol. 151. 2014. 8. Moon S, Lee J, Sun X, Kee Y-S.Optimizing the hadoopMapReduce framework with high-performance storage devices.J Supercomput. 2015;71(9):3525–48. 9. Eshghi K, Micheloni R. Ssd architecture and pci express interface. In: Inside solid state drives (SSDs).2013.. 10.Yang Y, Zhu J. Write skew and zipf distribution: evidence and implications. Trans Storage. 2016;12(4):21– 12119. 11.Roussos K. Storage virtualization gets smart. Queue.2007;5(6):38–44. 12. Dean J, Ghemawat S. MapReduce: simplified data processingonlargeclusters. CommunACM.2008;51:107–13 13. White T. Hadoop: the definitive guide. 2012. p. 1– http://hadoopbook.com/. Accessed 31 Mar 2017.