Cloud Computing Ambiance using Secluded Access Control Method

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1054
Cloud Computing Ambiance Using Secluded Access Control Method
Ms. A. Sivasankari1, Ms. P.Bhuvana2, Ms.Arunkumari.G3
1Head of the Department (cs), Dept of Computer Science and Applications, D.K.M. College for Women
(Autonomous),Vellore, Tamilnadu, India.
2Dept of Computer Science and Applications, D.K.M. College for Women (Autonomous), Vellore, Tamilnadu, India.
3Assistant professor, Dept of Computer Science and Applications, D.K.M. College for Women (Autonomous),
Vellore, Tamilnadu, India.
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract – Cloud computing has considerably reduced the
computational and storage costs of outsourced data. The
existing access control techniques offer users access
provisions centered on the frequent user attribute like role,
which reduce the fine grained admission calculate. The
storage space CorrectnessandFinegrainedAccessProvision
(SCFAP) scheme,whichprovidestheuseran exclusiveaccess
through the use of a hierarchical formation which is a
combination of users’ single and widespread attribute. Also,
we deploy the concept of voucher yielding system that
allows the users to authenticate the correctness of
outsourced data without the retrieval of the respective files.
The tokens are derived from the metadata containing file
position that helps in the process storage correctness
verification and improvises the storage efficiency. The
untried results show SCFAP has superior storage efficiency
and error recovery measures than existing techniques.
Keywords: Access control, access formation, barrier
limits, storage efficiency, token granting system.
1. INTRODUCTION
Data Centers today cater to a wide Diaspora of applications,
with workloads varying from data science batch and
streaming applications to decodinggenomesequences.Each
application can have different syntax and semantics,
with varying I/O needs from storage. With highly
sophisticated and optimized data processing frameworks,
such as Hadoop and Spark, applications are capable of
processing large amounts of data at the same time.
Dedicating physical resources for every application is not
economically feasible. In cloud environments,withtheaidof
server and storage virtualization, multiple processes
contend for the same physical resource (namely, compute,
network and storage) iscausescontentions.In-ordertomeet
their service level agreements (SLAs), cloud providers need
to ensure performance isolation guarantees for every
application.
With multi-core computing capabilities, CPUs have scaledto
accommodate the needs of “Big Data”, but storage still
remains a bottleneck. e physical media characteristics and
interface technology are mostly blamed for storage being
slow, but this is partially true. Full potential of storage
devices cannot be harnessed till all the layers of the I/O
hierarchy function efficiently.
Performance of storage devices depends on the order in
which the data is stored and accessed. Therefore, in large
scale distributed systems (“cloud”), data management plays
a vital role in processing and storing pet bytesofdata among
hundreds of thousands of storage devices. e problems
associated due to the in efficiencies in data management
get amplified in multi-tasking, and shared Big Data
environments.
Despite advanced optimizations applied across various
layers along the odyssey of data access, the I/O stack still
remains volatile. Linux OS (Host) block layer is the most
critical part of the I/O hierarchy as it orchestrates the I/O
requests from different applications to the underlying
storage. e key to the performance of the block layer is the
Block I/O scheduler, which is responsible for dividing the
I/O bandwidth amongst the contending processes as well as
determines the order of requests sent to storage device.
Unfortunately, despite its significance, the block layer,
essentially the block I/O scheduler hasn’t evolved to meet
the volume and contention resolution needs of data centers
experiencing Big Data workloads. We have designed and
developed two Contention Avoidance Storage solutions in
the Linux block layer, collectively known as “BID: Bulk I/O
Dispatch”, specifically to suit multi-tenant, multitasking Big
Data shared resource environments. Big Data applications
use data processing frameworks such as Hadoop Map
Reduce, which access storage in large data chunks (64 MB
HDFS blocks,) therefore exhibitingevidentsequential.Dueto
contentions amongst concurrent I/O submitting processes
and the working of the current I/O schedulers, the inherent
sequential of Big Data processes is lost. The processes may
be instances of the same application or belong to other
applications. contentions result into unwanted such as
multiplexing and inter leavings, thereby breaking of large
data accesses. Increase in latency of storage In the first
solution, we propose a dynamically adaptable Block I/O
scheduling scheme BID-HDD, for disk based storage.
BID-HDD tries to recreate the in I/O access in order to
provide performance isolation to each I/O submitting
process. Rough trace driven simulation based experiments

with cloud emulating Map Reduce benchmarks, we show of
BID -HDD which results in 28–52% I/O time performance
gain for all I/O requests than the best performing Linux disk
schedulers.
With recent developments in NVMe (non-volatile memory)
devices such as solid state drives (SSDs), commonly known
as storage class memories (SCM) , with supporting
infrastructure, and, virtualization techniques, a hybrid
approach of using heterogeneous tiers of storage together
such as those having HDDsandSSDscoupledwith workload-
aware tier to balance cost, performance and capacity have
become increasingly popular. Inthesecondpart,we propose
a novel hybrid scheme BID-Hybrid to exploit SCM’s (SSDs)
superior random performance to further avoid contentions
at disk based storage. The main goal of BID-Hybrid is to
further enhance the performance of BID-HDD scheduling
scheme, by interruption causing non-bulky I/Os to SSD and
thereby making the “HDD request queue”availablefor bulky
and sequential I/Os.
Contrary to the existing literature of tier, where data is
tiered based on deviation of adjacent disk block locations in
the device “request queue”, BID-Hybrid profiles process I/O
characteristics (bulkiness) to decide on the correct
candidates for tier. Current literature might cause
unnecessary deportations to SSDs, due to M/Os from an
application, which might be sequential but appear random
due to the contention by otherapplicationsinsubmittingI/O
to the “request queue”. While BID-Hybrid uses staging
capabilities and anticipation time for judicious and verified
decisions. BID-Hybrid serves I/Os from bulky processes in
HDD and tiers I/Os from non-bulky (lighter) interruption
causing processes to SSD.
BID-Hybrid is successfully able to achieve its objective
further reducing contention at disk based storage device.
BID Hybrid results in performance gain of 6–23% for Map
Reduce workloads over BID-HDD and 33– 54%overthe best
performing Linux scheduling schemes.
2. BACKGROUND
Hadoop Map Reduce working and workload characteristics
and Requirements from a block I/O scheduler in Big Data
deployments sections discuss the I/O workload
characteristics of Hadoop deployments and the
requirements from a I/O scheduler in such environments,
respectively. “Issues with current I/O schedulers” section
describes the working of the current state-of-the-art Linux
disk schedulers deployed in shared Big Data infrastructure.
3. WORKING AND WORKLOAD
CHARACTERISTICS
Hadoop Map Reduce is the defect large data processing
framework for Big Data. Hadoop is a multi-tasking system
which can process multiple data sets for multi-jobs in a
multi-user environment at the same time. Hadoop uses a
block-structured file system, known as Hadoop Distributed
File System (HDFS). HDFS splits the stored files into fixed
size (generally 64 MB/128 MB) file system blocks, knownas
chunks, which are usually tri-replicated across the storage
nodes for fault tolerance and performance. Hadoop is
designed in such a way that the processes access the data in
chunks. When a process opens a file, it reads/writes in
multiples of these chunks. Enterprise Hadoop workloads
have highly skewed characteristics making the profiling
tough with the “hot” data being really large. Us, the eﬀect of
file system caching is negligible in HDFS. Most of the data
access is done from the underlyingdisk (orsolidstate)based
storage devices. Therefore, a single chunk causes multiple
page faults, which eventually would result in creation and
submission of thousands of I/O requests to the block layer
for further processing before dispatching them to the
physical storage.
Each Map Reduce application consists of multiple processes
submitting I/Os concurrently, possibly in diﬀerent
interleaving stages, i.e. Map.Reduce, eachhavingskewedI/O
requirements. Moreover, these applications run on multi-
tenant infrastructure which is shared by a wide of such
applications, each having different syntax and semantics.
For Big Data multi-processing environments, although the
requests from each concurrent process results into large
number of sequential disk accesses, they face contention at
the storage interface from other applications. These
contentions are resolved by the OS Block Layer, more
essentially the I/O scheduler. e inherent sequential
operations of applications becomes non-sequential due to
the working of the current disk I/O schedulers, which
thereby result into unwanted like multiplexing and
interleaving of requests is also results in higher CPU
wait/idle time as it has to wait for the data. In order to pro-
vide performance isolation to each process as well as
improve system performance, it is imperative to remove or
avoid contentions.“Issues with current I/O schedulers”
section describes the working of the currentstate-of-the-art
Linux disk schedulers deployed in shared Big Data
infrastructure. In the next section, we discuss the
requirements of a block I/O scheduler most suited for
Hadoop deployments.
4. REQUIREMENTS FROM A BLOCK I/O
SCHEDULER IN BIG DATA DEPLOYMENTS
The key requirements from a block I/O scheduler in
multiprocess shared Big Data environments,suchasHadoop
Map Reduce are as follows:
Capitalize on large I/O access: Data is accessed in large
data chunks (64/128 MB in HDFS), whichhavea highdegree
of sequential in the storage media. I/O scheduler should be

able to capitalize on large I/O access and should not break
these large sequential requests.
Adaptive: Multiple CPUs (or applications) try to access the
same storage media in a shared infrastructure, which causes
skewed workload patterns. Additionally, each Map Reduce
task itself has varying and interleaving I/O characteristicsin
its Map, Reduce and phases. Therefore it is imperativefor an
I/O scheduler to dynamically adapt to such skewed and
changing I/O patterns.
Performance isolation: In-order to meet the SLAs, it is
highly imperative to provide I/O performance isolation for
each application. For ex: A single Map Reduce application
consists of multiple of tasks, each consisting of multiple
processes, each having different I/O requirements.
Therefore, an I/O scheduler through process-level
segregation should ensure I/O resource isolation to every
I/O contending process.
Regular I/O scheduler features Reducing CPU wait/idle
time by serving blocking I/ Os (reads) quickly; avoid
starvation of any requests; improve to reduce disk arm
movements.
Issues with current I/O schedulers
Since version 2.6.33, Linux currently employs three disk I/O
Schedulers namely Noop, Deadline and Completely Fair
Queuing CFQ.As observed in “Linux I/O stack” section, the
main functionalities of the block I/O schedulers are as
follows:Lifecycle Management of the block I/O “requests”
(which may consist of multiples of BIO structures) in the
“request queue”. Moving requests from “request queue” to
the “dispatch queue”.e dispatch queue is the sequence of
requests ready to be sent to the block device driver.
HDDs form the backbone of data centers storage. The effect
of caching is negligible in an enterprise Big Data
environment. Therefore large numbers of page faults occur,
which in turn result in most of the data accesses from the
underlying storage. Hence, it is imperative to tune the data
management software stack to harness the complete
potential of the physical media in highly skewed and
multiplexing Big Data deployments. The block layer is the
most performance critical component to resolve disk I/O
contentions along the odyssey of I/O path.
Unfortunately, despite its significance in orchestrating the
I/O requests, the block layer essentially the I/O Scheduler
has not evolved much to meet the needs of Big Data.
We have designed and developed two ContentionAvoidance
Storage solutions, collectively known as “BID: Bulk I/O
Dispatch” in the Linux block layer specifically to suit multi-
tenant, multitasking shared Big Data environments. In the
first part of this section, we propose a Block I/O scheduling
scheme BID-HDD for disk based storage. BID-HDD tries to
recreate the sequential in I/O access in order to provide
performance isolation to each I/O submitting process.
In the second part, we propose a hybrid scheme BID-Hybrid
to exploit SCM’s (SSDs) superior random performance to
further avoid contentionsatdisk basedstorage.Inthehybrid
approach, dynamic process level profiling in the block layer
is done for deciding the candidates for tiertoSSD. Therefore,
I/O blocks belonging to interruption causing processes are
to SSD; while bulky I/Os are served by HDD. BID-HDD
scheduling scheme is used for disk request processing and
multi-q FIFO architecture for SSD I/O request processing.
BID schemes are designed taking into consideration the
requirements laid out earlier in “Requirements from a block
I/O scheduler in Big Data deployments” section. BID as a
whole is aimed to avoid contentions for storage I/Os
following system constraints with-out compromising the
SLAs.
BID-HDD aims to avoid multiplexing of I/O requests from
different processesrunningconcurrently.Toachievethis,we
segregate the I/O requests from each process into
containers. The idea is to introduce dynamically adaptable
and need-based anticipation time for each process, i.e. “time
to wait for adjoining I/O request”. is allows coalescing of the
bulky data accesses and avoid starvation of any requests.
Each process container has a wait timer, based on inter
arrival time of requests and deadline associated with it.
Due to physical limitation of HDDs, therehave been recentto
incorporate fash based high-speed, non-volatile secondary
memory devices, known as SCMs in data centers.
Despite superior random performance of SCMs (or SSDs)
over HDDs, replacing disks with SCMs completely for data
center deploymentsdoesn’t seemtobefeasibleeconomically
as well as due to other associated issues.
With recent developments in NVMedevices,withsupporting
infrastructure, and, virtualization techniques, a hybrid
approach of using heterogeneous tiers of storage together
such as those having HDDsandSSDscoupledwith workload-
aware tier to balance cost, performance and capacity have
become increasingly popular. Data centers consist of many
tiers of storage devices. All storage devices of the same type
form a tier. For example: all HDDs across the data-center
form the HDD tier and all SSD form SSD tier, and similarlyfor
other SCMs. Based on profiling ofworkloads,balancedutility
value of data usage, the data is managed between the tiers of
storage for improved performance.
Workload aware Storage Tier or simply Tier istheautomatic
classification of how data is managed between
heterogeneous tiers of storage in enterprise data center
environment. It is vital to develop automated and dynamic
tier solutions to utilize all the tiers of storage. BID-Hybrid
aims to deliver the capability of dynamic and judicious
automated tier in the block layer as a SDS solution.

5. RELATED WORKS
The domain of storage technologies has been an active field
of research. More recently, there has been research
inclination in developing both, the software as well as
physical architecture of NVMe, referred to as SCMs to meet
the SLAs of Big Data. We broadly classify theliteratureinour
focus into:
(a) Block layer developments, mostly I/O Scheduling, and
(b) Multi-tier storage environment. Table 4 mentions state-
of-the-art solutions in both these classifications.
6. BLOCK LAYER DEVELOPMENTS,MOSTLYI/O
SCHEDULING
In this section, we discuss the developments in the block
layer, concentrating mostly on I/O Scheduling. I/O
Scheduling has been around since the beginning of disk
drives, though we will limit our discussion to those
approaches which are relevant to recent developments.
Despite advanced optimizations applied across various
layers along the odyssey of data access, the Linux I/O stack
still remains volatile. e block layer hasn’tevolvedtocaterthe
requirements of Big Data.
One of the major findings was in establishing relationships
between performance and block I/O scheduler.Ourwork on
BID-HDD is an in this domain especially for rotation based
recording drives. BID is essentially a contention avoidance
technique which can be modeled to cater diﬀerentobjective
functions (storage media type, performance characteristics,
etc.).The provides a brief overview of the Linux block layer,
basic I/O units, request queue processing, etc.ADproposesa
framework which studies the VM interference in Hadoop
virtualized environments with the execution of single Map
Reduce job with several disk pair schedulers. It divides the
Map Reduce job into phases and executes series of
experiments using a heuristic tochoosea disk pairscheduler
for the next phase in a VM Environment. BORG is a self-
optimizing HDD based solution which reorganizes blocks in
the block layer by forming sequences via calculating
correlation amongst LBA ranges with connectivity based on
frequency distribution and temporal locality. It makes
weighted graphs and relocation of blocks happens to most
needed vertex first. e goal is to service most requests from
dedicated zones of a HDD.
Multi-q is an important piece of work which extends the
capabilities of the block layer for utilizing internal
parallelism of SSDs to enable fast computation for multicore
systems.
It proposes changes to the existing OS block layer with
support for multiple software and hardware queues for a
single storage device. Multi-q involves a software queue per
CPU core.
Similar lock contention scheme can be usedforBID,asitalso
involves multiple queues. CFFQ is an SSD extension of CFQ
scheduler in which each process has a FIFO request queue
and the I/O bandwidth is fairly distributed in round robin
fashion. SLASSD and Kim et al. propose to ensure diverse
SLAs, including reservations, limitations, and proportional
sharing by their I/O Scheduling schemes in shared VM
environment for SSDs. While SLASSD uses an opportunistic
goal oriented block I/O scheduling algorithm, Kim et al.
proposes host level SSD I/O schedulers, which are
extensions of state-of -the-art I/O scheduling scheme CFQ.
Big Data cloud deployments due to the highly skewed, non-
uniform and multiplexing workloads prediction of utility
value of blocks for tier based on heat of data might not be a
viable option.
6. CONCLUSION AND FUTURE WORKS
WehavedevelopedanddesignedtwonovelContention
Avoidance storage solutions, collectively known as “BID: Bulk
I/O Dispatch” in the Linux block layer, specifically to suit multi-
tenant, multi-tasking and skewed sharedBigDatadeployments.
rough trace-driven experiments using in-house developed
system simulators and cloud emulating Big Data benchmarks,
we show the eﬀectiveness of both our schemes. BID-
HDD, which is essentially a blockI/Oscheduling schemefordisk
based storage, results in 28–52%lessertimeforallI/Orequests
than the best performing Linux disk schedulers. BID-Hybrid,
tries exploit SSDs superior random performance to further
reduce contentions at disk based storage. BID-Hybrid is
experimentally shown to be successful in achieving 6– 23%
performance gains over BID-HDD and 33–54% over best
performing Linux scheduling schemes.
In future, it would be interesting to design a system
with BID schemes for block level contention management
coupled with self-optimizing block re-organizationofBORG,
adaptive data migration policies of ADLAM, and replication-
management of such as Triple-H. is could solve the issue of
workload and cost-aware tiering for large scaledata-centers
experiencing Big Data workloads.
Broader impact of this research would aid Data
Centers in achieving their SLAs as well keeping the TCO low.
Apart from performance improvements of storage systems,
the over-all deployment of BID schemes in data centers
would also lead to energyfootprintreductionandincreasein
lifespan expectancy of disk based storage devices.

7. REFERENCES
1.Krish K, Wadhwa B, Iqbal MS, Rafique MM, Butt AR. On
eﬃcient hierarchical storageforbigdata processing.In:2016
16th IEEE/ACM international symposium on cluster, cloud
and grid computing (CCGrid). New York: IEEE; 2016. p.403–
8.
2. Nanavati M, Schwarzkopf M, Wires J, Warfield A.Non-
volatile storage. Queue. 2015;13(9):20–332056.
3. Mittal S, Vetter JS. A survey of software techniques for
using non-volatile memories for storage and main memory
systems. IEEE Trans Parallel DiatribeSyst.2016;27(5):
1537–50.
4.Love R. Linux Kernel development. 2010. p. 1–300.
https://rlove.org/. Accessed 31 Mar 2017.
5.Avanzini A. Debugging Fanatic, Linux and
Xenenthusiast. BFQ I/O scheduler. http://ari-
ava.blogspot.com/2014/06/opw-linux-block-io-layer-part-
1-base.html.Accessed 15 Apr 2016.
6. Vangoor BKR, Tarasov V, Zadok E. To fuse or not to fuse:
performance of user-space file systems. In: Proceedings of
FAST’17: 15th USENIX conference on file and storage
technologies. 2017. p. 59.
6. Aghayev A, Ts’o T, Gibson G, Desnoyers P. Evolving ext4
for shingled disks. 2017.
7. Arpaci-Dusseau RH, Arpaci-Dusseau AC. Operating
systems: three easy pieces, vol. 151. 2014.
8. Moon S, Lee J, Sun X, Kee Y-S.Optimizing the
hadoopMapReduce framework with high-performance
storage devices.J Supercomput. 2015;71(9):3525–48.
9. Eshghi K, Micheloni R. Ssd architecture and pci
express interface. In: Inside solid state drives
(SSDs).2013..
10.Yang Y, Zhu J. Write skew and zipf distribution: evidence
and implications. Trans Storage. 2016;12(4):21– 12119.
11.Roussos K. Storage virtualization gets smart.
Queue.2007;5(6):38–44.
12. Dean J, Ghemawat S. MapReduce: simplified data
processingonlargeclusters. CommunACM.2008;51:107–13
13. White T. Hadoop: the definitive guide. 2012. p. 1–
http://hadoopbook.com/. Accessed 31 Mar 2017.

Cloud Computing Ambiance using Secluded Access Control Method

More Related Content

What's hot (16)

Similar to Cloud Computing Ambiance using Secluded Access Control Method (20)

More from IRJET Journal (20)

Recently uploaded (20)

Cloud Computing Ambiance using Secluded Access Control Method