SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 84
IMPLEMENTATION OF DE-DUPLICATION ALGORITHM
Nirmala Bhadrappa1, Dr. G S Mamatha2
1Department of ISE, R.V College of Engineering, Bengaluru, India
2Associate Professor, Dept. of ISE, R.V College of Engineering, Bengaluru, India
---------------------------------------------------------------------***---------------------------------------------------------------------
ABSTRACT: Data which are duplicatedcan beavoidedusing
data de-duplication technique, and these techniques are used
to reduce storage space. This also helpsinreducingbandwidth
and can be stored on cloud storage. These de-duplications are
used to secure the data and have been a challenge for keeping
the data securely. To avoid miss handling of data cloud
convergent encryption technique is used. The duplication of
data can be treated in two different methods. Firstly we will
have to address the problem proficiently and should handle a
huge count of convergent keys. Secondly data resource raises
i.e. security and privacy. A third-party cloud service is
proposed for confidentiality of data; reliability checking by
access control mechanisms can be done both internal and
external. As the duplication techniqueimprovesstoragespace,
bandwidth and efficiency will be aconflictwiththeconvergent
encryption technique. So the convergent encryptiontechnique
requires key for their respective data to encrypt. The copies of
same data and will be checked for data feasibility. Convergent
encryption helps in encryptingand decryptingthedatausinga
key guaranteeing same data to be duplicated on to itself. The
key generation and data encryption technique helps to hold
the key and send the cipher text to cloud serviceprovider. Thus
encrypting technique is used to determine identicalcopiesand
also to create similar key and the identical cipher text hence
data is stored secured and only authorized user can accessthe
information from the cloud service provider.
KEYWORDS: data de-duplication, convergent
encryption, de-duplication efficiency, SHA algorithm
1. INTRODUCTION
As we move forward with the development of enterprise
data accelerates, the task of protecting and de-duplication
becomes more challenging. The individualized computing
frameworks like desktops, portable PCs, tablets, advanced
mobile phones have turned out to be significant stages for
various users, increasing the significance of data on these
gadgets. We may lose data because of system failure or at
times the data might be erased consequently or we may lose
the data by losing the gadget or be lost by looting of gadget,
yet people have enhanced the utilization of data protection
and recovery tool in their individualized computinggadgets.
Storage resources like Amazon S3 and Google storage take
economic advantages to store the data on the cloud storage
for users. Figure1. Replicates the data reinforcement for
individual stockpiling, these have beenoutsourcedsoclients
can oversee information much effectively withoutbothering
about keeping up about the reinforcement. The clouds are
always centralized so that they can be easily managed
efficiently and disaster free. The clouds offer offsite storage
for data backup. The data reinforcement for individual
stockpiling in the cloud demonstrates a geographic division
between customer and the service provider. In de-
duplication the data redundancy is removed by generating
hash key to the respective file and later these file is divided
into smaller parts based on the number of lines in the file.
This smaller parts can also be called has blocks. Hash key is
also created to these blocks for de-duplication check.
Fig-1: Shows cloud backup platform
The cloud concepts can be understood in more detail from
the below:
1.1 DE-DUPLICATION
Data de-duplication is a technique of finding duplication of
data in storage space. This technique used to improve
bandwidth and utilization of storagethatcanalsobeused for
data transfers over network and to decrease the number of
bytes and file size. Data de-duplication technique identifies
and removes the data which are not unique. Whenever any
similar data match occurs, they are copied with a small
reference. Based on file content or file name data de-
duplication is performed.
Data de-duplication consists of following steps:
Step 1: Divide the input file into blocks based on number of
lines in the file.
Step 2: Hash key value is generated for each block of file.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 85
Step 3: The generated hash value is matched with stored
hash value if it is matched then duplication removed.
Step 4: Duplicate data is replaced with reference object
which is stored in database earlier
1.2 CONVERGENT ENCRYPTION
It is also known as content hashing. It is a cryptographic
algorithm which generates alike cipher text from the plain
text. Cloud computing has unique applications to remove
duplication of files from storage, where the user is unaware
or having rights to use to the keys. Convergent encryption is
provided with authorization of a fileinwhichanattackercan
confirm whether the target possesses a specific file. The
attack possesses a problem for users to store information
which is publicly available/already held by the attacker. For
an example consider the books which are banned orthefiles
that cause copyright violation is a best sample for the same.
If an argument could be made that a validationofa fileattack
is easily rendered unsuccessful then by simply adding a
unique portion of data or a few arbitrary characters to the
plain text before encryption; would cause the uploaded file
to be unique and therefore results in a unique encryptedfile.
There are several samples to show convergent encryption
scheme in a plain-text, which are broken down into small
blocks, based on content of the text in the files, after that
each block performs convergent encryption which may by
mistake overcome any attempts at making the file unique by
adding bytes at any place.
1.3 KEY MANAGEMENT
Key management is method for cryptographic keys in a
cryptosystem to manage the files. The cryptosystem uses
different types of key, so as to differentiate and compare.
These keys may be asymmetric or symmetric. For both
encryption and decryption of message alike keys are used in
symmetric key algorithm. The selected keys must be
distributed and securely stored for finding out duplication.
In asymmetric key twodifferentkeysareusedforencryption
and decryption algorithm and they communicated with files
once keys is found, it would be easytomanagethem.Thekey
management involves exchange of data, storage of data and
use of them with required key.
2. FEATURES OF CRYPTOGRAPHIC
 Confidentiality
Only authorized party should be able to access the
information which is transmitted over network and
not by the third party.
 Authentication
The receiver should check the identityofthesender
before accessing the information whether the
information is sent from the authorized person or
by an attacker
 Integrity
The permission should not be given to everyone for
modifying the transmitted data only an authorized
party should be allowed to do so. Third party is not
allowed to modify the data.
 Non Repudiation
It ensures that neither the sender, nor the receiver
of the information should be able to deny the
transmission.
 Access Control
Only approved persons will be able to access the
data.
2.1 CRYPTOGRAPHIC ALGORITHM
Following are well known cryptographic algorithm
 DES: It stands for “Data Encryption Standard”. It
operates on 64 bit block of data using 56-bit key .It
is symmetric key block cipher. In DES the algorithm
and cryptographic key is applied simultaneouslyon
a block of data rather than one bit at a time to
encrypt a plaintext.
 RSA: RSA is designed by Rivest, Shamir and
Adleman. It is a public-key system and it is an
asymmetric cryptographic algorithm. It is used for
encryption and decryption of message.
 HASH: A “hash algorithm” is also calledas“message
digest”, or “fingerprint” and it is used for mapping
the data size. Hash function returns the hash value.
Hash value is stored in hash table
 AES: NIST as approved Advanced Encryption
Standard and uses Rijndael block cipher.
 SHA-1: SHA-1 is a hashing algorithm, produces a
digest of 160 bits (20 bytes) Same SHA-1 message
digest is given for two different messages hence
SHA-1 is recommended than MD5.
 HMAC: HMAC uses a key with an algorithm,
algorithms such as MD5 or SHA-1 anditisoneofthe
hashing algorithms hence in can also be referred as
HMAC-MD5 and HMAC-SHA1.
3. LITERATURE SURVEY
Data de-duplication can be divided into two parts: de-
duplication on unencrypted data and de-duplication on
encrypted data. In the previous way, performs proof of
ownership procedure in an effective and vigorous way.
Though, in the last way, privacy of data is the essential
security prerequisite make secure againstthirdpartyaswell
as inside the cloud server. In this manner, the majorityofthe
plans have been proposed to give information encryption,
while as yet profiting by a de-duplication system, by
empowering data proprietors to share the keys within the
sight.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 86
Data de-duplication over unencrypted data
Harnick et al. [1] exhibited how information de-duplication
method can be utilized as a side channel that uncovers data
to pernicious clients about the contents of documents of
different clients. Halevi et al. [2] likewise presented a
comparative assault situation on distributed storage that
utilizations de-duplication over various clients. This is on
account of just a little snippet of data about the information,
in particular, its hash esteem, fills in as not just a record of
the information to find data of the information among
countless, yet in addition a proof that any individual whose
the hash key is known esteem claims the comparing
information. In this manner, any clients who can get the
short hash an incentive for particular information can get to
every one of the information put away in the distributed
storage. Harnik et al. [3] projected randomized limits to
maintain a strategic distance from an assault on distributed
storage benefits that utilization server-side information de-
duplication by ceasing informationde-duplication.Bethatas
it may, their technique did not utilize customer side
information ownership verifications to counteract hash
control assaults. To conquer these assaults, Halevi et al. [4]
presented and define the idea of proof of ownership; here
the client demonstrates to a server that the document
utilizing At that point, a test reaction convention between
the server and the customer checks the possession. PoW is
firmly identified with verification of retrievability [5] and
evidence of information ownership [6]. In any case,
confirmation of retrievability and information ownership
regularly utilizes a pre-handling step that can'tbeutilized as
a part of the information de-duplication strategy.
DATA DE-DUPLICATION OVER ENCRYPTED DATA
Keeping in mind the end goal to protect informationsecurity
against inside cloud server and outside enemies,clients may
need their information encoded. In any case, conventional
encryption under various clients'keysmakescrossclient de-
duplication inconceivable, since the cloud server would
dependably observe distinctivefigurewritings,regardlessof
the possibility that the information are the same. United
encryption, presented by Douceur et al. [7], is a hopeful
answer for issue. Focalized encryption, an information
proprietor infers an encryption key R <-F (N), where N is
information or a document to be scrambled and F is a
cryptographic hash function. Then, he figures theciphertext
T <-P (R, N) through a piece figure P, erases N, andkeepsjust
R in the wake of transferring C to the cloud storage. In the
event that another client encodes a similar text, the alike
cipher text T is created since it is deterministic. Therefore,
on release of T from different clients later the underlying
transfer, the server does not holds the record but rather
refreshes data of data to demonstrate it has an extra
proprietor. In the event that any genuine owner ask for and
download T afterwards, they can encode with R. Be that as it
may, united encryption experiences the following security
flaws.
Xu et al. [8] likewise projected a spillage strong de-
duplication plan to determine the information uprightness
issue. This plan likewise empowers the information
proprietor to scramble informationwithanarbitrarilychose
key. At that point, the information encoded key is scrambled
under a KEK derived from the information and conveyed to
the next information proprietors after the proof of
ownership procedure. On the off chance that a honest to
goodness proprietor gets a cipher text, the honesty of the
information can be reviewed by unscrambling the
information encryption key with the same KEK.
4. WORKING
The Original Data block is chosen to out sourcedintothe CSP
(cloud service provider). The file or block of file can be
already present in cloud storage. The file that has to be
uploaded to cloud service provider first checks whether the
file or the block of file exists in the CSP
HASH KEY GENERATION
Based on file content hash key is generated. The tag, hash
key generated is unique for each file. Key generation
algorithm maps a data duplicate N to a convergentkeyRand
it is based on the security parameter. The purpose of
generating hash key is to encrypt the block of data with
unique hash key.
ENCRYPTION OF FILE
Convergent encryption gives data confidentiality in de-
duplication. The encrypt data, uses their own encryption
keys for accessing i.e. The keys are derived from itself and
hence, produce identical cipher text from identical files.
Convergent encryption allows each person cloud storage in
large amounts for a very low cost, It also offers privacy at its
core. This privacy has concern with cloud storage services
where de-duplicating data is done via convergent
encryption, as de-duplication can be used to find users who
are storing a file. Another reason is that it also finds if any
other attacker also has a copy of the same file. For example,
an oppressive government could find out users who are
storing copies of banned books. The same isusedtodiscover
users who are storing copyrighted material, thus assuming
direct access to the servers which is provided to the outside
party. Private Key encryption is used to bypass de-
duplication and hence forcing the cloud storage service to
store a unique copy of the files. Here no “password reset”
option is provided with convergent encryption, so if one
forgets it the data will be lost for ever. The client decides a
tag for the copy of data which can be utilized to recognize
duplicates. The tags are same if two data copiesareidentical.
To check whether there is any duplication, the clientinitially
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 87
sends the tag to server to check whether the copy exists or
not. Encryption is done using advanced encryptionstandard
algorithm. The encrypted file is stored in cloud service
provider with hash key and tag.
PROCESS OF FILE UPLOADING
When user wants to upload a file “A” file level De duplication
is performed first. The user computes file tag on the input
file “A” Upon receiving, the auditor checks if there exists the
same file with same tag on the cloud service provider. If
auditor replies that there is file duplication,orthereisnofile
duplication, if the user gets the reply that there is no file
duplication, then duplication check is done with block-level.
If there is file duplication then user checks for proof of
ownership whether the same file “A” that is stored in the
cloud service provider.
Fig 2: Architecture Diagram of de-duplication
PROCESS OF DE-DUPLICATION
Once the file is uploaded, upon receiving the auditor checks
whether there exist the similar tag and the hash key for the
corresponding file on the cloud service provider. If so,
auditor replies to the user whether there is file duplication
or there no file duplication. If the user receives the response
“no file duplication” then it moves to the next level that is
block level file duplication. If the auditor gets a reply as “file
duplication” then the user runs proof of ownership i.e. it
checks for the actual owner of the file on cloud service
provider. If proof of ownership file is passed, the cloud
service provider simply precedes a file pointer points to the
file to the user, and no more information will be uploaded. If
proof of ownership fails the upload operation on cloud
service provider get terminated.
PROCESS OF FILE DOWNLOADING
If in case user needs to download a file first along with a
filename request is sent to a cloud service provider. Upon
receiving, the request cloud service provider checks for the
authentication of the user using secret key. If user is
authenticated then the file to be downloaded will be in
encrypted form using convergent keyandcorrespondingtag
and hash key of respective file can be decrypted.
DECRYPTION
The user downloads the file from the cloud serviceprovider
using hash key and tag , the user decrypt the file using
decryption encryption standard.
EXPERIMENTAL ANALYSIS
When a user uploads the datawithencryptionalgorithm,itis
compared with two encryption algorithms such as DES and
AES. All this depends on the basis of block size. DES has 64
bits block size and AES has 128 bits block size. So the
number of blocks required to send over the network in DES
is greater than that of AES. The AES analysis is moreefficient
than that of DES.
CONCLUSION
Maintaining encoded data with secure de-duplication is an
important and significant aspect in practice on CSP. The
proposed work maintains the encoded data in cloud with
data de-duplication with on proof of ownership. Our
proposed work can support updating of data flexibly and
sharing of data with data de-duplication even though users
are not online. Only authorized user can access the encoded
data and get symmetric keys that is used for decryption
hence it is much secured. The analysis of the performance
and test conducted reviled that technique issecure, efficient,
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 88
suitable for data de-duplication. The resultsofourcomputer
simulations showed the practicability ofourproposed work.
REFERENCES
[1] M. Bellare, S. Keelveedhi, and T. Ristenpart, “DupLESS:
Server Aided Encryption for Deduplicated Storage,”
Proceedings of the 22nd USENIX Conference on Security,
2013, pp. 179-194.
[2] Dropbox, “A File-Storage and Sharing Service,”
http://www.dropbox.com/.
[3] Google Drive, http://drive.google.com.
[4] Mozy, “Mozy: A File-storage and Sharing Service,”
http://mozy.com/.
[5] J.R. Douceur, A. Adya, W.J. Bolosky, D. Simon, and
M.Theimer, “Reclaiming Space from Duplicate Files in a
Serverless Distributed File System,” Proceedings of IEEE
International Conferenceon DistributedComputingSystems,
2002, pp. 617 -624
[6] G. Wallace, F. Douglis, H. Qian, P. Shilane , S. Smaldone, M.
Chamness, and W. Hsu, “Characteristics of Backup
Workloads in Production Systems,” Proceedings of USENIX
Conference on File and StorageTechnologies,2012,pp.1-16.
[7]Z.O. Wilcox,“Convergent Encryption
Reconsidered,”2011,http://www.mailarchive.com/cryptogr
aphy@metzdowd.com/msg08949.html.
[8] G. Ateniese, K. Fu, M. Green, and S. Hohenberger,
“Improved Proxy Re-Encryption Schemes with Applications
to Secure Distributed Storage,” ACM Transactions on
Information and System Security, 9(1), 2006, pp. 1-30.
[9] Open de dup. http://opendedup.org/.
[10] D.T. Meyer and W.J Bolosky., “A Study of Practical De-
duplication,”ACM Transactions on Storage, 2012, pp.1-20.

More Related Content

PDF
E031102034039
PDF
IRJET- Privacy Preserving Cloud Storage based on a Three Layer Security M...
PDF
A Privacy Preserving Three-Layer Cloud Storage Scheme Based On Computational ...
PDF
IRJET - Multi Authority based Integrity Auditing and Proof of Storage wit...
PDF
Paper id 27201448
PDF
PDF
Ijaems apr-2016-7 An Enhanced Multi-layered Cryptosystem Based Secure and Aut...
PDF
International Journal of Engineering and Science Invention (IJESI)
E031102034039
IRJET- Privacy Preserving Cloud Storage based on a Three Layer Security M...
A Privacy Preserving Three-Layer Cloud Storage Scheme Based On Computational ...
IRJET - Multi Authority based Integrity Auditing and Proof of Storage wit...
Paper id 27201448
Ijaems apr-2016-7 An Enhanced Multi-layered Cryptosystem Based Secure and Aut...
International Journal of Engineering and Science Invention (IJESI)

What's hot (20)

PDF
Secret keys and the packets transportation for privacy data forwarding method...
PDF
Secret keys and the packets transportation for privacy data forwarding method...
PDF
Research trends review on RSA scheme of asymmetric cryptography techniques
PDF
Revocation based De-duplication Systems for Improving Reliability in Cloud St...
PPTX
SECRY - Secure file storage on cloud using hybrid cryptography
PDF
IRJET- Secure Sharing of Personal Data on Cloud using Key Aggregation and...
PDF
Review on Key Based Encryption Scheme for Secure Data Sharing on Cloud
PDF
Secure Medical Data Computation using Virtual_ID Authentication and File Swap...
PDF
Secure Privacy Preserving Using Multilevel Trust For Cloud Storage
PDF
A Review on Key-Aggregate Cryptosystem for Climbable Knowledge Sharing in Clo...
PDF
Searchable Encryption Systems
PDF
IRJET- Comparative Analysis of Encryption Techniques
PDF
IRJET- Privacy Preserving Encrypted Keyword Search Schemes
PDF
Ieeepro techno solutions 2014 ieee java project -key-aggregate cryptosystem...
DOC
126689454 jv6
PDF
Secure Redundant Data Avoidance over Multi-Cloud Architecture.
PDF
A research paper_on_lossless_data_compre
PDF
IRJET- Adaptable Wildcard Searchable Encryption System
PDF
IRJET- Secure File Storage on Cloud using Cryptography
PDF
IRJET- Message Encryption using Hybrid Cryptography
Secret keys and the packets transportation for privacy data forwarding method...
Secret keys and the packets transportation for privacy data forwarding method...
Research trends review on RSA scheme of asymmetric cryptography techniques
Revocation based De-duplication Systems for Improving Reliability in Cloud St...
SECRY - Secure file storage on cloud using hybrid cryptography
IRJET- Secure Sharing of Personal Data on Cloud using Key Aggregation and...
Review on Key Based Encryption Scheme for Secure Data Sharing on Cloud
Secure Medical Data Computation using Virtual_ID Authentication and File Swap...
Secure Privacy Preserving Using Multilevel Trust For Cloud Storage
A Review on Key-Aggregate Cryptosystem for Climbable Knowledge Sharing in Clo...
Searchable Encryption Systems
IRJET- Comparative Analysis of Encryption Techniques
IRJET- Privacy Preserving Encrypted Keyword Search Schemes
Ieeepro techno solutions 2014 ieee java project -key-aggregate cryptosystem...
126689454 jv6
Secure Redundant Data Avoidance over Multi-Cloud Architecture.
A research paper_on_lossless_data_compre
IRJET- Adaptable Wildcard Searchable Encryption System
IRJET- Secure File Storage on Cloud using Cryptography
IRJET- Message Encryption using Hybrid Cryptography
Ad

Similar to Implementation of De-Duplication Algorithm (20)

PDF
IRJET- Secure Data Deduplication and Auditing for Cloud Data Storage
PDF
Multi-part Dynamic Key Generation For Secure Data Encryption
PDF
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
PDF
An Approach towards Shuffling of Data to Avoid Tampering in Cloud
PDF
Secured Authorized Deduplication Based Hybrid Cloud
PDF
IRJET - A Novel Approach Implementing Deduplication using Message Locked Encr...
PDF
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
PDF
Improving Efficiency of Security in Multi-Cloud
PDF
IRJET - Confidential Image De-Duplication in Cloud Storage
PDF
IRJET - Efficient Public Key Cryptosystem for Scalable Data Sharing in Cloud ...
PDF
IRJET- Storage Security in Cloud Computing
PDF
IRJET - A Secure Access Policies based on Data Deduplication System
PDF
PDF
PDF
IRJET- Key Exchange Privacy Preserving Technique in Cloud Computing
PDF
EXPLORING WOMEN SECURITY BY DEDUPLICATION OF DATA
PDF
IRJET- Key-Aggregate Cryptosystem for Scalable Data Sharing in Cloud Storage
DOCX
Key aggregate searchable encryption (kase) for group data sharing via cloud s...
DOCX
Secure distributed deduplication systems with improved reliability
PDF
Data Sharing: Ensure Accountability Distribution in the Cloud
IRJET- Secure Data Deduplication and Auditing for Cloud Data Storage
Multi-part Dynamic Key Generation For Secure Data Encryption
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
An Approach towards Shuffling of Data to Avoid Tampering in Cloud
Secured Authorized Deduplication Based Hybrid Cloud
IRJET - A Novel Approach Implementing Deduplication using Message Locked Encr...
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...
Improving Efficiency of Security in Multi-Cloud
IRJET - Confidential Image De-Duplication in Cloud Storage
IRJET - Efficient Public Key Cryptosystem for Scalable Data Sharing in Cloud ...
IRJET- Storage Security in Cloud Computing
IRJET - A Secure Access Policies based on Data Deduplication System
IRJET- Key Exchange Privacy Preserving Technique in Cloud Computing
EXPLORING WOMEN SECURITY BY DEDUPLICATION OF DATA
IRJET- Key-Aggregate Cryptosystem for Scalable Data Sharing in Cloud Storage
Key aggregate searchable encryption (kase) for group data sharing via cloud s...
Secure distributed deduplication systems with improved reliability
Data Sharing: Ensure Accountability Distribution in the Cloud
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
additive manufacturing of ss316l using mig welding
PPTX
web development for engineering and engineering
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPT
Mechanical Engineering MATERIALS Selection
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
Sustainable Sites - Green Building Construction
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
PPT on Performance Review to get promotions
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPT
Project quality management in manufacturing
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
additive manufacturing of ss316l using mig welding
web development for engineering and engineering
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
Mechanical Engineering MATERIALS Selection
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Sustainable Sites - Green Building Construction
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPT on Performance Review to get promotions
Structs to JSON How Go Powers REST APIs.pdf
Project quality management in manufacturing

Implementation of De-Duplication Algorithm

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 84 IMPLEMENTATION OF DE-DUPLICATION ALGORITHM Nirmala Bhadrappa1, Dr. G S Mamatha2 1Department of ISE, R.V College of Engineering, Bengaluru, India 2Associate Professor, Dept. of ISE, R.V College of Engineering, Bengaluru, India ---------------------------------------------------------------------***--------------------------------------------------------------------- ABSTRACT: Data which are duplicatedcan beavoidedusing data de-duplication technique, and these techniques are used to reduce storage space. This also helpsinreducingbandwidth and can be stored on cloud storage. These de-duplications are used to secure the data and have been a challenge for keeping the data securely. To avoid miss handling of data cloud convergent encryption technique is used. The duplication of data can be treated in two different methods. Firstly we will have to address the problem proficiently and should handle a huge count of convergent keys. Secondly data resource raises i.e. security and privacy. A third-party cloud service is proposed for confidentiality of data; reliability checking by access control mechanisms can be done both internal and external. As the duplication techniqueimprovesstoragespace, bandwidth and efficiency will be aconflictwiththeconvergent encryption technique. So the convergent encryptiontechnique requires key for their respective data to encrypt. The copies of same data and will be checked for data feasibility. Convergent encryption helps in encryptingand decryptingthedatausinga key guaranteeing same data to be duplicated on to itself. The key generation and data encryption technique helps to hold the key and send the cipher text to cloud serviceprovider. Thus encrypting technique is used to determine identicalcopiesand also to create similar key and the identical cipher text hence data is stored secured and only authorized user can accessthe information from the cloud service provider. KEYWORDS: data de-duplication, convergent encryption, de-duplication efficiency, SHA algorithm 1. INTRODUCTION As we move forward with the development of enterprise data accelerates, the task of protecting and de-duplication becomes more challenging. The individualized computing frameworks like desktops, portable PCs, tablets, advanced mobile phones have turned out to be significant stages for various users, increasing the significance of data on these gadgets. We may lose data because of system failure or at times the data might be erased consequently or we may lose the data by losing the gadget or be lost by looting of gadget, yet people have enhanced the utilization of data protection and recovery tool in their individualized computinggadgets. Storage resources like Amazon S3 and Google storage take economic advantages to store the data on the cloud storage for users. Figure1. Replicates the data reinforcement for individual stockpiling, these have beenoutsourcedsoclients can oversee information much effectively withoutbothering about keeping up about the reinforcement. The clouds are always centralized so that they can be easily managed efficiently and disaster free. The clouds offer offsite storage for data backup. The data reinforcement for individual stockpiling in the cloud demonstrates a geographic division between customer and the service provider. In de- duplication the data redundancy is removed by generating hash key to the respective file and later these file is divided into smaller parts based on the number of lines in the file. This smaller parts can also be called has blocks. Hash key is also created to these blocks for de-duplication check. Fig-1: Shows cloud backup platform The cloud concepts can be understood in more detail from the below: 1.1 DE-DUPLICATION Data de-duplication is a technique of finding duplication of data in storage space. This technique used to improve bandwidth and utilization of storagethatcanalsobeused for data transfers over network and to decrease the number of bytes and file size. Data de-duplication technique identifies and removes the data which are not unique. Whenever any similar data match occurs, they are copied with a small reference. Based on file content or file name data de- duplication is performed. Data de-duplication consists of following steps: Step 1: Divide the input file into blocks based on number of lines in the file. Step 2: Hash key value is generated for each block of file.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 85 Step 3: The generated hash value is matched with stored hash value if it is matched then duplication removed. Step 4: Duplicate data is replaced with reference object which is stored in database earlier 1.2 CONVERGENT ENCRYPTION It is also known as content hashing. It is a cryptographic algorithm which generates alike cipher text from the plain text. Cloud computing has unique applications to remove duplication of files from storage, where the user is unaware or having rights to use to the keys. Convergent encryption is provided with authorization of a fileinwhichanattackercan confirm whether the target possesses a specific file. The attack possesses a problem for users to store information which is publicly available/already held by the attacker. For an example consider the books which are banned orthefiles that cause copyright violation is a best sample for the same. If an argument could be made that a validationofa fileattack is easily rendered unsuccessful then by simply adding a unique portion of data or a few arbitrary characters to the plain text before encryption; would cause the uploaded file to be unique and therefore results in a unique encryptedfile. There are several samples to show convergent encryption scheme in a plain-text, which are broken down into small blocks, based on content of the text in the files, after that each block performs convergent encryption which may by mistake overcome any attempts at making the file unique by adding bytes at any place. 1.3 KEY MANAGEMENT Key management is method for cryptographic keys in a cryptosystem to manage the files. The cryptosystem uses different types of key, so as to differentiate and compare. These keys may be asymmetric or symmetric. For both encryption and decryption of message alike keys are used in symmetric key algorithm. The selected keys must be distributed and securely stored for finding out duplication. In asymmetric key twodifferentkeysareusedforencryption and decryption algorithm and they communicated with files once keys is found, it would be easytomanagethem.Thekey management involves exchange of data, storage of data and use of them with required key. 2. FEATURES OF CRYPTOGRAPHIC  Confidentiality Only authorized party should be able to access the information which is transmitted over network and not by the third party.  Authentication The receiver should check the identityofthesender before accessing the information whether the information is sent from the authorized person or by an attacker  Integrity The permission should not be given to everyone for modifying the transmitted data only an authorized party should be allowed to do so. Third party is not allowed to modify the data.  Non Repudiation It ensures that neither the sender, nor the receiver of the information should be able to deny the transmission.  Access Control Only approved persons will be able to access the data. 2.1 CRYPTOGRAPHIC ALGORITHM Following are well known cryptographic algorithm  DES: It stands for “Data Encryption Standard”. It operates on 64 bit block of data using 56-bit key .It is symmetric key block cipher. In DES the algorithm and cryptographic key is applied simultaneouslyon a block of data rather than one bit at a time to encrypt a plaintext.  RSA: RSA is designed by Rivest, Shamir and Adleman. It is a public-key system and it is an asymmetric cryptographic algorithm. It is used for encryption and decryption of message.  HASH: A “hash algorithm” is also calledas“message digest”, or “fingerprint” and it is used for mapping the data size. Hash function returns the hash value. Hash value is stored in hash table  AES: NIST as approved Advanced Encryption Standard and uses Rijndael block cipher.  SHA-1: SHA-1 is a hashing algorithm, produces a digest of 160 bits (20 bytes) Same SHA-1 message digest is given for two different messages hence SHA-1 is recommended than MD5.  HMAC: HMAC uses a key with an algorithm, algorithms such as MD5 or SHA-1 anditisoneofthe hashing algorithms hence in can also be referred as HMAC-MD5 and HMAC-SHA1. 3. LITERATURE SURVEY Data de-duplication can be divided into two parts: de- duplication on unencrypted data and de-duplication on encrypted data. In the previous way, performs proof of ownership procedure in an effective and vigorous way. Though, in the last way, privacy of data is the essential security prerequisite make secure againstthirdpartyaswell as inside the cloud server. In this manner, the majorityofthe plans have been proposed to give information encryption, while as yet profiting by a de-duplication system, by empowering data proprietors to share the keys within the sight.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 86 Data de-duplication over unencrypted data Harnick et al. [1] exhibited how information de-duplication method can be utilized as a side channel that uncovers data to pernicious clients about the contents of documents of different clients. Halevi et al. [2] likewise presented a comparative assault situation on distributed storage that utilizations de-duplication over various clients. This is on account of just a little snippet of data about the information, in particular, its hash esteem, fills in as not just a record of the information to find data of the information among countless, yet in addition a proof that any individual whose the hash key is known esteem claims the comparing information. In this manner, any clients who can get the short hash an incentive for particular information can get to every one of the information put away in the distributed storage. Harnik et al. [3] projected randomized limits to maintain a strategic distance from an assault on distributed storage benefits that utilization server-side information de- duplication by ceasing informationde-duplication.Bethatas it may, their technique did not utilize customer side information ownership verifications to counteract hash control assaults. To conquer these assaults, Halevi et al. [4] presented and define the idea of proof of ownership; here the client demonstrates to a server that the document utilizing At that point, a test reaction convention between the server and the customer checks the possession. PoW is firmly identified with verification of retrievability [5] and evidence of information ownership [6]. In any case, confirmation of retrievability and information ownership regularly utilizes a pre-handling step that can'tbeutilized as a part of the information de-duplication strategy. DATA DE-DUPLICATION OVER ENCRYPTED DATA Keeping in mind the end goal to protect informationsecurity against inside cloud server and outside enemies,clients may need their information encoded. In any case, conventional encryption under various clients'keysmakescrossclient de- duplication inconceivable, since the cloud server would dependably observe distinctivefigurewritings,regardlessof the possibility that the information are the same. United encryption, presented by Douceur et al. [7], is a hopeful answer for issue. Focalized encryption, an information proprietor infers an encryption key R <-F (N), where N is information or a document to be scrambled and F is a cryptographic hash function. Then, he figures theciphertext T <-P (R, N) through a piece figure P, erases N, andkeepsjust R in the wake of transferring C to the cloud storage. In the event that another client encodes a similar text, the alike cipher text T is created since it is deterministic. Therefore, on release of T from different clients later the underlying transfer, the server does not holds the record but rather refreshes data of data to demonstrate it has an extra proprietor. In the event that any genuine owner ask for and download T afterwards, they can encode with R. Be that as it may, united encryption experiences the following security flaws. Xu et al. [8] likewise projected a spillage strong de- duplication plan to determine the information uprightness issue. This plan likewise empowers the information proprietor to scramble informationwithanarbitrarilychose key. At that point, the information encoded key is scrambled under a KEK derived from the information and conveyed to the next information proprietors after the proof of ownership procedure. On the off chance that a honest to goodness proprietor gets a cipher text, the honesty of the information can be reviewed by unscrambling the information encryption key with the same KEK. 4. WORKING The Original Data block is chosen to out sourcedintothe CSP (cloud service provider). The file or block of file can be already present in cloud storage. The file that has to be uploaded to cloud service provider first checks whether the file or the block of file exists in the CSP HASH KEY GENERATION Based on file content hash key is generated. The tag, hash key generated is unique for each file. Key generation algorithm maps a data duplicate N to a convergentkeyRand it is based on the security parameter. The purpose of generating hash key is to encrypt the block of data with unique hash key. ENCRYPTION OF FILE Convergent encryption gives data confidentiality in de- duplication. The encrypt data, uses their own encryption keys for accessing i.e. The keys are derived from itself and hence, produce identical cipher text from identical files. Convergent encryption allows each person cloud storage in large amounts for a very low cost, It also offers privacy at its core. This privacy has concern with cloud storage services where de-duplicating data is done via convergent encryption, as de-duplication can be used to find users who are storing a file. Another reason is that it also finds if any other attacker also has a copy of the same file. For example, an oppressive government could find out users who are storing copies of banned books. The same isusedtodiscover users who are storing copyrighted material, thus assuming direct access to the servers which is provided to the outside party. Private Key encryption is used to bypass de- duplication and hence forcing the cloud storage service to store a unique copy of the files. Here no “password reset” option is provided with convergent encryption, so if one forgets it the data will be lost for ever. The client decides a tag for the copy of data which can be utilized to recognize duplicates. The tags are same if two data copiesareidentical. To check whether there is any duplication, the clientinitially
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 87 sends the tag to server to check whether the copy exists or not. Encryption is done using advanced encryptionstandard algorithm. The encrypted file is stored in cloud service provider with hash key and tag. PROCESS OF FILE UPLOADING When user wants to upload a file “A” file level De duplication is performed first. The user computes file tag on the input file “A” Upon receiving, the auditor checks if there exists the same file with same tag on the cloud service provider. If auditor replies that there is file duplication,orthereisnofile duplication, if the user gets the reply that there is no file duplication, then duplication check is done with block-level. If there is file duplication then user checks for proof of ownership whether the same file “A” that is stored in the cloud service provider. Fig 2: Architecture Diagram of de-duplication PROCESS OF DE-DUPLICATION Once the file is uploaded, upon receiving the auditor checks whether there exist the similar tag and the hash key for the corresponding file on the cloud service provider. If so, auditor replies to the user whether there is file duplication or there no file duplication. If the user receives the response “no file duplication” then it moves to the next level that is block level file duplication. If the auditor gets a reply as “file duplication” then the user runs proof of ownership i.e. it checks for the actual owner of the file on cloud service provider. If proof of ownership file is passed, the cloud service provider simply precedes a file pointer points to the file to the user, and no more information will be uploaded. If proof of ownership fails the upload operation on cloud service provider get terminated. PROCESS OF FILE DOWNLOADING If in case user needs to download a file first along with a filename request is sent to a cloud service provider. Upon receiving, the request cloud service provider checks for the authentication of the user using secret key. If user is authenticated then the file to be downloaded will be in encrypted form using convergent keyandcorrespondingtag and hash key of respective file can be decrypted. DECRYPTION The user downloads the file from the cloud serviceprovider using hash key and tag , the user decrypt the file using decryption encryption standard. EXPERIMENTAL ANALYSIS When a user uploads the datawithencryptionalgorithm,itis compared with two encryption algorithms such as DES and AES. All this depends on the basis of block size. DES has 64 bits block size and AES has 128 bits block size. So the number of blocks required to send over the network in DES is greater than that of AES. The AES analysis is moreefficient than that of DES. CONCLUSION Maintaining encoded data with secure de-duplication is an important and significant aspect in practice on CSP. The proposed work maintains the encoded data in cloud with data de-duplication with on proof of ownership. Our proposed work can support updating of data flexibly and sharing of data with data de-duplication even though users are not online. Only authorized user can access the encoded data and get symmetric keys that is used for decryption hence it is much secured. The analysis of the performance and test conducted reviled that technique issecure, efficient,
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 88 suitable for data de-duplication. The resultsofourcomputer simulations showed the practicability ofourproposed work. REFERENCES [1] M. Bellare, S. Keelveedhi, and T. Ristenpart, “DupLESS: Server Aided Encryption for Deduplicated Storage,” Proceedings of the 22nd USENIX Conference on Security, 2013, pp. 179-194. [2] Dropbox, “A File-Storage and Sharing Service,” http://www.dropbox.com/. [3] Google Drive, http://drive.google.com. [4] Mozy, “Mozy: A File-storage and Sharing Service,” http://mozy.com/. [5] J.R. Douceur, A. Adya, W.J. Bolosky, D. Simon, and M.Theimer, “Reclaiming Space from Duplicate Files in a Serverless Distributed File System,” Proceedings of IEEE International Conferenceon DistributedComputingSystems, 2002, pp. 617 -624 [6] G. Wallace, F. Douglis, H. Qian, P. Shilane , S. Smaldone, M. Chamness, and W. Hsu, “Characteristics of Backup Workloads in Production Systems,” Proceedings of USENIX Conference on File and StorageTechnologies,2012,pp.1-16. [7]Z.O. Wilcox,“Convergent Encryption Reconsidered,”2011,http://www.mailarchive.com/cryptogr aphy@metzdowd.com/msg08949.html. [8] G. Ateniese, K. Fu, M. Green, and S. Hohenberger, “Improved Proxy Re-Encryption Schemes with Applications to Secure Distributed Storage,” ACM Transactions on Information and System Security, 9(1), 2006, pp. 1-30. [9] Open de dup. http://opendedup.org/. [10] D.T. Meyer and W.J Bolosky., “A Study of Practical De- duplication,”ACM Transactions on Storage, 2012, pp.1-20.