SlideShare a Scribd company logo
CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
Fuzzy-folded Bloom Filter-as-a-Service for Big Data Storage in the Cloud
Abstract:
With the ongoing trend of smart and Internet-connected objects being deployed across a
broad range of applications, there is also a corresponding increase in the amount of data
movement across different geographical regions. This, in turn, poses a number of challenges
with respect to big data storage across multiple locations, including cloud computing
platform. For example, the underlying distributed file system has a large number of
directories and files in the form of gigantic trees, which are difficult to parse in polynomial
time. Moreover, with the exponential increase of (big) data streams (i.e. unbounded sets of
continuous data flows), challenges associated with indexing and membership queries are
compounded. The capability to process such significant amount of data with high accuracy
can have significant impact on decision-making and formulation of business and risk-related
strategies, particularly in our current Industrial Internet of Things environment (IIoT).
However, existing storage solutions are deterministic in nature. In other words, they tend to
consume considerable memory and CPU time to yield accurate results. This necessitates the
design of efficient quality of service (QoS)-aware IIoT applications that are able to deal with
the challenges of data storage and retrieval in the cloud computing environment. In this
paper, we present an effective space-effective strategy for massive data storage using bloom
filter (BF). Specifically, in the proposed scheme, the standard BF is extended to incorporate
fuzzy-enabled folding approach, hereafter referred to as Fuzzy Folded BF (FFBF). In FFBF,
fuzzy operations are used to accommodate the hashed data of one BF into another to reduce
storage requirements. Evaluations on UCI ML AReM and Facebook datasets demonstrate the
efficacy of FFBF, in terms of dealing with approximately 1.9 times more data as compared to
using the standard BF. This is also achieved without affecting the false positive rate and
query time.
Existing System:
BFs is that query complexity increases as the size grows. Initial size of filter is an important
factor in dynamic BFs as the small initial sized array may lead to computational overhead,
slice addition and query complexity overhead. On the other hand, a larger initial dynamic BF
size may result in memory wastage. Further, streaming applications, such as-approximate
cache, duplicate detection, and membership query, require one-pass processing of data. In
such applications, results are required within a stipulated time-bound. Thus, to serve this
purpose, BF size should be small and constant to be optimally mapped with cache. In order to
accommodate new data, some data needs to be deleted from the BF. Thus, staling of data is
required to manage the trade-off between false positives and false negatives [21].
CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
Proposed System:
We propose a novel technique of compression of two BFs into one filter without losing any
data. The proposed approach uses fuzzy logic to store data optimally and efficiently use the
storage capacity: Compression of two BFs into one BF using fuzzy fold operation, wherein
large number of elements are accommodated in a single BF of size m. Slow decay of data
which allows streaming data to reside in memory for substantial amount of time. Efficient
and optimal utilization of storage space without any loss of accuracy. Significant reduction
in computational cost by leveraging double hashing to compute the k hash functions. False
positives in the proposed FFBF are not affected by the use of compression operation.
CONCLUSION:
IIoT is likely to be increasingly the norm in our society, particularly in our critical
infrastructure sectors such as the Chemical Sector, the Commercial Facilities Sector, the
Communications Sector, the Critical Manufacturing Sector, the Dams Sector, the Defense
Industrial Base Sector, the Emergency Services Sector, the Energy Sector, the Food and
Agriculture Sector, the Government Facilities Sector, and so on. IIoT also has applications in
a conflict and adversarial environment such as Industrial Internet of Military Things. Hence,
there is a pressing need to address some of the existing challenges, including the challenge
we were seeking to address in this paper. Specifically in this paper, our proposed filter uses a
novel fuzzy based technique to resolve the space requirement problem in BF. We
demonstrated that the proposed approach can accommodate a higher number of elements in
the same space, as compared to SBF. The cost of folding and operations associated with it is
almost negligible because the proposed filter only contains simple fuzzy operation on binary
sets. The false positive rate in compressed, and representation remains the same as that of the
standard BF. The computational time in hashing is also significantly reduced due to the use of
double hashing technique, since it uses only two hash functions to generate k hash functions.
The query complexity of FFBF is dependent on the number of blocks in which BF is divided.
Searching an element from a m sized BF and same sized compressed representation remains
unchanged (i.e., O(k)). Findings from our evaluations using both UCI ML AReM and
Facebook datasets also demonstrated the efficiency of FFBF.
REFERENCES
[1] A. Rajaraman and J. D. Ullman, Mining of Massive Datasets. New York, NY, USA:
Cambridge University Press, 2011.
[2] S. Al-Rubaye, E. Kadhum, Q. Ni, and A. Anpalagan, “Industrial Internet of Things
Driven by SDN Platform for Smart Grid Resiliency,” IEEE Internet of Things Journal, 2017.
CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
[3] S. Mumtaz, A. Alsohaily, Z. Pang, A. Rayes, K. F. Tsang, and J. Rodriguez, “Massive
Internet of Things for Industrial Applications: Addressing Wireless IIoT Connectivity
Challenges and Ecosystem Fragmentation,” IEEE Industrial Electronics Magazine, vol. 11,
no. 1, pp. 28–33, 2017.
[4] L. Jiang, L. D. Xu, H. Cai, Z. Jiang, F. Bu, and B. Xu, “An IoTOriented Data Storage
Framework in Cloud Computing Platform,” IEEE Transactions on Industrial Informatics, vol.
10, no. 2, pp. 1443–1451, May 2014.
[5] F. Tao, J. Cheng, and Q. Qi, “IIHub: an Industrial Internetof-Things Hub Towards Smart
Manufacturing Based on CyberPhysical System,” IEEE Transactions on Industrial
Informatics, 2017.
[6] A. R. Sfar, E. Natalizio, Y. Challal, and Z. Chtourou, “A roadmap for security challenges
in the internet of things,” Digital Communications and Networks, 2017.
[7] “Gartner says a thirty-fold increase in internet-connected physical devices by 2020 will
significantly alter how the supply chain operates,” Gartner, Mar. 2014, [Accessed on: Oct
2017]. [Online]. Available: {http://www.gartner.com/newsroom/id/2688717}
[8] A. Velosa, “Internet of things — architecture remains a core opportunity and challenge: A
gartner trend insight report,” Gartner, vol. G00317007, 2017.
[9] “Big data and cloud computing-challenges and opportunities,” Big Data Made Simple,
Jun. 2017, [Accessed on: Mar. 2018]. [Online]. Available: http://bigdata-madesimple.com/
big-data-and-cloud-computing-challenges-and-opportunities/
[10] X. Liu, R. Deng, K.-K. R. Choo, Y. Yang, and H. Pang, “Privacypreserving outsourced
calculation toolkit in the cloud,” IEEE Transactions on Dependable and Secure Computing,
2018.
[11] S. Kaisler, F. Armour, J. A. Espinosa, and W. Money, “Big data: issues and challenges
moving forward,” in System Sciences (HICSS), 2013 46th Hawaii International Conference
on. IEEE, 2013, pp. 995– 1004.
[12] A. Broder and M. Mitzenmacher, “Network applications of bloom filters: A survey,”
Internet mathematics, vol. 1, no. 4, pp. 485–509, 2004.
[13] S. Tarkoma, C. E. Rothenberg, and E. Lagerspetz, “Theory and Practice of Bloom Filters
for Distributed Systems,” IEEE Communications Surveys Tutorials, vol. 14, no. 1, pp. 131–
155, First 2012.
CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249)
MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com
Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com
[14] “What are the best applications of bloom filters?” https://www.quora.com/What-are-the-
best-applications-ofBloom-filters, [Online].

More Related Content

PDF
kambatla2014.pdf
PDF
A Survey on Big Data Mining Challenges
PDF
Using BIG DATA implementations onto Software Defined Networking
PDF
06. 9534 14985-1-ed b edit dhyan
PDF
A comprehensive survey on data mining
PDF
Data stream mining techniques: a review
PDF
Big data survey
DOCX
AN INFORMATION THEORY-BASED FEATURE SELECTIONFRAMEWORK FOR BIG DATA UNDER APA...
kambatla2014.pdf
A Survey on Big Data Mining Challenges
Using BIG DATA implementations onto Software Defined Networking
06. 9534 14985-1-ed b edit dhyan
A comprehensive survey on data mining
Data stream mining techniques: a review
Big data survey
AN INFORMATION THEORY-BASED FEATURE SELECTIONFRAMEWORK FOR BIG DATA UNDER APA...

What's hot (10)

PPTX
Cloud Computing Role in Information technology
PDF
Big data Mining Using Very-Large-Scale Data Processing Platforms
DOC
Integrated Information System for Construction Operations
PDF
wireless sensor network
PDF
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
PDF
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
PDF
A Data Decomposition Method for Stepwise Migration of Complex Legacy Data
PDF
Top 10 Read articles in Web & semantic technology
PPTX
PDF
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Cloud Computing Role in Information technology
Big data Mining Using Very-Large-Scale Data Processing Platforms
Integrated Information System for Construction Operations
wireless sensor network
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
A Data Decomposition Method for Stepwise Migration of Complex Legacy Data
Top 10 Read articles in Web & semantic technology
Overcomming Big Data Mining Challenges for Revolutionary Breakthroughs in Com...
Ad

Similar to Fuzzy folded bloom filter-as-a-service for big data storage in the cloud (20)

DOCX
Running Head MOBILE COMPUTING INTEGRATION ON IT INFRASTRUCTURE1.docx
DOCX
Running Head MOBILE COMPUTING INTEGRATION ON IT INFRASTRUCTURE1.docx
PDF
An Efficient and Fault Tolerant Data Replica Placement Technique for Cloud ba...
PDF
EDGE COMPUTING: VISION AND CHALLENGES
PDF
IRJET - A Research on Eloquent Salvation and Productive Outsourcing of Massiv...
PDF
Efficient Image Compression Technique using Clustering and Random Permutation
PDF
Efficient Image Compression Technique using Clustering and Random Permutation
PDF
Big Data in Bioinformatics & the Era of Cloud Computing
PDF
A Special Report on Infrastructure Futures: Keeping Pace in the Era of Big Da...
PDF
TOWARDS A MACHINE LEARNING BASED ARTIFICIALLY INTELLIGENT SYSTEM FOR ENERGY E...
PDF
TOWARDS A MACHINE LEARNING BASED ARTIFICIALLY INTELLIGENT SYSTEM FOR ENERGY E...
PDF
Big Data and Next Generation Network Challenges - Phdassistance
DOC
Service Level Comparison for Online Shopping using Data Mining
PDF
Virtual Machine Allocation Policy in Cloud Computing Environment using CloudSim
PDF
An efficient approach on spatial big data related to wireless networks and it...
PDF
Campus realities: forecasting user bandwidth utilization using Monte Carlo si...
PDF
Fog Computing: A Platform for Internet of Things and Analytics
PDF
Big Data and Internet of Things: A Roadmap For Smart Environments, Fog Comput...
PDF
IRJET- Deduplication of Encrypted Bigdata on Cloud
PDF
Efficient Cost Minimization for Big Data Processing
Running Head MOBILE COMPUTING INTEGRATION ON IT INFRASTRUCTURE1.docx
Running Head MOBILE COMPUTING INTEGRATION ON IT INFRASTRUCTURE1.docx
An Efficient and Fault Tolerant Data Replica Placement Technique for Cloud ba...
EDGE COMPUTING: VISION AND CHALLENGES
IRJET - A Research on Eloquent Salvation and Productive Outsourcing of Massiv...
Efficient Image Compression Technique using Clustering and Random Permutation
Efficient Image Compression Technique using Clustering and Random Permutation
Big Data in Bioinformatics & the Era of Cloud Computing
A Special Report on Infrastructure Futures: Keeping Pace in the Era of Big Da...
TOWARDS A MACHINE LEARNING BASED ARTIFICIALLY INTELLIGENT SYSTEM FOR ENERGY E...
TOWARDS A MACHINE LEARNING BASED ARTIFICIALLY INTELLIGENT SYSTEM FOR ENERGY E...
Big Data and Next Generation Network Challenges - Phdassistance
Service Level Comparison for Online Shopping using Data Mining
Virtual Machine Allocation Policy in Cloud Computing Environment using CloudSim
An efficient approach on spatial big data related to wireless networks and it...
Campus realities: forecasting user bandwidth utilization using Monte Carlo si...
Fog Computing: A Platform for Internet of Things and Analytics
Big Data and Internet of Things: A Roadmap For Smart Environments, Fog Comput...
IRJET- Deduplication of Encrypted Bigdata on Cloud
Efficient Cost Minimization for Big Data Processing
Ad

Recently uploaded (20)

PDF
01-Introduction-to-Information-Management.pdf
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
Yogi Goddess Pres Conference Studio Updates
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Updated Idioms and Phrasal Verbs in English subject
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
Classroom Observation Tools for Teachers
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PPTX
UNIT III MENTAL HEALTH NURSING ASSESSMENT
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
What if we spent less time fighting change, and more time building what’s rig...
01-Introduction-to-Information-Management.pdf
A systematic review of self-coping strategies used by university students to ...
Orientation - ARALprogram of Deped to the Parents.pptx
Yogi Goddess Pres Conference Studio Updates
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Updated Idioms and Phrasal Verbs in English subject
LDMMIA Reiki Yoga Finals Review Spring Summer
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Classroom Observation Tools for Teachers
Paper A Mock Exam 9_ Attempt review.pdf.
STATICS OF THE RIGID BODIES Hibbelers.pdf
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
UNIT III MENTAL HEALTH NURSING ASSESSMENT
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Final Presentation General Medicine 03-08-2024.pptx
Supply Chain Operations Speaking Notes -ICLT Program
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
What if we spent less time fighting change, and more time building what’s rig...

Fuzzy folded bloom filter-as-a-service for big data storage in the cloud

  • 1. CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com Fuzzy-folded Bloom Filter-as-a-Service for Big Data Storage in the Cloud Abstract: With the ongoing trend of smart and Internet-connected objects being deployed across a broad range of applications, there is also a corresponding increase in the amount of data movement across different geographical regions. This, in turn, poses a number of challenges with respect to big data storage across multiple locations, including cloud computing platform. For example, the underlying distributed file system has a large number of directories and files in the form of gigantic trees, which are difficult to parse in polynomial time. Moreover, with the exponential increase of (big) data streams (i.e. unbounded sets of continuous data flows), challenges associated with indexing and membership queries are compounded. The capability to process such significant amount of data with high accuracy can have significant impact on decision-making and formulation of business and risk-related strategies, particularly in our current Industrial Internet of Things environment (IIoT). However, existing storage solutions are deterministic in nature. In other words, they tend to consume considerable memory and CPU time to yield accurate results. This necessitates the design of efficient quality of service (QoS)-aware IIoT applications that are able to deal with the challenges of data storage and retrieval in the cloud computing environment. In this paper, we present an effective space-effective strategy for massive data storage using bloom filter (BF). Specifically, in the proposed scheme, the standard BF is extended to incorporate fuzzy-enabled folding approach, hereafter referred to as Fuzzy Folded BF (FFBF). In FFBF, fuzzy operations are used to accommodate the hashed data of one BF into another to reduce storage requirements. Evaluations on UCI ML AReM and Facebook datasets demonstrate the efficacy of FFBF, in terms of dealing with approximately 1.9 times more data as compared to using the standard BF. This is also achieved without affecting the false positive rate and query time. Existing System: BFs is that query complexity increases as the size grows. Initial size of filter is an important factor in dynamic BFs as the small initial sized array may lead to computational overhead, slice addition and query complexity overhead. On the other hand, a larger initial dynamic BF size may result in memory wastage. Further, streaming applications, such as-approximate cache, duplicate detection, and membership query, require one-pass processing of data. In such applications, results are required within a stipulated time-bound. Thus, to serve this purpose, BF size should be small and constant to be optimally mapped with cache. In order to accommodate new data, some data needs to be deleted from the BF. Thus, staling of data is required to manage the trade-off between false positives and false negatives [21].
  • 2. CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com Proposed System: We propose a novel technique of compression of two BFs into one filter without losing any data. The proposed approach uses fuzzy logic to store data optimally and efficiently use the storage capacity: Compression of two BFs into one BF using fuzzy fold operation, wherein large number of elements are accommodated in a single BF of size m. Slow decay of data which allows streaming data to reside in memory for substantial amount of time. Efficient and optimal utilization of storage space without any loss of accuracy. Significant reduction in computational cost by leveraging double hashing to compute the k hash functions. False positives in the proposed FFBF are not affected by the use of compression operation. CONCLUSION: IIoT is likely to be increasingly the norm in our society, particularly in our critical infrastructure sectors such as the Chemical Sector, the Commercial Facilities Sector, the Communications Sector, the Critical Manufacturing Sector, the Dams Sector, the Defense Industrial Base Sector, the Emergency Services Sector, the Energy Sector, the Food and Agriculture Sector, the Government Facilities Sector, and so on. IIoT also has applications in a conflict and adversarial environment such as Industrial Internet of Military Things. Hence, there is a pressing need to address some of the existing challenges, including the challenge we were seeking to address in this paper. Specifically in this paper, our proposed filter uses a novel fuzzy based technique to resolve the space requirement problem in BF. We demonstrated that the proposed approach can accommodate a higher number of elements in the same space, as compared to SBF. The cost of folding and operations associated with it is almost negligible because the proposed filter only contains simple fuzzy operation on binary sets. The false positive rate in compressed, and representation remains the same as that of the standard BF. The computational time in hashing is also significantly reduced due to the use of double hashing technique, since it uses only two hash functions to generate k hash functions. The query complexity of FFBF is dependent on the number of blocks in which BF is divided. Searching an element from a m sized BF and same sized compressed representation remains unchanged (i.e., O(k)). Findings from our evaluations using both UCI ML AReM and Facebook datasets also demonstrated the efficiency of FFBF. REFERENCES [1] A. Rajaraman and J. D. Ullman, Mining of Massive Datasets. New York, NY, USA: Cambridge University Press, 2011. [2] S. Al-Rubaye, E. Kadhum, Q. Ni, and A. Anpalagan, “Industrial Internet of Things Driven by SDN Platform for Smart Grid Resiliency,” IEEE Internet of Things Journal, 2017.
  • 3. CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com [3] S. Mumtaz, A. Alsohaily, Z. Pang, A. Rayes, K. F. Tsang, and J. Rodriguez, “Massive Internet of Things for Industrial Applications: Addressing Wireless IIoT Connectivity Challenges and Ecosystem Fragmentation,” IEEE Industrial Electronics Magazine, vol. 11, no. 1, pp. 28–33, 2017. [4] L. Jiang, L. D. Xu, H. Cai, Z. Jiang, F. Bu, and B. Xu, “An IoTOriented Data Storage Framework in Cloud Computing Platform,” IEEE Transactions on Industrial Informatics, vol. 10, no. 2, pp. 1443–1451, May 2014. [5] F. Tao, J. Cheng, and Q. Qi, “IIHub: an Industrial Internetof-Things Hub Towards Smart Manufacturing Based on CyberPhysical System,” IEEE Transactions on Industrial Informatics, 2017. [6] A. R. Sfar, E. Natalizio, Y. Challal, and Z. Chtourou, “A roadmap for security challenges in the internet of things,” Digital Communications and Networks, 2017. [7] “Gartner says a thirty-fold increase in internet-connected physical devices by 2020 will significantly alter how the supply chain operates,” Gartner, Mar. 2014, [Accessed on: Oct 2017]. [Online]. Available: {http://www.gartner.com/newsroom/id/2688717} [8] A. Velosa, “Internet of things — architecture remains a core opportunity and challenge: A gartner trend insight report,” Gartner, vol. G00317007, 2017. [9] “Big data and cloud computing-challenges and opportunities,” Big Data Made Simple, Jun. 2017, [Accessed on: Mar. 2018]. [Online]. Available: http://bigdata-madesimple.com/ big-data-and-cloud-computing-challenges-and-opportunities/ [10] X. Liu, R. Deng, K.-K. R. Choo, Y. Yang, and H. Pang, “Privacypreserving outsourced calculation toolkit in the cloud,” IEEE Transactions on Dependable and Secure Computing, 2018. [11] S. Kaisler, F. Armour, J. A. Espinosa, and W. Money, “Big data: issues and challenges moving forward,” in System Sciences (HICSS), 2013 46th Hawaii International Conference on. IEEE, 2013, pp. 995– 1004. [12] A. Broder and M. Mitzenmacher, “Network applications of bloom filters: A survey,” Internet mathematics, vol. 1, no. 4, pp. 485–509, 2004. [13] S. Tarkoma, C. E. Rothenberg, and E. Lagerspetz, “Theory and Practice of Bloom Filters for Distributed Systems,” IEEE Communications Surveys Tutorials, vol. 14, no. 1, pp. 131– 155, First 2012.
  • 4. CONTACT: PRAVEEN KUMAR. L (, +91 – 9791938249) MAIL ID: sunsid1989@gmail.com, praveen@nexgenproject.com Web: www.nexgenproject.com, www.finalyear-ieeeprojects.com [14] “What are the best applications of bloom filters?” https://www.quora.com/What-are-the- best-applications-ofBloom-filters, [Online].