SlideShare a Scribd company logo
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/342946040
Big Data Analytics for IoT
Article in INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING & TECHNOLOGY · July 2020
DOI: 10.34218/IJARET.11.6.2020.054
CITATIONS
9
READS
8,441
1 author:
Preeti Gulia
Maharshi Dayanand University
107 PUBLICATIONS 517 CITATIONS
SEE PROFILE
All content following this page was uploaded by Preeti Gulia on 15 July 2020.
The user has requested enhancement of the downloaded file.
http://www.iaeme.com/IJARET/index.asp 593 editor@iaeme.com
International Journal of Advanced Research in Engineering and Technology (IJARET)
Volume 11, Issue 6, June 2020, pp. 593-603, Article ID: IJARET_11_06_054
Available online athttp://www.iaeme.com/IJARET/issues.asp?JType=IJARET&VType=11&IType=6
ISSN Print: 0976-6480 and ISSN Online: 0976-6499
DOI: 10.34218/IJARET.11.6.2020.054
© IAEME Publication Scopus Indexed
BIG DATA ANALYTICS FOR IOT
Preeti Gulia
Assistant Professor, Department of Computer Science and Application,
MDU, Rohtak, Haryana, India
Ayushi Chahal
Research Scholar, Department of Computer Science and Application,
MDU, Rohtak, Haryana, India
ABSTRACT
The Internet has helped technology and communication to grow very fast, which
further increased the connection between different machines and sensor-based devices.
This connection of machines or devices through the internet gives rise to the concept of
IoT (Internet of Things). Various wearable devices like smart-watch, cars, home
appliances like washing machines, doors, door locks, lights, etc. are now connected
over the Internet of things. These sensor devices produce Big data in bulk per day. This
data can be used for analysis to solve out different day-today problems. This paper
discusses different Big data tools and techniques that can be used for IoT frameworks.
It also presented a way how Big Data can be used to analyze IoT data sets intelligently.
Different platforms of Big-data Analytics are explained in detail, and light is given on
which of them is best for IoT data.
Keywords: Big data, Frameworks, Internet of Things (IoT), Architecture, Big Data
Analytics (BDA)
Cite this Article: Preeti Gulia and Ayushi Chahal, Big Data Analytics for IoT,
International Journal of Advanced Research in Engineering and Technology (IJARET),
11(6), 2020, pp. 593-603.
http://www.iaeme.com/IJARET/issues.asp?JType=IJARET&VType=11&IType=6
1. INTRODUCTION
Big-Data is developing briskly, and so is IoT. [31] This recent advancement is affecting all
areas of business and technology. Data produced by IoT devices play an essential role in the
conversion of raw data to knowledge. This can be done by applying the correct methods of big
data analytics over raw data. Gartner has characterized Big Data in three qualities [1], i.e.,
volume, variety, and velocity, which are discussed in detail in section 2 of the paper.
IoT collects data in different forms and from different sources; that is why it is called a
heterogeneous data.[2] IoT can collect data from healthcare industries, smart homes, smart
traffic management, airplane system, railways system, weather forecasting system, agricultural
sensors, and many more shown in figure 1. IoT data is unstructured, having no pattern. By
Preeti Gulia and Ayushi Chahal
http://www.iaeme.com/IJARET/index.asp 594 editor@iaeme.com
applying the right big data analytical techniques, one can find out the hidden pattern, new
information, hidden correlation, revel trends, etc. [3], [36], [50] from this unstructured data.
1.1. IoT
When a set of anyone, anything, anytime, anyplace, any service, and any network gets
connected, it creates a situation of Internet of Things (IoT). Researchers give different
definitions and architectures for IoT.
IoT is a system of interrelated things or machines (computing, mechanical or digital
devices) which can connect these machines or things without any interruption of human. It is a
Machine to Machine (M2M) communication process.
Figure 1 Data sources for IoT
Different researchers have given different forms of architecture. The most basic architecture
of IoT shown in fig 2 below, which consists of 3 Layers, named as follows [5]:
Application Layer
Network Layer
Perception Layer
Figure 2 Three-layer architecture of IoT [29]
• Perception Layer: The bottom-most layer is called Perception Layer. It is used for data
collection.[32]
• Network Layer: This layer is a middle layer, which is used to set up a connection
between perception and application layer. [26]
• Application Layer: This layer provides services and is used for analyzing information
received from the other two layers.
Fig 3 also shows a different kind of IoT architecture inspired by [5]. In this architecture,
gateway and middleware are added to the previous architecture. It is a five-layer architecture.
The layers are as follows:
Application Layer
Middleware Layer
Network Layer
Access Gateway Layer
Perception Layer
Figure 3 Five layered IoT architecture [5]
Big Data Analytics for IoT
http://www.iaeme.com/IJARET/index.asp 595 editor@iaeme.com
• Perception Layer: This layer is also called the edge layer.[29]
• Access Gateway Layer: This layer is used to manage the conveying messages or data
between IoT devices.[5]
• Network Layer: This layer works the same as that of the above layer. It also helps to
convey messages among sender and receivers in IoT systems.
• Middleware Layer: This layer provides a connection between the hardware and
different software. It helps in setting up a pliable alliance of hardware and its
applications.[5]
• Application Layer: This layer provides the same services as the third layer of the three-
layer architecture. It exists over the top of all other layers. It is used to analyze all the
information that is provided by the layers below it.
1.2. Big Data
Nowadays, big data related to the companies which are using internet services are proliferating.
For example, over hundreds of Petabyte (PB) data is handled by Google, Facebook logs around
10 PB data per month, 10 PB data is analyzed and processed by Baidu, and many more. [27]
In the model of IoT, sensors are used to collect and transmit data all around the world. These
sensors generate increasingly growing data, which tends to form a vast heterogeneous dataset.
This data needs to be stored and processed such that the quality of data does not compromise.
To maintain the quantity and mutual relations of such extensive data, existing enterprises of IT
have to improve their architectures and infrastructures. [7]
There is a need for new mining, analyzing, modeling, visualizing, and forecasting
technologies in order to reveal the intrinsic properties of this heterogeneous data and improve
decision making [33-35]. For extensive discussion and definition of the term, big data let us
have a look at V’s model. Doug Laney, who is working as an analyst of META (presently
Gartner) who presented 3V’s model in early 2001, which described different challenges and
opportunities created by a large amount of data generated by sensors [27]. After that, with the
advancement in the big data area in 2011, IDC came up with four V’s model. Moreover, with
further advancements, scientists have reached to 10V’s of Big Data. In 10V’s model, we have
the following [28]:
Figure 4 10 V’s model of Big Data [28]
• Volume: It is the most crucial V in v’s model. It describes big data. With the rise in data
generation devices, broad, diverse data is being generated. [5] Our traditional data
Preeti Gulia and Ayushi Chahal
http://www.iaeme.com/IJARET/index.asp 596 editor@iaeme.com
processors and techniques cannot handle such a large amount of heterogeneous data.
So, there is a strict requirement of enhanced techniques to process such data.
• Velocity: Velocity represents the rate of big incoming data from various devices. This
velocity is indeed an essential factor of big data. Velocity describes the speed of
generating the data by various machines over the network. One of the most common
examples of data generation speed is social media. It creates a variety of data. Now,
every person is concerned to post most hot updates about themselves (a tweet, Instagram
posts, WhatsApp status updates, etc.)
• Variety: As the definition of Big data says, it is a large amount of heterogeneous data.
So, variety is indeed the essential property of big data. These days collection of different
kinds of data types (structured, semi-structured, or unstructured) exists over data
generation devices. Sometimes, this collected data may be in a different format as
expected. This unexpected format may cause trouble in the data processing. To remove
these troubles, any organization should have that kind of data storage system which can
examine and process any form of data irrespective of their structure.[5]
• Value: Continuous amount of data generation tends to create Big Data. This data is of
no use until or unless it seems to have some value. Thus the value of data indeed is an
essential factor of big data. These days big data analytics, which has become an integral
part of the society, is based on the valuable data that different devices provide to the
analyst or data scientist. It is not always necessary that big data will have a value.
• Veracity: Veracity does not refer to the quantity of data. It belongs to the
understandability of data that Big data provides to its users. Any organization working
on a large amount of data should remove “dirty data” before it accumulates in the
systems.
• Validity: For future use of data, it must be precise and accurate. Any organization should
validate the data if it wants to make correct decisions for the future based on the data
collected by the devices. So, Validity is considered an essential factor for big data.
• Variability: Variability includes data consistency and value of data.
• Viscosity: Viscosity is considered as a part of velocity. It is used to describe the delay
or lag-time which occurs between the sender and receiver during data transmission.[5]
• Virality: It describes the data speed. This property has checks on the data speed with
which sender and receiver access data from different devices.
• Visualization: This property represents big data symbolically. Visualization helps to
find out the hidden patterns. These hidden patterns help in decision making for any
query of big data. Visualization helps Big data to play an essential part in decision-
making.
Figure 5 Interrelation between big data and IoT
Big Data Analytics for IoT
http://www.iaeme.com/IJARET/index.asp 597 editor@iaeme.com
For handling such a massive amount of data, reliable software systems are required.
Software testing plays a crucial role in ensuring the quality of the software [37-49]
2. INTEGRATION OF BIG DATA AND IOT
In the current lifestyle, everything is merged with technology. IoT has been emerging rapidly
in many industries. IoT consists of devices that collect the data, and with the help of this data,
these devices connect with the real world. This data is useful to us as it can help in solving may
research problem in one way or another. To analyze this data, various big data analytical tools
and techniques can be beneficial. IoT and Big Data are considered as two sides of the same
coin. Figure 5 shows the interrelation between IoT and Big data analytics.
2.1. IoT and Big Data Analytics relationship
IoT data vary much different from standard data because it includes various sensors and objects
for during collection of data. IoT data is a heterogeneous data which involves noise, variety,
and have rapid growth[8]. It is assumed that by 2020 there will be 4.4 trillion data around us by
IoT devices. Also, these devices will collect, gaze, transmit, analyze, share the real-time data,
which changes with every millisecond. [25]
Here comes the vital role of Big Data Analytics to handle such a redundant, heterogeneous,
fluctuating data. [4] Big data is used to store this vast amount of data with different storage
techniques and then analyzing them for particular outcomes.
From various research its id generalized that big IoT data has three features, that confirm it
to get fit in the big data paradigm:
i. It consists of an abundant amount of terminals, which generate massive raw-data.
ii. Raw-data generated by devices used in IoT can be in any form, but generally, it is
unstructured.[30]
iii. IoT devices generated raw-data are useless if not examined.
2.2. Steps for IoT Big Data Processing
To manage IoT Big data, the process is broadly classified into four steps, described below [24]:
i. The first step is to manage different data sources of IoT, i.e., IoT sensor devices, where
sensors in a device interact with each other with the help of different applications and
generate highly unstructured, semi-structured, or structured data.
ii. In the second step, data generated by different IoT devices called Big IoT data is
collected and stored by the Big data storage system. This data is based on the 3V model
given by Gartner.
iii. In Big data storage system, this IoT data is converted into shared and distributed Big
data files.
iv. After that, it applies different analytical tools for analysis of data like Hadoop, Map-
Reduce or Spark, and many more, which are further discussed in the next section.
v. In the last step, the report corresponding to the injected data is generated and presented
to the user.
3. DIFFERENT PLATFORMS OF BIG DATA ANALYTICS FOR IOT
Big Data Analytics needs some tools and techniques to transform IoT structured, semi-
structured, and unstructured data into metadata or comprehensive form for the further analysis
process. These tools use algorithms that discover patterns, correlations, trends over various
forms of data. [9]
Preeti Gulia and Ayushi Chahal
http://www.iaeme.com/IJARET/index.asp 598 editor@iaeme.com
After analyzing the data, these tools are also used to visualize these outcomes in the form
of a graph, tables, pie chart, bar chart, etc. Here in this section, various platforms that can
analyze IoT data are discussed. Big data analytics platforms are described below [12]:
3.1. Apache Hadoop
Apache Hadoop is an open-source platform. It is used as a storage of a large volume of raw
data. It can perform Big data Analytics. This standard framework consists of Apache Hive,
Hadoop kernel, Map-Reduce, HDFS (Hadoop Distributed File System).
Hadoop contains libraries that use a simple programming model. HDFS stores the data
while Map-Reduce processes this data in a distributed manner. The combination of HDFS and
Map-Reduce framework allows data to get replicated and distributed in N different nodes.[10]
Hadoop is based on two nodes: Master node and Slave node. Master node helps in dividing
the problem into sub-problems. These sub-problems are then distributed into different slave
node. After that, the output of all the sub-problems from slaves is collected by the master node.
Figure 6 Architecture of Hadoop [11]
3.2. Apache Spark
It is also an open-source as Apache Hadoop, but it is used to overcome the limitations of Map-
Reduce like fault tolerance, linear scalability. It provides high speed, ease of use, and
sophisticated analytics. Figure 7 shows the architectural diagram of Apache Sparks. Sparks
libraries unify the analysis of graphs and ETL. It provides real-time analysis.
3.3. Dryad
It works as a data flow graph for parallel as well as distributed data sets. A user can use multiple
machines at a time without knowing concurrent programming. It is efficient in handling faults
in the cluster, graph generation, scheduling available free machines for allotment, visualizing
jobs to free machines, etc. [21]
Figure 7 Architecture of Spark [13]
Big Data Analytics for IoT
http://www.iaeme.com/IJARET/index.asp 599 editor@iaeme.com
3.4. Apache Drill
It is used in a distributed system for Big IoT data analytics. It can be used with many query
languages. It can handle thousands of servers at a time. It uses HDFS for storage and Map-
Reduce for analysis. [22]
3.5. Storm
It is used for extensive data processing. It works on real-time data, which should be distributed
and fault-tolerant. It forms a cluster of data that is similar to Hadoop clusters. It also works as
a Master node and worker node.
3.6. Splunk
It is a combination of Big data and cloud technology. It uses a web interface to allow the user
to analyze, search, and monitor the data. It helps to index structured and unstructured data
generated by machines. Hence, it is useful for IoT Big data-sets. It is an intelligent support
system for real-time and business-oriented data exploration. [23]
3.7. Jaspersoft
It is an open-source tool that is used for real-time data analysis. It visualizes data on various
platforms like Mongo DB, Cassandra, Redis. It can create powerful HTML reports.
3.8. Apache Mahout
It is a data analytics software that requires no license, i.e., open-source. It is used for automatic
learning. It is used to implement different machine learning methods. Big companies use it like
Google, Yahoo, Amazon, IBM, Twitter, Facebook, etc. to implement scalable machine learning
algorithms.
3.9. 1010data
It consists of columns in the database. It deals with semi-structured data. It supports enormous
scale infrastructure. It is not considered adequate for extracting the data, transforming the data,
and loading the data. It provides advanced analytical services, including statistical analysis and
optimization also. [14]
3.10. Cloudera Data Hub
It works as a Data Hub for different enterprises. It is used for data analytics and data processing
specifically for IoT based data. It uses Hadoop as a base for analytical purposes. It can be used
as a central point for IoT based extensive data analysis. It provides reliability, data access
control, high performance, security. It does not have its hardware, so it depends on the third
party for processing. [15]
3.11. SAP-Hana
It is used for in-memory addressing transactions for big IoT data analytics. It gives solutions to
various big unstructured IoT data. SAP-Hana contains libraries for spatial processing, text
analysis, and support R tool language. [16]
3.12. HP-HAVEn
HP introduced Hadoop Autonomy Vertica Enterprise (HAVEn). A large number of HP systems
use this platform for Big IoT data analytics. It is for massive data, which is analyzed as the
columnar database. It provides parallel processing. [17]
Preeti Gulia and Ayushi Chahal
http://www.iaeme.com/IJARET/index.asp 600 editor@iaeme.com
3.13. Hortonworks
Hortonworks is a Hadoop based platform. It is used for Big IoT data analytics. It is open-source
software and an improved version of Hive. It can-not minimizes the number of nodes group.
[18]
3.14. Pivotal Big Data Suite
It is installed, tested, and implemented on a public cloud. It is given as a single license. Pivotal
helps in massive parallel processing. It can perform predictive analytics on IoT data, but this
data should be kept in HDFS. [19]
3.15. Infobright
It is suitable for the analysis of machine-generated data like IoT data. It can analyze up to 50
TB at a time. It works with large scale data-based systems such as Hadoop. It is a columnar
designed tool which has data skipping and automatic indexing property. [20]
4. CONCLUSION
IoT has now become a significant source of Big Data, which is useless if not analyzed properly.
This paper focuses on Big Data context concerning the Internet of Things. It describes the basic
concepts of IoT and its architecture. It gives an elaborated structure of Gartner’s 3 V’s model
Big Data in the form of 10 V’s model. This paper enhances the understandability of the reader
for the relation between IoT, Big data, and analytics. It familiarizes reader to different Big Data
Analytics platforms which can handle various IoT datasets. After reading his paper, a reader
will be aware of different platforms and will be able to select one for their particular problems.
REFERENCES
[1] M. Beyer, ``Gartner says solving `Big Data' challenge involves more than just managing
volumes of data,'' Tech. Rep., AaltoDoc, Aalto Univ., 2011.
[2] R. Mital, J. Coughlin, and M. Canaday, ``Using big data technologies and analytics to predict
sensor anomalies,'' in Proc. Adv. Maui Opt. Space Surveill. Technol. Conf., Sep. 2014, p. 84.
[3] N. Golchha, ``Big data-the information revolution,'' Int. J. Adv. Res., vol. 1, no. 12, pp. 791_794,
2015.
[4] Y. Wang, L. Kung, W. Y. C. Wang, and C. G. Cegielski, “An integrated big data analytics-
enabled transformation model: Application to health care,” Inf. Manage., vol. 55, no. 1, pp. 64–
79, Jan. 2018.
[5] R. Khan, S. Khan, R. Zaheer & S. Khan, “Future Internet: The internet of things architecture,
possible applications, and key challenges,” In Proceedings of international conference on
frontiers of information technology, pp. 275-260, 2012.
[6] A. Ilapakurti, J. S. Vuppalapati, S. Kedari, S. Kedari, C. Chauhan, and C. Vuppalapati,
“iDispenser #x2014; Big Data Enabled Intelligent Dispenser,” in 2017 IEEE Third International
Conference on Big Data Computing Service and Applications (BigDataService), pp. 124–130,
2017.
[7] Y. Wang, L. Kung, and T. A. Byrd, “Big data analytics: Understanding its capabilities and
potential benefits for healthcare organizations,” Technol. Forecast. Soc. Change, vol. 126, pp.
3–13, Jan. 2018.
[8] M. Marjani, F. Nasaruddin, A. Gani, A. Karim, I.A.T. Hashem, A. Siddiqa, I. Yaqoob “Big IoT
Data Analytics: Architecture, Opportunities, and Open Research Challenges,” IEEE Access, vol.
5, pp. 5247–5261, 2017.
Big Data Analytics for IoT
http://www.iaeme.com/IJARET/index.asp 601 editor@iaeme.com
[9] E. Ahmed, Ibrar Yaqoob, Ibrahim Abaker Targio Hashem, Imran Khan, Abdelmuttlib Ibrahim
Abdalla Ahmed, Muhammad Imran, Athanasios V. Vasilakos, “The role of big data analytics
in Internet of Things,” Computer Networks, vol. 129, pp. 459–471, Dec. 2017.
[10] T. O. Center: Introducción a Hadoop y su ecosistema.
http://www.ticout.com/blog/2013/04/02/introduccion-a-Hadoop-y-su-ecosistema/
[11] Acharjya, D.P., Ahmed, K., “A survey on Big Data analytics: challenges, open research issues,
and tools.” in Int. J. Adv. Comput. Sci. Appl. Vol.7, issue 2, pp. No.- 511–518, 2016.
[12] F. Constante Nicolalde, F. Silva, B. Herrera, and A. Pereira, “Big Data Analytics in IoT:
Challenges, Open Research Issues and Tools,” in Trends and Advances in Information Systems
and Technologies, Cham, 2018, pp. 775–788.
[13] A. S. Foundation: Spark 0.8.0: This document gives a short overview of how Spark runs on
clusters, to make it easier to understand the components involved, 2014, https://spark.
apache.org/docs/0.8.0/cluster-overview.html
[14] V. Morabito, “Managing change for big data driven innovation,” in Big Data and Analytics.
Springer, 2015, pp. 125–153.
[15] A. Bhardwaj, S. Bhattacherjee, A. Chavan, A. Deshpande, A. J. Elmore, S. Madden, and A. G.
Parameswaran, “Datahub: Collaborative data science & dataset version management at scale,”
arXiv preprint arXiv:1409.0798, 2014.
[16] F. Farber, S. K. Cha, J. Primsch, C. Bornh¨ovd, S. Sigg, and W. Lehner, “Sap hana database:
data management for modern business applications,” ACM Sigmod Record, vol. 40, no. 4, pp.
45–51, 2012.
[17] S. Burke, “Hp haven big data platform is gaining partner momentum,” CRN [online]
http://www. crn.com/news/applications-os/240161649, 2013.
[18] (2019, Accessed on 3rd December) Hortonworks. [Online]. Available:
https://hortonworks.com/
[19] Y. Zhuang, Y.Wang, J. Shao, L. Chen, W. Lu, J. Sun, B.Wei, and J. Wu, “D-ocean: an
unstructured data management system for data ocean environment,” Frontiers of Computer
Science, vol. 10, no. 2, pp. 353–369, 2016. [Online]. Available: http://dx.doi.org/10.1007/s11704-
015-5045-6
[20] D. Slezak, P. Synak, J. Wr ´oblewski, and G. Toppin, “Infobright analytic database engine using
rough sets and granular computing,” in Granular Computing (GrC), 2010 IEEE International
Conference on. IEEE, 2010, pp. 432–437.
[21] Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D. Dryad, “distributed data-parallel programs
from sequential building blocks” in ACM SIGOPS Oper. Syst. Rev. 41, pp. No.- 59–72, 2007.
[22] Kelly, J.: Apache Drill Brings SQL-Like, Ad Hoc Query Capabilities to Big Data (2013).
http://wikibon.org/wiki/v/Apache_Drill_Brings_SQL-Like,_Ad_Hoc_Query_Capabilities_to_Big_Data
[23] C.L.P., Chen, C.Y. Zhang, “Data-intensive applications, challenges, techniques, and
technologies: a survey on Big Data.” In Inf. Sci. 275, pp. no. -314–347, 2014.
[24] G. Ingersoll, “Introducing apache mahout: Scalable, commercial-friendly machine learning for
building intelligent applications,” White Paper, IBM Developer Works, pp. no. - 1- 8, 2009.
[25] A. Verma, “Internet of Things and Big Data - Better Together,” Whizlabs Blog, 01-Aug-2018.
[Online]. Available: https://www.whizlabs.com/blog/iot-and-big-data/. [Accessed: 11-Mar-2020].
[26] “Integrating IoT with Big Data, a Revolutionary Step,” Experfy Insights. [Online]. Available:
https://www.experfy.com/blog/integrating-iot-with-big-data-a-revolutionary-step. [Accessed: 11-Mar-
2020].
[27] C.-W. Tsai, C.-F. Lai and A. V. Vasilakos, “Future Internet of Things: open issues and
challenges,” Wireless Netw, vol. 20, no. 8, pp. 2201–2217, Nov. 2014, DOI: 10.1007/s11276-014-
0731-0.
[28] M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Netw Appl, vol. 19, no. 2, pp. 171–
209, Apr. 2014, DOI: 10.1007/s11036-013-0489-0.
Preeti Gulia and Ayushi Chahal
http://www.iaeme.com/IJARET/index.asp 602 editor@iaeme.com
[29] G. Manogaran, D. Lopez, C. Thota, K. M. Abbas, S. Pyne, and R. Sundarasekar, “Big Data
Analytics in Healthcare Internet of Things,” in Innovative Healthcare Systems for the 21st
Century, H. Qudrat-Ullah and P. Tsasis, Eds. Cham: Springer International Publishing, 2017,
pp. 263–284.
[30] F. Alshohoumi, M. Sarrab, A. AlHamadani, and D. Al-Abri, “Systematic Review of Existing
IoT Architectures Security and Privacy Issues and Concerns,” International Journal of
Advanced Computer Science and Applications (IJACSA), vol. 10, no. 7, 57/31 2019, DOI:
10.14569/IJACSA.2019.0100733.
[31] “LNCS Titles published in 2015,” springer.com.
http://www.springer.com/computer/lncs?SGWID=4-164-66-653429-0 (accessed May 03, 2020).
[32] M. Mittal, V. E. Balas, L. M. Goyal, and R. Kumar, Eds., Big Data Processing Using Spark in
Cloud. Springer Singapore, 2019.
[33] S. Tanwar, S. Tyagi, and N. Kumar, Eds., Multimedia Big Data Computing for IoT Applications:
Concepts, Paradigms, and Solutions. Springer Singapore, 2020.
[34] A. Dhankhar, K. Solanki, A. Rathee and Ashish, “Predicting Student’s Performance by using
Classification Methods,” International Journal of advanced trends in computer science and
engineering, Volume 8 No. 4, 2019.
[35] A. Dhankhar and K. Solanki, State of the Art of Learning Analytics in Higher Education,
International journal of emerging trends in engineering research, Vol. 8 No. 3, pp. 868-877,
2020.
[36] M. Hooda and C. Rana, Learning Analytics Lens: Improving Quality of Higher Education,
International journal of emerging trends in engineering research, Vol. 8 No. 5, pp. 1626-1646,
2020.
[37] A. Dhankhar and K. Solanki, A Comprehensive Review of Tools & Techniques for Big Data
Analytics, International journal of emerging trends in engineering research, Vol. 7 No. 11, pp.
556-562, 2019.
[38] O. Dahiya and K. Solanki, S. Dalal, A. Dhankhar, Regression Testing: Analysis of its
Techniques for Test Effectiveness, International Journal of advanced trends in computer science
and engineering, Vol. 9, No. 1, pp. 737-744, 2020.
[39] O. Dahiya and K. Solanki, Comprehensive cognizance of Regression Test Case Prioritization
Techniques, International journal of emerging trends in engineering research, Vol. 7 No. 11, pp.
638-646, 2019.
[40] O. Dahiya and K. Solanki, S. Dalal, A. Dhankhar, An Exploratory Retrospective Assessment on
the Usage of Bio-Inspired Computing Algorithms for Optimization, International journal of
emerging trends in engineering research, Vol. 8 No. 2, pp. 414-434, 2020.
[41] O. Dahiya and K. Solanki, and A. Dhankhar, Risk-Based Testing: Identifying, Assessing,
Mitigating & Managing Risks Efficiently In Software Testing, International Journal of advanced
research in engineering and technology (IJARET), Vol. 11, Issue 3, pp. 192-203, 2020.
[42] O. Dahiya, and K. Solanki, A systematic literature study of regression test case prioritization
approaches, International Journal of Engineering & Technology, 7(4), pp.2184-2191, 2018.
[43] O. Dahiya, K. Solanki and S. dalal, Comparative Analysis of Regression Test Case Prioritization
Techniques, International Journal of advanced trends in computer science and engineering, Vol.
8 No. 4, pp. 1521-1531, 2019.
[44] K. Solanki, Y. Singh, and S. Dalal, “Experimental analysis of m-ACO technique for regression
testing,” Indian Journal of Science and Technology, 9(30), pp.1-7.
[45] K. Solanki, and S. Kumari, “Comparative study of software clone detection techniques.”
In 2016 Management and Innovation Technology International Conference (MITicon), pp.
MIT-152, IEEE, 2016.
[46] Shivani Yadav and Bal Kishan, “Reliability of Component-Based Systems – A Review”,
International Journal of Advanced Trends in Computer Science and Engineering, vol. 8, no. 2,
pp. 293-299, 2019. doi: doi.org/10.30534/ijatcse/2019/31822019
Big Data Analytics for IoT
http://www.iaeme.com/IJARET/index.asp 603 editor@iaeme.com
[47] Shivani Yadav and Bal Kishan, “Assessment of software quality models to measure the
effectiveness of software quality parameters for Component Based Software (CBS)”, Journal of
Applied Science and Computations, vol. 6, no. 4, pp. 2751-2756, 2019.
[48] S. Yadav and B. Kishan, “Analysis and Assessment of Existing Software Quality Models to
Predict the Reliability of Component-Based Software”, International journal of emerging trends
in engineering research, vol. 8, no. 6, 2020. [In Press]
[49] P. Gulia and Palak, “Nature-inspired soft computing-based software testing techniques for
reusable software components” Journal of Theoretical & Applied Information Technology,
95(24), 2017.
[50] P. Gulia, and Palak, “Hybrid swarm and GA based approach for software test case selection.”
International Journal of Electrical & Computer Engineering, pp. 2088-8708, Issue-9, 2019.
[51] R. Ratra, and P. Gulia, “Big Data Tools and Techniques: A Roadmap for Predictive Analytics.”,
International Journal of Engineering and Advanced Technology (IJEAT), Vol. 9, Issue-2, pp.
4986-4992, 2019.
[52] K. Vikram, Ch.Aparna, Harshitha.B and Ishpreet Kaur, A Secure and Certifiable Access
Mechanism System Designed For Big Data Storage In Clouds. International Journal of
Computer Engineering & Technology, 9(2), 2018, pp. 86–90.
[53] Azhagammal Alagarsamy and Dr. K. Ruba Soundar, A Survey Paper on Deep Belief Network
for Big Data. International Journal of Computer Engineering and Technology, 9(5), 2018, pp.
161-166.
[54] Dr. Nirmal Kumar Gupta, Addressing Big Data Security Issues and Challenges. International
Journal of Computer Engineering & Technology, 9(4), 2018, pp. 229-237.
[55] Kodimalar Palanivel and Chellammal Surianarayanan, An Approach for Prediction of Crop
Yield Using Machine Learning and Big Data Techniques, International Journal of Computer
Engineering and Technology 10(3), 2019, pp. 110-118.
View publication stats

More Related Content

PDF
IRJET- Scope of Big Data Analytics in Industrial Domain
PDF
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
PDF
Sameer Kumar Das International Conference Paper 53
PPTX
DEVOLSAFGSDFHGKJHJGHFGDFSDFDSDASFDGFUC.pptx
PDF
Study on Issues in Managing and Protecting Data of IOT
PDF
IoT devices enabled for data analytics intelligent decision making using mach...
PPTX
IRJET- Scope of Big Data Analytics in Industrial Domain
CASE STUDY ON METHODS AND TOOLS FOR THE BIG DATA ANALYSIS
Sameer Kumar Das International Conference Paper 53
DEVOLSAFGSDFHGKJHJGHFGDFSDFDSDASFDGFUC.pptx
Study on Issues in Managing and Protecting Data of IOT
IoT devices enabled for data analytics intelligent decision making using mach...

Similar to BigDataAnalytics for IOT BigDataAnalytic (20)

PDF
08258937bigdata en la industria 4.0
PDF
trends of information systems and artificial technology
PDF
DEALING CRISIS MANAGEMENT USING AI
PDF
DEALING CRISIS MANAGEMENT USING AI
PDF
DEALING CRISIS MANAGEMENT USING AI
PPTX
PRESTAdASFDGFHGHKJLKKHGFDSsadsfdgfhfgghjA.pptx
PDF
Deep Learning and Big Data technologies for IoT Security
PDF
Inventory of IoT slide sets
PDF
Content an Insight to Security Paradigm for BigData on Cloud: Current Trend a...
PPTX
Group 4 IT INfrastructure Group presentation Final [Auto-saved].pptx
PDF
Isolating values from big data with the help of four v’s
PDF
¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?
PDF
Comparing and analyzing various method of data integration in big data
PDF
IRJET-A Review: IoT and Cloud Computing for Future Internet
PDF
Inventory of my IoT slide sets
PDF
IRJET- Analysis of Big Data Technology and its Challenges
PDF
Attaining IoT Value: How To Move from Connecting Things to Capturing Insights
PDF
Different analytical frameworks and bigdata model for Internet of Things
PDF
A Review Paper on Big Data: Technologies, Tools and Trends
PDF
IRJET- A Scenario on Big Data
08258937bigdata en la industria 4.0
trends of information systems and artificial technology
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
PRESTAdASFDGFHGHKJLKKHGFDSsadsfdgfhfgghjA.pptx
Deep Learning and Big Data technologies for IoT Security
Inventory of IoT slide sets
Content an Insight to Security Paradigm for BigData on Cloud: Current Trend a...
Group 4 IT INfrastructure Group presentation Final [Auto-saved].pptx
Isolating values from big data with the help of four v’s
¿Cómo puede ayudarlo Qlik a descubrir más valor en sus datos de IoT?
Comparing and analyzing various method of data integration in big data
IRJET-A Review: IoT and Cloud Computing for Future Internet
Inventory of my IoT slide sets
IRJET- Analysis of Big Data Technology and its Challenges
Attaining IoT Value: How To Move from Connecting Things to Capturing Insights
Different analytical frameworks and bigdata model for Internet of Things
A Review Paper on Big Data: Technologies, Tools and Trends
IRJET- A Scenario on Big Data
Ad

Recently uploaded (20)

DOCX
573137875-Attendance-Management-System-original
PDF
PPT on Performance Review to get promotions
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PDF
Well-logging-methods_new................
PPTX
OOP with Java - Java Introduction (Basics)
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPT
Project quality management in manufacturing
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
Digital Logic Computer Design lecture notes
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
573137875-Attendance-Management-System-original
PPT on Performance Review to get promotions
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
Well-logging-methods_new................
OOP with Java - Java Introduction (Basics)
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Project quality management in manufacturing
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Embodied AI: Ushering in the Next Era of Intelligent Systems
Automation-in-Manufacturing-Chapter-Introduction.pdf
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Digital Logic Computer Design lecture notes
R24 SURVEYING LAB MANUAL for civil enggi
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
CH1 Production IntroductoryConcepts.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Ad

BigDataAnalytics for IOT BigDataAnalytic

  • 1. See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/342946040 Big Data Analytics for IoT Article in INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING & TECHNOLOGY · July 2020 DOI: 10.34218/IJARET.11.6.2020.054 CITATIONS 9 READS 8,441 1 author: Preeti Gulia Maharshi Dayanand University 107 PUBLICATIONS 517 CITATIONS SEE PROFILE All content following this page was uploaded by Preeti Gulia on 15 July 2020. The user has requested enhancement of the downloaded file.
  • 2. http://www.iaeme.com/IJARET/index.asp 593 editor@iaeme.com International Journal of Advanced Research in Engineering and Technology (IJARET) Volume 11, Issue 6, June 2020, pp. 593-603, Article ID: IJARET_11_06_054 Available online athttp://www.iaeme.com/IJARET/issues.asp?JType=IJARET&VType=11&IType=6 ISSN Print: 0976-6480 and ISSN Online: 0976-6499 DOI: 10.34218/IJARET.11.6.2020.054 © IAEME Publication Scopus Indexed BIG DATA ANALYTICS FOR IOT Preeti Gulia Assistant Professor, Department of Computer Science and Application, MDU, Rohtak, Haryana, India Ayushi Chahal Research Scholar, Department of Computer Science and Application, MDU, Rohtak, Haryana, India ABSTRACT The Internet has helped technology and communication to grow very fast, which further increased the connection between different machines and sensor-based devices. This connection of machines or devices through the internet gives rise to the concept of IoT (Internet of Things). Various wearable devices like smart-watch, cars, home appliances like washing machines, doors, door locks, lights, etc. are now connected over the Internet of things. These sensor devices produce Big data in bulk per day. This data can be used for analysis to solve out different day-today problems. This paper discusses different Big data tools and techniques that can be used for IoT frameworks. It also presented a way how Big Data can be used to analyze IoT data sets intelligently. Different platforms of Big-data Analytics are explained in detail, and light is given on which of them is best for IoT data. Keywords: Big data, Frameworks, Internet of Things (IoT), Architecture, Big Data Analytics (BDA) Cite this Article: Preeti Gulia and Ayushi Chahal, Big Data Analytics for IoT, International Journal of Advanced Research in Engineering and Technology (IJARET), 11(6), 2020, pp. 593-603. http://www.iaeme.com/IJARET/issues.asp?JType=IJARET&VType=11&IType=6 1. INTRODUCTION Big-Data is developing briskly, and so is IoT. [31] This recent advancement is affecting all areas of business and technology. Data produced by IoT devices play an essential role in the conversion of raw data to knowledge. This can be done by applying the correct methods of big data analytics over raw data. Gartner has characterized Big Data in three qualities [1], i.e., volume, variety, and velocity, which are discussed in detail in section 2 of the paper. IoT collects data in different forms and from different sources; that is why it is called a heterogeneous data.[2] IoT can collect data from healthcare industries, smart homes, smart traffic management, airplane system, railways system, weather forecasting system, agricultural sensors, and many more shown in figure 1. IoT data is unstructured, having no pattern. By
  • 3. Preeti Gulia and Ayushi Chahal http://www.iaeme.com/IJARET/index.asp 594 editor@iaeme.com applying the right big data analytical techniques, one can find out the hidden pattern, new information, hidden correlation, revel trends, etc. [3], [36], [50] from this unstructured data. 1.1. IoT When a set of anyone, anything, anytime, anyplace, any service, and any network gets connected, it creates a situation of Internet of Things (IoT). Researchers give different definitions and architectures for IoT. IoT is a system of interrelated things or machines (computing, mechanical or digital devices) which can connect these machines or things without any interruption of human. It is a Machine to Machine (M2M) communication process. Figure 1 Data sources for IoT Different researchers have given different forms of architecture. The most basic architecture of IoT shown in fig 2 below, which consists of 3 Layers, named as follows [5]: Application Layer Network Layer Perception Layer Figure 2 Three-layer architecture of IoT [29] • Perception Layer: The bottom-most layer is called Perception Layer. It is used for data collection.[32] • Network Layer: This layer is a middle layer, which is used to set up a connection between perception and application layer. [26] • Application Layer: This layer provides services and is used for analyzing information received from the other two layers. Fig 3 also shows a different kind of IoT architecture inspired by [5]. In this architecture, gateway and middleware are added to the previous architecture. It is a five-layer architecture. The layers are as follows: Application Layer Middleware Layer Network Layer Access Gateway Layer Perception Layer Figure 3 Five layered IoT architecture [5]
  • 4. Big Data Analytics for IoT http://www.iaeme.com/IJARET/index.asp 595 editor@iaeme.com • Perception Layer: This layer is also called the edge layer.[29] • Access Gateway Layer: This layer is used to manage the conveying messages or data between IoT devices.[5] • Network Layer: This layer works the same as that of the above layer. It also helps to convey messages among sender and receivers in IoT systems. • Middleware Layer: This layer provides a connection between the hardware and different software. It helps in setting up a pliable alliance of hardware and its applications.[5] • Application Layer: This layer provides the same services as the third layer of the three- layer architecture. It exists over the top of all other layers. It is used to analyze all the information that is provided by the layers below it. 1.2. Big Data Nowadays, big data related to the companies which are using internet services are proliferating. For example, over hundreds of Petabyte (PB) data is handled by Google, Facebook logs around 10 PB data per month, 10 PB data is analyzed and processed by Baidu, and many more. [27] In the model of IoT, sensors are used to collect and transmit data all around the world. These sensors generate increasingly growing data, which tends to form a vast heterogeneous dataset. This data needs to be stored and processed such that the quality of data does not compromise. To maintain the quantity and mutual relations of such extensive data, existing enterprises of IT have to improve their architectures and infrastructures. [7] There is a need for new mining, analyzing, modeling, visualizing, and forecasting technologies in order to reveal the intrinsic properties of this heterogeneous data and improve decision making [33-35]. For extensive discussion and definition of the term, big data let us have a look at V’s model. Doug Laney, who is working as an analyst of META (presently Gartner) who presented 3V’s model in early 2001, which described different challenges and opportunities created by a large amount of data generated by sensors [27]. After that, with the advancement in the big data area in 2011, IDC came up with four V’s model. Moreover, with further advancements, scientists have reached to 10V’s of Big Data. In 10V’s model, we have the following [28]: Figure 4 10 V’s model of Big Data [28] • Volume: It is the most crucial V in v’s model. It describes big data. With the rise in data generation devices, broad, diverse data is being generated. [5] Our traditional data
  • 5. Preeti Gulia and Ayushi Chahal http://www.iaeme.com/IJARET/index.asp 596 editor@iaeme.com processors and techniques cannot handle such a large amount of heterogeneous data. So, there is a strict requirement of enhanced techniques to process such data. • Velocity: Velocity represents the rate of big incoming data from various devices. This velocity is indeed an essential factor of big data. Velocity describes the speed of generating the data by various machines over the network. One of the most common examples of data generation speed is social media. It creates a variety of data. Now, every person is concerned to post most hot updates about themselves (a tweet, Instagram posts, WhatsApp status updates, etc.) • Variety: As the definition of Big data says, it is a large amount of heterogeneous data. So, variety is indeed the essential property of big data. These days collection of different kinds of data types (structured, semi-structured, or unstructured) exists over data generation devices. Sometimes, this collected data may be in a different format as expected. This unexpected format may cause trouble in the data processing. To remove these troubles, any organization should have that kind of data storage system which can examine and process any form of data irrespective of their structure.[5] • Value: Continuous amount of data generation tends to create Big Data. This data is of no use until or unless it seems to have some value. Thus the value of data indeed is an essential factor of big data. These days big data analytics, which has become an integral part of the society, is based on the valuable data that different devices provide to the analyst or data scientist. It is not always necessary that big data will have a value. • Veracity: Veracity does not refer to the quantity of data. It belongs to the understandability of data that Big data provides to its users. Any organization working on a large amount of data should remove “dirty data” before it accumulates in the systems. • Validity: For future use of data, it must be precise and accurate. Any organization should validate the data if it wants to make correct decisions for the future based on the data collected by the devices. So, Validity is considered an essential factor for big data. • Variability: Variability includes data consistency and value of data. • Viscosity: Viscosity is considered as a part of velocity. It is used to describe the delay or lag-time which occurs between the sender and receiver during data transmission.[5] • Virality: It describes the data speed. This property has checks on the data speed with which sender and receiver access data from different devices. • Visualization: This property represents big data symbolically. Visualization helps to find out the hidden patterns. These hidden patterns help in decision making for any query of big data. Visualization helps Big data to play an essential part in decision- making. Figure 5 Interrelation between big data and IoT
  • 6. Big Data Analytics for IoT http://www.iaeme.com/IJARET/index.asp 597 editor@iaeme.com For handling such a massive amount of data, reliable software systems are required. Software testing plays a crucial role in ensuring the quality of the software [37-49] 2. INTEGRATION OF BIG DATA AND IOT In the current lifestyle, everything is merged with technology. IoT has been emerging rapidly in many industries. IoT consists of devices that collect the data, and with the help of this data, these devices connect with the real world. This data is useful to us as it can help in solving may research problem in one way or another. To analyze this data, various big data analytical tools and techniques can be beneficial. IoT and Big Data are considered as two sides of the same coin. Figure 5 shows the interrelation between IoT and Big data analytics. 2.1. IoT and Big Data Analytics relationship IoT data vary much different from standard data because it includes various sensors and objects for during collection of data. IoT data is a heterogeneous data which involves noise, variety, and have rapid growth[8]. It is assumed that by 2020 there will be 4.4 trillion data around us by IoT devices. Also, these devices will collect, gaze, transmit, analyze, share the real-time data, which changes with every millisecond. [25] Here comes the vital role of Big Data Analytics to handle such a redundant, heterogeneous, fluctuating data. [4] Big data is used to store this vast amount of data with different storage techniques and then analyzing them for particular outcomes. From various research its id generalized that big IoT data has three features, that confirm it to get fit in the big data paradigm: i. It consists of an abundant amount of terminals, which generate massive raw-data. ii. Raw-data generated by devices used in IoT can be in any form, but generally, it is unstructured.[30] iii. IoT devices generated raw-data are useless if not examined. 2.2. Steps for IoT Big Data Processing To manage IoT Big data, the process is broadly classified into four steps, described below [24]: i. The first step is to manage different data sources of IoT, i.e., IoT sensor devices, where sensors in a device interact with each other with the help of different applications and generate highly unstructured, semi-structured, or structured data. ii. In the second step, data generated by different IoT devices called Big IoT data is collected and stored by the Big data storage system. This data is based on the 3V model given by Gartner. iii. In Big data storage system, this IoT data is converted into shared and distributed Big data files. iv. After that, it applies different analytical tools for analysis of data like Hadoop, Map- Reduce or Spark, and many more, which are further discussed in the next section. v. In the last step, the report corresponding to the injected data is generated and presented to the user. 3. DIFFERENT PLATFORMS OF BIG DATA ANALYTICS FOR IOT Big Data Analytics needs some tools and techniques to transform IoT structured, semi- structured, and unstructured data into metadata or comprehensive form for the further analysis process. These tools use algorithms that discover patterns, correlations, trends over various forms of data. [9]
  • 7. Preeti Gulia and Ayushi Chahal http://www.iaeme.com/IJARET/index.asp 598 editor@iaeme.com After analyzing the data, these tools are also used to visualize these outcomes in the form of a graph, tables, pie chart, bar chart, etc. Here in this section, various platforms that can analyze IoT data are discussed. Big data analytics platforms are described below [12]: 3.1. Apache Hadoop Apache Hadoop is an open-source platform. It is used as a storage of a large volume of raw data. It can perform Big data Analytics. This standard framework consists of Apache Hive, Hadoop kernel, Map-Reduce, HDFS (Hadoop Distributed File System). Hadoop contains libraries that use a simple programming model. HDFS stores the data while Map-Reduce processes this data in a distributed manner. The combination of HDFS and Map-Reduce framework allows data to get replicated and distributed in N different nodes.[10] Hadoop is based on two nodes: Master node and Slave node. Master node helps in dividing the problem into sub-problems. These sub-problems are then distributed into different slave node. After that, the output of all the sub-problems from slaves is collected by the master node. Figure 6 Architecture of Hadoop [11] 3.2. Apache Spark It is also an open-source as Apache Hadoop, but it is used to overcome the limitations of Map- Reduce like fault tolerance, linear scalability. It provides high speed, ease of use, and sophisticated analytics. Figure 7 shows the architectural diagram of Apache Sparks. Sparks libraries unify the analysis of graphs and ETL. It provides real-time analysis. 3.3. Dryad It works as a data flow graph for parallel as well as distributed data sets. A user can use multiple machines at a time without knowing concurrent programming. It is efficient in handling faults in the cluster, graph generation, scheduling available free machines for allotment, visualizing jobs to free machines, etc. [21] Figure 7 Architecture of Spark [13]
  • 8. Big Data Analytics for IoT http://www.iaeme.com/IJARET/index.asp 599 editor@iaeme.com 3.4. Apache Drill It is used in a distributed system for Big IoT data analytics. It can be used with many query languages. It can handle thousands of servers at a time. It uses HDFS for storage and Map- Reduce for analysis. [22] 3.5. Storm It is used for extensive data processing. It works on real-time data, which should be distributed and fault-tolerant. It forms a cluster of data that is similar to Hadoop clusters. It also works as a Master node and worker node. 3.6. Splunk It is a combination of Big data and cloud technology. It uses a web interface to allow the user to analyze, search, and monitor the data. It helps to index structured and unstructured data generated by machines. Hence, it is useful for IoT Big data-sets. It is an intelligent support system for real-time and business-oriented data exploration. [23] 3.7. Jaspersoft It is an open-source tool that is used for real-time data analysis. It visualizes data on various platforms like Mongo DB, Cassandra, Redis. It can create powerful HTML reports. 3.8. Apache Mahout It is a data analytics software that requires no license, i.e., open-source. It is used for automatic learning. It is used to implement different machine learning methods. Big companies use it like Google, Yahoo, Amazon, IBM, Twitter, Facebook, etc. to implement scalable machine learning algorithms. 3.9. 1010data It consists of columns in the database. It deals with semi-structured data. It supports enormous scale infrastructure. It is not considered adequate for extracting the data, transforming the data, and loading the data. It provides advanced analytical services, including statistical analysis and optimization also. [14] 3.10. Cloudera Data Hub It works as a Data Hub for different enterprises. It is used for data analytics and data processing specifically for IoT based data. It uses Hadoop as a base for analytical purposes. It can be used as a central point for IoT based extensive data analysis. It provides reliability, data access control, high performance, security. It does not have its hardware, so it depends on the third party for processing. [15] 3.11. SAP-Hana It is used for in-memory addressing transactions for big IoT data analytics. It gives solutions to various big unstructured IoT data. SAP-Hana contains libraries for spatial processing, text analysis, and support R tool language. [16] 3.12. HP-HAVEn HP introduced Hadoop Autonomy Vertica Enterprise (HAVEn). A large number of HP systems use this platform for Big IoT data analytics. It is for massive data, which is analyzed as the columnar database. It provides parallel processing. [17]
  • 9. Preeti Gulia and Ayushi Chahal http://www.iaeme.com/IJARET/index.asp 600 editor@iaeme.com 3.13. Hortonworks Hortonworks is a Hadoop based platform. It is used for Big IoT data analytics. It is open-source software and an improved version of Hive. It can-not minimizes the number of nodes group. [18] 3.14. Pivotal Big Data Suite It is installed, tested, and implemented on a public cloud. It is given as a single license. Pivotal helps in massive parallel processing. It can perform predictive analytics on IoT data, but this data should be kept in HDFS. [19] 3.15. Infobright It is suitable for the analysis of machine-generated data like IoT data. It can analyze up to 50 TB at a time. It works with large scale data-based systems such as Hadoop. It is a columnar designed tool which has data skipping and automatic indexing property. [20] 4. CONCLUSION IoT has now become a significant source of Big Data, which is useless if not analyzed properly. This paper focuses on Big Data context concerning the Internet of Things. It describes the basic concepts of IoT and its architecture. It gives an elaborated structure of Gartner’s 3 V’s model Big Data in the form of 10 V’s model. This paper enhances the understandability of the reader for the relation between IoT, Big data, and analytics. It familiarizes reader to different Big Data Analytics platforms which can handle various IoT datasets. After reading his paper, a reader will be aware of different platforms and will be able to select one for their particular problems. REFERENCES [1] M. Beyer, ``Gartner says solving `Big Data' challenge involves more than just managing volumes of data,'' Tech. Rep., AaltoDoc, Aalto Univ., 2011. [2] R. Mital, J. Coughlin, and M. Canaday, ``Using big data technologies and analytics to predict sensor anomalies,'' in Proc. Adv. Maui Opt. Space Surveill. Technol. Conf., Sep. 2014, p. 84. [3] N. Golchha, ``Big data-the information revolution,'' Int. J. Adv. Res., vol. 1, no. 12, pp. 791_794, 2015. [4] Y. Wang, L. Kung, W. Y. C. Wang, and C. G. Cegielski, “An integrated big data analytics- enabled transformation model: Application to health care,” Inf. Manage., vol. 55, no. 1, pp. 64– 79, Jan. 2018. [5] R. Khan, S. Khan, R. Zaheer & S. Khan, “Future Internet: The internet of things architecture, possible applications, and key challenges,” In Proceedings of international conference on frontiers of information technology, pp. 275-260, 2012. [6] A. Ilapakurti, J. S. Vuppalapati, S. Kedari, S. Kedari, C. Chauhan, and C. Vuppalapati, “iDispenser #x2014; Big Data Enabled Intelligent Dispenser,” in 2017 IEEE Third International Conference on Big Data Computing Service and Applications (BigDataService), pp. 124–130, 2017. [7] Y. Wang, L. Kung, and T. A. Byrd, “Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations,” Technol. Forecast. Soc. Change, vol. 126, pp. 3–13, Jan. 2018. [8] M. Marjani, F. Nasaruddin, A. Gani, A. Karim, I.A.T. Hashem, A. Siddiqa, I. Yaqoob “Big IoT Data Analytics: Architecture, Opportunities, and Open Research Challenges,” IEEE Access, vol. 5, pp. 5247–5261, 2017.
  • 10. Big Data Analytics for IoT http://www.iaeme.com/IJARET/index.asp 601 editor@iaeme.com [9] E. Ahmed, Ibrar Yaqoob, Ibrahim Abaker Targio Hashem, Imran Khan, Abdelmuttlib Ibrahim Abdalla Ahmed, Muhammad Imran, Athanasios V. Vasilakos, “The role of big data analytics in Internet of Things,” Computer Networks, vol. 129, pp. 459–471, Dec. 2017. [10] T. O. Center: Introducción a Hadoop y su ecosistema. http://www.ticout.com/blog/2013/04/02/introduccion-a-Hadoop-y-su-ecosistema/ [11] Acharjya, D.P., Ahmed, K., “A survey on Big Data analytics: challenges, open research issues, and tools.” in Int. J. Adv. Comput. Sci. Appl. Vol.7, issue 2, pp. No.- 511–518, 2016. [12] F. Constante Nicolalde, F. Silva, B. Herrera, and A. Pereira, “Big Data Analytics in IoT: Challenges, Open Research Issues and Tools,” in Trends and Advances in Information Systems and Technologies, Cham, 2018, pp. 775–788. [13] A. S. Foundation: Spark 0.8.0: This document gives a short overview of how Spark runs on clusters, to make it easier to understand the components involved, 2014, https://spark. apache.org/docs/0.8.0/cluster-overview.html [14] V. Morabito, “Managing change for big data driven innovation,” in Big Data and Analytics. Springer, 2015, pp. 125–153. [15] A. Bhardwaj, S. Bhattacherjee, A. Chavan, A. Deshpande, A. J. Elmore, S. Madden, and A. G. Parameswaran, “Datahub: Collaborative data science & dataset version management at scale,” arXiv preprint arXiv:1409.0798, 2014. [16] F. Farber, S. K. Cha, J. Primsch, C. Bornh¨ovd, S. Sigg, and W. Lehner, “Sap hana database: data management for modern business applications,” ACM Sigmod Record, vol. 40, no. 4, pp. 45–51, 2012. [17] S. Burke, “Hp haven big data platform is gaining partner momentum,” CRN [online] http://www. crn.com/news/applications-os/240161649, 2013. [18] (2019, Accessed on 3rd December) Hortonworks. [Online]. Available: https://hortonworks.com/ [19] Y. Zhuang, Y.Wang, J. Shao, L. Chen, W. Lu, J. Sun, B.Wei, and J. Wu, “D-ocean: an unstructured data management system for data ocean environment,” Frontiers of Computer Science, vol. 10, no. 2, pp. 353–369, 2016. [Online]. Available: http://dx.doi.org/10.1007/s11704- 015-5045-6 [20] D. Slezak, P. Synak, J. Wr ´oblewski, and G. Toppin, “Infobright analytic database engine using rough sets and granular computing,” in Granular Computing (GrC), 2010 IEEE International Conference on. IEEE, 2010, pp. 432–437. [21] Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D. Dryad, “distributed data-parallel programs from sequential building blocks” in ACM SIGOPS Oper. Syst. Rev. 41, pp. No.- 59–72, 2007. [22] Kelly, J.: Apache Drill Brings SQL-Like, Ad Hoc Query Capabilities to Big Data (2013). http://wikibon.org/wiki/v/Apache_Drill_Brings_SQL-Like,_Ad_Hoc_Query_Capabilities_to_Big_Data [23] C.L.P., Chen, C.Y. Zhang, “Data-intensive applications, challenges, techniques, and technologies: a survey on Big Data.” In Inf. Sci. 275, pp. no. -314–347, 2014. [24] G. Ingersoll, “Introducing apache mahout: Scalable, commercial-friendly machine learning for building intelligent applications,” White Paper, IBM Developer Works, pp. no. - 1- 8, 2009. [25] A. Verma, “Internet of Things and Big Data - Better Together,” Whizlabs Blog, 01-Aug-2018. [Online]. Available: https://www.whizlabs.com/blog/iot-and-big-data/. [Accessed: 11-Mar-2020]. [26] “Integrating IoT with Big Data, a Revolutionary Step,” Experfy Insights. [Online]. Available: https://www.experfy.com/blog/integrating-iot-with-big-data-a-revolutionary-step. [Accessed: 11-Mar- 2020]. [27] C.-W. Tsai, C.-F. Lai and A. V. Vasilakos, “Future Internet of Things: open issues and challenges,” Wireless Netw, vol. 20, no. 8, pp. 2201–2217, Nov. 2014, DOI: 10.1007/s11276-014- 0731-0. [28] M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Netw Appl, vol. 19, no. 2, pp. 171– 209, Apr. 2014, DOI: 10.1007/s11036-013-0489-0.
  • 11. Preeti Gulia and Ayushi Chahal http://www.iaeme.com/IJARET/index.asp 602 editor@iaeme.com [29] G. Manogaran, D. Lopez, C. Thota, K. M. Abbas, S. Pyne, and R. Sundarasekar, “Big Data Analytics in Healthcare Internet of Things,” in Innovative Healthcare Systems for the 21st Century, H. Qudrat-Ullah and P. Tsasis, Eds. Cham: Springer International Publishing, 2017, pp. 263–284. [30] F. Alshohoumi, M. Sarrab, A. AlHamadani, and D. Al-Abri, “Systematic Review of Existing IoT Architectures Security and Privacy Issues and Concerns,” International Journal of Advanced Computer Science and Applications (IJACSA), vol. 10, no. 7, 57/31 2019, DOI: 10.14569/IJACSA.2019.0100733. [31] “LNCS Titles published in 2015,” springer.com. http://www.springer.com/computer/lncs?SGWID=4-164-66-653429-0 (accessed May 03, 2020). [32] M. Mittal, V. E. Balas, L. M. Goyal, and R. Kumar, Eds., Big Data Processing Using Spark in Cloud. Springer Singapore, 2019. [33] S. Tanwar, S. Tyagi, and N. Kumar, Eds., Multimedia Big Data Computing for IoT Applications: Concepts, Paradigms, and Solutions. Springer Singapore, 2020. [34] A. Dhankhar, K. Solanki, A. Rathee and Ashish, “Predicting Student’s Performance by using Classification Methods,” International Journal of advanced trends in computer science and engineering, Volume 8 No. 4, 2019. [35] A. Dhankhar and K. Solanki, State of the Art of Learning Analytics in Higher Education, International journal of emerging trends in engineering research, Vol. 8 No. 3, pp. 868-877, 2020. [36] M. Hooda and C. Rana, Learning Analytics Lens: Improving Quality of Higher Education, International journal of emerging trends in engineering research, Vol. 8 No. 5, pp. 1626-1646, 2020. [37] A. Dhankhar and K. Solanki, A Comprehensive Review of Tools & Techniques for Big Data Analytics, International journal of emerging trends in engineering research, Vol. 7 No. 11, pp. 556-562, 2019. [38] O. Dahiya and K. Solanki, S. Dalal, A. Dhankhar, Regression Testing: Analysis of its Techniques for Test Effectiveness, International Journal of advanced trends in computer science and engineering, Vol. 9, No. 1, pp. 737-744, 2020. [39] O. Dahiya and K. Solanki, Comprehensive cognizance of Regression Test Case Prioritization Techniques, International journal of emerging trends in engineering research, Vol. 7 No. 11, pp. 638-646, 2019. [40] O. Dahiya and K. Solanki, S. Dalal, A. Dhankhar, An Exploratory Retrospective Assessment on the Usage of Bio-Inspired Computing Algorithms for Optimization, International journal of emerging trends in engineering research, Vol. 8 No. 2, pp. 414-434, 2020. [41] O. Dahiya and K. Solanki, and A. Dhankhar, Risk-Based Testing: Identifying, Assessing, Mitigating & Managing Risks Efficiently In Software Testing, International Journal of advanced research in engineering and technology (IJARET), Vol. 11, Issue 3, pp. 192-203, 2020. [42] O. Dahiya, and K. Solanki, A systematic literature study of regression test case prioritization approaches, International Journal of Engineering & Technology, 7(4), pp.2184-2191, 2018. [43] O. Dahiya, K. Solanki and S. dalal, Comparative Analysis of Regression Test Case Prioritization Techniques, International Journal of advanced trends in computer science and engineering, Vol. 8 No. 4, pp. 1521-1531, 2019. [44] K. Solanki, Y. Singh, and S. Dalal, “Experimental analysis of m-ACO technique for regression testing,” Indian Journal of Science and Technology, 9(30), pp.1-7. [45] K. Solanki, and S. Kumari, “Comparative study of software clone detection techniques.” In 2016 Management and Innovation Technology International Conference (MITicon), pp. MIT-152, IEEE, 2016. [46] Shivani Yadav and Bal Kishan, “Reliability of Component-Based Systems – A Review”, International Journal of Advanced Trends in Computer Science and Engineering, vol. 8, no. 2, pp. 293-299, 2019. doi: doi.org/10.30534/ijatcse/2019/31822019
  • 12. Big Data Analytics for IoT http://www.iaeme.com/IJARET/index.asp 603 editor@iaeme.com [47] Shivani Yadav and Bal Kishan, “Assessment of software quality models to measure the effectiveness of software quality parameters for Component Based Software (CBS)”, Journal of Applied Science and Computations, vol. 6, no. 4, pp. 2751-2756, 2019. [48] S. Yadav and B. Kishan, “Analysis and Assessment of Existing Software Quality Models to Predict the Reliability of Component-Based Software”, International journal of emerging trends in engineering research, vol. 8, no. 6, 2020. [In Press] [49] P. Gulia and Palak, “Nature-inspired soft computing-based software testing techniques for reusable software components” Journal of Theoretical & Applied Information Technology, 95(24), 2017. [50] P. Gulia, and Palak, “Hybrid swarm and GA based approach for software test case selection.” International Journal of Electrical & Computer Engineering, pp. 2088-8708, Issue-9, 2019. [51] R. Ratra, and P. Gulia, “Big Data Tools and Techniques: A Roadmap for Predictive Analytics.”, International Journal of Engineering and Advanced Technology (IJEAT), Vol. 9, Issue-2, pp. 4986-4992, 2019. [52] K. Vikram, Ch.Aparna, Harshitha.B and Ishpreet Kaur, A Secure and Certifiable Access Mechanism System Designed For Big Data Storage In Clouds. International Journal of Computer Engineering & Technology, 9(2), 2018, pp. 86–90. [53] Azhagammal Alagarsamy and Dr. K. Ruba Soundar, A Survey Paper on Deep Belief Network for Big Data. International Journal of Computer Engineering and Technology, 9(5), 2018, pp. 161-166. [54] Dr. Nirmal Kumar Gupta, Addressing Big Data Security Issues and Challenges. International Journal of Computer Engineering & Technology, 9(4), 2018, pp. 229-237. [55] Kodimalar Palanivel and Chellammal Surianarayanan, An Approach for Prediction of Crop Yield Using Machine Learning and Big Data Techniques, International Journal of Computer Engineering and Technology 10(3), 2019, pp. 110-118. View publication stats