Big Data Analytics for Sensor Network Collected Intelligence A volume in Intelligent Data Centric Systems Hui-Huang Hsu

Big Data Analytics for Sensor Network Collected
Intelligence A volume in Intelligent Data
Centric Systems Hui-Huang Hsu pdf download
https://textbookfull.com/product/big-data-analytics-for-sensor-
network-collected-intelligence-a-volume-in-intelligent-data-
centric-systems-hui-huang-hsu/
Download more ebook from https://textbookfull.com

We believe these products will be a great fit for you. Click
the link to download now, or visit textbookfull.com
to discover even more!
Smart Sensors Networks Communication Technologies and
Intelligent Applications A volume in Intelligent Data
Centric Systems Fatos Xhafa
https://textbookfull.com/product/smart-sensors-networks-
communication-technologies-and-intelligent-applications-a-volume-
in-intelligent-data-centric-systems-fatos-xhafa/
Data Analytics for Intelligent Transportation Systems
Mashrur Chowdhury
https://textbookfull.com/product/data-analytics-for-intelligent-
transportation-systems-mashrur-chowdhury/
Big Data Analytics for Intelligent Healthcare
Management 1st Edition Nilanjan Dey
https://textbookfull.com/product/big-data-analytics-for-
intelligent-healthcare-management-1st-edition-nilanjan-dey/
Big Data Analytics Systems Algorithms Applications
C.S.R. Prabhu
https://textbookfull.com/product/big-data-analytics-systems-
algorithms-applications-c-s-r-prabhu/

Computational Intelligence Applications in Business
Intelligence and Big Data Analytics 1st Edition Vijayan
Sugumaran
https://textbookfull.com/product/computational-intelligence-
applications-in-business-intelligence-and-big-data-analytics-1st-
edition-vijayan-sugumaran/
Traffic Measurement for Big Network Data Chen
https://textbookfull.com/product/traffic-measurement-for-big-
network-data-chen/
Obtaining Value from Big Data for Service Systems,
Volume I: Big Data Management 2nd Edition Steven H.
Kaiser
https://textbookfull.com/product/obtaining-value-from-big-data-
for-service-systems-volume-i-big-data-management-2nd-edition-
steven-h-kaiser/
Big Mechanisms in Systems Biology Big Data Mining
Network Modeling and Genome Wide Data Identification
1st Edition Bor-Sen Chen
https://textbookfull.com/product/big-mechanisms-in-systems-
biology-big-data-mining-network-modeling-and-genome-wide-data-
identification-1st-edition-bor-sen-chen/
Healthcare Big Data Analytics Computational
Optimization and Cohesive Approaches Intelligent
Biomedical Data Analysis 10 1st Edition Bhoi
https://textbookfull.com/product/healthcare-big-data-analytics-
computational-optimization-and-cohesive-approaches-intelligent-
biomedical-data-analysis-10-1st-edition-bhoi/

Big Data Analytics for
Sensor-Network Collected
Intelligence

Big Data Analytics for
Sensor-Network Collected
Intelligence
Edited by
Hui-Huang Hsu
Tamkang University, Taiwan
Chuan-Yu Chang
National Yunlin University of Science and Technology, Taiwan
Ching-Hsien Hsu
Chung Hua University, Taiwan
Series Editor Fatos Xhafa
Universitat Politècnica de Catalunya, Spain

Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1800, San Diego, CA 92101-4495, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
# 2017 Elsevier Inc. All rights reserved
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying, recording, or any information storage and retrieval system, without
permission in writing from the publisher. Details on how to seek permission, further information about
the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance
Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher
(other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience
broaden our understanding, changes in research methods, professional practices, or medical treatment
may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and
using any information, methods, compounds, or experiments described herein. In using such information or
methods they should be mindful of their own safety and the safety of others, including parties for whom they
have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any
liability for any injury and/or damage to persons or property as a matter of products liability, negligence or
otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the
material herein.
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
ISBN: 978-0-12-809393-1
For information on all Academic Press publications
visit our website at https://www.elsevier.com/books-and-journals
Publisher: Joe Hayton
Acquisition Editor: Sonnini R. Yura
Editorial Project Manager: Ana Claudia A. Garcia
Production Project Manager: Punithavathy Govindaradjane
Cover Designer: Victoria Pearson
Typeset by SPi Global, India

List of Contributors
Ahmad Anbar
The George Washington University, Washington, DC, United States
Haytham Assem
IBM, Dublin, Ireland
Christophe Blanchet
CNRS IFB, Orsay, France
Teodora S. Buda
Jiannong Cao
The Hong Kong Polytechnic University, Kowloon, Hong Kong
Chuan-Yu Chang
National Yunlin University of Science and Technology, Douliu City, Yunlin County, Taiwan
Jinjun Chen
University of Technology Sydney, Broadway, NSW, Australia
Cen Chen
Hunan University, Changsha, China
Szu-Ta Chen
National Taiwan University Hospital Yun-Lin Branch, Douliu City, Yunlin County, Taiwan
Kang Chen
Southern Illinois University, Carbondale, IL, United States
Zixue Cheng
University of Aizu, Aizuwakamatsu, Japan
Cees de Laat
University of Amsterdam, Amsterdam, The Netherlands
Yuri Demchenko
Mingxing Duan
Tarek El-Ghazawi
Weiwei W. Fang
Beijing Key Lab of Transportation Data Analysis and Mining, Beijing Jiaotong
University, Beijing, China
Edmond J. Golden III
National Institute of Standards and Technology, Gaithersburg, MD, United States
Chu-Cheng Hsieh
Slice Technologies Inc., San Mateo, CA, United States
xiii

Ching-Hsien Hsu
Chung Hua University, Hsinchu, Taiwan
Hui-Huang Hsu
Tamkang University, Tamsui, Taiwan
Qian Huang
Tian-Hsiang Huang
National Sun Yat-sen University, Kaohsiung, Taiwan
Chih-Chieh Hung
Tamkng University, New Taipei City, Taiwan
Pravin Kakar
Institute for Infocomm Research, Agency for Science, Technology and
Research (A*STAR), Singapore
Shonali Krishnaswamy
Chung-Nan Lee
Kenli Li
Keqin Li
Hunan University, Changsha, China; State University of New York, New Paltz,
NY, United States
Xiao-Li Li
Qingyong Y. Li
Hai-Ning Liang
Xi’an Jiaotong-Liverpool University, Suzhou, China
Chen Lin
National Yunlin University of Science and Technology, Douliu City, Yunlin County, Taiwan
Xuefeng Liu
The Hong Kong Polytechnic University, Kowloon, Hong Kong
Ming Liu
Charles Loomis
SixSq Sàrl, Geneva, Switzerland
Chao Lu
xiv List of Contributors

Ka L. Man
Martial Michel
National Institute of Standards and Technology, Gaithersburg, MD, United States
Vijayakumar Nanjappan
Minh N. Nguyen
Declan O’Sullivan
Trinity College Dublin, Dublin, Ireland
Phyo P. San
Olivier Serres
Kathiravan Srinivasan
National Ilan University, Yilan City, Yilan County, Taiwan
Ming-Chun Tsai
Fatih Turkmen
Wei Wang
Junbo Wang
Yilang Wu
Chen-Ming Wu
Lei Xu
Chi Yang
University of Technology Sydney, Broadway, NSW, Australia
Jian-Bo Yang
Zhangdui D. Zhong
xv
List of Contributors

Preface
There are three sources of information we can collect about the environment and the people in the en-
vironment: environmental sensors, wearable sensors, and social networks. Through intelligent analysis
of the huge amount of sensory data, we can develop various systems to automatically detect natural and
man-made events. Moreover, the systems can also try to understand people’s behavior and even inten-
tion. Thus better services can be provided to people in an unobtrusive manner.
With the advances in sensor and networking technologies, we are now able to collect sensory data
easily. These sensory data can be stored and processed in the cloud. Nevertheless, how to properly
utilize such a huge amount of data is another essential issue. We certainly hope that advanced ICT
technologies can help us perform intelligent analysis on these data and provide better services to people
automatically. Exciting new systems and research results have been developed. This book aims to in-
troduce these ambient intelligence and Internet of Things (IoT) systems, which are based on big data
analytics of collected sensory data.
The theme of this book is closely related to two hot topics: the Internet of Things and big data an-
alytics. Systems and technologies introduced in the book can be used as supplementary materials for
courses involving these two topics. Researchers, professionals, and practitioners in related fields can
also find useful information and technologies for their work. There are four parts of this book: big data
architecture and platforms; big data processing and management; big data analytics and services; and
big data intelligence and IoT systems. Each part includes three or four chapters. Here we briefly in-
troduce each of the 14 chapters.
Part I: Big Data Architecture and Platforms
1. Big Data: A Classification of Acquisition and Generation Methods
Vijayakumar Nanjappan, Hai-Ning Liang, Wei Wang, Ka L. Man
This chapter points out that it is very difficult to store, process, and analyze huge amounts of data
using conventional computing methodologies and resources. The authors classify the data into
digital and analog, environmental and personal. Data types and formats as well as input mechanisms
are also highlighted. These will help us understand the active and passive methods of data collection
and production.
2. Cloud Computing Infrastructure for Data Intensive Applications
Yuri Demchenko, Fatih Turkmen, Cees de Laat, Ching-Hsien Hsu, Christophe Blanchet,
Charles Loomis
This chapter proposes a cloud-based big data infrastructure (BDI). The general architecture and
functional components of BDI are described in detail. BDI is supported by the definition of the big
data architecture framework (BDAF). Two case studies in bioinformatics are illustrated in the
chapter to provide examples of requirements analysis and implementation.
3. Open Source Private Cloud Platforms for Big Data
Martial Michel, Olivier Serres, Ahmad Anbar, Edmond J. Golden III, Tarek El-Ghazawi
This chapter tells us that it is beneficial to use private clouds, especially open source clouds, for
big data. Security, privacy, and customization are the major concerns. The chapter introduces the most
prominent open source clouds in view of big data processing. A case study using an On-Premise
Private Cloud is also presented to demonstrate the implementation of such an environment.
xvii

Part II: Big Data Processing and Management
4. Efficient Nonlinear Regression-Based Compression of Big Sensing Data on Cloud
Chi Yang, Jinjun Chen
This chapter proposes a compression method for big sensing data based on a nonlinear regression
model. It improves the effectiveness and efficiency for processing real-world big sensing data.
Regression design, least squares, and triangular transform are discussed in this chapter. It is
demonstrated that the model achieves significant storage and time performance gains over other
compression models.
5. Big Data Management on Wireless Sensor Networks
Chih-Chieh Hung, Chu-Cheng Hsieh
This chapter gives an overview of data management issues and solutions in wireless sensor
networks. There are two possible models: centralized and decentralized. Data management can
be centralized for the benefit of computation, or decentralized for energy saving. Three major
issues for data management in both models are introduced: storage, query processing, and data
collection. Some case studies are also discussed.
6. Extreme Learning Machine and Its Applications in Big Data Processing
Cen Chen, Kenli Li, Mingxing Duan, Keqin Li
This chapter first reviews the extreme learning machine (ELM) theory and its variants. Due to its
memory-residency and high space/time complexity, the traditional ELM cannot train big data
efficiently. Optimization strategies are necessary to solve this problem. Thus, parallel ELM
algorithms based on MapReduce and Spark are described. Finally, practical applications of the
ELM for big data are also presented in this chapter.
Part III: Big Data Analytics and Services
7. Spatial Big Data Analytics for Cellular Communication Systems
Junbo Wang, Yilang Wu, Hui-Huang Hsu, Zixue Cheng
This chapter surveys methodologies of spatial big data analytics and possible applications to
support the cellular communication (CC) system. The CC system provides the most popular way to
connect people. However, it still faces challenges, such as unbalanced crowd communication
behavior and video transmission congestion. Spatial big data analytics can help the CC system to
provide services with better quality of service (QoS). Challenging issues are highlighted in this
chapter.
8. Cognitive Applications and Their Supporting Architecture for Smart Cities
Haytham Assem, Lei Xu, Teodora S. Buda, Declan O’Sullivan
This chapter proposes a cognitive architecture to enable big data applications with sensory
data for smart cities. It deals with organization, configuration, security, and optimization. This
chapter also reviews related work on location-based social networks and presents a novel
approach to detect urban patterns, especially anomalies. This is essential for better understanding
of human activities and behaviors.
9. Deep Learning for Human Activity Recognition
Phyo P. San, Pravin Kakar, Xiao-Li Li, Shonali Krishnaswamy, Jian-Bo Yang, Minh N. Nguyen
This chapter presents a systematic feature learning method for the problem of human activity
recognition (HAR). It adopts a deep convolutional neural network (CNN) to automate feature
learning from raw inputs. It is not necessary to handcraft features in advance. Such a
xviii Preface

unification of feature learning and classification results in mutual enhancements. This is
verified by comparing experimental results with several state-of-the-art techniques.
10. Neonatal Cry Analysis and Categorization System Via Directed Acyclic Graph Support
Vector Machine
Szu-Ta Chen, Kathiravan Srinivasan, Chen Lin, Chuan-Yu Chang
This chapter introduces a neonatal cry analysis and categorization system. From the cry of
the newborn, the system can identify different types of feelings such as pain, sleepiness, and
hunger. The sequential forward floating selection (SFFS) algorithm is used to choose the
discriminative features. The selected features are then used to classify the neonatal cries by
the directed acyclic graph support vector machine (DAG-SVM). The system is useful for
parents and nursing staff.
Part IV: Big Data Intelligence and IoT Systems
11. Smart Building Applications and Information System Hardware Co-Design
Qian Huang, Chao Lu, Kang Chen
This chapter emphasizes that a comprehensive understanding of information system hardware
is necessary when designing efficient smart building applications. The necessity and
importance of application and hardware co-design are discussed in this chapter. A case study
is also given to show that application and hardware co-design optimize the smart building
design from a system perspective.
12. Smart Sensor Networks for Building Safety
Xuefeng Liu, Jiannong Cao
This chapter presents the design and implementation of effective and energy-efficient structural
health monitoring (SHM) algorithms in resource-limited wireless sensor networks (WSNs).
Compared to traditional wired transmission, WSNs are low cost and easy to deploy for building
monitoring. Distributed versions of SHM algorithms can help overcome the bandwidth limitation.
A WSN-Cloud system architecture is also proposed for future SHM.
13. The Internet of Things and Its Applications
Chung-Nan Lee, Tian-Hsiang Huang, Chen-Ming Wu, Ming-Chun Tsai
This chapter first compares two lightweight protocols for the Internet of Things (IoT): MQ
telemetry transport (MQTT) and the constrained application protocol (CoAP). Both protocols
reduce the size of the packet and the over-loading of the bandwidth, thus saving battery power
and storage space. The major techniques for big data analytics are then introduced. Finally,
intelligent transportation systems and intelligent manufacturing systems are presented as
examples.
14. Smart Railway Based on the Internet of Things
Qingyong Y. Li, Zhangdui D. Zhong, Ming Liu, Weiwei W. Fang
This chapter discusses the framework and technologies for a smart railway based on Internet
of Things (IoT) and big data. The architecture of a smart railway, including the perception
and action layer, the transfer layer, the data engine layer, and the application layer, is
presented first. A case study on intelligent rail inspection is then introduced. This chapter
shows that a smart railway is promising in improving traditional railway systems.
xix
Preface

ACKNOWLEDGMENTS
This book is a part of the book series “Intelligent Data-Centric Systems.” First of all, we would like to
thank the series editor, Prof. Fatos Xhafa, for his encouragement and guidance in developing this book.
We gratefully acknowledge all the contributing authors of the chapters. This book would not have been
possible without their great efforts. We are also indebted to Ms. Ana Claudia Garcia, the editorial pro-
ject manager, and the whole production team at Elsevier for their continuous help in producing this
book. Finally, we thank our families for their love and support.
Hui-Huang Hsu, Chuan-Yu Chang, Ching-Hsien Hsu
September 2016
xx Preface

CHAPTER
BIG DATA: A CLASSIFICATION
OF ACQUISITION AND
GENERATION METHODS
1
Vijayakumar Nanjappan, Hai-Ning Liang, Wei Wang, Ka L. Man
ACRONYMS
AUIs adaptive user interfaces
BAN body area network
BSN body sensor network
BSON binary JavaScript object notation records
BT business transactions
CLI command-line interfaces
CPU central processing unit
CSV comma-separated values
DA data analytics
DM data mining
DS document store
ECG electrocardiography
EEG electroencephalogram
Email electronic mail
EMG electromyography
GB gigabyte
GPS Global Positioning System
GS graph store
GUI graphical user interfaces
HIDs human interface devices
HTML hypertext markup language
IoT Internet of Things
IR infrared
IUI intelligent user interfaces
JPEG joint photographic experts group
JSON Javascript object notation records
KD knowledge discovery
KV key-value stores
LED light-emitting diode
MB megabyte
MEMS Micro-Electro Mechanical Systems
NoSQL not only structured query language
Big Data Analytics for Sensor-Network Collected Intelligence. http://dx.doi.org/10.1016/B978-0-12-809393-1.00001-5
# 2017 Elsevier Inc. All rights reserved.
3

NUI natural user interfaces
ORC optimized row columnar
OS operating system
PC personal computer
PNG portable network graphics
PS proximity sensor
RC files Record Columnar files
RFID radio frequency identification
RPC Remote Procedure Call
SD scientific data
SF sequence file
SI satellite imagery
SMD social media data
SoC System on Chip
VUI voice user interfaces
WIMP windows icons menus and pointer device
WSN wireless sensor network
WWW World Wide Web
XML extensible markup language
1 BIG DATA: A CLASSIFICATION
The coinage of the term “big data” alludes to datasets of exceptionally massive sizes with distinct and
intricate structures. They can be extremely difficult to analyze and visualize with any personal com-
puting devices and conventional computational methods [1]. In fact, enormous datasets of complex
structures have been generated and used for a long time, for example, in satellite imagery (SI), raster
data, geographical, biological, and ecological data; data used for scientific research can also be con-
sidered as “big data.” Nowadays, we see that many different kinds of big data exist in our lives, from
social media data (SMD), to organization and enterprise data, to the sensor data on the Internet of
Things (e.g., metrological data about our environment and healthcare data).
1.1 CHARACTERISTICS OF BIG DATA
In 2001, Doug Laney characterized big data from three perspectives, volume, velocity, and variety (the
3Vs) [2]. Volume refers to the magnitude of data, which usually determines the potential value of the
data. Velocity refers to speed at which data is generated and processed according the requirements of
different applications. Variety refers to the nature and different types of data. Later, the research com-
munity proposed two additional Vs: veracity and value. Veracity indicates the trustworthiness and qual-
ity of the data. This is particularly important, as big data are usually collected from a variety of sources,
some of which may not provide high-quality, reliable data. The term value is used to indicate the po-
tential (or hope) that valuable information or insight can be extracted or derived from the big data pro-
vided that the data is appropriately processed and analyzed. These characteristics bring new challenges
into the data processing and analytics pipeline. As the size of the data is constantly increasing and the
4 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION

velocity of the data generation is higher than the processing speed, scalable storage and efficient data
management methods are needed to enable real-time or near real-time data processing by the analytical
tools. To ensure the creditability of the analytics, the quality of the data must be taken into consider-
ation, for example, to identify erroneous processes and uncertain, unreliable, or missing data.
2 BIG DATA GENERATION METHODS
In today’s digital era, the data unambiguously denote digital data which can be either born-digital or
born-analog, but eventually converted into digital form. There have already been large amounts of con-
ventional digital data such as Web documents, social media, and business transaction (BT) data. In
recent years, the “Internet of Things” (IoT) has generated vast volumes of data about our physical world
captured by sensing devices. Many everyday objects are embedded with a variety of sensors capable of
collecting analog data and converting it into digital. Besides conventional data, sensor data are becom-
ing the next big data source.
2.1 DATA SOURCES
2.1.1 Born-digital data
The born-digital data are created and managed using computers or other digital devices. Almost all
documents in personal computers are stored in some standardized file formats (e.g., Word or PDF doc-
uments). Advances in Internet and World Wide Web (WWW) technologies have enabled computers
around the world to be connected so that billions of Web documents can be accessed anywhere. The
emergence of Web 2.0 technologies enriched data and media types from text-only to images, videos,
and audios, as well as the associated metadata such as temporal and geographical information. We can
see now that numerous images and videos are being uploaded to social media websites which are an-
notated with location information and tagging data related to their contents. Some of the other tradi-
tional big data sources include electronic mails, instant messages, medical records, and business
transactions.
2.1.2 Sensor data
Recently, billions of physical objects, such as sensors, smartphones, tablets, wearable devices, and ra-
dio frequency identifications (RFIDs), embedded with identification, sensing, computing, communi-
cation, and actuation capabilities, are increasingly connected to the Internet, resulting in the next
technological revolution, known as the “Internet of Things” (IoT). Integration of multiple semiconduc-
tor components on a single chip (System on Chip) is the key success of the Internet of Things, which has
the potential to revolutionize a large array of intelligent applications and services in many fields.
According to Gartner, the network of connected things will reach nearly 20.8 billion by 2020, with
around 5.5 million new devices being connected every day [3]. It is estimated that by the end of 2017,
sales of worldwide wearable electronic devices will be increased by 39% [4]. In contrast, there is a
9.6% decline in worldwide PC shipments, which indicates that smart devices are more preferred in
the market [5]. It is reported that by 2018, new digital devices that can talk to each other in the house-
hold will be common [6]. It is estimated that nearly 3 trillion gigabytes of data are produced in a single
5

day. The high volumes of heterogeneous data streams coming from these varieties of devices bring
great challenges to the traditional data management methods.
A widespread example of these portable devices are mobile phones or smart devices, like Apple’s
watch, have been integrated with varieties of sensors like accelerometer, gyroscope, compass, Global
Positioning System (GPS), and more recently sensors that can capture biometric information such as
heart rate. Table 1 lists commonly used sensors on smartphones or tablets.
Sensors built on the Micro-Electro Mechanical Systems (MEMS) are small in size and only have
limited processing and computing capabilities. A wireless sensor networks (WSN) can be developed by
connecting the spatially distributed sensors using wireless interfaces. There can be different kinds of
sensors integrated into a single WSN, such as mechanical, magnetic, thermal, biological, chemical, and
optical. A sensor can be either immobile or mobile (including wearable). While immobile sensors are
installed on an object at a fixed location [7], mobile sensors are usually installed on a moving object.
A wearable sensor is a special kind of mobile sensor and is worn on the human body, which can be used
to form a body sensor network (BSN) or body area network (BAN) [8].
The fixed sensors can be installed on earth surfaces like terrain [9], or submerged under the water
[10] and under the land [11]. In contrast, mobile sensors can move and interact with surrounding phys-
ical environments. Wearable sensors are worn by the users and can convert physical or environmental
parameters of wearers such as blood pressure [12,13], heart rate [14,15], bodily motion [16], brain ac-
tivity [17], and skin temperature [18]. Table 2 summarizes some of the most commonly used sensors
in BSNs.
2.2 DATA TYPES
Interactions among physical objects, sensors, and people generate massive amounts of data, which can
be either structured or unstructured. Table 3 illustrates some of the examples on the different types of
the data.
Table 1 Common Sensors Integrated in Smartphones and Tablets
Sensors on
Smartphones Function
Microphone The real-world sound and vibration are converted to digital audio
Camera Senses visible light or electromagnetic radiation and converts them to digital image or video
Gyroscope Provides orientation information
Accelerometer Measures the linear acceleration
Compass or
magnetometer
Works as a traditional compass. Provides orientation in relation to the magnetic field of Earth
Proximity sensor Finds proximity of the phone from the user
Ambient light sensor Optimizes the display brightness
GPS Global Positioning System, tracks the target location or “navigates” the things by map with
the help of GPS satellites
Barometer Measures atmospheric pressure
Fingerprint sensor Captures the digital image of fingerprint pattern

2.2.1 Structured data
Structured data are usually defined with fixed attributes, type, and format—for example, records in a
relational database are generated according to a predefined schema. Compared to unstructured or semi-
structured data, processing of structured data is relatively simpler and more straightforward. This type
of data can be generated by people, machines, and sensors.
(1) Human-generated structured data: the data are created under explicit human involvement using
some interaction mechanisms, e.g., data generated through human-machine interface devices like
mouse input data and click-streams.
(2) Machine-generated structured data: the data are created automatically by a computing device
without explicit human interaction, e.g., Web log data.
Table 2 Commonly Used Sensors in Body Area Networks or Body Sensor Networks
Sensor Function
Blood-pressure sensor Measures human blood pressure
Camera pill Measures gastrointestinal tracts
Carbon dioxide sensor Measures carbon dioxide gas
ECG/EEG/EMG sensor Measures the electrical and muscular functions of the heart
Humidity sensor Measures humidity changes
Blood oxygen saturation Measures blood oxygen saturation
Pressure sensor Measures pressure value
Respiration sensor Measures human respiration values
Temperature sensor Measures human body temperature
Table 3 Data Types and Data Sources
Structured Data Unstructured Data
Human-generated Input data
Click-streams
Text documents
Social media data
Mobile data
Web page content
Machine-generated Web logs/server logs
Satellite imagery
Scientific data
Image and video
Radar data
Sensor-generated Fixed sensor data
Mobile sensor data
7

(3) Sensor-generated structured data: the data are generated by the embedded fixed or moveable
sensors, e.g., sensor data from smartphones and smart meters.
2.2.2 Unstructured data
Unstructured data are the opposite of structured data, without a predefined data model. Some common
examples include text, images, audio, video, and streaming sensor data. Unstructured data are one pri-
mary source of big data and are much more challenging to process compared to structure data. Human-
generated unstructured data include a large number of data types with different nature, such as textual
data (Web documents, licensed publications, e-journals, eBooks, organizational records, e-mails, logs),
and media data of different types contributed by ordinary users on social media platforms. Examples of
machine-generated unstructured data include scientific data (e.g., astronomical data, geographic, eco-
logical, biological, chemical, and geospatial data), satellite images of weather, surveillance data, and
radar data (e.g., meteorological and oceanographic seismic data).
3 BIG DATA: DATA ACQUISITION METHODS
Human interaction with computers and devices creates vast amounts of data. In the PC era, human
interface devices (HIDs), like keyboards and mice, support users in interacting with created digital
data. Most of the digital user-generated text data have been created by conventional and widely used
major input devices like keyboards and mice (or touchpads in portable computers) with explicit human
involvement. Digitized analog data or sensor data are generated using audio and camera devices,
known as multimedia data. The introduction of tactile-feedback technology has added an extra dimen-
sion to the manner in which people interact with computers. The stylus, a pen-shaped instrument used
with tactile-feedback devices and graphics tablets to interact, write on the surface of the screen, making
interaction more direct. The stylus and similar haptic-based devices allow users to interact directly with
the displayed content with multitouch gestures as an input, in lieu of the physical keyboard and pointing
devices. The rise of smart touch-based devices, embedded with sensors, has added diversity to existing
interaction methods, enabling richer interactive gesture-based interaction methods.
3.1 INTERFACE METHODS
Communication between the user and a computer system is done through various interface mechanisms,
especially using input/output devices. In this section, we review some of the most important ones and
their evolution, and describe how they contribute to data generation (see Fig. 1 for a summary).
3.1.1 Command-line interfaces
The command-line interface (CLI) or character user interface (CUI) is one of the first types of interface
methods that allows users to send text-based commands to the system. Text commands are converted to
appropriate operating system functions. The CLI is the earliest and oldest form of interface, but offers
powerful and concise control over programs. As such, the earliest forms of digital text data are created
using CLIs. The amounts of data generated are not significant. This is an important feature, as in earlier
systems, memory was limited and expensive.

3.1.2 Graphical user interfaces
The graphical user interface (GUI), popularized by Microsoft Windows, is an interactive visual inter-
face rather than a command or text-only interface. The interactive interface tools are visually repre-
sented as windows, icons, menus, and a pointer device, which collectively are known as WIMP.
The GUI interface also includes a text interface, called the graphical character-based interface. Pres-
ently, GUI is the most common and well-known user interface for computers and some earlier mobile
devices like mobile phones and laptops. Gracoli, a hybrid interface, combines the strengths of the GUI
and CLI to perform application specific interfaces [19].
3.1.3 Context-sensitive user interfaces
Context-sensitive user interfaces is almost pervasively used within GUIs, and allow users to choose
automatically from available multiple options based on the current or previous state of the application
process. Context menus in GUIs are the principal example of context-sensitive user interfaces. The
primary use of the context-sensitive user interface is to simplify the interface by reducing the number
of commands or clicks or keystrokes required to perform a given action. This type of interface plays
a crucial role where interface devices have limited number of buttons, like video games controlled by a
mouse, joystick, or gamepad. With the emergence of mobile devices, whose main input entry is via
a touch-based screen, context-sensitive interfaces have found more uses. A variety of contextual
options are provided via distinct taps and gestures on the screen.
3.1.4 Web-based user interfaces
A Web user interface or Web app allows the user to interact with content or software running on a
remote server through a Web browser. The content or Web page is downloaded from the Web server
and the user can interact with this content in a Web browser, which acts as a client. The distributed
nature allows the content to be stored on a remote server, while the ubiquitous nature of the Web
browser permits a convenient access to the content. The most common Web applications are Webmail,
online shopping, online document sharing, social media, and instant messaging. A vast amount of data
exists now, generated by these types of interfaces.
3.1.5 Adaptive user interfaces or intelligent user interfaces
Adaptive user interfaces (AUIs), also referred as intelligent user interface (IUIs), support users’ cus-
tomization of the interface by changing the layout and other elements according to the user or context
requirements. AUIs are either user-initiated adaptable or system-initiated self-adaptive. Their aim is to
Command-line interface Multi-touch gestural interface Natural user interface
Graphical user interface
FIG. 1
Evolution of user interfaces, user input methods or data generation.
9

offer efficient, intuitive, and secure way interfaces to users based on their unique preferences, traits, and
environmental circumstances.
AUIs are capable of passively recognizing a user’s presence, and offer services based on their im-
mediate requirements.
3.1.6 Natural user interfaces
The natural user interface (NUI) is a gesture-based simple and intuitive interface, and allows users to
naturally interact with the systems without any physical encumbrances, involving body movements,
gestures, and voice.
3.1.7 Voice interfaces
Voice user interfaces (VUI) are based on speech recognition technology and enable users to interact or
send commands to computers or smart devices using voice or speech. This is the most natural way of
allowing users to interact with computers or smart devices, similar to how one would communicate
with other people. The most commonly used voice interaction roles are command- and agent-based
interactions. The command-based interaction allows the user to give speech input to the system, most
commonly in a simple but specific predefined order. The agent-based interactions recognize natural
language as input and provide an appropriate response through text or audio on the system. Apple’s
Siri and Google’s Voice Search are typical examples of voice interfaces.
3.1.8 Gesture-based interfaces
Gesture-based interfaces attempt first to recognize gestures as commands. They distinguish continuous
physical moments of the users, organizing the hands, fingers, face, head, and body into a discrete se-
quence of commands. The successful interpretations of meaningful gestures are recognized by the re-
ceiving system, which let users interact with the systems in a more natural way. Sensor evolution also
led to the enhancement of novel human gesture-based interaction with smart connected devices in the
IoT. The natural, continuous meaningful movements of users, involving their hands, fingers, head,
face, and body, can all be part of this process of user-system interaction. This will be an important
way of interaction as for instance with the growing emphasis on BSNs where sensors are placed on
or attached to the human body to passively capture physiological data and body movements [20].
3.1.9 Multitouch gesture interface
A multitouch interface is a gesture-based interface that supports two or more continuous gestures to
interact with touch surfaces. On smart devices, for example, multitouch interfaces allow more direct
interaction with applications and are considered to be natural and intuitive gestures. This empowers
varieties of actions like taps, swipes, rotations, pinches, and other natural gestures. Touchpads and
touchscreens on portable smart devices are powered by multitouch interfaces. The ever increasing
dominance of these devices has replaced traditional input devices like keyboards and mice as most
data-generating input devices.
3.1.10 Touchless gesture interfaces
Touchless gesture interfaces completely eliminate physical contact with a device directly as by touch-
ing or indirectly via a secondary device like a mouse. This is thought to that it make interaction even
more natural and intuitive by letting users be free of any physical attachments and involving only their

body movements. Second to voice, human movements without the need for any physical controls are
closer to how people interact with one another. Touchless gesture interfaces aim to replicate this type of
communication, which is achieved through selections of intelligent sensing devices located around the
users. A touchless gestural interface represents an intelligent and a natural user interface method for
users to interact with systems using intuitive and unencumbered physical movements and gestures [20].
We next review some input devices and tools that enable the creation of the interfaces we have
presented above.
3.2 INTERFACE DEVICES
Input devices enable the user to input data directly into the computer. The best-known HIDs are the text
entry device keyboard and pointing devices like the mouse, trackball, light pen, and stylus, and other
devices like the joystick and touchscreen.
3.2.1 Keyboard
The keyboard is a typewriter-style device with a series of electronic switches or keys which allow users
to send text and alphanumeric data directly into computers. The switches each represent one character.
The most common English-language keyboard layout is a typewriter-style QWERTY layout. The stan-
dard computer keyboard contains alphabet keys, number keys, punctuation symbol keys, arrow keys,
and functional and control keys. The keyboard is the primary peripheral device for data entry. The vir-
tual, touchscreen-based keypad is used in mobile devices to simulate the physical keyboard.
3.2.2 Mice
The mouse allows a user to manipulate objects indirectly using a pointer-like representation by detect-
ing two-dimensional motion in a GUI. The mouse is a hand-controlled device, and typically has one or
two buttons. The mouse click is generated by pressing any of the buttons once, holding it, or releasing
it immediately. There are different variations of mouse clicks to select objects, move the pointer to the
desired location on the display, and input commands into the system. The keyboard and mouse are
the most integrated computer peripherals which allow the user to interact with the system. In contrast
to the keyboard, the mouse is supported only in GUIs. The trackball is another pointing device very
similar to the mouse.
3.2.3 Joystick
The joystick is a control column input device with a lever which controls the movement of a pointer in
all directions on the display. Similar to the mouse, joysticks include buttons known as triggers for ad-
ditional functionality. The joystick is typically used in games and sometimes as a replacement for the
mouse in certain situations. Miniature versions of finger-operated joysticks are now adopted in mobile
devices.
3.2.4 Stylus
The stylus, a pen-shaped input device, allows the user to input commands to the computer, mobile,
and other smart devices via their display. The stylus is used on the touchscreen devices to make se-
lections by tapping, or writing or drawing on the screen, just like using a pen on a notebook. The
11

stylus is more commonly used in portable handheld devices, like laptops and tablets, than on desktop
computers.
3.2.5 Touchpad
The touchpad is a pointing device or cursor-controlling device for portable computers. Touchpads
function a very similar way to mice and contain a tactile sensor to identify the position and motion
of the user’s fingers in contact with the pad. Touchpads introduced multitouch gesture-based interface
mechanisms. In addition to the taps and swipes features of the touchpad, gesture-based interface allows
additional gesture-based application special input methods.
3.2.6 Touchscreens
The touchscreen is a combination of both display and input device. A transparent touch-sensitive panel
is embedded on the rigid planar surface that recognizes the touch or press of users’ fingers as input. The
touchscreen replaced mouse or stylus with users’ fingers as an input device, giving the feeling of more
directness to users when they manipulate content on the display. Touchscreens have brought in wide-
spread use of multitouch gesture-based input interactions with modern devices.
Besides the above more recognizable input devices, there are a number of emergent ones which are
becoming more widely used and support new ways of interactive with computing systems. The inte-
gration of these emergent devices is only possible by powerful machines that can capture large amounts
of data and process them in real-time. In addition, the development of BANs has made the use of some
of these devices feasible. We describe some the emergent input devices next.
3.2.7 Kinect
The Kinect is device that captures body motion of users with them being placed at a certain distance of a
display. Its motion sensors translate a user’s physical body position and movements into commands.
Initially developed for the Xbox game consoles, it is now used for other applications and devices.
3.2.8 Leap motion
Leap is an in-air gestural user interface device. Leap uses two monochromic infrared (IR) cameras and
three IR LEDs covers hemispherical area at a distance of 1 m. It is similar to the Kinect but is designed
for closer interaction from any display.
3.2.9 Myo
Myo is a muscle-controlled arm worn gestural device. Myo recognizes forearm muscle movements and
transmits them wirelessly as valid gestural commands to interact with PCs or other systems.
3.2.10 Wearable devices
Wearable devices or gadgets are electronic devices worn by consumers ubiquitously and continually
to capture or track biometric information related to health or fitness. Wearable devices are new
manifestations of accessories that people wear, such as Apple’s Watch or Samsung’s Gear Watch
or more dedicated tools like the Fitbit One wireless activity and sleep tracker and monitor. Wearable
devices with biometric tracking capabilities represent one of the most important sources of data gen-
eration. They will continuously and uninterruptedly record data of different types and from a variety of
environments.

As data increase in variety and volume in parallel to the need to support greater velocity in their
generation and processing, it is important to have a way to organize them. Organization and manage-
ment of data will therefore be explored next.
4 BIG DATA: DATA MANAGEMENT
4.1 DATA REPRESENTATION AND ORGANIZATION
Current systems represent data using a binary digital system. The data types are converted into binary
digits represented of 1s or 0s called bits. A byte, equivalent to a sequence of 8-bits, is the fundamental
unit of storage. Different standards are used to encode data objects by assigning bit patterns together. In
order to utilize the storage space efficiently, data are compressed using various compression tech-
niques. One of the major requirements in big data is their low latency in their processing. In contrast
with the traditional methods of performing computations on stored data, the data must be processed as it
is generated in, or almost near, real-time. Thus low latency is a key requirement in big data
analytics (DA).
4.1.1 File formats
The file format is the description of how the collection of data is internally represented on a storage
medium in a file. Data processing and query performance are heavily based on the file format. In order
to reduce the total number of bytes moved from storage disk to temporary memory, data is often com-
pressed. Data compression methods save time to transfer data, but with a tradeoff that data have to be
decompressed. A selection of file formats has significant performance consequences. The compression
support reduces the size of data on the disks to maximize input/output and central processing unit
(CPU) resources to de-serialize the data. The query performance is mainly based on the amounts of
input/output and CPU resources required to transfer and decompress the data. The file formats can
be structured and unstructured.
We next describe some important file formats.
Javascript object notation records (JSON)
JSON is an open, lightweight, highly human, and machine-readable standard based on a subset of the
Javascript programming language that allows data interchange. JSON is a platform and language in-
dependent text format. It uses the conventions from different language families, including C, C++, C#,
Java, Javascript, Perl, Python, and others. JSON supports arrays and understands the different type of
standard data types, such as strings, numbers, and Boolean values. Computers can easily parse and
generate JSON records that can describe complex data structures.
JSON is built on two universal structures: as an object, such as the collection of value pairs, and an
array, such as the ordered list of values.
Binary Javascript object notation records (BSON)
BSON is a binary-encoded serialization of JSON-like documents. The value pairs are stored as a single
entity called document. BSON is also lightweight, traversable, and efficient. The extensions in BSON
allow representation of data types in addition to the standard JSON types. BSON supports embedding
13

documents and arrays with other documents and arrays. In comparison with other binary interchange
formats, BSON is more “schema-less.”
Comma-separated values (CSV)
Comma-separated values (CSV) is a standard file format for spreadsheet data used to exchange data
between distinct applications. The data is represented in a text file; each record is represented as one
line, and commas are used to separate data fields in each row. CSV is used to exchange data between
Hadoop and external systems.
Sequence file
A sequence file (SF) is a flat, compact binary storage format for serialized key-value pairs. These files
provide additional support for two different levels of compression formats like a record and block com-
pressions in addition to the uncompressed format. The file metadata is supported with a “secondary”
Text/Text pair key-value list. The files can be easily split and processed in parallel. A SF consists of a
header followed by one or more records.
Record columnar files
Record Columnar files (RC files) are intended for efficient and high-performing processing of data.
They are flat files and support columnar formats that consist of binary key/value pairs. RC files store
columns of a table in a record columnar way by horizontally partitioning the rows into row splits and
vertically partitioning them in a columnar way. The metadata of row splits remains stored in keys while
data of a row split stored as values. Since being introduced in 2011,1
RC files have been adopted in
major real-world systems for big DA, including in Facebook’s Hadoop cluster.2
Optimized row columnar files (ORC files)
Optimized row columnar (ORC) files are further optimized and intended to replace RC files. In an ORC
file, the collection of the row data is in the columnar format, optimized for compression; these collec-
tions of rows are stored in one separate file. This format supports parallel processing of row collections
across multiple clusters. The lightweight indexing enables the feature of skipping a complete block that
is not required for the requested query. ORC files come with basic statistics on its columns.
Parquet files
The Apache parquet is a columnar file format, and stores binary data in a column-oriented way. The
values of each column are organized adjacent to each other, enabling efficient, flexible compression
options and encoding schemes. Parquet file format supports all data processing frameworks and data
models. A single parquet file size range can reach up to gigabytes, and is optimized to process large
volumes of data, typically suited to data warehouse-style operations.
1
He Y, Lee R, Huai Y, Shao Z, Jain N, Zhang X, et al. RCFile: a fast and space-efficient data placement structure
in MapReduce-based warehouse systems. In: Proceedings of the IEEE international conference on data engineering
(ICDE); 2011.
2
http://www.slideshare.net/ydn/2-hive-integrationhadoopsummit2010.

Avro files
Avro is a binary data storage format, providing data serialization and data exchange services. Avro
supports a rich set of primitive data types. The data efficiently serialize into files or a message. The
data and data definition are combined together in a single file or message, making Avro schemas to
perform rapid serialization. The data stored in a binary format, making it compact and efficient.
The data definition is stored in JSON format, making it easy to read and interpret. The markers in Avro
files split large datasets into subsets. Avro files support both primitive data and complex data types.
Avro handles data schema changes. The data stored in Avro files can easily be portable between dif-
ferent programming languages.
Avro supports the Remote Procedure Call (RPC) interface in data exchange services to effectively
allow different programs to communicate data and information. Avro RPC interfaces and schemas are
defined in JSON. Avro heavily relies on its schemas, both data and its schemas are stored in a file.
When describing these different file formats, we have also made some references to data compres-
sion. This is an important aspect of the management of big data. We shall discuss some salient aspects
of data compression next.
4.1.2 Data compression
In big data, petabytes of data are captured, stored, and analyzed. The high volumes of data generally
increase the input/output operations and transferring these large datasets over the network will take
considerable time. The real-time DA need efficient management using these disk input/output and net-
work bandwidth resources. Data compression mitigates these problems by not just saving storage space
but also increasing the data transfer speed across the network. It is crucial in big data environments to
combine data compression and to increase the network transfer to improve the performance of DA ac-
tivities. Compression of massive datasets certainly increases the utilization of the CPU, as the data must
be decompressed to be processed at a later stage.
Hadoop supports multiple compression formats most commonly referred as codec—short name
for coder and decoder. There exist a set of compiled Java libraries that can be used in Hadoop to
perform data compression and decompression. Each codec has one algorithm implementation for
compression and decompression. Hadoop supports both splittable and nonsplittable compression al-
gorithms. A splittable algorithm enhances performance as large data blocks are distributed across
multiple data nodes and multiple MapReduce tasks decompress data blocks in parallel. Nonsplittable
algorithms, on the other hand, combine data blocks together and use one MapReduce task for
decompression.
There are a number of Hadoop codecs, which we describe next briefly.
4.1.3 Hadoop codecs
Deflate uses a combination of the Huffman coding, a form of prefix coding and LZ77 compression,
which works by finding and replacing redundant data with metadata.
LZ4 is a speed-focused lossless compression algorithm, belongs to a LZ77 based byte-oriented
compression scheme. The maximum compression speed is 400 MB/s per core, decompressed speed
in multiple GB/s per core, expandable to multicores.
Gzip is a file format, based on the Deflate algorithm, used for file compression and decompression.
Bzip2 is an open format file compression based on the Burrows-Wheeler algorithm, used to
compress single files. Bzip2 uses multiple layers of stacked up compression techniques.
15

Snappy codecs, previously known as Zipply, provide very high speed and reasonable
compression. The maximum compression speed is 250 MB/s or more and decompression speed
about 500 MB/s or more. Snappy is optimized for 64-bit x86-compatible processors. Snappy
assumes little-endian throughout and requires byte-swapping of data in several places for
big-endian platform. Snappy is a robust and stable system, and has successfully compressed
and decompressed petabytes of data in Google’s production environment.
Typical compression ratio for plain text data is 1.5–1.7, for HTML about 2–4, and for image data
like JPEGs and PNGs and other compressed formats, about 1.0.
Files, whether compressed or not, need to be organized properly. The organization of files is usually
down to databases.
4.2 DATABASES
In contrast to the traditional relational databases, a NoSQL (not only SQL) database is a geographically
distributed nonrelational database system. A NoSQL database system runs on multiple cluster nodes,
with individual instances of operating systems and built-in storage on each node. This feature support
is aimed largely at organizing and analyzing large amounts of heterogeneous data types, regardless of
OS. The nodes facsimile data across numerous nodes to ensure that there is no data loss during node
failure. The cluster services restore the data from the failed node through a single system image to
redistribute the data across the cluster.
4.2.1 Dynamic schema
In contrast to traditional relational databases, which require that database schemas should be defined
before data insertion, NoSQL permits data insertion without a predefined database schema. This allows
applications to integrate schema iteration rapidly in real-time. The side-code is added by the developers
to ensure quality controls by keeping specific fields and data types. This validation method imposes
authority on data without compromising the benefits of dynamic schema.
4.2.2 Sharding, replication and auto-caching
Sharding is a method of storing data records across many server instances. This is done through storage
area networks to make hardware perform like a single server. The NoSQL framework is natively
designed to support automatic distribution of the data across multiple servers including the query load.
Both data and query replacements are automatically distributed across multiple servers located in the
different geographic regions, and this facilitates rapid, automatic, and transparent replacement of the
data or query instances without any disruption. The cloud computing and platform as a service frame-
work makes this feature considerably easier. The most frequently used data are kept in the integrated in-
memory database instead of being placed in a separate caching later to maintain the lowest latency and
also provide the highest throughput.
4.2.3 NoSQL types
Key-value stores
Key-value (KV) stores, or key-value databases, are the simplest NoSQL databases. KV stores use an
associate array data model, known as a hash or dictionary. In this model, every single record in the
database is stored as an attribute name or key, together with its value in a schema-less way. This

relationship is known as key-value pair. In each key-value pair, the key is represented by a string and
the value is the data for the key. In particular, the key-value stores do not require a query language, but
provide a way to store, retrieve, and update data.
Notable key-value databases are Riak, Redis, Memcached, BerkerlyDB, Upscaledb, Amazon
DynamoDB, Couchbase, and Project Voldemort.
Table 4 shows the comparison of different NoSQL data-models.
Document stores
Document stores (DS) record data in a key-value pairs in a structured format which the database can
understand. Each document contains data and a unique key is assigned to retrieve the document. It
allows the adding of new fields of data by including additional key-value pairs into documents.
The transparent way of storing data remove query limitations by key. This allows content-oriented
retrieval of full-page, often semistructured data with a single query and is suited for content-oriented
applications. The documents are in XML, JSON, and BSON file formats.
The most notable and popular document databases are MongoDB, CouchDB, Terrastore,
OrientDB, RavenDB, and Lotus Notes.
Column-oriented stores
Column databases, as the name suggests, are designed to record data tables as rows of columns of data.
The columns of data always group related data as rows and are associated with a unique row key. This
inverse feature of relational database systems provides optimized queries over very big datasets and
offers very scalable architecture with extremely high performance. The columnar database is highly
compressed to save storage space and is also capable of self-indexing. The most popular column-
oriented databases are Cassandra, HBase, Hypertable, and Amazon DynamoDB.
Graph stores
These data stores are designed to represent data entities and the undetermined interconnected relation-
ships between these entities as a graph. The entities are similar to nodes with properties. The edges
represent relationships with their own properties, including directional significance. The nodes and
their relationships are organized as a graph. The relationship is actually persevered and the data are
interpreted in different ways based on their relationships in the graph. This supports rapid traversing
of joining or relationships. The nodes can have multiple types of relationships with start and end nodes
along with their own properties. The properties of the relationships are used to add intelligence to the
relationship and also employed to query the graph.
The notable graph databases are Neo4J, Infinite Graph, and OrientDB or FlockDB.
Table 4 NoSQL Data-Model Comparison
Data-Model Performance Scalability Flexibility Complexity Functionality
Key-value store High High High None Variable (none)
Column store High High Moderate Low Minimal
Document store High Variable (high) High Low Variable (low)
Graph store Variable Variable High High Graph theory
17

4.3 DATA FUSION AND DATA INTEGRATION
Data are generated from varieties of different sources and each data source carries significant infor-
mation that is sufficient to analyze and process the data. The data obtained directly from different
sources can have some redundant information and can also have heterogeneous representations. Re-
trieval of meaningful information from heterogeneous datasets has limitations. In order to manage data
and retrieve valuable information from data efficiently, it is essential to merge heterogeneous datasets
into one homogeneous data representation. Data fusion provides this by combining information from
multiple sources to form a unified representation.
Data fusion can be defined as [21]: “A multi-level process dealing with the association, correlation,
combination of data and information from single and multiple sources to achieve refined position, iden-
tify estimates and complete and timely assessments of situations, threats and their significance.” An
alternative definition is from Hall and Llinas [22]: “data fusion techniques combine data from multiple
sensors and related information from associated databases to achieve improved accuracy and more spe-
cific inferences than could be achieved by the use of a single sensor alone.”
Data fusion systems are used in a wide range of domains such as sensor networks, text proces-
sing, and video and image processing, to name a few. In big data, the high velocity of heteroge-
neous data types implies the importance of having data fusion. Advance developments in Internet
of Things connect networks of sensors. These networks encompass sensor nodes and at least one
base station. Every sensor nodes are integrated with sensors, data processing tools, a radio com-
munication system, and a battery. In these networks, raw data may present redundant information
and provide sufficient information about its relevance. In multisensor networks, transmitting raw
data can cause data collisions and there could be a higher chance of having inaccurate/unreliable
information from some abnormal nodes. In order to aggregate valid data to yield effective infor-
mation, it is essential to process the data. Data fusion facilitates better usage of network bandwidth,
a great network lifetime, utilizes the energy resources, and above all offers an efficient and high
level of accurate information retrieval. As such, data fusion represents one of the bigger challenges
in big data.
5 SUMMARY
The objective of this chapter is to give a broad overview of acquisition and generation methods of
big data. In the digital century, the term “big data” has expanded its boundary from scientific
data (e.g., satellite imagery data and geographical data) to the sensor data on the Internet of Things
(e.g., metrological data and healthcare data). The new boundary adds more characteristics, known as
volume, velocity, variety, veracity, and value—the Vs of big data. In the same way, the expansion also
brings new challenges into the big data processing and analytics pipeline. In fact, the coinage
“big data” unambiguously denotes digital data, either born-digital or converted into digital data from
born-analog. The computers or other digital devices are the main sources of born-digital data,
whereas born-analog or sensor data are captured by various sensing devices. These data are not only
predefined with a data model, known as structured data, but also without any predefined model,
branded as unstructured data. Moreover, these massive amounts of data are generated with or without
explicit human involvement. The tactile-feedback technology has added an extra dimension to the

manner in which human interact with computers or devices. Equally, the rise of smart devices, with
embedded sensors, has incorporated more diversity to existing interaction methods.
Different types of big data are created from well-known text-only keyboard to rapidly growing
wearable devices and are successfully converted into binary digits, or bits. A selection of file formats,
notably JSON, BSON, CSV, and RC files, are used to store the collection of data. The data are com-
pressed to reduce the size on storage disks to maximize input/output and CPU resources.
A geographically distributed nonrelational database system, NoSQL is used to handle unstructured
data. NoSQL encompasses the following different types of database technologies: key-value stores,
document stores, column-oriented stores, and graph stores. Additionally, NoSQL permits dynamic data
insertion without a predefined database scheme compared relational database predefined schema. Data
fusion merges heterogeneous datasets into one homogeneous data representation by combining infor-
mation from multiple sources to form a unified representation.
REFERENCES
[1] Sagiroglu S, Sinanc D. Big data: a review. In: International conference on Collaboration Technologies and
Systems (CTS); 2013. p. 42–7.
[2] Laney D. 3D data management: Controlling data volume, velocity and variety. META Group Research
Note 6, 2001. p. 70.
[3] Gartner Says 6.4 Billion Connected [Internet]. Available from: http://www.gartner.com/newsroom/id/
3165317 [cited 20.04.16].
[4] Gartner Says Worldwide Wearable Devices Sales to Grow 18.4 Percent in 2016 [Internet]. Available from:
http://www.gartner.com/newsroom/id/3198018 [cited 20.04.16].
[5] Gartner Says Worldwide PC Shipments Declined 9.6 Percent in First Quarter of 2016 [Internet]. Available
from: http://www.gartner.com/newsroom/id/3280626 [cited 20.04.16].
[6] When to Expect Devices and Connected [Internet]. Available from: http://www.gartner.com/newsroom/id/
3220117 [cited 20.04.16].
[7] Yick J, Mukherjee B, Ghosal D. Wireless sensor network survey. Comput Netw 2008;52(12):2292–330.
[8] Lai X, Liu Q, Wei X, Wang W, Zhou G, Han G. A survey of body sensor networks. Sensors 2013;13
(5):5406–47.
[9] Akyildiz IF, Su W, Sankarasubramaniam Y, Cayirci E. A survey on sensor networks. IEEE Commun Mag
2002;40(8):102–14.
[10] Akyildiz IF, Pompili D, Melodia T. Challenges for efficient communication in underwater acoustic sensor
networks. SIGBED Rev 2004;1(2):3–8.
[11] Li M, Liu Y. Underground structure monitoring with wireless sensor networks. In: Proceedings of the 6th
international conference on information processing in sensor networks (IPSN ’07) [Internet]. New York:
ACM; 2007. p. 69–78. Available from http://doi.acm.org/10.1145/1236360.1236370 [cited 22.04.16].
[12] Espina J, Falck T, Muehlsteff J, Aubert X. Wireless body sensor network for continuous cuff-less blood pres-
sure monitoring. In: 3rd IEEE/EMBS international summer school on medical devices and biosensors; 2006.
p. 11–5.
[13] Teng XF, Zhang YT, Poon CCY, Bonato P. Wearable medical systems for p-health. IEEE Rev Biomed Eng
2008;1:62–74.
[14] Paradiso R, Loriga G, Taccini N. A wearable health care system based on knitted integrated sensors. IEEE
Trans Inf Technol Biomed 2005;9(3):337–44.
19
REFERENCES

[15] Rienzo MD, Rizzo F, Parati G, Brambilla G, Ferratini M, Castiglioni P. MagIC system: a new textile-based
wearable device for biological signal monitoring. Applicability in daily life and clinical setting. In: IEEE
engineering in medicine and biology 27th annual conference; 2005. p. 7167–9.
[16] Mattmann C, Clemens F, Tr€
oster G. Sensor for measuring strain in textile. Sensors 2008;8(6):3719–32.
[17] Devot S, Bianchi AM, Naujoka E, Mendez MO, Braurs A, Cerutti S. Sleep monitoring through a textile re-
cording system. In: 29th annual international conference of the IEEE Engineering in Medicine and Biology
Society; 2007. p. 2560–3.
[18] Jung S, Ji T, Varadan VK. Point-of-care temperature and respiration monitoring sensors for smart fabric ap-
plications. Smart Mater Struct 2006;15(6):1872.
[19] Verma P. Gracoli: a graphical command line user interface. In: CHI’13 extended abstracts on human factors
in computing systems (CHI EA’13) [Internet]. New York: ACM; 2013. p. 3143–6. Available from http://doi.
acm.org/10.1145/2468356.2479631 [cited 30.03.16].
[20] Garzotto F, Valoriani M. Touchless gestural interaction with small displays: a case study. In: New York:
ACM Press; 2013. p. 1–10. Available from http://dl.acm.org/citation.cfm?doid¼2499149.2499154 [cited
02.07.15].
[21] White FE. Data Fusion Lexicon, Joint Directors of Laboratories, Technical Panel for C3, Data Fusion Sub-
Panel. San Diego, CA: Naval Ocean Systems Center; 1991.
[22] Hall DL, Llinas J. An introduction to multisensor data fusion. Proc IEEE 1997;85(1):6–23.
GLOSSARY
Data analytics It is the science of exploring large amounts of data to discover hidden patterns and correlations,
and draw conclusions based on the findings.
Data mining and Knowledge discovery It is an interdisciplinary computational process to analyze data for dis-
covering useful knowledge from data.
Raster data It is a data structure that is represented as a regular grid (rectangular or square) of cells.
Satellite imagery It is the collection images of Earth and other planets collected by satellites.
Scientific research It is the systematic investigation of scientific theories and hypotheses.

CHAPTER
CLOUD COMPUTING
INFRASTRUCTURE FOR DATA
INTENSIVE APPLICATIONS
2
Yuri Demchenko*, Fatih Turkmen*, Cees de Laat*, Ching-Hsien Hsu†
, Christophe Blanchet{
,
Charles Loomis§
University of Amsterdam, Amsterdam, The Netherlands*
Chung Hua University, Hsinchu, Taiwan†
CNRS IFB,
Orsay, France{
SixSq Sàrl, Geneva, Switzerland§
ACRONYMS
API application programming interface
ASP application service provider
AWS Amazon Web Services
BDAF Big Data Architecture Framework
BDE Big Data Ecosystem
BDI Big Data Infrastructure
BDLM Big Data Lifecycle Management
BDRA NIST Big Data Reference Architecture
CCRA NIST Cloud Computing Reference Architecture (NIST SP 500-292)
CEOS Committee on Earth Observation Satellites
CLI command line interface
CPR Capability (framework) provider requirements
CSDI cloud services delivery infrastructure
CSP cloud service provider
DACI dynamic access control infrastructure
DSR data sources requirements
EC2 Elastic Compute Cloud, IaaS cloud service provided by AWS/Amazon
ECL Enterprise Control Language by LexisNexis (currently open source)
EDW enterprise data warehouse
EMR Elastic MapReduce
ETL extract-transform-load
FADI Federated Access and Delivery Infrastructure
GCE Google Compute Engine cloud
HDFS Hadoop Distributed File System
HPC high performance computing
IaaS Infrastructure as a Service
ICAF Intercloud Architecture Framework
ICFF Intercloud Federation Framework, part of ICAF
ICT information communication technologies
IDE integrated development environment
Big Data Analytics for Sensor-Network Collected Intelligence. http://dx.doi.org/10.1016/B978-0-12-809393-1.00002-7
# 2017 Elsevier Inc. All rights reserved.
21

Exploring the Variety of Random
Documents with Different Content

purpose, and given to the relations, who place the winnows on the roof of the house till
the following day, when the food is eaten.
By some Koravas, a ceremony in honour of the departed ancestors is performed at the
time of the November new moon. A well-polished brass vessel, with red and white
marks on it, is placed in the corner of a room, which has previously been swept, and
purified with cow-dung. In front of the pot is placed a leaf plate, on which cooked rice
and other edibles are set. Incense is burned, and the eldest son of the house partakes of
the food in the hope that he, in due course, will be honoured by his offspring.
The Koramas of Mysore are said to experience considerable difficulty in finding men to
undertake the work of carrying the corpse to the grave. Should the dead Korama be a
man who has left a young widow, it is customary for some one to propose to marry her
the same day, and, by so doing, to engage to carry out the principal part of the work
connected with the burial. A shallow grave, barely two feet deep, is dug, and the corpse
laid therein. When the soil has been loosely piled in, a pot of fire, carried by the chief
mourner in a split bamboo, is broken, and a pot of water placed on the raised mound.
Should the spot be visited during the night by a pack of jackals, and the water drunk by
them to slake their thirst after feasting on the dead Korama, the omen is accepted as
proof that the liberated spirit has fled away to the realms of the dead, and will never
trouble man, woman, child, or cattle. On the sixth day, the chief mourner must kill a
fowl, and mix its blood with rice. This he places, with some betel leaves and nuts, near
the grave. If it is carried off by crows, everything is considered to have been settled
satisfactorily.
As regards the dress of the Koravas, Mr. Mullaly writes as follows. “The women wear
necklaces of shells and cowries interspersed with beads of all colours in several rows,
hanging low down on the bosom; brass bangles from the wrist to the elbow; brass, lead,
and silver rings, very roughly made, on all their fingers except the middle one. The cloth
peculiar to Koravar women is a coarse black one; but they are, as a rule, not particular as
to this, and wear stolen cloths after removing the borders and all marks of identification.
They also wear the chola, which is fastened across the bosom, and not, like the
Lambādis, at the back. The men are dirty, unkempt-looking objects, wear their hair long,
and usually tied in a knot on the top of the head, and indulge in little finery. A joochi
(gochi), or cloth round the loins, and a bag called vadi sanchi, made of striped cloth,
complete their toilet.”
In 1884, Mr. Stevenson, who was then the District Superintendent of Police, North
Arcot, devised a scheme for the regeneration of the Koravas of that district. He obtained
for the tribe a tract of Government land near Gudiyattam, free of assessment for ten
years, and also a grant of Rs. 200 for sinking wells. Licenses were also issued to the
settlers to cut firewood at specially favourable rates. He also prevailed upon the

Zemindar of Karvetnegar to grant twenty-five cawnies of land in Tiruttani for ten years
for another settlement, as well as some building materials. Unfortunately the
impecunious condition of the Zemindar precluded the Tiruttani settlement from deriving
any further privileges which were necessary to keep the colony going, and its existence
was, therefore, cut short. The Gudiyattam colony, on the other hand, exhibited some
vitality for two or three years, but, in 1887, it, too, went the way of the Tiruttani
colony.”226 I gather, from the Police Administration Report, 1906, that a scheme is being
worked out, the object of which is to give a well-known wandering criminal gang some
cultivable land, and so enable the members of it to settle down to an honest livelihood.
At the census, 1891, Korava was returned as a sub-division of Paraiyans, and the name is
also applied to Jōgis employed as scavengers.227
The following note on the Koravas of the west coast is interesting as showing that
Malabar is one of the homes of the now popular game of Diavolo, which has become
epidemic in some European countries. “In Malabar, there is a class of people called
Koravas, who have, from time immemorial, played this game almost in the same manner
as its Western devotees do at the present time. These people are met with mostly in the
southern parts of Malabar, Cochin and Travancore, and they speak the Malayālam
language with a sing-song accent, which easily distinguishes them from other people.
They are of wandering habits. The men are clever acrobats and rope-dancers, but those
of more settled habits are engaged in agriculture and other industries. The beautiful grass
mats, known as Palghat mats, are woven by these people. Their women are fortune-
tellers and ballad singers. Their services are also in demand for boring the ears of girls.
The ropedancers perform many wonderful feats while balancing themselves on the rope,
among them being the playing of diabolo while walking to and fro on a tight rope. The
Korava acrobat spins the wooden spool on a string, attached to the ends of two bamboo
sticks, and throws it up to the height of a cocoanut tree, and, when it comes down, he
receives it on the string, to be again thrown up. There are experts among them who can
receive the spool on the string without even looking at it. There is no noteworthy
difference in the structure and shape of the spool used by the Koravas, and those of
Europe, except that the Malabar apparatus is a solid wooden thing a little larger and
heavier than the Western toy. It has not yet emerged from the crude stage of the village
carpenter’s skill, and cannot boast of rubber tyres and other embellishments which adorn
the imported article; but it is heavy enough to cause a nasty injury should it hit the
performer while falling. The Koravas are a very primitive people, but as acrobats and
ropedancers they have continued their profession for generations past, and there is no
doubt that they have been expert diabolo players for many years.”228 It may be noted
that Lieutenant Cameron, when journeying from Zanzibar to Benguela, was detained
near Lake Tanganyika by a native chief. He relates as follows. “Sometimes a slave of
Djonmah would amuse us by his dexterity. With two sticks about a foot long connected
by a string of a certain length, he spun a piece of wood cut in the shape of an hour-glass,

throwing it before and behind him, pitching it up into the air like a cricket-ball, and
catching it again, while it continued to spin.”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Gazetteer of the Bellary district.
Madras Diocesan Magazine, June, 1906.
John S. Chandler, a Madura Missionary, Boston.
Madras Mail, November, 1905.
J. Hornell. Report on the Indian Pearl Fisheries of the Gulf of Manaar, 1905.
Madras Diocesan Mag., 1906.
Notes from a Diary, 1881–86.
Lecture delivered at Trivandrum, MS.
Nineteenth Century, 1898.
Malay Archipelago.
Monograph. Ethnog: Survey of Cochin, No. 9, 1906.
Malabar Manual.
Manual of the Coimbatore district.
Madras Journ. Lit. Science, I. 1833.
W. W. Skeat and C. O. Blagden. Pagan Races of the Malay Peninsula, 1906.
Gazetteer of the Malabar district.
Madras Census Report, 1891.
Manual of Malabar.
Manual of the North Arcot district.

20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
A reddish formation found all over Southern India.
Op. cit.
Journey through Mysore, Canara, and Malabar.
Rev. H. Jensen. Classified Collection of Tamil Proverbs, 1897.
Gazetteer of the Trichinopoly district.
For this note I am indebted to Mr. N. Subramani Aiyar.
Mokhalingam is in Ganjam, not Vizagapatam.
Place of meeting, which is a large tamarind tree, under which councils are held.
Gazetteer of the Madura district.
Sētupati, or lord of the bridge. The title of the Rājas of Rāmnād.
Manual of the Madura district.
G. Oppert. Madras Journ. Lit. Science, 1888–9.
Notes on Criminal Classes of the Madras Presidency.
Madras Review, 1899.
Op. cit.
Illustrated Criminal Investigation and Law Digest, I, 3, 1908, Vellore.

46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
Madras Journ. Lit. Science, XXV.
I am informed that only Mēl-nādu, Sīrukudi, Mella-kōttai, and Puramalai are
endogamous.
Hindu Feasts, Fasts, and Ceremonies, 1903.
The Tamils eighteen hundred years ago, 1904.
Gazetteer of the Tanjore district.
Madras Mail, 1908.
Ind. Ant., III., 1874.
A lakh = a hundred thousand.
Compare the theft of Laban’s teraphim by Rachel. Genesis, XXXI, 19.
Gazetteer of the Tanjore district.
Ind. Ant., VIII, 1879.
Hutchinson. Marriage Customs in many lands, 1897.
Gazetteer of the Anantapur district.
Mediæval Sinhalese Art.
Maduraikanchi, Line 521.
E. Hultzsch. South Indian Inscriptions, II, i, 44, 46, 1891.

72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
Ibid. III, i, 47, 1899.
New Asiatic Review, Jan. 1907.
Madras Mail, 1907.
Classified Collection of Tamil Proverbs, 1897, from which some of the proverbs
quoted are taken.
See the legendary story narrated in the article on Tiyans.
Malabar and its Folk, 1900.
Letters from Malabar.
Gazetteer of the Vizagapatam district.
Yule and Burnell, Hobson-Jobson.
Monograph, Eth. Survey of Cochin, No. 4, 1905.
Unhusked rice.
Manual of the South Canara district.
Money-lender.
Malabar Quarterly Review, 1905.
Indian Review, III, 1902.
Monograph, Ethnog. Survey, Cochin.
According to another version of the legend, it was the hut of a Tiyan.
Malabar Manual.
C. Karunakara Menon. Madras Mus. Bull., V, 2, 1906.
Madras Mus. Bull., II, 3, 1901.

98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
This account is mainly from an article by Mr. N. Subramani Aiyar.
Ind. Ant., IX, 1880.
Historical Sketches, Mysore.
Dynasties of the Kanarese Districts of the Bombay Presidency.
Loc. cit., and Manual of the North Arcot district.
Section III, Inhabitants, Madras Government Press, 1907.
J. F. Kearns. Kalyāna shatanku.
Madras Series, IV, 1882; VI, 1883.
Illatakaru, a bride’s father having no son, and adopting his son-in-law.
See further C. Ramachendrier. Collection of Decisions of High Courts and the Privy
Council applicable to dancing-girls, illatom affiliation, etc., Madras, 1892.
Madras Mail, Nov. 1905.
Madras Mail, 1905.
Tamil and English Dictionary, 1862.
The word, in this sense, is said to occur in a Tamil work named Pingala Nikandu.
Karuku is Tamil for the serrated margin of the leaf—petiole of the palmyra palm.
Yule and Burnell. Hobson-Jobson.

123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
Manual of the Salem district.
Manual of the Tanjore district.
Madras Christ. Coll. Mag., 1894.
Malabar Law and Custom.
Mysore and Coorg Gazetteer.
Journ. Anthrop. Inst., II, 1873.
Indian Review, VII, 1906.
See Ravi Varma, the Indian Artist. Indian Press, Allahabad.
Madras Museum Bull., V. 3, 1907.
Epigraphia Indica, VI, 1900–1901.
Rev. J. Cain, Ind. Ant., VIII, 1879.
Trans. Ethnolog. Soc., London, 1869; Ind. Ant., VIII, 1879.
Original Inhabitants of Bhārathavarsha.
The panas have reference to the division of South Indian castes into the right- and left-
hand factions.
The mofussil indicates up-country stations and districts, as contra-distinguished from
the “Presidency” (Madras City).
Marriage Customs in Many Lands, 1897.
Moore. Indian Appeal Cases, Vol. III, 359–82.

148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
Journey through Mysore, Canara and Malabar.
See Talboys Wheeler, Madras in the Olden Time, II, 49–89.
See Tales of Kōmati Wit and Wisdom. C. Hayavadana Rao, Madras, 1907.
Classified Collection of Tamil Proverbs, 1897. See also C. Hayavadana Rao, op. cit.,
and Ind. Ant., XX, 78, 1891.
Gazetteer of the Godāvari district.
Linguistic Survey of India, IV, 1906.
Man. March 1902.
G.O., No. 1020, Public, 8th October 1901.
G.O., No. 3005, Revenue, 3rd November 1908.
Occasional Essays on Native South Indian Life, 1901.
Agricul: Ledger Series, Calcutta. No. 7, 1904.
Madras Mail, 1894.
A very interesting note on Totemism among the Khonds by Mr. J. E. Friend-Pereira
has been published in the Journal of Asiatic Society of Bengal, LXXIII, 1905.
The Golden Bough, 1900.
Selections from the Records, Government of India, No. V, Human Sacrifice and
Infanticide, 1854.
Personal Narrative of Service among the Wild Tribes of Khondistan.
Manual of the Vizagapatam district.
Journ. Asiat. Soc., Bengal, 1898.
Madras Mail, 1894.
Selections from the Records of the Government of India (Home Department), V.,
1845.
J. A. R. Stevenson. Madras Journ: Lit. Science, VI, 1837.

172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
J. E. Friend-Pereira. Journ: Asiat: Soc. Bengal, LXXI, 1902.
Madras Journ: Lit. Science, VI, 1837.
Loc. cit.
Journ. Anthrop. Soc., Bombay, II, 249.
Madras Mail, 1896.
Macpherson. Memorials of Service in India.
Journ., Anth. Soc., Bombay, II, 1890.
Ibid.
Madras Police Report, 1904.
Madras Mail, 1894.
Madras Mail, 1908.
See G.O., Judicial, 14th August 1882, No. 952, Khond Rising.
Letters from Malabar. Translation. Madras, 1862.
Fine cakes made of gram flour and a fine species of alkali, which gives them an
agreeable taste, and serves the purpose of making them rise and become very crisp when
fried.
Journ. Anthrop. Inst., IV., 1875.
Madras Christ. Coll. Mag. III, 1885–6.
Ind. Ant. X, 1881.
Journ. Anthrop. Inst. IV, 1875.
M. Paupa Rao Naidu. History of Railway Thieves.
Madras Journ. Lit: and Science, 1888–89.

197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
Tirumurukairuppadai.
Indian Antiquity, IX, 1880.
Cyclopædia of India.
Loc. cit.
Note on Koravas, 1908.
Notes on Criminal Classes of the Madras Presidency.
Forest Inspection Report, 1896.
F. S. Mullaly. Op. cit.
Madras Journ. Lit. Science, XVII, 1853.
History of Railway Thieves. Madras, 1904.
Gazetteer of the Trichinopoly district.
This story is based on well-known episode of Nalacharitra in the Āranya Parva of the
Mahabharatha.
M. Paupa Rao Naidu. Op. cit.
Ibid.
Police Report, 1902.
Op. cit.
A varāha or pagoda was worth Rs. 3–8–0.
A seer is an Indian measure of weight, varying in different parts of the country.
Trans. Eth. Sec. N.S., VII.
J. F. Kearns, Kalyāna Shatanku, 1868.
Ind. Ant., III., 1874.

223
224
225
226
227
228
India. Trübner. Oriental Series.
Ind. Ant., III, 1874.
Madras Mail, 1907.
For this account of the Koravas, I am largely indebted to a report by Mr. N. E. Q.
Mainwaring, Superintendent of Police.
Madras Mail, 1908.

Colophon
Availability
This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever. You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.org .
This eBook is produced by the Online Distributed Proofreading Team
at www.pgdp.net .
Volume Contents First Article
I A and B Abhishēka
II C to J Canji
III K Kabbēra
VI K to M Kōri
V M to P Marakkāyar
VI P to S Palli
VII T to Z Tābēlu
Scans of this book are available from the Internet Archive (copy 1 ,
2 ).
Project Gutenberg catalog page: 42993 .
Related Library of Congress catalog page: 10014128 .
Related Open Library catalog page (for source): OL7024564M .
Related Open Library catalog page (for work): OL1106958W .

Related WorldCat catalog page: 1967849 .
Encoding
Revision History
2011-08-08 Started.
External References
This Project Gutenberg eBook contains external references. These
links may not work for you.
Corrections
The following corrections have been applied to the text:
Page Source Correction
9 [Not in source] ;
96 Gāmpa Gampa
102 annointing anointing
103 Gangimakkulu Gangimakkalu
155 negociations negotiations
160 orginally originally
161 feed fed
181 ” [Deleted]
226 [Not in source] ’
300 Kolāyans Kōlayans
316 negociate negotiate
317 Bhāskarācharya Bhāskarāchārya
394 tumeric turmeric
495 ’? ?’

*** END OF THE PROJECT GUTENBERG EBOOK CASTES AND
TRIBES OF SOUTHERN INDIA. VOL. 3 OF 7 ***
Updated editions will replace the previous one—the old editions
will be renamed.
Creating the works from print editions not protected by U.S.
copyright law means that no one owns a United States
copyright in these works, so the Foundation (and you!) can copy
and distribute it in the United States without permission and
without paying copyright royalties. Special rules, set forth in the
General Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.
START: FULL LICENSE

THE FULL PROJECT GUTENBERG LICENSE

PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK
To protect the Project Gutenberg™ mission of promoting the
free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.
Section 1. General Terms of Use and
Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree
to abide by all the terms of this agreement, you must cease
using and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.
1.B. “Project Gutenberg” is a registered trademark. It may only
be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.

1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project Gutenberg™
works in compliance with the terms of this agreement for
keeping the Project Gutenberg™ name associated with the
work. You can easily comply with the terms of this agreement
by keeping this work in the same format with its attached full
Project Gutenberg™ License when you share it without charge
with others.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E. Unless you have removed all references to Project
Gutenberg:
1.E.1. The following sentence, with active links to, or other
immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project

Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and
with almost no restrictions whatsoever. You may copy it,
give it away or re-use it under the terms of the Project
Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country
where you are located before using this eBook.
1.E.2. If an individual Project Gutenberg™ electronic work is
derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of
the copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.
1.E.3. If an individual Project Gutenberg™ electronic work is
posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.
1.E.4. Do not unlink or detach or remove the full Project
Gutenberg™ License terms from this work, or any files

containing a part of this work or any other work associated with
Project Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute
this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
1.E.7. Do not charge a fee for access to, viewing, displaying,
performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.
1.E.8. You may charge a reasonable fee for copies of or
providing access to or distributing Project Gutenberg™
electronic works provided that:
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty

payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You provide a full refund of any money paid by a user who
notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.
• You provide, in accordance with paragraph 1.F.3, a full refund of
any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.E.9. If you wish to charge a fee or distribute a Project
Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright

law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.
1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except
for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU AGREE
THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT
LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT
EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE
THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.
1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you
discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person
or entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.

If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.
1.F.4. Except for the limited right of replacement or refund set
forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
1.F.5. Some states do not allow disclaimers of certain implied
warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the
Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you
do or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.
Section 2. Information about the Mission
of Project Gutenberg™

Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.
Volunteers and financial support to provide volunteers with the
assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.
Section 3. Information about the Project
Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status
by the Internal Revenue Service. The Foundation’s EIN or
federal tax identification number is 64-6221541. Contributions
to the Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.
The Foundation’s business office is located at 809 North 1500
West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact

Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.
The Foundation is committed to complying with the laws
regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or determine
the status of compliance for any particular state visit
www.gutenberg.org/donate.
While we cannot and do not solicit contributions from states
where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.
International donations are gratefully accepted, but we cannot
make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.
Please check the Project Gutenberg web pages for current
donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and

credit card donations. To donate, please visit:
www.gutenberg.org/donate.
Section 5. General Information About
Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.
Project Gutenberg™ eBooks are often created from several
printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.

Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
textbookfull.com

Big Data Analytics for Sensor Network Collected Intelligence A volume in Intelligent Data Centric Systems Hui-Huang Hsu

More Related Content

Similar to Big Data Analytics for Sensor Network Collected Intelligence A volume in Intelligent Data Centric Systems Hui-Huang Hsu (20)

Recently uploaded (20)

Big Data Analytics for Sensor Network Collected Intelligence A volume in Intelligent Data Centric Systems Hui-Huang Hsu