Big Data Analytics for Sensor Network Collected
Intelligence A volume in Intelligent Data
Centric Systems Hui-Huang Hsu pdf download
https://textbookfull.com/product/big-data-analytics-for-sensor-
network-collected-intelligence-a-volume-in-intelligent-data-
centric-systems-hui-huang-hsu/
Download more ebook from https://textbookfull.com
We believe these products will be a great fit for you. Click
the link to download now, or visit textbookfull.com
to discover even more!
Smart Sensors Networks Communication Technologies and
Intelligent Applications A volume in Intelligent Data
Centric Systems Fatos Xhafa
https://textbookfull.com/product/smart-sensors-networks-
communication-technologies-and-intelligent-applications-a-volume-
in-intelligent-data-centric-systems-fatos-xhafa/
Data Analytics for Intelligent Transportation Systems
Mashrur Chowdhury
https://textbookfull.com/product/data-analytics-for-intelligent-
transportation-systems-mashrur-chowdhury/
Big Data Analytics for Intelligent Healthcare
Management 1st Edition Nilanjan Dey
https://textbookfull.com/product/big-data-analytics-for-
intelligent-healthcare-management-1st-edition-nilanjan-dey/
Big Data Analytics Systems Algorithms Applications
C.S.R. Prabhu
https://textbookfull.com/product/big-data-analytics-systems-
algorithms-applications-c-s-r-prabhu/
Computational Intelligence Applications in Business
Intelligence and Big Data Analytics 1st Edition Vijayan
Sugumaran
https://textbookfull.com/product/computational-intelligence-
applications-in-business-intelligence-and-big-data-analytics-1st-
edition-vijayan-sugumaran/
Traffic Measurement for Big Network Data Chen
https://textbookfull.com/product/traffic-measurement-for-big-
network-data-chen/
Obtaining Value from Big Data for Service Systems,
Volume I: Big Data Management 2nd Edition Steven H.
Kaiser
https://textbookfull.com/product/obtaining-value-from-big-data-
for-service-systems-volume-i-big-data-management-2nd-edition-
steven-h-kaiser/
Big Mechanisms in Systems Biology Big Data Mining
Network Modeling and Genome Wide Data Identification
1st Edition Bor-Sen Chen
https://textbookfull.com/product/big-mechanisms-in-systems-
biology-big-data-mining-network-modeling-and-genome-wide-data-
identification-1st-edition-bor-sen-chen/
Healthcare Big Data Analytics Computational
Optimization and Cohesive Approaches Intelligent
Biomedical Data Analysis 10 1st Edition Bhoi
https://textbookfull.com/product/healthcare-big-data-analytics-
computational-optimization-and-cohesive-approaches-intelligent-
biomedical-data-analysis-10-1st-edition-bhoi/
Big Data Analytics for
Sensor-Network Collected
Intelligence
Big Data Analytics for
Sensor-Network Collected
Intelligence
Edited by
Hui-Huang Hsu
Tamkang University, Taiwan
Chuan-Yu Chang
National Yunlin University of Science and Technology, Taiwan
Ching-Hsien Hsu
Chung Hua University, Taiwan
Series Editor Fatos Xhafa
Universitat Politècnica de Catalunya, Spain
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1800, San Diego, CA 92101-4495, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
# 2017 Elsevier Inc. All rights reserved
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying, recording, or any information storage and retrieval system, without
permission in writing from the publisher. Details on how to seek permission, further information about
the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance
Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher
(other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience
broaden our understanding, changes in research methods, professional practices, or medical treatment
may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and
using any information, methods, compounds, or experiments described herein. In using such information or
methods they should be mindful of their own safety and the safety of others, including parties for whom they
have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any
liability for any injury and/or damage to persons or property as a matter of products liability, negligence or
otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the
material herein.
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
ISBN: 978-0-12-809393-1
For information on all Academic Press publications
visit our website at https://www.elsevier.com/books-and-journals
Publisher: Joe Hayton
Acquisition Editor: Sonnini R. Yura
Editorial Project Manager: Ana Claudia A. Garcia
Production Project Manager: Punithavathy Govindaradjane
Cover Designer: Victoria Pearson
Typeset by SPi Global, India
List of Contributors
Ahmad Anbar
The George Washington University, Washington, DC, United States
Haytham Assem
IBM, Dublin, Ireland
Christophe Blanchet
CNRS IFB, Orsay, France
Teodora S. Buda
IBM, Dublin, Ireland
Jiannong Cao
The Hong Kong Polytechnic University, Kowloon, Hong Kong
Chuan-Yu Chang
National Yunlin University of Science and Technology, Douliu City, Yunlin County, Taiwan
Jinjun Chen
University of Technology Sydney, Broadway, NSW, Australia
Cen Chen
Hunan University, Changsha, China
Szu-Ta Chen
National Taiwan University Hospital Yun-Lin Branch, Douliu City, Yunlin County, Taiwan
Kang Chen
Southern Illinois University, Carbondale, IL, United States
Zixue Cheng
University of Aizu, Aizuwakamatsu, Japan
Cees de Laat
University of Amsterdam, Amsterdam, The Netherlands
Yuri Demchenko
University of Amsterdam, Amsterdam, The Netherlands
Mingxing Duan
Hunan University, Changsha, China
Tarek El-Ghazawi
The George Washington University, Washington, DC, United States
Weiwei W. Fang
Beijing Key Lab of Transportation Data Analysis and Mining, Beijing Jiaotong
University, Beijing, China
Edmond J. Golden III
National Institute of Standards and Technology, Gaithersburg, MD, United States
Chu-Cheng Hsieh
Slice Technologies Inc., San Mateo, CA, United States
xiii
Ching-Hsien Hsu
Chung Hua University, Hsinchu, Taiwan
Hui-Huang Hsu
Tamkang University, Tamsui, Taiwan
Qian Huang
Southern Illinois University, Carbondale, IL, United States
Tian-Hsiang Huang
National Sun Yat-sen University, Kaohsiung, Taiwan
Chih-Chieh Hung
Tamkng University, New Taipei City, Taiwan
Pravin Kakar
Institute for Infocomm Research, Agency for Science, Technology and
Research (A*STAR), Singapore
Shonali Krishnaswamy
Institute for Infocomm Research, Agency for Science, Technology and
Research (A*STAR), Singapore
Chung-Nan Lee
National Sun Yat-sen University, Kaohsiung, Taiwan
Kenli Li
Hunan University, Changsha, China
Keqin Li
Hunan University, Changsha, China; State University of New York, New Paltz,
NY, United States
Xiao-Li Li
Institute for Infocomm Research, Agency for Science, Technology and
Research (A*STAR), Singapore
Qingyong Y. Li
Beijing Key Lab of Transportation Data Analysis and Mining, Beijing Jiaotong
University, Beijing, China
Hai-Ning Liang
Xi’an Jiaotong-Liverpool University, Suzhou, China
Chen Lin
National Yunlin University of Science and Technology, Douliu City, Yunlin County, Taiwan
Xuefeng Liu
The Hong Kong Polytechnic University, Kowloon, Hong Kong
Ming Liu
Beijing Key Lab of Transportation Data Analysis and Mining, Beijing Jiaotong
University, Beijing, China
Charles Loomis
SixSq Sàrl, Geneva, Switzerland
Chao Lu
Southern Illinois University, Carbondale, IL, United States
xiv List of Contributors
Ka L. Man
Xi’an Jiaotong-Liverpool University, Suzhou, China
Martial Michel
National Institute of Standards and Technology, Gaithersburg, MD, United States
Vijayakumar Nanjappan
Xi’an Jiaotong-Liverpool University, Suzhou, China
Minh N. Nguyen
Institute for Infocomm Research, Agency for Science, Technology and
Research (A*STAR), Singapore
Declan O’Sullivan
Trinity College Dublin, Dublin, Ireland
Phyo P. San
Institute for Infocomm Research, Agency for Science, Technology and
Research (A*STAR), Singapore
Olivier Serres
The George Washington University, Washington, DC, United States
Kathiravan Srinivasan
National Ilan University, Yilan City, Yilan County, Taiwan
Ming-Chun Tsai
National Sun Yat-sen University, Kaohsiung, Taiwan
Fatih Turkmen
University of Amsterdam, Amsterdam, The Netherlands
Wei Wang
Xi’an Jiaotong-Liverpool University, Suzhou, China
Junbo Wang
University of Aizu, Aizuwakamatsu, Japan
Yilang Wu
University of Aizu, Aizuwakamatsu, Japan
Chen-Ming Wu
National Sun Yat-sen University, Kaohsiung, Taiwan
Lei Xu
IBM, Dublin, Ireland
Chi Yang
University of Technology Sydney, Broadway, NSW, Australia
Jian-Bo Yang
Institute for Infocomm Research, Agency for Science, Technology and
Research (A*STAR), Singapore
Zhangdui D. Zhong
Beijing Key Lab of Transportation Data Analysis and Mining, Beijing Jiaotong
University, Beijing, China
xv
List of Contributors
Preface
There are three sources of information we can collect about the environment and the people in the en-
vironment: environmental sensors, wearable sensors, and social networks. Through intelligent analysis
of the huge amount of sensory data, we can develop various systems to automatically detect natural and
man-made events. Moreover, the systems can also try to understand people’s behavior and even inten-
tion. Thus better services can be provided to people in an unobtrusive manner.
With the advances in sensor and networking technologies, we are now able to collect sensory data
easily. These sensory data can be stored and processed in the cloud. Nevertheless, how to properly
utilize such a huge amount of data is another essential issue. We certainly hope that advanced ICT
technologies can help us perform intelligent analysis on these data and provide better services to people
automatically. Exciting new systems and research results have been developed. This book aims to in-
troduce these ambient intelligence and Internet of Things (IoT) systems, which are based on big data
analytics of collected sensory data.
The theme of this book is closely related to two hot topics: the Internet of Things and big data an-
alytics. Systems and technologies introduced in the book can be used as supplementary materials for
courses involving these two topics. Researchers, professionals, and practitioners in related fields can
also find useful information and technologies for their work. There are four parts of this book: big data
architecture and platforms; big data processing and management; big data analytics and services; and
big data intelligence and IoT systems. Each part includes three or four chapters. Here we briefly in-
troduce each of the 14 chapters.
Part I: Big Data Architecture and Platforms
1. Big Data: A Classification of Acquisition and Generation Methods
Vijayakumar Nanjappan, Hai-Ning Liang, Wei Wang, Ka L. Man
This chapter points out that it is very difficult to store, process, and analyze huge amounts of data
using conventional computing methodologies and resources. The authors classify the data into
digital and analog, environmental and personal. Data types and formats as well as input mechanisms
are also highlighted. These will help us understand the active and passive methods of data collection
and production.
2. Cloud Computing Infrastructure for Data Intensive Applications
Yuri Demchenko, Fatih Turkmen, Cees de Laat, Ching-Hsien Hsu, Christophe Blanchet,
Charles Loomis
This chapter proposes a cloud-based big data infrastructure (BDI). The general architecture and
functional components of BDI are described in detail. BDI is supported by the definition of the big
data architecture framework (BDAF). Two case studies in bioinformatics are illustrated in the
chapter to provide examples of requirements analysis and implementation.
3. Open Source Private Cloud Platforms for Big Data
Martial Michel, Olivier Serres, Ahmad Anbar, Edmond J. Golden III, Tarek El-Ghazawi
This chapter tells us that it is beneficial to use private clouds, especially open source clouds, for
big data. Security, privacy, and customization are the major concerns. The chapter introduces the most
prominent open source clouds in view of big data processing. A case study using an On-Premise
Private Cloud is also presented to demonstrate the implementation of such an environment.
xvii
Part II: Big Data Processing and Management
4. Efficient Nonlinear Regression-Based Compression of Big Sensing Data on Cloud
Chi Yang, Jinjun Chen
This chapter proposes a compression method for big sensing data based on a nonlinear regression
model. It improves the effectiveness and efficiency for processing real-world big sensing data.
Regression design, least squares, and triangular transform are discussed in this chapter. It is
demonstrated that the model achieves significant storage and time performance gains over other
compression models.
5. Big Data Management on Wireless Sensor Networks
Chih-Chieh Hung, Chu-Cheng Hsieh
This chapter gives an overview of data management issues and solutions in wireless sensor
networks. There are two possible models: centralized and decentralized. Data management can
be centralized for the benefit of computation, or decentralized for energy saving. Three major
issues for data management in both models are introduced: storage, query processing, and data
collection. Some case studies are also discussed.
6. Extreme Learning Machine and Its Applications in Big Data Processing
Cen Chen, Kenli Li, Mingxing Duan, Keqin Li
This chapter first reviews the extreme learning machine (ELM) theory and its variants. Due to its
memory-residency and high space/time complexity, the traditional ELM cannot train big data
efficiently. Optimization strategies are necessary to solve this problem. Thus, parallel ELM
algorithms based on MapReduce and Spark are described. Finally, practical applications of the
ELM for big data are also presented in this chapter.
Part III: Big Data Analytics and Services
7. Spatial Big Data Analytics for Cellular Communication Systems
Junbo Wang, Yilang Wu, Hui-Huang Hsu, Zixue Cheng
This chapter surveys methodologies of spatial big data analytics and possible applications to
support the cellular communication (CC) system. The CC system provides the most popular way to
connect people. However, it still faces challenges, such as unbalanced crowd communication
behavior and video transmission congestion. Spatial big data analytics can help the CC system to
provide services with better quality of service (QoS). Challenging issues are highlighted in this
chapter.
8. Cognitive Applications and Their Supporting Architecture for Smart Cities
Haytham Assem, Lei Xu, Teodora S. Buda, Declan O’Sullivan
This chapter proposes a cognitive architecture to enable big data applications with sensory
data for smart cities. It deals with organization, configuration, security, and optimization. This
chapter also reviews related work on location-based social networks and presents a novel
approach to detect urban patterns, especially anomalies. This is essential for better understanding
of human activities and behaviors.
9. Deep Learning for Human Activity Recognition
Phyo P. San, Pravin Kakar, Xiao-Li Li, Shonali Krishnaswamy, Jian-Bo Yang, Minh N. Nguyen
This chapter presents a systematic feature learning method for the problem of human activity
recognition (HAR). It adopts a deep convolutional neural network (CNN) to automate feature
learning from raw inputs. It is not necessary to handcraft features in advance. Such a
xviii Preface
unification of feature learning and classification results in mutual enhancements. This is
verified by comparing experimental results with several state-of-the-art techniques.
10. Neonatal Cry Analysis and Categorization System Via Directed Acyclic Graph Support
Vector Machine
Szu-Ta Chen, Kathiravan Srinivasan, Chen Lin, Chuan-Yu Chang
This chapter introduces a neonatal cry analysis and categorization system. From the cry of
the newborn, the system can identify different types of feelings such as pain, sleepiness, and
hunger. The sequential forward floating selection (SFFS) algorithm is used to choose the
discriminative features. The selected features are then used to classify the neonatal cries by
the directed acyclic graph support vector machine (DAG-SVM). The system is useful for
parents and nursing staff.
Part IV: Big Data Intelligence and IoT Systems
11. Smart Building Applications and Information System Hardware Co-Design
Qian Huang, Chao Lu, Kang Chen
This chapter emphasizes that a comprehensive understanding of information system hardware
is necessary when designing efficient smart building applications. The necessity and
importance of application and hardware co-design are discussed in this chapter. A case study
is also given to show that application and hardware co-design optimize the smart building
design from a system perspective.
12. Smart Sensor Networks for Building Safety
Xuefeng Liu, Jiannong Cao
This chapter presents the design and implementation of effective and energy-efficient structural
health monitoring (SHM) algorithms in resource-limited wireless sensor networks (WSNs).
Compared to traditional wired transmission, WSNs are low cost and easy to deploy for building
monitoring. Distributed versions of SHM algorithms can help overcome the bandwidth limitation.
A WSN-Cloud system architecture is also proposed for future SHM.
13. The Internet of Things and Its Applications
Chung-Nan Lee, Tian-Hsiang Huang, Chen-Ming Wu, Ming-Chun Tsai
This chapter first compares two lightweight protocols for the Internet of Things (IoT): MQ
telemetry transport (MQTT) and the constrained application protocol (CoAP). Both protocols
reduce the size of the packet and the over-loading of the bandwidth, thus saving battery power
and storage space. The major techniques for big data analytics are then introduced. Finally,
intelligent transportation systems and intelligent manufacturing systems are presented as
examples.
14. Smart Railway Based on the Internet of Things
Qingyong Y. Li, Zhangdui D. Zhong, Ming Liu, Weiwei W. Fang
This chapter discusses the framework and technologies for a smart railway based on Internet
of Things (IoT) and big data. The architecture of a smart railway, including the perception
and action layer, the transfer layer, the data engine layer, and the application layer, is
presented first. A case study on intelligent rail inspection is then introduced. This chapter
shows that a smart railway is promising in improving traditional railway systems.
xix
Preface
ACKNOWLEDGMENTS
This book is a part of the book series “Intelligent Data-Centric Systems.” First of all, we would like to
thank the series editor, Prof. Fatos Xhafa, for his encouragement and guidance in developing this book.
We gratefully acknowledge all the contributing authors of the chapters. This book would not have been
possible without their great efforts. We are also indebted to Ms. Ana Claudia Garcia, the editorial pro-
ject manager, and the whole production team at Elsevier for their continuous help in producing this
book. Finally, we thank our families for their love and support.
Hui-Huang Hsu, Chuan-Yu Chang, Ching-Hsien Hsu
September 2016
xx Preface
CHAPTER
BIG DATA: A CLASSIFICATION
OF ACQUISITION AND
GENERATION METHODS
1
Vijayakumar Nanjappan, Hai-Ning Liang, Wei Wang, Ka L. Man
Xi’an Jiaotong-Liverpool University, Suzhou, China
ACRONYMS
AUIs adaptive user interfaces
BAN body area network
BSN body sensor network
BSON binary JavaScript object notation records
BT business transactions
CLI command-line interfaces
CPU central processing unit
CSV comma-separated values
DA data analytics
DM data mining
DS document store
ECG electrocardiography
EEG electroencephalogram
Email electronic mail
EMG electromyography
GB gigabyte
GPS Global Positioning System
GS graph store
GUI graphical user interfaces
HIDs human interface devices
HTML hypertext markup language
IoT Internet of Things
IR infrared
IUI intelligent user interfaces
JPEG joint photographic experts group
JSON Javascript object notation records
KD knowledge discovery
KV key-value stores
LED light-emitting diode
MB megabyte
MEMS Micro-Electro Mechanical Systems
NoSQL not only structured query language
Big Data Analytics for Sensor-Network Collected Intelligence. http://dx.doi.org/10.1016/B978-0-12-809393-1.00001-5
# 2017 Elsevier Inc. All rights reserved.
3
NUI natural user interfaces
ORC optimized row columnar
OS operating system
PC personal computer
PNG portable network graphics
PS proximity sensor
RC files Record Columnar files
RFID radio frequency identification
RPC Remote Procedure Call
SD scientific data
SF sequence file
SI satellite imagery
SMD social media data
SoC System on Chip
VUI voice user interfaces
WIMP windows icons menus and pointer device
WSN wireless sensor network
WWW World Wide Web
XML extensible markup language
1 BIG DATA: A CLASSIFICATION
The coinage of the term “big data” alludes to datasets of exceptionally massive sizes with distinct and
intricate structures. They can be extremely difficult to analyze and visualize with any personal com-
puting devices and conventional computational methods [1]. In fact, enormous datasets of complex
structures have been generated and used for a long time, for example, in satellite imagery (SI), raster
data, geographical, biological, and ecological data; data used for scientific research can also be con-
sidered as “big data.” Nowadays, we see that many different kinds of big data exist in our lives, from
social media data (SMD), to organization and enterprise data, to the sensor data on the Internet of
Things (e.g., metrological data about our environment and healthcare data).
1.1 CHARACTERISTICS OF BIG DATA
In 2001, Doug Laney characterized big data from three perspectives, volume, velocity, and variety (the
3Vs) [2]. Volume refers to the magnitude of data, which usually determines the potential value of the
data. Velocity refers to speed at which data is generated and processed according the requirements of
different applications. Variety refers to the nature and different types of data. Later, the research com-
munity proposed two additional Vs: veracity and value. Veracity indicates the trustworthiness and qual-
ity of the data. This is particularly important, as big data are usually collected from a variety of sources,
some of which may not provide high-quality, reliable data. The term value is used to indicate the po-
tential (or hope) that valuable information or insight can be extracted or derived from the big data pro-
vided that the data is appropriately processed and analyzed. These characteristics bring new challenges
into the data processing and analytics pipeline. As the size of the data is constantly increasing and the
4 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
velocity of the data generation is higher than the processing speed, scalable storage and efficient data
management methods are needed to enable real-time or near real-time data processing by the analytical
tools. To ensure the creditability of the analytics, the quality of the data must be taken into consider-
ation, for example, to identify erroneous processes and uncertain, unreliable, or missing data.
2 BIG DATA GENERATION METHODS
In today’s digital era, the data unambiguously denote digital data which can be either born-digital or
born-analog, but eventually converted into digital form. There have already been large amounts of con-
ventional digital data such as Web documents, social media, and business transaction (BT) data. In
recent years, the “Internet of Things” (IoT) has generated vast volumes of data about our physical world
captured by sensing devices. Many everyday objects are embedded with a variety of sensors capable of
collecting analog data and converting it into digital. Besides conventional data, sensor data are becom-
ing the next big data source.
2.1 DATA SOURCES
2.1.1 Born-digital data
The born-digital data are created and managed using computers or other digital devices. Almost all
documents in personal computers are stored in some standardized file formats (e.g., Word or PDF doc-
uments). Advances in Internet and World Wide Web (WWW) technologies have enabled computers
around the world to be connected so that billions of Web documents can be accessed anywhere. The
emergence of Web 2.0 technologies enriched data and media types from text-only to images, videos,
and audios, as well as the associated metadata such as temporal and geographical information. We can
see now that numerous images and videos are being uploaded to social media websites which are an-
notated with location information and tagging data related to their contents. Some of the other tradi-
tional big data sources include electronic mails, instant messages, medical records, and business
transactions.
2.1.2 Sensor data
Recently, billions of physical objects, such as sensors, smartphones, tablets, wearable devices, and ra-
dio frequency identifications (RFIDs), embedded with identification, sensing, computing, communi-
cation, and actuation capabilities, are increasingly connected to the Internet, resulting in the next
technological revolution, known as the “Internet of Things” (IoT). Integration of multiple semiconduc-
tor components on a single chip (System on Chip) is the key success of the Internet of Things, which has
the potential to revolutionize a large array of intelligent applications and services in many fields.
According to Gartner, the network of connected things will reach nearly 20.8 billion by 2020, with
around 5.5 million new devices being connected every day [3]. It is estimated that by the end of 2017,
sales of worldwide wearable electronic devices will be increased by 39% [4]. In contrast, there is a
9.6% decline in worldwide PC shipments, which indicates that smart devices are more preferred in
the market [5]. It is reported that by 2018, new digital devices that can talk to each other in the house-
hold will be common [6]. It is estimated that nearly 3 trillion gigabytes of data are produced in a single
5
2 BIG DATA GENERATION METHODS
day. The high volumes of heterogeneous data streams coming from these varieties of devices bring
great challenges to the traditional data management methods.
A widespread example of these portable devices are mobile phones or smart devices, like Apple’s
watch, have been integrated with varieties of sensors like accelerometer, gyroscope, compass, Global
Positioning System (GPS), and more recently sensors that can capture biometric information such as
heart rate. Table 1 lists commonly used sensors on smartphones or tablets.
Sensors built on the Micro-Electro Mechanical Systems (MEMS) are small in size and only have
limited processing and computing capabilities. A wireless sensor networks (WSN) can be developed by
connecting the spatially distributed sensors using wireless interfaces. There can be different kinds of
sensors integrated into a single WSN, such as mechanical, magnetic, thermal, biological, chemical, and
optical. A sensor can be either immobile or mobile (including wearable). While immobile sensors are
installed on an object at a fixed location [7], mobile sensors are usually installed on a moving object.
A wearable sensor is a special kind of mobile sensor and is worn on the human body, which can be used
to form a body sensor network (BSN) or body area network (BAN) [8].
The fixed sensors can be installed on earth surfaces like terrain [9], or submerged under the water
[10] and under the land [11]. In contrast, mobile sensors can move and interact with surrounding phys-
ical environments. Wearable sensors are worn by the users and can convert physical or environmental
parameters of wearers such as blood pressure [12,13], heart rate [14,15], bodily motion [16], brain ac-
tivity [17], and skin temperature [18]. Table 2 summarizes some of the most commonly used sensors
in BSNs.
2.2 DATA TYPES
Interactions among physical objects, sensors, and people generate massive amounts of data, which can
be either structured or unstructured. Table 3 illustrates some of the examples on the different types of
the data.
Table 1 Common Sensors Integrated in Smartphones and Tablets
Sensors on
Smartphones Function
Microphone The real-world sound and vibration are converted to digital audio
Camera Senses visible light or electromagnetic radiation and converts them to digital image or video
Gyroscope Provides orientation information
Accelerometer Measures the linear acceleration
Compass or
magnetometer
Works as a traditional compass. Provides orientation in relation to the magnetic field of Earth
Proximity sensor Finds proximity of the phone from the user
Ambient light sensor Optimizes the display brightness
GPS Global Positioning System, tracks the target location or “navigates” the things by map with
the help of GPS satellites
Barometer Measures atmospheric pressure
Fingerprint sensor Captures the digital image of fingerprint pattern
6 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
2.2.1 Structured data
Structured data are usually defined with fixed attributes, type, and format—for example, records in a
relational database are generated according to a predefined schema. Compared to unstructured or semi-
structured data, processing of structured data is relatively simpler and more straightforward. This type
of data can be generated by people, machines, and sensors.
(1) Human-generated structured data: the data are created under explicit human involvement using
some interaction mechanisms, e.g., data generated through human-machine interface devices like
mouse input data and click-streams.
(2) Machine-generated structured data: the data are created automatically by a computing device
without explicit human interaction, e.g., Web log data.
Table 2 Commonly Used Sensors in Body Area Networks or Body Sensor Networks
Sensor Function
Blood-pressure sensor Measures human blood pressure
Camera pill Measures gastrointestinal tracts
Carbon dioxide sensor Measures carbon dioxide gas
ECG/EEG/EMG sensor Measures the electrical and muscular functions of the heart
Humidity sensor Measures humidity changes
Blood oxygen saturation Measures blood oxygen saturation
Pressure sensor Measures pressure value
Respiration sensor Measures human respiration values
Temperature sensor Measures human body temperature
Table 3 Data Types and Data Sources
Structured Data Unstructured Data
Human-generated Input data
Click-streams
Text documents
Social media data
Mobile data
Web page content
Machine-generated Web logs/server logs
Satellite imagery
Scientific data
Image and video
Radar data
Sensor-generated Fixed sensor data
Mobile sensor data
7
2 BIG DATA GENERATION METHODS
(3) Sensor-generated structured data: the data are generated by the embedded fixed or moveable
sensors, e.g., sensor data from smartphones and smart meters.
2.2.2 Unstructured data
Unstructured data are the opposite of structured data, without a predefined data model. Some common
examples include text, images, audio, video, and streaming sensor data. Unstructured data are one pri-
mary source of big data and are much more challenging to process compared to structure data. Human-
generated unstructured data include a large number of data types with different nature, such as textual
data (Web documents, licensed publications, e-journals, eBooks, organizational records, e-mails, logs),
and media data of different types contributed by ordinary users on social media platforms. Examples of
machine-generated unstructured data include scientific data (e.g., astronomical data, geographic, eco-
logical, biological, chemical, and geospatial data), satellite images of weather, surveillance data, and
radar data (e.g., meteorological and oceanographic seismic data).
3 BIG DATA: DATA ACQUISITION METHODS
Human interaction with computers and devices creates vast amounts of data. In the PC era, human
interface devices (HIDs), like keyboards and mice, support users in interacting with created digital
data. Most of the digital user-generated text data have been created by conventional and widely used
major input devices like keyboards and mice (or touchpads in portable computers) with explicit human
involvement. Digitized analog data or sensor data are generated using audio and camera devices,
known as multimedia data. The introduction of tactile-feedback technology has added an extra dimen-
sion to the manner in which people interact with computers. The stylus, a pen-shaped instrument used
with tactile-feedback devices and graphics tablets to interact, write on the surface of the screen, making
interaction more direct. The stylus and similar haptic-based devices allow users to interact directly with
the displayed content with multitouch gestures as an input, in lieu of the physical keyboard and pointing
devices. The rise of smart touch-based devices, embedded with sensors, has added diversity to existing
interaction methods, enabling richer interactive gesture-based interaction methods.
3.1 INTERFACE METHODS
Communication between the user and a computer system is done through various interface mechanisms,
especially using input/output devices. In this section, we review some of the most important ones and
their evolution, and describe how they contribute to data generation (see Fig. 1 for a summary).
3.1.1 Command-line interfaces
The command-line interface (CLI) or character user interface (CUI) is one of the first types of interface
methods that allows users to send text-based commands to the system. Text commands are converted to
appropriate operating system functions. The CLI is the earliest and oldest form of interface, but offers
powerful and concise control over programs. As such, the earliest forms of digital text data are created
using CLIs. The amounts of data generated are not significant. This is an important feature, as in earlier
systems, memory was limited and expensive.
8 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
3.1.2 Graphical user interfaces
The graphical user interface (GUI), popularized by Microsoft Windows, is an interactive visual inter-
face rather than a command or text-only interface. The interactive interface tools are visually repre-
sented as windows, icons, menus, and a pointer device, which collectively are known as WIMP.
The GUI interface also includes a text interface, called the graphical character-based interface. Pres-
ently, GUI is the most common and well-known user interface for computers and some earlier mobile
devices like mobile phones and laptops. Gracoli, a hybrid interface, combines the strengths of the GUI
and CLI to perform application specific interfaces [19].
3.1.3 Context-sensitive user interfaces
Context-sensitive user interfaces is almost pervasively used within GUIs, and allow users to choose
automatically from available multiple options based on the current or previous state of the application
process. Context menus in GUIs are the principal example of context-sensitive user interfaces. The
primary use of the context-sensitive user interface is to simplify the interface by reducing the number
of commands or clicks or keystrokes required to perform a given action. This type of interface plays
a crucial role where interface devices have limited number of buttons, like video games controlled by a
mouse, joystick, or gamepad. With the emergence of mobile devices, whose main input entry is via
a touch-based screen, context-sensitive interfaces have found more uses. A variety of contextual
options are provided via distinct taps and gestures on the screen.
3.1.4 Web-based user interfaces
A Web user interface or Web app allows the user to interact with content or software running on a
remote server through a Web browser. The content or Web page is downloaded from the Web server
and the user can interact with this content in a Web browser, which acts as a client. The distributed
nature allows the content to be stored on a remote server, while the ubiquitous nature of the Web
browser permits a convenient access to the content. The most common Web applications are Webmail,
online shopping, online document sharing, social media, and instant messaging. A vast amount of data
exists now, generated by these types of interfaces.
3.1.5 Adaptive user interfaces or intelligent user interfaces
Adaptive user interfaces (AUIs), also referred as intelligent user interface (IUIs), support users’ cus-
tomization of the interface by changing the layout and other elements according to the user or context
requirements. AUIs are either user-initiated adaptable or system-initiated self-adaptive. Their aim is to
Command-line interface Multi-touch gestural interface Natural user interface
Graphical user interface
FIG. 1
Evolution of user interfaces, user input methods or data generation.
9
3 BIG DATA: DATA ACQUISITION METHODS
offer efficient, intuitive, and secure way interfaces to users based on their unique preferences, traits, and
environmental circumstances.
AUIs are capable of passively recognizing a user’s presence, and offer services based on their im-
mediate requirements.
3.1.6 Natural user interfaces
The natural user interface (NUI) is a gesture-based simple and intuitive interface, and allows users to
naturally interact with the systems without any physical encumbrances, involving body movements,
gestures, and voice.
3.1.7 Voice interfaces
Voice user interfaces (VUI) are based on speech recognition technology and enable users to interact or
send commands to computers or smart devices using voice or speech. This is the most natural way of
allowing users to interact with computers or smart devices, similar to how one would communicate
with other people. The most commonly used voice interaction roles are command- and agent-based
interactions. The command-based interaction allows the user to give speech input to the system, most
commonly in a simple but specific predefined order. The agent-based interactions recognize natural
language as input and provide an appropriate response through text or audio on the system. Apple’s
Siri and Google’s Voice Search are typical examples of voice interfaces.
3.1.8 Gesture-based interfaces
Gesture-based interfaces attempt first to recognize gestures as commands. They distinguish continuous
physical moments of the users, organizing the hands, fingers, face, head, and body into a discrete se-
quence of commands. The successful interpretations of meaningful gestures are recognized by the re-
ceiving system, which let users interact with the systems in a more natural way. Sensor evolution also
led to the enhancement of novel human gesture-based interaction with smart connected devices in the
IoT. The natural, continuous meaningful movements of users, involving their hands, fingers, head,
face, and body, can all be part of this process of user-system interaction. This will be an important
way of interaction as for instance with the growing emphasis on BSNs where sensors are placed on
or attached to the human body to passively capture physiological data and body movements [20].
3.1.9 Multitouch gesture interface
A multitouch interface is a gesture-based interface that supports two or more continuous gestures to
interact with touch surfaces. On smart devices, for example, multitouch interfaces allow more direct
interaction with applications and are considered to be natural and intuitive gestures. This empowers
varieties of actions like taps, swipes, rotations, pinches, and other natural gestures. Touchpads and
touchscreens on portable smart devices are powered by multitouch interfaces. The ever increasing
dominance of these devices has replaced traditional input devices like keyboards and mice as most
data-generating input devices.
3.1.10 Touchless gesture interfaces
Touchless gesture interfaces completely eliminate physical contact with a device directly as by touch-
ing or indirectly via a secondary device like a mouse. This is thought to that it make interaction even
more natural and intuitive by letting users be free of any physical attachments and involving only their
10 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
body movements. Second to voice, human movements without the need for any physical controls are
closer to how people interact with one another. Touchless gesture interfaces aim to replicate this type of
communication, which is achieved through selections of intelligent sensing devices located around the
users. A touchless gestural interface represents an intelligent and a natural user interface method for
users to interact with systems using intuitive and unencumbered physical movements and gestures [20].
We next review some input devices and tools that enable the creation of the interfaces we have
presented above.
3.2 INTERFACE DEVICES
Input devices enable the user to input data directly into the computer. The best-known HIDs are the text
entry device keyboard and pointing devices like the mouse, trackball, light pen, and stylus, and other
devices like the joystick and touchscreen.
3.2.1 Keyboard
The keyboard is a typewriter-style device with a series of electronic switches or keys which allow users
to send text and alphanumeric data directly into computers. The switches each represent one character.
The most common English-language keyboard layout is a typewriter-style QWERTY layout. The stan-
dard computer keyboard contains alphabet keys, number keys, punctuation symbol keys, arrow keys,
and functional and control keys. The keyboard is the primary peripheral device for data entry. The vir-
tual, touchscreen-based keypad is used in mobile devices to simulate the physical keyboard.
3.2.2 Mice
The mouse allows a user to manipulate objects indirectly using a pointer-like representation by detect-
ing two-dimensional motion in a GUI. The mouse is a hand-controlled device, and typically has one or
two buttons. The mouse click is generated by pressing any of the buttons once, holding it, or releasing
it immediately. There are different variations of mouse clicks to select objects, move the pointer to the
desired location on the display, and input commands into the system. The keyboard and mouse are
the most integrated computer peripherals which allow the user to interact with the system. In contrast
to the keyboard, the mouse is supported only in GUIs. The trackball is another pointing device very
similar to the mouse.
3.2.3 Joystick
The joystick is a control column input device with a lever which controls the movement of a pointer in
all directions on the display. Similar to the mouse, joysticks include buttons known as triggers for ad-
ditional functionality. The joystick is typically used in games and sometimes as a replacement for the
mouse in certain situations. Miniature versions of finger-operated joysticks are now adopted in mobile
devices.
3.2.4 Stylus
The stylus, a pen-shaped input device, allows the user to input commands to the computer, mobile,
and other smart devices via their display. The stylus is used on the touchscreen devices to make se-
lections by tapping, or writing or drawing on the screen, just like using a pen on a notebook. The
11
3 BIG DATA: DATA ACQUISITION METHODS
stylus is more commonly used in portable handheld devices, like laptops and tablets, than on desktop
computers.
3.2.5 Touchpad
The touchpad is a pointing device or cursor-controlling device for portable computers. Touchpads
function a very similar way to mice and contain a tactile sensor to identify the position and motion
of the user’s fingers in contact with the pad. Touchpads introduced multitouch gesture-based interface
mechanisms. In addition to the taps and swipes features of the touchpad, gesture-based interface allows
additional gesture-based application special input methods.
3.2.6 Touchscreens
The touchscreen is a combination of both display and input device. A transparent touch-sensitive panel
is embedded on the rigid planar surface that recognizes the touch or press of users’ fingers as input. The
touchscreen replaced mouse or stylus with users’ fingers as an input device, giving the feeling of more
directness to users when they manipulate content on the display. Touchscreens have brought in wide-
spread use of multitouch gesture-based input interactions with modern devices.
Besides the above more recognizable input devices, there are a number of emergent ones which are
becoming more widely used and support new ways of interactive with computing systems. The inte-
gration of these emergent devices is only possible by powerful machines that can capture large amounts
of data and process them in real-time. In addition, the development of BANs has made the use of some
of these devices feasible. We describe some the emergent input devices next.
3.2.7 Kinect
The Kinect is device that captures body motion of users with them being placed at a certain distance of a
display. Its motion sensors translate a user’s physical body position and movements into commands.
Initially developed for the Xbox game consoles, it is now used for other applications and devices.
3.2.8 Leap motion
Leap is an in-air gestural user interface device. Leap uses two monochromic infrared (IR) cameras and
three IR LEDs covers hemispherical area at a distance of 1 m. It is similar to the Kinect but is designed
for closer interaction from any display.
3.2.9 Myo
Myo is a muscle-controlled arm worn gestural device. Myo recognizes forearm muscle movements and
transmits them wirelessly as valid gestural commands to interact with PCs or other systems.
3.2.10 Wearable devices
Wearable devices or gadgets are electronic devices worn by consumers ubiquitously and continually
to capture or track biometric information related to health or fitness. Wearable devices are new
manifestations of accessories that people wear, such as Apple’s Watch or Samsung’s Gear Watch
or more dedicated tools like the Fitbit One wireless activity and sleep tracker and monitor. Wearable
devices with biometric tracking capabilities represent one of the most important sources of data gen-
eration. They will continuously and uninterruptedly record data of different types and from a variety of
environments.
12 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
As data increase in variety and volume in parallel to the need to support greater velocity in their
generation and processing, it is important to have a way to organize them. Organization and manage-
ment of data will therefore be explored next.
4 BIG DATA: DATA MANAGEMENT
4.1 DATA REPRESENTATION AND ORGANIZATION
Current systems represent data using a binary digital system. The data types are converted into binary
digits represented of 1s or 0s called bits. A byte, equivalent to a sequence of 8-bits, is the fundamental
unit of storage. Different standards are used to encode data objects by assigning bit patterns together. In
order to utilize the storage space efficiently, data are compressed using various compression tech-
niques. One of the major requirements in big data is their low latency in their processing. In contrast
with the traditional methods of performing computations on stored data, the data must be processed as it
is generated in, or almost near, real-time. Thus low latency is a key requirement in big data
analytics (DA).
4.1.1 File formats
The file format is the description of how the collection of data is internally represented on a storage
medium in a file. Data processing and query performance are heavily based on the file format. In order
to reduce the total number of bytes moved from storage disk to temporary memory, data is often com-
pressed. Data compression methods save time to transfer data, but with a tradeoff that data have to be
decompressed. A selection of file formats has significant performance consequences. The compression
support reduces the size of data on the disks to maximize input/output and central processing unit
(CPU) resources to de-serialize the data. The query performance is mainly based on the amounts of
input/output and CPU resources required to transfer and decompress the data. The file formats can
be structured and unstructured.
We next describe some important file formats.
Javascript object notation records (JSON)
JSON is an open, lightweight, highly human, and machine-readable standard based on a subset of the
Javascript programming language that allows data interchange. JSON is a platform and language in-
dependent text format. It uses the conventions from different language families, including C, C++, C#,
Java, Javascript, Perl, Python, and others. JSON supports arrays and understands the different type of
standard data types, such as strings, numbers, and Boolean values. Computers can easily parse and
generate JSON records that can describe complex data structures.
JSON is built on two universal structures: as an object, such as the collection of value pairs, and an
array, such as the ordered list of values.
Binary Javascript object notation records (BSON)
BSON is a binary-encoded serialization of JSON-like documents. The value pairs are stored as a single
entity called document. BSON is also lightweight, traversable, and efficient. The extensions in BSON
allow representation of data types in addition to the standard JSON types. BSON supports embedding
13
4 BIG DATA: DATA MANAGEMENT
documents and arrays with other documents and arrays. In comparison with other binary interchange
formats, BSON is more “schema-less.”
Comma-separated values (CSV)
Comma-separated values (CSV) is a standard file format for spreadsheet data used to exchange data
between distinct applications. The data is represented in a text file; each record is represented as one
line, and commas are used to separate data fields in each row. CSV is used to exchange data between
Hadoop and external systems.
Sequence file
A sequence file (SF) is a flat, compact binary storage format for serialized key-value pairs. These files
provide additional support for two different levels of compression formats like a record and block com-
pressions in addition to the uncompressed format. The file metadata is supported with a “secondary”
Text/Text pair key-value list. The files can be easily split and processed in parallel. A SF consists of a
header followed by one or more records.
Record columnar files
Record Columnar files (RC files) are intended for efficient and high-performing processing of data.
They are flat files and support columnar formats that consist of binary key/value pairs. RC files store
columns of a table in a record columnar way by horizontally partitioning the rows into row splits and
vertically partitioning them in a columnar way. The metadata of row splits remains stored in keys while
data of a row split stored as values. Since being introduced in 2011,1
RC files have been adopted in
major real-world systems for big DA, including in Facebook’s Hadoop cluster.2
Optimized row columnar files (ORC files)
Optimized row columnar (ORC) files are further optimized and intended to replace RC files. In an ORC
file, the collection of the row data is in the columnar format, optimized for compression; these collec-
tions of rows are stored in one separate file. This format supports parallel processing of row collections
across multiple clusters. The lightweight indexing enables the feature of skipping a complete block that
is not required for the requested query. ORC files come with basic statistics on its columns.
Parquet files
The Apache parquet is a columnar file format, and stores binary data in a column-oriented way. The
values of each column are organized adjacent to each other, enabling efficient, flexible compression
options and encoding schemes. Parquet file format supports all data processing frameworks and data
models. A single parquet file size range can reach up to gigabytes, and is optimized to process large
volumes of data, typically suited to data warehouse-style operations.
1
He Y, Lee R, Huai Y, Shao Z, Jain N, Zhang X, et al. RCFile: a fast and space-efficient data placement structure
in MapReduce-based warehouse systems. In: Proceedings of the IEEE international conference on data engineering
(ICDE); 2011.
2
http://www.slideshare.net/ydn/2-hive-integrationhadoopsummit2010.
14 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
Avro files
Avro is a binary data storage format, providing data serialization and data exchange services. Avro
supports a rich set of primitive data types. The data efficiently serialize into files or a message. The
data and data definition are combined together in a single file or message, making Avro schemas to
perform rapid serialization. The data stored in a binary format, making it compact and efficient.
The data definition is stored in JSON format, making it easy to read and interpret. The markers in Avro
files split large datasets into subsets. Avro files support both primitive data and complex data types.
Avro handles data schema changes. The data stored in Avro files can easily be portable between dif-
ferent programming languages.
Avro supports the Remote Procedure Call (RPC) interface in data exchange services to effectively
allow different programs to communicate data and information. Avro RPC interfaces and schemas are
defined in JSON. Avro heavily relies on its schemas, both data and its schemas are stored in a file.
When describing these different file formats, we have also made some references to data compres-
sion. This is an important aspect of the management of big data. We shall discuss some salient aspects
of data compression next.
4.1.2 Data compression
In big data, petabytes of data are captured, stored, and analyzed. The high volumes of data generally
increase the input/output operations and transferring these large datasets over the network will take
considerable time. The real-time DA need efficient management using these disk input/output and net-
work bandwidth resources. Data compression mitigates these problems by not just saving storage space
but also increasing the data transfer speed across the network. It is crucial in big data environments to
combine data compression and to increase the network transfer to improve the performance of DA ac-
tivities. Compression of massive datasets certainly increases the utilization of the CPU, as the data must
be decompressed to be processed at a later stage.
Hadoop supports multiple compression formats most commonly referred as codec—short name
for coder and decoder. There exist a set of compiled Java libraries that can be used in Hadoop to
perform data compression and decompression. Each codec has one algorithm implementation for
compression and decompression. Hadoop supports both splittable and nonsplittable compression al-
gorithms. A splittable algorithm enhances performance as large data blocks are distributed across
multiple data nodes and multiple MapReduce tasks decompress data blocks in parallel. Nonsplittable
algorithms, on the other hand, combine data blocks together and use one MapReduce task for
decompression.
There are a number of Hadoop codecs, which we describe next briefly.
4.1.3 Hadoop codecs
Deflate uses a combination of the Huffman coding, a form of prefix coding and LZ77 compression,
which works by finding and replacing redundant data with metadata.
LZ4 is a speed-focused lossless compression algorithm, belongs to a LZ77 based byte-oriented
compression scheme. The maximum compression speed is 400 MB/s per core, decompressed speed
in multiple GB/s per core, expandable to multicores.
Gzip is a file format, based on the Deflate algorithm, used for file compression and decompression.
Bzip2 is an open format file compression based on the Burrows-Wheeler algorithm, used to
compress single files. Bzip2 uses multiple layers of stacked up compression techniques.
15
4 BIG DATA: DATA MANAGEMENT
Snappy codecs, previously known as Zipply, provide very high speed and reasonable
compression. The maximum compression speed is 250 MB/s or more and decompression speed
about 500 MB/s or more. Snappy is optimized for 64-bit x86-compatible processors. Snappy
assumes little-endian throughout and requires byte-swapping of data in several places for
big-endian platform. Snappy is a robust and stable system, and has successfully compressed
and decompressed petabytes of data in Google’s production environment.
Typical compression ratio for plain text data is 1.5–1.7, for HTML about 2–4, and for image data
like JPEGs and PNGs and other compressed formats, about 1.0.
Files, whether compressed or not, need to be organized properly. The organization of files is usually
down to databases.
4.2 DATABASES
In contrast to the traditional relational databases, a NoSQL (not only SQL) database is a geographically
distributed nonrelational database system. A NoSQL database system runs on multiple cluster nodes,
with individual instances of operating systems and built-in storage on each node. This feature support
is aimed largely at organizing and analyzing large amounts of heterogeneous data types, regardless of
OS. The nodes facsimile data across numerous nodes to ensure that there is no data loss during node
failure. The cluster services restore the data from the failed node through a single system image to
redistribute the data across the cluster.
4.2.1 Dynamic schema
In contrast to traditional relational databases, which require that database schemas should be defined
before data insertion, NoSQL permits data insertion without a predefined database schema. This allows
applications to integrate schema iteration rapidly in real-time. The side-code is added by the developers
to ensure quality controls by keeping specific fields and data types. This validation method imposes
authority on data without compromising the benefits of dynamic schema.
4.2.2 Sharding, replication and auto-caching
Sharding is a method of storing data records across many server instances. This is done through storage
area networks to make hardware perform like a single server. The NoSQL framework is natively
designed to support automatic distribution of the data across multiple servers including the query load.
Both data and query replacements are automatically distributed across multiple servers located in the
different geographic regions, and this facilitates rapid, automatic, and transparent replacement of the
data or query instances without any disruption. The cloud computing and platform as a service frame-
work makes this feature considerably easier. The most frequently used data are kept in the integrated in-
memory database instead of being placed in a separate caching later to maintain the lowest latency and
also provide the highest throughput.
4.2.3 NoSQL types
Key-value stores
Key-value (KV) stores, or key-value databases, are the simplest NoSQL databases. KV stores use an
associate array data model, known as a hash or dictionary. In this model, every single record in the
database is stored as an attribute name or key, together with its value in a schema-less way. This
16 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
relationship is known as key-value pair. In each key-value pair, the key is represented by a string and
the value is the data for the key. In particular, the key-value stores do not require a query language, but
provide a way to store, retrieve, and update data.
Notable key-value databases are Riak, Redis, Memcached, BerkerlyDB, Upscaledb, Amazon
DynamoDB, Couchbase, and Project Voldemort.
Table 4 shows the comparison of different NoSQL data-models.
Document stores
Document stores (DS) record data in a key-value pairs in a structured format which the database can
understand. Each document contains data and a unique key is assigned to retrieve the document. It
allows the adding of new fields of data by including additional key-value pairs into documents.
The transparent way of storing data remove query limitations by key. This allows content-oriented
retrieval of full-page, often semistructured data with a single query and is suited for content-oriented
applications. The documents are in XML, JSON, and BSON file formats.
The most notable and popular document databases are MongoDB, CouchDB, Terrastore,
OrientDB, RavenDB, and Lotus Notes.
Column-oriented stores
Column databases, as the name suggests, are designed to record data tables as rows of columns of data.
The columns of data always group related data as rows and are associated with a unique row key. This
inverse feature of relational database systems provides optimized queries over very big datasets and
offers very scalable architecture with extremely high performance. The columnar database is highly
compressed to save storage space and is also capable of self-indexing. The most popular column-
oriented databases are Cassandra, HBase, Hypertable, and Amazon DynamoDB.
Graph stores
These data stores are designed to represent data entities and the undetermined interconnected relation-
ships between these entities as a graph. The entities are similar to nodes with properties. The edges
represent relationships with their own properties, including directional significance. The nodes and
their relationships are organized as a graph. The relationship is actually persevered and the data are
interpreted in different ways based on their relationships in the graph. This supports rapid traversing
of joining or relationships. The nodes can have multiple types of relationships with start and end nodes
along with their own properties. The properties of the relationships are used to add intelligence to the
relationship and also employed to query the graph.
The notable graph databases are Neo4J, Infinite Graph, and OrientDB or FlockDB.
Table 4 NoSQL Data-Model Comparison
Data-Model Performance Scalability Flexibility Complexity Functionality
Key-value store High High High None Variable (none)
Column store High High Moderate Low Minimal
Document store High Variable (high) High Low Variable (low)
Graph store Variable Variable High High Graph theory
17
4 BIG DATA: DATA MANAGEMENT
4.3 DATA FUSION AND DATA INTEGRATION
Data are generated from varieties of different sources and each data source carries significant infor-
mation that is sufficient to analyze and process the data. The data obtained directly from different
sources can have some redundant information and can also have heterogeneous representations. Re-
trieval of meaningful information from heterogeneous datasets has limitations. In order to manage data
and retrieve valuable information from data efficiently, it is essential to merge heterogeneous datasets
into one homogeneous data representation. Data fusion provides this by combining information from
multiple sources to form a unified representation.
Data fusion can be defined as [21]: “A multi-level process dealing with the association, correlation,
combination of data and information from single and multiple sources to achieve refined position, iden-
tify estimates and complete and timely assessments of situations, threats and their significance.” An
alternative definition is from Hall and Llinas [22]: “data fusion techniques combine data from multiple
sensors and related information from associated databases to achieve improved accuracy and more spe-
cific inferences than could be achieved by the use of a single sensor alone.”
Data fusion systems are used in a wide range of domains such as sensor networks, text proces-
sing, and video and image processing, to name a few. In big data, the high velocity of heteroge-
neous data types implies the importance of having data fusion. Advance developments in Internet
of Things connect networks of sensors. These networks encompass sensor nodes and at least one
base station. Every sensor nodes are integrated with sensors, data processing tools, a radio com-
munication system, and a battery. In these networks, raw data may present redundant information
and provide sufficient information about its relevance. In multisensor networks, transmitting raw
data can cause data collisions and there could be a higher chance of having inaccurate/unreliable
information from some abnormal nodes. In order to aggregate valid data to yield effective infor-
mation, it is essential to process the data. Data fusion facilitates better usage of network bandwidth,
a great network lifetime, utilizes the energy resources, and above all offers an efficient and high
level of accurate information retrieval. As such, data fusion represents one of the bigger challenges
in big data.
5 SUMMARY
The objective of this chapter is to give a broad overview of acquisition and generation methods of
big data. In the digital century, the term “big data” has expanded its boundary from scientific
data (e.g., satellite imagery data and geographical data) to the sensor data on the Internet of Things
(e.g., metrological data and healthcare data). The new boundary adds more characteristics, known as
volume, velocity, variety, veracity, and value—the Vs of big data. In the same way, the expansion also
brings new challenges into the big data processing and analytics pipeline. In fact, the coinage
“big data” unambiguously denotes digital data, either born-digital or converted into digital data from
born-analog. The computers or other digital devices are the main sources of born-digital data,
whereas born-analog or sensor data are captured by various sensing devices. These data are not only
predefined with a data model, known as structured data, but also without any predefined model,
branded as unstructured data. Moreover, these massive amounts of data are generated with or without
explicit human involvement. The tactile-feedback technology has added an extra dimension to the
18 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
manner in which human interact with computers or devices. Equally, the rise of smart devices, with
embedded sensors, has incorporated more diversity to existing interaction methods.
Different types of big data are created from well-known text-only keyboard to rapidly growing
wearable devices and are successfully converted into binary digits, or bits. A selection of file formats,
notably JSON, BSON, CSV, and RC files, are used to store the collection of data. The data are com-
pressed to reduce the size on storage disks to maximize input/output and CPU resources.
A geographically distributed nonrelational database system, NoSQL is used to handle unstructured
data. NoSQL encompasses the following different types of database technologies: key-value stores,
document stores, column-oriented stores, and graph stores. Additionally, NoSQL permits dynamic data
insertion without a predefined database scheme compared relational database predefined schema. Data
fusion merges heterogeneous datasets into one homogeneous data representation by combining infor-
mation from multiple sources to form a unified representation.
REFERENCES
[1] Sagiroglu S, Sinanc D. Big data: a review. In: International conference on Collaboration Technologies and
Systems (CTS); 2013. p. 42–7.
[2] Laney D. 3D data management: Controlling data volume, velocity and variety. META Group Research
Note 6, 2001. p. 70.
[3] Gartner Says 6.4 Billion Connected [Internet]. Available from: http://www.gartner.com/newsroom/id/
3165317 [cited 20.04.16].
[4] Gartner Says Worldwide Wearable Devices Sales to Grow 18.4 Percent in 2016 [Internet]. Available from:
http://www.gartner.com/newsroom/id/3198018 [cited 20.04.16].
[5] Gartner Says Worldwide PC Shipments Declined 9.6 Percent in First Quarter of 2016 [Internet]. Available
from: http://www.gartner.com/newsroom/id/3280626 [cited 20.04.16].
[6] When to Expect Devices and Connected [Internet]. Available from: http://www.gartner.com/newsroom/id/
3220117 [cited 20.04.16].
[7] Yick J, Mukherjee B, Ghosal D. Wireless sensor network survey. Comput Netw 2008;52(12):2292–330.
[8] Lai X, Liu Q, Wei X, Wang W, Zhou G, Han G. A survey of body sensor networks. Sensors 2013;13
(5):5406–47.
[9] Akyildiz IF, Su W, Sankarasubramaniam Y, Cayirci E. A survey on sensor networks. IEEE Commun Mag
2002;40(8):102–14.
[10] Akyildiz IF, Pompili D, Melodia T. Challenges for efficient communication in underwater acoustic sensor
networks. SIGBED Rev 2004;1(2):3–8.
[11] Li M, Liu Y. Underground structure monitoring with wireless sensor networks. In: Proceedings of the 6th
international conference on information processing in sensor networks (IPSN ’07) [Internet]. New York:
ACM; 2007. p. 69–78. Available from http://doi.acm.org/10.1145/1236360.1236370 [cited 22.04.16].
[12] Espina J, Falck T, Muehlsteff J, Aubert X. Wireless body sensor network for continuous cuff-less blood pres-
sure monitoring. In: 3rd IEEE/EMBS international summer school on medical devices and biosensors; 2006.
p. 11–5.
[13] Teng XF, Zhang YT, Poon CCY, Bonato P. Wearable medical systems for p-health. IEEE Rev Biomed Eng
2008;1:62–74.
[14] Paradiso R, Loriga G, Taccini N. A wearable health care system based on knitted integrated sensors. IEEE
Trans Inf Technol Biomed 2005;9(3):337–44.
19
REFERENCES
[15] Rienzo MD, Rizzo F, Parati G, Brambilla G, Ferratini M, Castiglioni P. MagIC system: a new textile-based
wearable device for biological signal monitoring. Applicability in daily life and clinical setting. In: IEEE
engineering in medicine and biology 27th annual conference; 2005. p. 7167–9.
[16] Mattmann C, Clemens F, Tr€
oster G. Sensor for measuring strain in textile. Sensors 2008;8(6):3719–32.
[17] Devot S, Bianchi AM, Naujoka E, Mendez MO, Braurs A, Cerutti S. Sleep monitoring through a textile re-
cording system. In: 29th annual international conference of the IEEE Engineering in Medicine and Biology
Society; 2007. p. 2560–3.
[18] Jung S, Ji T, Varadan VK. Point-of-care temperature and respiration monitoring sensors for smart fabric ap-
plications. Smart Mater Struct 2006;15(6):1872.
[19] Verma P. Gracoli: a graphical command line user interface. In: CHI’13 extended abstracts on human factors
in computing systems (CHI EA’13) [Internet]. New York: ACM; 2013. p. 3143–6. Available from http://doi.
acm.org/10.1145/2468356.2479631 [cited 30.03.16].
[20] Garzotto F, Valoriani M. Touchless gestural interaction with small displays: a case study. In: New York:
ACM Press; 2013. p. 1–10. Available from http://dl.acm.org/citation.cfm?doid¼2499149.2499154 [cited
02.07.15].
[21] White FE. Data Fusion Lexicon, Joint Directors of Laboratories, Technical Panel for C3, Data Fusion Sub-
Panel. San Diego, CA: Naval Ocean Systems Center; 1991.
[22] Hall DL, Llinas J. An introduction to multisensor data fusion. Proc IEEE 1997;85(1):6–23.
GLOSSARY
Data analytics It is the science of exploring large amounts of data to discover hidden patterns and correlations,
and draw conclusions based on the findings.
Data mining and Knowledge discovery It is an interdisciplinary computational process to analyze data for dis-
covering useful knowledge from data.
Raster data It is a data structure that is represented as a regular grid (rectangular or square) of cells.
Satellite imagery It is the collection images of Earth and other planets collected by satellites.
Scientific research It is the systematic investigation of scientific theories and hypotheses.
20 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
CHAPTER
CLOUD COMPUTING
INFRASTRUCTURE FOR DATA
INTENSIVE APPLICATIONS
2
Yuri Demchenko*, Fatih Turkmen*, Cees de Laat*, Ching-Hsien Hsu†
, Christophe Blanchet{
,
Charles Loomis§
University of Amsterdam, Amsterdam, The Netherlands*
Chung Hua University, Hsinchu, Taiwan†
CNRS IFB,
Orsay, France{
SixSq Sàrl, Geneva, Switzerland§
ACRONYMS
API application programming interface
ASP application service provider
AWS Amazon Web Services
BDAF Big Data Architecture Framework
BDE Big Data Ecosystem
BDI Big Data Infrastructure
BDLM Big Data Lifecycle Management
BDRA NIST Big Data Reference Architecture
CCRA NIST Cloud Computing Reference Architecture (NIST SP 500-292)
CEOS Committee on Earth Observation Satellites
CLI command line interface
CPR Capability (framework) provider requirements
CSDI cloud services delivery infrastructure
CSP cloud service provider
DACI dynamic access control infrastructure
DSR data sources requirements
EC2 Elastic Compute Cloud, IaaS cloud service provided by AWS/Amazon
ECL Enterprise Control Language by LexisNexis (currently open source)
EDW enterprise data warehouse
EMR Elastic MapReduce
ETL extract-transform-load
FADI Federated Access and Delivery Infrastructure
GCE Google Compute Engine cloud
HDFS Hadoop Distributed File System
HPC high performance computing
IaaS Infrastructure as a Service
ICAF Intercloud Architecture Framework
ICFF Intercloud Federation Framework, part of ICAF
ICT information communication technologies
IDE integrated development environment
Big Data Analytics for Sensor-Network Collected Intelligence. http://dx.doi.org/10.1016/B978-0-12-809393-1.00002-7
# 2017 Elsevier Inc. All rights reserved.
21
Exploring the Variety of Random
Documents with Different Content
purpose, and given to the relations, who place the winnows on the roof of the house till
the following day, when the food is eaten.
By some Koravas, a ceremony in honour of the departed ancestors is performed at the
time of the November new moon. A well-polished brass vessel, with red and white
marks on it, is placed in the corner of a room, which has previously been swept, and
purified with cow-dung. In front of the pot is placed a leaf plate, on which cooked rice
and other edibles are set. Incense is burned, and the eldest son of the house partakes of
the food in the hope that he, in due course, will be honoured by his offspring.
The Koramas of Mysore are said to experience considerable difficulty in finding men to
undertake the work of carrying the corpse to the grave. Should the dead Korama be a
man who has left a young widow, it is customary for some one to propose to marry her
the same day, and, by so doing, to engage to carry out the principal part of the work
connected with the burial. A shallow grave, barely two feet deep, is dug, and the corpse
laid therein. When the soil has been loosely piled in, a pot of fire, carried by the chief
mourner in a split bamboo, is broken, and a pot of water placed on the raised mound.
Should the spot be visited during the night by a pack of jackals, and the water drunk by
them to slake their thirst after feasting on the dead Korama, the omen is accepted as
proof that the liberated spirit has fled away to the realms of the dead, and will never
trouble man, woman, child, or cattle. On the sixth day, the chief mourner must kill a
fowl, and mix its blood with rice. This he places, with some betel leaves and nuts, near
the grave. If it is carried off by crows, everything is considered to have been settled
satisfactorily.
As regards the dress of the Koravas, Mr. Mullaly writes as follows. “The women wear
necklaces of shells and cowries interspersed with beads of all colours in several rows,
hanging low down on the bosom; brass bangles from the wrist to the elbow; brass, lead,
and silver rings, very roughly made, on all their fingers except the middle one. The cloth
peculiar to Koravar women is a coarse black one; but they are, as a rule, not particular as
to this, and wear stolen cloths after removing the borders and all marks of identification.
They also wear the chola, which is fastened across the bosom, and not, like the
Lambādis, at the back. The men are dirty, unkempt-looking objects, wear their hair long,
and usually tied in a knot on the top of the head, and indulge in little finery. A joochi
(gochi), or cloth round the loins, and a bag called vadi sanchi, made of striped cloth,
complete their toilet.”
In 1884, Mr. Stevenson, who was then the District Superintendent of Police, North
Arcot, devised a scheme for the regeneration of the Koravas of that district. He obtained
for the tribe a tract of Government land near Gudiyattam, free of assessment for ten
years, and also a grant of Rs. 200 for sinking wells. Licenses were also issued to the
settlers to cut firewood at specially favourable rates. He also prevailed upon the
Zemindar of Karvetnegar to grant twenty-five cawnies of land in Tiruttani for ten years
for another settlement, as well as some building materials. Unfortunately the
impecunious condition of the Zemindar precluded the Tiruttani settlement from deriving
any further privileges which were necessary to keep the colony going, and its existence
was, therefore, cut short. The Gudiyattam colony, on the other hand, exhibited some
vitality for two or three years, but, in 1887, it, too, went the way of the Tiruttani
colony.”226 I gather, from the Police Administration Report, 1906, that a scheme is being
worked out, the object of which is to give a well-known wandering criminal gang some
cultivable land, and so enable the members of it to settle down to an honest livelihood.
At the census, 1891, Korava was returned as a sub-division of Paraiyans, and the name is
also applied to Jōgis employed as scavengers.227
The following note on the Koravas of the west coast is interesting as showing that
Malabar is one of the homes of the now popular game of Diavolo, which has become
epidemic in some European countries. “In Malabar, there is a class of people called
Koravas, who have, from time immemorial, played this game almost in the same manner
as its Western devotees do at the present time. These people are met with mostly in the
southern parts of Malabar, Cochin and Travancore, and they speak the Malayālam
language with a sing-song accent, which easily distinguishes them from other people.
They are of wandering habits. The men are clever acrobats and rope-dancers, but those
of more settled habits are engaged in agriculture and other industries. The beautiful grass
mats, known as Palghat mats, are woven by these people. Their women are fortune-
tellers and ballad singers. Their services are also in demand for boring the ears of girls.
The ropedancers perform many wonderful feats while balancing themselves on the rope,
among them being the playing of diabolo while walking to and fro on a tight rope. The
Korava acrobat spins the wooden spool on a string, attached to the ends of two bamboo
sticks, and throws it up to the height of a cocoanut tree, and, when it comes down, he
receives it on the string, to be again thrown up. There are experts among them who can
receive the spool on the string without even looking at it. There is no noteworthy
difference in the structure and shape of the spool used by the Koravas, and those of
Europe, except that the Malabar apparatus is a solid wooden thing a little larger and
heavier than the Western toy. It has not yet emerged from the crude stage of the village
carpenter’s skill, and cannot boast of rubber tyres and other embellishments which adorn
the imported article; but it is heavy enough to cause a nasty injury should it hit the
performer while falling. The Koravas are a very primitive people, but as acrobats and
ropedancers they have continued their profession for generations past, and there is no
doubt that they have been expert diabolo players for many years.”228 It may be noted
that Lieutenant Cameron, when journeying from Zanzibar to Benguela, was detained
near Lake Tanganyika by a native chief. He relates as follows. “Sometimes a slave of
Djonmah would amuse us by his dexterity. With two sticks about a foot long connected
by a string of a certain length, he spun a piece of wood cut in the shape of an hour-glass,
throwing it before and behind him, pitching it up into the air like a cricket-ball, and
catching it again, while it continued to spin.”
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Gazetteer of the Bellary district.
Madras Diocesan Magazine, June, 1906.
John S. Chandler, a Madura Missionary, Boston.
Madras Mail, November, 1905.
J. Hornell. Report on the Indian Pearl Fisheries of the Gulf of Manaar, 1905.
Madras Diocesan Mag., 1906.
Notes from a Diary, 1881–86.
Lecture delivered at Trivandrum, MS.
Nineteenth Century, 1898.
Malay Archipelago.
Monograph. Ethnog: Survey of Cochin, No. 9, 1906.
Malabar Manual.
Manual of the Coimbatore district.
Madras Journ. Lit. Science, I. 1833.
W. W. Skeat and C. O. Blagden. Pagan Races of the Malay Peninsula, 1906.
Gazetteer of the Malabar district.
Madras Census Report, 1891.
Manual of Malabar.
Manual of the North Arcot district.
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Madras Census Report, 1891.
Manual of the North Arcot district.
A reddish formation found all over Southern India.
Op. cit.
Journey through Mysore, Canara, and Malabar.
Rev. H. Jensen. Classified Collection of Tamil Proverbs, 1897.
Manual of the North Arcot district.
Gazetteer of the Trichinopoly district.
For this note I am indebted to Mr. N. Subramani Aiyar.
Madras Census Report, 1891.
Mokhalingam is in Ganjam, not Vizagapatam.
Place of meeting, which is a large tamarind tree, under which councils are held.
Gazetteer of the Madura district.
Gazetteer of the Madura district.
Sētupati, or lord of the bridge. The title of the Rājas of Rāmnād.
Gazetteer of the Madura district.
Madras Census Report, 1891.
Manual of the Madura district.
G. Oppert. Madras Journ. Lit. Science, 1888–9.
Madras Census Report, 1901.
Notes on Criminal Classes of the Madras Presidency.
Madras Review, 1899.
Madras Census Report, 1891.
Gazetteer of the Madura district.
Op. cit.
Illustrated Criminal Investigation and Law Digest, I, 3, 1908, Vellore.
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
Madras Journ. Lit. Science, XXV.
I am informed that only Mēl-nādu, Sīrukudi, Mella-kōttai, and Puramalai are
endogamous.
Madras Census Report, 1891.
Manual of the Madura district.
Hindu Feasts, Fasts, and Ceremonies, 1903.
The Tamils eighteen hundred years ago, 1904.
Madras Census Report, 1901.
Gazetteer of the Tanjore district.
Madras Census Report, 1891.
Madras Mail, 1908.
Ind. Ant., III., 1874.
A lakh = a hundred thousand.
Compare the theft of Laban’s teraphim by Rachel. Genesis, XXXI, 19.
Gazetteer of the Tanjore district.
Madras Census Report, 1891.
Madras Census Report, 1901.
Manual of the North Arcot district.
Ind. Ant., VIII, 1879.
Manual of the North Arcot district.
Madras Census Report, 1891.
Hutchinson. Marriage Customs in many lands, 1897.
Gazetteer of the Bellary district.
Gazetteer of the Anantapur district.
Mediæval Sinhalese Art.
Maduraikanchi, Line 521.
E. Hultzsch. South Indian Inscriptions, II, i, 44, 46, 1891.
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
Ibid. III, i, 47, 1899.
Madras Census Report, 1891.
Madras Census Report, 1871.
New Asiatic Review, Jan. 1907.
Madras Mail, 1907.
Classified Collection of Tamil Proverbs, 1897, from which some of the proverbs
quoted are taken.
Madras Census Report, 1901.
See the legendary story narrated in the article on Tiyans.
Malabar and its Folk, 1900.
Letters from Malabar.
Madras Census Report, 1901.
Ind. Ant., VIII, 1879.
Gazetteer of the Vizagapatam district.
Yule and Burnell, Hobson-Jobson.
Monograph, Eth. Survey of Cochin, No. 4, 1905.
Unhusked rice.
Manual of the South Canara district.
Money-lender.
Malabar Quarterly Review, 1905.
Indian Review, III, 1902.
Monograph, Ethnog. Survey, Cochin.
According to another version of the legend, it was the hut of a Tiyan.
Malabar Manual.
Gazetteer of the Malabar district.
C. Karunakara Menon. Madras Mus. Bull., V, 2, 1906.
Madras Mus. Bull., II, 3, 1901.
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
This account is mainly from an article by Mr. N. Subramani Aiyar.
Ind. Ant., IX, 1880.
Manual of the North Arcot district.
Madras Census Report, 1901.
Historical Sketches, Mysore.
Gazetteer of the Madura district.
Gazetteer of the Madura district.
Madras Census Report, 1891.
Dynasties of the Kanarese Districts of the Bombay Presidency.
Loc. cit., and Manual of the North Arcot district.
Section III, Inhabitants, Madras Government Press, 1907.
Ind. Ant., VIII, 1879.
J. F. Kearns. Kalyāna shatanku.
Madras Series, IV, 1882; VI, 1883.
Illatakaru, a bride’s father having no son, and adopting his son-in-law.
See further C. Ramachendrier. Collection of Decisions of High Courts and the Privy
Council applicable to dancing-girls, illatom affiliation, etc., Madras, 1892.
Gazetteer of the Anantapur district.
Madras Mail, Nov. 1905.
Madras Mail, 1905.
Gazetteer of the Malabar district.
Manual of the North Arcot district.
Gazetteer of the Malabar district.
Tamil and English Dictionary, 1862.
The word, in this sense, is said to occur in a Tamil work named Pingala Nikandu.
Karuku is Tamil for the serrated margin of the leaf—petiole of the palmyra palm.
Yule and Burnell. Hobson-Jobson.
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
Madras Census Report, 1901.
Manual of the Salem district.
Madras Census Report, 1901.
Manual of the Madura district.
Manual of the Tanjore district.
Madras Census Report, 1891.
Madras Christ. Coll. Mag., 1894.
Malabar Law and Custom.
Ind. Ant., VIII, 1879.
Mysore and Coorg Gazetteer.
Madras Census Report, 1901.
Journ. Anthrop. Inst., II, 1873.
Gazetteer of the Anantapur district.
Indian Review, VII, 1906.
See Ravi Varma, the Indian Artist. Indian Press, Allahabad.
Madras Census Report, 1891.
Madras Museum Bull., V. 3, 1907.
Epigraphia Indica, VI, 1900–1901.
Rev. J. Cain, Ind. Ant., VIII, 1879.
Trans. Ethnolog. Soc., London, 1869; Ind. Ant., VIII, 1879.
Original Inhabitants of Bhārathavarsha.
The panas have reference to the division of South Indian castes into the right- and left-
hand factions.
The mofussil indicates up-country stations and districts, as contra-distinguished from
the “Presidency” (Madras City).
Marriage Customs in Many Lands, 1897.
Moore. Indian Appeal Cases, Vol. III, 359–82.
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
Journey through Mysore, Canara and Malabar.
See Talboys Wheeler, Madras in the Olden Time, II, 49–89.
See Tales of Kōmati Wit and Wisdom. C. Hayavadana Rao, Madras, 1907.
Classified Collection of Tamil Proverbs, 1897. See also C. Hayavadana Rao, op. cit.,
and Ind. Ant., XX, 78, 1891.
Madras Census Report, 1871.
Gazetteer of the Godāvari district.
Linguistic Survey of India, IV, 1906.
Man. March 1902.
G.O., No. 1020, Public, 8th October 1901.
G.O., No. 3005, Revenue, 3rd November 1908.
Occasional Essays on Native South Indian Life, 1901.
Madras Census Report, 1881.
Agricul: Ledger Series, Calcutta. No. 7, 1904.
Madras Mail, 1894.
Gazetteer of the Vizagapatam district.
A very interesting note on Totemism among the Khonds by Mr. J. E. Friend-Pereira
has been published in the Journal of Asiatic Society of Bengal, LXXIII, 1905.
The Golden Bough, 1900.
Selections from the Records, Government of India, No. V, Human Sacrifice and
Infanticide, 1854.
Personal Narrative of Service among the Wild Tribes of Khondistan.
Manual of the Vizagapatam district.
Journ. Asiat. Soc., Bengal, 1898.
Madras Mail, 1894.
Selections from the Records of the Government of India (Home Department), V.,
1845.
J. A. R. Stevenson. Madras Journ: Lit. Science, VI, 1837.
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
J. E. Friend-Pereira. Journ: Asiat: Soc. Bengal, LXXI, 1902.
Madras Journ: Lit.  Science, VI, 1837.
Loc. cit.
Journ. Anthrop. Soc., Bombay, II, 249.
Madras Mail, 1896.
Macpherson. Memorials of Service in India.
Journ., Anth. Soc., Bombay, II, 1890.
Ibid.
Madras Police Report, 1904.
Madras Mail, 1894.
Madras Mail, 1908.
See G.O., Judicial, 14th August 1882, No. 952, Khond Rising.
Manual of the South Canara district.
Letters from Malabar. Translation. Madras, 1862.
Fine cakes made of gram flour and a fine species of alkali, which gives them an
agreeable taste, and serves the purpose of making them rise and become very crisp when
fried.
Journ. Anthrop. Inst., IV., 1875.
Madras Christ. Coll. Mag. III, 1885–6.
Ind. Ant. X, 1881.
Manual of the South Canara district.
Journ. Anthrop. Inst. IV, 1875.
Manual of the South Canara district.
Manual of the South Canara district.
M. Paupa Rao Naidu. History of Railway Thieves.
Madras Census Report, 1891.
Madras Journ. Lit: and Science, 1888–89.
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
Tirumurukairuppadai.
Madras Census Report, 1901.
Indian Antiquity, IX, 1880.
Cyclopædia of India.
Loc. cit.
Note on Koravas, 1908.
Notes on Criminal Classes of the Madras Presidency.
Forest Inspection Report, 1896.
Gazetteer of the Bellary district.
Gazetteer of the Vizagapatam district.
F. S. Mullaly. Op. cit.
Madras Journ. Lit. Science, XVII, 1853.
History of Railway Thieves. Madras, 1904.
Madras Census Report, 1901.
Gazetteer of the Trichinopoly district.
This story is based on well-known episode of Nalacharitra in the Āranya Parva of the
Mahabharatha.
M. Paupa Rao Naidu. Op. cit.
Ibid.
Police Report, 1902.
Op. cit.
A varāha or pagoda was worth Rs. 3–8–0.
A seer is an Indian measure of weight, varying in different parts of the country.
Trans. Eth. Sec. N.S., VII.
J. F. Kearns, Kalyāna Shatanku, 1868.
Ind. Ant., III., 1874.
Madras Census Report, 1871.
223
224
225
226
227
228
Madras Census Report, 1901.
India. Trübner. Oriental Series.
Ind. Ant., III, 1874.
Madras Mail, 1907.
For this account of the Koravas, I am largely indebted to a report by Mr. N. E. Q.
Mainwaring, Superintendent of Police.
Madras Mail, 1908.
Colophon
Availability
This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever. You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.org .
This eBook is produced by the Online Distributed Proofreading Team
at www.pgdp.net .
Volume Contents First Article
I A and B Abhishēka
II C to J Canji
III K Kabbēra
VI K to M Kōri
V M to P Marakkāyar
VI P to S Palli
VII T to Z Tābēlu
Scans of this book are available from the Internet Archive (copy 1 ,
2 ).
Project Gutenberg catalog page: 42993 .
Related Library of Congress catalog page: 10014128 .
Related Open Library catalog page (for source): OL7024564M .
Related Open Library catalog page (for work): OL1106958W .
Related WorldCat catalog page: 1967849 .
Encoding
Revision History
2011-08-08 Started.
External References
This Project Gutenberg eBook contains external references. These
links may not work for you.
Corrections
The following corrections have been applied to the text:
Page Source Correction
9 [Not in source] ;
96 Gāmpa Gampa
102 annointing anointing
103 Gangimakkulu Gangimakkalu
155 negociations negotiations
160 orginally originally
161 feed fed
181 ” [Deleted]
226 [Not in source] ’
300 Kolāyans Kōlayans
316 negociate negotiate
317 Bhāskarācharya Bhāskarāchārya
394 tumeric turmeric
495 ’? ?’
*** END OF THE PROJECT GUTENBERG EBOOK CASTES AND
TRIBES OF SOUTHERN INDIA. VOL. 3 OF 7 ***
Updated editions will replace the previous one—the old editions
will be renamed.
Creating the works from print editions not protected by U.S.
copyright law means that no one owns a United States
copyright in these works, so the Foundation (and you!) can copy
and distribute it in the United States without permission and
without paying copyright royalties. Special rules, set forth in the
General Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.
START: FULL LICENSE
THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK
To protect the Project Gutenberg™ mission of promoting the
free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.
Section 1. General Terms of Use and
Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree
to abide by all the terms of this agreement, you must cease
using and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.
1.B. “Project Gutenberg” is a registered trademark. It may only
be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project Gutenberg™
works in compliance with the terms of this agreement for
keeping the Project Gutenberg™ name associated with the
work. You can easily comply with the terms of this agreement
by keeping this work in the same format with its attached full
Project Gutenberg™ License when you share it without charge
with others.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E. Unless you have removed all references to Project
Gutenberg:
1.E.1. The following sentence, with active links to, or other
immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project
Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and
with almost no restrictions whatsoever. You may copy it,
give it away or re-use it under the terms of the Project
Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country
where you are located before using this eBook.
1.E.2. If an individual Project Gutenberg™ electronic work is
derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of
the copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.
1.E.3. If an individual Project Gutenberg™ electronic work is
posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.
1.E.4. Do not unlink or detach or remove the full Project
Gutenberg™ License terms from this work, or any files
containing a part of this work or any other work associated with
Project Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute
this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
1.E.7. Do not charge a fee for access to, viewing, displaying,
performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.
1.E.8. You may charge a reasonable fee for copies of or
providing access to or distributing Project Gutenberg™
electronic works provided that:
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You provide a full refund of any money paid by a user who
notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.
• You provide, in accordance with paragraph 1.F.3, a full refund of
any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.E.9. If you wish to charge a fee or distribute a Project
Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.
1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except
for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU AGREE
THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT
LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT
EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE
THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.
1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you
discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person
or entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.
1.F.4. Except for the limited right of replacement or refund set
forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
1.F.5. Some states do not allow disclaimers of certain implied
warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the
Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you
do or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.
Section 2. Information about the Mission
of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.
Volunteers and financial support to provide volunteers with the
assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.
Section 3. Information about the Project
Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status
by the Internal Revenue Service. The Foundation’s EIN or
federal tax identification number is 64-6221541. Contributions
to the Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.
The Foundation’s business office is located at 809 North 1500
West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact
Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.
The Foundation is committed to complying with the laws
regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or determine
the status of compliance for any particular state visit
www.gutenberg.org/donate.
While we cannot and do not solicit contributions from states
where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.
International donations are gratefully accepted, but we cannot
make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.
Please check the Project Gutenberg web pages for current
donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and
credit card donations. To donate, please visit:
www.gutenberg.org/donate.
Section 5. General Information About
Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.
Project Gutenberg™ eBooks are often created from several
printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
textbookfull.com

More Related Content

PDF
Big Data Analytics For Sensornetwork Collected Intelligence A Volume In Intel...
PDF
Big Data and Smart Service Systems Liu Xiwei
PDF
Big Data Application in Power Systems 1st Edition - eBook PDF
PDF
Wearable and Implantable Medical Devices Applications and Challenges 1st Edit...
PDF
Mobile Security and Privacy Advances Challenges and Future Research Direction...
PDF
Big Data Application in Power Systems 1st Edition - eBook PDF
PDF
Smart Manufacturing: Concepts and Methods Masoud Soroush
PDF
Wearable and Implantable Medical Devices Applications and Challenges 1st Edit...
Big Data Analytics For Sensornetwork Collected Intelligence A Volume In Intel...
Big Data and Smart Service Systems Liu Xiwei
Big Data Application in Power Systems 1st Edition - eBook PDF
Wearable and Implantable Medical Devices Applications and Challenges 1st Edit...
Mobile Security and Privacy Advances Challenges and Future Research Direction...
Big Data Application in Power Systems 1st Edition - eBook PDF
Smart Manufacturing: Concepts and Methods Masoud Soroush
Wearable and Implantable Medical Devices Applications and Challenges 1st Edit...

Similar to Big Data Analytics for Sensor Network Collected Intelligence A volume in Intelligent Data Centric Systems Hui-Huang Hsu (20)

PDF
Systems Factorial Technology A Theory Driven Methodology For The Identificati...
PDF
Wearable and Implantable Medical Devices Applications and Challenges 1st Edit...
PDF
Smart Health International Conference Icsh 2014 Beijing China July 1011 2014 ...
PDF
Industrial Applications of Carbon Nanotubes 1st Edition Peng Huisheng (Ed.)
PDF
Handbook Of Research On Secure Multimedia Distribution Premier Reference Sour...
PDF
Perspectives on Data Science for Software Engineering 1st Edition Tim Menzies
PDF
Big Data In Astronomy Scientific Data Processing For Advanced Radio Telescope...
PDF
Advances in Delay tolerant Networks Dtns Architecture and Enhanced Performanc...
PDF
Wearable and Implantable Medical Devices Applications and Challenges 1st Edit...
PDF
Smart Cities Cybersecurity and Privacy Danda B. Rawat
PDF
Advances in Delay tolerant Networks Dtns Architecture and Enhanced Performanc...
PDF
Smart Health International Conference Icsh 2013 Beijing China August 34 2013 ...
PDF
Generative Adversarial Networks for Image-to-Image Translation 1st Edition Ar...
PDF
Electronic Governance And Crossboundary Collaboration Innovations And Advanci...
PDF
Wearable and Implantable Medical Devices Applications and Challenges 1st Edit...
PDF
Magnetic Skyrmions and Their Applications Giovanni Finocchio And Christos Pan...
PDF
Applied Computing in Medicine and Health 1st Edition Al-Jumeily
PDF
Responsible Genomic Data Sharing Challenges And Approaches 1st Edition Xiaoqi...
PDF
Emotions and Affect in Human Factors and Human Computer Interaction 1st Editi...
PDF
Big Data Analytics For Healthcare Datasets Techniques Life Cycles Management ...
Systems Factorial Technology A Theory Driven Methodology For The Identificati...
Wearable and Implantable Medical Devices Applications and Challenges 1st Edit...
Smart Health International Conference Icsh 2014 Beijing China July 1011 2014 ...
Industrial Applications of Carbon Nanotubes 1st Edition Peng Huisheng (Ed.)
Handbook Of Research On Secure Multimedia Distribution Premier Reference Sour...
Perspectives on Data Science for Software Engineering 1st Edition Tim Menzies
Big Data In Astronomy Scientific Data Processing For Advanced Radio Telescope...
Advances in Delay tolerant Networks Dtns Architecture and Enhanced Performanc...
Wearable and Implantable Medical Devices Applications and Challenges 1st Edit...
Smart Cities Cybersecurity and Privacy Danda B. Rawat
Advances in Delay tolerant Networks Dtns Architecture and Enhanced Performanc...
Smart Health International Conference Icsh 2013 Beijing China August 34 2013 ...
Generative Adversarial Networks for Image-to-Image Translation 1st Edition Ar...
Electronic Governance And Crossboundary Collaboration Innovations And Advanci...
Wearable and Implantable Medical Devices Applications and Challenges 1st Edit...
Magnetic Skyrmions and Their Applications Giovanni Finocchio And Christos Pan...
Applied Computing in Medicine and Health 1st Edition Al-Jumeily
Responsible Genomic Data Sharing Challenges And Approaches 1st Edition Xiaoqi...
Emotions and Affect in Human Factors and Human Computer Interaction 1st Editi...
Big Data Analytics For Healthcare Datasets Techniques Life Cycles Management ...
Ad

Recently uploaded (20)

PPTX
Virtual and Augmented Reality in Current Scenario
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PDF
MICROENCAPSULATION_NDDS_BPHARMACY__SEM VII_PCI .pdf
PDF
HVAC Specification 2024 according to central public works department
PDF
semiconductor packaging in vlsi design fab
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
My India Quiz Book_20210205121199924.pdf
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PPTX
What’s under the hood: Parsing standardized learning content for AI
PDF
Hazard Identification & Risk Assessment .pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
Complications of Minimal Access-Surgery.pdf
Virtual and Augmented Reality in Current Scenario
Introduction to pro and eukaryotes and differences.pptx
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
B.Sc. DS Unit 2 Software Engineering.pptx
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
MICROENCAPSULATION_NDDS_BPHARMACY__SEM VII_PCI .pdf
HVAC Specification 2024 according to central public works department
semiconductor packaging in vlsi design fab
What if we spent less time fighting change, and more time building what’s rig...
My India Quiz Book_20210205121199924.pdf
Cambridge-Practice-Tests-for-IELTS-12.docx
What’s under the hood: Parsing standardized learning content for AI
Hazard Identification & Risk Assessment .pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Share_Module_2_Power_conflict_and_negotiation.pptx
Complications of Minimal Access-Surgery.pdf
Ad

Big Data Analytics for Sensor Network Collected Intelligence A volume in Intelligent Data Centric Systems Hui-Huang Hsu

  • 1. Big Data Analytics for Sensor Network Collected Intelligence A volume in Intelligent Data Centric Systems Hui-Huang Hsu pdf download https://textbookfull.com/product/big-data-analytics-for-sensor- network-collected-intelligence-a-volume-in-intelligent-data- centric-systems-hui-huang-hsu/ Download more ebook from https://textbookfull.com
  • 2. We believe these products will be a great fit for you. Click the link to download now, or visit textbookfull.com to discover even more! Smart Sensors Networks Communication Technologies and Intelligent Applications A volume in Intelligent Data Centric Systems Fatos Xhafa https://textbookfull.com/product/smart-sensors-networks- communication-technologies-and-intelligent-applications-a-volume- in-intelligent-data-centric-systems-fatos-xhafa/ Data Analytics for Intelligent Transportation Systems Mashrur Chowdhury https://textbookfull.com/product/data-analytics-for-intelligent- transportation-systems-mashrur-chowdhury/ Big Data Analytics for Intelligent Healthcare Management 1st Edition Nilanjan Dey https://textbookfull.com/product/big-data-analytics-for- intelligent-healthcare-management-1st-edition-nilanjan-dey/ Big Data Analytics Systems Algorithms Applications C.S.R. Prabhu https://textbookfull.com/product/big-data-analytics-systems- algorithms-applications-c-s-r-prabhu/
  • 3. Computational Intelligence Applications in Business Intelligence and Big Data Analytics 1st Edition Vijayan Sugumaran https://textbookfull.com/product/computational-intelligence- applications-in-business-intelligence-and-big-data-analytics-1st- edition-vijayan-sugumaran/ Traffic Measurement for Big Network Data Chen https://textbookfull.com/product/traffic-measurement-for-big- network-data-chen/ Obtaining Value from Big Data for Service Systems, Volume I: Big Data Management 2nd Edition Steven H. Kaiser https://textbookfull.com/product/obtaining-value-from-big-data- for-service-systems-volume-i-big-data-management-2nd-edition- steven-h-kaiser/ Big Mechanisms in Systems Biology Big Data Mining Network Modeling and Genome Wide Data Identification 1st Edition Bor-Sen Chen https://textbookfull.com/product/big-mechanisms-in-systems- biology-big-data-mining-network-modeling-and-genome-wide-data- identification-1st-edition-bor-sen-chen/ Healthcare Big Data Analytics Computational Optimization and Cohesive Approaches Intelligent Biomedical Data Analysis 10 1st Edition Bhoi https://textbookfull.com/product/healthcare-big-data-analytics- computational-optimization-and-cohesive-approaches-intelligent- biomedical-data-analysis-10-1st-edition-bhoi/
  • 4. Big Data Analytics for Sensor-Network Collected Intelligence
  • 5. Big Data Analytics for Sensor-Network Collected Intelligence Edited by Hui-Huang Hsu Tamkang University, Taiwan Chuan-Yu Chang National Yunlin University of Science and Technology, Taiwan Ching-Hsien Hsu Chung Hua University, Taiwan Series Editor Fatos Xhafa Universitat Politècnica de Catalunya, Spain
  • 6. Academic Press is an imprint of Elsevier 125 London Wall, London EC2Y 5AS, United Kingdom 525 B Street, Suite 1800, San Diego, CA 92101-4495, United States 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom # 2017 Elsevier Inc. All rights reserved No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-12-809393-1 For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals Publisher: Joe Hayton Acquisition Editor: Sonnini R. Yura Editorial Project Manager: Ana Claudia A. Garcia Production Project Manager: Punithavathy Govindaradjane Cover Designer: Victoria Pearson Typeset by SPi Global, India
  • 7. List of Contributors Ahmad Anbar The George Washington University, Washington, DC, United States Haytham Assem IBM, Dublin, Ireland Christophe Blanchet CNRS IFB, Orsay, France Teodora S. Buda IBM, Dublin, Ireland Jiannong Cao The Hong Kong Polytechnic University, Kowloon, Hong Kong Chuan-Yu Chang National Yunlin University of Science and Technology, Douliu City, Yunlin County, Taiwan Jinjun Chen University of Technology Sydney, Broadway, NSW, Australia Cen Chen Hunan University, Changsha, China Szu-Ta Chen National Taiwan University Hospital Yun-Lin Branch, Douliu City, Yunlin County, Taiwan Kang Chen Southern Illinois University, Carbondale, IL, United States Zixue Cheng University of Aizu, Aizuwakamatsu, Japan Cees de Laat University of Amsterdam, Amsterdam, The Netherlands Yuri Demchenko University of Amsterdam, Amsterdam, The Netherlands Mingxing Duan Hunan University, Changsha, China Tarek El-Ghazawi The George Washington University, Washington, DC, United States Weiwei W. Fang Beijing Key Lab of Transportation Data Analysis and Mining, Beijing Jiaotong University, Beijing, China Edmond J. Golden III National Institute of Standards and Technology, Gaithersburg, MD, United States Chu-Cheng Hsieh Slice Technologies Inc., San Mateo, CA, United States xiii
  • 8. Ching-Hsien Hsu Chung Hua University, Hsinchu, Taiwan Hui-Huang Hsu Tamkang University, Tamsui, Taiwan Qian Huang Southern Illinois University, Carbondale, IL, United States Tian-Hsiang Huang National Sun Yat-sen University, Kaohsiung, Taiwan Chih-Chieh Hung Tamkng University, New Taipei City, Taiwan Pravin Kakar Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore Shonali Krishnaswamy Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore Chung-Nan Lee National Sun Yat-sen University, Kaohsiung, Taiwan Kenli Li Hunan University, Changsha, China Keqin Li Hunan University, Changsha, China; State University of New York, New Paltz, NY, United States Xiao-Li Li Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore Qingyong Y. Li Beijing Key Lab of Transportation Data Analysis and Mining, Beijing Jiaotong University, Beijing, China Hai-Ning Liang Xi’an Jiaotong-Liverpool University, Suzhou, China Chen Lin National Yunlin University of Science and Technology, Douliu City, Yunlin County, Taiwan Xuefeng Liu The Hong Kong Polytechnic University, Kowloon, Hong Kong Ming Liu Beijing Key Lab of Transportation Data Analysis and Mining, Beijing Jiaotong University, Beijing, China Charles Loomis SixSq Sàrl, Geneva, Switzerland Chao Lu Southern Illinois University, Carbondale, IL, United States xiv List of Contributors
  • 9. Ka L. Man Xi’an Jiaotong-Liverpool University, Suzhou, China Martial Michel National Institute of Standards and Technology, Gaithersburg, MD, United States Vijayakumar Nanjappan Xi’an Jiaotong-Liverpool University, Suzhou, China Minh N. Nguyen Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore Declan O’Sullivan Trinity College Dublin, Dublin, Ireland Phyo P. San Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore Olivier Serres The George Washington University, Washington, DC, United States Kathiravan Srinivasan National Ilan University, Yilan City, Yilan County, Taiwan Ming-Chun Tsai National Sun Yat-sen University, Kaohsiung, Taiwan Fatih Turkmen University of Amsterdam, Amsterdam, The Netherlands Wei Wang Xi’an Jiaotong-Liverpool University, Suzhou, China Junbo Wang University of Aizu, Aizuwakamatsu, Japan Yilang Wu University of Aizu, Aizuwakamatsu, Japan Chen-Ming Wu National Sun Yat-sen University, Kaohsiung, Taiwan Lei Xu IBM, Dublin, Ireland Chi Yang University of Technology Sydney, Broadway, NSW, Australia Jian-Bo Yang Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), Singapore Zhangdui D. Zhong Beijing Key Lab of Transportation Data Analysis and Mining, Beijing Jiaotong University, Beijing, China xv List of Contributors
  • 10. Preface There are three sources of information we can collect about the environment and the people in the en- vironment: environmental sensors, wearable sensors, and social networks. Through intelligent analysis of the huge amount of sensory data, we can develop various systems to automatically detect natural and man-made events. Moreover, the systems can also try to understand people’s behavior and even inten- tion. Thus better services can be provided to people in an unobtrusive manner. With the advances in sensor and networking technologies, we are now able to collect sensory data easily. These sensory data can be stored and processed in the cloud. Nevertheless, how to properly utilize such a huge amount of data is another essential issue. We certainly hope that advanced ICT technologies can help us perform intelligent analysis on these data and provide better services to people automatically. Exciting new systems and research results have been developed. This book aims to in- troduce these ambient intelligence and Internet of Things (IoT) systems, which are based on big data analytics of collected sensory data. The theme of this book is closely related to two hot topics: the Internet of Things and big data an- alytics. Systems and technologies introduced in the book can be used as supplementary materials for courses involving these two topics. Researchers, professionals, and practitioners in related fields can also find useful information and technologies for their work. There are four parts of this book: big data architecture and platforms; big data processing and management; big data analytics and services; and big data intelligence and IoT systems. Each part includes three or four chapters. Here we briefly in- troduce each of the 14 chapters. Part I: Big Data Architecture and Platforms 1. Big Data: A Classification of Acquisition and Generation Methods Vijayakumar Nanjappan, Hai-Ning Liang, Wei Wang, Ka L. Man This chapter points out that it is very difficult to store, process, and analyze huge amounts of data using conventional computing methodologies and resources. The authors classify the data into digital and analog, environmental and personal. Data types and formats as well as input mechanisms are also highlighted. These will help us understand the active and passive methods of data collection and production. 2. Cloud Computing Infrastructure for Data Intensive Applications Yuri Demchenko, Fatih Turkmen, Cees de Laat, Ching-Hsien Hsu, Christophe Blanchet, Charles Loomis This chapter proposes a cloud-based big data infrastructure (BDI). The general architecture and functional components of BDI are described in detail. BDI is supported by the definition of the big data architecture framework (BDAF). Two case studies in bioinformatics are illustrated in the chapter to provide examples of requirements analysis and implementation. 3. Open Source Private Cloud Platforms for Big Data Martial Michel, Olivier Serres, Ahmad Anbar, Edmond J. Golden III, Tarek El-Ghazawi This chapter tells us that it is beneficial to use private clouds, especially open source clouds, for big data. Security, privacy, and customization are the major concerns. The chapter introduces the most prominent open source clouds in view of big data processing. A case study using an On-Premise Private Cloud is also presented to demonstrate the implementation of such an environment. xvii
  • 11. Part II: Big Data Processing and Management 4. Efficient Nonlinear Regression-Based Compression of Big Sensing Data on Cloud Chi Yang, Jinjun Chen This chapter proposes a compression method for big sensing data based on a nonlinear regression model. It improves the effectiveness and efficiency for processing real-world big sensing data. Regression design, least squares, and triangular transform are discussed in this chapter. It is demonstrated that the model achieves significant storage and time performance gains over other compression models. 5. Big Data Management on Wireless Sensor Networks Chih-Chieh Hung, Chu-Cheng Hsieh This chapter gives an overview of data management issues and solutions in wireless sensor networks. There are two possible models: centralized and decentralized. Data management can be centralized for the benefit of computation, or decentralized for energy saving. Three major issues for data management in both models are introduced: storage, query processing, and data collection. Some case studies are also discussed. 6. Extreme Learning Machine and Its Applications in Big Data Processing Cen Chen, Kenli Li, Mingxing Duan, Keqin Li This chapter first reviews the extreme learning machine (ELM) theory and its variants. Due to its memory-residency and high space/time complexity, the traditional ELM cannot train big data efficiently. Optimization strategies are necessary to solve this problem. Thus, parallel ELM algorithms based on MapReduce and Spark are described. Finally, practical applications of the ELM for big data are also presented in this chapter. Part III: Big Data Analytics and Services 7. Spatial Big Data Analytics for Cellular Communication Systems Junbo Wang, Yilang Wu, Hui-Huang Hsu, Zixue Cheng This chapter surveys methodologies of spatial big data analytics and possible applications to support the cellular communication (CC) system. The CC system provides the most popular way to connect people. However, it still faces challenges, such as unbalanced crowd communication behavior and video transmission congestion. Spatial big data analytics can help the CC system to provide services with better quality of service (QoS). Challenging issues are highlighted in this chapter. 8. Cognitive Applications and Their Supporting Architecture for Smart Cities Haytham Assem, Lei Xu, Teodora S. Buda, Declan O’Sullivan This chapter proposes a cognitive architecture to enable big data applications with sensory data for smart cities. It deals with organization, configuration, security, and optimization. This chapter also reviews related work on location-based social networks and presents a novel approach to detect urban patterns, especially anomalies. This is essential for better understanding of human activities and behaviors. 9. Deep Learning for Human Activity Recognition Phyo P. San, Pravin Kakar, Xiao-Li Li, Shonali Krishnaswamy, Jian-Bo Yang, Minh N. Nguyen This chapter presents a systematic feature learning method for the problem of human activity recognition (HAR). It adopts a deep convolutional neural network (CNN) to automate feature learning from raw inputs. It is not necessary to handcraft features in advance. Such a xviii Preface
  • 12. unification of feature learning and classification results in mutual enhancements. This is verified by comparing experimental results with several state-of-the-art techniques. 10. Neonatal Cry Analysis and Categorization System Via Directed Acyclic Graph Support Vector Machine Szu-Ta Chen, Kathiravan Srinivasan, Chen Lin, Chuan-Yu Chang This chapter introduces a neonatal cry analysis and categorization system. From the cry of the newborn, the system can identify different types of feelings such as pain, sleepiness, and hunger. The sequential forward floating selection (SFFS) algorithm is used to choose the discriminative features. The selected features are then used to classify the neonatal cries by the directed acyclic graph support vector machine (DAG-SVM). The system is useful for parents and nursing staff. Part IV: Big Data Intelligence and IoT Systems 11. Smart Building Applications and Information System Hardware Co-Design Qian Huang, Chao Lu, Kang Chen This chapter emphasizes that a comprehensive understanding of information system hardware is necessary when designing efficient smart building applications. The necessity and importance of application and hardware co-design are discussed in this chapter. A case study is also given to show that application and hardware co-design optimize the smart building design from a system perspective. 12. Smart Sensor Networks for Building Safety Xuefeng Liu, Jiannong Cao This chapter presents the design and implementation of effective and energy-efficient structural health monitoring (SHM) algorithms in resource-limited wireless sensor networks (WSNs). Compared to traditional wired transmission, WSNs are low cost and easy to deploy for building monitoring. Distributed versions of SHM algorithms can help overcome the bandwidth limitation. A WSN-Cloud system architecture is also proposed for future SHM. 13. The Internet of Things and Its Applications Chung-Nan Lee, Tian-Hsiang Huang, Chen-Ming Wu, Ming-Chun Tsai This chapter first compares two lightweight protocols for the Internet of Things (IoT): MQ telemetry transport (MQTT) and the constrained application protocol (CoAP). Both protocols reduce the size of the packet and the over-loading of the bandwidth, thus saving battery power and storage space. The major techniques for big data analytics are then introduced. Finally, intelligent transportation systems and intelligent manufacturing systems are presented as examples. 14. Smart Railway Based on the Internet of Things Qingyong Y. Li, Zhangdui D. Zhong, Ming Liu, Weiwei W. Fang This chapter discusses the framework and technologies for a smart railway based on Internet of Things (IoT) and big data. The architecture of a smart railway, including the perception and action layer, the transfer layer, the data engine layer, and the application layer, is presented first. A case study on intelligent rail inspection is then introduced. This chapter shows that a smart railway is promising in improving traditional railway systems. xix Preface
  • 13. ACKNOWLEDGMENTS This book is a part of the book series “Intelligent Data-Centric Systems.” First of all, we would like to thank the series editor, Prof. Fatos Xhafa, for his encouragement and guidance in developing this book. We gratefully acknowledge all the contributing authors of the chapters. This book would not have been possible without their great efforts. We are also indebted to Ms. Ana Claudia Garcia, the editorial pro- ject manager, and the whole production team at Elsevier for their continuous help in producing this book. Finally, we thank our families for their love and support. Hui-Huang Hsu, Chuan-Yu Chang, Ching-Hsien Hsu September 2016 xx Preface
  • 14. CHAPTER BIG DATA: A CLASSIFICATION OF ACQUISITION AND GENERATION METHODS 1 Vijayakumar Nanjappan, Hai-Ning Liang, Wei Wang, Ka L. Man Xi’an Jiaotong-Liverpool University, Suzhou, China ACRONYMS AUIs adaptive user interfaces BAN body area network BSN body sensor network BSON binary JavaScript object notation records BT business transactions CLI command-line interfaces CPU central processing unit CSV comma-separated values DA data analytics DM data mining DS document store ECG electrocardiography EEG electroencephalogram Email electronic mail EMG electromyography GB gigabyte GPS Global Positioning System GS graph store GUI graphical user interfaces HIDs human interface devices HTML hypertext markup language IoT Internet of Things IR infrared IUI intelligent user interfaces JPEG joint photographic experts group JSON Javascript object notation records KD knowledge discovery KV key-value stores LED light-emitting diode MB megabyte MEMS Micro-Electro Mechanical Systems NoSQL not only structured query language Big Data Analytics for Sensor-Network Collected Intelligence. http://dx.doi.org/10.1016/B978-0-12-809393-1.00001-5 # 2017 Elsevier Inc. All rights reserved. 3
  • 15. NUI natural user interfaces ORC optimized row columnar OS operating system PC personal computer PNG portable network graphics PS proximity sensor RC files Record Columnar files RFID radio frequency identification RPC Remote Procedure Call SD scientific data SF sequence file SI satellite imagery SMD social media data SoC System on Chip VUI voice user interfaces WIMP windows icons menus and pointer device WSN wireless sensor network WWW World Wide Web XML extensible markup language 1 BIG DATA: A CLASSIFICATION The coinage of the term “big data” alludes to datasets of exceptionally massive sizes with distinct and intricate structures. They can be extremely difficult to analyze and visualize with any personal com- puting devices and conventional computational methods [1]. In fact, enormous datasets of complex structures have been generated and used for a long time, for example, in satellite imagery (SI), raster data, geographical, biological, and ecological data; data used for scientific research can also be con- sidered as “big data.” Nowadays, we see that many different kinds of big data exist in our lives, from social media data (SMD), to organization and enterprise data, to the sensor data on the Internet of Things (e.g., metrological data about our environment and healthcare data). 1.1 CHARACTERISTICS OF BIG DATA In 2001, Doug Laney characterized big data from three perspectives, volume, velocity, and variety (the 3Vs) [2]. Volume refers to the magnitude of data, which usually determines the potential value of the data. Velocity refers to speed at which data is generated and processed according the requirements of different applications. Variety refers to the nature and different types of data. Later, the research com- munity proposed two additional Vs: veracity and value. Veracity indicates the trustworthiness and qual- ity of the data. This is particularly important, as big data are usually collected from a variety of sources, some of which may not provide high-quality, reliable data. The term value is used to indicate the po- tential (or hope) that valuable information or insight can be extracted or derived from the big data pro- vided that the data is appropriately processed and analyzed. These characteristics bring new challenges into the data processing and analytics pipeline. As the size of the data is constantly increasing and the 4 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
  • 16. velocity of the data generation is higher than the processing speed, scalable storage and efficient data management methods are needed to enable real-time or near real-time data processing by the analytical tools. To ensure the creditability of the analytics, the quality of the data must be taken into consider- ation, for example, to identify erroneous processes and uncertain, unreliable, or missing data. 2 BIG DATA GENERATION METHODS In today’s digital era, the data unambiguously denote digital data which can be either born-digital or born-analog, but eventually converted into digital form. There have already been large amounts of con- ventional digital data such as Web documents, social media, and business transaction (BT) data. In recent years, the “Internet of Things” (IoT) has generated vast volumes of data about our physical world captured by sensing devices. Many everyday objects are embedded with a variety of sensors capable of collecting analog data and converting it into digital. Besides conventional data, sensor data are becom- ing the next big data source. 2.1 DATA SOURCES 2.1.1 Born-digital data The born-digital data are created and managed using computers or other digital devices. Almost all documents in personal computers are stored in some standardized file formats (e.g., Word or PDF doc- uments). Advances in Internet and World Wide Web (WWW) technologies have enabled computers around the world to be connected so that billions of Web documents can be accessed anywhere. The emergence of Web 2.0 technologies enriched data and media types from text-only to images, videos, and audios, as well as the associated metadata such as temporal and geographical information. We can see now that numerous images and videos are being uploaded to social media websites which are an- notated with location information and tagging data related to their contents. Some of the other tradi- tional big data sources include electronic mails, instant messages, medical records, and business transactions. 2.1.2 Sensor data Recently, billions of physical objects, such as sensors, smartphones, tablets, wearable devices, and ra- dio frequency identifications (RFIDs), embedded with identification, sensing, computing, communi- cation, and actuation capabilities, are increasingly connected to the Internet, resulting in the next technological revolution, known as the “Internet of Things” (IoT). Integration of multiple semiconduc- tor components on a single chip (System on Chip) is the key success of the Internet of Things, which has the potential to revolutionize a large array of intelligent applications and services in many fields. According to Gartner, the network of connected things will reach nearly 20.8 billion by 2020, with around 5.5 million new devices being connected every day [3]. It is estimated that by the end of 2017, sales of worldwide wearable electronic devices will be increased by 39% [4]. In contrast, there is a 9.6% decline in worldwide PC shipments, which indicates that smart devices are more preferred in the market [5]. It is reported that by 2018, new digital devices that can talk to each other in the house- hold will be common [6]. It is estimated that nearly 3 trillion gigabytes of data are produced in a single 5 2 BIG DATA GENERATION METHODS
  • 17. day. The high volumes of heterogeneous data streams coming from these varieties of devices bring great challenges to the traditional data management methods. A widespread example of these portable devices are mobile phones or smart devices, like Apple’s watch, have been integrated with varieties of sensors like accelerometer, gyroscope, compass, Global Positioning System (GPS), and more recently sensors that can capture biometric information such as heart rate. Table 1 lists commonly used sensors on smartphones or tablets. Sensors built on the Micro-Electro Mechanical Systems (MEMS) are small in size and only have limited processing and computing capabilities. A wireless sensor networks (WSN) can be developed by connecting the spatially distributed sensors using wireless interfaces. There can be different kinds of sensors integrated into a single WSN, such as mechanical, magnetic, thermal, biological, chemical, and optical. A sensor can be either immobile or mobile (including wearable). While immobile sensors are installed on an object at a fixed location [7], mobile sensors are usually installed on a moving object. A wearable sensor is a special kind of mobile sensor and is worn on the human body, which can be used to form a body sensor network (BSN) or body area network (BAN) [8]. The fixed sensors can be installed on earth surfaces like terrain [9], or submerged under the water [10] and under the land [11]. In contrast, mobile sensors can move and interact with surrounding phys- ical environments. Wearable sensors are worn by the users and can convert physical or environmental parameters of wearers such as blood pressure [12,13], heart rate [14,15], bodily motion [16], brain ac- tivity [17], and skin temperature [18]. Table 2 summarizes some of the most commonly used sensors in BSNs. 2.2 DATA TYPES Interactions among physical objects, sensors, and people generate massive amounts of data, which can be either structured or unstructured. Table 3 illustrates some of the examples on the different types of the data. Table 1 Common Sensors Integrated in Smartphones and Tablets Sensors on Smartphones Function Microphone The real-world sound and vibration are converted to digital audio Camera Senses visible light or electromagnetic radiation and converts them to digital image or video Gyroscope Provides orientation information Accelerometer Measures the linear acceleration Compass or magnetometer Works as a traditional compass. Provides orientation in relation to the magnetic field of Earth Proximity sensor Finds proximity of the phone from the user Ambient light sensor Optimizes the display brightness GPS Global Positioning System, tracks the target location or “navigates” the things by map with the help of GPS satellites Barometer Measures atmospheric pressure Fingerprint sensor Captures the digital image of fingerprint pattern 6 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
  • 18. 2.2.1 Structured data Structured data are usually defined with fixed attributes, type, and format—for example, records in a relational database are generated according to a predefined schema. Compared to unstructured or semi- structured data, processing of structured data is relatively simpler and more straightforward. This type of data can be generated by people, machines, and sensors. (1) Human-generated structured data: the data are created under explicit human involvement using some interaction mechanisms, e.g., data generated through human-machine interface devices like mouse input data and click-streams. (2) Machine-generated structured data: the data are created automatically by a computing device without explicit human interaction, e.g., Web log data. Table 2 Commonly Used Sensors in Body Area Networks or Body Sensor Networks Sensor Function Blood-pressure sensor Measures human blood pressure Camera pill Measures gastrointestinal tracts Carbon dioxide sensor Measures carbon dioxide gas ECG/EEG/EMG sensor Measures the electrical and muscular functions of the heart Humidity sensor Measures humidity changes Blood oxygen saturation Measures blood oxygen saturation Pressure sensor Measures pressure value Respiration sensor Measures human respiration values Temperature sensor Measures human body temperature Table 3 Data Types and Data Sources Structured Data Unstructured Data Human-generated Input data Click-streams Text documents Social media data Mobile data Web page content Machine-generated Web logs/server logs Satellite imagery Scientific data Image and video Radar data Sensor-generated Fixed sensor data Mobile sensor data 7 2 BIG DATA GENERATION METHODS
  • 19. (3) Sensor-generated structured data: the data are generated by the embedded fixed or moveable sensors, e.g., sensor data from smartphones and smart meters. 2.2.2 Unstructured data Unstructured data are the opposite of structured data, without a predefined data model. Some common examples include text, images, audio, video, and streaming sensor data. Unstructured data are one pri- mary source of big data and are much more challenging to process compared to structure data. Human- generated unstructured data include a large number of data types with different nature, such as textual data (Web documents, licensed publications, e-journals, eBooks, organizational records, e-mails, logs), and media data of different types contributed by ordinary users on social media platforms. Examples of machine-generated unstructured data include scientific data (e.g., astronomical data, geographic, eco- logical, biological, chemical, and geospatial data), satellite images of weather, surveillance data, and radar data (e.g., meteorological and oceanographic seismic data). 3 BIG DATA: DATA ACQUISITION METHODS Human interaction with computers and devices creates vast amounts of data. In the PC era, human interface devices (HIDs), like keyboards and mice, support users in interacting with created digital data. Most of the digital user-generated text data have been created by conventional and widely used major input devices like keyboards and mice (or touchpads in portable computers) with explicit human involvement. Digitized analog data or sensor data are generated using audio and camera devices, known as multimedia data. The introduction of tactile-feedback technology has added an extra dimen- sion to the manner in which people interact with computers. The stylus, a pen-shaped instrument used with tactile-feedback devices and graphics tablets to interact, write on the surface of the screen, making interaction more direct. The stylus and similar haptic-based devices allow users to interact directly with the displayed content with multitouch gestures as an input, in lieu of the physical keyboard and pointing devices. The rise of smart touch-based devices, embedded with sensors, has added diversity to existing interaction methods, enabling richer interactive gesture-based interaction methods. 3.1 INTERFACE METHODS Communication between the user and a computer system is done through various interface mechanisms, especially using input/output devices. In this section, we review some of the most important ones and their evolution, and describe how they contribute to data generation (see Fig. 1 for a summary). 3.1.1 Command-line interfaces The command-line interface (CLI) or character user interface (CUI) is one of the first types of interface methods that allows users to send text-based commands to the system. Text commands are converted to appropriate operating system functions. The CLI is the earliest and oldest form of interface, but offers powerful and concise control over programs. As such, the earliest forms of digital text data are created using CLIs. The amounts of data generated are not significant. This is an important feature, as in earlier systems, memory was limited and expensive. 8 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
  • 20. 3.1.2 Graphical user interfaces The graphical user interface (GUI), popularized by Microsoft Windows, is an interactive visual inter- face rather than a command or text-only interface. The interactive interface tools are visually repre- sented as windows, icons, menus, and a pointer device, which collectively are known as WIMP. The GUI interface also includes a text interface, called the graphical character-based interface. Pres- ently, GUI is the most common and well-known user interface for computers and some earlier mobile devices like mobile phones and laptops. Gracoli, a hybrid interface, combines the strengths of the GUI and CLI to perform application specific interfaces [19]. 3.1.3 Context-sensitive user interfaces Context-sensitive user interfaces is almost pervasively used within GUIs, and allow users to choose automatically from available multiple options based on the current or previous state of the application process. Context menus in GUIs are the principal example of context-sensitive user interfaces. The primary use of the context-sensitive user interface is to simplify the interface by reducing the number of commands or clicks or keystrokes required to perform a given action. This type of interface plays a crucial role where interface devices have limited number of buttons, like video games controlled by a mouse, joystick, or gamepad. With the emergence of mobile devices, whose main input entry is via a touch-based screen, context-sensitive interfaces have found more uses. A variety of contextual options are provided via distinct taps and gestures on the screen. 3.1.4 Web-based user interfaces A Web user interface or Web app allows the user to interact with content or software running on a remote server through a Web browser. The content or Web page is downloaded from the Web server and the user can interact with this content in a Web browser, which acts as a client. The distributed nature allows the content to be stored on a remote server, while the ubiquitous nature of the Web browser permits a convenient access to the content. The most common Web applications are Webmail, online shopping, online document sharing, social media, and instant messaging. A vast amount of data exists now, generated by these types of interfaces. 3.1.5 Adaptive user interfaces or intelligent user interfaces Adaptive user interfaces (AUIs), also referred as intelligent user interface (IUIs), support users’ cus- tomization of the interface by changing the layout and other elements according to the user or context requirements. AUIs are either user-initiated adaptable or system-initiated self-adaptive. Their aim is to Command-line interface Multi-touch gestural interface Natural user interface Graphical user interface FIG. 1 Evolution of user interfaces, user input methods or data generation. 9 3 BIG DATA: DATA ACQUISITION METHODS
  • 21. offer efficient, intuitive, and secure way interfaces to users based on their unique preferences, traits, and environmental circumstances. AUIs are capable of passively recognizing a user’s presence, and offer services based on their im- mediate requirements. 3.1.6 Natural user interfaces The natural user interface (NUI) is a gesture-based simple and intuitive interface, and allows users to naturally interact with the systems without any physical encumbrances, involving body movements, gestures, and voice. 3.1.7 Voice interfaces Voice user interfaces (VUI) are based on speech recognition technology and enable users to interact or send commands to computers or smart devices using voice or speech. This is the most natural way of allowing users to interact with computers or smart devices, similar to how one would communicate with other people. The most commonly used voice interaction roles are command- and agent-based interactions. The command-based interaction allows the user to give speech input to the system, most commonly in a simple but specific predefined order. The agent-based interactions recognize natural language as input and provide an appropriate response through text or audio on the system. Apple’s Siri and Google’s Voice Search are typical examples of voice interfaces. 3.1.8 Gesture-based interfaces Gesture-based interfaces attempt first to recognize gestures as commands. They distinguish continuous physical moments of the users, organizing the hands, fingers, face, head, and body into a discrete se- quence of commands. The successful interpretations of meaningful gestures are recognized by the re- ceiving system, which let users interact with the systems in a more natural way. Sensor evolution also led to the enhancement of novel human gesture-based interaction with smart connected devices in the IoT. The natural, continuous meaningful movements of users, involving their hands, fingers, head, face, and body, can all be part of this process of user-system interaction. This will be an important way of interaction as for instance with the growing emphasis on BSNs where sensors are placed on or attached to the human body to passively capture physiological data and body movements [20]. 3.1.9 Multitouch gesture interface A multitouch interface is a gesture-based interface that supports two or more continuous gestures to interact with touch surfaces. On smart devices, for example, multitouch interfaces allow more direct interaction with applications and are considered to be natural and intuitive gestures. This empowers varieties of actions like taps, swipes, rotations, pinches, and other natural gestures. Touchpads and touchscreens on portable smart devices are powered by multitouch interfaces. The ever increasing dominance of these devices has replaced traditional input devices like keyboards and mice as most data-generating input devices. 3.1.10 Touchless gesture interfaces Touchless gesture interfaces completely eliminate physical contact with a device directly as by touch- ing or indirectly via a secondary device like a mouse. This is thought to that it make interaction even more natural and intuitive by letting users be free of any physical attachments and involving only their 10 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
  • 22. body movements. Second to voice, human movements without the need for any physical controls are closer to how people interact with one another. Touchless gesture interfaces aim to replicate this type of communication, which is achieved through selections of intelligent sensing devices located around the users. A touchless gestural interface represents an intelligent and a natural user interface method for users to interact with systems using intuitive and unencumbered physical movements and gestures [20]. We next review some input devices and tools that enable the creation of the interfaces we have presented above. 3.2 INTERFACE DEVICES Input devices enable the user to input data directly into the computer. The best-known HIDs are the text entry device keyboard and pointing devices like the mouse, trackball, light pen, and stylus, and other devices like the joystick and touchscreen. 3.2.1 Keyboard The keyboard is a typewriter-style device with a series of electronic switches or keys which allow users to send text and alphanumeric data directly into computers. The switches each represent one character. The most common English-language keyboard layout is a typewriter-style QWERTY layout. The stan- dard computer keyboard contains alphabet keys, number keys, punctuation symbol keys, arrow keys, and functional and control keys. The keyboard is the primary peripheral device for data entry. The vir- tual, touchscreen-based keypad is used in mobile devices to simulate the physical keyboard. 3.2.2 Mice The mouse allows a user to manipulate objects indirectly using a pointer-like representation by detect- ing two-dimensional motion in a GUI. The mouse is a hand-controlled device, and typically has one or two buttons. The mouse click is generated by pressing any of the buttons once, holding it, or releasing it immediately. There are different variations of mouse clicks to select objects, move the pointer to the desired location on the display, and input commands into the system. The keyboard and mouse are the most integrated computer peripherals which allow the user to interact with the system. In contrast to the keyboard, the mouse is supported only in GUIs. The trackball is another pointing device very similar to the mouse. 3.2.3 Joystick The joystick is a control column input device with a lever which controls the movement of a pointer in all directions on the display. Similar to the mouse, joysticks include buttons known as triggers for ad- ditional functionality. The joystick is typically used in games and sometimes as a replacement for the mouse in certain situations. Miniature versions of finger-operated joysticks are now adopted in mobile devices. 3.2.4 Stylus The stylus, a pen-shaped input device, allows the user to input commands to the computer, mobile, and other smart devices via their display. The stylus is used on the touchscreen devices to make se- lections by tapping, or writing or drawing on the screen, just like using a pen on a notebook. The 11 3 BIG DATA: DATA ACQUISITION METHODS
  • 23. stylus is more commonly used in portable handheld devices, like laptops and tablets, than on desktop computers. 3.2.5 Touchpad The touchpad is a pointing device or cursor-controlling device for portable computers. Touchpads function a very similar way to mice and contain a tactile sensor to identify the position and motion of the user’s fingers in contact with the pad. Touchpads introduced multitouch gesture-based interface mechanisms. In addition to the taps and swipes features of the touchpad, gesture-based interface allows additional gesture-based application special input methods. 3.2.6 Touchscreens The touchscreen is a combination of both display and input device. A transparent touch-sensitive panel is embedded on the rigid planar surface that recognizes the touch or press of users’ fingers as input. The touchscreen replaced mouse or stylus with users’ fingers as an input device, giving the feeling of more directness to users when they manipulate content on the display. Touchscreens have brought in wide- spread use of multitouch gesture-based input interactions with modern devices. Besides the above more recognizable input devices, there are a number of emergent ones which are becoming more widely used and support new ways of interactive with computing systems. The inte- gration of these emergent devices is only possible by powerful machines that can capture large amounts of data and process them in real-time. In addition, the development of BANs has made the use of some of these devices feasible. We describe some the emergent input devices next. 3.2.7 Kinect The Kinect is device that captures body motion of users with them being placed at a certain distance of a display. Its motion sensors translate a user’s physical body position and movements into commands. Initially developed for the Xbox game consoles, it is now used for other applications and devices. 3.2.8 Leap motion Leap is an in-air gestural user interface device. Leap uses two monochromic infrared (IR) cameras and three IR LEDs covers hemispherical area at a distance of 1 m. It is similar to the Kinect but is designed for closer interaction from any display. 3.2.9 Myo Myo is a muscle-controlled arm worn gestural device. Myo recognizes forearm muscle movements and transmits them wirelessly as valid gestural commands to interact with PCs or other systems. 3.2.10 Wearable devices Wearable devices or gadgets are electronic devices worn by consumers ubiquitously and continually to capture or track biometric information related to health or fitness. Wearable devices are new manifestations of accessories that people wear, such as Apple’s Watch or Samsung’s Gear Watch or more dedicated tools like the Fitbit One wireless activity and sleep tracker and monitor. Wearable devices with biometric tracking capabilities represent one of the most important sources of data gen- eration. They will continuously and uninterruptedly record data of different types and from a variety of environments. 12 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
  • 24. As data increase in variety and volume in parallel to the need to support greater velocity in their generation and processing, it is important to have a way to organize them. Organization and manage- ment of data will therefore be explored next. 4 BIG DATA: DATA MANAGEMENT 4.1 DATA REPRESENTATION AND ORGANIZATION Current systems represent data using a binary digital system. The data types are converted into binary digits represented of 1s or 0s called bits. A byte, equivalent to a sequence of 8-bits, is the fundamental unit of storage. Different standards are used to encode data objects by assigning bit patterns together. In order to utilize the storage space efficiently, data are compressed using various compression tech- niques. One of the major requirements in big data is their low latency in their processing. In contrast with the traditional methods of performing computations on stored data, the data must be processed as it is generated in, or almost near, real-time. Thus low latency is a key requirement in big data analytics (DA). 4.1.1 File formats The file format is the description of how the collection of data is internally represented on a storage medium in a file. Data processing and query performance are heavily based on the file format. In order to reduce the total number of bytes moved from storage disk to temporary memory, data is often com- pressed. Data compression methods save time to transfer data, but with a tradeoff that data have to be decompressed. A selection of file formats has significant performance consequences. The compression support reduces the size of data on the disks to maximize input/output and central processing unit (CPU) resources to de-serialize the data. The query performance is mainly based on the amounts of input/output and CPU resources required to transfer and decompress the data. The file formats can be structured and unstructured. We next describe some important file formats. Javascript object notation records (JSON) JSON is an open, lightweight, highly human, and machine-readable standard based on a subset of the Javascript programming language that allows data interchange. JSON is a platform and language in- dependent text format. It uses the conventions from different language families, including C, C++, C#, Java, Javascript, Perl, Python, and others. JSON supports arrays and understands the different type of standard data types, such as strings, numbers, and Boolean values. Computers can easily parse and generate JSON records that can describe complex data structures. JSON is built on two universal structures: as an object, such as the collection of value pairs, and an array, such as the ordered list of values. Binary Javascript object notation records (BSON) BSON is a binary-encoded serialization of JSON-like documents. The value pairs are stored as a single entity called document. BSON is also lightweight, traversable, and efficient. The extensions in BSON allow representation of data types in addition to the standard JSON types. BSON supports embedding 13 4 BIG DATA: DATA MANAGEMENT
  • 25. documents and arrays with other documents and arrays. In comparison with other binary interchange formats, BSON is more “schema-less.” Comma-separated values (CSV) Comma-separated values (CSV) is a standard file format for spreadsheet data used to exchange data between distinct applications. The data is represented in a text file; each record is represented as one line, and commas are used to separate data fields in each row. CSV is used to exchange data between Hadoop and external systems. Sequence file A sequence file (SF) is a flat, compact binary storage format for serialized key-value pairs. These files provide additional support for two different levels of compression formats like a record and block com- pressions in addition to the uncompressed format. The file metadata is supported with a “secondary” Text/Text pair key-value list. The files can be easily split and processed in parallel. A SF consists of a header followed by one or more records. Record columnar files Record Columnar files (RC files) are intended for efficient and high-performing processing of data. They are flat files and support columnar formats that consist of binary key/value pairs. RC files store columns of a table in a record columnar way by horizontally partitioning the rows into row splits and vertically partitioning them in a columnar way. The metadata of row splits remains stored in keys while data of a row split stored as values. Since being introduced in 2011,1 RC files have been adopted in major real-world systems for big DA, including in Facebook’s Hadoop cluster.2 Optimized row columnar files (ORC files) Optimized row columnar (ORC) files are further optimized and intended to replace RC files. In an ORC file, the collection of the row data is in the columnar format, optimized for compression; these collec- tions of rows are stored in one separate file. This format supports parallel processing of row collections across multiple clusters. The lightweight indexing enables the feature of skipping a complete block that is not required for the requested query. ORC files come with basic statistics on its columns. Parquet files The Apache parquet is a columnar file format, and stores binary data in a column-oriented way. The values of each column are organized adjacent to each other, enabling efficient, flexible compression options and encoding schemes. Parquet file format supports all data processing frameworks and data models. A single parquet file size range can reach up to gigabytes, and is optimized to process large volumes of data, typically suited to data warehouse-style operations. 1 He Y, Lee R, Huai Y, Shao Z, Jain N, Zhang X, et al. RCFile: a fast and space-efficient data placement structure in MapReduce-based warehouse systems. In: Proceedings of the IEEE international conference on data engineering (ICDE); 2011. 2 http://www.slideshare.net/ydn/2-hive-integrationhadoopsummit2010. 14 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
  • 26. Avro files Avro is a binary data storage format, providing data serialization and data exchange services. Avro supports a rich set of primitive data types. The data efficiently serialize into files or a message. The data and data definition are combined together in a single file or message, making Avro schemas to perform rapid serialization. The data stored in a binary format, making it compact and efficient. The data definition is stored in JSON format, making it easy to read and interpret. The markers in Avro files split large datasets into subsets. Avro files support both primitive data and complex data types. Avro handles data schema changes. The data stored in Avro files can easily be portable between dif- ferent programming languages. Avro supports the Remote Procedure Call (RPC) interface in data exchange services to effectively allow different programs to communicate data and information. Avro RPC interfaces and schemas are defined in JSON. Avro heavily relies on its schemas, both data and its schemas are stored in a file. When describing these different file formats, we have also made some references to data compres- sion. This is an important aspect of the management of big data. We shall discuss some salient aspects of data compression next. 4.1.2 Data compression In big data, petabytes of data are captured, stored, and analyzed. The high volumes of data generally increase the input/output operations and transferring these large datasets over the network will take considerable time. The real-time DA need efficient management using these disk input/output and net- work bandwidth resources. Data compression mitigates these problems by not just saving storage space but also increasing the data transfer speed across the network. It is crucial in big data environments to combine data compression and to increase the network transfer to improve the performance of DA ac- tivities. Compression of massive datasets certainly increases the utilization of the CPU, as the data must be decompressed to be processed at a later stage. Hadoop supports multiple compression formats most commonly referred as codec—short name for coder and decoder. There exist a set of compiled Java libraries that can be used in Hadoop to perform data compression and decompression. Each codec has one algorithm implementation for compression and decompression. Hadoop supports both splittable and nonsplittable compression al- gorithms. A splittable algorithm enhances performance as large data blocks are distributed across multiple data nodes and multiple MapReduce tasks decompress data blocks in parallel. Nonsplittable algorithms, on the other hand, combine data blocks together and use one MapReduce task for decompression. There are a number of Hadoop codecs, which we describe next briefly. 4.1.3 Hadoop codecs Deflate uses a combination of the Huffman coding, a form of prefix coding and LZ77 compression, which works by finding and replacing redundant data with metadata. LZ4 is a speed-focused lossless compression algorithm, belongs to a LZ77 based byte-oriented compression scheme. The maximum compression speed is 400 MB/s per core, decompressed speed in multiple GB/s per core, expandable to multicores. Gzip is a file format, based on the Deflate algorithm, used for file compression and decompression. Bzip2 is an open format file compression based on the Burrows-Wheeler algorithm, used to compress single files. Bzip2 uses multiple layers of stacked up compression techniques. 15 4 BIG DATA: DATA MANAGEMENT
  • 27. Snappy codecs, previously known as Zipply, provide very high speed and reasonable compression. The maximum compression speed is 250 MB/s or more and decompression speed about 500 MB/s or more. Snappy is optimized for 64-bit x86-compatible processors. Snappy assumes little-endian throughout and requires byte-swapping of data in several places for big-endian platform. Snappy is a robust and stable system, and has successfully compressed and decompressed petabytes of data in Google’s production environment. Typical compression ratio for plain text data is 1.5–1.7, for HTML about 2–4, and for image data like JPEGs and PNGs and other compressed formats, about 1.0. Files, whether compressed or not, need to be organized properly. The organization of files is usually down to databases. 4.2 DATABASES In contrast to the traditional relational databases, a NoSQL (not only SQL) database is a geographically distributed nonrelational database system. A NoSQL database system runs on multiple cluster nodes, with individual instances of operating systems and built-in storage on each node. This feature support is aimed largely at organizing and analyzing large amounts of heterogeneous data types, regardless of OS. The nodes facsimile data across numerous nodes to ensure that there is no data loss during node failure. The cluster services restore the data from the failed node through a single system image to redistribute the data across the cluster. 4.2.1 Dynamic schema In contrast to traditional relational databases, which require that database schemas should be defined before data insertion, NoSQL permits data insertion without a predefined database schema. This allows applications to integrate schema iteration rapidly in real-time. The side-code is added by the developers to ensure quality controls by keeping specific fields and data types. This validation method imposes authority on data without compromising the benefits of dynamic schema. 4.2.2 Sharding, replication and auto-caching Sharding is a method of storing data records across many server instances. This is done through storage area networks to make hardware perform like a single server. The NoSQL framework is natively designed to support automatic distribution of the data across multiple servers including the query load. Both data and query replacements are automatically distributed across multiple servers located in the different geographic regions, and this facilitates rapid, automatic, and transparent replacement of the data or query instances without any disruption. The cloud computing and platform as a service frame- work makes this feature considerably easier. The most frequently used data are kept in the integrated in- memory database instead of being placed in a separate caching later to maintain the lowest latency and also provide the highest throughput. 4.2.3 NoSQL types Key-value stores Key-value (KV) stores, or key-value databases, are the simplest NoSQL databases. KV stores use an associate array data model, known as a hash or dictionary. In this model, every single record in the database is stored as an attribute name or key, together with its value in a schema-less way. This 16 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
  • 28. relationship is known as key-value pair. In each key-value pair, the key is represented by a string and the value is the data for the key. In particular, the key-value stores do not require a query language, but provide a way to store, retrieve, and update data. Notable key-value databases are Riak, Redis, Memcached, BerkerlyDB, Upscaledb, Amazon DynamoDB, Couchbase, and Project Voldemort. Table 4 shows the comparison of different NoSQL data-models. Document stores Document stores (DS) record data in a key-value pairs in a structured format which the database can understand. Each document contains data and a unique key is assigned to retrieve the document. It allows the adding of new fields of data by including additional key-value pairs into documents. The transparent way of storing data remove query limitations by key. This allows content-oriented retrieval of full-page, often semistructured data with a single query and is suited for content-oriented applications. The documents are in XML, JSON, and BSON file formats. The most notable and popular document databases are MongoDB, CouchDB, Terrastore, OrientDB, RavenDB, and Lotus Notes. Column-oriented stores Column databases, as the name suggests, are designed to record data tables as rows of columns of data. The columns of data always group related data as rows and are associated with a unique row key. This inverse feature of relational database systems provides optimized queries over very big datasets and offers very scalable architecture with extremely high performance. The columnar database is highly compressed to save storage space and is also capable of self-indexing. The most popular column- oriented databases are Cassandra, HBase, Hypertable, and Amazon DynamoDB. Graph stores These data stores are designed to represent data entities and the undetermined interconnected relation- ships between these entities as a graph. The entities are similar to nodes with properties. The edges represent relationships with their own properties, including directional significance. The nodes and their relationships are organized as a graph. The relationship is actually persevered and the data are interpreted in different ways based on their relationships in the graph. This supports rapid traversing of joining or relationships. The nodes can have multiple types of relationships with start and end nodes along with their own properties. The properties of the relationships are used to add intelligence to the relationship and also employed to query the graph. The notable graph databases are Neo4J, Infinite Graph, and OrientDB or FlockDB. Table 4 NoSQL Data-Model Comparison Data-Model Performance Scalability Flexibility Complexity Functionality Key-value store High High High None Variable (none) Column store High High Moderate Low Minimal Document store High Variable (high) High Low Variable (low) Graph store Variable Variable High High Graph theory 17 4 BIG DATA: DATA MANAGEMENT
  • 29. 4.3 DATA FUSION AND DATA INTEGRATION Data are generated from varieties of different sources and each data source carries significant infor- mation that is sufficient to analyze and process the data. The data obtained directly from different sources can have some redundant information and can also have heterogeneous representations. Re- trieval of meaningful information from heterogeneous datasets has limitations. In order to manage data and retrieve valuable information from data efficiently, it is essential to merge heterogeneous datasets into one homogeneous data representation. Data fusion provides this by combining information from multiple sources to form a unified representation. Data fusion can be defined as [21]: “A multi-level process dealing with the association, correlation, combination of data and information from single and multiple sources to achieve refined position, iden- tify estimates and complete and timely assessments of situations, threats and their significance.” An alternative definition is from Hall and Llinas [22]: “data fusion techniques combine data from multiple sensors and related information from associated databases to achieve improved accuracy and more spe- cific inferences than could be achieved by the use of a single sensor alone.” Data fusion systems are used in a wide range of domains such as sensor networks, text proces- sing, and video and image processing, to name a few. In big data, the high velocity of heteroge- neous data types implies the importance of having data fusion. Advance developments in Internet of Things connect networks of sensors. These networks encompass sensor nodes and at least one base station. Every sensor nodes are integrated with sensors, data processing tools, a radio com- munication system, and a battery. In these networks, raw data may present redundant information and provide sufficient information about its relevance. In multisensor networks, transmitting raw data can cause data collisions and there could be a higher chance of having inaccurate/unreliable information from some abnormal nodes. In order to aggregate valid data to yield effective infor- mation, it is essential to process the data. Data fusion facilitates better usage of network bandwidth, a great network lifetime, utilizes the energy resources, and above all offers an efficient and high level of accurate information retrieval. As such, data fusion represents one of the bigger challenges in big data. 5 SUMMARY The objective of this chapter is to give a broad overview of acquisition and generation methods of big data. In the digital century, the term “big data” has expanded its boundary from scientific data (e.g., satellite imagery data and geographical data) to the sensor data on the Internet of Things (e.g., metrological data and healthcare data). The new boundary adds more characteristics, known as volume, velocity, variety, veracity, and value—the Vs of big data. In the same way, the expansion also brings new challenges into the big data processing and analytics pipeline. In fact, the coinage “big data” unambiguously denotes digital data, either born-digital or converted into digital data from born-analog. The computers or other digital devices are the main sources of born-digital data, whereas born-analog or sensor data are captured by various sensing devices. These data are not only predefined with a data model, known as structured data, but also without any predefined model, branded as unstructured data. Moreover, these massive amounts of data are generated with or without explicit human involvement. The tactile-feedback technology has added an extra dimension to the 18 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
  • 30. manner in which human interact with computers or devices. Equally, the rise of smart devices, with embedded sensors, has incorporated more diversity to existing interaction methods. Different types of big data are created from well-known text-only keyboard to rapidly growing wearable devices and are successfully converted into binary digits, or bits. A selection of file formats, notably JSON, BSON, CSV, and RC files, are used to store the collection of data. The data are com- pressed to reduce the size on storage disks to maximize input/output and CPU resources. A geographically distributed nonrelational database system, NoSQL is used to handle unstructured data. NoSQL encompasses the following different types of database technologies: key-value stores, document stores, column-oriented stores, and graph stores. Additionally, NoSQL permits dynamic data insertion without a predefined database scheme compared relational database predefined schema. Data fusion merges heterogeneous datasets into one homogeneous data representation by combining infor- mation from multiple sources to form a unified representation. REFERENCES [1] Sagiroglu S, Sinanc D. Big data: a review. In: International conference on Collaboration Technologies and Systems (CTS); 2013. p. 42–7. [2] Laney D. 3D data management: Controlling data volume, velocity and variety. META Group Research Note 6, 2001. p. 70. [3] Gartner Says 6.4 Billion Connected [Internet]. Available from: http://www.gartner.com/newsroom/id/ 3165317 [cited 20.04.16]. [4] Gartner Says Worldwide Wearable Devices Sales to Grow 18.4 Percent in 2016 [Internet]. Available from: http://www.gartner.com/newsroom/id/3198018 [cited 20.04.16]. [5] Gartner Says Worldwide PC Shipments Declined 9.6 Percent in First Quarter of 2016 [Internet]. Available from: http://www.gartner.com/newsroom/id/3280626 [cited 20.04.16]. [6] When to Expect Devices and Connected [Internet]. Available from: http://www.gartner.com/newsroom/id/ 3220117 [cited 20.04.16]. [7] Yick J, Mukherjee B, Ghosal D. Wireless sensor network survey. Comput Netw 2008;52(12):2292–330. [8] Lai X, Liu Q, Wei X, Wang W, Zhou G, Han G. A survey of body sensor networks. Sensors 2013;13 (5):5406–47. [9] Akyildiz IF, Su W, Sankarasubramaniam Y, Cayirci E. A survey on sensor networks. IEEE Commun Mag 2002;40(8):102–14. [10] Akyildiz IF, Pompili D, Melodia T. Challenges for efficient communication in underwater acoustic sensor networks. SIGBED Rev 2004;1(2):3–8. [11] Li M, Liu Y. Underground structure monitoring with wireless sensor networks. In: Proceedings of the 6th international conference on information processing in sensor networks (IPSN ’07) [Internet]. New York: ACM; 2007. p. 69–78. Available from http://doi.acm.org/10.1145/1236360.1236370 [cited 22.04.16]. [12] Espina J, Falck T, Muehlsteff J, Aubert X. Wireless body sensor network for continuous cuff-less blood pres- sure monitoring. In: 3rd IEEE/EMBS international summer school on medical devices and biosensors; 2006. p. 11–5. [13] Teng XF, Zhang YT, Poon CCY, Bonato P. Wearable medical systems for p-health. IEEE Rev Biomed Eng 2008;1:62–74. [14] Paradiso R, Loriga G, Taccini N. A wearable health care system based on knitted integrated sensors. IEEE Trans Inf Technol Biomed 2005;9(3):337–44. 19 REFERENCES
  • 31. [15] Rienzo MD, Rizzo F, Parati G, Brambilla G, Ferratini M, Castiglioni P. MagIC system: a new textile-based wearable device for biological signal monitoring. Applicability in daily life and clinical setting. In: IEEE engineering in medicine and biology 27th annual conference; 2005. p. 7167–9. [16] Mattmann C, Clemens F, Tr€ oster G. Sensor for measuring strain in textile. Sensors 2008;8(6):3719–32. [17] Devot S, Bianchi AM, Naujoka E, Mendez MO, Braurs A, Cerutti S. Sleep monitoring through a textile re- cording system. In: 29th annual international conference of the IEEE Engineering in Medicine and Biology Society; 2007. p. 2560–3. [18] Jung S, Ji T, Varadan VK. Point-of-care temperature and respiration monitoring sensors for smart fabric ap- plications. Smart Mater Struct 2006;15(6):1872. [19] Verma P. Gracoli: a graphical command line user interface. In: CHI’13 extended abstracts on human factors in computing systems (CHI EA’13) [Internet]. New York: ACM; 2013. p. 3143–6. Available from http://doi. acm.org/10.1145/2468356.2479631 [cited 30.03.16]. [20] Garzotto F, Valoriani M. Touchless gestural interaction with small displays: a case study. In: New York: ACM Press; 2013. p. 1–10. Available from http://dl.acm.org/citation.cfm?doid¼2499149.2499154 [cited 02.07.15]. [21] White FE. Data Fusion Lexicon, Joint Directors of Laboratories, Technical Panel for C3, Data Fusion Sub- Panel. San Diego, CA: Naval Ocean Systems Center; 1991. [22] Hall DL, Llinas J. An introduction to multisensor data fusion. Proc IEEE 1997;85(1):6–23. GLOSSARY Data analytics It is the science of exploring large amounts of data to discover hidden patterns and correlations, and draw conclusions based on the findings. Data mining and Knowledge discovery It is an interdisciplinary computational process to analyze data for dis- covering useful knowledge from data. Raster data It is a data structure that is represented as a regular grid (rectangular or square) of cells. Satellite imagery It is the collection images of Earth and other planets collected by satellites. Scientific research It is the systematic investigation of scientific theories and hypotheses. 20 CHAPTER 1 BIG DATA: ACQUISITION AND GENERATION
  • 32. CHAPTER CLOUD COMPUTING INFRASTRUCTURE FOR DATA INTENSIVE APPLICATIONS 2 Yuri Demchenko*, Fatih Turkmen*, Cees de Laat*, Ching-Hsien Hsu† , Christophe Blanchet{ , Charles Loomis§ University of Amsterdam, Amsterdam, The Netherlands* Chung Hua University, Hsinchu, Taiwan† CNRS IFB, Orsay, France{ SixSq Sàrl, Geneva, Switzerland§ ACRONYMS API application programming interface ASP application service provider AWS Amazon Web Services BDAF Big Data Architecture Framework BDE Big Data Ecosystem BDI Big Data Infrastructure BDLM Big Data Lifecycle Management BDRA NIST Big Data Reference Architecture CCRA NIST Cloud Computing Reference Architecture (NIST SP 500-292) CEOS Committee on Earth Observation Satellites CLI command line interface CPR Capability (framework) provider requirements CSDI cloud services delivery infrastructure CSP cloud service provider DACI dynamic access control infrastructure DSR data sources requirements EC2 Elastic Compute Cloud, IaaS cloud service provided by AWS/Amazon ECL Enterprise Control Language by LexisNexis (currently open source) EDW enterprise data warehouse EMR Elastic MapReduce ETL extract-transform-load FADI Federated Access and Delivery Infrastructure GCE Google Compute Engine cloud HDFS Hadoop Distributed File System HPC high performance computing IaaS Infrastructure as a Service ICAF Intercloud Architecture Framework ICFF Intercloud Federation Framework, part of ICAF ICT information communication technologies IDE integrated development environment Big Data Analytics for Sensor-Network Collected Intelligence. http://dx.doi.org/10.1016/B978-0-12-809393-1.00002-7 # 2017 Elsevier Inc. All rights reserved. 21
  • 33. Exploring the Variety of Random Documents with Different Content
  • 34. purpose, and given to the relations, who place the winnows on the roof of the house till the following day, when the food is eaten. By some Koravas, a ceremony in honour of the departed ancestors is performed at the time of the November new moon. A well-polished brass vessel, with red and white marks on it, is placed in the corner of a room, which has previously been swept, and purified with cow-dung. In front of the pot is placed a leaf plate, on which cooked rice and other edibles are set. Incense is burned, and the eldest son of the house partakes of the food in the hope that he, in due course, will be honoured by his offspring. The Koramas of Mysore are said to experience considerable difficulty in finding men to undertake the work of carrying the corpse to the grave. Should the dead Korama be a man who has left a young widow, it is customary for some one to propose to marry her the same day, and, by so doing, to engage to carry out the principal part of the work connected with the burial. A shallow grave, barely two feet deep, is dug, and the corpse laid therein. When the soil has been loosely piled in, a pot of fire, carried by the chief mourner in a split bamboo, is broken, and a pot of water placed on the raised mound. Should the spot be visited during the night by a pack of jackals, and the water drunk by them to slake their thirst after feasting on the dead Korama, the omen is accepted as proof that the liberated spirit has fled away to the realms of the dead, and will never trouble man, woman, child, or cattle. On the sixth day, the chief mourner must kill a fowl, and mix its blood with rice. This he places, with some betel leaves and nuts, near the grave. If it is carried off by crows, everything is considered to have been settled satisfactorily. As regards the dress of the Koravas, Mr. Mullaly writes as follows. “The women wear necklaces of shells and cowries interspersed with beads of all colours in several rows, hanging low down on the bosom; brass bangles from the wrist to the elbow; brass, lead, and silver rings, very roughly made, on all their fingers except the middle one. The cloth peculiar to Koravar women is a coarse black one; but they are, as a rule, not particular as to this, and wear stolen cloths after removing the borders and all marks of identification. They also wear the chola, which is fastened across the bosom, and not, like the Lambādis, at the back. The men are dirty, unkempt-looking objects, wear their hair long, and usually tied in a knot on the top of the head, and indulge in little finery. A joochi (gochi), or cloth round the loins, and a bag called vadi sanchi, made of striped cloth, complete their toilet.” In 1884, Mr. Stevenson, who was then the District Superintendent of Police, North Arcot, devised a scheme for the regeneration of the Koravas of that district. He obtained for the tribe a tract of Government land near Gudiyattam, free of assessment for ten years, and also a grant of Rs. 200 for sinking wells. Licenses were also issued to the settlers to cut firewood at specially favourable rates. He also prevailed upon the
  • 35. Zemindar of Karvetnegar to grant twenty-five cawnies of land in Tiruttani for ten years for another settlement, as well as some building materials. Unfortunately the impecunious condition of the Zemindar precluded the Tiruttani settlement from deriving any further privileges which were necessary to keep the colony going, and its existence was, therefore, cut short. The Gudiyattam colony, on the other hand, exhibited some vitality for two or three years, but, in 1887, it, too, went the way of the Tiruttani colony.”226 I gather, from the Police Administration Report, 1906, that a scheme is being worked out, the object of which is to give a well-known wandering criminal gang some cultivable land, and so enable the members of it to settle down to an honest livelihood. At the census, 1891, Korava was returned as a sub-division of Paraiyans, and the name is also applied to Jōgis employed as scavengers.227 The following note on the Koravas of the west coast is interesting as showing that Malabar is one of the homes of the now popular game of Diavolo, which has become epidemic in some European countries. “In Malabar, there is a class of people called Koravas, who have, from time immemorial, played this game almost in the same manner as its Western devotees do at the present time. These people are met with mostly in the southern parts of Malabar, Cochin and Travancore, and they speak the Malayālam language with a sing-song accent, which easily distinguishes them from other people. They are of wandering habits. The men are clever acrobats and rope-dancers, but those of more settled habits are engaged in agriculture and other industries. The beautiful grass mats, known as Palghat mats, are woven by these people. Their women are fortune- tellers and ballad singers. Their services are also in demand for boring the ears of girls. The ropedancers perform many wonderful feats while balancing themselves on the rope, among them being the playing of diabolo while walking to and fro on a tight rope. The Korava acrobat spins the wooden spool on a string, attached to the ends of two bamboo sticks, and throws it up to the height of a cocoanut tree, and, when it comes down, he receives it on the string, to be again thrown up. There are experts among them who can receive the spool on the string without even looking at it. There is no noteworthy difference in the structure and shape of the spool used by the Koravas, and those of Europe, except that the Malabar apparatus is a solid wooden thing a little larger and heavier than the Western toy. It has not yet emerged from the crude stage of the village carpenter’s skill, and cannot boast of rubber tyres and other embellishments which adorn the imported article; but it is heavy enough to cause a nasty injury should it hit the performer while falling. The Koravas are a very primitive people, but as acrobats and ropedancers they have continued their profession for generations past, and there is no doubt that they have been expert diabolo players for many years.”228 It may be noted that Lieutenant Cameron, when journeying from Zanzibar to Benguela, was detained near Lake Tanganyika by a native chief. He relates as follows. “Sometimes a slave of Djonmah would amuse us by his dexterity. With two sticks about a foot long connected by a string of a certain length, he spun a piece of wood cut in the shape of an hour-glass,
  • 36. throwing it before and behind him, pitching it up into the air like a cricket-ball, and catching it again, while it continued to spin.”
  • 37. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Gazetteer of the Bellary district. Madras Diocesan Magazine, June, 1906. John S. Chandler, a Madura Missionary, Boston. Madras Mail, November, 1905. J. Hornell. Report on the Indian Pearl Fisheries of the Gulf of Manaar, 1905. Madras Diocesan Mag., 1906. Notes from a Diary, 1881–86. Lecture delivered at Trivandrum, MS. Nineteenth Century, 1898. Malay Archipelago. Monograph. Ethnog: Survey of Cochin, No. 9, 1906. Malabar Manual. Manual of the Coimbatore district. Madras Journ. Lit. Science, I. 1833. W. W. Skeat and C. O. Blagden. Pagan Races of the Malay Peninsula, 1906. Gazetteer of the Malabar district. Madras Census Report, 1891. Manual of Malabar. Manual of the North Arcot district.
  • 38. 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Madras Census Report, 1891. Manual of the North Arcot district. A reddish formation found all over Southern India. Op. cit. Journey through Mysore, Canara, and Malabar. Rev. H. Jensen. Classified Collection of Tamil Proverbs, 1897. Manual of the North Arcot district. Gazetteer of the Trichinopoly district. For this note I am indebted to Mr. N. Subramani Aiyar. Madras Census Report, 1891. Mokhalingam is in Ganjam, not Vizagapatam. Place of meeting, which is a large tamarind tree, under which councils are held. Gazetteer of the Madura district. Gazetteer of the Madura district. Sētupati, or lord of the bridge. The title of the Rājas of Rāmnād. Gazetteer of the Madura district. Madras Census Report, 1891. Manual of the Madura district. G. Oppert. Madras Journ. Lit. Science, 1888–9. Madras Census Report, 1901. Notes on Criminal Classes of the Madras Presidency. Madras Review, 1899. Madras Census Report, 1891. Gazetteer of the Madura district. Op. cit. Illustrated Criminal Investigation and Law Digest, I, 3, 1908, Vellore.
  • 39. 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 Madras Journ. Lit. Science, XXV. I am informed that only Mēl-nādu, Sīrukudi, Mella-kōttai, and Puramalai are endogamous. Madras Census Report, 1891. Manual of the Madura district. Hindu Feasts, Fasts, and Ceremonies, 1903. The Tamils eighteen hundred years ago, 1904. Madras Census Report, 1901. Gazetteer of the Tanjore district. Madras Census Report, 1891. Madras Mail, 1908. Ind. Ant., III., 1874. A lakh = a hundred thousand. Compare the theft of Laban’s teraphim by Rachel. Genesis, XXXI, 19. Gazetteer of the Tanjore district. Madras Census Report, 1891. Madras Census Report, 1901. Manual of the North Arcot district. Ind. Ant., VIII, 1879. Manual of the North Arcot district. Madras Census Report, 1891. Hutchinson. Marriage Customs in many lands, 1897. Gazetteer of the Bellary district. Gazetteer of the Anantapur district. Mediæval Sinhalese Art. Maduraikanchi, Line 521. E. Hultzsch. South Indian Inscriptions, II, i, 44, 46, 1891.
  • 40. 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 Ibid. III, i, 47, 1899. Madras Census Report, 1891. Madras Census Report, 1871. New Asiatic Review, Jan. 1907. Madras Mail, 1907. Classified Collection of Tamil Proverbs, 1897, from which some of the proverbs quoted are taken. Madras Census Report, 1901. See the legendary story narrated in the article on Tiyans. Malabar and its Folk, 1900. Letters from Malabar. Madras Census Report, 1901. Ind. Ant., VIII, 1879. Gazetteer of the Vizagapatam district. Yule and Burnell, Hobson-Jobson. Monograph, Eth. Survey of Cochin, No. 4, 1905. Unhusked rice. Manual of the South Canara district. Money-lender. Malabar Quarterly Review, 1905. Indian Review, III, 1902. Monograph, Ethnog. Survey, Cochin. According to another version of the legend, it was the hut of a Tiyan. Malabar Manual. Gazetteer of the Malabar district. C. Karunakara Menon. Madras Mus. Bull., V, 2, 1906. Madras Mus. Bull., II, 3, 1901.
  • 41. 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 This account is mainly from an article by Mr. N. Subramani Aiyar. Ind. Ant., IX, 1880. Manual of the North Arcot district. Madras Census Report, 1901. Historical Sketches, Mysore. Gazetteer of the Madura district. Gazetteer of the Madura district. Madras Census Report, 1891. Dynasties of the Kanarese Districts of the Bombay Presidency. Loc. cit., and Manual of the North Arcot district. Section III, Inhabitants, Madras Government Press, 1907. Ind. Ant., VIII, 1879. J. F. Kearns. Kalyāna shatanku. Madras Series, IV, 1882; VI, 1883. Illatakaru, a bride’s father having no son, and adopting his son-in-law. See further C. Ramachendrier. Collection of Decisions of High Courts and the Privy Council applicable to dancing-girls, illatom affiliation, etc., Madras, 1892. Gazetteer of the Anantapur district. Madras Mail, Nov. 1905. Madras Mail, 1905. Gazetteer of the Malabar district. Manual of the North Arcot district. Gazetteer of the Malabar district. Tamil and English Dictionary, 1862. The word, in this sense, is said to occur in a Tamil work named Pingala Nikandu. Karuku is Tamil for the serrated margin of the leaf—petiole of the palmyra palm. Yule and Burnell. Hobson-Jobson.
  • 42. 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 Madras Census Report, 1901. Manual of the Salem district. Madras Census Report, 1901. Manual of the Madura district. Manual of the Tanjore district. Madras Census Report, 1891. Madras Christ. Coll. Mag., 1894. Malabar Law and Custom. Ind. Ant., VIII, 1879. Mysore and Coorg Gazetteer. Madras Census Report, 1901. Journ. Anthrop. Inst., II, 1873. Gazetteer of the Anantapur district. Indian Review, VII, 1906. See Ravi Varma, the Indian Artist. Indian Press, Allahabad. Madras Census Report, 1891. Madras Museum Bull., V. 3, 1907. Epigraphia Indica, VI, 1900–1901. Rev. J. Cain, Ind. Ant., VIII, 1879. Trans. Ethnolog. Soc., London, 1869; Ind. Ant., VIII, 1879. Original Inhabitants of Bhārathavarsha. The panas have reference to the division of South Indian castes into the right- and left- hand factions. The mofussil indicates up-country stations and districts, as contra-distinguished from the “Presidency” (Madras City). Marriage Customs in Many Lands, 1897. Moore. Indian Appeal Cases, Vol. III, 359–82.
  • 43. 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 Journey through Mysore, Canara and Malabar. See Talboys Wheeler, Madras in the Olden Time, II, 49–89. See Tales of Kōmati Wit and Wisdom. C. Hayavadana Rao, Madras, 1907. Classified Collection of Tamil Proverbs, 1897. See also C. Hayavadana Rao, op. cit., and Ind. Ant., XX, 78, 1891. Madras Census Report, 1871. Gazetteer of the Godāvari district. Linguistic Survey of India, IV, 1906. Man. March 1902. G.O., No. 1020, Public, 8th October 1901. G.O., No. 3005, Revenue, 3rd November 1908. Occasional Essays on Native South Indian Life, 1901. Madras Census Report, 1881. Agricul: Ledger Series, Calcutta. No. 7, 1904. Madras Mail, 1894. Gazetteer of the Vizagapatam district. A very interesting note on Totemism among the Khonds by Mr. J. E. Friend-Pereira has been published in the Journal of Asiatic Society of Bengal, LXXIII, 1905. The Golden Bough, 1900. Selections from the Records, Government of India, No. V, Human Sacrifice and Infanticide, 1854. Personal Narrative of Service among the Wild Tribes of Khondistan. Manual of the Vizagapatam district. Journ. Asiat. Soc., Bengal, 1898. Madras Mail, 1894. Selections from the Records of the Government of India (Home Department), V., 1845. J. A. R. Stevenson. Madras Journ: Lit. Science, VI, 1837.
  • 44. 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 J. E. Friend-Pereira. Journ: Asiat: Soc. Bengal, LXXI, 1902. Madras Journ: Lit. Science, VI, 1837. Loc. cit. Journ. Anthrop. Soc., Bombay, II, 249. Madras Mail, 1896. Macpherson. Memorials of Service in India. Journ., Anth. Soc., Bombay, II, 1890. Ibid. Madras Police Report, 1904. Madras Mail, 1894. Madras Mail, 1908. See G.O., Judicial, 14th August 1882, No. 952, Khond Rising. Manual of the South Canara district. Letters from Malabar. Translation. Madras, 1862. Fine cakes made of gram flour and a fine species of alkali, which gives them an agreeable taste, and serves the purpose of making them rise and become very crisp when fried. Journ. Anthrop. Inst., IV., 1875. Madras Christ. Coll. Mag. III, 1885–6. Ind. Ant. X, 1881. Manual of the South Canara district. Journ. Anthrop. Inst. IV, 1875. Manual of the South Canara district. Manual of the South Canara district. M. Paupa Rao Naidu. History of Railway Thieves. Madras Census Report, 1891. Madras Journ. Lit: and Science, 1888–89.
  • 45. 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 Tirumurukairuppadai. Madras Census Report, 1901. Indian Antiquity, IX, 1880. Cyclopædia of India. Loc. cit. Note on Koravas, 1908. Notes on Criminal Classes of the Madras Presidency. Forest Inspection Report, 1896. Gazetteer of the Bellary district. Gazetteer of the Vizagapatam district. F. S. Mullaly. Op. cit. Madras Journ. Lit. Science, XVII, 1853. History of Railway Thieves. Madras, 1904. Madras Census Report, 1901. Gazetteer of the Trichinopoly district. This story is based on well-known episode of Nalacharitra in the Āranya Parva of the Mahabharatha. M. Paupa Rao Naidu. Op. cit. Ibid. Police Report, 1902. Op. cit. A varāha or pagoda was worth Rs. 3–8–0. A seer is an Indian measure of weight, varying in different parts of the country. Trans. Eth. Sec. N.S., VII. J. F. Kearns, Kalyāna Shatanku, 1868. Ind. Ant., III., 1874. Madras Census Report, 1871.
  • 46. 223 224 225 226 227 228 Madras Census Report, 1901. India. Trübner. Oriental Series. Ind. Ant., III, 1874. Madras Mail, 1907. For this account of the Koravas, I am largely indebted to a report by Mr. N. E. Q. Mainwaring, Superintendent of Police. Madras Mail, 1908.
  • 47. Colophon Availability This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org . This eBook is produced by the Online Distributed Proofreading Team at www.pgdp.net . Volume Contents First Article I A and B Abhishēka II C to J Canji III K Kabbēra VI K to M Kōri V M to P Marakkāyar VI P to S Palli VII T to Z Tābēlu Scans of this book are available from the Internet Archive (copy 1 , 2 ). Project Gutenberg catalog page: 42993 . Related Library of Congress catalog page: 10014128 . Related Open Library catalog page (for source): OL7024564M . Related Open Library catalog page (for work): OL1106958W .
  • 48. Related WorldCat catalog page: 1967849 . Encoding Revision History 2011-08-08 Started. External References This Project Gutenberg eBook contains external references. These links may not work for you. Corrections The following corrections have been applied to the text: Page Source Correction 9 [Not in source] ; 96 Gāmpa Gampa 102 annointing anointing 103 Gangimakkulu Gangimakkalu 155 negociations negotiations 160 orginally originally 161 feed fed 181 ” [Deleted] 226 [Not in source] ’ 300 Kolāyans Kōlayans 316 negociate negotiate 317 Bhāskarācharya Bhāskarāchārya 394 tumeric turmeric 495 ’? ?’
  • 49. *** END OF THE PROJECT GUTENBERG EBOOK CASTES AND TRIBES OF SOUTHERN INDIA. VOL. 3 OF 7 *** Updated editions will replace the previous one—the old editions will be renamed. Creating the works from print editions not protected by U.S. copyright law means that no one owns a United States copyright in these works, so the Foundation (and you!) can copy and distribute it in the United States without permission and without paying copyright royalties. Special rules, set forth in the General Terms of Use part of this license, apply to copying and distributing Project Gutenberg™ electronic works to protect the PROJECT GUTENBERG™ concept and trademark. Project Gutenberg is a registered trademark, and may not be used if you charge for an eBook, except by following the terms of the trademark license, including paying royalties for use of the Project Gutenberg trademark. If you do not charge anything for copies of this eBook, complying with the trademark license is very easy. You may use this eBook for nearly any purpose such as creation of derivative works, reports, performances and research. Project Gutenberg eBooks may be modified and printed and given away—you may do practically ANYTHING in the United States with eBooks not protected by U.S. copyright law. Redistribution is subject to the trademark license, especially commercial redistribution. START: FULL LICENSE
  • 50. THE FULL PROJECT GUTENBERG LICENSE
  • 51. PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK To protect the Project Gutenberg™ mission of promoting the free distribution of electronic works, by using or distributing this work (or any other work associated in any way with the phrase “Project Gutenberg”), you agree to comply with all the terms of the Full Project Gutenberg™ License available with this file or online at www.gutenberg.org/license. Section 1. General Terms of Use and Redistributing Project Gutenberg™ electronic works 1.A. By reading or using any part of this Project Gutenberg™ electronic work, you indicate that you have read, understand, agree to and accept all the terms of this license and intellectual property (trademark/copyright) agreement. If you do not agree to abide by all the terms of this agreement, you must cease using and return or destroy all copies of Project Gutenberg™ electronic works in your possession. If you paid a fee for obtaining a copy of or access to a Project Gutenberg™ electronic work and you do not agree to be bound by the terms of this agreement, you may obtain a refund from the person or entity to whom you paid the fee as set forth in paragraph 1.E.8. 1.B. “Project Gutenberg” is a registered trademark. It may only be used on or associated in any way with an electronic work by people who agree to be bound by the terms of this agreement. There are a few things that you can do with most Project Gutenberg™ electronic works even without complying with the full terms of this agreement. See paragraph 1.C below. There are a lot of things you can do with Project Gutenberg™ electronic works if you follow the terms of this agreement and help preserve free future access to Project Gutenberg™ electronic works. See paragraph 1.E below.
  • 52. 1.C. The Project Gutenberg Literary Archive Foundation (“the Foundation” or PGLAF), owns a compilation copyright in the collection of Project Gutenberg™ electronic works. Nearly all the individual works in the collection are in the public domain in the United States. If an individual work is unprotected by copyright law in the United States and you are located in the United States, we do not claim a right to prevent you from copying, distributing, performing, displaying or creating derivative works based on the work as long as all references to Project Gutenberg are removed. Of course, we hope that you will support the Project Gutenberg™ mission of promoting free access to electronic works by freely sharing Project Gutenberg™ works in compliance with the terms of this agreement for keeping the Project Gutenberg™ name associated with the work. You can easily comply with the terms of this agreement by keeping this work in the same format with its attached full Project Gutenberg™ License when you share it without charge with others. 1.D. The copyright laws of the place where you are located also govern what you can do with this work. Copyright laws in most countries are in a constant state of change. If you are outside the United States, check the laws of your country in addition to the terms of this agreement before downloading, copying, displaying, performing, distributing or creating derivative works based on this work or any other Project Gutenberg™ work. The Foundation makes no representations concerning the copyright status of any work in any country other than the United States. 1.E. Unless you have removed all references to Project Gutenberg: 1.E.1. The following sentence, with active links to, or other immediate access to, the full Project Gutenberg™ License must appear prominently whenever any copy of a Project Gutenberg™ work (any work on which the phrase “Project
  • 53. Gutenberg” appears, or with which the phrase “Project Gutenberg” is associated) is accessed, displayed, performed, viewed, copied or distributed: This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook. 1.E.2. If an individual Project Gutenberg™ electronic work is derived from texts not protected by U.S. copyright law (does not contain a notice indicating that it is posted with permission of the copyright holder), the work can be copied and distributed to anyone in the United States without paying any fees or charges. If you are redistributing or providing access to a work with the phrase “Project Gutenberg” associated with or appearing on the work, you must comply either with the requirements of paragraphs 1.E.1 through 1.E.7 or obtain permission for the use of the work and the Project Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9. 1.E.3. If an individual Project Gutenberg™ electronic work is posted with the permission of the copyright holder, your use and distribution must comply with both paragraphs 1.E.1 through 1.E.7 and any additional terms imposed by the copyright holder. Additional terms will be linked to the Project Gutenberg™ License for all works posted with the permission of the copyright holder found at the beginning of this work. 1.E.4. Do not unlink or detach or remove the full Project Gutenberg™ License terms from this work, or any files
  • 54. containing a part of this work or any other work associated with Project Gutenberg™. 1.E.5. Do not copy, display, perform, distribute or redistribute this electronic work, or any part of this electronic work, without prominently displaying the sentence set forth in paragraph 1.E.1 with active links or immediate access to the full terms of the Project Gutenberg™ License. 1.E.6. You may convert to and distribute this work in any binary, compressed, marked up, nonproprietary or proprietary form, including any word processing or hypertext form. However, if you provide access to or distribute copies of a Project Gutenberg™ work in a format other than “Plain Vanilla ASCII” or other format used in the official version posted on the official Project Gutenberg™ website (www.gutenberg.org), you must, at no additional cost, fee or expense to the user, provide a copy, a means of exporting a copy, or a means of obtaining a copy upon request, of the work in its original “Plain Vanilla ASCII” or other form. Any alternate format must include the full Project Gutenberg™ License as specified in paragraph 1.E.1. 1.E.7. Do not charge a fee for access to, viewing, displaying, performing, copying or distributing any Project Gutenberg™ works unless you comply with paragraph 1.E.8 or 1.E.9. 1.E.8. You may charge a reasonable fee for copies of or providing access to or distributing Project Gutenberg™ electronic works provided that: • You pay a royalty fee of 20% of the gross profits you derive from the use of Project Gutenberg™ works calculated using the method you already use to calculate your applicable taxes. The fee is owed to the owner of the Project Gutenberg™ trademark, but he has agreed to donate royalties under this paragraph to the Project Gutenberg Literary Archive Foundation. Royalty
  • 55. payments must be paid within 60 days following each date on which you prepare (or are legally required to prepare) your periodic tax returns. Royalty payments should be clearly marked as such and sent to the Project Gutenberg Literary Archive Foundation at the address specified in Section 4, “Information about donations to the Project Gutenberg Literary Archive Foundation.” • You provide a full refund of any money paid by a user who notifies you in writing (or by e-mail) within 30 days of receipt that s/he does not agree to the terms of the full Project Gutenberg™ License. You must require such a user to return or destroy all copies of the works possessed in a physical medium and discontinue all use of and all access to other copies of Project Gutenberg™ works. • You provide, in accordance with paragraph 1.F.3, a full refund of any money paid for a work or a replacement copy, if a defect in the electronic work is discovered and reported to you within 90 days of receipt of the work. • You comply with all other terms of this agreement for free distribution of Project Gutenberg™ works. 1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™ electronic work or group of works on different terms than are set forth in this agreement, you must obtain permission in writing from the Project Gutenberg Literary Archive Foundation, the manager of the Project Gutenberg™ trademark. Contact the Foundation as set forth in Section 3 below. 1.F. 1.F.1. Project Gutenberg volunteers and employees expend considerable effort to identify, do copyright research on, transcribe and proofread works not protected by U.S. copyright
  • 56. law in creating the Project Gutenberg™ collection. Despite these efforts, Project Gutenberg™ electronic works, and the medium on which they may be stored, may contain “Defects,” such as, but not limited to, incomplete, inaccurate or corrupt data, transcription errors, a copyright or other intellectual property infringement, a defective or damaged disk or other medium, a computer virus, or computer codes that damage or cannot be read by your equipment. 1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for the “Right of Replacement or Refund” described in paragraph 1.F.3, the Project Gutenberg Literary Archive Foundation, the owner of the Project Gutenberg™ trademark, and any other party distributing a Project Gutenberg™ electronic work under this agreement, disclaim all liability to you for damages, costs and expenses, including legal fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE. 1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you discover a defect in this electronic work within 90 days of receiving it, you can receive a refund of the money (if any) you paid for it by sending a written explanation to the person you received the work from. If you received the work on a physical medium, you must return the medium with your written explanation. The person or entity that provided you with the defective work may elect to provide a replacement copy in lieu of a refund. If you received the work electronically, the person or entity providing it to you may choose to give you a second opportunity to receive the work electronically in lieu of a refund.
  • 57. If the second copy is also defective, you may demand a refund in writing without further opportunities to fix the problem. 1.F.4. Except for the limited right of replacement or refund set forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PURPOSE. 1.F.5. Some states do not allow disclaimers of certain implied warranties or the exclusion or limitation of certain types of damages. If any disclaimer or limitation set forth in this agreement violates the law of the state applicable to this agreement, the agreement shall be interpreted to make the maximum disclaimer or limitation permitted by the applicable state law. The invalidity or unenforceability of any provision of this agreement shall not void the remaining provisions. 1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation, the trademark owner, any agent or employee of the Foundation, anyone providing copies of Project Gutenberg™ electronic works in accordance with this agreement, and any volunteers associated with the production, promotion and distribution of Project Gutenberg™ electronic works, harmless from all liability, costs and expenses, including legal fees, that arise directly or indirectly from any of the following which you do or cause to occur: (a) distribution of this or any Project Gutenberg™ work, (b) alteration, modification, or additions or deletions to any Project Gutenberg™ work, and (c) any Defect you cause. Section 2. Information about the Mission of Project Gutenberg™
  • 58. Project Gutenberg™ is synonymous with the free distribution of electronic works in formats readable by the widest variety of computers including obsolete, old, middle-aged and new computers. It exists because of the efforts of hundreds of volunteers and donations from people in all walks of life. Volunteers and financial support to provide volunteers with the assistance they need are critical to reaching Project Gutenberg™’s goals and ensuring that the Project Gutenberg™ collection will remain freely available for generations to come. In 2001, the Project Gutenberg Literary Archive Foundation was created to provide a secure and permanent future for Project Gutenberg™ and future generations. To learn more about the Project Gutenberg Literary Archive Foundation and how your efforts and donations can help, see Sections 3 and 4 and the Foundation information page at www.gutenberg.org. Section 3. Information about the Project Gutenberg Literary Archive Foundation The Project Gutenberg Literary Archive Foundation is a non- profit 501(c)(3) educational corporation organized under the laws of the state of Mississippi and granted tax exempt status by the Internal Revenue Service. The Foundation’s EIN or federal tax identification number is 64-6221541. Contributions to the Project Gutenberg Literary Archive Foundation are tax deductible to the full extent permitted by U.S. federal laws and your state’s laws. The Foundation’s business office is located at 809 North 1500 West, Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up to date contact information can be found at the Foundation’s website and official page at www.gutenberg.org/contact
  • 59. Section 4. Information about Donations to the Project Gutenberg Literary Archive Foundation Project Gutenberg™ depends upon and cannot survive without widespread public support and donations to carry out its mission of increasing the number of public domain and licensed works that can be freely distributed in machine-readable form accessible by the widest array of equipment including outdated equipment. Many small donations ($1 to $5,000) are particularly important to maintaining tax exempt status with the IRS. The Foundation is committed to complying with the laws regulating charities and charitable donations in all 50 states of the United States. Compliance requirements are not uniform and it takes a considerable effort, much paperwork and many fees to meet and keep up with these requirements. We do not solicit donations in locations where we have not received written confirmation of compliance. To SEND DONATIONS or determine the status of compliance for any particular state visit www.gutenberg.org/donate. While we cannot and do not solicit contributions from states where we have not met the solicitation requirements, we know of no prohibition against accepting unsolicited donations from donors in such states who approach us with offers to donate. International donations are gratefully accepted, but we cannot make any statements concerning tax treatment of donations received from outside the United States. U.S. laws alone swamp our small staff. Please check the Project Gutenberg web pages for current donation methods and addresses. Donations are accepted in a number of other ways including checks, online payments and
  • 60. credit card donations. To donate, please visit: www.gutenberg.org/donate. Section 5. General Information About Project Gutenberg™ electronic works Professor Michael S. Hart was the originator of the Project Gutenberg™ concept of a library of electronic works that could be freely shared with anyone. For forty years, he produced and distributed Project Gutenberg™ eBooks with only a loose network of volunteer support. Project Gutenberg™ eBooks are often created from several printed editions, all of which are confirmed as not protected by copyright in the U.S. unless a copyright notice is included. Thus, we do not necessarily keep eBooks in compliance with any particular paper edition. Most people start at our website which has the main PG search facility: www.gutenberg.org. This website includes information about Project Gutenberg™, including how to make donations to the Project Gutenberg Literary Archive Foundation, how to help produce our new eBooks, and how to subscribe to our email newsletter to hear about new eBooks.
  • 61. Welcome to our website – the ideal destination for book lovers and knowledge seekers. With a mission to inspire endlessly, we offer a vast collection of books, ranging from classic literary works to specialized publications, self-development books, and children's literature. Each book is a new journey of discovery, expanding knowledge and enriching the soul of the reade Our website is not just a platform for buying books, but a bridge connecting readers to the timeless values of culture and wisdom. With an elegant, user-friendly interface and an intelligent search system, we are committed to providing a quick and convenient shopping experience. Additionally, our special promotions and home delivery services ensure that you save time and fully enjoy the joy of reading. Let us accompany you on the journey of exploring knowledge and personal growth! textbookfull.com